What is wrong with the following code in vivado hls? - vivado

The following code should read a value from DDR, decrement it, write the result back to the same address, and read the next value, repeating 256 times.
Instead on the first run it decrements the first 2 values (axi_ddr[0] and [1]), and on consecutive runs it only decrements the first value (axi_ddr[0]).
#include "ap_cint.h"
#include <stdio.h>
#include "string.h"
void hls_test(volatile int256 axi_ddr[256], uint32 *axi_lite_status_control){
#pragma HLS INTERFACE s_axilite port=axi_lite_status_control register bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE m_axi depth=256 port=axi_ddr bundle=DDR
int256 axi_ddr_reg;
int256 diff = 1;
uint9 i = 0;
if (*axi_lite_status_control == 1){
for(i = 0; i < 256; i++){
axi_ddr_reg = axi_ddr[i];
axi_ddr[i] = axi_ddr_reg -diff;
}
*axi_lite_status_control = 2;
}
}
Both simulation and cosimulation passes as intended, and cannot figure out what is causing the issue.
Also tried C++, but it ended in the same behavior. The only time it was different, was when I forgot to give initial value to variable diff, and then the value in all 256 DDR locations became 0x0.
Could somebody please point out what am I missing?

The code looks fine to me and it should work flowlessly. However, if you're saying that both simulation and cosimulation pass, then something might be wrong with either you test code or with your hardware implementation.
Also, for the C++ version of the code, you shall be using the ap_uint<N> types defined in ap_int.h, instead of ap_cint.h.

Related

CUDD BDDs: building a boolean as disjunction of conjunctions but get runtime error: segmentation fault

Does anyone with experience using CUDD (not be confused with CUDA) for manipulating BDDs know why possibly I keep getting the dreaded "segmentation error (dumped core)". I suspect it could be related to referencing de-referencing which I confess I don't fully understand. Any hints pointers appreciated. I (commented out some things I have been trying):
#include <stdio.h>
#include <stdlib.h>
#include "cudd.h"
int main(int argc, char* argv[])
{
/*char filename[30];*/
DdManager * gbm; /* Global BDD manager. */
gbm = Cudd_Init(0, 0, CUDD_UNIQUE_SLOTS, CUDD_CACHE_SLOTS, 0); /* Initialize a new BDD manager with defaults. */
int const n = 3;
int i, j;
DdNode *var, *tmp, *tmp2, *BDD, *BDD_t;
BDD_t = Cudd_ReadLogicZero(gbm);
/*Cudd_Ref(BDD_t);*/
/* Outter loop: disjunction of the n terms*/
for (j = 0; j <= n - 1; j++) {
BDD = Cudd_ReadOne(gbm); /*Returns the logic one constant of the manager*/
/* Cudd_Ref(BDD);*/
/* Inner loop: assemble each of the n conjunctions */
for (i = j * (n - 1); i >= (j - 1) * (n - 1); i--) {
var = Cudd_bddIthVar(gbm, i); /*Create a new BDD variable*/
tmp = Cudd_bddAnd(gbm, var, BDD); /*Perform AND boolean operation*/
BDD = tmp;
}
tmp2 = Cudd_bddOr(gbm, BDD, BDD_t); /*Perform OR boolean operation*/
/*Cudd_RecursiveDeref(gbm, tmp);*/
BDD_t = tmp2;
}
Cudd_PrintSummary(gbm, BDD_t, 4, 0);
/* Cudd_bddPrintCover(mgr, BDD_t, BDD);*/
/* BDD = Cudd_BddToAdd(gbm, BDD_t);*/
/* printf(gbm,BDD_t, 2, 4);*/
Cudd_Quit(gbm);
return 0;
}
While you are correc that Cudd_Ref'find and Cudd_RecursiveDeref'ing is not correct in your code (yet), the current and first problem is actually a different one.
You never check the return values of the CUDD function. Some of them return NULL (0) on error, and your code does not detect such cases. In fact, the call to "Cudd_bddIthVar" returns NULL (0) at least once, and then the subsequent call to the BDD AND function makes the CUDD library access the memory at memory address 0+4, causing the segmentation fault.
There are multiple ways to fix this:
The best way is to always check for NULL return values and then notify the user of the program of the problem. Since this is your main() function, this could be printing an error message and the returning 1
At the very bare minimum, you can add assert(...) statements, so that at least in debug mode, the problem will become obvious. This is not recommended in general, as when compiling not in debug mode, such problems may go unnoticed.
In C++, there is also the possibility to work with exception - but you don't seem to be using C++.
Now why does "Cudd_bddIthVar(gbm, i)" return NULL? Because in the second iteration, variable "i" of the loop has value -1.
Now as far as Ref'fing and Deref'fing is concerned:
You need to call Cudd_Ref(...) to every BDD variable that you want to use until after calling the next Cudd function. Exceptions are constants are variables.
You need to call Cudd_RecursiveDeref(...) on every BDD node that you initially Ref'd that is no longer needed.
This is because every BDD node has a counter telling you how often it is currently in use. Once the counter hits 0, the BDD node may be recycled. Ref'fing while the node is in use makes sure that it does not happen while the node is in use.
You program could be fixed as follows:
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "cudd.h"
int main(int argc, char* argv[])
{
/*char filename[30];*/
DdManager * gbm; /* Global BDD manager. */
gbm = Cudd_Init(0, 0, CUDD_UNIQUE_SLOTS, CUDD_CACHE_SLOTS, 0); /* Initialize a new BDD manager with defaults. */
assert(gbm!=0);
int const n = 3;
int i, j;
DdNode *var, *tmp, *tmp2, *BDD, *BDD_t;
BDD_t = Cudd_ReadLogicZero(gbm);
assert(BDD_t!=0);
Cudd_Ref(BDD_t);
/* Outter loop: disjunction of the n terms*/
for (j = 0; j <= n - 1; j++) {
BDD = Cudd_ReadOne(gbm); /*Returns the logic one constant of the manager*/
assert(BDD!=0);
Cudd_Ref(BDD);
/* Inner loop: assemble each of the n conjunctions */
for (i = j * (n - 1); i >= (j) * (n - 1); i--) {
var = Cudd_bddIthVar(gbm, i); /*Create a new BDD variable*/
assert(var!=0);
tmp = Cudd_bddAnd(gbm, var, BDD); /*Perform AND boolean operation*/
assert(tmp!=0);
Cudd_Ref(tmp);
Cudd_RecursiveDeref(gbm,BDD);
BDD = tmp;
}
tmp2 = Cudd_bddOr(gbm, BDD, BDD_t); /*Perform OR boolean operation*/
assert(tmp2!=0);
Cudd_Ref(tmp2);
Cudd_RecursiveDeref(gbm, BDD_t);
Cudd_RecursiveDeref(gbm, BDD);
BDD_t = tmp2;
}
Cudd_PrintSummary(gbm, BDD_t, 4, 0);
/* Cudd_bddPrintCover(mgr, BDD_t, BDD);*/
/* BDD = Cudd_BddToAdd(gbm, BDD_t);*/
/* printf(gbm,BDD_t, 2, 4);*/
Cudd_RecursiveDeref(gbm,BDD_t);
assert(Cudd_CheckZeroRef(gbm)==0);
Cudd_Quit(gbm);
return 0;
}
For brevity, I used assert(...) statements to check the conditions. Don't use this in production code - this is only to keep the code shorter during learning. Also look up in the CUDD documentation which calls can actually return NULL. Those that cannot do not need such a check. But most calls can return 0.
Note that:
The return value of Cudd_bddIthVar is not Cudd_Ref's - it doesn't need to.
The return value of Cudd_ReadLogicZero(gbm) is Cudd_Ref'd - this is because the variable is overwritten with nodes that have to be Ref'd later, and hence the code needs to have a call to RecursiveDeref(...) in that case. To make Ref's and Deref's symmetric, the node is needlessly Ref'd (which is allowed).
The last assert statement checks if there are any nodes still in use -- if that is the case before calling Cudd_Quit, this tells you that your code doesn't Deref correctly, which should be fixed. If you comment out any RecursiveDeref line and run the code, the assert statement should halt execution then.
I've rewritten your for-loop condition to ensure that no negative variable numbers occur. But your code may not do what it is supposed to now.
Thanks, #DCTLib. I've found that the C++ interface is much more convenient for formulating Boolean expressions. The only problem is how to go back and forth between the C and C++ interfaces, since ultimately I still need C for printing out the minterms (called cutsets in the world I inhabit, Reliability Eng. Let me pose the question in separate entry. It seems you know CUDD quite well. You should be maintaining that repo! It's a great product but sparsely documented or interacted with.

SHA1 ERROR WHILE USING TinyECC in tinyos-2.x

I am using TinyOS-2.1.2 and to achieve security techniques I am using TinyECC-2.0. I want to use the SHA1 available in tinyecc. But, when I take the hash of a value say,
uint8_t data=123;
I use the three functions of sha given in SHA1.nc namely, SHA1.reset, SHA1.update and SHA1.digest to obtain the result. But each time I run the code ie. do "make micaz sim" I get different hash results for the same data.
How to get a unique hash value for each data taken?
The code is:
#include "sha1.h"
module DisseminationC {
uses {
interface SHA1;
}
implementation{
void hash(){
uint8_t x=123;
call SHA1.context(context);
call SHA1.update(context, x, sizeof(x));
call SHA1.digest(context, Message_Digest[SHA1HashSize]);
dbg("All", "%s Hash is : %d \n", sim_time_string(), Message_Digest);
}
I made modifications in the code as shown below. Now, I am getting a hash output. But the problem is that for every different number given as input I am getting the same answer. How do I solve this issue?
Please help me..
#include "sha1.h"
module SecurityC{
uses interface Boot;
uses interface SHA1;
}
implementation{
uint8_t Message_Digest[SHA1HashSize];
SHA1Context context;
uint8_t num=123;
uint32_t length=3;
uint8_t i;
event void Boot.booted()
{
dbg("Boot", "Application booted.\n");
call SHA1.reset(&context);
while(length>0)
{
length=length/10;
call SHA1.update(&context, &num, length);
}
call SHA1.digest(&context, Message_Digest);
for(i = 0; i < SHA1HashSize; i++) {
dbg("Boot", "%s KEY IS: %x \n", sim_time_string(), Message_Digest[i]);
}
}
}
First of all, your code is bad. It lacks two braces and the function SHA1.context doesn't exist in this library (it should be SHA1.reset, I think). Moreover, Message_Digest and context aren't declared. Please provide the full code you actually use.
However, I see you have at least two serious bugs.
Firstly, you pass the value of x to SHA1.update, but you should pass a pointer to the message. Therefore, the function processes a message that lies at the address 123 in the memory (you should get a compiler warning about this). If you want to calculate a hash from the value of x, try this:
call SHA1.update(context, &x, sizeof(x));
Secondly, Message_Digest seems to be a uint8_t array of size SHA1HashSize. In the last statement you print a pointer to this array instead of its content (and again, the compiler should warn you), so you get an adress of this array in the memory. You may want to process the array in a loop:
uint8_t i;
for(i = 0; i < SHA1HashSize; ++i) {
// process Message_Digest[i], for instance print it
}

CUDA - Array Generating random array on gpu and its modification using kernel

in this code im generating 1D array of floats on a gpu using CUDA. The numbers are between 0 and 1. For my purpose i need them to be between -1 and 1 so i have made simple kernel to multiply each element by 2 and then substract 1 from it. However something is going wrong here. When i print my original array into .bmp i get this http://i.imgur.com/IS5dvSq.png (typical noise pattern). But when i try to modify that array with my kernel i get blank black picture http://imgur.com/cwTVPTG . The program is executable but in the debug i get this:
First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe:
Microsoft C++ exception: cudaError_enum at memory location
0x003cfacc..
First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe:
Microsoft C++ exception: cudaError_enum at memory location
0x003cfb08..
First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe:
Microsoft C++ exception: [rethrow] at memory location 0x00000000..
i would be thankfull for any help or even little hint in this matter. Thanks !
(edited)
#include <device_functions.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include "stdafx.h"
#include "EasyBMP.h"
#include <curand.h> //curand.lib must be added in project propetties > linker > input
#include "device_launch_parameters.h"
float *heightMap_cpu;
float *randomArray_gpu;
int randCount = 0;
int rozmer = 513;
void createRandoms(int size){
curandGenerator_t generator;
cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));
curandCreateGenerator(&generator,CURAND_RNG_PSEUDO_XORWOW);
curandSetPseudoRandomGeneratorSeed(generator,(int)time(NULL));
curandGenerateUniform(generator,randomArray_gpu,size*size);
}
__global__ void polarizeRandoms(int size, float *randomArray_gpu){
int index = threadIdx.x + blockDim.x * blockIdx.x;
if(index<size*size){
randomArray_gpu[index] = randomArray_gpu[index]*2.0f - 1.0f;
}
}
//helper fucnction for getting address in 1D using 2D coords
int ad(int x,int y){
return x*rozmer+y;
}
void printBmp(){
BMP AnImage;
AnImage.SetSize(rozmer,rozmer);
AnImage.SetBitDepth(24);
int i,j;
for(i=0;i<=rozmer-1;i++){
for(j=0;j<=rozmer-1;j++){
AnImage(i,j)->Red = (int)((heightMap_cpu[ad(i,j)]*127)+128);
AnImage(i,j)->Green = (int)((heightMap_cpu[ad(i,j)]*127)+128);
AnImage(i,j)->Blue = (int)((heightMap_cpu[ad(i,j)]*127)+128);
AnImage(i,j)->Alpha = 0;
}
}
AnImage.WriteToFile("HeightMap.bmp");
}
int main(){
createRandoms(rozmer);
polarizeRandoms<<<((rozmer*rozmer)/1024)+1,1024>>>(rozmer,randomArray_gpu);
heightMap_cpu = (float*)malloc((rozmer*rozmer)*sizeof(float));
cudaMemcpy(heightMap_cpu,randomArray_gpu,rozmer*rozmer*sizeof(float),cudaMemcpyDeviceToHost);
printBmp();
//cleanup
cudaFree(randomArray_gpu);
free(heightMap_cpu);
return 0;
}
This is wrong:
cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));
We don't use cudaMalloc with __device__ variables. If you do proper cuda error checking I'm pretty sure that line will throw an error.
If you really want to use a __device__ pointer this way, you need to create a separate normal pointer, cudaMalloc that, then copy the pointer value to the device pointer using cudaMemcpyToSymbol:
float *my_dev_pointer;
cudaMalloc((void**)&my_dev_pointer, size*size*sizeof(float));
cudaMemcpyToSymbol(randomArray_gpu, &my_dev_pointer, sizeof(float *));
Whenever you are having trouble with your CUDA programs, you should do proper cuda error checking. It will likely focus your attention on what is wrong.
And, yes, kernels can access __device__ variables without the variable being passed explicitly as a parameter to the kernel.
The programming guide covers the proper usage of __device__ variables and the api functions that should be used to access them from the host.

Question regarding the clock() function

Why does the time to execute function f1() changes from one run to another in debug mode? Why it's always zero in release mode?
I didn't include stdio.h nor cstdio and the code compiled. How ?
#include <iostream>
#include <ctime>
void f1()
{
for( int i = 0; i < 10000; i++ );
}
int main()
{
clock_t start, finish;
start = clock();
for( int i = 0; i < 100000; i++ ) f1();
finish = clock();
double duration = (double)(finish - start) / CLOCKS_PER_SEC;
printf( "Duration = %6.2f seconds\n", duration);
}
Possible the machine you're running your test code from is too fast. Try increasing the loop count to a really huge number.
Other things to try is to test with the sleep() function.
This should confirm the behavior of your clock() measurements.
I believe the reason you are seeing zero runtime for f1() in release mode is because the compiler is optimizing the function. Since your for loop doesn't have a code block, it can effectively be pulled out during compilation.
I'm guessing that this optimization is not performed in debug mode, which would explain why you see a longer execution time. It varies between runs simply because your OS scheduler (almost certainly) does not guarantee a fixed time slot for processes.
As for why you can use printf() when you have not explicitly included <cstdio>, it's because of the <iostream> include.
From looking my headers at C:\Program Files\Microsoft Visual Studio 10.0\VC\include, I can see that iostream includes istream and ostream, both of which include ios, which includes xlocnum, which includes both cstdlib and cstdio.

Thread-safe random number generation for Monte-Carlo integration

Im trying to write something which very quickly calculates random numbers and can be applied on multiple threads. My current code is:
/* Approximating PI using a Monte-Carlo method. */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <omp.h>
#define N 1000000000 /* As lareg as possible for increased accuracy */
double random_function(void);
int main(void)
{
int i = 0;
double X, Y;
double count_inside_temp = 0.0, count_inside = 0.0;
unsigned int th_id = omp_get_thread_num();
#pragma omp parallel private(i, X, Y) firstprivate(count_inside_temp)
{
srand(th_id);
#pragma omp for schedule(static)
for (i = 0; i <= N; i++) {
X = 2.0 * random_function() - 1.0;
Y = 2.0 * random_function() - 1.0;
if ((X * X) + (Y * Y) < 1.0) {
count_inside_temp += 1.0;
}
}
#pragma omp atomic
count_inside += count_inside_temp;
}
printf("Approximation to PI is = %.10lf\n", (count_inside * 4.0)/ N);
return 0;
}
double random_function(void)
{
return ((double) rand() / (double) RAND_MAX);
}
This works but from observing a resource manager I know its not using all the threads. Does rand() work for multithreaded code? And if not is there a good alternative? Many Thanks. Jack
Is rand() thread safe? Maybe, maybe not:
The rand() function need not be reentrant. A function that is not required to be reentrant is not required to be thread-safe."
One test and good learning exercise would be to replace the call to rand() with, say, a fixed integer and see what happens.
The way I think of pseudo-random number generators is as a black box which take an integer as input and return an integer as output. For any given input the output is always the same, but there is no pattern in the sequence of numbers and the sequence is uniformly distributed over the range of possible outputs. (This model isn't entirely accurate, but it'll do.) The way you use this black box is to choose a staring number (the seed) use the output value in your application and as the input for the next call to the random number generator. There are two common approaches to designing an API:
Two functions, one to set the initial seed (e.g. srand(seed)) and one to retrieve the next value from the sequence (e.g. rand()). The state of the PRNG is stored internally in sort of global variable. Generating a new random number either will not be thread safe (hard to tell, but the output stream won't be reproducible) or will be slow in multithreded code (you end up with some serialization around the state value).
A interface where the PRNG state is exposed to the application programmer. Here you typically have three functions: init_prng(seed), which returns some opaque representation of the PRNG state, get_prng(state), which returns a random number and changes the state variable, and destroy_peng(state), which just cleans up allocated memory and so on. PRNGs with this type of API should all be thread safe and run in parallel with no locking (because you are in charge of managing the (now thread local) state variable.
I generally write in Fortran and use Ladd's implementation of the Mersenne Twister PRNG (that link is worth reading). There are lots of suitable PRNG's in C which expose the state to your control. PRNG looks good and using this (with initialization and destroy calls inside the parallel region and private state variables) should give you a decent speedup.
Finally, it's often the case that PRNGs can be made to perform better if you ask for a whole sequence of random numbers in one go (e.g. the compiler can vectorize the PRNG internals). Because of this libraries often have something like get_prng_array(state) functions which give you back an array full of random numbers as if you put get_prng in a loop filling the array elements - they just do it more quickly. This would be a second optimization (and would need an added for loop inside the parallel for loop. Obviously, you don't want to run out of per-thread stack space doing this!

Resources