PyCuda program keeps on running - python-3.x

answer_array = np.zeros_like(self.redarray)
answer_array_gpu = cuda.mem_alloc(answer_array.nbytes)
redarray_gpu = cuda.mem_alloc(self.redcont.nbytes)
greenarray_gpu = cuda.mem_alloc(self.greencont.nbytes)
bluearray_gpu = cuda.mem_alloc(self.bluecont.nbytes)
cuda.memcpy_htod(redarray_gpu, self.redcont)
cuda.memcpy_htod(greenarray_gpu, self.greencont)
cuda.memcpy_htod(bluearray_gpu, self.bluecont)
cuda.memcpy_htod(answer_array_gpu, answer_array)
desaturate_mod = SourceModule("""
__global__ void array_desaturation(float *a, float *b, float *c, float *d){
int index = blockIdx.x * blockDim.x + threadIdx.x;
d[index] = ((a[index] + b[index] + c[index])/3);
}
""")
func = desaturate_mod.get_function("array_desaturation")
func(redarray_gpu, greenarray_gpu, bluearray_gpu, answer_array_gpu,
block=(self.gpu_threads, self.gpu_threads, self.blocks_to_use))
desaturated = np.empty_like(self.redarray)
cuda.memcpy_dtoh(desaturated, answer_array_gpu)
print(desaturated)
print("Up to here")
I wrote this piece of code for finding the average of values on three arrays and save it in to a fourth array. The code is neither printing the result, nor the line saying "Up to here". What could be the error?
Additional info: Redarray, greenarray and bluearray are float32 numpy arrays

I know getting started with arrays in C, and especially in PyCUDA can be pretty tricky, it took me months to get a 2D sliding max algorithm working.
In this example, you can't access array elements like you can in Python where you can just provide an index since you are passing a pointer to the memory address to the first element in each array. A useful example for how this works in C can be found here. You will also have to pass in the length for the arrays (assuming they are all equal so that we do not go out of bounds) and if they are of different lengths, all of them respectively.
Hopefully, you can understand how to access your array elements via pointers in C from that link. Then #talonmies provides an nice example here for how to pass in a 2D array (this is the same as a 1D array since the 2D array gets flattened to a 1D one in memory on the GPU). However, when I was working with this, I never passed in the strides which #talonmies does, working like the TutorialsPoint tutorial says *(pointer_to_array + index) is correct. Providing a memory stride here will cause you to go out of bounds.
Therefore my code for this would look more like:
C_Code = """
__global__ void array_desaturation(float *array_A, float *array_B, float *array_C, float *outputArray, int arrayLengths){
int index = blockIdx.x * blockDim.x + threadIdx.x;
if(index >= arrayLengths){ // In case our threads created would be outwise out of the bounds for our memory, if we did we would have some serious unpredictable problems
return;
}
// These variables will get the correct values from the arrays at the appropriate index relative to their unique memory addresses (You could leave this part out but I like the legibility)
float aValue = *(array_A + index);
float bValue = *(array_B + index);
float cValue = *(array_C + index);
*(outputArray + index) = ((aValue + bValue + cValue)/3); //Set the (output arrays's pointer + offset)'s value to our average value
}"""
desaturate_mod = SourceModule(C_Code)
desaturate_kernel = desaturate_mod.get_function("array_desaturation")
desaturate_kernel(cuda.In(array_A), # Input
cuda.In(array_B), # Input
cuda.In(array_C), # Input
cuda.Out(outputArray), # Output
numpy.int32(len(array_A)), # Array Size if they are all the same length
block=(blockSize[0],blockSize[1],1), # However you want for the next to parameters but change your index accordingly
grid=(gridSize[0],gridSize[1],1)
)
print(outputArray) # Done! Make sure you have defined all these arrays before ofc

Related

Retrieving depth value of a sample position using depth map

I have been trying to implement SSAO following the LearnOpenGL implementation. In their implementation they have utilized the positions g-buffer to obtain the sample positions depth value and I am wondering how I could go about using the depth buffer instead since I have this ready to use, in order to retrieve the depth value instead of using the positions g-buffer. I have shown the LearnOpenGL implementation using the positions texture and the attempt I made at using the depth buffer. I think I might be missing a required step for utilizing the depth buffer but I am unsure.
[LearnOpenGL SSAO][1]
Using Positions g-buffer
layout(binding = 7) uniform sampler2D positionsTexture;
layout(binding = 6) uniform sampler2D depthMap;
// ...
vec4 offset = vec4(samplePos, 1.0);
offset = camera.proj * offset; //transform sample to clip space
offset.xyz /= offset.w; // perspective divide
offset.xyz = offset.xyz * 0.5 + 0.5; // transform to range 0-1
float sampleDepth = texture(positionsTexture, offset.xy).z;
I want to use the depth buffer instead. This approach did not seem to work for me
float sampleDepth = texture(depthMap, offset.xy).x;
Update: 8/01
I have implemented a linerazation function for the depth result. Still unable to obtain the right result. Am I missing something more?
float linearize_depth(float d,float zNear,float zFar)
{
return zNear * zFar / (zFar + d * (zNear - zFar));
}
float sampleDepth = linearize_depth(texture(depthMap, offset.xy).z, zNear, zFar);

Finding the binary composition of a binary number

Very new to C#, so this could be a silly question.
I am working with alot of UInt64's. These are expressed as hex right? If we look at its binary representation, can we return such an array that if we apply the 'or' operation to, we will arrive back at the original UInt64?
For example, let's say
x = 1011
Then, I am looking for an efficient way to arrive at,
f(x) = {1000, 0010, 0001}
Where these numbers are in hex, rather than binary. Sorry, I am new to hex too.
I have a method already, but it feels inefficient. I first convert to a binary string, and loop over that string to find each '1'. I then add the corresponding binary number to an array.
Any thoughts?
Here is a better example. I have a hexadecimal number x, in the form of,
UInt64 x = 0x00000000000000FF
Where the binary representation of x is
0000000000000000000000000000000000000000000000000000000011111111
I wish to find an array consisting of hexadecimal numbers (UInt64??) such that the or operation applied to all members of that array would result in x again. For example,
f(x) = {0x0000000000000080, // 00000....10000000
0x0000000000000040, // 00000....01000000
0x0000000000000020, // 00000....00100000
0x0000000000000010, // 00000....00010000
0x0000000000000008, // 00000....00001000
0x0000000000000004, // 00000....00000100
0x0000000000000002, // 00000....00000010
0x0000000000000001 // 00000....00000001
}
I think the question comes down to finding an efficient way to find the index of the '1's in the binary expansion...
public static UInt64[] findOccupiedSquares(UInt64 pieces){
UInt64[] toReturn = new UInt64[BitOperations.PopCount(pieces)];
if (BitOperations.PopCount(pieces) == 1){
toReturn[0] = pieces;
}
else{
int i = 0;
int index = 0;
while (pieces != 0){
i += 1;
pieces = pieces >> 1;
if (BitOperations.TrailingZeroCount(pieces) == 0){ // One
int rank = (int)(i / 8);
int file = i - (rank * 8);
toReturn[index] = LUTable.MaskRank[rank] & LUTable.MaskFile[file];
index += 1;
}
}
}
return toReturn;
}
Your question still confuses me as you seem to be mixing the concepts of numbers and number representations. i.e. There is an integer and then there is a hexadecimal representation of that integer.
You can very simply break any integer into its base-2 components.
ulong input = 16094009876; // example input
ulong x = 1;
var bits = new List<ulong>();
do
{
if ((input & x) == x)
{
bits.Add(x);
}
x <<= 1;
} while (x != 0);
bits is now a list of integers which each represent one of the binary 1 bits within the input. This can be verified by adding (or ORing - same thing) all the values. So this expression is true:
bits.Aggregate((a, b) => a | b) == input
If you want hexadecimal representations of those integers in the list, you can simply use ToString():
var hexBits = bits.Select(b => b.ToString("X16"));
If you want the binary representations of the integers, you can use Convert:
var binaryBits = bits.Select(b => Convert.ToString((long)b, 2).PadLeft(64, '0'));

Open Scene Graph - Usage of DrawElementsUInt: Drawing a cloth without duplicating vertices

I am currently working on simulating a cloth like material and then displaying the results via Open Scene Graph.
I've gotten the setup to display something cloth like, by just dumping all the vertices into 1 Vec3Array and then displaying them with a standard Point based DrawArrays. However I am looking into adding the faces between the vertices so that a further part of my application can visually see the cloth.
This is currently what I am attempting as for the PrimitiveSet
// create and add a DrawArray Primitive (see include/osg/Primitive). The first
// parameter passed to the DrawArrays constructor is the Primitive::Mode which
// in this case is POINTS (which has the same value GL_POINTS), the second
// parameter is the index position into the vertex array of the first point
// to draw, and the third parameter is the number of points to draw.
unsigned int k = CLOTH_SIZE_X;
unsigned int n = CLOTH_SIZE_Y;
osg::ref_ptr<osg::DrawElementsUInt> indices = new osg::DrawElementsUInt(GL_QUADS, (k) * (n));
for (uint y_i = 0; y_i < n - 1; y_i++) {
for (uint x_i = 0; x_i < k - 1; x_i++) {
(*indices)[y_i * k + x_i] = y_i * k + x_i;
(*indices)[y_i * (k + 1) + x_i] = y_i * (k + 1) + x_i;
(*indices)[y_i * (k + 1) + x_i + 1] = y_i * (k + 1) + x_i + 1;
(*indices)[y_i * k + x_i] = y_i * k + x_i + 1;
}
}
geom->addPrimitiveSet(indices.get());
This does however cause memory corruption when running, and I am not fluent enough in Assembly code to decipher what it is trying to do wrong when CLion gives me the disassembled code.
My thought was that I would iterate over each of the faces of my cloth and then select the 4 indices of the vertices that belong to it. The vertices are inputted from top left to bottom right in order. So:
0 1 2 3 ... k-1
k k+1 k+2 k+3 ... 2k-1
2k 2k+1 2k+2 2k+3 ... 3k-1
...
Has anyone come across this specific use-case before and does he/she perhaps have a solution for my problem? Any help would be greatly appreciated.
You might want to look into using DrawArrays with QUAD_STRIP (or TRIANGLE_STRIP because quads are frowned upon these days). There's an example here:
http://openscenegraph.sourceforge.net/documentation/OpenSceneGraph/examples/osggeometry/osggeometry.cpp
It's slightly less efficient than Elements/indices, but it's also less complicated to manage the relationship between the two related containers (the vertices and the indices).
If you really want to do the Elements/indices route, we'd probably need to see more repro code to see what's going on.

Taking mean of images for background subtraction - incorrect results

When I try to sum up N previous frames stored in a list and then dividing by num frames, the background model produced is not as expected. I can tell because I've tried the algo in Matlab earlier on the same video.
class TemporalMeanFilter {
private:
int learningCount;
list<Mat> history;
int height, width;
Mat buildModel(){
if(history.size() == 0)
return Mat();
Mat image_avg(height, width, CV_8U, Scalar(0));
double alpha = (1.0/history.size());
list<Mat>::iterator it = history.begin();
cout << "History size: " << history.size() << " Weight per cell: " << alpha << endl;
while(it != history.end()){
image_avg += (*it * alpha);
it++;
}
return image_avg;
}
public:
TemporalMeanFilter(int height, int width, int learningCount){
this->learningCount = learningCount;
this->height = height;
this->width = width;
}
void applyFrameDifference(Mat& frame_diff, Mat& bg_model, Mat& input_frame){
if(history.size() == learningCount)
history.pop_front();
history.push_back(input_frame);
bg_model = buildModel();
frame_diff = bg_model - input_frame;
}
};
//The main looks like this
// ... reading video from file
TemporalMeanFilter meanFilter(height, width, 50); //background subtraction algorithm
meanFilter.applyFrameDifference(diff_frame, bg_model, curr_frame);
//... displaying on screen ... prog ends
Image:
http://imagegur.com/images/untitled.png
The left one is the bg_model, the middle is the curr_frame, and the right one is the output.
Maybe it's because of the rounding off done on CV_U8? I tried changing to CV_32FC1, but then the program just crashed because for some reason it couldn't add two CV_32FC1 Matrices.
Any insight would be greatly appreciated. Thanks!
More Info:
Inside the class, I now keep the average in a CV_16UC1 Mat to prevent clipping, how it results in an error after successive addition.
The add function / operator + both change the type of result from CV_16UC1 to CV8UC1. This error is caused by that. Any suggestion how to ask it preserve the original datatype? (PS: I asked politely... didn't work)
background_model += *it;
OpenCV Error: Bad argument (When the input arrays in add/subtract/multiply/divid
e functions have different types, the output array type must be explicitly speci
fied) in unknown function, file C:\buildslave64\win64_amdocl\2_4_PackSlave-win32
-vc11-shared\opencv\modules\core\src\arithm.cpp, line 1313
You're right that it's almost certainly the rounding errors you get by accumulating scaled greyscale values. There's no reason why it should crash using floating point pixels though, so you should try something like this:
Mat buildModel()
{
if (history.size() == 0)
return Mat();
Mat image_avg = Mat::zeros(height, width, CV_32FC1);
double alpha = (1.0 / history.size());
list<Mat>::iterator it = history.begin();
while (it != history.end())
{
Mat image_temp;
(*it).convertTo(image_temp, CV_32FC1);
image_avg += image_temp;
it++;
}
image_avg *= alpha;
return image_avg;
}
Depending on what you want to do with the result, you may need to normalize it or rescale it or convert back to greyscale before display etc.

How can i store and access images in Mat of opencv

I am trying to use:
cv::Mat source;
const int histSize[] = {intialframes, initialWidth, initialHeight};
source.create(3, histSize, CV_8U);
for saving multiple images in one matrix. However when i do so, it gives me dims = 3 and -1 in rows and cols.
Is it correct?
If not what is the bug in it?
if yes how can I access my images one by one?
Reading the documentation of the class cv::Mat ->doc
You can see that cv::Mat.rows and cv::Mat.cols are the number of rows and cols in a 2D array -1 otherwise.
With source.create(3, histSize, CV_8U); you are creating a 3D array.
In the cv::Mat doc is written how to access the elements.
With the create method the matrix is continuos and in a plane-by-plane organized fashion.
EDIT
The first part of text in the documentation after the code of the class definition tells you how to access each element of the matrix using the step[] parameter of the matrix:
If you want to access the pixel (u, v) of the image i you need to get a pointer to the data and use pointer's arithmetic to reach the desired pixel:
int sizes[] = { 10, 200, 100 };
cv::Mat M(3, sizes, CV_8UC1);
//get a pointer to the pixel
uchar *px = M.data + M.step[0] * i + M.step[1] * u + M.step[2] * v;
//get the pixel intensity
uchar intensity = *px;

Resources