Weird values when passing an array of structs as an openCL kernel argument

Weird values when passing an array of structs as an openCL kernel argument - struct

When passing an array of structs to my kernel as an argument, I get weird values for the items after the first (array[1], array[2], etc). It seems to be an alignment issue maybe?
Here is the struct:
typedef struct Sphere
{
float3 color;
float3 position;
float3 reflectivity;
float radius;
int phong;
bool isReflective;
} Sphere;
Here is the host side init code:
cl::Buffer cl_spheres = cl::Buffer(context, CL_MEM_READ_ONLY, sizeof(Sphere) * MAX_SPHERES, NULL, &err);
err = queue.enqueueWriteBuffer(cl_spheres, CL_TRUE, 0, sizeof(Sphere) * MAX_SPHERES, spheres, NULL, &event);
err = kernel.setArg(3, cl_spheres);
What happens is that the color for the second Sphere struct in the array will actually have the last value of what I set color to on the host side (s3 or z), a non initialized zero value, and the first value of what I set position to on the host side (s0 or x). I noticed that the float3 datatype actually still has a fourth value (s3) that does not get initialized. I think that is where the non initialized zero value is coming from. So it seems that it is an alignment issue. I really am at a loss as to what I could do to fix it. I was hoping maybe someone could shed some light on this problem. I have ensured that my struct definitions are exactly the same on both sides.

From the OpenCL 1.2 specs, section 6.11.1:
Note that the alignment of any given struct or union type is required
by the ISO C standard to be at least a perfect multiple of the lowest
common multiple of the alignments of all of the members of the struct
or union in question and must also be a power of two.
Also cl_float3 counts as a cl_float4, see section 6.1.5.
Finally, in section 6.9.k:
Arguments to kernel functions in a program cannot be declared with the
built-in scalar types bool, half, size_t, ptrdiff_t, intptr_t, and
uintptr_t or a struct and/or union that contain fields declared to be
one of these built-in scalar types.
To comply with these rules, and probably make accesses faster, you can try (OpenCL C side; on the host use cl_float4):
typedef struct Sphere
{
float4 color;
float4 position;
float4 reflectivity;
float4 radiusPhongReflective; // each value uses 1 float
} Sphere;

Related

Undefined number of TEXCOORDs

How to declare an array of TEXCOORDs?
In different struct I have :
float2 foo : TEXCOORD0
float3 bar : TEXCOORD1
And now I need
float4 Positions[NUMBER_OF_FLOATS]
float3 OtherPositions[NUMBER_OF_FLOATS_2]
I want these arrays to consist of TEXCOORDs (if I omit the TEXCOORD semantic, I get an error because of it). But no matter how I write it, I get a duplicate error, that I use TEXCOORD0 and TEXCOORD1 multiple times.
Any help is appreciated.

The problem is that the predefined semantics like TEXCOORD have a specific type (seen in doc). So the compiler expects you TEXCOORD to be float vector and not an array of float vectors. Maybe it works with custom semantics, but didn't found any references and never tested it by myself.
I also stumbled over this problematic and solved it (quite ugly) with the preprocessor. In your case it would look like
#if NUMBER_OF_FLOATS > 0
float4 Position_1 : TEXCOORD0;
#endif
#if NUMBER_OF_FLOATS > 1
float4 Position_2 : TEXCOORD1;
#endif
#if NUMBER_OF_FLOATS > 2
float4 Position_3 : TEXCOORD2;
#endif
...
Of course this would need a recompling of the shader if the number changes and your vertex layout must have to fit, but despite it is not the best solution it works for me :)

What is the point of using arrays of one element in ddk structures?

Here is an excerpt from ntdddisk.h
typedef struct _DISK_GEOMETRY_EX {
DISK_GEOMETRY Geometry; // Standard disk geometry: may be faked by driver.
LARGE_INTEGER DiskSize; // Must always be correct
UCHAR Data[1]; // Partition, Detect info
} DISK_GEOMETRY_EX, *PDISK_GEOMETRY_EX;
What is the point of UCHAR Data[1];? Why not just UCHAR Data; ?
And there are a lot of structures in DDK which have arrays of one element in declarations.
Thanks, thats clear now. The one thing is not clear the implementation of offsetof.
It's defined as
#ifdef _WIN64
#define offsetof(s,m) (size_t)( (ptrdiff_t)&(((s *)0)->m) )
#else
#define offsetof(s,m) (size_t)&(((s *)0)->m)
#endif
How this works:
((s *)0)->m ???
This
(size_t)&((DISK_GEOMETRY_EX *)0)->Data
is like
sizeof (DISK_GEOMETRY) + sizeof( LARGE_INTEGER);
But there is two additional questions:
1)
What type is this? And why we should use & for this?
((DISK_GEOMETRY_EX *)0)->Data
2) ((DISK_GEOMETRY_EX *)0)
This gives me 00000000. Is it convering to the address alignment? interpret it like an address?

Very common in the winapi as well, these are variable length structures. The array is always the last element in the structure and it always includes a field that indicates the actual array size. A bitmap for example is declared that way:
typedef struct tagBITMAPINFO {
BITMAPINFOHEADER bmiHeader;
RGBQUAD bmiColors[1];
} BITMAPINFO, FAR *LPBITMAPINFO, *PBITMAPINFO;
The color table has a variable number of entries, 2 for a monochrome bitmap, 16 for a 4bpp and 256 for a 8bpp bitmap. Since the actual length of the structure varies, you cannot declare a variable of that type. The compiler won't reserve enough space for it. So you always need the free store to allocate it using code like this:
#include <stddef.h> // for offsetof() macro
....
size_t len = offsetof(BITMAPINFO, bmiColors) + 256 * sizeof(RGBQUAD);
BITMAPINFO* bmp = (BITMAPINFO*)malloc(len);
bmp->bmiHeader.biClrUsed = 256;
// etc...
//...
free(bmp);

Why is the transitive closure of matrix multiplication not working in this vertex shader?

This can probably be filed under premature optimization, but since the vertex shader executes on each vertex for each frame, it seems like something worth doing (I have a lot of vars I need to multiply before going to the pixel shader).
Essentially, the vertex shader performs this operation to convert a vector to projected space, like so:
// Transform the vertex position into projected space.
pos = mul(pos, model);
pos = mul(pos, view);
pos = mul(pos, projection);
output.pos = pos;
Since I'm doing this operation to multiple vectors in the shader, it made sense to combine those matrices into a cumulative matrix on the CPU and then flush that to the GPU for calculation, like so:
// VertexShader.hlsl
cbuffer ModelViewProjectionConstantBuffer : register (b0)
{
matrix model;
matrix view;
matrix projection;
matrix cummulative;
float3 eyePosition;
};
...
// Transform the vertex position into projected space.
pos = mul(pos, cummulative);
output.pos = pos;
And on the CPU:
// Renderer.cpp
// now is also the time to update the cummulative matrix
m_constantMatrixBufferData->cummulative =
m_constantMatrixBufferData->model *
m_constantMatrixBufferData->view *
m_constantMatrixBufferData->projection;
// NOTE: each of the above vars is an XMMATRIX
My intuition was that there was some mismatch of row-major/column-major, but XMMATRIX is a row-major struct (and all of its operators treat it as such) and mul(...) interprets its matrix parameter as row-major. So that doesn't seem to be the problem but perhaps it still is in a way that I don't understand.
I've also checked the contents of the cumulative matrix and they appear correct, further adding to the confusion.
Thanks for reading, I'll really appreciate any hints you can give me.
EDIT (additional information requested in comments):
This is the struct that I am using as my matrix constant buffer:
// a constant buffer that contains the 3 matrices needed to
// transform points so that they're rendered correctly
struct ModelViewProjectionConstantBuffer
{
DirectX::XMMATRIX model;
DirectX::XMMATRIX view;
DirectX::XMMATRIX projection;
DirectX::XMMATRIX cummulative;
DirectX::XMFLOAT3 eyePosition;
// and padding to make the size divisible by 16
float padding;
};
I create the matrix stack in CreateDeviceResources (along with my other constant buffers), like so:
void ModelRenderer::CreateDeviceResources()
{
Direct3DBase::CreateDeviceResources();
// Let's take this moment to create some constant buffers
... // creation of other constant buffers
// and lastly, the all mighty matrix buffer
CD3D11_BUFFER_DESC constantMatrixBufferDesc(sizeof(ModelViewProjectionConstantBuffer), D3D11_BIND_CONSTANT_BUFFER);
DX::ThrowIfFailed(
m_d3dDevice->CreateBuffer(
&constantMatrixBufferDesc,
nullptr,
&m_constantMatrixBuffer
)
);
... // and the rest of the initialization (reading in the shaders, loading assets, etc)
}
I write into the matrix buffer inside a matrix stack class I created. The client of the class calls Update() once they are done modifying the matrices:
void MatrixStack::Update()
{
// then update the buffers
m_constantMatrixBufferData->model = model.front();
m_constantMatrixBufferData->view = view.front();
m_constantMatrixBufferData->projection = projection.front();
// NOTE: the eye position has no stack, as it's kept updated by the trackball
// now is also the time to update the cummulative matrix
m_constantMatrixBufferData->cummulative =
m_constantMatrixBufferData->model *
m_constantMatrixBufferData->view *
m_constantMatrixBufferData->projection;
// and flush
m_d3dContext->UpdateSubresource(
m_constantMatrixBuffer.Get(),
0,
NULL,
m_constantMatrixBufferData,
0,
0
);
}

Given your code snippets, it should work.
Possible causes of your problem:
Have you tried the inverted multiplication: projection * view * model?
Are you sure you set correctly the cummulative (register index = 12, offset = 192) in your constant buffer
Same for eyePosition (register index = 12, offset = 256)?

GLSL - Front vs. Back faces of polygons

I made some simple shading in GLSL of a checkers board:
f(P) = [ floor(Px)+floor(Py)+floor(Pz) ] mod 2
It seems to work well except the fact that i see the interior of the objects but i want to see only the front face.
Any ideas how to fix this? Thanks!
Teapot (glutSolidTeapot()):
Cube (glutSolidCube):
The vertex shader file is:
varying float x,y,z;
void main(){
gl_Position = gl_ProjectionMatrix * gl_ModelViewMatrix * gl_Vertex;
x = gl_Position.x;
y = gl_Position.y;
z = gl_Position.z;
}
And the fragment shader file is:
varying float x,y,z;
void main(){
float _x=x;
float _y=y;
float _z=z;
_x=floor(_x);
_y=floor(_y);
_z=floor(_z);
float sum = (_x+_y+_z);
sum = mod(sum,2.0);
gl_FragColor = vec4(sum,sum,sum,1.0);
}

The shaders are not the problem - the face culling is.
You should either disable the face culling (which is not recommended, since it's bad for performance reasons):
glDisable(GL_CULL_FACE);
or use glCullFace and glFrontFace to set the culling mode, i.e.:
glEnable(GL_CULL_FACE); // enables face culling
glCullFace(GL_BACK); // tells OpenGL to cull back faces (the sane default setting)
glFrontFace(GL_CW); // tells OpenGL which faces are considered 'front' (use GL_CW or GL_CCW)
The argument to glFrontFace depends on application conventions, i.e. the matrix handedness.

How do you count registers in HLSL?

With shader model 2.0, you can have 256 constant registers. I have been looking at various shaders, and trying to figure out what constitutes a single register?
For example, in my instancing shader, I have the following variables declared at the top, outside of functions:
float4x4 InstanceTransforms[40];
float4 InstanceDiffuses[40];
float4x4 View;
float4x4 Projection;
float3 LightDirection = normalize(float3(-1, -1, -1));
float3 DiffuseLight = 1;
float3 AmbientLight = 0.66;
float Alpha;
texture Texture;
How many registers have I consumed? How do I count them?

Each constant register is a float4.
float3, float2 and float will each allocate a whole register. float4x4 will use 4 registers. Arrays will simply multiply the number of registers allocated by the number of elements. And the compiler will probably allocate a few registers itself to use as constants in various calculations.
The only way to really tell what the shader is using is to disassemble it. To that end you may be interested in this question that I asked a while ago: HLSL: Enforce Constant Register Limit at Compile Time
You might also find this one worth a look: HLSL: Index to unaligned/packed floats. It explains why an array of 40 floats will use 40 registers, and how you can make it use 10 instead.
Your texture will use a texture sampler (you have 16 of these), not a constant register.
For reference, here are the list of ps_2_0 registers and vs_2_0 registers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Weird values when passing an array of structs as an openCL kernel argument - struct

Related

Undefined number of TEXCOORDs

What is the point of using arrays of one element in ddk structures?

Why is the transitive closure of matrix multiplication not working in this vertex shader?

GLSL - Front vs. Back faces of polygons

How do you count registers in HLSL?

Categories

Resources