Windows c++ struct with Bit field, packing - struct

union AP
{
UCHAR bin[28];
struct{
ULONGLONG TA :42;
UINT St :6;
UINT Reserved1 :3;
UINT fo :4;
UINT P :9;
UINT cy :17;
UINT Reserved2 :3;
UINT A :12;
UINT Fg :8;
UINT P2 :24;
UINT Fp :10;
UINT SChNum :22;
UINT ItAdrs:32;
UINT IEAdrs:32;
}stt;
I want to use stt when stt's size is 28 byte.
but This code's size of stt is 32byte.
I think this code's needs "packing" because of 'ULONGLONG TA:42'.
#pragma pack(push,1)
#pragma pack(1)
both code didn't work.
how to packing struct that has bit field symbol.
I create my code on visual studio 2012 with c++, windows7

The C/C++ standards allow the compiler to insert padding between bitfields when the base types of the bitfields are different. So the padding is likely coming between your ULONGLONG and the subsequent UINT. Try splitting your ULONGLONG into two UINT members; that should eliminate the padding.

Related

using bitfields as a sorting key in modern C (C99/C11 union)

Requirement:
For my tiny graphics engine, I need an array of all objects to draw. For performance reasons this array needs to be sorted on the attributes. In short:
Store a lot of attributes per struct, add the struct to an array of structs
Efficiently sort the array
walk over the array and perform operations (modesetting and drawing) depending on the attributes
Approach: bitfields in a union (i.e.: let the compiler do the masking and shifting for me)
I thought I had an elegant plan to accomplish this, based on this article: http://realtimecollisiondetection.net/blog/?p=86. The idea is as follows: each attribute is a bitfield, which can be read and written to (step 1). After writing, the sorting procedure look at the bitfield struct as an integer, and sorts on it (step 2). Afterwards (step 3), the bitfields are read again.
Sometimes code says more than a 1000 words, a high-level view:
union key {
/* useful for accessing */
struct {
unsigned int some_attr : 2;
unsigned int another_attr : 3;
/* ... */
} bitrep;
/* useful for sorting */
uint64_t intrep;
};
I would just make sure that the bit-representation was as large as the integer representation (64 bits in this case). My first approach went like this:
union key {
/* useful for accessing */
struct {
/* generic part: 11 bits */
unsigned int layer : 2;
unsigned int viewport : 3;
unsigned int viewportLayer : 3;
unsigned int translucency : 2;
unsigned int type : 1;
/* depends on type-bit: 64 - 11 bits = 53 bits */
union {
struct {
unsigned int sequence : 8;
unsigned int id : 32;
unsigned int padding : 13;
} cmd;
struct {
unsigned int depth : 24;
unsigned int material : 29;
} normal;
};
};
/* useful for sorting */
uint64_t intrep;
};
Note that in this case, there is a decision bitfield called type. Based on that, either the cmd struct or the normal struct gets filled in, just like in the mentioned article. However this failed horribly. With clang 3.3 on OSX 10.9 (x86 macbook pro), the key union is 16 bytes, while it should be 8.
Unable to coerce clang to pack the struct better, I took another approach based on some other stack overflow answers and the preprocessor to avoid me having to repeat myself:
/* 2 + 3 + 3 + 2 + 1 + 5 = 16 bits */
#define GENERIC_FIELDS \
unsigned int layer : 2; \
unsigned int viewport : 3; \
unsigned int viewportLayer : 3; \
unsigned int translucency : 2; \
unsigned int type : 1; \
unsigned int : 5;
/* 8 + 32 + 8 = 48 bits */
#define COMMAND_FIELDS \
unsigned int sequence : 8; \
unsigned int id : 32; \
unsigned int : 8;
/* 24 + 24 = 48 bits */
#define MODEL_FIELDS \
unsigned int depth : 24; \
unsigned int material : 24;
struct generic {
/* 16 bits */
GENERIC_FIELDS
};
struct command {
/* 16 bits */
GENERIC_FIELDS
/* 48 bits */
COMMAND_FIELDS
} __attribute__((packed));
struct model {
/* 16 bits */
GENERIC_FIELDS
/* 48 bits */
MODEL_FIELDS
} __attribute__((packed));
union alkey {
struct generic gen;
struct command cmd;
struct model mod;
uint64_t intrep;
};
Without including the __attribute__((packed)), the command and model structs are 12 bytes. But with the __attribute__((packed)), they are 8 bytes, exactly what I wanted! So it would seems that I have found my solution. However, my small experience with bitfields has taught me to be leery. Which is why I have a few questions:
My questions are:
Can I get this to be cleaner (i.e.: more like my first big union-within-struct-within-union) and still keep it 8 bytes for the key, for fast sorting?
Is there a better way to accomplish this?
Is this safe? Will this fail on x86/ARM? (really exotic architectures are not much of a concern, I'm targeting the 2 most prevalent ones). What about setting a bitfield and then finding out that an adjacent one has already been written to. Will different compilers vary wildly on this?
What issues can I expect from different compilers? Currently I'm just aiming for clang 3.3+ and gcc 4.9+ with -std=c11. However it would be quite nice if I could use MSVC as well in the future.
Related question and webpages I've looked up:
Variable-sized bitfields with aliasing
Unions within unions
for those (like me) scratching their heads about what happens with bitfields that are not byte-aligned and endianness, look no further: http://mjfrazer.org/mjfrazer/bitfields/
Sadly no answer got me the entirety of the way there.
EDIT: While experimenting, setting some values and reading the integer representation. I noticed something that I had forgotten about: endianness. This opens up another can of worms. Is it even possible to do what I want using bitfields or will I have to go for bitshifting operations?
The layout for bitfields is highly implementation (=compiler) dependent. In essence, compilers are free to place consecutive bitfields in the same byte/word if it sees fit, or not. Thus without extensions like the packed attribute that you mention, you can never be sure that your bitfields are squeezed into one word.
Then, if the bitfields are not squeezed into one word, or if you have just some spare bits that you don't use, you may be even more in trouble. These so-called padding bits can have arbitrary values, thus your sorting idea could never work in a portable setting.
For all these reasons, bitfields are relatively rarely used in real code. What you can see more often is the use of macros for the bits of your uint64_t that you need. For each of your bitfields that you have now, you'd need two macros, one to extract the bits and one to set them. Such a code then would be portable on all platforms that have a C99/C11 compiler without problems.
Minor point:
In the declaration of a union it is better to but the basic integer field first. The default initializer for a union uses the first field, so this would then ensure that your union would be initialized to all bits zero by such an initializer. The initializer of the struct would only guarantee that the individual fields are set to 0, the padding bits, if any, would be unspecific.

Compiler Support of members of SSE vector types like m128_f32[x]

This might sound stupid but is there a way to activate support of the inner members of an SSE vector type ?
I know this works fine on MSVC, And I ve found some comments on forums and SO like this.
The question, is can I activate this on CLang at least without creating my own unions ?
Thank you
[edit, workaround]
Currently I decided to create a vec4 type to help me.
here is the code
#include <emmintrin.h>
#include <cstdint>
#ifdef _WIN32
typedef __m128 vec4;
typedef __m128i vec4i;
typedef __m128d vec4d;
#else
typedef union __declspec(align(16)) vec4{
float m128_f32[4];
uint64_t m128_u64[2];
int8_t m128_i8[16];
int16_t m128_i16[8];
int32_t m128_i32[4];
int64_t m128_i64[2];
uint8_t m128_u8[16];
uint16_t m128_u16[8];
uint32_t m128_u32[4];
} vec4;
typedef union __declspec(align(16)) vec4i{
uint64_t m128i_u64[2];
int8_t m128i_i8[16];
int16_t m128i_i16[8];
int32_t m128i_i32[4];
int64_t m128i_i64[2];
uint8_t m128i_u8[16];
uint16_t m128i_u16[8];
uint32_t m128i_u32[4];
} vec4i;
typedef union __declspec(align(16)) vec4d{
double m128d_f64[2];
} vec4d;
#endif
On recent clangs, this Just Works without you needing to do anything at all:
#include <immintrin.h>
float foo(__m128 x) {
return x[1];
}
AFAIK it Just Works in recent GCC builds as well.
However, I should note the following:
Consider carefully whether or not you really need to do element-wise access in your vector code. If you can keep your operations in-lane, they will almost certainly be significantly more efficient.
If you really do need to do a significant number of lanewise or horizontal operations, and you don’t need portability, consider using Clang extended vectors (or “OpenCL vectors") instead of the basic SSE intrinsic types. You can pass them to intrinsics just like __m128 and friends, but they also have much nicer syntax for vector-scalar operations, lane wise operations, vector literals, etc.

Passing struct to device driver through IOCTL

I am trying to pass a struct from user space to kernel space. I had been trying for many hours and it isn't working. Here is what I have done so far..
int device_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg){
int ret, SIZE;
switch(cmd){
case PASS_STRUCT_ARRAY_SIZE:
SIZE = (int *)arg;
if(ret < 0){
printk("Error in PASS_STRUCT_ARRAY_SIZE\n");
return -1;
}
printk("Struct Array Size : %d\n",SIZE);
break;
case PASS_STRUCT:
struct mesg{
int pIDs[SIZE];
int niceVal;
};
struct mesg data;
ret = copy_from_user(&data, arg, sizeof(*data));
if(ret < 0){
printk("PASS_STRUCT\n");
return -1;
}
printk("Message PASS_STRUCT : %d\n",data.niceVal);
break;
default :
return -ENOTTY;
}
return 0;
}
I have trouble defining the struct. What is the correct way to define it? I want to have int pIDs[SIZE]. Will int *pIDs do it(in user space it is defined like pIDs[SIZE])?
EDIT:
With the above change I get this error? error: expected expression before 'struct' any ideas?
There are two variants of the structure in your question.
struct mesg1{
int *pIDs;
int niceVal;
};
struct mesg2{
int pIDs[SIZE];
int niceVal;
};
They are different; in case of mesg1 you has pointer to int array (which is outside the struct). In other case (mesg2) there is int array inside the struct.
If your SIZE is fixed (in API of your module; the same value used in user- and kernel- space), you can use second variant (mesg2).
To use first variant of structure (mesg1) you may add field size to the structure itself, like:
struct mesg1{
int pIDs_size;
int *pIDs;
int niceVal;
};
and fill it with count of ints, pointed by *pIDs.
PS: And please, never use structures with variable-sized arrays in the middle of the struct (aka VLAIS). This is proprietary, wierd, buggy and non-documented extension to C language by GCC compiler. Only last field of struct can be array with variable size (VLA) according to international C standard. Some examples here: 1 2
PPS:
You can declare you struct with VLA (if there is only single array with variable size):
struct mesg2{
int niceVal;
int pIDs[];
};
but you should be careful when allocating memory for such struct with VLA

C++/CLI from tracking reference to (native) reference - wrapping

I need a C# interface to call some native C++ code via the CLI dialect. The C# interface uses the out attribute specifier in front of the required parameters. That translates to a % tracking reference in C++/CLI.
The method I has the following signature and body (it is calling another native method to do the job):
virtual void __clrcall GetMetrics(unsigned int %width, unsigned int %height, unsigned int %colourDepth, int %left, int %top) sealed
{
mRenderWindow->getMetrics(width, height, colourDepth, left, top);
}
Now the code won't compile because of a few compile time errors (all being related to not being able to convert parameter 1 from 'unsigned int' to 'unsigned int &').
As a modest C++ programmer, to me CLI is looking like Dutch to a German speaker. What can be done to make this wrapper work properly in CLI?
Like it was also suggested in a deleted answer, I did the obvious and used local variables to pass the relevant values around:
virtual void __clrcall GetMetrics(unsigned int %width, unsigned int %height, unsigned int %colourDepth, int %left, int %top) sealed
{
unsigned int w = width, h = height, c = colourDepth;
int l = left, t = top;
mRenderWindow->getMetrics(w, h, c, l, t);
width = w; height = h; colourDepth = c; left = l; top = t;
}
It was a bit obvious since the rather intuitive mechanism of tracked references: they're affected by the garbage collector's work and are not really that static/constant as normal &references when they're prone to be put somewhere else in memory. Thus this is the only way reliable enough to overcome the issue. Thanks to the initial answer.
If your parameters use 'out' on the C# side, you need to define your C++/CLI parameters like this: [Out] unsigned int ^%width
Here's an example:
virtual void __clrcall GetMetrics([Out] unsigned int ^%width)
{
width = gcnew UInt32(42);
}
Then on your C# side, you'll get back 42:
ValueType vt;
var res = cppClass.GetMetrics(out vt);
//vt == 42
In order to use the [Out] parameter on the C++/CLI side you'll need to include:
using namespace System::Runtime::InteropServices;
Hope this helps!
You can use pin_ptr so that 'width' doesn't move when native code changes it. The managed side suffers from pin_ptr, but I don't think you can get around that if you want native code directly access it without 'w'.
virtual void __clrcall GetMetrics(unsigned int %width, unsigned int %height, unsigned int %colourDepth, int %left, int %top) sealed
{
pin_ptr<unsigned int> pw = &width; //do the same for height
mRenderWindow->getMetrics(*pw, h, c, l, t);
}

Why I can't use global float constants in device code? [duplicate]

I am using CUDA 5.0. I noticed that the compiler will allow me to use host-declared int constants within kernels. However, it refuses to compile any kernels that use host-declared float constants. Does anyone know the reason for this seeming discrepancy?
For example, the following code runs just fine as is, but it will not compile if the final line in the kernel is uncommented.
#include <cstdio>
#include <cuda_runtime.h>
static int __constant__ DEV_INT_CONSTANT = 1;
static float __constant__ DEV_FLOAT_CONSTANT = 2.0f;
static int const HST_INT_CONSTANT = 3;
static float const HST_FLOAT_CONSTANT = 4.0f;
__global__ void uselessKernel(float * val)
{
*val = 0.0f;
// Use device int and float constants
*val += DEV_INT_CONSTANT;
*val += DEV_FLOAT_CONSTANT;
// Use host int and float constants
*val += HST_INT_CONSTANT;
//*val += HST_FLOAT_CONSTANT; // won't compile if uncommented
}
int main(void)
{
float * d_val;
cudaMalloc((void **)&d_val, sizeof(float));
uselessKernel<<<1, 1>>>(d_val);
cudaFree(d_val);
}
Adding a const number in the device code is OK, but adding a number stored on the host memory in the device code is NOT.
Every reference of the static const int in your code can be replaced with the value 3 by the compiler/optimizer when the addr of that variable is never referenced. In this case, it is like #define HST_INT_CONSTANT 3, and no host memory is allocated for this variable.
But for float var, the host memory is always allocated even it is of static const float. Since the kernel can not access the host memory directly, your code with static const float won't be compiled.
For C/C++, int can be optimized more aggressively than float.
You code runs when the comment is ON can be seen as a bug of CUDA C I think. The static const int is a host side thing, and should not be accessible to the device directly.

Resources