using bitfields as a sorting key in modern C (C99/C11 union) - struct

Requirement:
For my tiny graphics engine, I need an array of all objects to draw. For performance reasons this array needs to be sorted on the attributes. In short:
Store a lot of attributes per struct, add the struct to an array of structs
Efficiently sort the array
walk over the array and perform operations (modesetting and drawing) depending on the attributes
Approach: bitfields in a union (i.e.: let the compiler do the masking and shifting for me)
I thought I had an elegant plan to accomplish this, based on this article: http://realtimecollisiondetection.net/blog/?p=86. The idea is as follows: each attribute is a bitfield, which can be read and written to (step 1). After writing, the sorting procedure look at the bitfield struct as an integer, and sorts on it (step 2). Afterwards (step 3), the bitfields are read again.
Sometimes code says more than a 1000 words, a high-level view:
union key {
/* useful for accessing */
struct {
unsigned int some_attr : 2;
unsigned int another_attr : 3;
/* ... */
} bitrep;
/* useful for sorting */
uint64_t intrep;
};
I would just make sure that the bit-representation was as large as the integer representation (64 bits in this case). My first approach went like this:
union key {
/* useful for accessing */
struct {
/* generic part: 11 bits */
unsigned int layer : 2;
unsigned int viewport : 3;
unsigned int viewportLayer : 3;
unsigned int translucency : 2;
unsigned int type : 1;
/* depends on type-bit: 64 - 11 bits = 53 bits */
union {
struct {
unsigned int sequence : 8;
unsigned int id : 32;
unsigned int padding : 13;
} cmd;
struct {
unsigned int depth : 24;
unsigned int material : 29;
} normal;
};
};
/* useful for sorting */
uint64_t intrep;
};
Note that in this case, there is a decision bitfield called type. Based on that, either the cmd struct or the normal struct gets filled in, just like in the mentioned article. However this failed horribly. With clang 3.3 on OSX 10.9 (x86 macbook pro), the key union is 16 bytes, while it should be 8.
Unable to coerce clang to pack the struct better, I took another approach based on some other stack overflow answers and the preprocessor to avoid me having to repeat myself:
/* 2 + 3 + 3 + 2 + 1 + 5 = 16 bits */
#define GENERIC_FIELDS \
unsigned int layer : 2; \
unsigned int viewport : 3; \
unsigned int viewportLayer : 3; \
unsigned int translucency : 2; \
unsigned int type : 1; \
unsigned int : 5;
/* 8 + 32 + 8 = 48 bits */
#define COMMAND_FIELDS \
unsigned int sequence : 8; \
unsigned int id : 32; \
unsigned int : 8;
/* 24 + 24 = 48 bits */
#define MODEL_FIELDS \
unsigned int depth : 24; \
unsigned int material : 24;
struct generic {
/* 16 bits */
GENERIC_FIELDS
};
struct command {
/* 16 bits */
GENERIC_FIELDS
/* 48 bits */
COMMAND_FIELDS
} __attribute__((packed));
struct model {
/* 16 bits */
GENERIC_FIELDS
/* 48 bits */
MODEL_FIELDS
} __attribute__((packed));
union alkey {
struct generic gen;
struct command cmd;
struct model mod;
uint64_t intrep;
};
Without including the __attribute__((packed)), the command and model structs are 12 bytes. But with the __attribute__((packed)), they are 8 bytes, exactly what I wanted! So it would seems that I have found my solution. However, my small experience with bitfields has taught me to be leery. Which is why I have a few questions:
My questions are:
Can I get this to be cleaner (i.e.: more like my first big union-within-struct-within-union) and still keep it 8 bytes for the key, for fast sorting?
Is there a better way to accomplish this?
Is this safe? Will this fail on x86/ARM? (really exotic architectures are not much of a concern, I'm targeting the 2 most prevalent ones). What about setting a bitfield and then finding out that an adjacent one has already been written to. Will different compilers vary wildly on this?
What issues can I expect from different compilers? Currently I'm just aiming for clang 3.3+ and gcc 4.9+ with -std=c11. However it would be quite nice if I could use MSVC as well in the future.
Related question and webpages I've looked up:
Variable-sized bitfields with aliasing
Unions within unions
for those (like me) scratching their heads about what happens with bitfields that are not byte-aligned and endianness, look no further: http://mjfrazer.org/mjfrazer/bitfields/
Sadly no answer got me the entirety of the way there.
EDIT: While experimenting, setting some values and reading the integer representation. I noticed something that I had forgotten about: endianness. This opens up another can of worms. Is it even possible to do what I want using bitfields or will I have to go for bitshifting operations?

The layout for bitfields is highly implementation (=compiler) dependent. In essence, compilers are free to place consecutive bitfields in the same byte/word if it sees fit, or not. Thus without extensions like the packed attribute that you mention, you can never be sure that your bitfields are squeezed into one word.
Then, if the bitfields are not squeezed into one word, or if you have just some spare bits that you don't use, you may be even more in trouble. These so-called padding bits can have arbitrary values, thus your sorting idea could never work in a portable setting.
For all these reasons, bitfields are relatively rarely used in real code. What you can see more often is the use of macros for the bits of your uint64_t that you need. For each of your bitfields that you have now, you'd need two macros, one to extract the bits and one to set them. Such a code then would be portable on all platforms that have a C99/C11 compiler without problems.
Minor point:
In the declaration of a union it is better to but the basic integer field first. The default initializer for a union uses the first field, so this would then ensure that your union would be initialized to all bits zero by such an initializer. The initializer of the struct would only guarantee that the individual fields are set to 0, the padding bits, if any, would be unspecific.

Related

Was: How does BPF calculate number of CPU for PERCPU_ARRAY?

I have encountered an interesting issue where a PERCPU_ARRAY created on one system with 2 processors creates an array with 2 per-CPU elements and on another system with 2 processors, an array with 128 per-CPU elements. The latter was rather unexpected to me!
The way I discovered this behavior is that a program that allocated an array for the number of CPUs (using get_nprocs_conf(3)) and then read in the PERCPU_ARRAY into it (using bpf_map_lookup_elem()) ended up writing past the end of the array and crashing.
I would like to find out what is the proper way to determine in a program that reads BPF maps the number of elements in a PERCPU_ARRAY used on a system.
Failing that, I think the second best approach is to pick a buffer for reading in that is "large enough." Here, the problem is similar: what is that number and is there way to learn it at runtime?
The question comes from reading the source of bpftool, which figures this out:
unsigned int get_possible_cpus(void)
{
int cpus = libbpf_num_possible_cpus();
if (cpus < 0) {
p_err("Can't get # of possible cpus: %s", strerror(-cpus));
exit(-1);
}
return cpus;
}
int libbpf_num_possible_cpus(void)
{
static const char *fcpu = "/sys/devices/system/cpu/possible";
static int cpus;
int err, n, i, tmp_cpus;
bool *mask;
/* ---8<--- snip */
}
So that's how they do it!

Dynamic memory in a struct

So, my struct is like this:
struct player{
char name[20];
int time;
}s[50];
I don't know how many players i am going to add to the struct, and i also have to use dynamic memory for this. So how can i allocate and reallocate more space when i add a player to my struct?
I am inexperienced programmer, but i have been googling for this for a long time and i also don't perfectly understand structs.
This site doesn't accept my question so let's put some more text to this post
I assume you are programming in C/C++.
Your struct player has static-allocated fields, so, when you use malloc on it, you are asking space for a 20-byte char array a for one integer.
My suggestion is to store in a variable (or #define a symbol) the initial number of structures you can accept. Then, use malloc to allocate a static array that contains these structures.
Also you have to think about a strategy to store new players coming. The simplest one could be have an index variable to store last free position and use it to add over that position.
A short example follows:
#define init_cap 50
struct player {
char name[20];
int time;
};
int main() {
int index;
struct player* players;
players = (struct player*) malloc(init_cap * sizeof(struct player));
for(i = 0; i < init_cap; i++) {
strcpy(players[i].name, "peppe");
players[i].time = i;
}
free(players);
return 0;
}
At this point you should also think about reallocating memory if the number of players you get a runtime exceeds your initial capacity. You can use:
players = (struct player*) realloc(2 * init_cap * sizeof(struct player));
in order to double the initial capacity.
At the end, always remember to free the requested memory.

What is the point of using arrays of one element in ddk structures?

Here is an excerpt from ntdddisk.h
typedef struct _DISK_GEOMETRY_EX {
DISK_GEOMETRY Geometry; // Standard disk geometry: may be faked by driver.
LARGE_INTEGER DiskSize; // Must always be correct
UCHAR Data[1]; // Partition, Detect info
} DISK_GEOMETRY_EX, *PDISK_GEOMETRY_EX;
What is the point of UCHAR Data[1];? Why not just UCHAR Data; ?
And there are a lot of structures in DDK which have arrays of one element in declarations.
Thanks, thats clear now. The one thing is not clear the implementation of offsetof.
It's defined as
#ifdef _WIN64
#define offsetof(s,m) (size_t)( (ptrdiff_t)&(((s *)0)->m) )
#else
#define offsetof(s,m) (size_t)&(((s *)0)->m)
#endif
How this works:
((s *)0)->m ???
This
(size_t)&((DISK_GEOMETRY_EX *)0)->Data
is like
sizeof (DISK_GEOMETRY) + sizeof( LARGE_INTEGER);
But there is two additional questions:
1)
What type is this? And why we should use & for this?
((DISK_GEOMETRY_EX *)0)->Data
2) ((DISK_GEOMETRY_EX *)0)
This gives me 00000000. Is it convering to the address alignment? interpret it like an address?
Very common in the winapi as well, these are variable length structures. The array is always the last element in the structure and it always includes a field that indicates the actual array size. A bitmap for example is declared that way:
typedef struct tagBITMAPINFO {
BITMAPINFOHEADER bmiHeader;
RGBQUAD bmiColors[1];
} BITMAPINFO, FAR *LPBITMAPINFO, *PBITMAPINFO;
The color table has a variable number of entries, 2 for a monochrome bitmap, 16 for a 4bpp and 256 for a 8bpp bitmap. Since the actual length of the structure varies, you cannot declare a variable of that type. The compiler won't reserve enough space for it. So you always need the free store to allocate it using code like this:
#include <stddef.h> // for offsetof() macro
....
size_t len = offsetof(BITMAPINFO, bmiColors) + 256 * sizeof(RGBQUAD);
BITMAPINFO* bmp = (BITMAPINFO*)malloc(len);
bmp->bmiHeader.biClrUsed = 256;
// etc...
//...
free(bmp);

In bitmap.h, why does bitmap_zero need memset?

In include/linux/bitmap.h, in the bitmap_zero(), why use memset?
static inline void bitmap_zero(unsigned long *dst, int nbits)
{
if (small_const_nbits(nbits))
*dst = 0UL;
else {
int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
memset(dst, 0, len);
}
}
IS the *det = OUL not enough?
The definition of small_const_nbits is:
#define small_const_nbits(nbits) \
(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
BITS_PER_LONG is normally 32 or 64, depending on what machine you're on.
So, if you're trying to clear fewer than that many bits, you can certainly do it in a single operation - that's the first half of the if statement. If it's longer than 32 or 64 bits, you need to set multiple words, and that's done by the memset call.

allocating enough memory using typedef struct object whose size varies in another typedef struct

I have defined two typedef structs, and the second has the first as an object:
typedef struct
{
int numFeatures;
float* levelNums;
} Symbol;
typedef struct
{
int numSymbols;
Symbol* symbols;
} Data_Set;
I then defined numFeatures and numSymbols and allocate memory for both symbols and levelNums, then fill levelNums inside a for loop with value of the inner loop index just to verify it is working as expected.
Data_Set lung_cancer;
lung_cancer.numSymbols = 5;
lung_cancer.symbols = (Symbol*)malloc( lung_cancer.numSymbols * sizeof( Symbol ) );
lung_cancer.symbols->numFeatures = 3;
lung_cancer.symbols->levelNums = (float*)malloc( lung_cancer.symbols->numFeatures * sizeof( float ) );
for(int symbol = 0; symbol < lung_cancer.numSymbols; symbol++ )
for( int feature = 0; feature < lung_cancer.symbols->numFeatures; feature++ )
*(lung_cancer.symbols->levelNums + symbol * lung_cancer.symbols->numFeatures + feature ) = feature;
for(int symbol = 0; symbol < lung_cancer.numSymbols; symbol++ )
for( int feature = 0; feature < lung_cancer.symbols->numFeatures; feature++ )
cout << *(lung_cancer.symbols->levelNums + symbol * lung_cancer.symbols->numFeatures + feature ) << endl;
return 0;
When levelNums are int I get what I expect( i.e. 0,1,2,0,1,2,...) but when they are float, only the first 3 are correct and the remaining are very small or very large values, not 0,1,2 like expected. I then have two questions:
When allocating memory for symbols, how does it know how big a Symbol is since I have not yet defined how large levelNums will be yet.
How do I get float values into levelNums correctly.
The reason I am doing it like this is this is a data structure that will be sent to a GPU for GPGPU programming in CUDA and arrays are not recognized. I can only send in a continuous block of memory explicitly and the typedef structs are only there for conveying/defining the memory struture of the data.
A couple thing jump out at meet. For one thing, you only allocated a buffer for levelNums of the first symbol. Similarly, your inner loops always loop over the numFeatures of the first symbol.
You're doing a whole lot of dereferencing of arrays, which is fine in general, but the assignment in particular (inside the first set of loops) looks very strange. It's entirely possible I just don't understand what you're trying to do there, but I think it'd be a lot less confusing if you used some square bracket array accessors.

Resources