Why is VkShaderStageFlagBits a bitmask? - graphics

In Vulkan you specify the VkPipelineShaderStageCreateInfo's to the VkGraphicsPipelineCreateInfo structure, and presumably there is supposed to be one VkPipelineShaderStageCreateInfo for each shader stage (for example the vertex, and fragment shaders).
So why exactly is the field stage field of type vkShaderStageFlagBits is this just because it sits closer to some kind of Vulkan convention?
My confusion is I am led to believe that the only reason you would use a Bitmask in this way, is if you need to combine bits together. (For example for the general flags field in all Vulkan structures). I was trying to find the answer for this, so I looked at the Vulkan Spec, and this confused me even more! This is because they have two bits VK_SHADER_STAGE_ALL_GRAPHICS and VK_SHADER_STAGE_ALL these are defined as:
VK_SHADER_STAGE_ALL_GRAPHICS is a combination of bits used as shorthand to specify all graphics stages defined above (excluding the compute stage).
VK_SHADER_STAGE_ALL is a combination of bits used as shorthand to specify all shader stages supported by the device, including all additional stages which are introduced by extensions.
Well if they are supposed to be "shorthand" for specifying all bits, does this mean one shader stage, is supposed to be able to represent a version of all the stages?
Thanks in advance!

Exactly, this is mostly to keep the api consistent. VkShaderStageFlagBits is used in several spots where a bit mask makes more sense than at pipeline creation time.
An example where it makes sense are descriptor set layout bindings where you use the flag mask to specify what stages can access your descriptors (samplers, uniform buffer object, etc.).
So if you want one UBO to be accessible from the vertex and fragment stage and another one from the geometry and tessellation stage you'd use different stage flag bit combinations when setting up the VkDescriptorSetLayoutBinding. Pipeline state combinations are pretty common here.

Vulkan uses fields of type Vk*FlagBits (e.g. VkShaderStageFlagBits) when exactly one of the defined values is expected, and uses the corresponding Vk*Flags type (always a typedef for VkFlags which is just a typedef for uint32_t (e.g. typedef VkFlags VkShaderStageFlags) when a combination zero, one, or more of the defined values is expected.
There are two reasons for this:
It gives a signal (albeit subtle) about whether exactly one value is expected/allowed or some combination of values is expected.
Many compilers will give warnings when assigning a combination of bit values to a field of enum type, which in practice helps enforce (1). This is because to do bitwise operations on enum values, they're first promoted to an integer type, and the result is an integer type, and typical settings for most compilers yield a warning (often promoted to error) when doing an implicit conversion from integer to enum type, since the integer may not be one of the enumerated values.
So VkPipelineShaderStageCreateInfo::stage is VkShaderStageFlagBits because exactly one shader stage is valid there, and you'll probably get a warning if you try to set it to something silly like VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT.
But VkDescriptorSetLayoutBinding::stageFlags is VkShaderStageFlags because it's common and expected to include multiple stages there, and you won't get a compiler warning if you set it to VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT.

Related

How do you approach creating a complete new datatype on the "bit-level"?

I would like to create a new data type in Rust on the "bit-level".
For example, a quadruple-precision float. I could create a structure that has two double-precision floats and arbitrarily increase the precision by splitting the quad into two doubles, but I don't want to do that (that's what I mean by on the "bit-level").
I thought about using a u8-array or a bool-array but in both cases, I waste 7 bits of memory (because also bool is a byte large). I know there are several crates that implement something like bit-arrays or bit-vectors, but looking through their source code didn't help me to understand their implementation.
How would I create such a bit-array without wasting memory, and is this the way I would want to choose when implementing something like a quad-precision type?
I don't know how to implement new data types that don't use the basic types or are structures that combine the basic types, and I haven't been able to find a solution on the internet yet; maybe I'm not searching with the right keywords.
The question you are asking has no direct answer: Just like any other programming language, Rust has a basic set of rules for type layouts. This is due to the fact that (most) real-world CPUs can't address individual bits, need certain alignments when referencing memory, have rules regarding how pointer arithmetic works etc. etc.
For instance, if you create a type of just two bits, you'll still need an 8-bit byte to represent that type, because there is simply no way to address two individual bits on most CPU's opcodes; there is also no way to take the address of such a type because addressing works at least on the byte-level. More useful information regarding this can be found here, section 2, The Anatomy of a Type. Be aware that the non-wasting bit-level type you are thinking about needs to fulfill all the rules mentioned there.
It's a perfectly reasonable approach to represent what you want to do e.g. either as a single, wrapped u128 and implement all arithmetic on top of that type. Another, more generic, approach would be to use a Vec<u8>. You'll always do a relatively large amount of bit-masking, indirecting and such.
Having a look at rust_decimal or similar crates might also be a good idea.

Vulkan Shader & Resources: Why Uniform and not Const Resources

We usually use const in c++ to imply that the value does not change (read only), why in GLSL/VK in the shader or resource definition they choose the word uniform ? Wodn`t be more consistent and use the keyword borrowed from c/c++
Beside that probably the uniform keyword in shader definitions give clues to the compiler to attach those resources as close to the hardware as possible, probably shared memory or registers ? Not sure on that.
That also probably why they mention in the VkSpec. that we need small ammounts of data for those type of resources. Like for eg: values of cosmological constants..etc
Is anything that I`m missing, or some bit of history that passed away ?
Uniforms in GPU programming and const in C++ are focused on different things.
C++ const documents that a variable is not intended to be changed, with some compiler enforcement. As such it's more about using the type system to improve clarity and enforce intended usage -- important for large-project software engineering. You can still get around it with const_cast or other tricks, and the compiler can't assume you didn't, so it's not strictly enforced.
The important thing about uniforms is that they're, well, uniform. Meaning they have the same value whenever they are read within a draw call. Since there might be hundreds to millions of reads of that value in a single draw call, this allows it to be cached, and just one copy of it to be cached, or that it can be preloaded into registers (or cache) before shaders run, that it can be cached in a non-coherent cache, that a single read result can be broadcast across all SIMD lanes in a core, etc. For this to work, the fact that the contents can't change must be strictly enforced (with memory aliasing you can get around even this, now, but results are very much undefined if you do). So uniform really isn't about declaring intent to other programmers for software engineering benefits like const is, it's about declaring intent to the compiler and driver so they can optimize based on it.
D3D uses "const" and "constant buffer" rather than uniform, so clearly there is some overlap. Though that does lead to saying things like "how many times do you update constants per frame?" which when you think about it is kind of a weird thing to say :). The values are constant within shader code, but very much aren't constant at the API level.
The etymology of the word is important here. The term "uniform" is derived from GLSL, which was inspired by the Renderman standard's shader terminology. In Renderman, "uniform" was used for values "whose values are constant over whatever portion of the surface begin shaded". This was an alternative to "varying" which represented values interpolated across the surface.
"Constant" would imply that the value never changes. Uniform values do change; they simply don't change at the same frequency as other values. Input values change per-invocation, uniform values change per-draw call, and constant values don't change. Note that in GLSL, const usually means "compile-time constant": a value that is set at compile time and is never changed.
A uniform variable in Vulkan ultimately comes from a resource that exists outside of the shader. Blocks of uniform variables fed by buffers, uniforms in push constants fed by push constant state are both external resources, set by the user. That's a fundamentally different concept from having a compile-time constant struct.
Since it's different from a constant struct, it needs a different term to request it.

How to use ApplicationDataTypes in C code

For my understanding, the ApplicationDataType was introduced to AUTOSAR Version 4 to design Software-Components that are independent of the underlying platform and are therefore re-usable in different projects and applications.
But how about the implementation behind such a SW-C to be platform independent?
Use-case example: You want to design and implement a SW-C that works as a FiFo. You have one Port for Input-Data, an internal buffer and one Port for Output-Data. You could implement this without knowing about the data type of the data by using the “abstract” ApplicationDataType.
By using an ApplicationDataType for a variable as part of a PortInterface sooner or later you have to map this ApplicationDataType to an ImplementationDataType for the RTE-Generator.
Finally, the code created by the RTE-Generator only uses the ImplementationDataType. The ApplicationDataType is nowhere to be found in the generated code.
Is this intended behavior or a bug of the RTE-Generator?
(Or maybe I'm missing something?)
It is intended that ApplicationDataTypes do not directly appear in code, they are represented by their ImplementationDataType counterparts.
The motivation for the definition of data types on different levels of abstraction is explained in the AUTOSAR specifications, namely the TPS Software Component Template.
You will never find an ApplicationDataType in the C code, because it's defined on a physical level with a physical unit and might have a (completly) different representation on the implementation level in C.
Imagine a battery control sensor that measures the voltage. The value can be in range 0.0V and 14.0V with one digit after the decimal point (physical). You could map it to a float in C but floating point operations are expensive. Instead, you use a fixed point arithmetic where you map the phyiscal value 0.0 to 0, 0.1 to 1, 0.2 to 2 and so on. This mapping is described by a so called compuMethod.
The software component will always use the internal representation. So, why do you need the ApplicationDataType then? There are many reasons to use them, some of them are:
Methodology: The software component designer doesn't need to worry about the implementation in C. Somebody else can define that in a later stage.
Measurement If you measure the value, you have a well defined compuMethod and know the physical interpretation of the value in C.
Data conversion: If you connect software component with different units e.g. km/h vs mph, the Rte could automatically convert the internal representation between them.
Constant conversion: You can specify an initial value on the physical value (e.g. 10.6V) and the Rte will convert it to the internal representation.
Variable Size Arrays: Without dynamic memory allocation, you cannot have a variable size array in C. But you could reserve some (max) memory in an array and store the actual length in a seperate field. On the implementation level you have then a struct with two members (value, length). But on the application level you just have an array.
from AUTOSAR_TPS_SoftwareComponentTemplate.pdf
ApplicationDataType defines a data type from the application point of
view. Especially it should be used whenever something "physical" is at
stake.
An ApplicationDataType represents a set of values as seen in the
application model, such as measurement units. It does not consider
implementation details such as bit-size, endianess, etc.
It should be possible to model the application level aspects of a VFB
system by using ApplicationDataTypes only.

What does Int use three bits for? [duplicate]

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

What's going on in the 'offsetof' macro?

Visual C++ 2008 C runtime offers an operator 'offsetof', which is actually macro defined as this:
#define offsetof(s,m) (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
This allows you to calculate the offset of the member variable m within the class s.
What I don't understand in this declaration is:
Why are we casting m to anything at all and then dereferencing it? Wouldn't this have worked just as well:
&(((s*)0)->m)
?
What's the reason for choosing char reference (char&) as the cast target?
Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
An offset is in bytes. So to get a number expressed in bytes, you have to cast the addresses to char, because that is the same size as a byte (on this platform).
The use of volatile is perhaps a cautious step to ensure that no compiler optimisations (either that exist now or may be added in the future) will change the precise meaning of the cast.
Update:
If we look at the macro definition:
(size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
With the cast-to-char removed it would be:
(size_t)&((((s *)0)->m))
In other words, get the address of member m in an object at address zero, which does look okay at first glance. So there must be some way that this would potentially cause a problem.
One thing that springs to mind is that the operator & may be overloaded on whatever type m happens to be. If so, this macro would be executing arbitrary code on an "artificial" object that is somewhere quite close to address zero. This would probably cause an access violation.
This kind of abuse may be outside the applicability of offsetof, which is supposed to only be used with POD types. Perhaps the idea is that it is better to return a junk value instead of crashing.
(Update 2: As Steve pointed out in the comments, there would be no similar problem with operator ->)
offsetof is something to be very careful with in C++. It's a relic from C. These days we are supposed to use member pointers. That said, I believe that member pointers to data members are overdesigned and broken - I actually prefer offsetof.
Even so, offsetof is full of nasty surprises.
First, for your specific questions, I suspect the real issue is that they've adapted relative to the traditional C macro (which I thought was mandated in the C++ standard). They probably use reinterpret_cast for "it's C++!" reasons (so why the (size_t) cast?), and a char& rather than a char* to try to simplify the expression a little.
Casting to char looks redundant in this form, but probably isn't. (size_t) is not equivalent to reinterpret_cast, and if you try to cast pointers to other types into integers, you run into problems. I don't think the compiler even allows it, but to be honest, I'm suffering memory failure ATM.
The fact that char is a single byte type has some relevance in the traditional form, but that may only be why the cast is correct again. To be honest, I seem to remember casting to void*, then char*.
Incidentally, having gone to the trouble of using C++-specific stuff, they really should be using std::ptrdiff_t for the final cast.
Anyway, coming back to the nasty surprises...
VC++ and GCC probably won't use that macro. IIRC, they have a compiler intrinsic, depending on options.
The reason is to do what offsetof is intended to do, rather than what the macro does, which is reliable in C but not in C++. To understand this, consider what would happen if your struct uses multiple or virtual inheritance. In the macro, when you dereference a null pointer, you end up trying to access a virtual table pointer that isn't there at address zero, meaning that your app probably crashes.
For this reason, some compilers have an intrinsic that just uses the specified structs layout instead of trying to deduce a run-time type. But the C++ standard doesn't mandate or even suggest this - it's only there for C compatibility reasons. And you still have to be careful if you're working with class heirarchies, because as soon as you use multiple or virtual inheritance, you cannot assume that the layout of the derived class matches the layout of the base class - you have to ensure that the offset is valid for the exact run-time type, not just a particular base.
If you're working on a data structure library, maybe using single inheritance for nodes, but apps cannot see or use your nodes directly, offsetof works well. But strictly speaking, even then, there's a gotcha. If your data structure is in a template, the nodes may have fields with types from template parameters (the contained data type). If that isn't POD, technically your structs aren't POD either. And all the standard demands for offsetof is that it works for POD. In practice, it will work - your type hasn't gained a virtual table or anything just because it has a non-POD member - but you have no guarantees.
If you know the exact run-time type when you dereference using a field offset, you should be OK even with multiple and virtual inheritance, but ONLY if the compiler provides an intrinsic implementation of offsetof to derive that offset in the first place. My advice - don't do it.
Why use inheritance in a data structure library? Well, how about...
class node_base { ... };
class leaf_node : public node_base { ... };
class branch_node : public node_base { ... };
The fields in the node_base are automatically shared (with identical layout) in both the leaf and branch, avoiding a common error in C with accidentally different node layouts.
BTW - offsetof is avoidable with this kind of stuff. Even if you are using offsetof for some jobs, node_base can still have virtual methods and therefore a virtual table, so long as it isn't needed to dereference member variables. Therefore, node_base can have pure virtual getters, setters and other methods. Normally, that's exactly what you should do. Using offsetof (or member pointers) is a complication, and should only be used as an optimisation if you know you need it. If your data structure is in a disk file, for instance, you definitely don't need it - a few virtual call overheads will be insignificant compared with the disk access overheads, so any optimisation efforts should go into minimising disk accesses.
Hmmm - went off on a bit of a tangent there. Whoops.
char is guarenteed to be the smallest number of bits the architectural can "bite" (aka byte).
All pointers are actually numbers, so cast adress 0 to that type because it's the beginning.
Take the address of member starting from 0 (resulting into 0 + location_of_m).
Cast that back to size_t.
1) I also do not know why it is done in this way.
2) The char type is special in two ways.
No other type has weaker alignment restrictions than the char type. This is important for reinterpret cast between pointers and between expression and reference.
It is also the only type (together with its unsigned variant) for which the specification defines behavior in case the char is used to access stored value of variables of different type. I do not know if this applies to this specific situation.
3) I think that the volatile modifier is used to ensure that no compiler optimization will result in attempt to read the memory.
2 . What's the reason for choosing char reference (char&) as the cast target?
if type s has operator& overloaded then we can't get address using &s
so we reinterpret_cast the type s to primitive type char because primitive type char
doesn't have operator& overloaded
now we can get address from that
if in C then reinterpret_cast is not required
3 . Why use volatile? Is there a danger of the compiler optimizing the loading of m? If so, in what exact way could that happen?
here volatile is not relevant to compiler optimizing.
if type s have const or volatile or both qualifier(s) then
reinterpret_cast can't cast to char& because reinterpret_cast can't remove cv-qualifiers
so result is using <const volatile char&> for casting work from any combination

Resources