An input attachment can be accessed by the subpassLoad GLSL function which samples the input attachment at the current fragment position, i.e. the interface doesn't provide random access. The consequence of this that input attachments cannot be accessed at arbitrary fragment locations.
This practically means [1]:
If a rendering technique requires reading values outside the current fragment area (which on a tiler would mean accessing rendered data outside the currently-rendering tile), separate render passes must be used.
Then, about VK_DEPENDENCY_BY_REGION_BIT the specification says [2]:
If a synchronization command includes a dependencyFlags parameter, and specifies the VK_DEPENDENCY_BY_REGION_BIT flag, then it defines framebuffer-local dependencies for the framebuffer-space pipeline stages in that synchronization command, for all framebuffer regions. If no dependencyFlags parameter is included, or the VK_DEPENDENCY_BY_REGION_BIT flag is not specified, then a framebuffer-global dependency is specified for those stages.
Hans-Kristian Arntzen from ARM [3] suggests that on tiled architectures multi-subpass renderpasses should be used only in conjuction with VK_DEPENDENCY_BY_REGION_BIT:
Next, we try to merge adjacent render passes together. This is particularly important on tile-based renderers. We try to merge passes together if:
They are both graphics passes
They share some color/depth/input attachments
Not more than one unique depth/stencil attachment exists
Their dependencies can be implemented with BY_REGION_BIT, i.e. no “texture” dependency, which allows sampling for arbitrary locations.
Now the questions are:
If you cannot access fragments outside of the current fragment location anyway, what is the point of VK_DEPENDENCY_BY_REGION_BIT?
On tiled architectures does a multi-subpass render pass where subpass dependencies cannot be declared with VK_DEPENDENCY_BY_REGION_BIT provide any performance advantage over functionally equivalent properly-synchronized series of separate single-subpass render passes?
Well, the specification gives one example. If you want to access a sample of the input attachment that is not covered by the fragment, then you have to use framebuffer-global dependency (i.e. dependencyFlags = 0, or one of the vendor extension fixes that).
Though the most obvious example are non-attachment resources, which are naturally random access (where you can access any pixel). With VK_DEPENDENCY_BY_REGION_BIT only the part that was written for the same fragment can ever be certain to be visible. While with framebuffer-global dependency (dependencyFlags=0), you could access a location in a storage buffer written by any fragment shader invocation of the previous subpass.
dependencyFlags=0 is sort of a soft-restart of the Render Pass. So everything being the same I would grade the performance this way:
single Subpass ≥ multiple Subpasses with VK_DEPENDENCY_BY_REGION_BIT ≥ multiple subpasses without VK_DEPENDENCY_BY_REGION_BIT ≥ multiple render passes.
Whether framebuffer-global subpasses actually provide any performance advantage I cannot say without measurement of a particular implementation (and that would potentially be a perishable information, changing with new GPUs, or perhaps even driver versions). Though the case should not be worse than a separate render pass, which would likely be the worst demotion the driver itself would do if it cannot do anything with those special subpasses.
Related
In Vulkan you specify the VkPipelineShaderStageCreateInfo's to the VkGraphicsPipelineCreateInfo structure, and presumably there is supposed to be one VkPipelineShaderStageCreateInfo for each shader stage (for example the vertex, and fragment shaders).
So why exactly is the field stage field of type vkShaderStageFlagBits is this just because it sits closer to some kind of Vulkan convention?
My confusion is I am led to believe that the only reason you would use a Bitmask in this way, is if you need to combine bits together. (For example for the general flags field in all Vulkan structures). I was trying to find the answer for this, so I looked at the Vulkan Spec, and this confused me even more! This is because they have two bits VK_SHADER_STAGE_ALL_GRAPHICS and VK_SHADER_STAGE_ALL these are defined as:
VK_SHADER_STAGE_ALL_GRAPHICS is a combination of bits used as shorthand to specify all graphics stages defined above (excluding the compute stage).
VK_SHADER_STAGE_ALL is a combination of bits used as shorthand to specify all shader stages supported by the device, including all additional stages which are introduced by extensions.
Well if they are supposed to be "shorthand" for specifying all bits, does this mean one shader stage, is supposed to be able to represent a version of all the stages?
Thanks in advance!
Exactly, this is mostly to keep the api consistent. VkShaderStageFlagBits is used in several spots where a bit mask makes more sense than at pipeline creation time.
An example where it makes sense are descriptor set layout bindings where you use the flag mask to specify what stages can access your descriptors (samplers, uniform buffer object, etc.).
So if you want one UBO to be accessible from the vertex and fragment stage and another one from the geometry and tessellation stage you'd use different stage flag bit combinations when setting up the VkDescriptorSetLayoutBinding. Pipeline state combinations are pretty common here.
Vulkan uses fields of type Vk*FlagBits (e.g. VkShaderStageFlagBits) when exactly one of the defined values is expected, and uses the corresponding Vk*Flags type (always a typedef for VkFlags which is just a typedef for uint32_t (e.g. typedef VkFlags VkShaderStageFlags) when a combination zero, one, or more of the defined values is expected.
There are two reasons for this:
It gives a signal (albeit subtle) about whether exactly one value is expected/allowed or some combination of values is expected.
Many compilers will give warnings when assigning a combination of bit values to a field of enum type, which in practice helps enforce (1). This is because to do bitwise operations on enum values, they're first promoted to an integer type, and the result is an integer type, and typical settings for most compilers yield a warning (often promoted to error) when doing an implicit conversion from integer to enum type, since the integer may not be one of the enumerated values.
So VkPipelineShaderStageCreateInfo::stage is VkShaderStageFlagBits because exactly one shader stage is valid there, and you'll probably get a warning if you try to set it to something silly like VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT.
But VkDescriptorSetLayoutBinding::stageFlags is VkShaderStageFlags because it's common and expected to include multiple stages there, and you won't get a compiler warning if you set it to VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT.
We usually use const in c++ to imply that the value does not change (read only), why in GLSL/VK in the shader or resource definition they choose the word uniform ? Wodn`t be more consistent and use the keyword borrowed from c/c++
Beside that probably the uniform keyword in shader definitions give clues to the compiler to attach those resources as close to the hardware as possible, probably shared memory or registers ? Not sure on that.
That also probably why they mention in the VkSpec. that we need small ammounts of data for those type of resources. Like for eg: values of cosmological constants..etc
Is anything that I`m missing, or some bit of history that passed away ?
Uniforms in GPU programming and const in C++ are focused on different things.
C++ const documents that a variable is not intended to be changed, with some compiler enforcement. As such it's more about using the type system to improve clarity and enforce intended usage -- important for large-project software engineering. You can still get around it with const_cast or other tricks, and the compiler can't assume you didn't, so it's not strictly enforced.
The important thing about uniforms is that they're, well, uniform. Meaning they have the same value whenever they are read within a draw call. Since there might be hundreds to millions of reads of that value in a single draw call, this allows it to be cached, and just one copy of it to be cached, or that it can be preloaded into registers (or cache) before shaders run, that it can be cached in a non-coherent cache, that a single read result can be broadcast across all SIMD lanes in a core, etc. For this to work, the fact that the contents can't change must be strictly enforced (with memory aliasing you can get around even this, now, but results are very much undefined if you do). So uniform really isn't about declaring intent to other programmers for software engineering benefits like const is, it's about declaring intent to the compiler and driver so they can optimize based on it.
D3D uses "const" and "constant buffer" rather than uniform, so clearly there is some overlap. Though that does lead to saying things like "how many times do you update constants per frame?" which when you think about it is kind of a weird thing to say :). The values are constant within shader code, but very much aren't constant at the API level.
The etymology of the word is important here. The term "uniform" is derived from GLSL, which was inspired by the Renderman standard's shader terminology. In Renderman, "uniform" was used for values "whose values are constant over whatever portion of the surface begin shaded". This was an alternative to "varying" which represented values interpolated across the surface.
"Constant" would imply that the value never changes. Uniform values do change; they simply don't change at the same frequency as other values. Input values change per-invocation, uniform values change per-draw call, and constant values don't change. Note that in GLSL, const usually means "compile-time constant": a value that is set at compile time and is never changed.
A uniform variable in Vulkan ultimately comes from a resource that exists outside of the shader. Blocks of uniform variables fed by buffers, uniforms in push constants fed by push constant state are both external resources, set by the user. That's a fundamentally different concept from having a compile-time constant struct.
Since it's different from a constant struct, it needs a different term to request it.
For my understanding, the ApplicationDataType was introduced to AUTOSAR Version 4 to design Software-Components that are independent of the underlying platform and are therefore re-usable in different projects and applications.
But how about the implementation behind such a SW-C to be platform independent?
Use-case example: You want to design and implement a SW-C that works as a FiFo. You have one Port for Input-Data, an internal buffer and one Port for Output-Data. You could implement this without knowing about the data type of the data by using the “abstract” ApplicationDataType.
By using an ApplicationDataType for a variable as part of a PortInterface sooner or later you have to map this ApplicationDataType to an ImplementationDataType for the RTE-Generator.
Finally, the code created by the RTE-Generator only uses the ImplementationDataType. The ApplicationDataType is nowhere to be found in the generated code.
Is this intended behavior or a bug of the RTE-Generator?
(Or maybe I'm missing something?)
It is intended that ApplicationDataTypes do not directly appear in code, they are represented by their ImplementationDataType counterparts.
The motivation for the definition of data types on different levels of abstraction is explained in the AUTOSAR specifications, namely the TPS Software Component Template.
You will never find an ApplicationDataType in the C code, because it's defined on a physical level with a physical unit and might have a (completly) different representation on the implementation level in C.
Imagine a battery control sensor that measures the voltage. The value can be in range 0.0V and 14.0V with one digit after the decimal point (physical). You could map it to a float in C but floating point operations are expensive. Instead, you use a fixed point arithmetic where you map the phyiscal value 0.0 to 0, 0.1 to 1, 0.2 to 2 and so on. This mapping is described by a so called compuMethod.
The software component will always use the internal representation. So, why do you need the ApplicationDataType then? There are many reasons to use them, some of them are:
Methodology: The software component designer doesn't need to worry about the implementation in C. Somebody else can define that in a later stage.
Measurement If you measure the value, you have a well defined compuMethod and know the physical interpretation of the value in C.
Data conversion: If you connect software component with different units e.g. km/h vs mph, the Rte could automatically convert the internal representation between them.
Constant conversion: You can specify an initial value on the physical value (e.g. 10.6V) and the Rte will convert it to the internal representation.
Variable Size Arrays: Without dynamic memory allocation, you cannot have a variable size array in C. But you could reserve some (max) memory in an array and store the actual length in a seperate field. On the implementation level you have then a struct with two members (value, length). But on the application level you just have an array.
from AUTOSAR_TPS_SoftwareComponentTemplate.pdf
ApplicationDataType defines a data type from the application point of
view. Especially it should be used whenever something "physical" is at
stake.
An ApplicationDataType represents a set of values as seen in the
application model, such as measurement units. It does not consider
implementation details such as bit-size, endianess, etc.
It should be possible to model the application level aspects of a VFB
system by using ApplicationDataTypes only.
Is there a faster way to search data in JavaScript (specifically on V8 via node.js, but without c/c++ modules) than using the JavaScript Object?
This may be outdated but it suggests a new class is dynamically generated for every single property. Which made me wonder if a binary tree implementation might be faster, however this does not appear to be the case.
The binary tree implementation isn't well balanced so it might get better with balancing (only the first 26 values are roughly balanced by hand.)
Does anyone have an idea on why or how it might be improved? On another note: does the dynamic class notion mean there are actually ~260,000 properties (in the jsperf benchmark test of the second link) and subsequently chains of dynamic class definitions held in memory?
V8 uses the concepts of 'maps', which describe the layout of the data in an object.
These maps can be "fast maps" which specify a fixed offset from the start of the object at which a particular property can be found, or they can be "dictionary map", which use a hashtable to provide a lookup mechanism.
Each object has a pointer to the map that describes it.
Generally, objects start off with a fast map. When a property is added to an object with a fast map, the map is transitioned to a new one which describes the location of the new property within the object. The object is re-allocated with enough space for the new data item if necessary, and the object's map pointer is set to the new map.
The old map keeps a record of the transitions from it, including a pointer to the new map and a description of the property whose addition caused the map transition.
If another object which has the old map gets the same property added (which is very common, since objects of the same type tend to get used in the same way), that object will just use the new map - V8 doesn't create a new map in this case.
However, once the number of properties goes over a certain theshold (in fact, the current metric is to do with the storage space used, not the actual number of properties), the object is changed to use a dictionary map. At this point the object is re-written using a hashtable. In general, it won't undergo any more map transitions - any more properties that are added will just go in the hashtable.
Fast maps allow V8 to generate optimized code (using Crankshaft) where the offset of a property within an object is hard-coded into the machine code. This makes it very fast for cases where it can do this - it avoids the need for doing any lookup.
Obviously, the generated machine code is then dependent on the map - if the object's data layout changes, the code has to be discarded and re-optimized when necessary. V8 has a type profiling mechanism which collects information about what the types of various objects are during execution of unoptimized code. It doesn't trigger optimization of the code until certain stability constraints are met - one of these is that the maps of objects used in the function aren't changing frequently.
Here's a more detailed description of how this stuff works.
Here's a video where one of the lead developers of V8 describes stuff like map transitions and lots more.
For your particular test case, I would think that it goes through a few hundred map transitions while properties are being added in the preparation loop, then it will eventually transition to a dictionary based object. It certainly won't go through 260,000 of them.
Regarding your question about binary trees: a properly sized hashtable (with a sensible hash function and a significant number of objects in it) will always outperform a binary tree for a use-case where you're just searching, as your test code seems to do (all of the insertion is done in the setup phase).
everyone what is the difference between those 4 terms, can You give please examples?
Static and dynamic are jargon words that refer to the point in time at which some programming element is resolved. Static indicates that resolution takes place at the time a program is constructed. Dynamic indicates that resolution takes place at the time a program is run.
Static and Dynamic Typing
Typing refers to changes in program structure that are due to the differences between data values: integers, characters, floating point numbers, strings, objects and so on. These differences can have many effects, for example:
memory layout (e.g. 4 bytes for an int, 8 bytes for a double, more for an object)
instructions executed (e.g. primitive operations to add small integers, library calls to add large ones)
program flow (simple subroutine calling conventions versus hash-dispatch for multi-methods)
Static typing means that the executable form of a program generated at build time will vary depending upon the types of data values found in the program. Dynamic typing means that the generated code will always be the same, irrespective of type -- any differences in execution will be determined at run-time.
Note that few real systems are either purely one or the other, it is just a question of which is the preferred strategy.
Static and Dynamic Binding
Binding refers to the association of names in program text to the storage locations to which they refer. In static binding, this association is predetermined at build time. With dynamic binding, this association is not determined until run-time.
Truly static binding is almost extinct. Earlier assemblers and FORTRAN, for example, would completely precompute the exact memory location of all variables and subroutine locations. This situation did not last long, with the introduction of stack and heap allocation for variables and dynamically-loaded libraries for subroutines.
So one must take some liberty with the definitions. It is the spirit of the concept that counts here: statically bound programs precompute as much as possible about storage layout as is practical in a modern virtual memory, garbage collected, separately compiled application. Dynamically bound programs wait as late as possible.
An example might help. If I attempt to invoke a method MyClass.foo(), a static-binding system will verify at build time that there is a class called MyClass and that class has a method called foo. A dynamic-binding system will wait until run-time to see whether either exists.
Contrasts
The main strength of static strategies is that the program translator is much more aware of the programmer's intent. This makes it easier to:
catch many common errors early, during the build phase
build refactoring tools
incur a significant amount of the computational cost required to determine the executable form of the program only once, at build time
The main strength of dynamic strategies is that they are much easier to implement, meaning that:
a working dynamic environment can be created at a fraction of the cost of a static one
it is easier to add language features that might be very challenging to check statically
it is easier to handle situations that require self-modifying code
Typing - refers to variable tyes and if variables are allowed to change type during program execution
http://en.wikipedia.org/wiki/Type_system#Type_checking
Binding - this, as you can read below can refer to variable binding, or library binding
http://en.wikipedia.org/wiki/Binding_%28computer_science%29#Language_or_Name_binding