Many of the Rust Compilers target definitions use "p270:32:32-p271:32:32-p272:64:64" inside the data-layout - what does it mean? - rust

A few days ago, the basic understanding of the data layout strings of Rust compiler, or to be more specific, the underlying LLVM, was already mostly resolved on Stack Overflow. Unfortunately, one thing is still unclear.
Many Rust compiler targets include p270:32:32-p271:32:32-p272:64:64 inside their data layout string. Examples are i686-unknown-uefi, x86_64-uwp_windows-msvc, x86_64-unknown-uefi, x86_64-unknown-linux_gnu, x86_64-fuchsia, or 86_64-apple-darwin.
(These targets can be found here https://github.com/rust-lang/rust/tree/1.52.1/compiler/rustc_target/src/spec.)
The LLVM Language Reference explains:
p[n]:<size>:<abi>:<pref>:<idx>
This specifies the size of a pointer and its and erred alignments for address space n. The fourth parameter is a size of index that used for address calculation. If not specified, the default index size is equal to the pointer size. All sizes are in bits. The address space, n, is optional, and if not specified, denotes the default address space 0. The value of n must be in the range [1,2^23).
I don't understand this. What is so special about p270 to p272? To which "address space" are they referring to?

These data layout strings were commited to Rust on 2020-01-07. The commit message says "Update data layouts to include new X86 address spaces". After more research I found that the underlying functionality was merged into LLVM in 2019. These new address spaces are MSVC's __ptr32, __ptr64, __sptr, and __uptr extensions [1].
Quote from the LLVM discussion:
The numbers 270-272 are more or less arbitrary; I picked them because they're near 256-258, which are the current existing address spaces.
If you look into X86.h in LLVM's source code, one can see that this number is used as identifier and is chosen at will but not for technical reasons.

Related

Conflicting alignment rules for structs vs. arrays

As the title implies, the question is regarding alignment of aggregate types in x86-64 on Linux.
In our lecture, the professor introduced alignment of structs (and the elements thereof) with the attached slide. Hence, I would assume (in accordance with wikipedia and other lecture material) that for any aggregate type, the alignment is in accordance to its largest member. Unfortunately, this does not seem to be the case in a former exam question, in which it said:
"Assuming that each page table [4kB, each PTE 64b] is stored in memory
at a “naturally aligned” physical address (i.e. an address which is an
integer multiple of the size of the table), ..."
How come that for a page table (which afaik is basically an array of 8 byte values in memory), alignment rules are not according to the largest element, but to the size of the whole table?
Clarification is greatly appreciated!
Felix
Why page tables are aligned on their size
For a given level on the process of translating the virtual address, requiring the current page table to be aligned on its size in bytes speeds up the indexing operation.
The CPU doesn't need to perform an actual addition to find the base of the next level page table, it can scale the index and then replace the lowest bits in the current level base.
You can convince yourself this is indeed the case with a few examples.
It's not a coincidence x86s follow this alignment too.
For example, regarding the 4-level paging for 4KiB pages of the x86 CPUs, the Page Directory Pointer field of a 64-bit address is 9 bits wide.
Each entry in that table (a PDPTE) is 64 bits, so the page size is 4096KiB and the last entry has offset 511 * 8 = 4088 (0xff8 in hex, so only 12 bits used at most).
The address of a Page Directory Pointer table is given by a PML4 entry, these entries have don't specify the lower 12 bits of the base (which are used for other purposes), only the upper bits.
The CPU can then simply replace the lower 12 bits in the PML4 entry with the offset of the PDPTE since we have seen it has size 12 bits.
This is fast and cheap to do in hardware (no carry, easy to do with registers).
Assume that a country has ZIP codes made of two fields: a city code (C) and a block code (D), added together.
Also, assume that there can be at most 100 block codes for a given city, so D is 2 digits long.
Requiring that the city code is aligned on 100 (which means that the last two digits of C are zero) makes C + D like replacing the last two digits of C with D.
(1200 + 34 = 12|34).
Relation with the alignment of aggregates
A page table is not regarded as an aggregate, i.e. as an array of 8 byte elements. It is regarded as a type of its own, defined by the ISA of the CPU and that must satisfy the requirement of the particular part of the CPU that uses it.
The page walker finds convenient to have a page table aligned on their size, so this is the requirement.
The alignment of aggregates is a set of rules used by the compiler to allocate objects in memory, it guarantees that every element alignment is satisfied so that instructions can access any element without alignment penalties/fault.
The execution units for loads and stores are a different part of the CPU than the page walker, so different needs.
You should use the aggregates alignment to know how the compiler will align your structs and then check if that's enough for your use case.
Exceptions exist
Note that the professor went a long way with explaining what alignment on their natural boundary means for page tables.
Exceptions exist, if you are told that a datum must be aligned on X, you can assume there's some hardware trick/simplification involved and try to see which one but in the end you just do the alignment and move on.
Margaret explained why page tables are special, I'm only answer this other part of the question.
according to the largest element.
That's not the rule for normal structs either. You want max(alignof(member)) not max(sizeof(member)). So "according to the most-aligned element" would be a better way to describe the required alignment of a normal struct.
e.g. in the i386 System V ABI, double has sizeof = 8 but alignof = 4, so alignof(struct S1) = 41
Even if the char member had been last, sizeof(struct S1) still has to be padded to a multiple of its alignof(), so all the usual invariants are maintained (e.g. sizeof( array ) = N * sizeof(struct S1)), and so stepping by sizeof always gets you to a sufficiently-aligned boundary for the start of a new struct.
Footnote 1: That ABI was designed before CPUs could efficiently load/store 8 bytes at once. Modern compilers try to give double and [u]int64_t 8-byte alignment, e.g. as globals or locals outside of structs. But the ABI's struct layout rules fix the layout based on the minimum guaranteed alignment for any double or int64_t object, which is alignof(T) = 4 for those types.
x86-64 System V has alignof(T) = sizeof(T) for all the primitive types, including the 8-byte ones. This makes atomic operations on any properly-aligned int64_t possible, for example, simplifying the implementation of C++20 std::atomic_ref to not have to check for sufficient alignment. (Why is integer assignment on a naturally aligned variable atomic on x86?)

Vulkan Shader & Resources: Why Uniform and not Const Resources

We usually use const in c++ to imply that the value does not change (read only), why in GLSL/VK in the shader or resource definition they choose the word uniform ? Wodn`t be more consistent and use the keyword borrowed from c/c++
Beside that probably the uniform keyword in shader definitions give clues to the compiler to attach those resources as close to the hardware as possible, probably shared memory or registers ? Not sure on that.
That also probably why they mention in the VkSpec. that we need small ammounts of data for those type of resources. Like for eg: values of cosmological constants..etc
Is anything that I`m missing, or some bit of history that passed away ?
Uniforms in GPU programming and const in C++ are focused on different things.
C++ const documents that a variable is not intended to be changed, with some compiler enforcement. As such it's more about using the type system to improve clarity and enforce intended usage -- important for large-project software engineering. You can still get around it with const_cast or other tricks, and the compiler can't assume you didn't, so it's not strictly enforced.
The important thing about uniforms is that they're, well, uniform. Meaning they have the same value whenever they are read within a draw call. Since there might be hundreds to millions of reads of that value in a single draw call, this allows it to be cached, and just one copy of it to be cached, or that it can be preloaded into registers (or cache) before shaders run, that it can be cached in a non-coherent cache, that a single read result can be broadcast across all SIMD lanes in a core, etc. For this to work, the fact that the contents can't change must be strictly enforced (with memory aliasing you can get around even this, now, but results are very much undefined if you do). So uniform really isn't about declaring intent to other programmers for software engineering benefits like const is, it's about declaring intent to the compiler and driver so they can optimize based on it.
D3D uses "const" and "constant buffer" rather than uniform, so clearly there is some overlap. Though that does lead to saying things like "how many times do you update constants per frame?" which when you think about it is kind of a weird thing to say :). The values are constant within shader code, but very much aren't constant at the API level.
The etymology of the word is important here. The term "uniform" is derived from GLSL, which was inspired by the Renderman standard's shader terminology. In Renderman, "uniform" was used for values "whose values are constant over whatever portion of the surface begin shaded". This was an alternative to "varying" which represented values interpolated across the surface.
"Constant" would imply that the value never changes. Uniform values do change; they simply don't change at the same frequency as other values. Input values change per-invocation, uniform values change per-draw call, and constant values don't change. Note that in GLSL, const usually means "compile-time constant": a value that is set at compile time and is never changed.
A uniform variable in Vulkan ultimately comes from a resource that exists outside of the shader. Blocks of uniform variables fed by buffers, uniforms in push constants fed by push constant state are both external resources, set by the user. That's a fundamentally different concept from having a compile-time constant struct.
Since it's different from a constant struct, it needs a different term to request it.

What does Int use three bits for? [duplicate]

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

How is a syscall is defined in linux kernel? What's the relation between compat_sys_xxx and sys_xxx?

In /include/linux/compat.h, I see a lot of compat_sys_xxx. Also, there is sys_xxx defined somewhere else. What's the relation between compat_sys_xxx and sys_xxx?
If there's a compat entry, it almost certainly means that the system call prototype was changed and a version of the previous prototype was maintained for compatibility. Often you'll see that compat_sys_xxx just calls sys_xxx with the arguments converted appropriately (or both call a common function with slightly different conversions).
As a more or less random example, compat_sys_msgsnd takes three "int" arguments followed by a pointer to a compat_msgbuf structure (wherein the first - ostensibly "long" - field is forced to a 32-bit size). OTOH, sys_msgsnd lists the arguments in a different order and with argument types selected to morph appropriately for the architecture (i.e. long floats according to the natural long integer size, size_t replaces int in one place, etc).
No doubt the syscall interface was changed because the original interface was ambiguous in some way, when moved to a different (non-i386) architecture. The compat_ version allows existing binaries to continue working without modification.

Dynamic Languages and Variable Allocation

How does a dynamic language decide how much memory to allocate for a variable?
eg. How does the compiler change variable= 5 to variable ="xxx" without too much memory overhead? When does it use the hardware stack and when does it use the memory heap?
The compiler allocates enough memory for each variable to hold a pointer plus whatever metadata the language runtime requires. But I think you mean to be asking how much memory is allocated for each object. In that case the answer is that it depends on the type of object. When a variable gets assigned to a different object, the pointer associated with that variable changes what it points to.
The answer, of course, varies by language - both the hosted dynamic language and the lower-level implementation language. That which applies to Perl does not necessarily apply to Python, nor does what applies in Tcl apply in Java or LISP or ... well, do they count as dynamic languages.
In Perl, there's a C-level structure that goes by the name SV (scalar variable) that contains different storage for different versions of the variable's value. These often heap-based; the storage for strings always ends up being heap based, though a pure numeric value that has never been converted to string might be in an SV that is strictly on the stack. In Perl, these things are reference counted (and mortalized, or immortalized, and all sorts of other interesting terms). More complicated types (AV, HV, RV, etc) are based on SV.

Resources