Conflicting alignment rules for structs vs. arrays - struct

As the title implies, the question is regarding alignment of aggregate types in x86-64 on Linux.
In our lecture, the professor introduced alignment of structs (and the elements thereof) with the attached slide. Hence, I would assume (in accordance with wikipedia and other lecture material) that for any aggregate type, the alignment is in accordance to its largest member. Unfortunately, this does not seem to be the case in a former exam question, in which it said:
"Assuming that each page table [4kB, each PTE 64b] is stored in memory
at a “naturally aligned” physical address (i.e. an address which is an
integer multiple of the size of the table), ..."
How come that for a page table (which afaik is basically an array of 8 byte values in memory), alignment rules are not according to the largest element, but to the size of the whole table?
Clarification is greatly appreciated!
Felix

Why page tables are aligned on their size
For a given level on the process of translating the virtual address, requiring the current page table to be aligned on its size in bytes speeds up the indexing operation.
The CPU doesn't need to perform an actual addition to find the base of the next level page table, it can scale the index and then replace the lowest bits in the current level base.
You can convince yourself this is indeed the case with a few examples.
It's not a coincidence x86s follow this alignment too.
For example, regarding the 4-level paging for 4KiB pages of the x86 CPUs, the Page Directory Pointer field of a 64-bit address is 9 bits wide.
Each entry in that table (a PDPTE) is 64 bits, so the page size is 4096KiB and the last entry has offset 511 * 8 = 4088 (0xff8 in hex, so only 12 bits used at most).
The address of a Page Directory Pointer table is given by a PML4 entry, these entries have don't specify the lower 12 bits of the base (which are used for other purposes), only the upper bits.
The CPU can then simply replace the lower 12 bits in the PML4 entry with the offset of the PDPTE since we have seen it has size 12 bits.
This is fast and cheap to do in hardware (no carry, easy to do with registers).
Assume that a country has ZIP codes made of two fields: a city code (C) and a block code (D), added together.
Also, assume that there can be at most 100 block codes for a given city, so D is 2 digits long.
Requiring that the city code is aligned on 100 (which means that the last two digits of C are zero) makes C + D like replacing the last two digits of C with D.
(1200 + 34 = 12|34).
Relation with the alignment of aggregates
A page table is not regarded as an aggregate, i.e. as an array of 8 byte elements. It is regarded as a type of its own, defined by the ISA of the CPU and that must satisfy the requirement of the particular part of the CPU that uses it.
The page walker finds convenient to have a page table aligned on their size, so this is the requirement.
The alignment of aggregates is a set of rules used by the compiler to allocate objects in memory, it guarantees that every element alignment is satisfied so that instructions can access any element without alignment penalties/fault.
The execution units for loads and stores are a different part of the CPU than the page walker, so different needs.
You should use the aggregates alignment to know how the compiler will align your structs and then check if that's enough for your use case.
Exceptions exist
Note that the professor went a long way with explaining what alignment on their natural boundary means for page tables.
Exceptions exist, if you are told that a datum must be aligned on X, you can assume there's some hardware trick/simplification involved and try to see which one but in the end you just do the alignment and move on.

Margaret explained why page tables are special, I'm only answer this other part of the question.
according to the largest element.
That's not the rule for normal structs either. You want max(alignof(member)) not max(sizeof(member)). So "according to the most-aligned element" would be a better way to describe the required alignment of a normal struct.
e.g. in the i386 System V ABI, double has sizeof = 8 but alignof = 4, so alignof(struct S1) = 41
Even if the char member had been last, sizeof(struct S1) still has to be padded to a multiple of its alignof(), so all the usual invariants are maintained (e.g. sizeof( array ) = N * sizeof(struct S1)), and so stepping by sizeof always gets you to a sufficiently-aligned boundary for the start of a new struct.
Footnote 1: That ABI was designed before CPUs could efficiently load/store 8 bytes at once. Modern compilers try to give double and [u]int64_t 8-byte alignment, e.g. as globals or locals outside of structs. But the ABI's struct layout rules fix the layout based on the minimum guaranteed alignment for any double or int64_t object, which is alignof(T) = 4 for those types.
x86-64 System V has alignof(T) = sizeof(T) for all the primitive types, including the 8-byte ones. This makes atomic operations on any properly-aligned int64_t possible, for example, simplifying the implementation of C++20 std::atomic_ref to not have to check for sufficient alignment. (Why is integer assignment on a naturally aligned variable atomic on x86?)

Related

RISC-V how to use each lb, sa, li and how to properly use its syntax

I don't know how to access each character and know which one I'm accessing. In terms of the c language, they use char array for strings and use indexes to know which is being accessed. What is the equivalent of this in risc-v?
C has logical variables — assembly language has physical storage.
(Logical) variables have names & types & scope/lifetime, and hold values at runtime.  In assembly, we have physical storage: registers & memory — physical storage has size, but names are often not implemented except in comments (and for globals), while types are implemented by individual instructions of the machine code program (and by storage directives in assembly), and variable lifetimes are implemented by repurposing of physical storage as per the machine code program.
Compilers and assembly language programmers map logical variables of our algorithms onto available physical storage.
C has arrays — assembly language has arrays.
Arrays are consecutive storage locations, whether consecutive bytes as in a string, or consecutive words as in an array of integers/words.  An array has a base address: the address of its lowest element (e.g. at index position 0).
In some sense, we can refer to the whole array by that base address, though usually that would be paired with a length indication of some kind, either a terminal value as in C-style string, which are nul-terminated, or an explicit length variable or constant as commonly done with arrays of integer.
C has indexes — assembly language has indexes.
Indexes are just simple integers.  With regard to arrays, indexes usually start at zero, and increment by 1 to refer to the next element / index position, for example, no matter the size of the elements.
C has pointers — assembly language has pointers.
In C we have the concept of pointers.  A pointer variable holds a memory address — an address in assembly language is just an (unsigned) integer.  In C, pointers have types, so they know what type of element they are pointing to.  In assembly language, pointers are just (unsigned) integers, so the program must know what they are pointing to.
The simplest pointer in usage is as an immutable variable, e.g. simple a copy of the base address of the array.
Pointers can be dereferenced, in C and assembly.  We can dereference for read or for write.  In C a dereference is written with * as in *p, or p[0], which is equivalent.  If that appears as the left hand side operand of an assignment operator (e.g. *p = ...), that is a dereference for write.  In assembly language we would use sb or sw for that operation.  If that *p appears in any other context (e.g. c = *p), that is a dereference for read; assembly language would use lb and lw instruction for that operation.
A function that takes an array as a parameter would see such a pointer.
C has array & pointer indexing — assembly language has pointer indexing.
In C we can write str[i] or arr[i] and this will access the ith element of the string or integer array.
A single integer/word, alone, stored in memory occupies 4 bytes.  As we refer to an array by its base address (the lowest address in the array), we also refer to a multi-byte value, like an integer/word, by the lowest address in that word.
In C, arr[i] can be written as *(arr+i) (or *(a+(i)) if i represents an expression) and, by definition of the C language, these are equivalent.  The + in the expanded form (not visible in the arr[i] form) is called pointer arithmetic.
Further, hidden from the view in C but visible in assembly language, the assembly programmer and the processor deals with a byte addressable memory, meaning that each address refers to one byte and multi-byte items occupy not only multiple bytes but therefore also multiple addresses.  The assembly programmer & processor will also deal with byte offsets, which are scaled indexes — these offsets are not seen in C as that language sticks to indexes and pointers.
In assembly & machine code, indexing, as in arr[i], involves scaling by the element size.  For an integer array, the element at index position 0 has the same address as the base address of the array, let's say 0x1000.  Since an integer/word takes 4 individual bytes, then, the element at index position 1 has address 0x1004, and the element at position i has address 0x1000 + i * 4.  This scaling is explicitly done in assembly language.
The scale factor for an array of bytes (as in str[i]) is 1, meaning no scaling of the index is really needed: index & byte offset are equal in value.
In RISC V assembly language, array index is done instead as pointer indexing, so with two steps, first: create a pointer variable (in a CPU register) that refers to the base address of the array, then perform pointer indexing.
In RISC V, there is only one addressing mode, base + displacement, where base is a register containing a pointer value, and displacement is a compile time constant.  This can accomplish constant array element access: if the base array address is in register s0, then 0(s0) represents the addressing mode to access element 0 of the array (this can be used with an lb, lw, sb, or sw instruction).  If it is an array of bytes, then 1(s0) represents str[1], while if it is an array of integers/words, then 4(s0) represents arr[1].
To do indexing with a variable, as in str[i] or arr[i], we need to form a new pointer that refers directly to the memory address of the ith element position, then we can use RISC V's standard base + displacement where the displacement is simply 0, since the base we will use will have already computed the complete address.  For str[i], form a new pointer value computed as base address of array plus the value of i.  For arr[i], form a new pointer value computed by base address of the array plus the value of i appropriately scaled, e.g. multiplied by 4 or more commonly shifted left by 2.
C has mutable pointer variables — assembly language has them, too.*
More advances usages of pointers involve updating them to refer to a next (or some other) element of the array.  When used like that a pointer is logically equivalent to a base array address + an index combined into one variable.  Pointer addition: adding to the pointer advances it to refer to subsequent elements, subtraction (of index/offset) moves it backwards, and subtracting a pointer from a pointer yields an index in C (and offset in assembly).
In C we can advance a pointer, for example, modify the pointer variable to point to the next element, as in p++, which might also be written as p += 1, or p = p + 1 — all equivalent by the definition of the C language.  This can also be done in assembly language, as long as we are also aware of the scaling, so if the pointer refers to a byte then incrementing by value 1 makes it refer to the next byte, where as for integers/words, we must increment by 4 to accomplish what is just +1 in C.
Many algorithms (language independent) involve pointers.  When taking an algorithm from C to assembly language, try to stay true to the C code rather than performing optimization during translation.
If you have an array version of an algorithm and want to optimize it to use pointers in assembly, then do that in C first, and make sure it works there by testing it, then take that pointer version to assembly most literally, and you won't have algorithmic problems to debug in assembly language.
la is used in some environments like RARS, to put the address of a (usually data) label into a register.  For example, taking the address of an array and putting into an array, e.g. as a pointer.
li is used to put a constant integer value into a register.
There is no sa I'm aware of.
lb instructs the processor to use the (only available) addressing mode, base + displacement, to access the byte-sized memory item at the effective address and retrieve a copy of that value into a CPU register.  (The effective address is the address computed by adding the base register to the displacement.)
lw does the same but instructions the processor to fetch a multi-byte item that starts at the effective address.

What does Int use three bits for? [duplicate]

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

Minimal and maximal magnitude in Fortran

I'm trying to rewrite minpack Fortran77 library to Java (for my own needs), so I met this in minpack.f source code:
integer mcheps(4)
integer minmag(4)
integer maxmag(4)
double precision dmach(3)
equivalence (dmach(1),mcheps(1))
equivalence (dmach(2),minmag(1))
equivalence (dmach(3),maxmag(1))
...
data dmach(1) /2.22044604926d-16/
data dmach(2) /2.22507385852d-308/
data dmach(3) /1.79769313485d+308/
dpmpar = dmach(i)
return
What are minmag and maxmag functions, and why dmach(2) and dmach(3) have these values?
There is an explanation in comments:
c dpmpar(1) = b**(1 - t), the machine precision,
c dpmpar(2) = b**(emin - 1), the smallest magnitude,
c dpmpar(3) = b**emax*(1 - b**(-t)), the largest magnitude.
What is smallest and largest magnitude? There must be a way to count these values on runtime; machine constants in source code is a bad style.
EDIT:
I suppose that static fields Double.MIN_VALUE and Double.MAX_VALUE are those values I looked for.
minmag and maxmag (and mcheps too) are not functions, they are declared to be rank 1 integer arrays with 4 elements each. Likewise dmach is a rank 1 3 element array of double precision values. It is very likely, but not certain, that each integer value occupies 4 bytes and each d-p value 8 bytes. Bear this in mind as the answer progresses.
So an expression such as mcheps(1) is not a function call but a reference to the 1st element of an array.
equivalence is an old FORTRAN feature, now deprecated both by language standards and by software engineering practices. A statement such as
equivalence (dmach(1),mcheps(1))
states that the first element of dmach is located, in memory, at the same address as the first element of mcheps. By implication, this also means that the 24 bytes of dmach occupy the same addresses as the 16 bytes of mcheps, and another 8 bytes too. I'll leave you to draw a picture of what is going on. Note that it is conceivable that the code originally (and perhaps still) uses 8 byte integers so that the elements of the equivalenced arrays match 1:1.
Note that equivalence gives, essentially, more than one name, and more than one interpretation, to the same memory locations. mcheps(1) is the name of an integer stored in 4 bytes of memory which form part of the storage for dmach(1). Equivalencing used to be used to implement all sorts of 'clever' tricks back in the days when every byte was precious.
Then the data statements assign values to the elements of dmach. To me those values look to be just what the comment tells us they are.
EDIT: The comment indicates that those magnitudes are the smallest and largest representable double precision numbers on the platform for which the code was last compiled. I think that in Java they are probably called doubles. I don't know Java so don't know what facilities it has for returning the value of the largest and smallest doubles, if you don't know this either hit the 'net or ask another SO question -- to which you'll probably get responses along the lines of search the net.
Most of this you should be able to ignore entirely. As you write, a better approach would be to find out those values at run-time by enquiry using intrinsic functions. Fortran 90 (and later) have such functions, I imagine Java has too but that's your domain not mine.

Bit Size of GHC's Int Type

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

Most efficient data structure to add styles to text

I'm looking for the best data structure to add styles to a text (say in a text editor). The structure should allow the following operations:
Quick lookup of all styles at absolute position X
Quick insert of text at any position (styles after that position must be moved).
Every position of the text must support an arbitrary number of styles (overlapping).
I've considered lists/arrays which contain text ranges but they don't allow quick insert without recalculating the positions of all styles after the insert point.
A tree structure with relative offsets supports #2 but the tree will degenerate fast when I add lots of styles to the text.
Any other options?
I have never developped an editor, but how about this:
I believe it would be possible to expand the scheme that is used to store the text characters themeselves, depending of course on the details of your implementation (language, toolkits etc) and your performance and resource usage requirements.
Rather than use a separate data structure for the styles, I'd prefer having a reference that would accompany each character and point to an array or list with the applicable characters. Characters with the same set of styles could point to the same array or list, so that one could be shared.
Character insertions and deletions would not affect the styles themeselves, apart from changing the number of references to them, which could be handled with a bit of reference counting.
Depending on your programming language you could even compress things a bit more by pointing halfway into a list, although the additional bookkeeping for this might in fact make it more inefficient.
The main issue with this suggestion is the memory usage. In an ASCII editor written in C, bundling a pointer with each char would raise its effective memory usage from 1 byte to 12 bytes on a 64 bit system, due to struct alignment padding.
I would look about breaking the text into small variable size blocks that would allow you to efficiently compress the pointers. E.g. a 32-character block might look like this in C:
struct _BLK_ {
unsigned char size;
unsigned int styles;
char content[];
}
The interesting part is the metadata processing on the variable part of the struct, which contains both the stored text and any style pointers. The size element would indicate the number of characters. The styles integer (hence the 32-character limit) would be seen as a set of 32 1-bit fields, with each one indicating whether a character has its own style pointer, or whether it should use the same style as the previous character. This way a 32-char block with a single style would only have the additional overhead of the size char, the styles mask and a single pointer, along with any padding bytes. Inserting and deleting characters into a small array like this should be quite fast.
As for the text storage itself, a tree sounds like a good idea. Perhaps a binary tree where each node value would be the sum of the children values, with the leaf nodes eventually pointing to text blocks with their size as their node value? The root node value would be the total size of the text, with each subtree ideally holding half of your text. You'd still have to auto-balance it, though, with sometimes having to merge half-empty text blocks.
And in case you missed it, I am no expert in trees :-)
EDIT:
Apparently what I suggested is a modified version of this data structure:
http://en.wikipedia.org/wiki/Rope_%28computer_science%29
as referenced in this post:
Data structure for text editor
EDIT 2:
Deletion in the proposed data structure should be relatively fast, as it would come down to byte shifting in an array and a few bitwise operations on the styles mask. Insertion is pretty much the same, unless a block fills up. It might make sense to reserve some space (i.e. some bits in the styles mask) within each block to allow for future insertions directly in the blocks, without having to alter the tree itself for relatively small amounts of new text.
Another advantage of bundling characters and styles in blocks like this is that its inherent data locality should allow for more efficient use of the CPU cache than other alternatives, thus improving the processing speed to some extent.
Much like any complex data structure, though, you'd probably need either profiling with representative test cases or an adaptive algorithm to determine the optimal parameters for its operation (block size, any reserved space etc).

Resources