Word, Doubleword, Quadword - linux

It's my second question, one after another. That's the problem with assembly (x86 - 32bit) too.
"Programming from the Ground Up" says that 4bytes are 32bits and that's a word.
But Intel's "Basic Architecture" guide says, that word is 16bits (2 bytes) and 4 bytes is a dualword.
Memory uses 4bytes words, to get to another word I have to skip next 4 bytes, on each word I can make 4 offsets (0-3) to read a byte, so it's wrong with Intel's name, but this memory definition goes from Intel, so what's there bad?
And how to operate on words, dualword, quadwords in assembly? How to define the number as quadword?

To answer your first question, the processor word size is a function of the architecture. Thus, a 32-processor has a 32-bit word. In software types, including assembly, usually there is need to identify the size unambigously, so word type for historical reasons is 16-bits. So probably both sources are correct, if you read them in context: the first one is referring to the processor word, while the Intel guide is referring to the word type.

We've got different "word"s: program words, memory words, OS-specific words, architecture-specific words (program space word, flash word, eeprom word), even address words.
It's just a matter of convention what size the word word refers to.
I usually find the size of the word by looking at the number of hex digits the context is using to show them. Intel's most common type, 4 digits (0x0000), is two bytes.
And for further information, even byte is a convention. In many systems in the past bytes have been 7 or 9 bits. Most architectures nowadays have 8-bit bytes. The correct name for an always-8-bit structure is an octet.

Related

Conflicting alignment rules for structs vs. arrays

As the title implies, the question is regarding alignment of aggregate types in x86-64 on Linux.
In our lecture, the professor introduced alignment of structs (and the elements thereof) with the attached slide. Hence, I would assume (in accordance with wikipedia and other lecture material) that for any aggregate type, the alignment is in accordance to its largest member. Unfortunately, this does not seem to be the case in a former exam question, in which it said:
"Assuming that each page table [4kB, each PTE 64b] is stored in memory
at a “naturally aligned” physical address (i.e. an address which is an
integer multiple of the size of the table), ..."
How come that for a page table (which afaik is basically an array of 8 byte values in memory), alignment rules are not according to the largest element, but to the size of the whole table?
Clarification is greatly appreciated!
Felix
Why page tables are aligned on their size
For a given level on the process of translating the virtual address, requiring the current page table to be aligned on its size in bytes speeds up the indexing operation.
The CPU doesn't need to perform an actual addition to find the base of the next level page table, it can scale the index and then replace the lowest bits in the current level base.
You can convince yourself this is indeed the case with a few examples.
It's not a coincidence x86s follow this alignment too.
For example, regarding the 4-level paging for 4KiB pages of the x86 CPUs, the Page Directory Pointer field of a 64-bit address is 9 bits wide.
Each entry in that table (a PDPTE) is 64 bits, so the page size is 4096KiB and the last entry has offset 511 * 8 = 4088 (0xff8 in hex, so only 12 bits used at most).
The address of a Page Directory Pointer table is given by a PML4 entry, these entries have don't specify the lower 12 bits of the base (which are used for other purposes), only the upper bits.
The CPU can then simply replace the lower 12 bits in the PML4 entry with the offset of the PDPTE since we have seen it has size 12 bits.
This is fast and cheap to do in hardware (no carry, easy to do with registers).
Assume that a country has ZIP codes made of two fields: a city code (C) and a block code (D), added together.
Also, assume that there can be at most 100 block codes for a given city, so D is 2 digits long.
Requiring that the city code is aligned on 100 (which means that the last two digits of C are zero) makes C + D like replacing the last two digits of C with D.
(1200 + 34 = 12|34).
Relation with the alignment of aggregates
A page table is not regarded as an aggregate, i.e. as an array of 8 byte elements. It is regarded as a type of its own, defined by the ISA of the CPU and that must satisfy the requirement of the particular part of the CPU that uses it.
The page walker finds convenient to have a page table aligned on their size, so this is the requirement.
The alignment of aggregates is a set of rules used by the compiler to allocate objects in memory, it guarantees that every element alignment is satisfied so that instructions can access any element without alignment penalties/fault.
The execution units for loads and stores are a different part of the CPU than the page walker, so different needs.
You should use the aggregates alignment to know how the compiler will align your structs and then check if that's enough for your use case.
Exceptions exist
Note that the professor went a long way with explaining what alignment on their natural boundary means for page tables.
Exceptions exist, if you are told that a datum must be aligned on X, you can assume there's some hardware trick/simplification involved and try to see which one but in the end you just do the alignment and move on.
Margaret explained why page tables are special, I'm only answer this other part of the question.
according to the largest element.
That's not the rule for normal structs either. You want max(alignof(member)) not max(sizeof(member)). So "according to the most-aligned element" would be a better way to describe the required alignment of a normal struct.
e.g. in the i386 System V ABI, double has sizeof = 8 but alignof = 4, so alignof(struct S1) = 41
Even if the char member had been last, sizeof(struct S1) still has to be padded to a multiple of its alignof(), so all the usual invariants are maintained (e.g. sizeof( array ) = N * sizeof(struct S1)), and so stepping by sizeof always gets you to a sufficiently-aligned boundary for the start of a new struct.
Footnote 1: That ABI was designed before CPUs could efficiently load/store 8 bytes at once. Modern compilers try to give double and [u]int64_t 8-byte alignment, e.g. as globals or locals outside of structs. But the ABI's struct layout rules fix the layout based on the minimum guaranteed alignment for any double or int64_t object, which is alignof(T) = 4 for those types.
x86-64 System V has alignof(T) = sizeof(T) for all the primitive types, including the 8-byte ones. This makes atomic operations on any properly-aligned int64_t possible, for example, simplifying the implementation of C++20 std::atomic_ref to not have to check for sufficient alignment. (Why is integer assignment on a naturally aligned variable atomic on x86?)

What is an efficient way to search a string for the first of a set of delimiters?

I have a UTF-8 encoded string and I would like to iterate through it,
splitting it at one of multiple delimiters. I also need to know
which delimiter matched, as each delimiter has a specific meaning.
An example usage:
algorithm("one, two; three") => Match("one")
algorithm(", two; three") => Delimiter(",")
algorithm(" two; three") => Match(" two")
algorithm("; three") => Delimiter(";")
algorithm(" three") => Match(" three")
Additional information:
My delimiters are all single ASCII characters, so optimized
algorithms that require that are possible.
A solution that handles UTF-8 substrings would also be appreciated,
but isn't required.
I plan to call the method many times and potentially in a tight
loop, so an ideal algorithm would not need to allocate any memory.
The algorithm should return the first matching string or delimiter
and I can handle restarting the search on the next iteration.
An ideal algorithm would innately know if it is returning a match or
a delimiter, but it's possible to check that after the fact.
My target language is Rust, but I would appreciate answers in any
language with a similar lower-level focus. Pseudocode is fine as well,
as long as it recognizes the realities of UTF-8 text. Solutions that
use esoteric hex tricks or SIMD instructions are also suitable, but may require more explanation for me to understand ^_^.
For a processor-specific solution, X86-64 processors with SSE4.2 contain the PCMPxSTRx family of instructions. One of the modes available with these instructions is Equal Any:
arg1 is a character set, arg2 is the string to search in. IntRes1[i] is set to 1 if arg2[i] is in the set represented by arg1
The basic algorithm is straight-forward:
Fill an XMM register with up to 16 single bytes to search for (the needle).
Set the count of needle bytes in rax.
Calculate the memory address of the start of the string, including an offset.
Set the count of haystack bytes in rdx.
Call PCMPxSTRx with the appropriate control byte.
Check the result of ecx or one of the control code flags.
If there was no match and there is still string left to search for, increment the offset and loop.
There is a complication around page boundaries, however. Namely, the PCMPxSTRx instructions will always read 16 bytes of data. This can cause a segmentation fault if you read into a page of memory that is protected. A solution is to align all the reads to the end of the string, and handle the leftover bytes at the beginning. Before starting the above algorithm, use something like:
Mask the address of the start of the string with ~0xF. This clears all the low bits.
Use a PCMPxSTRM instruction (with a similar setup as above algorithm) for the first 16 bytes. This returns a mask of matching characters. You can shift the mask to ignore leading characters that are not part of your string.
If there was no match and there is more string left to search, start the above algorithm.
You can see the complete example of this algorithm in my Rust library Jetscii. Inline assembly is used to call out to the PCMPxSTRx instructions.

What's the difference between a "bit" and "octet"?

What's the difference between a "bit" and "octet"? Some python books, depending on the author, seem to use the terms interchangeably. I asked a PHD level guy and he said there was a difference but didn't explain what the difference was.
A bit is a single binary digit.
An octet is a collection 8 bits, sometimes called a "byte". There is no formal definition of a byte as 8 bits (though it is the generally accepted standard). The term octet is used when it is necessary to unambiguously specify that there are only 8 bits in the collection.
An octet is always eight bits. A byte is typically eight bits, or the width of a character in a given architecture. Some older computers represented characters in six bits. See http://en.wikipedia.org/wiki/Byte .
An octet and a byte is the same.
A bit is just one 0 or 1. Eight bits make a byte.

Decoding 68k instructions

I'm writing an interpreted 68k emulator as a personal/educational project. Right now I'm trying to develop a simple, general decoding mechanism.
As I understand it, the first two bytes of each instruction are enough to uniquely identify the operation (with two rare exceptions) and the number of words left to be read, if any.
Here is what I would like to accomplish in my decoding phase:
1. read two bytes
2. determine which instruction it is
3. extract the operands
4. pass the opcode and the operands on to the execute phase
I can't just pass the first two bytes into a lookup table like I could with the first few bits in a RISC arch, because operands are "in the way". How can I accomplish part 2 in a general way?
Broadly, my question is: How do I remove the variability of operands from the decoding process?
More background:
Here is a partial table from section 8.2 of the Programmer's Reference Manual:
Table 8.2. Operation Code Map
Bits 15-12 Operation
0000 Bit Manipulation/MOVEP/Immediate
0001 Move Byte
...
1110 Shift/Rotate/Bit Field
1111 Coprocessor Interface...
This made great sense to me, but then I look at the bit patterns for each instruction and notice that there isn't a single instruction where bits 15-12 are 0001, 0010, or 0011. There must be some big piece of the picture that I'm missing.
This Decoding Z80 Opcodes site explains decoding explicitly, which is something I haven't found in the 68k programmer's reference manual or by googling.
I've decided to simply create a look-up table with every possible pattern for each instruction. It was my first idea, but I discarded it as "wasteful, inelegant". Now, I'm accepting it as "really fast".

Does a strings length equal the byte size?

Exactly that: Does a strings length equal the byte size? Does it matter on the language?
I think it is, but I just want to make sure.
Additional Info: I'm just wondering in general. My specific situation was PHP with MySQL.
As the answer is no, that's all I need know.
Nope. A zero terminated string has one extra byte. A pascal string (the Delphi shortstring) has an extra byte for the length. And unicode strings has more than one byte per character.
By unicode it depends on the encoding. It could be 2 or 4 bytes per character or even a mix of 1,2 and 4 bytes.
It entirely depends on the platform and representation.
For example, in .NET a string takes two bytes in memory per UTF-16 code point. However, surrogate pairs require two UTF-16 values for a full Unicode character in the range U+100000 to U+10FFFF. The in-memory form also has an overhead for the length of the string and possibly some padding, as well as the normal object overhead of a type pointer etc.
Now, when you write a string out to disk (or the network, etc) from .NET, you specify the encoding (with most classes defaulting to UTF-8). At that point, the size depends very much on the encoding. ASCII always takes a single byte per character, but is very limited (no accents etc); UTF-8 gives the full Unicode range with a variable encoding (all ASCII characters are represented in a single byte, but others take up more). UTF-32 always uses exactly 4 bytes for any Unicode character - the list goes on.
As you can see, it's not a simple topic. To work out how much space a string is going to take up you'll need to specify exactly what the situation is - whether it's an object in memory on some platform (and if so, which platform - potentially even down to the implementation and operating system settings), or whether it's a raw encoded form such as a text file, and if so using which encoding.
It depends on what you mean by "length". If you mean "number of characters" then, no, many languages/encoding methods use more than one byte per character.
Not always, it depends on the encoding.
There's no single answer; it depends on language and implementation (remember that some languages have multiple implementations!)
Zero-terminated ASCII strings occupy at least one more byte than the "content" of the string. (More may be allocated, depending on how the string was created.)
Non-zero-terminated strings use a descriptor (or similar structure) to record length, which takes extra memory somewhere.
Unicode strings (in various languages) use two bytes per char.
Strings in an object store may be referenced via handles, which adds a layer of indirection (and more data) in order to simplify memory management.
You are correct. If you encode as ASCII, there is one byte per character. Otherwise, it is one or more bytes per character.
In particular, it is important to know how this effects substring operations. If you don't have one byte per character, does s[n] get the nth byte or nth char? Getting the nth char will be inefficient for large n instead of constant, as it is with a one byte per character.

Resources