Are there 16 registers in x64? - 64-bit

One the one hand I read that x64 cpus have 16 registers, I took a look at the list and a lot are missing like Eflags register, why is that?
why it is not counted into the total registers? it seems like a contradiction to me
https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture

It has 16 general purpose registers which can be the target of most instructions and used as address pointers and array indices. The flags register, program counter, floating point registers, 8 MMX registers and 16 SSE registers all require different sets of instructions to operate and can't be used as address pointers or array indices.

From the link you provided, there is an excerpt:
x64 extends x86's 8 general-purpose registers to be 64-bit, and adds 8 new 64-bit registers.
As you can see, it isn't talking about all registers here, it's only talking about general-purpose registers (The list after this excerpt which I think you are talking about is also only showing general-purpose registers). As the name implies, these registers have no specific purpose and are mostly used to store temporary data or for addressing. Flags do not count as general-purpose registers. They each have a specific purpose and you can't access them directly. Instead, they are set through various instructions and are checked mostly in conditional instructions. Note, however, that the page does mention the flag registers, they just don't count them in the general-purpose registers:
The instruction pointer, eip, and flags register have been extended to 64 bits (rip and rflags, respectively) as well.

Related

What is the modern usage of the global descriptor table(GTD)?

After a long read, I am really confused.
From what I read:
Modern OS does not use segments at all.
The GDT is used to define a segment in the memory (including constraints).
The page table has a supervisor bit that indicates if the current location is for the kernel.
Wikipedia says that "The GDT is still present in 64-bit mode; a GDT must be defined but is generally never changed or used for segmentation."
Why do we need it at all? And how linux uses it?
Modern OS does not use segments at all.
A modern OS (for 64-bit 80x86) still uses segment registers; it's just that their use is "mostly hidden" from user-space (and most user-space code can ignore them). Specifically; the CPU will determine if the code is 64-bit (or 32-bit or 16-bit) from whatever the OS loads (from GDT or LDT) into CS, interrupts still save CS and SS for the interrupted code (and load them again at iret), GS and/or FS are typically used for thread-local and/or CPU local storage, etc.
The GDT is used to define a segment in the memory (including constraints).
Code and data segments are just one of the things that GDT is used for. The other main use is defining where the Task State Segment is (which is used to find IO port permission map, values to load into CS, SS and RSP when there's a privilege level change caused by an interrupt, etc). It's also still possible for 64-bit code (and 32-bit code/processes running under a 64-bit kernel) to use call gates defined in the GDT, but most operating systems don't use that feature for 64-bit code (they use syscall instead).
The page table has a supervisor bit that indicates if the current location is for the kernel.
Yes. The page table's supervisor bit determines if code running at CPL=3 can/can't access the page (or if the code must be CPL=2, CPL=1 or CPL=0 to access the page).
Wikipedia says that "The GDT is still present in 64-bit mode; a GDT must be defined but is generally never changed or used for segmentation."
Yes - Wikipedia is right. Typically an OS will set up a GDT early during boot (for TSS, CS, SS, etc) and then not have any reason to modify it after boot; and the segment registers aren't used for "segmented memory protection" (but are used for other things - determining code size, if an interrupt handler should return to CPL=0 or not, etc).

networking system call multiplexing on x86 but not on x64

I was reading an article on how networking related system calls are made on x86 and I saw that the calls were multiplexed through a single system call "socketcall". The reason for this additional level of hierarchy seems to be to conserve system call numbers.
Taking a quick look at x64, this does not seem to be the case. Why is this so? Each register in an x86 processor is 32 bits long and should not have trouble storing bigger values for system call numbers; so what is the reason for socketcall not being implemented on x64?
Pure speculation, but on some architectures with a small number of registers like x86, functions beyond a certain number of parameters cannot efficiently pass all of the parameters directly into registers (for x86 this is about 6). For example, sendto and recvfrom take 6 + 1 for the syscall number. At this point it is more efficient to pass a pointer to an array of longs and as for the others with parameters less than the threshold I am guessing it was a matter of convenience and code sharing between related function.

great size pointer in gcc

I want to define a great size pointer(64 bit or 128 bit) in gcc which is not depend on platform.
I think there is something like __ptr128 or __ptr64 in MSDN.
sizeof(__ptr128) is 16 bytes.
sizeof(__ptr64 ) is 8 bytes.
is it possible?
it can be useful when you use kernel functions in 64 bit OS which requires 8 bytes pointer argument and you have a 32 bit application which uses 32 bits address and you want to use this kernel function.
Your question makes no sense. Pointers, by definition, are a memory address to something - the size must depend upon the platform. How would you dereference a 128-bit pointer on a hardware platform supporting 64-bit addressing?!
You can create 64 or 128-bit values, but a pointer is directly related to the memory addressing scheme of the underlying hardware.
EDIT
With your additional statement, I think I see what you're trying to do. Unfortunately, I doubt it's possible. If the kernel function you want to use takes a 64-bit pointer argument, it's highly likely to be a 64-bit function (unless you're developing for some unusual hardware).
Even though it's technically possible to mix 64-bit instructions into a 32-bit executable, no compiler will actually let you do this. A 64-bit API call will use 64-bit code, 64-bit registers and a 64-bit stack - it would be extremely awkward for the compiler and operating system to manage arbitrary switching from a 32-bit environment to a 64-bit environment.
You should look at finding the equivalent API for a 32-bit environment. Perhaps you could post the kernel function prototype (name+parameters) you want to use and someone could help you find a better solution.
Just so there's no confusion, __ptr64 in MSDN is not platform independent:
On a 32-bit system, a pointer declared with __ptr64 is truncated to a
32-bit pointer.
Can't comment, but the statement that you can't use 64 bit instructions in a "32 bit executable" is misleading since the definition of "32 bit executable" is subject to interpretation. If you mean an executable that uses 32 bit pointers, then there is nothing at all that says you can't use instructions that manipulate 64 bit values while using 32 bit pointers. The processor doesn't know the difference.
Linux even supports a mode where you can have a 32 bit userspace and a 64 bit kernel space. Thus, each app has access to 4GB of RAM, but the system can access much more. This keeps the size of your pointers down to 4 bytes but does not restrict the use of 64 bit data manipulations.
I'm late to the party but the question makes quite a lot of sense in embedded platforms.
If you combine a CPU with some additional accelerators in the same SOC, they don't necessarily need to have the same address space or address space size.
For the firmware in the accelerator you would want pointers that cover its address space from the CPU and the accelerator's perspective. They are not necessarily the same size.
For example, with a 64 bit CPU and a 32 bit accelerator, the pointer for the firmware can cover 32 bit long address space and the pointer for CPU covers 64 bit address space. C does not have two or more void * types depending on the address spaces you want to talk to.
People generally solve this by casting void * to uintN_t with N as large as needed and passing this around between different parts of the system.
There is none, because gcc was not designed for embedded architectures. There are architectures where multiple sized pointers exist like for example m16c: ram has 16 bit addresses and rom(flash) has 20 bit addresses in the same address space. The performance and size usage is better for smaller pointers.

program life in terms of paged segmentation memory

I have a confusing notion about the process of segmentation & paging in x86 linux machines. Will be glad if some clarify all the steps involved from the start to the end.
x86 uses paged segmentation memory technique for memory management.
Can any one please explain what happens from the moment an executable .elf format file is loaded from hard disk in to main memory to the time it dies. when compiled the executable has different sections in it (text, data, stack, heap, bss). how will this be loaded ? how will they be set up under paged segmentation memory technique.
Wanted to know how the page tables get set up for the loaded program ? Wanted to know how GDT table gets set up. how the registers are loaded ? and why it is said that logical addresses (the ones that are processed by segmentation unit of MMU are 48 bits (16 bits of segment selector + 32 bit offset) when it is a bit 32 bit machine. how will other 16 bits be stored ? any thing accessed from ram must be 32 bits or 4 bytes how does the rest of 16 bits be accessed (to be loaded into segment registers) ?
Thanks in advance. the question can have a lot of things. but wanted to get clarification about the entire life cycle of an executable. Will be glad if some answers and pulls up a discussion on this.
Unix traditionally has implemented protection via paging. 286+ provides segmentation, and 386+ provides paging. Everyone uses paging, few make any real use of segmentation.
In x86, every memory operand has an implicit segment (so the address is really 16 bit selector + 32 bit offset), depending on the register used. So if you access [ESP + 8] the implied segment register is SS, if you access [ESI] the implied segment register is DS, if you access [EDI+4] the implied segment register is ES,... You can override this via segment prefix overrides.
Linux, and virtually every modern x86 OS, uses a flat memory model (or something similar). Under a flat memory model each segment provides access to the whole memory, with a base of 0 and a limit of 4Gb, so you don't have to worry about the complications segmentation brings about. Basically there are 4 segments: kernelspace code (RX), kernelspace data (RW), userspace code (RX), userspace data (RW).
An ELF file consists of some headers that pont to "program segments" and "sections". Section are used for linking. Program segments are used for loading. Program segments are mapped into memory via mmap(), this setups page-table entries with appropriate permissions.
Now, older x86 CPUs' paging mechanism only provided RW access control (read permission implies execute permission), while segmentation provided RWX access control. The end permission takes into account both segmentation and paging (e.g: RW (data segment) + R (read only page) = R (read only), while RX (code segment) + R (read only page) = RX (read and execute)).
So there are some patches that provide execution prevention via segmentation: e.g. OpenWall provided a non-executable stack by shrinking the code segment (the one with execute permission), and having special emulation in the page fault handler for anything that needed execution from a high memory address (e.g: GCC trampolines, self-modified code created on the stack to efficiently implement nested functions).
There's no such thing as paged segmentation, not in the official documentation at least. There are two different mechanisms working together and more or less independently of each other:
Translation of a logical address of the form 16-bit segment selector value:16/32/64-bit segment offset value, that is, a pair of 2 numbers into a 32/64-bit virtual address.
Translation of the virtual address into a 32/64-bit physical address.
Logical addresses is what your applications operate directly with. Then follows the above 2-step translation of them into what the RAM will understand, physical addresses.
In the first step the GDT (or it can be LDT, depends on the selector value) is indexed by the selector to find the relevant segment's base address and size. The virtual address will be the sum of the segment base address and the offset. The segment size and other things in segment descriptors are needed to provide protection.
In the second step the page tables are indexed by different parts of the virtual address and the last indexed table in the hierarchy gives the final, physical address that goes out on the address bus for the RAM to see. Just like with segment descriptors, page table entries contain not only addresses but also protection control bits.
That's about it on the mechanisms.
Now, in many x86 OSes the segment selectors that are used for applications are fixed, they are the same in all of them, they never change and they point to segment descriptors that have base addresses equal to 0 and sizes equal to the possible maximum (e.g. 4GB in non-64-bit modes). Such a GDT setup effectively means that the first step does no useful work and the offset part of the logical address translates into numerically equal virtual address.
This makes the segment selector values practically useless. They still have to be loaded into the CPU's segment registers (in non-64-bit modes into at least CS, SS, DS and ES), but beyond that point they can be forgotten about.
This all (except Linux-related details and the ELF format) is explained in or directly follows from Intel's and AMD's x86 CPU manuals. You'll find many more details there.
Perhaps read the Assembly HOWTO. When a Linux process starts to execute an ELF executable using the execve system call, it is essentially (sort of) mmap-ing some segments (and initializing registers, and a tiny part of the stack). Read also the SVR4 x86 ABI supplement and its x86-64 variant. Don't forget that a Linux process only see memory mapping for its address space and only cares about virtual memory
There are many good books on Operating Systems (=O.S.) kernels, notably by A.Tanenbaum & by M.Bach, and some on the linux kernel
NB: segment registers are nearly (almost) unused on Linux.

What are 16, 32 and 64-bit architectures?

What do 16-bit, 32-bit and 64-bit architectures mean in case of Microprocessors and/or Operating Systems?
In case of Microprocessors, does it mean maximum size of General Purpose Registers or size of Integer or number of Address-lines or number of Data Bus lines or what?
What do we mean by saying "DOS is a 16-bit OS", "Windows in a 32-bit OS", etc...?
My original answer is below, if you want to understand the comments.
New Answer
As you say, there are a variety of measures. Luckily for many CPUs a lot of the measures are the same, so there is no confusion. Let's look at some data (Sorry for image upload, I couldn't see a good way to do a table in markdown).
As you can see, many columns are good candidates. However, I would argue that the size of the general purpose registers (green) is the most commonly understood answer.
When a processor is very varied in size for different registers, it will often be described in more detail, eg the Motorola 68k being described as a 16/32bit chip.
Others have argued it is the instruction bus width (yellow) which also matches in the table. However, in today's world of pipelining I would argue this is a much less relevant measure for most applications than the size of the general purpose registers.
Original answer
Different people can mean different things, because as you say there are several measures. So for example someone talking about memory addressing might mean something different to someone talking about integer arithmetic. However, I'll try and define what i think is the common understanding.
My take is that for a CPU it means "The size of the typical register used for standard operations" or "the size of the data bus" (the two are normally equivalent).
I justify this with the following logic. The Z80 has an 8bit accumulator and an 8 bit databus, while having 16bit memory addressing registers (IX, IY, SP, PC), and a 16bit memory address bus. And the Z80 is called an 8bit microprocessor. This means people must normally mean the main integer arithmetic size, or databus size, not the memory addressing size.
It is not the size of instructions, as the Z80 (again) had 1,2 and 3 byte instructions, though of course the multi-byte were read in multiple reads. In the other direction, the 8086 is a 16bit microprocessor and can read 8 or 16bit instructions. So I would have to disagree with the answers that say it is instruction size.
For Operating systems, I would define it as "the code is compiled to run on a CPU of that size", so a 32bit OS has code compiled to run on a 32 bit CPU (as per the definition above).
How many bits a CPU "is", means what it's instruction word length is.
On a 32 bit CPU, the word length of such instruction is 32 bit, meaning that this is the width what a CPU can handle as instructions or data, often resulting in a bus line with that width.
For a similar reason, registers have the size of the CPU's word length, but you often have larger registers for different purposes.
Take the PDP-8 computer as an example. This was a 12 bit computer. Each instruction was 12 bit long. To handle data of the same width, the accumulator was also 12 bit.
But what makes the 12-bit computer a 12 bit machine, was its instruction word length. It had twelve switches on the front panel with which it could be programmed, instruction by instruction.
This is a good example to break out of the 8/16/32 bit focus.
The bit count is also typically the size of the address bus. It therefore usually tells the maximum addressable memory.
There's a good explanation of this at Wikipedia:
In computer architecture, 32-bit integers, memory addresses, or other data units are those that are at most 32 bits (4 octets) wide. Also, 32-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size. 32-bit is also a term given to a generation of computers in which 32-bit processors were the norm.
Now let's talk about OS.
With OS-es, this is way less bound to the actual "bitty-ness" of the CPU, it usually reflects how opcodes are assembled (for which word length of the CPU) and how registers are adressed (you can't load a 32 bit value in a 16 bit register) and how memory is adressed. Think of it as the completed, compiled program. It is stored as binary instructions and has therefore to fit into the CPUs word length. Task-wise, it has to be able to address the whole memory, otherwise it couldn't do proper memory management.
But what come's down to it, is whether a program is 32 or 64 bit (an OS is essentially a program here) it how its binary instructions are stored and how registers and memory are addressed. All in all, this applies to all kinds of programs, not just OS-es. That's why you have programs compiled for 32 bit or 64 bit.
The difference comes down to the bit width of an instruction set passed to a general purpose register for operating on. 16 bits can operate on 2 bytes, 64 on 8 bytes of instruction at a time. You can often increase throughput of a processor by executing more dense instructions per clock cycle.
The definitions are marketing terms more than precise technical terms.
In fuzzy technical term they are more related to architecturally visible widths than any real implementation register or bus width. For instance the 68008 was classed as a 32-bit CPU, but had 16-bit registers in the silicon and only an 8-bit data bus and 20-odd address bits.
http://en.wikipedia.org/wiki/64-bit#64-bit_data_models the data models mean bitness for the language.
The "OS is x-bit" phrase usually means that the OS was written for x-bit cpu mode, that is, 64-bit Windows uses long mode on x86-64, where registers are 64 bits and address space is 64-bits large and there are other distinct differences from 32-bits mode, where typically registers are 32-bits wide and address space is 32-bits large. On x86 a major difference between 32 and 64 bits modes is presence of segmentation in 32-bits for historical compatibility.
Usually the OS is written with CPU bitness in mind, x86-64 being a notable example of decades of backwards compatibility - you can have everything from 16-bit real-mode programs through 32-bits protected-mode programs to 64-bits long-mode programs.
Plus there are different ways to virtualise, so your program may run as if in 32-bits mode, but in reality it is executed by a non-x86 core at all.
When we talk about 2^n bit architectures in computer science then we are basically talking about memory registers, address buses size or data buses size. The basic concept behind term of 2^n bit architecture is to signify that this here 2^n bit of data can be made use to address/transport the data of size 2^n by processes.
As far as I know, technically, it's the width of the integer pathways. I've heard of 16bit chips that have 32bit addressing. However, in reality, it is the address width. sizeof(void*) is 16bit on a 16bit chip, 32bit on a 32bit, and 64bit on a 64bit.
This leads to problems because C and C++ allow conversions between void* and integral types, and it's safe if the integral type is large enough (the same size as the pointer). This lead to all sorts of unsafe stuff in terms of
void* p = something;
int i = (int)p;
Which will horrifically crash and burn on 64bit code (works on 32bit) because void* is now twice as big as int.
In most languages, you have to work hard to care about the width of the system you're working on.

Resources