what is the corresponding register in SPARC architecture for x86 CR3? - linux

I know that in x86 architecture, I can read CR3 register in kernel context
to follow page directory of kernel.
now I am trying to do the same work from linux with SPARC architecture.
how can I access page directory of kernel in SPARC?
what is the corresponding register in SPARC as x86 CR3?
are their paging mechanism same??
ps. what about ARM?, I have some documents about these, but I need more...
thank you in advance.

On SPARC TLB-faulting is handled in software, so there is nothing like CR3. You'll have to check the current process data structure to find the required information.
ARM on the other hand uses hardware translation, the MMU is handled as a coprocessor using MRC/MCR for accessing the Translation Table Base Register. See ARMs website for more information: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0056d/BABIDADG.html

The SPARC specification as such does not mandate the use of a MMU nor require a specific implementation of a MMU; it merely defines the interface between CPU and MMU (largely concerned about traps that need to be generated via MMU activity).
That said - and I have to state here that I only know about Sun's / Fujitsu's SPARC CPUs, not about the embedded stuff (LEON and predecessors) - SPARC CPUs as far back as sun4 workstation CPUs of 1990 have had MMUs.
As with many non-mandated SPARC CPU features, control of the MMU happens through what's called Address Space Identifiers (ASIs).
These ASIs are a feature of the SPARC architecture that can best be described as a mix between x86 segmentation and memory mapped registers. Their use changes what an "address" means to a SPARC CPU (that's just like the use of a segment [reg] in x86 changes the meaning of the "address") - except that there's no configurable "descriptor table" behind ASI address ranges, but usually hardware-specific control registers. I.e. they're not "mapped" into the ordinary physical address space, but into alternate address ranges.
First - in sun4, sun4c, sun4d and sun4m architectures (32bit sparcv7), the MMU was called srmmu (SPARC reference MMU) and implemented a two-level hardware table walk. This is deprecated and I don't remember out of my head what the control regs for this were.
When Sun created the sun4u architecture, a hardware-implemented translation table walk was considered both too-high an overhead as well as too memory-intensive; therefore the table-walking MMU implementation was completely yanked in favour of implementing most (but not all) of MMU functionality in software. In particular, the only thing programmable into the hardware are the contents of the TLB, translation lookaside buffer - meaning if a mapping isn't readily cached in the TLB, a MMU miss trap occurs and the trap handler will perform a table lookup, reprogramming the TLB at the end (so that re-issuing the instruction afterwards will succeed). That's where the name, sfmmu (Software MMU) comes from. The MMU is controlled through ASI_MMU, while the actual context register is CPU-specific ...
See, for reference:
pg. 351ff of the 2007 UltraSPARC architecture reference manual - this lists the ASI numbers involved with MMU control.
OpenSolaris sourcecode, use of ASI_MMU_CTX.
OpenSolaris sourcecode, sfmmu_asm.s (beware of eyes bleeding and brain becoming mushy)
OpenSolaris sourcecode, hat_sfmmu.c
that's the software-side hash chain / translation table walks; it might have the dubious honor of being the largest sourcefile in the Solaris kernel ...
Re, ARM: I suggest you ask the question again, as a separate post. Over the existance of ARM multiple (both mmu-less and with-mmu) implementations have evolved (there is, for example, the ARMv7 instruction set based Cortex-A8/M3 with/out MMU). MMU specifications by ARM themselves are usually called VMSA (Virtual Memory Systems Architecture) and there are several revisions of it. This posting is already too long to contain more details ;-)

Related

What is the modern usage of the global descriptor table(GTD)?

After a long read, I am really confused.
From what I read:
Modern OS does not use segments at all.
The GDT is used to define a segment in the memory (including constraints).
The page table has a supervisor bit that indicates if the current location is for the kernel.
Wikipedia says that "The GDT is still present in 64-bit mode; a GDT must be defined but is generally never changed or used for segmentation."
Why do we need it at all? And how linux uses it?
Modern OS does not use segments at all.
A modern OS (for 64-bit 80x86) still uses segment registers; it's just that their use is "mostly hidden" from user-space (and most user-space code can ignore them). Specifically; the CPU will determine if the code is 64-bit (or 32-bit or 16-bit) from whatever the OS loads (from GDT or LDT) into CS, interrupts still save CS and SS for the interrupted code (and load them again at iret), GS and/or FS are typically used for thread-local and/or CPU local storage, etc.
The GDT is used to define a segment in the memory (including constraints).
Code and data segments are just one of the things that GDT is used for. The other main use is defining where the Task State Segment is (which is used to find IO port permission map, values to load into CS, SS and RSP when there's a privilege level change caused by an interrupt, etc). It's also still possible for 64-bit code (and 32-bit code/processes running under a 64-bit kernel) to use call gates defined in the GDT, but most operating systems don't use that feature for 64-bit code (they use syscall instead).
The page table has a supervisor bit that indicates if the current location is for the kernel.
Yes. The page table's supervisor bit determines if code running at CPL=3 can/can't access the page (or if the code must be CPL=2, CPL=1 or CPL=0 to access the page).
Wikipedia says that "The GDT is still present in 64-bit mode; a GDT must be defined but is generally never changed or used for segmentation."
Yes - Wikipedia is right. Typically an OS will set up a GDT early during boot (for TSS, CS, SS, etc) and then not have any reason to modify it after boot; and the segment registers aren't used for "segmented memory protection" (but are used for other things - determining code size, if an interrupt handler should return to CPL=0 or not, etc).

Which type of memory model (i.e. flat / segmentation) is used by linux kernel?

I am reading about x86 protected mode working, In that I have seen the flat memory model and segmentation memory model.
If linux kernel is using flat memory model then, How it protects the access of unprivileged applications to critical data?
Linux generally uses neither. On x86, Linux has separate page tables for userspace processes and the kernel. The userspace page tables do not contain user mappings to kernel memory, which makes it impossible for user-space processes to access kernel memory directly.
Technically, "virtual addresses" on x86 pass through segmentation first (and are converted from logical addresses to linear addresses) before being remapped from linear addresses to physical addresses through the page tables. Except in unusual cases, segmentation won't change the resulting physical address in 64 bit mode (segmentation is just used to store traits like the current privilege level, and enforce features like SMEP).
One well known "unusual case" is the implementation of Thread Local Storage by most compilers on x86, which uses the FS and GS segments to define per logical processor offsets into the address space. Other segments can not have non-zero bases, and therefore cannot shift addresses through segmentation.

Linux uses Paging or Segmentation or Both? [duplicate]

I'm reading "Understanding Linux Kernel". This is the snippet that explains how Linux uses Segmentation which I didn't understand.
Segmentation has been included in 80 x
86 microprocessors to encourage
programmers to split their
applications into logically related
entities, such as subroutines or
global and local data areas. However,
Linux uses segmentation in a very
limited way. In fact, segmentation
and paging are somewhat redundant,
because both can be used to separate
the physical address spaces of
processes: segmentation can assign a
different linear address space to each
process, while paging can map the same
linear address space into different
physical address spaces. Linux prefers
paging to segmentation for the
following reasons:
Memory management is simpler when all
processes use the same segment
register values that is, when they
share the same set of linear
addresses.
One of the design objectives of Linux
is portability to a wide range of
architectures; RISC architectures in
particular have limited support for
segmentation.
All Linux processes running in User
Mode use the same pair of segments to
address instructions and data. These
segments are called user code segment
and user data segment , respectively.
Similarly, all Linux processes running
in Kernel Mode use the same pair of
segments to address instructions and
data: they are called kernel code
segment and kernel data segment ,
respectively. Table 2-3 shows the
values of the Segment Descriptor
fields for these four crucial
segments.
I'm unable to understand 1st and last paragraph.
The 80x86 family of CPUs generate a real address by adding the contents of a CPU register called a segment register to that of the program counter. Thus by changing the segment register contents you can change the physical addresses that the program accesses. Paging does something similar by mapping the same virtual address to different real addresses. Linux using uses the latter - the segment registers for Linux processes will always have the same unchanging contents.
Segmentation and Paging are not at all redundant. The Linux OS fully incorporates demand paging, but it does not use memory segmentation. This gives all tasks a flat, linear, virtual address space of 32/64 bits.
Paging adds on another layer of abstraction to the memory address translation. With paging, linear memory addresses are mapped to pages of memory, instead of being translated directly to physical memory. Since pages can be swapped in and out of physical RAM, paging allows more memory to be allocated than what is physically available. Only pages that are being actively used need to be mapped into physical memory.
An alternative to page swapping is segment swapping, but it is generally much less efficient given that segments are usually larger than pages.
Segmentation of memory is a method of allocating multiple chunks of memory (per task) for different purposes and allowing those chunks to be protected from each other. In Linux a task's code, data, and stack sections are all mapped to a single segment of memory.
The 32-bit processors do not have a mode bit for disabling
segmentation, but the same effect can be achieved by mapping the
stack, code, and data spaces to the same range of linear addresses.
The 32-bit offsets used by 32-bit processor instructions can cover a
four-gigabyte linear address space.
Aditionally, the Intel documentation states:
A flat model without paging minimally requires a GDT with one code and
one data segment descriptor. A null descriptor in the first GDT entry
is also required. A flat model with paging may provide code and data
descriptors for supervisor mode and another set of code and data
descriptors for user mode
This is the reason for having a one pair of CS/DS for kernel privilege execution (ring 0), and one pair of CS/DS for user privilege execution (ring 3).
Summary: Segmentation provides a means to isolate and protect sections of memory. Paging provides a means to allocate more memory that what is physically available.
Windows uses the fs segment for local thread storage.
Therefore, wine has to use it, and the linux kernel needs to support it.
Modern operating systems (i.e. Linux, other Unixen, Windows NT, etc.) do not use the segmentation facility provided by the x86 processor. Instead, they use a flat 32 bit memory model. Each user mode process has it's own 32 bit virtual address space.
(Naturally the widths are expanded to 64 bits on x86_64 systems)
Intel first added segmentation on the 80286, and then paging on the 80386. Unix-like OSes typically use paging for virtual memory.
Anyway, since paging on x86 didn't support execute permissions until recently, OpenWall Linux used segmentation to provide non-executable stack regions, i.e. it set the code segment limit to a lower value than the other segment's limits, and did some emulation to support trampolines on the stack.

Linux Page Table Management and MMU

I have a question about relationship between linux kernel and MMU.
I now got a point that the linux kernel manages page table between virtual memory addresses and physical memory addresses.
At the same time there is MMU in x86 architecture which manages page table between virtual memory addresses and physical memory addresses.
If MMU presents near CPU, does kernel still need to take care of page table?
This question may be stupid, but the other question is, if MMU takes care of memory space, who manages high memory and low memory? I believe kernel will receive size of virtual memory from MMU (4GB in 32bit) then kernel will distinguish between userspace and kernel space in virtual address.
Am I correct? or completely wrong?
Thanks a lot in advance!
The OS and MMU page management responsibilities are 2 sides of the same mechanism, that lives on the boundary between architecture and micro-architecture.
The first side defines the "contract" between the hardware and the software that runs over it (in this case - the OS) - if you want to use virtual memory, you need build and maintain a page table as described in that contract.
The MMU side, on the other hand, is a hardware unit that's responsible for performing the HW tasks of the address translation. This may or may not include hardware optimizations, these are usually hidden and may be implemented in various ways to run under the hood, as long as it maintains the hardware side of the contract.
In theory, the MMU may decide to issue a set of memory accesses for each translation (a page walk), in order to achieve the required behavior. However, since it's a performance critical element, most MMUs optimize this by caching the results of previous page walks inside the TLB, just like a cache stores the results of previous accesses (actually, on some implementations, the caches themselves may also store some of the accesses to the page table since it usually resides in cacheable memory). The MMU can manage multiple TLBs (most implementations separate the ones for data and code pages, and some have 2nd level TLBs), and provide the translation from there without you noticing that except for the faster access time.
It should also be noted that the hardware must guard against many corner cases that can harm the coherency of such TLB "caching" of previous translations, for example page aliasing or remaps during usage. On some machines, the nastier cases even require a massive flush flow called TLB shootdown.

Questions on Linux kernel internals

I was reading "Linux device drivers, 3rd edition" and faced a few kernel items I don't quite understand. Hope gurus on this forum will help me out.
1) Does the Linux kernel internally operate with virtual or physical addresses? What especially confuses me is that there are several types of addresses (logical, virtual, bus and physical) and they are all valid and operable by the kernel.
2) Is that correct that CPU instructions can't directly address data stored in peripheral devices and therefore addressable memory is used, i.e. buffers, for these purposes?
4) Can a process sleep when requesting semaphore (which has a value 0) and has to wait for it?
4) Atomic operations -- are these guaranteed by specific CPU instructions?
Some special bits (e.g. initial bootstrap) operate in real mode on physical addresses, but most kernel code (all the parts written in C) runs in a virtual address space. You'll see pointers to other address spaces with annotations to remind you not to directly dereference them.
There are special functions for performing copies between various other address spaces (e.g. PCI devices' configuration space) and the kernel's memory. Depending on the architecture, some parts are directly mappable.
Yes.
Not necessarily. For architectures lacking atomic operations, atomicity can be guaranteed by halting all other processors (easy if single-core) and disabling interrupts.
Three- Yes
Four- Architecture dependent

Resources