mprotect : how is memory protection implemented

mprotect : how is memory protection implemented - linux

I already know that mprotect() syscall has 4 protection mode in BSD, but my problem is that how this protection is implemented ( Hardware or Software Implemention ) ?
let's say if we set protection of specific pages to PROT_NONE ,is it really depend on the hardware I'm using or it's kind of some software tricks by setting some flags on that specified page in page table.
it seems that this protection on hardware deponds on MMU we have, but I'm not sure about it.
you can find more information about mprotect and paging on :
BSD man page
Paging - Wiki

Page protection is implemented in hardware with software assistance. Basically, you want to achieve the following:
Enter kernel context automatically when user process wants to do something with specific memory page (the hardware is responsible for this).
Let kernel code do something to the accessing process in order to uphold the mprotect guarantee (this happens in software invoked from hardware trap handler triggered in p.1).
And yes, without the MMU p.1 would not work, so on ucLinux (a version of Linux designed to support processors without MMU) mprotect is not implemented (as it will be impossible to invoke the code from p.2 transparently).

Related

What is the tdp_page_fault used for?

I am working on a QEMU-KVM hypervisor, and i'd like to understand the purpose of tdp_page_fault.
In fact, i need to count the page faults due to the virtual machine execution and it seems that tdp_page_fault handles more page faults than what i talked about. so what is tdp_page_fault used for?

On simple processors, a lot of the kernel stuff needs to be emulated because it is actually running in user space. On high-end x86 we can do TDP (two-dimensional paging) where the page table lookup for both the guest->host and the host->physical are done in hardware, so much faster than emulation.
tdp_page_fault handles a page fault in the guest address space.

How the OS knows a page is dirty in mapped memory?

I mean when data is updated directly in memory, without using write().
In linux I thought all the data specified in a msync call was flushed.
But in Windows the doc of FlushViewOfFile says "writing of dirty pages", so somehow the OS knows what pages have been updated.
How does that work ? Do we have to use WriteFile to update mapped memory ?
If we use write() in linux does msync only syncs dirty pages ?

On most (perhaps all) modern-day computers running either Linux or Windows, the CPU keeps track of dirty pages on the operating system's behalf. This information is stored in the page table.
(See, for example, section 4.8 of the Intel® 64 and IA-32 Architectures
Software Developer’s Manual, Volume 3A and section 5.4.2 in the AMD64 Architecture Programmer's Manual, Volume 2.)
If that functionality isn't available on a particular CPU, an operating system could instead use page faults to detect the first write to a page, as described in datenwolf's answer.

When flushing pages (i.e. cleaning them up) the OS internally removes the "writeable" flag. After that, when a program attempts to write to a memory location in such a page, the kernel's page fault handler is invoked. The page fault handler then sets the page access permissions to allow the actual write and marks the page dirty, then returns control to the program to let it perform the actual write.

what is the corresponding register in SPARC architecture for x86 CR3?

I know that in x86 architecture, I can read CR3 register in kernel context
to follow page directory of kernel.
now I am trying to do the same work from linux with SPARC architecture.
how can I access page directory of kernel in SPARC?
what is the corresponding register in SPARC as x86 CR3?
are their paging mechanism same??
ps. what about ARM?, I have some documents about these, but I need more...
thank you in advance.

On SPARC TLB-faulting is handled in software, so there is nothing like CR3. You'll have to check the current process data structure to find the required information.
ARM on the other hand uses hardware translation, the MMU is handled as a coprocessor using MRC/MCR for accessing the Translation Table Base Register. See ARMs website for more information: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0056d/BABIDADG.html

The SPARC specification as such does not mandate the use of a MMU nor require a specific implementation of a MMU; it merely defines the interface between CPU and MMU (largely concerned about traps that need to be generated via MMU activity).
That said - and I have to state here that I only know about Sun's / Fujitsu's SPARC CPUs, not about the embedded stuff (LEON and predecessors) - SPARC CPUs as far back as sun4 workstation CPUs of 1990 have had MMUs.
As with many non-mandated SPARC CPU features, control of the MMU happens through what's called Address Space Identifiers (ASIs).
These ASIs are a feature of the SPARC architecture that can best be described as a mix between x86 segmentation and memory mapped registers. Their use changes what an "address" means to a SPARC CPU (that's just like the use of a segment [reg] in x86 changes the meaning of the "address") - except that there's no configurable "descriptor table" behind ASI address ranges, but usually hardware-specific control registers. I.e. they're not "mapped" into the ordinary physical address space, but into alternate address ranges.
First - in sun4, sun4c, sun4d and sun4m architectures (32bit sparcv7), the MMU was called srmmu (SPARC reference MMU) and implemented a two-level hardware table walk. This is deprecated and I don't remember out of my head what the control regs for this were.
When Sun created the sun4u architecture, a hardware-implemented translation table walk was considered both too-high an overhead as well as too memory-intensive; therefore the table-walking MMU implementation was completely yanked in favour of implementing most (but not all) of MMU functionality in software. In particular, the only thing programmable into the hardware are the contents of the TLB, translation lookaside buffer - meaning if a mapping isn't readily cached in the TLB, a MMU miss trap occurs and the trap handler will perform a table lookup, reprogramming the TLB at the end (so that re-issuing the instruction afterwards will succeed). That's where the name, sfmmu (Software MMU) comes from. The MMU is controlled through ASI_MMU, while the actual context register is CPU-specific ...
See, for reference:
pg. 351ff of the 2007 UltraSPARC architecture reference manual - this lists the ASI numbers involved with MMU control.
OpenSolaris sourcecode, use of ASI_MMU_CTX.
OpenSolaris sourcecode, sfmmu_asm.s (beware of eyes bleeding and brain becoming mushy)
OpenSolaris sourcecode, hat_sfmmu.c
that's the software-side hash chain / translation table walks; it might have the dubious honor of being the largest sourcefile in the Solaris kernel ...
Re, ARM: I suggest you ask the question again, as a separate post. Over the existance of ARM multiple (both mmu-less and with-mmu) implementations have evolved (there is, for example, the ARMv7 instruction set based Cortex-A8/M3 with/out MMU). MMU specifications by ARM themselves are usually called VMSA (Virtual Memory Systems Architecture) and there are several revisions of it. This posting is already too long to contain more details ;-)

Why doesn't Linux use the hardware context switch via the TSS?

I read the following statement:
The x86 architecture includes a
specific segment type called the Task
State Segment (TSS), to store hardware
contexts. Although Linux doesn't use
hardware context switches, it is
nonetheless forced to set up a TSS for
each distinct CPU in the system.
I am wondering:
Why doesn't Linux use the hardware support for context switch?
Isn't the hardware approach much faster than the software approach?
Is there any OS which does take advantage of the hardware context switch? Does windows use it?
At last and as always, thanks for your patience and reply.
-----------Added--------------
http://wiki.osdev.org/Context_Switching got some explanation.
People as confused as me could take a look at it. 8^)

The x86 TSS is very slow for hardware multitasking and offers almost no benefits when compared to software task switching. (In fact, I think doing it manually beats the TSS a lot of times)
The TSS is known also for being annoying and tedious to work with and it is not portable, even to x86-64. Linux aims at working on multiple architectures so they probably opted to use software task switching because it can be written in a machine independent way. Also, Software task switching provides a lot more power over what can be done and is generally easier to setup than the TSS is.
I believe Windows 3.1 used the TSS, but at least the NT >5 kernel does not. I do not know of any Unix-like OS that uses the TSS.
Do note that the TSS is mandatory. The thing that OSs do though is create a single TSS entry(per processor) and everytime they need to switch tasks, they just change out this single TSS. And also the only fields used in the TSS by software task switching is ESP0 and SS0. This is used to get to ring 0 from ring 3 code for interrupts. Without a TSS, there would be no known Ring 0 stack which would of course lead to a GPF and eventually triple fault.

Linux used to use HW-based switching, in the pre-1.3 timeframe iirc. I believe sw-based context switching turned out to be faster, and it is more flexible.
Another reason may have been minimizing arch-specific code. The first port of Linux to a non-x86 architecture was Alpha. Alpha didn't have TSS, so more code could be shared if all archs used SW switching. (Just a guess.) Unfortunately the kernel changelogs for the 1.2-1.3 kernel period are not well-preserved, so I can't be more specific.

Linux doesn't use a segmented memory model, so this segmentation specific feature isn't used.
x86 CPUs have many different kinds of hardware support for context switching, so the distinction isn't hardware vs software, but more how does an OS use the various hardware features available. It isn't necessary to use them all.
Linux is so efficiency focussed that you can bet that someone has profiled every option that is possible, and that the options currently used are the best available compromise.

Disabling Multithreading during runtime

I am wondering if Intel's processor provides instructions in their instruction set
to turn on and off the multithreading or hyperthreading capability? Basically, I wanna
know if an Operating System can control these feature via instructions somehow?
Thank you so much
Mareike

Most operating systems have a facility for changing a process' CPU affinity, thereby restricting it to a single physical or virtual core. But multithreading is a program architecture, not a CPU facility.

I think that what you are trying to ask is, "Is there a way to prevent the OS from utilizing hyperthreading and/or multiple cores?"
The answer is, definitely. This isn't governed by a single instruction, and indeed it's not like you can just write a device driver that would automagically disable all of that hardware. Most of this depends on how the kernel configures the interrupt controllers at boot time.
When a machine is first started, there is a designated processor that is used for bootstrapping. It is the responsibility of the OS to configure the multiprocessor hardware accordingly. On PC platforms this would involve reading information about the multiprocessor configuration from in-memory tables provided by the boot firmware. This data would likely conform to either the ACPI or the Intel multiprocessor specifications. The kernel then uses that date to configure the APIC hardware accordingly.

Multithreading and multitasking are not special instructions or modes in the CPU. They're just fancy ways people who write operating systems use interrupts. There is a hardware timer, basically a counter being incremented by a clocking signal, that triggers an interrupt when it overflows. The exact interrupt is platform specific. In the olden days this timer is actually a separate chip/circuit on the motherboard that is simply attached to one of the CPU's interrupt pin. Modern CPUs have this timer built in. So, to turn off multithreading and multitasking the OS can simply disable the interrupt signal.
Alternatively, since it's the OS's job to actually schedule processes/threads, the OS can simply decide to ignore all threads and not run them.
Hyperthreading is a different thing. It sort of allows the OS to see a second virtual CPU that it can execute code on. Never had to deal with the thing directly so I'm not sure how to turn it off (or even if it is possible).

There is no x86 instruction that disables HyperThreading or additional cores. But, there is BIOS settings that can turn off these features. Because it can be set in BIOS, it requires rebooting, and generally it's beyond OS control. There is Windows booting option that limits the number of active core, but HyperThreading can be turn on/off only by BIOS. Current Intel's HyperThreading implementation doesn't allow dynamic turn on and off (and it won't be easily implemented in a near time).
I have assumed 'multithreading' in your question as 'hardware multithreading' which is technically identical to HyperThreading. However, if you really intended software-level multithreading (i.e., multitasking), then it's totally different question. It is (almost) impossible for modern operating systems since they are by default supports multitasking. And, this question actually doesn't make sense. It can make sense if you want to run MS-DOS (as real mode of x86, where a single task can be done).
p.s. Please note that 'multithreading' can be either hardware or software. Also I agree all others' answers such as processor/thread affinity.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string