How is user address memory organized?

How is user address memory organized? - linux

I always read that, at any given time, the processor can only run one process at a time. So one and only one process is in state running.
However, we can have a number of runnable processes. These are all of these processes who are waiting for the scheduler to schedule their execution.
At any given time, do all these runnable processes exist in user address space? Or has the currently running process in user address space, and it is only once they are scheduled that they are brought back to RAM from disk. In which case, does it mean that the kernel keeps the process task descriptor in its list of all runnable processes even if they are in disk? I guess you can tell I am confused.

If CPU supports virtual memory addressing, each process has a unique view of the memory. Two different processes that tries to read from the same memory address, will map to different location in physical memory, unless the memory maps tells otherwize (shared memory, like DLL files are mapped read only like this for instance)
If CPU does not support virtual memory, but only memory protection, the memory from the other processes will be protected away, so that the running process can only access its own memory.

Related

How process and thread related to virtual memory

I'm new to Linux and computer architecture, just some questions on how process and thread related to virtual memory and physical memory RAM.Below is my questions.
Q1-When there is two processes(process A and process B) running concurrently, if process A is running now, the process B's states like register values, heap objects etc have to be pushed to store on disk (Virtual Memory), and when the next context switch happens, process B will be "recovery" from disk to RAM, process A's state will be pushed to disk, is my understanding correct?
Q2- If my understanding in Q1 is correct, why not just save all processes on RAM too? normally we have large RAM like 16gb,32gb etc, how about just store every process's state on RAM, and when there is too many processes and RAM is going to run out, then further processes' states will be stored to disk?
Q3-How about threads? if there is multiple threads (e.g thread A and thread B), when thread A is running, does thread B's state will be pushed to stored on disk too?

is my understanding correct?
No, it's wrong. Waiting or blocked processes don't get swapped to disc. They wait in memory. Virtual memory is not on disc.
Also on a system with two processors, two processes are running concurrently, so both processes A and B can be running at the same time.
why not just save all processes on RAM too?
This is exactly what happens. All processes memory kindly waits in RAM until scheduler switches to this process.
Side note: If there is no RAM available and the system has swap available and this process is idle for some defined time, than it may get swapped on disc, ie. the processes memory may get moved to disc. But this doesn't happen immediately, it happens after a long time and in certain situation
will be pushed to stored on disk too?
No.
Virtual memory is not about physical location of the memory. It's the other way round - virtual memory is a of abstraction that allows system to modify the physical (maybe if any) location of the memory. A simplest explanation I give: there is a special cpu register that is added to each address upon dereferencing. A user space program does *(int*)4 but he doesn't get the value behind 4th byte in RAM, the special cpu register value is added to the pointer value upon dereferencing. The register value is configured by the system, can be different in different programs. So you can have exact same pointer values in two programs, but they both point to different locations. Of cause, this is over-over-simplification.

What happens to allocated memory of other threads when forking

I have a huge application that needs to fork itself at some point. The application is multithreaded and has about 200MB of allocated memory. What I want to do now to ensure that the data allocated by the process wont get duplicated is to start a new thread and fork inside of this thread. From what I have read, only the thread that calls fork will be duplicated, but what will happen to the allocated memory? Will that still be there? The purpose of this is to restart the application with other startup parameters, when its forked, it will call main with my new parameters, thus getting hopefully a new process of the same program. Now before you ask: I cannot assure that the binary of that process will still be in the same place as when I started the process, otherwise I could just fork and exec whats in /proc/self/exe.

Threads are execution units inside the big bag of resources that a process is. A process is the whole thing that you can access from any thread in the process: all the threads, all the file descriptors, all the other resources. So memory is absolutely not tied to a thread, and forking from a thread has no useful effect. Everything still needs to be copied over since the point of forking is creating a new process.
That said, Linux has some tricks to make it faster. Copying 2 gigabytes worth of RAM is neither fast or efficient. So when you fork, Linux actually gives the new process the same memory (at first), but it uses the virtual memory system to mark it as copy-on-write: as soon as one process needs to write to that memory, the kernel intercepts it and allocates distinct memory so that the other process isn't affected.

Is a process kept in main memory while it is in the blocked/suspended state?

When a process P1 is in a blocked or suspended state, will the memory management system swap it out of main memory for room for an active process?
And if the process is determined to come back where is the Program's procedure call stack, Contents of program counter (PC) and Contents of program status word (PSW) stored? Does the OS keep it all in secondary memory or is part of the suspended/blocked process of P1 kept in main memory?

So I'm guessing when a process is swapped out of memory and put in a
suspended state, all of its resident pages are moved out. When the
process is resumed, all of the pages that were previously in main
memory are returned to main memory
Think in terms of pages, not processes.
Even an active process may have many pages evicted out of physical memory and into swap if the system is under memory pressure.
So, sure, a suspended process may have effectively all of its pages swapped out entirely.
But it is unlikely to have all pages swapped in simply because the process woke up. Doing so would be a waste of CPU, I/O and memory. Instead, pages will be brought back as needed (general case -- some pagers may bring back sets of pages heuristically).
If a process is active, then it won't be swapped out, so the dynamic state of the lowest call stack (all the register noise, red zone on stack, etc... ) isn't in play when the swap happens.
I.e. for a process to be swapped out the threads need to be blocked on something, typically a call into the kernel or into a system library that is blocking. Registers will be out of play, etc... Thus, the execution state that needs to be swapped out is pretty straightforward as the call return state will be preserved in the thread state itself (as the thread is blocked).
In fact, things like the PC and the PSW are preserved more as a part of the context switching subsystem than paging. I.e. on a typical system, you'll likely have several hundred, maybe thousands, of threads running at once across the N physical cores of the CPU. The concurrency support of the architecture is where you'll find how that state is maintained.

In Linux, when a process is about to be swapped or terminated, what state should its threads be in?

By swapped and terminated, I mean, if the process is about to be swapped to a swap space or terminated(by OOM killer) to free up memory.
What algorithm does the linux kernel follow?
For instance, Process A needs extra memory and Process B has been chosen to be swapped or killed(if swap space is already occupied), but process B still has a blocking thread.
a.) Does process B gets swapped or killed regardless of the blocking thread?
b.) If not, how is this kind of case handled?
If my example is an unlikely case, any insights would be appreciated.

Yeah - you need to read up on paged virtual memory, as suggested by #CL. Processes are not swapped out in their entirety and swapping!=termination.
If the OS needs to terminate a process, either because of a specific API request or because of its OOM algorithm, the OS stops all its threads first. Blocked threads are easy to 'stop' because they are not running anyway - it's only necessary to change their state to ensure that they are never run again. Thread/s that are actually running on cores have to be stopped by means of an inter-core comms driver that can hardware-interrupt the cores running the threads. Once all threads are not running, the resources, including all user-space memory, allocated to the process can be freed and OS thread/process management structs released. The process then no longer exists.

Thread context switch Vs. process context switch

Could any one tell me what is exactly done in both situations? What is the main cost each of them?

The main distinction between a thread switch and a process switch is that during a thread switch, the virtual memory space remains the same, while it does not during a process switch.
Both types involve handing control over to the operating system kernel to perform the context switch. The process of switching in and out of the OS kernel along with the cost of switching out the registers is the largest fixed cost of performing a context switch.
A more fuzzy cost is that a context switch messes with the processors cacheing mechanisms. Basically, when you context switch, all of the memory addresses that the processor "remembers" in its cache effectively become useless. The one big distinction here is that when you change virtual memory spaces, the processor's Translation Lookaside Buffer (TLB) or equivalent gets flushed making memory accesses much more expensive for a while. This does not happen during a thread switch.

Process context switching involves switching the memory address space. This includes memory addresses, mappings, page tables, and kernel resources—a relatively expensive operation. On some architectures, it even means flushing various processor caches that aren't sharable across address spaces. For example, x86 has to flush the TLB and some ARM processors have to flush the entirety of the L1 cache!
Thread switching is context switching from one thread to another in the same process (switching from thread to thread across processes is just process switching).Switching processor state (such as the program counter and register contents) is generally very efficient.

First of all, operating system brings outgoing thread in a kernel mode if it is not already there, because thread switch can be performed only between threads, that runs in kernel mode. Then the scheduler is invoked to make a decision about thread to which will be performed switching. After decision is made, kernel saves part of the thread context that is located in CPU (CPU registers) into the dedicated place in memory (frequently on the top of the kernel stack of outgoing thread). Then the kernel performs switch from kernel stack of outgoing thread on to kernel stack of the incoming thread. After that, kernel loads previously stored context of incoming thread from memory into CPU registers. And finally returns control back into user mode, but in user mode of the new thread.
In the case when OS has determined that incoming thread runs in another process, kernel performs one additional step: sets new active virtual address space.
The main cost in both scenarios is related to a cache pollution. In most cases, the working set used by the outgoing thread will differ significantly from working set which is used by the incoming thread. As a result, the incoming thread will start its life with avalanche of cache misses, thus flushing old and useless data from the caches and loading the new data from memory. The same is true for TLB (Translation Look Aside Buffer, which is on the CPU). In the case of reset of virtual address space (threads run in different processes) the penalty is even worse, because reset of virtual address space leads to the flushing of the entire TLB, even if new thread actually needs to load only few new entries. As a result, the new thread will start its time quantum with lots TLB misses and frequent page walking. Direct cost of threads switch is also not negligible (from ~250 and up to ~1500-2000 cycles) and depends on the CPU complexity, states of both threads and sets of registers which they actually use.
P.S.: Good post about context switch overhead: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

process switching: it is a transition between two memory resident of process in a multiprogramming environment;
context switching: it is a changing context from an executing program to an interrupt service routine (ISR).

In Thread Context Switching, the virtual memory space remains the same while it is not in the case of Process Context Switch. Also, Process Context Switch is costlier than Thread Context Switch.

I think main difference is when calling switch_mm() which handles memory descriptors of old and new task. In the case of threads, the virtual memory address space is unchanged (threads share virtual memory), so very little has to be done, and therefore less costly.

Though thread context switching needs to change the execution context (registers, stack pointers, program counters), they don't need to change address space as processes context switches do. There's an additional cost when you switch address space, more memory access (paging, segmentation, etc) and you have to flush TLB when entering or exiting a new process...

In short, the thread context switch does not assign a brand new set of memory and pid, it uses the same as the parent since it is running within the same process. A process one spawns a new process and thus assigns new mem and pid.
There is a loooooot more to it. They have written books on it.
As for cost, a process context switch >>>> thread as you have to reset all of the stack counters etc.

Assuming that The CPU the OS runs has got Some High Latency Devices Attached,
It makes sense to run another thread Of the Process's Address Space, while the high latency device responds back.
But, if the High Latency Device is responding faster than the time to need do set up of table + translation of Virtual To Physical memories for a NEW Process, then it is questionable if a switch is essential at all.
Also, HOT cache(data needed for running the process/thread is reachable in less time) is better choice.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string