I just read a text about threads that says "address space does not need protection boundaries" and I was wondering what protection boundaries mean on this context.
It's probably referring to Memory Protection. This relates to threading because separate threads may overwrite each other when trying to access the same memory space.
Related
Is there a way to allocate contiguous physical memory from userspace in linux? At least few guaranteed contiguous memory pages. One huge page isn't the answer.
No. There is not. You do need to do this from Kernel space.
If you say "we need to do this from User Space" - without anything going on in kernel-space it makes little sense - because a user space program has no way of controlling or even knowing if the underlying memory is contiguous or not.
The only reason where you would need to do this - is if you were working in-conjunction with a piece of hardware, or some other low-level (i.e. Kernel) service that needed this requirement. So again, you would have to deal with it at that level.
So the answer isn't just "you can't" - but "you should never need to".
I have written such memory managers that do allow me to do this - but it was always because of some underlying issue at the kernel level, which had to be addressed at the kernel level. Generally because some other agent on the bus (PCI card, BIOS or even another computer over RDMA interface) had the physical contiguous memory requirement. Again, all of this had to be addressed in kernel space.
When you talk about "cache lines" - you don't need to worry. You can be assured that each page of your user-space memory is contiguous, and each page is much larger than a cache-line (no matter what architecture you're talking about).
Yes, if all you need is a few pages, this may indeed be possible.
The file /proc/[pid]/pagemap now allows programs to inspect the mapping of their virtual memory to physical memory.
While you cannot explicitly modify the mapping, you can just allocate a virtual page, lock it into memory via a call to mlock, record its physical address via a lookup into /proc/self/pagemap, and repeat until you just happen to get enough blocks touching eachother to create a large enough contiguous block. Then unlock and free your excess blocks.
It's hackish, clunky and potentially slow, but it's worth a try. On the other hand, there's a decently large chance that this isn't actually what you really need.
DPDK library's memory allocator uses approach #Wallacoloo described. eal_memory.c. The code is BSD licensed.
if specific device driver exports dma buffer which is physical contiguous, user space can access through dma buf apis
so user task can access but not allocate directly
that is because physically contiguous constraints are not from user aplications but only from device
so only device drivers should care.
Why do we need separation between kernel space and user space, when every data structure, every memory resource is managed by the kernel even if we create a process in user space? Why do we need such a big architecture?
All the page tables, context switching etc. are managed by the kernel, then why user space? Can we not use only one space and develop all the things over it?
This question may sound weird, but I want to really understand why is it required? If I have to design a completely new OS and I do not perform such division between kernel space and user space, what would be the problem?
Thanks
Your question is not weird at all -- are absolutely correct in you intuition that the kernel space may represent an unnecessary overhead. Indeed, it is overhead that many operating systems do not have. Your OS most likely won't have it either.
The rationale for the extra complexity goes something like this:
We want process/task separation.
We want to make sure that one task doing bad things does not affect another one. This requires that the tasks execute in their respective isolated spaces.
We want similar protection for shared hardware controllers.
A task cannot set up its own boundaries; if that was the case a misbehaving task could purposely break the rules. So we need a third party with higher privileges than what the tasks have, and there is your kernel (or privileged) space.
If the cost of the extra kernel space and separation is too high, either in size or in computation overhead, the operating system will not have those features. Many real-time or embedded systems run in a single space with a single (same) privilege level (the privilege levels are sometimes referred to as protection rings) exactly as you described it.
If there is no protection between kernel space and user space or between different user processes then anybody can write some code which will intentionally or accidentally modify either memory in kernel space or memory in the space of another user process. Look into the good old MS-DOS which lacked this protection.
Separating the spaces is not strictly necessary, but if you need to protect OS structures from user code to prevent wrong or bad manipulation of them, then you need some kind protection. Otherwise such bad manipulations would break OS inner coherence which is usually something you don't want.
kernel space and user space is about the system protection, to make the system more robust. kernel space is privileged mode and can do things (like directly interacting with hardware/system resources) which user space cannot. All the user space interaction with the hardware has to go through kernel space only.
So, a wrongly written user code cannot crash the system/ cannot make the system unstable. Without this separation, the wrongly written user code could even crash the system.
We just ran out of semaphores on our Linux box, due to the use of too many Websphere Message Broker instances or somesuch.
A colleague and I got to wondering why this is even limited - it's just a bit of memory, right?
I thoroughly googled and found nothing.
Anyone know why this is?
cheers
Semaphores, when being used, require frequent access with very, very low overhead.
Having an expandable system where memory for each newly requested semaphore structure is allocated on the fly would introduce complexity that would slow down access to them because it would have to first look up where the particular semaphore in question at the moment is stored, then go fetch the memory where it is stored and check the value. It is easier and faster to keep them in one compact block of fixed memory that is readily at hand.
Having them dispersed throughout memory via dynamic allocation would also make it more difficult to efficiently use memory pages that are locked (that is, not subject to being swapped out when there are high demands on memory). The use of "locked in" memory pages for kernel data is especially important for time-sensitive and/or critical kernel functions.
Having the limit be a tunable parameter (see links in the comments of original question) allows it to be increased at runtime if needed via an "expensive" reallocation and relocation of the block. But typically this is done one time at system initialization before anything much is even using semaphores.
That said, the amount of memory used by a semaphore set is rather tiny. With modern memory available on systems being in the many gigabytes the original default limits on the number of them might seem a bit stingy. But keep in mind that on many systems semaphores are rarely used by user space processes and the linux kernel finds its way into lots of small embedded systems with rather limited memory, so setting the default limit arbitrarily high in case it might be used seems wasteful.
The few software packages, such as Oracle database for example, that do depend on having many semaphores available, typically do recommend in their installation and/or system tuning advice to increase the system limits.
In order to write to a read-only memory location (an example for such a memory location would be the sys_call table) in kernel module, is it sufficient to disable the page protection by manipulating the 16th bit of CR0 register?
Or do we need something more to write to a read-only memory location?
If you disable page write protection, you may break something dependent on it (e.g. any copy-on-write occurring on kernel pages). If you do it that way, you probably want to temporarily disable interrupts/scheduling, so the memory modification looks atomic on that CPU, this will also avoid moving of the thread to a different CPU if you have more than 1.
I'm not sure that using hard-coded addresses like 0xc12c9e90 is a good idea. I don't know how Linux lays out things in the kernel portion of the address space, but addresses may change from one boot to another either because of dynamic memory allocation or for security reasons (moving things around is useful thing as it reduces the chances of exploitation of kernel bugs).
Is there a way to allocate contiguous physical memory from userspace in linux? At least few guaranteed contiguous memory pages. One huge page isn't the answer.
No. There is not. You do need to do this from Kernel space.
If you say "we need to do this from User Space" - without anything going on in kernel-space it makes little sense - because a user space program has no way of controlling or even knowing if the underlying memory is contiguous or not.
The only reason where you would need to do this - is if you were working in-conjunction with a piece of hardware, or some other low-level (i.e. Kernel) service that needed this requirement. So again, you would have to deal with it at that level.
So the answer isn't just "you can't" - but "you should never need to".
I have written such memory managers that do allow me to do this - but it was always because of some underlying issue at the kernel level, which had to be addressed at the kernel level. Generally because some other agent on the bus (PCI card, BIOS or even another computer over RDMA interface) had the physical contiguous memory requirement. Again, all of this had to be addressed in kernel space.
When you talk about "cache lines" - you don't need to worry. You can be assured that each page of your user-space memory is contiguous, and each page is much larger than a cache-line (no matter what architecture you're talking about).
Yes, if all you need is a few pages, this may indeed be possible.
The file /proc/[pid]/pagemap now allows programs to inspect the mapping of their virtual memory to physical memory.
While you cannot explicitly modify the mapping, you can just allocate a virtual page, lock it into memory via a call to mlock, record its physical address via a lookup into /proc/self/pagemap, and repeat until you just happen to get enough blocks touching eachother to create a large enough contiguous block. Then unlock and free your excess blocks.
It's hackish, clunky and potentially slow, but it's worth a try. On the other hand, there's a decently large chance that this isn't actually what you really need.
DPDK library's memory allocator uses approach #Wallacoloo described. eal_memory.c. The code is BSD licensed.
if specific device driver exports dma buffer which is physical contiguous, user space can access through dma buf apis
so user task can access but not allocate directly
that is because physically contiguous constraints are not from user aplications but only from device
so only device drivers should care.