I just learned about mlock() functions. I know that it allows you to lock program memory into RAM (allowing the physical address to change but not allowing the memory to be evicted). I've read that newer Linux kernel versions have a mlock limit (ulimit -l), but that this is only applicable to unprivileged processes. If this is a per-process limit, could an unprivileged process spawn a ton of processes by fork()-ing and have each call mlock(), until all memory is locked up and the OS slows to a crawl because of tons of swapping or OOM killer calls?
It is possible that an attacker could cause problems with this, but not materially more problems than they could cause otherwise.
The default limit for this on my system is about 2 MB. That means a typical process won't be able to lock more than 2 MB of data into memory. Note that this is just normal memory that won't be swapped out; it's not an independent, special resource.
It is possible that a malicious process could spawn many other processes to use more locked memory, but because a process usually requires more than 2 MB of memory to run anyway, they're not really exhausting memory more efficiently by locking it; in fact, starting a new process itself is actually going to more effective at using memory than locking it. It is true that a process could simply fork, lock memory, and sleep, in which case its other pages would likely be shared because of copy-on-write, but it could just also allocate a decent chunk of memory and cause many more problems, and in fact it will generally have permission to do so since many processes require non-trivial amounts of memory.
So, yes, it's possible that an attacker could use this technique to cause problems, but because there are many easier and more effective ways to exhaust memory or cause other problems, this seems like a silly way to go about doing it. I, for one, am not worried about this as a practical security problem.
Related
Does threading a lot leads to thrashing if each new thread wants to access the memory (specifically the same database in my case) and perform read/write operations through out its lifetime?
I assume that this is true. If my assumption is true, then what is the best way to maximize the CPU utilization? And how can i determine that some specific number of threads will give good CPU utilization?
If my assumption is wrong, please do give proper illustrations to let me understand the scenario clearly.
Trashy code causes trashing. Not thread. All code is ran by some threads, even the main(). Temp objects are garbage collected the same way on any thread.
The subtle part is when each thread preloads its own objects to perform the work, which can duplicate a lot of same classes. It's usually a small sacrifice to make to get the power of concurrency. But it's not trash (no leak, no deterioration).
There is one exception: when some 3rd party code caches material in thread locals... You could end up caching the same stuff on each thread. Not really a leak, but not efficient.
Rule of thumb for number of threads? Depends on the task.
If the tasks are pure computation like math, then you should not exceed the number of non-hyperthreaded cores.
If the job is memory intensive along with pure computation work (most cases), then the number of hyperthreaded cores is your target (because the CPU will use the idle time of memory access for another core computations).
If the job is mostly large sequential disk i/o, then you number of threads should be not to much above the number of disk spindle available to read. This is VERY approximative since the disk caches, DMA, SSD, raids and such are completely affecting how the disk layer can service your thread without idling. When using random access, this is also valid. However, the virtualization these days will throw all your estimates out the window. Disk i/o could be much more available than you think, but also much worse.
If the jobs are mostly network i/o waits, then it is not really limited from your side; I would go with about 3x the number of cores to start. This multiplier is simply presuming that such thread wait on network for 2/3 of its time. Which is very low in practice. Could be 99% of its time waiting for nw i/o (100x). Which is why you see NIO sockets everywhere, to deal with many connections with fewer busier threads.
No, you could have 100's of idle threads waiting for work and not see any thrashing, which is caused by application working set size exceeding available memory size, so active pages need to be reloaded from disk (even written out to disk to when temporary variable storage needs saving to be relaoded later).
Threads share an address space, having many active leads to diminishing returns due to lock contention. So in the DB case, many processes reading tables can proceed simultaneously, yet updates of dependant data need to be serialised to keep data consistent which may cause lock contention and limit parallel processing.
Poorly written queries which need to load & sort large tables into memory, may cause thrashing when they exceed free RAM (perhaps poor choice of indexs). You can increase the query throughput, to utilise CPUs more, by having large RAM disk caches and using SSDs to reduce random data access times.
On memory intensive computations, cache sizes may become important, fewer threads whose data stays in cache and CPU pre-fetches minimise stalls, work better than threads competing to load their data from main memory.
After thinking about the the whole concept of shared memory , a question came up:
can two processes share the same shared memory segment ? can two threads share the same shared memory ?
After thinking about it a little more clearly , I'm almost positive that two processes can share the same shared memory segment , where the first is the father and the second is the son , that was created with a fork() , but what about two threads ?
Thanks
can two processes share the same shared memory segment?
Yes and no. Typically with modern operating systems, when another process is forked from the first, they share the same memory space with a copy-on-write set on all pages. Any updates made to any of the read-write memory pages causes a copy to be made for the page so there will be two copies and the memory page will no longer be shared between the parent and child process. This means that only read-only pages or pages that have not been written to will be shared.
If a process has not been forked from another then they typically do not share any memory. One exception is if you are running two instances of the same program then they may share code and maybe even static data segments but no other pages will be shared. Another is how some operating systems allow applications to share the code pages for dynamic libraries that are loaded by multiple applications.
There are also specific memory-map calls to share the same memory segment. The call designates whether the map is read-only or read-write. How to do this is very OS dependent.
can two threads share the same shared memory?
Certainly. Typically all of the memory inside of a multi-threaded process is "shared" by all of the threads except for some relatively small stack spaces which are per-thread. That is usually the definition of threads in that they all are running within the same memory space.
Threads also have the added complexity of having cached memory segments in high speed memory tied to the processor/core. This cached memory is not shared and updates to memory pages are flushed into central storage depending on synchronization operations.
In general, a major point of processes is to prevent memory being shared! Inter-process comms via a shared memory segment is certainly possible on the most common OS, but the mechanisms are not there by default. Failing to set up, and manage, the shared area correctly will likely result in a segFault/AV if you're lucky and UB if not.
Threads belonging to the same process, however, do not have such hardware memory-management protection can pretty much share whatever they like, the obvious downside being that they can corrupt pretty much whatever they like. I've never actually found this to be a huge problem, esp. with modern OO languages that tend to 'structure' pointers as object instances, (Java, C#, Delphi).
Yes, two processes can both attach to a shared memory segment. A shared memory segment wouldn't be much use if that were not true, as that is the basic idea behind a shared memory segment - that's why it's one of several forms of IPC (inter-Process communication).
Two threads in the same process could also both attach to a shared memory segment, but given that they already share the entire address space of the process they are part of, there likely isn't much point (although someone will probably see that as a challenge to come up with a more-or-less valid use case for doing so).
In general terms, each process occupies a memory space isolated from all others in order to avoid unwanted interactions (including those which would represent security issues). However, there is usually a means for processes to share portions of memory. Sometimes this is done to reduce RAM footprint ("installed files" in VAX/VMS is/was one such example). It can also be a very efficient way for co-operating processes to communicate. How that sharing is implemented/structured/managed (e.g. parent/child) depends on the features provided by the specific operating system and design choices implemented in the application code.
Within a process, each thread has access to exactly the same memory space as all other threads of the same process. The only thing a thread has unique to itself is "execution context", part of which is its stack (although nothing prevents one thread from accessing or manipulating the stack "belonging to" another thread of the same process).
How expensive is a OS native thread ? The host OS allocates some virtual memory for a thread stack and a little bit of the kernel memory for the thread control structures. Am I missing something?
It can increase the scheduler workload, depending how busy the thread is, and the kind of scheduler. It will also allocate physical memory for the first page of the stack.
The main cost in many cases is cache pollution. Having too many active concurrent tasks kills performance because too many threads are sharing too little cache, and they just keep shoving each other back onto main memory, which is a far worse indignity for a thread to suffer than simply being put to sleep, since sleeping incurs a single penalty of several hundred cycles, while retrieving main memory incurs a similar overhead several times during a single time-slice, and also means proportionally more context-switching since much less work gets done during that time-slice.
I've heard that under linux on multicore server it would be impossible to reach top performance when you have just 1 process but multiple threads because Linux have some limitations on the IO, so that 1 process with 8 threads on 8-core server might be slower than 8 processes.
Any comments? Are there other limitation which might slow the applications?
The applications is a network C++ application, serving 100s of clients, with some disk IO.
Update: I am concerned that there are some more IO-related issues other than the locking I implement myself... Aren't there any issues doing simultanious network/disk IO in several threads?
Drawbacks of Threads
Threads:
Serialize on memory operations. That is the kernel, and in turn the MMU must service operations such as mmap() that perform page allocations.
Share the same file descriptor table. There is locking involved making changes and performing lookups in this table, which stores stuff like file offsets, and other flags. Every system call made that uses this table such as open(), accept(), fcntl() must lock it to translate fd to internal file handle, and when make changes.
Share some scheduling attributes. Processes are constantly evaluated to determine the load they're putting on the system, and scheduled accordingly. Lots of threads implies a higher CPU load, which the scheduler typically dislikes, and it will increase the response time on events for that process (such as reading incoming data on a socket).
May share some writable memory. Any memory being written to by multiple threads (especially slow if it requires fancy locking), will generate all kinds of cache contention and convoying issues. For example heap operations such as malloc() and free() operate on a global data structure (that can to some degree be worked around). There are other global structures also.
Share credentials, this might be an issue for service-type processes.
Share signal handling, these will interrupt the entire process while they're handled.
Processes or Threads?
If you want to make debugging easier, use threads.
If you are on Windows, use threads. (Processes are extremely heavyweight in Windows).
If stability is a huge concern, try to use processes. (One SIGSEGV/PIPE is all it takes...).
If threads aren't available, use processes. (Not so common now, but it did happen).
If your threads share resources that can't be use from multiple processes, use threads. (Or provide an IPC mechanism to allow communicating with the "owner" thread of the resource).
If you use resources that are only available on a one-per-process basis (and you one per context), obviously use processes.
If your processing contexts share absolutely nothing (such as a socket server that spawns and forgets connections as it accept()s them), and CPU is a bottleneck, use processes and single-threaded runtimes (which are devoid of all kinds of intense locking such as on the heap and other places).
One of the biggest differences between threads and processes is this: Threads use software constructs to protect data structures, processes use hardware (which is significantly faster).
Links
pthreads(7)
About Processes and Threads (MSDN)
Threads vs. Processes
it really should make no difference but is probably about design.
A multi process app may have to do less locking but may use more memory. Sharing data between processes may be harder.
On the other hand multi process can be more robust. You can call exit() and quit the child safely mostly without affecting others.
It depends how dependent the clients are. I usually recommend the simplest solution.
Okay, so we support per-process memory paging/protection today. I've been wondering for years what sort of benefit is gained by offering page-level protections to what is arguably the smallest execution unit our OSes support today: threads. This question on Software Transactional Memory brought it back to the forefront for me.
Benefits to having page-level thread-ownership
OS support for locking the page when accessed
In theory, protection against memory corruption if the OS had a mechanism to take ownership for the lifetime of a thread.
Downsides:
Deadlock detection with standard
locking techniques is already
difficult enough
debugger/OS
support for determining page-level
ownership
Any other downsides, upsides that you can see from supporting such a model?
This kind of programming model is already possible with processes and shared memory. It isn't used much, for good reason: interprocess message passing is far safer and easier to reason about.
Per-thread per-page memory protection can be used to efficiently implement parallel garbage collection.
The problem to be solved is that in order to collect a region of memory, the garbage collector needs exclusive access to that region, otherwise other threads (so-called "mutator" threads) would be able to read and write objects that are not in a consistent state (for example, halfway through being copied from oldspace to newspace).
With per-thread memory protection, the garbage collector can control access to the region of memory so that only the collector thread can access it; attempts by other threads to access the region of memory will result in segmentation faults that can be handled by the collector (for example, by blocking the thread until the collector is finished with that region).