Linux kernel thread serialization - linux

I'm writing linux kernel module (it is LSM).
It is easy to make hooks in several linux kernel operations but I'm wondering how they are called from multiple threads(and processes).
I'm going to use shared data that may be read/written from multiple threads.
Is there any rule how we should lock the shared data?
If the LSM hook calls are serialized outside, we don't need to lock the shared data, but I couldn't find any documents that specifically says about this.
I also want to know the same thing for procfs or securityfs interface operations.
Are they serialized outside?
My guess it that they are NOT serialized because no docs says so. But still not sure.

Related

File Access (read/write) synchronization between 'n' processes in Linux

I am studying Operating Systems this semester and was just wondering how Linux handles file access (read/write) synchronization, what is the default implementation does it use semaphores, mutexes or monitors? And can you please tell me where I would find this in the source codes or my own copy of Ubuntu and how to disable it?
I need to disable it so i can check if my own implementation of this works, also how do i add my own implementation to the system.
Here's my current plan please tell me if its okay:
Disable the default implementation, add my own. (recompile kernel if need be)
My own version would keep track of every incoming process and maintain a list of what files they were using adn whenever a file would repeat i would check if its a reader process or a writer process
I will be going with a reader preferred solution to the readers writers problem.
Kernel doesn't impose process synchronization (it should be performed by processes while kernel only provides tools for that), but it can guarantee atomicity on some operations: atomic operation can not be interrupted and its result cannot be altered by other operation running in parallel.
Speaking of writing to a file, it has some atomicity guarantees. From man -s3 write:
Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.
Some discussion on SO: Atomicity of write(2) to a local filesystem.
To maintain atomicity, various kernel routines hold i_mutex mutex of an inode. I.e. in generic_file_write_iter():
mutex_lock(&inode->i_mutex);
ret = __generic_file_write_iter(iocb, from);
mutex_unlock(&inode->i_mutex);
So other write() calls won't mess with your call. Readers, however doesn't lock i_mutex, so they may get invalid data. Actual locking for readers is performed in page cache, so a page (4096 bytes on x86) is a minimum amount data that guarantees atomicity in kernel.
Speaking of recompiling kernel to test your own implementation, there are two ways of doing that: download vanilla kernel from http://kernel.org/ (or from Git), patch and build it - it is easy. Recompiling Ubuntu kernels is harder -- it will require working with Debian build tools: https://help.ubuntu.com/community/Kernel/Compile
I'm not clear about what you trying to achieve with your own implementation. If you want to apply strictier synchronization rules, maybe it is time to look at TxOS?

Do all types of interprocess/interthread communication need system calls?

In Linux,
do all types of interprocess communication need system calls?
Types of interprocess communication are such as
Pipes
Signals
Message Queues
Semaphores
Shared Memory
Sockets
Do all types of interthread communication need system calls?
I would like to know if all interprocess communications and interthread communications involve switching from user mode to kernel mode so that the OS kernel will run to perform the communications? Since system calls all involve such switch, I asked if the communications need system calls.
For example, "Shared memory" can be used for both interprocess and interthread communcations, but i am not sure if it requires system calls or involvement of OS kernel to take over the cpu to perform something.
Thanks.
For interprocess communication I am pretty sure you cannot avoid system calls.
For interthread communication I cannot give you a definitive answer, but my educated guess would be "yes-and-no". You see, you can communicate between threads using thread-safe queues, and the only thing that a thread-safe queue needs in order to work is a lock. If a lock is unavailable at the moment that a thread wants to obtain it, then of course the system must be involved in order to put the thread in a waiting mode. But if the lock is available to obtain, then the thread should be able to proceed without the need for any system call.
That's what I would guess, and I would be quite disappointed to find out that things do not actually work this way, because that would mean that code which I have up until now been considering pretty innocent in fact has a tremendous additional hidden overhead.
Yes, every IPC was set by some syscalls(2).
It might happen that some IPC was set by a previous program (e.g. the program in the same process before execve), for example when running a pipeline like ls | ./yourprog it is the shell which has called pipe(2), not yourprog.
Since threads -in the same process- (by definition) share a common address space they can communicate using some shared data. However, they often need some syscall for synchronization (e.g. with mutexes), see e.g. futex(7) - because you want to avoid spinlocks (i.e. wasting CPU power for waiting). But in practice you should use pthreads(7)
In practice you cannot use shared memory (like shm_overview(7)) without synchronization (e.g. with semaphores, see sem_overview(7)). Notice that cache coherence is tricky and makes memory model sometimes non-intuitive (and processor specific).
At least, you do not need a system call for each read/write to shared memory. Setting up shared memory will for sure and synchronizing threads/processes will often involve system calls.
You could use flags in shared memory for synchronization, but note that read and write of flags may not be atomic actions.
(For example if you set up a location in shared memory to be 0 in the beginning and then check for it to be non-zero, while the other process sets it to non-zero when ready for something)

Is it feasible to implemenent Linux concurrency primitives that give better isolation than threads but comparable performance?

Consider a following application: a web search server that upon start creates a large in-memory index of web pages based on data read from disk. Once initialized, in-memory index can not be modified and multiple threads are started to serve user queries. Assume the server is compiled to native code and uses OS threads.
Now, threading model gives no isolation between threads. A buggy thread or any non thread safe code, can corrupt the index or corrupt memory that was allocated by and logically belongs to some other thread. Such problems are difficult to detect and debug.
Theoretically, Linux allows to enforce a better isolation. Once index is initialized, memory it occupies can be marked read only. Threads can be replaced with processes that share the index (shared memory) but other than that have separate heaps and can not corrupt each other. Illegal operation are automatically detected by hardware and the operating system. No mutexes or other synchronization primitives are needed. Memory related data races are completely eliminated.
Is such model feasible in practice? Are you aware of any real life application that do such things? Or maybe there are some fundamental difficulties that make such model impractical? Do you think such approach would introduce a performance overhead compared to traditional threads? Theoretically, memory that is used is the same, but are there some implementation-related issues that would make things slower?
The obvious solution is to not use threads at all. Use separate processes. Since each process has much in common with code and readonly structures, making the readonly data shared is trivial: format it as needed for in-memory use within a file and map the file to memory.
Using this scheme, only the variable per-process data would be independent. The code would be shared and statically initialized data would be shared until written. If a process croaks, there is zero impact on other processes. No concurrency issues at all.
You can use mprotect() to make your index read-only. On a 64-bit system you can map the local memory for each thread at a random address (see this Wikipedia article on address space randomization) which makes the odds of memory corruption from one thread touching another astronomically small (and of course any corruption that misses mapped memory altogether will cause a segfault). Obviously you'll need to have different heaps for each thread.
I think you might find memcached interesting. Also, you can create a shared memory and open it as read-only and then create your threads. This should not cause much performance degradation.

Linux system call for creating process and thread

I read in a paper that the underlying system call to create processes and threads is actually the same, and thus the cost of creating processes over threads is not that great.
First, I wanna know what is the system call that creates
processes/threads (possibly a sample code or a link?)
Second, is
the author correct to assume that creating processes instead of
threads is inexpensive?
EDIT:
Quoting article:
Replacing pthreads with processes is surprisingly inexpensive,
especially on Linux where both pthreads and processes are invoked
using the same underlying system call.
Processes are usually created with fork, threads (lightweight processes) are usually created with clone nowadays. However, anecdotically, there exist 1:N thread models, too, which don't do either.
Both fork and clone map to the same kernel function do_fork internally. This function can create a lightweight process that shares the address space with the old one, or a separate process (and many other options), depending on what flags you feed to it. The clone syscall is more or less a direct forwarding of that kernel function (and used by the higher level threading libraries) whereas fork wraps do_fork into the functionality of the 50 year old traditional Unix function.
The important difference is that fork guarantees that a complete, separate copy of the address space is made. This, as Basil points out correctly, is done with copy-on-write nowadays and therefore is not nearly as expensive as one would think.
When you create a thread, it just reuses the original address space and the same memory.
However, one should not assume that creating processes is generally "lightweight" on unix-like systems because of copy-on-write. It is somewhat less heavy than for example under Windows, but it's nowhere near free.
One reason is that although the actual pages are not copied, the new process still needs a copy of the page table. This can be several kilobytes to megabytes of memory for processes that use larger amounts of memory.
Another reason is that although copy-on-write is invisible and a clever optimization, it is not free, and it cannot do magic. When data is modified by either process, which inevitably happens, the affected pages fault.
Redis is a good example where you can see that fork is everything but lightweight (it uses fork to do background saves).
The underlying system call to create threads is clone(2) (it is Linux specific). BTW, the list of Linux system calls is on syscalls(2), and you could use the strace(1) command to understand the syscalls done by some process or command. Processes are usually created with fork(2) (or vfork(2), which is not much useful these days). However, you could (and some C standard libraries might do that) create them with some particular form of clone. I guess that the kernel is sharing some code to implement clone, fork etc... (since some functionalities, e.g. management of the virtual address space, are common).
Indeed, process creation (and also thread creation) is generally quite fast on most Unix systems (because they use copy-on-write machinery for the virtual memory), typically a small fraction of a millisecond. But you could have pathological cases (e.g. thrashing) which makes that much longer.
Since most C standard library implementations are free software on Linux, you could study the source code of the one on your system (often GNU glibc, but sometimes musl-libc or something else).

Where can I find documentation on the kflushd?

I cannot find any documentation on the kflushd such as what it does exactly, how it is involved in network IO and how I could use it/call it from my own code.
kflushd AFAIK handles writing out pending I/O in memory to their corresponding devices. If you want to flush pending I/O's you can always call flush, fflush, or sync to force a write to the I/O device.
To call it from your code simply use one of the calls I mentioned (although I think there might be one more I'm forgetting).
Kernel processes like kflushd are started by the kernel on its own (they are not descendant of the init process started by fork-ing) and exist only for the kernel needs. User applications may invisibly need them (because they need some feature offered by the kernel which the kernel implements with the help of its own kernel processes) but don't actively use them.
You definitely should use appropriately the fflush(3) library function (which just happens to make the relevant write(2) syscalls).
You may want to use the fsync(2) and related syscalls.
Regarding networking, you may be interested by Nagle's algorithm. See this answer.

Resources