Fork in critical section

Fork in critical section - linux

Recently, in an interview I gave, we were discussing about the critical sections.
The question asked was "what happens when we execute fork inside critical section? Will the resulting child process also execute the critical section simultaneously?"
We discussed about the possibilities though:
Yes, both might execute the critical section simultaneously
fork() system call might be blocking for child process, allowing only parent to execute the critical section.
Compiler may be intelligent enough to identify this problem and might throw compilation error.
Unfortunately, I could not find more details about this on the internet. TIA.
Edited:
Adding the pseudocode for reference:
semaphore s;
s.wait(); // lock
/* critical-section */
pid = fork(); /* what will happen here in child/parent process? */
s.signal(); // unlock

For Linux semaphores, the second parameter of sem_init determines if it's a cross-process semaphore. You place those in shared memory, which is inherited by fork.
fork does not try to check existing semaphores, nor does it try to adjust the semaphore count. Semaphores can have counts >1, and will allow that many running threads. So a count of 2 would allow two threads to run - fork isn't going to guess.
[edit] The old answer below assumed a Linux futex, which is more like a critical section.
"The" critical section is misleading. After the fork, both processes have their own critical sections. As a result, none of your 3 options apply.

what happens when we execute fork inside critical section? Will the resulting child process also execute the critical section simultaneously?"
Yes, both might execute the critical section simultaneously
fork() system call might be blocking for child process, allowing only parent to execute the critical section.
Compiler may be intelligent enough to identify this problem and might throw compilation error. Unfortunately, I could not find more
details about this on the internet. TIA.
Any of those and more is possible in principle, but neither (2) nor (3) is implemented by any system I know. In particular, since you tagged Linux, GLibc's fork() doesn't have any special provision for interaction with semaphores, and GCC will not reject code on the grounds you suggest. (There is a matter of the (System V) semaphore adjustment, but that's not directly relevant.)
But (1) is not completely correct, either. It is true that if the fork() is successful then the child will start execution by returning from the fork into the critical section. Nothing specifically prevents that from happening while the parent is still inside the critical section itself, so it may be that both run in the critical section at the same time. On the other hand, it is possible that one of the two resulting processes does not get scheduled to actually execute any instructions until a time that happens to be after the other has exited the critical section. That would look a lot like (2) even though technically, neither process was blocked.
That is barely the tip of the iceberg of issues involved in such a situation, however.

Related

Multi-threaded fork()

In a multi-threaded application, if a thread calls fork(), it will copy the state of only that thread. So the child process created would be a single-thread process. If some other thread were to hold a lock required by the thread which called the fork(), that lock would never be released in the child process. This is a problem.
To counter this, we can modify the fork() in two ways. Either we can copy all the threads instead of only that single one. Or we can make sure that any lock held by the (other) non-copied threads will be released. So what will be the modified fork() system call in both these cases. And which of these two would be better, or what would be the advantages and disadvantages of either option?

This is a thorny question.
POSIX has pthread_atfork() to work through the mess of mixing forks and thread creation. The NOTES section of that man page discusses mutexes etc. However, it acknowledges that getting it right is hard.
The function isn't so much an alternative to fork() as it is a way to explain to the pthread library how your program needs to be prepared for the use of fork().
In general not trying to launch a thread from the child of fork but either exiting that child or calling exec asap, will minimize problems.
This post has a good discussion of pthread_atfork().

...Or we can make sure that any lock held by the (other) non-copied threads will be released.
That's going to be harder than you realize because a program can implement "locks" entirely in user-mode code, in which case, the OS would have no knowledge of them.
Even if you were careful only to use locks that were known to the OS you still have a more general problem: Creating a new process with just the one thread would effectively be no different from creating a new process with all of the threads and then immediately killing all but one of them.
Read about why we don't kill threads. In a nutshell: Locks aren't the only state that needs to be cleaned up. Any of the threads that existed in the parent but not in the child could, at the moment of the fork call, been in the middle of making a mess that needs to be cleaned up. If that thread doesn't exist in the child, then you've lost the knowledge of what needs to be cleaned up.
we can copy all the threads instead of only that single one...
That also is a potential problem. The one thread that calls fork() would know when and why fork() was called, and it would be prepared for the fork call. None of the other threads would have any warning. And, if any of those threads is interacting with something outside of the process (e.g., talking to a remote service) then,where you previously had one client talking to the service, you suddenly have two clients, talking to the same service, and they both think that they are the only one. That's not going to end well.
Don't call fork() from multi-threaded programs.
In one project I worked on: We had a big multi-threaded program that needed to spawn other processes. How we did it is, we had it spawn a simple, single-threaded "helper" program before it created any new threads. Then, whenever it needed to spawn another process, it sent a message to the helper, and the helper did it.

Can a Linux process/thread terminate without pass through do_exit()?

To verify the behavior of a third party binary distributed software I'd like to use, I'm implementing a kernel module whose objective is to keep track of each child this software produces and terminates.
The target binary is a Golang produced one, and it is heavily multi thread.
The kernel module I wrote installs hooks on the kernel functions _do_fork() and do_exit() to keep track of each process/thread this binary produces and terminates.
The LKM works, more or less.
During some conditions, however, I have a scenario I'm not able to explain.
It seems like a process/thread could terminate without passing through do_exit().
The evidence I collected by putting printk() shows the process creation but does not indicate the process termination.
I'm aware that printk() can be slow, and I'm also aware that messages can be lost in such situations.
Trying to prevent message loss due to slow console (for this particular application, serial tty 115200 is used), I tried to implement a quicker console, and messages have been collected using netconsole.
The described setup seems to confirm a process can terminate without pass through the do_exit() function.
But because I wasn't sure my messages couldn't be lost on the printk() infrastructure, I decided to repeat the same test but replacing printk() with ftrace_printk(), which should be a leaner alternative to printk().
Still the same result, occasionally I see processes not passing through the do_exit(), and verifying if the PID is currently running, I have to face the fact that it is not running.
Also note that I put my hook in the do_exit() kernel function as the first instruction to ensure the function flow does not terminate inside a called function.
My question is then the following:
Can a Linux process terminate without its flow pass through the do_exit() function?
If so, can someone give me a hint of what this scenario can be?

After a long debug session, I'm finally able to answer my own question.
That's not all; I'm also able to explain why I saw the strange behavior I described in my scenario.
Let's start from the beginning: monitoring a heavily multithreading application. I observed rare cases where a PID that suddenly stops exists without observing its flow to pass through the Linux Kernel do_exit() function.
Because this my original question:
Can a Linux process terminate without pass through the do_exit() function?
As for my current knowledge, which I would by now consider reasonably extensive, a Linux process can not end its execution without pass through the do_exit() function.
But this answer is in contrast with my observations, and the problem leading me to this question is still there.
Someone here suggested that the strange behavior I watched was because my observations were somehow wrong, alluding my method was inaccurate, as for my conclusions.
My observations were correct, and the process I watched didn't pass through the do_exit() but terminated.
To explain this phenomenon, I want to put on the table another question that I think internet searchers may find somehow useful:
Can two processes share the same PID?
If you'd asked me this a month ago, I'd surely answered this question with: "definitively no, two processes can not share the same PID."
Linux is more complex, though.
There's a situation in which, in a Linux system, two different processes can share the same PID!
https://elixir.bootlin.com/linux/v4.19.20/source/fs/exec.c#L1141
Surprisingly, this behavior does not harm anyone; when this happens, one of these two processes is a zombie.
updated to correct an error
The circumstances of this duplicate PID are more intricate than those described previously. The process must flush the previous exec context if a threaded process forks before invoking an execve (the fork copies also the threads). If the intention is to use the execve() function to execute a new text, the kernel must first call the flush_old_exec()  function, which then calls the de_thread() function for each thread in the process other than the task leader. Except the task leader, all the process' threads are eliminated as a result. Each thread's PID is changed to that of the leader, and if it is not immediately terminated, for example because it needs to wait an operation completion, it keeps using that PID.
end of the update
That was what I was watching; the PID I was monitoring did not pass through the do_exit() because when the corresponding thread terminated, it had no more the PID it had when it started, but it had its leader's.
For people who know the Linux Kernel's mechanics very well, this is nothing to be surprised for; this behavior is intended and hasn't changed since 2.6.17.
Current 5.10.3, is still this way.
Hoping this to be useful to internet searchers; I'd also like to add that this also answers the followings:
Question: Can a Linux process/thread terminate without pass through do_exit()? Answer: NO, do_exit() is the only way a process has to end its execution — both intentional than unintentional.
Question: Can two processes share the same PID? Answer: Normally don't. There's some rare case in which two schedulable entities have the same PID.
Question: Do Linux kernel have scenarios where a process change its PID? Answer: yes, there's at least one scenario where a Process changes its PID.

Can a Linux process terminate without its flow pass through the do_exit() function?
Probably not, but you should study the source code of the Linux kernel to be sure. Ask on KernelNewbies. Kernel threads and udev or systemd related things (or perhaps modprobe or the older hotplug) are probable exceptions. When your /sbin/init of pid 1 terminates (that should not happen) strange things would happen.
The LKM works, more or less.
What does that means? How could a kernel module half-work?
And in real life, it does happen sometimes that your Linux kernel is panicking or crashes (and it could happen with your LKM, if it has not been peer-reviewed by the Linux kernel community). In such a case, there is no more any notion of processes, since they are an abstraction provided by a living Linux kernel.
See also dmesg(1), strace(1), proc(5), syscalls(2), ptrace(2), clone(2), fork(2), execve(2), waitpid(2), elf(5), credentials(7), pthreads(7)
Look also inside the source code of your libc, e.g. GNU libc or musl-libc
Of course, see Linux From Scratch and Advanced Linux Programming
And verifying if the PID is currently running,
This can be done is user land with /proc/, or using kill(2) with a 0 signal (and maybe also pidfd_send_signal(2)...)
PS. I still don't understand why you need to write a kernel module or change the kernel code. My intuition would be to avoid doing that when possible.

How to ensure a signal handler never yields to a thread within the same process group?

This is a bit of a meta question since I think I have a solution that works for me, but it has its own downsides and upsides. I need to do a fairly common thing, catch SIGSEGV on a thread (no dedicated crash handling thread), dump some debug information and exit.
The catch here is the fact that upon crash, my application runs llvm-symbolizer which takes a while (relatively speaking) and causes a yield (either because of clone + execve or exceeding the time quanta for the thread, I've seen latter happen when doing symbolication myself in-process using libLLVM). The reason for doing all this is to get a stack trace with demangled symbols and with line/file information (stored in a separate DWP file). For obvious reasons I do not want a yield happening across my SIGSEGV handler since I intend to terminate the application (thread group) after it has executed and never return from the signal handler.
I'm not that familiar with Linux signal handling and with glibc's wrappers doing magic around them, though, I know the basic gotchas but there isn't much information on the specifics of handling signals like whether synchronous signal handlers get any kind of special priority in terms of scheduling.
Brainstorming, I had a few ideas and downsides to them:
pthread_kill(<every other thread>, SIGSTOP) - Cumbersome with more threads, interacts with signal handlers which seems like it could have unintended side effects. Also requires intercepting thread creation from other libraries to keep track of the thread list and an increasing chance of pre-emption with every system call. Possibly even change their contexts once they're stopped to point to a syscall exit stub or flat out use SIGKILL.
Global flag to serve as cancellation points for all thread (kinda like pthread_cancel/pthread_testcancel). Safer but requires a lot of maintenance and across a large codebase it can be hellish, in addition to a a mild performance overhead. Global flag could also cause the error to cascade since the program is already in an unpredictable state so letting any other thread run there is already not great.
"Abusing" the scheduler which is my current pick, with my implementation as one of the answers. Switching to FIFO scheduling policy and raising priority therefore becoming the only runnable thread in that group.
Core dumps not an option since the goal here was to avoid them in the first place. I would prefer not requiring a helper program aside from from the symbolizer as well.
Environment is a typical glibc based Linux (4.4) distribution with NPTL.
I know that crash handlers are fairly common now so I believe none of the ways I picked are that great, especially considering I've never seen the scheduler "hack" ever get used that way. So with that, does anyone have a better alternative that is cleaner and less riskier than the scheduler "hack" and am I missing any important points in my general ideas about signals?
Edit: It seems that I haven't really considered MP in this equation (as per comments) and the fact that other threads are still runnable in an MP situation and can happily continue running alongside the FIFO thread on a different processor. I can however change the affinity of the process to only execute on the same core as the crashing thread, which basically will effectively freeze all other threads at schedule boundaries. However, that still leaves the "FIFO thread yielding due to blocking IO" scenario open.
It seems like the FIFO + SIGSTOP option is the best one, though I do wonder if there are any other tricks that can make a thread unschedulable short of using SIGSTOP. From the docuemntation it seems like it's not possible to set a thread's CPU affinity to zero (leaving it in a limbo state where it's technically runnable except no processors are available for it to run on).

upon crash, my application runs llvm-symbolizer
That is likely to cause deadlocks. I can't find any statement about llvm-symbolizer being async-signal safe. It's likely to call malloc, and if so will surely deadlock if the crash also happens inside malloc (e.g. due to heap corruption elsewhere).
Switching to FIFO scheduling policy and raising priority therefore becoming the only runnable thread in that group.
I believe you are mistaken: a SCHED_FIFO thread will run so long as it is runnable (i.e. does not issue any blocking system calls). If the thread does issue such a call (which it has to: to e.g. open the separate .dwp file), it will block and other threads will become runnable.
TL;DR: there is no easy way to achieve what you want, and it seems unnecessary anyway: what do you care that other threads continue running while the crashing thread finishes its business?

This is the best solution I could come up (parts omitted for brevity but it shows the principle) with, my basic assumption being that in this situation the process runs as root. This approach can lead to resource starvation in case things go really bad and requires privileges (if I understand the man(7) sched page correctly) I run the part of the signal handler that causes preemptions under the OSSplHigh guard and exit the scope as soon as I can. This is not strictly C++ related since the same could be done in C or any other native language.
void spl_get(spl_t& O)
{
os_assert(syscall(__NR_sched_getattr,
0, &O, sizeof(spl_t), 0) == 0);
}
void spl_set(spl_t& N)
{
os_assert(syscall(__NR_sched_setattr,
0, &N, 0) == 0);
}
void splx(uint32_t PRI, spl_t& O) {
spl_t PL = {0};
PL.size = sizeof(PL);
PL.sched_policy = SCHED_FIFO;
PL.sched_priority = PRI;
spl_set(PL, O);
}
class OSSplHigh {
os::spl_t OldPrioLevel;
public:
OSSplHigh() {
os::splx(2, OldPrioLevel);
}
~OSSplHigh() {
os::spl_set(OldPrioLevel);
}
};
The handler itself is quite trivial using sigaltstack and sigaction though I do not block SIGSEGV on any thread. Also oddly enough syscalls sched_setattr and sched_getattr or the struct definition weren't exposed through glibc contrary to the documentation.
Late Edit: The best solution involved sending SIGSTOP to all threads (by intercepting pthread_create via linker's --wrap option) to keep a ledger of all running threads, thank you to suggestion in the comments.

Correct way of calling fork() after parent has created threads?

I'm implementing a complex application that takes third-party plug-ins, and I want to run the plug-in code in child processes for isolation. The parent process needs to be multithreaded, but I have read that fork may be unsafe in multithreaded processes, particularly if you do not immediately call execve, and that pthread_atfork is not a complete solution.
What do other complex applications do about this? I know Chrome uses both subprocesses and multithreading simultaneously, so it must be possible.

The behavior of fork() in a multithreaded program is well-defined. On success, the child process has exactly one thread -- the same one that called fork() in the parent program. Although this can be a problem, whether it actually is a problem depends on the circumstances.
When is fork()ing a problem for a multithreaded program?
The main reason for fork()ing to present a problem in a multithreaded program is that the child process depends on mutexes, condition variables, etc. that other threads can no longer be relied upon to manipulate. For example, if the child needs to acquire a process-private mutex that it does not already hold, then it may be that that mutex was held by a different thread at the time of the fork. In that case, it will never be released in the child process, because no thread that could release it exists in the child.
When is fork()ing not a problem for a multithreaded program?
One of the common idioms involving fork() is to immediately follow it up by execing another program. That's no problem, regardless of the threadedness of the parent.
Alternatively, if the child process does not depend on any problematic resources, then nothing special need be done. Note that process-shared interthread objects are not "problematic" in this sense. This situation is fairly common, and it sounds like it might be your case.
Otherwise, it's not a problem if the parent's forking thread can and does acquire all the process-private interthread resources that the child will need before it forks. Handlers registered by pthread_atfork() can help with this under some circumstances, but under others, it makes more sense for that to be done in the immediate environs of the fork call.
Overall
You've presented the question as if fork()ing was a deep and troublesome problem for multithreaded programs. It is certainly a problem that should be considered, and it is typically best to avoid using both multiple threads and multiple processes. Therefore, inasmuch as you want multiple processes so as to have separate address spaces and perhaps name spaces into which to load plugins, perhaps you should consider using separate processes wherever you now use threads. On the other hand, if you exercise some thought and care, you can probably make it work just fine for your multi-threaded process to fork children and interact with them.

If you cannot ensure that fork is only used under safe circumstances, as described in John Bollinger's answer, a general workaround is to use a "fork server". Before creating any threads, the original process forks once. The child process is the fork server; it remains single-threaded. The parent process now goes ahead and creates its threads. Whenever the parent would want to call fork, it instead sends a message to the fork server asking it to do so.
If the (ultimate) child processes also need to communicate with the parent, the easiest way to accomplish this is to have the parent create pipes for each child's stdin and stdout, and then transfer the child sides of those pipes to the fork server, using a SCM_RIGHTS special message. You can send file descriptors and data simultaneously. The communication protocol between the fork server and the parent might need to get pretty fancy — look at the posix_spawn API for a more-or-less complete list of all the knobs you might want. (Note: posix_spawn is just a library wrapper around fork; using it will not avoid the original problem.)
The fork server is also responsible for calling waitpid and relaying exit statuses back to the parent. This is trickier than it ought to be, because the standard APIs for waiting for the next of several possible events (select and poll) do not accept a process ID as one of the things to wait for. (BSD's kqueue does, but you're probably not on a BSD.) You have to do a messy dance with SIGCHLD and a pipe-to-self instead.

Why pthread_mutex_lock is used, when the same can be done in a programmable way?

We all know about semaphore and critical section problem.
In pthreads, this can be sorted by using pthread_mutex_lock( ) and pthread_mutex_unlock( ).
But why do we need these system calls, when the same can be implemented in the code, by doing something like:
flag = 0;
if (flag) // Thread1 enters and makes flag = 0
{
flag = 0; // On entering critical section, flag is made 0 so that others can't enter
// do some critical section operation
flag = 1;
}
// Thread1 exits
Doing the same as above, will it solve the critical section problem? If no, then why?

There are probably many reasons why pthread_mutex objects and APIs that manipulate those objects are used instead of everyone coding up their own synchronization primitives. A coupl eof the more important ones:
much like many other objects and functionality that are useful to a wide audience, it makes sense to standardize that functionality so that people don't have to reinvent the wheel and so they can use and recognize standard patterns and idioms. In other words, there are pthread mutex APIs for the same reason there are standard string manipulation functions.
synchronization techniques are notoriously complicated and difficult to get right. So it's best to have a vetted library of code that performs this functionality. Even if it were OK to reinvent the wheel umpteen million times, having 99% of those implementations with serious flaws isn't a great situation. For example, pthreads handles issues like memory barriers and atomicity which are not addressed properly in the example you have in your question. Considering the example in the question: there's at least one serious problem; it has a race condition where two threads could enter the critical section concurrently since the test of the flag and setting it to 0 aren't performed atomically.

First, if your code would work, the second thread will skip entirely the critical section. You'd have to place a loop or something there.
Also, take into account the fact that the scheduler may preempt your thread anywhere. What happens if thread A does the test and is preempted before changing flag and thread B is allowed to make the test and enter the critical section being preempted soon afterwards. You'll have two threads there.

Firstly, you should use atomic memory operations (see InterlockedCompareExchange() in MSVC and __sync_val_compare_and_swap() in GCC).
Secondly, this code will work, but only if second thread shouldn't wait when first sets flag back to 1. If it should, you wil end with loop, which would consume all your CPU. In that case you should use something, that will cause waiting thread to sleep (for example pthread_mutex_lock()).

Since you have tagged you question with "linux", one might add that pthreads are built on top of something called "futexes", or "fast userspace mutexes". As the name might imply, the fast path, namely locking and unlocking an uncontended mutex, does NOT require a syscall, it's all done in userspace. FWIW, AFAIK Windows does something similar as well.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string