Correct way of calling fork() after parent has created threads?

Correct way of calling fork() after parent has created threads? - multithreading

I'm implementing a complex application that takes third-party plug-ins, and I want to run the plug-in code in child processes for isolation. The parent process needs to be multithreaded, but I have read that fork may be unsafe in multithreaded processes, particularly if you do not immediately call execve, and that pthread_atfork is not a complete solution.
What do other complex applications do about this? I know Chrome uses both subprocesses and multithreading simultaneously, so it must be possible.

The behavior of fork() in a multithreaded program is well-defined. On success, the child process has exactly one thread -- the same one that called fork() in the parent program. Although this can be a problem, whether it actually is a problem depends on the circumstances.
When is fork()ing a problem for a multithreaded program?
The main reason for fork()ing to present a problem in a multithreaded program is that the child process depends on mutexes, condition variables, etc. that other threads can no longer be relied upon to manipulate. For example, if the child needs to acquire a process-private mutex that it does not already hold, then it may be that that mutex was held by a different thread at the time of the fork. In that case, it will never be released in the child process, because no thread that could release it exists in the child.
When is fork()ing not a problem for a multithreaded program?
One of the common idioms involving fork() is to immediately follow it up by execing another program. That's no problem, regardless of the threadedness of the parent.
Alternatively, if the child process does not depend on any problematic resources, then nothing special need be done. Note that process-shared interthread objects are not "problematic" in this sense. This situation is fairly common, and it sounds like it might be your case.
Otherwise, it's not a problem if the parent's forking thread can and does acquire all the process-private interthread resources that the child will need before it forks. Handlers registered by pthread_atfork() can help with this under some circumstances, but under others, it makes more sense for that to be done in the immediate environs of the fork call.
Overall
You've presented the question as if fork()ing was a deep and troublesome problem for multithreaded programs. It is certainly a problem that should be considered, and it is typically best to avoid using both multiple threads and multiple processes. Therefore, inasmuch as you want multiple processes so as to have separate address spaces and perhaps name spaces into which to load plugins, perhaps you should consider using separate processes wherever you now use threads. On the other hand, if you exercise some thought and care, you can probably make it work just fine for your multi-threaded process to fork children and interact with them.

If you cannot ensure that fork is only used under safe circumstances, as described in John Bollinger's answer, a general workaround is to use a "fork server". Before creating any threads, the original process forks once. The child process is the fork server; it remains single-threaded. The parent process now goes ahead and creates its threads. Whenever the parent would want to call fork, it instead sends a message to the fork server asking it to do so.
If the (ultimate) child processes also need to communicate with the parent, the easiest way to accomplish this is to have the parent create pipes for each child's stdin and stdout, and then transfer the child sides of those pipes to the fork server, using a SCM_RIGHTS special message. You can send file descriptors and data simultaneously. The communication protocol between the fork server and the parent might need to get pretty fancy — look at the posix_spawn API for a more-or-less complete list of all the knobs you might want. (Note: posix_spawn is just a library wrapper around fork; using it will not avoid the original problem.)
The fork server is also responsible for calling waitpid and relaying exit statuses back to the parent. This is trickier than it ought to be, because the standard APIs for waiting for the next of several possible events (select and poll) do not accept a process ID as one of the things to wait for. (BSD's kqueue does, but you're probably not on a BSD.) You have to do a messy dance with SIGCHLD and a pipe-to-self instead.

Related

Multi-threaded fork()

In a multi-threaded application, if a thread calls fork(), it will copy the state of only that thread. So the child process created would be a single-thread process. If some other thread were to hold a lock required by the thread which called the fork(), that lock would never be released in the child process. This is a problem.
To counter this, we can modify the fork() in two ways. Either we can copy all the threads instead of only that single one. Or we can make sure that any lock held by the (other) non-copied threads will be released. So what will be the modified fork() system call in both these cases. And which of these two would be better, or what would be the advantages and disadvantages of either option?

This is a thorny question.
POSIX has pthread_atfork() to work through the mess of mixing forks and thread creation. The NOTES section of that man page discusses mutexes etc. However, it acknowledges that getting it right is hard.
The function isn't so much an alternative to fork() as it is a way to explain to the pthread library how your program needs to be prepared for the use of fork().
In general not trying to launch a thread from the child of fork but either exiting that child or calling exec asap, will minimize problems.
This post has a good discussion of pthread_atfork().

...Or we can make sure that any lock held by the (other) non-copied threads will be released.
That's going to be harder than you realize because a program can implement "locks" entirely in user-mode code, in which case, the OS would have no knowledge of them.
Even if you were careful only to use locks that were known to the OS you still have a more general problem: Creating a new process with just the one thread would effectively be no different from creating a new process with all of the threads and then immediately killing all but one of them.
Read about why we don't kill threads. In a nutshell: Locks aren't the only state that needs to be cleaned up. Any of the threads that existed in the parent but not in the child could, at the moment of the fork call, been in the middle of making a mess that needs to be cleaned up. If that thread doesn't exist in the child, then you've lost the knowledge of what needs to be cleaned up.
we can copy all the threads instead of only that single one...
That also is a potential problem. The one thread that calls fork() would know when and why fork() was called, and it would be prepared for the fork call. None of the other threads would have any warning. And, if any of those threads is interacting with something outside of the process (e.g., talking to a remote service) then,where you previously had one client talking to the service, you suddenly have two clients, talking to the same service, and they both think that they are the only one. That's not going to end well.
Don't call fork() from multi-threaded programs.
In one project I worked on: We had a big multi-threaded program that needed to spawn other processes. How we did it is, we had it spawn a simple, single-threaded "helper" program before it created any new threads. Then, whenever it needed to spawn another process, it sent a message to the helper, and the helper did it.

Does kill(SIGSTOP) take effect by the time kill() returns?

Suppose I have a parent process and a child process (started with e.g. fork() or clone()) running on Linux. Further suppose that there is some shared memory that both the parent and the child can modify.
Within the context of the parent process, I would like to stop the child process and know that it has actually stopped, and moreover that any shared memory writes made by the child are visible to the parent (including whatever synchronization or cache flushes that may require in a multi-processor system).
This answer, which speaks of using kill(SIGSTOP) to stop a child process, contains an interesting tidbit:
When the first kill() call succeeds, you can safely assume that the child has stopped.
Is this statement actually true, and if so, can anyone expound on it, or point me to some more detailed documentation (e.g. a Linux manpage)? Otherwise, is there another mechanism that I can use to ensure that the child process is completely stopped and is not going to be doing any more writes to the shared memory?
I'm imagining something along the lines of:
the parent sends a different signal (e.g. SIGUSR1), which the child can handle
the child handles the SIGUSR1 and does something like a pthread_cond_wait() in the signal handler to safely "stop" (though still running from the kernel perspective) -- this is not fully fleshed out in my mind yet, just an idea
I'd like to avoid reinventing the wheel if there's already an established solution to this problem. Note that the child process needs to be stopped preemptively; adding some kind of active polling to the child process is not an option in this case.
If it only existed on Linux, pthread_suspend() would be perfect ...

It definitely sounds like you should be using a custom signal with a handler, and not sigstop.
It's rare not to care about the state of the child at all, e.g. being fine with it having stored 32bits out of a single non-atomic 64bit write, or logically caught between two dependent writes.
Even if you are, POSIX allows the OS to not make shared writes immediately visible to other processes, so the child should have a chance to call msync for portability, to ensure that writes are completely synced.

The POSIX documentation on Signal Concepts strongly suggests, but does not explicitly say, that the targeted process will be STOPped by the time kill() returns:
A signal is said to be "generated" for (or sent to) a process or thread when the event that causes the signal first occurs... Examples of such events include ... invocations of the kill() and sigqueue() functions.
The documentation is at pains to distinguish signal generation from delivery (when the signal action takes effect) or acceptance. Unfortunately, it sometimes mentions actions taken in response to a stop signal upon generation, and sometimes upon delivery. Given that something must happen upon generation per se, I'd agree that the target process must be STOPped by the time your call returns.
However, at the cost of another syscall, you can be sure. Since you have a parent/child relationship in your design, you can waitpid()/WUNTRACED to receive notification that your child process has, indeed, STOPped.
Edit
See the other answer from that other guy [sic] for reasons why you might not want to do this.

Linux/POSIX: Why doesn't fork() fork all threads

It is well-known that the default way to create a new process under POSIX is to use fork() (under Linux this internally maps to clone(...))
What I want to know is the following: It is well-known that when one calls fork() "The child process is created with a single thread--the one that called fork()"
(cf. https://linux.die.net/man/2/fork). This can of course cause problems if for example some other thread currently holds a lock. To me not also forking all the threads that exist in the process intuitively feels like a "leaky abstraction".
So I would like to know: What is the reason why only the thread calling fork() will exist in the child process instead of all threads of the process? Is there a good technical reason for this?
I know that on Multithreaded fork there is a related question, but the answers given there don't answer mine.

Of these two possibilities:
only the thread calling fork() continues running in the child process
Downside: if another thread was holding on to an internal resource such as a lock, it will not be released.
after fork(), all threads are duplicated into the child process
Downside: threads that were interacting with external resources continue running in parallel. If a thread was appending data to a file: now it happens twice.
Both are bad, but the first one choice only deadlocks the new child process, while the second choice results in corruption outside of the process. This could be described as "bad".
POSIX did standardize pthread_atfork to try to allow automatic cleanup in the first case, but it cannot possibly work.
tl;dr Don't use both threads and forks. Use posix_spawn if you have to.

Forking vs Threading

I have used threading before in my applications and know its concepts well, but recently in my operating system lecture I came across fork(). Which is something similar to threading.
I google searched difference between them and I came to know that:
Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having it’s own memory.
Threads are light-weight process which have less overhead
But, there are still some questions in my mind.
When should you prefer fork() over threading and vice-verse?
If I want to call an external application as a child, then should I use fork() or threads to do it?
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?

The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.
Specifically to your questions:
When should you prefer fork() over threading and vice-verse?
On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.
If I want to call an external application as a child, then should I use fork() or threads to do it?
If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?
This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.

A forked process is called a heavy-weight process, whereas a threaded process is called light-weight process.
The following are the difference between them:
A forked process is considered a child process whereas a threaded process is called a sibling.
Forked process shares no resource like code, data, stack etc with the parent process whereas a threaded process can share code but has its own stack.
Process switching requires the help of OS but thread switching it is not required
Creating multiple processes is a resource intensive task whereas creating multiple thread is less resource intensive task
Each process can run independently whereas one thread can read/write another threads data.
Thread and process lecture

fork() spawns a new copy of the process, as you've noted. What isn't mentioned above is the exec() call which often follows. This replaces the existing process with a new process (a new executable) and as such, fork()/exec() is the standard means of spawning a new process from an old one.
e.g. that's how your shell will invoke a process from the command line. You specify your process (ls, say) and the shell forks and then execs ls.
Note that this operates at a very different level from threading. Threading runs multiple lines of execution intra-process. Forking is a means of creating new processes.

As #2431234123412341234123 said, on Linux thanks to COW, processes are not much heavier than threads and boils down to their usage. COW - copy on write means that a memory page of the forked process gets copied only when forked process makes changes to it, otherwise OS keeps redirecting it to pages of the parent process.
From a programming use case, let us say in the heap memory you have a big data structure a 2d array[2000000][100] (200 mb), and the page size of the kernel is around 4 kb. When the process is forked, no new memory for this array will be allocated. If one particular row (100 bytes) is changed (in either parent process or child), only the corresponding page (4 kb or 8kb if it is overlapping in two pages) will be copied and updated for the forked thread.
Other memory portions of memory work in forked processes same as threads (code is same, registers and call stack are separate).
On Windows as #Niels Keurentjes said, thrads might be better from a performance view, but on Linux it is more of use case.

Twisted: use of multiple threads and processes together

The Twisted documentation led me to believe that it was OK to combine techniques such as reactor.spawnProcess() and threads.deferToThread() in the same application, that the reactor would handle this elegantly under the covers. Upon actually trying it, I found that my application deadlocks. Using multiple threads by themselves, or child processes by themselves, everything is fine.
Looking into the reactor source, I find that the SelectReactor.spawnProcess() method simply calls os.fork() without any consideration for multiple threads that might be running. This explains the deadlocks, because starting with the call to os.fork() you will have two processes with multiple concurrent threads running and doing who knows what with the same file descriptors.
My question for SO is, what is the best strategy for solving this problem?
What I have in mind is to subclass SelectReactor, so that it is a singleton and calls os.fork() only once, immediately when instantiated. The child process will run in the background and act as a server for the parent (using object serialization over pipes to communicate back and forth). The parent continues to run the application and may use threads as desired. Calls to spawnProcess() in the parent will be delegated to the child process, which will be guaranteed to have only one thread running and can therefore call os.fork() safely.
Has anyone done this before? Is there a faster way?

What is the best strategy for solving this problem?
File a ticket (perhaps after registering) describing the issue, preferably with a reproducable test case (for maximum accuracy). Then there can be some discussion about what the best way (or ways - different platforms may demand different solution) to implement it might be.
The idea of immediately creating a child process to help with further child process creation has been raised before, to solve the performance issue surrounding child process reaping. If that approach now resolves two issues, it starts to look a little more attractive. One potential difficulty with this approach is that spawnProcess synchronously returns an object which supplies the child's PID and allows signals to be sent to it. This is a little more work to implement if there is an intermediate process in the way, since the PID will need to be communicated back to the main process before spawnProcess returns. A similar challenge will be supporting the childFDs argument, since it will no longer be possible to merely inherit the file descriptors in the child process.
An alternate solution (which may be somewhat more hackish, but which may also have fewer implementation challenges) might be to call sys.setcheckinterval with a very large number before calling os.fork, and then restore the original check interval in the parent process only. This should suffice to avoid any thread switching in the process until the os.execvpe takes place, destroying all the extra threads. This isn't entirely correct, since it will leave certain resources (such as mutexes and conditions) in a bad state, but you use of these with deferToThread isn't very common so maybe that doesn't affect your case.

The advice Jean-Paul gives in his answer is good, but this should work (and does in most cases).
First, Twisted uses threads for hostname resolution as well, and I've definitely used subprocesses in Twisted processes that also make client connections. So this can work in practice.
Second, fork() does not create multiple threads in the child process. According to the standard describing fork(),
A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread ...
Now, that's not to say that there are no potential multithreading issues with spawnProcess; the standard also says:
... to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called ...
and I don't think there's anything to ensure that only async-signal-safe operations are used.
So, please be more specific as to your exact problem, since it isn't a subprocess with threads being cloned.

Returning to this issue after some time, I found that if I do this:
reactor.callFromThread(reactor.spawnProcess, *spawnargs)
instead of this:
reactor.spawnProcess(*spawnargs)
then the problem goes away in my small test case. There is a remark in the Twisted documentation "Using Processes" that led me to try this: "Most code in Twisted is not thread-safe. For example, writing data to a transport from a protocol is not thread-safe."
I suspect that the other people Jean-Paul mentioned were having this problem may be making a similar mistake. The responsibility is on the application to enforce that reactor and other API calls are being made within the correct thread. And apparently, with very narrow exceptions, the "correct thread" is nearly always the main reactor thread.

fork() on Linux definitely leaves the child process with only one thread.
I assume you are aware that, when using threads in Twisted, the ONLY Twisted API that threads are permitted to call is callFromThread? All other Twisted APIs must only be called from the main, reactor thread.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string