fork and IPC mechanism

fork and IPC mechanism - linux

I'm writing a mono-thread memory heavy proof of concept application.
This application doesn't manipulate much data per se, will mainly load GBs of data and then do some data analysis on it.
I don't want to manage concurrency via MT implementation, don't want to have to implement locks (i.e. mutexes, spinlocks, ...) so I've decided this time around to use the dear old fork().
On Linux, where memory is CoW, I should be able to efficiently analyse same datasets without having to copy them explicitly and with simple parallel mono-thread logic (again, this is a proof of concept).
Now that I spawn child processes, with fork() is very easy to setup input parameters for a sub-task (sub-process in this case), but then I have to get back the results to the main process. And sometimes these results are 10s of GB large. All the IPC mechanisms I have in mind are:
PIPEs/Sockets (and then epoll equivalent to wait for results in a mono-thread fashion)
Hybrid PIPEs/Shared Memory (epoll equivalent to wait for results with reference to Shared Memory, then copy data from Shared Memory into parent process, destroy Shared Memory)
What else could I use? Apart the obvious "go multi-thread", I really would like to leverage the CoW and single-thread multi-process architecture for this proof of concept. Any ideas?
Thanks

After some experimenting the conclusion I got to is the following:
When a child process has to communicate with parent, before spawning such child process I create a segment of shared memory (i.e. 16 MB)
if coordination is needed a semaphore is created in sh mem segment
Then upon forking, I pipe2 with nonblocking sockets so child can notify parent when some data is available
The pipe fd is then used into epoll
epoll is used as Level Triggered so I can interleave requests if the child processes are really fast in sending data
The segment of shared memory is used to communicate data directly if the structures are pod or with simple template<...> binary read/write functions if those are not
I believe this is a good solution.
Cheers

You could also use a regular file.
Parent process could wait for the child process (to analyse the data on memory and then write to file its result and) to exit and once it does, you must be able to read data from the file. As you mentioned, input parameter is not a problem, you could just specify the file name to write to in one of the input parameters. This way, there is no locking required or except for wait() on exit status of child process.
I wonder if each of your child processes return 10s of GB large data, this way it is much better to use regular files, as you will have enough time to process each of the child process's result. But is this 10GBs data shared across child processes? If that was the case, you would have preferred to use locks, so I assume it isn't.

Related

Choice between thread and process

I am designing a program in C++ running in Ubuntu Linux and developed with Eclipse.
This program is implemented by a infinite cycle (main loop), communicating with several interfaces (GUI, RS485 etc.).
One of the tasks of the main cycle is to read information from a DSP via I2C interface every millisecods. Every reading operation through I2C takes around 0.5 msec.
My idea is to have a separate process or thread to manage the interface to the I2C, while the main controller would simply read the data that are written by the separate thread. This avoids keeping blocked the main cycle for long time.
The question is: is it better to implement this separate process as a separate program or as a thread?
My idea is that a thread would be much better because in this way it can use the same memory space of the main program and read/write directly the same memory buffer. The only care to be taken is to use a mutex to access safely the shared memory buffer.
With a separate program instead the communication would need shared memory (shmget) or sockets, pipes etc.
So the choice seems even too easy, but I wish to be sure I am not missing any details. In which cases it would be better to use a separate program instead of a thread?

Node.js Clustering- Forking, how much memory is actually used?

http://stackabuse.com/setting-up-a-node-js-cluster/
says
" To be clear, forking in Node is very different than a POISIX fork in that it doesn't actually clone the current process, but it does start up a new V8 instance.
Although this is one of the easiest ways to multi-thread, it should be used with caution. Just because you're able to spawn 1,000 workers doesn't mean you should. Each worker takes up system resources, so only spawn those that are really needed. The Node docs state that since each child process is a new V8 instance, you need to expect a 30ms startup time for each and at least 10mb of memory per instance."
But https://nodejs.org/api/cluster.html says
"There is no routing logic in Node.js, or in your program, and no shared state between the workers. Therefore, it is important to design your program such that it does not rely too heavily on in-memory data objects for things like sessions and login."
If the workers (forked processes) aren't actually clones of the master process, then how is it that there is also no shared state?
I was under the impression that if the master process has a one gigabyte JSON string, then all the child processes would also have clones of that one gigabyte JSON string. So with two children there would be 3gb of memory used. What actually happens?

On Linux et al. fork() uses copy-on-write semantics, i.e. all of the memory pages of the forked process are shared (not copied) and only those pages that the process wants to modify are copied before the modifications are done. So it's possible to use very little memory even if you have a lot of forked processes, if your modified data is close together, i.e. it uses a small number of actual memory pages.
See:
http://man7.org/linux/man-pages/man2/fork.2.html
https://en.wikipedia.org/wiki/Fork_(system_call)
https://en.wikipedia.org/wiki/Copy-on-write
http://obvious.services.net/2011/01/history-of-copy-on-write-memory.html

Forking vs Threading

I have used threading before in my applications and know its concepts well, but recently in my operating system lecture I came across fork(). Which is something similar to threading.
I google searched difference between them and I came to know that:
Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having it’s own memory.
Threads are light-weight process which have less overhead
But, there are still some questions in my mind.
When should you prefer fork() over threading and vice-verse?
If I want to call an external application as a child, then should I use fork() or threads to do it?
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?

The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.
Specifically to your questions:
When should you prefer fork() over threading and vice-verse?
On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.
If I want to call an external application as a child, then should I use fork() or threads to do it?
If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?
This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.

A forked process is called a heavy-weight process, whereas a threaded process is called light-weight process.
The following are the difference between them:
A forked process is considered a child process whereas a threaded process is called a sibling.
Forked process shares no resource like code, data, stack etc with the parent process whereas a threaded process can share code but has its own stack.
Process switching requires the help of OS but thread switching it is not required
Creating multiple processes is a resource intensive task whereas creating multiple thread is less resource intensive task
Each process can run independently whereas one thread can read/write another threads data.
Thread and process lecture

fork() spawns a new copy of the process, as you've noted. What isn't mentioned above is the exec() call which often follows. This replaces the existing process with a new process (a new executable) and as such, fork()/exec() is the standard means of spawning a new process from an old one.
e.g. that's how your shell will invoke a process from the command line. You specify your process (ls, say) and the shell forks and then execs ls.
Note that this operates at a very different level from threading. Threading runs multiple lines of execution intra-process. Forking is a means of creating new processes.

As #2431234123412341234123 said, on Linux thanks to COW, processes are not much heavier than threads and boils down to their usage. COW - copy on write means that a memory page of the forked process gets copied only when forked process makes changes to it, otherwise OS keeps redirecting it to pages of the parent process.
From a programming use case, let us say in the heap memory you have a big data structure a 2d array[2000000][100] (200 mb), and the page size of the kernel is around 4 kb. When the process is forked, no new memory for this array will be allocated. If one particular row (100 bytes) is changed (in either parent process or child), only the corresponding page (4 kb or 8kb if it is overlapping in two pages) will be copied and updated for the forked thread.
Other memory portions of memory work in forked processes same as threads (code is same, registers and call stack are separate).
On Windows as #Niels Keurentjes said, thrads might be better from a performance view, but on Linux it is more of use case.

When is clone() and fork better than pthreads?

I am beginner in this area.
I have studied fork(), vfork(), clone() and pthreads.
I have noticed that pthread_create() will create a thread, which is less overhead than creating a new process with fork(). Additionally the thread will share file descriptors, memory, etc with parent process.
But when is fork() and clone() better than pthreads? Can you please explain it to me by giving real world example?
Thanks in Advance.

clone(2) is a Linux specific syscall mostly used to implement threads (in particular, it is used for pthread_create). With various arguments, clone can also have a fork(2)-like behavior. Very few people directly use clone, using the pthread library is more portable. You probably need to directly call clone(2) syscall only if you are implementing your own thread library - a competitor to Posix-threads - and this is very tricky (in particular because locking may require using futex(2) syscall in machine-tuned assembly-coded routines, see futex(7)). You don't want to directly use clone or futex because the pthreads are much simpler to use.
(The other pthread functions require some book-keeping to be done internally in libpthread.so after a clone during a pthread_create)
As Jonathon answered, processes have their own address space and file descriptor set. And a process can execute a new executable program with the execve syscall which basically initialize the address space, the stack and registers for starting a new program (but the file descriptors may be kept, unless using close-on-exec flag, e.g. thru O_CLOEXEC for open).
On Unix-like systems, all processes (except the very first process, usuallyinit, of pid 1) are created by fork (or variants like vfork; you could, but don't want to, use clone in such way as it behaves like fork).
(technically, on Linux, there are some few weird exceptions which you can ignore, notably kernel processes or threads and some rare kernel-initiated starting of processes like /sbin/hotplug ....)
The fork and execve syscalls are central to Unix process creation (with waitpid and related syscalls).
A multi-threaded process has several threads (usually created by pthread_create) all sharing the same address space and file descriptors. You use threads when you want to work in parallel on the same data within the same address space, but then you should care about synchronization and locking. Read a pthread tutorial for more.
I suggest you to read a good Unix programming book like Advanced Unix Programming and/or the (freely available) Advanced Linux Programming

The strength and weakness of fork (and company) is that they create a new process that's a clone of the existing process.
This is a weakness because, as you pointed out, creating a new process has a fair amount of overhead. It also means communication between the processes has to be done via some "approved" channel (pipes, sockets, files, shared-memory region, etc.)
This is a strength because it provides (much) greater isolation between the parent and the child. If, for example, a child process crashes, you can kill it and start another fairly easily. By contrast, if a child thread dies, killing it is problematic at best -- it's impossible to be certain what resources that thread held exclusively, so you can't clean up after it. Likewise, since all the threads in a process share a common address space, one thread that ran into a problem could overwrite data being used by all the other threads, so just killing that one thread wouldn't necessarily be enough to clean up the mess.
In other words, using threads is a little bit of a gamble. As long as your code is all clean, you can gain some efficiency by using multiple threads in a single process. Using multiple processes adds a bit of overhead, but can make your code quite a bit more robust, because it limits the damage a single problem can cause, and makes it much easy to shut down and replace a process if it does run into a major problem.
As far as concrete examples go, Apache might be a pretty good one. It will use multiple threads per process, but to limit the damage in case of problems (among other things), it limits the number of threads per process, and can/will spawn several separate processes running concurrently as well. On a decent server you might have, for example, 8 processes with 8 threads each. The large number of threads helps it service a large number of clients in a mostly I/O bound task, and breaking it up into processes means if a problem does arise, it doesn't suddenly become completely un-responsive, and can shut down and restart a process without losing a lot.

These are totally different things. fork() creates a new process. pthread_create() creates a new thread, which runs under the context of the same process.
Thread share the same virtual address space, memory (for good or for bad), set of open file descriptors, among other things.
Processes are (essentially) totally separate from each other and cannot modify each other.
You should read this question:
What is the difference between a process and a thread?
As for an example, if I am your shell (eg. bash), when you enter a command like ls, I am going to fork() a new process, and then exec() the ls executable. (And then I wait() on the child process, but that's getting out of scope.) This happens in an entire different address space, and if ls blows up, I don't care, because I am still executing in my own process.
On the other hand, say I am a math program, and I have been asked to multiply two 100x100 matrices. We know that matrix multiplication is an Embarrassingly Parallel problem. So, I have the matrices in memory. I spawn of N threads, who each operate on the same source matrices, putting their results in the appropriate location in the result matrix. Remember, these operate in the context of the same process, so I need to make sure they are not stamping on each other's data. If N is 8 and I have an eight-core CPU, I can effectively calculate each part of the matrix simultaneously.

Process creation mechanism on unix using fork() (and family) is very efficient.
Morever , most unix system doesnot support kernel level threads i.e thread is not entity recognized by kernel. Hence thread on such system cannot get benefit of CPU scheduling at kernel level. pthread library does that which is not kerenl rather some process itself.
Also on such system pthreads are implemented using vfork() and as light weight process only.
So using threading has no point except portability on such system.
As per my understanding Sun-solaris and windows has kernel level thread and linux family doesn't support kernel threads.
with processes pipes and unix doamin sockets are very efficient IPC without synchronization issues.
I hope it clears why and when thread should be used practically.

Do child processes copy entire arrays?

I'm writing a basic UNIX program that involves processes sending messages to each other. My idea to synchronize the processes is to simply have an array of flags to indicate whether or not a process has reached a certain point in the code.
For example, I want all the processes to wait until they've all been created. I also want them to wait until they've all finished sending messages to each other before they begin reading their pipes.
I'm aware that a process performs a copy-on-write operation when it writes to a previously defined variable.
What I'm wondering is, if I make an array of flags, will the pointer to that array be copied, or will the entire array be copied (thus making my idea useless).
I'd also like any tips on inter-process communication and process synchronization.
EDIT: The processes are writing to each other process' pipe. Each process will send the following information:
typedef struct MessageCDT{
pid_t destination;
pid_t source;
int num;
} Message;
So, just the source of the message and some random number. Then each process will print out the message to stdout: Something along the lines of "process 20 received 5724244 from process 3".

Unix processes have independent address spaces. This means that the memory in one is totally separate from the memory in another. When you call fork(), you get a new copy of the process. Immediately on return from fork(), the only thing different between the two processes is fork()'s return value. All of the data in the two processes are the same, but they are copies. Updating memory in one cannot be known by the other, unless you take steps to share the memory.
There are many choices for interprocess communication (IPC) in Unix, including shared memory, semaphores, pipes (named and unnamed), sockets, message queues and signals. If you Google these things you will find lots to read.
In your particular case, trying to make several processes wait until they all reach a certain point, I might use a semaphore or shared memory, depending on whether there is some master process that started them all or not.
If there is a master process that launches the others, then the master could setup the semaphore with a count equal to the number of processes to synchronize and then launch them. Each child could then decrement the semaphore value and wait for the semaphore value to reach zero.
If there is no master process, then I might create a shared memory segment that contains a count of processes and a flag for each process. But when you have two or more processes using shared memory, then you also need some kind of locking mechanism (probably a semaphore again) to ensure that two processes do not try to update the shared memory simultaneously.
Keep in mind that reading a pipe that nobody is writing to will block the reader until data appears. I don't know what your processes do, but perhaps that is synchronization enough? One other thing to consider if you have multiple processes writing to a given pipe, their data may become interleaved if the writes are larger than PIPE_BUF. The value and location of this macro are system dependent.
-Kevin

The entire array of flags will seem to be copied. It will not actually be copied until one process or another writes to it of course. But that's an implementation detail and transparent to the individual processes. As far as each process is concerned, they each get a copy of the array.
There are ways to make this not happen. You can use mmap with the MAP_SHARED option for the memory used for your flags. Then each sub-process will share the same region of memory. There's also Posix shared memory (which I, BTW, think is an awful hack). To find out about Posix shared memory, look at the shm_overview(7) man page.
But using memory in this way isn't really a good idea. On multi-core systems it's not always the case that when one process (or thread) writes to an area of shared memory that all other processes will see the value written right away. Frequently the value will hang out for awhile in the L2 cache and not be immediately flushed.
If you want to communicate using shared memory, you will have to used mutexes or the C++11 atomic operations to ensure that writes are properly seen by the other processes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string