How does a child process modify or read data in parent process after vfork()?
Are the variables declared in parent process directly accessible to the child?
I have a process which creates some data structures. I then need to fork a child process
which needs to read/write these data structures. The child will be an exec'ed process different from the parent.
One process cannot directly modify another's memory. What you would typically do is create a pipe or other mechanism that can cross process boundaries. The open descriptors will be inherited by the child process if you use fork(). It can then send messages to the parent instructing it to modify the data structures as required.
The form of the messages can be the difficult part of this design. You can:
Design a protocol that carries values and instructions on what to do with them.
Use an existing marshaling tool such as Google Protocol Buffers.
Use Remote Procedure Calls with one of the existing RPC mechanisms (i.e. SUN or ONC-RPC).
You can also use a manually set-up shared memory scheme that would allow both processes to access common memory. The parent process would allocate storage for its data structures in that shared memory. The child process would map that also into its space and access those structures. You would need to use some sort of sync mechanism depending on how you use the data.
Related
what (1>resources are shared) and what (2>resources are created new) during (1>new process) and (2>new thread) creation in linux?
I searched for it, but nowhere it is mentioned what resources are created new and what are shared
When you call fork() and create a child, all descriptors open in the parent before the call to fork are shared between the parent and the child. For instance a socket in parent and say the parent calls accept and then calls fork. The connected socket is then shared between the parent and the child. Normally, the child then reads and writes the connected socket and the parent closes the connected socket.
In the traditional UNIX model, when a parent process needs something performed by another entity, it forks a child process and lets the child perform the processing. While this paradigm has served well for many years there are issues as well:
fork is expensive. Memory is copied from the parent to the child, all descriptors are duplicated in the child and so on. Some optimizations have been made in recent days with copy-on-write, which avoids copy until the child needs its own copy.
While passing information from parent to child is easy, the reverse takes some work. And in order to achieve the pass information IPC (Inter Process Communication) is required.
So LINUX introduced clone(). clone() allows the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.
Then comes the threads. They are also known as lightweight processes. Thread creation can be 10-100 times faster than process creation as you can guess. All threads within a process share the same global memory. This makes the sharing of information easy between threads, but along with this comes the requirement to synchronize access.
To sum up, all threads share the following:
Process information
Most Data
Open files (eg descriptors)
Signal Handlers
Current working dir
user and group ids
But each thread has its own:
ThreadID
set of registers
stack for local variables and return addresses
errno
signal mask
priority
Say I have some process calling file device operation like read. Before this read the process also called a syscall(defined by me), providing me with some information relevant to the read(and possibly other future reads done by this process). What is the best way of achieving this sort of information flow in the kernel? Is there any good way to store process-specific information other than making some pid-indexed list?
I'd like the syscall information stored in kernel to be inherited by children of that process too. Would it be possible to achieve that without (somehow) traversing the process child-parent tree(and that wouldn't give me the inheritance I want because after forking I don't want changes in parent to affect the child)?
Just like we have init_task variable which gives the starting address of the runqueue and which can be accessible anywhere in the user as well as kernel space, you can add a variable which will be set to the appropriate value by your system call and then accessed by your read(appropriate) methods.
In Linux, if the parent process has any data structures (e.g., trees, lists), are those data structures inherited by the child? I mean, does the child get access to the same data structure (any kind of pointer to that data structure)?
If you're talking about Linux/Unix processes after a fork(), yes. They get their own copy of the data of the parent process, so whatever one of them does after the fork is not seen by the other (which is normally implemented by copy-on-write, so the memory pages won't get copied until written to, but that's a detail the user program doesn't see).
If you're talking about Windows starting a new process with CreateProcess(), no, the new process does not inherit any data structure from the parent.
Both of these have much more to do with which OS you're using than with any specific programming language.
Assuming you are using something like fork() to create the child processes, they'll inherit everything that's global for the actual parent process' context:
Environment variable settings
Opened file descriptors
etc.
Global scope variables will be copied to the child process context from the state they actually are. Changes to these variables will not be reflected in the parent process.
If you want to communicate between parent and child processes, consider using pipes or shared memory.
I'm writing a mono-thread memory heavy proof of concept application.
This application doesn't manipulate much data per se, will mainly load GBs of data and then do some data analysis on it.
I don't want to manage concurrency via MT implementation, don't want to have to implement locks (i.e. mutexes, spinlocks, ...) so I've decided this time around to use the dear old fork().
On Linux, where memory is CoW, I should be able to efficiently analyse same datasets without having to copy them explicitly and with simple parallel mono-thread logic (again, this is a proof of concept).
Now that I spawn child processes, with fork() is very easy to setup input parameters for a sub-task (sub-process in this case), but then I have to get back the results to the main process. And sometimes these results are 10s of GB large. All the IPC mechanisms I have in mind are:
PIPEs/Sockets (and then epoll equivalent to wait for results in a mono-thread fashion)
Hybrid PIPEs/Shared Memory (epoll equivalent to wait for results with reference to Shared Memory, then copy data from Shared Memory into parent process, destroy Shared Memory)
What else could I use? Apart the obvious "go multi-thread", I really would like to leverage the CoW and single-thread multi-process architecture for this proof of concept. Any ideas?
Thanks
After some experimenting the conclusion I got to is the following:
When a child process has to communicate with parent, before spawning such child process I create a segment of shared memory (i.e. 16 MB)
if coordination is needed a semaphore is created in sh mem segment
Then upon forking, I pipe2 with nonblocking sockets so child can notify parent when some data is available
The pipe fd is then used into epoll
epoll is used as Level Triggered so I can interleave requests if the child processes are really fast in sending data
The segment of shared memory is used to communicate data directly if the structures are pod or with simple template<...> binary read/write functions if those are not
I believe this is a good solution.
Cheers
You could also use a regular file.
Parent process could wait for the child process (to analyse the data on memory and then write to file its result and) to exit and once it does, you must be able to read data from the file. As you mentioned, input parameter is not a problem, you could just specify the file name to write to in one of the input parameters. This way, there is no locking required or except for wait() on exit status of child process.
I wonder if each of your child processes return 10s of GB large data, this way it is much better to use regular files, as you will have enough time to process each of the child process's result. But is this 10GBs data shared across child processes? If that was the case, you would have preferred to use locks, so I assume it isn't.
Is there any way that I can access the data created by my FUSE filesystem process?
e.g.
in prefix_write() I store some data in memory and would like to access those data from another process.
Shared memory should work. But I'm looking for a more elegant solution, such as a custom field in fuse_operations that I access as a function from other processes. But as far as I know, the fields in fuse_operations need to be from POSIX, so it's probably impossible to do so. Please correct me if I'm wrong.
thanks
The other process that you are speaking of, is it a process forked by another process. If yes then it should be pretty easy to send data. Before forking create a pipe and then fork, so the fd's returned by the pipe are inherited by the child process. You can then use these fd's for bi-directional data transfer.
If your use case is not this, then can you illustrate why do you want a foreign process to access another processes data?