When you set a cpu limitation for a process and that process creates several other child processes, as the child processes increase their CPU-shares, do the CPU-shares of the parent process increase as well ?
The same question would go for memory. Although, memory is different as far as I am concerned since I've learned that each process has its own heap. Would it then be correct to say that the memory limitation of a process isn't influenced by the amount of memory its child processes use ?
Related
When you start a system, the first process started is INIT, which gets the PID of 1. Afterwards, all other processes are child of INIT (PID: 1) and each of them can have child processes and so on.
My question is, how are processes related when it comes to memory and cpu ? I am aware that each process can have multiple threads and that it each process has its own heap. In that case, how are parent and child processes related when it comes to resources ?
If, for example a process has a limit on the CPU amount it can use and that limit is set by the cgroup the process it's in, how would that affect child processes ? Does the CPU-amount of the parent process increase as the CPU-amount of its child processes increase ?
My node.js application takes a few minutes to initialize and allocates significant amount of memory. And I would like to do some work in child processes.
The standard solution is to use the Cluster module, but this has two disadvantages:
First, all initialization is repeated in children processes, wasting CPU memory;
Second I'm losing huge copy-on-write memory saving provided by forking (the majority of allocated memory is only read).
I also looked at child_process.fork but it seem to do the same - i.e. it does fork+exec, which doesn't help me.
Standard Unix way is to do heavy initialization and then fork() after that. Is it possible to do with node.js?
-Interview Question
I was asked the disadvantages of thread. And what are the scenario where we shouldn't use thread instead use process?
I couldn't think much except invalid memory access in some case.
Threads, spawned by the same process, all share the same memory. Processes all run in their own memory context.
In Linux (I don't know what the behavior under Windows is like) a newly spawned child process will usually received a copy of certain part the parent process' memory context an therefore is more expensive memory-wise at runtime and CPU-time/MMU wise at creation. Also context switching - (off)loading the process from or to the CPU (this happens, when a process or thread has nothing to do and is pushed to a queue in favor of processes or threads with actual work) - might be more expensive with a process.
On the other hand processes might be much more secure since their memory is isolated from the memory of their sibling processes.
I have a huge application that needs to fork itself at some point. The application is multithreaded and has about 200MB of allocated memory. What I want to do now to ensure that the data allocated by the process wont get duplicated is to start a new thread and fork inside of this thread. From what I have read, only the thread that calls fork will be duplicated, but what will happen to the allocated memory? Will that still be there? The purpose of this is to restart the application with other startup parameters, when its forked, it will call main with my new parameters, thus getting hopefully a new process of the same program. Now before you ask: I cannot assure that the binary of that process will still be in the same place as when I started the process, otherwise I could just fork and exec whats in /proc/self/exe.
Threads are execution units inside the big bag of resources that a process is. A process is the whole thing that you can access from any thread in the process: all the threads, all the file descriptors, all the other resources. So memory is absolutely not tied to a thread, and forking from a thread has no useful effect. Everything still needs to be copied over since the point of forking is creating a new process.
That said, Linux has some tricks to make it faster. Copying 2 gigabytes worth of RAM is neither fast or efficient. So when you fork, Linux actually gives the new process the same memory (at first), but it uses the virtual memory system to mark it as copy-on-write: as soon as one process needs to write to that memory, the kernel intercepts it and allocates distinct memory so that the other process isn't affected.
I'm writing a mono-thread memory heavy proof of concept application.
This application doesn't manipulate much data per se, will mainly load GBs of data and then do some data analysis on it.
I don't want to manage concurrency via MT implementation, don't want to have to implement locks (i.e. mutexes, spinlocks, ...) so I've decided this time around to use the dear old fork().
On Linux, where memory is CoW, I should be able to efficiently analyse same datasets without having to copy them explicitly and with simple parallel mono-thread logic (again, this is a proof of concept).
Now that I spawn child processes, with fork() is very easy to setup input parameters for a sub-task (sub-process in this case), but then I have to get back the results to the main process. And sometimes these results are 10s of GB large. All the IPC mechanisms I have in mind are:
PIPEs/Sockets (and then epoll equivalent to wait for results in a mono-thread fashion)
Hybrid PIPEs/Shared Memory (epoll equivalent to wait for results with reference to Shared Memory, then copy data from Shared Memory into parent process, destroy Shared Memory)
What else could I use? Apart the obvious "go multi-thread", I really would like to leverage the CoW and single-thread multi-process architecture for this proof of concept. Any ideas?
Thanks
After some experimenting the conclusion I got to is the following:
When a child process has to communicate with parent, before spawning such child process I create a segment of shared memory (i.e. 16 MB)
if coordination is needed a semaphore is created in sh mem segment
Then upon forking, I pipe2 with nonblocking sockets so child can notify parent when some data is available
The pipe fd is then used into epoll
epoll is used as Level Triggered so I can interleave requests if the child processes are really fast in sending data
The segment of shared memory is used to communicate data directly if the structures are pod or with simple template<...> binary read/write functions if those are not
I believe this is a good solution.
Cheers
You could also use a regular file.
Parent process could wait for the child process (to analyse the data on memory and then write to file its result and) to exit and once it does, you must be able to read data from the file. As you mentioned, input parameter is not a problem, you could just specify the file name to write to in one of the input parameters. This way, there is no locking required or except for wait() on exit status of child process.
I wonder if each of your child processes return 10s of GB large data, this way it is much better to use regular files, as you will have enough time to process each of the child process's result. But is this 10GBs data shared across child processes? If that was the case, you would have preferred to use locks, so I assume it isn't.