How do different processes share file descriptors? - linux

When forking a process, consider the following scenario:
1) We open two pipes for IPC bidirection communication
2) Suppose these have (3,4) and (5,6) as file descriptors.
3) We fork the process somewhere in the middle.
4) We exec the child process
Now, the thing that happens is that these two processes are completely independent of each other and the-then child process is now having it's own address space and is a completely new process.
Now, my question is, how do pipes(/file descriptor) live in an Execed processes? Because, pipes opened like this are used for the execed and the parent process to communicate.
The only way I see this could happen is when the file descriptors are global to the machine, which I think is impossible to happen, as that would be conflicting.
And in the IDE for this code:
import os
from multiprocessing import Process, Pipe
def sender(pipe):
"""
send object to parent on anonymous pipe
"""
pipe.send(['spam']+[42, 'eggs'])
pipe.close()
def talker(pipe):
"""
send and receive objects on a pipe
"""
pipe.send(dict(name = 'Bob', spam = 42))
reply = pipe.recv()
print('talker got: ', reply)
if __name__ == '__main__':
(parentEnd, childEnd) = Pipe()
Process(target = sender, args = (childEnd,)).start()
print("parent got: ", parentEnd.recv())
parentEnd.close()
(parentEnd, childEnd) = Pipe()
child = Process(target = talker, args = (childEnd,))
##############################from here
child.start()
print('From talker Parent got:', parentEnd.recv())
parentEnd.send({x * 2 for x in 'spam'})
child.join()
############################## to here
print('parent exit')
There are two processes run, but only the output from one process can be seen in the idle, not two processes. However, in the terminal, it's like the stdout is also shared.

The actual job of copying process file descriptor table which is regulated by more generic clone() syscall flag CLONE_FILES (which is actually is not set by the fork()):
CLONE_FILES (since Linux 2.0)
...
If CLONE_FILES is not set, the child process inherits a copy of all file
descriptors opened in the calling process at the time of clone(). (The dupli-
cated file descriptors in the child refer to the same open file descriptions
(see open(2)) as the corresponding file descriptors in the calling process.)
Subsequent operations that open or close file descriptors, or change file
descriptor flags, performed by either the calling process or the child process
do not affect the other process.
execve() doesn't touch file descriptors except when file is opened or marked with O_CLOEXEC or FD_CLOEXEC flags, in which case those descriptors will be closed:
* By default, file descriptors remain open across an execve(). File
descriptors that are marked close-on-exec are closed; see the
description of FD_CLOEXEC in fcntl(2).

Related

Is there a reason for this difference between Python threads and processes?

When a list object is passed to a python (3.9) Process and Thread, the additions to the list object done in the thread are seen in the parent but not the additions done in the process. E. g.,
from multiprocessing import Process
from threading import Thread
def job(x, out):
out.append(f'f({x})')
out = []
pr = Process(target=job, args=('process', out))
th = Thread(target=job, args=('thread', out))
pr.start(), th.start()
pr.join(), th.join()
print(out)
This prints ['f(thread)']. I expected it to be (disregard the order) ['f(thread)', 'f(process)'].
Could someone explain the reason for this?
There's nothing Python-specific about it; that's just how processes work.
Specifically, all threads running within a given process share the process's memory-space -- so e.g. if thread A changes the state of a variable, thread B will "see" that change.
Processes, OTOH, each get their own private memory space that is inaccessible to all other processes. That's done deliberately as a way to prevent process A from accidentally (or deliberately) reading or corrupting the memory of process B.
When you spawn a child process, the new child process gets its own memory-space that initially contains a copy of all the data in the parent's memory space, but it is a separate space, so changes made by the child will not be visible to the parent (and vice-versa).

File descriptor for ioctl call to make a controlling terminal

On linux to be able to control lifetime of processes forked off of my main process I'm making the main process be the session and group leader by calling setsid(). Then it looks like I need to have the main process make a controlling terminal for the process group, and then, once the main process terminates, all other processes in the process group will receive a SIGHUP. I tried calling open() for a regular file on the filesystem, but ioctl() refuses to accept this fd with 'Inappropriate file descriptor'. Is posix_openpt() what I should be using instead? The man page says that it'll create a pseudo-terminal and return a file descriptor for it. Do I even need an ioctl(fd, TIOCSCTTY, 0) call after posix_openpt(), or not using O_NOCTTY is all I really need? Thanks!
Do I even need an ioctl(fd, TIOCSCTTY, 0) call after posix_openpt(), or not using O_NOCTTY is all I really need?
I just tried on Ubuntu 18.04.5:
If you don't do that and the controlling process is closed, the systemd process becomes the new controlling process of the child process and the child process does not receive SIGHUP.
I'm not sure if this behavior is the same for other Linux distributions, too.
Is posix_openpt() what I should be using instead?
Try the following code:
int master, tty;
master = posix_openpty(O_RDWR);
grantpt(master);
unlockpt(master);
tty = open(ptsname(master), O_RDWR);
ioctl(tty, TIOCSCTTY, 0);
This must be done in the same process that called setsid().
Note: As soon as you completely close the master file, the processes will receive a SIGHUP.
("Completely" means: When you close all copies created by dup() or by creating a child process inheriting the handle.)
If you really want to use the pseudo-TTY, you should not inherit the master handle to child processes (or close() the handle in a child process. However, in your case you only want to use the pseudo-TTY as "workaround", so this is not that important.

How to share a variable with forked process in python?

In python3.6 I have the following code which forks a process, and the child changes a variable. However, the variable of the same name remains unchanged
import os, sys, time
var = 42
child_pid = os.fork()
if child_pid == 0:
print(f"Child Process start {os.getpid()}.")
var = 20
print(f"Child variable {var}")
time.sleep(10)
print(f"Child Process end {os.getpid()}.")
sys.exit(0)
else:
print(f"Parent Process start {os.getpid()}.")
for x in range(20):
time.sleep(2)
print(f"Parent variable {var}")
print(f"Parent Process end {os.getpid()}.")
How can I share the variable var in the example between the child and parent process?
Forking a process creates a new process with a new PID and in a separate memory space. So basically you can not share variables even if they are globals.
If you create a thread you could share global variables.
Otherwise with two (or more) processes you can use IPC (stands for Inter Process Communication) : https://docs.python.org/fr/3.5/library/ipc.html.
Common IPC are sockets (even local) but you can choose another one (e.g. memory mapping, message queues, shared memory ...).
Here is a post for the same problem but in C, the IPC being handled by the OS the principle remains the same : How to share memory between process fork()?.

Preventing threaded subprocess.popen from terminating my main script when child is killed?

Python 2.7.3 on Solaris 10
Questions
When my subprocess has an internal Segmentation Fault(core) issue or a user externally kills it from the shell with a SIGTERM or SIGKILL, my main program's signal handler handles a SIGTERM(-15) and my parent program exits. Is this real? or is it a bad python build?
Background and Code
I have a python script that first spawns a worker management thread. The worker management thread then spawns one or more worker threads. I have other stuff going on in my main thread that I cannot block. My management thread stuff and worker threads are rock-solid. My services run for years without restarts but then we have this subprocess.Popen scenario:
In the run method of the worker thread, I am using:
class workerThread(threading.Thread):
def __init__(self) :
super(workerThread, self).__init__()
...
def run(self)
...
atempfile = tempfile.NamedTempFile(delete=False)
myprocess = subprocess.Popen( ['third-party-cmd', 'with', 'arguments'], shell=False, stdin=subprocess.PIPE, stdout=atempfile, stderr=subprocess.STDOUT,close_fds=True)
...
I need to use myprocess.poll() to check for process termination because I need to scan the atempfile until I find relevant information (the file may be > 1 GiB) and I need to terminate the process because of user request or because the process has been running too long. Once I find what I am looking for, I will stop checking the stdout temp file. I will clean it up after the external process is dead and before the worker thread terminates. I need the stdin PIPE in case I need to inject a response to something interactive in the child's stdin stream.
In my main program, I set a SIGINT and SIGTERM handler for me to perform cleanup, if my main python program is terminated with SIGTERM or SIGINT(Ctrl-C) if running from the shell.
Does anyone have a solid 2.x recipe for child signal handling in threads?
ctypes sigprocmask, etc.
Any help would be very appreciated. I am just looking for an 'official' recipe or the BEST hack, if one even exists.
Notes
I am using a restricted build of Python. I must use 2.7.3. Third-party-cmd is a program I do not have source for - modifying it is not possible.
There are many things in your description that look strange. First thing, you have a couple of different threads and processes. Who is crashing, who's receinving SIGTERM and who's receiving SIGKILL and due to which operations ?
Second: why does your parent receive SIGTERM ? It can't be implicitly sent. Someone is calling kill to your parent process, either directly or indirectly (for example, by killing the whole parent group).
Third point: how's your program terminating when you're handling SIGTERM ? By definition, the program terminates if it's not handled. If it's handled, it's not terminated. What's really happenning ?
Suggestions:
$ cat crsh.c
#include <stdio.h>
int main(void)
{
int *f = 0x0;
puts("Crashing");
*f = 0;
puts("Crashed");
return 0;
}
$ cat a.py
import subprocess, sys
print('begin')
p = subprocess.Popen('./crsh')
a = raw_input()
print(a)
p.wait()
print('end')
$ python a.py
begin
Crashing
abcd
abcd
end
This works. No signal delivered to the parent. Did you isolate the problem in your program ?
If the problem is a signal sent to multiple processes: can you use setpgid to set up a separate process group for the child ?
Is there any reason for creating the temporary file ? It's 1 GB files being created in your temporary directory. Why not piping stdout ?
If you're really sure you need to handle signals in your parent program (why didn't you try/except KeyboardInterrupt, for example ?): could signal() unspecified behavior with multi threaded programs be causing those problems (for example, dispatching a signal to a thread that does not handle signals) ?
NOTES
The effects of signal() in a multithreaded process are unspecified.
Anyway, try to explain with more precision what are the threads and process of your program, what they do, how were the signal handlers set up and why, who is sending signals, who is receiving, etc, etc, etc, etc, etc.

Prevent fork() from copying sockets

I have the following situation (pseudocode):
function f:
pid = fork()
if pid == 0:
exec to another long-running executable (no communication needed to that process)
else:
return "something"
f is exposed over a XmlRpc++ server. When the function is called over XML-RPC, the parent process prints "done closing socket" after the function returned "something". But the XML-RPC client hangs as long as the child process is still running. When I kill the child process, the XML-RPC client correctly finishes the RPC call.
It seems to me that I'm having a problem with fork() copying socket descriptors to the child process (parent called closesocket but child still owns a reference -> connection still established). How can I circumvent this?
EDIT: I read about FD_CLOEXEC already, but can't I force all descriptors to be closed on exec?
No, you can't force all file descriptors to be closed on exec. You will need to loop over all unwanted file descriptors in the child after the fork() and close them. Unfortunately, there isn't an easy, portable, way to do that - the usual approach is to use getrlimit() to get the current value of RLIMIT_NOFILE and loop from 3 to that number, trying close() on each candidate.
If you are happy to be Linux-only, you can read the /proc/self/fd/ directory to determine the open file descriptors and close them (except 0, 1 and 2 - which should either be left alone or reopened to /dev/null).

Resources