How to share a variable with forked process in python? - python-3.x

In python3.6 I have the following code which forks a process, and the child changes a variable. However, the variable of the same name remains unchanged
import os, sys, time
var = 42
child_pid = os.fork()
if child_pid == 0:
print(f"Child Process start {os.getpid()}.")
var = 20
print(f"Child variable {var}")
time.sleep(10)
print(f"Child Process end {os.getpid()}.")
sys.exit(0)
else:
print(f"Parent Process start {os.getpid()}.")
for x in range(20):
time.sleep(2)
print(f"Parent variable {var}")
print(f"Parent Process end {os.getpid()}.")
How can I share the variable var in the example between the child and parent process?

Forking a process creates a new process with a new PID and in a separate memory space. So basically you can not share variables even if they are globals.
If you create a thread you could share global variables.
Otherwise with two (or more) processes you can use IPC (stands for Inter Process Communication) : https://docs.python.org/fr/3.5/library/ipc.html.
Common IPC are sockets (even local) but you can choose another one (e.g. memory mapping, message queues, shared memory ...).
Here is a post for the same problem but in C, the IPC being handled by the OS the principle remains the same : How to share memory between process fork()?.

Related

Is there a reason for this difference between Python threads and processes?

When a list object is passed to a python (3.9) Process and Thread, the additions to the list object done in the thread are seen in the parent but not the additions done in the process. E. g.,
from multiprocessing import Process
from threading import Thread
def job(x, out):
out.append(f'f({x})')
out = []
pr = Process(target=job, args=('process', out))
th = Thread(target=job, args=('thread', out))
pr.start(), th.start()
pr.join(), th.join()
print(out)
This prints ['f(thread)']. I expected it to be (disregard the order) ['f(thread)', 'f(process)'].
Could someone explain the reason for this?
There's nothing Python-specific about it; that's just how processes work.
Specifically, all threads running within a given process share the process's memory-space -- so e.g. if thread A changes the state of a variable, thread B will "see" that change.
Processes, OTOH, each get their own private memory space that is inaccessible to all other processes. That's done deliberately as a way to prevent process A from accidentally (or deliberately) reading or corrupting the memory of process B.
When you spawn a child process, the new child process gets its own memory-space that initially contains a copy of all the data in the parent's memory space, but it is a separate space, so changes made by the child will not be visible to the parent (and vice-versa).

Why to call multiprocessing module with Process can create same instance?

My platform info:
uname -a
Linux debian 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux
python3 --version
Python 3.9.2
Note:to add a lock can make Singleton class take effect ,that is not my issue,no need to talk the process lock.
The same parts which will be executed in different status:
Singleton class
class Singleton(object):
def __init__(self):
pass
#classmethod
def instance(cls, *args, **kwargs):
if not hasattr(Singleton, "_instance"):
Singleton._instance = Singleton(*args, **kwargs)
return Singleton._instance
working function in the process
import time,os
def task():
print("start the process %d" % os.getpid())
time.sleep(2)
obj = Singleton.instance()
print(hex(id(obj)))
print("end the process %d" % os.getpid())
Creating multi-process with processing pool way:
from multiprocessing.pool import Pool
with Pool(processes = 4) as pool:
[pool.apply_async(func=task) for item in range(4)]
#same effcet with pool.apply ,pool.map,pool.map_async in this example,i have verified,you can try again
pool.close()
pool.join()
The result is as below:
start the process 11986
start the process 11987
start the process 11988
start the process 11989
0x7fd8764e04c0
end the process 11986
0x7fd8764e05b0
end the process 11987
0x7fd8764e0790
end the process 11989
0x7fd8764e06a0
end the process 11988
All sub-process has its own memory,they don't share space each other,they don't know other process has already created an instance,so it output different instances.
Creating multi-process with Process way:
import multiprocessing
for i in range(4):
t = multiprocessing.Process(target=task)
t.start()
The result is as below:
start the process 12012
start the process 12013
start the process 12014
start the process 12015
0x7fb288c21910
0x7fb288c21910
end the process 12014
end the process 12012
0x7fb288c21910
end the process 12013
0x7fb288c21910
end the process 12015
Why it create same instance with this way ?What is the working principle in the multiprocessing module?
#Reed Jones,i have read the related post you provided for many times.
In lmjohns3's answer:
So the net result is actually the same, but in the first case you're guaranteed to run foo and bar on different processes.
The first case is Process sub-module,the Process will guarante to run on different processes,so in my case :
import multiprocessing
for i in range(4):
t = multiprocessing.Process(target=task)
t.start()
It should result in several (maybe 4 or not,at least bigger than 1) instances instead of the same instance.
I am sure that the material can't explain my case.
As already explained in this answer, id implementation is platform specific and is not a good method to guarantee unique identifiers across multiple processes.
In CPython specifically, id returns the pointer of the object within its own process address space. Most of modern OSes abstract the computer memory using a methodology known as Virtual Memory.
What you are observing are actual different objects. Nevertheless, they appear to have the same identifiers as each process allocated that object in the same offset of its own memory address space.
The reason why this does not happen in the pool is most likely due to the fact the Pool is allocating several resources in the worker process (pipes, counters, etc..) before running the task function. Hence, it randomizes the process address space utilization enough such that the object IDs appear different across their sibling processes.

How to run os.system to start a process through a different pid?

import os
from kazoo.client import KazooClient
signal.signal(signal.SIGINT, signal_handler)
def signal_handler(signal,frame):
print('\nYou pressed Ctrl+C!')
print("Stopping...Process "+process_id)
print()
children = zk.get_children("/assign2/root")
l = [int(i[3:]) for i in children]
l.remove(int(process_id))
print("Min process id : ",min(l))
zk.delete("/assign2/root/pid"+process_id, recursive=True)
#To run a command on terminal --> os.system("python3 zook.py")
if(int(process_id)==min(l)):
print("Starting new process through :" , min(l))
os.system("python3 zook.py")
os.kill(int(process_id),0)
zk.stop()
sys.exit(0)
zk = KazooClient(hosts='127.0.0.1:2181')
zk.start()
zk.ensure_path("/assign2/root")
zk.create("/assign2/root/pid"+process_id, bytes(process_id, encoding='utf-8'),ephemeral=True)
On killing a process, I want to find the smallest of the remaining pids and start a process through the smallest pid.
I am using zookeeper to store pids as ephemeral znodes and upon terminating the python script in terminal, I'm trying to create another process.
I am able to create a process through the pid of the process I am terminating, but not from another pid of my choice.
Please let me know if there is a way to do this.
So, what you need here is monitoring all your different processes (each process monitoring all other), and wait for the failure of a process. In zookeeper world, this can be achieved via:
1) Zookeeper watches: Each zookeeper client (process) can set up a watch on the parent znode (In your case /assign2/root/) and watch for CHILD_DELETE events. You can have a look at the zookeeper documentation and the documentation of your specify library (kazoo) to see how you can do that. On setting up the watch, you can specify a callback which would be executed asynchronously, each time a znode under your parent znode disappears. This could happen because:
The child znode is ephemeral and the zk client which create that
znode died.
The client explicitly deleted the child node. For example in your case you can delete the child node in the signal handler.
In this callback, each of the alive zookeeper clients/processes, will determine if its the lowest ranked process id (by fetching the list of all existing children under the parent znode), and if it is indeed the smallest pid, it will spawn the new process (python3 zook.py).
Note, that zookeeper watches are one-time fire concept. This means that after the watch has fired (i.e. the callback has executed), you would want to reset the watch at the very end of the callback execution, so that the next child delete event will also result in the callback firing.
2) The alternative approach is polling. In each of your process, you can have a dedicated thread which can periodically monitor the children of your parent znode, and each time a process detects that a node has disappeared, it can again determine if its the lowest alive pid and spawn a new process, if it indeed is.
Hope this helps. Let me know if you have any doubts. Would have liked to post the code you needed, but have to run for an errand.

How do different processes share file descriptors?

When forking a process, consider the following scenario:
1) We open two pipes for IPC bidirection communication
2) Suppose these have (3,4) and (5,6) as file descriptors.
3) We fork the process somewhere in the middle.
4) We exec the child process
Now, the thing that happens is that these two processes are completely independent of each other and the-then child process is now having it's own address space and is a completely new process.
Now, my question is, how do pipes(/file descriptor) live in an Execed processes? Because, pipes opened like this are used for the execed and the parent process to communicate.
The only way I see this could happen is when the file descriptors are global to the machine, which I think is impossible to happen, as that would be conflicting.
And in the IDE for this code:
import os
from multiprocessing import Process, Pipe
def sender(pipe):
"""
send object to parent on anonymous pipe
"""
pipe.send(['spam']+[42, 'eggs'])
pipe.close()
def talker(pipe):
"""
send and receive objects on a pipe
"""
pipe.send(dict(name = 'Bob', spam = 42))
reply = pipe.recv()
print('talker got: ', reply)
if __name__ == '__main__':
(parentEnd, childEnd) = Pipe()
Process(target = sender, args = (childEnd,)).start()
print("parent got: ", parentEnd.recv())
parentEnd.close()
(parentEnd, childEnd) = Pipe()
child = Process(target = talker, args = (childEnd,))
##############################from here
child.start()
print('From talker Parent got:', parentEnd.recv())
parentEnd.send({x * 2 for x in 'spam'})
child.join()
############################## to here
print('parent exit')
There are two processes run, but only the output from one process can be seen in the idle, not two processes. However, in the terminal, it's like the stdout is also shared.
The actual job of copying process file descriptor table which is regulated by more generic clone() syscall flag CLONE_FILES (which is actually is not set by the fork()):
CLONE_FILES (since Linux 2.0)
...
If CLONE_FILES is not set, the child process inherits a copy of all file
descriptors opened in the calling process at the time of clone(). (The dupli-
cated file descriptors in the child refer to the same open file descriptions
(see open(2)) as the corresponding file descriptors in the calling process.)
Subsequent operations that open or close file descriptors, or change file
descriptor flags, performed by either the calling process or the child process
do not affect the other process.
execve() doesn't touch file descriptors except when file is opened or marked with O_CLOEXEC or FD_CLOEXEC flags, in which case those descriptors will be closed:
* By default, file descriptors remain open across an execve(). File
descriptors that are marked close-on-exec are closed; see the
description of FD_CLOEXEC in fcntl(2).

Preventing threaded subprocess.popen from terminating my main script when child is killed?

Python 2.7.3 on Solaris 10
Questions
When my subprocess has an internal Segmentation Fault(core) issue or a user externally kills it from the shell with a SIGTERM or SIGKILL, my main program's signal handler handles a SIGTERM(-15) and my parent program exits. Is this real? or is it a bad python build?
Background and Code
I have a python script that first spawns a worker management thread. The worker management thread then spawns one or more worker threads. I have other stuff going on in my main thread that I cannot block. My management thread stuff and worker threads are rock-solid. My services run for years without restarts but then we have this subprocess.Popen scenario:
In the run method of the worker thread, I am using:
class workerThread(threading.Thread):
def __init__(self) :
super(workerThread, self).__init__()
...
def run(self)
...
atempfile = tempfile.NamedTempFile(delete=False)
myprocess = subprocess.Popen( ['third-party-cmd', 'with', 'arguments'], shell=False, stdin=subprocess.PIPE, stdout=atempfile, stderr=subprocess.STDOUT,close_fds=True)
...
I need to use myprocess.poll() to check for process termination because I need to scan the atempfile until I find relevant information (the file may be > 1 GiB) and I need to terminate the process because of user request or because the process has been running too long. Once I find what I am looking for, I will stop checking the stdout temp file. I will clean it up after the external process is dead and before the worker thread terminates. I need the stdin PIPE in case I need to inject a response to something interactive in the child's stdin stream.
In my main program, I set a SIGINT and SIGTERM handler for me to perform cleanup, if my main python program is terminated with SIGTERM or SIGINT(Ctrl-C) if running from the shell.
Does anyone have a solid 2.x recipe for child signal handling in threads?
ctypes sigprocmask, etc.
Any help would be very appreciated. I am just looking for an 'official' recipe or the BEST hack, if one even exists.
Notes
I am using a restricted build of Python. I must use 2.7.3. Third-party-cmd is a program I do not have source for - modifying it is not possible.
There are many things in your description that look strange. First thing, you have a couple of different threads and processes. Who is crashing, who's receinving SIGTERM and who's receiving SIGKILL and due to which operations ?
Second: why does your parent receive SIGTERM ? It can't be implicitly sent. Someone is calling kill to your parent process, either directly or indirectly (for example, by killing the whole parent group).
Third point: how's your program terminating when you're handling SIGTERM ? By definition, the program terminates if it's not handled. If it's handled, it's not terminated. What's really happenning ?
Suggestions:
$ cat crsh.c
#include <stdio.h>
int main(void)
{
int *f = 0x0;
puts("Crashing");
*f = 0;
puts("Crashed");
return 0;
}
$ cat a.py
import subprocess, sys
print('begin')
p = subprocess.Popen('./crsh')
a = raw_input()
print(a)
p.wait()
print('end')
$ python a.py
begin
Crashing
abcd
abcd
end
This works. No signal delivered to the parent. Did you isolate the problem in your program ?
If the problem is a signal sent to multiple processes: can you use setpgid to set up a separate process group for the child ?
Is there any reason for creating the temporary file ? It's 1 GB files being created in your temporary directory. Why not piping stdout ?
If you're really sure you need to handle signals in your parent program (why didn't you try/except KeyboardInterrupt, for example ?): could signal() unspecified behavior with multi threaded programs be causing those problems (for example, dispatching a signal to a thread that does not handle signals) ?
NOTES
The effects of signal() in a multithreaded process are unspecified.
Anyway, try to explain with more precision what are the threads and process of your program, what they do, how were the signal handlers set up and why, who is sending signals, who is receiving, etc, etc, etc, etc, etc.

Resources