Issue with spawning multiple processes - python-3.x

I am running a code and getting this error:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
Could someone help me with this?
Here is the code, I am running:
def heavy(n, myid):
for x in range(1, n):
for y in range(1, n):
x**y
print(myid, "is done")
start=time.perf_counter()
big=[]
for i in range(50):
p=multiprocessing.Process(target=heavy,args=(500,i))
big.append(p)
p.start()
for i in big:
i.join()
end=time.perf_counter()
print(end-start)

Depending on the OS, the new process will be either forked from the current one (in linux) or spawned (Windows, mac). A simplistic view of forking is that a copy of the current process is made: all the objects already created will exists in the new copy, alas will not be the same objects as in the initial process. Indeed, objects in one process are not directly accessible from the other process. Spawning starts a new python interpreter, re-runs the file (with a __name__ different from '__main__'), and then runs the target passed when creating the Process. The re-running of the file is what is part of the 'bootstraping phase'.
In your case, as you do not use the if __name__ == '__main__': guard, when the file is re-run tries to start new process, which would lead to an infinite tree of new processes. The multiprocessing module takes care of this cases and detects that you are trying to start new processes even before the target was run.
For a simple test of what happens when you use multiprocessing, see the code in this answer.

Related

Multiprocessing with Multiple Functions: Need to add a function to the pool from within another function

I am measuring the metrics of an encryption algorithm that I designed. I have declared 2 functions and a brief sample is as follows:
import sys, random, timeit, psutil, os, time
from multiprocessing import Process
from subprocess import check_output
pid=0
def cpuUsage():
global running
while pid == 0:
time.sleep(1)
running=true
p = psutil.Process(pid)
while running:
print(f'PID: {pid}\t|\tCPU Usage: {p.memory_info().rss/(1024*1024)} MB')
time.sleep(1)
def Encryption()
global pid, running
pid = os.getpid()
myList=[]
for i in range(1000):
myList.append(random.randint(-sys.maxsize,sys.maxsize)+random.random())
print('Now running timeit function for speed metrics.')
p1 = Process(target=metric_collector())
p1.start()
p1.join()
number=1000
unit='msec'
setup = '''
import homomorphic,random,sys,time,os,timeit
myList={myList}
'''
enc_code='''
for x in range(len(myList)):
myList[x] = encryptMethod(a, b, myList[x], d)
'''
dec_code='''
\nfor x in range(len(myList)):
myList[x] = decryptMethod(myList[x])
'''
time=timeit.timeit(setup=setup,
stmt=(enc_code+dec_code),
number=number)
running=False
print(f'''Average Time:\t\t\t {time/number*.0001} seconds
Total time for {number} Iters:\t\t\t {time} {unit}s
Total Encrypted/Decrypted Values:\t {number*len(myList)}''')
sys.exit()
if __name__ == '__main__':
print('Beginning Metric Evaluation\n...\n')
p2 = Process(target=Encryption())
p2.start()
p2.join()
I am sure there's an implementation error in my code, I'm just having trouble grabbing the PID for the encryption method and I am trying to make the overhead from other calls as minimal as possible so I can get an accurate reading of just the functionality of the methods being called by timeit. If you know a simpler implementation, please let me know. Trying to figure out how to measure all of the metrics has been killing me softly.
I've tried acquiring the pid a few different ways, but I only want to measure performance when timeit is run. Good chance I'll have to break this out separately and run it that way (instead of multiprocessing) to evaluate the function properly, I'm guessing.
There are at least three major problems with your code. The net result is that you are not actually doing any multiprocessing.
The first problem is here, and in a couple of other similar places:
p2 = Process(target=Encryption())
What this code passes to Process is not the function Encryption but the returned value from Encryption(). It is exactly the same as if you had written:
x = Encryption()
p2 = Process(target=x)
What you want is this:
p2 = Process(target=Encryption)
This code tells Python to create a new Process and execute the function Encryption() in that Process.
The second problem has to do with the way Python handles memory for Processes. Each Process lives in its own memory space. Each Process has its own local copy of global variables, so you cannot set a global variable in one Process and have another Process be aware of this change. There are mechanisms to handle this important situation, documented in the multiprocessing module. See the section titled "Sharing state between processes." The bottom line here is that you cannot simply set a global variable inside a Process and expect other Processes to see the change, as you are trying to do with pid. You have to use one of the approaches described in the documentation.
The third problem is this code pattern, which occurs for both p1 and p2.
p2 = Process(target=Encryption)
p2.start()
p2.join()
This tells Python to create a Process and to start it. Then you immediately wait for it to finish, which means that your current Process must stop at that point until the new Process is finished. You never allow two Processes to run at once, so there is no performance benefit. The only reason to use multiprocessing is to run two things at the same time, which you never do. You might as well not bother with multiprocessing at all since it is only making your life more difficult.
Finally I am not sure why you have decided to try to use multiprocessing in the first place. The functions that measure memory usage and execution time are almost certainly very fast, and I would expect them to be much faster than any method of synchronizing one Process to another. If you're worried about errors due to the time used by the diagnostic functions themselves, I doubt that you can make things better by multiprocessing. Why not just start with a simple program and see what results you get?

Is there a reason for this difference between Python threads and processes?

When a list object is passed to a python (3.9) Process and Thread, the additions to the list object done in the thread are seen in the parent but not the additions done in the process. E. g.,
from multiprocessing import Process
from threading import Thread
def job(x, out):
out.append(f'f({x})')
out = []
pr = Process(target=job, args=('process', out))
th = Thread(target=job, args=('thread', out))
pr.start(), th.start()
pr.join(), th.join()
print(out)
This prints ['f(thread)']. I expected it to be (disregard the order) ['f(thread)', 'f(process)'].
Could someone explain the reason for this?
There's nothing Python-specific about it; that's just how processes work.
Specifically, all threads running within a given process share the process's memory-space -- so e.g. if thread A changes the state of a variable, thread B will "see" that change.
Processes, OTOH, each get their own private memory space that is inaccessible to all other processes. That's done deliberately as a way to prevent process A from accidentally (or deliberately) reading or corrupting the memory of process B.
When you spawn a child process, the new child process gets its own memory-space that initially contains a copy of all the data in the parent's memory space, but it is a separate space, so changes made by the child will not be visible to the parent (and vice-versa).

Why to call multiprocessing module with Process can create same instance?

My platform info:
uname -a
Linux debian 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux
python3 --version
Python 3.9.2
Note:to add a lock can make Singleton class take effect ,that is not my issue,no need to talk the process lock.
The same parts which will be executed in different status:
Singleton class
class Singleton(object):
def __init__(self):
pass
#classmethod
def instance(cls, *args, **kwargs):
if not hasattr(Singleton, "_instance"):
Singleton._instance = Singleton(*args, **kwargs)
return Singleton._instance
working function in the process
import time,os
def task():
print("start the process %d" % os.getpid())
time.sleep(2)
obj = Singleton.instance()
print(hex(id(obj)))
print("end the process %d" % os.getpid())
Creating multi-process with processing pool way:
from multiprocessing.pool import Pool
with Pool(processes = 4) as pool:
[pool.apply_async(func=task) for item in range(4)]
#same effcet with pool.apply ,pool.map,pool.map_async in this example,i have verified,you can try again
pool.close()
pool.join()
The result is as below:
start the process 11986
start the process 11987
start the process 11988
start the process 11989
0x7fd8764e04c0
end the process 11986
0x7fd8764e05b0
end the process 11987
0x7fd8764e0790
end the process 11989
0x7fd8764e06a0
end the process 11988
All sub-process has its own memory,they don't share space each other,they don't know other process has already created an instance,so it output different instances.
Creating multi-process with Process way:
import multiprocessing
for i in range(4):
t = multiprocessing.Process(target=task)
t.start()
The result is as below:
start the process 12012
start the process 12013
start the process 12014
start the process 12015
0x7fb288c21910
0x7fb288c21910
end the process 12014
end the process 12012
0x7fb288c21910
end the process 12013
0x7fb288c21910
end the process 12015
Why it create same instance with this way ?What is the working principle in the multiprocessing module?
#Reed Jones,i have read the related post you provided for many times.
In lmjohns3's answer:
So the net result is actually the same, but in the first case you're guaranteed to run foo and bar on different processes.
The first case is Process sub-module,the Process will guarante to run on different processes,so in my case :
import multiprocessing
for i in range(4):
t = multiprocessing.Process(target=task)
t.start()
It should result in several (maybe 4 or not,at least bigger than 1) instances instead of the same instance.
I am sure that the material can't explain my case.
As already explained in this answer, id implementation is platform specific and is not a good method to guarantee unique identifiers across multiple processes.
In CPython specifically, id returns the pointer of the object within its own process address space. Most of modern OSes abstract the computer memory using a methodology known as Virtual Memory.
What you are observing are actual different objects. Nevertheless, they appear to have the same identifiers as each process allocated that object in the same offset of its own memory address space.
The reason why this does not happen in the pool is most likely due to the fact the Pool is allocating several resources in the worker process (pipes, counters, etc..) before running the task function. Hence, it randomizes the process address space utilization enough such that the object IDs appear different across their sibling processes.

How can you create a independent process with python running in the background

I am trying to create a independent, behind the scenes process which runs completely on it's own with python. Is it possible to create such a process where even after an exiting the python script the process still keeps running in the background. I have created an .exe file with pyinstaller and I want that file to run in the background without popping up a console such that the user is not aware of the process unless he tends to open his Task Manager quite often.
The multiprocessing module helps to create processes but they are terminated after the script execution is completed. Same with the threading module.
Is it by any way possible to keep some particular piece of code running in the background even after the script has finished execution? or start the whole script in the background altogether without displaying any console?
I have tried using the Process class from the multiprocessing module and the Thread class from the threading module, but all of the exit after script execution. Even the subprocess and os module proves to be of no help
from multiprocessing import Process
from threading import Thread
def bg_func():
#stuff
def main():
proc = Process(target=bg_func,args=()) #Thread alternatively
proc.start()
When I found this problem the only solution I can see is to do a Double Fork. Here is an example that works.
import os
import time
from multiprocessing import Process
def bg():
# ignore first call
if os.fork() != 0:
return
print('sub process is running')
time.sleep(5)
print('sub process finished')
if __name__ == '__main__':
p = Process(target=bg)
p.start()
p.join()
print('exiting main')
exit(0)
The output shows the subprocess starting, the main program exiting status 0, my prompt coming back before the background program printed the last line. We are running in the background. :D
[0] 10:58 ~ $ python3 ./scratch_1.py
exiting main
sub process is running
[0] 10:58 ~ $ sub process finished

using dask distributed computing via jupyter notebook

I am seeing strange behavior from dask when using it from jupyter notebook. So I am initiating a local client and giving it a list of jobs to do. My real code is a bit complex so I am putting a simple example for you here:
from dask.distributed import Client
def inc(x):
return x + 1
if __name__ == '__main__':
c = Client()
futures = [c.submit(inc, i) for i in range(1,10)]
result = c.gather(futures)
print(len(result))
The problem is that, I realize that:
1. Dask initiates more than 9 processes for this example.
2. After the code has ran and it is done (nothing in the notebook is running), the processes created by dask are not killed (and the client is not shutdown). When I do a top, I can see all those processes still alive.
I saw in the documents that there is a client.close() option, but interestingly enough, such a functionality does not exist in 0.15.2.
The only time that the dask processes are killed, is when I stop the jupyter notebook. This issue is causing strange and unpredictable performance behavior. Is there anyway that the processes can get killed or the client shutdown when there is no code running on the notebook?
The default Client allows for optional parameters which are passed to LocalCluster (see the docs) and allow you to specify, for example, the number of processes you wish. Also, it is a context manager, which will close itself and end processes when you are done.
with Client(n_workers=2) as c:
futures = [c.submit(inc, i) for i in range(1,10)]
result = c.gather(futures)
print(len(result))
When this ends, the processes will be terminated.

Resources