Python multiprocessing: with and without pooling - python-3.x

I'm trying to understand Python's multiprocessing, and have devised the following code to test it:
import multiprocessing
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
processes = []
for i in range(25, 35):
processes.append(multiprocessing.Process(target=G, args=(i, )))
for pro in processes:
pro.start()
When I run it, I tells me that the computing time was roughly of 6.65s.
I then wrote the following code, which I thought to be functionally equivalent to the latter:
from multiprocessing.dummy import Pool as ThreadPool
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
in_data = [i for i in range(25, 35)]
pool = ThreadPool(10)
results = pool.map(G, in_data)
pool.close()
pool.join()
and its running time was almost 12s.
Why is it that the second takes almost twice as the first one? Aren't they supposed to be equivalent?
(NB. I'm running Python 3.6, but also tested a similar code on 3.52 with same results.)

The reason the second takes twice as long as the first is likely due to the CPython Global Interpreter Lock.
From http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html:
[...] the GIL effectively restricts bytecode execution to a single core, thus rendering pure Python threads an ineffective tool for distributing CPU bound work across multiple cores.
As you know, multiprocessing.dummy is a wrapper around the threading module, so you're creating threads, not processes. The Global Interpreter Lock, with a CPU-bound task as here, is not much different than simply executing your Fibonacci calculations sequentially in a single thread (except that you've added some thread-management/context-switching overhead).
With the "true multiprocessing" version, you only have a single thread in each process, each of which is using its own GIL. Hence, you can actually make use of multiple processors to improve the speed.
For this particular processing task, there is no significant advantage to using multiple threads over multiple processes. If you only have a single processor, there is no advantage to using either multiple processes or multiple threads over a single thread/process (in fact, both merely add context-switching overhead to your task).
(FWIW: A join in the true multiprocessing version is apparently being done automatically by the python runtime so adding an explicit join doesn't seem to make any difference in my tests using time(1). And, by the way, if you did want to add join, you should add a second loop for the join processing. Adding join to the existing loop will simply serialize your processes.)

Related

Multiprocessing with Multiple Functions: Need to add a function to the pool from within another function

I am measuring the metrics of an encryption algorithm that I designed. I have declared 2 functions and a brief sample is as follows:
import sys, random, timeit, psutil, os, time
from multiprocessing import Process
from subprocess import check_output
pid=0
def cpuUsage():
global running
while pid == 0:
time.sleep(1)
running=true
p = psutil.Process(pid)
while running:
print(f'PID: {pid}\t|\tCPU Usage: {p.memory_info().rss/(1024*1024)} MB')
time.sleep(1)
def Encryption()
global pid, running
pid = os.getpid()
myList=[]
for i in range(1000):
myList.append(random.randint(-sys.maxsize,sys.maxsize)+random.random())
print('Now running timeit function for speed metrics.')
p1 = Process(target=metric_collector())
p1.start()
p1.join()
number=1000
unit='msec'
setup = '''
import homomorphic,random,sys,time,os,timeit
myList={myList}
'''
enc_code='''
for x in range(len(myList)):
myList[x] = encryptMethod(a, b, myList[x], d)
'''
dec_code='''
\nfor x in range(len(myList)):
myList[x] = decryptMethod(myList[x])
'''
time=timeit.timeit(setup=setup,
stmt=(enc_code+dec_code),
number=number)
running=False
print(f'''Average Time:\t\t\t {time/number*.0001} seconds
Total time for {number} Iters:\t\t\t {time} {unit}s
Total Encrypted/Decrypted Values:\t {number*len(myList)}''')
sys.exit()
if __name__ == '__main__':
print('Beginning Metric Evaluation\n...\n')
p2 = Process(target=Encryption())
p2.start()
p2.join()
I am sure there's an implementation error in my code, I'm just having trouble grabbing the PID for the encryption method and I am trying to make the overhead from other calls as minimal as possible so I can get an accurate reading of just the functionality of the methods being called by timeit. If you know a simpler implementation, please let me know. Trying to figure out how to measure all of the metrics has been killing me softly.
I've tried acquiring the pid a few different ways, but I only want to measure performance when timeit is run. Good chance I'll have to break this out separately and run it that way (instead of multiprocessing) to evaluate the function properly, I'm guessing.
There are at least three major problems with your code. The net result is that you are not actually doing any multiprocessing.
The first problem is here, and in a couple of other similar places:
p2 = Process(target=Encryption())
What this code passes to Process is not the function Encryption but the returned value from Encryption(). It is exactly the same as if you had written:
x = Encryption()
p2 = Process(target=x)
What you want is this:
p2 = Process(target=Encryption)
This code tells Python to create a new Process and execute the function Encryption() in that Process.
The second problem has to do with the way Python handles memory for Processes. Each Process lives in its own memory space. Each Process has its own local copy of global variables, so you cannot set a global variable in one Process and have another Process be aware of this change. There are mechanisms to handle this important situation, documented in the multiprocessing module. See the section titled "Sharing state between processes." The bottom line here is that you cannot simply set a global variable inside a Process and expect other Processes to see the change, as you are trying to do with pid. You have to use one of the approaches described in the documentation.
The third problem is this code pattern, which occurs for both p1 and p2.
p2 = Process(target=Encryption)
p2.start()
p2.join()
This tells Python to create a Process and to start it. Then you immediately wait for it to finish, which means that your current Process must stop at that point until the new Process is finished. You never allow two Processes to run at once, so there is no performance benefit. The only reason to use multiprocessing is to run two things at the same time, which you never do. You might as well not bother with multiprocessing at all since it is only making your life more difficult.
Finally I am not sure why you have decided to try to use multiprocessing in the first place. The functions that measure memory usage and execution time are almost certainly very fast, and I would expect them to be much faster than any method of synchronizing one Process to another. If you're worried about errors due to the time used by the diagnostic functions themselves, I doubt that you can make things better by multiprocessing. Why not just start with a simple program and see what results you get?

How to pass a share value to Processes which has jit / njit function that read and modify the share value?

I am trying to have an integer value which would be assigned to a multiprocess programme and each process has a jit funtion to read and modify the value.
I came accross with multiprocessing.Manager().value which would pass a share value to each process, but numba.jit does not accept this type.
Is there any solution to work around it?
import numba
import multiprocessing
#numba.jit()
def jj (o, ii):
print (o.value)
o.value = ii
print (o.value)
if __name__ == '__main__':
o = multiprocessing.Manager().Value('i', 0 , lock=False)
y1 = multiprocessing.Process(target=jj, args=(o,10))
y1.daemon = True
y2 = multiprocessing.Process(target=jj, args=(o,20))
y2.daemon = True
y1.start()
y2.start()
y1.join()
y2.join()
You cannot modify a CPython object from an njit function so the function will (almost) not benefit from Numba (the only optimization Numba can do is looplifting but it cannot be used here anyway). What you try to archive is not possible with multiprocessing + njitted Numba functions. Numba can be fast because it does not operate on CPython types but native ones but multiprocessing's managers operate on only on CPython types. You can use the very experimental objmode scope of Numba so to execute pure-Python in a Numba function but be aware that this is slow (and it sometimes just crash currently).
Another big issue is that shared CPython objects are protected by the global interpreter lock (GIL) which basically prevent any parallel speed-up inside a process (unless on IO-based codes or similar things). The GIL is designed so to protect the interpreter of race conditions on the internal state of objects. AFAIK, managers can transfer pure-Python objects between processes thanks to pickling (which is slow), but using lock=False is unsafe and can also cause a race condition (not at the interpreter level thanks to the GIL).
Note the Numba function have to be recompiled for each process which is slow (caching can help the subsequent runs but not the first time because of concurrent compilation in multiple processes).

Python: running many subprocesses from different threads is slow

I have a program with 1 process that starts a lot of threads.
Each thread might use subprocess.Popen to run some command.
I see that the time to run the command increases with the number of threads.
Example:
>>> def foo():
... s = time.time()
... subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
... print(time.time() - s)
...
>>> foo()
0.028950929641723633
>>> [threading.Thread(target=foo).start() for _ in range(10)]
0.058995723724365234
0.07323050498962402
0.09158825874328613
0.11541390419006348 # !!!
0.08147192001342773
0.05238771438598633
0.0950784683227539
0.10175108909606934 # !!!
0.09703755378723145
0.06497764587402344
Is there another way of executing a lot of commands from single process in parallel that doesn't decrease the performance?
Python's threads are, of course, concurrent, but they do not really run in parallel because of the GIL. Therefore, they are not suitable for CPU-bound applications. If you need to truly parallelize something and allow it to run on all CPU cores, you will need to use multiple processes. Here is a nice answer discussing this in more detail: What are the differences between the threading and multiprocessing modules?.
For the above example, multiprocessing.pool may be a good choice (note that there is also a ThreadPool available in this module).
from multiprocessing.pool import Pool
import subprocess
import time
def foo(*args):
s = time.time()
subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
return time.time() - s
if __name__ == "__main__":
with Pool(10) as p:
result = p.map(foo, range(10))
print(result)
# [0.018695592880249023, 0.009021520614624023, 0.01150059700012207, 0.02113938331604004, 0.014114856719970703, 0.01342153549194336, 0.011168956756591797, 0.014746427536010742, 0.013572454452514648, 0.008752584457397461]
result = p.map_async(foo, range(10))
print(result.get())
# [0.00636744499206543, 0.011589527130126953, 0.010645389556884766, 0.0070612430572509766, 0.013571739196777344, 0.009610414505004883, 0.007040739059448242, 0.010993719100952148, 0.012415409088134766, 0.0070383548736572266]
However, if your function is similar to the example in that it mostly just launches other processes and doesn't do a lot of calculations - I doubt parallelizing it will make much of a difference because the subprocesses can already run in parallel. Perhaps the slowdown occurs because your whole system gets overwhelmed for a moment because of all those processes (could be CPU usage is high or too many disk reads/writes are attempted within a short time). I would suggest taking a close look at system resources (Task Manager etc.) while running the program.
maybe it has nothing to do with python: Opening a new shell = opening a new file since basically everything is a file on linux
take a look at your limit for open files with this command (default is 1024):
ulimit
and try to raise it with this command to see if your code gets faster :
ulimit -n 2048

How to use multi-threading correctly in python?

Recently, I started learning about threading and i wanted to implement it in the following code.
import timeit
start = timeit.default_timer()
def func(num):
s = [(i, j, k) for i in range(num) for j in range(num) for k in range(num)]
return s
z = 150
a,b = func(z),func(z)
print(a[:5], b[:5])
stop = timeit.default_timer()
print("time: ", stop - start)
the time it took was:
time: 3.7628489000000003
So I tried to use the Threading module and modified the code as:
import timeit
from threading import Thread
start = timeit.default_timer()
def func(num):
s = [(i, j, k) for i in range(num) for j in range(num) for k in range(num)]
print(s[:5])
a = Thread(target=func, args=(150,))
b = Thread(target=func, args=(150,))
a.start()
b.start()
a.join()
b.join()
stop = timeit.default_timer()
print("time: ", stop - start)
the time it took was:
time: 4.2522736
But, its supposed to get halved instead it increases. Is there anything wrong in my implementation?
Please explain what went wrong or is there a better way to achieve this.
You have encountered what is known as Global Interpreter Lock, GIL for short.
Threads in python are not "real" threads, that is to say that they do not execute simultaneously, but atomic operations in them are computed in sequence in some order (that order is often hard to predetermine)
This means that threads of threading library are useful when you need to wait for many blocking things simultaneously. This is usually listening to a network connection when one thread sits at receive() -method until something is received.
Other threads can keep doing other things and don't have to keep constantly checking the connection.
Real performance gains however cannot be achieved with threading
There is another library, called multiprocessing which does implement real threads that actually execute simultaneously. Using multiprocessing is in many ways similar to threading library but requires a little bit more work and care. I've come to realise that this divide between threading and multiprocessing is a good and useful thing. Threads in threading all have access to the same complete namespace, and as long as race conditions are taken care of, they operate in the same universe.
Threads in multiprocessing (I should use term process here) on the other hand are separated by the chasm of different namespaces after the child process is started. One has to use specialized communication queues and shared namespace objects when transmitting information between them. This will quickly require hundreds of lines of boilerplate code.

Python threads making them take 100% of my cpu

Python threads making them take 100% of my cpu
All,
How do I make this script take 100% of my cpu? If this post is bad please explain why! Any help would be much appreciated.
import threading
import sys
def isPrime(number):
# isPrime is to check if a int is Prime
if not isinstance(number,int):
#check if number is a int
raise Exception("Please enter a int. Function: isPrime.")
#create array of numbers to check
rangeOfNumbers=range(1,number+1,1)
#count of how many multiplacations if it is a prime number it would be 2
multiplicationCount=0
#tow for loops to loop through all possibilities
for n1 in rangeOfNumbers:
for n2 in rangeOfNumbers:
if (n1*n2==number):
multiplicationCount +=1
if (multiplicationCount==2):
print(number)
return True
else:
return False
if __name__ == "__main__":
if not sys.version_info[0] == 3:
raise Exception("Please Upgrade or Downgrade your python to python 3.")
number=0
while True:
threads=[]
for i in range(100):
number+=1
thread=threading.Thread(target=isPrime,args=[number])
thread.start()
threads=[]
threads.append(thread)
for thread in threads:
thread.join()
isPrime not does no IO or other operation that could potentially relinquish the CPU (except print). It therefore consumes 100% of one CPU core. Since enough such jobs are kicked of, measurable CPU usage should stay at about 100% of one core. Note that, since Python has the additional limitation that only one thread can execute bytecode at the same time (referred to as the Global Interpreter Lock), no parallelism is achieved here.
Look into Python's multiprocessing module to achieve real concurrency. It spawns off new Python processes, thus allowing multiple primality tests to execute at the same time.
Lastly, your code does not properly wait for all threads
while True:
threads=[]
for i in range(100):
number+=1
thread=threading.Thread(target=isPrime,args=[number])
thread.start()
threads=[] # threads being reset here!
threads.append(thread)
for thread in threads:
thread.join()
(This is probably not intentional). It means that you keep creating threads in an infinite loop, but only wait for one of them to finish. This is going to run you out of memory at some point. It would be much more catastrophic if Python had real threading, though ...

Resources