threading does not speed up the process in python - python-3.x

I am just learning the threading module of python, threading implementation of a test code as below takes more time compared to the sequential implementation. Am I missing an underlying concept of threading in python?
from time import sleep, perf_counter
from threading import Thread
def task(id):
print(f'Starting the task {id}...')
for i in range(1, 1000):
for j in range(1, 1000):
b=(i**2)/(i*j**3)
print(f'The task {id} completed')
############## sequential ##############
start_time = perf_counter()
for n in range(1, 11):
task(n)
end_time = perf_counter()
print(f'sequential took {end_time- start_time: f} second(s) to complete.')
##########E Multi-threading ##########
start_time = perf_counter()
threads = []
for n in range(1, 11):
t = Thread(target=task, args=(n,))
threads.append(t)
t.start()
for t in threads:
t.join()
end_time = perf_counter()
print(f'multi-threaded took {end_time- start_time: f} second(s) to complete.')

I think your threading solution looks nice and correct but there is a pitfal of using threading in python as it is still using a single core. Check this tutorial which I find really helpful to understand why: https://www.quantstart.com/articles/Parallelising-Python-with-Threading-and-Multiprocessing/
The essence from that resource:
The GIL is necessary because the Python interpreter is not thread
safe. This means that there is a globally enforced lock when trying to
safely access Python objects from within threads. At any one time only
a single thread can acquire a lock for a Python object or C API. The
interpreter will reacquire this lock for every 100 bytecodes of Python
instructions and around (potentially) blocking I/O operations. Because
of this lock CPU-bound code will see no gain in performance when using
the Threading library, but it will likely gain performance increases
if the Multiprocessing library is used.
In other words: Use multiprocessing instead.

Related

How to use Release and Acquire in Lock in python multithreading?

I was trying to implement a program that can simultaneously change the elements of an array and print it, using multithreading in python.
from threading import Thread
import threading
import random
def print_ele(array):
count=True
while count:
print(array)
def change_ele():
array=[1,2,3,4]
t1=Thread(target=print_ele,args=(array,))
t1.start()
lock=threading.Lock()
random.seed(10)
count=True
while count:
lock.acquire()
for i in range(5):
array[i]=random.random()
lock.release()
change_ele()
I expect to get different random numbers printed in each iteration. But instead it seems that the array gets updated only once.
I know that we can do the same thing without multithreading. But I was wondering if we could do the same thing using multithreading.

Python: running many subprocesses from different threads is slow

I have a program with 1 process that starts a lot of threads.
Each thread might use subprocess.Popen to run some command.
I see that the time to run the command increases with the number of threads.
Example:
>>> def foo():
... s = time.time()
... subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
... print(time.time() - s)
...
>>> foo()
0.028950929641723633
>>> [threading.Thread(target=foo).start() for _ in range(10)]
0.058995723724365234
0.07323050498962402
0.09158825874328613
0.11541390419006348 # !!!
0.08147192001342773
0.05238771438598633
0.0950784683227539
0.10175108909606934 # !!!
0.09703755378723145
0.06497764587402344
Is there another way of executing a lot of commands from single process in parallel that doesn't decrease the performance?
Python's threads are, of course, concurrent, but they do not really run in parallel because of the GIL. Therefore, they are not suitable for CPU-bound applications. If you need to truly parallelize something and allow it to run on all CPU cores, you will need to use multiple processes. Here is a nice answer discussing this in more detail: What are the differences between the threading and multiprocessing modules?.
For the above example, multiprocessing.pool may be a good choice (note that there is also a ThreadPool available in this module).
from multiprocessing.pool import Pool
import subprocess
import time
def foo(*args):
s = time.time()
subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
return time.time() - s
if __name__ == "__main__":
with Pool(10) as p:
result = p.map(foo, range(10))
print(result)
# [0.018695592880249023, 0.009021520614624023, 0.01150059700012207, 0.02113938331604004, 0.014114856719970703, 0.01342153549194336, 0.011168956756591797, 0.014746427536010742, 0.013572454452514648, 0.008752584457397461]
result = p.map_async(foo, range(10))
print(result.get())
# [0.00636744499206543, 0.011589527130126953, 0.010645389556884766, 0.0070612430572509766, 0.013571739196777344, 0.009610414505004883, 0.007040739059448242, 0.010993719100952148, 0.012415409088134766, 0.0070383548736572266]
However, if your function is similar to the example in that it mostly just launches other processes and doesn't do a lot of calculations - I doubt parallelizing it will make much of a difference because the subprocesses can already run in parallel. Perhaps the slowdown occurs because your whole system gets overwhelmed for a moment because of all those processes (could be CPU usage is high or too many disk reads/writes are attempted within a short time). I would suggest taking a close look at system resources (Task Manager etc.) while running the program.
maybe it has nothing to do with python: Opening a new shell = opening a new file since basically everything is a file on linux
take a look at your limit for open files with this command (default is 1024):
ulimit
and try to raise it with this command to see if your code gets faster :
ulimit -n 2048

How to use multi-threading correctly in python?

Recently, I started learning about threading and i wanted to implement it in the following code.
import timeit
start = timeit.default_timer()
def func(num):
s = [(i, j, k) for i in range(num) for j in range(num) for k in range(num)]
return s
z = 150
a,b = func(z),func(z)
print(a[:5], b[:5])
stop = timeit.default_timer()
print("time: ", stop - start)
the time it took was:
time: 3.7628489000000003
So I tried to use the Threading module and modified the code as:
import timeit
from threading import Thread
start = timeit.default_timer()
def func(num):
s = [(i, j, k) for i in range(num) for j in range(num) for k in range(num)]
print(s[:5])
a = Thread(target=func, args=(150,))
b = Thread(target=func, args=(150,))
a.start()
b.start()
a.join()
b.join()
stop = timeit.default_timer()
print("time: ", stop - start)
the time it took was:
time: 4.2522736
But, its supposed to get halved instead it increases. Is there anything wrong in my implementation?
Please explain what went wrong or is there a better way to achieve this.
You have encountered what is known as Global Interpreter Lock, GIL for short.
Threads in python are not "real" threads, that is to say that they do not execute simultaneously, but atomic operations in them are computed in sequence in some order (that order is often hard to predetermine)
This means that threads of threading library are useful when you need to wait for many blocking things simultaneously. This is usually listening to a network connection when one thread sits at receive() -method until something is received.
Other threads can keep doing other things and don't have to keep constantly checking the connection.
Real performance gains however cannot be achieved with threading
There is another library, called multiprocessing which does implement real threads that actually execute simultaneously. Using multiprocessing is in many ways similar to threading library but requires a little bit more work and care. I've come to realise that this divide between threading and multiprocessing is a good and useful thing. Threads in threading all have access to the same complete namespace, and as long as race conditions are taken care of, they operate in the same universe.
Threads in multiprocessing (I should use term process here) on the other hand are separated by the chasm of different namespaces after the child process is started. One has to use specialized communication queues and shared namespace objects when transmitting information between them. This will quickly require hundreds of lines of boilerplate code.

Python multiprocessing: with and without pooling

I'm trying to understand Python's multiprocessing, and have devised the following code to test it:
import multiprocessing
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
processes = []
for i in range(25, 35):
processes.append(multiprocessing.Process(target=G, args=(i, )))
for pro in processes:
pro.start()
When I run it, I tells me that the computing time was roughly of 6.65s.
I then wrote the following code, which I thought to be functionally equivalent to the latter:
from multiprocessing.dummy import Pool as ThreadPool
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
in_data = [i for i in range(25, 35)]
pool = ThreadPool(10)
results = pool.map(G, in_data)
pool.close()
pool.join()
and its running time was almost 12s.
Why is it that the second takes almost twice as the first one? Aren't they supposed to be equivalent?
(NB. I'm running Python 3.6, but also tested a similar code on 3.52 with same results.)
The reason the second takes twice as long as the first is likely due to the CPython Global Interpreter Lock.
From http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html:
[...] the GIL effectively restricts bytecode execution to a single core, thus rendering pure Python threads an ineffective tool for distributing CPU bound work across multiple cores.
As you know, multiprocessing.dummy is a wrapper around the threading module, so you're creating threads, not processes. The Global Interpreter Lock, with a CPU-bound task as here, is not much different than simply executing your Fibonacci calculations sequentially in a single thread (except that you've added some thread-management/context-switching overhead).
With the "true multiprocessing" version, you only have a single thread in each process, each of which is using its own GIL. Hence, you can actually make use of multiple processors to improve the speed.
For this particular processing task, there is no significant advantage to using multiple threads over multiple processes. If you only have a single processor, there is no advantage to using either multiple processes or multiple threads over a single thread/process (in fact, both merely add context-switching overhead to your task).
(FWIW: A join in the true multiprocessing version is apparently being done automatically by the python runtime so adding an explicit join doesn't seem to make any difference in my tests using time(1). And, by the way, if you did want to add join, you should add a second loop for the join processing. Adding join to the existing loop will simply serialize your processes.)

Python threads making them take 100% of my cpu

Python threads making them take 100% of my cpu
All,
How do I make this script take 100% of my cpu? If this post is bad please explain why! Any help would be much appreciated.
import threading
import sys
def isPrime(number):
# isPrime is to check if a int is Prime
if not isinstance(number,int):
#check if number is a int
raise Exception("Please enter a int. Function: isPrime.")
#create array of numbers to check
rangeOfNumbers=range(1,number+1,1)
#count of how many multiplacations if it is a prime number it would be 2
multiplicationCount=0
#tow for loops to loop through all possibilities
for n1 in rangeOfNumbers:
for n2 in rangeOfNumbers:
if (n1*n2==number):
multiplicationCount +=1
if (multiplicationCount==2):
print(number)
return True
else:
return False
if __name__ == "__main__":
if not sys.version_info[0] == 3:
raise Exception("Please Upgrade or Downgrade your python to python 3.")
number=0
while True:
threads=[]
for i in range(100):
number+=1
thread=threading.Thread(target=isPrime,args=[number])
thread.start()
threads=[]
threads.append(thread)
for thread in threads:
thread.join()
isPrime not does no IO or other operation that could potentially relinquish the CPU (except print). It therefore consumes 100% of one CPU core. Since enough such jobs are kicked of, measurable CPU usage should stay at about 100% of one core. Note that, since Python has the additional limitation that only one thread can execute bytecode at the same time (referred to as the Global Interpreter Lock), no parallelism is achieved here.
Look into Python's multiprocessing module to achieve real concurrency. It spawns off new Python processes, thus allowing multiple primality tests to execute at the same time.
Lastly, your code does not properly wait for all threads
while True:
threads=[]
for i in range(100):
number+=1
thread=threading.Thread(target=isPrime,args=[number])
thread.start()
threads=[] # threads being reset here!
threads.append(thread)
for thread in threads:
thread.join()
(This is probably not intentional). It means that you keep creating threads in an infinite loop, but only wait for one of them to finish. This is going to run you out of memory at some point. It would be much more catastrophic if Python had real threading, though ...

Resources