I have sample code that uses map_async in Multiprocessing using Python 3. What I'm trying to figure out is how I can run map_async(a, c) and map_async(b, d) concurrently. But it seems like to second map_async(b, d) statement seems to run when the first one is about to finish. Is there a way I can run two map_async functions to run at the same time? I tried to search online but didn't get the answer that I wanted. Below is the sample code. If you have other suggestions, I'm very happy to listen to that as well. Thank you all for the help!
from multiprocessing import Pool
import time
import os
def a(i):
print('First': i)
return
def b(i):
print('Second': i)
return
if __name__ = '__main__':
c = range(100)
d = range(100)
pool = Pool(os.cpu_count())
pool.map_async(a, c)
pool.map_async(b, d)
pool.close()
pool.join()
map_async simply splits the iterable in a set of chunks and sends those chunks via a os.pipe to the workers. Therefore, two subsequent calls to map_async will appear to the workers as a single list composed by the join of the two above mentioned sets.
This is the correct behaviour as the workers really don't care about which map_async call a chunk belongs. Running two map_async in parallel would not bring any improvement in terms of speed or throughput.
If for any reason you really need the two call to be executed in parallel, the only way is to create two different Pool objects. I would nevertheless recommend against such approach as it would make things much more unpredictable.
Related
Concretely, I'm using Flask to process a request, pseudocode like this:
from flask import Flask, request
app = Flask(__name__)
#app.route("/foo", methods=["POST"])
def foo():
data = request.get_json() # {"request_id": "abc", "data": "some text"}
result_a = do_task_a(data) # returns {"result_a": "a"}, maybe about 1 second to finish
result_b = do_task_b(data) # returns {"result_b": "b"}, maybe about 1 second to finish
result_c = do_task_c(data) # returns {"result_c": "c"}, maybe about 1 second to finish
result = {
"result_a": result_a["result_a"],
"result_b": result_b["result_b"],
"result_c": result_c["result_c"]}
return result
app.run(host='0.0.0.0', port=4000, threaded=False)
Here, do_task_a, do_task_b, do_task_c are completely independent subtasks, I know I can use multiprocessing.Process to create processes to finish these three subtasks, and use join() to wait for subtask done, But I don't know it's proper way to create Process for every request?
Maybe I can use multiprocessing.Queue to help, But I don't find a good way.
I search for multiprocessing, but can't figure out a good solution.
I'm not a python guy, but indeed creating processes is sn expensive operation
If its possible - create threads they're cheaper than processes.
If you run the request multiple times - you can do even better than that, because creating threads per request is still quite expensive
Even more advanced setup is to create a "pre-loaded" thread pool. Like N threads that you always keep in memory ready for running arriving task.
In terms of technical solution I've found This article that explains how to create thread pools in python 3.2+
Recently, I started learning about threading and i wanted to implement it in the following code.
import timeit
start = timeit.default_timer()
def func(num):
s = [(i, j, k) for i in range(num) for j in range(num) for k in range(num)]
return s
z = 150
a,b = func(z),func(z)
print(a[:5], b[:5])
stop = timeit.default_timer()
print("time: ", stop - start)
the time it took was:
time: 3.7628489000000003
So I tried to use the Threading module and modified the code as:
import timeit
from threading import Thread
start = timeit.default_timer()
def func(num):
s = [(i, j, k) for i in range(num) for j in range(num) for k in range(num)]
print(s[:5])
a = Thread(target=func, args=(150,))
b = Thread(target=func, args=(150,))
a.start()
b.start()
a.join()
b.join()
stop = timeit.default_timer()
print("time: ", stop - start)
the time it took was:
time: 4.2522736
But, its supposed to get halved instead it increases. Is there anything wrong in my implementation?
Please explain what went wrong or is there a better way to achieve this.
You have encountered what is known as Global Interpreter Lock, GIL for short.
Threads in python are not "real" threads, that is to say that they do not execute simultaneously, but atomic operations in them are computed in sequence in some order (that order is often hard to predetermine)
This means that threads of threading library are useful when you need to wait for many blocking things simultaneously. This is usually listening to a network connection when one thread sits at receive() -method until something is received.
Other threads can keep doing other things and don't have to keep constantly checking the connection.
Real performance gains however cannot be achieved with threading
There is another library, called multiprocessing which does implement real threads that actually execute simultaneously. Using multiprocessing is in many ways similar to threading library but requires a little bit more work and care. I've come to realise that this divide between threading and multiprocessing is a good and useful thing. Threads in threading all have access to the same complete namespace, and as long as race conditions are taken care of, they operate in the same universe.
Threads in multiprocessing (I should use term process here) on the other hand are separated by the chasm of different namespaces after the child process is started. One has to use specialized communication queues and shared namespace objects when transmitting information between them. This will quickly require hundreds of lines of boilerplate code.
I write Python 3 code, in which I have 2 functions. The first function insertBlock() inserts data in MongoDB collection 1, the second function insertTransactionData() takes data from collection 1 and inserts it into collection 2. Data is in very large amount so I use threading to increase performance. But when I use threading it is taking more time to insert data than without threading. I am so confused that exactly how threading will work in my code and how to increase performance? Here is the main function :
if __name__ == '__main__':
t1 = threading.Thread(target=insertBlock())
t1.start()
t2 = threading.Thread(target=insertTransactionData())
t2.start()
From the python documentation for threading:
target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.
So the correct usage is
threading.Thread(target=insertBlock)
(without the () after insertBlock), because otherwise insertBlock is called, executed normally (blocking the main thread) and target is set to it's return value None. This causes t1.start() not to do anything and you don't get any performance improvement.
Warning:
Be aware that multithreading gives you no guarantee on what the order of execution in different threads will be. You can not rely on the data that insertBlock has inserted into the database inside the insertTransactionData function, because at the time insertTransactionData uses this data, you can not be sure that it was already inserted. So, maybe multithreading does not work at all for this code or you need to restructure your code and only parallelize those parts that do not depend on each other.
I solved this problem by merging these two functionalities into one new function
insertBlockAndTransaction(startrange,endrange). As these two functionalities depend on each other so what I did is I insert transaction information immediately below where block information is inserted (block number was common and needed for both functionalities).Then did multithreading by creating 10 threads for single function:
for i in range(10):
print('thread:',i)
t1 = threading.Thread(target=insertBlockAndTransaction,args(5000000+i*10000,5000000+(i+1)*10000))
t1.start()
It helps me to deal with increasing execution time for more than 1lakh data.
I'm trying to understand Python's multiprocessing, and have devised the following code to test it:
import multiprocessing
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
processes = []
for i in range(25, 35):
processes.append(multiprocessing.Process(target=G, args=(i, )))
for pro in processes:
pro.start()
When I run it, I tells me that the computing time was roughly of 6.65s.
I then wrote the following code, which I thought to be functionally equivalent to the latter:
from multiprocessing.dummy import Pool as ThreadPool
def F(n):
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
def G(n):
print(f'Fibbonacci of {n}: {F(n)}')
in_data = [i for i in range(25, 35)]
pool = ThreadPool(10)
results = pool.map(G, in_data)
pool.close()
pool.join()
and its running time was almost 12s.
Why is it that the second takes almost twice as the first one? Aren't they supposed to be equivalent?
(NB. I'm running Python 3.6, but also tested a similar code on 3.52 with same results.)
The reason the second takes twice as long as the first is likely due to the CPython Global Interpreter Lock.
From http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html:
[...] the GIL effectively restricts bytecode execution to a single core, thus rendering pure Python threads an ineffective tool for distributing CPU bound work across multiple cores.
As you know, multiprocessing.dummy is a wrapper around the threading module, so you're creating threads, not processes. The Global Interpreter Lock, with a CPU-bound task as here, is not much different than simply executing your Fibonacci calculations sequentially in a single thread (except that you've added some thread-management/context-switching overhead).
With the "true multiprocessing" version, you only have a single thread in each process, each of which is using its own GIL. Hence, you can actually make use of multiple processors to improve the speed.
For this particular processing task, there is no significant advantage to using multiple threads over multiple processes. If you only have a single processor, there is no advantage to using either multiple processes or multiple threads over a single thread/process (in fact, both merely add context-switching overhead to your task).
(FWIW: A join in the true multiprocessing version is apparently being done automatically by the python runtime so adding an explicit join doesn't seem to make any difference in my tests using time(1). And, by the way, if you did want to add join, you should add a second loop for the join processing. Adding join to the existing loop will simply serialize your processes.)
I need to run two consecutive pool.maps in python. But the second one is dependent on the results of the first map. Thus, before running the second pool.map I need to make sure the function1 is executed for all args. Can anyone show me how to do that?
# The first multiprocessing unit
pool = Pool(processes=num_p)
new_args=dict(pool.map(function1, args))
# The second multiprocessing unit
pool.map(function2, new_args)
Thanks
Surely pool.map will block until the results are done. How else could it return them?
You can also confirm this fact from the documentation.
It blocks until the result is ready.