Suppose I have 3 functions which needs to run in background at every "t" seconds(t can vary for every function) and for n times(n can vary for every function). How can we do this using Threading.
I have written follow:
import threading
import time
def func2(iterations=2, duration=10):
print("Hello, World!")
t = threading.Timer(duration, func2)
t.start()
def func3(iterations=3, duration=5):
print("Hi, World!")
t=threading.Timer(duration, func3)
t.start()
for func in [func1(), func2()]:
func
This does the job of waiting t seconds, but I cannot stop/iterate this for "n" times.
For ex: I want func2 to iterate 2 times and exit and func3 to execute 3 times and exit.
How can we achieve this in threading in Python?
Related
From my understanding, both the code blocks are doing the same thing. Why there is a difference in execution time?
import asyncio
import time
...
# Block 1:
start_time = time.time()
tasks = [
get_from_knowledge_v2(...),
get_from_knowledge_v2(...),
get_from_knowledge_v2(...),
]
data_list = await asyncio.gather(*tasks)
print("TIME TAKEN::", time.time() - start_time)
# Block 2:
start_time = time.time()
data1 = await get_from_knowledge_v2(...)
data2 = await get_from_knowledge_v2(...)
data3 = await get_from_knowledge_v2(...)
print("WITHOUT ASYNCIO GATHER TIME TAKEN::", time.time() - start_time)
Result:
TIME TAKEN:: 0.6016566753387451
WITHOUT ASYNCIO GATHER TIME TAKEN:: 1.7620849609375
The asyncio.gather function runs the awaitables you pass to it concurrently. That means, if I/O is happening in at least one of them that allows for useful context switches by the event loop. That in turn leads to a certain degree of parallelism.
In this case I assume that get_from_knowledge_v2 does some HTTP request in a way that supports asynchronous execution.
In the second code block you have no concurrency between the three get_from_knowledge_v2 calls. Instead you just execute them sequentially (with respect to each other). In other words, while you are awaiting the first one of them, the second one will not start. Their context is blocked.
Note: This does not mean that outside of that code block no concurrency is happening/possible. If this sequential code block is inside an async function (i.e. coroutine), you can execute that concurrently with some other coroutine. It is just that inside that code block, those get_from_knowledge_v2 coroutines are executed sequentially.
The time you measured confirms this rather nicely since you have three coroutines and gather allows them to be executed almost in parallel, while the other code block executes them sequentially, thus leading to an almost three times longer execution time.
PS
Maybe a minimal concrete example will help illustrate what I mean:
from asyncio import gather, run, sleep
from time import time
async def sleep_and_print(seconds: float) -> None:
await sleep(seconds)
print("slept", seconds, "seconds")
async def concurrent_sleeps() -> None:
await gather(
sleep_and_print(3),
sleep_and_print(2),
sleep_and_print(1),
)
async def sequential_sleeps() -> None:
await sleep_and_print(3)
await sleep_and_print(2)
await sleep_and_print(1)
async def something_else() -> None:
print("Doing something else that takes 4 seconds...")
await sleep(4)
print("Done with something else!")
async def main() -> None:
start = time()
await concurrent_sleeps()
print("concurrent_sleeps took", round(time() - start, 1), "seconds\n")
start = time()
await sequential_sleeps()
print("sequential_sleeps took", round(time() - start, 1), "seconds\n")
start = time()
await gather(
sequential_sleeps(),
something_else(),
)
print("sequential_sleeps & something_else together took", round(time() - start, 1), "seconds")
if __name__ == '__main__':
run(main())
Running that script gives the following output:
slept 1 seconds
slept 2 seconds
slept 3 seconds
concurrent_sleeps took 3.0 seconds
slept 3 seconds
slept 2 seconds
slept 1 seconds
sequential_sleeps took 6.0 seconds
Doing something else that takes 4 seconds...
slept 3 seconds
Done with something else!
slept 2 seconds
slept 1 seconds
sequential_sleeps & something_else together took 6.0 seconds
This illustrates that the sleeping was done almost in parallel inside concurrent_sleeps, with the 1 second sleep finishing first, then the 2 second sleep, then the 3 second sleep.
It shows that the sleeping is done sequentially inside sequential_sleeps and in the call order, meaning it first slept 3 seconds, then it slept 2 seconds, then 1 second.
And finally, executing sequential_sleeps concurrently with something_else shows that they are executed almost in parallel, with the 3-second-sleep finishing first (after 3 seconds), then one second later something_else finished, then another second later the 2-second-sleep, then after another second the 1-second-sleep. Together they still took approximately 6 seconds.
That last part is what I meant, when I said you an still execute another coroutine concurrently with the sequential block of code. In itself, the code block will still always remain sequential.
I hope this is clearer now.
PPS
Just to throw another option into the mix, you can also achieve concurrency by using Tasks. Calling asyncio.create_task will immediately schedule the coroutine for execution on the event loop. The task it creates should be awaited at some point, but the underlying coroutine will start running almost immediately after calling create_task. You can add this to the example script above:
from asyncio import create_task
...
async def task_sleeps() -> None:
t3 = create_task(sleep_and_print(3))
t2 = create_task(sleep_and_print(2))
t1 = create_task(sleep_and_print(1))
await t3
await t2
await t1
async def main() -> None:
...
start = time()
await task_sleeps()
print("task_sleeps took", round(time() - start, 1), "seconds\n")
And you'll see the following again:
...
slept 1 seconds
slept 2 seconds
slept 3 seconds
task_sleeps took 3.0 seconds
Tasks are a nice option to decouple the execution of some coroutine from its surrounding context to an extent, but you need to keep track of them in some way.
i want to do a job as fast as possible so i should paralelize it using processes (not threads because of GIL). My problem is that i cant start the processes at the sametime, it always start p1, when p1 ends, p2, and so on... how can i start all my processes at the same time? My simplified code:
import multiprocessing
import time
if __name__ == '__main__':
def work(data,num):
if(num==0):
time.sleep(5)
print("starts:",num)
******heavy works that lasts random seconds to be done*****************
print("ends",num)
**********
for k in range(0,2):
p = multiprocessing.Process(target=work(data,k))
p.daemon=True
p.start()
result:
starts 0
ends 0
starts 1
ends 1
starts 2
ends 2
What i expected:
starts 0
starts 1
starts 2
ends 1 or 2
ends 1 or 2
ends 0 (because of time.sleep)
why my scripts waits always until the first process is finished to start the next one?
First of all, making your program parallel/concurrent does not always make it faster as Amdahl's law suggests
Secondly, you want to use the join() method in order to execute them concurrently, furthermore, you need to pass the arguments with the args parameter, because what you are doing is calling the whole function each time, and blocking each run with time.sleep(5), without waiting on one process to finish as such:
process_pool = []
for k in range(0, 5):
p = multiprocessing.Process(target=work, args=('you_data', k))
p.daemon = True
process_pool.append(p)
for process in process_pool:
process.start()
for process in process_pool:
process.join()
Tried 2 code examples from first answer here: Python sharing a lock between processes. Result is the same.
import multiprocessing
import time
from threading import Lock
def target(arg):
if arg == 1:
lock.acquire()
time.sleep(1.1)
print('hi')
lock.release()
elif arg == 2:
while True:
print('not locked')
time.sleep(0.5)
def init(lock_: Lock):
global lock
lock = lock_
if __name__ == '__main__':
lock_ = multiprocessing.Lock()
with multiprocessing.Pool(initializer=init, initargs=[lock_], processes=2) as pool:
pool.map(target, [1, 2])
Why does this code prints:
not locked
not locked
not locked
hi
not locked
instead
hi
not locked
Well, call your worker processes "1" and "2". They both start. 2 prints "not locked", sleeps half a second, and loops around to print "not locked" again. But note that what 2 is printing has nothing do with whether lock is locked. Nothing in the code 2 executes even references lock, let alone synchronizes on lock. After another half second, 2 wakes up to print "not locked" for a third time, and goes to sleep again.
While that's going on, 1 starts, acquires the lock, sleeps for 1.1 seconds, and then prints "hi". It then releases the lock and ends. At the time 1 gets around to printing "hi", 2 has already printed "not locked" three times, and is about 0.1 seconds into its latest half-second sleep.
After "hi" is printed, 2 will continue printing "not locked" about twice per second forever more.
So the code appears to be doing what it was told to do.
What I can't guess, though, is how you expected to see "hi" first and then "not locked". That would require some kind of timing miracle, where 2 didn't start executing at all before 1 had been running for over 1.1 seconds. Not impossible, but extremely unlikely.
Changes
Here's one way to get the output you want, although I'm making many guesses about your intent.
If you don't want 2 to start before 1 ends, then you have to force that. One way is to have 2 begin by acquiring lock at the start of what it does. That also requires guaranteeing that lock is in the acquired state before any worker begins.
So acquire it before map() is called. Then there's no point left to having 1 acquire it at all - 1 can just start at once, and release it when it ends, so that 2 can proceed.
There are few changes to the code, but I'll paste all of it in here for convenience:
import multiprocessing
import time
from threading import Lock
def target(arg):
if arg == 1:
time.sleep(1.1)
print('hi')
lock.release()
elif arg == 2:
lock.acquire()
print('not locked')
time.sleep(0.5)
def init(lock_: Lock):
global lock
lock = lock_
if __name__ == '__main__':
lock_ = multiprocessing.Lock()
lock_.acquire()
with multiprocessing.Pool(initializer=init, initargs=[lock_], processes=2) as pool:
pool.map(target, [1, 2])
I have a code in which there are two functions (fun(), fun2()). Now I want to execute these 2 functions simultaneously but with some time offset.
Note: The execution time of fun1() and fun2() is the same.
Explanation:
The code is running.
fun1() is running and doing some tasks.
After a particular time offset (say after 10 seconds) I want to run
fun2() along with fun1().
When fun1() is done it should stop (Here fun2() is still running).
Again after 10 seconds fun1() should run and when fun2() is done it should stop (Here fun1() is still running).
And this process should repeat.
For parallel execution, I tried Multiprocess in python.
Below is a sample code.
from multiprocessing import Process
from datetime import datetime
def fun1():
#do something
def fun2():
#do something
main():
t = 0 # initially time = 0
t_offset = 10 # time offset
processes = []
p1 = Process(target=fun1) # Process p1 for fun1
p2 = Process(target=fun2) # Process p2 for fun2
processes.append(p1)
processes.append(p2)
while True:
dt = datetime.now()
t = datetime.now().second
if(dt.second == 0): # Here process p1 is started at beginning of a minute.
p1.start()
if(t == t_offset) # Here after 10 seconds offset process should start.
p2.start()
Is there any solution to the above problem. Can I have two processes running together with time offset between them?
Just add a time delay to the process you want by adding some lines of code at it's start.
eg.
import time
n=0
while n<3000:
time.sleep(1)
n=n+1
print(str(3000-n) + 'until process ')
else:
pass
causes delay of 300 seconds before body of process starts.
I previously asked Repeatedly run a function in parallel on how to run a function in parallel. The function that I am wanting to run has a stochastic element, where random integers are drawn.
When I use the code in that answer it returns repeated numbers within one process (and also between runs if I add an outer loop to repeat the process). For example,
import numpy as np
from multiprocessing.pool import Pool
def f(_):
x = np.random.uniform()
return x*x
if __name__ == "__main__":
processes = 3
p = Pool(processes)
print(p.map(f, range(6)))
returns
[0.8484870744666029, 0.8484870744666029, 0.04019012715175054, 0.04019012715175054, 0.7741414835156634, 0.7741414835156634]
Another run may give
[0.17390735240615365, 0.17390735240615365, 0.5188673758527017, 1.308159884267618e-08, 0.09140498447418667, 0.021537291489524404]
It seems as if there is some internal seed that is being used -- how can I generate random numbers similar to what would be returned from np.random.uniform(size=6) please?
Same output in different workers in multiprocessing indicates that the seed needs to be included in the function. Python multiprocessing pool.map for multiple arguments provides a way to pass multiple arguments to Pool -- one for the repeats and one for a list of seeds. This allows for a new seed for each process, and is reproducible.
import numpy as np
from multiprocessing.pool import Pool
def f(reps, seed):
np.random.seed(seed)
x = np.random.uniform()
return x*x
#np.random.seed(1)
if __name__ == "__main__":
processes = 3
p = Pool(processes)
print(p.starmap(f, zip(range(6), range(6))))
Where the second argument is the vector of seeds (to see change the line to print(p.starmap(f, zip(range(0,6), np.repeat(1,6)))))