Multiprocessing in Python with time offset - python-3.x

I have a code in which there are two functions (fun(), fun2()). Now I want to execute these 2 functions simultaneously but with some time offset.
Note: The execution time of fun1() and fun2() is the same.
Explanation:
The code is running.
fun1() is running and doing some tasks.
After a particular time offset (say after 10 seconds) I want to run
fun2() along with fun1().
When fun1() is done it should stop (Here fun2() is still running).
Again after 10 seconds fun1() should run and when fun2() is done it should stop (Here fun1() is still running).
And this process should repeat.
For parallel execution, I tried Multiprocess in python.
Below is a sample code.
from multiprocessing import Process
from datetime import datetime
def fun1():
#do something
def fun2():
#do something
main():
t = 0 # initially time = 0
t_offset = 10 # time offset
processes = []
p1 = Process(target=fun1) # Process p1 for fun1
p2 = Process(target=fun2) # Process p2 for fun2
processes.append(p1)
processes.append(p2)
while True:
dt = datetime.now()
t = datetime.now().second
if(dt.second == 0): # Here process p1 is started at beginning of a minute.
p1.start()
if(t == t_offset) # Here after 10 seconds offset process should start.
p2.start()
Is there any solution to the above problem. Can I have two processes running together with time offset between them?

Just add a time delay to the process you want by adding some lines of code at it's start.
eg.
import time
n=0
while n<3000:
time.sleep(1)
n=n+1
print(str(3000-n) + 'until process ')
else:
pass
causes delay of 300 seconds before body of process starts.

Related

Asyncio gather difference

From my understanding, both the code blocks are doing the same thing. Why there is a difference in execution time?
import asyncio
import time
...
# Block 1:
start_time = time.time()
tasks = [
get_from_knowledge_v2(...),
get_from_knowledge_v2(...),
get_from_knowledge_v2(...),
]
data_list = await asyncio.gather(*tasks)
print("TIME TAKEN::", time.time() - start_time)
# Block 2:
start_time = time.time()
data1 = await get_from_knowledge_v2(...)
data2 = await get_from_knowledge_v2(...)
data3 = await get_from_knowledge_v2(...)
print("WITHOUT ASYNCIO GATHER TIME TAKEN::", time.time() - start_time)
Result:
TIME TAKEN:: 0.6016566753387451
WITHOUT ASYNCIO GATHER TIME TAKEN:: 1.7620849609375
The asyncio.gather function runs the awaitables you pass to it concurrently. That means, if I/O is happening in at least one of them that allows for useful context switches by the event loop. That in turn leads to a certain degree of parallelism.
In this case I assume that get_from_knowledge_v2 does some HTTP request in a way that supports asynchronous execution.
In the second code block you have no concurrency between the three get_from_knowledge_v2 calls. Instead you just execute them sequentially (with respect to each other). In other words, while you are awaiting the first one of them, the second one will not start. Their context is blocked.
Note: This does not mean that outside of that code block no concurrency is happening/possible. If this sequential code block is inside an async function (i.e. coroutine), you can execute that concurrently with some other coroutine. It is just that inside that code block, those get_from_knowledge_v2 coroutines are executed sequentially.
The time you measured confirms this rather nicely since you have three coroutines and gather allows them to be executed almost in parallel, while the other code block executes them sequentially, thus leading to an almost three times longer execution time.
PS
Maybe a minimal concrete example will help illustrate what I mean:
from asyncio import gather, run, sleep
from time import time
async def sleep_and_print(seconds: float) -> None:
await sleep(seconds)
print("slept", seconds, "seconds")
async def concurrent_sleeps() -> None:
await gather(
sleep_and_print(3),
sleep_and_print(2),
sleep_and_print(1),
)
async def sequential_sleeps() -> None:
await sleep_and_print(3)
await sleep_and_print(2)
await sleep_and_print(1)
async def something_else() -> None:
print("Doing something else that takes 4 seconds...")
await sleep(4)
print("Done with something else!")
async def main() -> None:
start = time()
await concurrent_sleeps()
print("concurrent_sleeps took", round(time() - start, 1), "seconds\n")
start = time()
await sequential_sleeps()
print("sequential_sleeps took", round(time() - start, 1), "seconds\n")
start = time()
await gather(
sequential_sleeps(),
something_else(),
)
print("sequential_sleeps & something_else together took", round(time() - start, 1), "seconds")
if __name__ == '__main__':
run(main())
Running that script gives the following output:
slept 1 seconds
slept 2 seconds
slept 3 seconds
concurrent_sleeps took 3.0 seconds
slept 3 seconds
slept 2 seconds
slept 1 seconds
sequential_sleeps took 6.0 seconds
Doing something else that takes 4 seconds...
slept 3 seconds
Done with something else!
slept 2 seconds
slept 1 seconds
sequential_sleeps & something_else together took 6.0 seconds
This illustrates that the sleeping was done almost in parallel inside concurrent_sleeps, with the 1 second sleep finishing first, then the 2 second sleep, then the 3 second sleep.
It shows that the sleeping is done sequentially inside sequential_sleeps and in the call order, meaning it first slept 3 seconds, then it slept 2 seconds, then 1 second.
And finally, executing sequential_sleeps concurrently with something_else shows that they are executed almost in parallel, with the 3-second-sleep finishing first (after 3 seconds), then one second later something_else finished, then another second later the 2-second-sleep, then after another second the 1-second-sleep. Together they still took approximately 6 seconds.
That last part is what I meant, when I said you an still execute another coroutine concurrently with the sequential block of code. In itself, the code block will still always remain sequential.
I hope this is clearer now.
PPS
Just to throw another option into the mix, you can also achieve concurrency by using Tasks. Calling asyncio.create_task will immediately schedule the coroutine for execution on the event loop. The task it creates should be awaited at some point, but the underlying coroutine will start running almost immediately after calling create_task. You can add this to the example script above:
from asyncio import create_task
...
async def task_sleeps() -> None:
t3 = create_task(sleep_and_print(3))
t2 = create_task(sleep_and_print(2))
t1 = create_task(sleep_and_print(1))
await t3
await t2
await t1
async def main() -> None:
...
start = time()
await task_sleeps()
print("task_sleeps took", round(time() - start, 1), "seconds\n")
And you'll see the following again:
...
slept 1 seconds
slept 2 seconds
slept 3 seconds
task_sleeps took 3.0 seconds
Tasks are a nice option to decouple the execution of some coroutine from its surrounding context to an extent, but you need to keep track of them in some way.

Repeat different threads for every "t" seconds "n" times

Suppose I have 3 functions which needs to run in background at every "t" seconds(t can vary for every function) and for n times(n can vary for every function). How can we do this using Threading.
I have written follow:
import threading
import time
def func2(iterations=2, duration=10):
print("Hello, World!")
t = threading.Timer(duration, func2)
t.start()
def func3(iterations=3, duration=5):
print("Hi, World!")
t=threading.Timer(duration, func3)
t.start()
for func in [func1(), func2()]:
func
This does the job of waiting t seconds, but I cannot stop/iterate this for "n" times.
For ex: I want func2 to iterate 2 times and exit and func3 to execute 3 times and exit.
How can we achieve this in threading in Python?

How to create new non-blocking processes in python? (examples does not work)

i want to do a job as fast as possible so i should paralelize it using processes (not threads because of GIL). My problem is that i cant start the processes at the sametime, it always start p1, when p1 ends, p2, and so on... how can i start all my processes at the same time? My simplified code:
import multiprocessing
import time
if __name__ == '__main__':
def work(data,num):
if(num==0):
time.sleep(5)
print("starts:",num)
******heavy works that lasts random seconds to be done*****************
print("ends",num)
**********
for k in range(0,2):
p = multiprocessing.Process(target=work(data,k))
p.daemon=True
p.start()
result:
starts 0
ends 0
starts 1
ends 1
starts 2
ends 2
What i expected:
starts 0
starts 1
starts 2
ends 1 or 2
ends 1 or 2
ends 0 (because of time.sleep)
why my scripts waits always until the first process is finished to start the next one?
First of all, making your program parallel/concurrent does not always make it faster as Amdahl's law suggests
Secondly, you want to use the join() method in order to execute them concurrently, furthermore, you need to pass the arguments with the args parameter, because what you are doing is calling the whole function each time, and blocking each run with time.sleep(5), without waiting on one process to finish as such:
process_pool = []
for k in range(0, 5):
p = multiprocessing.Process(target=work, args=('you_data', k))
p.daemon = True
process_pool.append(p)
for process in process_pool:
process.start()
for process in process_pool:
process.join()

Why doesn't multiprocessing Lock acquiring work?

Tried 2 code examples from first answer here: Python sharing a lock between processes. Result is the same.
import multiprocessing
import time
from threading import Lock
def target(arg):
if arg == 1:
lock.acquire()
time.sleep(1.1)
print('hi')
lock.release()
elif arg == 2:
while True:
print('not locked')
time.sleep(0.5)
def init(lock_: Lock):
global lock
lock = lock_
if __name__ == '__main__':
lock_ = multiprocessing.Lock()
with multiprocessing.Pool(initializer=init, initargs=[lock_], processes=2) as pool:
pool.map(target, [1, 2])
Why does this code prints:
not locked
not locked
not locked
hi
not locked
instead
hi
not locked
Well, call your worker processes "1" and "2". They both start. 2 prints "not locked", sleeps half a second, and loops around to print "not locked" again. But note that what 2 is printing has nothing do with whether lock is locked. Nothing in the code 2 executes even references lock, let alone synchronizes on lock. After another half second, 2 wakes up to print "not locked" for a third time, and goes to sleep again.
While that's going on, 1 starts, acquires the lock, sleeps for 1.1 seconds, and then prints "hi". It then releases the lock and ends. At the time 1 gets around to printing "hi", 2 has already printed "not locked" three times, and is about 0.1 seconds into its latest half-second sleep.
After "hi" is printed, 2 will continue printing "not locked" about twice per second forever more.
So the code appears to be doing what it was told to do.
What I can't guess, though, is how you expected to see "hi" first and then "not locked". That would require some kind of timing miracle, where 2 didn't start executing at all before 1 had been running for over 1.1 seconds. Not impossible, but extremely unlikely.
Changes
Here's one way to get the output you want, although I'm making many guesses about your intent.
If you don't want 2 to start before 1 ends, then you have to force that. One way is to have 2 begin by acquiring lock at the start of what it does. That also requires guaranteeing that lock is in the acquired state before any worker begins.
So acquire it before map() is called. Then there's no point left to having 1 acquire it at all - 1 can just start at once, and release it when it ends, so that 2 can proceed.
There are few changes to the code, but I'll paste all of it in here for convenience:
import multiprocessing
import time
from threading import Lock
def target(arg):
if arg == 1:
time.sleep(1.1)
print('hi')
lock.release()
elif arg == 2:
lock.acquire()
print('not locked')
time.sleep(0.5)
def init(lock_: Lock):
global lock
lock = lock_
if __name__ == '__main__':
lock_ = multiprocessing.Lock()
lock_.acquire()
with multiprocessing.Pool(initializer=init, initargs=[lock_], processes=2) as pool:
pool.map(target, [1, 2])

Python fails to parallelize buffer reads

I'm having performances issues in multi-threading.
I have a code snippet that reads 8MB buffers in parallel:
import copy
import itertools
import threading
import time
# Basic implementation of thread pool.
# Based on multiprocessing.Pool
class ThreadPool:
def __init__(self, nb_threads):
self.nb_threads = nb_threads
def map(self, fun, iter):
if self.nb_threads <= 1:
return map(fun, iter)
nb_threads = min(self.nb_threads, len(iter))
# ensure 'iter' does not evaluate lazily
# (generator or xrange...)
iter = list(iter)
# map to results list
results = [None] * nb_threads
def wrapper(i):
def f(args):
results[i] = map(fun, args)
return f
# slice iter in chunks
chunks = [iter[i::nb_threads] for i in range(nb_threads)]
# create threads
threads = [threading.Thread(target = wrapper(i), args = [chunk]) \
for i, chunk in enumerate(chunks)]
# start and join threads
[thread.start() for thread in threads]
[thread.join() for thread in threads]
# reorder results
r = list(itertools.chain.from_iterable(map(None, *results)))
return r
payload = [0] * (1000 * 1000) # 8 MB
payloads = [copy.deepcopy(payload) for _ in range(40)]
def process(i):
for i in payloads[i]:
j = i + 1
if __name__ == '__main__':
for nb_threads in [1, 2, 4, 8, 20]:
t = time.time()
c = time.clock()
pool = ThreadPool(nb_threads)
pool.map(process, xrange(40))
t = time.time() - t
c = time.clock() - c
print nb_threads, t, c
Output:
1 1.04805707932 1.05
2 1.45473504066 2.23
4 2.01357698441 3.98
8 1.56527090073 3.66
20 1.9085559845 4.15
Why does the threading module miserably fail at parallelizing mere buffer reads?
Is it because of the GIL? Or because of some weird configuration on my machine, one process
is allowed only one access to the RAM at a time (I have decent speed-up if I switch ThreadPool for multiprocessing.Pool is the code above)?
I'm using CPython 2.7.8 on a linux distro.
Yes, Python's GIL prevents Python code from running in parallel across multiple threads. You describe your code as doing "buffer reads", but it's really running arbitrary Python code (in this case, iterating over a list adding 1 to other integers). If your threads were making blocking system calls (like reading from a file, or from a network socket), then the GIL would usually be released while the thread blocked waiting on the external data. But since most operations on Python objects can have side effects, you can't do several of them in parallel.
One important reason for this is that CPython's garbage collector uses reference counting as its main way to know when an object can be cleaned up. If several threads try to update the reference count of the same object at the same time, they might end up in a race condition and leave the object with the wrong count. The GIL prevents that from happening, as only one thread can be making such internal changes at a time. Every time your process code does j = i + 1, it's going to be updating the reference counts of the integer objects 0 and 1 a couple of times each. That's exactly the kind of thing the GIL exists to guard.

Resources