Let's say I have a program in Python which looks like this:
import time
def send_message_realtime(s):
print("Real Time: ", s)
def send_message_delay(s):
time.sleep(5)
print("Delayed Message ", s)
for i in range(10):
send_message_realtime(str(i))
time.sleep(1)
send_message_delay(str(i))
What I am trying to do here is some sort of multithreading, so that the contents of my main for loop continues to execute without having to wait for the delay caused by time.sleep(5) in the delayed function.
Ideally, the piece of code that I am working upon looks something like below. I get a message from some API endpoint which I want to send to a particular telegram channel in real-time(paid subscribers), but I also want to send it to another channel by delaying it exactly 10 minutes or 600 seconds since they're free members. The problem I am facing is, I want to keep sending the message in real-time to my paid subscribers and kind of create a new thread/process for the delayed message which runs independently of the main while loop.
def send_message_realtime(my_realtime_message):
telegram.send(my_realtime_message)
def send_message_delayed(my_realtime_message):
time.sleep(600)
telegram.send(my_realtime_message)
while True:
my_realtime_message = api.get()
send_message_realtime(my_realtime_message)
send_message_delayed(my_realtime_message)
I think something like ThreadPoolExecutor does what you are look for:
import time
from concurrent.futures.thread import ThreadPoolExecutor
def send_message_realtime(s):
print("Real Time: ", s)
def send_message_delay(s):
time.sleep(5)
print("Delayed Message ", s)
def work_to_do(i):
send_message_realtime(str(i))
time.sleep(1)
send_message_delay(str(i))
with ThreadPoolExecutor(max_workers=4) as executor:
for i in range(10):
executor.submit(work_to_do, i)
The max_workers would be the number of parallel messages that could have at a given moment in time.
Instead of a multithread solution, you can also use a multiprocessing solution, for instance
from multiprocessing import Pool
...
with Pool(4) as p:
print(p.map(work_to_do, range(10)))
Related
I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())
I have the following code (simplified):
from multiprocessing import Process, Queue
def f1(queue):
while True:
# do some stuff and get a variable called data
# ...
queue.put(data)
def f2(queue):
while True:
if not queue.empty():
data = queue.get(timeout=300)
print('queue data: ' + str(data))
if __name__ == '__main__':
q = Queue()
p1 = Process(target=f1, args=(q,))
p2 = Process(target=f2, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()
The problem I'm facing is that I don't know how to lock the queue in f1 in order to keep putting data for n seconds, before f2 is able to read it.
I tried with timeouts but of course, it didn't work. Basically, the expected behaviour would be that f1 keeps appending data into the queue and after n seconds, f2 can get what's in that queue. So, summarising, f1 should be running continuously, f2 should be running continuously too but accessing the queue every n seconds.
I can think of not so elegant ways of doing this with the time library, but I guess it has to be other way. Maybe the code's approach is wrong and I shouldn't be using Process and Queue but Pipelines or something else.
Thanks in advance!
For this particular case in which I was using the multiprocessing library, instead of threading or asyncio, I found the best way to do this by using a simple sleep, so the f2() will end up like:
def f2(queue):
while True:
time.sleep(300) # sleep for 5 minutes before POSTing
if not queue.empty():
data = queue.get(timeout=300)
print('queue data: ' + str(data))
Of course, after importing time.
As I said, maybe not the most elegant solution but I couldn't come up with anything better for the time being (and this particular use case).
I am dealing with thousands of image urls and want to use concurrent.futures.ProcessPoolExecutor to speed up.
Since some of the urls are broken or images are large, the process function may hang or unexpectedly consume a lot of time during processing. I want to add a timeout on the process function like 10 seconds to get rid of these invalid images.
I tried to set the timeout param in futures .as_completed, the TimeoutException could be successfully raised. However, it seems that the main process will still wait until the timeout child process is completed. Is there any approach to immediately kill the timeout child process and put next url into the pool?
from concurrent import futures
def process(url):
### Some time consuming operation
return result
def main():
urls = ['url1','url2','url3',...,'url100']
with futures.ProcessPoolExecutor(max_workers=10) as executor:
future_list = {executor.submit(process, url):url for url in urls}
results = []
try:
for future in futures.as_completed(future_list, timeout=10):
results.append(future.result())
except futures._base.TimeoutException:
print("timeout")
print(results)
if __name__ == '__main__':
main()
In above example, suppose that I have 100 urls, 10 of them are invalid and may cost a lot of time ,how to get the rest 90 urls' processed result list?
Not with the concurrent.futures library.
The pebble module has been developed to overcome such limitation.
from pebble import ProcessPool
from concurrent.futures import TimeoutError
with process.ProcessPool() as pool:
future = pool.schedule(function, args=(1,2), timeout=5)
try:
result = future.result() # blocks until results are ready
except TimeoutError as error:
print("Function took longer than %d seconds" % error.args[1])
I try to find a simple way to "speed up" simple functions for a big script so I googled for it and found 3 ways to do that.
but it seems the time they need is always the same.
so what I am doing wrong testing them?
file1:
from concurrent.futures import ThreadPoolExecutor as PoolExecutor
from threading import Thread
import time
import os
import math
#https://dev.to/rhymes/how-to-make-python-code-concurrent-with-3-lines-of-code-2fpe
def benchmark():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
with PoolExecutor(max_workers=3) as executor:
for _ in executor.map((benchmark())):
pass
file2:
#the basic way
from threading import Thread
import time
import os
import math
def calc():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
calc()
file3:
import asyncio
import uvloop
import time
import math
#https://github.com/magicstack/uvloop
async def main():
start = time.time()
for i in range (0, 40000000):
x = math.sqrt(i)
print(x)
end = time.time()
print('time', end - start)
uvloop.install()
asyncio.run(main())
every file needs about 180-200 sec
so i 'can't see' a difference.
I googled for it and found 3 ways to [speed up a function], but it seems the time they need is always the same. so what I am doing wrong testing them?
You seemed to have found strategies to speed up some code by parallelizing it, but you failed to implement them correctly. First, the speedup is supposed to come from running multiple instances of the function in parallel, and the code snippets make no attempt to do that. Then, there are other problems.
In the first example, you pass the result benchmark() to executor.map, which means all of benchmark() is immediately executed to completion, thus effectively disabling parallelization. (Also, executor.map is supposed to receive an iterable, not None, and this code must have printed a traceback not shown in the question.) The correct way would be something like:
# run the benchmark 5 times in parallel - if that takes less
# than 5x of a single benchmark, you've got a speedup
with ThreadPoolExecutor(max_workers=5) as executor:
for _ in range(5):
executor.submit(benchmark)
For this to actually produce a speedup, you should try to use ProcessPoolExecutor, which runs its tasks in separate processes and is therefore unaffected by the GIL.
The second code snippet never actually creates or runs a thread, it just executes the function in the main thread, so it's unclear how that's supposed to speed things up.
The last snippet doesn't await anything, so the async def works just like an ordinary function. Note that asyncio is an async framework based on switching between tasks blocked on IO, and as such can never speed CPU-bound calculations.
I am writing an application using python3 and am trying out asyncio for the first time. One issue I have encountered is that some of my coroutines block the event loop for longer than I like. I am trying to find something along the lines of top for the event loop that will show how much wall/cpu time is being spent running each of my coroutines. If there isn't anything already existing does anyone know of a way to add hooks to the event loop so that I can take measurements?
I have tried using cProfile which gives some helpful output, but I am more interested in time spent blocking the event loop, rather than total execution time.
Event loop can already track if coroutines take much CPU time to execute. To see it you should enable debug mode with set_debug method:
import asyncio
import time
async def main():
time.sleep(1) # Block event loop
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.set_debug(True) # Enable debug
loop.run_until_complete(main())
In output you'll see:
Executing <Task finished coro=<main() [...]> took 1.016 seconds
By default it shows warnings for coroutines that blocks for more than 0.1 sec. It's not documented, but based on asyncio source code, looks like you can change slow_callback_duration attribute to modify this value.
You can use call_later. Periodically run callback that will log/notify the difference of loop's time and period interval time.
class EventLoopDelayMonitor:
def __init__(self, loop=None, start=True, interval=1, logger=None):
self._interval = interval
self._log = logger or logging.getLogger(__name__)
self._loop = loop or asyncio.get_event_loop()
if start:
self.start()
def run(self):
self._loop.call_later(self._interval, self._handler, self._loop.time())
def _handler(self, start_time):
latency = (self._loop.time() - start_time) - self._interval
self._log.error('EventLoop delay %.4f', latency)
if not self.is_stopped():
self.run()
def is_stopped(self):
return self._stopped
def start(self):
self._stopped = False
self.run()
def stop(self):
self._stopped = True
example
import time
async def main():
EventLoopDelayMonitor(interval=1)
await asyncio.sleep(1)
time.sleep(2)
await asyncio.sleep(1)
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
output
EventLoop delay 0.0013
EventLoop delay 1.0026
EventLoop delay 0.0014
EventLoop delay 0.0015
For anyone reading this in 2019, this might be a better answer: yappi. With Yappi version 1.2.1>=, you can natively profile coroutines and see exactly how much wall or cpu time is spent inside a coroutine.
See here for details on this coroutine profiling.
To expand a bit on one of the answers, if you want to monitor your loop and detect hangs, here's a snippet to do just that. It launches a separate thread that checks whether the loop's tasks yielded execution recently enough.
def monitor_loop(loop, delay_handler):
loop = loop
last_call = loop.time()
INTERVAL = .5 # How often to poll the loop and check the current delay.
def run_last_call_updater():
loop.call_later(INTERVAL, last_call_updater)
def last_call_updater():
nonlocal last_call
last_call = loop.time()
run_last_call_updater()
run_last_call_updater()
def last_call_checker():
threading.Timer(INTERVAL / 2, last_call_checker).start()
if loop.time() - last_call > INTERVAL:
delay_handler(loop.time() - last_call)
threading.Thread(target=last_call_checker).start()