Writing an EventLoop without using asyncio - python-3.x

I'm getting very familiar with python's asyncio, the asynchronous programming in python, co-routines etc.
I want to be able to executing several co-routines with my own custom made eventloop.
I'm curious if i can write my own eventloop without importing asyncio at all

I want to be able to executing several co-routines with my own custom made eventloop.
The asyncio event loop is well-tested and can be easily extended to acknowledge non-asyncio events. If you describe the actual use case, it might be easier to help. But if your goal is to learn about async programming and coroutines, read on.
I'm curious if i can write my own eventloop without importing asyncio at all
It's definitely possible - asyncio itself is just a library, after all - but it will take some work for your event loop to be useful. See this excellent talk by David Beazley where he demonstrates writing an event loop in front of a live audience. (Don't be put off by David using the older yield from syntax - await works exactly the same way.)

Ok, so i found an example somewhere (sorry, don't remember where, no link), and changed a little bit.
An eventloop and co-routins without even importing asyncio:
import datetime
import heapq
import types
import time
class Task:
def __init__(self, wait_until, coro):
self.coro = coro
self.waiting_until = wait_until
def __eq__(self, other):
return self.waiting_until == other.waiting_until
def __lt__(self, other):
return self.waiting_until < other.waiting_until
class SleepingLoop:
def __init__(self, *coros):
self._new = coros
self._waiting = []
def run_until_complete(self):
# Start all the coroutines.
for coro in self._new:
wait_for = coro.send(None)
heapq.heappush(self._waiting, Task(wait_for, coro))
# Keep running until there is no more work to do.
while self._waiting:
now = datetime.datetime.now()
# Get the coroutine with the soonest resumption time.
task = heapq.heappop(self._waiting)
if now < task.waiting_until:
# We're ahead of schedule; wait until it's time to resume.
delta = task.waiting_until - now
time.sleep(delta.total_seconds())
now = datetime.datetime.now()
try:
# It's time to resume the coroutine.
wait_until = task.coro.send(now)
heapq.heappush(self._waiting, Task(wait_until, task.coro))
except StopIteration:
# The coroutine is done.
pass
#types.coroutine
def async_sleep(seconds):
now = datetime.datetime.now()
wait_until = now + datetime.timedelta(seconds=seconds)
actual = yield wait_until
return actual - now
async def countdown(label, total_seconds_wait, *, delay=0):
print(label, 'waiting', delay, 'seconds before starting countdown')
delta = await async_sleep(delay)
print(label, 'starting after waiting', delta)
while total_seconds_wait:
print(label, 'T-minus', total_seconds_wait)
waited = await async_sleep(1)
total_seconds_wait -= 1
print(label, 'lift-off!')
def main():
loop = SleepingLoop(countdown('A', 5, delay=0),
countdown('B', 3, delay=2),
countdown('C', 4, delay=1))
start = datetime.datetime.now()
loop.run_until_complete()
print('Total elapsed time is', datetime.datetime.now() - start)
if __name__ == '__main__':
main()

Related

Python Timer object lagging behind system time [duplicate]

I'm trying to schedule a repeating event to run every minute in Python 3.
I've seen class sched.scheduler but I'm wondering if there's another way to do it. I've heard mentions I could use multiple threads for this, which I wouldn't mind doing.
I'm basically requesting some JSON and then parsing it; its value changes over time.
To use sched.scheduler I have to create a loop to request it to schedule the even to run for one hour:
scheduler = sched.scheduler(time.time, time.sleep)
# Schedule the event. THIS IS UGLY!
for i in range(60):
scheduler.enter(3600 * i, 1, query_rate_limit, ())
scheduler.run()
What other ways to do this are there?
You could use threading.Timer, but that also schedules a one-off event, similarly to the .enter method of scheduler objects.
The normal pattern (in any language) to transform a one-off scheduler into a periodic scheduler is to have each event re-schedule itself at the specified interval. For example, with sched, I would not use a loop like you're doing, but rather something like:
def periodic(scheduler, interval, action, actionargs=()):
scheduler.enter(interval, 1, periodic,
(scheduler, interval, action, actionargs))
action(*actionargs)
and initiate the whole "forever periodic schedule" with a call
periodic(scheduler, 3600, query_rate_limit)
Or, I could use threading.Timer instead of scheduler.enter, but the pattern's quite similar.
If you need a more refined variation (e.g., stop the periodic rescheduling at a given time or upon certain conditions), that's not too hard to accomodate with a few extra parameters.
You could use schedule. It works on Python 2.7 and 3.3 and is rather lightweight:
import schedule
import time
def job():
print("I'm working...")
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
while 1:
schedule.run_pending()
time.sleep(1)
My humble take on the subject:
from threading import Timer
class RepeatedTimer(object):
def __init__(self, interval, function, *args, **kwargs):
self._timer = None
self.function = function
self.interval = interval
self.args = args
self.kwargs = kwargs
self.is_running = False
self.start()
def _run(self):
self.is_running = False
self.start()
self.function(*self.args, **self.kwargs)
def start(self):
if not self.is_running:
self._timer = Timer(self.interval, self._run)
self._timer.start()
self.is_running = True
def stop(self):
self._timer.cancel()
self.is_running = False
Usage:
from time import sleep
def hello(name):
print "Hello %s!" % name
print "starting..."
rt = RepeatedTimer(1, hello, "World") # it auto-starts, no need of rt.start()
try:
sleep(5) # your long-running job goes here...
finally:
rt.stop() # better in a try/finally block to make sure the program ends!
Features:
Standard library only, no external dependencies
Uses the pattern suggested by Alex Martnelli
start() and stop() are safe to call multiple times even if the timer has already started/stopped
function to be called can have positional and named arguments
You can change interval anytime, it will be effective after next run. Same for args, kwargs and even function!
Based on MestreLion answer, it solve a little problem with multithreading:
from threading import Timer, Lock
class Periodic(object):
"""
A periodic task running in threading.Timers
"""
def __init__(self, interval, function, *args, **kwargs):
self._lock = Lock()
self._timer = None
self.function = function
self.interval = interval
self.args = args
self.kwargs = kwargs
self._stopped = True
if kwargs.pop('autostart', True):
self.start()
def start(self, from_run=False):
self._lock.acquire()
if from_run or self._stopped:
self._stopped = False
self._timer = Timer(self.interval, self._run)
self._timer.start()
self._lock.release()
def _run(self):
self.start(from_run=True)
self.function(*self.args, **self.kwargs)
def stop(self):
self._lock.acquire()
self._stopped = True
self._timer.cancel()
self._lock.release()
You could use the Advanced Python Scheduler. It even has a cron-like interface.
Use Celery.
from celery.task import PeriodicTask
from datetime import timedelta
class ProcessClicksTask(PeriodicTask):
run_every = timedelta(minutes=30)
def run(self, **kwargs):
#do something
Based on Alex Martelli's answer, I have implemented decorator version which is more easier to integrated.
import sched
import time
import datetime
from functools import wraps
from threading import Thread
def async(func):
#wraps(func)
def async_func(*args, **kwargs):
func_hl = Thread(target=func, args=args, kwargs=kwargs)
func_hl.start()
return func_hl
return async_func
def schedule(interval):
def decorator(func):
def periodic(scheduler, interval, action, actionargs=()):
scheduler.enter(interval, 1, periodic,
(scheduler, interval, action, actionargs))
action(*actionargs)
#wraps(func)
def wrap(*args, **kwargs):
scheduler = sched.scheduler(time.time, time.sleep)
periodic(scheduler, interval, func)
scheduler.run()
return wrap
return decorator
#async
#schedule(1)
def periodic_event():
print(datetime.datetime.now())
if __name__ == '__main__':
print('start')
periodic_event()
print('end')
Doc: Advanced Python Scheduler
#sched.cron_schedule(day='last sun')
def some_decorated_task():
print("I am printed at 00:00:00 on the last Sunday of every month!")
Available fields:
| Field | Description |
|-------------|----------------------------------------------------------------|
| year | 4-digit year number |
| month | month number (1-12) |
| day | day of the month (1-31) |
| week | ISO week number (1-53) |
| day_of_week | number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun) |
| hour | hour (0-23) |
| minute | minute (0-59) |
| second | second (0-59) |
Here's a quick and dirty non-blocking loop with Thread:
#!/usr/bin/env python3
import threading,time
def worker():
print(time.time())
time.sleep(5)
t = threading.Thread(target=worker)
t.start()
threads = []
t = threading.Thread(target=worker)
threads.append(t)
t.start()
time.sleep(7)
print("Hello World")
There's nothing particularly special, the worker creates a new thread of itself with a delay. Might not be most efficient, but simple enough. northtree's answer would be the way to go if you need more sophisticated solution.
And based on this, we can do the same, just with Timer:
#!/usr/bin/env python3
import threading,time
def hello():
t = threading.Timer(10.0, hello)
t.start()
print( "hello, world",time.time() )
t = threading.Timer(10.0, hello)
t.start()
time.sleep(12)
print("Oh,hai",time.time())
time.sleep(4)
print("How's it going?",time.time())
There is a new package, called ischedule. For this case, the solution could be as following:
from ischedule import schedule, run_loop
from datetime import timedelta
def query_rate_limit():
print("query_rate_limit")
schedule(query_rate_limit, interval=60)
run_loop(return_after=timedelta(hours=1))
Everything runs on the main thread and there is no busy waiting inside the run_loop. The startup time is very precise, usually within a fraction of a millisecond of the specified time.
Taking the original threading.Timer() class implementation and fixing the run() method I get something like:
class PeriodicTimer(Thread):
"""A periodic timer that runs indefinitely until cancel() is called."""
def __init__(self, interval, function, args=None, kwargs=None):
Thread.__init__(self)
self.interval = interval
self.function = function
self.args = args if args is not None else []
self.kwargs = kwargs if kwargs is not None else {}
self.finished = Event()
def cancel(self):
"""Stop the timer if it hasn't finished yet."""
self.finished.set()
def run(self):
"""Run until canceled"""
while not self.finished.wait(self.interval):
self.function(*self.args, **self.kwargs)
The wait() method called is using a condition variable, so it should be rather efficient.
See my sample
import sched, time
def myTask(m,n):
print n+' '+m
def periodic_queue(interval,func,args=(),priority=1):
s = sched.scheduler(time.time, time.sleep)
periodic_task(s,interval,func,args,priority)
s.run()
def periodic_task(scheduler,interval,func,args,priority):
func(*args)
scheduler.enter(interval,priority,periodic_task,
(scheduler,interval,func,args,priority))
periodic_queue(1,myTask,('world','hello'))
I ran into a similar issue a while back so I made a python module event-scheduler to address this. It has a very similar API to the sched library with a few differences:
It utilizes a background thread and is always able to accept and run jobs in the background until the scheduler is stopped explicitly (no need for a while loop).
It comes with an API to schedule recurring events at a user specified interval until explicitly cancelled.
It can be installed by pip install event-scheduler
from event_scheduler import EventScheduler
event_scheduler = EventScheduler()
event_scheduler.start()
# Schedule the recurring event to print "hello world" every 60 seconds with priority 1
# You can use the event_id to cancel the recurring event later
event_id = event_scheduler.enter_recurring(60, 1, print, ("hello world",))

Multiprocessing with multiprocessing.Pool

I am having issue with even the most basic task using mutiprocessing.Tool method.
It seems to be working but never finish the simplest task.
Could you please help what am I doing wrong?
I read some articles, tried to understand it, but could figure it out. I added a short example (with list(map(squared, range(2_000_000))), it works, but not the below.)
Thanks in advance,
Roland
"""
from multiprocessing import Pool
import time
process_pool = Pool(processes = 4)
def squared(n):
return n ** 2
start = time.perf_counter()
process_pool.apply(squared, range(2_000_000))
end = time.perf_counter() - start
print(f"Run time: {end}")
"""
It seems to be a case of multithread..... Have you tried something like:
from concurrent.futures import ThreadPoolExecutor, as_completed
num_of_threads = 50 # Number of threads executing at the same time
with ThreadPoolExecutor(max_workers=num_of_threads) as executor:
tasks = []
for i in i_list:
tasks.append(
executor.submit(
<Function_to_execute>, i
)
)
for future in as_completed(tasks):
if future.result():
yield future.result() # Here can be just a return, yield you return a generator
I think you want imap() (and move squared() before you define the Pool):
from multiprocessing import Pool
import time
def squared(n):
return n ** 2
process_pool = Pool(processes = 4)
start = time.perf_counter()
process_pool.imap(squared, range(2))
end = time.perf_counter() - start
print(f"Run time: {end}")
just keep in mind this is not a very representative example, since you dont do anything with the results; something better would be
with Pool(4) as pool:
results = pool.imap(squared, range(2_000_000))
for result in results:
pass # do something here with the result

Sharing asyncio.Queue with another thread or process

I've recently converted my old template matching program to asyncio and I have a situation where one of my coroutines relies on a blocking method (processing_frame).
I want to run that method in a seperate thread or process whenever the coroutine that calls that method (analyze_frame) gets an item from the shared asyncio.Queue()
I'm not sure if that's possible or worth it performance wise since I have very little experience with threading and multiprocessing
import cv2
import datetime
import argparse
import os
import asyncio
# Making CLI
if not os.path.exists("frames"):
os.makedirs("frames")
t0 = datetime.datetime.now()
ap = argparse.ArgumentParser()
ap.add_argument("-v", "--video", required=True,
help="path to our file")
args = vars(ap.parse_args())
threshold = .2
death_count = 0
was_found = False
template = cv2.imread('youdied.png')
vidcap = cv2.VideoCapture(args["video"])
loop = asyncio.get_event_loop()
frames_to_analyze = asyncio.Queue()
def main():
length = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
tasks = []
for _ in range(int(length / 50)):
tasks.append(loop.create_task(read_frame(50, frames_to_analyze)))
tasks.append(loop.create_task(analyze_frame(threshold, template, frames_to_analyze)))
final_task = asyncio.gather(*tasks)
loop.run_until_complete(final_task)
dt = datetime.datetime.now() - t0
print("App exiting, total time: {:,.2f} sec.".format(dt.total_seconds()))
print(f"Deaths registered: {death_count}")
async def read_frame(frames, frames_to_analyze):
global vidcap
for _ in range(frames-1):
vidcap.grab()
else:
current_frame = vidcap.read()[1]
print("Read 50 frames")
await frames_to_analyze.put(current_frame)
async def analyze_frame(threshold, template, frames_to_analyze):
global vidcap
global was_found
global death_count
frame = await frames_to_analyze.get()
is_found = processing_frame(frame)
if was_found and not is_found:
death_count += 1
await writing_to_file(death_count, frame)
was_found = is_found
def processing_frame(frame):
res = cv2.matchTemplate(frame, template, cv2.TM_CCOEFF_NORMED)
max_val = cv2.minMaxLoc(res)[1]
is_found = max_val >= threshold
print(is_found)
return is_found
async def writing_to_file(death_count, frame):
cv2.imwrite(f"frames/frame{death_count}.jpg", frame)
if __name__ == '__main__':
main()
I've tried using unsync but without much success
I would get something along the lines of
with self._rlock:
PermissionError: [WinError 5] Access is denied
If processing_frame is a blocking function, you should call it with await loop.run_in_executor(None, processing_frame, frame). That will submit the function to a thread pool and allow the event loop to proceed with doing other things until the call function completes.
The same goes for calls such as cv2.imwrite. As written, writing_to_file is not truly asynchronous, despite being defined with async def. This is because it doesn't await anything, so once its execution starts, it will proceed to the end without ever suspending. In that case one could as well make it a normal function in the first place, to make it obvious what's going on.

Python multiprocessing script partial output

I am following the principles laid down in this post to safely output the results which will eventually be written to a file. Unfortunately, the code only print 1 and 2, and not 3 to 6.
import os
import argparse
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist):
for par in parlist:
queue.put(par)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
try:
par=queueIn.get(block=False)
res=doCalculation(par)
queueOut.put((res))
queueIn.task_done()
except:
break
def doCalculation(par):
return par
def write(queue):
while True:
try:
par=queue.get(block=False)
print("response:",par)
except:
break
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
On running the code it prints,
$ python3 tst.py
Queue size 6
response: 1
response: 2
Also, is it possible to ensure that the write function always outputs 1,2,3,4,5,6 i.e. in the same order in which the data is fed into the feed queue?
The error is somehow with the task_done() call. If you remove that one, then it works, don't ask me why (IMO that's a bug). But the way it works then is that the queueIn.get(block=False) call throws an exception because the queue is empty. This might be just enough for your use case, a better way though would be to use sentinels (as suggested in the multiprocessing docs, see last example). Here's a little rewrite so your program uses sentinels:
import os
import argparse
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist, nthreads):
for par in parlist:
queue.put(par)
for i in range(nthreads):
queue.put(None)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
par=queueIn.get()
if par is None:
break
res=doCalculation(par)
queueOut.put((res))
def doCalculation(par):
return par
def write(queue):
while not queue.empty():
par=queue.get()
print("response:",par)
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod, nthreads))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
A few things to note:
the sentinel is putting a None into the queue. Note that you need one sentinel for every worker process.
for the write function you don't need to do the sentinel handling as there's only one process and you don't need to handle concurrency (if you would do the empty() and then get() thingie in your calc function you would run into a problem if e.g. there's only one item left in the queue and both workers check empty() at the same time and then both want to do get() and then one of them is locked forever)
you don't need to put feed and write into processes, just put them into your main function as you don't want to run it in parallel anyway.
how can I have the same order in output as in input? [...] I guess multiprocessing.map can do this
Yes map keeps the order. Rewriting your program into something simpler (as you don't need the workerQueue and writerQueue and adding random sleeps to prove that the output is still in order:
from multiprocessing import Pool
import time
import random
def calc(val):
time.sleep(random.random())
return val
if __name__ == "__main__":
considerperiod=[1,2,3,4,5,6]
with Pool(processes=2) as pool:
print(pool.map(calc, considerperiod))

What is mean yield None (tornado.gen.moment)

I need async subprocess lock in my web application.
I writes next code:
r = redis.Redis('localhost')
pipe = r.pipeline()
is_locked = False
while not is_locked:
try:
pipe.watch(lock_name)
current_locked = int(pipe.get(lock_name))
if current_locked == 0:
pipe.multi()
pipe.incr(lock_name)
pipe.execute()
is_locked = True
else:
yield None
except redis.WatchError:
yield None
return True
In documentation writen that tornado.gen.moment (yield None since version 4.5) is a special object which may be yielded to allow the IOLoop to run for one iteration. How it works? Is it next iteration with other Feature object (from other request) or not? Is it correct yield None usage?
The gen.moment is just resolved Future object added to the ioloop with a callback. This allows to run one iteration of ioloop.
The yield None is converted to the gen.moment using convert_yielded in the coroutine's gen.Runner.
The ioloop (basically while True) with each iteration do things like:
run callbacks scheduled with ioloop's add_callback or add_callback_from_signal
run callbacks scheduled with ioloop's add_timeout
poll for fd events (e.g. wait for file descirptor to be ready to write or read). Of course to not block the ioloop the poll has timeout.
run handler of ready fds
So getting to the point yield gen.moment will allow to do all the things above for one time (one iteration).
As an example let's schedule async task - httpclient fetch that requires running ioloop to be finished. On the other hand there will be also blocking function (time.sleep).
import time
from tornado import gen
from tornado.ioloop import IOLoop
from tornado.httpclient import AsyncHTTPClient
#gen.coroutine
def fetch_task():
client = AsyncHTTPClient()
yield client.fetch('http://google.com')
print('fetch_task finished')
#gen.coroutine
def blocking():
start_time = time.time()
counter = 1
while True:
time.sleep(5)
print('blocking for %f' % (time.time() - start_time))
yield gen.moment
print('gen.moment counter %d' % counter)
counter += 1
#gen.coroutine
def main():
fetch_task()
yield blocking()
IOLoop.instance().run_sync(main)
Observation:
without a yield gen.moment, the fetch_task won't be finished
increase/decrease value of time.sleep does not affect the number of required iteration of a ioloop for the fetch_task to be completed. This means also that a AsyncHTTPClient.fetch is N + 1 (gen.moments + the task schedule) interactions with ioloop (handling callbacks, polling fd, handling events).
gen.moment does not always mean, the other tasks will be finished, rather they get opportunity to be one step closer to completeness.

Resources