How to create a global connection with asyncio and redis - python-3.x

I am new to python 3 and asyncio coming from gevent and 2.7 ....
How do I create a global connection that will be use by all for reids? E.g. I will have on 1 process e.g. 10 asyncio threads but I dont want a separate connection per thread. Why?..will have e.g. 100 cores with 10 threads per core and dont want that many connections to redis
import asyncio
import asyncio_redis
async def worker():
while True:
data = await connection.brpop(['queue'], timeout=0)
print(data)
res = blocking_code(data)
await connection.set('test',res)
#Process raw data here and all code is blocking
def blocking_code(data):
results = {}
return results
if __name__ == '__main__':
connection = asyncio_redis.Connection.create(host='127.0.0.1', port=6379, poolsize=2)
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(worker()), asyncio.ensure_future(worker())]
loop.run_until_complete(asyncio.gather(*tasks))
connection.close()
Traceback (most recent call last):
File "/Users//worker.py", line 14, in <module>
loop.run_until_complete(example())
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 466, in run_until_complete
return future.result()
File "/Users//worker.py", line 7, in example
data = yield from connection.brpop(['queue'], timeout=0)
AttributeError: 'generator' object has no attribute 'brpop'
So in the above I have two tasks but I want only 1 redis connection

10 asyncio threads
Just in case - asyncio coroutines run in one thread. Concurrency achieved by switching between coroutines while I/O operations.
Why your code doesn't work?
asyncio_redis.Connection.create - is a coroutine you should await this operation using yield from to get result from it:
connection = yield from asyncio_redis.Connection.create(host='127.0.0.1', port=6379)
How to create a global connection
If you'll have only one connection, you'll probably get no benefit from using asyncio. Concurrent requests may need pool of connection that can be used. asyncio_redis has easy way to do it, for example:
import asyncio
import asyncio_redis
#asyncio.coroutine
def main():
connection = yield from asyncio_redis.Pool.create(host='127.0.0.1', port=6379, poolsize=10)
try:
# 3 requests running concurrently in single thread using connections from pool:
yield from asyncio.gather(
connection.brpop(['queue:pixel'], timeout=0),
connection.brpop(['queue:pixel'], timeout=0),
connection.brpop(['queue:pixel'], timeout=0),
)
finally:
connection.close()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Python 3.5+
If you're working with Python 3.5+ consider using newer syntax for defining and awaiting coroutines.
Upd:
Blocking code (for example, code that needs much CPU time) can't be used inside coroutines directly: it'll freeze your event loop and you'll get no benefit of asyncio. It's not related to number of connections.
You can use run_in_executor to run this code in separate process without blocking event loop:
from concurrent.futures import ProcessPoolExecutor
executor = ProcessPoolExecutor(max_workers=10) # use number of cores here
async def worker():
while True:
data = await connection.brpop(['queue'], timeout=0)
print(data)
# await blocking_code from separate process:
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(executor, blocking_code, data)

Related

Asyncio big list of Task with sequential combine run_in_executor and standard Coroutine in each

I need to handle list of 2500 ip-addresses from csv file. So I need to create_task from coroutine 2500 times. Inside every coroutine firstly I need to fast-check access of IP:PORT via python module "socket" and it is a synchronous function want to be in loop.run_in_executor(). Secondly if IP:PORT is opened I need to connect to this socket via asyncssh.connect() for doing some bash commands and this is standart asyncio coroutine. Then I need to collect results of this bash commands to another csv file.
Additionaly there is an issue in Linux: system can not open more than 1024 connections at same time. I think it may be solved by making list of lists[1000] with asyncio.sleep(1) between or something like that.
I expected my tasks will be executed by 1000 in 1 second but it only 20 in 1 sec. Why?
Little working code snippet with comments here:
#!/usr/bin/env python3
import asyncio
import csv
import time
from pathlib import Path
import asyncssh
import socket
from concurrent.futures import ThreadPoolExecutor as Executor
PARALLEL_SESSIONS_COUNT = 1000
LEASES_ALL = Path("ip_list.csv")
PORT = 22
TIMEOUT = 1
USER = "testuser1"
PASSWORD = "123"
def is_open(ip,port,timeout):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(timeout)
try:
s.connect((ip, int(port)))
s.shutdown(socket.SHUT_RDWR)
return {"result": True, "error": "NoErr"}
except Exception as ex:
return {"result": False, "error": str(ex)}
finally:
s.close()
def get_leases_list():
# Minimal csv content:
# header must contain "IPAddress"
# every other line is concrete ip-address.
result = []
with open(LEASES_ALL, newline="") as csvfile_1:
reader_1 = csv.DictReader(csvfile_1)
result = list(reader_1)
return result
def split_list(some_list, sublist_count):
result = []
while len(some_list) > sublist_count:
result.append(some_list[:sublist_count])
some_list = some_list[sublist_count:]
result.append(some_list)
return result
async def do_single_host(one_lease_dict): # Function for each Task
# Firstly
IP = one_lease_dict["IPAddress"]
loop = asyncio.get_event_loop()
socket_check = await loop.run_in_executor(None, is_open, IP, PORT, TIMEOUT)
print(socket_check, IP)
# Secondly
if socket_check["result"] == True:
async with asyncssh.connect(host=IP, port=PORT, username=USER, password=PASSWORD, known_hosts=None) as conn:
result = await conn.run("uname -r", check=True)
print(result.stdout, end="") # Just print without write in file at this point.
def aio_root():
leases_list = get_leases_list()
list_of_lists = split_list(leases_list, PARALLEL_SESSIONS_COUNT)
r = []
loop = asyncio.get_event_loop()
for i in list_of_lists:
for j in i:
task = loop.create_task(do_single_host(j))
r.append(task)
group = asyncio.wait(r)
loop.run_until_complete(group) # At this line execute only by 20 in 1sec. Can't understand why :(
loop.close()
def main():
aio_root()
if __name__ == '__main__':
main()
loop.run_in_exectutor signature:
awaitable loop.run_in_executor(executor, func, *args)ΒΆ
The default ThreadPoolExecutor is used if executor is None.
ThreadPoolExecutor document:
Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.
Changed in version 3.8: Default value of max_workers is changed to min(32, os.cpu_count() + 4). This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.

aiohttp slowness with threading

I copied the code from How to run an aiohttp server in a thread?. It runs fine. So I am adding one second sleep. When I launch 10 requests at the same time. The average response time is 9 seconds. Why is that? Wouldn't all requests coming back in a little bit over 1 second?
import asyncio
import threading
from aiohttp import web
import time
loop = asyncio.get_event_loop()
def say_hello(request):
time.sleep(1)
return web.Response(text='Hello, world')
app = web.Application(debug=True)
app.add_routes([web.get('/', say_hello)])
handler = app.make_handler()
server = loop.create_server(handler, host='127.0.0.1', port=8080)
def aiohttp_server():
loop.run_until_complete(server)
loop.run_forever()
t = threading.Thread(target=aiohttp_server)
t.start()
Use asyncio.sleep instead. Your setup is running coros that hard sleep 1 second before they yield to the event loop. So if you gather a bunch of them you have to wait that 1 second for each one serially.
You are starting the server in a second thread, but all of the requests are served from the same thread. The call to time.sleep blocks this thread and does not yield to the event loop so that the requests are effectively processed serially.
If you genuinely want to use sleep for a delay in the response you could use asyncio.sleep instead, which yields to the event loop.
However I expect you are using it as a placeholder for another blocking function. In this case you need to run this in another thread to the main server. The example below shows how to do this using run_in_executor and asyncio.wait.
import asyncio
from aiohttp import web
from concurrent.futures import ThreadPoolExecutor
import time
def blocking_func(seconds: int) -> int:
time.sleep(seconds)
return seconds
async def view_page(request: web.Request):
seconds = int(request.query.get("seconds", 5))
executor = request.app["executor"]
loop = asyncio.get_event_loop()
task = loop.run_in_executor(executor, blocking_func, seconds)
completed, pending = await asyncio.wait([task])
result = task.result()
return web.Response(text=f"Waited {result} second(s).")
def create_app():
app = web.Application()
app.add_routes([web.get("/", view_page)])
executor = ThreadPoolExecutor(max_workers=3)
app["executor"] = executor
return app
if __name__ == "__main__":
app = create_app()
web.run_app(app)

Tornado + aioredis: why are my redis calls blocking?

I try to build on Tornado and with Redis a simple system with two API endpoints:
an API reading a value from Redis, or waiting until this value exists (with BRPOP : value = yield from redis.brpop("test"))
an API writing this value (with LPUSH : redis.lpush("test", "the value")).
So I expect to be able to call those API in any order. Indeed, If I call 2. then 1., it works as expected, the call to 1. returns immediately with the value.
The problem is, if I call 1. then 2., both requests block to never return.
Concurrently, while the requests block, I can still LPUSH/BRPOP directly in Redis, even to the same key. Similarly, I can call other Handlers in Tornado. So I guess the block is situated neither in Redis nor in Tornado, but in my use of aioredis? Maybe the asyncio loop? But I can't understand where I'm mistaken. Any tip?
Thanks for any help.
Here is my code:
import tornado.ioloop
import tornado.web
from tornado import web, gen
from tornado.options import options, define
import aioredis
import asyncio
class WaitValueHandler(tornado.web.RequestHandler):
#asyncio.coroutine
def get(self):
redis = self.application.redis
value = yield from redis.brpop("test")
self.write("I received a value: %s" % value)
class WriteValueHandler(tornado.web.RequestHandler):
#asyncio.coroutine
def get(self):
redis = self.application.redis
res = yield from redis.lpush("test", "here is the value")
self.write("Ok ")
class Application(tornado.web.Application):
def __init__(self):
tornado.ioloop.IOLoop.configure('tornado.platform.asyncio.AsyncIOMainLoop')
handlers = [
(r"/get", WaitValueHandler),
(r"/put", WriteValueHandler)
]
super().__init__(handlers, debug=True)
def init_with_loop(self, loop):
self.redis = loop.run_until_complete(
aioredis.create_redis(('localhost', 6379), loop=loop)
)
if __name__ == "__main__":
application = Application()
application.listen(8888)
loop = asyncio.get_event_loop()
application.init_with_loop(loop)
loop.run_forever()
Ok I saw why, as the doc states :
Blocking operations (like blpop, brpop or long-running LUA scripts) in shared mode mode will block connection and thus may lead to whole program malfunction.
This blocking issue can be easily solved by using exclusive connection for such operations:
redis = await aioredis.create_redis_pool(
('localhost', 6379),
minsize=1,
maxsize=1)
async def task():
# Exclusive mode
with await redis as r:
await r.set('key', 'val')
asyncio.ensure_future(task())
asyncio.ensure_future(task())
# Both tasks will first acquire connection.

Why coroutines cannot be used with run_in_executor?

I want to run a service that requests urls using coroutines and multithread. However I cannot pass coroutines to the workers in the executor. See the code below for a minimal example of this issue:
import time
import asyncio
import concurrent.futures
EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=5)
async def async_request(loop):
await asyncio.sleep(3)
def sync_request(_):
time.sleep(3)
async def main(loop):
futures = [loop.run_in_executor(EXECUTOR, async_request,loop)
for x in range(10)]
await asyncio.wait(futures)
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
Resulting in the following error:
Traceback (most recent call last):
File "co_test.py", line 17, in <module>
loop.run_until_complete(main(loop))
File "/usr/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete
return future.result()
File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
raise self._exception
File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
result = coro.send(None)
File "co_test.py", line 10, in main
futures = [loop.run_in_executor(EXECUTOR, req,loop) for x in range(10)]
File "co_test.py", line 10, in <listcomp>
futures = [loop.run_in_executor(EXECUTOR, req,loop) for x in range(10)]
File "/usr/lib/python3.5/asyncio/base_events.py", line 541, in run_in_executor
raise TypeError("coroutines cannot be used with run_in_executor()")
TypeError: coroutines cannot be used with run_in_executor()
I know that I could use sync_request funcion instead of async_request, in this case I would have coroutines by means of sending the blocking function to another thread.
I also know I could call async_request ten times in the event loop. Something like in the code below:
loop = asyncio.get_event_loop()
futures = [async_request(loop) for i in range(10)]
loop.run_until_complete(asyncio.wait(futures))
But in this case I would be using a single thread.
How could I use both scenarios, the coroutines working within multithreads? As you can see by the code, I am passing (and not using) the pool to the async_request in the hopes I can code something that tells the worker to make a future, send it to the pool and asynchronously (freeing the worker) waits for the result.
The reason I want to do that is to make the application scalable. Is it an unnecessary step? Should I simply have a thread per url and that is it? Something like:
LEN = len(list_of_urls)
EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=LEN)
is good enough?
You have to create and set a new event loop in the thread context in order to run coroutines:
import asyncio
from concurrent.futures import ThreadPoolExecutor
def run(corofn, *args):
loop = asyncio.new_event_loop()
try:
coro = corofn(*args)
asyncio.set_event_loop(loop)
return loop.run_until_complete(coro)
finally:
loop.close()
async def main():
loop = asyncio.get_event_loop()
executor = ThreadPoolExecutor(max_workers=5)
futures = [
loop.run_in_executor(executor, run, asyncio.sleep, 1, x)
for x in range(10)]
print(await asyncio.gather(*futures))
# Prints: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
From what I understood from the question, you are trying to use each thread to:
trigger a coroutine execution
be free to receive more coroutines to trigger
wait everything to end in an asynchronous way
However, as soon as you call the loop (be it the main or a new loop) to wait for results, it blocks the thread waiting.
And, by using run_in_executor with a bunch of sync functions, the thread doesn't actually know if there are more coroutines to dispatch in one go before reaching the point where it waits the loop.
I think that if you want to dispatch a bunch of coroutines in such a way as to each thread manage its own group of coroutines in its own event loop, the following code achieved the 1 second total time, multithreaded wait for 10 async sleeps of 1 second.
import asyncio
import threading
from asyncio import AbstractEventLoop
from concurrent.futures import ThreadPoolExecutor
from time import perf_counter
from typing import Dict, Set
import _asyncio
event_loops_for_each_thread: Dict[int, AbstractEventLoop] = {}
def run(corofn, *args):
curr_thread_id = threading.current_thread().ident
if curr_thread_id not in event_loops_for_each_thread:
event_loops_for_each_thread[curr_thread_id] = asyncio.new_event_loop()
thread_loop = event_loops_for_each_thread[curr_thread_id]
coro = corofn(*args)
return thread_loop.create_task(coro)
async def async_gather_tasks(all_tasks: Set[_asyncio.Task]):
return await asyncio.gather(*all_tasks)
def wait_loops():
# each thread will block waiting all async calls of its specific async loop
curr_thread_id = threading.current_thread().ident
threads_event_loop = event_loops_for_each_thread[curr_thread_id]
# I print the following to prove that each thread is waiting its loop
print(f'Thread {curr_thread_id} will wait its tasks.')
return threads_event_loop.run_until_complete(async_gather_tasks(asyncio.all_tasks(threads_event_loop)))
async def main():
loop = asyncio.get_event_loop()
max_workers = 5
executor = ThreadPoolExecutor(max_workers=max_workers)
# dispatching async tasks for each thread.
futures = [
loop.run_in_executor(executor, run, asyncio.sleep, 1, x)
for x in range(10)]
# waiting the threads finish dispatching the async executions to its own event loops
await asyncio.wait(futures)
# at this point the async events were dispatched to each thread event loop
# in the lines below, you tell each worker thread to wait all its async tasks completion.
futures = [
loop.run_in_executor(executor, wait_loops)
for _ in range(max_workers)
]
print(await asyncio.gather(*futures))
# it will print something like:
# [[1, 8], [0], [6, 3, 9, 7], [4], [2, 5]]
# each sub-set is the result of the tasks of a thread
# it is non-deterministic, so it will return a diferent array of arrays each time you run.
if __name__ == '__main__':
loop = asyncio.get_event_loop()
start = perf_counter()
loop.run_until_complete(main())
end = perf_counter()
duration_s = end - start
# the print below proves that all threads are waiting its tasks asynchronously
print(f'duration_s={duration_s:.3f}')
I just wanted to write a similar answer to Tonsic's answer on how asyncio should actually be used in this situation, but much more succinctly (using some newer asyncio features as well).
What you're really looking for in this case asyncio.gather, which let's you run many coroutines concurrently.
From your example, it should thus become:
async def async_request():
await asyncio.sleep(3)
async def main():
await asyncio.gather(*[async_request() for _ in range(10)])
Now when we time it, it takes about 3 seconds, as desired, instead of 30 seconds:
>>> from time import time
>>> start = time()
>>> asyncio.run(main())
>>> time() - start
3.00907039642334
Furthermore, on using concurrent.futures alongside asyncio, you should identify what blocking code needs an executor and only apply it there to turn it into asynchronous code.
async def async_request():
# The default executor is a `ThreadPoolExecutor`.
# In python >= 3.9, this can be shortened to `asyncio.to_thread(sync_request)`.
await asyncio.get_running_loop().run_in_executor(None, sync_request)
From that point, you can then manage your executors by treating these as coroutines with asyncio, using things like asyncio.gather, as originally shown.

What is mean yield None (tornado.gen.moment)

I need async subprocess lock in my web application.
I writes next code:
r = redis.Redis('localhost')
pipe = r.pipeline()
is_locked = False
while not is_locked:
try:
pipe.watch(lock_name)
current_locked = int(pipe.get(lock_name))
if current_locked == 0:
pipe.multi()
pipe.incr(lock_name)
pipe.execute()
is_locked = True
else:
yield None
except redis.WatchError:
yield None
return True
In documentation writen that tornado.gen.moment (yield None since version 4.5) is a special object which may be yielded to allow the IOLoop to run for one iteration. How it works? Is it next iteration with other Feature object (from other request) or not? Is it correct yield None usage?
The gen.moment is just resolved Future object added to the ioloop with a callback. This allows to run one iteration of ioloop.
The yield None is converted to the gen.moment using convert_yielded in the coroutine's gen.Runner.
The ioloop (basically while True) with each iteration do things like:
run callbacks scheduled with ioloop's add_callback or add_callback_from_signal
run callbacks scheduled with ioloop's add_timeout
poll for fd events (e.g. wait for file descirptor to be ready to write or read). Of course to not block the ioloop the poll has timeout.
run handler of ready fds
So getting to the point yield gen.moment will allow to do all the things above for one time (one iteration).
As an example let's schedule async task - httpclient fetch that requires running ioloop to be finished. On the other hand there will be also blocking function (time.sleep).
import time
from tornado import gen
from tornado.ioloop import IOLoop
from tornado.httpclient import AsyncHTTPClient
#gen.coroutine
def fetch_task():
client = AsyncHTTPClient()
yield client.fetch('http://google.com')
print('fetch_task finished')
#gen.coroutine
def blocking():
start_time = time.time()
counter = 1
while True:
time.sleep(5)
print('blocking for %f' % (time.time() - start_time))
yield gen.moment
print('gen.moment counter %d' % counter)
counter += 1
#gen.coroutine
def main():
fetch_task()
yield blocking()
IOLoop.instance().run_sync(main)
Observation:
without a yield gen.moment, the fetch_task won't be finished
increase/decrease value of time.sleep does not affect the number of required iteration of a ioloop for the fetch_task to be completed. This means also that a AsyncHTTPClient.fetch is N + 1 (gen.moments + the task schedule) interactions with ioloop (handling callbacks, polling fd, handling events).
gen.moment does not always mean, the other tasks will be finished, rather they get opportunity to be one step closer to completeness.

Resources