aiohttp slowness with threading - python-3.x

I copied the code from How to run an aiohttp server in a thread?. It runs fine. So I am adding one second sleep. When I launch 10 requests at the same time. The average response time is 9 seconds. Why is that? Wouldn't all requests coming back in a little bit over 1 second?
import asyncio
import threading
from aiohttp import web
import time
loop = asyncio.get_event_loop()
def say_hello(request):
time.sleep(1)
return web.Response(text='Hello, world')
app = web.Application(debug=True)
app.add_routes([web.get('/', say_hello)])
handler = app.make_handler()
server = loop.create_server(handler, host='127.0.0.1', port=8080)
def aiohttp_server():
loop.run_until_complete(server)
loop.run_forever()
t = threading.Thread(target=aiohttp_server)
t.start()

Use asyncio.sleep instead. Your setup is running coros that hard sleep 1 second before they yield to the event loop. So if you gather a bunch of them you have to wait that 1 second for each one serially.

You are starting the server in a second thread, but all of the requests are served from the same thread. The call to time.sleep blocks this thread and does not yield to the event loop so that the requests are effectively processed serially.
If you genuinely want to use sleep for a delay in the response you could use asyncio.sleep instead, which yields to the event loop.
However I expect you are using it as a placeholder for another blocking function. In this case you need to run this in another thread to the main server. The example below shows how to do this using run_in_executor and asyncio.wait.
import asyncio
from aiohttp import web
from concurrent.futures import ThreadPoolExecutor
import time
def blocking_func(seconds: int) -> int:
time.sleep(seconds)
return seconds
async def view_page(request: web.Request):
seconds = int(request.query.get("seconds", 5))
executor = request.app["executor"]
loop = asyncio.get_event_loop()
task = loop.run_in_executor(executor, blocking_func, seconds)
completed, pending = await asyncio.wait([task])
result = task.result()
return web.Response(text=f"Waited {result} second(s).")
def create_app():
app = web.Application()
app.add_routes([web.get("/", view_page)])
executor = ThreadPoolExecutor(max_workers=3)
app["executor"] = executor
return app
if __name__ == "__main__":
app = create_app()
web.run_app(app)

Related

Can I stop waiting for threads to finish if one of them produced results?

Im making a bunch of GET requests to about a few hundred different API endpoints on different servers. In one of these endpoints there is some information that i want to fetch and return.
After any of these requests return something to me, i want to terminate the other threads and exit. Some requests are almost instant, some can take up to 20 seconds to finish.
If i happen to find the info in 2 seconds, i don't want for 20 seconds to elapse before i can resume work.
Currently I'm doing things like this:
threads = list()
for s in silos: #here i create all the requests
t = Thread(target=process_request, args=(my, args, here))
t.name = "{} - {}".format(some, name)
threads.append(t)
Then I do:
print("Threads: {}".format(len(threads))) # 100 - 250 of them
[ t.start() for t in threads ]
[ t.join() for t in threads ]
process_request() simply makes the get request and stores the result inside a dict if the status_code == 200.
I'm using the requests and threading modules.
If you use the multiprocess pool, then you can terminate the pool as soon as the first response arrives:
import multiprocessing as mp
import time
pool = None
def make_get_request(inputs):
print('Making get request with inputs ' + str(inputs))
time.sleep(2)
return 'dummy response for inputs ' + str(inputs)
def log_response(response):
print("Got response = " + response)
pool.terminate()
def main():
global pool
pool = mp.Pool()
for i in range(10):
pool.apply_async(make_get_request, args = (i,), callback = log_response)
pool.close()
pool.join()
if __name__ == '__main__':
main()

Why does calling datetime hang a thread?

I am attempting to make use of concurrent.futures.ThreadPoolExecutor for the first time. One of my threads (level_monitor) consistently hangs on a call to datetime.now.strftime()—and on another hardware-specific function. For now I am assuming it is the same fundamental problem in both cases.
I've created a reproducible minimum example.
from concurrent.futures import ThreadPoolExecutor
import socket
from time import sleep
status = 'TRY AGAIN\n'
def get_level():
print('starting get_level()')
while True:
sleep(2)
now = datetime.now().strftime('%d-%b-%Y %H:%M:%S')
print('get_level woken...')
# report status when requested
def serve_level():
print('starting serve_level()')
si = socket.socket()
port = 12345
si.bind(('127.0.0.1',port))
si.listen()
print('socket is listening')
while True:
ci, addr = si.accept()
print('accepted client connection from ',addr)
with ci:
req = ci.recv(1024)
print( req )
str = status.encode('utf-8')
ci.send(str)
ci.close()
if __name__ == '__main__':
nthreads = 5
with ThreadPoolExecutor(nthreads) as executor:
level_monitor = executor.submit(get_level)
server = executor.submit(serve_level)
When I run it I see the serve_level thread works fine. I can talk to that thread using telnet. I can see the level_monitor thread starts too, but then it hangs before print('get_level woken...'). If I comment out the call to datetime then the thread behaves as expected.
I am sure that when I find out why I will have found out a lot.

How to correctly use async-await with thread pool in Python 3

I want to achieve same effect as
# Code 1
from multiprocessing.pool import ThreadPool as Pool
from time import sleep, time
def square(a):
print('start', a)
sleep(a)
print('end', a)
return a * a
def main():
p = Pool(2)
queue = list(range(4))
start = time()
results = p.map(square, queue)
print(results)
print(time() - start)
if __name__ == "__main__":
main()
with async functions like
# Code 2
from multiprocessing.pool import ThreadPool as Pool
from time import sleep, time
import asyncio
async def square(a):
print('start', a)
sleep(a) # await asyncio.sleep same effect
print('end', a)
return a * a
async def main():
p = Pool(2)
queue = list(range(4))
start = time()
results = p.map_async(square, queue)
results = results.get()
results = [await result for result in results]
print(results)
print(time() - start)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Currently Code 1 takes 4 seconds and Code 2 takes 6 seconds which means it is not running in parallel. What is the correct and cleanest way to run multiple async functions in parallel?
Better to be python 3.6 compatible. Thank you!
map_async() is not the same "async" as in async def - if it is fed with an async def method, it won't actually run it but return a coroutine instance immediately (try calling such a method without await). Then you awaited on the 4 coroutines one by one, that equals to sequential execution, and ended up with 6 seconds.
Please see following example:
from time import time
import asyncio
from asyncio.locks import Semaphore
semaphore = Semaphore(2)
async def square(a):
async with semaphore:
print('start', a)
await asyncio.sleep(a)
print('end', a)
return a * a
async def main():
start = time()
tasks = []
for a in range(4):
tasks.append(asyncio.ensure_future(square(a)))
await asyncio.wait(tasks)
print([t.result() for t in tasks])
print(time() - start)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
The Semaphore acts similarly like the ThreadPool - it allows only 2 concurrent coroutines entering the async with semaphore: block.

How to create a global connection with asyncio and redis

I am new to python 3 and asyncio coming from gevent and 2.7 ....
How do I create a global connection that will be use by all for reids? E.g. I will have on 1 process e.g. 10 asyncio threads but I dont want a separate connection per thread. Why?..will have e.g. 100 cores with 10 threads per core and dont want that many connections to redis
import asyncio
import asyncio_redis
async def worker():
while True:
data = await connection.brpop(['queue'], timeout=0)
print(data)
res = blocking_code(data)
await connection.set('test',res)
#Process raw data here and all code is blocking
def blocking_code(data):
results = {}
return results
if __name__ == '__main__':
connection = asyncio_redis.Connection.create(host='127.0.0.1', port=6379, poolsize=2)
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(worker()), asyncio.ensure_future(worker())]
loop.run_until_complete(asyncio.gather(*tasks))
connection.close()
Traceback (most recent call last):
File "/Users//worker.py", line 14, in <module>
loop.run_until_complete(example())
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 466, in run_until_complete
return future.result()
File "/Users//worker.py", line 7, in example
data = yield from connection.brpop(['queue'], timeout=0)
AttributeError: 'generator' object has no attribute 'brpop'
So in the above I have two tasks but I want only 1 redis connection
10 asyncio threads
Just in case - asyncio coroutines run in one thread. Concurrency achieved by switching between coroutines while I/O operations.
Why your code doesn't work?
asyncio_redis.Connection.create - is a coroutine you should await this operation using yield from to get result from it:
connection = yield from asyncio_redis.Connection.create(host='127.0.0.1', port=6379)
How to create a global connection
If you'll have only one connection, you'll probably get no benefit from using asyncio. Concurrent requests may need pool of connection that can be used. asyncio_redis has easy way to do it, for example:
import asyncio
import asyncio_redis
#asyncio.coroutine
def main():
connection = yield from asyncio_redis.Pool.create(host='127.0.0.1', port=6379, poolsize=10)
try:
# 3 requests running concurrently in single thread using connections from pool:
yield from asyncio.gather(
connection.brpop(['queue:pixel'], timeout=0),
connection.brpop(['queue:pixel'], timeout=0),
connection.brpop(['queue:pixel'], timeout=0),
)
finally:
connection.close()
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Python 3.5+
If you're working with Python 3.5+ consider using newer syntax for defining and awaiting coroutines.
Upd:
Blocking code (for example, code that needs much CPU time) can't be used inside coroutines directly: it'll freeze your event loop and you'll get no benefit of asyncio. It's not related to number of connections.
You can use run_in_executor to run this code in separate process without blocking event loop:
from concurrent.futures import ProcessPoolExecutor
executor = ProcessPoolExecutor(max_workers=10) # use number of cores here
async def worker():
while True:
data = await connection.brpop(['queue'], timeout=0)
print(data)
# await blocking_code from separate process:
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(executor, blocking_code, data)

What is mean yield None (tornado.gen.moment)

I need async subprocess lock in my web application.
I writes next code:
r = redis.Redis('localhost')
pipe = r.pipeline()
is_locked = False
while not is_locked:
try:
pipe.watch(lock_name)
current_locked = int(pipe.get(lock_name))
if current_locked == 0:
pipe.multi()
pipe.incr(lock_name)
pipe.execute()
is_locked = True
else:
yield None
except redis.WatchError:
yield None
return True
In documentation writen that tornado.gen.moment (yield None since version 4.5) is a special object which may be yielded to allow the IOLoop to run for one iteration. How it works? Is it next iteration with other Feature object (from other request) or not? Is it correct yield None usage?
The gen.moment is just resolved Future object added to the ioloop with a callback. This allows to run one iteration of ioloop.
The yield None is converted to the gen.moment using convert_yielded in the coroutine's gen.Runner.
The ioloop (basically while True) with each iteration do things like:
run callbacks scheduled with ioloop's add_callback or add_callback_from_signal
run callbacks scheduled with ioloop's add_timeout
poll for fd events (e.g. wait for file descirptor to be ready to write or read). Of course to not block the ioloop the poll has timeout.
run handler of ready fds
So getting to the point yield gen.moment will allow to do all the things above for one time (one iteration).
As an example let's schedule async task - httpclient fetch that requires running ioloop to be finished. On the other hand there will be also blocking function (time.sleep).
import time
from tornado import gen
from tornado.ioloop import IOLoop
from tornado.httpclient import AsyncHTTPClient
#gen.coroutine
def fetch_task():
client = AsyncHTTPClient()
yield client.fetch('http://google.com')
print('fetch_task finished')
#gen.coroutine
def blocking():
start_time = time.time()
counter = 1
while True:
time.sleep(5)
print('blocking for %f' % (time.time() - start_time))
yield gen.moment
print('gen.moment counter %d' % counter)
counter += 1
#gen.coroutine
def main():
fetch_task()
yield blocking()
IOLoop.instance().run_sync(main)
Observation:
without a yield gen.moment, the fetch_task won't be finished
increase/decrease value of time.sleep does not affect the number of required iteration of a ioloop for the fetch_task to be completed. This means also that a AsyncHTTPClient.fetch is N + 1 (gen.moments + the task schedule) interactions with ioloop (handling callbacks, polling fd, handling events).
gen.moment does not always mean, the other tasks will be finished, rather they get opportunity to be one step closer to completeness.

Resources