Need access to flask request context & db session after they exit - python-3.x

I'm writing a flask app that streams content out to the user, and I'm trying to manipulate the db while this streaming is happening. Here's some example code (simplified):
def work_hard(obj):
yield 'About to do a lot of work...'
obj.status = do_a_lot_of_work_very_slowly()
yield obj.status
db.session.commit()
obj.more = more_slow_stuff()
yield obj.more
db.session.commit()
yield 'Hard work is done!'
#app.route('/log/<int:objid>/work_hard', methods=['POST'])
def perform_action(objid):
obj = MyModel.query.get(objid)
return Response(work_hard(obj), mimetype='text/html')
This code gives the error Instance <MyModel at 0x7f5555f046a0> is not bound to a Session; attribute refresh operation cannot proceed, but if I call db.session.commit() inside perform_action() instead of work_hard(), it works. Similarly, if I try to access flask's request instance, it works in perform_action() but not work_hard() (it complains that I'm trying to access the request outside of a request context).
I assume that these are both because work_hard() is executing after perform_action() has returned. Is is possible to somehow prolong the request context to include work_hard()? So far I've just been passing individual values from request to work_hard() and that worked to a point, but now I need to commit to the db and I'm not sure how to fix the db session here.
I can't just call db.session.commit() from perform_action(), I really do need to be able to make multiple updates in the db in real-time as the output is streaming to the HTTP client.
If this turns out not to be possible, my backup plan is to stream output from a subprocess, and then in the subprocess I'll connect to the db from there, but I'd prefer to do it all within the same process if possible.
Thanks!

You can use stream_with_context() to keep the context (and hence the session) around while the generator runs:
return Response(stream_with_context(work_hard(obj)), ...)

Access the session with application context: with app.app_context
def work_hard(obj):
yield 'About to do a lot of work...'
obj.status = "blub"
yield obj.status
with app.app_context():
db.session.commit()
obj.more = "fish"
yield obj.more
with app.app_context():
db.session.commit()
yield 'Hard work is done!'
#app.route('/log/<int:objid>/work_hard', methods=['POST'])
def perform_action(objid):
def perform_action(objid):
obj = MyModel.query.get(objid)
return Response(work_hard(obj), mimetype='text/html')

Related

Using asyncio for doing a/b testing in Python

Let's say there's some API that's running in production already and you created another API which you kinda want to A/B test using the incoming requests that's hitting the production-api. Now I was wondering, is it possible to do something like this, (I am aware of people doing traffic splits by keeping two different API versions for A/B testing etc)
As soon as you get the incoming request for your production-api, you make an async request to your new API and then carry on with the rest of the code for the production-api and then, just before returning the final response to the caller back, you check whether you have the results computed for that async task that you had created before. If it's available, then you return that instead of the current API.
I am wondering, what's the best way to do something like this? Do we try to write a decorator for this or something else? i am a bit worried about lot of edge cases that can happen if we use async here. Anyone has any pointers on making the code or the whole approach better?
Thanks for your time!
Some pseudo-code for the approach above,
import asyncio
def call_old_api():
pass
async def call_new_api():
pass
async def main():
task = asyncio.Task(call_new_api())
oldResp = call_old_api()
resp = await task
if task.done():
return resp
else:
task.cancel() # maybe
return oldResp
asyncio.run(main())
You can't just execute call_old_api() inside asyncio's coroutine. There's detailed explanation why here. Please, ensure you understand it, because depending on how your server works you may not be able to do what you want (to run async API on a sync server preserving the point of writing an async code, for example).
In case you understand what you're doing, and you have an async server, you can call the old sync API in thread and use a task to run the new API:
task = asyncio.Task(call_new_api())
oldResp = await in_thread(call_old_api())
if task.done():
return task.result() # here you should keep in mind that task.result() may raise exception if the new api request failed, but that's probably ok for you
else:
task.cancel() # yes, but you should take care of the cancelling, see - https://stackoverflow.com/a/43810272/1113207
return oldResp
I think you can go even further and instead of always waiting for the old API to be completed, you can run both APIs concurrently and return the first that's done (in case new api works faster than the old one). With all checks and suggestions above, it should look something like this:
import asyncio
import random
import time
from contextlib import suppress
def call_old_api():
time.sleep(random.randint(0, 2))
return "OLD"
async def call_new_api():
await asyncio.sleep(random.randint(0, 2))
return "NEW"
async def in_thread(func):
loop = asyncio.get_running_loop()
return await loop.run_in_executor(None, func)
async def ensure_cancelled(task):
task.cancel()
with suppress(asyncio.CancelledError):
await task
async def main():
old_api_task = asyncio.Task(in_thread(call_old_api))
new_api_task = asyncio.Task(call_new_api())
done, pending = await asyncio.wait(
[old_api_task, new_api_task], return_when=asyncio.FIRST_COMPLETED
)
if pending:
for task in pending:
await ensure_cancelled(task)
finished_task = done.pop()
res = finished_task.result()
print(res)
asyncio.run(main())

how to terminate rest request gracefully

I want to terminate rest request coming to server without further processing if input params are missing.
Currently this is the implementation, which I think is not very good for verify_required_params().
I want to terminate this request without returning any value from verify_required_params() in case of missing params. else flow should continue.
Running this on flask server and open to include any new package for best/ optimized approach.
Can please someone suggest an optimize way for this?
#app.route('/is_registered', methods=['POST'])
def is_registered():
_json = request.get_json()
keys = _json.keys()
customer = Customer()
if verify_required_params(['mobile'], keys) is True:
_mobile = _json['mobile']
validated = validate_mobile(_mobile)
registered = customer.find(_mobile)
if not validated:
response = get_response('MOBILE_NUMBER_NOT_VALID')
return jsonify(response)
if not registered:
response = get_response('MOBILE_NUMBER_NOT_REGISTERED')
return jsonify(response)
response = get_response('MOBILE_NUMBER_REGISTERED')
return jsonify(response)
else:
return verify_required_params(['mobile'], keys)
def verify_required_params(required, received):
required = set(required)
received = set(received)
missing = list(sorted(required - received))
data = {"missing_key(s)": missing}
# response = app.response_class(
# response=json.dumps(data),
# status=200,
# mimetype='application/json'
# )
if missing:
return jsonify(data)
return True
🎶 You say it works in a RESTful way, then your errors come back as 200 OK 🎶
In REST, your URL should encode all the information about your entity. In your case, you are identifying a client by their phone number, and you are getting rather than updating information about them, so your endpoint should look like GET /client/<phonenumber>/registered. That way, a request can't not provide this information without going to a different endpoint.
In short, your code will be replaced with:
#app.route('/client/<mobile>/registered', methods=['GET'])
def is_registered(mobile):
if not mobile.is_decimal():
return jsonify({'error': 'mobile is not number'}), 400 # Bad Request
customer = Customer()
registered = bool(customer.find(mobile))
# does it make sense to have a customer who is not registered yet?
# if not, use:
if not registered:
return jsonify({'error': 'client not found'}), 404 # Not Found
validated = validate_mobile(mobile)
return jsonify( {'validated': validated, 'registered': registered} )
In addition, it's better to have the validation function be a decorator. That way it gets called before the actual business logic of the function. For your example of checking whether request.get_json() contains the proper fields, this is how it would look like:
import functools
def requires_fields(fields):
required_fields = set(fields)
def wrapper(func):
#functools.wraps(decorated)
def decorated(*args, **kwargs):
current_fields = set(request.get_json().keys())
missing_fields = required_fields
if missing_fields:
return jsonify({'error': 'missing fields', 'fields': list(missing_fields)}), 400 # Bad Request
resp = func(*args, **kwargs)
return resp
return wrapper
# usage:
#app.route('/comment', methods=['POST'])
#requires_fields(['author', 'post_id', 'body'])
def create_comment():
data = request.get_json()
id = FoobarDB.insert('comment', author=data['author'], post_id=data['post_id'], body=data['body'])
return jsonify({'new_id': id}), 201 # Created
If you must leave it the way it is now, then in order to not return data from the validation function, you must raise an HTTPException. The default function to do it is flask.abort(code).

How to manage sessions with aiohttp?

I'm using aiohttp with asyncio to make a batch of requests. My first approach was to create a session inside the fetch() function (which starts an asyncio.gather job), and then passing the session object around to the functions that perform the post requests (get_info)
def batch_starter(item_list)
return_value = loop.run_until_complete(fetch(item_list))
return return_value
async def fetch(item_list):
async with aiohttp.ClientSession() as session: # <- session started here
results = await asyncio.gather(*[asyncio.ensure_future(get_info(session, item)) for item in item_list])
async def get_info(session, item): # <- session passed to the function
async with session.post("some_url", data={"id": item}) as resp:
html = await resp.json()
some_info = html.get('info')
return some_info
but thanks to my confusion, I am now leaning towards instantiating the session right away once the script is imported, like below, at the top of the file:
import asyncio
import aiohttp
import json
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
session = aiohttp.ClientSession() # <- session started at top of file
def batch_starter(item_list)
return_value = loop.run_until_complete(fetch(item_list))
return return_value
async def fetch(item_list):
results = await asyncio.gather(*[asyncio.ensure_future(get_info(item)) for item in item_list])
async def get_info(item):
async with session.post("some_url", data={"id": item}) as resp: # <- session from outer scope is used
html = await resp.json()
some_info = html.get('info')
return some_info
the docs explain that opening a session with every request is a "very bad" idea (obviously). But this is stated right after the example which does apparently exactly that (first approach)? Which one of this is correct, and how is the session going to behave when it is used like in the second approach, at the top of the file? wouldn't the session just stay open forever if I'm using the second approach?
The batch_starter() function is not going to be called a lot, but with 9000+ items in the item_list. I assumed this was already reducing the amount of sessions to 1 (per gather job), but apparently this is the "bad idea" example, and needs to be corrected? the docs are a bit unclear about this...

Failure on unit tests with pytest, tornado and aiopg, any query fail

I've a REST API running on Python 3.7 + Tornado 5, with postgresql as database, using aiopg with SQLAlchemy core (via the aiopg.sa binding). For the unit tests, I use py.test with pytest-tornado.
All the tests go ok as soon as no query to the database is involved, where I'd get this:
Runtime error: Task cb=[IOLoop.add_future..() at venv/lib/python3.7/site-packages/tornado/ioloop.py:719]> got Future attached to a different loop
The same code works fine out of the tests, I'm capable of handling 100s of requests so far.
This is part of an #auth decorator which will check the Authorization header for a JWT token, decode it and get the user's data and attach it to the request; this is the part for the query:
partner_id = payload['partner_id']
provided_scopes = payload.get("scope", [])
for scope in scopes:
if scope not in provided_scopes:
logger.error(
'Authentication failed, scopes are not compliant - '
'required: {} - '
'provided: {}'.format(scopes, provided_scopes)
)
raise ForbiddenException(
"insufficient permissions or wrong user."
)
db = self.settings['db']
partner = await Partner.get(db, username=partner_id)
# The user is authenticated at this stage, let's add
# the user info to the request so it can be used
if not partner:
raise UnauthorizedException('Unknown user from token')
p = Partner(**partner)
setattr(self.request, "partner_id", p.uuid)
setattr(self.request, "partner", p)
The .get() async method from Partner comes from the Base class for all models in the app. This is the .get method implementation:
#classmethod
async def get(cls, db, order=None, limit=None, offset=None, **kwargs):
"""
Get one instance that will match the criteria
:param db:
:param order:
:param limit:
:param offset:
:param kwargs:
:return:
"""
if len(kwargs) == 0:
return None
if not hasattr(cls, '__tablename__'):
raise InvalidModelException()
tbl = cls.__table__
instance = None
clause = cls.get_clause(**kwargs)
query = (tbl.select().where(text(clause)))
if order:
query = query.order_by(text(order))
if limit:
query = query.limit(limit)
if offset:
query = query.offset(offset)
logger.info(f'GET query executing:\n{query}')
try:
async with db.acquire() as conn:
async with conn.execute(query) as rows:
instance = await rows.first()
except DataError as de:
[...]
return instance
The .get() method above will either return a model instance (row representation) or None.
It uses the db.acquire() context manager, as described in aiopg's doc here: https://aiopg.readthedocs.io/en/stable/sa.html.
As described in this same doc, the sa.create_engine() method returns a connection pool, so the db.acquire() just uses one connection from the pool. I'm sharing this pool to every request in Tornado, they use it to perform the queries when they need it.
So this is the fixture I've set up in my conftest.py:
#pytest.fixture
async def db():
dbe = await setup_db()
return dbe
#pytest.fixture
def app(db, event_loop):
"""
Returns a valid testing Tornado Application instance.
:return:
"""
app = make_app(db)
settings.JWT_SECRET = 'its_secret_one'
return app
I can't find an explanation of why this is happening; Tornado's doc and source makes it clear that asyncIO event loop is used as default, and by debugging it I can see the event loop is indeed the same one, but for some reason it seems to get closed or stopped abruptly.
This is one test that fails:
#pytest.mark.gen_test(timeout=2)
def test_score_returns_204_empty(app, http_server, http_client, base_url):
score_url = '/'.join([base_url, URL_PREFIX, 'score'])
token = create_token('test', scopes=['score:get'])
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json',
}
response = yield http_client.fetch(score_url, headers=headers, raise_error=False)
assert response.code == 204
This test fails as it returns 401 instead of 204, given the query on the auth decorator fails due to the RuntimeError, which returns then an Unauthorized response.
Any idea from the async experts here will be very appreciated, I'm quite lost on this!!!
Well, after a lot of digging, testing and, of course, learning quite a lot about asyncio, I made it work myself. Thanks for the suggestions so far.
The issue was that the event_loop from asyncio was not running; as #hoefling mentioned, pytest itself does not support coroutines, but pytest-asyncio brings such a useful feature to your tests. This is very well explained here: https://medium.com/ideas-at-igenius/testing-asyncio-python-code-with-pytest-a2f3628f82bc
So, without pytest-asyncio, your async code that needs to be tested will look like this:
def test_this_is_an_async_test():
loop = asyncio.get_event_loop()
result = loop.run_until_complete(my_async_function(param1, param2, param3)
assert result == 'expected'
We use loop.run_until_complete() as, otherwise, the loop will never be running, as this is the way asyncio works by default (and pytest makes nothing to make it work differently).
With pytest-asyncio, your test works with the well-known async / await parts:
async def test_this_is_an_async_test(event_loop):
result = await my_async_function(param1, param2, param3)
assert result == 'expected'
pytest-asyncio in this case wraps the run_until_complete() call above, summarizing it heavily, so the event loop will run and be available for your async code to use it.
Please note: the event_loop parameter in the second case is not even necessary here, pytest-asyncio gives one available for your test.
On the other hand, when you are testing your Tornado app, you usually need to get a http server up and running during your tests, listening in a well-known port, etc., so the usual way goes by writing fixtures to get a http server, base_url (usually http://localhost:, with an unused port, etc etc).
pytest-tornado comes up as a very useful one, as it offers several of these fixtures for you: http_server, http_client, unused_port, base_url, etc.
Also to mention, it gets a pytest mark's gen_test() feature, which converts any standard test to use coroutines via yield, and even to assert it will run with a given timeout, like this:
#pytest.mark.gen_test(timeout=3)
def test_fetch_my_data(http_client, base_url):
result = yield http_client.fetch('/'.join([base_url, 'result']))
assert len(result) == 1000
But, this way it does not support async / await, and actually only Tornado's ioloop will be available via the io_loop fixture (although Tornado's ioloop uses by default asyncio underneath from Tornado 5.0), so you'd need to combine both pytest.mark.gen_test and pytest.mark.asyncio, but in the right order! (which I did fail).
Once I understood better what could be the problem, this was the next approach:
#pytest.mark.gen_test(timeout=2)
#pytest.mark.asyncio
async def test_score_returns_204_empty(http_client, base_url):
score_url = '/'.join([base_url, URL_PREFIX, 'score'])
token = create_token('test', scopes=['score:get'])
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json',
}
response = await http_client.fetch(score_url, headers=headers, raise_error=False)
assert response.code == 204
But this is utterly wrong, if you understand how Python's decorator wrappers work. With the code above, pytest-asyncio's coroutine is then wrapped in a pytest-tornado yield gen.coroutine, which won't get the event-loop running... so my tests were still failing with the same problem. Any query to the database were returning a Future waiting for an event loop to be running.
My updated code once I made myself up of the silly mistake:
#pytest.mark.asyncio
#pytest.mark.gen_test(timeout=2)
async def test_score_returns_204_empty(http_client, base_url):
score_url = '/'.join([base_url, URL_PREFIX, 'score'])
token = create_token('test', scopes=['score:get'])
headers = {
'Authorization': f'Bearer {token}',
'Accept': 'application/json',
}
response = await http_client.fetch(score_url, headers=headers, raise_error=False)
assert response.code == 204
In this case, the gen.coroutine is wrapped inside the pytest-asyncio coroutine, and the event_loop runs the coroutines as expected!
But there were still a minor issue that took me a little while to realize, too; pytest-asyncio's event_loop fixture creates for every test a new event loop, while pytest-tornado creates too a new IOloop. And the tests were still failing, but this time with a different error.
The conftest.py file now looks like this; please note I've re-declared the event_loop fixture to use the event_loop from pytest-tornado io_loop fixture itself (please recall pytest-tornado creates a new io_loop on each test function):
#pytest.fixture(scope='function')
def event_loop(io_loop):
loop = io_loop.current().asyncio_loop
yield loop
loop.stop()
#pytest.fixture(scope='function')
async def db():
dbe = await setup_db()
yield dbe
#pytest.fixture
def app(db):
"""
Returns a valid testing Tornado Application instance.
:return:
"""
app = make_app(db)
settings.JWT_SECRET = 'its_secret_one'
yield app
Now all my tests work, I'm back a happy man and very proud of my now better understanding of the asyncio way of life. Cool!

Unexpected AssertionError: single test not using logged in user from previous step

I am following the tutorial by http://www.patricksoftwareblog.com/flask-tutorial/, which I believe is based on https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world. Great stuff for a beginner.
I am getting different results when testing my code through frontend manually (which works fine) v.s. through pytest.
My test tries to show the "groups" endpoint which requires a login (standard #login_required decorator).
I initially test the user getting a login page ("Knock knock") when trying to get the endpoint without a login. This works manually and through pytest.
I login a user. If I inspect the response from the login I can clearly see a "Welcome back Pete!" success message.
My second assert receives a response from URL /login?next=%2Fgroups indicating the /groups endpoint is called without a login/authentication preceding it and the assert fails. Testing this manually works as expected. Why is that single test not using the same user/session combination in the next step(s)?
Test with the problem is the first snippet below:
def test_groups(app):
assert b'Knock knock' in get(app, "/groups").data
login(app, "pete#testmail.com", "pete123")
assert b'Test group 1' in get(app, "/groups").data
My "get" function for reference:
def get(app, endpoint: str):
return app.test_client().get(endpoint, follow_redirects=True)
My "login" function for reference:
def login(app, email="testuser#testmail.com", password="testing"):
return app.test_client().post('/login', data=dict(email=email, password=password), follow_redirects=True)
The app (from a conftest fixture imported in the test module by #pytest.mark.usefixtures('app')) for reference:
#pytest.fixture
def app():
"""An application for the tests."""
_app = create_app(DevConfig)
ctx = _app.test_request_context()
ctx.push()
yield _app
ctx.pop()
The login route for reference:
#app.route('/login', methods=['GET', 'POST'])
def login():
form = LoginForm(request.form)
if request.method == 'POST':
if form.validate_on_submit():
user = User.query.filter_by(email=form.email.data).first()
if user is not None and user.is_correct_password(form.password.data):
user.authenticated = True
user.last_login = user.current_login
user.current_login = datetime.now()
user.insert_user()
login_user(user)
flash(f'Welcome back {user.name}!', 'success')
return redirect(url_for('our_awesome_group.index'))
else:
flash('Incorrect credentials! Did you already register?', 'error')
else:
flash_errors(form)
return render_template('login.html', form=form)
The groups route for reference:
#app.route('/groups')
#login_required
def groups():
groups_and_users = dict()
my_group_uuids = Membership.list_groups_per_user(current_user)
my_groups = [Group.query.filter_by(uuid=group).first() for group in my_group_uuids]
for group in my_groups:
user_uuids_in_group = Membership.list_users_per_group(group)
users_in_group = [User.query.filter_by(uuid=user).first() for user in user_uuids_in_group]
groups_and_users[group] = users_in_group
return render_template('groups.html', groups_and_users=groups_and_users)
Im going to sum up the comments I made that gave the answer on how to solve this issue.
When creating a test app using Pytest and Flask there are a few different ways to go about it.
The suggested way to create a test client with proper app context is to use something like:
#pytest.fixture
def client():
""" Creates the app from testconfig, activates test client and context then makes the db and allows the test client
to be used """
app = create_app(TestConfig)
client = app.test_client()
ctx = app.app_context()
ctx.push()
db.create_all()
yield client
db.session.close()
db.drop_all()
ctx.pop()
That creates the client while pushing the app context so you can register things like your database and create the tables to the test client.
The second way is show in OP's question where use app.test_request context
#pytest.fixture
def app():
"""An application for the tests."""
_app = create_app(DevConfig)
ctx = _app.test_request_context()
ctx.push()
yield _app
ctx.pop()
and then create the test client in another pytest fixture
#pytest.fixture
def client(app):
return app.test_client()
Creating a test client allows you to use various testing features and gives access to flask requests with the proper app context.

Resources