Why would my pytest tests hang before dropping my SQLAlchemy DB? - python-3.x

Here's my conftest.py (some code deleted for brevity)
from trip_planner import create_app, db as _db
from trip_planner.models import User
from test import TestConfig, test_instance_dir
#pytest.fixture(scope='session', autouse=True)
def app(session_mocker: pytest_mock.MockerFixture):
static_folder = mkdtemp(prefix='static')
_app = create_app(TestConfig(), instance_path=test_instance_dir,
static_folder=static_folder)
ctx = _app.app_context()
ctx.push()
session_mocker.patch('trip_planner.assets.manifest',
new=defaultdict(str))
yield _app
ctx.pop()
os.rmdir(static_folder)
#pytest.fixture(scope='session')
def db(app):
_db.create_all()
seed_db(_db)
yield _db
_db.drop_all()
def seed_db(db) -> User:
sessionmaker = db.create_session({'autocommit': False})
session = sessionmaker()
user = User(username='username',
password_digest=bcrypt.hash('password'))
session.add(user)
session.commit()
session.close()
return user
#pytest.fixture(scope='function')
def db_session(db):
session = db.create_scoped_session(options=dict(
autocommit=False, autoflush=False
))
db.session = session
with session.begin_nested():
yield session
session.rollback()
session.remove()
#pytest.fixture(scope='function')
def app_client(app):
with app.test_client() as c:
yield c
#pytest.fixture(scope='function')
def session_user(db_session, app_client) -> int:
user_id, = db_session.query(User.id).filter_by(username='username').one()
with app_client.session_transaction() as sess:
sess['user_id'] = user_id
return user_id
When my tests pass, pytest hangs. I'm only able to stop it with killall. Inspection of the test database reveals that the relationships were not, in fact, dropped.
How do I remedy this?

Apparently, it's a well-known problem with PostgreSQL specifically, here's the discussion.
The way I solved it was adding _db.close_all_sessions() before dropping all tables:
#pytest.fixture(scope='session')
def db(app):
_db.create_all()
seed_db(_db)
yield _db
_db.close_all_sessions()
_db.drop_all()

Another reason that might have been the case before, not sure. But it's worth checking if you check pg_stat_activity and see that your queries hang on obtaining advisory locks.
Apparently advisory locks can be session-level or transaction-level in PostgreSQL. A session-level advisory lock is not released upon the end of transaction, only on disconnect. It can cause overlapping sessions to hang while one is trying to roll everything back, and the other trying to take an advisory lock.
A session-level advisory lock is obtained via pg_advisory_lock functions and a transaction-level advisory lock is obtained via pg_advisory_xact_lock functions.

Related

How to implement "relationship" caching system in a similar query?

I noticed that when having a Model such as :
class User(Model):
id = ...
books = relationship('Book')
When calling user.books for the first time, SQLAlchemy query the database (when lazy='select' for instance, which is the default), but sub-sequent call to user.books don't call the database. The results seems to have been cached.
I'd like to have the same feature from SQLAlchemy when using a method that query, for instance:
class User:
def get_books(self):
return Book.query.filter(Book.user_id == self.id).all()
But when doing that, if I call 3 times get_books(), SQLAlchemy does call the database 3 times (when setting the ECHO property to True).
How can I change get_books() to use the caching system from SQLAlchemy ?
I insist to mention "from SQLAlchemy" because I believe they handle the refresh/expunge/flush system and changes are then re-queried to the DB if one of these happened. Opposed to if I were to simply create a caching property in the model with a simple:
def get_books(self):
if self._books is None:
self._books = Book.query.filter(Book.user_id == self.id).all()
return self._books
This does not work well with flush/refresh/expunge from SQLAlchemy.
So, How can I change get_books() to use the caching system from SQLAlchemy ?
Edit 1:
I realized that the solution provided under is not perfect, because it caches for the current object. If you have two instances of the same user, and call get_books on both, two queries will be made because the caching applies only on the instance, not globally, contrary to SQLAlchemy.
The reason is simple - I believe - but still unclear how to apply it in my case: The object is defined at the class level, not the instance (books = relationship()), and they build their own query internally, so they can cache it based on the query.
In the solution I gave, the memoize_getter is unaware of the query made, and as such, cannot cache it for the same value accros multiple instance, so any identical call made to another instance will query the database.
Original answer:
I've been trying to wrap my head around SQLAlchemy's code (wow that's dense!), and I think I figured it out!
A relationship, at least when being set as "lazy='select'" (default), is a InstrumentedAttribute, which contains a get function that does the following :
def __get__(self, instance, owner):
if instance is None:
return self
dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
try:
state = instance_state(instance)
except AttributeError as err:
util.raise_(
orm_exc.UnmappedInstanceError(instance),
replace_context=err,
)
return self.impl.get(state, dict_)
So, a basic caching system, respecting SQLAlchemy, would be something like:
from sqlalchemy.orm.base import instance_dict
def get_books(self):
dict_ = instance_dict(self)
if 'books' not in dict_:
dict_['books'] = Book.query.filter(Book.user_id == self.id).all()
return dict_['books']
Now, we can push the vice a bit further, and do ... a decorator (oh sweet):
def memoize_getter(f):
#functools.wraps(f)
def decorator(instance, *args, **kwargs):
property_name = f.__name__.replace('get_', '')
dict_ = instance_dict(instance)
if property_name not in dict_:
dict_[property_name] = f(instance, *args, **kwargs)
return dict_[property_name]
return decorator
Thus transforming the original method to :
class User:
#memoize_getter
def get_books(self):
return Book.query.filter(Book.user_id == self.id).all()
If someone has a better solution, I'm eagerly interested!

flask-sqlalchemy - how to obtain a request-independent db session

I am looking at the best (and correct way) to obtain a request-independent db session.
The problem is the following: I am building a web application that has to access the database. The endpoint exposed accepts a request, performs the first work, then create a thread (that will perform the hard work), starts it, and replies to the client with a unique id for the "job". Meanwhile the thread goes on with its work (and it has to access the database) and the client can perform polling to check the status. I am not using dedicated framework to perform this background job, but only a simple thread. I can only have one single background thread going on at any time, for this reason I am maintaining the state in a singleton.
The application is created with the application factory design https://flask.palletsprojects.com/en/1.1.x/patterns/appfactories/
I am using Gunicorn as WSGI server and sqlite as database.
The basic structure of the code is the following (I am removing the business logic and imports, but the concept remain):
api_jobs.py
#bp.route('/jobs', methods=['POST'])
def create_job():
data = request.get_json(force=True) or {}
name = data['name']
job_controller = JobController() # This is a singleton
job_process = job_controller.start_job(name)
job_process_dict = job_process.to_dict()
return jsonify(job_process_dict)
controller.py
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class JobController(object):
__metaclass__ = Singleton
def __init__(self):
self.job_thread = None
def start_job(self, name):
if self.job_thread is not None:
job_id = self.job_thread.job_id
job_process = JobProcess.query.get(job_id)
if job_process.status != 'end':
raise ValueError('A job process is already ongoing!')
else:
self.job_thread = None
job_process = JobProcess(name)
db.session.add(job_process)
db.session.commit() # At this step I create the ID
self.job_thread = JobThread(db.session, job_process.id)
self.job_thread.start()
return job_process
class JobThread(threading.Thread):
def __init__(self, db_session, job_id):
self.job_id = job_id
self.db_session = db_session
self.session = self.db_session()
def run(self):
self.job_process = self.session.query(JobProcess).get(self.job_id)
self.job_process.status = 'working'
self.session.commit()
i = 0
while True:
sleep(1)
print('working hard')
i = i +1
if i > 10:
break
self.job_process.status = 'end'
self.session.commit()
self.db_session.remove()
models.py
class JobProcess(db.Model):
id = db.Column(db.Integer, primary_key=True)
status = db.Column(db.String(64))
name = db.Column(db.String(64))
def to_dict(self):
data = {
'id': self.id,
'status': self.status,
'name': self.name,
}
return data
From my understanding, calling self.session = self.db_session() is actually doing nothing (due to the fact that sqlalchemy is using a registry, that is also a proxy, if I am not wrong), however that was the best attempt that I found to create a "new/detached/useful" session.
I checked out https://docs.sqlalchemy.org/en/13/orm/contextual.html#using-thread-local-scope-with-web-applications in order to obtain a request-independent db-session, however even using the suggested method of creating a new session factory (sessionmaker + scoped_session), does not work.
The errors that I obtain, with slight changes to the code, are multiple, in this configuration the error is
DetachedInstanceError: Instance <JobProcess at 0x7f875f81c350> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3)
The basic question remains: Is it possible to create a session that will live inside the thread and that I will take care of creating/tearing down?
The reason that you are encountering the DetachedInstanceError is that you are attempting to pass the session from your main thread to your job thread. Sqlalchemy is using thread local storage to manage the sessions and thus a single session cannot be shared between two threads. You just need to create a new session in the run method of your job thread.

Python neo4j bolt driver loses connection after context switch

I have a backend written in django which uses the neo4j bolt driver to communicate with the neo4j graph db.
I use a Singleton to handle the connection and the bolt driver closes the connection, whenever I access it from another location than where the connection was initially established (e.g. I open the connection in a view, access it in a signal and when I try to save in the view the connection is lost).
I’ve tried to extract the main problem I have come up with and break it down to a small piece of example code below.
I would appreciate any explanation of behavior to or even better a solution ;)
from neo4j.v1 import Driver, GraphDatabase, basic_auth, Session, Transaction
def main():
gm = GraphMapper()
gm.begin_atomic_transaction()
print(f"graph connection closed before method? {gm.is_connection_closed()}") # -> false
fill_transaction() #the context switch
print(f"graph connection closed after method? {gm.is_connection_closed()}") # -> true
if not gm.is_connection_closed():
print(f"graph connection open - try to commit") # -> is never called
gm.commit_atomic_transaction_and_close_session()
def fill_transaction():
gm = GraphMapper()
print(f"graph connection closed in method? {gm.is_connection_closed()}") # -> true
gm.create_update_node("TestNode")
class GraphMapper:
__instance = None
__transaction = None # type: Transaction
__session = None # type: Session
__connection = None # type: Connection
__driver = None # type: Driver
def __new__(cls, *args, **kwargs):
if not isinstance(cls.__instance, cls):
cls.__instance = object.__new__(cls, *args, **kwargs)
return cls.__instance
def __init__(self):
self.__driver = GraphDatabase.driver("bolt://localhost:7687", auth=basic_auth("neo4j", "password"))
def is_connection_closed(self):
return self.__transaction.session._connection._closed
def begin_atomic_transaction(self):
self.__session = self.__driver.session()
self.__transaction = self.__session.begin_transaction()
self.__connection = self.__transaction.session._connection
return self.__transaction
def commit_atomic_transaction_and_close_session(self):
result = self.__transaction.commit()
self.__transaction = None
return result
def create_update_node(self, label):
# Add Cypher statement to transaction
Implementation details: I have a wrapper object "GraphMapper" which encapsulates the connection, session, and transaction of the driver. and is designed as a singleton instance. A transaction is established at a point (A, e.g. a view) but I cannot complete the transaction here. I need to add additional values from location (B, e.g. a post-save signal). However, I cannot pass a reference to the "GraphMapper" A to B. Thus, I came up with the singleton implementation as explained above.
I have ensured that the singleton is exact the same instance on all locations (within one request). But at the moment I exit the context (package, class or method) through a method call and retrieve the "GraphMapper" instance at the very next location, the connection is closed. I even checked the reference count to the "GraphMapper" and its connection and the garbage collector should not delete it. Seldom it says the connection is not closed. But writing to the graph results in a connection refused error.
P.S.: I know there is some useless and unnecessary code, this is for illustrative purposes only and I wanted to make sure that the garbage collector did not kill some objects.

Proper way to share sqlachemy session and egine in context of gevent

My code tried to create an engine first:
def createEngine(connectionstring):
engine = create_engine(connectionstring,
#pool_size = DEFAULT_POOL_SIZE,
#max_overflow = DEFAULT_MAX_OVERFLOW,
echo = False)
return engine
Then get a session from the engine:
#contextmanager
def getOrmSession(engine):
try:
Session.configure(bind=engine)
session = Session()
yield session
finally:
pass
The client code is as follows:
def composeItems(keyword, itemList):
with getOrmSession(engine) as session:
for i in itemList:
item = QueryItem(query=keyword,
......
active = 0)
session.add(item)
session.commit()
Then when i call composeItems within gevent spawn. Obviously, mysql deadlocks. What had happened? What is wrong with the above usage?
Find the answer by myself.
I need to patch threading when import gevent. So scoped_session will be able to use greenlet's threading local. change the patching and everhything works fine now.

Python nose unit tests generating too many clients already

I'm using python 3.3, pyramid, sqlalchemy, psygopg2. I'm using a test postgres db for the unit tests. I have 101 unit tests set up for nose to run. On test 101 I get:
nose.proxy.OperationalError: (OperationalError) FATAL: sorry, too many clients already
It seems from the traceback that the exception is being thrown in
......./venv/lib/python3.3/site-packages/SQLAlchemy-0.8.2-py3.3.egg/sqlalchemy/pool.py", line 368, in __connect
connection = self.__pool._creator()
Perhaps tearDown() is not running after each test? Isn't the connection pool limit for Postgresql 100 at one time?
Here's my BaseTest class:
class BaseTest(object):
def setup(self):
self.request = testing.DummyRequest()
self.config = testing.setUp(request=self.request)
self.config.scan('../models')
sqlalchemy_url = 'postgresql://<user>:<pass>#localhost:5432/<db>'
engine = create_engine(sqlalchemy_url)
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
DBSession.configure(bind=engine)
Base.metadata.bind = engine
Base.metadata.create_all(engine)
self.dbsession = DBSession
def tearDown(self):
testing.teardown()
My test classes inherit from BaseTest:
class TestUser(BaseTest):
def __init__(self, dbsession = None):
if dbsession:
self.dbsession = dbsession
def test_create_user(self):
......
......
One of the test classes tests a many-to-many relationship, so in that test class I first create the records needed to satisfy the foreign key relationships:
from tests.test_user import TestUser
from tests.test_app import TestApp
class TestAppUser(BaseTest):
def __init__(self, dbsession = None):
if dbsession:
self.dbsession = dbsession
def create_app_user(self):
test_app = TestApp(self.dbsession)
test_user = TestUser(self.dbsession)
test_app.request = testing.DummyRequest()
test_user.request = testing.DummyRequest()
app = test_app.create_app()
user = test_user.create_user()
......
I'm passing the dbsession into the TestApp and TestUser classes...I'm thinking that is the source of the problem, but I'm not sure.
Any help is greatly appreciated. Thanks.
Pyramid has nothing to do with SQLAlchemy. There is nowhere in Pyramid's API where you would link any of your SQLAlchemy configuration in a way that Pyramid would actually care. Therefore, Pyramid's testing.tearDown() does not do anything with connections. How could it? It doesn't know they exist.
You're using scoped sessions with a unit test, which really doesn't make a lot of sense because your unit tests are probably not threaded. So now you're creating threadlocal sessions and not cleaning them up. They aren't garbage collected because they're threadlocal. You also aren't manually closing those connections so the connection pool thinks they're still being used.
Is there a reason you need the ZopeTransactionExtension in your tests? Are you using the transaction package in your tests, or pyramid_tm? In a test if you don't know what something does then it shouldn't be there. You're calling create_all() from your setUp() method? That's going to be slow as hell introspecting the database and creating tables on every request. Ouch.
class BaseTest(object):
def setUp(self):
self.request = testing.DummyRequest()
self.config = testing.setUp(request=self.request)
self.config.scan('../models')
sqlalchemy_url = 'postgresql://<user>:<pass>#localhost:5432/<db>'
self.engine = create_engine(sqlalchemy_url)
Base.metadata.create_all(bind=self.engine)
self.sessionmaker = sessionmaker(bind=self.engine)
self.sessions = []
def makeSession(self, autoclose=True):
session = self.sessionmaker()
if autoclose:
self.sessions.append(session)
def tearDown(self):
for session in self.sessions:
session.close()
self.engine.dispose()
testing.teardown()

Resources