flask-sqlalchemy - how to obtain a request-independent db session - multithreading

I am looking at the best (and correct way) to obtain a request-independent db session.
The problem is the following: I am building a web application that has to access the database. The endpoint exposed accepts a request, performs the first work, then create a thread (that will perform the hard work), starts it, and replies to the client with a unique id for the "job". Meanwhile the thread goes on with its work (and it has to access the database) and the client can perform polling to check the status. I am not using dedicated framework to perform this background job, but only a simple thread. I can only have one single background thread going on at any time, for this reason I am maintaining the state in a singleton.
The application is created with the application factory design https://flask.palletsprojects.com/en/1.1.x/patterns/appfactories/
I am using Gunicorn as WSGI server and sqlite as database.
The basic structure of the code is the following (I am removing the business logic and imports, but the concept remain):
api_jobs.py
#bp.route('/jobs', methods=['POST'])
def create_job():
data = request.get_json(force=True) or {}
name = data['name']
job_controller = JobController() # This is a singleton
job_process = job_controller.start_job(name)
job_process_dict = job_process.to_dict()
return jsonify(job_process_dict)
controller.py
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class JobController(object):
__metaclass__ = Singleton
def __init__(self):
self.job_thread = None
def start_job(self, name):
if self.job_thread is not None:
job_id = self.job_thread.job_id
job_process = JobProcess.query.get(job_id)
if job_process.status != 'end':
raise ValueError('A job process is already ongoing!')
else:
self.job_thread = None
job_process = JobProcess(name)
db.session.add(job_process)
db.session.commit() # At this step I create the ID
self.job_thread = JobThread(db.session, job_process.id)
self.job_thread.start()
return job_process
class JobThread(threading.Thread):
def __init__(self, db_session, job_id):
self.job_id = job_id
self.db_session = db_session
self.session = self.db_session()
def run(self):
self.job_process = self.session.query(JobProcess).get(self.job_id)
self.job_process.status = 'working'
self.session.commit()
i = 0
while True:
sleep(1)
print('working hard')
i = i +1
if i > 10:
break
self.job_process.status = 'end'
self.session.commit()
self.db_session.remove()
models.py
class JobProcess(db.Model):
id = db.Column(db.Integer, primary_key=True)
status = db.Column(db.String(64))
name = db.Column(db.String(64))
def to_dict(self):
data = {
'id': self.id,
'status': self.status,
'name': self.name,
}
return data
From my understanding, calling self.session = self.db_session() is actually doing nothing (due to the fact that sqlalchemy is using a registry, that is also a proxy, if I am not wrong), however that was the best attempt that I found to create a "new/detached/useful" session.
I checked out https://docs.sqlalchemy.org/en/13/orm/contextual.html#using-thread-local-scope-with-web-applications in order to obtain a request-independent db-session, however even using the suggested method of creating a new session factory (sessionmaker + scoped_session), does not work.
The errors that I obtain, with slight changes to the code, are multiple, in this configuration the error is
DetachedInstanceError: Instance <JobProcess at 0x7f875f81c350> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3)
The basic question remains: Is it possible to create a session that will live inside the thread and that I will take care of creating/tearing down?

The reason that you are encountering the DetachedInstanceError is that you are attempting to pass the session from your main thread to your job thread. Sqlalchemy is using thread local storage to manage the sessions and thus a single session cannot be shared between two threads. You just need to create a new session in the run method of your job thread.

Related

Dart: Store heavy object in an isolate and access its method from main isolate without reinstatiating it

is it possible in Dart to instantiate a class in an isolate, and then send message to this isolate to receive a return value from its methods (instead of spawning a new isolate and re instantiate the same class every time)? I have a class with a long initialization, and heavy methods. I want to initialize it once and then access its methods without compromising the performance of the main isolate.
Edit: I mistakenly answered this question thinking python rather than dart. snakes on the brain / snakes on a plane
I am not familiar with dart programming, but it would seem the concurrency model has a lot of similarities (isolated memory, message passing, etc..). I was able to find an example of 2 way message passing with a dart isolate. There's a little difference in how it gets set-up, and the streams are a bit simpler than python Queue's, but in general the idea is the same.
Basically:
Create a port to receive data from the isolate
Create the isolate passing it the port it will send data back on
Within the isolate, create the port it will listen on, and send the other end of it back to main (so main can send messages)
Determine and implement a simple messaging protocol for remote method call on an object contained within the isolate.
This is basically duplicating what a multiprocessing.Manager class does, however it can be helpful to have a simplified example of how it can work:
from multiprocessing import Process, Lock, Queue
from time import sleep
class HeavyObject:
def __init__(self, x):
self._x = x
sleep(5) #heavy init
def heavy_method(self, y):
sleep(.2) #medium weight method
return self._x + y
def HO_server(in_q, out_q):
ho = HeavyObject(5)
#msg format for remote method call: ("method_name", (arg1, arg2, ...), {"kwarg1": 1, "kwarg2": 2, ...})
#pass None to exit worker cleanly
for msg in iter(in_q.get, None): #get a remote call message from the queue
out_q.put(getattr(ho, msg[0])(*msg[1], **msg[2])) #call the method with the args, and put the result back on the queue
class RMC_helper: #remote method caller for convienience
def __init__(self, in_queue, out_queue, lock):
self.in_q = in_queue
self.out_q = out_queue
self.l = lock
self.method = None
def __call__(self, *args, **kwargs):
if self.method is None:
raise Exception("no method to call")
with self.l: #isolate access to queue so results don't pile up and get popped off in possibly wrong order
print("put to queue: ", (self.method, args, kwargs))
self.in_q.put((self.method, args, kwargs))
return self.out_q.get()
def __getattr__(self, name):
if not name.startswith("__"):
self.method = name
return self
else:
super().__getattr__(name)
def child_worker(remote):
print("child", remote.heavy_method(5)) #prints 10
sleep(3) #child works on something else
print("child", remote.heavy_method(2)) #prints 7
if __name__ == "__main__":
in_queue = Queue()
out_queue = Queue()
lock = Lock() #lock is used as to not confuse which reply goes to which request
remote = RMC_helper(in_queue, out_queue, lock)
Server = Process(target=HO_server, args=(in_queue, out_queue))
Server.start()
Worker = Process(target=child_worker, args=(remote, ))
Worker.start()
print("main", remote.heavy_method(3)) #this will *probably* start first due to startup time of child
Worker.join()
with lock:
in_queue.put(None)
Server.join()
print("done")

Why would my pytest tests hang before dropping my SQLAlchemy DB?

Here's my conftest.py (some code deleted for brevity)
from trip_planner import create_app, db as _db
from trip_planner.models import User
from test import TestConfig, test_instance_dir
#pytest.fixture(scope='session', autouse=True)
def app(session_mocker: pytest_mock.MockerFixture):
static_folder = mkdtemp(prefix='static')
_app = create_app(TestConfig(), instance_path=test_instance_dir,
static_folder=static_folder)
ctx = _app.app_context()
ctx.push()
session_mocker.patch('trip_planner.assets.manifest',
new=defaultdict(str))
yield _app
ctx.pop()
os.rmdir(static_folder)
#pytest.fixture(scope='session')
def db(app):
_db.create_all()
seed_db(_db)
yield _db
_db.drop_all()
def seed_db(db) -> User:
sessionmaker = db.create_session({'autocommit': False})
session = sessionmaker()
user = User(username='username',
password_digest=bcrypt.hash('password'))
session.add(user)
session.commit()
session.close()
return user
#pytest.fixture(scope='function')
def db_session(db):
session = db.create_scoped_session(options=dict(
autocommit=False, autoflush=False
))
db.session = session
with session.begin_nested():
yield session
session.rollback()
session.remove()
#pytest.fixture(scope='function')
def app_client(app):
with app.test_client() as c:
yield c
#pytest.fixture(scope='function')
def session_user(db_session, app_client) -> int:
user_id, = db_session.query(User.id).filter_by(username='username').one()
with app_client.session_transaction() as sess:
sess['user_id'] = user_id
return user_id
When my tests pass, pytest hangs. I'm only able to stop it with killall. Inspection of the test database reveals that the relationships were not, in fact, dropped.
How do I remedy this?
Apparently, it's a well-known problem with PostgreSQL specifically, here's the discussion.
The way I solved it was adding _db.close_all_sessions() before dropping all tables:
#pytest.fixture(scope='session')
def db(app):
_db.create_all()
seed_db(_db)
yield _db
_db.close_all_sessions()
_db.drop_all()
Another reason that might have been the case before, not sure. But it's worth checking if you check pg_stat_activity and see that your queries hang on obtaining advisory locks.
Apparently advisory locks can be session-level or transaction-level in PostgreSQL. A session-level advisory lock is not released upon the end of transaction, only on disconnect. It can cause overlapping sessions to hang while one is trying to roll everything back, and the other trying to take an advisory lock.
A session-level advisory lock is obtained via pg_advisory_lock functions and a transaction-level advisory lock is obtained via pg_advisory_xact_lock functions.

In The Observer Design Pattern How Is The Subject Class Stored?

I've been reading up on the Observer Design Pattern.
Take the below code from the following article:
https://sourcemaking.com/design_patterns/observer/python/1
I understand how the code works and how the two classes, subject and observer, interrelate. What I can't quite get my head round is how this would all work in reality.
In the real world examples of the observer pattern I have read about, the interaction between the subject and observer tends to be open ended.
For example, if you subscribe to a question on Quora and get notified of answers via email, theoretically, you could receive updates indefinitely.
How then if the code below was applied to a real world scenario (like Quora) do the classes persist? Are the classes and their states stored somewhere on the server?
import abc
class Subject:
"""
Know its observers. Any number of Observer objects may observe a
subject.
Send a notification to its observers when its state changes.
"""
def __init__(self):
self._observers = set()
self._subject_state = None
def attach(self, observer):
observer._subject = self
self._observers.add(observer)
def detach(self, observer):
observer._subject = None
self._observers.discard(observer)
def _notify(self):
for observer in self._observers:
observer.update(self._subject_state)
#property
def subject_state(self):
return self._subject_state
#subject_state.setter
def subject_state(self, arg):
self._subject_state = arg
self._notify()
class Observer(metaclass=abc.ABCMeta):
"""
Define an updating interface for objects that should be notified of
changes in a subject.
"""
def __init__(self):
self._subject = None
self._observer_state = None
#abc.abstractmethod
def update(self, arg):
pass
class ConcreteObserver(Observer):
"""
Implement the Observer updating interface to keep its state
consistent with the subject's.
Store state that should stay consistent with the subject's.
"""
def update(self, arg):
self._observer_state = arg
# ...
def main():
subject = Subject()
concrete_observer = ConcreteObserver()
subject.attach(concrete_observer)
subject.subject_state = 123
if __name__ == "__main__":
main()
You do not need to persist the classes. All you need to do is persist the data. In your real-world example, Quora just stores the list of subscribers for each questions. If there is any activity, it just needs to call some Notification service, passing along the list, which eventually will send the mail to subscribed users.

Python neo4j bolt driver loses connection after context switch

I have a backend written in django which uses the neo4j bolt driver to communicate with the neo4j graph db.
I use a Singleton to handle the connection and the bolt driver closes the connection, whenever I access it from another location than where the connection was initially established (e.g. I open the connection in a view, access it in a signal and when I try to save in the view the connection is lost).
I’ve tried to extract the main problem I have come up with and break it down to a small piece of example code below.
I would appreciate any explanation of behavior to or even better a solution ;)
from neo4j.v1 import Driver, GraphDatabase, basic_auth, Session, Transaction
def main():
gm = GraphMapper()
gm.begin_atomic_transaction()
print(f"graph connection closed before method? {gm.is_connection_closed()}") # -> false
fill_transaction() #the context switch
print(f"graph connection closed after method? {gm.is_connection_closed()}") # -> true
if not gm.is_connection_closed():
print(f"graph connection open - try to commit") # -> is never called
gm.commit_atomic_transaction_and_close_session()
def fill_transaction():
gm = GraphMapper()
print(f"graph connection closed in method? {gm.is_connection_closed()}") # -> true
gm.create_update_node("TestNode")
class GraphMapper:
__instance = None
__transaction = None # type: Transaction
__session = None # type: Session
__connection = None # type: Connection
__driver = None # type: Driver
def __new__(cls, *args, **kwargs):
if not isinstance(cls.__instance, cls):
cls.__instance = object.__new__(cls, *args, **kwargs)
return cls.__instance
def __init__(self):
self.__driver = GraphDatabase.driver("bolt://localhost:7687", auth=basic_auth("neo4j", "password"))
def is_connection_closed(self):
return self.__transaction.session._connection._closed
def begin_atomic_transaction(self):
self.__session = self.__driver.session()
self.__transaction = self.__session.begin_transaction()
self.__connection = self.__transaction.session._connection
return self.__transaction
def commit_atomic_transaction_and_close_session(self):
result = self.__transaction.commit()
self.__transaction = None
return result
def create_update_node(self, label):
# Add Cypher statement to transaction
Implementation details: I have a wrapper object "GraphMapper" which encapsulates the connection, session, and transaction of the driver. and is designed as a singleton instance. A transaction is established at a point (A, e.g. a view) but I cannot complete the transaction here. I need to add additional values from location (B, e.g. a post-save signal). However, I cannot pass a reference to the "GraphMapper" A to B. Thus, I came up with the singleton implementation as explained above.
I have ensured that the singleton is exact the same instance on all locations (within one request). But at the moment I exit the context (package, class or method) through a method call and retrieve the "GraphMapper" instance at the very next location, the connection is closed. I even checked the reference count to the "GraphMapper" and its connection and the garbage collector should not delete it. Seldom it says the connection is not closed. But writing to the graph results in a connection refused error.
P.S.: I know there is some useless and unnecessary code, this is for illustrative purposes only and I wanted to make sure that the garbage collector did not kill some objects.

Python nose unit tests generating too many clients already

I'm using python 3.3, pyramid, sqlalchemy, psygopg2. I'm using a test postgres db for the unit tests. I have 101 unit tests set up for nose to run. On test 101 I get:
nose.proxy.OperationalError: (OperationalError) FATAL: sorry, too many clients already
It seems from the traceback that the exception is being thrown in
......./venv/lib/python3.3/site-packages/SQLAlchemy-0.8.2-py3.3.egg/sqlalchemy/pool.py", line 368, in __connect
connection = self.__pool._creator()
Perhaps tearDown() is not running after each test? Isn't the connection pool limit for Postgresql 100 at one time?
Here's my BaseTest class:
class BaseTest(object):
def setup(self):
self.request = testing.DummyRequest()
self.config = testing.setUp(request=self.request)
self.config.scan('../models')
sqlalchemy_url = 'postgresql://<user>:<pass>#localhost:5432/<db>'
engine = create_engine(sqlalchemy_url)
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
DBSession.configure(bind=engine)
Base.metadata.bind = engine
Base.metadata.create_all(engine)
self.dbsession = DBSession
def tearDown(self):
testing.teardown()
My test classes inherit from BaseTest:
class TestUser(BaseTest):
def __init__(self, dbsession = None):
if dbsession:
self.dbsession = dbsession
def test_create_user(self):
......
......
One of the test classes tests a many-to-many relationship, so in that test class I first create the records needed to satisfy the foreign key relationships:
from tests.test_user import TestUser
from tests.test_app import TestApp
class TestAppUser(BaseTest):
def __init__(self, dbsession = None):
if dbsession:
self.dbsession = dbsession
def create_app_user(self):
test_app = TestApp(self.dbsession)
test_user = TestUser(self.dbsession)
test_app.request = testing.DummyRequest()
test_user.request = testing.DummyRequest()
app = test_app.create_app()
user = test_user.create_user()
......
I'm passing the dbsession into the TestApp and TestUser classes...I'm thinking that is the source of the problem, but I'm not sure.
Any help is greatly appreciated. Thanks.
Pyramid has nothing to do with SQLAlchemy. There is nowhere in Pyramid's API where you would link any of your SQLAlchemy configuration in a way that Pyramid would actually care. Therefore, Pyramid's testing.tearDown() does not do anything with connections. How could it? It doesn't know they exist.
You're using scoped sessions with a unit test, which really doesn't make a lot of sense because your unit tests are probably not threaded. So now you're creating threadlocal sessions and not cleaning them up. They aren't garbage collected because they're threadlocal. You also aren't manually closing those connections so the connection pool thinks they're still being used.
Is there a reason you need the ZopeTransactionExtension in your tests? Are you using the transaction package in your tests, or pyramid_tm? In a test if you don't know what something does then it shouldn't be there. You're calling create_all() from your setUp() method? That's going to be slow as hell introspecting the database and creating tables on every request. Ouch.
class BaseTest(object):
def setUp(self):
self.request = testing.DummyRequest()
self.config = testing.setUp(request=self.request)
self.config.scan('../models')
sqlalchemy_url = 'postgresql://<user>:<pass>#localhost:5432/<db>'
self.engine = create_engine(sqlalchemy_url)
Base.metadata.create_all(bind=self.engine)
self.sessionmaker = sessionmaker(bind=self.engine)
self.sessions = []
def makeSession(self, autoclose=True):
session = self.sessionmaker()
if autoclose:
self.sessions.append(session)
def tearDown(self):
for session in self.sessions:
session.close()
self.engine.dispose()
testing.teardown()

Resources