I am trying to implement a database in my python 3 program. I am using SQLite 3. I don't really understand how to use my DBHelper class.
In order to use my DBHelper, I would need to instantiate a DBHelper object and call a function (insert, etc.). However, each time I instantiate an object, a new connection is made to my database.
I am confused because it looks like I am connecting to the database multiple times, when I feel like I should only be connecting once at the start of the program. But if I don't instantiate a DBHelper object, I cannot use the functions that I need.
Having multiple connections like this also sometimes locks my database.
What is the correct way to implement SQLite in my program?
Edit: I need to use the same sql db file across multiple other classes
import sqlite3
class DBHelper:
def __init__(self, dbname="db.sqlite"):
self.dbname = dbname
try:
self.conn = sqlite3.connect(dbname)
except sqlite3.Error as e:
log().critical('local database initialisation error: "%s"', e)
def setup(self):
stmt = "CREATE TABLE IF NOT EXISTS users (id integer PRIMARY KEY)"
self.conn.execute(stmt)
self.conn.commit()
def add_item(self, item):
stmt = "INSERT INTO users (id) VALUES (?)"
args = (item,)
try:
self.conn.execute(stmt, args)
self.conn.commit()
except sqlite3.IntegrityError as e:
log().critical('user id ' + str(item) + ' already exists in database')
def delete_item(self, item):
stmt = "DELETE FROM users WHERE id = (?)"
args = (item,)
self.conn.execute(stmt, args)
self.conn.commit()
def get_items(self):
stmt = "SELECT id FROM users"
return [x[0] for x in self.conn.execute(stmt)]
You can use singleton design pattern in your code. You instantiate your connection once, and each time you call __init__ it will return the same connection. For more information go to here.
Remember, if you are accessing the connection using concurrent workflows, you have to either implement safe access to the database connection inside DBHelper. Read SQLite documents for more information.
Related
I noticed that when having a Model such as :
class User(Model):
id = ...
books = relationship('Book')
When calling user.books for the first time, SQLAlchemy query the database (when lazy='select' for instance, which is the default), but sub-sequent call to user.books don't call the database. The results seems to have been cached.
I'd like to have the same feature from SQLAlchemy when using a method that query, for instance:
class User:
def get_books(self):
return Book.query.filter(Book.user_id == self.id).all()
But when doing that, if I call 3 times get_books(), SQLAlchemy does call the database 3 times (when setting the ECHO property to True).
How can I change get_books() to use the caching system from SQLAlchemy ?
I insist to mention "from SQLAlchemy" because I believe they handle the refresh/expunge/flush system and changes are then re-queried to the DB if one of these happened. Opposed to if I were to simply create a caching property in the model with a simple:
def get_books(self):
if self._books is None:
self._books = Book.query.filter(Book.user_id == self.id).all()
return self._books
This does not work well with flush/refresh/expunge from SQLAlchemy.
So, How can I change get_books() to use the caching system from SQLAlchemy ?
Edit 1:
I realized that the solution provided under is not perfect, because it caches for the current object. If you have two instances of the same user, and call get_books on both, two queries will be made because the caching applies only on the instance, not globally, contrary to SQLAlchemy.
The reason is simple - I believe - but still unclear how to apply it in my case: The object is defined at the class level, not the instance (books = relationship()), and they build their own query internally, so they can cache it based on the query.
In the solution I gave, the memoize_getter is unaware of the query made, and as such, cannot cache it for the same value accros multiple instance, so any identical call made to another instance will query the database.
Original answer:
I've been trying to wrap my head around SQLAlchemy's code (wow that's dense!), and I think I figured it out!
A relationship, at least when being set as "lazy='select'" (default), is a InstrumentedAttribute, which contains a get function that does the following :
def __get__(self, instance, owner):
if instance is None:
return self
dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
try:
state = instance_state(instance)
except AttributeError as err:
util.raise_(
orm_exc.UnmappedInstanceError(instance),
replace_context=err,
)
return self.impl.get(state, dict_)
So, a basic caching system, respecting SQLAlchemy, would be something like:
from sqlalchemy.orm.base import instance_dict
def get_books(self):
dict_ = instance_dict(self)
if 'books' not in dict_:
dict_['books'] = Book.query.filter(Book.user_id == self.id).all()
return dict_['books']
Now, we can push the vice a bit further, and do ... a decorator (oh sweet):
def memoize_getter(f):
#functools.wraps(f)
def decorator(instance, *args, **kwargs):
property_name = f.__name__.replace('get_', '')
dict_ = instance_dict(instance)
if property_name not in dict_:
dict_[property_name] = f(instance, *args, **kwargs)
return dict_[property_name]
return decorator
Thus transforming the original method to :
class User:
#memoize_getter
def get_books(self):
return Book.query.filter(Book.user_id == self.id).all()
If someone has a better solution, I'm eagerly interested!
I'm trying to run a postgres db in a docker container, running a small python program with a class to call the db.
When I run code with a query it seems to work fine, but gives no results.
I can see that I in fact have hit the database because one of the tables have an id constraint, resulting in an error when I try to insert something that already exists.
Using the db from TablePlus works fine.
Code:
import psycopg2
class postgres():
def __init__(self, db="foo", user="bar", password='baz', host='127.0.0.1', port=5432):
self.conn = psycopg2.connect(
database=db, user=user, password=password, host=host, port=port)
self.cur = self.conn.cursor()
def query(self, query):
self.cur.execute(query)
# self.cur.execute("CREATE TABLE test (id serial PRIMARY KEY, num integer, data varchar);")
def close(self):
self.cur.close()
self.conn.close()
db = postgres()
db.query("INSERT INTO test (id) values('test2')")
db.close()
results in:
"""
Traceback (most recent call last):
File "/Users/myname/projects/myproject/dataGathering/postgres.py", line 21, in
db.query("INSERT INTO test (id) values('test2')")
File "/Users/myname/projects/myproject/dataGathering/postgres.py", line 11, in query
self.cur.execute(query)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "id_pkey"
DETAIL: Key (id)=(test) already exists.
"""
Inserting something without a conflicting id raises no error but neither gives any results in the db. SELECT queries have the same faith.
You are not doing:
conn.commit()
So your changes are not committed.
From here:
https://www.psycopg.org/docs/connection.html
" close()
Close the connection now (rather than whenever del is executed). The connection will be unusable from this point forward; an InterfaceError will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection. Note that closing a connection without committing the changes first will cause any pending change to be discarded as if a ROLLBACK was performed (unless a different isolation level has been selected: see set_isolation_level())."
I am looking at the best (and correct way) to obtain a request-independent db session.
The problem is the following: I am building a web application that has to access the database. The endpoint exposed accepts a request, performs the first work, then create a thread (that will perform the hard work), starts it, and replies to the client with a unique id for the "job". Meanwhile the thread goes on with its work (and it has to access the database) and the client can perform polling to check the status. I am not using dedicated framework to perform this background job, but only a simple thread. I can only have one single background thread going on at any time, for this reason I am maintaining the state in a singleton.
The application is created with the application factory design https://flask.palletsprojects.com/en/1.1.x/patterns/appfactories/
I am using Gunicorn as WSGI server and sqlite as database.
The basic structure of the code is the following (I am removing the business logic and imports, but the concept remain):
api_jobs.py
#bp.route('/jobs', methods=['POST'])
def create_job():
data = request.get_json(force=True) or {}
name = data['name']
job_controller = JobController() # This is a singleton
job_process = job_controller.start_job(name)
job_process_dict = job_process.to_dict()
return jsonify(job_process_dict)
controller.py
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class JobController(object):
__metaclass__ = Singleton
def __init__(self):
self.job_thread = None
def start_job(self, name):
if self.job_thread is not None:
job_id = self.job_thread.job_id
job_process = JobProcess.query.get(job_id)
if job_process.status != 'end':
raise ValueError('A job process is already ongoing!')
else:
self.job_thread = None
job_process = JobProcess(name)
db.session.add(job_process)
db.session.commit() # At this step I create the ID
self.job_thread = JobThread(db.session, job_process.id)
self.job_thread.start()
return job_process
class JobThread(threading.Thread):
def __init__(self, db_session, job_id):
self.job_id = job_id
self.db_session = db_session
self.session = self.db_session()
def run(self):
self.job_process = self.session.query(JobProcess).get(self.job_id)
self.job_process.status = 'working'
self.session.commit()
i = 0
while True:
sleep(1)
print('working hard')
i = i +1
if i > 10:
break
self.job_process.status = 'end'
self.session.commit()
self.db_session.remove()
models.py
class JobProcess(db.Model):
id = db.Column(db.Integer, primary_key=True)
status = db.Column(db.String(64))
name = db.Column(db.String(64))
def to_dict(self):
data = {
'id': self.id,
'status': self.status,
'name': self.name,
}
return data
From my understanding, calling self.session = self.db_session() is actually doing nothing (due to the fact that sqlalchemy is using a registry, that is also a proxy, if I am not wrong), however that was the best attempt that I found to create a "new/detached/useful" session.
I checked out https://docs.sqlalchemy.org/en/13/orm/contextual.html#using-thread-local-scope-with-web-applications in order to obtain a request-independent db-session, however even using the suggested method of creating a new session factory (sessionmaker + scoped_session), does not work.
The errors that I obtain, with slight changes to the code, are multiple, in this configuration the error is
DetachedInstanceError: Instance <JobProcess at 0x7f875f81c350> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3)
The basic question remains: Is it possible to create a session that will live inside the thread and that I will take care of creating/tearing down?
The reason that you are encountering the DetachedInstanceError is that you are attempting to pass the session from your main thread to your job thread. Sqlalchemy is using thread local storage to manage the sessions and thus a single session cannot be shared between two threads. You just need to create a new session in the run method of your job thread.
I have the following code structure written in Python3.6, which I need to test using sqlite3 (because of standards defined in my project):
class BigSecretService:
""" Class designed to make calculations based on data stored in MySQL. """
def load_data(self):
# load some data using sqlalchemy ORM
def get_values_from_fields(self, fields):
# here's getting values via sqlalchemy execute with raw query:
self.sql_service.execute(SOME_QUERY)
def process_data(self, data, values):
# again execute some raw query
# process data and put into result list
return reuslt_list
def make_calculations(self, params):
data = self.load_data()
values = self.get_values_from_fields(fields)
result_vector = process_data(data, values)
SOME_QUERY is in separate module and it's format looks like this:
"SELECT SUM(some_field) FROM some_table WHERE col1 = :col1 AND col2 = :col2"
To cover make_calculations in my component test I designed awful patches:
class PatchedConnection:
""" Class is used to transform queries to sqlite format before executing. """
def __init__(self, connection, engine):
self.connection = connection
self.engine = engine
def __call__(self):
conn = self.connection()
conn.execute = self.patched_execute(conn.execute)
return conn
def transform_date(self, date):
try:
# quick check just for testing
if '+00:00' in date:
date = date.replace('T', ' ').replace('+00:00', '.000000')
finally:
return date
def patched_execute(self, f_execute):
def prepare_args_for_sqlite(query, *args):
# check if query is in sqlite format
if args:
if '?' in str(query):
args = list(map(self.transform_date, list(args[0].values())))
return self.engine.execute(str(query), args)
return f_execute(query, args[0])
else:
return f_execute(query)
return prepare_args_for_sqlite
Then in test it looks like this:
QUERY_TEMPLATE_SQLITE = 'SELECT SUM(some_field) FROM some_table WHERE col1 = ? AND col2 = ?'
with mock.patch('path_to_my_service.SOME_QUERY', QUERY_TEMPLATE_SQLITE):
self.sql_service.get_connection = PatchedConnection(self.sql_service.get_connection, self.engine)
response = self.client.simulate_post("/v1/secret_service/make_calculations",
headers=self.auth_header,
body=json.dumps(payload))
self.assertEqual(response.status_code, 200)
# then check response.text
It works so far, but I believe there must be much better solution. Moreover, in patched_execute args from dict are being converted to list, and who knows if order of dict values will be the same all the time.
So, my question is how to perform such testing in a correct way with given tools?
If you need to intercept and manipulate the SQL being sent to the database then using core events https://docs.sqlalchemy.org/en/13/core/events.html would be the most straightforward way of doing this. The before_cursor_execute event would suit your purposes as outlined in the following example from the SQLAlchemy documentation.
#event.listens_for(engine, "before_cursor_execute", retval=True)
def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
# do something with statement, parameters
return statement, parameters
From the example you have given however, I'm not sure that this is necessary. The MySQL query you have listed is also a valid SQLite query and needs no manipulation. Also if you pass your parameters as python objects, rather than as strings, then again no manipulation should be needed as SQLAlchemy will map these correctly to the backend.
I have this method on my class:
def exec_query(self, dbms, connection_string, sql):
self.dbms = dbms
self.connection_string = connection_string
self.sql = sql
self._connection, self._cursor = self._mydb(self.dbms, self.connection_string)
self._result = None
self.query_result = []
try:
self._cursor.execute(self.sql)
self.collected_data = self._cursor
except Exception as e:
raise e
self._cursor.close()
self._connection.close()
return self.collected_data
Then I tried to get it's return value outside the class. And I got this error:
pyodbc.ProgrammingError: Attempt to use a closed cursor.
Can't I assign the cursor to a variable? And why?
What I want to do is to process the cursor outside the class. Basically, I can just do .fetchall() and get the data then close the cursor. But .fetchall() eats the memory. So, I want to process the cursor outside.
self.collected_data is just another name for the exact same object as self._cursor. If you close self._cursor then self.collected_data is also closed.
You need to either use
self.collected_data = self.cursor.fetch_all()
(or whatever) to save the actual data, or leave the connection and cursor open, process your data outside the class, and then call a close method.
You may want to look at making your class usable as a context manager, then use it something like this:
with db_connection.exec_query(dbms, connection_string, sql) as data:
# process data
while data:
row in data.fetch_one()
do_something_with_row()
# on exit, cursor is closed
I probably have the particulars on the while data wrong, but hopefully you get the idea. Your exec_query would look something like this:
class exec_query:
def __init__(self, dbms, connection_string, sql):
self.dbms = dbms
self.connection_string = connection_string
self.sql = sql
self._connection, self._cursor = self._mydb(self.dbms, self.connection_string)
self._result = None
self.query_result = []
self._cursor.execute(self.sql)
def __enter__(self):
return self._cursor
def __exit__(self, *args):
self._cursor.close()
self._connection.close()