When to use SQL Foreign key using peewee? - python-3.x

I'm currently using PeeWee together with Python and I have managed to create a decent beginner
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
which has been converted to peewee by following code:
# ------------------------------------------------------------------------------- #
class Stores(Model):
id = IntegerField(column_name='id')
store_name = TextField(column_name='store_name')
class Meta:
database = postgres_pool
db_table = "stores"
#classmethod
def get_all(cls):
try:
return cls.select(cls.id, cls.store_name).order_by(cls.store)
except Stores.IntegrityError:
return None
# ------------------------------------------------------------------------------- #
class Products(Model):
id = IntegerField(column_name='id')
store_id = TextField(column_name='store_id')
title = TextField(column_name='title')
url = TextField(column_name='url')
image = TextField(column_name='image')
store = ForeignKeyField(Stores, backref='products')
class Meta:
database = postgres_pool
db_table = "products"
#classmethod
def get_all_products(cls, given_id):
try:
return cls.select().where(cls.store_id == given_id)
except Stores.IntegrityError:
return None
#classmethod
def add_product(cls, pageData, store_id):
"""
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'), 'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t51116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
"""
try:
return cls.insert(
store_id=store_id,
title=pageData.title,
url=pageData.url,
image=pageData.image,
).execute()
except Products.DoesNotExist:
return None
except peewee.IntegrityError as err:
print(f"error: {err}")
return None
My idea is that when I start my application, I would have a constant variable which a store_id set already e.g. 1. With that it would make the execution of queries faster as I do not need another select to get the store_id by a store_name. However looking at my code. I have a field that is: store = ForeignKeyField(Stores, backref='products') where I am starting to think what do I need it in my application.
I am aware that I do have a FK from my ALTER query but in my application that I have written I cannot see a reason why I would need to type in the the foreign key at all but I would like some help to understand more why and how I could use the value "store" in my applciation. It could be as I think that I might not need it at all?

Hello! By reading your initial idea about making "the execution of queries faster" from having a constant variable, the first thing that came to mind was the hassle of always having to manually edit the variable. This is poor practice and not something you'd want to do on a professional application. To obtain the value you should use, I suggest running a query programmatically and fetching the id's highest value using SQL's MAX() function.
As for the foreign key, you don't have to use it, but it can be good practice when it matters. In this case, look at your FK constraint: it has an ON DELETE RESTRICT statement, which cancels any delete operation on the parent table if it has data being used as a foreign key in another table. This would require going to the other table, the one with the foreign key, and deleting every row related to the one on the previous table before being able to delete it.
In general, if you have two tables with information linked in any way, I'd highly suggest using keys. It increases organization and, if proper constraints are added, it increases both readability for external users and reduces errors.
When it comes to using the store you mentioned, you might want to have an API return all products related to a single store. Or all products except from a specific one.
I tried to keep things simple due to not being fully confident I understood the question. I hope this was helpful.

Related

How to INSERT into a database using JOIN

I'm currently using PeeWee together with Python and I have managed to create a cool application
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
which has been converted to peewee by following code:
# ------------------------------------------------------------------------------- #
class Stores(Model):
id = IntegerField(column_name='id')
store_name = TextField(column_name='store_name')
class Meta:
database = postgres_pool
db_table = "stores"
#classmethod
def get_all(cls):
try:
return cls.select(cls.id, cls.store_name).order_by(cls.store)
except Stores.IntegrityError:
return None
# ------------------------------------------------------------------------------- #
class Products(Model):
id = IntegerField(column_name='id')
title = TextField(column_name='title')
url = TextField(column_name='url')
image = TextField(column_name='image')
store = ForeignKeyField(Stores, backref='products')
class Meta:
database = postgres_pool
db_table = "products"
#classmethod
def add_product(cls, pageData, store_name):
"""
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'), 'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t1116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
"""
try:
return cls.insert(
store_id=cls.select(cls.store.id).join(Stores).where(cls.store.store_name == store_name).get().store.id,
title=pageData.title,
url=pageData.url,
image=pageData.image,
).execute()
except Products.DoesNotExist:
return None
However I have realized that working with id's is quite faster than working with text and I have an issue where I am trying to figure out what would be the best way to insert the ID. I did get a comment regarding my code as for today:
your insert isn't' referencing "stores" at all so not sure what your hoping to get from that since you have a sub query there
I am a bit confused what that means however my question is that I would like to know which approach is the correct way to insert
Is it better on start of application, to store the id as a variable and pass the variable into a insert function (argument)
Or to call store_id=cls.select(cls.store.id).join(Stores).where(cls.store.store_name == store_name).get().store.id where I instead pass the store_name and then it would return the correct id?
My first thought is that by doing the number 2, that is like doing 2 queries instead of one? but I might be wrong. Looking forward to know!
This is quite incorrect:
# Wrong
store_id=cls.select(cls.store.id).join(Stores).where(cls.store.store_name == store_name).get().store.id,
Correct:
try:
store = Stores.select().where(Stores.name == store_name).get()
except Stores.DoesNotExist:
# the store name does not exist. do whatever?
return
Products.insert(store=store, ...rest-of-fields...).execute()

How to Save all models changes in one query on Django

I try to modify many instance of some model (like User model), and this changes is different (I don't want to use update QuerySet method and not works for my scenario).
For example some user need to change first_name and some user need to change last_name and get users like : all_user = User.objects.all()
I think if I use save method for each instance after change, Django sent one query for save that!
How can I save all changes to database in one query instead of use foreach on models and save that one by one?
Given the comment from #iklinac, I would thoroughly recommend implementing django's own approach to bulk updates detailed here
It's quite similar to my original answer, below, but it looks like the functionality is now built in.
# bulk_update(objs, fields, batch_size=None)
>>> objs = [
... Entry.objects.create(headline='Entry 1'),
... Entry.objects.create(headline='Entry 2'),
... ]
>>> objs[0].headline = 'This is entry 1'
>>> objs[1].headline = 'This is entry 2'
>>> Entry.objects.bulk_update(objs, ['headline'])
Original answer
There's a package called django-bulk-update which is similar to bulk create which is builtin to django.
An example of where I use this, is part of an action in an admin class;
#admin.register(Token)
class TokenAdmin(admin.ModelAdmin):
list_display = (
'id',
'type'
)
actions = (
'set_type_charity',
)
def set_type_charity(self, request, queryset):
for token in queryset:
token.type = Token.Type.CHARITY
bulk_update(
queryset,
update_fields=['type', 'modified'],
batch_size=1000
)
Usage, taken from their readme;
With manager:
import random
from django_bulk_update.manager import BulkUpdateManager
from tests.models import Person
class Person(models.Model):
...
objects = BulkUpdateManager()
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
person.name = random.choice(random_names)
Person.objects.bulk_update(people, update_fields=['name']) # updates only name column
Person.objects.bulk_update(people, exclude_fields=['username']) # updates all columns except username
Person.objects.bulk_update(people) # updates all columns
Person.objects.bulk_update(people, batch_size=50000) # updates all columns by 50000 sized chunks
With helper:
import random
from django_bulk_update.helper import bulk_update
from tests.models import Person
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
person.name = random.choice(random_names)
bulk_update(people, update_fields=['name']) # updates only name column
bulk_update(people, exclude_fields=['username']) # updates all columns except username
bulk_update(people, using='someotherdb') # updates all columns using the given db
bulk_update(people) # updates all columns using the default db
bulk_update(people, batch_size=50000) # updates all columns by 50000 sized chunks using the default db

How to add a default filter parameter to every query in mongoengine?

I've been researching a lot, but I haven't found a way.
I have Document clases with a _owner attribute which specifies the ObjectID of the owner, which is a per-request value, so it's globally available. I would like to be able to set part of the query by default.
For example, doing this query
MyClass.objects(id='12345')
should be the same as doing
MyClass.objects(id='12345', _owner=global.owner)
because _owner=global.owner is always added by default
I haven't found a way to override objects, and using a queryset_classis someway confusing because I still have to remember to call a ".owned()" manager to add the filter every time I want to query something.
It ends up like this...
MyClass.objects(id='12345').owned()
// same that ...
MyClass.objects(id='12345', _owner=global.owner)
Any Idea? Thanks!
The following should do the trick for querying (example is simplified by using a constant owned=True but it can easily be extended to use your global):
class OwnedHouseWrapper(object):
# Implements descriptor protocol
def __get__(self, instance, owner):
return House.objects.filter(owned=True)
def __set__(self, instance, value):
raise Exception("can't set .objects")
class House(Document):
address = StringField()
owned = BooleanField(default=False)
class OwnedHouse:
objects = OwnedHouseWrapper()
House(address='garbage 12', owned=True).save()
print(OwnedHouse.objects()) # [<House: House object>]
print(len(OwnedHouse.objects)) # 1

Commit error after uploading data

i have a simple program that stores some inputs in a database. I use flask-sqlalchemy as a ORM and didn't have any issues until now. Due some issues, i had to save my data onto CSV files and erase everything. After that, i uploaded the data back again using the df.to_sql method from pandas.
NOTE: I'm using df.to_sql to load the previously saved CSV back to the database. The idea is to recover the data that i had stored.
Now, with everything back normal (or so i thought) when i try to upload data using my usual method (filling a form) and commit the changes in the database, i get the following error:
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key violates \
uniqueness restriction
Detail: The key already exists (id) = (#).
Every time i repeat the process, the error stays the same, only that # changes to #+1 (eg: from 2 goes to 3 and so on).
Sorry for my english, if you need any clarifications please ask, i'll try to edit this post the best i can.
Thanks for your time!
EDIT 1:
The process is adding a new line to the database and committing:
new_observation = Observations(var1 = new_var1, var2 = new_var2)
db.session.add(new_observation)
db.session.commit()
EDIT 2:
The model of the database is:
class Observations(db.Model):
__tablename__ = 'observations'
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
timestamp = db.Column(db.DateTime, index=True, default=datetime.today())
var1= db.Column(db.Numeric)
var2= db.Column(db.Numeric)
EDIT 3:
As suggested by mad_ i tried filling the primary key directly:
new_observation = Observations(primary_key = some_number, var1 = new_var1, var2 = new_var2)
db.session.add(new_observation)
db.session.commit()
The problem now is that i get this new error:
sqlalchemy.orm.exc.FlushError: New instance <observations at 0x47deb50> \
with identity key (<class 'app.models.observations '>, (368,), None) \
conflicts with persistent instance <observations at 0x4ab8a90>
Thanks to the comments from #mad_ I was able to solve my problem. The issue presented when I uploaded a table back to my database. When I tried to commit a new observation to the DB I got an error.
A workaround is to explicitly declare the primary key. With this I got a new error which was solved by disabling the autoincrement property of the primary key ( autoincrement = False).

Multi-threading SQLAlchemy and return results to ObjectListView

I have run into another issue with a program I am working on. Basically what my program does is it takes up to 4 input files, processes them and stores the information I collect from them in a SQLite3 database on my computer. This has allowed me to view the data any time I want without having to run the input files again. The program uses a main script that is essentially just an AUI Notebook that imports an input script, and output scripts to use as panels.
To add the data to the database I am able to use threading since I am not returning the results directly to my output screen(s). However, when I need to view the entire contents from my main table I end up with 25,000 records that are being loaded. While these are loading my GUI is locked and almost always displays: "Program not responding".
I would like to use threading/multiprocessing to grab the 25k records from the database and load them into my ObjectListView widget(s) so that my GUI is still usable during this process. When I attempted to use a similar threading class that is used to add the data to the database I get nothing returned. When I say I get nothing I am not exaggerating.
So here is my big question, is there a way to thread the query and return the results without using global variables? I have not been able to find a solution with an example that I could understand, but I may be using the wrong search terms.
Here are the snippets of code pertaining to the issue at hand:
This is what I use to make sure the data is ready for my ObjectListView widget.
class OlvMainDisplay(object):
def __init__(self, id, name, col01, col02, col03, col04, col05,
col06, col07, col08, col09, col10, col11,
col12, col13, col14, col15):
self.id = id
self.name = name
self.col01 = col01
self.col02 = col02
self.col03 = col03
self.col04 = col04
self.col05 = col05
self.col06 = col06
self.col07 = col07
self.col08 = col08
self.col09 = col09
self.col10 = col10
self.col11 = col11
self.col12 = col12
self.col13 = col13
self.col14 = col14
self.col15 = col15
The 2 tables I am pulling data from:
class TableMeta(base):
__tablename__ = 'meta_extra'
id = Column(String(20), ForeignKey('main_data.id'), primary_key=True)
col06 = Column(String)
col08 = Column(String)
col02 = Column(String)
col03 = Column(String)
col04 = Column(String)
col09 = Column(String)
col10 = Column(String)
col11 = Column(String)
col12 = Column(String)
col13 = Column(String)
col14 = Column(String)
col15 = Column(String)
class TableMain(base):
__tablename__ = 'main_data'
id = Column(String(20), primary_key=True)
name = Column(String)
col01 = Column(String)
col05 = Column(String)
col07 = Column(String)
extra_data = relation(
TableMeta, uselist=False, backref=backref('main_data', order_by=id))
I use 2 queries to collect from these 2 tables, one grabs all records while the other one is part of a function definition that takes multiple dictionaries and applies filters based on the dictionary contents. Both queries are part of my main "worker" script that is imported by each of my notebook panels.
Here is the function that applies the filter(s):
def multiFilter(theFilters, table, anOutput, qType):
session = Session()
anOutput = session.query(table)
try:
for x in theFilters:
for attr, value in x.items():
anOutput = anOutput.filter(getattr(table, attr).in_(value))
except AttributeError:
for attr, value in theFilters.items():
anOutput = anOutput.filter(getattr(table, attr).in_(value))
anOutput = convertResults(anOutput.all())
return anOutput
session.close()
The theFilters can either be a single dictionary or a list of dictionaries, hence the "Try:" statement. Once the function has applied the filters it then runs the returned results through another function that puts each result returned through the OlvMainDisplay class and adds them to a list to be passed on to the OLV Widget.
Again the big question, is there a way to thread the query (or queries) and return the results without using global variables? Or possibly grab around 200 records at a time and add the data "in chunks" to the OLV widget?
Thank you in advance.
-MikeS
--UPDATE--
I have reviewed "how to get the return value from a thread in python" and the accepted answer does not return anything or still locked the GUI (not sure what is causing the variance). I would like to limit the number of threads created to about 5 at the most.
--New Update--
I made some corrections to the filter function.
You probably don't want to load the entire database into memory at once. That is usually a bad idea. Because ObjectListView is a wrapper of the ListCtrl, I would recommend using the Virtual version of the the underlying widget. The flag is wx.LC_VIRTUAL. Take a look at the wxPython demo for an example, but basically you load data on demand via virtual methods OnGetItemText(), OnGetItemImage(), and OnGetItemAttr(). Note that that refers to the ListCtrl methods...that may be different in OLV land. Anyway, I know that the OLV version is called VirtualObjectListView and works in much the same way. I'm pretty sure there's an example in the source download.
Ok, I finally managed to get the query to run in a thread and be able to display the results in a standard ObjectListView. I used the answer HERE with some modifications.
I added the code to my main worker script which is imported into my output panel as EW.
Since I am not passing arguments to my query these lines were changed:
def start(self, params):
self.thread = threading.Thread(target=self.func, args=params)
to
def start(self):
self.thread = threading.Thread(target=self.func)
In my output panel I changed how I call upon my default query, the one that returns 25,000+ records. In my output panel's init I added self.worker = () as a placeholder and in my function that runs the default query:
def defaultView(self, evt):
self.worker = EW.ThreadWorker(EW.defaultQuery)
self.worker.start()
pub.sendMessage('update.statusbar', msg='Full query started.')
I also added:
def threadUpdateOLV(self):
time.sleep(10)
anOutput = self.worker.get_results()
self.dataOLV.SetObjects(anOutput)
pub.subscribe(self.threadUpdateOLV, 'thread.completed')
the time.sleep(10) was added after trial an error to get the full 25,000+ results, and I found a 10 seconds delay worked fine.
And finally, at the end of my default query I added the PubSub send right before my output return:
wx.CallAfter(pub.sendMessage, 'thread.completed')
return anOutput
session.close()
To be honest I am sure there is a better way to accomplish this, but as of right now it is serving the purpose needed. I will work on finding a better solution though.
Thanks
-Mike S

Resources