How to Save all models changes in one query on Django - python-3.x

I try to modify many instance of some model (like User model), and this changes is different (I don't want to use update QuerySet method and not works for my scenario).
For example some user need to change first_name and some user need to change last_name and get users like : all_user = User.objects.all()
I think if I use save method for each instance after change, Django sent one query for save that!
How can I save all changes to database in one query instead of use foreach on models and save that one by one?

Given the comment from #iklinac, I would thoroughly recommend implementing django's own approach to bulk updates detailed here
It's quite similar to my original answer, below, but it looks like the functionality is now built in.
# bulk_update(objs, fields, batch_size=None)
>>> objs = [
... Entry.objects.create(headline='Entry 1'),
... Entry.objects.create(headline='Entry 2'),
... ]
>>> objs[0].headline = 'This is entry 1'
>>> objs[1].headline = 'This is entry 2'
>>> Entry.objects.bulk_update(objs, ['headline'])
Original answer
There's a package called django-bulk-update which is similar to bulk create which is builtin to django.
An example of where I use this, is part of an action in an admin class;
#admin.register(Token)
class TokenAdmin(admin.ModelAdmin):
list_display = (
'id',
'type'
)
actions = (
'set_type_charity',
)
def set_type_charity(self, request, queryset):
for token in queryset:
token.type = Token.Type.CHARITY
bulk_update(
queryset,
update_fields=['type', 'modified'],
batch_size=1000
)
Usage, taken from their readme;
With manager:
import random
from django_bulk_update.manager import BulkUpdateManager
from tests.models import Person
class Person(models.Model):
...
objects = BulkUpdateManager()
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
person.name = random.choice(random_names)
Person.objects.bulk_update(people, update_fields=['name']) # updates only name column
Person.objects.bulk_update(people, exclude_fields=['username']) # updates all columns except username
Person.objects.bulk_update(people) # updates all columns
Person.objects.bulk_update(people, batch_size=50000) # updates all columns by 50000 sized chunks
With helper:
import random
from django_bulk_update.helper import bulk_update
from tests.models import Person
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
person.name = random.choice(random_names)
bulk_update(people, update_fields=['name']) # updates only name column
bulk_update(people, exclude_fields=['username']) # updates all columns except username
bulk_update(people, using='someotherdb') # updates all columns using the given db
bulk_update(people) # updates all columns using the default db
bulk_update(people, batch_size=50000) # updates all columns by 50000 sized chunks using the default db

Related

How to handle foreign-keys during the iteration through attributes of Django model (Python)?

Dear Django/Python experts. I have a Django model (python class) which contain standard fields and also fields represented by foreign keys. It is easy to iterate throught attributes of a model however I have no idea how to handle foreign keys?
Here is a model nr.1 Employee containing foreign key which refers to another model EmployeeLocation:
class Employee(models.Model):
firstname = models.CharField(max_length=128)
lastname = models.CharField(max_length=128)
location = models.ForeignKey(EmployeeLocation, on_delete=models.CASCADE)
and here is a model nr.2 EmployeeLocation:
class EmployeeLocation(models.Model):
id = models.BinaryField(primary_key=True, max_length=16, null=False)
city = models.CharField(max_length=32)
and now I iterate via attributes of Employee in the following way:
# Collecting names of class fields.
field_names = [f.name for f in Employee._meta.get_fields()]
for current_attribute in field_names:
field_value = str(getattr(my_employee, current_attribute))
This solution works fine for standard attributes but does not return values when reaches the location which is a foreign key.
To tackle this issue I did the following stunts :) :
I have made a dictionary containing names of foreign keys and as values I have placed Django queryset, that gets a value - but this is not an elegant hack :) In this way then iteration ecounters attribute which is foreign-key, it takes value from dictionary (which is generated by queryset):
FKs = {'location': EmployeeLocation.objects.filter(id=my_employee.location_id)[0].city,
...
...}
# Collecting names of class fields.
field_names = [f.name for f in Employee._meta.get_fields()]
for current_attribute in field_names:
if current_attribute in FKs.keys():
field_value = FKs[current_attribute]
else:
field_value = str(getattr(my_employee, current_attribute))
Please tell me in simple way how shall I realize it properly. Thank you so much in advance :)

When to use SQL Foreign key using peewee?

I'm currently using PeeWee together with Python and I have managed to create a decent beginner
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
which has been converted to peewee by following code:
# ------------------------------------------------------------------------------- #
class Stores(Model):
id = IntegerField(column_name='id')
store_name = TextField(column_name='store_name')
class Meta:
database = postgres_pool
db_table = "stores"
#classmethod
def get_all(cls):
try:
return cls.select(cls.id, cls.store_name).order_by(cls.store)
except Stores.IntegrityError:
return None
# ------------------------------------------------------------------------------- #
class Products(Model):
id = IntegerField(column_name='id')
store_id = TextField(column_name='store_id')
title = TextField(column_name='title')
url = TextField(column_name='url')
image = TextField(column_name='image')
store = ForeignKeyField(Stores, backref='products')
class Meta:
database = postgres_pool
db_table = "products"
#classmethod
def get_all_products(cls, given_id):
try:
return cls.select().where(cls.store_id == given_id)
except Stores.IntegrityError:
return None
#classmethod
def add_product(cls, pageData, store_id):
"""
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'), 'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t51116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
"""
try:
return cls.insert(
store_id=store_id,
title=pageData.title,
url=pageData.url,
image=pageData.image,
).execute()
except Products.DoesNotExist:
return None
except peewee.IntegrityError as err:
print(f"error: {err}")
return None
My idea is that when I start my application, I would have a constant variable which a store_id set already e.g. 1. With that it would make the execution of queries faster as I do not need another select to get the store_id by a store_name. However looking at my code. I have a field that is: store = ForeignKeyField(Stores, backref='products') where I am starting to think what do I need it in my application.
I am aware that I do have a FK from my ALTER query but in my application that I have written I cannot see a reason why I would need to type in the the foreign key at all but I would like some help to understand more why and how I could use the value "store" in my applciation. It could be as I think that I might not need it at all?
Hello! By reading your initial idea about making "the execution of queries faster" from having a constant variable, the first thing that came to mind was the hassle of always having to manually edit the variable. This is poor practice and not something you'd want to do on a professional application. To obtain the value you should use, I suggest running a query programmatically and fetching the id's highest value using SQL's MAX() function.
As for the foreign key, you don't have to use it, but it can be good practice when it matters. In this case, look at your FK constraint: it has an ON DELETE RESTRICT statement, which cancels any delete operation on the parent table if it has data being used as a foreign key in another table. This would require going to the other table, the one with the foreign key, and deleting every row related to the one on the previous table before being able to delete it.
In general, if you have two tables with information linked in any way, I'd highly suggest using keys. It increases organization and, if proper constraints are added, it increases both readability for external users and reduces errors.
When it comes to using the store you mentioned, you might want to have an API return all products related to a single store. Or all products except from a specific one.
I tried to keep things simple due to not being fully confident I understood the question. I hope this was helpful.

How to INSERT into a database using JOIN

I'm currently using PeeWee together with Python and I have managed to create a cool application
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
which has been converted to peewee by following code:
# ------------------------------------------------------------------------------- #
class Stores(Model):
id = IntegerField(column_name='id')
store_name = TextField(column_name='store_name')
class Meta:
database = postgres_pool
db_table = "stores"
#classmethod
def get_all(cls):
try:
return cls.select(cls.id, cls.store_name).order_by(cls.store)
except Stores.IntegrityError:
return None
# ------------------------------------------------------------------------------- #
class Products(Model):
id = IntegerField(column_name='id')
title = TextField(column_name='title')
url = TextField(column_name='url')
image = TextField(column_name='image')
store = ForeignKeyField(Stores, backref='products')
class Meta:
database = postgres_pool
db_table = "products"
#classmethod
def add_product(cls, pageData, store_name):
"""
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'), 'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t1116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
"""
try:
return cls.insert(
store_id=cls.select(cls.store.id).join(Stores).where(cls.store.store_name == store_name).get().store.id,
title=pageData.title,
url=pageData.url,
image=pageData.image,
).execute()
except Products.DoesNotExist:
return None
However I have realized that working with id's is quite faster than working with text and I have an issue where I am trying to figure out what would be the best way to insert the ID. I did get a comment regarding my code as for today:
your insert isn't' referencing "stores" at all so not sure what your hoping to get from that since you have a sub query there
I am a bit confused what that means however my question is that I would like to know which approach is the correct way to insert
Is it better on start of application, to store the id as a variable and pass the variable into a insert function (argument)
Or to call store_id=cls.select(cls.store.id).join(Stores).where(cls.store.store_name == store_name).get().store.id where I instead pass the store_name and then it would return the correct id?
My first thought is that by doing the number 2, that is like doing 2 queries instead of one? but I might be wrong. Looking forward to know!
This is quite incorrect:
# Wrong
store_id=cls.select(cls.store.id).join(Stores).where(cls.store.store_name == store_name).get().store.id,
Correct:
try:
store = Stores.select().where(Stores.name == store_name).get()
except Stores.DoesNotExist:
# the store name does not exist. do whatever?
return
Products.insert(store=store, ...rest-of-fields...).execute()

Insert a nested schema into a database with fastAPI?

I have recently come to know about fastAPI and worked my way through the tutorial and other docs. Although fastAPI is pretty well documented, I couldn't find information about how to process a nested input when working with a database.
For testing, I wrote a very small family API with two models:
class Member(Base):
__tablename__ = 'members'
id = Column(Integer, primary_key=True, server_default=text("nextval('members_id_seq'::regclass)"))
name = Column(String(128), nullable=False)
age = Column(Integer, nullable=True)
family_id = Column(Integer, ForeignKey('families.id', deferrable=True, initially='DEFERRED'), nullable=False, index=True)
family = relationship("Family", back_populates="members")
class Family(Base):
__tablename__ = 'families'
id = Column(Integer, primary_key=True, server_default=text("nextval('families_id_seq'::regclass)"))
family_name = Column(String(128), nullable=False)
members = relationship("Member", back_populates="family")
and I created a Postgres database with two tables and the relations described here. With schema definitions and a crud file as in the fastAPI tutorial, I can create individual families and members and view them in a nested fashion with a get request. Here is the nested schema:
class Family(FamilyBase):
id: int
members: List[Member]
class Config:
orm_mode = True
So far, so good. Now, I would like to add a post view which accepts the nested structure as input and populates the database accordingly. The documentation at https://fastapi.tiangolo.com/tutorial/body-nested-models/ shows how to do this in principle, but it misses the database (i.e. crud) part.
As the input will not have id fields and obviously doesn't need to specify family_id, I have a MemberStub schema and the NestedFamilyCreate schema as follows:
class MemberStub(BaseModel):
name: str
age: int
class NestedFamilyCreate(BaseModel):
family_name: str
members: List[MemberStub]
In my routing routine families.py I have:
#app.post('/nested-families/', response_model=schemas.Family)
def create_family(family: schemas.NestedFamilyCreate, db: Session = Depends(get_db)):
# no check for previous existence as names can be duplicates
return crud.create_nested_family(db=db, family=family)
(the response_model points to the nested view of a family with all members including all ids; see above).
What I cannot figure out is how to write the crud.create_nested_family routine. Based on the simple create as in the tutorial, this looks like:
def create_nested_family(db: Session, family: schemas.NestedFamilyCreate):
# split information in family and members
members = family.members
core_family = None # ??? This is where I get stuck
db_family = models.Family(**family.dict()) # This fails
db.add(db_family)
db.commit()
db.refresh(db_family)
return db_family
So, I can extract the members and can loop through them, but I would first need to create a new db_family record which must not contain the members. Then, with db.refresh, I would get the new family_id back, which I could add to each record of members. But how can I do this? If I understand what is required here, I would need to achieve some mapping of my nested schema onto a plain schema for FamilyCreate (which works by itself) and a plain schema for MemberCreate (which also works by itself). But how can I do this?
I found a solution after re-reading about Pydantic models and their mapping to dict.
in crud.py:
def create_nested_family(db: Session, family: schemas.NestedFamilyCreate):
# split information in family and members
family_data = family.dict()
member_data = family_data.pop('members', None) # ToDo: handle error if no members
db_family = models.Family(**family_data)
db.add(db_family)
db.commit()
db.refresh(db_family)
# get family_id
family_id = db_family.id
# add members
for m in member_data:
m['family_id'] = family_id
db_member = models.Member(**m)
db.add(db_member)
db.commit()
db.refresh(db_member)
return db_family
Hope, this may be useful to someone else.

Column-name mapping in SQLAlchemy

I'm just two days into learning SQLAlchemy, and have a question regarding mapping of column names. First off, the following statement in the official docs threw me off balance:
Matching columns on name works for simple cases but can become
unwieldy when dealing with complex statements that contain duplicate
column names or when using anonymized ORM constructs that don’t easily
match to specific names. Additionally, there is typing behavior
present in our mapped columns that we might find necessary when
handling result rows.
Ugh, what?! An example or two of what this paragraph is trying to say would have been great, but never mind; it's a task for another day.
I kind of understood that sometimes the field list generated by our query will not map perfectly with the model, so we can specify expected columns positionally. This got me trying to think up an example where this mapping might be useful. So I thought, how about selecting a column from the table that is calculated on the fly? I simulated this in a sqlite database using the random() function like this:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import sessionmaker
engine = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
Session = sessionmaker(bind=engine)
session = Session()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
def __repr__(self):
return "User<name={}, fullname={}, password={}>".format(self.name, self.fullname, self.password)
# Create all tables
Base.metadata.create_all(engine)
ed_user = User(name='ed', fullname='Ed Jones', password='edspassword')
session.add(ed_user)
session.add_all([
User(name='wendy', fullname='Wendy Williams', password='foobar'),
User(name='mary', fullname='Mary Contrary', password='xxg527'),
User(name='fred', fullname='Fred Flinstone', password='blah')
])
session.commit()
stmt = text("SELECT name, id, fullname, random() FROM users WHERE name=:name")
stmt = stmt.columns(User.name, User.id, User.fullname ...
Now what? I can't write a column name because none exists, and I can't add it to the model definition because this column will be created unnecessarily. All I wanted to get some extra information from the database (like NOW()) that I can use in code.
How can this be done?
You can use column labels
from sqlalchemy import func
users = session.query(User.id, User.name, User.fullname, func.random().label('random_value'))
for user in users:
print(user.id, user.name, user.fullname, user.random_value)
Try explicitly listing column objects. This way "random()" can be any arbitrary SQL calculated column...
# You may have already imported column, but for completeness:
from sqlalchemy import column
# As an example:
find_name = 'wendy'
stmt = text("SELECT name, id, fullname, (random()) as random_value FROM users WHERE name=:name")
stmt = stmt.columns(column('name'), column('id'), column('fullname'), column('random_value'))
users = session.query(column('name'), column('id'), column('fullname'), column('random_value')).from_statement(stmt).params(name = find_name)
for user in users:
print(user.id, user.name, user.fullname, user.random_value)
All right, so I seem to have figured this out myself (damn, that's twice in a day! The SQLAlchemy community here seems dead, which is forcing me to search for answers myself ... not a very bad thing, though!). Here's how I did it:
from sqlalchemy import func
session.query(User.name, User.id, User.fullname, func.current_timestamp()).filter(User.name=='ed').all()

Resources