Django: aggregate if all boolean model fields is True - python-3.x

from django.db import models
class Car(models.model):
sold = models.BooleanField(default=False)
I can detemine if all cars were sold by making two queries:
sold_count = Car.objects.filter(sold=True).count()
all_count = Car.objects.count()
are_all_sold = (all_count - sold_count) == 0
Since this operation is very frequent on my app, I am wondering if it is possible to do it in just one DB query? e.g. using Aggregation or Query Expressions, etc.
just an update, I can get the stats on how many were sold & unsold by one single query:
Car.objects.values("sold").annotate(count=Count("sold"))
<QuerySet [{'sold': False, 'count': 1}, {'sold': True, 'count': 9}]>

Yes. You can do that by using Model Manager.
in model.py:
from django.db import models
class CarManager(models.Manager):
def are_all_sold_or_not(self):
sold_count = Car.objects.filter(sold=True).count()
all_count = Car.objects.count()
return all_count == sold_count
class Car(models.model):
sold = models.BooleanField(default=False)
objects = CarManager()
in views.py
def myview(request):
.....
.
are_all_sold = Car.objects.are_all_sold_or_not()
.
.
You can read more about Model Manager in the documentation:
https://docs.djangoproject.com/en/4.0/topics/db/managers/

Related

psycop2 list of dictRow to nested pydantic objects

I am working on a small Python CRUD game using FastAPI. For personal reasons, I don't want to use an ORM and I am using psycop2 as a db connector and pydantic for schemas validation (for any CRUD operations).
models/villages.py
from pydantic import BaseModel
class Village(BaseModel):
village_id: int
name: str
owner_id: int
location_id: int
class UserVillages(BaseModel):
villages: list[Village]
crud/villages.py
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
print(records) ## [[1, 'paris', 145, 4, 41], [3, 'milan', 16, 4, 15]]
Instead of printing records, I would like to convert all of my dictRow into a Village object and all Villages into a UserVillages village object. Would it be posible to do that without a lop of extra data structure and loops?
**I found a way to do it but it's not really efficient and there is probably build-in functions to do it **
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
villages = []
for record in records:
villages.append(
Village(
village_id=record[0],
name=record[1],
owner_id=record[2],
location_id=record[3],
)
)
return UserVillages(villages=villages)

How to bulk create or update in Django

I have to process an item report CSV file every 1 hour. The CSV contains 150k+ records for 1 account and there are multiple accounts in my system. I was working previously on rails and there was active record gem to handle this use case very efficiently. I am looking for an alternate to this gem in Django or any built in method that will be helpful to import such large data in bulk.
So far I have tried this code.
class ItemReportService:
def call(self, file_url):
with open(file_url, 'r') as file:
reader = csv.DictReader(file)
products = []
for row in reader:
product = self.process_product(row)
products.append(product)
self.update_products(products)
def process_product(self, row):
print(f'Processing sku: {row["SKU"]}')
product = Product.objects.filter(
sku=row['SKU']).first() or Product(sku=row['SKU'])
product.listing_title = row['Product Name']
product.listed_price = row['Price']
product.buy_box_price = row['Buy Box Item Price'] + \
row['Buy Box Shipping Price']
product.status = row['Lifecycle Status']
return product
def update_products(self, products):
Product.objects.bulk_update(
products,
[
'listing_title',
'listed_price',
'buy_box_price',
'Lifecycle Status'
]
)
It is raising this exception because when there is a new product it doesn't have primary key assigned to it
ValueError: All bulk_update() objects must have a primary key set.
Django 4.1 has new parameters for bulk_create(update_conflicts=bool and update_fields=[])
If your model has a field UNIQUE usually Django would ignore it when creating new data. But if you set the update_conflicts parameter to True, the fields inside update_fields will be updated.
You are not saving the product in the database before applying bulk_update.
I have checked your code for this purpose, you can use bulk_insert with an additional parameter
Model.objects.bulk_create(self.data, ignore_conflicts=True)
or
columns = ['column1', 'column2']
obj = Model.objects.filter(column1="sku").first()
if not obj:
obj = Model.objects.create(column1="sku")
obj.column1 = row["column1"] or obj.column1
obj.column2 = row["column2"] or obj.column2
items_to_be_inserted.append(obj)
In the end, you can do bulk update like
Model.objects.bulk_update(items_to_be_inserted, columns)
This will solve your problem.
I made this class function which can be used on any Django model in a project.
from django.db import models
class BaseModel(models.Model):
#classmethod
def bulk_create_or_update(
cls, uniques: list[str],
defaults: list[str],
data: list[dict]
):
# Get existing object list
data_dict, select = {}, None
for entry in data:
sub_entry, key = {}, ''
for uniq in uniques:
sub_entry[uniq] = entry[uniq]
key += str(entry[uniq])
data_dict[key] = entry
if not select:
select = models.Q(**sub_entry)
continue
select |= models.Q(**sub_entry)
records = cls.objects.filter(select).values('pk', *uniques)
existing = {}
for rec in records:
key = ''
for uniq in uniques:
key += str(rec[uniq])
existing[key] = rec
# Split new objects from existing ones
to_create, to_update = [], []
for key, entry in data_dict.items():
obj = cls(**entry)
if key not in existing:
to_create.append(obj)
continue
obj.pk = existing[key]['pk']
to_update.append(obj)
cls.objects.bulk_create(to_create, batch_size=1000)
cls.objects.bulk_update(to_create, defaults, batch_size=1000)
Let take an usage example
class Product(BaseModel)
price = models.IntegerField()
name = models.CharField(max_length=128, unique=True)
status = models.CharField(max_length=128)
if __name__ == '__main__':
data = [
{'price': 50, 'name': 'p1', 'status': 'New'},
{'price': 33, 'name': 'p2', 'status': 'Old'}
]
Product.bulk_create_or_update(uniques=['name'], defaults=['price', 'status'], data=data)
Any improvement suggestion of the code is welcome.

Error creating partition key using MergeTree engine, Clickhouse

I've been trying to create model using infi.clickhouse_orm but there have been an issue with partition key
My model:
from infi.clickhouse_orm import Model, UInt16Field, Float32Field, StringField, MergeTree,DateField
class OHLC(Model):
__tablename__ = 'ohlc'
id = UInt16Field()
min = Float32Field()
max = Float32Field()
start_date = DateField()
interval = StringField()
engine = MergeTree(partition_key=['id'])
I get the error:
DB::Exception: Syntax error: .. SETTINGS index_granularity=8192.
Expected one of: Arrow, token, non-empty parenthesized list of
expressions
creating my db
""" SqlAlchemy ClickHouse database session maker """
db = Database('test', db_url=os.environ['TEST_CONNECTION'],
username=os.environ['CLICKHOUSE_USER'], password=os.environ['CLICKHOUSE_PASSWORD'])
db.create_database()
db.create_table(OHLC)
The MergeTree-engine required the primary key in the table declaration that passed in order_by-parameter:
..
engine = MergeTree(partition_key=['id'], order_by=['id'])
..
from infi.clickhouse_orm.engines import MergeTree
from infi.clickhouse_orm.fields import UInt16Field, Float32Field, StringField, DateField
from infi.clickhouse_orm.models import Model
from sqlalchemy import create_engine
class OHLC(Model):
__tablename__ = 'ohlc'
id = UInt16Field()
min = Float32Field()
max = Float32Field()
start_date = DateField()
interval = StringField()
engine = MergeTree(partition_key=['id'], order_by=['id'])
engine = create_engine('clickhouse://default:#localhost/test_001')
with engine.connect() as conn:
conn.connection.create_database()
conn.connection.create_table(OHLC)
requirements.txt
sqlalchemy==1.3.18
sqlalchemy-clickhouse==0.1.5.post0
infi.clickhouse_orm==1.3.0
Using id as partition key looks pretty suspicious, consider defining it as toYYYYMM(start_date) or something like this:
class OHLC(Model):
__tablename__ = 'ohlc'
id = UInt16Field()
min = Float32Field()
max = Float32Field()
start_date = DateField()
interval = StringField()
engine = MergeTree(partition_key=['toYYYYMM(start_date)'], order_by=['id'])

SQLAlchemy Order joined table by field in another joined table

My project requires that Orders are split into their individual lines which can be displayed in their own views I want these views to order the lines by eta which is a value in the Order table.
I have 3 tables with a 1>1 join on tables 1&2 and a many>many join on tables 2 and 3 defined by table 4 as follows:
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
eta = db.Column(db.DateTime())
order_lines = db.relationship('Line', backref='order', order_by=lambda: Line.id)
def __repr__(self):
return '<Order No. {}>'.format(self.increment_id)
class Line(db.Model):
id = db.Column(db.Integer, primary_key=True)
line_name = db.Column(db.String())
order_id = db.Column(db.Integer, db.ForeignKey('order.id'))
product_id = db.Column(db.String, db.ForeignKey('product.product_id'))
def __repr__(self):
return '<Line SKU: {}>'.format(self.line_sku)
class Line_view(db.Model):
id = db.Column(db.Integer, primary_key=True)
view_name = db.Column(db.String())
view_lines = relationship('Line',
secondary='line_view_join',
backref='views',
lazy='dynamic',
order_by= ***???*** ) #ordery by eta on Order table
def __repr__(self):
return '<View: {}>'.format(self.view_name)
class Line_view_join(db.Model):
__tablename__ = 'line_view_join'
id = db.Column(db.Integer(), primary_key=True)
line_id = db.Column(db.Integer(), db.ForeignKey('line.id', ondelete='CASCADE'))
view_id = db.Column(db.Integer(), db.ForeignKey('line_view.id', ondelete='CASCADE'))
I am trying to work out how to query table 3, Line_View and have the joined Lines ordered by the eta of Order table.
Such that when querying:
chosen_view = Line_view.query.filter_by(id = 1).one()
chosen_view.view_lines are ordered by Order.eta
I have Tried
class Line_view(db.Model):
id = db.Column(db.Integer, primary_key=True)
view_name = db.Column(db.String())
view_lines = relationship('Line',
secondary='line_view_join',
backref='views',
lazy='dynamic',
**order_by=lambda: asc(Line.order.eta))**
def __repr__(self):
return '<View: {}>'.format(self.view_name)
But this results in the error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Line.order has an attribute 'eta'
Do you need to store the Line_views in the database? If not, you can query the Lines sorted by the eta attribute of the related order. Below, I create two orders with one line each, and then query the lines sorted by the eta attribute of their order:
eta = datetime(2019,10,10)
o = Order(eta = eta)
l = Line(order=o, line_name="sample")
db.session.add(o)
db.session.add(l)
eta = datetime(2019,11,11)
o1 = Order(eta = eta)
l1 = Line(order=o1, line_name="sample1")
db.session.add(o1)
db.session.add(l1)
db.session.commit()
lines = Line.query.join(Order).order_by(Order.eta)

Peewee - update issue with records with ForeignKeyField('self')

I have a database that I am filling from a pd.DataFrame. One of the classes has a ForeignKeyField('self').
from peewee import SqliteDatabase, Model
from peewee import IntegerField, CharField, ForeignKeyField, BooleanField
import pandas as pd
db = SqliteDatabase(':memory:', pragmas=(('foreign_keys', 'on'),))
class BaseModel(Model):
class Meta:
database = db
class Team(BaseModel):
id = IntegerField(unique = True, primary_key = True)
name = CharField()
reserve_team = BooleanField()
parent_team = ForeignKeyField('self', related_name = 'reserve_teams', null = True)
class Meta:
db_table = 'team_team'
Team.create_table()
The dataframe I am filling from looks something like this
df = pd.DataFrame({'ID': [1,2,3,4,5],
'Name': ['A','A2','B','C','C2'],
'Reserve': [False, True, False, False, True],
'Parent': [None, 'A', None, None, 'C']})
I use the following code to fill the table. The parent_team is set to None and when the table is filled I intend to go back and update this field where appropriate.
data = []
for row in df.itertuples():
data.append((row.ID,
row.Name,
row.Reserve == True,
None))
fields = [Team.id,
Team.name,
Team.reserve_team,
Team.parent_team]
with db.atomic():
Team.insert_many(data, fields = fields).execute()
My problem is that I don't understand how to do this without looping over the dataframe/table combination. The documentation seems pretty clear that this should never be done.
for row in df.itertuples():
if row.Reserve:
r = row.ID
p = row.Parent
Team.update(parent_team = Team.get(Team.name == p)).where(Team.id == r).execute()
You could do a topo-sort of the data and then insert them directly with the parent IDs.
As far as looping and updating -- some ideas:
wrap in a transaction
use ValuesList() to provide the mapping of id->parent id and update all at once
insert id -> parent id into a temp table and update using the temp table (all at once)

Resources