Django: aggregate if all boolean model fields is True

Django: aggregate if all boolean model fields is True - python-3.x

from django.db import models
class Car(models.model):
sold = models.BooleanField(default=False)
I can detemine if all cars were sold by making two queries:
sold_count = Car.objects.filter(sold=True).count()
all_count = Car.objects.count()
are_all_sold = (all_count - sold_count) == 0
Since this operation is very frequent on my app, I am wondering if it is possible to do it in just one DB query? e.g. using Aggregation or Query Expressions, etc.
just an update, I can get the stats on how many were sold & unsold by one single query:
Car.objects.values("sold").annotate(count=Count("sold"))
<QuerySet [{'sold': False, 'count': 1}, {'sold': True, 'count': 9}]>

Yes. You can do that by using Model Manager.
in model.py:
from django.db import models
class CarManager(models.Manager):
def are_all_sold_or_not(self):
sold_count = Car.objects.filter(sold=True).count()
all_count = Car.objects.count()
return all_count == sold_count
class Car(models.model):
sold = models.BooleanField(default=False)
objects = CarManager()
in views.py
def myview(request):
.....
.
are_all_sold = Car.objects.are_all_sold_or_not()
.
.
You can read more about Model Manager in the documentation:
https://docs.djangoproject.com/en/4.0/topics/db/managers/

Related

psycop2 list of dictRow to nested pydantic objects

I am working on a small Python CRUD game using FastAPI. For personal reasons, I don't want to use an ORM and I am using psycop2 as a db connector and pydantic for schemas validation (for any CRUD operations).
models/villages.py
from pydantic import BaseModel
class Village(BaseModel):
village_id: int
name: str
owner_id: int
location_id: int
class UserVillages(BaseModel):
villages: list[Village]
crud/villages.py
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
print(records) ## [[1, 'paris', 145, 4, 41], [3, 'milan', 16, 4, 15]]
Instead of printing records, I would like to convert all of my dictRow into a Village object and all Villages into a UserVillages village object. Would it be posible to do that without a lop of extra data structure and loops?
**I found a way to do it but it's not really efficient and there is probably build-in functions to do it **
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
villages = []
for record in records:
villages.append(
Village(
village_id=record[0],
name=record[1],
owner_id=record[2],
location_id=record[3],
)
)
return UserVillages(villages=villages)

How to bulk create or update in Django

I have to process an item report CSV file every 1 hour. The CSV contains 150k+ records for 1 account and there are multiple accounts in my system. I was working previously on rails and there was active record gem to handle this use case very efficiently. I am looking for an alternate to this gem in Django or any built in method that will be helpful to import such large data in bulk.
So far I have tried this code.
class ItemReportService:
def call(self, file_url):
with open(file_url, 'r') as file:
reader = csv.DictReader(file)
products = []
for row in reader:
product = self.process_product(row)
products.append(product)
self.update_products(products)
def process_product(self, row):
print(f'Processing sku: {row["SKU"]}')
product = Product.objects.filter(
sku=row['SKU']).first() or Product(sku=row['SKU'])
product.listing_title = row['Product Name']
product.listed_price = row['Price']
product.buy_box_price = row['Buy Box Item Price'] + \
row['Buy Box Shipping Price']
product.status = row['Lifecycle Status']
return product
def update_products(self, products):
Product.objects.bulk_update(
products,
[
'listing_title',
'listed_price',
'buy_box_price',
'Lifecycle Status'
]
)
It is raising this exception because when there is a new product it doesn't have primary key assigned to it
ValueError: All bulk_update() objects must have a primary key set.

Django 4.1 has new parameters for bulk_create(update_conflicts=bool and update_fields=[])
If your model has a field UNIQUE usually Django would ignore it when creating new data. But if you set the update_conflicts parameter to True, the fields inside update_fields will be updated.

You are not saving the product in the database before applying bulk_update.
I have checked your code for this purpose, you can use bulk_insert with an additional parameter
Model.objects.bulk_create(self.data, ignore_conflicts=True)
or
columns = ['column1', 'column2']
obj = Model.objects.filter(column1="sku").first()
if not obj:
obj = Model.objects.create(column1="sku")
obj.column1 = row["column1"] or obj.column1
obj.column2 = row["column2"] or obj.column2
items_to_be_inserted.append(obj)
In the end, you can do bulk update like
Model.objects.bulk_update(items_to_be_inserted, columns)
This will solve your problem.

I made this class function which can be used on any Django model in a project.
from django.db import models
class BaseModel(models.Model):
#classmethod
def bulk_create_or_update(
cls, uniques: list[str],
defaults: list[str],
data: list[dict]
):
# Get existing object list
data_dict, select = {}, None
for entry in data:
sub_entry, key = {}, ''
for uniq in uniques:
sub_entry[uniq] = entry[uniq]
key += str(entry[uniq])
data_dict[key] = entry
if not select:
select = models.Q(**sub_entry)
continue
select |= models.Q(**sub_entry)
records = cls.objects.filter(select).values('pk', *uniques)
existing = {}
for rec in records:
key = ''
for uniq in uniques:
key += str(rec[uniq])
existing[key] = rec
# Split new objects from existing ones
to_create, to_update = [], []
for key, entry in data_dict.items():
obj = cls(**entry)
if key not in existing:
to_create.append(obj)
continue
obj.pk = existing[key]['pk']
to_update.append(obj)
cls.objects.bulk_create(to_create, batch_size=1000)
cls.objects.bulk_update(to_create, defaults, batch_size=1000)
Let take an usage example
class Product(BaseModel)
price = models.IntegerField()
name = models.CharField(max_length=128, unique=True)
status = models.CharField(max_length=128)
if __name__ == '__main__':
data = [
{'price': 50, 'name': 'p1', 'status': 'New'},
{'price': 33, 'name': 'p2', 'status': 'Old'}
]
Product.bulk_create_or_update(uniques=['name'], defaults=['price', 'status'], data=data)
Any improvement suggestion of the code is welcome.

Error creating partition key using MergeTree engine, Clickhouse

I've been trying to create model using infi.clickhouse_orm but there have been an issue with partition key
My model:
from infi.clickhouse_orm import Model, UInt16Field, Float32Field, StringField, MergeTree,DateField
class OHLC(Model):
__tablename__ = 'ohlc'
id = UInt16Field()
min = Float32Field()
max = Float32Field()
start_date = DateField()
interval = StringField()
engine = MergeTree(partition_key=['id'])
I get the error:
DB::Exception: Syntax error: .. SETTINGS index_granularity=8192.
Expected one of: Arrow, token, non-empty parenthesized list of
expressions
creating my db
""" SqlAlchemy ClickHouse database session maker """
db = Database('test', db_url=os.environ['TEST_CONNECTION'],
username=os.environ['CLICKHOUSE_USER'], password=os.environ['CLICKHOUSE_PASSWORD'])
db.create_database()
db.create_table(OHLC)

The MergeTree-engine required the primary key in the table declaration that passed in order_by-parameter:
..
engine = MergeTree(partition_key=['id'], order_by=['id'])
..
from infi.clickhouse_orm.engines import MergeTree
from infi.clickhouse_orm.fields import UInt16Field, Float32Field, StringField, DateField
from infi.clickhouse_orm.models import Model
from sqlalchemy import create_engine
class OHLC(Model):
__tablename__ = 'ohlc'
id = UInt16Field()
min = Float32Field()
max = Float32Field()
start_date = DateField()
interval = StringField()
engine = MergeTree(partition_key=['id'], order_by=['id'])
engine = create_engine('clickhouse://default:#localhost/test_001')
with engine.connect() as conn:
conn.connection.create_database()
conn.connection.create_table(OHLC)
requirements.txt
sqlalchemy==1.3.18
sqlalchemy-clickhouse==0.1.5.post0
infi.clickhouse_orm==1.3.0
Using id as partition key looks pretty suspicious, consider defining it as toYYYYMM(start_date) or something like this:
class OHLC(Model):
__tablename__ = 'ohlc'
id = UInt16Field()
min = Float32Field()
max = Float32Field()
start_date = DateField()
interval = StringField()
engine = MergeTree(partition_key=['toYYYYMM(start_date)'], order_by=['id'])

SQLAlchemy Order joined table by field in another joined table

My project requires that Orders are split into their individual lines which can be displayed in their own views I want these views to order the lines by eta which is a value in the Order table.
I have 3 tables with a 1>1 join on tables 1&2 and a many>many join on tables 2 and 3 defined by table 4 as follows:
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
eta = db.Column(db.DateTime())
order_lines = db.relationship('Line', backref='order', order_by=lambda: Line.id)
def __repr__(self):
return '<Order No. {}>'.format(self.increment_id)
class Line(db.Model):
id = db.Column(db.Integer, primary_key=True)
line_name = db.Column(db.String())
order_id = db.Column(db.Integer, db.ForeignKey('order.id'))
product_id = db.Column(db.String, db.ForeignKey('product.product_id'))
def __repr__(self):
return '<Line SKU: {}>'.format(self.line_sku)
class Line_view(db.Model):
id = db.Column(db.Integer, primary_key=True)
view_name = db.Column(db.String())
view_lines = relationship('Line',
secondary='line_view_join',
backref='views',
lazy='dynamic',
order_by= ***???*** ) #ordery by eta on Order table
def __repr__(self):
return '<View: {}>'.format(self.view_name)
class Line_view_join(db.Model):
__tablename__ = 'line_view_join'
id = db.Column(db.Integer(), primary_key=True)
line_id = db.Column(db.Integer(), db.ForeignKey('line.id', ondelete='CASCADE'))
view_id = db.Column(db.Integer(), db.ForeignKey('line_view.id', ondelete='CASCADE'))
I am trying to work out how to query table 3, Line_View and have the joined Lines ordered by the eta of Order table.
Such that when querying:
chosen_view = Line_view.query.filter_by(id = 1).one()
chosen_view.view_lines are ordered by Order.eta
I have Tried
class Line_view(db.Model):
id = db.Column(db.Integer, primary_key=True)
view_name = db.Column(db.String())
view_lines = relationship('Line',
secondary='line_view_join',
backref='views',
lazy='dynamic',
**order_by=lambda: asc(Line.order.eta))**
def __repr__(self):
return '<View: {}>'.format(self.view_name)
But this results in the error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Line.order has an attribute 'eta'

Do you need to store the Line_views in the database? If not, you can query the Lines sorted by the eta attribute of the related order. Below, I create two orders with one line each, and then query the lines sorted by the eta attribute of their order:
eta = datetime(2019,10,10)
o = Order(eta = eta)
l = Line(order=o, line_name="sample")
db.session.add(o)
db.session.add(l)
eta = datetime(2019,11,11)
o1 = Order(eta = eta)
l1 = Line(order=o1, line_name="sample1")
db.session.add(o1)
db.session.add(l1)
db.session.commit()
lines = Line.query.join(Order).order_by(Order.eta)

Peewee - update issue with records with ForeignKeyField('self')

I have a database that I am filling from a pd.DataFrame. One of the classes has a ForeignKeyField('self').
from peewee import SqliteDatabase, Model
from peewee import IntegerField, CharField, ForeignKeyField, BooleanField
import pandas as pd
db = SqliteDatabase(':memory:', pragmas=(('foreign_keys', 'on'),))
class BaseModel(Model):
class Meta:
database = db
class Team(BaseModel):
id = IntegerField(unique = True, primary_key = True)
name = CharField()
reserve_team = BooleanField()
parent_team = ForeignKeyField('self', related_name = 'reserve_teams', null = True)
class Meta:
db_table = 'team_team'
Team.create_table()
The dataframe I am filling from looks something like this
df = pd.DataFrame({'ID': [1,2,3,4,5],
'Name': ['A','A2','B','C','C2'],
'Reserve': [False, True, False, False, True],
'Parent': [None, 'A', None, None, 'C']})
I use the following code to fill the table. The parent_team is set to None and when the table is filled I intend to go back and update this field where appropriate.
data = []
for row in df.itertuples():
data.append((row.ID,
row.Name,
row.Reserve == True,
None))
fields = [Team.id,
Team.name,
Team.reserve_team,
Team.parent_team]
with db.atomic():
Team.insert_many(data, fields = fields).execute()
My problem is that I don't understand how to do this without looping over the dataframe/table combination. The documentation seems pretty clear that this should never be done.
for row in df.itertuples():
if row.Reserve:
r = row.ID
p = row.Parent
Team.update(parent_team = Team.get(Team.name == p)).where(Team.id == r).execute()

You could do a topo-sort of the data and then insert them directly with the parent IDs.
As far as looping and updating -- some ideas:
wrap in a transaction
use ValuesList() to provide the mapping of id->parent id and update all at once
insert id -> parent id into a temp table and update using the temp table (all at once)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Django: aggregate if all boolean model fields is True - python-3.x

Related

psycop2 list of dictRow to nested pydantic objects

How to bulk create or update in Django

Error creating partition key using MergeTree engine, Clickhouse

SQLAlchemy Order joined table by field in another joined table

Peewee - update issue with records with ForeignKeyField('self')

Categories

Resources