psycop2 list of dictRow to nested pydantic objects - psycopg2

I am working on a small Python CRUD game using FastAPI. For personal reasons, I don't want to use an ORM and I am using psycop2 as a db connector and pydantic for schemas validation (for any CRUD operations).
models/villages.py
from pydantic import BaseModel
class Village(BaseModel):
village_id: int
name: str
owner_id: int
location_id: int
class UserVillages(BaseModel):
villages: list[Village]
crud/villages.py
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
print(records) ## [[1, 'paris', 145, 4, 41], [3, 'milan', 16, 4, 15]]
Instead of printing records, I would like to convert all of my dictRow into a Village object and all Villages into a UserVillages village object. Would it be posible to do that without a lop of extra data structure and loops?
**I found a way to do it but it's not really efficient and there is probably build-in functions to do it **
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
villages = []
for record in records:
villages.append(
Village(
village_id=record[0],
name=record[1],
owner_id=record[2],
location_id=record[3],
)
)
return UserVillages(villages=villages)

Related

SQLAlchemy - insert from a result object?

The below query makes a result set in the variable 'result'
I need to insert that into the iconndest (the new MySQL server). But I have no idea how to insert the query result into the new table? I just want to do Insert into DB.TBL SELECT * FROM RESULT. But I am not sure how?
import mysql.connector
import pandas as pd
from sqlalchemy import create_engine
import multiprocessing as mp
from multiprocessing import cpu_count
try:
engine_source = create_engine("CONN STRING")
iconn = engine_source.connect()
result = iconn.execute('SELECT QUERY')
print('EXTRACT COMPLETE')
engine_dest = create_engine("CONN STRING")
iconndest = engine_dest.connect()
iconndest.execute('SELECT * from ')
engine_source.dispose()
engine_dest.dispose()
except Exception as e:
print('extract: ' + str(e))
What you describe is very simple if we use .mappings() to convert the list of Row objects to a list of RowMapping objects when we retrieve the results. RowMapping objects behave like dict objects when passed as parameter values:
import sqlalchemy as sa
source_engine = sa.create_engine("mssql+pyodbc://scott:tiger^5HHH#mssql_199")
destination_engine = sa.create_engine("sqlite://")
with source_engine.begin() as conn:
results = (
conn.exec_driver_sql(
"""\
SELECT 1 AS id, N'foo' AS txt
UNION ALL
SELECT 2 AS id, N'bar' AS txt
"""
)
.mappings()
.all()
)
print(results)
# [{'id': 1, 'txt': 'foo'}, {'id': 2, 'txt': 'bar'}]
destination_engine.echo = True
with destination_engine.begin() as conn:
conn.exec_driver_sql("CREATE TABLE t (id int, txt varchar(10))")
conn.execute(
sa.text("INSERT INTO t (id, txt) VALUES (:id, :txt)"), results
)
"""SQL emitted:
INSERT INTO t (id, txt) VALUES (?, ?)
[generated in 0.00038s] ((1, 'foo'), (2, 'bar'))
"""

Django: aggregate if all boolean model fields is True

from django.db import models
class Car(models.model):
sold = models.BooleanField(default=False)
I can detemine if all cars were sold by making two queries:
sold_count = Car.objects.filter(sold=True).count()
all_count = Car.objects.count()
are_all_sold = (all_count - sold_count) == 0
Since this operation is very frequent on my app, I am wondering if it is possible to do it in just one DB query? e.g. using Aggregation or Query Expressions, etc.
just an update, I can get the stats on how many were sold & unsold by one single query:
Car.objects.values("sold").annotate(count=Count("sold"))
<QuerySet [{'sold': False, 'count': 1}, {'sold': True, 'count': 9}]>
Yes. You can do that by using Model Manager.
in model.py:
from django.db import models
class CarManager(models.Manager):
def are_all_sold_or_not(self):
sold_count = Car.objects.filter(sold=True).count()
all_count = Car.objects.count()
return all_count == sold_count
class Car(models.model):
sold = models.BooleanField(default=False)
objects = CarManager()
in views.py
def myview(request):
.....
.
are_all_sold = Car.objects.are_all_sold_or_not()
.
.
You can read more about Model Manager in the documentation:
https://docs.djangoproject.com/en/4.0/topics/db/managers/

How to bulk create or update in Django

I have to process an item report CSV file every 1 hour. The CSV contains 150k+ records for 1 account and there are multiple accounts in my system. I was working previously on rails and there was active record gem to handle this use case very efficiently. I am looking for an alternate to this gem in Django or any built in method that will be helpful to import such large data in bulk.
So far I have tried this code.
class ItemReportService:
def call(self, file_url):
with open(file_url, 'r') as file:
reader = csv.DictReader(file)
products = []
for row in reader:
product = self.process_product(row)
products.append(product)
self.update_products(products)
def process_product(self, row):
print(f'Processing sku: {row["SKU"]}')
product = Product.objects.filter(
sku=row['SKU']).first() or Product(sku=row['SKU'])
product.listing_title = row['Product Name']
product.listed_price = row['Price']
product.buy_box_price = row['Buy Box Item Price'] + \
row['Buy Box Shipping Price']
product.status = row['Lifecycle Status']
return product
def update_products(self, products):
Product.objects.bulk_update(
products,
[
'listing_title',
'listed_price',
'buy_box_price',
'Lifecycle Status'
]
)
It is raising this exception because when there is a new product it doesn't have primary key assigned to it
ValueError: All bulk_update() objects must have a primary key set.
Django 4.1 has new parameters for bulk_create(update_conflicts=bool and update_fields=[])
If your model has a field UNIQUE usually Django would ignore it when creating new data. But if you set the update_conflicts parameter to True, the fields inside update_fields will be updated.
You are not saving the product in the database before applying bulk_update.
I have checked your code for this purpose, you can use bulk_insert with an additional parameter
Model.objects.bulk_create(self.data, ignore_conflicts=True)
or
columns = ['column1', 'column2']
obj = Model.objects.filter(column1="sku").first()
if not obj:
obj = Model.objects.create(column1="sku")
obj.column1 = row["column1"] or obj.column1
obj.column2 = row["column2"] or obj.column2
items_to_be_inserted.append(obj)
In the end, you can do bulk update like
Model.objects.bulk_update(items_to_be_inserted, columns)
This will solve your problem.
I made this class function which can be used on any Django model in a project.
from django.db import models
class BaseModel(models.Model):
#classmethod
def bulk_create_or_update(
cls, uniques: list[str],
defaults: list[str],
data: list[dict]
):
# Get existing object list
data_dict, select = {}, None
for entry in data:
sub_entry, key = {}, ''
for uniq in uniques:
sub_entry[uniq] = entry[uniq]
key += str(entry[uniq])
data_dict[key] = entry
if not select:
select = models.Q(**sub_entry)
continue
select |= models.Q(**sub_entry)
records = cls.objects.filter(select).values('pk', *uniques)
existing = {}
for rec in records:
key = ''
for uniq in uniques:
key += str(rec[uniq])
existing[key] = rec
# Split new objects from existing ones
to_create, to_update = [], []
for key, entry in data_dict.items():
obj = cls(**entry)
if key not in existing:
to_create.append(obj)
continue
obj.pk = existing[key]['pk']
to_update.append(obj)
cls.objects.bulk_create(to_create, batch_size=1000)
cls.objects.bulk_update(to_create, defaults, batch_size=1000)
Let take an usage example
class Product(BaseModel)
price = models.IntegerField()
name = models.CharField(max_length=128, unique=True)
status = models.CharField(max_length=128)
if __name__ == '__main__':
data = [
{'price': 50, 'name': 'p1', 'status': 'New'},
{'price': 33, 'name': 'p2', 'status': 'Old'}
]
Product.bulk_create_or_update(uniques=['name'], defaults=['price', 'status'], data=data)
Any improvement suggestion of the code is welcome.

Peewee - update issue with records with ForeignKeyField('self')

I have a database that I am filling from a pd.DataFrame. One of the classes has a ForeignKeyField('self').
from peewee import SqliteDatabase, Model
from peewee import IntegerField, CharField, ForeignKeyField, BooleanField
import pandas as pd
db = SqliteDatabase(':memory:', pragmas=(('foreign_keys', 'on'),))
class BaseModel(Model):
class Meta:
database = db
class Team(BaseModel):
id = IntegerField(unique = True, primary_key = True)
name = CharField()
reserve_team = BooleanField()
parent_team = ForeignKeyField('self', related_name = 'reserve_teams', null = True)
class Meta:
db_table = 'team_team'
Team.create_table()
The dataframe I am filling from looks something like this
df = pd.DataFrame({'ID': [1,2,3,4,5],
'Name': ['A','A2','B','C','C2'],
'Reserve': [False, True, False, False, True],
'Parent': [None, 'A', None, None, 'C']})
I use the following code to fill the table. The parent_team is set to None and when the table is filled I intend to go back and update this field where appropriate.
data = []
for row in df.itertuples():
data.append((row.ID,
row.Name,
row.Reserve == True,
None))
fields = [Team.id,
Team.name,
Team.reserve_team,
Team.parent_team]
with db.atomic():
Team.insert_many(data, fields = fields).execute()
My problem is that I don't understand how to do this without looping over the dataframe/table combination. The documentation seems pretty clear that this should never be done.
for row in df.itertuples():
if row.Reserve:
r = row.ID
p = row.Parent
Team.update(parent_team = Team.get(Team.name == p)).where(Team.id == r).execute()
You could do a topo-sort of the data and then insert them directly with the parent IDs.
As far as looping and updating -- some ideas:
wrap in a transaction
use ValuesList() to provide the mapping of id->parent id and update all at once
insert id -> parent id into a temp table and update using the temp table (all at once)

SQLAlchemy: column_property jsonb operation

My Test table has a JSONB column data:
class Test(Base):
__tablename__ = 'test'
data = Column(JSONB)
A typical document has two lists:
{'percentage': [10, 20, 50, 80, 90],
'age': [1.21, 2.65, 5.23, 8.65, 11.78]
}
With a column_property I would like to tailor this two lists so it is available as a dictionary. In "open field" Python this is straightforward:
dict(zip(Test.data['percentage'], Test.data['age']))
But with a column_property:
Test.data_dict = column_property(
dict(zip(Test.data['percentage'], Test.data['age']))
)
this gives:
AttributeError: 'dict' object has no attribute 'label'
Is this actually possible and how should this been done?
Does it solves your problem?
#property
def data_dict(self):
return dict(zip(Test.data['percentage'], Test.data['age']))
In PostgreSQL it would something like this (for PostgreSQL >= 9.4)
SELECT json_object(array_agg(ARRAY[p,a]))
FROM (
SELECT unnest(ARRAY(select jsonb_array_elements_text(data->'percentage'))) p,
unnest(ARRAY(select jsonb_array_elements_text(data->'age'))) a
FROM test
) x;
In SQLAlchemy
from sqlalchemy.orm import column_property
from sqlalchemy import select, alias, text
class Test(Base):
__tablename__ = 'test'
data = db.Column(JSONB)
data_dict = column_property(
select([text('json_object(array_agg(ARRAY[p,a]))')]).select_from(
alias(select([
text("unnest(ARRAY(select jsonb_array_elements_text(data->'percentage'))) p, \
unnest(ARRAY(select jsonb_array_elements_text(data->'age'))) a")
]).select_from(text('test')))
)
)

Resources