My Test table has a JSONB column data:
class Test(Base):
__tablename__ = 'test'
data = Column(JSONB)
A typical document has two lists:
{'percentage': [10, 20, 50, 80, 90],
'age': [1.21, 2.65, 5.23, 8.65, 11.78]
}
With a column_property I would like to tailor this two lists so it is available as a dictionary. In "open field" Python this is straightforward:
dict(zip(Test.data['percentage'], Test.data['age']))
But with a column_property:
Test.data_dict = column_property(
dict(zip(Test.data['percentage'], Test.data['age']))
)
this gives:
AttributeError: 'dict' object has no attribute 'label'
Is this actually possible and how should this been done?
Does it solves your problem?
#property
def data_dict(self):
return dict(zip(Test.data['percentage'], Test.data['age']))
In PostgreSQL it would something like this (for PostgreSQL >= 9.4)
SELECT json_object(array_agg(ARRAY[p,a]))
FROM (
SELECT unnest(ARRAY(select jsonb_array_elements_text(data->'percentage'))) p,
unnest(ARRAY(select jsonb_array_elements_text(data->'age'))) a
FROM test
) x;
In SQLAlchemy
from sqlalchemy.orm import column_property
from sqlalchemy import select, alias, text
class Test(Base):
__tablename__ = 'test'
data = db.Column(JSONB)
data_dict = column_property(
select([text('json_object(array_agg(ARRAY[p,a]))')]).select_from(
alias(select([
text("unnest(ARRAY(select jsonb_array_elements_text(data->'percentage'))) p, \
unnest(ARRAY(select jsonb_array_elements_text(data->'age'))) a")
]).select_from(text('test')))
)
)
Related
The below query makes a result set in the variable 'result'
I need to insert that into the iconndest (the new MySQL server). But I have no idea how to insert the query result into the new table? I just want to do Insert into DB.TBL SELECT * FROM RESULT. But I am not sure how?
import mysql.connector
import pandas as pd
from sqlalchemy import create_engine
import multiprocessing as mp
from multiprocessing import cpu_count
try:
engine_source = create_engine("CONN STRING")
iconn = engine_source.connect()
result = iconn.execute('SELECT QUERY')
print('EXTRACT COMPLETE')
engine_dest = create_engine("CONN STRING")
iconndest = engine_dest.connect()
iconndest.execute('SELECT * from ')
engine_source.dispose()
engine_dest.dispose()
except Exception as e:
print('extract: ' + str(e))
What you describe is very simple if we use .mappings() to convert the list of Row objects to a list of RowMapping objects when we retrieve the results. RowMapping objects behave like dict objects when passed as parameter values:
import sqlalchemy as sa
source_engine = sa.create_engine("mssql+pyodbc://scott:tiger^5HHH#mssql_199")
destination_engine = sa.create_engine("sqlite://")
with source_engine.begin() as conn:
results = (
conn.exec_driver_sql(
"""\
SELECT 1 AS id, N'foo' AS txt
UNION ALL
SELECT 2 AS id, N'bar' AS txt
"""
)
.mappings()
.all()
)
print(results)
# [{'id': 1, 'txt': 'foo'}, {'id': 2, 'txt': 'bar'}]
destination_engine.echo = True
with destination_engine.begin() as conn:
conn.exec_driver_sql("CREATE TABLE t (id int, txt varchar(10))")
conn.execute(
sa.text("INSERT INTO t (id, txt) VALUES (:id, :txt)"), results
)
"""SQL emitted:
INSERT INTO t (id, txt) VALUES (?, ?)
[generated in 0.00038s] ((1, 'foo'), (2, 'bar'))
"""
I am working on a small Python CRUD game using FastAPI. For personal reasons, I don't want to use an ORM and I am using psycop2 as a db connector and pydantic for schemas validation (for any CRUD operations).
models/villages.py
from pydantic import BaseModel
class Village(BaseModel):
village_id: int
name: str
owner_id: int
location_id: int
class UserVillages(BaseModel):
villages: list[Village]
crud/villages.py
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
print(records) ## [[1, 'paris', 145, 4, 41], [3, 'milan', 16, 4, 15]]
Instead of printing records, I would like to convert all of my dictRow into a Village object and all Villages into a UserVillages village object. Would it be posible to do that without a lop of extra data structure and loops?
**I found a way to do it but it's not really efficient and there is probably build-in functions to do it **
def get_villages(session: Database, user_id: str):
sql = """
SELECT *
FROM villages
WHERE owner_id = (%s)
"""
params = [user_id]
records = session.select_rows_dict_cursor(sql, params)
villages = []
for record in records:
villages.append(
Village(
village_id=record[0],
name=record[1],
owner_id=record[2],
location_id=record[3],
)
)
return UserVillages(villages=villages)
I have to process an item report CSV file every 1 hour. The CSV contains 150k+ records for 1 account and there are multiple accounts in my system. I was working previously on rails and there was active record gem to handle this use case very efficiently. I am looking for an alternate to this gem in Django or any built in method that will be helpful to import such large data in bulk.
So far I have tried this code.
class ItemReportService:
def call(self, file_url):
with open(file_url, 'r') as file:
reader = csv.DictReader(file)
products = []
for row in reader:
product = self.process_product(row)
products.append(product)
self.update_products(products)
def process_product(self, row):
print(f'Processing sku: {row["SKU"]}')
product = Product.objects.filter(
sku=row['SKU']).first() or Product(sku=row['SKU'])
product.listing_title = row['Product Name']
product.listed_price = row['Price']
product.buy_box_price = row['Buy Box Item Price'] + \
row['Buy Box Shipping Price']
product.status = row['Lifecycle Status']
return product
def update_products(self, products):
Product.objects.bulk_update(
products,
[
'listing_title',
'listed_price',
'buy_box_price',
'Lifecycle Status'
]
)
It is raising this exception because when there is a new product it doesn't have primary key assigned to it
ValueError: All bulk_update() objects must have a primary key set.
Django 4.1 has new parameters for bulk_create(update_conflicts=bool and update_fields=[])
If your model has a field UNIQUE usually Django would ignore it when creating new data. But if you set the update_conflicts parameter to True, the fields inside update_fields will be updated.
You are not saving the product in the database before applying bulk_update.
I have checked your code for this purpose, you can use bulk_insert with an additional parameter
Model.objects.bulk_create(self.data, ignore_conflicts=True)
or
columns = ['column1', 'column2']
obj = Model.objects.filter(column1="sku").first()
if not obj:
obj = Model.objects.create(column1="sku")
obj.column1 = row["column1"] or obj.column1
obj.column2 = row["column2"] or obj.column2
items_to_be_inserted.append(obj)
In the end, you can do bulk update like
Model.objects.bulk_update(items_to_be_inserted, columns)
This will solve your problem.
I made this class function which can be used on any Django model in a project.
from django.db import models
class BaseModel(models.Model):
#classmethod
def bulk_create_or_update(
cls, uniques: list[str],
defaults: list[str],
data: list[dict]
):
# Get existing object list
data_dict, select = {}, None
for entry in data:
sub_entry, key = {}, ''
for uniq in uniques:
sub_entry[uniq] = entry[uniq]
key += str(entry[uniq])
data_dict[key] = entry
if not select:
select = models.Q(**sub_entry)
continue
select |= models.Q(**sub_entry)
records = cls.objects.filter(select).values('pk', *uniques)
existing = {}
for rec in records:
key = ''
for uniq in uniques:
key += str(rec[uniq])
existing[key] = rec
# Split new objects from existing ones
to_create, to_update = [], []
for key, entry in data_dict.items():
obj = cls(**entry)
if key not in existing:
to_create.append(obj)
continue
obj.pk = existing[key]['pk']
to_update.append(obj)
cls.objects.bulk_create(to_create, batch_size=1000)
cls.objects.bulk_update(to_create, defaults, batch_size=1000)
Let take an usage example
class Product(BaseModel)
price = models.IntegerField()
name = models.CharField(max_length=128, unique=True)
status = models.CharField(max_length=128)
if __name__ == '__main__':
data = [
{'price': 50, 'name': 'p1', 'status': 'New'},
{'price': 33, 'name': 'p2', 'status': 'Old'}
]
Product.bulk_create_or_update(uniques=['name'], defaults=['price', 'status'], data=data)
Any improvement suggestion of the code is welcome.
I have found only write method
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from beam_nuggets.io import relational_db
with beam.Pipeline(options=PipelineOptions()) as p:
months = p | "Reading month records" >> beam.Create([
{'name': 'Jan', 'num': 1},
{'name': 'Feb', 'num': 2},
])
source_config = relational_db.SourceConfiguration(
drivername='postgresql+pg8000',
host='localhost',
port=5432,
username='postgres',
password='password',
database='calendar',
create_if_missing=True,
)
table_config = relational_db.TableConfiguration(
name='months',
create_if_missing=True
)
months | 'Writing to DB' >> relational_db.Write(
source_config=source_config,
table_config=table_config
)
The above method can be used for writing into the database, but I need to update row based on some value(update table set value where some_condition).
Thanks
I had the same question and have figured out a good (albeit undocumented) solution.
The TableConfiguration class takes an optional parameter create_insert_f, which expects a function used to generate the sql to run the query. While there are no examples in the main codebase of Beam Nuggets, you can specify an update statement in lieu of the create_insert_f (see this docstring).
For example, for postgres you could write
from sqlalchemy import update
def create_update_some_value(table, record):
update_statement = (
update(table)
.where(table.c.uuid == str(record["uuid"]))
.values(another_value = str(record["some_value"]))
)
return update_statement
and then pass it into your table configuration like
config = TableConfiguration(
name=MY_TABLE,
create_if_missing=False,
primary_key_columns=["uuid"],
create_insert_f=create_update_some_value,
)
Best of luck!
I have a database that I am filling from a pd.DataFrame. One of the classes has a ForeignKeyField('self').
from peewee import SqliteDatabase, Model
from peewee import IntegerField, CharField, ForeignKeyField, BooleanField
import pandas as pd
db = SqliteDatabase(':memory:', pragmas=(('foreign_keys', 'on'),))
class BaseModel(Model):
class Meta:
database = db
class Team(BaseModel):
id = IntegerField(unique = True, primary_key = True)
name = CharField()
reserve_team = BooleanField()
parent_team = ForeignKeyField('self', related_name = 'reserve_teams', null = True)
class Meta:
db_table = 'team_team'
Team.create_table()
The dataframe I am filling from looks something like this
df = pd.DataFrame({'ID': [1,2,3,4,5],
'Name': ['A','A2','B','C','C2'],
'Reserve': [False, True, False, False, True],
'Parent': [None, 'A', None, None, 'C']})
I use the following code to fill the table. The parent_team is set to None and when the table is filled I intend to go back and update this field where appropriate.
data = []
for row in df.itertuples():
data.append((row.ID,
row.Name,
row.Reserve == True,
None))
fields = [Team.id,
Team.name,
Team.reserve_team,
Team.parent_team]
with db.atomic():
Team.insert_many(data, fields = fields).execute()
My problem is that I don't understand how to do this without looping over the dataframe/table combination. The documentation seems pretty clear that this should never be done.
for row in df.itertuples():
if row.Reserve:
r = row.ID
p = row.Parent
Team.update(parent_team = Team.get(Team.name == p)).where(Team.id == r).execute()
You could do a topo-sort of the data and then insert them directly with the parent IDs.
As far as looping and updating -- some ideas:
wrap in a transaction
use ValuesList() to provide the mapping of id->parent id and update all at once
insert id -> parent id into a temp table and update using the temp table (all at once)