Very simple trying to run a query in Python 3 sqlalchemy to delete some records given string names of table and field to query against.
How do you get the table object from a string?
Given 1. how do you run a query via ORM with just a string of the field name?
I would assume all ORM's have an internal array or method like get with the name.
json_config = [
{"table": "tableA",
"field": "modified_on"
"expires": 30},
{"table": "tableB",
"field": "event_on"
"expires": 30}
]
for table_conf_item in self.json_config:
table_name = table_conf_item["table"]
field_name = table_conf_item["field"]
expire_after = table_conf_item["expires"]
table_obj = self.orm_session.TABLES[table_name]
field_obj = self.orm_session.TABLES[table_name].FIELDS[field_name]
result = self.orm_session.delete(table_obj).where(field_obj < expire_after)
self.orm_session.commit()
print(f"{table_name}: removed {result.row_count} objects")
Given the table's name, you can use reflection to get a Table object. Using SQLAlchemy's core layer, this is reasonably straightforward:
import sqlalchemy as sa
engine = sa.create_engine(...)
tbl = sa.Table(name_of_table, metadata, autoload_with=engine)
If you want to work with multiple tables, it may be more efficient to store them a Metadata instance for later access:
metadata = sa.MetaData()
metadata.reflect(engine, only=list_of_table_names)
tbl = metadata.tables[name_of_table]
Once you have a Table object you can reference columns by name like this: tbl.c[name_of_field].
Full example:
import sqlalchemy as sa
# Setup
engine = sa.create_engine('sqlite://', echo=True, future=True)
tbl = sa.Table(
't',
sa.MetaData(),
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('foo', sa.Integer),
)
tbl.create(engine)
with engine.begin() as conn:
vals = [42, 43, 42, 43, 56, 87, 89]
conn.execute(tbl.insert(), [{'foo': v} for v in vals])
del tbl
# Reflect the table.
metadata = sa.MetaData()
metadata.reflect(engine, only=['t'])
tbl = metadata.tables['t']
# Define some statements.
q1 = sa.select(tbl).where(tbl.c['foo'] == 42)
q2 = sa.select(tbl.c['id'], tbl.c['foo']).where(tbl.c['foo'] == 43)
q3 = sa.delete(tbl).where(tbl.c['foo'] != 42)
# Execute the statements.
with engine.connect() as conn:
rows = conn.execute(q1)
for row in rows:
print(row)
print()
rows = conn.execute(q2)
for row in rows:
print(row)
print()
with engine.begin() as conn:
conn.execute(q3)
with engine.connect() as conn:
rows = conn.execute(q1)
for row in rows:
print(row)
print()
Doing the same through the ORM layer is more complicated, as table and column names must be mapped to ORM entity classes (models) and their attributes. This replicates the previous example for a simple mapping (it assumes the same initial data as above).
import sqlalchemy as sa
from sqlalchemy import orm
Base = orm.declarative_base()
class Thing(Base):
__tablename__ = 't'
id = sa.Column(sa.Integer, primary_key=True)
thing_foo = sa.Column('foo', sa.Integer)
engine = sa.create_engine(...)
Base.metadata.create_all(engine)
Session = orm.sessionmaker(engine, future=True)
tablename = 't'
columnname = 'foo'
with Session.begin() as s:
# Get the mappers for the Base class.
mappers = Base.registry.mappers
# Get the mapper for our table.
mapper = next(m for m in mappers if m.entity.__tablename__ == tablename)
# Get the entity class (Thing).
entity = mapper.entity
# Get the column from the Table.
table_column = mapper.selectable.c[columnname]
# Get the mapper property that corresponds to the column
# (the entity attribute may have a different name to the
# column in the database).
mapper_property = mapper.get_property_by_column(table_column)
# Get the queryable entity attribute (Thing.thing_foo).
attr = mapper.all_orm_descriptors[mapper_property.key]
q = sa.select(entity).where(attr != 42)
entities = s.scalars(q)
for entity in entities:
s.delete(entity)
with Session() as s:
for thing in s.scalars(sa.select(Thing)):
print(thing.id, thing.thing_foo)
Related
I am not able to iterate through my query as I would like using Peewee
Those are the related Objects in Models.py
class Conversation(peewee.Model):
id = peewee.AutoField(unique=True, index=True)
creation_date = peewee.DateTimeField(default=datetime.now)
contact_id = ForeignKeyField(Contact, backref='conversation')
launch_id = ForeignKeyField(Launch, backref='conversation')
request_data = peewee.TextField(null=True)
status = peewee.TextField(null=True)
class Contact(peewee.Model):
id = peewee.AutoField(unique=True, index=True)
uuid = peewee.CharField(default=shortuuid.uuid, index=True)
whatsapp_phone = peewee.CharField(index=True, default='')
status = peewee.CharField(default='init')
conversationId = peewee.CharField(null=True)
Here's how I am trying to iterate:
for conversation in Conversation.select().where(Conversation.launch_id == str(launch_id)):
print(conversation.contact.id)
And this is the error that I a getting:
print(conversation.contact.id)
AttributeError: 'Conversation' object has no attribute 'contact'
I've tried to change the way I do my query:
query = Conversation.select(Contact).join(Contact).where(Conversation.launch_id == str(launch_id))
But I get the exact same error if I iterate in the same way.
The issue is you are, for some reason, trying to access .contact when you've named your foreign-key .contact_id. The peewee docs are clear about foreign key naming, but you want this:
class Conversation(peewee.Model):
id = peewee.AutoField(unique=True, index=True)
creation_date = peewee.DateTimeField(default=datetime.now)
# Data will be stored in a column named "contact_id":
contact = ForeignKeyField(Contact, backref='conversations')
# Data will be stored in a column named "launch_id":
launch = ForeignKeyField(Launch, backref='conversations')
request_data = peewee.TextField(null=True)
status = peewee.TextField(null=True)
This allows:
query = (Conversation
.select()
.where(Conversation.launch == str(launch_id)))
for conversation in query:
# Access the underlying foreign-key value.
print(conversation.contact_id)
Or, if you intend to access other fields on the Contact:
query = (Conversation
.select(Conversation, Contact)
.join(Contact)
.where(Conversation.launch == str(launch_id)))
for conversation in query:
# We now have a "full" Contact instance we can access efficiently:
print(conversation.contact.id)
Please read the docs:
http://docs.peewee-orm.com/en/latest/peewee/quickstart.html#lists-of-records
http://docs.peewee-orm.com/en/latest/peewee/relationships.html
http://docs.peewee-orm.com/en/latest/peewee/models.html#foreignkeyfield
How do I express relationship which depends on len of collection child in joined entity?
In below example, parent entity is AlgoOrder. Child entity is Order. And PrivateTrade is child entity of Order.
AlgoOrder --> Order --> PrivateTrade
The problem I am having is with "orders_pending_private_trade_update".
class AlgoOrder(DbModel):
__tablename__ = "algo_order"
id = sa.Column(sa.Integer, primary_key=True)
... stuff ...
# https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html
open_orders = orm.relation(Order, primaryjoin=and_(Order.algo_order_id == id, Order.status == 'OPEN'), lazy='select')
orders_pending_private_trade_update = orm.relation(Order, primaryjoin=and_(Order.algo_order_id == id, , Order.status == 'CLOSED', len(Order.private_trades)==0), lazy='select')
#property
def pending_orders(self):
return self.open_orders + self.orders_pending_private_trade_update
class Order(DbModel):
__tablename__ = "order_hist"
algo_order_id = sa.Column(sa.Integer, sa.ForeignKey("algo_order.id"))
... stiff ...
private_trades = orm.relation(PrivateTrade, primaryjoin=and_(PrivateTrade.order_id == order_id))
class PrivateTrade(DbModel):
__tablename__ = "private_trade"
id = sa.Column(sa.Integer, primary_key=True)
order_id = sa.Column(sa.String, sa.ForeignKey("order_hist.order_id"))
In particular, the error at "orders_pending_private_trade_update" was with "len" on Order.private_trades:
Exception has occurred: TypeError (note: full exception trace is shown but execution is paused at: _run_module_as_main) object of type 'InstrumentedAttribute' has no len()
So, I tried:
from sqlalchemy.sql.expression import func
orders_pending_private_trade_update = orm.relation(Order, primaryjoin=and_(Order.algo_order_id == id, Order.status == 'CLOSED', func.count(Order.private_trades)==0), lazy='select', viewonly=True)
But then error was "foreign key columns are present in neither the parent nor the child's mapped tables":
Can't determine relationship direction for relationship 'AlgoOrder.orders_pending_private_trade_update' - foreign key columns are present in neither the parent nor the child's mapped tables <class 'sqlalchemy.exc.ArgumentError'> Can't determine relationship direction for relationship 'AlgoOrder.orders_pending_private_trade_update' - foreign key columns are present in neither the parent nor the child's mapped tables
I checked my tables, I do have them:
op.create_table(
'algo_order',
sa.Column('id', sa.Integer(), primary_key=True),
...
op.create_table(
'order_hist',
sa.Column('id', sa.Integer(), primary_key=True),
sa.Column('algo_order_id', sa.Integer, sa.ForeignKey("algo_order.id")),
...
op.create_table(
'private_trade',
sa.Column('id', sa.Integer(), primary_key=True),
sa.Column('order_id', sa.String(), sa.ForeignKey("order_hist.order_id"))
...
Thanks in advance.
I think I found it, but syntax pretty ugly: I used closed_order.session to do a new Query
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy.sql.expression import func
import sqlalchemy.dialects.postgresql as psql
from sqlalchemy.ext.mutable import MutableDict
from sqlalchemy.sql.expression import and_
class AlgoOrder(DbModel):
__tablename__ = "algo_order"
id = sa.Column(sa.Integer, primary_key=True)
... other stuff ...
open_orders = orm.relation(Order, primaryjoin=and_(Order.algo_order_id == id, Order.status == 'OPEN'), lazy='select')
closed_orders = orm.relation(Order, primaryjoin=and_(Order.algo_order_id == id, Order.status == 'CLOSED'), lazy='dynamic', viewonly=True)
#property
def orders_pending_private_trade_update(self):
order_ids_with_no_private_trades = [ order.id for order in list(self.closed_orders.session.query(Order.id, func.count(PrivateTrade.id).label('count_private_trades')).join(PrivateTrade, isouter=True).group_by(Order.id).having(func.count(PrivateTrade.id) == 0).all())]
orders_with_no_private_trades = self.closed_orders.session.query(Order).filter(Order.id.in_(order_ids_with_no_private_trades)).order_by(Order.id.desc()).limit(1000).all()
return orders_with_no_private_trades
#property
def pending_orders(self):
return list(self.open_orders) + list(self.orders_pending_private_trade_update)
I also don't like "first(100)" as an attempt to limit number of rows fetched. And how/when you dispose of the list to prevent memory leak? I think above approach is bad. Should use generator instead of returning list.
Essentially raw sql what I am looking for is a generator which returns below:
select
order_id,
cnt
from (
select
order_hist.id,
order_hist.order_id,
count(private_trade.id) cnt
from order_hist
left join private_trade on private_trade.order_id = order_hist.order_id
where order_hist.status in ('CLOSED', 'CANCELLED')
group by order_hist.id, order_hist.order_id
) src
where
cnt=0
Any better way to do this? I think my solution shows the sqlalchemy syntax but it's computationally inefficient.
Here's solution using generator instead to avoid MemoryError:
def order_hist_missing_private_trade_get(engine):
order_hist_missing_private_trade_sql = '''
select
order_id,
cnt
from (
select
order_hist.id,
order_hist.order_id,
count(private_trade.id) cnt
from order_hist
left join private_trade on private_trade.order_id = order_hist.order_id
where order_hist.status in ('CLOSED', 'CANCELLED')
group by order_hist.id, order_hist.order_id
) src
where
cnt=0
'''
with engine.connect() as conn:
# https://stackoverflow.com/questions/7389759/memory-efficient-built-in-sqlalchemy-iterator-generator
conn.execution_options(stream_results=True)
rs = conn.execute(order_hist_missing_private_trade_sql)
while True:
batch = rs.fetchmany(10000)
for row in batch:
order_id = row['order_id']
yield order_id
Usage:
from sqlalchemy import create_engine
connstr : str = "postgresql://postgres:your_secret#localhost/postgres"
engine = create_engine(connstr)
generator = order_hist_missing_private_trade_get(engine)
while True:
order_id = next(generator)
print(f"order_id: {order_id}")
I understand that the documented way to insert data into a table looks like
```class Table(db.Model):
__tablename___ = 'table'
id = db.Column(db.Integer, primary_key=True)
data = db.Column(db.String(50)
...
insert = Table(id = '0', data = 'new data')```
However, I am working on a project that has multiple tables all with different columns, lengths, and data. I have worked out how to get the dynamic data into a dict, prepped to create rows. Below is my actual code:
def load_csv_data(self, ctx):
data_classes = [Locations, Scents, Classes]
data_tables = ['locations', 'scents', 'classes']
tables = len(data_tables)
for i in range(tables):
with open('./development/csv/{}.csv'.format(data_tables[i]), newline='') as times_file:
times_reader = csv.reader(times_file, delimiter=',', quotechar='|')
for row in times_reader:
data_columns = data_classes[i].__table__.columns
columns = len(data_columns)
insert_data = {}
for col in range(columns):
row_key = data_columns[col].key
row_value = row[col]
insert_data.update({row_key: row_value})
The challenge I am having is finding a way to do the actual insert based on these dynamic params. So if the above returns:
insert_data = {val1: val2, val3: val4, val5: val6}
I would like to convert this to:
insert = Table(val1='val2', val3='val4', val5='val6)
Everything I have tried so far has issued a __init__() missing 2 required positional arguments: error.
Anyone have any thoughts on how I might accomplish this?
Does anyone know what this error means? Below is class method that returns error sqlalchemy.exc.CompileError: The 'oracle' dialect with current database version settings does not support empty inserts.
Im using same classmethod in sqlserver and it works. how to bypass this in oracle using sqlalchemy orm?
def insertr(self, tablename, data, schema=None):
def convert_nan(v):
if pd.isnull(v) or pd.isna(v):
v = None
return v
class DbTable(object):
pass
engine = self.engine
metadata = MetaData(bind=engine)
table = Table(tablename, metadata, autoload=True, quote=True, schema=schema)
mapper(DbTable, table)
DbTable.__getattribute__(DbTable, self.primary_key)
insert_rows = [{k: convert_nan(v) for k, v in check_ir.items()} for check_ir in data]
session = sessionmaker(bind=engine)()
session.bulk_insert_mappings(DbTable, insert_rows)
session.commit()
session.flush()
and data that im trying to insert looks like :
[{'coll': 10, 'col2': 'value'}, {'col1': 20, 'col2': 'value'}]
I found workaround for this problem so if anyone stumbleupon same issue here is the code:
def insertr(self, tablename, data, schema=None):
def convert_nan(v):
if pd.isnull(v) or pd.isna(v):
v = None
return v
engine = self.engine
metadata = MetaData(bind=engine)
table = Table(tablename, metadata, autoload=True, quote=True, schema=schema)
insert_rows = [{k.lower(): convert_nan(v) for k, v in check_ir.items()} for check_ir in data]
with engine.begin() as connection:
for insert_row in insert_rows:
connection.execute(table.insert().values(**insert_row))
Followed below checklist to resolve the same error:
Check the column names and dictionary keys (dict to be inserted) are matching (case-sensitive match)
Some database objects like a trigger, procedure is trying to insert a value in a column which does not exists.
Trying to insert a none value for a NUN-NULL column
I can for example get BigQuery data into local python with:
import os
from google.cloud import bigquery
project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()
query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows
The result:
print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]
How do I then go and get the schema for this table? Such that, for example
headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}
You can use the schema method from your resp variable.
After running the query you can retrieve it:
schema = resp.schema
schema will be a list containing the definition for each column in your query.
As an example, lets say this is your query:
query = "select '1' as fv, STRUCT<i INT64, j INT64> (1, 2) t from `dataset.table` limit 1"
The schema will be a list containing 2 entries:
[<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6e50>,
<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6b10>]
For each object in schema, you have the methods field_type, fields, mode and name so if you run:
schema[0].field_type, schema[0].mode, schema[0].name
The result is "STRING", "NULLABLE", "fv".
As the second column is a record, then if you run:
schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields
The result is:
"RECORD", "NULLABLE", "t", [google schema 1, google schema 2]
Where google schema 1 contains the definition for the inner fields within the record.
As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema and build it yourself. It should be simple though. Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it:
def extract_schema(schema_resp):
l = []
for schema_obj in schema_resp:
r = {}
r['name'] = schema_obj.name
r['type'] = schema_obj.field_type
r['mode'] = schema_obj.mode
if schema_obj.fields:
r['fields'] = extract_schema(schema_obj.fields)
l.append(r)
return l
So you'd just have to run schema = extract_schema(resp.schema) and (hopefully) you'll be good to go.