I have the following query that contains a common table expression:
WITH example AS (
SELECT unnest(ARRAY['foo', 'bar', 'baz']) as col
)
SELECT *
FROM example
Trying to use it in database.select(query) throws pony.orm.dbapiprovider.ProgrammingError: syntax error at or near "WITH", and database.select(raw_sql(query)) throws TypeError: expected string or bytes-like object.
How can I select data using a CTE with ponyorm?
To use a query containing a CTE, call the execute function on the database and fetch the rows with the returned cursor:
cursor = database.execute("""
WITH example AS (
SELECT unnest(ARRAY['foo', 'bar', 'baz']) as col
)
SELECT *
FROM example
""")
rows = cursor.fetchall()
Note: The cursor is a class from psycopg2, so while this solution does use the pony library the solution may differ depending on the database being used.
Related
In my mapping dataflow I have simplified this down to dimdate just for the test
My parameters are
The source even tells you exactly how to enter the select query if you are using parameters which is what I'm trying to achieve
Then I import but get two different errors
for parameterizing a table`
SELECT * FROM {$df_TableName}
I get
This error from a select * or invidiual columns
I've tried just the WHERE clause (what I actually need) as a parameter but keep getting datatype mismatch errors
I then started testing multiple ways and it only allows the schema to be parameterised from my queries below
all of these other options seem to fail no matter what I do
SELECT * FROM [{$df_Schema}].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = 2106
I know there's an issue with the Integer datatype but don't know how to pass this to the query within the parameter without changing its type as the sql engine cannot run [period] as a string
Use CONCAT function in expression builder to build the Query in Dataflow.
concat(<this> : string, <that> : string, ...) => string
Note: Concatenates a variable number of strings together. All the variables should be in form of strings.
Example 1:
concat(toString("select * from "), toString($df_tablename))
Example 2:
concat(toString("select * from "), toString($df_tablename), ' ', toString(" where incomingperiod = "), toString($df_incomingPeriod))
Awesome, it worked like magic for me. I was struggling with parameterizing tables= names which I was passing through Array list.
Created a data flow parameter and gave this value:
#item().TABLE_NAME
I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)
I need to query my database and return the result by applying Distinct on only date part of datetime field.
My code is:
#blueprint.route('/<field_id>/timeline', methods=['GET'])
#blueprint.response(field_timeline_paged_schema)
def get_field_timeline(
field_id,
page=1,
size=10,
order_by=['capture_datetime desc'],
**kwargs
):
session = flask.g.session
field = fetch_field(session, parse_uuid(field_id))
if field:
query = session.query(
func.distinct(cast(Capture.capture_datetime, Date)),
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id))
return paginate(
query=query,
order_by=order_by,
page=page,
size=size
)
However this returns the following error:
(psycopg2.errors.InvalidColumnReference) for SELECT DISTINCT, ORDER BY expressions must appear in select list
The resulting query is:
SELECT distinct(CAST(tenant_resson.capture.capture_datetime AS DATE)) AS distinct_1, CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date, tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s
Error is:
Query error - {'error': ProgrammingError('(psycopg2.errors.InvalidColumnReference) SELECT DISTINCT ON expressions must match initial ORDER BY expressions\nLINE 2: FROM (SELECT DISTINCT ON (CAST(tenant_resson.capture.capture...\n ^\n',)
How to resolve this issue? Cast is not working for order_by.
I am not familiar with sqlalchemy but this resulting query works as you expect. Please note the DISTINCT ON.
Maybe there is a way in sqlalchemy to execute non-trivial parameterized queries? This would give you the extra benefit to be able to test and optimize the query upfront.
SELECT DISTINCT ON (CAST(tenant_resson.capture.capture_datetime AS DATE))
CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date,
tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s;
You can order by event_date if your business logic needs.
The query posted by #Stefanov.sm is correct. In SQLAlchemy terms it would be
query = (
session.query(
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).distinct(cast(Capture.capture_datetime, Date))\
.filter(Capture.field_id == parse_uuid(field_id))
)
See the docs for more information
I needed to add order_by to my query. Now it works fine.
query = session.query(
cast(Capture.capture_datetime, Date).label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id)) \
.distinct(cast(Capture.capture_datetime, Date)) \
.order_by(cast(Capture.capture_datetime, Date).desc())
I have a self-referencing db table with notes (id, title, parent).
[Note: the self-referencing property of the table is not relevant to the question]
I would like to order them alphabetically by title and find the rank (= order) of a specific note by its id. I'm using Peewee 3 as ORM.
DB Model:
class Note(Model):
title = CharField()
parent = ForeignKeyField('self', backref='children', null = True)
Code:
noteAlias = Note.alias()
subquery = (noteAlias.select(noteAlias.id, fn.RANK().over(partition_by=[noteAlias.title], order_by=[noteAlias.title]).alias('rank')).where(noteAlias.parent.is_null()).alias('subq'))
query = (Note.select(subquery.c.id, subquery.c.rank).from_(subquery).where(subquery.c.id == 5))
print("Rank of element is: " + str(query.rank))
This code gives me the following error:
cursor.execute(sql, params or ())
sqlite3.OperationalError: near "(": syntax error
SQLite test
If I simply run this sqlite code directly against my db:
SELECT (title, ROW_NUMBER() OVER (ORDER BY title) AS placement) FROM note
I get the error: near ",": syntax error:
Your version of SQLite perhaps doesn't support window functions, as this was added relatively recently in 3.25 I believe.
I'm trying to figure out how to replicate the below query in SQLAlchemy
SELECT c.company_id AS company_id,
(SELECT policy_id FROM associative_table at WHERE at.company_id = c.company_id) AS policy_id_ref,
(SELECT `default` FROM policy p WHERE p.policy_id = policy_id_ref) AS `default`,
FROM company c;
Note that this is a stripped down, basic example of what I'm really dealing with. The actual schema supports data and relationship versioning that requires the subqueries to include additional conditions, sorting, and limiting, making it impractical (if not impossible) for them to be joins.
The crux of the problem is in how the second subquery relies on policy_id_ref -- the value obtained from the first subquery. In SQLAlchemy, this is effectively what I have now:
ct = aliased(classes.company)
at = aliased(classes.associative_table)
pt = aliased(classes.policy)
policy_id_ref = session.query(at.policy_id).\
filter(at.company_id == ct.company_id).\
label('policy_id_ref')
policy_default = session.query(pt.default).\
filter(pt.id == 'policy_id_ref').\
label('default')
query = session.query(ct.company_id,policy_id_ref,policy_default)
The pull from the "company" table works fine as does the first subquery that retrieves the "policy_id_ref" column. The problem is the second subquery that has to reference that "policy_id_ref" column. I don't know how to write its filter in such a way that it literally renders "policy_id_ref" in the resulting query, to match the label of the first subquery.
Suggestions?
Thanks in advance
You can write your query as
select(
Companies.company_id,
AssociativeTable.policy_id.label('policy_id_ref'),
Policy.default.label('policy_default'),
).select_from(
Companies,
).join(
AssociativeTable,
AssociativeTable.company_id == Companies.company_id,
).join(
Policy,
AssociativeTable.policy_id == Policy.id
)
but in case you need reference to label from subquery => use literal_column
from sqlalchemy import func, select, literal_column
session.query(
func.array_agg(
literal_column('batch_info'),
JSONB
).label('history')
).select_from(
select(
func.jsonb_build_object(
'batch_id', AccountingQueueBatch.id,
'batch_label', AccountingQueueBatch.label,
).label('batch_info')
).select_from(
AccountingQueueBatch,
)
)