How to apply DISTINCT on only date part of datetime field in sqlalchemy python? - python-3.x

I need to query my database and return the result by applying Distinct on only date part of datetime field.
My code is:
#blueprint.route('/<field_id>/timeline', methods=['GET'])
#blueprint.response(field_timeline_paged_schema)
def get_field_timeline(
field_id,
page=1,
size=10,
order_by=['capture_datetime desc'],
**kwargs
):
session = flask.g.session
field = fetch_field(session, parse_uuid(field_id))
if field:
query = session.query(
func.distinct(cast(Capture.capture_datetime, Date)),
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id))
return paginate(
query=query,
order_by=order_by,
page=page,
size=size
)
However this returns the following error:
(psycopg2.errors.InvalidColumnReference) for SELECT DISTINCT, ORDER BY expressions must appear in select list
The resulting query is:
SELECT distinct(CAST(tenant_resson.capture.capture_datetime AS DATE)) AS distinct_1, CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date, tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s
Error is:
Query error - {'error': ProgrammingError('(psycopg2.errors.InvalidColumnReference) SELECT DISTINCT ON expressions must match initial ORDER BY expressions\nLINE 2: FROM (SELECT DISTINCT ON (CAST(tenant_resson.capture.capture...\n ^\n',)
How to resolve this issue? Cast is not working for order_by.

I am not familiar with sqlalchemy but this resulting query works as you expect. Please note the DISTINCT ON.
Maybe there is a way in sqlalchemy to execute non-trivial parameterized queries? This would give you the extra benefit to be able to test and optimize the query upfront.
SELECT DISTINCT ON (CAST(tenant_resson.capture.capture_datetime AS DATE))
CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date,
tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s;
You can order by event_date if your business logic needs.

The query posted by #Stefanov.sm is correct. In SQLAlchemy terms it would be
query = (
session.query(
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).distinct(cast(Capture.capture_datetime, Date))\
.filter(Capture.field_id == parse_uuid(field_id))
)
See the docs for more information

I needed to add order_by to my query. Now it works fine.
query = session.query(
cast(Capture.capture_datetime, Date).label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id)) \
.distinct(cast(Capture.capture_datetime, Date)) \
.order_by(cast(Capture.capture_datetime, Date).desc())

Related

psycopg2 SELECT query with inbuilt functions

I have the following SQL statement where i am reading the database to get the records for 1 day. Here is what i tried in pgAdmin console -
SELECT * FROM public.orders WHERE createdat >= now()::date AND type='t_order'
I want to convert this to the syntax of psycopg2but somehow it throws me errors -
Database connection failed due to invalid input syntax for type timestamp: "now()::date"
Here is what i am doing -
query = f"SELECT * FROM {table} WHERE (createdat>=%s AND type=%s)"
cur.execute(query, ("now()::date", "t_order"))
records = cur.fetchall()
Any help is deeply appreciated.
DO NOT use f strings. Use proper Parameter Passing
now()::date is better expressed as current_date. See Current Date/Time.
You want:
query = "SELECT * FROM public.orders WHERE (createdat>=current_date AND type=%s)"
cur.execute(query, ["t_order"])
If you want dynamic identifiers, table/column names then:
from psycopg2 import sql
query = sql.SQL("SELECT * FROM {} WHERE (createdat>=current_date AND type=%s)").format(sql.Identifier(table))
cur.execute(query, ["t_order"])
For more information see sql.

Inserting Timestamp Into Snowflake Using Python 3.8

I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)

RedShift Correlated Sub-query

Need your help. I am trying to convert below SQL query into RedShift, but getting error message "Invalid operation: This type of correlated subquery pattern is not supported yet"
SELECT
Comp_Key,
Comp_Reading_Key,
Row_Num,
Prev_Reading_Date,
( SELECT MAX(X) FROM (
SELECT CAST(dateadd(day, 1, Prev_Reading_Date) AS DATE) AS X
UNION ALL
SELECT dim_date.calendar_date
) a
) as start_dt
FROM stage5
JOIN dim_date ON calendar_date BETWEEN '2020-04-01' and '2020-04-15'
WHERE Comp_Key =50906055
The same query works fine in SQL Server. Could you please help me to run it in RedShift?
Regards,
Kiru
Kiru - you need to convert the correlated query into a join structure. Not knowing the data content of your tables and the exact expected out put I'm just guessing but here's a swag:
SELECT
Comp_Key,
Comp_Reading_Key,
Row_Num,
Prev_Reading_Date,
Max_X
FROM stage5
JOIN dim_date ON calendar_date BETWEEN '2020-04-01' and '2020-04-15'
JOIN ( SELECT MAX(X) as Max_X, MAX(calendar_date) as date FROM (
SELECT CAST(dateadd(day, 1, Prev_Reading_Date) AS DATE) AS X FROM stage5
cross join
SELECT dim_date.calendar_date from dim_date
) a
) as start_dt ON a.date = dim_date.calendar_date
WHERE Comp_Key =50906055
This is just a starting guess but might get you started.
However, you are likely better off rewriting this query to use window functions as they are the fastest way to perform these types of looping queries in Redshift.
Thanks Bill. It won't work in RedShift as it still has correalted sub-query.
However I have modified query in another method and it works fine.
I am closing ticket.

SQLAlchemy: Referencing labels in SELECT subqueries

I'm trying to figure out how to replicate the below query in SQLAlchemy
SELECT c.company_id AS company_id,
(SELECT policy_id FROM associative_table at WHERE at.company_id = c.company_id) AS policy_id_ref,
(SELECT `default` FROM policy p WHERE p.policy_id = policy_id_ref) AS `default`,
FROM company c;
Note that this is a stripped down, basic example of what I'm really dealing with. The actual schema supports data and relationship versioning that requires the subqueries to include additional conditions, sorting, and limiting, making it impractical (if not impossible) for them to be joins.
The crux of the problem is in how the second subquery relies on policy_id_ref -- the value obtained from the first subquery. In SQLAlchemy, this is effectively what I have now:
ct = aliased(classes.company)
at = aliased(classes.associative_table)
pt = aliased(classes.policy)
policy_id_ref = session.query(at.policy_id).\
filter(at.company_id == ct.company_id).\
label('policy_id_ref')
policy_default = session.query(pt.default).\
filter(pt.id == 'policy_id_ref').\
label('default')
query = session.query(ct.company_id,policy_id_ref,policy_default)
The pull from the "company" table works fine as does the first subquery that retrieves the "policy_id_ref" column. The problem is the second subquery that has to reference that "policy_id_ref" column. I don't know how to write its filter in such a way that it literally renders "policy_id_ref" in the resulting query, to match the label of the first subquery.
Suggestions?
Thanks in advance
You can write your query as
select(
Companies.company_id,
AssociativeTable.policy_id.label('policy_id_ref'),
Policy.default.label('policy_default'),
).select_from(
Companies,
).join(
AssociativeTable,
AssociativeTable.company_id == Companies.company_id,
).join(
Policy,
AssociativeTable.policy_id == Policy.id
)
but in case you need reference to label from subquery => use literal_column
from sqlalchemy import func, select, literal_column
session.query(
func.array_agg(
literal_column('batch_info'),
JSONB
).label('history')
).select_from(
select(
func.jsonb_build_object(
'batch_id', AccountingQueueBatch.id,
'batch_label', AccountingQueueBatch.label,
).label('batch_info')
).select_from(
AccountingQueueBatch,
)
)

Dynamic variable parameter in static mysql query using groovy soap ui

I would like to generate the query that results for the BeneficiaryID 'ABC123' along with some other inputs if they were also given. Suppose if the currency value is given, I would like to include the Currency condition as well in the JOIN query, so as well the Category. I have the following code snippet in the SOAP UI Groovy script.
query= " CORR.BeneficiaryID LIKE 'ABC123'"
if (currencyValue!=""){
query=query + " and CORR.Currency LIKE '${currencyValue}'"
}
if (CategoryValue!=""){
query=query + " and CORR.Category LIKE '${CategoryValue}'"
}
log.info("Query" + query)
Outputrows = sql.rows("select CORR.Preferred as preferred ,CORR.Category as category,CORR.Currency as currency\
from BENEFICIARY CORR \
JOIN LOCATION LOC on CORR.UID=LOC.UID and ${query}
log.info("Output rows size" + Outputrows.size())
When currency and category are not given, I would like to have the following query run and get me the results.
select CORR.Preferred as preferred ,CORR.Category as category,CORR.Currency as currency\
from BENEFICIARY CORR \
JOIN LOCATION LOC on CORR.UID=LOC.UID and CORR.BeneficiaryID LIKE 'ABC123'
and when the currency and category are given(say USD & Commercial), then the following query.
select CORR.Preferred as preferred ,CORR.Category as category,CORR.Currency as currency\
from BENEFICIARY CORR \
JOIN LOCATION LOC on CORR.UID=LOC.UID and CORR.BeneficiaryID LIKE 'ABC123' and CORR.Currency LIKE 'USD' and CORR.Category LIKE 'Commercial'
All I could see on the result for Outputrows.size() is zero(0).
Can you please correct me where am I doing wrong.
Thanks.
Here is changed script.
Since the issue to just build query, only putting that part remove sql execution part as that is not really the issue.
//Define the values or remove if you get those value from somewhere else
//Just put them here to demonstrate
//You may also try by empty value to make sure you are getting the right query
def currencyValue = 'USD'
def categoryValue = 'Commercial'
def query = 'select CORR.Preferred as preferred, CORR.Category as category,CORR.Currency as currency from BENEFICIARY CORR JOIN LOCATION LOC on CORR.UID = LOC.UID and CORR.BeneficiaryID LIKE \'ABC123\''
currencyValue ? (query += " and CORR.Currency LIKE '${currencyValue}'") : query
categoryValue ? (query += " and CORR.Category LIKE '${categoryValue}'") : query
log.info "Final query is \n ${query}"
You can just pass query to further where you need to run the sql, say sql.rows(query)
You may quickly try Demo

Resources