Querying BigQuery based on where clause for dates

Querying BigQuery based on where clause for dates - python-3.x

I am trying to query the data from BigQuery based on where conditions where the data is extracted within specified input dates. I have tried the following two solutions, but neither of them are working for me.
SELECT Count(*)
FROM `my-one-330114.results_out.results_validation_out` t
WHERE date(parse_datetime('%d/%m/%Y', t.{col})) AS date_conv1 >= {end_time}
AND date(parse_datetime('%d/%m/%Y', t.{col})) AS date_conv2 <= {start_time}
SELECT *
FROM `my-one-330114.results_out.results_validation_out`
WHERE parse({column} AS int64) <= format_date("%Y%m%d", parse({column} AS int64)) """ #AND CAST('%d/%m/%Y', DATE) <= CAST('%Y-%m-%d', '2021-11-01')

Related

Cosmos Db Sql Query produces drastically different results when using order by

I have a Cosmos Db instance with > 1 Million JSON Documents stored in it.
I am trying to pull data of a certain time frame as to when the document was created based on the _ts variable which is auto-generated when the document is inserted. It represents the UNIX timestamp of that moment.
I am unable to understand, why both these queries produce drastically different results:
Query 1:
Select *
from c
where c._ts > TimeStamp1
AND c._ts < TimeStamp2
Produces 0 results
Query 2
Select *
from c
where c._ts > TimeStamp1
AND c._ts < TimeStamp2
order by c._ts desc
Produces the correct number of results.
What I have tried?
I suspected that might be because of the default CosmosDb index on the data. So, I rewrote the index policy to index only that variable. Still the same problem.
Since my end purpose is to group by on the returned data from the query, then I tried to use group by with order by alone or in a subquery. Surprisingly, according to the docs, CosmosDb yet doesn't support using group by with order by.
What I need help on?
Why am I observing such a behavior?
Is there a way to index the Db in such a way that the rows are returned.
Beyond this, is there a way to still use group by and order by together (Please don't link the question to another one because of this point, I have gone through them and their answers are not valid in my case).

#Andy and #Tiny-wa, Thanks for replying.
I was able to understand the unintended behavior and it was showing up because of the GetCurrentTimestamp() used to calculate the TimeStamps. The documentation states that
This system function will not utilize the index. If you need to
compare values to the current time, obtain the current time before
query execution and use that constant string value in the WHERE
clause.
Although, I don't fully understand what this means but I was to solve this by creating a Stored Procedure where the Time Stamp is fetched before the SQLAPI query is formed and executed and I was able to get the rows as expected.
Stored Procedure Pseudocode for that is like:
function FetchData(){
..
..
..
var Current_TimeStamp = Date.now();
var CDbQuery =
`Select *
FROM c
where (c._ts * 10000000) > DateTimeToTicks(DateTimeAdd("day", -1, TicksToDateTime(` + Current_TimeStamp + ` * 10000)))
AND (c._ts * 10000000) < (` + Current_TimeStamp + ` * 10000)`
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
XQuery,
function (err, feed, options) {
..
..
..
});
}

ADF data flow not giving cosmos query results with the parameters

Disclaimer : I am very new to Azure Development
In Azure Data factory Dataflow in source option when I have hardcoded the date string and used below query it gives the results as expected for cosmos DB.
“select c.column1,c.column2 from c where c.time_stamp >= '2010-01-01T20:28:45Z' and c.time_stamp <= '2020-09-11T20:28:45Z'”
When I have passed the parameters which I have mapped in pipeline and use the query with parameters I am not getting any result.
"oldwatermark": "'2010-01-01T20:28:45Z'",
"newwatermark": "'2020-09-11T20:28:45Z'"
“select c.column1,c.column2 from c where c.time_stamp >= ‘$oldwatermark’ and c.time_stamp <= ‘$oldwatermark’”
Could you please suggest what am I doing wrong here as my parameter values and hardcoded values are same.

Just from your worked statements, your query should be:
select c.column1,c.column2 from c where c.time_stamp >= $oldwatermark and c.time_stamp <= $newwatermark
not where c.time_stamp >= $oldwatermark and c.time_stamp <= $oldwatermark.
Please don't use the quotes for the parameter in the query.
Please try this query:
concat('select c.column1,c.column2 from c where c.time_stamp >= ',$oldwatermark,'and c.time_stamp <= ',$newwatermark)

How to apply DISTINCT on only date part of datetime field in sqlalchemy python?

I need to query my database and return the result by applying Distinct on only date part of datetime field.
My code is:
#blueprint.route('/<field_id>/timeline', methods=['GET'])
#blueprint.response(field_timeline_paged_schema)
def get_field_timeline(
field_id,
page=1,
size=10,
order_by=['capture_datetime desc'],
**kwargs
):
session = flask.g.session
field = fetch_field(session, parse_uuid(field_id))
if field:
query = session.query(
func.distinct(cast(Capture.capture_datetime, Date)),
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id))
return paginate(
query=query,
order_by=order_by,
page=page,
size=size
)
However this returns the following error:
(psycopg2.errors.InvalidColumnReference) for SELECT DISTINCT, ORDER BY expressions must appear in select list
The resulting query is:
SELECT distinct(CAST(tenant_resson.capture.capture_datetime AS DATE)) AS distinct_1, CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date, tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s
Error is:
Query error - {'error': ProgrammingError('(psycopg2.errors.InvalidColumnReference) SELECT DISTINCT ON expressions must match initial ORDER BY expressions\nLINE 2: FROM (SELECT DISTINCT ON (CAST(tenant_resson.capture.capture...\n ^\n',)
How to resolve this issue? Cast is not working for order_by.

I am not familiar with sqlalchemy but this resulting query works as you expect. Please note the DISTINCT ON.
Maybe there is a way in sqlalchemy to execute non-trivial parameterized queries? This would give you the extra benefit to be able to test and optimize the query upfront.
SELECT DISTINCT ON (CAST(tenant_resson.capture.capture_datetime AS DATE))
CAST(tenant_resson.capture.capture_datetime AS DATE) AS event_date,
tenant_resson.capture.tags -> %(tags_1)s AS visibility
FROM tenant_resson.capture
WHERE tenant_resson.capture.field_id = %(field_id_1)s;
You can order by event_date if your business logic needs.

The query posted by #Stefanov.sm is correct. In SQLAlchemy terms it would be
query = (
session.query(
Capture.capture_datetime.label('event_date'),
Capture.tags['visibility'].label('visibility')
).distinct(cast(Capture.capture_datetime, Date))\
.filter(Capture.field_id == parse_uuid(field_id))
)
See the docs for more information

I needed to add order_by to my query. Now it works fine.
query = session.query(
cast(Capture.capture_datetime, Date).label('event_date'),
Capture.tags['visibility'].label('visibility')
).filter(Capture.field_id == parse_uuid(field_id)) \
.distinct(cast(Capture.capture_datetime, Date)) \
.order_by(cast(Capture.capture_datetime, Date).desc())

PySpark DataFrame Code for an HiveQL that takes 3-4 hours

The following HiveQL code takes about 3 to 4 hours and I am trying effectively convert this into a pyspark data frame code. Any dataframe experts input is appreciated a lot.
INSERT overwrite table dlstage.DIBQtyRank_C11 PARTITION(fiscalyearmonth)
SELECT * FROM
(SELECT a.matnr, a.werks, a.periodstartdate, a.fiscalyear, a.fiscalmonth,b.dy_id, MaterialType,(COALESCE(a.salk3,0)) salk3,(COALESCE(a.lbkum,0)) lbkum, sum(a.valuatedquantity) AS valuatedquantity, sum(a.InventoryValue) AS InventoryValue,
rank() over (PARTITION by dy_id, werks, matnr order by a.max_date DESC) rnk, sum(stprs) stprs, max(peinh) peinh, fcurr,fiscalyearmonth
FROM dlstage.DIBmsegFinal a
LEFT JOIN dlaggr.dim_fiscalcalendar b ON a.periodstartdate=b.fmth_begin_dte WHERE a.max_date >= b.fmth_begin_dte AND a.max_date <= b.dy_id and
fiscalYearmonth = concat(fyr_id,lpad(fmth_nbr,2,0))
GROUP BY a.matnr, a.werks,dy_id, max_date, a.periodstartdate, a.fiscalyear, a.fiscalmonth, MaterialType, fcurr, COALESCE(a.salk3,0), COALESCE(a.lbkum,0),fiscalyearmonth) a
WHERE a.rnk=1 and a.fiscalYear = '%s'" %(year) + " and a.fiscalmonth ='%s'" %(mnth)

python oracle where clause containing date greater than comparison

I am trying to use cx_Oracle to query a table in oracle DB (version 11.2) and get rows with values in a column between a datetime range.
I have tried the following approaches:
Tried between clause as described here, but cursor gets 0 rows
parameters = (startDateTime, endDateTime)
query = "select * from employee where joining_date between :1 and :2"
cur = con.cursor()
cur.execute(query, parameters)
Tried the TO_DATE() function and Date'' qualifiers. Still no result for Between or >= operator. Noteworthy is that < operator works. I also got the same query and tried in a sql client, and the query returns results. Code:
#returns no rows:
query = "select * from employee where joining_date >= TO_DATE('" + startDateTime.strftime("%Y-%m-%d") + "','yyyy-mm-dd')"
cur = con.cursor()
cur.execute(query)
#tried following just to ensure that some query runs fine, it returns results:
query = query.replace(">=", "<")
cur.execute(query)
Any pointers about why the between and >= operators are failing for me? (my second approach was in line with the answer in Oracle date comparison in where clause but still doesn't work for me)
I am using python 3.4.3 and used cx_Oracle 5.3 and 5.2 with oracle client 11g on windows 7 machine

Assume that your employee table contains the field emp_id and the row with emp_id=1234567 should be retrieved by your query.
Make two copies of your a program that execute the following queries
query = "select to_char(:1,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(joining_date,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(:2,'YYYY-MM-DD HH24:MI:SS') resultstring from employee where emp_id=1234567"
and
query="select to_char(joining_date,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(TO_DATE('" + startDateTime.strftime("%Y-%m-%d") + "','yyyy-mm-dd'),'YYYY-MM-DD HH24:MI:SS') resultstring from employee where emp_id=1234567"
Show us the code and the value of the column resultstring

You are constructing SQL queries as strings when you should be using parameterized queries. You can't use parameterization to substitute the comparison operators, but you should use it for the dates.
Also, note that the referenced answer uses the PostgreSQL parameterisation format, whereas Oracle requires you to use the ":name" format.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Querying BigQuery based on where clause for dates - python-3.x

Related

Cosmos Db Sql Query produces drastically different results when using order by

ADF data flow not giving cosmos query results with the parameters

How to apply DISTINCT on only date part of datetime field in sqlalchemy python?

PySpark DataFrame Code for an HiveQL that takes 3-4 hours

python oracle where clause containing date greater than comparison

Categories

Resources