Disclaimer : I am very new to Azure Development
In Azure Data factory Dataflow in source option when I have hardcoded the date string and used below query it gives the results as expected for cosmos DB.
“select c.column1,c.column2 from c where c.time_stamp >= '2010-01-01T20:28:45Z' and c.time_stamp <= '2020-09-11T20:28:45Z'”
When I have passed the parameters which I have mapped in pipeline and use the query with parameters I am not getting any result.
"oldwatermark": "'2010-01-01T20:28:45Z'",
"newwatermark": "'2020-09-11T20:28:45Z'"
“select c.column1,c.column2 from c where c.time_stamp >= ‘$oldwatermark’ and c.time_stamp <= ‘$oldwatermark’”
Could you please suggest what am I doing wrong here as my parameter values and hardcoded values are same.
Just from your worked statements, your query should be:
select c.column1,c.column2 from c where c.time_stamp >= $oldwatermark and c.time_stamp <= $newwatermark
not where c.time_stamp >= $oldwatermark and c.time_stamp <= $oldwatermark.
Please don't use the quotes for the parameter in the query.
Please try this query:
concat('select c.column1,c.column2 from c where c.time_stamp >= ',$oldwatermark,'and c.time_stamp <= ',$newwatermark)
Related
I am trying to query the data from BigQuery based on where conditions where the data is extracted within specified input dates. I have tried the following two solutions, but neither of them are working for me.
SELECT Count(*)
FROM `my-one-330114.results_out.results_validation_out` t
WHERE date(parse_datetime('%d/%m/%Y', t.{col})) AS date_conv1 >= {end_time}
AND date(parse_datetime('%d/%m/%Y', t.{col})) AS date_conv2 <= {start_time}
SELECT *
FROM `my-one-330114.results_out.results_validation_out`
WHERE parse({column} AS int64) <= format_date("%Y%m%d", parse({column} AS int64)) """ #AND CAST('%d/%m/%Y', DATE) <= CAST('%Y-%m-%d', '2021-11-01')
I have a Cosmos Db instance with > 1 Million JSON Documents stored in it.
I am trying to pull data of a certain time frame as to when the document was created based on the _ts variable which is auto-generated when the document is inserted. It represents the UNIX timestamp of that moment.
I am unable to understand, why both these queries produce drastically different results:
Query 1:
Select *
from c
where c._ts > TimeStamp1
AND c._ts < TimeStamp2
Produces 0 results
Query 2
Select *
from c
where c._ts > TimeStamp1
AND c._ts < TimeStamp2
order by c._ts desc
Produces the correct number of results.
What I have tried?
I suspected that might be because of the default CosmosDb index on the data. So, I rewrote the index policy to index only that variable. Still the same problem.
Since my end purpose is to group by on the returned data from the query, then I tried to use group by with order by alone or in a subquery. Surprisingly, according to the docs, CosmosDb yet doesn't support using group by with order by.
What I need help on?
Why am I observing such a behavior?
Is there a way to index the Db in such a way that the rows are returned.
Beyond this, is there a way to still use group by and order by together (Please don't link the question to another one because of this point, I have gone through them and their answers are not valid in my case).
#Andy and #Tiny-wa, Thanks for replying.
I was able to understand the unintended behavior and it was showing up because of the GetCurrentTimestamp() used to calculate the TimeStamps. The documentation states that
This system function will not utilize the index. If you need to
compare values to the current time, obtain the current time before
query execution and use that constant string value in the WHERE
clause.
Although, I don't fully understand what this means but I was to solve this by creating a Stored Procedure where the Time Stamp is fetched before the SQLAPI query is formed and executed and I was able to get the rows as expected.
Stored Procedure Pseudocode for that is like:
function FetchData(){
..
..
..
var Current_TimeStamp = Date.now();
var CDbQuery =
`Select *
FROM c
where (c._ts * 10000000) > DateTimeToTicks(DateTimeAdd("day", -1, TicksToDateTime(` + Current_TimeStamp + ` * 10000)))
AND (c._ts * 10000000) < (` + Current_TimeStamp + ` * 10000)`
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
XQuery,
function (err, feed, options) {
..
..
..
});
}
I have this SQL query that I confirmed works in SQLite. It updates two columns in the Table. I have 144 columns that need to be updated using the same query. How can I, using Python, pass along variables so I can use the same query to update all of them?
Here is my query to update one column:
UPDATE GBPAUD_TA AS t1
SET _1m_L3_Time = COALESCE(
(
SELECT
MIN(
CASE t1.Action
WHEN 'Buy' THEN CASE WHEN (t2._1M_55 >= t2.Low AND t2._1M_55 < t2.Open) THEN t2.Date_Time END
WHEN 'Sell' THEN CASE WHEN (t2._1M_55 <= t2.High AND t2._1M_55 < t2.Open) THEN t2.Date_Time END
END
)
FROM GBPAUD_DATA t2
WHERE t2.Date_Time >= t1.Open_Date AND t2.Date_Time <= t1.New_Closing_Time
),
t1._1m_L3_Time
);
UPDATE GBPAUD_TA
SET _1m_L3_Price = (SELECT _1M_55
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA._1m_L3_Time)
where EXISTS (SELECT _1M_55
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA._1m_L3_Time)
Here is my query showing the variables that I would need to automatically insert:
UPDATE GBPAUD_TA AS t1
SET Variable1 = COALESCE(
(
SELECT
MIN(
CASE t1.Action
WHEN 'Buy' THEN CASE WHEN (t2.Variable2 >= t2.Low AND t2.Variable2< t2.Open) THEN t2.Date_Time END
WHEN 'Sell' THEN CASE WHEN (t2.Variable2 <= t2.High AND t2.Variable2< t2.Open) THEN t2.Date_Time END
END
)
FROM GBPAUD_DATA t2
WHERE t2.Date_Time >= t1.Open_Date AND t2.Date_Time <= t1.New_Closing_Time
),
t1.Variable1
);
UPDATE GBPAUD_TA
SET Variable3 = (SELECT Variable2
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA.Variable1)
where EXISTS (SELECT Variable2
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA.Variable1)
I have a total of 3 Variables.
Based upon googling and reading, I found a possible way by using host variables: I use the "?" in place of the variable, combine the variables into a tuple, and then use "executemany()"?
I tried this, but it did not work. It gave me an error:
"cursor.executemany(sql_update_query, SLTuple)
OperationalError: near "?": syntax error"
So what should I do? Any guidance is much appreciated!
Found the answer after I figured out the proper terminology: string formatting and interloping. Found the answer here.
accountBal.createOrReplaceTempView("accntBal")
var finalDf = spark.sql(
" SELECT CTC_ID, ACCNT_BAL, PAID_THRU_DT, DAYS(CURRENT_DATE) - DAYS(PAID_THRU_DT) AS DEL_DAYS FROM accntBal WHERE ACCNT_BAL > 0 AND PAID_THRU_DT <= CURRENT_DATE AND PAID_THRU_DT > '01/01/2000' AND PAID_THRU_DT is not null "
)
org.apache.spark.sql.AnalysisException: Undefined function: 'DAYS'
. This function is neither a registered temporary function nor a permanent function registered in the database
You should be using DATEDIFF to get the difference in days between two date:
SELECT
CTC_ID,
ACCNT_BAL,
PAID_THRU_DT,
DATEDIFF(CURRENT_DATE, PAID_THRU_T) AS DEL_DAYS
FROM accntBal
WHERE
ACCNT_BAL > 0 AND
PAID_THRU_DT > '2000-01-01' AND PAID_THRU_DT <= CURRENT_DATE;
Note: The NULL check on PAID_THRU_DT is probably not necessary, since a NULL value would fail the range check already.
In spark udf has to be registered to be used in your queries.
Register a function as a UDF
example :
val squared = (s: Long) => {
s * s
}
spark.udf.register("square", squared)
since you have not registered days as it was throwing this error.
I assume you have written a custom udf to know number of days between 2 dates.
How to debug ? :
To check your udf is there in the functions registerd with spark or not like this.
You can query for available standard and user-defined functions using the Catalog interface (that is available through SparkSession.catalog attribute).
val spark: SparkSession = ...
scala> spark.catalog.listFunctions.show(false)
it will be display all the functions defined within spark session.
Further reading : UDFs — User-Defined Functions
If not... you can try which is already present in spark functions.scala
static Column datediff(Column end, Column start) Returns the number of
days from start to end.
I am trying to use cx_Oracle to query a table in oracle DB (version 11.2) and get rows with values in a column between a datetime range.
I have tried the following approaches:
Tried between clause as described here, but cursor gets 0 rows
parameters = (startDateTime, endDateTime)
query = "select * from employee where joining_date between :1 and :2"
cur = con.cursor()
cur.execute(query, parameters)
Tried the TO_DATE() function and Date'' qualifiers. Still no result for Between or >= operator. Noteworthy is that < operator works. I also got the same query and tried in a sql client, and the query returns results. Code:
#returns no rows:
query = "select * from employee where joining_date >= TO_DATE('" + startDateTime.strftime("%Y-%m-%d") + "','yyyy-mm-dd')"
cur = con.cursor()
cur.execute(query)
#tried following just to ensure that some query runs fine, it returns results:
query = query.replace(">=", "<")
cur.execute(query)
Any pointers about why the between and >= operators are failing for me? (my second approach was in line with the answer in Oracle date comparison in where clause but still doesn't work for me)
I am using python 3.4.3 and used cx_Oracle 5.3 and 5.2 with oracle client 11g on windows 7 machine
Assume that your employee table contains the field emp_id and the row with emp_id=1234567 should be retrieved by your query.
Make two copies of your a program that execute the following queries
query = "select to_char(:1,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(joining_date,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(:2,'YYYY-MM-DD HH24:MI:SS') resultstring from employee where emp_id=1234567"
and
query="select to_char(joining_date,'YYYY-MM-DD HH24:MI:SS')||' >= '||to_char(TO_DATE('" + startDateTime.strftime("%Y-%m-%d") + "','yyyy-mm-dd'),'YYYY-MM-DD HH24:MI:SS') resultstring from employee where emp_id=1234567"
Show us the code and the value of the column resultstring
You are constructing SQL queries as strings when you should be using parameterized queries. You can't use parameterization to substitute the comparison operators, but you should use it for the dates.
Also, note that the referenced answer uses the PostgreSQL parameterisation format, whereas Oracle requires you to use the ":name" format.