I have a timestamp value in a varchar column. the value looks like below.
2020-10-31T23:36:03.000+0000
I want to convert this to below for using it in my query filter.
2020-10-31
I tried using date_parse and split_part:
SELECT date_parse('2020-06-30T17:17:35.000+0000','%Y-%m-%d %H:%i:%s:%f') as xy
and
where cast(split_part('2020-06-30T17:17:35.000+0000', ' ', 1) as date) >= date '2020-06-30'
Both return error saying:
presto error: Invalid format: "2020-06-30T17:17:35.000+0000" is malformed at "T17:17:35.000+0000"
Can someone point me in the right direction?
Using this solved my problem
cast(from_iso8601_timestamp('2020-06-30T17:17:35.000+0000') as DATE) >= date '2020-06-30'
Related
I need to join dataframes with dates in the format '%Y%m%d'. Some data is wrong or missing and when I put pandas with:
try: df['data'] = pd.to_datetime(df['data'], format='%Y%m%d')
except: pass
If 1 row is wrong, it fails to convert the whole column. I would like it to skip only the rows with error without converting.
I could solve this by lopping with datetime, but my question is, is there a better solution for this with pandas?
Pass errors = 'coerce' to pd.to_datetime to convert the values with wrong date format to NaT. Then you can use Series.fillna to fill those NaT with the input values.
df['data'] = (
pd.to_datetime(df['data'], format='%Y%m%d', errors='coerce')
.fillna(df['data'])
)
From the docs
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
If 'raise', then invalid parsing will raise an exception.
If 'coerce', then invalid parsing will be set as NaT.
If 'ignore', then invalid parsing will return the input.
I have a MS SQL Server DateTime field, and Im trying to search all records that are in between a date range:
mySqlString = "select * from users where signupDate >=#from and signupdate <=#to"
The two variables containing the date range come with format MM/dd/yyyy (dataFrom and dataTo, so Im replacing #from and #to at the string as follows:
datefrom = new Date(dataFrom);
dateto = new Date(dataTo);
req.input('from', sql.DateTime, datefrom )
req.input('to', sql.DateTime, dateto )
But I do not get any result.
What's the best approach to get this working properly?
You can always use CONVERT to accommodate your SQL query to your input format. In your case its format 101: select convert(varchar, getdate(), 101) ---> mm/dd/yyyy
So your query should look like
where (signupdate >= CONVERT(date, #from, 101)) AND (signupdate <= CONVERT(date, #to, 101))
This way you won't worry about the time of the stored date
req.input('from', sql.Date, (dataFrom))
req.input('to', sql.Date, (dataTo))
Assuming you checked if dataFrom and dataTo have valid dates.
I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)
I have a widget called 'filedate' where you can specify a date, were you enter the date in the format 'yyyy-mm-dd', my example will use '2019-10-01' .
I get the value from the widget with the following:
val fileloaddate = dbutils.widgets.get("filedate")
If I print the fileloaddate, it shows 2019-10-01, I need to use it in a query so if I do a 'select to_timestamp(${fileloaddate}, "yyyy-mm-dd")' it errors as it's seeing the variable as '((2019 - 8) -18). If I cast the string to a date, for example
select to_timestamp(to_date(${prundate}), "yyyy-mm-dd")
with the error of 'cannot resolve 'CAST(((2019 - 8) - 18) AS DATE)'
select to_timestamp(to_date('2019-10-01'), "yyyy-mm-dd")
works fine. I have googled around the answer but can't seem to see what I doing wrong.
thanks
Azure DataBrick you can use getArgument to convert date into desire output
dbutils.widgets.text("x","2018-09-12")
select to_timestamp(to_date(getArgument("x")), "yyyy-mm-dd")
hope this helps you
If Scala is disabled then you need to use the legacy method for retrieving sql parameters from widgets
.
select * from database where parameter='$file'
For sql I can pass the date as a text
%sql
create widget 'StartDate' DEFAULT 'YYYY-MM-DD' (this is simply text to remind the user to input the date in a format the sql query can use)
select * from my_table Between '$StartDate' and '$EndDate'
https://docs.databricks.com/notebooks/widgets.html#legacy-input
https://docs.databricks.com/notebooks/widgets.html
I have a table ServerHistory with multiple varchar and datetime fields in a SybaseASE database and I am using pyodbc library to connect to the database from my Python codebase.
The table allows for null datetime and has a few records where updDate is null but when I execute a sql from Python to read such records, I never get null in the updDate field.
Instead it picks up the value of previous datetime field liveDate and fills it as updDate. If I just select updDate, it gives me a ValueError "year 0/any no. is out of range".
connection=pyodbc.connect('connection_str')
results=connection.execute("select updDate from ServerHistory where server=\'srvr1\'").fetchone()
ValueError: year 31728 is out of range
I believe the error maybe happening because Python does not allow times below min year of 1 as per sqlalchemy can't read null dates from sqlite3 (0000-00-00): ValueError: year is out of range
Nevertheless, I still don't understand how to resolve my issue to get null as the datetime when I execute the query.
Hack-Ish:
Use coalesce(updDate , date('0001/01/01') where you select updDate, .. from ... and treat any such updDate as None when you use the data gathered from pyodbc.