Convert date string with AM PM to 24 Hour timestamp in Impala - string

I am trying to convert a date string with AM/PM to timestamp in impala to check data conversion.
My date string is as below:
10/07/2017 02:04:01.575000000 PM
I tried to convert this in Impala through below query:
select from_unixtime(unix_timestamp((Y_date), "MM/dd/yyyy HH:mm:ss.SSSSSSSSS 'ZZ'"), "yyyy-MM-dd HH:mm:ss.SSSSSS 'ZZ'") from table
The result I get is
2017-10-07 02:04:01.000000 .
I only lose the AM/PM however the hour part "02" is not getting converted to timestamp value "14". I need to get below as result:
2017-10-07 14:04:01.000000 .
I use Impala as my interface for querying Hadoop.
Any inputs would be helpful.
Thanks,
Vishu

Haven't found a built in function for this. Gotta do an inefficient double query, adding 12 hours for PM:
SELECT cast(unix_timestamp(Y_date , "dd/MM/yyyy HH:mm:ss") + 43200 as timestamp) as
FROM table_a
where instr(`datetime`,'PM') > 0
union
SELECT cast(unix_timestamp(Y_date , "dd/MM/yyyy HH:mm:ss") as timestamp) as action_time
FROM table_a
where instr(`datetime`,'AM') > 0

Related

How to extract the year and quarter from the String date in databricks SQL

Can someone show me how to extract the year from the String date in databricks SQL.
I am based in the UK and our date format is normally as follows:
dd/mm/yyyy
The field containing the dates is set as StringType()
I am trying to extract the year from the string as follows:
select year(cast(financials_0_accountsDate as Date)) from `financiallimited_csv`
I'm using the following the code to extract the quarter
select quarter(cast(financials_0_accountsDate as Date)) from `financiallimited_csv`
However, both result in NULL values.
Any thoughts on how to extract the year and quarter from dates with StringType() dd/mm/yyyy?
The table looks like the following:
Could you try the to_date function?
select year(to_date(financials_0_accountsDate, 'dd/MM/yyyy')) from `financiallimited_csv`

Why Spark is not recognizing this time format?

I get null for the timestamp 27-04-2021 14:11 with this code. What mistake am I doing? Why is the timestamp format string DD-MM-yyyy HH:mm not correct here?
df = spark.createDataFrame([('27-04-2021 14:11',)], ['t'])
df = df.select(to_timestamp(df.t, 'DD-MM-yyyy HH:mm').alias('dt'))
display(df)
D is for day of the year, and d is for day of the month.
Try this:
df = df.select(F.to_timestamp(df.t, "dd-MM-yyyy HH:mm").alias("dt"))

HIVE where date filter by x days back? string format

so our DBA's setup our hive table with the date column as the partition column, but as a "string" YYYYMMDD format.
How can I WHERE filter this "date" column for something like last 30 days?
Please use date_format to format systemdate - 30 days into YYYYMMDD and then compare with your partition column. Please note to use partition column as is so hive can choose correct partitions.
When you want to pick previous 30th days data -
select *
from mytable
where partition_col = date_format( current_date() - interval '30' days, 'yyyyMMdd')
If you want all data since last 30 days -
select *
from mytable
wherecast(partition_col as INT) >= cast(date_format( current_date() - interval '30' days, 'yyyyMMdd') as INT)
casting shouldnt impact partition benefits but you need to check the performance before using it. Please get back in such scenario.

Spark SQL: How to convert time string column in "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" format to timestamp preserving nanoseconds?

I am trying to convert a String type column which is having timestamp string in "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" format to Timestamp type. This cast operation should preserve nanosecond values.
I tried using unix_timestamp() and to_timestamp() methods by specifying the timestamp format, but returning NULL values.
using cast:
hive> select cast('2019-01-01 12:10:10.123456789' as timestamp);
OK
2019-01-01 12:10:10.123456789
Time taken: 0.611 seconds, Fetched: 1 row(s)
using timestamp():
hive> select timestamp('2019-01-01 12:10:10.123456789','yyyy-MM-dd HH:mm:ss.SSSSSSSSS');
OK
2019-01-01 12:10:10.123456789
Time taken: 12.845 seconds, Fetched: 1 row(s)
As per the description provided in source code of TimestampType and DateTimeUtils classes, they support timestamps till microseconds precision only.
So, we cannot store timestamps with nanoseconds precision in Spark SQL's TimestampType column.
References:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampType.scala
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

cassandra extract month from timestamp

In Postgres, we have function to extract MONTH, YEAR etc from timestamp using EXTRACT function. See below.
SELECT EXTRACT(MONTH FROM TIMESTAMP '2001-02-16 20:38:40');
Is it possible to do the same in cassandra? Is there a function for this?
If it is possible, I can then run queries such as find all entries in year "2015" and month "may". This is possible in postgres using EXTRACT function.
I hope you have the answer, but you can create a function in your keyspace like this:
cqlsh> use keyspace;
cqlsh:keyspace> CREATE OR REPLACE FUNCTION YEAR (input TIMESTAMP)
RETURNS NULL ON NULL INPUT RETURNS TEXT
LANGUAGE java AS 'return input.toString().substring(0,4);';
cqlsh:kespace> SELECT YEAR('2001-02-16 20:38:40') as year FROM ...;
In Cassandra you would handle that a little differently. You can have fields in your table of type timestamp or timeuuid, and then use that field in a time range query.
For example if you wanted all entries for 2015 and May and have a timestamp field called 'date', you could do a query like this:
SELECT * from mytable where date > '2015-05-01' and date < '2015-06-01' allow filtering;
You can use a number of different formats for specifying the date to specify the time more precisely (such as down to fractions of seconds).
Cassandra converts date strings using the org.apache.commons.lang3.time.DateUtils class, and allows the following date formats:
private static final String[] dateStringPatterns = new String[] {
"yyyy-MM-dd HH:mm",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd HH:mmX",
"yyyy-MM-dd HH:mmXX",
"yyyy-MM-dd HH:mmXXX",
"yyyy-MM-dd HH:mm:ssX",
"yyyy-MM-dd HH:mm:ssXX",
"yyyy-MM-dd HH:mm:ssXXX",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSSX",
"yyyy-MM-dd HH:mm:ss.SSSXX",
"yyyy-MM-dd HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'HH:mm",
"yyyy-MM-dd'T'HH:mmX",
"yyyy-MM-dd'T'HH:mmXX",
"yyyy-MM-dd'T'HH:mmXXX",
"yyyy-MM-dd'T'HH:mm:ss",
"yyyy-MM-dd'T'HH:mm:ssX",
"yyyy-MM-dd'T'HH:mm:ssXX",
"yyyy-MM-dd'T'HH:mm:ssXXX",
"yyyy-MM-dd'T'HH:mm:ss.SSS",
"yyyy-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSXX",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd",
"yyyy-MM-ddX",
"yyyy-MM-ddXX",
"yyyy-MM-ddXXX"
};
But note that Cassandra is not as good at ad hoc queries as a relational database like Postgres. So typically you would set up your table schema to group the time ranges you wanted to query into separate partitions within a table.

Resources