cassandra extract month from timestamp - cassandra

In Postgres, we have function to extract MONTH, YEAR etc from timestamp using EXTRACT function. See below.
SELECT EXTRACT(MONTH FROM TIMESTAMP '2001-02-16 20:38:40');
Is it possible to do the same in cassandra? Is there a function for this?
If it is possible, I can then run queries such as find all entries in year "2015" and month "may". This is possible in postgres using EXTRACT function.

I hope you have the answer, but you can create a function in your keyspace like this:
cqlsh> use keyspace;
cqlsh:keyspace> CREATE OR REPLACE FUNCTION YEAR (input TIMESTAMP)
RETURNS NULL ON NULL INPUT RETURNS TEXT
LANGUAGE java AS 'return input.toString().substring(0,4);';
cqlsh:kespace> SELECT YEAR('2001-02-16 20:38:40') as year FROM ...;

In Cassandra you would handle that a little differently. You can have fields in your table of type timestamp or timeuuid, and then use that field in a time range query.
For example if you wanted all entries for 2015 and May and have a timestamp field called 'date', you could do a query like this:
SELECT * from mytable where date > '2015-05-01' and date < '2015-06-01' allow filtering;
You can use a number of different formats for specifying the date to specify the time more precisely (such as down to fractions of seconds).
Cassandra converts date strings using the org.apache.commons.lang3.time.DateUtils class, and allows the following date formats:
private static final String[] dateStringPatterns = new String[] {
"yyyy-MM-dd HH:mm",
"yyyy-MM-dd HH:mm:ss",
"yyyy-MM-dd HH:mmX",
"yyyy-MM-dd HH:mmXX",
"yyyy-MM-dd HH:mmXXX",
"yyyy-MM-dd HH:mm:ssX",
"yyyy-MM-dd HH:mm:ssXX",
"yyyy-MM-dd HH:mm:ssXXX",
"yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd HH:mm:ss.SSSX",
"yyyy-MM-dd HH:mm:ss.SSSXX",
"yyyy-MM-dd HH:mm:ss.SSSXXX",
"yyyy-MM-dd'T'HH:mm",
"yyyy-MM-dd'T'HH:mmX",
"yyyy-MM-dd'T'HH:mmXX",
"yyyy-MM-dd'T'HH:mmXXX",
"yyyy-MM-dd'T'HH:mm:ss",
"yyyy-MM-dd'T'HH:mm:ssX",
"yyyy-MM-dd'T'HH:mm:ssXX",
"yyyy-MM-dd'T'HH:mm:ssXXX",
"yyyy-MM-dd'T'HH:mm:ss.SSS",
"yyyy-MM-dd'T'HH:mm:ss.SSSX",
"yyyy-MM-dd'T'HH:mm:ss.SSSXX",
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
"yyyy-MM-dd",
"yyyy-MM-ddX",
"yyyy-MM-ddXX",
"yyyy-MM-ddXXX"
};
But note that Cassandra is not as good at ad hoc queries as a relational database like Postgres. So typically you would set up your table schema to group the time ranges you wanted to query into separate partitions within a table.

Related

spark dataframe: date formatting not working

I have a csv file in which a date column has values like 01080600, basically MM-dd-HH-mm.
I want to add a column in dataframe which shows this in a more readable format.
I do :
spark.sql("SELECT date...")
.withColumn("readable date", to_date(col("date"), "MM:dd HH:mm"))
.show(10)
But readable date is returned null.
What am I missing here?
While formating or converting to date or timestamp you need to provide the date_format as is following your pattern , example in your case you need to modify your format as below and further which can be formatted depending on the final format you wish your date col to take using date_format
References to various patterns and parsing can be found here
To Timestamp
sql.sql("""
SELECT
TO_TIMESTAMP('01080600','ddMMhhmm') as date,
DATE_FORMAT(TO_TIMESTAMP('01080600','ddMMhhmm'),'MM/dd hh:mm') as formated_date
""").show()
+-------------------+-------------+
| date|formated_date|
+-------------------+-------------+
|1970-08-01 06:00:00| 08/01 06:00|
+-------------------+-------------+

HIVE where date filter by x days back? string format

so our DBA's setup our hive table with the date column as the partition column, but as a "string" YYYYMMDD format.
How can I WHERE filter this "date" column for something like last 30 days?
Please use date_format to format systemdate - 30 days into YYYYMMDD and then compare with your partition column. Please note to use partition column as is so hive can choose correct partitions.
When you want to pick previous 30th days data -
select *
from mytable
where partition_col = date_format( current_date() - interval '30' days, 'yyyyMMdd')
If you want all data since last 30 days -
select *
from mytable
wherecast(partition_col as INT) >= cast(date_format( current_date() - interval '30' days, 'yyyyMMdd') as INT)
casting shouldnt impact partition benefits but you need to check the performance before using it. Please get back in such scenario.

In Cassandra cql query how to convert string containing timestamp and then using in where clause for time series queries?

In Cassandra table, created_date is string column "2018-04-26 12:59:38 UTC". Need to use this column to create time series query like
Select * from dynamic_data where toTimeStamp(created_date) >=? and created_date <=?;
Is there any inbuilt function in Cassandra to convert string to timestamp and then use in time series query?

Parsing non iso datetime string to just date part in presto

I have table which stores datetime as varchar
Format looks like this 2018-07-16 15:00:00.0 ,
I want to parse this to extract only date part so that I use date part to compare with date in string format such as '2018-07-20' in where clause. What is the best way to achieve this in presto?
This particular format (based on example value 2018-07-16 15:00:00.0 in the question) is understood by cast from varchar to timestamp. You then need to extract date part with another cast:
presto> SELECT CAST(CAST('2018-07-16 15:00:00.0' AS timestamp) AS date);
_col0
------------
2018-07-16
(1 row)

Convert date string with AM PM to 24 Hour timestamp in Impala

I am trying to convert a date string with AM/PM to timestamp in impala to check data conversion.
My date string is as below:
10/07/2017 02:04:01.575000000 PM
I tried to convert this in Impala through below query:
select from_unixtime(unix_timestamp((Y_date), "MM/dd/yyyy HH:mm:ss.SSSSSSSSS 'ZZ'"), "yyyy-MM-dd HH:mm:ss.SSSSSS 'ZZ'") from table
The result I get is
2017-10-07 02:04:01.000000 .
I only lose the AM/PM however the hour part "02" is not getting converted to timestamp value "14". I need to get below as result:
2017-10-07 14:04:01.000000 .
I use Impala as my interface for querying Hadoop.
Any inputs would be helpful.
Thanks,
Vishu
Haven't found a built in function for this. Gotta do an inefficient double query, adding 12 hours for PM:
SELECT cast(unix_timestamp(Y_date , "dd/MM/yyyy HH:mm:ss") + 43200 as timestamp) as
FROM table_a
where instr(`datetime`,'PM') > 0
union
SELECT cast(unix_timestamp(Y_date , "dd/MM/yyyy HH:mm:ss") as timestamp) as action_time
FROM table_a
where instr(`datetime`,'AM') > 0

Resources