DataStax Bulk Loader - to change date format in Cassandra - cassandra

Does anyone know this tool (DataStax Bulk Loader)? I'd like to change date format in some rows from 2020-05-18T14:18:45.878Z to 1593402243336 (like Instant Java type/epochMilliseconds) because of the error in code.
The date in this "column" in cassandra is of type text. Is it possible? I tried to create a proper script but without any success

No, all available options for time/date conversion are applied to the columns with time, date, or timestamp types. As I suggested in the previous answer, you need to create a column with timestamp type, unload from the text column, and load into the new column, and then unload into milliseconds format if it's necessary...
P.S. Although, can you add more information why you need so complex approach?

Related

Remove Duplicates based on latest date in power query

I got a dataset that I am loading into my sheet via power query and wish to transform the data a little bit according to my liking before loading it in.
To give a little more context, I have some ID's and I would like the older rows to be removed and the rows which have the newer date to be loaded in.
Solution is described at https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/
"Remove Duplicates and Keep the Last Record with Power Query"
In short, sort per date in a buffered table and then remove duplicate id
Another way I think would be to group by id and get MAX date but it depends of the data size

Unable to coerce to a formatted date - Cassandra timestamp type

I have the values stored for timestamp type column in cassandra table in format of
2018-10-27 11:36:37.950000+0000 (GMT date).
I get Unable to coerce '2018-10-27 11:36:37.950000+0000' to a formatted date (long) when I run below query to get data.
select create_date from test_table where create_date='2018-10-27 11:36:37.950000+0000' allow filtering;
How to get the query working if the data is already stored in the table (of format, 2018-10-27 11:36:37.950000+0000) and also perform range (>= or <=) operations on create_date column?
I tried with create_date='2018-10-27 11:36:37.95Z',
create_date='2018-10-27 11:36:37.95' create_date='2018-10-27 11:36:37.95'too.
Is it possible to perform filtering on this kind of timestamp type data?
P.S. Using cqlsh to run query on cassandra table.
In first case, the problem is that you specify timestamp with microseconds, while Cassandra operates with milliseconds - try to remove the three last digits - .950 instead of .950000 (see this document for details). The timestamps are stored inside Cassandra as 64-bit number, and then formatted when printing results using the format specified by datetimeformat options of cqlshrc (see doc). Dates without explicit timezone will require that default timezone is specified in cqlshrc.
Regarding your question about filtering the data - this query will work only for small amounts of data, and on bigger data sizes will most probably timeout, as it will need to scan all data in the cluster. Also, the data won't be sorted correctly, because sorting happens only inside single partition.
If you want to perform such queries, then maybe the Spark Cassandra Connector will be the better choice, as it can effectively select required data, and then you can perform sorting, etc. Although this will require much more resources.
I recommend to take DS220 course from DataStax Academy to understand how to model data for Cassandra.
This is works for me
var datetime = DateTime.UtcNow.ToString("yyyy-MM-dd HH:MM:ss");
var query = $"SET updatedat = '{datetime}' WHERE ...

Cassandra time not saved in UTC

I need to split my timestamp to date and time separately and insert then to db columns with 'date' and 'time' cqltypes.
I was trying to insert a time value as string to Cassandra table. The time was converted to UTC (05:27:00). But while I checked table using Datastax devcenter, column was populated with value '09:37:54.935541808'. I tried to retrieve the value in spring using repository, then it was returning value as '3473746674935541808'.
How to get the correct value from table for time?
It looks like the limitation of Spring-data. In Cassandra time value is encoded as a 64-bit signed integer representing the number of nanoseconds since midnight. But I don't see the time type listed as supported in spring-data-cassandra documentation, so you may need to write your custom converter for it, as described in documentation.

Change date format on category axis in my PowerPivot column chart

I'm trying to show values based on date on a PivotChart (column) in Excel. The problem is that the category label does not seem to follow the number formatting I set. Yes, the underlying table is formatted as date type. I've Googled a bit and the fix that seems to work for most people is to have complete data set (no holes or empty cells). My dataset is complete, so it doesn't help me.
Any ideas?
If you want guaranteed consistency of display format, make a display date string in your date dimension that is formatted the way you like. Use this for labels in pivot tables and charts.
You will also have to sort this field by your date field in the model to make sure they display in chronological order.
This is not possible with data linked from a PowerPivot data model. It is something Microsoft has been aware of for a long while but unfortunately they haven't implemented the functionality you desire.
There is, however, one work around: you can create a calculated column using =FORMAT(<date>,"<date format>"). If you use the formated column instead of your usual date column the axis will be formatted according to your needs.
This is old, but creating a calculated column using =FORMAT(,""), With this approach you lose the ability to sort the Axis in proper date order.

Accessing timestamp of a Cassandra column

I am new to Cassandra.
I have a column Family where the columns are sorted by "LexicalUUIDType".
How can I access timestamp of each column in such a ColumnFamily?
I need to the timestamp because I have to read the oldest entry.
I can not use "TimeUUIDType" for sorting columns.
Thanks,
It depends on the library you are using. But if you are using the raw thrift api its something like (unreleased 0.7/trunk):
column.column.clock.timestamp
(To get all data you will have to use get_range_slices, start with "", and after each call use the last key as the start key in the next call)
You would have to get back all of the columns using get_slice http://wiki.apache.org/cassandra/API06#get_slice and then look at the timestamp field in each one.
Or you can make another column family sorted by timeuuid which has the corresponding column in the first cf as the value. Query cf #2 with the time you want, and use the result to get from cf #1.

Resources