Date field truncated reading Elasticseatch documents with Spark

Date field truncated reading Elasticseatch documents with Spark - apache-spark

I have date field in Elasticsearch documents named timestamp, containing microseconds:
I read the dataframe as below:
then I loose 3 last numbers of each timestamp
Do you know what is wrong?

Related

Prevent pyspark/spark from transforming timestamp when creating a dataframe from a parquet file

I am reading a parquet file into a dataframe. My goal is to verify that my time data (column type in parquet : timestamp) are ISO 8601.
The dates in time column look like this : 2021-03-13T05:34:27.100Z or 2021-03-13T05:34:27.100+0000
But when I read my dataframe, pyspark transform 2021-03-13T05:34:27.100Z into 2021-03-13 05:34:27.100
I want to keep the original format, but I can't figure out how to stop pyspark from doing this. I tried to use a custom schema with string for dates but I get this error: Parquet column cannot be converted in file file.snappy.parquet. Column: [time], Expected: string, Found: INT96
Also I tried using conf parameters but it didn't work for me.

Hive TimeStamp column with TimeZone

I have a created a table in hive with one column as timestamp datatype. While I am inserting into the hive getting different than the existing.
My column expected value : 2021-11-03 16:57:10.842 UTC (This I am getting as string). How I can store the above output in hive table( column with Datatype as timestamp)

You need to use cast to convert this to timestamp after removing the word UTC. Since hive doesnt care about timezone intentionally, and display data in UTC, you should be in good shape.
select cast( substr('2021-11-03 16:57:10.84 UTC',1,23) as timestamp) as ts
Pls note you need to have the data in above yyyy-MM-dd hh:mm:ss.SS format.
Also pls note you can not use from_unixtime(unix_timestamp(string_col , 'dd-MM-yyyy HH:mm:ss.SSS')) because we will loose millisecond part.

re-format timestamp to keep time and time zone

I want to create a table with date and timezone in different columns.
For example:
Date 20170311 Time 10:32:24+1300
The format has to be the same as above.
When I create the table Date was set as type date and time was type timestamp.
When I insert the date, I have to follow a certain format like 2017-03-11, how can I make it the same as the table shown.
When inserting the time and time zone, I have to insert the date alone with it, like '2017-03-22T10:37:50+1300' is there any way that I can reformat it?
After inserting with this format '2017-03-22T10:37:50+1300', the time and time zone changed in the table, how could I keep it the same as input?
CREATE TABLE example (id int, work_date date, sequence timestamp);
INSERT INTO example (id int, work_date date, sequence timestamp) VALUES (1, '2017-03-22', '2017-03-22T10:37:50+1300')
expected result:
1 20170322 10:37:50+1300
actual result:
1 2017-03-22 2017-03-21 21:37:50.000000+0000

Cassandra has several data types related to date & time - date, time, and timestamp, and only the last one has the notion of the time zone.
The formatting of the timestamps is your responsibility - internally data is stored as long (8 bytes) representing number of milliseconds since epoch, and then converted into textual representation by corresponding driver - in case of cqlsh, the formatting is controlled by datetimeformat parameter. Similarly, for date & time data types - they are kept as numbers inside database, not as strings.
If you're accessing the data from your own program, then you can format time as you want.

Logstash convert the "yyyy-MM-dd" to "yyyy-MM-dd'T'HH:mm:ss.SSSZ"

I use the logstash-input-jdbc plugin to sync my data from mysql to elasiticsearch. However, when I looked at the data in elasticsearch, I found that the format of the fields of all date types changed from "yyyy-MM-dd" to "yyyy-MM-dd'T'HH:mm:ss.SSSZ".I have nearly 200 fields whose type is date, so I want to know how to configure logstash so that it can output the format "yyyy-MM-dd" instead of "yyyy-MM-dd'T'HH:mm:ss.SSSZ".

Elasticsearch stores dates as UTC timestamps:
Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch.
Queries on dates are internally converted to range queries on this long representation, and the result of aggregations and stored fields is converted back to a string depending on the date format that is associated with the field.
So if you want to retain the yyyy-MM-dd format, you'll have to store it as a keyword (which you then won't be able to do range queries on).
You can change Kibana's display to only show the yyyy-MM-dd format, but note that it will convert the date to the timezone of the viewer which may result in a different day than you entered in the input field.
If you want to ingest the date as a string, you'll need to create a mapping for the index in question to prevent default date processing.

Cassandra time not saved in UTC

I need to split my timestamp to date and time separately and insert then to db columns with 'date' and 'time' cqltypes.
I was trying to insert a time value as string to Cassandra table. The time was converted to UTC (05:27:00). But while I checked table using Datastax devcenter, column was populated with value '09:37:54.935541808'. I tried to retrieve the value in spring using repository, then it was returning value as '3473746674935541808'.
How to get the correct value from table for time?

It looks like the limitation of Spring-data. In Cassandra time value is encoded as a 64-bit signed integer representing the number of nanoseconds since midnight. But I don't see the time type listed as supported in spring-data-cassandra documentation, so you may need to write your custom converter for it, as described in documentation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Date field truncated reading Elasticseatch documents with Spark - apache-spark

I have date field in Elasticsearch documents named timestamp, containing microseconds: I read the dataframe as below: then I loose 3 last numbers of each timestamp Do you know what is wrong?

Related

Prevent pyspark/spark from transforming timestamp when creating a dataframe from a parquet file

Hive TimeStamp column with TimeZone

re-format timestamp to keep time and time zone

Logstash convert the "yyyy-MM-dd" to "yyyy-MM-dd'T'HH:mm:ss.SSSZ"

Cassandra time not saved in UTC

Categories

Resources