Azure Data Factory Mapping Data Flow: Epoch timestamp to Datetime - azure

I have a JSON-based source I'd like to transform using ADF Mapping Data Flow. I have a string containing an epoch timestamp value that I want to transform to Datetime value to later sink it into Parquet file.
Do you know a way? Docs of this language are here.
Source file:
{
"timestamp":"1574127407",
"name":"D.A."
}

Use toTimestamp() and set the formatting you wish as 2nd parameter
toTimestamp(1574127407*1000l)
From string:
toTimestamp(toInteger(toString(byName('timestamp')))*1000l,'yyyy-MM-dd HH:mm:ss')

I have came across various epoch timestamp values which are of 13 digits i.e., they even have milliseconds detailed information.
In such case, converting to integer using 'toInteger' won't serve the purpose instead this will keep the values as NULL. So, to fix this issue, we need to convert it to Long using toLong as below:
toTimestamp(toLong(toString(created)),'yyyy-MM-dd HH:mm:ss')
In above expression, 'created' is a field whose value is 13-digit epoch timestamp, something like this created='1635359043307'.
Here, toTimestamp returns the Date Timestamp with above-mentioned date format.
FYI, you can use this site https://www.epochconverter.com/ to check epoch timestamp to human date.

Related

The best way to store date in arangodb

I need to store resource creation date in arangodb. Furthermore, I need to filter resources by creation date:
FOR doc in Docs
FILTER doc.created < #some_date
RETURN doc
What is the best way to do that?
Possible solutions:
{
// one
created: "<iso8601-date>", // Persistent index, use DATE_TIMESTAMP()
// two
created: "<timestamp-str>", // Persistent index, use TO_NUMBER
// three
created: <timestamp-int>, // Persistent index
}
Just found this documentation. In brief: date can be stored as string in iso8601 format either as a numeric timestamp. All compare operations will work correctly and efficiently.
I'd recommend you store dates in the Unix Epoch Seconds format, as this helps with date range queries and ensures your clients can reliably format the date to a usable format.
When the date is just a number, then it's very easy to do time queries over ranges as 1 hour is just 3600 seconds.
If you need more granularity, then storing dates as Unix Epoch MilliSeconds also helps.
You can store dates earlier than 1/1/1970 with this format, they will just be negative integers.

How to Convert a column having one timestamp to another timestamp in Azure Data Factory

I have column ABC where timestamp is of format dd/MM/yyyy HH:mm:SS (11/04/2020 1:17:40).I want to create another column ABC_NEW with same data as old column but with different timestamp 'yyyy-MM-dd HH:mm:SS'.I tried doing in azure data factory derived column using
toTimestamp(column_name,'yyyy-MM-dd HH:mm:SS') but it did not work it is coming as NULL. Can anyone help?
It's a 2-step process. You first need to tell ADF what each field in your timestamp column represents, then you can use string conversions to manipulate that timestamp into the output string as you like:
toString(toTimestamp('11/04/2020 1:17:40','MM/dd/yyyy HH:mm:ss'),'yyyy-MM-dd HH:mm:SS')
Data Factory doesn't support date format 'dd/mm/yyyy', we can not convert it to 'YYYY-MM-DD' directly.
I use DerivedColumn to generate a new column ABC_NEW from origin column DateTime and enter the expression bellow:
toTimestamp(concat(split(substring(DateTime,1, 10), '/')[3], '-',split(substring(DateTime,1, 10), '/')[2],'-',split(substring(DateTime,1, 10), '/')[1],substring(DateTime,11, length(DateTime))))
The result shows:
This is a trick which was a blocker for me, but try this-
Go to sink
Mapping
Click on output format
Select the data format or time format you prefer to store the data into the sink.

Presto epoch string to timestamp

Require your help as stuck in time conversion in presto.
I have a epoch column with name timestamp as a string datatype and i want to convert this into date timestamp.
I have used the below query after reading through various blogs:
SELECT date_parse(to_iso8601(from_unixtime(CAST(timestamp AS bigint)) AS date ,
'%Y-%m-%dT%H:%i:%s.%fZ'))
FROM wqmparquet;
Everytime i run this query i get an error:
INVALID_FUNCTION_ARGUMENT: Invalid format: "2020-04-27T19:49:50.000Z" is malformed at "T19:49:50.000Z"
Can somebody please help me on this.
I might be oversimplifying this, but if you want to convert an epoch string to a timestamp datatype, you can just do:
from_unixtime(cast(timestamp as bigint))
You can generate a timestamp with time zone by passing a second argument to from_unixtime(), as a time zone string.

Logstash convert the "yyyy-MM-dd" to "yyyy-MM-dd'T'HH:mm:ss.SSSZ"

I use the logstash-input-jdbc plugin to sync my data from mysql to elasiticsearch. However, when I looked at the data in elasticsearch, I found that the format of the fields of all date types changed from "yyyy-MM-dd" to "yyyy-MM-dd'T'HH:mm:ss.SSSZ".I have nearly 200 fields whose type is date, so I want to know how to configure logstash so that it can output the format "yyyy-MM-dd" instead of "yyyy-MM-dd'T'HH:mm:ss.SSSZ".
Elasticsearch stores dates as UTC timestamps:
Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch.
Queries on dates are internally converted to range queries on this long representation, and the result of aggregations and stored fields is converted back to a string depending on the date format that is associated with the field.
So if you want to retain the yyyy-MM-dd format, you'll have to store it as a keyword (which you then won't be able to do range queries on).
You can change Kibana's display to only show the yyyy-MM-dd format, but note that it will convert the date to the timezone of the viewer which may result in a different day than you entered in the input field.
If you want to ingest the date as a string, you'll need to create a mapping for the index in question to prevent default date processing.

Timestamp casting makes value null

When cast the column datatype from string to timestamp the value becomes null.
I have values in the following format
20070811T00789.167861+0100
I want to cast the type to "timestamp", when i do the following
df.withColumn('arrivetime', df['arrivetime'].cast('timestamp'))
the value is becoming null. How to cast the column to timestamp without affecting the value and its format?
I dont' know exactly what format you are going for with the 5 digits for time and the 6 (nano seconds?) at the end, but do know that timestamps in Spark are milliseconds, not nanoseconds, so you are going to lose information.
That being said, you can use Spark's unix_timestamp method to convert strings to timestamps using the SimpleDateFormat syntax.
First you probably have to get rid of the last 3 digits of the timestamp, by using Spark's regexp_replace
In Scala that would look like:
regexp_replace(df("arrivetime"), """(\.\d{3})\d*""", """$1""")
Then you could use the unix_timestamp like so:
unix_timestamp([replaced string], "yyyyMMdd'T'HHmmss.SSSz")

Resources