I have a datetime column in an Azure Data Lake Analytics table.
All my incoming data is UTC +0000. When using the below code, all the csv outputs convert the dates to -0800
OUTPUT #data
TO #"/data.csv"
USING Outputters.Text(quoting : false, delimiter : '|');
An example datatime in the output:
2018-01-15T12:20:13.0000000-08:00
Are there any options for controlling the output format of the dates? I don't really understand why everything is suddenly in -0800 when the incoming data isn't.
Currently, ADLA does not store TimeZone information in DateTime, meaning it will always default to the local time of the cluster machine when reading (-8:00 in your case). Therefore, you can either normalize your DateTime to this local time by running
DateTime.SpecifyKind(myDate, DateTimeKind.Local)
or use
DateTime.ConvertToUtc()
to output in Utc form (but note that next time you ingest that same data, ADLA will still default to reading it in offset -0800). Examples below:
#getDates =
EXTRACT
id int,
date DateTime
FROM "/test/DateTestUtc.csv"
USING Extractors.Csv();
#formatDates =
SELECT
id,
DateTime.SpecifyKind(date, DateTimeKind.Local) AS localDate,
date.ConvertToUtc() AS utcDate
FROM #getDates;
OUTPUT #formatDates
TO "/test/dateTestUtcKind_AllUTC.csv"
USING Outputters.Csv();
You can file a feature request for DateTime with offset on our ADL feedback site. Let me know if you have other questions!
Related
I am using pyspark to load data by jdbc from mssql database. I have a problem with reading datetimes and treat fetched datetime as UTC without applying any timezone, becouse datetime is already in UTC.
Here on mssql i have a table
CREATE TABLE [config].[some_temp_table](
[id] [int] NULL,
[dt] [datetime] NULL
)
-- select * from [config].[some_temp_table]
--id dt
--1 2022-07-18 23:11:26.613
And I want to read it with pyspark by jdbc.
After reading it I have this in my DataFrame
df.>>> df.show(truncate=False)
+---+-----------------------+
|id |dt |
+---+-----------------------+
|1 |2022-07-18 21:11:26.613|
+---+-----------------------+
Timezone from server when spark runs:
# date +"%Z %z"
CEST +0200
So what I understand - spark is reading datetime and treating it as datetime with local timezone from server on which it runs. So it's getting '2022-07-18 23:11:26.613' and it thinks that it is datetime with +0200 timezone so it converts it to UTC like that: '2022-07-18 21:11:26.613'. Correct me if my thinking is wrong.
What I want to do:
Read datetime from mssql database by spark and save it into parquet without any conversion.
For example if spark reads datatime '2022-07-18 23:11:26.613' it should save same value into parquet, so after reading the parquet file I want to see value '2022-07-18 23:11:26.613'.
Is there any option to tell spark to treat datetime from jdbc connection as UTC already or not to do any conversion ?
What I tried:
spark.conf.set("spark.sql.session.timeZone", "UTC") - just doing
nothing
serverTimezone=UTC - added to jdbc uri , did nothing also
Note:
I don't know if this is needed but timezone on mssql database is
--select current_timezone()
(UTC+01:00)
I am trying to read some data from the SharePoint API via the older _vti_bin/client.svc endpoint
I can't seem to find what type of date format this is and how I can parse it via C#.
The timestamp being returned is:
"LastContentModifiedDate": "/Date(2022,3,18,13,12,28,990)/"
The year and month are obvious so I could parse it myself if I knew what all the values are. Is there a formal definition for this or a way to parse this reliably? Is this a DateTime or DateTimeOffset or something else?
I just get an exception when trying to deserialize to a DateTime or DateTimeOffset.
The /Date(...)/ format is Microsoft's built-in JSON date format.
You can try to parse it using the code below.You can also check out this post, which provides a lot of methods.
using System.Web.Script.Serialization;
//code
JavaScriptSerializer json_serializer = new JavaScriptSerializer();
DateTime ddate = json_serializer.Deserialize<DateTime>(#"""\/Date(1326038400000)\/""").ToLocalTime();
I have a TDMS file with a bunch of DateTime values with relevant instrumentation data.
The issue I am having is:
TDMS file >>>> Python Reads
4/20/2021 12:00:01 AM >>>> 2021-04-20 04:00:00.597573
4/20/2021 8:00:01 PM >>>> 2021-04-21 00:00:00.570708
This is messing up transfers to the database because it is not accurate.
This is my code:
dfscaled = tdmsfile.__getitem__("Data (Scaled)").as_dataframe()
for index, row in dfscaled.iterrows():
print(row["Timestamp"])
I am using the NPTDMS library. Any ideas on how to fix this?
So I ended up contacting the author and he was helpful enough to send a response:
TDMS files internally store times in UTC. You don't say how you're
getting the values in the "TDMS file" column but I assume this is from
a program that converts times into your local timezone to display
them. There is an example of how to convert UTC times from a TDMS file
into your local timezone in the documentation.
If you are storing these times in a database though, I would strongly
recommend you store times in UTC rather than your local timezone as
local times can be ambiguous due to daylight savings changes, and UTC
is easily understood from anywhere in the world without having to know
the local timezone where the data originated.
If you still feel like making the change from UTC to EST then this should do it:
fmt = "%Y-%m-%d %H:%M:%S.%f"
x=pd.to_datetime(row["Timestamp"]).tz_localize('UTC').tz_convert('US/Eastern').strftime(fmt)
dtm=pd.to_datetime(x[:-3])
print(dtm)
I have the following code which am using to monitor Azure ADF pipeline runs. The code uses 'RunFilterParameters' to apply a date range filter in extracting run results:
filter_params = RunFilterParameters(last_updated_after=datetime.now() - timedelta(1), last_updated_before=datetime.now() + timedelta(1))
query_response = adf_client.activity_runs.query_by_pipeline_run(resource_group, adf_name, row.latest_runid,filter_params)
The above works ok, however it is throwing a warning:
Datetime with no tzinfo will be considered UTC
Not sure how to add timezone to this or just suppress the warning?
Please help.
"no tzinfo" means that naive datetime is used, i.e. datetime with no defined time zone. Since Python assumes local time by default for naive datetime, this can cause unexpected behavior.
In the code example given in the question, you can create an aware datetime object (with tz specified to be UTC) like
from datetime import datetime, timezone
# and use
datetime.now(timezone.utc)
If you need to use another time zone than UTC, have a look at zoneinfo (Python 3.9+ standard library).
We are sending datetime as string in the format 2018-03-20 10:50:037.996, and we have written Steam analytics query as below.
SELECT
powerscout.Device_Id AS PowerScout,
powerscout.[kW System],
CAST(powerscout.[TimeStamp] AS datetime) AS [TimeStamp]
INTO
[PowergridView]
FROM
[IoTHubIn]
When we are sending data through Stream analytics, Job is getting failed.
Any Suggestions please.
Thanks In Advance
ASA can parse DATETIME fields represented in one of the formats described in ISO 8601. This format is not supported. You can try using custom JavaScript function to parse it.