Change datetime format generated with make-series operation in Kusto - azure

Introduction:
In Azure Data Explorer there is a make-series-Operator which allow us to create series of specified aggregated values along specified axis.
Where is the problem:
The operator works good except the changes in timestamp format.
For example
let resolution = 1d;
let timeframe = 3d;
let start_ts = datetime_add('second', offset, ago(timeframe));
let end_ts = datetime_add('second', offset, now());
Table
| make-series max(value) default=0 on timestamp from start_ts to end_ts step resolution by col_1, col_2
Current results:
I got the result contains the timestamp in UTC like the following
"max_value": [
-2.69,
-2.79,
-2.69
],
"timestamp": [
"2020-03-29T18:01:08.0552135Z",
"2020-03-30T18:01:08.0552135Z",
"2020-03-31T18:01:08.0552135Z"
],
Expected result:
result should be like the following
"max_value": [
-2.69,
-2.79,
-2.69
],
"timestamp": [
"2020-03-29 18:01:08",
"2020-03-30 18:01:08",
"2020-03-31 18:01:08"
],
Question:
is there any way to change the datetime format which generated in make-series operation in kusto to be NOT in UTC format.

is there any way to change the datetime format which generated in make-series operation in kusto to be NOT in UTC format.
it's not clear what you define as "UTC Format". Kusto/ADX uses the ISO 8601 standard, and timestamps are always UTC. You can see that is used in your original message, e.g. 2020-03-29T18:01:08.0552135Z.
if, for whatever reason, you want to present datetime values in a different format, inside of a dynamic column (array or property bag), you could achieve that using mv-apply and format_datetime():
print arr = dynamic(
[
"2020-03-29T18:01:08.0552135Z",
"2020-03-30T18:01:08.0552135Z",
"2020-03-31T18:01:08.0552135Z"
])
| mv-apply arr on (
summarize make_list(format_datetime(todatetime(arr), "yyyy-MM-dd HH:mm:ss"))
)

Related

Bigquery failed to parse input string as TIMESTAMP

I'm trying to load a csv from Google Cloud Storage into Bigquery using schema autodetect.
However I'm getting stumped by a parsing error on one of my columns. I'm perplexed why bigquery can't parse the field. In the documentation, it should be able to parse fields that look like YYYY-MM-DD HH:MM:SS.SSSSSS (which is exactly what my BQInsertTimeUTC column is).
Here's my code:
from google.cloud import bigquery
from google.oauth2 import service_account
project_id = "<my_project_id>"
table_name = "<my_table_name>"
gs_link = "gs://<my_bucket_id>/my_file.csv"
creds = service_account.Credentials.from_service_account_info(gcs_creds)
bq_client = bigquery.Client(project=project_id, credentials=creds)
dataset_ref = bq_client.dataset(<my_dataset_id>)
# create job_config object
job_config = bigquery.LoadJobConfig(
autodetect=True,
skip_leading_rows=1,
source_format="CSV",
write_disposition="WRITE_TRUNCATE",
)
# prepare the load_job
load_job = bq_client.load_table_from_uri(
gs_link,
dataset_ref.table(table_name),
job_config=job_config,
)
# execute the load_job
result = load_job.result()
Error Message:
Could not parse '2021-07-07 23:10:47.989155' as TIMESTAMP for field BQInsertTimeUTC (position 4) starting at location 64 with message 'Failed to parse input string "2021-07-07 23:10:47.989155"'
And here's the csv file that is living in GCS:
first_name, last_name, date, number_col, BQInsertTimeUTC, ModifiedBy
lisa, simpson, 1/2/2020T12:00:00, 2, 2021-07-07 23:10:47.989155, tim
bart, simpson, 1/2/2020T12:00:00, 3, 2021-07-07 23:10:47.989155, tim
maggie, simpson, 1/2/2020T12:00:00, 4, 2021-07-07 23:10:47.989155, tim
marge, simpson, 1/2/2020T12:00:00, 5, 2021-07-07 23:10:47.989155, tim
homer, simpson, 1/3/2020T12:00:00, 6, 2021-07-07 23:10:47.989155, tim
Loading CSV files to BigQuery assumes that all the timestamp fields are going to follow the same format. In your CSV file, since the first timestamp value is "1/2/2020T12:00:00" so it is going to consider the timestamp format that the CSV file uses is [M]M-[D]D-YYYYT[H]H:[M]M:[S]S[.F]][time zone].
Therefore, it complains that the value "2021-07-07 23:10:47.989155" could not be parsed. If you change "2021-07-07 23:10:47.989155" to "7/7/2021T23:10:47.989155", it will work.
To fix this, you can either
Create a table with date column's type and BQInsertTimeUTC column's type as STRING. Load the CSV into it. And then expose a view which will have the expected TIMESTAMP column types for date and BQInsertTimeUTC, using SQL to transform the data from the base table.
Open the CSV file and transform either the "date" values or "BQInsertTimeUTC" values to make their formats consistent.
By the way, the CSV sample you pasted here has extra space after the delimiter ",".
Working version:
first_name,last_name,date,number_col,BQInsertTimeUTC,ModifiedBy
lisa,simpson,1/2/2020T12:00:00,7/7/2021T23:10:47.989155,tim
bart,simpson,1/2/2020T12:00:00,3,7/7/2021T23:10:47.989155,tim
maggie,simpson,1/2/2020T12:00:00,4,7/7/2021T23:10:47.989155,tim
marge,simpson,1/2/2020T12:00:00,5,7/7/2021T23:10:47.989155,tim
homer,simpson,1/3/2020T12:00:00,6,7/7/2021T23:10:47.989155,tim
As per the limitaions mentioned here,
When you load JSON or CSV data, values in TIMESTAMP columns must use a dash - separator for the date portion of the timestamp, and the date must be in the following format: YYYY-MM-DD (year-month-day). The hh:mm:ss (hour-minute-second) portion of the timestamp must use a colon : separator.
So can you try passing the BQInsertTimeUTC as 2021-07-07 23:10:47 without the milli seconds instead of 2021-07-07 23:10:47.989155
If you want to still use different Date formats you can do the following:
Load the CSV file as-is to BigQuery (i.e. your schema should be modified to BQInsertTimeUTC:STRING)
Create a BigQuery view that transforms the shipped field from a string to a recognized date format.
Do a PARSE_DATE for the BQInsertTimeUTC and use that view for your analysis

How to compare and get data between given dates in mongodb?

I need to get the data from MongoDB between two given dates. The same mongo db query is working for ( yy-mm-dd hh:mm:ss.ms ) format but it is not working for ( dd-mm-yy hh:mm:ss) format.
Sample Data in DB
{
"name":"user1",
"Value":"Success",
"Date": "02-06-2020 00:00:00",
"Status":"available",
"Updated_on":"2021-01-09 00:00:00.0000"
}
Python:
start_date = "02-06-2020 00:00:00"
end_date = "11-06-2020 10:16:41"
data = list(db.collection.find({"Date":{"gte":start_date,"Slte":end_date},"Value":"Success"},{'_id':False,"Date":1,"name":1,"Value":1}))
print(data)
I need to get the data based on the "Date" field.
The problem is it is giving extra data than the start_date and end_date.
Example: if my start_date is "02-06-2020 00:00:00"and end_date is "11-06-2020 10:16:41", it is giving data from "02-04-2020 00:00:00" to "11-06-2020 10:16:41"
Any idea to achieve this and please explain why it is not taking dates correctly.

How to extract and flatten a JSON array as well as specify an Array Value for 'TIMESTAMP BY' in Stream Analytics Query

I got the following input stream data to Stream Analytics.
[
{
"timestamp": 1559529369274,
"values": [
{
"id": "SimCh01.Device01.Ramp1",
"v": 39,
"q": 1,
"t": 1559529359833
},
{
"id": "SimCh01.Device01.Ramp2",
"v": 183.5,
"q": 1,
"t": 1559529359833
}
],
"EventProcessedUtcTime": "2019-06-03T02:37:29.5824231Z",
"PartitionId": 3,
"EventEnqueuedUtcTime": "2019-06-03T02:37:29.4390000Z",
"IoTHub": {
"MessageId": null,
"CorrelationId": null,
"ConnectionDeviceId": "ew-IoT-01-KepServer",
"ConnectionDeviceGenerationId": "636948080712635859",
"EnqueuedTime": "2019-06-03T02:37:29.4260000Z",
"StreamId": null
}
}
]
I try to extract the "values" array and specify the "t" within the array element for TIMESTAMP BY
I was able to query with simple SAQL statement within Stream Analytics to read the input and route to the output. However, I only interested in the "values" array above.
This is my first attempt. It does not like my 'TIMESTAMP BY' statement when I try to re-start Stream Analytics Job
SELECT
KepValues.ArrayValue.id,
KepValues.ArrayValue.v,
KepValues.ArrayValue.q,
KepValues.ArrayValue.t
INTO
[PowerBI-DS]
FROM
[IoTHub-Input] as event
CROSS APPLY GetArrayElements(event.[values]) as KepValues
TIMESTAMP BY KepValues.ArrayValue.t
==============================================================================
This is my 2nd attempt. It still does not like my 'TIMESTAMP BY' statement.
With [PowerBI-Modified-DS] As (
SELECT
KepValues.ArrayValue.id as ID,
KepValues.ArrayValue.v as V,
KepValues.ArrayValue.q as Q,
KepValues.ArrayValue.t as T
FROM
[IoTHub-Input] as event
CROSS APPLY GetArrayElements(event.[values]) as KepValues
)
SELECT
ID, V, Q, T
INTO
[PowerBI-DS]
FROM
[PowerBI-Modified-DS] TIMESTAMP BY T
After extraction, this is what I expected, a table with columns "id", "v", "q", "t" and each row will have a single ArrayElement. e.g.,
"SimCh01.Device01.Ramp1", 39, 1, 1559529359833
"SimCh01.Device01.Ramp2", 183.5, 1, 1559529359833
Added
Since then, I have modified the query as below to create a DateTime by converting the Unix time t into DateTime time
With [PowerBI-Modified-DS] As (
SELECT
arrayElement.ArrayValue.id as ID,
arrayElement.ArrayValue.v as V,
arrayElement.ArrayValue.q as Q,
arrayElement.ArrayValue.t as TT
FROM
[IoTHub-Input] as iothubAlias
CROSS APPLY GetArrayElements(iothubAlias.data) as arrayElement
)
SELECT
ID, V, Q, DATEADD(millisecond, TT, '1970-01-01T00:00:00Z') as T
INTO
[SAJ-01-PowerBI]
FROM
[PowerBI-Modified-DS]
I manage to add DATEADD() to convert Unix Time into DateTime and call it as T. Now how can I add 'TIMESTAMP BY'. I did try to add behind [PowerBI-Modified-DS] . But the editor complains the insert is invalid. What else can I do. Or this is the best I can do. I understand I need to set 'TIMESTAMP BY' so Power BI understand this is the streaming data.
The TIMESTAMP BY clause in Stream Analytics requires a value to be of type DATETIME. String values conforming to ISO 8601 formats are supported. In your example, the value of 't' does not conform to this standard.
To use TIMESTAMP BY clause in your case, you will have to pre-process the data before sending it to Stream Analytics or change the source to create the event (specifically the field 't') using this format.
Stream Analytics assigns TIMESTAMP before the query is executed. So TIMESTAMP BY expression can only refer to fields in the input payload. You have 2 options.
You can have 2 ASA jobs. First does the CROSS APPLY and the 2nd job does TIMESTAMP BY.
You can implement a deserializer in C# (sign up for preview access). This way you can have one job that uses your implementation to read incoming events. Your deserializer will convert the unix time to DateTime and this field can then be used in your TIMESTAMP BY clause.

Getting the time frame in which a series of message is received in stream analytics

I am streaming event messages which contain an posix/epoch time field. I am trying to calculate in which time frame i received a series of messages from a device.
Let's assume the following (simplified) input:
[
{ "deviceid":"device01", "epochtime":1500975613660 },
{ "deviceid":"device01", "epochtime":1500975640194 },
{ "deviceid":"device01", "epochtime":1500975649627 },
{ "deviceid":"device01", "epochtime":1500994473225 },
{ "deviceid":"device01", "epochtime":1500994486725 }
]
The result of my calculation should be a message like {deviceid, start, end} for each device id. I assume that a new time frame starts, if the time intervall between two events is longer than one hour. In my example this would result in two transmissions:
[
{"deviceid":"device01", "start":1500975613660, "end"=1500975649627},
{"deviceid":"device01", "start":500994473225, "end"=1500994486725}
]
I can convert the epoch time according to example 2 in the documentation https://msdn.microsoft.com/en-us/library/azure/mt573293.aspx. However, i cannot use the converted timestamp with the LAG function in a sub query. All values for previousTime are null in the ouput.
WITH step1 AS (
SELECT
[deviceid] AS deviceId,
System.Timestamp AS ts,
LAG([ts]) OVER (LIMIT DURATION(hour, 24)) as previousTime
FROM
input TIMESTAMP BY DATEADD(millisecond, epochtime, '1970-01-01T00:00:00Z')
)
I am not sure how i can perform my calculation and what's the best way to do it. I need to figure out the beginning and end of an event series.
Any help is very much appreciated.
I slightly modified your query below in order to get the expected result:
WITH STEP1 AS (
SELECT
[deviceid] AS deviceId,
System.Timestamp AS ts,
LAG(DATEADD(millisecond, epochtime, '1970-01-01T00:00:00Z') ) OVER (LIMIT DURATION(hour, 24)) as previousTime
FROM
input TIMESTAMP BY DATEADD(millisecond, epochtime, '1970-01-01T00:00:00Z')
)
SELECT * from STEP1
The problem is "ts" was defined in the current step, but when using LAG you are looking at the original message coming from the FROM statement, and it doesn't contain the "ts" variable.
Let me know if you have any question.
Thanks,
JS - Azure Stream Analytics team

Change field to timestamp

I have a csv file which store up cpu usage. There is a field with date format like this "20150101-00:15:00". How can I change it to #timestamp in logstash as shown in kibana?
Use date filter on that field:
date {
match => [ "dateField" , "yyyyMMdd-HH:mm:ss"]
}
It will add the #timestamp field.
See documentation here: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html

Resources