How to convert datetime format on Azure Logs Query - azure

I want to format the datetime on Azure Log this is the date time format
DATETIME = 01/Sep/2022:04:48:11 +0000
I tried to split and get 01/Sep/2022 but it wont convert
SampleLog_CL
| extend raw = parse_json(RawData).log
| parse raw with DATETIME
| extend dt = split(DATETIME, ':')
| project DATE=format_datetime(todatetime(dt[0]), 'yyyy-MM-dd')
when I try to put the actual value 01/Sep/2022 it can convert
SampleLog_CL
| extend raw = parse_json(RawData).log
| parse raw with DATETIME
| extend dt = split(DATETIME, ':')
| project DATE=format_datetime(todatetime("01/Sep/2022"), 'yyyy-MM-dd')
How can I convert it?

Well...
// Data sample generation. Not part of the solution
let SampleLog_CL = datatable(RawData:dynamic)
[
dynamic({"log":"01/Sep/2022:04:48:11 +0000"})
];
// Solution starts here
let months = dynamic({"Jan":01, "Feb":"02", "Mar":"03", "Apr":"04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"});
SampleLog_CL
| extend raw = parse_json(RawData).log
| parse raw with d "/" M "/" y ":" h ":" m ":" s " " o
| extend Timestamp = todatetime(strcat(y, "-", months[M], "-", d, " ", h, ":", m, ":", s, o))
RawData
raw
d
M
y
h
m
s
o
Timestamp
{"log":"01/Sep/2022:04:48:11 +0000"}
01/Sep/2022:04:48:11 +0000
01
Sep
2022
04
48
11
+0000
2022-09-01T04:48:11Z

Related

Kusto number of overlapping intervals in a time range

I'm trying to write a Kusto query that needs to count how many intervals overlap for a certain date range. This is how my table looks like:
userID | interval1 | interval2
24 | 21.1.2012 10:40 | 21.1.2012 11:00
25 | 21.1.2012 9:55 | 21.1.2012 10:50
I would like to to consider the time range given by [min(interval1), max(interval2)] with 1s step and for each instance of this interval I would like to know how many intervals from the previous table overlap. For example, for 21.1.2012 10:00 there is only one interval but for 10:45 there are two intervals overlapping.
Thank you
Every interval1 indicates additional user's session start (+1).
Every interval2 indicates additional user's session end (-1).
The accumulated sum indicates the number of active sessions.
Solution 1 (Rendering level)
with (accumulate=True)
let t = (datatable (userID:int,interval1:datetime,interval2:datetime)
[
24 ,datetime(2012-01-21 10:40) ,datetime(2012-01-21 11:00)
,25 ,datetime(2012-01-21 09:55) ,datetime(2012-01-21 10:50)
]);
let from_dttm = datetime(2012-01-21 09:30);
let to_dttm = datetime(2012-01-21 11:30);
let sessions_starts = (t | project delta = 1, dttm = interval1);
let sessions_ends = (t | project delta = -1, dttm = interval2);
union sessions_starts, sessions_ends
| make-series delta = sum(delta) on dttm from from_dttm to to_dttm step 1s
| render timechart with (accumulate=True)
Fiddle
Solution 2 (Data level)
mv-apply + row_cumsum
let t = (datatable (userID:int,interval1:datetime,interval2:datetime)
[
24 ,datetime(2012-01-21 10:40) ,datetime(2012-01-21 11:00)
,25 ,datetime(2012-01-21 09:55) ,datetime(2012-01-21 10:50)
]);
let from_dttm = datetime(2012-01-21 09:30);
let to_dttm = datetime(2012-01-21 11:30);
let sessions_starts = (t | project delta = 1, dttm = interval1);
let sessions_ends = (t | project delta = -1, dttm = interval2);
union sessions_starts, sessions_ends
| make-series delta = sum(delta) on dttm from from_dttm to to_dttm step 1s
| mv-apply delta to typeof(long), dttm to typeof(datetime) on (project active_users = row_cumsum(delta), dttm)
| render timechart with (xcolumn=dttm, ycolumns=active_users)
Fiddle
Take a look at this sample from the Kusto docs:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/samples?pivots=azuredataexplorer#chart-concurrent-sessions-over-time
X
| mv-expand samples = range(bin(interval1, 1m), interval2, 1m)
| summarize count_userID = count() by bin(todatetime(samples), 1m)

Use regexprep with cell array for colons to format

I have a cell array formatted as:
t = {'23:34:22.959511';
'22:34:11.885113';
'12:34:08.995146';
'11:34:02.383092'}
I am trying to format the output as 4 column vectors as:
a = 23
22
12
11
b = 34
34
34
34
c = 22
11
08
02
d = 959511
885113
995146
383092
I am using regexprep to operate on the data:
a = regexprep(t,':34:22.959511', '')
However this only pertains to only one string in the data set and not all strings.
How do I divide the string into 4 column vectors -- using regexprep for colon: and display the output below?
If you're willing to use other solutions that regexp: strplit can split on any desired character:
a = zeros(numel(t),1);
b = zeros(numel(t),1);
c = zeros(numel(t),1);
d = zeros(numel(t),1);
for ii = 1:numel(t)
C = strsplit(t{ii}, ':');
a(ii) = str2double(C{1});
b(ii) = str2double(C{2});
tmp = strsplit(C{3},'.'); % Additional split for dot
c(ii) = str2double(tmp{1});
d(ii) = str2double(tmp{2});
end
Of course this only works when your data always has this structure (two colons, then one dot)
Here's a way:
r = cell2mat(cellfun(#str2double, regexp(t, ':|\.', 'split'), 'uniformoutput', false));
This gives
r =
23 34 22 959511
22 34 11 885113
12 34 8 995146
11 34 2 383092
If you really need four separate variables, you can use:
r = num2cell(r,1);
[a, b, c, d] = r{:};
I would recommend using split instead of strsplit. split will operate on vectors and if you use the string datatype you can just call double on the string to get the numeric value
>> profFunc
Adriaan's Solution: 5.299892
Luis Mendo's Solution: 3.449811
My Solution: 0.094535
function profFunc()
n = 1e4; % Loop to get measurable timings
t = ["23:34:22.959511";
"22:34:11.885113";
"12:34:08.995146";
"11:34:02.383092"];
tic
for i = 1:n
a = zeros(numel(t),1);
b = zeros(numel(t),1);
c = zeros(numel(t),1);
d = zeros(numel(t),1);
for ii = 1:numel(t)
C = strsplit(t{ii}, ':');
a(ii) = str2double(C{1});
b(ii) = str2double(C{2});
tmp = strsplit(C{3},'.'); % Additional split for dot
c(ii) = str2double(tmp{1});
d(ii) = str2double(tmp{2});
end
end
fprintf('Adriaan''s Solution: %f\n',toc);
tic
for i = 1:n
r = cell2mat(cellfun(#str2double, regexp(t, ':|\.', 'split'), 'uniformoutput', false));
r = num2cell(r,1);
[a, b, c, d] = r{:};
end
fprintf('Luis Mendo''s Solution: %f\n',toc);
tic
for i = 1:n
x = split(t,[":" "."]);
x = double(x);
a = x(:,1);
b = x(:,2);
c = x(:,3);
d = x(:,4);
end
fprintf('My Solution: %f\n',toc);

How to calculate hours active between two timestamps

If I have a Dataframe with two Timestamps, called 'start' and 'end', how can I calculate a list of all the hour's between 'start' and 'end'?
Another say to say this might be "which hours was the record active"?
For example:
// Input
| start| end|
|2017-06-01 09:30:00|2017-06-01 11:30:00|
|2017-06-01 14:00:00|2017-06-01 14:30:00|
// Result
| start| end|hours_active|
|2017-06-01 09:30:00|2017-06-01 11:30:00| (9,10,11)|
|2017-06-01 14:00:00|2017-06-01 14:30:00| (14)|
Thanks
If the difference between the start and end is always less than 24 hours, you can use the following UDF. Assuming the type of the columns is Timestamp:
val getActiveHours = udf((s: Long, e: Long) => {
if (e >= s) {
val diff = e - s
(s to (s+diff)).toSeq
} else {
// the end is in the next day
(s to 24).toSeq ++ (1L to e).toSeq
}
})
df.withColumn("hours_active", getActiveHours(hour($"start"), hour($"end")))
Using the example data in the question gives:
+---------------------+---------------------+------------+
|start |end |hours_active|
+---------------------+---------------------+------------+
|2017-06-01 09:30:00.0|2017-06-01 11:30:00.0|[9, 10, 11] |
|2017-06-01 14:00:00.0|2017-06-01 14:30:00.0|[14] |
+---------------------+---------------------+------------+
Note: For larger differences between the timestamps the above code can be adjusted to take that into account. It would then be necessary to look at other fields in addition to the hour, e.g. day/month/year.

Urlencode/decode, different representation of the same string

I am a bit out of my comfort zone here, so I'm not even sure I'm aproaching the problem appropriately. Anyhow, here goes:
So I have a problem where I shall hash some info with sha1 that will work as that info's id.
when a client wants to signal what current info is being used, it sends a percent-encoded sha1-string.
So one example is, my server hashes some info and gets a hex representation like so:
44 c1 b1 0d 6a de ce 01 09 fd 27 bc 81 7f 0e 90 e3 b7 93 08
and the client sends me
D%c1%b1%0dj%de%ce%01%09%fd%27%bc%81%7f%0e%90%e3%b7%93%08
Removing the % we get
D c1 b1 0dj de ce 01 09 fd 27 bc 81 7f 0e 90 e3 b7 93 08
which matches my hash except for the beginning D and the j after the 0d, but replacing those with their ascii hex no, we have identical hash.
So, as I have read and understood the urlencoding, the standard would allow a client to send the D as either D or %44? So different clients would be able to send different representations off the same hash, and I will not be able to just compare them for equality?
I would prefer to be able to compare the urlencoded strings as they are when they are sent, but one way to do it would be to decode them, removing all '%' and get the ascii hex value for whatever mismatch I get, much like the D and the j in my above example.
This all seems to be a very annoying way to do things, am I missing something, please tell me I am? :)
I am doing this in node.js but I suppose the solution would be language/platform agnostic.
I made this crude solution for now:
var unreserved = 'A B C D E F G H I J S O N K L M N O P Q R S T U V W X Y Za b c d e f g h i j s o n k l m n o p q r s t u v w x y z + 1 2 3 4 5 6 7 8 9 0 - _ . ~';
function hexToPercent(hex){
var index = 0,
end = hex.length,
delimiter = '%',
step = 2,
result = '',
tmp = '';
if(end % step !== 0){
console.log('\'' + hex + '\' must be dividable by ' + step + '.');
return result;
}
while(index < end){
tmp = hex.slice(index, index + step);
if(unreserved.indexOf(String.fromCharCode('0x' + tmp)) !== -1){
result = result + String.fromCharCode('0x' + tmp);
}
else{
result = result + delimiter + tmp;
}
index = index + step;
}
return result;
}

GAWK: Inverse of strftime() - Convert date string to seconds since epoc timestamp using format pattern

Gnu AWK provides the built in function
strftime()
which can convert a timestamp like 1359210984 into Sat 26. Jan 15:36:24 CET 2013.
I couldn't find a function that would do this:
seconds = timefromdate("Sat 26. Jan 15:36:24 CET 2013", "%a %d. %b %H:%M:%S CET %Y")
or
seconds = timefromdate("2013-01-26 15:36:24", "%Y-%m-%d %H:%M:%S")
Whereas seconds then is 1359210984.
So, the date string should be convertable by a format pattern.
I'd like to do this in gawk only.
Edit 1:
I'd like to convert the date only in gawk for further processing of the stream.
Edit 2:
I've clarified my question. It was a bit sloppy in the "would do this" code example.
The function you're looking for is called mktime(). You should use the gensub() function to manipulate the datespec into the format that can be read by mktime().
To format the second example that you give, consider:
BEGIN {
t = "2013-01-26 15:36:24"
f = "\\1 \\2 \\3 \\4 \\5 \\6"
s = mktime(gensub(/(....)-(..)-(..) (..):(..):(..)/, f, "", t))
print s
}
Results on my machine:
1359178584
To format the first example that you give, consider:
BEGIN {
t = "Sat 26. Jan 15:36:24 CET 2013"
gsub(/\.|:/, FS, t)
split(t,a)
Y = a[8]
M = convert(a[3])
D = a[2]
h = a[4]
m = a[5]
s = a[6]
x = mktime(sprintf("%d %d %d %d %d %d", Y, M, D, h, m, s))
print x
}
function convert(month) {
return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", month) - 1) / 3) + 1)
}
Results on my machine:
1359178584
For more information, please consult the manual, specifically time functions and string functions. HTH.
You could use the python method strptime from datetime:
from datetime import datetime
date_string = "21 June, 2018"
print("date_string =", date_string)
print("type of date_string =", type(date_string))
date_object = datetime.strptime(date_string, "%d %B, %Y")
print("date_object =", date_object)
print("type of date_object =", type(date_object))
for more info please see here:
https://www.programiz.com/python-programming/datetime/strptime

Resources