Change tracking using Kusto queries - azure

I have data in kusto table that gets updated with every deployment. I want to check what change was made in a particular deployment
ColumnA ColumnB ColumnC ColumnD Modifiedat
Row1 Value1 C6 D6 Dec15
Row2 Value2 C2 D2 Dec15
Row3 Value6 C3 D5 Dec15
Row4 Value4 C4 D4 Dec15
Row1 Value1 C1 D1 Dec14
Row2 Value2 C2 D2 Dec14
Row3 Value6 C3 D5 Dec14
Row4 Value4 C5 D4 Dec14
Row1 Value1 C1 D1 Dec13
Row2 Value2 C2 D2 Dec13
Row3 Value3 C3 D3 Dec13
Row4 Value4 C4 D4 Dec13
Now if we need to track change by column A values in column B Row 3 has a value change on Dec 14 load (From Value 3 to Value 6 ) Row 4 has a value change on Dec 15 load (From Value 4 to Value 5)
I want to extract data on each date as for what all has changed like to project only changed rows for a date ,and I can run this query daily to find the daily change tracking
ColumnA ColumnB ColumnC ColumnD Modified at
Row3 Value6 C3 D6 Dec14
Row4 Value4 C5 D4 Dec14
Row2 Value2 C6 D7 Dec15
Row1 Value1 C6 D6 Dec15

As mentioned in the comments it would help, if you post your datasets as a text.
If you would like to query the current data (with the latest modification), you could do this with the arg_max operator
datatable(Fruit: string, Value : int, timestamp : datetime ) [
"Apple", 1, datetime(2022-01-01),
"Banana", 2, datetime(2022-01-01),
"Apple", 3, datetime(2022-01-03),
"Banana", 6, datetime(2022-01-05)
]
| summarize arg_max(timestamp, *) by Fruit

Related

How To Insert Data With Matching Parameter Values From One Table to Another?

I have 2 tables, the first one is the data table that I get daily from a source and the second one is a static table where there are parameter information for every 15 minutes.
The problem is, as you can see in the first table, I didn't get Value1 or Value2 for 00:15, 00:30 from the source. I want to insert the values with matching all parameters from table1 to table2. And if there is a mismatch, I want it to insert 0.
Parameter1
Parameter2
Parameter3
Parameter4
Value1
Value2
00:00
1434
A10
B10
1
1
00:45
1434
A10
B10
2
2
01:00
1434
A10
B10
3
3
01:15
1434
A10
B10
4
4
Parameter1
Parameter2
Parameter3
Parameter4
Value1
Value2
00:00
1434
A10
B10
00:15
1434
A10
B10
00:30
1434
A10
B10
00:45
1434
A10
B10
01:00
1434
A10
B10
01:15
1434
A10
B10
00:00
1434
A11
B11
00:15
1434
A11
B11
The final table should look like this.
Parameter1
Parameter2
Parameter3
Parameter4
Value1
Value2
00:00
1434
A10
B10
1
1
00:15
1434
A10
B10
0
0
00:30
1434
A10
B10
0
0
00:45
1434
A10
B10
2
2
01:00
1434
A10
B10
3
3
01:15
1434
A10
B10
4
4
00:00
1434
A11
B11
0
0
00:15
1434
A11
B11
0
0
I tried to use =VLOOKUP function but I couldn't figure out how I was supposed to use it having multiple parameters and values.
The reason behind I'm trying to this on Excel is because I don't want to do all these process one by one on SQL. Here's how I do all these things on SQL one by one.
I import the raw data to SQL table.
I import the parameter values to another SQL table. (I import table2 so Value1 and Value2 come as NULL)
Then I update the parameter value table.
Now that I have the table I want, I simply change all the NULL values with 0. So that is an another step :)
Here is the update code for step 3:
UPDATE a
SET
a.Value1 = b.Value1,
a.Value2 = b.Value2,
FROM Table2 a
INNER JOIN
Table1 b
ON a.Parameter1 = b.Parameter1 AND a.Parameter2 = b.Parameter2 AND a.Parameter3 = b.Parameter3
You can try FILTER() formula with. BYROW() for dynamically iterate each row and spill results.
=BYROW(H2:K7,LAMBDA(x,FILTER(E2:E5,A2:A5&B2:B5&C2:C5&D2:D5=CONCAT(x),0)))
=BYROW(H2:K7,LAMBDA(x,FILTER(F2:F5,A2:A5&B2:B5&C2:C5&D2:D5=CONCAT(x),0)))

Combine 2 related DataFrames into one multiple indexes dataFrame

I've 2 related Data Frames, Is there any easy way to combine into multi-indexes dataframe?
import pandas as pd
df = pd.DataFrame( [[
1,0,1,0],
[1,1,0,0],
[1,0,0,1],
[0,1,0,1]], columns=["c1","c2","c3", "c4"]
)
idx= pd.Index(['p1','p2','p3','p4'])
df = df.set_index(idx)
df output is:
c1 c2 c3 c4
p1 1 0 1 0
p2 1 1 0 0
p3 1 0 0 1
p4 0 1 0 1
df2 = pd.DataFrame( [[
0,10,30,0],
[20,10,0,0],
[0,10,0,6],
[15,0,18,5]], columns=["c1","c2","c3", "c4"]
)
idx2= pd.Index(['a1','a2','a3','a4'])
df2 = df2.set_index(idx2)
df2 output is:
c1 c2 c3 c4
a1 0 10 30 0
a2 20 10 0 0
a3 0 10 0 6
a4 15 0 18 5
The final dataframe is multi-indexing (p,c,a) single column (value):
value
p1 c1 a2 20
a4 15
c3 a1 30
a4 18
p2 c1 a2 20
a4 15
c2 a1 10
a2 10
a3 10
p3 c1 a2 20
a4 15
c4 a3 6
a4 5
p4 c2 a1 10
a2 10
a3 10
c4 a3 6
a4 5
You can reshape an merge:
(df.reset_index().melt('index')
.loc[lambda x: x.pop('value').eq(1)]
.merge(df2.reset_index().melt('index').query('value != 0'),
on='variable')
.set_index(['index_x', 'variable', 'index_y'])
.rename_axis([None, None, None])
)
output:
value
p1 c1 a2 20
a4 15
p2 c1 a2 20
a4 15
p3 c1 a2 20
a4 15
p2 c2 a1 10
a2 10
a3 10
p4 c2 a1 10
a2 10
a3 10
p1 c3 a1 30
a4 18
p3 c4 a3 6
a4 5
p4 c4 a3 6
a4 5
If order matters:
(df.stack().reset_index()
.loc[lambda x: x.pop(0).eq(1)]
.set_axis(['index', 'variable'], axis=1)
.merge(df2.reset_index().melt('index').query('value != 0'),
on='variable', how='left')
.set_index(['index_x', 'variable', 'index_y'])
.rename_axis([None, None, None])
)
output:
value
p1 c1 a2 20
a4 15
c3 a1 30
a4 18
p2 c1 a2 20
a4 15
c2 a1 10
a2 10
a3 10
p3 c1 a2 20
a4 15
c4 a3 6
a4 5
p4 c2 a1 10
a2 10
a3 10
c4 a3 6
a4 5

How to merge pandas dataframes with different column names

Can someone please tell me how I can achieve results like the image above, but with the following differences:
# Note the column names
df1 = pd.DataFrame({"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
index = [0, 1, 2, 3],
)
# Note the column names
df2 = pd.DataFrame({"AA": ["A4", "A5", "A6", "A7"],
"BB": ["B4", "B5", "B6", "B7"],
"CC": ["C4", "C5", "C6", "C7"],
"DD": ["D4", "D5", "D6", "D7"],
},
index = [4, 5, 6, 7],
)
# Note the column names
df3 = pd.DataFrame({"AAA": ["A8", "A9", "A10", "A11"],
"BBB": ["B8", "B9", "B10", "B11"],
"CCC": ["C8", "C9", "C10", "C11"],
"DDD": ["D8", "D9", "D10", "D11"],
},
index = [8, 9, 10, 11],
)
Every kind of merge I do results in this:
Here's what I'm trying to accomplish:
I'm doing my Capstone Project, and the use case uses the SpaceX data set. I've web-scraped the tables found here: SpaceX Falcon 9 Wikipedia,
Now I'm trying to combine them into one large table. However, there are slight differences in the column names, between each table, and so I have to do more logic to merge properly. There are 10 tables in total, I've checked 5. 3 have unique column names, so the simple merging doesn't work.
I've searched around at the other questions, but the use case is different than mine, so I haven't found an answer that works for me.
I'd really appreciate someone's help, or pointing me where I can find more info on the subject. So far I've had no luck in my searches.
Let us just do np.concatenate
out = pd.DataFrame(np.concatenate([df1.values,df2.values,df3.values]),columns=df1.columns)
Out[346]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
IIUC, you could just modify the column names and concatenate:
df2.columns = df2.columns.str[0]
df3.columns = df3.columns.str[0]
out = pd.concat([df1, df2, df3])
or if you're into one-liners, you could do:
out = pd.concat([df1, df2.rename(columns=lambda x:x[0]), df3.rename(columns=lambda x:x[0])])
Output:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

Recode A String Variable

input VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 0
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end
Say you have data such as the ones shown. I have VAR1 and wish to create from it VAR2 which takes values 1 if VAR1 contains at the beginning: A1, A3-A10, A15-A19, A21 or if not then it is zero. I believe for this you can use strpos(VAR1) but is it possible to say for example: strpos(VAR1, "A1, A3/A10, A15/A19, A21") ?
The following works if you have a small number of strings of interest. You may need an alternate approach if you are searching for a larger number of strings, where writing out the ranges of strings (e.g. A3-A10) is unfeasible.
clear
input str3 VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 1
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end
gen wanted = 0
local mystrings = "A1 A3 A4 A5 A6 A7 A8 A9 A10 A15 A16 A17 A18 A19 A21"
foreach string in `mystrings' {
replace wanted = 1 if strpos(VAR1, "`string'") == 1
}
assert wanted == VAR2
Note that in your example input, the second occurrence of A1 had a value of 0 but should have a value of 1 according to your post.
Here is a more generalisable solution for larger ranges of strings:
gen A = 0
replace A = 1 if strpos(VAR1,"A") == 1
gen newvar = substr(VAR1,2,.)
destring newvar, replace
gen wanted = 0
replace wanted = 1 if A == 1 & (inlist(newvar,1,21) | inrange(newvar,3,10) | inrange(newvar,15,19))
assert wanted == VAR2

Merging 2 data frames on 3 columns where data sometimes exists

I am attempting merge and fill in missing values in one data frame from another one. Hopefully this isn't too long of an explanation i have just been wracking my brain around this for too long. I am working with 2 huge CSV files so i made a small example here. I have included the entire code at the end in case you were curious to assist. THANK YOU SO MUCH IN ADVANCE. Here we go!
print(df1)
A B C D E
0 1 B1 D1 E1
1 C1 D1 E1
2 1 B1 D1 E1
3 2 B2 D2 E2
4 B2 C2 D2 E2
5 3 D3 E3
6 3 B3 C3 D3 E3
7 4 C4 D4 E4
print(df2)
A B C F G
0 1 C1 F1 G1
1 B2 C2 F2 G2
2 3 B3 F3 G3
3 4 B4 C4 F4 G4
I would essentially like to merge df2 into df1 by 3 different columns. i understand that you can merge on multiple column names but it seems to not give me the desired result. I would like to KEEP all data in df1, and fill in the data from df2 so i use how='left'.
I am fairly new to python and have done a lot of research but have hit a stuck point. Here is what i have tried.
data3 = df1.merge(df2, how='left', on=['A'])
print(data3)
A B_x C_x D E B_y C_y F G
0 1 B1 D1 E1 C1 F1 G1
1 C1 D1 E1 B2 C2 F2 G2
2 1 B1 D1 E1 C1 F1 G1
3 2 B2 D2 E2 NaN NaN NaN NaN
4 B2 C2 D2 E2 B2 C2 F2 G2
5 3 D3 E3 B3 F3 G3
6 3 B3 C3 D3 E3 B3 F3 G3
7 4 C4 D4 E4 B4 C4 F4 G4
As you can see here it sort of worked with just A, however since this is a csv file with blank values. the blank values seem to merge together. which i do not want. because df2 was blank in row 2 it filled in the data where it saw blanks, which is not what i want. it should be NaN if it could not find a match.
whenever i start putting additional rows into my "on=['A', 'B'] it does not do anything different. in-fact, A no longer merges.
data3 = df1.merge(df2, how='left', on=['A', 'B'])
print(data3)
A B C_x D E C_y F G
0 1 B1 D1 E1 NaN NaN NaN
1 C1 D1 E1 NaN NaN NaN
2 1 B1 D1 E1 NaN NaN NaN
3 2 B2 D2 E2 NaN NaN NaN
4 B2 C2 D2 E2 C2 F2 G2
5 3 D3 E3 NaN NaN NaN
6 3 B3 C3 D3 E3 F3 G3
7 4 C4 D4 E4 NaN NaN NaN
Rows A, B, and C are the values i want to correlate and merge on. Using both data frames it should know enough to fill in all the gaps. my ending df should look like:
print(desired_output):
A B C D E F G
0 1 B1 C1 D1 E1 F1 G1
1 1 B1 C1 D1 E1 F1 G1
2 1 B1 C1 D1 E1 F1 G1
3 2 B2 C2 D2 E2 F2 G2
4 2 B2 C2 D2 E2 F2 G2
5 3 B3 C3 D3 E3 F3 G3
6 3 B3 C3 D3 E3 F3 G3
7 4 B4 C4 D4 E4 F4 G4
even though A, B, and C have repeating rows i want to keep ALL the data and just fill in the data from df2 where it might fit, even if it is repeat data. i also do not want to have all of the _x and _y the suffix's from merging. i know how to rename but doing 3 different merges and merging those merges starts to get really complicated really fast with repeated rows and suffix's...
long story short, how can i merge both data-frames by A, and then B, and then C? order in which it happens is irrelevant.
Here is a sample of actual data. I have my own data that has additional data and i relate it to this data by certain identifiers. basically by MMSI, Name and IMO. i want to keep duplicates because they aren't actually duplicates, just additional data points for each vessel
MMSI BaseDateTime LAT LON VesselName IMO CallSign
366940480.0 2017-01-04T11:39:36 52.48730 -174.02316 EARLY DAWN 7821130 WDB7319
366940480.0 2017-01-04T13:51:07 52.41575 -174.60041 EARLY DAWN 7821130 WDB7319
273898000.0 2017-01-06T16:55:33 63.83668 -174.41172 MYS CHUPROVA NaN UAEZ
352844000.0 2017-01-31T22:51:31 51.89778 -176.59334 JACHA 8512920 3EFC4
352844000.0 2017-01-31T23:06:31 51.89795 -176.59333 JACHA 8512920 3EFC4

Resources