Produce output from empty window - azure

Is it possible to produce output from a stream analytics query, using the "group by window" expression, when the window is empty?
For instance, in this example, the query:
SELECT System.Timestamp as WindowEnd, SwitchNum, COUNT(*) as CallCount
FROM CallStream TIMESTAMP BY CallRecTime
GROUP BY TUMBLINGWINDOW(s, 5), SwitchNum
produces the output:
2015-04-15T22:10:40.000Z UK 1
2015-04-15T22:10:40.000Z US 1
2015-04-15T22:10:45.000Z China 1
2015-04-15T22:10:45.000Z Germany 1
2015-04-15T22:10:45.000Z UK 3
2015-04-15T22:10:45.000Z US 1
2015-04-15T22:10:50.000Z Australia 2
...
Is it possible to make it produce something like:
2015-04-15T22:10:40.000Z China 0
2015-04-15T22:10:40.000Z Germany 0
2015-04-15T22:10:40.000Z UK 1
2015-04-15T22:10:40.000Z US 1
2015-04-15T22:10:40.000Z Australia 0
2015-04-15T22:10:45.000Z China 1
2015-04-15T22:10:45.000Z Germany 1
2015-04-15T22:10:45.000Z UK 3
2015-04-15T22:10:45.000Z US 1
2015-04-15T22:10:45.000Z Australia 0
...
?
The objective is to detect, using a hopping window, if there were no events in the last x seconds.

Use a LEFT JOIN with a lookup table of SwitchNum values, which will produce a result with NULL if no values are in the window.
This blog post explains in more detail: http://blogs.msdn.com/b/streamanalytics/archive/2014/12/09/how-to-query-for-all-events-and-no-event-scenarios.aspx

Related

How to list row headers from matrix based on binary value (Excel)?

I would like to extract/list from a matrix the row headers based on binary values and depending on the column. Basically FROM something like this:
Country Product1 Product2 Product3
Germany 1 0 1
France 1 1 0
Spain 0 1 0
Italy 1 0 1
Belgium 0 1 0
OBTAIN something like this:
Product1 Product2 Product3
Germany France Germany
France Spain Italy
Italy Belgium
So basically list the values based on column and binary value.
Better if no VBA is involved.
Any suggestion is welcome!
Assuming your data is in a table named Table1, for Office 365:
=T(SORT(IF(Table1[Product1],Table1[[Country]:[Country]])))
and use the fill handle to drag right.

Compare three dataframe and create a new column in one of the dataframe based on a condition

I am comparing two data frames with master_df and create a new column based on a new condition if available.
for example I have master_df and two region df as asia_df and europe_df. I want to check if company of master_df is available in any of the region data frames and create a new column as region as Europe and Asia
master_df
company product
ABC Apple
BCA Mango
DCA Apple
ERT Mango
NFT Oranges
europe_df
account sales
ABC 12
BCA 13
DCA 12
asia_df
account sales
DCA 15
ERT 34
My final output dataframe is expected to be
company product region
ABC Apple Europe
BCA Mango Europe
DCA Apple Europe
DCA Apple Asia
ERT Mango Asia
NFT Oranges Others
When I try to merge and compare, some datas are removed. I need help on how to fix this issues
final_df = europe_df.merge(master_df, left_on='company', right_on='account', how='left').drop_duplicates()
final1_df = asia_df.merge(master_df, left_on='company', right_on='account', how='left').drop_duplicates()
final['region'] = np.where(final_df['account'] == final_df['company'] ,'Europe','Others')
final['region'] = np.where(final1_df['account'] == final1_df['company'] ,'Asia','Others')
First using pd.concat concat the dataframes asia_df and europe_df then use DataFrame.merge to merge them with master_df, finally use Series.fillna to fill NaN values in Region with Others:
r = pd.concat([europe_df.assign(Region='Europe'), asia_df.assign(Region='Asia')])\
.rename(columns={'account': 'company'})[['company', 'Region']]
df = master_df.merge(r, on='company', how='left')
df['Region'] = df['Region'].fillna('Others')
Result:
print(df)
company product Region
0 ABC Apple Europe
1 BCA Mango Europe
2 DCA Apple Europe
3 DCA Apple Asia
4 ERT Mango Asia
5 NFT Oranges Others

How to calculate common elements in a dataframe depending on another column

I have a dataframe like this.
sport Country(s)
Foot_ball brazil
Foot_ball UK
Volleyball UK
Volleyball South_Africa
Volleyball brazil
Rugger UK
Rugger South_africa
Rugger Australia
Carrom UK
Carrom Australia
Chess UK
Chess Australia
I want to calculate the number of sports shared by two countries. For a example
Football and Volleyball is common to brazil and Uk. So the number of common sports played by brazil and Uk is 2.
carrom, chess and Rugger are common to australia and Uk. So the number of sports shared by australia and UK is 3.
Like this is there anyway that I can get a count in whole dataframe for
brazil, south_afriaca.
Brazil, Austrlia
SouthAfrica, Uk
e.t.c
Can anybody suggest me how to do this in pandas or any other way.
With the sample data you provided you can generate the desired output with below code:
import pandas as pd
df = pd.DataFrame(
[["Foot_ball", "brazil"],\
["Foot_ball", "UK"],\
["Volleyball", "UK"],\
["Volleyball", "South_Africa"],\
["Volleyball", "brazil"],\
["Rugger", "UK"],\
["Rugger", "South_Africa"],\
["Rugger", "Australia"],\
["Carrom", "UK"],\
["Carrom", "Australia"],\
["Chess", "UK"],\
["Chess", "Australia"]],\
columns = ["sport" , "Country"])
# Function to get the number of sports in common
def countCommonSports(row):
sports1 = df["sport"][df["Country"]==row["Country 1"]]
sports2 = df["sport"][df["Country"]==row["Country 2"]]
return len(list(set(sports1).intersection(sports2)))
# Generate the combinations of countries from original Dataframe
from itertools import combinations
comb = combinations(df["Country"].unique(), 2)
out = pd.DataFrame(list(comb), columns=["Country 1", "Country 2"])
# Find out the sports in common between coutries
out["common Sports count"] = out.apply(countCommonSports, axis = 1)
output is then:
>>> out
Country 1 Country 2 common Sports count
0 brazil UK 2
1 brazil South_Africa 1
2 brazil Australia 0
3 UK South_Africa 2
4 UK Australia 3
5 South_Africa Australia 1
pd.factorize and itertools.combinations
import pandas as pd
import numpy as np
from itertools import combinations, product
# Fix Capitalization
df['Country(s)'] = ['_'.join(map(str.title, x.split('_'))) for x in df['Country(s)']]
c0, c1 = zip(*[(a, b)
for s, c in df.groupby('sport')['Country(s)']
for a, b in combinations(c, 2)])
i, r = pd.factorize(c0)
j, c = pd.factorize(c1)
n, m = len(r), len(c)
o = np.zeros((n, m), np.int64)
np.add.at(o, (i, j), 1)
result = pd.DataFrame(o, r, c)
result
Australia Uk South_Africa Brazil
Uk 3 0 2 1
Brazil 0 1 0 0
South_Africa 1 0 0 1
Make symmetrical
result = result.align(result.T, fill_value=0)[0]
result
Australia Brazil South_Africa Uk
Australia 0 0 0 0
Brazil 0 0 0 1
South_Africa 1 1 0 0
Uk 3 1 2 0
pd.crosstab
This will be slower... almost certainly.
c0, c1 = map(pd.Series, zip(*[(a, b)
for s, c in df.groupby('sport')['Country(s)']
for a, b in combinations(c, 2)]))
pd.crosstab(c0, c1).rename_axis(None).rename_axis(None, axis=1).pipe(
lambda d: d.align(d.T, fill_value=0)[0]
)
Australia Brazil South_Africa Uk
Australia 0 0 0 0
Brazil 0 0 0 1
South_Africa 1 1 0 0
Uk 3 1 2 0
Or including all sports within a single country
c0, c1 = map(pd.Series, zip(*[(a, b)
for s, c in df.groupby('sport')['Country(s)']
for a, b in product(c, c)]))
pd.crosstab(c0, c1).rename_axis(None).rename_axis(None, axis=1)
Australia Brazil South_Africa Uk
Australia 3 0 1 3
Brazil 0 2 1 2
South_Africa 1 1 2 2
Uk 3 2 2 5

python: groupby - function called when grouping

All I would like to do a groupby and call a function at the same time.
Here is the function
def dollar_wtd_two(DF, kw_param, kw_param2):
return np.sum(DF[kw_param] * DF[kw_param2]) / DF[kw_param2].sum()
The function dollar_wtd_two has 3 parameters DF (the dataframe) and columns names of the same dataframe DF. Conceptually here what I would like to do:
DF.groupby(['prime_broker_id', 'country_name'],
as_index=False).agg({"notional_current": np.sum, "new_column":dollar_wtd_two(DF,
kw_param, kw_param2) })
Basically the groupby would do simple operations like sum or average and also more involved operations where I would call functions similar to dollar_wtd_two
Here is how the output of DF would like without the "new_column"
DF.groupby(['prime_broker_id', 'country_name'],
as_index=False).agg({"notional_current": np.sum })
Output 1:
prime_broker_id country_name notional_current
0 BARCAP AUSTRIA 2.616735e+07
1 BARCAP BELGIUM 6.327196e+07
2 BARCAP DENMARK 1.286309e+07
3 BARCAP FINLAND 4.181843e+07
4 BARCAP FRANCE 1.579292e+08
5 BARCAP GERMANY 2.653451e+08
6 BARCAP IRELAND 1.037968e+07
I am not able to show the output of DF with "new_column":dollar_wtd_two(DF, kw_param, kw_param2). However individually the output of dollar_wtd_two(DF, kw_param, kw_param2) should look like this:
Output 2:
prime_broker_id country_name
BARCAP AUSTRIA 25.009402
BELGIUM 25.083404
DENMARK 25.000000
FINLAND 25.034493
FRANCE 25.000000
GERMANY 25.007943
IRELAND 25.000000
ISRAEL 399.242997
The idea is to combine Output 1 and Output 2 in one operation. Please let me if it is unclear.
Any help is more than welcome
Thanks a lot

python -count elements pandas dataframe

I have a table with some info about districts. I have converted it into a pandas dataframe and my question is how can I count how many times SOUTHERN, BAYVIEW etc. appear in the table below? I want to add an extra column next to District with the total number of each district.
District
0 SOUTHERN
1 BAYVIEW
2 CENTRAL
3 NORTH
Here you need to use a groupby and a size method (you can also use some other aggregations such as count)
With this dataframe:
import pandas as pd
df = pd.DataFrame({'DISTRICT': ['SOUTHERN', 'SOUTHERN', 'BAYVIEW', 'BAYVIEW', 'BAYVIEW', 'CENTRAL', 'NORTH']})
Represented as below
DISTRICT
0 SOUTHERN
1 SOUTHERN
2 BAYVIEW
3 BAYVIEW
4 BAYVIEW
5 CENTRAL
6 NORTH
You can use
df.groupby(['DISTRICT']).size().reset_index(name='counts')
You have this output
DISTRICT counts
0 BAYVIEW 3
1 CENTRAL 1
2 NORTH 1
3 SOUTHERN 2

Resources