Output Multiple Aggregates From One Table - calculated-columns

I have a single table 'EMPLOYEE'. I need to count the 'emp_no', so that I have a multiple columns with each aggregate based on different restrictions. Not sure how to write to get the below output.
SELECT DEP_NO, COUNT(EMP_NO) Active
FROM EMPLOYEE
WHERE STATUS = 'active'
SELECT DEP_NO, COUNT(EMP_NO) "On Leave"
FROM EMPLOYEE
WHERE STATUS = 'on leave'
dep_no| Active On Leave Female Male
------|------------------------------
1 | 236 10 136 100
2 | 500 26 250 250
3 | 130 2 80 50
4 | 210 7 60 150

One possible answer is to use SUM + CASE
SELECT DEP_NO, SUM(CASE WHEN STATUS = 'active' THEN 1 ELSE 0 END) AS Active,
SUM(CASE WHEN STATUS = 'on leave' THEN 1 ELSE 0 END) AS [On Leave],
SUM(CASE WHEN STATUS = 'female' THEN 1 ELSE 0 END) AS Female,
SUM(CASE WHEN STATUS = 'male' THEN 1 ELSE 0 END) AS Male
FROM EMPLOYEE
GROUP BY DEP_NO

Related

How to exclude rows from a groupby operation

I am working on a groupby operation using the attribute column but I want to exclude the desc_type 1 and 2 that will be used to calculate total discount inside each attrib.
pd.DataFrame({'ID':[10,10,10,20,30,30],'attribute':['attrib_1','desc_type1','desc_type2','attrib_1','attrib_2','desc_type1'],'value':[100,0,0,100,30,0],'discount':[0,6,2,0,0,13.3]})
output:
ID attribute value discount
10 attrib_1 100 0
10 desc_type1 0 6
10 desc_type2 0 2
20 attrib_1 100 0
30 attrib_2 30 0
30 desc_type1 0 13.3
I want to groupby this dataframe by attribute but excluding the desc_type1 and desc_type2.
The desired output:
attribute ID_count value_sum discount_sum
attrib_1 2 200 8
attrib_2 1 30 13.3
explanations:
attrib_1 has discount_sum=8 because ID 30 that belongs to attrib_1has two desc_type
attrib_2 has discount_sum=13.3 because ID 10 has one desc_type
ID=20 has no discounts types.
What I did so far:
df.groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})
But the line above does not exclude the desc_type 1 and 2 from the groupby
Important: an ID may have a discount or not.
link to the realdataset: realdataset
You can fill the attributes per ID, then groupby.agg:
m = df['attribute'].str.startswith('desc_type')
group = df['attribute'].mask(m).groupby(df['ID']).ffill()
out = (df
.groupby(group, as_index=False)
.agg(**{'ID_count': ('ID', 'nunique'),
'value_sum': ('value', 'sum'),
'discount_sum': ('discount', 'sum')
})
)
output:
ID_count value_sum discount_sum
0 2 200 8.0
1 1 30 13.3
Hello I think this helps :
df.loc[(df['attribute'] != 'desc_type1') &( df['attribute'] != 'desc_type2')].groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})
Output :
ID value discount
attribute
attrib_1 2 200 0.0
attrib_2 1 30 0.0

Aggregate Total count

I want to merge two columns(Sender and Receiver) and get the Transaction Type count.
Sender Receiver Type Amount Date
773787639 777611388 1 300 2/1/2019
773631898 776806843 4 450 8/20/2019
773761571 777019819 6 369 2/11/2019
774295511 777084440 34 1000 1/22/2019
774263079 776816905 45 678 6/27/2019
774386894 777202863 12 2678 2/10/2019
773671537 777545555 14 38934 9/29/2019
774288117 777035194 18 21 4/22/2019
774242382 777132939 21 1275 9/30/2019
774144715 777049859 30 6309 7/4/2019
773911674 776938987 10 3528 5/1/2019
773397863 777548054 15 35892 7/6/2019
776816905 772345091 6 1234 7/7/2019
777035194 775623065 4 453454 7/20/2019
I am try to get like this kind of table
Sender/Receiver Type_1 Type_4 Type_12...... Type_45
773787639 3 2 0 0
773631898 1 0 1 2
773397863 2 2 0 0
772345091 1 1 0 3
You are looking for a pivot query. The only twist here is that we need to first take a union of the table to combine the sender/receiver data into a single column.
SELECT
SenderReceiver,
COUNT(CASE WHEN Type = 1 THEN 1 END) AS Type_1,
COUNT(CASE WHEN Type = 2 THEN 1 END) AS Type_2,
COUNT(CASE WHEN Type = 3 THEN 1 END) AS Type_3,
...
COUNT(CASE WHEN Type = 45 THEN 1 END) AS Type_45
FROM
(
SELECT Sender AS SenderReceiver, Type FROM yourTable
UNION ALL
SELECT Receiver, Type FROM yourTable
) t
GROUP BY
SenderReceiver;
If you don't want to type out 45 separate CASE expressions, you could probably automate it to some degree using Python.

pandas assign value in multiple columns based on value in one

I have a dataset like this,
sample = {'Theme': ['never give a ten','interaction speed','no feedback,premium'],
'cat1': [0,0,0],
'cat2': [0,0,0],
'cat3': [0,0,0],
'cat4': [0,0,0]
}
pd.DataFrame(sample,columns = ['Theme','cat1','cat2','cat3','cat4'])
Theme cat1 cat2 cat3 cat4
0 never give a ten 0 0 0 0
1 interaction speed 0 0 0 0
2 no feedback,premium 0 0 0 0
Now, I need to replace the values in cat columns based on value in Theme. If the Theme column has 'never give a ten', then change cat1 as 1, similarly if the theme column has 'interaction speed', then change cat2 as 1, if the theme column has 'no feedback' in it, change 'cat3' as 1 and for 'premium' change cat4 as 1.
In this sample I have provided 4 categories, I have in total 21 categories. I can do if word in string 21 times for 21 categories, but I am looking for an efficient way to write this in a function, loop every row and go through the logic and update the corresponding columns, can anyone help please?
Thanks in advance.
Here is possible set columns names by categories with Series.str.get_dummies - columns names are sorted:
df1 = df['Theme'].str.get_dummies(',')
print (df1)
interaction speed never give a ten no feedback premium
0 0 1 0 0
1 1 0 0 0
2 0 0 1 1
If need first column in output add DataFrame.join:
df11 = df[['Theme']].join(df['Theme'].str.get_dummies(','))
print (df11)
Theme interaction speed never give a ten no feedback \
0 never give a ten 0 1 0
1 interaction speed 1 0 0
2 no feedback,premium 0 0 1
premium
0 0
1 0
2 1
If order of columns is important add DataFrame.reindex:
#removed posible duplicates with remain ordering
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df['Theme'].str.get_dummies(',').reindex(cols, axis=1)
print (df2)
never give a ten interaction speed no feedback premium
0 1 0 0 0
1 0 1 0 0
2 0 0 1 1
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df[['Theme']].join(df['Theme'].str.get_dummies(',').reindex(cols, axis=1))
print (df2)
Theme never give a ten interaction speed no feedback \
0 never give a ten 1 0 0
1 interaction speed 0 1 0
2 no feedback,premium 0 0 1
premium
0 0
1 0
2 1

if the column hotel_name values already exist then add 1 in the column of is_pool,otherwise add new data to hotel_name column

I have 4 columns hotel_name, is_pool ,is_wifi ,is_gym.First time when i start the loop 10 hotels will added in hotel_name column and 1 is added to is_pool column. Second time when the loop starts for wifi it will check if the hotels already exist in hotel_name column then it will add 1 in is_spa(infront of that hotel which already exist like avari-hotel in below example) ,if these hotels not exist then it will add new hotel in hotel_name column and add 1 in is_wifi column same for is_gym etc...
Hotel_name Is_pool Is_wifi Is_gym
Grand_hayat 1 0 0
Royal-marria 1 0 0
Peart-continent 1 0 0
Sub-hotelways 1 0 0
Grand_marqs 1 0 0
Avari hotels 1 1 0
Chenone hotels 1 0 0
Savoey hotels 1 0 0
The grand 1 0 0
Hotel-range 1 0 0
Sub-marry 0 1 0
Royal-reside 0 1 0
Xyz 0 1 0
Abc 0 1 0
. . . .
. . . .
how i achieve this task kindly help :) thanks in advance
CREATE TABLE "Hotels" (
`hotel_id` INTEGER PRIMARY KEY AUTOINCREMENT,
`hotel_name` TEXT NOT NULL,
`is_pool` INTEGER DEFAULT 0,
`is_wifi` INTEGER DEFAULT 0,
`is_gym` INTEGER DEFAULT 0,
)
if(prefrence[i]=='pool'):
c.execute("INSERT INTO hotels (Hotel_name,is_pool) VALUES (?,?)" , [hotel], 1)
Just try to update the row. If the row was not found, insert it:
c.execute("UPDATE hotels SET is_wifi = 1 WHERE hotel_name = ?", [hotel])
if c.rowcount == 0:
c.execute("INSERT INTO hotels..."...)

Spotfire Consecutive Count

I am new to Spotfire so I hope that I ask this question correctly.
My table contains Corp_ID, Date and Flagged columns. The flagged column is either "1" or "0" based on if that Corp_ID had production on that date.
I need a custom expression that will return "0" if the flagged column is "0", BUT if the flagged column is "1" then I need it to return how many consecutive "1"s are in that string for that Corp_ID.
Corp_ID Date Flagged New Column
101 1/1/2016 1 1
101 1/2/2016 0 0
101 1/3/2016 1 4
101 1/4/2016 1 4
101 1/5/2016 1 4
101 1/6/2016 1 4
101 1/7/2016 0 0
101 1/8/2016 0 0
101 1/9/2016 1 2
101 1/10/2016 1 2
102 1/2/2016 1 3
102 1/3/2016 1 3
102 1/4/2016 1 3
102 1/5/2016 0 0
102 1/6/2016 0 0
102 1/7/2016 0 0
102 1/8/2016 1 4
102 1/9/2016 1 4
102 1/10/2016 1 4
102 1/11/2016 1 4
Thanks in advance for any assistance!
KC
This would be a lot easier to implement as part of the query you’re using to return the data, but if you have to do it in spotfire, I suggest this.
1- Create a hierarchy column containing [Corp_ID] and [Date] (Named ‘DateHr’)
2- Add a calculated column named ‘Concat Flags’ which concatenates all the previous flag values: Concatenate([Flagged]) OVER (Intersect(Parent([Hierarchy.DateHr]),allPrevious([Hierarchy.DateHr])))
3- Add a calculated column which will return the number of 0’s in the Concat Flags field (Named ‘# of 0s’): Len([Concat Flags]) - Len(Substitute([Concat Flags],"0",""))
4- Add a hierarchy column containing [Corp_ID] and [# of 0s] (Named ‘CorpHr’)
5- Add a calculated column to return your desired value: case when [Flagged]=1 then Sum([Flagged]) OVER (Intersect([Hierarchy.CorpHr])) else 0 end
Note: the above assumes you are working in Spotfire version 7.5. The syntax for using hierarchies in calculated columns differs slightly in earlier versions).

Resources