I have a single table 'EMPLOYEE'. I need to count the 'emp_no', so that I have a multiple columns with each aggregate based on different restrictions. Not sure how to write to get the below output.
SELECT DEP_NO, COUNT(EMP_NO) Active
FROM EMPLOYEE
WHERE STATUS = 'active'
SELECT DEP_NO, COUNT(EMP_NO) "On Leave"
FROM EMPLOYEE
WHERE STATUS = 'on leave'
dep_no| Active On Leave Female Male
------|------------------------------
1 | 236 10 136 100
2 | 500 26 250 250
3 | 130 2 80 50
4 | 210 7 60 150
One possible answer is to use SUM + CASE
SELECT DEP_NO, SUM(CASE WHEN STATUS = 'active' THEN 1 ELSE 0 END) AS Active,
SUM(CASE WHEN STATUS = 'on leave' THEN 1 ELSE 0 END) AS [On Leave],
SUM(CASE WHEN STATUS = 'female' THEN 1 ELSE 0 END) AS Female,
SUM(CASE WHEN STATUS = 'male' THEN 1 ELSE 0 END) AS Male
FROM EMPLOYEE
GROUP BY DEP_NO
Related
I am working on a groupby operation using the attribute column but I want to exclude the desc_type 1 and 2 that will be used to calculate total discount inside each attrib.
pd.DataFrame({'ID':[10,10,10,20,30,30],'attribute':['attrib_1','desc_type1','desc_type2','attrib_1','attrib_2','desc_type1'],'value':[100,0,0,100,30,0],'discount':[0,6,2,0,0,13.3]})
output:
ID attribute value discount
10 attrib_1 100 0
10 desc_type1 0 6
10 desc_type2 0 2
20 attrib_1 100 0
30 attrib_2 30 0
30 desc_type1 0 13.3
I want to groupby this dataframe by attribute but excluding the desc_type1 and desc_type2.
The desired output:
attribute ID_count value_sum discount_sum
attrib_1 2 200 8
attrib_2 1 30 13.3
explanations:
attrib_1 has discount_sum=8 because ID 30 that belongs to attrib_1has two desc_type
attrib_2 has discount_sum=13.3 because ID 10 has one desc_type
ID=20 has no discounts types.
What I did so far:
df.groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})
But the line above does not exclude the desc_type 1 and 2 from the groupby
Important: an ID may have a discount or not.
link to the realdataset: realdataset
You can fill the attributes per ID, then groupby.agg:
m = df['attribute'].str.startswith('desc_type')
group = df['attribute'].mask(m).groupby(df['ID']).ffill()
out = (df
.groupby(group, as_index=False)
.agg(**{'ID_count': ('ID', 'nunique'),
'value_sum': ('value', 'sum'),
'discount_sum': ('discount', 'sum')
})
)
output:
ID_count value_sum discount_sum
0 2 200 8.0
1 1 30 13.3
Hello I think this helps :
df.loc[(df['attribute'] != 'desc_type1') &( df['attribute'] != 'desc_type2')].groupby('attribute').agg({'ID':'count','value':'sum','discount':'sum'})
Output :
ID value discount
attribute
attrib_1 2 200 0.0
attrib_2 1 30 0.0
I want to merge two columns(Sender and Receiver) and get the Transaction Type count.
Sender Receiver Type Amount Date
773787639 777611388 1 300 2/1/2019
773631898 776806843 4 450 8/20/2019
773761571 777019819 6 369 2/11/2019
774295511 777084440 34 1000 1/22/2019
774263079 776816905 45 678 6/27/2019
774386894 777202863 12 2678 2/10/2019
773671537 777545555 14 38934 9/29/2019
774288117 777035194 18 21 4/22/2019
774242382 777132939 21 1275 9/30/2019
774144715 777049859 30 6309 7/4/2019
773911674 776938987 10 3528 5/1/2019
773397863 777548054 15 35892 7/6/2019
776816905 772345091 6 1234 7/7/2019
777035194 775623065 4 453454 7/20/2019
I am try to get like this kind of table
Sender/Receiver Type_1 Type_4 Type_12...... Type_45
773787639 3 2 0 0
773631898 1 0 1 2
773397863 2 2 0 0
772345091 1 1 0 3
You are looking for a pivot query. The only twist here is that we need to first take a union of the table to combine the sender/receiver data into a single column.
SELECT
SenderReceiver,
COUNT(CASE WHEN Type = 1 THEN 1 END) AS Type_1,
COUNT(CASE WHEN Type = 2 THEN 1 END) AS Type_2,
COUNT(CASE WHEN Type = 3 THEN 1 END) AS Type_3,
...
COUNT(CASE WHEN Type = 45 THEN 1 END) AS Type_45
FROM
(
SELECT Sender AS SenderReceiver, Type FROM yourTable
UNION ALL
SELECT Receiver, Type FROM yourTable
) t
GROUP BY
SenderReceiver;
If you don't want to type out 45 separate CASE expressions, you could probably automate it to some degree using Python.
I have a dataset like this,
sample = {'Theme': ['never give a ten','interaction speed','no feedback,premium'],
'cat1': [0,0,0],
'cat2': [0,0,0],
'cat3': [0,0,0],
'cat4': [0,0,0]
}
pd.DataFrame(sample,columns = ['Theme','cat1','cat2','cat3','cat4'])
Theme cat1 cat2 cat3 cat4
0 never give a ten 0 0 0 0
1 interaction speed 0 0 0 0
2 no feedback,premium 0 0 0 0
Now, I need to replace the values in cat columns based on value in Theme. If the Theme column has 'never give a ten', then change cat1 as 1, similarly if the theme column has 'interaction speed', then change cat2 as 1, if the theme column has 'no feedback' in it, change 'cat3' as 1 and for 'premium' change cat4 as 1.
In this sample I have provided 4 categories, I have in total 21 categories. I can do if word in string 21 times for 21 categories, but I am looking for an efficient way to write this in a function, loop every row and go through the logic and update the corresponding columns, can anyone help please?
Thanks in advance.
Here is possible set columns names by categories with Series.str.get_dummies - columns names are sorted:
df1 = df['Theme'].str.get_dummies(',')
print (df1)
interaction speed never give a ten no feedback premium
0 0 1 0 0
1 1 0 0 0
2 0 0 1 1
If need first column in output add DataFrame.join:
df11 = df[['Theme']].join(df['Theme'].str.get_dummies(','))
print (df11)
Theme interaction speed never give a ten no feedback \
0 never give a ten 0 1 0
1 interaction speed 1 0 0
2 no feedback,premium 0 0 1
premium
0 0
1 0
2 1
If order of columns is important add DataFrame.reindex:
#removed posible duplicates with remain ordering
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df['Theme'].str.get_dummies(',').reindex(cols, axis=1)
print (df2)
never give a ten interaction speed no feedback premium
0 1 0 0 0
1 0 1 0 0
2 0 0 1 1
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df[['Theme']].join(df['Theme'].str.get_dummies(',').reindex(cols, axis=1))
print (df2)
Theme never give a ten interaction speed no feedback \
0 never give a ten 1 0 0
1 interaction speed 0 1 0
2 no feedback,premium 0 0 1
premium
0 0
1 0
2 1
I have 4 columns hotel_name, is_pool ,is_wifi ,is_gym.First time when i start the loop 10 hotels will added in hotel_name column and 1 is added to is_pool column. Second time when the loop starts for wifi it will check if the hotels already exist in hotel_name column then it will add 1 in is_spa(infront of that hotel which already exist like avari-hotel in below example) ,if these hotels not exist then it will add new hotel in hotel_name column and add 1 in is_wifi column same for is_gym etc...
Hotel_name Is_pool Is_wifi Is_gym
Grand_hayat 1 0 0
Royal-marria 1 0 0
Peart-continent 1 0 0
Sub-hotelways 1 0 0
Grand_marqs 1 0 0
Avari hotels 1 1 0
Chenone hotels 1 0 0
Savoey hotels 1 0 0
The grand 1 0 0
Hotel-range 1 0 0
Sub-marry 0 1 0
Royal-reside 0 1 0
Xyz 0 1 0
Abc 0 1 0
. . . .
. . . .
how i achieve this task kindly help :) thanks in advance
CREATE TABLE "Hotels" (
`hotel_id` INTEGER PRIMARY KEY AUTOINCREMENT,
`hotel_name` TEXT NOT NULL,
`is_pool` INTEGER DEFAULT 0,
`is_wifi` INTEGER DEFAULT 0,
`is_gym` INTEGER DEFAULT 0,
)
if(prefrence[i]=='pool'):
c.execute("INSERT INTO hotels (Hotel_name,is_pool) VALUES (?,?)" , [hotel], 1)
Just try to update the row. If the row was not found, insert it:
c.execute("UPDATE hotels SET is_wifi = 1 WHERE hotel_name = ?", [hotel])
if c.rowcount == 0:
c.execute("INSERT INTO hotels..."...)
I am new to Spotfire so I hope that I ask this question correctly.
My table contains Corp_ID, Date and Flagged columns. The flagged column is either "1" or "0" based on if that Corp_ID had production on that date.
I need a custom expression that will return "0" if the flagged column is "0", BUT if the flagged column is "1" then I need it to return how many consecutive "1"s are in that string for that Corp_ID.
Corp_ID Date Flagged New Column
101 1/1/2016 1 1
101 1/2/2016 0 0
101 1/3/2016 1 4
101 1/4/2016 1 4
101 1/5/2016 1 4
101 1/6/2016 1 4
101 1/7/2016 0 0
101 1/8/2016 0 0
101 1/9/2016 1 2
101 1/10/2016 1 2
102 1/2/2016 1 3
102 1/3/2016 1 3
102 1/4/2016 1 3
102 1/5/2016 0 0
102 1/6/2016 0 0
102 1/7/2016 0 0
102 1/8/2016 1 4
102 1/9/2016 1 4
102 1/10/2016 1 4
102 1/11/2016 1 4
Thanks in advance for any assistance!
KC
This would be a lot easier to implement as part of the query you’re using to return the data, but if you have to do it in spotfire, I suggest this.
1- Create a hierarchy column containing [Corp_ID] and [Date] (Named ‘DateHr’)
2- Add a calculated column named ‘Concat Flags’ which concatenates all the previous flag values: Concatenate([Flagged]) OVER (Intersect(Parent([Hierarchy.DateHr]),allPrevious([Hierarchy.DateHr])))
3- Add a calculated column which will return the number of 0’s in the Concat Flags field (Named ‘# of 0s’): Len([Concat Flags]) - Len(Substitute([Concat Flags],"0",""))
4- Add a hierarchy column containing [Corp_ID] and [# of 0s] (Named ‘CorpHr’)
5- Add a calculated column to return your desired value: case when [Flagged]=1 then Sum([Flagged]) OVER (Intersect([Hierarchy.CorpHr])) else 0 end
Note: the above assumes you are working in Spotfire version 7.5. The syntax for using hierarchies in calculated columns differs slightly in earlier versions).