In Pandas, how to compute value counts on bins and sum value in 1 other column

In Pandas, how to compute value counts on bins and sum value in 1 other column - python-3.x

I have Pandas dataframe like:
df =
col1 col2
23 75
25 78
22 120
I want to specify bins: 0-100 and 100-200 and divide col2 in those bins compute its value counts and sum col1 for the values that lie in those bins.
So:
df_output:
col2_range count col1_cum
0-100 2 48
100-200 1 22
Getting the col2_range and count is pretty simple:
import numpy as np
a = np.arange(0,200, 100)
bins = a.tolist()
counts = data['col1'].value_counts(bins=bins, sort=False)
How do I get to sum col2 though?

IIUC, try using pd.cut to create bins and groupby those bins:
g = pd.cut(df['col2'],
bins=[0, 100, 200, 300, 400],
labels = ['0-99', '100-199', '200-299', '300-399'])
df.groupby(g, observed=True)['col1'].agg(['count','sum']).reset_index()
Output:
col2 count sum
0 0-99 2 48
1 100-199 1 22
I think I misread the original post:
g = pd.cut(df['col2'],
bins=[0,100,200,300,400],
labels = ['0-99', '100-199', '200-299', '300-399'])
df.groupby(g, observed=True).agg(col1_count=('col1','count'),
col2_sum=('col2','sum'),
col1_sum=('col1','sum')).reset_index()
Output:
col2 col1_count col2_sum col1_sum
0 0-99 2 153 48
1 100-199 1 120 22

Related

How to filter a dataframe using a cumulative sum of a column as parameter

I have this df:
df=pd.DataFrame({'Name':['John','Mike','Lucy','Mary','Andy'],
'Age':[10,23,13,12,15],
'%':[20,20,10,25,25]})
I want to filter this df by taking from row 0 to row n until the sum of column % = 50
I don't want to sort the % column or the df, I just need to get it's first row where % column sums 50
The output is:
filtered=pd.DataFrame({'Name':['John','Mike','Lucy'],'Age':[10,23,13],'%':[20,20,10]})

cumsum, boolean index and slice using the loc or iloc accessor
df.iloc[:(df['%'].cumsum()==50).idxmax()+1,:]
Name Age %
0 John 10 20
1 Mike 23 20
2 Lucy 13 10

Applying "percentage of group total" to a column in a grouped dataframe

I have a dataframe from which I generate another dataframe using following code as under:
df.groupby(['Cat','Ans']).agg({'col1':'count','col2':'sum'})
This gives me following result:
Cat Ans col1 col2
A Y 100 10000.00
N 40 15000.00
B Y 80 50000.00
N 40 10000.00
Now, I need percentage of group totals for each group (level=0, i.e. "Cat") instead of count or sum.
For getting count percentage instead of count value, I could do this:
df['Cat'].value_counts(normalize=True)
But here I have sub-group "Ans" under the "Cat" group. And I need the percentage to be on each Cat group level and not the whole total.
So, expectation is:
Cat Ans col1 .. col3
A Y 100 .. 71.43 #(100/(100+40))*100
N 40 .. 28.57
B Y 80 .. 66.67
N 40 .. 33.33
Similarly, col4 will be percentage of group-total for col2.
Is there a function or method available for this?
How do we do this in an efficient way for large data?

You can use the level argument of DataFrame.sum (to perform a groupby) and have pandas take care of the index alignment for the division.
df['col3'] = df['col1']/df['col1'].sum(level='Cat')*100
col1 col2 col3
Cat Ans
A Y 100 10000.0 71.428571
N 40 15000.0 28.571429
B Y 80 50000.0 66.666667
N 40 10000.0 33.333333
For multiple columns you can loop the above, or have pandas align those too. I add a suffix to distinguish the new columns from the original columns when joining back with concat.
df = pd.concat([df, (df/df.sum(level='Cat')*100).add_suffix('_pct')], axis=1)
col1 col2 col1_pct col2_pct
Cat Ans
A Y 100 10000.0 71.428571 40.000000
N 40 15000.0 28.571429 60.000000
B Y 80 50000.0 66.666667 83.333333
N 40 10000.0 33.333333 16.666667

In pandas dataframe, how to make one column act on all the others?

Consider the small following dataframe:
import pandas as pd
value1 = [15, 20, 50, 70]
value2 = [15, 80, 45, 30]
base = [175, 150, 200, 125]
df = pd.DataFrame({"val1": value1, "val2": value2, "base": base})
df
val1 val2 base
0 15 15 175
1 20 80 150
2 50 45 200
3 70 30 125
Actually, there are much more rows and much more val*** columns...
I would like to express the figures given in the columns val*** as percent of their corresponding base (in the same row); as an example, 70 (last in val1) should become (70/125)*100, (which is 56), or 30 (last in val2) should become (30/125)*100 (which is 28) ; and so on for every figure.
I am sure the solution lies in a correct use of assign or apply and lambda, but I can't find how to do it ...

We can filter the val like columns then divide these columns by the base column along axis=0 followed by multiplication with 100 to calculate the percentage
df.filter(like='val').div(df['base'], axis=0).mul(100).add_suffix('%')
val1% val2%
0 8.571429 8.571429
1 13.333333 53.333333
2 25.000000 22.500000
3 56.000000 24.000000

Get total of Pandas column and row

I have a Pandas data frame, as shown below,
a b c
A 100 60 60
B 90 44 44
A 70 50 50
Now, I would like to get the total of column and row, skip c, as shown below,
a b sum
A 170 110 280
B 90 44 134
So I do not know how to do, I'm in trouble, please help me, thank you, guys.
My example dataframe is:
df = pd.DataFrame(dict(a=[100, 90,70], b=[60, 44,50],c=[60, 44,50]),index=["A", "B","A"])

(
df.groupby(level=0)['a','b'].sum()
.assign(sum=lambda x: x.sum(1))
)

Use:
#remove unnecessary column
df = df.drop('c', 1)
#get sum of rows
df['sum'] = df.sum(1)
#get sum per index
df = df.sum(level=0)
print (df)
a b sum
A 170 110 280
B 90 44 134

df["sum"] = df[["a","b"]].sum(axis=1) #Column-wise sum of "a" and "b"
df[["a", "b", "sum"]] #show all columns but not "c"

The pandas way is:
#create sum column
df['sum'] = df['a']+df['b']
#remove colimn c
df = df[['a', 'b', 'sum']]

How to take values in the column as the columns in the DataFrame in pandas

My current DataFrame is:
Term value
Name
A 1 35
A 2 40
A 3 50
B 1 20
B 2 45
B 3 50
I want to get a dataframe as:
Term 1 2 3
Name
A 35 40 50
B 20 45 50
How can i get it?I've tried using pivot_table but i didn't get my expected output.Is there any way to get my expected output?

Use:
df = df.set_index('Term', append=True)['value'].unstack()
Or:
df = pd.pivot(df.index, df['Term'], df['value'])
print (df)
Term 1 2 3
Name
A 35 40 50
B 20 45 50
EDIT: If duplicates in pairs Name with Term is necessary aggretion, e.g. sum or mean:
df = df.groupby(['Name','Term'])['value'].sum().unstack(fill_value=0)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

In Pandas, how to compute value counts on bins and sum value in 1 other column - python-3.x

Related

How to filter a dataframe using a cumulative sum of a column as parameter

Applying "percentage of group total" to a column in a grouped dataframe

In pandas dataframe, how to make one column act on all the others?

Get total of Pandas column and row

How to take values in the column as the columns in the DataFrame in pandas

Categories

Resources