Creating an aggregate columns in pandas dataframe - python-3.x

I have a pandas dataframe as below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ORDER':["A", "A", "B", "B"], 'var1':[2, 3, 1, 5],'a1_bal':[1,2,3,4], 'a1c_bal':[10,22,36,41], 'b1_bal':[1,2,33,4], 'b1c_bal':[11,22,3,4], 'm1_bal':[15,2,35,4]})
df
ORDER var1 a1_bal a1c_bal b1_bal b1c_bal m1_bal
0 A 2 1 10 1 11 15
1 A 3 2 22 2 22 2
2 B 1 3 36 33 3 35
3 B 5 4 41 4 4 4
I want to create new columns as below:
a1_final_bal = sum(a1_bal, a1c_bal)
b1_final_bal = sum(b1_bal, b1c_bal)
m1_final_bal = m1_bal (since we only have m1_bal field not m1c_bal, so it will renain as it is)
I don't want to hardcode this step because there might be more such columns as "c_bal", "m2_bal", "m2c_bal" etc..
My final data should look something like below
ORDER var1 a1_bal a1c_bal b1_bal b1c_bal m1_bal a1_final_bal b1_final_bal m1_final_bal
0 A 2 1 10 1 11 15 11 12 15
1 A 3 2 22 2 22 2 24 24 2
2 B 1 3 36 33 3 35 38 36 35
3 B 5 4 41 4 4 4 45 8 4

You could try something like this. I am not sure if its exactly what you are looking for, but I think it should work.
dfforgroup = df.set_index(['ORDER','var1']) #Creates MultiIndex
dfforgroup.columns = dfforgroup.columns.str[:2] #Takes first two letters of remaining columns
df2 = dfforgroup.groupby(dfforgroup.columns,axis=1).sum().reset_index().drop(columns =
['ORDER','var1']).add_suffix('_final_bal') #groups columns by their first two letters and sums the columns up
df = pd.concat([df,df2],axis=1) #concatenates new columns to original df

Related

Stack row under row from two different dataframe using python? [duplicate]

df1 = pd.DataFrame({'a':[1,2,3],'x':[4,5,6],'y':[7,8,9]})
df2 = pd.DataFrame({'b':[10,11,12],'x':[13,14,15],'y':[16,17,18]})
I'm trying to merge the two data frames using the keys from the df1. I think I should use pd.merge for this, but I how can I tell pandas to place the values in the b column of df2 in the a column of df1. This is the output I'm trying to achieve:
a x y
0 1 4 7
1 2 5 8
2 3 6 9
3 10 13 16
4 11 14 17
5 12 15 18
Just use concat and rename the column for df2 so it aligns:
In [92]:
pd.concat([df1,df2.rename(columns={'b':'a'})], ignore_index=True)
Out[92]:
a x y
0 1 4 7
1 2 5 8
2 3 6 9
3 10 13 16
4 11 14 17
5 12 15 18
similarly you can use merge but you'd need to rename the column as above:
In [103]:
df1.merge(df2.rename(columns={'b':'a'}),how='outer')
Out[103]:
a x y
0 1 4 7
1 2 5 8
2 3 6 9
3 10 13 16
4 11 14 17
5 12 15 18
Use numpy to concatenate the dataframes, so you don't have to rename all of the columns (or explicitly ignore indexes). np.concatenate also works on an arbitrary number of dataframes.
df = pd.DataFrame( np.concatenate( (df1.values, df2.values), axis=0 ) )
df.columns = [ 'a', 'x', 'y' ]
df
You can rename columns and then use functions append or concat:
df2.columns = df1.columns
df1.append(df2, ignore_index=True)
# pd.concat([df1, df2], ignore_index=True)
You can also concatenate both dataframes with vstack from numpy and convert the resulting ndarray to dataframe:
pd.DataFrame(np.vstack([df1, df2]), columns=df1.columns)

loops application in dataframe to find output

I have the following data:
dict={'A':[1,2,3,4,5],'B':[10,20,233,29,2],'C':[10,20,3040,230,238]...................}
and
df= pd.Dataframe(dict)
In this manner I have 20 columns with 5 numerical entry in each column
I want to have a new column where the value should come as the following logic:
0 A[0]*B[0]+A[0]*C[0] + A[0]*D[0].......
1 A[1]*B[1]+A[1]*C[1] + A[1]*D[1].......
2 A[2]*B[2]+A[2]*B[2] + A[2]*D[2].......
I tried in the following manner but manually I can not put 20 columns, so I wanted to know the way to apply a loop to get the desired output
:
lst=[]
for i in range(0,5):
j=df.A[i]*df.B[i]+ df.A[i]*df.C[i]+.......
lst.append(j)
i=i+1
A potential solution is the following. I am only taking the example you posted but is works fine for more. Your data is df
A B C
0 1 10 10
1 2 20 20
2 3 233 3040
3 4 29 230
4 5 2 238
You can create a new column, D by first subsetting your dataframe
add = df.loc[:, df.columns != 'A']
and then take the sum over all multiplications of the columns in D with column A in the following way:
df['D'] = df['A']*add.sum(axis=1)
which returns
A B C D
0 1 10 10 20
1 2 20 20 80
2 3 233 3040 9819
3 4 29 230 1036
4 5 2 238 1200

How can I sort 3 columns and assign it to one python pandas

I have a dataframe:
df = {A:[1,1,1], B:[2012,3014,3343], C:[12,13,45], D:[111,222,444]}
but I need to join the last 3 columns in consecutive order horizontally and thus assign it to the first column, some like this:
df2 = {A:[1,1,1,2,2,2], Fusion3:[2012,12,111,3014,13,222]}
I have tried with .melt, but you are struggling with some ideas and grateful for your comments
From the desired output I'm making the assumption that the initial dataframe should have 1,2,3 in the A column rather 1,1,1
import pandas as pd
df= pd.DataFrame({'A':[1,2,3], 'B':[2012,3014,3343], 'C':[12,13,45], 'D':[111,222,444]})
df = df.set_index('A')
df = df.stack().droplevel(1)
will give you this series:
A
1 2012
1 12
1 111
2 3014
2 13
2 222
3 3343
3 45
3 444
Check melt
out = df.melt('A').drop('variable',1)
Out[15]:
A value
0 1 2012
1 2 3014
2 3 3343
3 1 12
4 2 13
5 3 45
6 1 111
7 2 222
8 3 444

How to copy values from other dataframe based on condition (same values of specific column)?

I have two dataframes (df1 and df2) and they look like this:
data1 = {'col1':[1,2,3,4,1,2,3,4,1,2,3,4], 'col2':np.arange(1,13)*2}
df1 = pd.DataFrame(data1)
data2 = {'x': [1,2,3,4], 'y': [10,20,40,5]}
df2 = pd.DataFrame(data2)
I would like to add a new column 'col3' to df1 with the values of df2['y'] when df1['col1'] is equal to df2['x']. So my df1 would stay like:
col1 col2 col3
1 2 10
2 4 20
3 6 40
4 8 5
1 10 10
2 12 20
3 14 40
4 16 5
1 18 10
2 20 20
3 22 40
4 24 5
Anyone could help me?
Use map with the dictionary creating from df2
df1['col3'] = df1.col1.map(dict(df2[['x', 'y']].values))
or
df1['col3'] = df1.col1.map(dict(zip(df2.x, df2.y)))
Out[886]:
col1 col2 col3
0 1 2 10
1 2 4 20
2 3 6 40
3 4 8 5
4 1 10 10
5 2 12 20
6 3 14 40
7 4 16 5
8 1 18 10
9 2 20 20
10 3 22 40
11 4 24 5
Use a merge:
df1.merge(df2, how='left', left_on='col1', right_on='x') \
[['col1', 'col2', 'y']] \
.rename(columns={'y': 'col3'})

Subset and Loop to create a new column [duplicate]

With the DataFrame below as an example,
In [83]:
df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
df
Out[83]:
A B values
0 1 1 10
1 1 2 15
2 2 1 20
3 2 2 25
What would be a simple way to generate a new column containing some aggregation of the data over one of the columns?
For example, if I sum values over items in A
In [84]:
df.groupby('A').sum()['values']
Out[84]:
A
1 25
2 45
Name: values
How can I get
A B values sum_values_A
0 1 1 10 25
1 1 2 15 25
2 2 1 20 45
3 2 2 25 45
In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
In [21]: df
Out[21]:
A B values
0 1 1 10
1 1 2 15
2 2 1 20
3 2 2 25
In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)
In [23]: df
Out[23]:
A B values sum_values_A
0 1 1 10 25
1 1 2 15 25
2 2 1 20 45
3 2 2 25 45
I found a way using join:
In [101]:
aggregated = df.groupby('A').sum()['values']
aggregated.name = 'sum_values_A'
df.join(aggregated,on='A')
Out[101]:
A B values sum_values_A
0 1 1 10 25
1 1 2 15 25
2 2 1 20 45
3 2 2 25 45
Anyone has a simpler way to do it?
This is not so direct but I found it very intuitive (the use of map to create new columns from another column) and can be applied to many other cases:
gb = df.groupby('A').sum()['values']
def getvalue(x):
return gb[x]
df['sum'] = df['A'].map(getvalue)
df
In [15]: def sum_col(df, col, new_col):
....: df[new_col] = df[col].sum()
....: return df
In [16]: df.groupby("A").apply(sum_col, 'values', 'sum_values_A')
Out[16]:
A B values sum_values_A
0 1 1 10 25
1 1 2 15 25
2 2 1 20 45
3 2 2 25 45

Resources