I have a data frame where I need to show a totals in the last row in terms of the count of not null values of string value columns and a summation of a one column and the average of another column.
df2 = pd.DataFrame({ 'Name':['John', 'Tom', 'Tom' ,'Ole','Ole','Tom'],
'To_Count':['Yes', 'Yes','Yes', 'No', np.nan, np.nan],
'To_Count1':['Yes', 'Yes','Yes','No', np.nan,np.nan],
'To_Sum':[100, 200, 300, 500,600, 400],
'To_Avg':[100, 200, 300, 500, 600, 400],
})
This is how I code to get this result.
df2.loc["Totals",'To_Count':'To_Count1'] = df2.loc[:,'To_Count':'To_Count1'].count(axis=0)
df2.loc["Totals",'To_Sum'] = df2.loc[:,'To_Sum'].sum(axis=0)
df2.loc["Totals",'To_Avg'] = df2.loc[:,'To_Avg'].mean(axis=0)
However if I run this code again accidently the values get duplicated.
Is there a better way to get this result.
Expected result;
Use DataFrame.agg with dictionary:
df2.loc["Totals"] = df2.agg({'To_Sum': 'sum',
'To_Avg': 'mean',
'To_Count': 'count',
'To_Count1':'count'})
print (df2)
Name To_Count To_Count1 To_Sum To_Avg
0 John Yes Yes 100.0 100.0
1 Tom Yes Yes 200.0 200.0
2 Tom Yes Yes 300.0 300.0
3 Ole No No 500.0 500.0
4 Ole NaN NaN 600.0 600.0
5 Tom NaN NaN 400.0 400.0
Totals NaN 4.0 4.0 2100.0 350.0
More dynamic solution if many columns between To_Count and To_Count1:
d = dict.fromkeys(df2.loc[:,'To_Count':'To_Count1'].columns, 'count')
print (d)
df2.loc[:,'To_Count':'To_Count1']
df2.loc["Totals"] = df2.agg({**{'To_Sum': 'sum', 'To_Avg': 'mean'}, **d})
print (df2)
Name To_Count To_Count1 To_Sum To_Avg
0 John Yes Yes 100.0 100.0
1 Tom Yes Yes 200.0 200.0
2 Tom Yes Yes 300.0 300.0
3 Ole No No 500.0 500.0
4 Ole NaN NaN 600.0 600.0
5 Tom NaN NaN 400.0 400.0
Totals NaN 4.0 4.0 2100.0 350.0
Related
For a df like below, I use pct_change() to calculate the rolling percentage changes:
price = [np.NaN, 10, 13, np.NaN, np.NaN, 9]
df = pd. DataFrame(price, columns = ['price'])
df
Out[75]:
price
0 NaN
1 10.0
2 13.0
3 NaN
4 NaN
5 9.0
But I get these unexpected results:
df.price.pct_change(periods = 1, fill_method='bfill')
Out[76]:
0 NaN
1 0.000000
2 0.300000
3 -0.307692
4 0.000000
5 0.000000
Name: price, dtype: float64
df.price.pct_change(periods = 1, fill_method='pad')
Out[77]:
0 NaN
1 NaN
2 0.300000
3 0.000000
4 0.000000
5 -0.307692
Name: price, dtype: float64
df.price.pct_change(periods = 1, fill_method='ffill')
Out[78]:
0 NaN
1 NaN
2 0.300000
3 0.000000
4 0.000000
5 -0.307692
Name: price, dtype: float64
I hope that while calculating with NaNs, the results will be NaNs instead of being filled forward or backward and then calculated.
May I ask how to achieve it? Thanks.
The expected result:
0 NaN
1 NaN
2 0.300000
3 NaN
4 NaN
5 NaN
Name: price, dtype: float64
Reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html
Maybe you can compute the pct manually with diff and shift:
period = 1
pct = df.price.diff().div(df.price.shift(period))
print(pct)
# Output
0 NaN
1 NaN
2 0.3
3 NaN
4 NaN
5 NaN
Name: price, dtype: float64
Update: you can pass fill_method=None
period = 1
pct = df.price.pct_change(periods=period, fill_method=None)
print(pct)
# Output
0 NaN
1 NaN
2 0.3
3 NaN
4 NaN
5 NaN
Name: price, dtype: float64
This question already has answers here:
How to merge multiple dataframes
(13 answers)
Closed 1 year ago.
I have a 3 dfs as shown below
df1:
ID March_Number March_Amount
A 10 200
B 4 300
C 2 100
df2:
ID Feb_Number Feb_Amount
A 1 100
B 8 500
E 4 400
F 8 100
H 4 200
df3:
ID Jan_Number Jan_Amount
A 6 800
H 3 500
B 1 50
G 8 100
I tried below code and worked well.
df_outer = pd.merge(df1, df2, on='ID', how='outer')
df_outer = pd.merge(df_outer , df3, on='ID', how='outer')
But would like to pass all df together and merge at a short. I tried below code with error as shown.
df_outer = pd.merge(df1, df2, df3, on='ID', how='outer')
please guide me, how to merge if I have 12 months of data. i.e I have to merge 12 dfs.
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-32-a63627da7233> in <module>
----> 1 df_outer = pd.merge(df1, df2, df3, on='ID', how='outer')
TypeError: merge() got multiple values for argument 'how'
Expected output:
ID March_Number March_Amount Feb_Number Feb_Amount Jan_Number Jan_Amount
A 10.0 200.0 1.0 100.0 6.0 800.0
B 4.0 300.0 8.0 500.0 1.0 50.0
C 2.0 100.0 NaN NaN NaN NaN
E NaN NaN 4.0 400.0 NaN NaN
F NaN NaN 8.0 100.0 NaN NaN
H NaN NaN 4.0 200.0 3.0 500.0
G NaN NaN NaN NaN 8.0 100.0
We can create a list of dfs in this case dfl which we want to merge and then we can merge them together.
We can add as many dfs as we want in dfl=[df1, df2, df3,..., dfn]
from functools import reduce
dfl=[df1, df2, df3]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['ID'],
how='outer'), dfl)
Output
ID March_Number March_Amount Feb_Number Feb_Amount Jan_Number Jan_Amount
0 A 10.0 200.0 1.0 0.0 6.0 800.0
1 B 4.0 300.0 8.0 500.0 1.0 50.0
2 C 2.0 100.0 NaN NaN NaN NaN
3 E NaN NaN 4.0 400.0 NaN NaN
4 F NaN NaN 8.0 0.0 NaN NaN
5 H NaN NaN 4.0 200.0 3.0 500.0
6 G NaN NaN NaN NaN 8.0 100.0
I want to do a left join with two pandas dataframes: d1 and d2. However after the join, I want one column values to replace the NULL values in another column. Here's my datasets:
vehicle_type vehicle_id sales margin
a 11 200 0.1
b 22 150 0.2
c NaN NaN NaN
d NaN NaN NaN
vehicle_type vehicle_id sales alignment
c 33 210 x
d 44 300 y
I would like the final result to be like the following, where the left join would replace the Null vehicle IDs and sales in D1:
vehicle_type vehicle_id sales margin alignment
a 11 200 0.1 NaN
b 22 150 0.2 NaN
c 33 210 NaN x
d 44 300 NaN y
I'm using the following code, but it is not working:
D3 = D1.merge(D2, on='vehicle_type',how='left')
Use DataFrame.combine_first with DataFrame.set_index for correct align DataFrame by vehicle_type columns:
df3 = (df1.set_index('vehicle_type')
.combine_first(df2.set_index('vehicle_type'))
.reset_index())
print (df3)
vehicle_type alignment margin sales vehicle_id
0 a NaN 0.1 200.0 11.0
1 b NaN 0.2 150.0 22.0
2 c x NaN 210.0 33.0
3 d y NaN 300.0 44.0
I want to merge content for respective rows' data only where some specific conditions are met.
Here is the test dataframe I am working on
Date Desc Debit Credit Bal
0 04-08-2019 abcdef 45654 NaN 345.0
1 NaN jklmn NaN NaN 6
2 04-08-2019 pqr NaN 23 368.06
3 05-08-2019 abd 23 NaN 345.06
4 06-08-2019 xyz NaN 350.0 695.06
in which, I want to join the rows where there is nan into Date to the previous row.
Output required:
Date Desc Debit Credit Bal
0 04-08-2019 abcdefjklmn 45654 NaN 345.06
1 NaN jklmn NaN NaN 6
2 04-08-2019 pqr NaN 23 368.06
3 05-08-2019 abd 23 NaN 345.0
4 06-08-2019 xyz NaN 350.0 695.06
If anybody help me out with this? I have tried the following:
for j in [x for x in range(lst[0], lst[-1]+1) if x not in lst]:
print (test.loc[j-1:j, ].apply(lambda x: ''.join(str(x)), axis=1))
But could not get the expected result.
You can use
d = df["Date"].fillna(method='ffill')
df.update(df.groupby(d).transform('sum'))
print(df)
output
Date Desc Debit Credit Bal
0 04-08-2019 abcdefjklmn 45654.0 0.0 351.0
1 NaN abcdefjklmn 45654.0 0.0 351.0
2 05-08-2019 abd 45.0 0.0 345.0
3 06-08-2019 xyz 0.0 345.0 54645.0
idx = test.loc[test["Date"].isna()].index
test.loc[idx-1, "Desc"] = test.loc[idx-1]["Desc"].str.cat(test.loc[idx]["Desc"])
test.loc[idx-1, "Bal"] = (test.loc[idx-1]["Bal"].astype(str)
.str.cat(test.loc[idx]["Bal"].astype(str)))
## I tried to add two values but it didn't work as expected, giving 351.0
# test.loc[idx-1, "Bal"] = test.loc[idx-1]["Bal"].values + test.loc[idx]["Bal"].values
Date Desc Debit Credit Bal
0 04-08-2019 abcdefjklmn 45654.0 NaN 345.06.0
1 NaN jklmn NaN NaN 6
2 05-08-2019 abd 45.0 NaN 345
3 06-08-2019 xyz NaN 345.0 54645
i have a pandas dataframe structured as follow:
In[1]: df = pd.DataFrame({"A":[10, 15, 13, 18, 0.6],
"B":[20, 12, 16, 24, 0.5],
"C":[23, 22, 26, 24, 0.4],
"D":[9, 12, 17, 24, 0.8 ]})
Out[1]: df
A B C D
0 10.0 20.0 23.0 9.0
1 15.0 12.0 22.0 12.0
2 13.0 16.0 26.0 17.0
3 18.0 24.0 24.0 24.0
4 0.6 0.5 0.4 0.8
From here my goal is to filter multiple columns based on the last row (index 4) values. More in detail i need to keep those columns that has a value < 0.06 in the last row. The output should be a df structured as follow:
B C
0 20.0 23.0
1 12.0 22.0
2 16.0 26.0
3 24.0 24.0
4 0.5 0.4
I'm trying this:
In[2]: df[(df[["A", "B", "C", "D"]] < 0.6)]
but i get the as follow:
Out[2]:
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN 0.5 0.4 NaN
I even try:
df[(df[["A", "B", "C", "D"]] < 0.6).all(axis=0)]
but It gives me error, It doesn't work.
Is there anybody whom can help me?
Use DataFrame.loc with : for return all rows by condition - compare last row by DataFrame.iloc:
df1 = df.loc[:, df.iloc[-1] < 0.6]
print (df1)
B C
0 20.0 23.0
1 12.0 22.0
2 16.0 26.0
3 24.0 24.0
4 0.5 0.4