Perform arithmetic operation mainly subtraction and division over a pandas series on null values - python-3.x

Simply i want when i subtract/division operation with null value it will give the value(digit).Ex - 3/np.nan = 3 or 2-np.nan = 2.
By using np.nansum and np.nanprod i have handled addition and multiplication,but dont know how will i do operation for subtraction and division.
df = pd.DataFrame({"a":[1,2,3,4],"b":[1,2,np.nan,np.nan]})
df
Out[6]:
a b c=a-b d=a/b
0 1 1.0 0.0 1.0
1 2 2.0 0.0 1.0
2 3 NaN 3.0 3.0
3 4 NaN 4.0 4.0
Above i mention that actually what i am looking for.

#Use fill value of 0 for subtraction operation
df['c']=df.a.sub(df.b,fill_value=0)
#Use fill value of 1 for division operation
df['d']=df.a.div(df.b,fill_value=1)

IIUC using sub with fill_value
df.a.sub(df.b,fill_value=0)
Out[251]:
0 0.0
1 0.0
2 3.0
3 4.0
dtype: float64

Related

python pandas data frame: single column to multiple columns based on values

I am new to pandas.
I am trying to split a single column to multiple columns based on index value using Groupby. Below is the program wrote.
import pandas as pd
data = [(0,1.1),
(1,1.2),
(2,1.3),
(0,2.1),
(1,2.2),
(0,3.1),
(1,3.2),
(2,3.3),
(3,3.4)]
df = pd.DataFrame(data, columns=['ID','test_data'])
df = df.groupby('ID',sort=True).apply(lambda g: pd.Series(g['test_data'].values))
print(df)
df=df.unstack(level=-1).rename(columns=lambda x: 'test_data%s' %x)
print(df)
I have to use unstack(level=-1) because when we have uneven column size, the groupie and series stores the result as shown below.
ID
0 0 1.1
1 2.1
2 3.1
1 0 1.2
1 2.2
2 3.2
2 0 1.3
1 3.3
3 0 3.4
dtype: float64
End result I am getting after unstack is like below
test_data0 test_data1 test_data2
ID
0 1.1 2.1 3.1
1 1.2 2.2 3.2
2 1.3 3.3 NaN
3 3.4 NaN NaN
but what I am expecting is
test_data0 test_data1 test_data2
ID
0 1.1 2.1 3.1
1 1.2 2.2 3.2
2 1.3 NAN 3.3
3 NAN NAN 3.4
Let me know if there is any better way to do this other than groupby.
This will work if your dataframe is sorted as you show
df['num_zeros_seen'] = df['ID'].eq(0).cumsum()
#reshape the table
df = df.pivot(
index='ID',
columns='num_zeros_seen',
values='test_data',
)
print(df)
Output:
num_zeros_seen 1 2 3
ID
0 1.1 2.1 3.1
1 1.2 2.2 3.2
2 1.3 NaN 3.3
3 NaN NaN 3.4

DataFrame of Dates into sequential dates

I would like to turn a dataframe as follows into a data frame of sequential dates.
Date
01/25/1995
01/20/1995
01/20/1995
01/23/1995
into
Date Value Cumsum
01/20/1995 2 2
01/21/1995 0 2
01/22/1995 0 2
01/23/1995 1 3
01/24/1995 0 3
01/25/1995 1 4
Try this:
df['Date'] = pd.to_datetime(df['Date'])
df_out = df.assign(Value=1).set_index('Date').resample('D').asfreq().fillna(0)
df_out = df_out.assign(Cumsum=df_out['Value'].cumsum())
print(df_out)
Output:
Value Cumsum
Date
1995-01-20 1.0 1.0
1995-01-21 0.0 1.0
1995-01-22 0.0 1.0
1995-01-23 1.0 2.0
1995-01-24 0.0 2.0
1995-01-25 1.0 3.0

Pandas: GroupBy Shift And Cumulative Sum

I want to do groupby, shift and cumsum which seems pretty trivial task but still banging my head over the result I'm getting. Can someone please tell what am I doing wrong. All the results I found online shows the same or the same variation of what I am doing. Below is my implementation.
temp = pd.DataFrame(data=[['a',1],['a',1],['a',1],['b',1],['b',1],['b',1],['c',1],['c',1]], columns=['ID','X'])
temp['transformed'] = temp.groupby('ID')['X'].cumsum().shift()
print(temp)
ID X transformed
0 a 1 NaN
1 a 1 1.0
2 a 1 2.0
3 b 1 3.0
4 b 1 1.0
5 b 1 2.0
6 c 1 3.0
7 c 1 1.0
This is wrong because the actual or what I am looking for is as below:
ID X transformed
0 a 1 NaN
1 a 1 1.0
2 a 1 2.0
3 b 1 NaN
4 b 1 1.0
5 b 1 2.0
6 c 1 NaN
7 c 1 1.0
Thanks a lot in advance.
You could use transform() to feed the separate groups that are created at each level of groupby into the cumsum() and shift() methods.
temp['transformed'] = \
temp.groupby('ID')['X'].transform(lambda x: x.cumsum().shift())
ID X transformed
0 a 1 NaN
1 a 1 1.0
2 a 1 2.0
3 b 1 NaN
4 b 1 1.0
5 b 1 2.0
6 c 1 NaN
7 c 1 1.0
For more info on transform() please see here:
https://jakevdp.github.io/PythonDataScienceHandbook/03.08-aggregation-and-grouping.html#Transformation
https://pandas.pydata.org/pandas-docs/version/0.22/groupby.html#transformation
You need using apply , since one function is under groupby object which is cumsum another function shift is for all df
temp['transformed'] = temp.groupby('ID')['X'].apply(lambda x : x.cumsum().shift())
temp
Out[287]:
ID X transformed
0 a 1 NaN
1 a 1 1.0
2 a 1 2.0
3 b 1 NaN
4 b 1 1.0
5 b 1 2.0
6 c 1 NaN
7 c 1 1.0
While working on this problem, as the DataFrame size grows, using lambdas on transform starts to get very slow. I found out that using some DataFrameGroupBy methods (like cumsum and shift instead of lambdas are much faster.
So here's my proposed solution, creating a 'temp' column to save the cumsum for each ID and then shifting in a different groupby:
df['temp'] = df.groupby("ID")['X'].cumsum()
df['transformed'] = df.groupby("ID")['temp'].shift()
df = df.drop(columns=["temp"])

How to fill NaN with user defined value in pandas dataframe

How to fill NaN with user defined value in pandas dataframe.
For text columns like A and B, user defined text like 'Missing' should be imputed. For discrete numeric variables like C and D, median value should be imputed. I have many columns like these, I would like apply rule for all vars in the dataframe
DF
A B C D
A0A1 Railway 10 NaN
A1A1 Shipping NaN 1
NaN Shipping 3 2
B1A1 NaN 1 7
DF out:
A B C D
A0A1 Railway 10 2
A1A1 Shipping 3 1
Missing Shipping 3 2
B1A1 Missing 1 7
You can fillna by pass dict
df.fillna({'A':'Miss','B':"Your2",'C':df.C.median(),'D':df.D.mean()})
Out[373]:
A B C D
0 A0A1 Railway 10.0 3.333333
1 A1A1 Shipping 3.0 1.000000
2 Miss Shipping 3.0 2.000000
3 B1A1 Your2 1.0 7.000000
Fun way!
d = {np.dtype('O'): 'Missing'}
df.fillna(df.dtypes.map(d).fillna(df.median()))
A B C D
0 A0A1 Railway 10.0 2.0
1 A1A1 Shipping 3.0 1.0
2 Missing Shipping 3.0 2.0
3 B1A1 Missing 1.0 7.0
First replace median for numeric columns and then fillna for non numeric:
df = df.fillna(df.median()).fillna('Missing')
print (df)
A B C D
0 A0A1 Railway 10.0 2.0
1 A1A1 Shipping 3.0 1.0
2 Missing Shipping 3.0 2.0
3 B1A1 Missing 1.0 7.0

Get number of rows for all combinations of attribute levels in Pandas

I have a dataframe bunch of categorical variables, each row corresponds to a product.I wanted to find the number of rows for every combination of attribute levels and decided to run the following:
att1=list(frame_base.columns.values)
f1=att.groupby(att1,as_index=False).size().rename('counts').to_frame()
att1 is the list of all attributes, f1 does not seem to provide the correct value as f1.counts.sum() is not equal to len(f1) before the group by.Why doesn't this work?
One possible problem is NaN row, but maybe there is typo - need att instead frame_base:
att = pd.DataFrame({'A':[1,1,3,np.nan],
'B':[1,1,6,np.nan],
'C':[2,2,9,np.nan],
'D':[1,1,5,np.nan],
'E':[1,1,6,np.nan],
'F':[1,1,3,np.nan]})
print (att)
A B C D E F
0 1.0 1.0 2.0 1.0 1.0 1.0
1 1.0 1.0 2.0 1.0 1.0 1.0
2 3.0 6.0 9.0 5.0 6.0 3.0
3 NaN NaN NaN NaN NaN NaN
att1=list(att.columns.values)
f1=att.groupby(att1).size().reset_index(name='counts')
print (f1)
A B C D E F counts
0 1.0 1.0 2.0 1.0 1.0 1.0 2
1 3.0 6.0 9.0 5.0 6.0 3.0 1

Resources