Pandas Apply Function returns numpy.nan instead of None - python-3.x

My DataFrame has Null values and I would like to replace them with None to send them to Database. If I use apply function None gets written as Numpy.nan in pandas.
import pandas as pd
import numpy as np
df = pd.DataFrame([1,2, 4, 5, np.nan], columns = ['a'])
df.a.apply(lambda x: x if x==x else None)
Output:
0 1.0
1 2.0
2 4.0
3 5.0
4 NaN
Name: a, dtype: float64
If I run below function it writes None in Dataframe.
df.a.apply(lambda x: None)
0 None
1 None
2 None
3 None
4 None
Name: a, dtype: object
This might be because of the column datatype is float and not Object. Any workaround for that? Thank you.

Related

Convert floats to ints of a column with numbers and nans

I'm working with Python 3.6 and Pandas 1.0.3.
I would like to convert the floats from column "A" to int... This column has some nan values.
So i followed this post with the solution of #jezrael.
But I get the following error:
"TypeError: cannot safely cast non-equivalent float64 to int64"
This is my code
import pandas as pd
import numpy as np
data = {'timestamp': [1588757760.0000, 1588757760.0161, 1588757764.7339, 1588757764.9234], 'A':[9087.6000, 9135.8000, np.nan, 9102.1000], 'B':[0.1648, 0.1649, '', 5.3379], 'C':['b', 'a', '', 'a']}
df = pd.DataFrame(data)
df['A'] = pd.to_numeric(df['A'], errors='coerce').astype('Int64')
print(df)
Did I miss something?
Your problem is that you have true float numbers, not integers in the float form. So for safety reasons pandas will not convert them, because you would be obtained other values.
So you need first explicitely round them to integers, and only then use the.astype() method:
df['A'] = pd.to_numeric(df['A'].round(), errors='coerce').astype('Int64')
Test:
print(df)
timestamp A B C
0 1.588758e+09 9088 0.1648 b
1 1.588758e+09 9136 0.1649 a
2 1.588758e+09 NaN
3 1.588758e+09 9102 5.3379 a
One way to do it is to convert NaN to a integer:
df['A'] = df['A'].fillna(99999999).astype(np.int64, errors='ignore')
df['A'] = df['A'].replace(99999999, np.nan)
df
timestamp A B C
0 1.588758e+09 9087 0.1648 b
1 1.588758e+09 9135 0.1649 a
2 1.588758e+09 NaN
3 1.588758e+09 9102 5.3379 a

Replacing NaN value with None fill the value from the previous row in a DataFrame (pandas 1.0.3)

Not sure if it is a bug, but I am unable to replace NaN value with None value using latest Pandas library. When I use DataFrame.replace() method to replace NaN with None, dataframe is taking value from previous row instead of None value. For example,
import numpy as np
import pandas as pd
df = pd.DataFrame({'x': [10, 20, np.nan], 'y': [30, 40, 50]})
print(df)
Outputs
x y
0 10.0 30
1 20.0 40
2 NaN 50
And if I apply replace method
print(df.replace(np.NaN, None))
Outputs. Cell(X, 2) should be None instead of 20.0.
x y
0 10.0 30
1 20.0 40
2 20.0 50
Anyone help is appreciated.

Delete row from dataframe having "None" value in all the columns - Python

I need to delete the row completely in a dataframe having "None" value in all the columns. I am using the following code -
df.dropna(axis=0,how='all',thresh=None,subset=None,inplace=True)
This does not bring any difference to the dataframe. The rows with "None" value are still there.
How to achieve this?
There Nones should be strings, so use replace first:
df = df.replace('None', np.nan).dropna(how='all')
df = pd.DataFrame({
'a':['None','a', 'None'],
'b':['None','g', 'None'],
'c':['None','v', 'b'],
})
print (df)
a b c
0 None None None
1 a g v
2 None None b
df1 = df.replace('None', np.nan).dropna(how='all')
print (df1)
a b c
1 a g v
2 NaN NaN b
Or test values None with not equal and DataFrame.any:
df1 = df[df.ne('None').any(axis=1)]
print (df1)
a b c
1 a g v
2 None None b
You should be dropping in the axis 1. Use the how keyword to drop columns with any or all NaN values. Check the docs
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3], 'b':[-1, 0, np.nan], 'c':[np.nan, np.nan, np.nan]})
df
a b c
0 1 -1.0 NaN
1 2 0.0 NaN
2 3 NaN 5.0
df.dropna(axis=1, how='any')
a
0 1
1 2
2 3
df.dropna(axis=1, how='all')
a b
0 1 -1.0
1 2 0.0
2 3 NaN

Logarithm calculation in python

I am trying to perform a logarithm in series, but I get the following error.
TypeError: cannot convert the series to class 'float'
I have a dataframe with two column A and B
A B
------
1 5
2 6
3 7
I am trying the following:
O/p = 10*math.log(10,df['A']+df['B'])
Required output:
row1 = 10*math.log(10,6)
row2 = 10*math.log(10,8)
row3 = 10*math.log(10,10)
But getting TypeError: cannot convert the series to class 'float'
math.log is meant to work with a scalar of type float. To compute log10 of a dataframe column, which is of type series, use numpy.log10 documented here.
Example:
import numpy
10*numpy.log10(df['A']+df['B'])
Here's a reproducible example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame([[1,5],[2,6],[3,7]], columns=["A","B"])
>>> df
A B
0 1 5
1 2 6
2 3 7
>>> np.log10(df["A"]+df["B"])
0 0.778151
1 0.903090
2 1.000000
dtype: float64
>>>

Element-wise Maximum of Two DataFrames Ignoring NaNs

I have two dataframes (df1 and df2) that each have the same rows and columns. I would like to take the maximum of these two dataframes, element-by-element. In addition, the result of any element-wise maximum with a number and NaN should be the number. The approach I have implemented so far seems inefficient:
def element_max(df1,df2):
import pandas as pd
cond = df1 >= df2
res = pd.DataFrame(index=df1.index, columns=df1.columns)
res[(df1==df1)&(df2==df2)&(cond)] = df1[(df1==df1)&(df2==df2)&(cond)]
res[(df1==df1)&(df2==df2)&(~cond)] = df2[(df1==df1)&(df2==df2)&(~cond)]
res[(df1==df1)&(df2!=df2)&(~cond)] = df1[(df1==df1)&(df2!=df2)]
res[(df1!=df1)&(df2==df2)&(~cond)] = df2[(df1!=df1)&(df2==df2)]
return res
Any other ideas? Thank you for your time.
A more readable way to do this in recent versions of pandas is concat-and-max:
import scipy as sp
import pandas as pd
A = pd.DataFrame([[1., 2., 3.]])
B = pd.DataFrame([[3., sp.nan, 1.]])
pd.concat([A, B]).max(level=0)
#
# 0 1 2
# 0 3.0 2.0 3.0
#
You can use where to test your df against another df, where the condition is True, the values from df are returned, when false the values from df1 are returned. Additionally in the case where NaN values are in df1 then an additional call to fillna(df) will use the values from df to fill those NaN and return the desired df:
In [178]:
df = pd.DataFrame(np.random.randn(5,3))
df.iloc[1,2] = np.NaN
print(df)
df1 = pd.DataFrame(np.random.randn(5,3))
df1.iloc[0,0] = np.NaN
print(df1)
0 1 2
0 2.671118 1.412880 1.666041
1 -0.281660 1.187589 NaN
2 -0.067425 0.850808 1.461418
3 -0.447670 0.307405 1.038676
4 -0.130232 -0.171420 1.192321
0 1 2
0 NaN -0.244273 -1.963712
1 -0.043011 -1.588891 0.784695
2 1.094911 0.894044 -0.320710
3 -1.537153 0.558547 -0.317115
4 -1.713988 -0.736463 -1.030797
In [179]:
df.where(df > df1, df1).fillna(df)
Out[179]:
0 1 2
0 2.671118 1.412880 1.666041
1 -0.043011 1.187589 0.784695
2 1.094911 0.894044 1.461418
3 -0.447670 0.558547 1.038676
4 -0.130232 -0.171420 1.192321

Resources