Logarithm calculation in python - python-3.x

I am trying to perform a logarithm in series, but I get the following error.
TypeError: cannot convert the series to class 'float'
I have a dataframe with two column A and B
A B
------
1 5
2 6
3 7
I am trying the following:
O/p = 10*math.log(10,df['A']+df['B'])
Required output:
row1 = 10*math.log(10,6)
row2 = 10*math.log(10,8)
row3 = 10*math.log(10,10)
But getting TypeError: cannot convert the series to class 'float'

math.log is meant to work with a scalar of type float. To compute log10 of a dataframe column, which is of type series, use numpy.log10 documented here.
Example:
import numpy
10*numpy.log10(df['A']+df['B'])
Here's a reproducible example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame([[1,5],[2,6],[3,7]], columns=["A","B"])
>>> df
A B
0 1 5
1 2 6
2 3 7
>>> np.log10(df["A"]+df["B"])
0 0.778151
1 0.903090
2 1.000000
dtype: float64
>>>

Related

Pandas Apply Function returns numpy.nan instead of None

My DataFrame has Null values and I would like to replace them with None to send them to Database. If I use apply function None gets written as Numpy.nan in pandas.
import pandas as pd
import numpy as np
df = pd.DataFrame([1,2, 4, 5, np.nan], columns = ['a'])
df.a.apply(lambda x: x if x==x else None)
Output:
0 1.0
1 2.0
2 4.0
3 5.0
4 NaN
Name: a, dtype: float64
If I run below function it writes None in Dataframe.
df.a.apply(lambda x: None)
0 None
1 None
2 None
3 None
4 None
Name: a, dtype: object
This might be because of the column datatype is float and not Object. Any workaround for that? Thank you.

How to access list of list values in columns in dataset

In my DataFrame.I am having a list of list values in a column. For example, I am having columns as A, B, C, and my output column. In column A I'm having a value of 12 and in column B I am having values of 30 and in column C I am having a list of values like [0.01,1.234,2.31].When I try to find mean for all the list of list values.It shows list object as no attribute mean.How to convert all list of list values to mean in the dataframe?
You can transform the column which contains the lists to another DataFrame and calculate the mean.
import pandas as pd
df = ... # Original df
pd.DataFrame(df['column_with_lists'].values.tolist()).mean(1)
This would result in a pandas DataFrame which looks like the following:
0 mean_of_list_row_0
1 mean_of_list_row_1
. .
. .
. .
n mean_of_list_row_n
You can use apply(np.mean) on the column with the lists in it to get the mean. For example:
Build a dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame([[2,4],[4,6]])
df[3] = [[5,7],[8,9,10]]
print(df)
0 1 3
0 2 4 [5, 7]
1 4 6 [8, 9, 10]
Use apply(np.mean)
print(df[3].apply(np.mean))
0 6.0
1 9.0
If you want to convert that column into the mean of the lists:
df[3] = df[3].apply(np.mean)
print(df)
Name: 3, dtype: float64
0 1 3
0 2 4 6.0
1 4 6 9.0

How to change the format for values in a dataframe?

I need to change the format for values in a column in a dataframe. If I have a dataframe in that format:
df =
sector funding_total_usd
1 NaN
2 10,00,000
3 3,90,000
4 34,06,159
5 2,17,50,000
6 20,00,000
How to change it to that format:
df =
sector funding_total_usd
1 NaN
2 10000.00
3 3900.00
4 34061.59
5 217500.00
6 20000.00
This is my code:
for row in df['funding_total_usd']:
dt1 = row.replace (',','')
print (dt1)
This is the error that I got "AttributeError: 'float' object has no attribute 'replace'"
I need really to your help in how to do that?
Here's the way to get the decimal places:
import pandas as pd
import numpy as np
df= pd.DataFrame({'funding_total_usd': [np.nan, 1000000, 390000, 3406159,21750000,2000000]})
print(df)
df['funding_total_usd'] /= 100
print(df)
funding_total_usd
0 NaN
1 1000000.0
2 390000.0
3 3406159.0
4 21750000.0
funding_total_usd
0 NaN
1 10000.00
2 3900.00
3 34061.59
4 217500.00
To solve your comma problem, please run this as your first command before you print. It will remove all your commas for the float values.
pd.options.display.float_format = '{:.2f}'.format

Convert floats to ints of a column with numbers and nans

I'm working with Python 3.6 and Pandas 1.0.3.
I would like to convert the floats from column "A" to int... This column has some nan values.
So i followed this post with the solution of #jezrael.
But I get the following error:
"TypeError: cannot safely cast non-equivalent float64 to int64"
This is my code
import pandas as pd
import numpy as np
data = {'timestamp': [1588757760.0000, 1588757760.0161, 1588757764.7339, 1588757764.9234], 'A':[9087.6000, 9135.8000, np.nan, 9102.1000], 'B':[0.1648, 0.1649, '', 5.3379], 'C':['b', 'a', '', 'a']}
df = pd.DataFrame(data)
df['A'] = pd.to_numeric(df['A'], errors='coerce').astype('Int64')
print(df)
Did I miss something?
Your problem is that you have true float numbers, not integers in the float form. So for safety reasons pandas will not convert them, because you would be obtained other values.
So you need first explicitely round them to integers, and only then use the.astype() method:
df['A'] = pd.to_numeric(df['A'].round(), errors='coerce').astype('Int64')
Test:
print(df)
timestamp A B C
0 1.588758e+09 9088 0.1648 b
1 1.588758e+09 9136 0.1649 a
2 1.588758e+09 NaN
3 1.588758e+09 9102 5.3379 a
One way to do it is to convert NaN to a integer:
df['A'] = df['A'].fillna(99999999).astype(np.int64, errors='ignore')
df['A'] = df['A'].replace(99999999, np.nan)
df
timestamp A B C
0 1.588758e+09 9087 0.1648 b
1 1.588758e+09 9135 0.1649 a
2 1.588758e+09 NaN
3 1.588758e+09 9102 5.3379 a

Sort pandas dataframe by a column

I have a pandas dataframe as below:
import pandas as pd
import numpy as np
import datetime
# intialise data of lists.
data = {'A' :[1,1,1,1,2,2,2,2],
'B' :[2,3,1,5,7,7,1,6]}
# Create DataFrame
df = pd.DataFrame(data)
df
I want to sort 'B' by each group of 'A'
Expected Output:
A B
0 1 1
1 1 2
2 1 3
3 1 5
4 2 1
5 2 6
6 2 7
7 2 7
You can sort a dataframe using the sort_values command. This command will sort your dataframe with priority on A and then B as requested.
df.sort_values(by=['A', 'B'])
Docs

Resources