how to call diffenet attribute of a Pandas dataframeusing a variabe? - python-3.x

consider df is a pandas data frame with 10 different columns and 500 rows. user is asked to pick a column name which will be stored in var1.
I am trying to call the corresponding column to var1 and change the data type but I see an error.
is there anyway to solve this problem?
Regards,
var1=input('Enter the file name:').lower().capitalize()
df[var1]=df.var1.astype(float)
error:
'DataFrame' object has no attribute 'file_name'

The current approach you're taking - using df.var1 to reference the var1 column - has pandas searching literally for a column/attribute named var1. A correct way of accessing this column/attribute would be to use something like this df[var1], which will look for whatever is contained in var1. See the example below for more detail:
>>> import pandas as pd
>>> var1 = 'hello'
>>> df = pd.DataFrame({'hello': [1]})
>>> df
hello
0 1
>>> df.var1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/blacksite/Documents/envs/dsenv/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'var1'
>>> df.hello
0 1
Name: hello, dtype: int64
>>> df[var1]
0 1
Name: hello, dtype: int64

Related

How to select a row in a pandas DataFrame datetime index using a datetime variable?

I am not a Professional programmer at all and slowly accumulating some experience in python.
This is the issue I encounter.
On my dev machine I had a python3.7 installed with pandas version 0.24.4
the following sequence was working perfectly fine.
>>> import pandas as pd
>>> df = pd.Series(range(3), index=pd.date_range("2000", freq="D", periods=3))
>>> df
2000-01-01 0
2000-01-02 1
2000-01-03 2
Freq: D, dtype: int64
>>> import datetime
>>> D = datetime.date(2000,1,1)
>>> df[D]
0
in the production environnent the pandas version is 1.1.4 and the sequence described does not work anymore.
>>> df[D]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/series.py", line 882, in __getitem__
return self._get_value(key)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/series.py", line 989, in _get_value
loc = self.index.get_loc(label)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 622, in get_loc
raise KeyError(key)
KeyError: datetime.date(2000, 1, 1)
Then, unexpectedly, by transforming D in a string type the following command did work :
>>> df[str(D)]
0
Any idea of why this behaviour has changed in the different versions ?
Is this behaviour a bug or will be permanent over time ?
should I transform all the selections by datetime variables in the code in string variables or is there a more robust way over time to do this ?
It depends of version. If need more robust solution use datetimes for match DatetimeIndex:
import datetime
D = datetime.datetime(2000,1,1)
print (df[D])
0

TypeError: 'tuple' object is not callable when converting a list to tuple as return value

I want to convert the final list as tuple. However i am receiving an error.How can i get rid of this?
li= [(19343160,),(39343169,)]
def render_list_sql(li):
l = []
for index, tuple in enumerate(li):
idd = str(tuple[0])
l.append(idd)
return tuple(l)
print(render_list_sql(li))
Expected value to be returned is:
(19343160,39343169)
Error
Traceback (most recent call last):
File "test.py", line 20, in <module>
print(render_list_sql(list))
File "test.py", line 14, in render_list_sql
return tuple(l)
TypeError: 'tuple' object is not callable
As commented, don't use names for variables that mean other things to Python. This is called "shadowing" and you lose the meaning of the original name.
Example:
>>> tuple # This is the class used to create tuples.
<class 'tuple'>
>>> for index,tuple in enumerate([1,2,3]): # This is similar to your code
... print(index,tuple)
...
0 1
1 2
2 3
>>> tuple # tuple is no longer a class, but an instance of an integer.
3
>>> tuple([1,2,3]) # so this fails
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: 'int' object is not callable
>>> 3([1,2,3]) # You are basically doing this:
<interactive input>:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma?
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
TypeError: 'int' object is not callable
So don't do that:
li = [(19343160,),(39343169,)] # don't reassign list
def render_list_sql(li):
l = []
for index, tup in enumerate(li): # don't reassign tuple
idd = str(tup[0])
l.append(idd)
return tuple(l) # now this will work
print(render_list_sql(li))
Output:
('19343160', '39343169')
FYI, a shorter version using a generator:
li = [(19343160,),(39343169,)]
tup = tuple(str(i[0]) for i in li)
print(tup)

Pandas Dataframe: I want to use round() while dividing columns, but I'm getting an error

TypeError: loop of ufunc does not support argument 0 of type float which has no callable rint method
state_trend['Percent Change'].round(2)
There's a bunch of floats and ints in that column. Any idea what the problem is?
The error message is a bit misleading here and actually has nothing to do with the zeros.
The quick answer is:
state_trend['Percent Change'].astype(float).round(2)
I managed to replicate the error when the series is of type object. For float format the round function works well. And in the object format case, you get the same error with or without zeros
So all this code works as expected:
import pandas as pd
>>> s1 = pd.Series([0, 1, 2]) / pd.Series([1,2,3])
>>> s2 = pd.Series([1, 1, 2]) / pd.Series([1,2,3])
>>> s1.round(2)
0 0.00
1 0.50
2 0.67
dtype: float64
>>> s2.round(2)
0 1.00
1 0.50
2 0.67
dtype: float64
But when you convert the series to object you get the rounding error:
>>> t1 = s1.astype(object)
>>> t2 = s2.astype(object)
>>> t1.round(2)
AttributeError: 'float' object has no attribute 'rint'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/site-packages/pandas/core/series.py", line 2218, in round
result = self._values.round(decimals)
TypeError: loop of ufunc does not support argument 0 of type float which has no callable rint method
And this is with and without the zero:
>>> t2.round(2)
AttributeError: 'float' object has no attribute 'rint'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/site-packages/pandas/core/series.py", line 2218, in round
result = self._values.round(decimals)
TypeError: loop of ufunc does not support argument 0 of type float which has no callable rint method
So I just converted to float before the rounding:
>>> t1.astype(float).round(2)
0 0.00
1 0.50
2 0.67
dtype: float64
>>>
A final question will be how did the state_trend['Percent Change'] become of type object... generally this happens if you had None values before or the first series that generated the percent where of type object themselves.

What is the correct call for lists

Why is every time I set a variable as a list, it always comes back as TraceBack error when I try to append to the list:
>>> a = list
>>> a.append('item1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'append' requires a 'list' object but received a 'str'
>>> type(a)
<class 'type'>
list is not the attribute to define lists but it is a method to convert to lists. It is explained here.
What is the python attribute to define a variable as a list?
To initialize a var with a certain type, use () after the name of the class:
a = list()
You can also instatiate a list like this:
mylist = []

I am trying to read a csv file with pandas and then to search a string in the first column, to use total row for calculations

I am reading a CSV file with pandas, and then I try to find a word like "Net income" in the first column. Then I want to use the whole row which has this structure: string/number/number/number/... to do some calculations with the numbers.
The problem is that find is not working.
data = pd.read_csv(name)
data.str.find('Net income')
Traceback (most recent call last):
File "C:\Users\thoma\Desktop\python programme\manage.py", line 16, in <module>
data.str.find('Net income')
I am using CSV files from here: Income Statement for Deutsche Lufthansa AG (DLAKF) from Morningstar.com
I found this: Python | Pandas Series.str.find() - GeeksforGeeks
Traceback (most recent call last):
File "C:\Users\thoma\Desktop\python programme\manage.py", line 16, in <module>
data.str.find('Net income')
File "C:\Users\thoma\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 5067, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'str'
So, it works now. But I still have a question. After using the describe function with pandas I get this:
<bound method NDFrame.describe of 2014-12 615
2015-12 612
2016-12 636
2017-12 713
2018-12 736
Name: Goodwill, dtype: object>
I have problems to use the data. So how can I f.e. use the second column here? I tried to do a new table:
new_Table['Goodwill'] = data1['Goodwill'].describe
but this does not work.
I also would like to add more "second" columns to new_Table.
Hi you should filter the column name like df[‘col name’].str.find(x) this required a series not a data frame.
I recommend setting your header row if pandas isnt recognizing named rows in your CSV file.
Something like:
new_header = data.iloc[0] #grab the first row for the header
data = data[1:] #take the data less the header row
data.columns = new_header
From there you can summarize each column by name:
data['Net Income'].describe
edit: I looked at the csv file, I recommend reshaping the data first before analyzing columns.Something like...
data=data.transpose
So in summation:
data = pd.read_csv(name)
data=data.transpose #flip the columns/rows
new_header = data.iloc[0] #grab the first row for the header
data = data[1:] #take the data less the header row
data.columns = new_header
data['Net Income'].describe #analyze

Resources