TypeError: unhashable type: 'Int64Index' - python-3.x

The section of my code that is causing me problems is
def Half_Increase(self):
self.keg_count=summer17.iloc[self.result_rows,2].values[0]
self.keg_count +=1
summer17[self.result_rows,2] = self.keg_count
print(keg_count)
So this function is to be executed when a button widget is pressed. It's supposed to get the value from a specific cell in a dataframe, add 1 to it, and then return the new value to the dataframe. (I'm not entirely sure if this is the proper way to do this.)
I get the following error
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python3.6\lib\tkinter\__init__.py", line 1699, in __call__
return self.func(*args)
File "beerfest_program_v0.3.py", line 152, in Half_Increase
summer17[self.result_rows,2] = self.keg_count
File "C:\Python3.6\lib\site-packages\pandas\core\frame.py", line 2331, in __setitem__
self._set_item(key, value)
File "C:\Python3.6\lib\site-packages\pandas\core\frame.py", line 2397, in _set_item
value = self._sanitize_column(key, value)
File "C:\Python3.6\lib\site-packages\pandas\core\frame.py", line 2596, in _sanitize_column
if broadcast and key in self.columns and value.ndim == 1:
File "C:\Python3.6\lib\site-packages\pandas\core\indexes\base.py", line 1640, in __contains__
hash(key)
File "C:\Python3.6\lib\site-packages\pandas\core\indexes\base.py", line 1667, in __hash__
raise TypeError("unhashable type: %r" % type(self).__name__)
TypeError: unhashable type: 'Int64Index'
I'm guessing this has something to do with the variable types not matching but I've looked and cant find how to remedy this.

I think you need iloc:
summer17.iloc[result_rows,2] += 1
Sample:
summer17 = pd.DataFrame({'a':[1,2,3],
'b':[3,4,5],
'c':[5,9,7]})
#if reselt_rows is scalar
result_rows = 1
print(summer17)
a b c
0 1 3 5
1 2 4 9
2 3 5 7
summer17.iloc[result_rows,2] += 1
print(summer17)
a b c
0 1 3 5
1 2 4 10
2 3 5 7
It is same as:
#get value
keg_count=summer17.iloc[result_rows,2]
#increment
keg_count +=1
#set value
summer17.iloc[result_rows,2] = keg_count
print(summer17)
a b c
0 1 3 5
1 2 4 10
2 3 5 7
But if result_rows is list or 1d array:
result_rows = [1,2]
#get all values per positions defined in result_rows
#filter only first value by values[0]
keg_count=summer17.iloc[result_rows,2].values[0]
#increment
keg_count +=1
#set all values of result_rows by incremented value
summer17.iloc[result_rows,2] = keg_count
print(summer17)
a b c
0 1 3 5
1 2 4 10
2 3 5 10

Related

What do i do wrong in passing the parameters?

My function:
def buttons_for_country(main_master,datasets):
country_list = every_country_in_datasets(datasets)
rows = 0
columns = 0
for i in range(1,len(country_list)):
name = "button",i
name = tkinter.Button(master = main_master,
command = lambda: plot(10000)
height = 2,
width=10,
text=country_list[i-1])
if rows == 12:
rows = 0
colums += 1
name.grid(rows,columns)
rows += 1
name.pack()
It cames error at name.grid(rows,columns) said:
Traceback (most recent call last):
File "c:/Python/Covid/Cov_predict.py", line 93, in <module>
buttons_for_country(window,df)
File "c:/Python/Covid/Cov_predict.py", line 75, in buttons_for_country
name.grid(rows,columns)
TypeError: grid_configure() takes from 1 to 2 positional arguments but 3 were given
It seems fine with by giving 2 params rows and columns
But it said i given 3 paramsWhere do i did wrong here?
You should specify the row and column as keyword arguments.
name.grid(row=rows, column=columns)
You also need to remove name.pack() - a widget can only be controlled by a single geometry manager, and the last one you use is the one that is in control. Calling pack() after calling grid() removes all of the benefits of calling grid().

Creating two columns from an unstructured file of IDs and sequences

Problem: Working with python 3.x, I have a file called input.txt with content as below
2345673 # First ID
0100121102020211111002 # first sequence (seq) which is long and goes to several lines
0120102100211001101200
6758442 #Second ID
0202111100011111022222 #second sequence (seq) which is long and goes to several lines
0202111110001120211210
0102101011211001101200
What i want: To process input.txt and save the results in output.csv and when i read it in pandas the
result should be a data frame like below.
ID Seq
2345673 0 1 0 0 1 2 1 1 0 2 …
6758442 0 2 0 2 1 1 1 1 0 0 …
Below is my code
with open("input.txt") as f:
with open("out.csv", "w") as f1:
for i, line in enumerate(f): #read each line in file
if(len(line) < 15 ): #check if length line is say < 15
id = line # if yes, make line ID
else:
seq = line # if not make it a sequence
#print(id)
lines = []
lines.append(','.join([str(id),str(seq)]))
for l in lines:
f1.write('('+l+'),\n') #write to file f1
when i read out.csv in pandas the output is not what i want. see below. Please i will appreciate your help , i am really stocked.
(2345673
,0100121102020211111002
),
(2345673
,0120102100211001101200
),
(6758442
,0202111100011111022222
),
(6758442
,0202111110001120211210
),
(6758442
,0102101011211001101200),
import pandas as pd
### idea is to create two lists: one with ids and another with sequences
with open("input.txt") as f:
ids=[]
seqs=[]
seq=""
for i, line in enumerate(f):
if (len(line) < 15 ) :
seqs.append(seq)
id=line
id=id.rstrip('\n')
id=id.rstrip(' ')
ids.append(id)
seq=""
else:
#next three lines combine all sequences that correspond the same id into one
additional_seq = line.rstrip('\n')
additional_seq = additional_seq.rstrip(' ')
seq+=additional_seq
seqs.append(seq)
seqs=seqs[1:]
df = pd.DataFrame(list(zip(ids, seqs)), columns =['id', 'seq'])
df.to_scv("out.csv",index=False)

Drop similar text rows of one column in Python

import pandas as pd
from difflib import SequenceMatcher
df = pd.DataFrame({"id":[9,12,13,14],
"text":["Error number 609 at line 10", "Error number 609 at line 22", "Error string 'foo' at line 11", "Error string 'bar' at line 14"]})
Output:
id text
0 9 Error number 609 at line 10
1 12 Error number 609 at line 22
2 13 Error string 'foo' at line 11
3 14 Error string 'bar' at line 14
I want to use difflib.SequenceMatcher to remove similarity score lower than 80 rows and only keep one.
a = "Error number 609 at line 10"
b = "Error number 609 at line 22"
c = "Error string 'foo' at line 11"
d = "Error string 'bar' at line 14"
print(SequenceMatcher(None, a, b).ratio()*100) #92.5925925925926
print(SequenceMatcher(None, b, c).ratio()*100) #60.71428571428571
print(SequenceMatcher(None, c, d).ratio()*100) #86.20689655172413
print(SequenceMatcher(None, a, c).ratio()*100) #64.28571428571429
How can I get expected result as follows in Python? You can use difflib or other python packages. Thank you.
id text
0 9 Error number 609 at line 10
2 13 Error string 'foo' at line 11
You can use:
#cross join with filter onl text column
df = df.assign(a=1).merge(df[['text']].assign(a=1), on='a')
#filter out same columns per rows
df = df[df['text_x'] != df['text_y']]
#sort columns per rows
df[['text_x','text_y']] = pd.DataFrame(np.sort(df[['text_x','text_y']],axis=1), index=df.index)
#remove duplicates
df = df.drop_duplicates(subset=['text_x','text_y'])
#get similarity
df['r'] = df.apply(lambda x: SequenceMatcher(None, x.text_x, x.text_y).ratio(), axis=1)
#filtering
df = df[df['r'] > 0.8].drop(['a','r'], axis=1)
print (df)
id text_x text_y
1 9 Error number 609 at line 10 Error number 609 at line 22
11 13 Error string 'bar' at line 14 Error string 'foo' at line 11

checking range of number and writing a value in a new column in pandas dataframe

I need to iterate over column 'movies_rated', check the value against the conditions, and write a value in a newly create column 'expert_level'. When I test on a subset of data, it works. But when I run it against my whole dateset, it only gets filled with value 1.
for num in df_merge['movies_rated']:
if num in range(20,31):
df_merge['expert_level'] = 1
elif num in range(31,53):
df_merge['expert_level'] = 2
elif num in range(53,99):
df_merge['expert_level'] = 3
elif num in range(99,202):
df_merge['expert_level'] = 4
else:
df_merge['expert_level'] = 5
here's a sample dataframe.
movies = [88,20,35,55,1203,99,2222,847]
name = ['angie','chris','pine','benedict','alice','spock','tony','xena']
df = pd.DataFrame(movies,name,columns=['movies_rated'])
certainly there's a less verbose way of doing this?
You could build an IntervalIndex and then apply pd.cut. I'm sure this is a duplicate, but I can't find one right now which uses both closed='left' and .codes, though I'm sure it exists.
bins = pd.IntervalIndex.from_breaks([0, 20, 31, 53, 99, 202, np.inf], closed='left')
df["expert_level"] = pd.cut(movies, bins).codes
which gives me
In [242]: bins
Out[242]:
IntervalIndex([[0.0, 20.0), [20.0, 31.0), [31.0, 53.0), [53.0, 99.0), [99.0, 202.0), [202.0, inf)]
closed='left',
dtype='interval[float64]')
and
In [243]: df
Out[243]:
movies_rated expert_level
angie 88 3
chris 20 1
pine 35 2
benedict 55 3
alice 1203 5
spock 99 4
tony 2222 5
xena 847 5
Note that I've set this up so that scores below 20 get a 0 value, so they can be distinguished from really high rankings. If you really want everything outside the bins to get 5, it'd be straightforward to remap 0 to 5, or just pass breaks of [20, 31, 53, 99, 202] and then map anything with a code of -1 (which means 'not binned') to 5.
I think np.select with the pandas function between is a good choice for you:
conds = [df.movies_rated.between(20,30), df.movies_rated.between(31,52),
df.movies_rated.between(53,98), df.movies_rated.between(99,202)]
choices = [1,2,3,4]
df['expert_level'] = np.select(conds,choices, 5)
>>> df
movies_rated expert_level
angie 88 3
chris 20 1
pine 35 2
benedict 55 3
alice 1203 5
spock 99 4
tony 2222 5
xena 847 5
you could do it with apply and a function:
def expert_level_check(num):
if 20<= num < 31:
return 1
elif 31<= num < 53:
return 2
elif 53<= num < 99:
return 3
elif 99<= num < 202:
return 4
else:
return 5
df['expert_level'] = df['movies_rated'].apply(expert_level_check)
it is slower to manually iterate over a df, I recommend reading this

Python pandas DataFrame column insert call

I have a simple txt file and I am reading it as follows:
data=pd.read_csv("data1.txt",sep=',', header = None)
data.columns=['X1', 'Y']
when I print this I get:-
X1 Y
0 6.1101 17.5920
1 5.5277 9.1302
2 8.5186 13.6620
3 7.0032 11.8540
4 5.8598 6.8233
Now I want to insert a Column X0 in front of X1 ( to its left) and give this column a value of 1.so I added this code:-
data = data.insert(0,'X0',1)
print(type(data))
print(len(data))
But I get the following error message:-
<class 'NoneType'>
TypeError: object of type 'NoneType' has no len()
The question is , is my data.insert correct?. why is that type of the dataframe coming as NoneType. what am I doing wrong here?.
Instead of using insert which acts in place, you can use assign
data = data.assign(X0=1)[['X0'] + data.columns.tolist()]
print(data)
X0 X1 Y
0 1 6.1101 17.5920
1 1 5.5277 9.1302
2 1 8.5186 13.6620
3 1 7.0032 11.8540
4 1 5.8598 6.8233
You cannot assign DataFrame.insert to new DataFrame, because it works inplace:
data.insert(0,'X0',1)
print (data)
X0 X1 Y
0 1 6.1101 17.5920
1 1 5.5277 9.1302
2 1 8.5186 13.6620
3 1 7.0032 11.8540
4 1 5.8598 6.8233

Resources