The section of my code that is causing me problems is
def Half_Increase(self):
self.keg_count=summer17.iloc[self.result_rows,2].values[0]
self.keg_count +=1
summer17[self.result_rows,2] = self.keg_count
print(keg_count)
So this function is to be executed when a button widget is pressed. It's supposed to get the value from a specific cell in a dataframe, add 1 to it, and then return the new value to the dataframe. (I'm not entirely sure if this is the proper way to do this.)
I get the following error
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python3.6\lib\tkinter\__init__.py", line 1699, in __call__
return self.func(*args)
File "beerfest_program_v0.3.py", line 152, in Half_Increase
summer17[self.result_rows,2] = self.keg_count
File "C:\Python3.6\lib\site-packages\pandas\core\frame.py", line 2331, in __setitem__
self._set_item(key, value)
File "C:\Python3.6\lib\site-packages\pandas\core\frame.py", line 2397, in _set_item
value = self._sanitize_column(key, value)
File "C:\Python3.6\lib\site-packages\pandas\core\frame.py", line 2596, in _sanitize_column
if broadcast and key in self.columns and value.ndim == 1:
File "C:\Python3.6\lib\site-packages\pandas\core\indexes\base.py", line 1640, in __contains__
hash(key)
File "C:\Python3.6\lib\site-packages\pandas\core\indexes\base.py", line 1667, in __hash__
raise TypeError("unhashable type: %r" % type(self).__name__)
TypeError: unhashable type: 'Int64Index'
I'm guessing this has something to do with the variable types not matching but I've looked and cant find how to remedy this.
I think you need iloc:
summer17.iloc[result_rows,2] += 1
Sample:
summer17 = pd.DataFrame({'a':[1,2,3],
'b':[3,4,5],
'c':[5,9,7]})
#if reselt_rows is scalar
result_rows = 1
print(summer17)
a b c
0 1 3 5
1 2 4 9
2 3 5 7
summer17.iloc[result_rows,2] += 1
print(summer17)
a b c
0 1 3 5
1 2 4 10
2 3 5 7
It is same as:
#get value
keg_count=summer17.iloc[result_rows,2]
#increment
keg_count +=1
#set value
summer17.iloc[result_rows,2] = keg_count
print(summer17)
a b c
0 1 3 5
1 2 4 10
2 3 5 7
But if result_rows is list or 1d array:
result_rows = [1,2]
#get all values per positions defined in result_rows
#filter only first value by values[0]
keg_count=summer17.iloc[result_rows,2].values[0]
#increment
keg_count +=1
#set all values of result_rows by incremented value
summer17.iloc[result_rows,2] = keg_count
print(summer17)
a b c
0 1 3 5
1 2 4 10
2 3 5 10
Related
My function:
def buttons_for_country(main_master,datasets):
country_list = every_country_in_datasets(datasets)
rows = 0
columns = 0
for i in range(1,len(country_list)):
name = "button",i
name = tkinter.Button(master = main_master,
command = lambda: plot(10000)
height = 2,
width=10,
text=country_list[i-1])
if rows == 12:
rows = 0
colums += 1
name.grid(rows,columns)
rows += 1
name.pack()
It cames error at name.grid(rows,columns) said:
Traceback (most recent call last):
File "c:/Python/Covid/Cov_predict.py", line 93, in <module>
buttons_for_country(window,df)
File "c:/Python/Covid/Cov_predict.py", line 75, in buttons_for_country
name.grid(rows,columns)
TypeError: grid_configure() takes from 1 to 2 positional arguments but 3 were given
It seems fine with by giving 2 params rows and columns
But it said i given 3 paramsWhere do i did wrong here?
You should specify the row and column as keyword arguments.
name.grid(row=rows, column=columns)
You also need to remove name.pack() - a widget can only be controlled by a single geometry manager, and the last one you use is the one that is in control. Calling pack() after calling grid() removes all of the benefits of calling grid().
Problem: Working with python 3.x, I have a file called input.txt with content as below
2345673 # First ID
0100121102020211111002 # first sequence (seq) which is long and goes to several lines
0120102100211001101200
6758442 #Second ID
0202111100011111022222 #second sequence (seq) which is long and goes to several lines
0202111110001120211210
0102101011211001101200
What i want: To process input.txt and save the results in output.csv and when i read it in pandas the
result should be a data frame like below.
ID Seq
2345673 0 1 0 0 1 2 1 1 0 2 …
6758442 0 2 0 2 1 1 1 1 0 0 …
Below is my code
with open("input.txt") as f:
with open("out.csv", "w") as f1:
for i, line in enumerate(f): #read each line in file
if(len(line) < 15 ): #check if length line is say < 15
id = line # if yes, make line ID
else:
seq = line # if not make it a sequence
#print(id)
lines = []
lines.append(','.join([str(id),str(seq)]))
for l in lines:
f1.write('('+l+'),\n') #write to file f1
when i read out.csv in pandas the output is not what i want. see below. Please i will appreciate your help , i am really stocked.
(2345673
,0100121102020211111002
),
(2345673
,0120102100211001101200
),
(6758442
,0202111100011111022222
),
(6758442
,0202111110001120211210
),
(6758442
,0102101011211001101200),
import pandas as pd
### idea is to create two lists: one with ids and another with sequences
with open("input.txt") as f:
ids=[]
seqs=[]
seq=""
for i, line in enumerate(f):
if (len(line) < 15 ) :
seqs.append(seq)
id=line
id=id.rstrip('\n')
id=id.rstrip(' ')
ids.append(id)
seq=""
else:
#next three lines combine all sequences that correspond the same id into one
additional_seq = line.rstrip('\n')
additional_seq = additional_seq.rstrip(' ')
seq+=additional_seq
seqs.append(seq)
seqs=seqs[1:]
df = pd.DataFrame(list(zip(ids, seqs)), columns =['id', 'seq'])
df.to_scv("out.csv",index=False)
import pandas as pd
from difflib import SequenceMatcher
df = pd.DataFrame({"id":[9,12,13,14],
"text":["Error number 609 at line 10", "Error number 609 at line 22", "Error string 'foo' at line 11", "Error string 'bar' at line 14"]})
Output:
id text
0 9 Error number 609 at line 10
1 12 Error number 609 at line 22
2 13 Error string 'foo' at line 11
3 14 Error string 'bar' at line 14
I want to use difflib.SequenceMatcher to remove similarity score lower than 80 rows and only keep one.
a = "Error number 609 at line 10"
b = "Error number 609 at line 22"
c = "Error string 'foo' at line 11"
d = "Error string 'bar' at line 14"
print(SequenceMatcher(None, a, b).ratio()*100) #92.5925925925926
print(SequenceMatcher(None, b, c).ratio()*100) #60.71428571428571
print(SequenceMatcher(None, c, d).ratio()*100) #86.20689655172413
print(SequenceMatcher(None, a, c).ratio()*100) #64.28571428571429
How can I get expected result as follows in Python? You can use difflib or other python packages. Thank you.
id text
0 9 Error number 609 at line 10
2 13 Error string 'foo' at line 11
You can use:
#cross join with filter onl text column
df = df.assign(a=1).merge(df[['text']].assign(a=1), on='a')
#filter out same columns per rows
df = df[df['text_x'] != df['text_y']]
#sort columns per rows
df[['text_x','text_y']] = pd.DataFrame(np.sort(df[['text_x','text_y']],axis=1), index=df.index)
#remove duplicates
df = df.drop_duplicates(subset=['text_x','text_y'])
#get similarity
df['r'] = df.apply(lambda x: SequenceMatcher(None, x.text_x, x.text_y).ratio(), axis=1)
#filtering
df = df[df['r'] > 0.8].drop(['a','r'], axis=1)
print (df)
id text_x text_y
1 9 Error number 609 at line 10 Error number 609 at line 22
11 13 Error string 'bar' at line 14 Error string 'foo' at line 11
I need to iterate over column 'movies_rated', check the value against the conditions, and write a value in a newly create column 'expert_level'. When I test on a subset of data, it works. But when I run it against my whole dateset, it only gets filled with value 1.
for num in df_merge['movies_rated']:
if num in range(20,31):
df_merge['expert_level'] = 1
elif num in range(31,53):
df_merge['expert_level'] = 2
elif num in range(53,99):
df_merge['expert_level'] = 3
elif num in range(99,202):
df_merge['expert_level'] = 4
else:
df_merge['expert_level'] = 5
here's a sample dataframe.
movies = [88,20,35,55,1203,99,2222,847]
name = ['angie','chris','pine','benedict','alice','spock','tony','xena']
df = pd.DataFrame(movies,name,columns=['movies_rated'])
certainly there's a less verbose way of doing this?
You could build an IntervalIndex and then apply pd.cut. I'm sure this is a duplicate, but I can't find one right now which uses both closed='left' and .codes, though I'm sure it exists.
bins = pd.IntervalIndex.from_breaks([0, 20, 31, 53, 99, 202, np.inf], closed='left')
df["expert_level"] = pd.cut(movies, bins).codes
which gives me
In [242]: bins
Out[242]:
IntervalIndex([[0.0, 20.0), [20.0, 31.0), [31.0, 53.0), [53.0, 99.0), [99.0, 202.0), [202.0, inf)]
closed='left',
dtype='interval[float64]')
and
In [243]: df
Out[243]:
movies_rated expert_level
angie 88 3
chris 20 1
pine 35 2
benedict 55 3
alice 1203 5
spock 99 4
tony 2222 5
xena 847 5
Note that I've set this up so that scores below 20 get a 0 value, so they can be distinguished from really high rankings. If you really want everything outside the bins to get 5, it'd be straightforward to remap 0 to 5, or just pass breaks of [20, 31, 53, 99, 202] and then map anything with a code of -1 (which means 'not binned') to 5.
I think np.select with the pandas function between is a good choice for you:
conds = [df.movies_rated.between(20,30), df.movies_rated.between(31,52),
df.movies_rated.between(53,98), df.movies_rated.between(99,202)]
choices = [1,2,3,4]
df['expert_level'] = np.select(conds,choices, 5)
>>> df
movies_rated expert_level
angie 88 3
chris 20 1
pine 35 2
benedict 55 3
alice 1203 5
spock 99 4
tony 2222 5
xena 847 5
you could do it with apply and a function:
def expert_level_check(num):
if 20<= num < 31:
return 1
elif 31<= num < 53:
return 2
elif 53<= num < 99:
return 3
elif 99<= num < 202:
return 4
else:
return 5
df['expert_level'] = df['movies_rated'].apply(expert_level_check)
it is slower to manually iterate over a df, I recommend reading this
I have a simple txt file and I am reading it as follows:
data=pd.read_csv("data1.txt",sep=',', header = None)
data.columns=['X1', 'Y']
when I print this I get:-
X1 Y
0 6.1101 17.5920
1 5.5277 9.1302
2 8.5186 13.6620
3 7.0032 11.8540
4 5.8598 6.8233
Now I want to insert a Column X0 in front of X1 ( to its left) and give this column a value of 1.so I added this code:-
data = data.insert(0,'X0',1)
print(type(data))
print(len(data))
But I get the following error message:-
<class 'NoneType'>
TypeError: object of type 'NoneType' has no len()
The question is , is my data.insert correct?. why is that type of the dataframe coming as NoneType. what am I doing wrong here?.
Instead of using insert which acts in place, you can use assign
data = data.assign(X0=1)[['X0'] + data.columns.tolist()]
print(data)
X0 X1 Y
0 1 6.1101 17.5920
1 1 5.5277 9.1302
2 1 8.5186 13.6620
3 1 7.0032 11.8540
4 1 5.8598 6.8233
You cannot assign DataFrame.insert to new DataFrame, because it works inplace:
data.insert(0,'X0',1)
print (data)
X0 X1 Y
0 1 6.1101 17.5920
1 1 5.5277 9.1302
2 1 8.5186 13.6620
3 1 7.0032 11.8540
4 1 5.8598 6.8233