tst = pd.read_csv('/Users/me/Desktop/stuff/Et2Load.csv', header=0,delimiter="\t", quoting=3)
print(tst.head(2)) # ok
#print(tst['date'][0])
I made up this file, one line header 2 lines
3 columns, 2 lines
id,date,coldata
0 1,August 18 2016,"With all this stuff going do...
1 2,August 19 2016,this is a great movie. The mu...
i cannot access a specific "cell"
print(tst['date'][0]) error
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'date'
well this is the big secret:
set_index('name the column that going to be used as 'id'')
It was difficult to find this one.
Related
I am trying to develop model using PLSR (Partial Least Squares Regression) in Python3 using code provided https://github.com/pgbrodrick/ensemblePLSR. Sample data is also provided.
When I try to run code, it gives me error
>>> python3 ensemble_plsr.py example_settings.txt
I am using Python (3.7.3), python modules scikit-learn (0.20.2) and pandas (0.23.3).
/usr/lib/python3/dist-packages/sklearn/externals/joblib.py:1: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
n bad bands 57
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: -1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ensemble_plsr.py", line 173, in <module>
df.pop(col)
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 760, in pop
result = self[item]
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 2491, in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python3/dist-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/usr/lib/python3/dist-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: -1
In short, you're attempting to remove columns for which there is no reference at row 173 of ensemble_plsr.py, hence the KeyError during execution. What's happening under the hood is that when Python attempts to execute the pop method on the DataFrame for the unspecified/non-existent column, it raises that error. There are different ways to resolve this but this solution will resolve the error you are seeing:
Replace rows 172 & 173 in ensemble_plsr.py with the following:
for col in sf.get_setting('ignore columns'):
if col != 'nothing_here':
df.pop(col)
Replace row 16 in example_settings.txt with the following:
ignore columns(any other columns to remove) = nothing_here
Good news, you're done with this issue. Bad news, you're going to hit the next error down the line but you're on your way!
I have the data file which looks like this -
And I have another data file which has all the correct country names.
For matching both the files that, I am using below:
from fuzzywuzzy import process
import pandas as pd
names_array=[]
ratio_array=[]
def match_names(wrong_names,correct_names):
for row in wrong_names:
x=process.extractOne(row, correct_names)
names_array.append(x[0])
ratio_array.append(x[1])
return names_array,ratio_array
#Wrong country names dataset
df=pd.read_csv("wrong-country-names.csv",encoding="ISO-8859-1")
wrong_names=df['name'].dropna().values
#Correct country names dataset
choices_df=pd.read_csv("country-names.csv",encoding="ISO-8859-1")
correct_names=choices_df['name'].values
name_match,ratio_match=match_names(wrong_names,correct_names)
df['correct_country_name']=pd.Series(name_match)
df['country_names_ratio']=pd.Series(ratio_match)
df.to_csv("string_matched_country_names.csv")
print(df[['name','correct_country_name','country_names_ratio']].head(10))
I get the below error:
runfile('C:/Users/Drashti Bhatt/Desktop/untitled0.py', wdir='C:/Users/Drashti Bhatt/Desktop')
Traceback (most recent call last):
File "<ipython-input-155-a1fd87d9f661>", line 1, in <module>
runfile('C:/Users/Drashti Bhatt/Desktop/untitled0.py', wdir='C:/Users/Drashti Bhatt/Desktop')
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Drashti Bhatt/Desktop/untitled0.py", line 17, in <module>
wrong_names=df['name'].dropna().values
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\Drashti Bhatt\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'name'
Any help on this will be much appreciated! Thanks much!
Your code contains e.g. wrong_names=df['name'].dropna().values (it is
mentioned in your Traceback as the "offending" line).
And now look at the picture presenting your DataFrame:
it does not contain name column,
it contains Country column.
Go back to the Traceback: At the very end there is the error message:
KeyError: 'name'.
So you attempt to access a non-existing column.
I noticed also another detail: values attribute contains the underlying
Numpy array, whereas process.extractOne requires "ordinary" Python lists (of strings, to perform the match).
So probably you should change the instruction above to:
wrong_names=df['Country'].dropna().values.tolist()
The same for the other instruction.
I can selected one column from a DataFrame, for example: the code like print(df['201809']) works:
df = pd.read_csv('xxxx.csv', low_memory=False)
print(df.info()]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 11 entries, 0 to 10
Data columns (total 4 columns):
BO_product2 11 non-null object
201808 11 non-null float64
201809 11 non-null float64
4 11 non-null float64
dtypes: float64(3), object(1)
memory usage: 440.0+ bytes
print(df['201809']) # works fine
None
0 1.634931e+06
1 2.653640e+08
2 7.475315e+07
3 9.710830e+06
4 3.023899e+08
5 1.087862e+08
6 2.031106e+08
7 3.556234e+08
8 5.830665e+06
9 8.766841e+08
10 7.544689e+07
Name: 201809, dtype: float64
However print(df['4']) don't. Any tips or ideas is here?
PS: if i save the df.to_csv('yy.csv) to local file in csv format, print(a['4'])works after `df = pd.read_csv('yy.csv').
print(df['4'])
Traceback (most recent call last):
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3063, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '4'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:/Python/2.py", line 45, in <module>
he()
File "E:/Python/2.py", line 26, in he
print(a['4'])
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2685, in __getitem__
return self._getitem_column(key)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2692, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3065, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '4'
If you execute the below:
[type(i) for i in df.columns]
#[str, str, str, int]
For columns having type int you should call the column as df[4] and not df['4']
Probably the reason why it is getting written as string is due to the quoting builtin function. From the docs:
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are >>converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non->>numeric
Hope this helps.
I am trying to set up Python (3.4) code to sort a time-series by date.
In python shell, I key in the following
>>>data = quandl.get("YAHOO/INDEX_GSPC", start_date="2017-01-01", end_date="2017-01-20")
>>>print(data)
So, I can load in the data. But when I try to use sort by the command
>>>data = data.sort_values(by='Date')
I get the following list of errors messages. I can't seem to understand/get the syntax for date sort from http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html
Experts out there......., many thanks for advice.
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\pandas\indexes\base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: 'Date'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#37>", line 1, in <module>
data = data.sort_values(by='Date')
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 3230, in sort_values
k = self.xs(by, axis=other_axis).values
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1770, in xs
return self[key]
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2059, in __getitem__
return self._getitem_column(key)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "C:\Python34\lib\site-packages\pandas\indexes\base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: 'Date'
quandl.get loads a DataFrame with the date as index.
So if you sort by index, you're good to go:
data = data.sort_index()
Make sure you look at the error. You are getting a KeyError which means that the column Date does not exist in your DataFrame. It's like that the dates are stored in the index which requires the sort_index method instead. The 'Date' name that you see in your DataFrame is the name of the index and not a column.
data.sort_index()
I started working with pandas in python 3.4 for couple of days. I chose to work on Book-Crossing data set.
The book information table is like this:
The Book rating table is like this:
I want to grab the "ISBN","Book-title" from the book information table and merge it with the book-rating table in which both match the "ISBN" and after that write the results in another csv file.
I used the code below:
udata = pd.read_csv('1', names = ('User_ID', 'ISBN', 'Book-Rating'), encoding="ISO-8859-1", sep=';', usecols=[0,1,2])
uitem = pd.read_csv('2', names = ('ISBN', 'Book-Title'), encoding="ISO-8859-1", sep=';', usecols=[0,1])
ratings = pd.merge(udata, uitem, on='ISBN')
ratings.to_csv('ratings.csv', index=False)
Unfortunately it doesn't work and it gives an error:
Traceback (most recent call last):
File "C:\Users\masoud\Desktop\Dataset\data2\a.py", line 2, in <module>
udata = pd.read_csv('2.csv', names = ('User_ID', 'ISBN', 'Book-Rating'),encoding="ISO-8859-1", sep=';', usecols=[0,1,2])
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 491, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 278, in _read
return parser.read()
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 740, in read
ret = self._engine.read(nrows)
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 1187, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 758, in pandas.parser.TextReader.read (pandas\parser.c:7919)
File "pandas\parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:8175)
File "pandas\parser.pyx", line 833, in pandas.parser.TextReader._read_rows (pandas\parser.c:8868)
File "pandas\parser.pyx", line 820, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:8736)
File "pandas\parser.pyx", line 1732, in pandas.parser.raise_parser_error (pandas\parser.c:22105)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 8 fields in line 6452, saw 9
I was wondering if anybody could fix the error?
In the first and second row, change sep to ;.
sep=';'