i am getting this error when i try to run quora duplicates files on my feature python file,
the part of code i am running is below
data = pd.read_csv('train.csv', sep='\t')
data = data.drop(['id', 'qid1', 'qid2'], axis=1)
and the output is
unfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')
Traceback (most recent call last):
File "<ipython-input-31-e29a1095cc40>", line 1, in <module>
runfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py", line 55, in <module>
data = data.drop(['id','qid1','qid2'], axis=1)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2530, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2562, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3744, in drop
labels[mask])
ValueError: labels ['id' 'qid1' 'qid2'] not contained in axis
my csv file is like this
"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share market?","0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?","0"
please help me in trying to figure out the problem
you need to remove the separator argument \ because content in csv already has , as a separator:
# sample.csv file contains following data
"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"
df = pd.read_csv('sample.csv')
data = df.drop(['id', 'qid1', 'qid2'], axis=1)
print data
#output will be like this:
"question1","question2","is_duplicate"
"What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"
Related
I am writing python script in which i am generating two different csv files and then reading these file by using pandas. I am able to read file1 with pandas but getting error while reading file2 which in same format(same column name) as file1 but different/same values. Please find the below error that i am getting and sample code that i am using.
Error:
Traceback (most recent call last):
File "MSReport.py", line 168, in <module>
fail = pd.read_csv('/home/cisapp/msLogFailure.csv', sep=',')
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Code:
df = pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', engine='python')
f_output = df.groupby('MSISDN').last()
#print(df)
print(f_output)
fail = pd.read_csv(BASE_LOCATION+'/msLogFailure.csv', engine='python')
fail = fail['MSISDN']
fail = fail.tolist()
for i in fail:
succ = f_output[f_output.MSISDN != i]
In above sample code there is no error while reading file df = pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', engine='python') but while reading file fail = pd.read_csv(BASE_LOCATION+'/msLogFailure.csv', engine='python') i am facing the error as mentioned above. Please help to resolve.
Note: I am running code by using python3.
I faced the same problem and resolved. So you can check using below idea.
Check the delimitator and mention like below examples
pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', encoding='utf-16', sep='\t')
pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', delim_whitespace=True)
You can also add 'r' before file path.
Otherwise share the file image
Your sample of msLogFailure file looks OK - 6 column names and 6 data fields.
I looked for posts concerning just this error message and I found an advice to:
read the input file into a string variable,
read_csv from this string, e.g. pd.read_csv(io.StringIO(txt),...).
Maybe this will help.
I have a list of tickers that I want to retrieve the prices for by running the following:
from yahoo_fin import stock_info as si
for x in watchlist:
print(si.get_live_price(x))
When I run this I get the following error:
File "", line 1, in
runfile('C:/Users/User/OneDrive/Documents/Stuff/fluff 2.py', wdir='C:/Users/User/OneDrive/Documents/Stuff')
File
"D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 705, in runfile
execfile(filename, namespace)
File
"D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/User/OneDrive/Documents/Stuff/fluff 2.py", line 46,
in
print(si.get_live_price(x))
File "D:\Anaconda3\lib\site-packages\yahoo_fin\stock_info.py", line
338, in get_live_price
df = get_data(ticker, end_date = pd.Timestamp.today() + pd.DateOffset(10))
File "D:\Anaconda3\lib\site-packages\yahoo_fin\stock_info.py", line
68, in get_data
temp = loads(needed)
ValueError: Expected object or value
However, when I refer to a ticker directly, it runs normally:
print(si.get_live_price('tsla'))
348.8399963378906
What could be causing this issue? Is it due to me using a different html parser than that used with yahoo_fin in an earlier part of the code?
Try this out, It gives you complete dataframe for last 6 months data
import yfinance as yf
for x in ['TSLA','AAPL']:
data = yf.download( tickers = x)
print(data['Close'][-1])
Output :
348.8399963378906
268.4800109863281
If you want last 6 month data then you can store individual dataframe. In above case I have printed only last index as you wanted LTP.
This issue should be fixed now in the latest version of yahoo_fin (0.8.4). It was due to a change in Yahoo Finance's structure. See here for news about recent updates: http://theautomatic.net/2019/12/16/updates-to-yahoo_fin-package/
I am using training spacy NER to extract skills information from resume.But error is
Could not find a transition with the name 'U-SKILL' in the NER model
TRAINING DATA:
[(u"I have 2 years of experience in Python", {"entities": [(30, 35, "SKILL")]})]
CODE :
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes):
optimizer = nlp.begin_training()
for i in range(10):
random.shuffle(train_data)
for text, annotations in train_data:
nlp.update([text], [annotations], sgd=optimizer)```
Error Traceback:
```Traceback (most recent call last):
File "<ipython-input-1-b5f869eaaf43>", line 1, in <module>
runfile('/home/abhishek/Desktop/Monster/Resume_Parser/MI_Resume/skills_ner.py', wdir='/home/abhishek/Desktop/Monster/Resume_Parser/MI_Resume')
File "/usr/lib/python3/dist-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/usr/lib/python3/dist-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/abhishek/Desktop/Monster/Resume_Parser/MI_Resume/skills_ner.py", line 234, in <module>
nlp.update([text], [annotations], sgd=optimizer)
File "/home/abhishek/.local/lib/python3.6/site-packages/spacy/language.py", line 452, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 413, in spacy.syntax.nn_parser.Parser.update
File "nn_parser.pyx", line 516, in spacy.syntax.nn_parser.Parser._init_gold_batch
File "ner.pyx", line 106, in spacy.syntax.ner.BiluoPushDown.preprocess_gold
File "ner.pyx", line 165, in spacy.syntax.ner.BiluoPushDown.lookup_transition
KeyError: "[E022] Could not find a transition with the name 'U-SKILL' in the NER model."```
I recently faced the same error message while training my own custom NER model. Since you didn't show your entire code snippet, I'm not certain if it was caused by the same problem. For my case, it was actually a very silly mistake where the new labels I introduced into the entity recognizer were all in lowercase.
for label in entity_types:
ner.add_label(label.upper())
The error went away once I made sure that all my new labels added were in uppercase (i.e. 'SKILL' instead of 'skill') using str.upper().
You should probably refer to https://spacy.io/usage/training#ner as well for the example given on adding new entity types.
In my training data. I escaped special characters and it worked.
For example : from 1/1/2020 to 1///1///2020
I'm new to Python and more specifically, Alpha Vantage. My issue here is that my code is collecting the data for the stock, but it is not graphing it. I ran this code on my cmd with 3.7 python and already updated all my packages. I have heard people have been having problems with matplotlib and 3.7 version of python, but I really want to figure out this API. This is my code below:
from alpha_vantage.timeseries
import TimeSeries
import matplotlib.pyplot as plt
import sys
def stockchart(symbol):
ts = TimeSeries(key='1ORS1XLM1YK1GK9Y', output_format='pandas')
data, meta_data = ts.get_intraday(symbol=symbol, interval='1min', outputsize='full')
print (data)
data['close'].plot()
plt.title('Stock chart')
plt.show()
symbol=input("Enter symbol name:") stockchart(symbol)
I got this in response on my cmd after entering MSFT for the symbol name... which means I am pulling from the API but the data['close'] function is not working correctly with PANDAS
1. open 2. high 3. low 4. close 5. volume
date
2018-09-05 09:30:00 111.1900 111.4000 111.1200 111.3500 673119.0
File "C:\Users\davis\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "C:\Users\davis\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'close'
I had the same problem.
Pay attention in this line, that contain the names of columns:
1. open 2. high 3. low 4. close 5. volume
So, change this line with name of column: For example:
data['close'].plot() change for data['4. close'].plot()
So I'm trying to edit a csv file by writing to a temporary file and eventually replacing the original with the temp file. I'm going to have to edit the csv file multiple times so I need to be able to reference it. I've never used the NamedTemporaryFile command before and I'm running into a lot of difficulties. The most persistent problem I'm having is writing over the edited lines.
This part goes through and writes over rows unless specific values are in a specific column and then it just passes over.
I have this:
office = 3
temp = tempfile.NamedTemporaryFile(delete=False)
with open(inFile, "rb") as oi, temp:
r = csv.reader(oi)
w = csv.writer(temp)
for row in r:
if row[office] == "R00" or row[office] == "ALC" or row[office] == "RMS":
pass
else:
w.writerow(row)
and I get this error:
Traceback (most recent call last):
File "H:\jcatoe\Practice Python\pract.py", line 86, in <module>
cleanOfficeCol()
File "H:\jcatoe\Practice Python\pract.py", line 63, in cleanOfficeCol
for row in r:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
So I searched for that error and the general consensus was that "rb" needs to be "rt" so I tried that and got this error:
Traceback (most recent call last):
File "H:\jcatoe\Practice Python\pract.py", line 86, in <module>
cleanOfficeCol()
File "H:\jcatoe\Practice Python\pract.py", line 67, in cleanOfficeCol
w.writerow(row)
File "C:\Users\jcatoe\AppData\Local\Programs\Python\Python35-32\lib\tempfile.py", line 483, in func_wrapper
return func(*args, **kwargs)
TypeError: a bytes-like object is required, not 'str'
I'm confused because the errors seem to be saying to do the opposite thing.
If you read the tempfile docs you'll see that by default it's opening the file in 'w+b' mode. If you take a closer look at your errors, you'll see that you're getting one on read, and one on write. What you need to be doing is making sure that you're opening your input and output file in the same mode.
You can do it like this:
import csv
import tempfile
office = 3
temp = tempfile.NamedTemporaryFile(delete=False)
with open(inFile, 'r') as oi, tempfile.NamedTemporaryFile(delete=False, mode='w') as temp:
reader = csv.reader(oi)
writer = csv.writer(temp)
for row in reader:
if row[office] == "R00" or row[office] == "ALC" or row[office] == "RMS":
pass
else:
writer.writerow(row)