Converting a supposed excel file in csv in python - excel

I am having an issue trying to use a code for converting a file into csv.
I am using the code below as a start
directory = 'C:\OI Data'
filename = 'OpenInterest08-24-16'
data_xls = pd.read_excel(os.path.join(directory,filename), 'Sheet1', index_col=None)
data_xls.to_csv(os.path.join(directory,filename +'.csv'), encoding='utf-8')
and I am getting the following error:
Traceback (most recent call last):
File "", line 1, in
File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/Public/Documents/Python Scripts/work.py", line 26, in
data_xls = pd.read_excel(os.path.join(directory,filename), 'Sheet1', index_col=None)
File "C:\Anaconda2\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Anaconda2\lib\site-packages\pandas\io\excel.py", line 227, in init
self.book = xlrd.open_workbook(io)
File "C:\Anaconda2\lib\site-packages\xlrd__init__.py", line 441, in open_workbook
ragged_rows=ragged_rows,
File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 1230, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\n\n\n\n\n '
I am struggling to figure out the file format I am using
https://www.theice.com/marketdata/reports/icefuturesus/PreliminaryOpenInterest.shtml?futuresExcel=&tradeDate=8%2F24%2F16
opening the file myself I get the following
enter image description here
I am still a beginner at python and some help would be much appreciated.
Thanks

You can start by fixing this part:
data_xls.to_csv(os.path.join(directory,filename,'.csv'), encoding='utf-8')
What happens when you do that is:
'C:\OI Data\\OpenInterest08-24-16\\.csv'
Which is not what you want. Instead do:
os.path.join(directory,filename+'.csv')
Which will give you:
'C:\OI Data\\OpenInterest08-24-16.csv'
Also, this is not a problem here, but in general be careful with this because a single backslash and a character can indicate an escape sequence, e.g. \n is a newline:
directory = 'C:\OI Data'
Instead escape the backslash like so:
directory = 'C:\\OI Data'

Related

Python passing a string value from a table to a function

I'm trying to go through a list of items and passing each one to a function one by one to create an Excel file with the same name as the argument passed. I am getting the error below which I believe is related to the '/' in the String name. Can anyone advise how I get it to ignore this?
>>> test.createExcel(filename)
Traceback (most recent call last):
File "<pyshell#97>", line 1, in <module>
test.createExcel(filename)
File "C:\Users\danie\OneDrive\JVC\project1.py", line 52, in createExcel
wb2.save(modelname+'.xlsx')
File "C:\Users\danie\AppData\Local\Programs\Python\Python37\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
save_workbook(self, filename)
File "C:\Users\danie\AppData\Local\Programs\Python\Python37\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\danie\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1240, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '14 A4/32GB BLU.xlsx'
A filename cannot contain any of the following characters: \ / : * ? " < > |
In ur caseļ¼Œ u could replace ur filename using str.replace('/','-') or any other character u'd like to.
eg:
wb.save(filename.replace('\','-'))
Or using the regular expression to replace it may work well.

Not able to parse csv file from pandas

I am writing python script in which i am generating two different csv files and then reading these file by using pandas. I am able to read file1 with pandas but getting error while reading file2 which in same format(same column name) as file1 but different/same values. Please find the below error that i am getting and sample code that i am using.
Error:
Traceback (most recent call last):
File "MSReport.py", line 168, in <module>
fail = pd.read_csv('/home/cisapp/msLogFailure.csv', sep=',')
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Code:
df = pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', engine='python')
f_output = df.groupby('MSISDN').last()
#print(df)
print(f_output)
fail = pd.read_csv(BASE_LOCATION+'/msLogFailure.csv', engine='python')
fail = fail['MSISDN']
fail = fail.tolist()
for i in fail:
succ = f_output[f_output.MSISDN != i]
In above sample code there is no error while reading file df = pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', engine='python') but while reading file fail = pd.read_csv(BASE_LOCATION+'/msLogFailure.csv', engine='python') i am facing the error as mentioned above. Please help to resolve.
Note: I am running code by using python3.
I faced the same problem and resolved. So you can check using below idea.
Check the delimitator and mention like below examples
pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', encoding='utf-16', sep='\t')
pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', delim_whitespace=True)
You can also add 'r' before file path.
Otherwise share the file image
Your sample of msLogFailure file looks OK - 6 column names and 6 data fields.
I looked for posts concerning just this error message and I found an advice to:
read the input file into a string variable,
read_csv from this string, e.g. pd.read_csv(io.StringIO(txt),...).
Maybe this will help.

Getting `EOFError: Compressed file ended before the end-of-stream marker was reached` error

I wrote a python script to download a file, extract it and train the AI. Code is as given below:
def maybe_download_and_extract(data_url):
dest_directory = FLAGS.model_dir
if not os.path.exists(dest_directory):
os.makedirs(dest_directory)
filename = data_url.split('/')[-1]
filepath = os.path.join(dest_directory, filename)
if not os.path.exists(filepath):
def _progress(count, block_size, total_size):
sys.stdout.write('\r>> Downloading %s %.1f%%' %
(filename, float(count * block_size) / float(total_size) * 100.0))
sys.stdout.flush()
filepath, _ = urllib.request.urlretrieve(data_url, filepath, _progress)
print()
statinfo = os.stat(filepath)
tf.logging.info('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
tarfile.open(filepath, 'r:gz').extractall(dest_directory)
When I run it, I get this error:
Traceback (most recent call last):
File "scripts/retrain.py", line 1326, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "scripts/retrain.py", line 982, in main
maybe_download_and_extract(model_info['data_url'])
File "scripts/retrain.py", line 340, in maybe_download_and_extract
tarfile.open(filepath, 'r:gz').extractall(dest_directory)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2010, in extractall
numeric_owner=numeric_owner)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2052, in extract
numeric_owner=numeric_owner)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2122, in _extract_member
self.makefile(tarinfo, targetpath)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2171, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 249, in copyfileobj
buf = src.read(bufsize)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\gzip.py", line 276, in read
return self._buffer.read(size)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
It seems that file was partially downloaded. Where is that file? I deleted contents of tmp folder and ran that program again, but got same error.

How can I read the csv file in pandas which is separated with ";"?

I started working with pandas in python 3.4 for couple of days. I chose to work on Book-Crossing data set.
The book information table is like this:
The Book rating table is like this:
I want to grab the "ISBN","Book-title" from the book information table and merge it with the book-rating table in which both match the "ISBN" and after that write the results in another csv file.
I used the code below:
udata = pd.read_csv('1', names = ('User_ID', 'ISBN', 'Book-Rating'), encoding="ISO-8859-1", sep=';', usecols=[0,1,2])
uitem = pd.read_csv('2', names = ('ISBN', 'Book-Title'), encoding="ISO-8859-1", sep=';', usecols=[0,1])
ratings = pd.merge(udata, uitem, on='ISBN')
ratings.to_csv('ratings.csv', index=False)
Unfortunately it doesn't work and it gives an error:
Traceback (most recent call last):
File "C:\Users\masoud\Desktop\Dataset\data2\a.py", line 2, in <module>
udata = pd.read_csv('2.csv', names = ('User_ID', 'ISBN', 'Book-Rating'),encoding="ISO-8859-1", sep=';', usecols=[0,1,2])
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 491, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 278, in _read
return parser.read()
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 740, in read
ret = self._engine.read(nrows)
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 1187, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 758, in pandas.parser.TextReader.read (pandas\parser.c:7919)
File "pandas\parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:8175)
File "pandas\parser.pyx", line 833, in pandas.parser.TextReader._read_rows (pandas\parser.c:8868)
File "pandas\parser.pyx", line 820, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:8736)
File "pandas\parser.pyx", line 1732, in pandas.parser.raise_parser_error (pandas\parser.c:22105)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 8 fields in line 6452, saw 9
I was wondering if anybody could fix the error?
In the first and second row, change sep to ;.
sep=';'

Uppercases convert to lowercase when loading a file with h5py

Hello I can't load a hdf5 file with h5py:
$ python verif.py
Traceback (most recent call last):
File "verif.py", line 4, in <module>
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/'+'LowRes_13434_overlapping_pairs.h5",'r')
File "/home/jeanpat/VirtualEnv/venv3/lib/python3.5/site-packages/h5py/_hl/files.py", line 272, in __init__
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/home/jeanpat/VirtualEnv/venv3/lib/python3.5/site-packages/h5py/_hl/files.py", line 92, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-at6d2npe-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-at6d2npe-build/h5py/_objects.c:2642)
File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/tmp/pip-at6d2npe-build/h5py/h5f.c:1930)
OSError: Unable to open file (Unable to open file: name = '../deepfish-github_projects/deepfish/dataset/'+'lowres_13434_overlapping_pairs.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0
The string containing the path to the file:
../DeepFISH-Github_projects/DeepFISH/dataset'+'LowRes_13434_overlapping_pairs.h5
seems to be modified by h5py
../deepfish-github_projects/deepfish/dataset/lowres_13434_overlapping_pairs.h5
I could modify the directory name, but it's weird.
In this line
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/'+'LowRes_13434_overlapping_pairs.h5",'r')
you're trying to open a file with a literal '+' in its name. The outer quotes are double quotes, so the single quotes within the string are just part of the name. What you probably wanted to use is:
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/" + "LowRes_13434_overlapping_pairs.h5",'r')
I don't know why the error message is all lower case, maybe the library tries to find the file in a case insensitive way if it doesn't find it by the original name, or the underlying file system is case insensitive and this is just how the OS reports the missing file error.

Resources