Uppercases convert to lowercase when loading a file with h5py - python-3.x

Hello I can't load a hdf5 file with h5py:
$ python verif.py
Traceback (most recent call last):
File "verif.py", line 4, in <module>
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/'+'LowRes_13434_overlapping_pairs.h5",'r')
File "/home/jeanpat/VirtualEnv/venv3/lib/python3.5/site-packages/h5py/_hl/files.py", line 272, in __init__
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/home/jeanpat/VirtualEnv/venv3/lib/python3.5/site-packages/h5py/_hl/files.py", line 92, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-at6d2npe-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-at6d2npe-build/h5py/_objects.c:2642)
File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/tmp/pip-at6d2npe-build/h5py/h5f.c:1930)
OSError: Unable to open file (Unable to open file: name = '../deepfish-github_projects/deepfish/dataset/'+'lowres_13434_overlapping_pairs.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0
The string containing the path to the file:
../DeepFISH-Github_projects/DeepFISH/dataset'+'LowRes_13434_overlapping_pairs.h5
seems to be modified by h5py
../deepfish-github_projects/deepfish/dataset/lowres_13434_overlapping_pairs.h5
I could modify the directory name, but it's weird.

In this line
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/'+'LowRes_13434_overlapping_pairs.h5",'r')
you're trying to open a file with a literal '+' in its name. The outer quotes are double quotes, so the single quotes within the string are just part of the name. What you probably wanted to use is:
h5f = h5py.File("../DeepFISH-Github_projects/DeepFISH/dataset/" + "LowRes_13434_overlapping_pairs.h5",'r')
I don't know why the error message is all lower case, maybe the library tries to find the file in a case insensitive way if it doesn't find it by the original name, or the underlying file system is case insensitive and this is just how the OS reports the missing file error.

Related

Generated Ebook throws error when trying to read it with ebook readers using Ebooklib

While the epub is being generated successfully, but when I try to read the epub using readers like Calibre or Sigil. They throw errors like certain files are missing.
Here's my code to generate the epub file:
book = epub.EpubBook()
book.set_title(novelName)
book.set_language("en")
book.set_cover('temp.jpg', content=open('temp.jpg','rb').read())
book.set_identifier("test")
for i in authorNames:
book.add_author(i)
for i in range(1):
driver.get(chapterLinks[i])
try:
content=driver.find_element_by_id('chr-content').get_attribute("innerHTML")
time.sleep(5)
except Exception as e:
driver.close()
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)
driver.get(chapterLinks[i])
content=driver.find_element_by_id('chr-content').get_attribute("innerHTML")
time.sleep(5)
soup = BeautifulSoup(content)
ads=soup.find("div", class_="ads-holder")
if(ads!=None):
ads.decompose()
print(chapterNames[i], chapterLinks[i])
chapterName=chapterNames[i].replace("-","")
c=epub.EpubHtml(title=chapterName,
file_name='{}.xhtml'.format(chapterName),
lang='en')
c.set_content(str(soup).encode('utf-8'))
book.add_item(c)
chapterList.append(c)
book.toc = chapterList
book.spine = chapterList
book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())
epub.write_epub('test.epub', book)
and here are the errors:
Calibre :
calibre, version 5.20.0
ERROR: Loading book failed: Failed to open the book at C:\Users\xxxxx\Documents\Visual Studio 2019\PersonalProjects\Novel Grabber\test.epub. Click "Show details" for more info.
Failed to convert book: C:\Users\xxxxx\Documents\Visual Studio 2019\PersonalProjects\Novel Grabber\test.epub with error:
InputFormatPlugin: EPUB Input running
on C:\Users\xxxxx\Documents\Visual Studio 2019\PersonalProjects\Novel Grabber\test.epub
Failed to run pipe worker with command: from calibre.srv.render_book import viewer_main; viewer_main()
Traceback (most recent call last):
File "runpy.py", line 194, in _run_module_as_main
File "runpy.py", line 87, in _run_code
File "site.py", line 82, in <module>
File "site.py", line 77, in main
File "site.py", line 49, in run_entry_point
File "calibre\utils\ipc\worker.py", line 197, in main
File "<string>", line 1, in <module>
File "calibre\srv\render_book.py", line 824, in viewer_main
File "calibre\srv\render_book.py", line 815, in render_for_viewer
File "calibre\srv\render_book.py", line 793, in render
File "calibre\srv\render_book.py", line 601, in process_exploded_book
File "calibre\srv\render_book.py", line 604, in <setcomp>
File "calibre\ebooks\oeb\polish\container.py", line 561, in has_name_and_is_not_empty
File "genericpath.py", line 50, in getsize
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\Users\\xxxxxx\\AppData\\Local\\calibre-cache\\ev2\\t\\c0-vdo66nim\\EPUB\\Chapter 2 '
Sigil:
Files exist in epub that are not listed in manifest, they will be ignored
Does anybody know what could be the cause for this?

Python passing a string value from a table to a function

I'm trying to go through a list of items and passing each one to a function one by one to create an Excel file with the same name as the argument passed. I am getting the error below which I believe is related to the '/' in the String name. Can anyone advise how I get it to ignore this?
>>> test.createExcel(filename)
Traceback (most recent call last):
File "<pyshell#97>", line 1, in <module>
test.createExcel(filename)
File "C:\Users\danie\OneDrive\JVC\project1.py", line 52, in createExcel
wb2.save(modelname+'.xlsx')
File "C:\Users\danie\AppData\Local\Programs\Python\Python37\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
save_workbook(self, filename)
File "C:\Users\danie\AppData\Local\Programs\Python\Python37\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\danie\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1240, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '14 A4/32GB BLU.xlsx'
A filename cannot contain any of the following characters: \ / : * ? " < > |
In ur case, u could replace ur filename using str.replace('/','-') or any other character u'd like to.
eg:
wb.save(filename.replace('\','-'))
Or using the regular expression to replace it may work well.

How can I compile the example xml file from the Open62541 tutorial?

I'm on chapter 11 of the official guide to the open62541 library. The html version is here. Before trying anything custom, I just want to try this feature in the most basic way by "compiling" their example xml file into C code, which can then be compiled with the GCC and run as an OPC server. (If you would like to follow along, download the full source code from the main page—the nodeset compiler tool is in there.)
I'm in a Debian-based environment (CLI only). I made a copy of myNS.xml and saved it directly in the path ~/code/open62541-open62541-6249bb2/tools/nodeset_compiler/, which is also my current working directory in this example. I tried to use the nodeset compiler with exactly the same command that they use in the tutorial: python ./nodeset_compiler.py --types-array=UA_TYPES --existing ../../deps/ua-nodeset/Schema/Opc.Ua.NodeSet2.xml --xml myNS.xml myNS
The error message I got is this:
Traceback (most recent call last):
File "./nodeset_compiler.py", line 126, in <module>
ns.addNodeSet(xmlfile, True, typesArray=getTypesArray(nsCount))
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/nodeset.py", line 224, in addNodeSet
nodesets = dom.parseString(fileContent).getElementsByTagName("UANodeSet")
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1928, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
Any idea what I might be doing wrong?
UPDATE:
Alright, I found out there was a problem with my Opc.Ua.NodeSet2.xml file, which I corrected. If you are following along and would like to grab the version of the file I have, you can get it here.
But now I have this issue:
INFO:__main__:Preprocessing (existing) ../../deps/ua-nodeset/Schema/Opc.Ua.NodeSet2.xml
INFO:__main__:Preprocessing myNS.xml
Traceback (most recent call last):
File "./nodeset_compiler.py", line 178, in <module>
ns.allocateVariables()
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/nodeset.py", line 322, in allocateVariables
n.allocateValue(self)
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/nodes.py", line 291, in allocateValue
self.value.parseXMLEncoding(self.xmlValueDef, dataTypeNode, self)
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/datatypes.py", line 161, in parseXMLEncoding
val = self.__parseXMLSingleValue(el, parentDataTypeNode, parent)
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/datatypes.py", line 281, in __parseXMLSingleValue
extobj.value.append(extobj.__parseXMLSingleValue(ebodypart, parentDataTypeNode, parent, alias=None, encodingPart=e))
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/datatypes.py", line 223, in __parseXMLSingleValue
alias=alias, encodingPart=enc[1], valueRank=enc[2] if len(enc)>2 else None)
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/datatypes.py", line 198, in __parseXMLSingleValue
t.parseXML(xmlvalue)
File "/root/code/open62541-open62541-6249bb2/tools/nodeset_compiler/datatypes.py", line 330, in parseXML
self.value = int(unicode(xmlvalue.firstChild.data))
ValueError: invalid literal for int() with base 10: ''
UPDATE_2:
I tried doing the same thing on my Windows laptop, and here is the error I got:
INFO:__main__:Preprocessing (existing) ../../deps/ua-nodeset/Schema/Opc.Ua.NodeSet2.xml
INFO:__main__:Preprocessing myNS.xml
Traceback (most recent call last):
File "./nodeset_compiler.py", line 178, in <module>
ns.allocateVariables()
File "C:\Users\ekstraaa\Source\open62541\open62541-open62541-6249bb2\tools\nodeset_compiler\nodeset.py", line 322, in allocateVariables
n.allocateValue(self)
File "C:\Users\ekstraaa\Source\open62541\open62541-open62541-6249bb2\tools\nodeset_compiler\nodes.py", line 291, in allocateValue
self.value.parseXMLEncoding(self.xmlValueDef, dataTypeNode, self)
File "C:\Users\ekstraaa\Source\open62541\open62541-open62541-6249bb2\tools\nodeset_compiler\datatypes.py", line 161, in parseXMLEncoding
val = self.__parseXMLSingleValue(el, parentDataTypeNode, parent)
File "C:\Users\ekstraaa\Source\open62541\open62541-open62541-6249bb2\tools\nodeset_compiler\datatypes.py", line 281, in __parseXMLSingleValue
extobj.value.append(extobj.__parseXMLSingleValue(ebodypart, parentDataTypeNode, parent, alias=None, encodingPart=e))
File "C:\Users\ekstraaa\Source\open62541\open62541-open62541-6249bb2\tools\nodeset_compiler\datatypes.py", line 223, in __parseXMLSingleValue
alias=alias, encodingPart=enc[1], valueRank=enc[2] if len(enc)>2 else None)
File "C:\Users\ekstraaa\Source\open62541\open62541-open62541-6249bb2\tools\nodeset_compiler\datatypes.py", line 198, in __parseXMLSingleValue
t.parseXML(xmlvalue)
File "C:\Users\ekstraaa\Source\open62541\open62541-open62541-6249bb2\tools\nodeset_compiler\datatypes.py", line 330, in parseXML
self.value = int(unicode(xmlvalue.firstChild.data))
ValueError: invalid literal for int() with base 10: '\n '
The complete documentation for the open62541 nodeset compiler can be found here:
https://open62541.org/doc/current/nodeset_compiler.html
The command you are using also seems to be fine.
The last issue you are describing invalid literal for int() is due to a newline inside the value tag of a variable.
This will be fixed with
https://github.com/open62541/open62541/pull/2768
For a workaround you can change your .xml from
<Value>
<Int32>
</Int32>
</Value>
to (no newline):
<Value>
<Int32></Int32>
</Value>

OSError: [Errno 12] Cannot allocate memory pytesseract

I am facing an issue. I am running a python script which converts pdf to image using tesseract.
for filename in path_list:
print(filename)
pdfFile = wi(filename = filename, resolution = 300)
image = pdfFile.convert('jpeg')
imageBlobs = []
for img in image.sequence:
imgPage = wi(image = img)
imageBlobs.append(imgPage.make_blob('jpeg'))
extract = []
for imgBlob in imageBlobs:
image = Image.open(io.BytesIO(imgBlob))
text = pytesseract.image_to_string(image, lang = 'eng')
After extracting content from 11 pdfs I get the following error.
It's not the problem with the pdf file as when I give that particular pdf separately it extracted its content.
I am running the script on Ubuntu 16.04
Any help will be grateful.
Error: -
File "/home/steve/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170 ,in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "ocr_script.py", line 466, in <module>
gather_details(path_list)
File "ocr_script.py", line 45, in gather_details
discover_data('Indexing',discoveryPath,final_meta,start_time)
File "ocr_script.py", line 165, in discover_data
text = pytesseract.image_to_string(image, lang='eng')
File "/home/steve/.local/lib/python3.5/site
packages/pytesseract/pytesseract.py", line 294
, in image_to_string
return run_and_get_output(*args)
File "/home/steve/.local/lib/python3.5/site-
packages/pytesseract/pytesseract.py", line 202
, in run_and_get_output
run_tesseract(**kwargs)
File "/home/steve/.local/lib/python3.5/site-
packages/pytesseract/pytesseract.py", line 172
, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: /usr/bin/tesseract is not
installed or it's
After further analysis and tweaks I came to conclusion that the problem was with my tesseract rather than OS.
Changes I did-
/etc/ImageMagic..(version )
Edit , policy.xml file
These are the parameters where I increased the memory.

Converting a supposed excel file in csv in python

I am having an issue trying to use a code for converting a file into csv.
I am using the code below as a start
directory = 'C:\OI Data'
filename = 'OpenInterest08-24-16'
data_xls = pd.read_excel(os.path.join(directory,filename), 'Sheet1', index_col=None)
data_xls.to_csv(os.path.join(directory,filename +'.csv'), encoding='utf-8')
and I am getting the following error:
Traceback (most recent call last):
File "", line 1, in
File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/Public/Documents/Python Scripts/work.py", line 26, in
data_xls = pd.read_excel(os.path.join(directory,filename), 'Sheet1', index_col=None)
File "C:\Anaconda2\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Anaconda2\lib\site-packages\pandas\io\excel.py", line 227, in init
self.book = xlrd.open_workbook(io)
File "C:\Anaconda2\lib\site-packages\xlrd__init__.py", line 441, in open_workbook
ragged_rows=ragged_rows,
File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 1230, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Anaconda2\lib\site-packages\xlrd\book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\n\n\n\n\n '
I am struggling to figure out the file format I am using
https://www.theice.com/marketdata/reports/icefuturesus/PreliminaryOpenInterest.shtml?futuresExcel=&tradeDate=8%2F24%2F16
opening the file myself I get the following
enter image description here
I am still a beginner at python and some help would be much appreciated.
Thanks
You can start by fixing this part:
data_xls.to_csv(os.path.join(directory,filename,'.csv'), encoding='utf-8')
What happens when you do that is:
'C:\OI Data\\OpenInterest08-24-16\\.csv'
Which is not what you want. Instead do:
os.path.join(directory,filename+'.csv')
Which will give you:
'C:\OI Data\\OpenInterest08-24-16.csv'
Also, this is not a problem here, but in general be careful with this because a single backslash and a character can indicate an escape sequence, e.g. \n is a newline:
directory = 'C:\OI Data'
Instead escape the backslash like so:
directory = 'C:\\OI Data'

Resources