OSError: [Errno 12] Cannot allocate memory pytesseract - python-3.x

I am facing an issue. I am running a python script which converts pdf to image using tesseract.
for filename in path_list:
print(filename)
pdfFile = wi(filename = filename, resolution = 300)
image = pdfFile.convert('jpeg')
imageBlobs = []
for img in image.sequence:
imgPage = wi(image = img)
imageBlobs.append(imgPage.make_blob('jpeg'))
extract = []
for imgBlob in imageBlobs:
image = Image.open(io.BytesIO(imgBlob))
text = pytesseract.image_to_string(image, lang = 'eng')
After extracting content from 11 pdfs I get the following error.
It's not the problem with the pdf file as when I give that particular pdf separately it extracted its content.
I am running the script on Ubuntu 16.04
Any help will be grateful.
Error: -
File "/home/steve/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170 ,in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "ocr_script.py", line 466, in <module>
gather_details(path_list)
File "ocr_script.py", line 45, in gather_details
discover_data('Indexing',discoveryPath,final_meta,start_time)
File "ocr_script.py", line 165, in discover_data
text = pytesseract.image_to_string(image, lang='eng')
File "/home/steve/.local/lib/python3.5/site
packages/pytesseract/pytesseract.py", line 294
, in image_to_string
return run_and_get_output(*args)
File "/home/steve/.local/lib/python3.5/site-
packages/pytesseract/pytesseract.py", line 202
, in run_and_get_output
run_tesseract(**kwargs)
File "/home/steve/.local/lib/python3.5/site-
packages/pytesseract/pytesseract.py", line 172
, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: /usr/bin/tesseract is not
installed or it's

After further analysis and tweaks I came to conclusion that the problem was with my tesseract rather than OS.
Changes I did-
/etc/ImageMagic..(version )
Edit , policy.xml file
These are the parameters where I increased the memory.

Related

Generated Ebook throws error when trying to read it with ebook readers using Ebooklib

While the epub is being generated successfully, but when I try to read the epub using readers like Calibre or Sigil. They throw errors like certain files are missing.
Here's my code to generate the epub file:
book = epub.EpubBook()
book.set_title(novelName)
book.set_language("en")
book.set_cover('temp.jpg', content=open('temp.jpg','rb').read())
book.set_identifier("test")
for i in authorNames:
book.add_author(i)
for i in range(1):
driver.get(chapterLinks[i])
try:
content=driver.find_element_by_id('chr-content').get_attribute("innerHTML")
time.sleep(5)
except Exception as e:
driver.close()
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)
driver.get(chapterLinks[i])
content=driver.find_element_by_id('chr-content').get_attribute("innerHTML")
time.sleep(5)
soup = BeautifulSoup(content)
ads=soup.find("div", class_="ads-holder")
if(ads!=None):
ads.decompose()
print(chapterNames[i], chapterLinks[i])
chapterName=chapterNames[i].replace("-","")
c=epub.EpubHtml(title=chapterName,
file_name='{}.xhtml'.format(chapterName),
lang='en')
c.set_content(str(soup).encode('utf-8'))
book.add_item(c)
chapterList.append(c)
book.toc = chapterList
book.spine = chapterList
book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())
epub.write_epub('test.epub', book)
and here are the errors:
Calibre :
calibre, version 5.20.0
ERROR: Loading book failed: Failed to open the book at C:\Users\xxxxx\Documents\Visual Studio 2019\PersonalProjects\Novel Grabber\test.epub. Click "Show details" for more info.
Failed to convert book: C:\Users\xxxxx\Documents\Visual Studio 2019\PersonalProjects\Novel Grabber\test.epub with error:
InputFormatPlugin: EPUB Input running
on C:\Users\xxxxx\Documents\Visual Studio 2019\PersonalProjects\Novel Grabber\test.epub
Failed to run pipe worker with command: from calibre.srv.render_book import viewer_main; viewer_main()
Traceback (most recent call last):
File "runpy.py", line 194, in _run_module_as_main
File "runpy.py", line 87, in _run_code
File "site.py", line 82, in <module>
File "site.py", line 77, in main
File "site.py", line 49, in run_entry_point
File "calibre\utils\ipc\worker.py", line 197, in main
File "<string>", line 1, in <module>
File "calibre\srv\render_book.py", line 824, in viewer_main
File "calibre\srv\render_book.py", line 815, in render_for_viewer
File "calibre\srv\render_book.py", line 793, in render
File "calibre\srv\render_book.py", line 601, in process_exploded_book
File "calibre\srv\render_book.py", line 604, in <setcomp>
File "calibre\ebooks\oeb\polish\container.py", line 561, in has_name_and_is_not_empty
File "genericpath.py", line 50, in getsize
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\Users\\xxxxxx\\AppData\\Local\\calibre-cache\\ev2\\t\\c0-vdo66nim\\EPUB\\Chapter 2 '
Sigil:
Files exist in epub that are not listed in manifest, they will be ignored
Does anybody know what could be the cause for this?

PythonPDF: FileNotFoundError: [WinError 2] The system cannot find the file specified

I am getting the below error trace when I am using PDFJinja's example for filling form fields in an existing PDF file.
Code Snippet:
dir_name = os.path.dirname("P:\\Project\\pdfjinja_services\\resources\\sample.pdf")
template_pdf_file = os.path.join(dir_name, 'sample.pdf')
template_pdf = PdfJinja(template_pdf_file, current_app.jinja_env)
print(type(template_pdf))
rendered_pdf = template_pdf({
'firstName': 'Faye',
'lastName': 'Valentine'
})
output_file = os.path.join(dir_name, 'output.pdf')
rendered_pdf.write(open(output_file, 'wb'))
Error:
Traceback (most recent call last):
File "P:\Professional\Python\CR\workspace\workspace-local\myplaybook\pdf_pdfjinja.py", line 10, in
rendered_pdf = template_pdf({
File "P:\Professional\Python\Softwares\python-3.9.6\lib\site-packages\pdfjinja.py", line 240, in call
filled = PdfFileReader(self.exec_pdftk(self.rendered))
File "P:\Professional\Python\Softwares\python-3.9.6\lib\site-packages\pdfjinja.py", line 212, in exec_pdftk
p = Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
File "P:\Professional\Python\Softwares\python-3.9.6\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "P:\Professional\Python\Softwares\python-3.9.6\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
I am using python-3.9.6 and PDFJinja-1.1.0.
Please let me know if I am missing any other dependencies.
You are missing a binary (per update pdftk), or the binary is not in your search path.

Getting `EOFError: Compressed file ended before the end-of-stream marker was reached` error

I wrote a python script to download a file, extract it and train the AI. Code is as given below:
def maybe_download_and_extract(data_url):
dest_directory = FLAGS.model_dir
if not os.path.exists(dest_directory):
os.makedirs(dest_directory)
filename = data_url.split('/')[-1]
filepath = os.path.join(dest_directory, filename)
if not os.path.exists(filepath):
def _progress(count, block_size, total_size):
sys.stdout.write('\r>> Downloading %s %.1f%%' %
(filename, float(count * block_size) / float(total_size) * 100.0))
sys.stdout.flush()
filepath, _ = urllib.request.urlretrieve(data_url, filepath, _progress)
print()
statinfo = os.stat(filepath)
tf.logging.info('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
tarfile.open(filepath, 'r:gz').extractall(dest_directory)
When I run it, I get this error:
Traceback (most recent call last):
File "scripts/retrain.py", line 1326, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "scripts/retrain.py", line 982, in main
maybe_download_and_extract(model_info['data_url'])
File "scripts/retrain.py", line 340, in maybe_download_and_extract
tarfile.open(filepath, 'r:gz').extractall(dest_directory)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2010, in extractall
numeric_owner=numeric_owner)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2052, in extract
numeric_owner=numeric_owner)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2122, in _extract_member
self.makefile(tarinfo, targetpath)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 2171, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\tarfile.py", line 249, in copyfileobj
buf = src.read(bufsize)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\gzip.py", line 276, in read
return self._buffer.read(size)
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "C:\Users\kulkaa\AppData\Local\conda\conda\envs\mlcc\lib\gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
It seems that file was partially downloaded. Where is that file? I deleted contents of tmp folder and ran that program again, but got same error.

OSError: [Errno 8] Exec format error while reading video Moviepy

I'm trying to run a video processing code on NVIDIA TX2 using moviepy. The code is:
clip = VideoFileClip(video_file)
video_clip = clip.fl_image(process_vid)
video_clip.write_videofile(output_vid2)
I get the error in the first line. The full error is:
Traceback (most recent call last):
File "img_test.py", line 117, in <module>
clip = VideoFileClip(video_file)
File "/home/nvidia/.local/lib/python3.5/site-packages/moviepy/video/io/VideoFileClip.py", line 91, in __init__
fps_source=fps_source)
File "/home/nvidia/.local/lib/python3.5/site-packages/moviepy/video/io/ffmpeg_reader.py", line 33, in __init__
fps_source)
File "/home/nvidia/.local/lib/python3.5/site-packages/moviepy/video/io/ffmpeg_reader.py", line 256, in ffmpeg_parse_infos
proc = sp.Popen(cmd, **popen_params)
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg)
OSError: [Errno 8] Exec format error
I even used the refernce of this but nothing seems to work.
Any suggestions?

Apache Tika Server issue and unable to read a PDF file

I am trying to just read in the data from Apache tika library to parse the pdf files. I installed it through pip install tika using python 3.
Code:
from tika import parser
parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')
or
from tika import parser
parsedPDF = parser.from_file("test.pdf")
Error:
Traceback (most recent call last):
File "tikaparsing-test.py", line 2, in <module>
parsedPDF = parser.from_file("test.pdf",serverEndpoint='http://localhost:9998')
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\parser.py", line 36, in from_file
jsonOutput = parse1('all', filename, serverEndpoint, headers=headers)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 316, in parse1
headers, verbose, tikaServerJar, rawResponse=rawResponse)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 510, in callServer
serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 565, in checkTikaServer
startServer(jarPath, serverHost, port, classpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 609, in startServer
cmd = Popen(cmd , stdout= logFile, stderr = STDOUT, shell =True)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
PermissionError: [WinError 5] Access is denied

Resources