Why there is a permisson denied error while using node-tesseract-ocr? - node.js

I am using node-tesseract-ocr library for using ocr for my node js project. I installed tesseract-ocr in my machine(windows) using choco and then node-tesseract-ocr using npm. While requesting that particular route I am getting the following error
Error, cannot read input file "myActualPath": Permission denied
Error during processing.
This is the code I am using
const config = {
lang: 'eng',
oem: 1,
psm: 3,
};
tesseract
.recognize(__dirname, `../public/data/${reciept}`, config)
.then((text) => {
serialResponse = text.match(new RegExp(serial, 'g'));
})
.catch((error) => {
console.log(error.message);
});

Make sure you have added the tesseract-OCR path in your environment variables, and restart your IDE
Note, for programs like PyCharm and many others, you need to also close the program and re-open it after setting the system environment variable - As told by silas in another post similar to this one.
You can refer that post here .
Make sure to import the necessary packages in your module
import pytesseract
import argparse
import cv2
Then construct the argument parser and parse the arguments.
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd")
args = vars(ap.parse_args())
Note The first Python import you’ll notice in this script is pytesseract (Python Tesseract), a Python binding that ties in directly with the Tesseract OCR application running on your system. The power of pytesseract is our ability to interface with Tesseract rather than relying on ugly os.cmd calls as we needed to do before pytesseract ever existed.
For additional reference.

Related

Python: subprocess + isql on windows: a bytes-like object is required, not 'str'

This is a company issued laptop and I can't install any new software on it. I can install Python modules to my own directory though. I need to run something on Sybase and there are 10+ servers. Manual operation is very time consuming hence I'm looking for the option as Python + subprocess.
I did some research and referred to using subprocess module in python to run isql command. However, my version doesn't work. The error message is TypeError: a bytes-like object is required, not 'str'. This error message popped up from the "communicate" line.
I can see my "isql" has connected successfully as I can get a isql.pid.
Anything I missed here?
Thank you
import subprocess
from subprocess import Popen, PIPE
import keyring
from textwrap import dedent
server = "MYDB"
cmd = r"C:\Sybase15\OCS-15_0\bin\isql.exe"
interface = r"C:\Sybase15\ini\sql.ini"
c = keyring.get_credential("db", None)
username = c.username
password = c.password
isql = Popen([cmd, '-I', interface,
'-S', server,
'-U', username,
'-P', password,
'-w', '99999'], stdin=PIPE, stdout=PIPE)
output = isql.communicate("""\
SET NOCOUNT ON
{}
go
""".format("select count(*) from syslogins"))[0]
From the communicate() documentation,
If streams were opened in text mode, input must be a string. Otherwise, it must be bytes.
By default, streams are opened in binary format, but you can change it to text mode by using the text=True argument in your Popen() call.

Why does the program run in command line but not with IDLE?

The code uses a Reddit wrapper called praw
Here is part of the code:
import praw
from praw.models import MoreComments
username = 'myusername'
userAgent = 'MyAppName/0.1 by ' + username
clientId = 'myclientID'
clientSecret = 'myclientSecret'
threadId = input('Enter your thread id: ');
reddit = praw.Reddit(user_agent=userAgent, client_id=clientId, client_secret=clientSecret)
submission = reddit.submission(id=threadId)
subredditName = submission.subreddit
subredditName = str(subredditName)
act = input('type in here what you want to see: ')
comment_queue = submission.comments[:] # Seed with top-level
submission.comments.replace_more(limit=None)
def dialogues():
for comment in submission.comments.list():
if comment.body.count('"')>7 or comment.body.count('\n')>3:
print(comment.body + '\n \n \n')
def maxLen():
res = 'abc'
for comment in submission.comments.list():
if len(comment.body)>len(res):
res=comment.body
print(res)
#http://code.activestate.com/recipes/269708-some-python-style-switches/
eval('%s()'%act)
Since I am new to Python and don't really get programming in general, I am surprised to see that the every bit of code in the commandline works but I get an error in IDLE on the first line saying ModuleNotFoundError: No module named 'praw'
you have to install praw using the command
pip install praw which install latest version of praw in the environment
What must be happening is that your cmd and idle are using different python interpreters i.e., you have two different modules which can execute python code. It can either be different versions of python or it can be the same version but, installed in different locations in your machine.
Let's call the two interpreters as PyA and PyB for now. If you have pip install praw in PyA, only PyA will be able to import and execute functions from that library. PyB still has no idea what praw means.
What you can do is install the library for PyB and everything will be good to go.

How to get WKHTMLTOPDF working on Heroku?

I created a website which generates PDF using PDFKIT and I know how to install and setup environment variable path on Window. I managed to deploy my first website on Heroku but now I'm getting error "No wkhtmltopdf executable found: "b''" When trying to generate the PDF.
I have no idea, How to install and setup WKHTMLTOPDF on Heroku because this is first time I'm dealing with Linux.
I really tried everything before asking this but even following this not working for me.
Python 3 flask install wkhtmltopdf on heroku
If possible, please guide me with step by step on how to install and setup this.
I followed all the resource and everything but couldn't make it work. Every time I get the same error.
I'm using Django version 2. Python version 3.7.
This is what I get if I do heroku stack
Available Stacks
cedar-14
container
heroku-16
* heroku-18
Error, I'm getting when generating the PDF.
No wkhtmltopdf executable found: "b''"
If this file exists please check that this process can read it. Otherwise please install wkhtmltopdf - https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf
My website works very well on localhost without any problem and as far as I know, I'm sure that I have done something wrong in installing wkhtmltopdf.
Thank you
It's non-trivial. If you want to avoid all of the below's headache, you can just use my service, api2pdf: https://github.com/api2pdf/api2pdf.python. Otherwise, if you want to try and work through it, see below.
1) Add this to your requirements.txt to install a special wkhtmltopdf pack for heroku as well as pdfkit.
git+git://github.com/johnfraney/wkhtmltopdf-pack.git
pdfkit==0.6.1
2) I created a pdf_manager.py in my flask app. In pdf_manager.py I have a method:
def _get_pdfkit_config():
"""wkhtmltopdf lives and functions differently depending on Windows or Linux. We
need to support both since we develop on windows but deploy on Heroku.
Returns:
A pdfkit configuration
"""
if platform.system() == 'Windows':
return pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
else:
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')], stdout=subprocess.PIPE).communicate()[0].strip()
return pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
The reason I have the platform statement in there is that I develop on a windows machine and I have the local wkhtmltopdf binary on my PC. But when I deploy to Heroku, it runs in their linux containers so I need to detect first which platform we're on before running the binary.
3) Then I created two more methods - one to convert a url to pdf and another to convert raw html to pdf.
def make_pdf_from_url(url, options=None):
"""Produces a pdf from a website's url.
Args:
url (str): A valid url
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
Returns:
pdf of the website
"""
return pdfkit.from_url(url, False, configuration=_get_pdfkit_config(), options=options)
def make_pdf_from_raw_html(html, options=None):
"""Produces a pdf from raw html.
Args:
html (str): Valid html
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
Returns:
pdf of the supplied html
"""
return pdfkit.from_string(html, False, configuration=_get_pdfkit_config(), options=options)
I use these methods to convert to PDF.
Just follow these steps to Deploy Django app(pdfkit) on Heroku:
Step 1:: Add following packages in requirements.txt file
wkhtmltopdf-pack==0.12.3.0
pdfkit==0.6.0
Step 2: Add below lines in the views.py to add path of binary file
import os, sys, subprocess, platform
if platform.system() == "Windows":
pdfkit_config = pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
else:
os.environ['PATH'] += os.pathsep + os.path.dirname(sys.executable)
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')],
stdout=subprocess.PIPE).communicate()[0].strip()
pdfkit_config = pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
Step 3: And then pass pdfkit_config as argument as below
pdf = pdfkit.from_string(html,False,options, configuration=pdfkit_config)

I used Open CV to get the Image Difference using Python and code doesn't work

I am using Python 3.6.2. I am looking to run this code https://www.pyimagesearch.com/2017/06/19/image-difference-with-opencv-and-python/, but I have received this error:
usage: [-h] -f FIRST -s SECOND
error: the following arguments are required: -f/--first, -s/--second"
when I run the last line of this code and I don't know what is wrong:
from skimage.measure import compare_ssim
import argparse
import imutils
import cv2
import args
ap = argparse.ArgumentParser()
ap.add_argument("-f", "--first", required=True,default='I:\Aaron - Satslab\Pyimagesearch - code - Image processing and computer vision and others\image-difference\images\first.png',
help="firstinputimage")
ap.add_argument("-s", "--second", required=True,default='I:\Aaron - Satslab\Pyimagesearch - code - Image processing and computer vision and others\image-difference\images\second.png',
help="second")
args = vars(ap.parse_args())
Looking forward to your help.
The problem is that you added a default value to the arguments you add, but didn't set required=False. This means the program will throw an exception when parsing the arguments, unless you invoke it with the actual -f/--first and -s/--second.
The solution is to either:
Set required=False in both add_argument calls, since you provide a default. That way, you can call python my_script.py and it will use the default provided.
Invoke your program by providing the two CLI options: python my_script. py -f some_file.png -s some_other_file.png

ipython notebook --script deprecated. How to replace with post save hook?

I have been using "ipython --script" to automatically save a .py file for each ipython notebook so I can use it to import classes into other notebooks. But this recenty stopped working, and I get the following error message:
`--script` is deprecated. You can trigger nbconvert via pre- or post-save hooks:
ContentsManager.pre_save_hook
FileContentsManager.post_save_hook
A post-save hook has been registered that calls:
ipython nbconvert --to script [notebook]
which behaves similarly to `--script`.
As I understand this I need to set up a post-save hook, but I do not understand how to do this. Can someone explain?
[UPDATED per comment by #mobius dumpling]
Find your config files:
Jupyter / ipython >= 4.0
jupyter --config-dir
ipython <4.0
ipython locate profile default
If you need a new config:
Jupyter / ipython >= 4.0
jupyter notebook --generate-config
ipython <4.0
ipython profile create
Within this directory, there will be a file called [jupyter | ipython]_notebook_config.py, put the following code from ipython's GitHub issues page in that file:
import os
from subprocess import check_call
c = get_config()
def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)
c.FileContentsManager.post_save_hook = post_save
For Jupyter, replace ipython with jupyter in check_call.
Note that there's a corresponding 'pre-save' hook, and also that you can call any subprocess or run any arbitrary code there...if you want to do any thing fancy like checking some condition first, notifying API consumers, or adding a git commit for the saved script.
Cheers,
-t.
Here is another approach that doesn't invoke a new thread (with check_call). Add the following to jupyter_notebook_config.py as in Tristan's answer:
import io
import os
from notebook.utils import to_api_path
_script_exporter = None
def script_post_save(model, os_path, contents_manager, **kwargs):
"""convert notebooks to Python script after save with nbconvert
replaces `ipython notebook --script`
"""
from nbconvert.exporters.script import ScriptExporter
if model['type'] != 'notebook':
return
global _script_exporter
if _script_exporter is None:
_script_exporter = ScriptExporter(parent=contents_manager)
log = contents_manager.log
base, ext = os.path.splitext(os_path)
py_fname = base + '.py'
script, resources = _script_exporter.from_filename(os_path)
script_fname = base + resources.get('output_extension', '.txt')
log.info("Saving script /%s", to_api_path(script_fname, contents_manager.root_dir))
with io.open(script_fname, 'w', encoding='utf-8') as f:
f.write(script)
c.FileContentsManager.post_save_hook = script_post_save
Disclaimer: I'm pretty sure I got this from SO somwhere, but can't find it now. Putting it here so it's easier to find in future (:
I just encountered a problem where I didn't have rights to restart my Jupyter instance, and so the post-save hook I wanted couldn't be applied.
So, I extracted the key parts and could run this with python manual_post_save_hook.py:
from io import open
from re import sub
from os.path import splitext
from nbconvert.exporters.script import ScriptExporter
for nb_path in ['notebook1.ipynb', 'notebook2.ipynb']:
base, ext = splitext(nb_path)
script, resources = ScriptExporter().from_filename(nb_path)
# mine happen to all be in Python so I needn't bother with the full flexibility
script_fname = base + '.py'
with open(script_fname, 'w', encoding='utf-8') as f:
# remove 'In [ ]' commented lines peppered about
f.write(sub(r'[\n]{2}# In\[[0-9 ]+\]:\s+[\n]{2}', '\n', script))
You can add your own bells and whistles as you would with the standard post save hook, and the config is the correct way to proceed; sharing this for others who might end up in a similar pinch where they can't get the config edits to go into action.

Resources