Imghdr module in Python 3.9.5 misidentifying .png file - python-3.x

When I call imghdr.what(sampleImage.png) it recognizes it as a jpeg for some reason. For example if I use print(imghdr.what(sampleImage.png)) it will output the string "jpeg". I checked the image file and it is displayed as having the expected PNG file format on Mac OSX.

Related

image to osd tesseract error using python 3.6

I m trying to use image_to_osd function of tesseract but I got this error for python 3.6, but when I test the same script in an other environment with python 3.8 it works !!, is there any configuration for python 3.6 or anything to do ?
angle_rotated_image = re.search('(?<=Rotate: )\d+',pytesseract.image_to_osd(rotated)).group(0)
error:
angle_rotated_image = re.search('(?<=Rotate: )\d+',pytesseract.image_to_osd(rotated)).group(0)
File "C:\Users\username\AppData\Roaming\Python\Python36\site-packages\pytesseract\pytesseract.py", line 543, in image_to_osd
}[output_type]()
File "C:\Users\username\AppData\Roaming\Python\Python36\site-packages\pytesseract\pytesseract.py", line 542, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\Users\username\AppData\Roaming\Python\Python36\site-packages\pytesseract\pytesseract.py", line 287, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\username\AppData\Roaming\Python\Python36\site-packages\pytesseract\pytesseract.py", line 263, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v5.0.0.20190623 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 163 Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')
I ran into a similar problem when trying to determine rotation of a given document and trying to use pytesseract's image_to_osd(). It was working fine for me on MacOS with tesseract 4.1.1, but it wouldn't work on Windows with tesseract 5.0.0-alpha. After reading through many threads on the topic related to the OP's error and trying various things like passing --dpi and -c min_chararacters_to_try= with no success, I finally tried using a different version of tesseract on Windows, which finally solved my problem.
Status of image_to_osd():
(PASS) OS MacOS; tesseract 4.1.1; pytesseract 0.3.0; Python 3.6.5
(PASS) OS Windows; tesseract 4.1.0; pytesseract 0.3.0; Python 3.6.5
(FAIL) OS Windows; tesseract 5.0.0; pytesseract 0.3.0; Python 3.6.5
I think pytesseract 0.3.7 will probably work too, but I didn't test it.
Note that you can still get OP's error with this, but from what I tested it's much more reasonable now, e.g., with blank pages.

Problems after generating the exe from a py file

I am using Python Version 3.7.3 and PyInstaller 3.5. I have written a Python script named read.py which will read the scanned PDF and convert it to a text file. I could successfully generate the exe using the PyInstaller commands pyi-makespec --onefile read.py and pyinstaller --onefile read.spec. The .exe file is working fine in my system where i have all the packages installed, but when I tried to run it on a different Windows PC where the Python packages are not available, it is looking for the poppler module.
I even tried to add the hook file for poppler in pyinstaller hooks folder but it did not take this hook file while generating the .exe file.
Please help to resolve this issue, or please let me know if there is any other way to generate the exe without having the dependencies on the Python supporting files.
Below is the error which I'm getting when I execute the .exe file on a different PC:
File "read_image.py", line 29, in <module>
File "site-packages\pdf2image\pdf2image.py", line 54, in convert_from_path
File "site-packages\pdf2image\pdf2image.py", line 244, in _page_count pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? [17920] Failed to execute script read_image
Thanks for the Help...

Python: Syntax error in downloaded Github code at input and output lines

Python-3.x IDLE is giving me a syntax error when I attempt to run this code with a custom input .json file and .csv output file. The codes purpose is to convert any .json file into a .csv file and output it to a specific path.
I have attempted cloning the repository through Github. First I used the python 3.9 IDLE, then the python 3.4 IDLE; both do not run the code because of a syntax error. I tried running the code in the path it was in using the following input "python index.py --input /path/to/file.json --output /path/to/output/file.csv" in the python terminal and it comes up as an error. The final and most recent try, which I have displayed in the code, was writing to the add argument lines for the input and output (not sure why I didn't try it before) and it came up with a unicode read error
parser = argparse.ArgumentParser(description='Convert Google Maps location history to csv file.')
parser.add_argument('-i', '--input', action="store", help='C:\Users\bmevans\Downloads\takeout-20190904T042913Z-001.zip\Takeout\Location History.json file.', required=True)
parser.add_argument('-o', '--output', action="store", help='C:\Users\bmevans\Desktop.csv file.', required=True)
I expect the following error to be "error: "unicodeescape codec can't decode bytes in position 2-3: truncated\UXXXXXXXX escape", not sure why that is. I had assumed the lack of an os import was the problem but that isnt the case either. Could it be that I included an absolute path, not a relative?

Converting with ImageMagick - illegal parameter

I am trying to convert a PDF to JPG, main goal is to have a thumbnail so that the user can preview the PDF within the application.
Apparently ImageMagick is a good way to do this, yet so far I failed to get it to convert the file.
import subprocess
params = ['convert', '-density 300 -resize 220x205', 'dummy.pdf', 'thumb.jpg']
subprocess.check_call(params)
So this is what I am getting instead of having the file converted:
Unzulässiger Parameter - 300
Traceback (most recent call last):
File "pdf_preview_test.py", line 4, in
subprocess.check_call(params)
File "C:\Users\EliasMessner\AppData\Local\Programs\Python\Python37-32\lib\subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['convert', '-density 300 -resize 220x205', 'dummy.pdf', 'thumb.jpg']' returned non-zero exit status 4.
I couldn't find any info about the problem. It looks like the "300" parameter is illegal but IMHO it is used correctly.. Help would be much appreciated. Thanks.
As you are on Windows the code could be trying the built in Windows convert program. You do not say what version on Imagemagick you are using; V7 uses magick as opposed to convert which can prevent this problem.
If you allowed Imagemagick to add convert to the environmental path on install it would probably not be a problem - it never has for me on multiple installs.
You could try changing convert to the full path to convert surrounding the path in " ". You can rename the convert program e.g. myconvert and use that in the program rather than convert.
I would try a Imagemagick command out in the command line to prove it works as well first.

Exe to python with pyinstaller?

So I made a huge mistake and deleted my code file (python). The only thing I have is my python file as .exe that I created with pyinstaller. Is there a way to reverse this and to extract my code file from .exe?
You can extract the contents of the .exe file using PyInstaller Extractor. Run it like this:
python pyinstxtractor.py executable.exe
You will then get a bunch of files, including your original python file.

Resources