How to get WKHTMLTOPDF working on Heroku? - python-3.x

I created a website which generates PDF using PDFKIT and I know how to install and setup environment variable path on Window. I managed to deploy my first website on Heroku but now I'm getting error "No wkhtmltopdf executable found: "b''" When trying to generate the PDF.
I have no idea, How to install and setup WKHTMLTOPDF on Heroku because this is first time I'm dealing with Linux.
I really tried everything before asking this but even following this not working for me.
Python 3 flask install wkhtmltopdf on heroku
If possible, please guide me with step by step on how to install and setup this.
I followed all the resource and everything but couldn't make it work. Every time I get the same error.
I'm using Django version 2. Python version 3.7.
This is what I get if I do heroku stack
Available Stacks
cedar-14
container
heroku-16
* heroku-18
Error, I'm getting when generating the PDF.
No wkhtmltopdf executable found: "b''"
If this file exists please check that this process can read it. Otherwise please install wkhtmltopdf - https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf
My website works very well on localhost without any problem and as far as I know, I'm sure that I have done something wrong in installing wkhtmltopdf.
Thank you

It's non-trivial. If you want to avoid all of the below's headache, you can just use my service, api2pdf: https://github.com/api2pdf/api2pdf.python. Otherwise, if you want to try and work through it, see below.
1) Add this to your requirements.txt to install a special wkhtmltopdf pack for heroku as well as pdfkit.
git+git://github.com/johnfraney/wkhtmltopdf-pack.git
pdfkit==0.6.1
2) I created a pdf_manager.py in my flask app. In pdf_manager.py I have a method:
def _get_pdfkit_config():
"""wkhtmltopdf lives and functions differently depending on Windows or Linux. We
need to support both since we develop on windows but deploy on Heroku.
Returns:
A pdfkit configuration
"""
if platform.system() == 'Windows':
return pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
else:
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')], stdout=subprocess.PIPE).communicate()[0].strip()
return pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
The reason I have the platform statement in there is that I develop on a windows machine and I have the local wkhtmltopdf binary on my PC. But when I deploy to Heroku, it runs in their linux containers so I need to detect first which platform we're on before running the binary.
3) Then I created two more methods - one to convert a url to pdf and another to convert raw html to pdf.
def make_pdf_from_url(url, options=None):
"""Produces a pdf from a website's url.
Args:
url (str): A valid url
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
Returns:
pdf of the website
"""
return pdfkit.from_url(url, False, configuration=_get_pdfkit_config(), options=options)
def make_pdf_from_raw_html(html, options=None):
"""Produces a pdf from raw html.
Args:
html (str): Valid html
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
Returns:
pdf of the supplied html
"""
return pdfkit.from_string(html, False, configuration=_get_pdfkit_config(), options=options)
I use these methods to convert to PDF.

Just follow these steps to Deploy Django app(pdfkit) on Heroku:
Step 1:: Add following packages in requirements.txt file
wkhtmltopdf-pack==0.12.3.0
pdfkit==0.6.0
Step 2: Add below lines in the views.py to add path of binary file
import os, sys, subprocess, platform
if platform.system() == "Windows":
pdfkit_config = pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
else:
os.environ['PATH'] += os.pathsep + os.path.dirname(sys.executable)
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')],
stdout=subprocess.PIPE).communicate()[0].strip()
pdfkit_config = pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
Step 3: And then pass pdfkit_config as argument as below
pdf = pdfkit.from_string(html,False,options, configuration=pdfkit_config)

Related

Download file from website directly into Linux directory - Python

If I manually click on button, the browser starts downloading a CSV file (2GB) onto my computer. But I want to automate this.
This is the link to download:
https://data.cityofnewyork.us/api/views/bnx9-e6tj/rows.csv?accessType=DOWNLOAD
Issue; when I use either (requests or pandas) libraries it just hangs. I have no idea if it is being downloaded or not.
My goal is to:
Know if the file is being downloaded and
Have the CSV downloaded to a specified directory ie.
~/mydirectory
Can someone provide the code to do this?
Try this...
import requests
URL = "https://data.cityofnewyork.us/api/views/bnx9-e6tj/rows.csv?accessType=DOWNLOAD"
response = requests.get(URL)
print('Download Complete')
open("/mydirectory/downloaded_file.csv", "wb").write(response.content)
Or you could do it this way and have a progress bar ...
import wget
wget.download('https://data.cityofnewyork.us/api/views/bnx9-e6tj/rows.csv?accessType=DOWNLOAD')
The output will look like this:
11% [........ ] 73728 / 633847

How to check if the environment variable "PROJ_LIB" is defined and how to unset it ? (PyQGIS Standalone Script Executer)

I just tried the standalone PyQGIS application by running the custom script "Proximity.py"* in a VS Code project without the need of a GUI (such as QGIS).
But, when I run the python-program I get the following message:
proj_create_from_database: C:\Program Files\PostgreSQL\14\share\contrib\postgis-3.2\proj\proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 0 whereas a number >= 2 is expected. It comes from another PROJ installation. (see also: Error Message after launching the configuration (launch.json) from VS Code (when pressing F5))
I'm trying this online example with the following installations:
PostgreSQL 14
Python39
.vscode\extensions\ms-python.python-2022.4.1\pythonFiles\lib\python\debugpy\launcher
osgeo4w-setup.exe (including QGIS LTR)
I read that there is a solution by undefining [PROJ_LIB] before importing pyproj or osgeo: del os.environ ['PROJ_LIB'] as described under this link. If this is also supposed to be the correct solution in this case, can someone help me with step-by-step instructions (for dummies)?
. * The "Proximity.py" script is a pyqgis standalone example from "https://github.com/MarByteBeep/pyqgis-standalone"
Finally, I got a solution to be able to run the "standalone PyQGIS"* example "Proximity" (provided by MarByteBeep).
This solution was possible without needing to launch the configuration file "launch.json" as above described. And so, avoiding the need to make any configuration to the environment variable "PROJ_LIB" by trying to circumvent the above issue.
I just first added the following two code-lines (see here line 2 and 3) in the python file "main.py" so as to be able to use the plugin "PROCESSING" (initially line 8 of the "main.py" file), then I store it and finally I ran it.
Line 1: from qgis.core import
Line 2: import sys
Line 3: sys.path.append('C:\Program Files\QGIS 3.24.1\apps\qgis\python\plugins')
Line 4: qgs = QgsApplication([], False)
Line 5: ...
The Proximity example is based on the answer of "Mar Tjin" to the following Question: "Looking for manual on how to properly setup standalone PyQGIS without GUI"
. * By "Standalone PyQGIS" I refer to code/scripts that can be run outside the QGIS-GUI (=> QGIS-Desktop/Server Application). In my case under the external Editor VS Code

Can't load PDF with Wand/ImageMagick in Google Cloud Function

Trying to load a PDF from the local file system and getting a "not authorized" error.
"File "/env/local/lib/python3.7/site-packages/wand/image.py", line 4896, in read self.raise_exception() File "/env/local/lib/python3.7/site-packages/wand/resource.py", line 222, in raise_exception raise e wand.exceptions.PolicyError: not authorized `/tmp/tmp_iq12nws' # error/constitute.c/ReadImage/412
The PDF file is successfully saved to the local 'server' from GCS but won't be loaded by Wand. Loading images into OpenCV isn't an issue, just happening when trying to load PDFs using Wand/ImageMagick
Code to load the PDF from GCS to local file system into Wand/ImageMagick is below
_, temp_local_filename = tempfile.mkstemp()
gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
gcs_blob.download_to_filename(temp_local_filename)
# load the pdf into a set of images using imagemagick
with(Image(filename=temp_local_filename, resolution=200)) as source:
#run through pages and save images etc.
ImageMagick should be authorised to access files on the local filesystem so it should load the file without issue instead of this 'Not Authorised' error.
PDF reading by ImageMagick has been disabled because of a security vulnerability Ghostscript had. The issue is by design and a security mitigation from the ImageMagick team will exist until. ImageMagick Enables Ghostscript processing of PDFs again and Google Cloud Functions update to that new version of ImageMagick with PDF processing enabled again.
There's no fix for the ImageMagick/Wand issue in GCF that I could find but as a workaround for converting PDFs to images in Google Cloud Functions, you can use this [ghostscript wrapper][2] to directly request the PDF conversion to an image via Ghostscript and bypass ImageMagick/Wand. You can then load the PNGs into ImageMagick or OpenCV without issue.
requirements.txt
google-cloud-storage
ghostscript==0.6
main.py
# create a temp filename and save a local copy of pdf from GCS
_, temp_local_filename = tempfile.mkstemp()
gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
gcs_blob.download_to_filename(temp_local_filename)
# create a temp folder based on temp_local_filename
temp_local_dir = tempfile.mkdtemp()
# use ghostscript to export the pdf into pages as pngs in the temp dir
args = [
"pdf2png", # actual value doesn't matter
"-dSAFER",
"-sDEVICE=pngalpha",
"-o", temp_local_dir+"page-%03d.png",
"-r300", temp_local_filename
]
# the above arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
#run the request through ghostscript
ghostscript.Ghostscript(*args)
# read the files in the tmp dir and process the pngs individually
for png_file_loc in glob.glob(temp_local_dir+"*.png"):
# loop through the saved PNGs, load into OpenCV and do what you want
cv_image = cv2.imread(png_file_loc, cv2.IMREAD_UNCHANGED)
Hope this helps someone facing the same issue.

Pyramid's Chameleon renderer template can't be found using relative path

I'm new to pyramid. When trying to use chameleon as the templating engine, it fails to find the template when specified with a relative path - It is looking for it at env35/lib/python3.5/site-packages/pyramid/ where env35 is the virtual environment I created. It will work however if the full path is specified. It will also work using a relative path using jinja2 as the templating engine.
Why can I not use a chameleon template using relative path?
From the manual
add_view(...., renderer,...)
This is either a single string term (e.g. json) or a string implying a path or asset specification (e.g. templates/views.pt)
naming a renderer implementation. If the renderer value does not
contain a dot ., the specified string will be used to look up a
renderer implementation, and that renderer implementation will be used
to construct a response from the view return value. If the renderer
value contains a dot (.), the specified term will be treated as a
path, and the filename extension of the last element in the path will
be used to look up the renderer implementation, which will be passed
the full path. The renderer implementation will be used to construct a
response from the view return value.
Note that if the view itself returns a response (see View Callable Responses), the specified renderer implementation is never called.
When the renderer is a path, although a path is usually just a simple relative pathname (e.g. templates/foo.pt, implying that a
template named "foo.pt" is in the "templates" directory relative to
the directory of the current package of the Configurator), a path can
be absolute, starting with a slash on UNIX or a drive letter prefix on
Windows. The path can alternately be a asset specification in the form
some.dotted.package_name:relative/path, making it possible to address
template assets which live in a separate package.
The renderer attribute is optional. If it is not defined, the "null" renderer is assumed (no rendering is performed and the value is
passed back to the upstream Pyramid machinery unmodified).
Here are the steps I undertook to set up my environment...
export VENV=~/Documents/app_projects/pyramid_tutorial/env35
python3 -m venv $VENV
source $VENV/bin/activate #activate the virtual environment
pip install --upgrade pip
pip install pyramid
pip install wheel
pip install pyramid_chameleon
pip install pyramid_jinja2
My file structure:
pyramid_tutorial
env35
bin
...
templates
hello.jinja2
hello.pt
test_app.py
test_app.py:
from wsgiref.simple_server import make_server
from pyramid.config import Configurator
from pyramid.response import Response
from pyramid.view import view_config
def hello(request):
return dict(name='Bugs Bunny')
if __name__ == '__main__':
config = Configurator()
config.include('pyramid_chameleon')
config.include('pyramid_jinja2')
#This does not work... http://localhost:6543/chameleon
config.add_route('hello_world_1', '/chameleon')
config.add_view(hello, route_name='hello_world_1', renderer='templates/hello.pt')
# ValueError: Missing template asset: templates/hello.pt (/home/david/Documents/app_projects/pyramid_tutorial/env35/lib/python3.5/site-packages/pyramid/templates/hello.pt)
#This works... http://localhost:6543/chameleon2
config.add_route('hello_world_2', '/chameleon2')
config.add_view(hello, route_name='hello_world_2', renderer='/home/david/Documents/app_projects/pyramid_tutorial/templates/hello.pt')
#This works... http://localhost:6543/jinja
config.add_route('hello_world_3', '/jinja')
config.add_view(hello, route_name='hello_world_3', renderer='templates/hello.jinja2')
app = config.make_wsgi_app()
server = make_server('0.0.0.0', 6543, app)
print ('Serving at http://127.0.0.1:6543')
server.serve_forever()
hello.pt:
<p>Hello <strong>${name}</strong>! (Chameleon renderer)</p>
hello.jinja2:
<p>Hello <strong>{{name}}</strong>! (jinja2 renderer)</p>
It will work if you specify renderer=__name__ + ':templates/hello.pt'. The resolution logic doesn't work in this case because the file is not being executed as a python package and thus some weird stuff can occur. pyramid_chameleon could likely be updated with better support here but by far the common case for real apps is to write your code as a package which will work as expected.
It might also work if you tweak things slighty run your script as a module via python -m test_app.

How to load a raster layer using PyQGIS?

Although there are some posts on this matter, there is no answer in anyone of them. This is why I am asking it again.
One post I found was https://gis.stackexchange.com/questions/68032/raster-layer-invalid
I read information from the following link: https://hub.qgis.org/wiki/17/Arcgis_rest .
I used the command: gdal_translate "http://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer?f=json&pretty=true" s.xml -of WMS. And it generated the file successfully. However, when I try to open the file and assuming the provider is wms, the code report layer is invalid.
The code I used is:
file = QFileDialog.getOpenFileName(self,
"Open WMS", ".", "WMS (*.xml)")
fileInfo = QFileInfo(file)
# Add the layer
layer = QgsRasterLayer(file, fileInfo.fileName(),"wms")
if not layer.isValid():
print "Failed to load."
return
I just choose the file from the dialog box.
I also tried the other command: qgis.utils.iface.addRasterLayer("http://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer?f=json&pretty=true","raster") by using the following code:
layer = QgsRasterLayer("http://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer?f=json&pretty=true", "layer")
if not layer.isValid():
print "Failed to load."
return
It also report "Failed to load". The original command can be run successfully in QGIS python command line. Also, if I try to enter the code in python console, the layer.isValid() would return true. It is just not working in standalone script.
Answer can be found here: https://gis.stackexchange.com/questions/120823/how-to-load-a-wms-layer-using-pyqgis.
Basically, it is just a version problem. If you have qgis previous than v2.6, it would not work. But it is fixed for 2.6
If it is still not working for you, you most likely have problem for environment variable settings.
This is working for me for single band image.I am using python 2.7 and QGIS 2.0.1 .You can load any raster layer like wms,tiff (single band or multiband) etc. using this.:
def ifile(self):
global fileName
fileName = str(QtGui.QFileDialog.getOpenFileName(self.iface.mainWindow(),"Open Raster File",'C:\\',"raster files(*.tif *.tiff *.TIF *.TIFF *.IMG *.img )"))
if len(fileName) is 0:
return
else:
self.inFileName = fileName;
filelayer = QgsRasterLayer(fileName,os.path.basename(fileName))
if filelayer == None or filelayer.bandCount() != 1:
self.errorMessage = "Not a DEM Image"
QMessageBox.information(self.iface.mainWindow(), "Error", self.errorMessage)
else:
#f=open(str(self.inFileName))
self.dlg.lineEdit.setText(self.inFileName)
if filelayer.isValid():
QgsMapLayerRegistry.instance().addMapLayer(filelayer)
pass

Resources