AWS Lambda function - convert PDF to Image - node.js

I am developing application where user can upload some drawings in pdf format. Uploaded files are stored on S3. After uploading, files has to be converted to images. For this purpose I have created lambda function which downloads file from S3 to /tmp folder in lambda execution environment and then I call ‘convert’ command from imagemagick.
convert sourceFile.pdf targetFile.png
Lambda runtime environment is nodejs 4.3. Memory is set to 128MB, timeout 30 sec.
Now the problem is that some files are converted successfully while others are failing with the following error:
{ [Error: Command failed: /bin/sh -c convert /tmp/sourceFile.pdf
/tmp/targetFile.png convert: %s' (%d) "gs" -q -dQUIET -dSAFER -dBATCH
-dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/tmp/magick-QRH6nVLV--0000001" "-f/tmp/magick-B610L5uo"
"-f/tmp/magick-tIe1MjeR" # error/utility.c/SystemCommand/1890.
convert: Postscript delegate failed/tmp/sourceFile.pdf': No such
file or directory # error/pdf.c/ReadPDFImage/678. convert: no images
defined `/tmp/targetFile.png' #
error/convert.c/ConvertImageCommand/3046. ] killed: false, code: 1,
signal: null, cmd: '/bin/sh -c convert /tmp/sourceFile.pdf
/tmp/targetFile.png' }
At first I did not understand why this happens, then I tried to convert problematic files on my local Ubuntu machine with the same command. This is the output from terminal:
**** Warning: considering '0000000000 XXXXX n' as a free entry.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Mac OS X 10.10.5 Quartz PDFContext <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
So the message was very clear, but the file gets converted to png anyway. If I try to do convert source.pdf target.pdf and after that convert target.pdf image.png, file is repaired and converted without any errors. This doesn’t work with lambda.
Since the same thing works on one environment but not on the other, my best guess is that the version of Ghostscript is the problem. Installed version on AMI is 8.70. On my local machine Ghostsript version is 9.18.
My questions are:
Is the version of ghostscript problem? Is this a bug with older
version of ghostscript? If not, how can I tell ghostscript (with or
without using imagemagick) to repair or ignore errors like it does on
my local environment?
If the old version is a problem, is it possible to build ghostscript
from source, create nodejs module and then use that version of
ghostscript instead the one that is installed?
Is there an easier way to convert pdf to image without using
imagemagick and ghostscript?
UPDATE
Relevant part of lambda code:
var exec = require('child_process').exec;
var AWS = require('aws-sdk');
var fs = require('fs');
...
var localSourceFile = '/tmp/sourceFile.pdf';
var localTargetFile = '/tmp/targetFile.png';
var writeStream = fs.createWriteStream(localSourceFile);
writeStream.write(body);
writeStream.end();
writeStream.on('error', function (err) {
console.log("Error writing data from s3 to tmp folder.");
context.fail(err);
});
writeStream.on('finish', function () {
var cmd = 'convert ' + localSourceFile + ' ' + localTargetFile;
exec(cmd, function (err, stdout, stderr ) {
if (err) {
console.log("Error executing convert command.");
context.fail(err);
}
if (stderr) {
console.log("Command executed successfully but returned error.");
context.fail(stderr);
}else{
//file converted successfully - do something...
}
});
});

You can find a compiled version of Ghostscript for Lambda in the following repository.
You should add the files to the zip file that you are uploading as the source code to AWS Lambda.
https://github.com/sina-masnadi/lambda-ghostscript
This is an npm package to call Ghostscript functions:
https://github.com/sina-masnadi/node-gs
After copying the compiled Ghostscript files to your project and adding the npm package, you can use the executablePath('path to ghostscript') function to point the package to the compiled Ghostscript files that you added earlier.

Its almost certainly a bug, or perhaps limitation, with the older version of Ghostscript.
Many PDF producers create PDF files which do not conform to the specification, and yet will open without complain in Adobe Acrobat. Ghostscript endeavours to do the same, but obviously we can't know what Acrobat is going to allow, so we are continually chasing this nebulous target. (FWIW that warning is a legitimate out-of-spec PDF file).
There's nothing you can do with the old version other than replace it.
Yes you can build Ghostscript from source, I have no idea about a nodejs module, not sure why that's relevant.
There are numerous other applications which will render a PDF file, MuPDF is another one I know of. And, of course, you can use Ghostscript directly without using ImageMagick. Of course, if you can load another application, then you should simply be able to replace your Ghostscript installation too.

The version of GS on aws is an old version with known bugs. We can get around this by uploading an x64 GS file, compiled specifically for Linux. Then upload that using new AWS lambda layers. I have written a node function that does just this here:
https://github.com/rcastoro/PDFImagine
Make sure you have that GS layer for your lambda, however!

Related

FFmpeg.wasm - retrieve video metadata with node js

I want to retrieve metadata from a converted video using FFmpeg web assembly package.
I tried to extract a metadata.txt file to my current directory but nothing was return from my command
in my index.js file:
const { createFFmpeg, fetchFile } = require('#ffmpeg/ffmpeg')
const ffmpeg = createFFmpeg({ log: true })
await ffmpeg.load()
await ffmpeg.run('-i', "myvideo.avi'", '-f', 'ffmetadata', 'metadata.txt')
I got error :
Output file is empty, nothing was encoded.
Is this kind of command available on this wasm package? Found nothing relevant on their documentation and I want to avoid the global FFmpeg installation
Wasm doesn't read from the file system. You need to copy files into wasm virtual file system for it to be able to reference files.

How files in a lambda layer will be copied to the /opt/bin directory?

I am working with a PDF to Image Conversion Project in AWS Lambda, but had some issues, since AWS lambda doesn't have the relevant binaries like ImageMagick in the environment, then I followed some links and StackOverflow question and put the relevant binaries as a layer, for the job I had to use Ghostscript compiled binaries.
The layer zip contain files like this
GhostScript.zip > bin > gs
I have a wrapper library call pdf2png and It will execute a child process which do the convertion, the command this child process use is the above mentioned gs utitity, but my issue is the path I mentioned for the utility is wrong, it throws an error saying,
Error: spawn /opt/bin/bin/gs ENOENT
So What I want to know is how will the lambda layer files be copied to the /opt/bin/ directory? how should I replace the path?
Corresponding code,
gs()
.batch()
.nopause()
.option('-r' + options.density)
// .option('-dDownScaleFactor=2')
.option('-dFirstPage=' + page)
.option('-dLastPage=' + page)
.executablePath('/opt/bin/bin/gs')
.device('png16m')
.output(output)
.input(filepath)
.exec(function (err, stdout, stderr) {
The executable path in my case have to be like this
.executablePath('/opt/bin/gs')
It has extracted the files in the bin folder inside the layer into the /opt/bin/ folder directly.

Can't load PDF with Wand/ImageMagick in Google Cloud Function

Trying to load a PDF from the local file system and getting a "not authorized" error.
"File "/env/local/lib/python3.7/site-packages/wand/image.py", line 4896, in read self.raise_exception() File "/env/local/lib/python3.7/site-packages/wand/resource.py", line 222, in raise_exception raise e wand.exceptions.PolicyError: not authorized `/tmp/tmp_iq12nws' # error/constitute.c/ReadImage/412
The PDF file is successfully saved to the local 'server' from GCS but won't be loaded by Wand. Loading images into OpenCV isn't an issue, just happening when trying to load PDFs using Wand/ImageMagick
Code to load the PDF from GCS to local file system into Wand/ImageMagick is below
_, temp_local_filename = tempfile.mkstemp()
gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
gcs_blob.download_to_filename(temp_local_filename)
# load the pdf into a set of images using imagemagick
with(Image(filename=temp_local_filename, resolution=200)) as source:
#run through pages and save images etc.
ImageMagick should be authorised to access files on the local filesystem so it should load the file without issue instead of this 'Not Authorised' error.
PDF reading by ImageMagick has been disabled because of a security vulnerability Ghostscript had. The issue is by design and a security mitigation from the ImageMagick team will exist until. ImageMagick Enables Ghostscript processing of PDFs again and Google Cloud Functions update to that new version of ImageMagick with PDF processing enabled again.
There's no fix for the ImageMagick/Wand issue in GCF that I could find but as a workaround for converting PDFs to images in Google Cloud Functions, you can use this [ghostscript wrapper][2] to directly request the PDF conversion to an image via Ghostscript and bypass ImageMagick/Wand. You can then load the PNGs into ImageMagick or OpenCV without issue.
requirements.txt
google-cloud-storage
ghostscript==0.6
main.py
# create a temp filename and save a local copy of pdf from GCS
_, temp_local_filename = tempfile.mkstemp()
gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
gcs_blob.download_to_filename(temp_local_filename)
# create a temp folder based on temp_local_filename
temp_local_dir = tempfile.mkdtemp()
# use ghostscript to export the pdf into pages as pngs in the temp dir
args = [
"pdf2png", # actual value doesn't matter
"-dSAFER",
"-sDEVICE=pngalpha",
"-o", temp_local_dir+"page-%03d.png",
"-r300", temp_local_filename
]
# the above arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
#run the request through ghostscript
ghostscript.Ghostscript(*args)
# read the files in the tmp dir and process the pngs individually
for png_file_loc in glob.glob(temp_local_dir+"*.png"):
# loop through the saved PNGs, load into OpenCV and do what you want
cv_image = cv2.imread(png_file_loc, cv2.IMREAD_UNCHANGED)
Hope this helps someone facing the same issue.

AWS Lambda permission denied when trying to use ffmpeg

I want to write a handler that responds to S3 put events to convert any avi files that are uploaded to mp4. I doing it in Java, in Eclipse, with the AWS toolkit plugin. For video conversion, I am using ffmpeg with ffmpeg-cli-wrapper, and I have provided a static (linux) binary of ffmpeg in the source tree.
I have found that when I upload the function, the binary gets put in /var/task, but when I try to use the test function I've written, I get a "permission denied" error.
import net.bramp.ffmpeg.FFmpeg;
public class LambdaFunctionHandler implements RequestHandler<S3Event, String> {
private static final String FFMPEG = "/var/task/ffmpeg";
public String handleRequest(S3Event event, Context context) {
try {
FFmpeg ff = new FFmpeg(FFMPEG);
System.out.println(ff.version());
} catch (Exception e) {
e.printStackTrace();
}
return "foo";
}
}
And the first line of the stacktrace: java.io.IOException: Cannot run program "/var/task/ffmpeg": error=13, Permission denied.
How do I execute this binary? I have done as others have suggested and chmod 755 the binary before uploading, but it hasn't made a difference.
AWS Lambda runs on Amazon Linux. It is a known issue. Try building (with static enabled) and check if it works on Amazon Linux and upload that binary. You do not have the privileges to chmod the files in /var/task/. Or try this solution that works:
Move ffmpeg to /tmp
chmod 755 /tmp/ffmpeg
Call /tmp/ffmpeg
See this discussion for more info.
I ran into this issue recently, and after messing with various manual solutions, what really solved the issue was:
Create a Lambda Layer, with only the ffmpeg binary inside a bin/ folder
Create a Lambda Function to implement said layer, and in the python code run /opt/bin/ffmpeg
See https://aws.amazon.com/blogs/media/processing-user-generated-content-using-aws-lambda-and-ffmpeg/
As helloV mentioned, you might have to include a static ffmpeg binary and copy it to a location and execute from there.
A detailed answer, (node.js code) is given here

How to deliver a node.js + imagemagick tool?

I've developed a node.js tool which uses imagemagick from the command line. That is, exec('my_imagemagick_commands'). What's the best way to deliver that tool to a client using Windows? That is, how can I make a Windows installer that'll install node.js, imagemagick and the tool - preferably as a binary, not the source - in a specific folder?
If you want an easy bundle... Zip the list below, deploy on client and drag/drop images to be processed onto the yourtool.cmd file (I'm doing something similar for image optimizers)
Bundle: (put these in one directory)
yourtool.cmd
yourtool.js
node.exe
node_modules/ (if applicable)
yourtool.cmd
REM Get the drive/path the batch file is in
set batchdir=%~d0%~p0
REM Run tool for items dragged over...
"%batchdir%node.exe" "%batchdir%yourtool.js" %*
yourtool.js
// start at 2 for arguments passed...
// 0 is node.exe
// 1 is the js file run
for (var i=2; i<process.argv.length; i++) {
var imagePath = process.argv[i];
//do something with image...
}
For others interested in imagemagick with node, you should check out node-imagemagick

Resources