ImageMagick issue on AppEngine Standard (PDFs and NodeJS) - node.js

I am using App Engine Standard. Since ImageMagick is available on it, I tried a few PDF manipulation libraries and basically, what I would like to do, is simply converting a PDF into an image.
The issue I am getting is this:
'convert-im6.q16: not authorized /tmp/ygM1sF-Txq00JkGbpal8YWBQ.pdf\'
# error/constitute.c/ReadImage/412.\nconvert-im6.q16: no images
defined/tmp/ygM1sF-Txq00JkGbpal8YWBQ-0.png\' #
error/convert.c/ConvertImageCommand/3258.\n' }
After some research, I found out that post here: Fix for ImageMagick convert errors with pdf files. Here is what he says:
PDF files on Linux systems are usually handled by ghostscript (via the
terminal command gs). And, ImageMagick (done through the terminal
convert command) uses ghostscript for reading and writing PDF files.
Because the security problems are serious and numerous, ImageMagick’s
access to PDF files is then cut off.
Granted, through these security flaws in PDF someone could craft a
malicious image file that, when converted by ImageMagick into a PDF,
will then do very nasty things to your computer.
But, ghostscript has since been updated once and once again with
security fixes. How about a fix for ImageMagick to get PDF
functionality back? Or, at least an explanation of progress towards
fixing this issue?
I can't change the ImageMagick configuration on App Engine Standard, but I wonder if there is something else I can do. Or maybe the engineers at Google would be able to update ImageMagick instead and remove that limitation?
I really need to convert PDF into images, so I wonder if it worth waiting, or if I need to find another solution.
Thanks for your ideas.

Related

GhostScript - ImageMagick converts pdf to image to odd letters when converting Microsoft Print to PDF files

NOTICE: Watch updates at bottom.
I am building an API which supposed to convert PDF to base64 images (doesn't matter which type - jpg, jpeg, png..).
The API is built with NodeJS on CentOS 7.5 x64.
I have searched all over the web for npm packages which converts pdf to images, the very most of them uses ImageMagick and GhostScript (The others doesn't seem to work). These packages work well on code but the problem starts when GhostScript does it job.
For example, a simple pdf page with text will look like this after conversion:
This is the output in shell:
**** Warning: can't process font stream, loading font by the name.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Microsoft: Print To PDF <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
I have tried to convert the images with shell commands ended up with the same outputs.
Thanks by advance.
UPDATE:
Converting a sample pdf file which probably was not printed to pdf by Microsoft worked fine, maybe this is the problem?
UPDATE 2:
After converting a few more pdfs it turns out that this is Microsoft Print to PDF files only that making this problem.
This was reported as a bug to the Ghostscript Bugzilla here
As can be seen from the thread, this is due to using an old version of Ghostscript, and has been fixed at some point in the past. So the problem is due to using old (in this case more than 5 years old) software.

Pentaho 7.1 PDF wrong diacritc

We have installed new LINUX server for our Pentaho installation, but I am having problem with diacritics in PDF generated files.
For web (HTML), I have set encoding to utf-8 which is working perfectly.
BUT for PDF encoding utf-8 is not working. I have "fixed" it on old server by setting up CP-1250 encoding, but I don't want to use old standard anymore. So I have been trying to fix it.
I have set option in pentaho-server/tomcat/webapps/pentaho/WEB-INF/classes/classic-engine.properties to
org.pentaho.reporting.engine.classic.core.modules.output.pageable.pdf.Encoding=UTF-8
but PDF versions of report are still ignoring letters with diacritics..
Soo, my thought is, that there must be some PDF encoding setting above this, perhaps some global PDF generator setting or perhaps Java or Linux itself?
Is anyone able to give me a hint where should I look, what to check?

Issue in pdf to text depending on PDF file's version

I'm actually working on ubuntu as I'm trying to parse pdf files to extract text from them, which I managed to get working (using tesseract for example), BUT as I get a 1.7 pdf file version, conversion doesn't work (I get a blank page in my 'name.txt' file).
So I was wondering if anyone knew about some magic that can solve my problem regarding this pdf version issue...
I looked pretty much everywhere I could on the web, without seeing similar issues, therefore I came to y'all.
Hope you'll find a way to help me, cause google hasn't been such a friend so far...

Converting HTML to odt, doc, docx

Is there an easy way to convert HTML(with CSS styles and embedded images) to ODT, DOCX, DOC from the command line on linux server. I searched a lot but have not found a good option.
There was a problem the same way to convert to PDF, decided by wkhtmltopdf. Perhaps there are ways to convert the resulting PDF documents to other formats?
To convert to odt it's pretty easy after installing pandoc.
After the relatively hard part: from odt (or even html) you can script (Open|Libre)Office via e.g. unoconv
Or you can like:
abiword --to=doc filename.odt
Also see this thread, and this blog post.
HTH
If you want to convert HTML into docx you may use a solution like PHPDocX. You need to get the PRO version though because the free one does not include the conversion functionality.
If you're on ruby there is a gem based on libreoffice headless (with pyod/jod converter) and pdf tools.
Post with your issues to the pandoc GoogleGroup, John is very responsive in every way.
You may even find the latest release v1.9 may fix your problem, or maybe you just need to get to know the toolset in more detail.
I found soultion - is abiword in console variant.

make swf from fla without ever opening it

is it possible to change text and images in a fla file without ever opening it up and then making the swf via command line? I want to make a flash template and save the fla. Then be able to update my text and image name and convert it to swf. I have one template but tons of different text options and background images. It would be nice to be able to copy the master.fla twenty times and just change the source code (will do this from command line) and then convert to swf (via command line).
Any help would be appreciated.
With CS5, you can do half of what you're asking today, by using the XFL file format instead of FLA. Instead of a binary blob, you get an editable XML file and a tree of separate asset files: PNGs, AS3 files, etc. You can then modify the XML or AS3 files programmatically to get your variants.
(A CS5 FLA file is really just a zipped up version of the XFL, but there's no advantage to using that instead of an XFL. In CS4 and previous, FLA was a proprietary binary format.)
The missing piece is an XFL compiler. Adobe currently provides no such thing, and the third party market hasn't yet produced one.
You could use a systems automation tool to drive the Flash Professional environment through the compilation steps. On OS X, for example, either Automator or AppleScript should be able to do what you want. It'll just have more overhead than the command line compiler you were hoping for.
I agree with Jason, there are a lot of alternatives to what you suggest. Keeping content out of the SWF is good practice actually. This is a good way to avoid large files!
Depending on what you 're looking to achieve, there are a lot of solutions available. XML is an option, JSON another.
If you're looking to build a template, any of the above would seem appropriate.
It sounds like you're working from the Flash IDE, as Jason suggests you may want to have a look at another IDE, such as FlashDevelop, FDT or FlashBuilder as they make coding with AS3 a lot easier.

Resources