Convert multi PDF document to several Image files

Convert multi PDF document to several Image files - linux

How can I convert a multi-page-PDF into several JPG or TIFF?
When I use
convert "abc.PDF" "abc.JPG"
then just the first page of the PDF is being converted. Is it possible to convert every page of "abc.PDF" to several JPG-files?

You should be able to convert multipage PDF files into multiple JPEGs (one file per page) easily when using convert.
Here is a command to process just pages 1--5:
convert PDF32000_2008.pdf[0-4] page-%d.jpg
([0-4] means pages 1--5. The page indexing is 0-based!)
However, this does not give you much control about the resulting quality. The only thing you can add is -density 150 or -density 300 to increase the resolution of your images. (convert by default uses -density 72 which is 72 PPI.)
Also, be aware that ImageMagick is not able to process PDFs all by itself. It employs Ghostscript as its 'delegate' to handle PDF files. You can see this if you add -verbose to your command line:
convert -verbose -density 200 ~/Downloads/PDF32000_2008.pdf[0-4] page-%d.jpg
[....]
[ghostscript library] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
-dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 \
"-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 \
"-r200x200" -dFirstPage=1 -dLastPage=5 \
"-sOutputFile=/var/tmp/magick-63898lc1DhZVuD6lu%d" \
"-f/var/tmp/magick-63898h8-BZJ59LyhQ" \
"-f/var/tmp/magick-638989MxSe0EALH5F"
So in many cases where you want to convert PDF pages to images it has advantages to run Ghostscript directly...

Related

Ghostscript command line - pass arguments to included file

I developing pdf conversion app with node.js and Ghostscript. I execute command line gs with exec(). My command definition looks like:
let gs_cmd = `
gs -sDEVICE=pdfwrite \
-dPDFX=true \
-dPDFACompatibilityPolicy=1 \
-sColorConversionStrategy=/CMYK \
-sProcessColorModel=DeviceCMYK \
-sDefaultCMYKProfile=${icc_profile_file} \
-dNoOutputFonts \
-dBATCH \
-dQUIET \
-r${DPI} \
-g${w}x${h} \
-dPDFFitPage \
-NumRenderingThreads=4 \
-o ${target_file}-conv.pdf \
PDFX_def.ps \
#trimbox.in "Trimed" \
${target_file}.pdf
`;
I have problem with line:
#trimbox.in "Trimed" \
which tells to Ghostscript to include file and pass the parameters to in. I can't find a proper way to include parameters that can be used in included file. I want to pass "Trimed" string as $0 argument which will be available in trimbox.in file. I also tried with -t=Trimmed or -t="Trimmed" without effects.
From Ghostscript docs (section 10.1):
#filename
Causes Ghostscript to read filename and treat its contents the same as the command line. (This was intended primarily for getting around DOS's 128-character limit on the length of a command line.) Switches or file names in the file may be separated by any amount of white space (space, tab, line break); there is no limit on the size of the file.
-- filename arg1 ...
-+ filename arg1 ...
Takes the next argument as a file name as usual, but takes all remaining arguments (even if they have the syntactic form of switches) and defines the name ARGUMENTS in userdict (not systemdict) as an array of those strings, before running the file. When Ghostscript finishes executing the file, it exits back to the shell.
How to achieve this?
Running my command causes error:
Error: /undefined in Trimed

Firstly you should review the Ghostscript licence to ensure your use is compliant with the licence (AGPL v3). Note that this includes software as a service applications.
"Trimed" isn't a Ghostscript switch and it isn't the name of an input file, so yes, you get an error. You can't 'pass parameters' to #file, because Ghostscript treats that, literally, as a file containing a bunch of switches. There is no command substitution or anything like that. SO you can't have $0 in the file specified by #file.
So when you say :
#PDFX_def_trimbox.ps "Trimed" \
which tells to Ghostscript to include file and pass the parameters to
in
I'm afraid you are incorrect. There is no way to 'pass parameters' to the file when using the #file syntax.
You haven't said what's in the file 'PDFX_def_trimbox.ps', and I'm suspicious (because of the .ps) that this is a PostScript program. You can't use a PostScript program with the #file syntax, because a PostScript program is not a series of Ghostscript switches.
So where you have :
-sDEVICE=pdfwrite \
-dPDFX=true\
etc, you could put all of those switches into the file specified by #file. But you can't put any PostScript in there.
There are a few other problems. You have specified NumRenderingThreads=4, which will do nothing, because the pdfwrite device doesn't (in general) do any rendering, it preserves the input as far as possible as vector data. So pdfwrite ignores this parameter altogether.
For similar reasons, the -r parameter is less than useful. In the case of pdfwrite that simply affects how accurate the conversion is. You shouldn't set that without good reason.
You've set -sColorConversionStrategy=/CMYK when it should be =-sColorConversionStrategy=CMYK or -dColorConversionStrategy=/CMYK. -s takes strings, -d takes numbers or names.
-g sets teh widht and height of the page in pixels, which isn't a great plan, that depends on the resolution. You should -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS instead, and not set the resolution.
-EDIT-
-response to comment below-
If you want PDF file to contain a 300 dpi image, then you need to create a page which is the correct size so that, when drawn on it, the bitmap data form the image is 300 dpi.
So for example, if you have an image which is 600 pixels by 900 pixels, then in order to get that to be 300 dpi you must make the media size 2 inches by 3 inches, which is 144 by 216 points. Changing the resolution of the pdfwrite device won't affect that at all. Setting -g and -r will alter the media size, but not the resolution of the image, though if you also set -dPDFFitPage then yes it will rescale the image to fit the media, which will alter its resolution.....
I have no idea if your original image was 300 dpi, if it was, and the SVG to PDF conversion maintained that, then you don't need to mess about with media sizes and resolution at all, the pdfwrite device will maintain whatever was there.
As regards the #file syntax, you cannot do this:
-c "[ {ThisPage} << /TrimBox [$0 $1 $2 $3] >> /PUT pdfmark"
in the file supplied via the # comamnd because, as I said, there is no variable replacement in the processing which Ghostscript does on the contents of that file. This is not a bash script.

How to use stdin with caption in ImageMagick

My understanding from reading the ImageMagick documentation regarding text, is that the #- notation reads the contents of standard input.
As such, this should be a fairly straight forward way to render Hello World.
printf "Hello\nWorld" |
convert \
-size 1280x100 \
-background '#0000FF10' \
-density 90 \
-gravity Center \
-fill black \
-font Helvetica \
caption:#- \
test.png
On OS X 10.11.5 via HomeBrew, this works, using convert Version: ImageMagick 6.9.4-3 Q16 x86_64 2016-05-20.
However on Ubuntu 16.04 LTS, the identical command does not work, using convert Version: ImageMagick 6.8.9-9 Q16 x86_64 2016-06-01. In fact, it renders the stdin operator, literally.
The only thing I was able to find that remotely looked like this problem on Google was this article, dated back in Oct 2015 in which ImageMagick 6.9.2-5 Beta was patched to fix a similar problem.
QUESTION: Am I not escaping it properly, is there really a problem in ImageMagick, or is my Linux Distro picking up a historical version of ImageMagick with the bug and I need to build from source?
Much Later After Many Experiments
SOLVED ...? Built ImageMagick 7.0.2 from source on the Ubuntu box and the above command worked as desired. Was there a better solution?

No need to build from source. Just replace #- with "`tee`" :
printf "Hello\nWorld" |
convert \
-size 1280x100 \
-background '#0000FF10' \
-density 90 \
-gravity Center \
-fill black \
-font Helvetica \
caption:"`tee`" \
test.png
`tee` will execute first and 'process' stdin before completing the convert command.

I suspect it is down to differences in your policy.xml file. There were recent warnings about IM security vulnerabilities (detailed here) and I guess the policy.xml file on one of your servers has been made secure and not the other. The affected line in that file is:
<!-- <policy domain="path" rights="none" pattern="#*" /> -->
Clarifications contributed by question owner:
The policy file's location is /etc/ImageMagick-6/policy.xml
The default state in the package distribution is UNcommented out.
The recommendation posed here is to comment it out.
This solution did not work for the question owner; your mileage may vary,
Further clarification by Mark:
The location of the policy.xml file is not always /etc/ImageMagick-6, it is different on different systems - for example on OS X it is under /usr/local/Cellar/imagemagick....
The sure way to find the policy.xml file is to run the following command and realise that ImageMagick expects the policy.xml file to be in the same directory as the delegates.xml and coder.xml:
convert -debug configure logo: null: 2>&1 | grep -Ei "Searching|Loading"

Why is the following convert command resulting in Segmentation fault?

This is the command I am running (directly from the command line, logged in as root):
/usr/bin/convert '/var/storage/files/drupal/273f09ab5f8671d3c457719c7955063f.jpg' -resize 127x127! -quality '75' '/var/storage/files/drupal/imagecache/artwork_moreart/273f09ab5f8671d3c457719c7955063f.jpg'
The result of the command is just: Segmentation fault
Version of ImageMagic: ImageMagick 6.4.3 2009-02-25
Linux version: SUSE Linux Enterprise Server 11 (x86_64)
This image does exists and I have copied it to my local computer and opened it up with no issue.
Please let me know if there is additional information you need and how to get this information.

Try it with a correct command. The ! needs backslash-escaping, first of all, otherwise it is interpreted by your shell, instead of by convert:
/usr/bin/convert \
'/var/storage/files/drupal/273f09ab5f8671d3c457719c7955063f.jpg' \
-resize 127x127\! -quality '75' \
'/var/storage/files/drupal/imagecache/artwork_moreart/273f09ab5f8671d3c457719c7955063f.jpg'
If this doesn't work, try to surround the argument with single quotes too (like you did with your other arguments:
127x127\! => '127x127\!'
The cause of your problem could also reside outside the convert binary, and be within the specific input JPEG you want process. You can try to rule this out by processing a set different input files. Start with the built-in IM test files logo:, wizard: and netscape::
convert wizard: \
-resize "127x127\!" \
127wiz.jpg
convert logo: \
-resize "127x127\!" \
127log.jpg
convert netscape: \
-resize "127x127\!" \
127net.jpg
Sorry, I cannot reproduce your problem directly here. SLES 11 with IM 6.4.3 is simply too ancient for me.

How to convert pptx files to jpg or png (for each slide) on linux?

I want to convert a powerpoint presentation to multiple images. I already installed LibreOffice on my server and converting docx to pdf is no problem. pptx to pdf conversion does not work. I used following command line:
libreoffice --headless --convert-to pdf filename.pptx
Is there es way to convert pptx to pngs immediately or do I have to convert it to pdf first and then use ghostscript or something?
And what about the quality settings? Is there a way to choose the resolution of the resulting images?
Thanks in advance!
EDIT:
According to this link I was able to convert a pdf to images with the simple command line:
convert <filename>.pdf <filename>.jpg
(I guess you need LibreOffice and ImageMagick for it but not sure about it - worked on my server)
But there are still the problems with the pptx-to-pdf convert.
Thanks to googling and Sebastian Heyn's help I was able to create some high quality images with this line:
convert -density 400 my_filename.pdf -resize 2000x1500 my_filename%d.jpg
Please be patient after using it - you still can type soemthing into the unix console but it's processing. Just wait a few minutes and the jpg files will be created.
For further information about the options check out this link
P.S.: The aspect ratio of a pptx file doesn't seem to be exactly 4:3 because the resulting image size is 1950x1500

After Installing unoconv and LibreOffice you can use:
unoconv --export Quality=100 filename.pptx filename.pdf
to convert your presentation to a pdf. For further options look here.
Afterwards you can - as already said above - use:
convert -density 400 my_filename.pdf -resize 2000x1500 my_filename%d.jpg
to receive the images.

Convertion PPTX to PNG/JPG
This solution requires LibreOffice ( soffice ) and Ghostscript ( gs )
sudo apt install libreoffice ghostscript
Then two steps:
PPTX -> PDF
soffice --headless --convert-to pdf prezentacja.pptx
PDF -> PNG/JPG
gs -sDEVICE=pngalpha -o slajd-%02d.png -r96 prezentacja.pdf
-o slajd-%02d.png - output to file, %02d slajd number, two digits
-r96 - resolution:
96 -> 1280x720
144 -> 1920x1080

Not sure about libreoffice, but afaik its the only program to deal with pptx files.
I found this http://ask.libreoffice.org/en/question/23851/converting-pptx-to-pdf-issue/
If you have pdfs you can use imagemagick to output any quality pictures

Repair apparently damaged pdf and reduce file size

I have a PDF file (4.6MB) which was made by combining 6 different PDFs (containing both text and bitmap graphics) using pdftk in Ubuntu 12.04. I wish to compress this file to something close to 2MB without affecting its quality.
I have tried pdftk's "compress" option (couldn't compress it to 2 MB), also tried converting it to ps first and than back to pdf, it gives the following warning:
****Warning: considering '0000000000 XXXXX n' as a free entry.
and then hangs. qpdf also failed saying that the file is damaged.
Could someone help me out?

What result does Ghostscript give you? Try this command:
gs \
-o output.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/screen \
input.pdf

has this pdf file reserved infos? If it has no confidential data it would be interesting to see
anyway many times where qpdf fails, Multivalent works
you can try to use its Compress tool (it also attempts to repair pdf file)
Multivalent
https://rg.to/file/c6bd7f31bf8885bcaa69b50ffab7e355/Multivalent20060102.jar.html
(latest free version with tools included, current has no tools in itself)
java -cp path....to/Multivalent.jar tool.pdf.Compress file.pdf

This works for me to repair the damaged PDF
sudo apt-get install mupdf-tools
mutool clean input.pdf output.pdf

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string