No output file from GhostScript PDF to PNG conversion - linux

I have a two-page PDF I'm trying to convert to a PNG file. When I run:
gs -sDevice=pngalpha -o=gs-output-%d.png -r400 test1-0.pdf
I get:
GPL Ghostscript 9.07 (2013-02-14)
Copyright (C) 2012 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 2.
Page 1
%%BoundingBox: 35 35 577 757
%%HiResBoundingBox: 35.910001 35.910001 576.090022 756.090029
Page 2
%%BoundingBox: 35 35 577 757
%%HiResBoundingBox: 35.910001 35.910001 576.090022 756.090029
And then... nothing. No output files at all. Where am I going wrong?

You're so close you'll be mad ;-)
After a bit of manpage reading here's what worked for me:
gs -sDEVICE=pngalpha -ogs-output-%d.png -r400 test1-0.pdf
i.e. DEVICE instead of Device, and -o instead of -o=
In case it matters, my gs version is:
GPL Ghostscript 9.05 (2012-02-08)

try this which works perfectly with me and it get a very good results:
-sDEVICE=pngalpha -o "$OUTPUTIMAGEFILE" -dFirstPage=1 -dLastPage=2 -dNOPAUSE -dGraphicsAlphaBits=4 -dTextAlphaBits=4 "$INPUTPDFFILE"
However i guess that using -r400 is for JPEG images not PNGS

Related

What do device (character special) file sizes mean?

Using ls -l normally results in a long listing that includes the file size...
-rw-r--r--# 1 user1 staff 881344 Sep 1 15:35 someFile.png
On macOS 10.13.5, and Ubuntu 20.04, character special (device) file sizes are very different...
crw------- 1 root wheel 31, 0 Aug 30 16:11 autofs
In this case, what does the "31, 0" mean?
what does the "31, 0" mean?
It's the major/minor numbers of character device.
See these:
https://unix.stackexchange.com/questions/97676/how-to-find-the-driver-module-associated-with-a-device-on-linux
https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lgdd/lgdd_c_udev.html
Read carefully the documentation of ls(1) then about inode(7)
31 is a major device number, 0 is a minor device number.
Remember that ls(1) would use stat(2) (you might check using strace(1)...), so read Advanced Linux Programming then syscalls(2)
Sometimes, ls might be some shell alias or function. So read documentation of GNU bash. Try also /bin/ls --help
On GNU Linux, read documentation of coreutils. And it is free software, you could download and study its source code !
On MacOSX, the operating system kernel might have different system calls.
Be however aware of udev (on Linux).

Weird "border" with jpegtran when rotating an image

I'm using jpegtran to rotate my images, but it seems to be doing some weird stuff.
The same code/logic worked on my other server. Here is a test case I've done:
test.jpg is the original image, then I run:
jpegtran -rotate 90 test1.jpg > test2.jpg.tmp
mv test2.jpg.tmp test2.jpg
jpegtran -rotate 90 test2.jpg > test3.jpg.tmp
mv test3.jpg.tmp test3.jpg
test.jpg:
test2.jpg
test3.jpg:
As you should be able to see, there are weird borders on it (almost like shadows on another part of the image):
What would cause this, and how can I get around it?
The OLD server shows:
jpegtran -v
libjpeg-turbo version 1.2.1 (build 20131209)
Copyright (C) 1991-2010 Thomas G. Lane, Guido Vollbeding
Copyright (C) 1999-2006 MIYASAKA Masaru
Copyright (C) 2009 Pierre Ossman for Cendio AB
Copyright (C) 2009-2012 D. R. Commander
Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies)
Emulating The Independent JPEG Group's libjpeg, version 6b 27-Mar-1998
...and the NEW server shows:
jpegtran -v
Independent JPEG Group's JPEGTRAN, version 9b 17-Jan-2016
Copyright (C) 2016, Thomas G. Lane, Guido Vollbeding
Typically, no sooner had I written the question out and posted it, I found the solution :)
http://www.imagemagick.org/discourse-server/viewtopic.php?t=21341#p87099
Basically, to add "-trim" to the command - so it looks like:
jpegtran -rotate 90 -trim test1.jpg > test2.jpg.tmp
mv test2.jpg.tmp test2.jpg
jpegtran -rotate 90 -trim test2.jpg > test3.jpg.tmp
mv test3.jpg.tmp test3.jpg
I've just tried this out, and it works as expected :) Thought I would post it here in case anyone else comes across this!

How to create a custom Oracle Linux 7u2 iso image

I have been using a some proven steps to create my own Oracle Linux 6uX ISO images with a custom kickstart script for a long time. What i basically do is mount the iso-image using hdiutil, copy the contents to a working folder, make the modifications and create an iso using makeiso (cdrutils).
Details have been described here; http://www.reddipped.com/2015/12/virtualbox-soa-bpm-osb-bam-33-minutes/
I just made my first attempts to create a custom Oracle Linux 7u2 ISO images, but miserably failed till now.
First opening the image using hdiutil gives and 'hdiutil: attach failed - no mountable file systems'. Instead i used Keka to extract the contents of the iso.
Modified the contents of the extracted iso-image;
Removing /isolinux/boot.cat,
Adding a new ks-bd.ks
Adding a menu item to the isolinux.cfg to be able to start installation using the kickstart file
label linux_basicserver_silent\
menu label ^Install basic server silent\
menu default\
kernel vmlinuz\
append initrd=initrd.img ks=cdrom:\/ks-bd.ks\
Then created an iso again;
## Make isolinux.bin writable
chmod u+w V100082-01U/isolinux/isolinux.bin
# Build the V100082-01Uiso
cdrtools/cdrtools-*/mkisofs/OBJ/i386-darwin-clang/mkisofs -r -J -T -o V100082-01U.iso -b isolinux/isolinux.bin \
-c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -R \
-m TRANS.TBL -v -V Oracle\ Linux\ 7.2 ./V100082-01U
After mounting the iso image and selecting the 'linux_basicserver_silent' installation option the installation seems to stall on the message 'Starting automated install'
When selecting a standard interactive installation in the installation menu the installation also freezes with the latest step 'Reached target Basic System'
After some minutes the same error 'dracut-initqueue timeout' is repeatedly shown .
Any hints how to fix this?
-- Update 10/27/2016 --
When comparing the orignal iso with the created iso using mkisofs there are no substantial differences, i think..
Original
./isoinfo -d -i V100082-01.iso
CD-ROM is in ISO 9660 format
System id: LINUX
Volume id: OL-7.2 Server.x86_64
Volume set id:
Publisher id:
Data preparer id:
Application id: GENISOIMAGE ISO 9660/HFS FILESYSTEM CREATOR (C) 1993 E.YOUNGDALE (C) 1997-2006 J.PEARSON/J.SCHILLING (C) 2006-2007 CDRKIT TEAM
Copyright File id:
Abstract File id:
Bibliographic File id:
Volume set size is: 1
Volume set sequence number is: 1
Logical block size is: 2048
Volume size is: 2178717
El Torito VD version 1 found, boot catalog is in sector 701
Joliet with UCS level 3 found.
SUSP signatures version 1 found
Rock Ridge signatures version 1 found
Rock Ridge id 'RRIP_1991A'
Eltorito validation header:
Hid 1
Arch 0 (x86)
ID ''
Cksum AA 55 OK
Key 55 AA
Eltorito defaultboot header:
Bootid 88 (bootable)
Boot media 0 (No Emulation Boot)
Load segment 0
Sys type 0
Nsect 4
Bootoff EFE 3838
Rebuild
./isoinfo -d -i V100082-01U.iso
CD-ROM is in ISO 9660 format
System id: Mac OS X
Volume id: Oracle Linux 7.2
Volume set id:
Publisher id:
Data preparer id:
Application id: MKISOFS ISO9660/HFS/UDF FILESYSTEM BUILDER & CDRECORD CD/DVD/BluRay CREATOR (C) 1993 E.YOUNGDALE (C) 1997 J.PEARSON/J.SCHILLING
Copyright File id:
Abstract File id:
Bibliographic File id:
Volume set size is: 1
Volume set sequence number is: 1
Logical block size is: 2048
Volume size is: 2251903
El Torito VD version 1 found, boot catalog is in sector 718
Joliet with UCS level 3 found.
SUSP signatures version 1 found
Rock Ridge signatures version 1 found
Rock Ridge id 'RRIP_1991A'
Eltorito validation header:
Hid 1
Arch 0 (x86)
ID ''
Cksum AA 55 OK
Key 55 AA
Eltorito defaultboot header:
Bootid 88 (bootable)
Boot media 0 (No Emulation Boot)
Load segment 0
Sys type 0
Nsect 4
Bootoff 2CF 719
Instead of using 7zip, use the cdrtool utility isoinfo to extract the original iso image.
mkdir V100082-01U
cd V100082-01U
isoinfo -R -X -i ../V100082-01.iso
Then modify the image and rebuild using mkisofs
## Make isolinux.bin writable
chmod u+w work/isolinux/isolinux.bin
# Build the V100082-01Uiso
cdrtools/cdrtools-*/mkisofs/OBJ/i386-darwin-clang/mkisofs -r -J -T -o V100082-01U2.iso -b isolinux/isolinux.bin \
-c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -R -sysid LINUX \
-m TRANS.TBL -v -V OL-7.2\ Server.x86_64 ./work

Optimize PDF files (with Ghostscript or other)

Is Ghostscript the best option if you want to optimize a PDF file and reduce the file size?
I need to store alot of PDF files and therefore I need to optimize and reduce the file size as much as possible
Does anyone have any experience with Ghostscript and/or other?
command line
exec('gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4
-dPDFSETTINGS=/screen -sOutputFile='.$file_new.' '.$file);
If you looking for a Free (as in 'libre') Software, Ghostscript is surely your best choice. However, it is not always easy to use -- some of its (very powerful) processing options are not easy to find documented.
Have a look at this answer, which explains how to execute a more detailed control over image resolution downsampling than what the generic -dPDFSETTINGS=/screen does (that defines a few overall defaults, which you may want to override):
How to downsample images within pdf file?
Basically, it tells you how to make Ghostscript downsample all images to a resolution of 72dpi (this value is what -dPDFSETTINGS=/screen uses -- you may want to go even lower):
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=72 \
If you want to try if Ghostscript is able to also 'un-embed' the fonts used (sometimes it works, sometimes not -- depending on the complexity of the embedded font, and also on the font type used), you can try to add the following to your gs command:
gs \
-o output.pdf \
[...other options...] \
-dEmbedAllFonts=false \
-dSubsetFonts=true \
-dConvertCMYKImagesToRGB=true \
-dCompressFonts=true \
-c ".setpdfwrite <</AlwaysEmbed [ ]>> setdistillerparams" \
-c ".setpdfwrite <</NeverEmbed [/Courier /Courier-Bold /Courier-Oblique /Courier-BoldOblique /Helvetica /Helvetica-Bold /Helvetica-Oblique /Helvetica-BoldOblique /Times-Roman /Times-Bold /Times-Italic /Times-BoldItalic /Symbol /ZapfDingbats /Arial]>> setdistillerparams" \
-f input.pdf
Note: Be aware that downsampling image resolution will surely reduce quality (irreversibly), and dis-embedding fonts will make it difficult or impossible to display and print the PDFs unless the same fonts are installed on the machine....
Update
One option which I had overlooked in my original answer is to add
-dDetectDuplicateImages=true
to the command line. This parameter leads Ghostscript to try and detect any images which are embedded in the PDF multiple times. This can happen if you use an image as a logo or page background, and if the PDF-generating software is not optimized for this situation. This used to be the case with older versions of OpenOffice/LibreOffice (I tested the latest release of LibreOffice, v4.3.5.2, and it does no longer do such stupid things).
It also happens if you concatenate PDF files with the help of pdftk. To show you the effect, and how you can discover it, let's look at a sample PDF file:
pdfinfo p1.pdf
Producer: libtiff / tiff2pdf - 20120922
CreationDate: Tue Jan 6 19:36:34 2015
ModDate: Tue Jan 6 19:36:34 2015
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 595 x 842 pts (A4)
Page rot: 0
File size: 20983 bytes
Optimized: no
PDF version: 1.1
Recent versions of Poppler's pdfimages utility have added support for a -list parameter, which can list all images included in a PDF file:
pdfimages -list p1.pdf
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------
1 0 image 423 600 rgb 3 8 jpeg no 7 0 52 52 19.2K 2.6%
This sample PDF is a 1-page document, containing an image, which is compressed with JPEG-compression, has a width of 423 pixels and a height of 600 pixels and renders at a resolution of 52 PPI on the page.
If we concatenate 3 copies of this file with the help of pdftk like so:
pdftk p1.pdf p1.pdf p1.pdf cat output p3.pdf
then the result shows these image properties via pdfimages -list:
pdfimages -list p3.pdf
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------
1 0 image 423 600 rgb 3 8 jpeg no 4 0 52 52 19.2K 2.6%
2 1 image 423 600 rgb 3 8 jpeg no 8 0 52 52 19.2K 2.6%
3 2 image 423 600 rgb 3 8 jpeg no 12 0 52 52 19.2K 2.6%
This shows that there are 3 identical PDF objects (with the IDs 4, 8 and 12) which are embedded in p3.pdf now. p3.pdf consists of 3 pages:
pdfinfo p3.pdf | grep Pages:
Pages: 3
Optimize PDF by replacing duplicate images with references
Now we can apply the above mentioned optimization with the help of Ghostscript
gs -o p3-optim.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true p3.pdf
Checking:
pdfimages -list p3-optim.pdf
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------
1 0 image 423 600 rgb 3 8 jpeg no 10 0 52 52 19.2K 2.6%
2 1 image 423 600 rgb 3 8 jpeg no 10 0 52 52 19.2K 2.6%
3 2 image 423 600 rgb 3 8 jpeg no 10 0 52 52 19.2K 2.6%
There is still one image listed per page -- but the PDF object ID is always the same now: 10.
ls -ltrh p1.pdf p3.pdf p3-optim.pdf
-rw-r--r--# 1 kp staff 20K Jan 6 19:36 p1.pdf
-rw-r--r-- 1 kp staff 60K Jan 6 19:37 p3.pdf
-rw-r--r-- 1 kp staff 16K Jan 6 19:40 p3-optim.pdf
As you can see, the "dumb" concatentation made with pdftk increased the original file size to three times the original one. The optimization by Ghostscript brought it down by a considerable amount.
The most recent versions of Ghostscript may even apply the -dDetectDuplicateImages by default. (AFAIR, v9.02, which introduced it for the first time, didn't use it by default.)
You can obtain good results by converting from PDF to Postscript, then back to PDF using
pdf2ps file.pdf file.ps
ps2pdf -dPDFSETTINGS=/ebook file.ps file-optimized.pdf
The value of argument -dPDFSETTINGS defines the quality of the images in the resulting PDF. Options are, from low to high quality: /screen, /default, /ebook, /printer, /prepress, see http://milan.kupcevic.net/ghostscript-ps-pdf/ for a reference.
The Postscript file can become quite large, but the results are worth it. I went from a 60 MB PDF to a 140 MB Postscript file, but ended up with a 1.1 MB optimized PDF.
I use Ghostscript with following options taken from here.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen \
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
You may find that pdftocairo (from Poppler) can make smaller PDFs but beware that it will strip some features (such as hyperlinks) away.
This worked for me
Convert your PDF to PS (this creates a large file
pdf2ps large.pdf very_large.ps
Convert the new PS back to a PDF
ps2pdf very_large.ps small.pdf
Source:
https://pandemoniumillusion.wordpress.com/2008/05/07/compress-a-pdf-with-pdftk/
You will lose in quality but if it's not an issue then ImageMagick's convert may proves helpful :
convert original.pdf reduced.pdf
Note that it doesn't always work : I once converted a 126 MB file into a 14 MB one using this command, but another time it doubled the size of a 350 Ko file.
Anyway it's worth giving it a try…
As mentioned in comments, of course there is no point in applying this command on a vector-based PDF, it will only be useful on rasterized images.
See also this post for related options.
Ghostscript comes with ps2pdf14 utility which can be used to optimise PDF file(s) but on some occasions size of "optimised" file may be bigger than original.
For the PDF which size is mainly due to embedded images (pdfimages -list is your friend), typically scanned documents, I would recommend the use of ocrmypdf which is quite good at optimizing, with optional OCR layer as a bonus.

Imagemagick use-trimbox doesn't work at all

I'm desparately trying to convert PDF to JPG by Imagemagick (convert command) preserving trimbox.
I run following command (convert only first page).
convert -verbose -define pdf:use-trimbox=true "test_org.pdf[0]" cropped.jpg
Here is an output. Looks like imagemagick doesn't pass use-trimbox parameter to the ghostscript. May that be a reason? As at the moment converted image is mediabox size, not trimbox. Version of ImageMagick is 6.0.7, ghostscript GPL Ghostscript 8.64.
convert: **"gs" -q -dBATCH -dSAFER -dMaxBitmap=500000000 -dNOPAUSE -dAlignToPixels=0 "-sDEVICE=bmpsep8" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-g652x935" "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-XXgTtZZG" "-f/tmp/magick-XXs4Kjq2" "-ftest_org.pdf".**
/tmp/magick-XXgTtZZG[0] BMP 652x935 PseudoClass 256c 2.3mb 0.050u 0:01
/tmp/magick-XXgTtZZG[1] BMP 652x935 PseudoClass 256c 2.3mb 0.040u 0:01
/tmp/magick-XXgTtZZG[2] BMP 652x935 PseudoClass 256c 2.3mb 0.020u 0:01
/tmp/magick-XXgTtZZG[3] BMP 652x935 PseudoClass 256c 2.3mb 0.010u 0:01
test_org.pdf PDF 652x935 652x935+0+0 DirectClass 2.3mb 0.040u 0:01
test_org.pdf PDF 652x935 652x935+0+0 DirectClass 2.3mb 0.040u 0:01
test_org.pdf=>cropped.jpg PDF 652x935 652x935+0+0 DirectClass 202kb 0.120u 0:01
You possibly have too old ImageMagick. Works fine in my case.
Version: ImageMagick 6.6.0-4 2010-11-16 Q16 http://www.imagemagick.org
-define pdf:use-trimbox=true makes convert invoke gs with -dUseTrimBox option, which I don't see in output provided by you. Consider updating ImageMagick.
Your ImageMagick 6.0.7 is more than 6 years old (dozens of releases back). Current is 6.7.0-9.
Your Ghostscript 8.64 is also more than 2 years old already (5 releases back). Current is 9.02.
My recommendation is to upgrade.
On my (Windows) system I have IM 6.7.0-8 and GS 9.02. Running -define pdf:use-trimbox=true works fine here and translates to a Ghostscript commandline parameter of -dUseTrimBox=true.
However (and this is important!): one should take into account, that for many practical example PDFs out there, TrimBox is undefined, or explicitely set to the same values as MediaBox. Both have the same effect: a -dUseTrimBox=true will not make any difference in the output to a -dUseTrimBox=false.

Resources