print a big, landscape PDF drawing to a3 LaserJet4 (portrait) using ghostscript 9.0 - linux

We want to print big drawings (up to A0 and some times longer) to A3 printers using ghostscript:
gs -o - -sDEVICE=pdfwrite -r1200x1200 -sPAPERSIZE=a3 -f
/S/tmp/SamplePDFnewStamp.pdf | gs -o resized.pcl -sDEVICE=ljet4
-g7012x4961 -dPDFFitPage -
I get A4 landscape on an A3 portrait Paper. I also tried to rotate:
gs -sOutputFile="-" -sDEVICE=pdfwrite -r1200x1200 -sPAPERSIZE=a3 -d
-dBATCH -dNOPAUSE -dAutoRotatePages=/None -dPDFFitPage -c "<</Orientation 1>> setpagedevice 90 rotate 0 -595 translate" -f
/S/tmp/SamplePDFnewStamp.pdf -c quit | gs -o resized.pcl
-sDEVICE=ljet4 -g7012x4961 -dPDFFitPage -
getting the same result.

Its not really possible to comment without seeing a PDF file, but a number of the command line options you are using there don't make sense in the combination you have.
The first thing I would do is stop piping the commands like that, at least while investigating the problem. Do it as 2 stages, that will allow you (and others) to look at the intermediate PDF file.
Secondly, I don't believe you can do what you seem to be trying to do. It looks like you are trying to pipe the PDF produced by the first invocation of gs through the second invocation. I don't see any way that will work, the pdfwrite device needs to seek around the file in order to create the xref table, it cannot use stdout, at least in the current version. What version of Ghostscript are you using ?
I also can't see the point of this, why take a PDF, make a new PDF from it, and then render the second PDF ? Why not just render the original ?
None of the media size switches you are specifying will have any effect, because you haven't told Ghostscript that the media size is fixed (using -dFIXEDMEDIA). As a result the PDF interpreter will set the media size to be the same as the MediaBox in the PDF file. Similar problems apply with sending PostScript and expecting it to alter the behaviour of Ghostscript when rendering a PDF file.
Setting the resolution for pdfwrite is not a good idea, and will in general have no effect. Even if it does have an effect, you probably don't want to set it to be the resolution of the device (and the -g values seem to suggest this is not a 1200 dpi device either). The only effect the resolution has is when objects have to be rendered to images because the can't be represented in PDF. You don't want to create images at the printer resolution, somewhere between one quarter and one half the resolution is usually sufficient.
If you'd care to share an example PDF file, I may be able to tell you how to solve your orientation problem. You will need to explain why you are running it through pdfwrite before going to PCL though, I can't see any reason for that.
This:
gs -sDEVICE=pdfwrite -sOutputFile=\temp\out.pdf -dDEVICEHEIGHTPOINTS=2386.08 -dDEVICEWIDTHPOINTS=1685.7600 -dFIXEDMEDIA -dPDFFitPage SamplePDFnewStamp.pdf
Will take your original PDF file and produce a PDF file rotated by 90 degrees. If I then do:
gs -sDEVICE=ljet4 -sOutputFile=\temp\out.pcl \temp\out.pdf
I get a PCL file that, when processed by GhostPDL with appropriate media size, seems to do what you want.
I haven't tried it, due to lack of an actual device to print on, but I would expect that:
gs -sDEVICE=ljet4 -sOutputFile=\temp\out.pcl -dDEVICEHEIGHTPOINTS=2386.08 -dDEVICEWIDTHPOINTS=1685.7600 -dFIXEDMEDIA -dPDFFitPage SamplePDFnewStamp.pdf
would produce the same file in one step.

I found a solution:
gs -q -sDEVICE=ljet4 -sOutputFile=out.pcl -dFIXEDMEDIA -dPDFFitPage -sPAPERSIZE=a3 \-c "<</Install {-1 -1 scale -843 -1192 translate}>> setpagedevice" -f SamplePDFnewStamp.pdf -c quit

Related

How minimze pdf filesize using post processing while keeping formfields and images?

I need to minimize a pdf file which will later be filled programmatically. Some of the "text" in it is really an image. Sadly I don't have the original file to fix that. So I can't just remove the background image.
My current attempt is:
gs -dShowAcroForm=false -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -sOutputFile=OUTPUT_NAME INPUT_NAME
I included -dShowAcroForm=false because it was mentioned that this would keep the form fields. Sadly this didn't help and form fields are removed by this command.
The answer can use a different command and doesn't have to use ghostscript. Solutions where I can try out different dpi's would be better.
EDIT1
The two files in question are:
https://github.com/Aeolitus/Sephrasto/blob/master/src/Sephrasto/Data/Charakterb%C3%B6gen/Langer%20Charakterbogen.pdf
https://github.com/Aeolitus/Sephrasto/blob/master/src/Sephrasto/Data/Charakterb%C3%B6gen/Standard%20Charakterbogen.pdf
As shown in the ghostscript command above, minimal loss of dpi and color transformation to black & white would be acceptable for me in order to receive a smaller file size.
I tried using cpdf -squeeze. Form fields are kept correctly. On Langer Charakterbogen.pdf it saved 13% (0,5mb) and on Standard Charakterbogen.pdf it saved 6% (0,2mb). I would like to reduce it even more, so I accept lossy compression, too.

how to merge pdf as a table with pdftk or convert

How can one use convert or pdftk to merge several pdfs organized as a table?
For example, given 4 files: file1.pdf, file2.pdf, file3.pdf, file4.pdf, each of a single page, I would like to have a single-page pdf like
file1.pdf file2.pdf
file3.pdf file4.pdf
That is, the files are arranged like an array.
By far the easiest way to convert 4 PDF pages to 1 page on any OS is by N-Up imposition/printing with output to a virtual PDF printer such as Ghostscript. For the most basic 4-Up command line usage see https://stackoverflow.com/a/72850245/10802527
Thus to combine 4 pages (others such as 2 6 9 or 16 are possible) using here in a gui I can very easily set the order.
On Linux or MacOS you can use, along with other options, the CUPS command
lp -o number-up=4 filename
see https://www.cups.org/doc/options.html
The major advantage over using tools such as PDFtk with convert is that it resolves both scaling and preserving most PDF structures without degrading to inferior down-scaled imagery by NOT passing in and out of images before calling Ghostscript.
If you have single pdfs then you can merge before print using PDFtk (uses Ghostscript) instead of poppler pdfunite. Note that with either the Original PDF format is preserved.
If you want to convert to half size images and stitch them together, then reprint to one pdf page, then that can easily be done using imagemagik convert and other commands to call Ghostscript to suit your requirements direct. However, the results will in many ways be degraded by translation to image output.
Since all of the above pass through GS it makes sense, where possible, to install GS as a PDF printer driver.
If you want to avoid installing GhostScript printing then you can use cross platform Coherent cpdf (it only uses GS if the files need repairs)
Note these are "windows double quoted names" adjust as required and is based on the 4 sequential pages in one file are then to be placed 4 at a time on each new page, thus can be used with any multiple of pages in the input.pdf
cpdf -twoup "input.pdf" -o "in-2-Up-tmp.pdf"
cpdf "in-2-Up-tmp.pdf" -rotate 90 -o "out-2-Up.pdf"
cpdf -twoup "out-2-Up.pdf" -o "out-4-Up-tmp.pdf"
cpdf "out-4-Up-tmp.pdf" -rotate 90 -o "out-4-Up.pdf"

Ghoscript /cropbox not printing correctly in linux

I'm using the Domestic shipping label api in usps to generate domestic shipping labels in pdf format. I managed to crop the top section of the pdf file which is the label needed by the usps and Ignored the bottom section which is the receipt which is not needed in shipping.
I use Ghostscript /Cropbox to crop the section that I only want which is successful but when I try to print the cropped pdf file in linux cups I get the whole uncropped pdf printed instead of the cropped pdf file. Why is it still printing the whole file instead of just printing the cropped section?.
Here's the script I'm using to crop the usps Shipping label.
gs -o cropped.pdf -sDEVICE=pdfwrite -c "[/CropBox [50.4 460.5 484.4 750.5] /PAGES pdfmark" -f uncropped.pdf
Then to change its orientation to portrait i use pdftk
pdftk cropped.pdf cat 1L output cropped_portrait.pdf
To print it in linux cups I'm using the command.
lp cropped_portrait.pdf
But when i print it it is printing the uncropped.pdf file instead of cropped_portrait.pdf.
Why is it doing that? I even deleted uncropped.pdf and tried printing again but it still prints uncropped.pdf.
Here's the two files the uncropped and cropped usps shipping labels.
Uncropped PDF file
Cropped PDF file
Hope you can help me on this one,
Thank you
Presumably the reduced PDF file displays correctly, so there is no problem with Ghostscript producing the PDF file.
As to why the printing process doesn't respect the CropBox, there is no reason really why it should. There are many Boxes in PDF and no real way for a print application to know which one you want to use. As a result printing applications often default to the MediaBox, which you haven't altered (Note that altering the CropBox doesn't change the content of the PDF file, just what is displayed).
Now, if your CUPS chain is using Ghostscript to render the PDF file, or convert it to PostScript, then this can be solved, you need to add -dUseCropBox to the command line. However I'm not a CUPS expert so I can't tell you how to do that. If CUPS isn't using Ghostscript then its probably still possible to instruct whatever is doing the conversion to use the CropBox, but you're going to have to find out what application is involved and alter the command appropriately for that application.

ps2pdf giving blank page

I'm written a program on a legacy app (Progress 4GL on SCO Unix 5.0.7 - - I know, I know) to generate a postscript file line by line. If I open the .ps file in PDFCreator, it renders everything just like I want.
But, I need to get it into pdf format. When I use ps2pdf to do that by:
ps2pdf mypsfile.ps newpdffile.pdf
and open the pdf in PDFCreator or Acrobat, I get a single blank page (expected output is one page).
If I chop the .ps file down to something that yields a simple one line output, I still get a blank .pdf page. This .ps also renders fine in PDFCreator:
/Arial findfont
12 scalefont
setfont
175 700 moveto
(ABC Company) show
showpage
ps2pdf isn't displaying any errors. Can someone tell me what I need to add to the .ps code above to have it correctly convert to a PDF?
This PDF becomes overlay text on 'top of' another PDF using pdftk. When I have two "good" PDF's, that part works fine as well. It's just the converting ps to pdf that I'm stuck on.
Thanks,
David
That ought to work fine. First question is 'which version of Ghostscript are you using ?' Have you tried using Ghostscript directly instead of using the ps2pdf script ? Something like
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=<output.pdf> <input.ps>
You haven't set a media size request, so its possible that GS is defaulting to Letter, and your text is simply off the top of the page. Try placing the text at 0,0 instead of 175, 700.

What are best parameters to run ImageMagick to convert low quality pdf to images (for OCR)

I have several low quality pdfs. I would like to use OCR -- to be more precise Ocropus to get text from them. To do use, I use first ImageMagick -- a command line tool to convert pdf to images -- to transforms these pdfs into jpg or png.
However ImageMagick produces very low quality images and Ocropus hardly recognizes anything. I would like to learn what are the best parameters for handling low quality pdfs to provide as-good-as-possible-quality images to OCR.
I have found this page, but I do not know where to start.
You can learn about the detailed settings ImageMagick's "delegates" (external programs IM uses, such as Ghostscript) by typing
convert -list delegate
(On my system that's a list of 32 different commands.) Now to see which commands are used to convert to PNG, use this:
convert -list delegate | findstr /i png
Ok, this was for Windows. You didn't say which OS you use. [*] If you are on Linux, try this:
convert -list delegate | grep -i png
You'll discover that IM does produce PNG only from PS or EPS input. So how does IM get (E)PS from your PDF? Easy:
convert -list delegate | findstr /i PDF
convert -list delegate | grep -i PDF
Ah! It uses Ghostscript to make a PDF => PS conversion, then uses Ghostscript again to make a PS => PNG conversion. Works, but isn't the most efficient way if you know that Ghostscript can do PDF => PNG in one go. And faster. And in much better quality.
About IM's handling of PDF conversion to images via the Ghostscript delegate you should know two things first and foremost:
By default, if you don't give an extra parameter, Ghostscript will output images with a 72dpi resolution. That's why Karl's answer suggested to add -density 600 which tells Ghostscript to use a 600 dpi resolution for its image output.
The detour of IM to call Ghostscript twice to convert first PDF => PS and then PS => PNG is a real blunder. Because you never win and harldy keep quality in the first step, but very often loose some. Reasons:
PDF can handle transparencies, which PostScript can not.
PDF can embed TrueType fonts, which Ghostscript can not. etc.pp.
Conversion in the direction PS => PDF is not that critical....)
That's why I'd suggest you convert your PDFs in one go to PNG (or JPEG) using Ghostscript directly. And use the most recent version 8.71 (soon to be released: 9.01) of Ghostscript! Here are example commands:
gswin32c.exe ^
-sDEVICE=pngalpha ^
-o output/page_%03d.png ^
-r600 ^
d:/path/to/your/input.pdf
(This is the commandline for Windows. On Linux, use gs instead of gswin32c.exe, and \ instead of ^.) This command expects to find an output subdirectory where it will store a separate file for each PDF page. To produce JPEGs of good quality, try
gs \
-sDEVICE=jpeg \
-o output/page_%03d.jpeg \
-r600 \
-dJPEGQ=95 \
/path/to/your/input.pdf
(Linux command version). This direct conversion avoids the intermediate PostScript format, which may have lost your TrueType font and transparency object's information that were in the original PDF file.
[*] D'oh! I missed to see your "linux" tag at first...
-density 600 or so should give you what you need.
At least two other tools you may want to consider:
pdfimages, which comes with the package poppler-utils, makes it easy to extract the images from a PDF without degrading them.
pdfsandwich, which can give you an OCR'd file by simply running pdfsandwich inputfile.pdf. You may need to tweak the options to get a decent result. See the official page for more info.

Resources