Postscript to PDF scale to fit into A4

Postscript to PDF scale to fit into A4 - linux

I need to create an A4 PDF file by fitting into page this 13.44x16.44 inches Postscript file. I thought ps2pdf could help me but I cannot get the desired effect.
I use this command to create the PDF:
ps2pdf -dFIXEDMEDIA -dPDFFitPage -sPAPERSIZE=a4 ori.postscript salida.pdf
Please note I used -dFIXEDMEDIA and -dPDFFitPage to force fit the Postscript file into the A4 paper size, but those apparently aren't working.
This is the original file:
Edit: Here's the original file
And this is the resulting file. As you can see, the image isn't resized to fit, but just placed as is:

Firstly; the order of operands in Ghostscript is important, they are applied in the command line order. So you would want to apply the -sPAPERSIZE before you apply -dFIXEDMEDIA and both of those before you apply -dPDFFitPage.
I'd also suggest that you use Ghostscript directly rather than using the ps2pdf script.
If that still doesn't work for you, then you will need to provide an example file to show the problem, I can't tell you anything by looking at pictures.
You should also state the operating system and version of Ghostscript being used.
EDIT
The problem is that your PostScript program doesn't request a media size, it simply draws on whatever media happens to be available at the time. Some programs will rescale their content to fit whatever media is currently available, this isn't one of them. Anything which doesn't lie on the current media is allowed to be clipped off.
The 'FitPage' code relies on the PostScript program requesting a media size, which it then compares to the current (fixed) size. From that it works out how much to scale the content so that it fits into the new media.
If your program doesn't request a media size then there's no way for Ghostscript to know how much to scale it so it fits.
Now your program does have BoundingBox comments, but those are just comments, a PostScript consumer will ignore them. But you can use them.....
You can either modify the header of your PostScript program to pretend its an EPS instead of a PostScirpt program. :
Change
%!PS-Adobe-2.0
To
%!PS-Adobe-2.0 EPSF-3.0
and then use -dEPSFitPage instead of -dPDFFitPage then it will produce something like what (I think) you want. Note that PDFFitPage is for PDF input, so you shouldn't really be using it anyway. For PostScript input you want -dPSFitPage
Alternatively, read the BoundingBox comments and apply a media size request and origin translation yourself.
This command:
gs -sPAPERSIZE=a4 -dFIXEDMEDIA -dPSFitPage -sDEVICE=pdfwrite -sOutputFile=\temp\out.pdf -c "<</PageSize [968 1184]>> setpagedevice -20 -50 translate" -f d:\temp\ori.eps
Produces the same output as treating the file as EPS would.

Related

GhostScript PS to PDF converting - cropted some parts

I tried to convert Python Tkinter canvas to pdf. For that I used Ghostscript. Here is the code part,
canvas.postscript(file="tmp.ps",colormode='color')
somecommand = "gswin64c -o output.pdf -sDEVICE=pdfwrite -g57750x62070 - dPDFFitPage tmp.ps"
call(somecommand, shell=True)
The output pdf with large size but the pdf shows canvas GUI cropped and it is in bottom left corner of the pdf.
I want to show complete canvas on pdf.

You've specified -dPDFFitPage, but your input file appears to be PostScript (judging by the '.ps' extension and your question title). PDFFitPage works with PDF input. Even using -dPSFitPage of the simpler -dFitPage will only work if the input PostScript program requests a media size. If it doesn't then the interpreter can't tell what its bounding box is, and so cannot scale it to fit the media.
You've also specified a media size in pixels (-g57750x62070) which is entirely inappropriate when the input and output are vector formats. For what it's worth, you are specifying a fixed media size of (approximately) 80 inches by 86 inches, using the default resolution of 720 dpi.
If all you want to do is turn a PostScript file into a PDF file then the simpler:
gs -sDEVICE=pdfwrite -o out.pdf input.ps
is sufficient.

Ghostscript : Crop Certain Area?

I am new to ghostscript.
I have a pdf which contains a card. i want to crop that card out.
Currently with the understanding of document i am only able to convert the pdf to image but have no luck in cropping.
Saw every other related question but there are not working for me.
This is code i used in batch file for converting the pdf to image:
"C:\Program Files\gs\gs9.50\bin\gswin64c.exe" -sDEVICE=png16m -r300 -o c:\users\jen\desktop\pdf.png -f "c:\users\jen\desktop\pdf.pdf
pause
now i don't know how to crop with it too ?
i want to crop at certain postition like: Left:28 Top:524 Width:492.3 Height:161
EDIT
I will be using this in firebase functions.
Example PDF file THE_PDF_TO_CROP. I want to cutout the blue area of pdf to image.

You need to set several parameters; Firstly you need to specify the width and height of the output bitmap. You can use either -dDEVICEHEIGHTPOINTS and -dDEVICEWIDTHPOINTS, or alternatively you can specify the output size in pixels using -g<x>x<y> where and are the number of pixels in the x and y directions. Obviously that will vary depending on the resolution. You can't (obviously) use fractional pixels.
If you use -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS then you also need to set -dFIXEDMEDIA to tell the interpreter not to use the media size from the PDF file instead.
So that shoould create an output bitmap of the correct size. If you try rendering your file using just that, you will see that it renders just a portion of the page from the bottom left. So now you need to shift the content around so that the portion you want lies at the bottom left of the media. You can do that by using the PageOffset PostScript operator.
You haven't given any numbers, nor supplied an example file, so lets say (for the sake of example) that you want to render a 1 inch by 2 inch portion of the document. Lets further say that you the part you want rendered starts 2.5 inches from the left edge, and 1.5 inches from the bottom edge.
A suitable command line would be:
gs -sDEVICEWIDTHPOINTS=72 -dDEVICEHEIGHTPOINTS=144 -dFIXEDMEDIA -r300 -sDEVICE=png16m -o out.png -c "<</PageOffset [-180 -108]>> setpagedevice" -f input.pdf
Note that PDF (and PostScript) units are 1/72 inch so 72 = 1 inch, 144 = 2 inches. You need to shift the origin of the page down and left, which is why the values for PageOffset are negative.
If that doesn't work for you I'll need to see your PDF file and you'll need to tell me which version of Ghostscript you are using.

Embed ICC color profile in PDF

I am generating a PDF where all the graphics are drawn in \DeviceRGB in the sRGB color space. I would like to convert the PDF into a different Color Profile using an ICC profile and embed the ICC profile, but I can't find a good tool to do this.
I have tried ImageMagick, but that rasterizes the PDF which is undesirable, and I have tried using Ghostscript. But while that converts the colors, it doesn't embed the ICC profile.
Is there any tool or library (preferably Java or Scala) available for Linux that does what I want?
The Ghostscript commands I have tried are:
gs -o cmyk.pdf -sColorConversionStrategy=CMYK -sDEVICE=pdfwrite \
-dOverrideICC=true -sOutputICCProfile=CoatedFOGRA27.icc \
-dRenderIntent=3 in.pdf
and
gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -ColorConversionStrategy=CMYK \
-dProcessColorModel=/DeviceCMYK -sOutputICCProfile=CoatedFOGRA27.icc \
-sOutputFile=cmyk.pdf in.pdf
and several variations of the above. I have tried both Ghostscript version 9.10 and 9.16.

Use Ghostscript v9.16 or higher:
www.ghostscript.com/download/
Read its documentation about ICC color profile support, available here:
Ghostscript 9.15 Color Management (PDF)
Here's a possible command to convert the color space and embed the ICC profile:
gs -o cmyk-doc.pdf \
-sDEVICE=pdfwrite \
-dOverrideICC=true \
-sDefaultCMYKProfile=/path/to/mycmykprofile.icc \
-sOutputICCProfile=/path/to/mydeviceprofile.icc \
-dRenderIntent=3 \
-dDeviceGrayToK=true \
input-doc.pdf
(-dRenderIntent : possible arguments are 0 (Perceptual), 1 (Colorimetric), 2 (Saturation), and 3 (Absolute Colorimetric).)
Caveats
If you look at a PDF file on screen (or on paper, when printed) converted with above command and use a:
non-calibrated monitor/screen;
non-calibrated print device;
non-calibrated room illumination; or
PDF reader which cannot handle embedded ICC profiles, then
you may be disappointed. Using the wrong ICC profile or paper type that does not match the one expected by the output profile can also lead to issues.

AFAIU, Ghostscript 9.12-9.27 is unable to do what you expect.
But you might be able to partially achieve your goals:
Try UseDeviceIndependentColor. This won't embed your profile, and won't convert colors to your profile. But it would make you colors "colorimetrically defined" and would embed some icc profile. If your aim is to "preserve" colors, that might work for you.
Try PDF/X-3 output, embed "Output Intent" icc profile.
Try to adjust the DefaultRGB colorspace - note the following phrase in the docs:
If a user needs an non trivial color adjustment, a non trivial DefaultRGB color space must be defined
(I've never tried this.)
Try collink. (I've never managed to make this work.)
A toy example
Original file:
The gs command:
gswin64c -dPDFX -dBATCH -dNOPAUSE -dHaveTransparency=false -r20
-dProcessColorModel=/DeviceCMYK -sColorConversionStrategy=UseDeviceIndependentColor
-sDefaultRGBProfile="default_rgb.icc" -sOutputICCProfile="cmyk_des_renderintent.icc"
-dRenderIntent=1 -dDefaultRenderingIntent=/Perceptual -sDEVICE=pdfwrite
-sOutputFile=colorbar_v1.pdf PDFX_IntCmyk.ps Colorbar.pdf
The output looks like this in Adobe Acrobat (it honors embedded "Output Intent" icc profile):
Same file in Foxit Reader (it ignores embedded "Output Intent"):
What's happening here:
The cmyk_des_renderintent.icc
profile, as documented in "Ghostscript 9.21 Color Management",
is designed such that different intents output different colors:
"Perceptual" rendering intent (0) outputs cyan only,
"RelativeColorimetric" intent (1) outputs magenta only
"Saturation" rendering intent (2) outputs yellow only.
-dHaveTransparency=false makes sure that the 2nd page would get rasterized (due to the presence of a tikz pic with transparency)
-r20 makes sure rasterization would be clearly visible (due to just 20dpi)
-sOutputICCProfile="cmyk_des_renderintent.icc" -dRenderIntent=1 makes rasterizer produce magenta output.
Note that OutputICCProfile parameter is not mentioned in current docs,
since this (9.27 docs are a bit outdated).
RenderIntent is also undocumented in this context. It only affects rasterization as well.
-dDefaultRenderingIntent=/Perceptual puts said intent to metadata, alongside "Output Intent icc profile". This makes Acrobat draw everything in cyan.
-sDefaultRGBProfile="default_rgb.icc" is a placeholder for possible experiments with input icc profiles. Same default is set if this parameter is omitted.
If you know that your input profile is sRGB (but it is not embeded - the pdf is plain \DefaultRGB), it might be a good idea to explicitly specify the profile here.
Even though sRGB is the default.
I use modified gs/lib/PDFX_def.ps
from the Ghostscript repo, which embeds cmyk_des_renderintent.icc as the "Output Intent".
You can find all files used in this experiment here.
There are several other experiments as well.
I've created them trying to understand how Color Management works in gs. I hope they shed some light on the subject.
There's also a comparison with Adobe Acrobat "Convert Colors" tool. AFAIU, it does exactly what you expect.
When it comes to Color Management for pdf output, KenS (gs dev) usually says "the pdfwrite device goes to extreme lengths to maintain color specifications from the input unchanged in the output". It looks like they do not really focus on things like conversion from one profile to another in this case. Well... This is hardly "the most requested" feature.

Extracting Text from a PDF file with embedded font

I have a PDF file containing some tabular data.
http://dl.dropbox.com/u/44235928/sample_rotate-0.pdf
I have to extract the tabular data from it. I have tried following with no success :
Select the text and paste it to notepad/excel-sheet. (I am getting junk characters)
Used save as text from Acrobat Reader. It is also giving junk characters and not the actual text.
Tried ApachePDFBox command line utility to extract text from PDF. It is also giving junk characters instead of real texts.
Finally I am trying a OCR solution. I am converting the pdf file into .tif images using ImageMagick and getting those images processed by tesseract OCR.
The OCR solution is not very accurate though( about 80% words matched ).
I tried changing density and geometry of the image created from PDF to get better results from tesseract OCR.
convert -rotate 90 -geometry 10000 -depth 8 -density 800 sample.pdf img_800_10000.tif;
tesseract img_800_10000.tif img_800_10000.tif nobatch letters;
I am not sure for what kind of image( density, geometry, monochromatic, sharpen boundary etc) would be best suited for the OCR.
Please suggest what could be the best possible parameters(density,geometry,depth etc) for generating images from a PDF file, so that the tesseract accuracy will increase.
I am open to other( non-ocr ) solutions as well.

In this case I recommend to NOT use ImageMagick for the PDF -> TIFF conversion. Instead, use Ghostscript. Two reasons:
Using Ghostscript directly will give you more control over individual parameters of the conversion.
ImageMagick cannot do that particular conversion itself -- it will call Ghostscript as its 'delegate' anyway, but will not allow you to give all the same fine-grained control that your own Ghostscript command will give you.
Most of the text in the table of your sample PDF is extremely small (I guess, only 4 or 5 pt high). This makes it rather difficult to run a successful OCR unless you increase the resolution considerably.
Ghostscript uses -r72 by default for image format output (such as TIFF). Tesseract works best with r=300 or r=400 -- but only for a font size from 10-12 pt or higher. Therefor, to compensate for the small text size you should make Ghostscript using a resolution of at least 1200 DPI when it renders the PDF to the image.
Also, you'll have to rotate the image so the text displays in the normal reading direction (not bottom -> top).
This is the command which I would try first:
gs \
-o sample.tif \
-sDEVICE=tiffg4 \
-r1200 \
-dAutoRotatePages=/PageByPage \
sample_rotate-0.pdf
You may need to play with variations of the -r1200 parameter (higher or lower) for best results.

Since a comment asked "How to define the geometry of an image when using Ghostscript as we do in convert?", here is an answer:
It does not make sense to define geometry (that is image dimensions) and resolution for a raster image created by Ghostscript at the same time.
Once you convert a vector based page of a given dimension (such as PDF) into a raster image (such as the TIFF G4 format) giving a desired resolution (as done in the other answer), you already indirectly and implicitly also did set the dimension:
The original PDF dimension of your sample file sample_rotate-0.pdf is 1008x612 points.
At a resolution of 72 DPI (the default Ghostscript uses if not given directly, or -r72 in the Ghostscript command if given directly) the image dimensions will be 1008x612 pixels.
At a resolution of 720 DPI (-r720 in the Ghostscript command) the image dimensions will be 10080x6120 pixels.
At a resolution of 1440 DPI (-r1440 in the Ghostscript command of my other answer) the image dimensions will be 20160x12240 pixels.
At a resolution of 1200 DPI (-r1200 in the Ghostscript command) the image dimensions will be 16800x10200 pixels.
At resolution of 1000 DPI (-r1000 in the Ghostscript command) the image dimensions will be 14000x8500 pixels.
At a resolution of 120 DPI (-r120 in the Ghostscript command) the image dimensions will be 1680x1020 pixels.
At resolution of 100 DPI (-r100 in the Ghostscript command) the image dimensions will be 1400x850 pixels.
If you absolutely insist to specify the dimension/geometry for the output image on the Ghostscript commandline (rather than the resolution), you can do so by adding -gNNNNxMMMM -dPDFFitPage to the commandline.

There you can find decoded content of your file: https://docs.google.com/open?id=0B1YEM-11PerqSHpnb1RQcnJ4cFk
A absolutely sure the OCR is the best way to read pdf file, but you can try REGEX-ing the native content. It going to be be the hard and long way.

Convert image to indexed color with custom palette through console

I have image.png in truecolor,
palette.png (N colors, where N>256) or text file, where list RGB color palette.
How to get a picture with this palette?
If I use imagemagick:
convert image.png -remap palette.png remap_image.png
It does not work.
convert image.png -map palette.png remap_image.png
Gives a very bad quality. The image is very noisy. File size is bigger than before.
GIMP gives best quality:
Сonvert image to indexed color > use custom palette
But GIMP is GUI. I need to convert a lot images in the console without running the gimp and X.org.

Using a shared palette across multiple images requires a carefully crafted palette. If you don't take great care when using the palette of a single image across many images, the result will be poor.
This needn't be complicated though. If you have accesss to the GIMP (or other tool) which supports truecolor graphics, you can create a large image and fit all of the smaller images into it, then quantize the image to N colors, then use that palette as the source.
you should be able to closely mimic GIMP's behavior in the console using ImageMagick
Once you've got a truecolor image with all the colors you want to quantize,
# Create an 8-bit png from our source, with a 235-color palette as an example.
convert truecolor_source.png -colors 235 palette.png
# Create an 8-bit png from an arbitrary image and use the palette in palette.png
convert sample.png -map palette.png output.png
There are a number of options for down-sampling colors, like dithering. See the ImageMagickv6 example page for an excellent overview with example pictures and code.
Although I still don't exactly understand what you want to do, your currently most recent comment ("Yes, from RGB to palette will set independently. Need set correct quantity of colors"), it sounds like all you want to do is set a strict limit on the amount of colors of a bunch of images, but they don't need to use the same palette.
In that case, the solution is very simple:
convert sample.png -colors 135 output.png
Try playing with the quantization options if the result isn't to your satisfaction.
If the output image is too large for your liking, you can experiment with the -quality option.
If this still isn't satisfactory, please try to explain your goal in a more detailed manner.
Good luck!

cat photo.png | pngnq -s 1 > photoindexed.png

I tend to get good results with the "-remap" (single imge) or "+remap" (multiple images) functions in combination with "-colors". Read up on those functions here. Note that "with "-remap" you provide IM with the final set of colors you want to use for the image, whether you plan to dither those colors, or just replace the ones with their nearest neighbours.", meaning just remapping/replacing might not look good enough, as colors from the input image are simply replaced by those from the palette image. Some form of dithering will be necessary to distribute pixel color conversion errors throughout the output image, because not all colors in the palette match those of the input image.
I'd suggest you use the "-colors N" option for that. This will reduce your output image color count to a maximum of N. By default ImageMagick uses "-dither Riemersma" for this implicitly when you specify "-colors N". The are also other dithering options available.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string