Embed ICC color profile in PDF - linux

I am generating a PDF where all the graphics are drawn in \DeviceRGB in the sRGB color space. I would like to convert the PDF into a different Color Profile using an ICC profile and embed the ICC profile, but I can't find a good tool to do this.
I have tried ImageMagick, but that rasterizes the PDF which is undesirable, and I have tried using Ghostscript. But while that converts the colors, it doesn't embed the ICC profile.
Is there any tool or library (preferably Java or Scala) available for Linux that does what I want?
The Ghostscript commands I have tried are:
gs -o cmyk.pdf -sColorConversionStrategy=CMYK -sDEVICE=pdfwrite \
-dOverrideICC=true -sOutputICCProfile=CoatedFOGRA27.icc \
-dRenderIntent=3 in.pdf
and
gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -ColorConversionStrategy=CMYK \
-dProcessColorModel=/DeviceCMYK -sOutputICCProfile=CoatedFOGRA27.icc \
-sOutputFile=cmyk.pdf in.pdf
and several variations of the above. I have tried both Ghostscript version 9.10 and 9.16.

Use Ghostscript v9.16 or higher:
www.ghostscript.com/download/
Read its documentation about ICC color profile support, available here:
Ghostscript 9.15 Color Management (PDF)
Here's a possible command to convert the color space and embed the ICC profile:
gs -o cmyk-doc.pdf \
-sDEVICE=pdfwrite \
-dOverrideICC=true \
-sDefaultCMYKProfile=/path/to/mycmykprofile.icc \
-sOutputICCProfile=/path/to/mydeviceprofile.icc \
-dRenderIntent=3 \
-dDeviceGrayToK=true \
input-doc.pdf
(-dRenderIntent : possible arguments are 0 (Perceptual), 1 (Colorimetric), 2 (Saturation), and 3 (Absolute Colorimetric).)
Caveats
If you look at a PDF file on screen (or on paper, when printed) converted with above command and use a:
non-calibrated monitor/screen;
non-calibrated print device;
non-calibrated room illumination; or
PDF reader which cannot handle embedded ICC profiles, then
you may be disappointed. Using the wrong ICC profile or paper type that does not match the one expected by the output profile can also lead to issues.

AFAIU, Ghostscript 9.12-9.27 is unable to do what you expect.
But you might be able to partially achieve your goals:
Try UseDeviceIndependentColor. This won't embed your profile, and won't convert colors to your profile. But it would make you colors "colorimetrically defined" and would embed some icc profile. If your aim is to "preserve" colors, that might work for you.
Try PDF/X-3 output, embed "Output Intent" icc profile.
Try to adjust the DefaultRGB colorspace - note the following phrase in the docs:
If a user needs an non trivial color adjustment, a non trivial DefaultRGB color space must be defined
(I've never tried this.)
Try collink. (I've never managed to make this work.)
A toy example
Original file:
The gs command:
gswin64c -dPDFX -dBATCH -dNOPAUSE -dHaveTransparency=false -r20
-dProcessColorModel=/DeviceCMYK -sColorConversionStrategy=UseDeviceIndependentColor
-sDefaultRGBProfile="default_rgb.icc" -sOutputICCProfile="cmyk_des_renderintent.icc"
-dRenderIntent=1 -dDefaultRenderingIntent=/Perceptual -sDEVICE=pdfwrite
-sOutputFile=colorbar_v1.pdf PDFX_IntCmyk.ps Colorbar.pdf
The output looks like this in Adobe Acrobat (it honors embedded "Output Intent" icc profile):
Same file in Foxit Reader (it ignores embedded "Output Intent"):
What's happening here:
The cmyk_des_renderintent.icc
profile, as documented in "Ghostscript 9.21 Color Management",
is designed such that different intents output different colors:
"Perceptual" rendering intent (0) outputs cyan only,
"RelativeColorimetric" intent (1) outputs magenta only
"Saturation" rendering intent (2) outputs yellow only.
-dHaveTransparency=false makes sure that the 2nd page would get rasterized (due to the presence of a tikz pic with transparency)
-r20 makes sure rasterization would be clearly visible (due to just 20dpi)
-sOutputICCProfile="cmyk_des_renderintent.icc" -dRenderIntent=1 makes rasterizer produce magenta output.
Note that OutputICCProfile parameter is not mentioned in current docs,
since this (9.27 docs are a bit outdated).
RenderIntent is also undocumented in this context. It only affects rasterization as well.
-dDefaultRenderingIntent=/Perceptual puts said intent to metadata, alongside "Output Intent icc profile". This makes Acrobat draw everything in cyan.
-sDefaultRGBProfile="default_rgb.icc" is a placeholder for possible experiments with input icc profiles. Same default is set if this parameter is omitted.
If you know that your input profile is sRGB (but it is not embeded - the pdf is plain \DefaultRGB), it might be a good idea to explicitly specify the profile here.
Even though sRGB is the default.
I use modified gs/lib/PDFX_def.ps
from the Ghostscript repo, which embeds cmyk_des_renderintent.icc as the "Output Intent".
You can find all files used in this experiment here.
There are several other experiments as well.
I've created them trying to understand how Color Management works in gs. I hope they shed some light on the subject.
There's also a comparison with Adobe Acrobat "Convert Colors" tool. AFAIU, it does exactly what you expect.
When it comes to Color Management for pdf output, KenS (gs dev) usually says "the pdfwrite device goes to extreme lengths to maintain color specifications from the input unchanged in the output". It looks like they do not really focus on things like conversion from one profile to another in this case. Well... This is hardly "the most requested" feature.

Related

Ghostscript : Crop Certain Area?

I am new to ghostscript.
I have a pdf which contains a card. i want to crop that card out.
Currently with the understanding of document i am only able to convert the pdf to image but have no luck in cropping.
Saw every other related question but there are not working for me.
This is code i used in batch file for converting the pdf to image:
"C:\Program Files\gs\gs9.50\bin\gswin64c.exe" -sDEVICE=png16m -r300 -o c:\users\jen\desktop\pdf.png -f "c:\users\jen\desktop\pdf.pdf
pause
now i don't know how to crop with it too ?
i want to crop at certain postition like: Left:28 Top:524 Width:492.3 Height:161
EDIT
I will be using this in firebase functions.
Example PDF file THE_PDF_TO_CROP. I want to cutout the blue area of pdf to image.
You need to set several parameters; Firstly you need to specify the width and height of the output bitmap. You can use either -dDEVICEHEIGHTPOINTS and -dDEVICEWIDTHPOINTS, or alternatively you can specify the output size in pixels using -g<x>x<y> where and are the number of pixels in the x and y directions. Obviously that will vary depending on the resolution. You can't (obviously) use fractional pixels.
If you use -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS then you also need to set -dFIXEDMEDIA to tell the interpreter not to use the media size from the PDF file instead.
So that shoould create an output bitmap of the correct size. If you try rendering your file using just that, you will see that it renders just a portion of the page from the bottom left. So now you need to shift the content around so that the portion you want lies at the bottom left of the media. You can do that by using the PageOffset PostScript operator.
You haven't given any numbers, nor supplied an example file, so lets say (for the sake of example) that you want to render a 1 inch by 2 inch portion of the document. Lets further say that you the part you want rendered starts 2.5 inches from the left edge, and 1.5 inches from the bottom edge.
A suitable command line would be:
gs -sDEVICEWIDTHPOINTS=72 -dDEVICEHEIGHTPOINTS=144 -dFIXEDMEDIA -r300 -sDEVICE=png16m -o out.png -c "<</PageOffset [-180 -108]>> setpagedevice" -f input.pdf
Note that PDF (and PostScript) units are 1/72 inch so 72 = 1 inch, 144 = 2 inches. You need to shift the origin of the page down and left, which is why the values for PageOffset are negative.
If that doesn't work for you I'll need to see your PDF file and you'll need to tell me which version of Ghostscript you are using.

Black color showing on CMY channels when converted to CMYK using GhostScript

I am trying to generate a PDF using a library called wkhtmltopdf to create an RGB pdf. I am then using ghostscript to convert it to a CMYK format, however, the black text that is in the pdf is not pure black [cmyk(0,0,0,1)].
The black color is visible in other channels.
The command for ghostscript is:
gs -dBATCH -dNoOutputFonts -dNOPAUSE -dTextBlackPt=1 -dBlackPtComp=1 -sTextICCProfile -dNOCACHE -sDEVICE=pdfwrite -sProcessColorModel=DeviceCMYK -sColorConversionStrategy=CMYK -sOutputICCProfile=ps_cmyk.icc -sDefaultRGBProfile=srgb.icc -dOverrideICC=true -dRenderIntent=1 -sOutputFile=cmyk11.pdf test-rgb-cmyk.pdf
Any help would be massively appreciated! Been at this for a few days now. Thanks!
Ghostscript version: 9.26
Example pdf: https://drive.google.com/file/d/1nSM05b0O6fEb_0Z1rr2REbOPQAdwolTA/view?usp=drivesdk
Almost all the switches you are using will have no effect with the pdfwrite device, they are specific to rendering devices (bitmap output). In particular the -dTextBlackPt, -dBlackPtComp and TextICCProfile will do nothing.
In order to properly colour manage the conversion you need to specify input and output ICC profiles. If memory serves, you need to alter the default Gray, RGB and CMYK profiles that Ghostscript uses.
Really I'd need to see an example file (as simple as possible) and it would obviously be useful to know which version of Ghostscript you are using. If it's not the current version then I'd suggest you upgrade anyway.
Examples of Ghostscript commands to convert PDF from sRGB or eciRGB_v2 to eciCMYK_v2 (FOGRA 59) while keeping black plain (K only), not rich (CMYK)
I managed to convert PDF file in RGB to CMYK while keeping RGB black #000000 plain K black, not rich CMYK black.
DeviceLink ICC had to be created from e.g. eciRGB_v2 (or sRGB, if your source is sRGB) profile to the appropriate CMYK profile using collink tool (from argyll package) with the "-f" attribute to hack the black colors.
Ghostscript is then called with a control file declaring use of the profile and its parameters.
Example of making the DeviceLink RGB to CMYK profile
collink -v -f eciRGB_v2.icc eciCMYK_v2.icc eciRGB_v2_to_eciCMYK_v2.icc
Example of a control file to map eciRGB_v2 to eciCMYK_v2 (control-eciRGB_v2.txt)
Image_RGB eciRGB_v2_to_eciCMYK_v2.icc 0 1 0
Graphic_RGB eciRGB_v2_to_eciCMYK_v2.icc 0 1 0
Text_RGB eciRGB_v2_to_eciCMYK_v2.icc 0 1 0
(note must be separated by tabs, not spaces)
Sample ghostscript command to do the actual conversion
gs -o 2-output-cmyk-from-eciRGB.pdf \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=CMYK \
-sSourceObjectICC=control-eciRGB_v2.txt \
1-input-rgb.pdf

Postscript to PDF scale to fit into A4

I need to create an A4 PDF file by fitting into page this 13.44x16.44 inches Postscript file. I thought ps2pdf could help me but I cannot get the desired effect.
I use this command to create the PDF:
ps2pdf -dFIXEDMEDIA -dPDFFitPage -sPAPERSIZE=a4 ori.postscript salida.pdf
Please note I used -dFIXEDMEDIA and -dPDFFitPage to force fit the Postscript file into the A4 paper size, but those apparently aren't working.
This is the original file:
Edit: Here's the original file
And this is the resulting file. As you can see, the image isn't resized to fit, but just placed as is:
Firstly; the order of operands in Ghostscript is important, they are applied in the command line order. So you would want to apply the -sPAPERSIZE before you apply -dFIXEDMEDIA and both of those before you apply -dPDFFitPage.
I'd also suggest that you use Ghostscript directly rather than using the ps2pdf script.
If that still doesn't work for you, then you will need to provide an example file to show the problem, I can't tell you anything by looking at pictures.
You should also state the operating system and version of Ghostscript being used.
EDIT
The problem is that your PostScript program doesn't request a media size, it simply draws on whatever media happens to be available at the time. Some programs will rescale their content to fit whatever media is currently available, this isn't one of them. Anything which doesn't lie on the current media is allowed to be clipped off.
The 'FitPage' code relies on the PostScript program requesting a media size, which it then compares to the current (fixed) size. From that it works out how much to scale the content so that it fits into the new media.
If your program doesn't request a media size then there's no way for Ghostscript to know how much to scale it so it fits.
Now your program does have BoundingBox comments, but those are just comments, a PostScript consumer will ignore them. But you can use them.....
You can either modify the header of your PostScript program to pretend its an EPS instead of a PostScirpt program. :
Change
%!PS-Adobe-2.0
To
%!PS-Adobe-2.0 EPSF-3.0
and then use -dEPSFitPage instead of -dPDFFitPage then it will produce something like what (I think) you want. Note that PDFFitPage is for PDF input, so you shouldn't really be using it anyway. For PostScript input you want -dPSFitPage
Alternatively, read the BoundingBox comments and apply a media size request and origin translation yourself.
This command:
gs -sPAPERSIZE=a4 -dFIXEDMEDIA -dPSFitPage -sDEVICE=pdfwrite -sOutputFile=\temp\out.pdf -c "<</PageSize [968 1184]>> setpagedevice -20 -50 translate" -f d:\temp\ori.eps
Produces the same output as treating the file as EPS would.

Method to remove color artifacts on stills from DV tapes

I'm trying to use optical character recognition (OCR) to read text printed on digital video (DV) tapes. I'm using cropped still frames from the video for the OCR process. The text is white, but there are color artifacts (maybe composite color artifacts) so that the white text has color bleeding onto it (see example below). The colors look to be in magenta-cyan-yellow colorspace, maybe?
OCR results would likely be improved if I could remove/filter those colors to leave only white on the text. Then I can create a binary black/white image. I can do this now, but I suspect results will improve if I can remove colors from the white text before OCR, and this will hopefully help separate the white text from the background image.
Are there any ways, using Imagemagick preferably, to filter out those colors from the white text? I'm not sure of the best way to approach this since there are multiple colors bleeding, and the background changes in each frame. Currently using Imagemagick version 6.9.2-3 Q16 x64 on Windows 7.
Sample full-frame image:
Sample of cropped region with text (note color-bleed and white text blending into background):
I would suggest leveraging ImageMagick's FX & Morphology Dilate to preprocess the image. But to be honest, it'll take a bit of trial & error to find the solution that would work for you. I would also recommend that whatever solution you develop allows graceful error handling (i.e. If attempted OCR process unsuccessful, emit warning, and progress video to next I-frame & repeat.)
Fx Preprocessing
The -fx operator will allow you to create user-defined mathematical expression. Some quick google search about chrome-keys, and other tolerance methods might be helpful. But for many OCR techniques, it's usually common to reduce the colors to a "uniformed" gray scale.
convert aaA7b.png -fx 'intensity' intensity.png
Morphology Preprocessing
Morphology allows common & custom kernels to alter surrounding pixels. As video scanlines + other artifacts are distorting the text, I would recommend exploring Dilate, but there are many other techniques listed in the Usage documents.
Diamond
convert aaA7b.png -fx 'intensity' \
-morphology Dilate Diamond:1 diamond.png
Square
convert aaA7b.png -fx 'intensity' \
-morphology Dilate Square:1 square.png
Plus
convert aaA7b.png -fx 'intensity' \
-morphology Dilate Plus:1 plus.png
Custom
And if you need something more exact, create your own kernel by supplying the following format size: row1 row2 ... rowN. In this example, I'm creating a 3x3 kernel with a single vertical line to offset the video scanlines.
convert aaA7b.png -fx 'intensity' \
-morphology Dilate \
'3x3: nan,1,nan nan,1,nan nan,1,nan' user_defined.png
But YMMV. Also take a look at Fred's TextCleaner script. The -deskew & -sharpen operators will help reduce the noise.
Sample of cropped region with text (note color-bleed and white text blending into background):
I think there's a saying "You can't make steak from a hamburger." or something like that. At some point the background will washout the text in the foreground, and it's time better spent to create a solution that acknowledges this.

Generating images using MetaUML

How do you generate images(*.svg, *.gif, *.png ...) with MetaUML?
MetaUML is built on top of MetaPost, which is a tool used with LaTeX to typeset documents. So the output of the MetaUML is ment as graphics for LaTeX documents, which is a PostScript based vector graphics format similar to EPS. You can find more on the MetaPost wiki page.
However you can convert the resulting file, just google "metapost to svg" for example, first hit is this tutorial: http://www.tlhiv.org/MetaPost/tools/mptosvg/

Resources