Convert PDF to image with high resolution to fit in page - resolution

I regularly get tree-drilling-data out of a machine that should get into reports.
The pdf-s contain too much empty space and useless information.
With convert i already managed to convert the pdf to png, cut out parts and rebuild an image i desire. It has a fine sharpness, its just too large:
Output 1: Nice, just too large
For my reports i need it in 45% size of that, or 660 pixels wide.
The best output i managed up to now is this:
Output 2: Perfect size but unsharp
Now, this is far away in quality from the picture before shrinking.
For sure, i've read this article here, that already helped.
But i think it must be possible to get an image as fine as the too large one in Output 1.
I've tried around for hours with convert -scale, -resize, -resample, playing around with values for density, sharpen, unsharpen, quality... nothing better than what i've got, using
convert -density 140 -trim input.pdf -quality 100 -sharpen 0x1.0 step1.png
then processing it to the new picture (output1, see up), that i'm putting to the correct size with
convert output1.png -resize 668x289! -unsharp 0x0.75+0.75+0.01 output2.png
I tried also "resize 668x" in order not to maybe disturb, no difference.
I find i am helpless in the end.
I am not an IT-expert, i am a computer-affin tree-consultant.
My understanding of image-processing is limited.
Maybe it would make sense to stay on a vector-based format (i tried .gif and .svg ... brrrr).
I would prefer to stay with convert/imagemagick and not to install additional software.
It has to run from command-line, as it is part of a bash-script processing multiple files. I am using Suse Linux.
Grateful for your help!

I realize you said no other software, but it can be easier to get good results from other PDF rendering engines.
ImageMagick renders PDFs by shelling out to ghostscript. This is terrific software, but it's designed for print rather than screen output. As a result, it generates very hard edges, because that's what you need if you are intending to control ink on paper. The tricks you see for rendering PDF at higher res and then resizing them fix this, but it can be tricky to get the parameters just right (as you know).
There are PDF rendering libraries which target screen output and will produce nice edges immediately. You don't need to render at high res and sample down, they just render correctly for screen in the first place. This makes them easier to use (obviously!) and a lot faster.
For example, vipsthumbnail comes with suse and includes a direct PDF rendering system. Install with:
zypper install vips-tools
Regarding the size, your 660 pixels across is too low. Some characters in your PDF will come out at only 3 or 4 pixels across and you simply can't make them sharp, there are just too few dots.
Instead, think about the size you want them printed on the paper, and the level of detail you need. The number of pixels across sets the detail, and the resolution controls the physical size of those dots when you print.
I would at least double that 668. Try:
vipsthumbnail P3_M002.pdf --size 1336 -o x.png
With your sample image I get:
Now when you print, you want those 1336 pixels to fill 17cm of paper. libvips lets you set resolution in pixels per millimetre, so you need 1336 pixels in 170 mm, or 1336 / 170, or 7.86. Try:
vips.exe copy x.png y.png[palette] --xres 7.86 --yres 7.86
Now y.png should load into librecalc at 17cm across and be nice and sharp when printed. The [palette] option after y.png enables palettised PNG, which shrinks the image to around 50kb.
The resolution setting is also called DPI (dots per inch). I find the name confusing myself -- you'll also see it called "pixels per printed inch", which I think is a much clearer.

In Imagemagick, set a higher density, then trim, then resize, then unsharpened. The higher the density, the sharper your result, but the slower it will get. Note that PNG quality of 100 is not the proper scale. It does not have quality values corresponding to 0 to 100 as in JPG. See https://imagemagick.org/script/command-line-options.php#quality. I cannot tell you the "best" numbers to use as it is image dependent. You can use some other tool such as at https://imagemagick.org/Usage/formats/#png_non-im to optimize your PNG output.
So try,
convert -density 300 input.pdf -trim +repage -resize 668x289 -unsharp 0x0.75+0.75+0.01 output.png
Or remove the -unsharp if you find that it is not needed.
ADDITION
Here is what I get with
convert -density 1200 P3_M002.pdf -alpha off -resize 660x -brightness-contrast -35,35 P3_M002.png
I am not sure why the graph itself lost brightness and contrast. (I suspect it is due to an imbedded image for the graph). So I added -brightness-contrast to bring out the detail. But it made the background slightly gray. You can try reducing those values. You may not need it quite so strong.

Great, #fmw42,
pngcrush -res 213 graphc.png done.png
from your link did the job, as to be seen here:
perfect size and sharp graph
Thank you a lot.
Now i'll try to get file-size down, as the Original pdf has 95 KiB an d now i am on 350 KiB. So, with 10 or more graphs in a document it would be maybe unnecessary large, also working on the ducument might get slow.
-- Addition -- 2023-02-04
#fmw42 : Thanks for all your effort!
Your solution with the .pdf you show does not really work - too gray for a good report, also not the required sharpness.
#jcupitt : Also thanks, vips is quick and looks interesting. vipsthumbnails' outcome ist unsharp, i tried around a bit but the docu is too abstract for me to get syntax-correct use. I could not find a dilettant-readable docu, maybe you know one?
General: With all my beginners-trials up to now i find:
the pdf contains all information to produce a large, absolutely sharp output (vector-typic, i guess)
it is no problem to convert to a png of same size without losing quality
any solutions of shrinking the png in size then result in significant (a) quality-loss or (b) file-size increase.
So, i (beginner) think that the pdf should be processed directly to the correct png-size, without later downsampling the png.
This could be done
(a) telling the conversion-process the output-size (if there is a possibility for this?) or
(b) first creating a smaller pdf, like letting it look A5 instead of A4, so a fitting .png is directly created (i need 6.5 inches wide approx.).
For both solutions i miss ability to sensefully investigate, for it takes me hours and hours to try out things and learn about the mysteries of image-processing.
The solution with pngcrush works for the moment, although i'm not really happy about the file-size (cpu and fan-power are not really important factors here).
--- Addition II --- final one 2023-02-05
convert -density 140 -trim "$datei" -sharpen 0x1.0 rgp-kopie0.png
magick rgp-kopie0.png +dither PNG8:rgp-kopie.png ## less colours
## some convert -crop and -composite here to arrange new image
pngcrush -s -res 213 graphc.png "$namenr.png"
New image is as this, with around 50 KiB, definitely satisfying for me in quality and filesize.
I thank you all a lot for contributing, this makes my work easier from now on!
... and even if i do not completely understand everything, i learnt a bit.

Related

Compress PDF after manipulation

I have the following problem:
I am receiving various scanned PDF files from a Kyocera Scanner Device.
I have to automatically manipulate these PDF Files in order to:
Delete the colors from textmarkers
Convert the PDF to grayscale
Put it in our DMS
I am using a Bash-Script to do the job.
For deleting the textmarker colors and converting to grayscale I use Imagemagick:
convert -density 150 INPUT.pdf \
-channel rgba \
-alpha set \
-fuzz 15% \
-fill white \
-opaque 'rgb(255,200,195)' \
-opaque 'rgb(255,253,177)' \
-opaque 'rgb(255,155,240)' \
-opaque 'rgb(255,91,193)' \
-colorspace gray OUTPUT-convert.pdf
The resulting image is quite good, BUT the size of the PDF is huge:
Original: 365K
Converted: 1.358K
So I've found a ghostscript command to do the job and reduce the file size:
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -dCompatibilityLevel=1.4 \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=/LeaveColorUnchanged \
-dPDFSETTINGS=/ebook \
-sOutputFile=OUTPUT-ghostscript.pdf OUTPUT-convert.pdf
Now the file sizes are:
Original: 365K
Converted: 1.358K (OUTPUT-convert.pdf)
Ghostscript: 500K (OUTPUT-ghostscript.pdf)
I can't figure out why the size of the PDF after manipulation, from color to grayscale, is higher then the original document. The density (150 dpi) is the resolution of the original document.
When I put the converted PDF (1.358K) through Adobe Acrobat on Windows and recreate the PDF, the size is 213K. I have no loss in quality. How can I achieve this under linux with a bash script?
Any help is appreciated!
Here is a link for example PDF Files:
http://62.75.158.162/download/yKLu3fkbLy7MgkczDrKdG6osHdXh3jvy/
Its not really possible to comment very much without seeing an example file, to determine exactly what has happened at each stage.
However, I very strongly suspect that you have 'lost quality', its just that, at screen resolutions, you can't tell. Your original PDF file was created using ImageMagick at a resolution of 150 dpi. Most probably the image is stored uncompressed in the PDF file, which is why its large.
When you run that PDF file back through Ghostscript there are two effects. Firstly you've used the PDFSETTINGS canned set of job configuration. That (amongst many other things) downsamples grey images to a resolution of 150 dpi (so fortunately for you, no effect). It also compresses the image data using JPEG compression.
Now I've no idea what's in the original PDF file, but if the data there was compressed using JPEG, as seems likely, then you are double applying JPEG quantisation. That's a lossy process and will result in a loss of quality.
Since you are altering the original image data (to change the colour) you have no choice about decompressing the image data. However, to preserve quality you should then not use JPEG compression again, instead you should use Flate compression. The compression ration won't be as good, but it will keep the quality unchanged. To do that you would need to specify the GrayImageFilter using distillerparams, you can't use a PDFSETTINGS for that.
I can't imagine what Acrobat has done to decrease the file size still further (and you haven't said how you 'recreate the PDF file'), but I would imagine it involves reducing the quality of the image still further. Its hard to see how it could save 50% of the file size without doing so. Its also possible it is (like Ghostscript) JPEG compressing the grayscale data but using a more aggressive set of JPEG parameters (resulting in still more loss of quality, of course).
If you posted examples of the original, Ghostscript output, and Acrobat output I might be able to tell you more, but not from this.
For what its worth, there's a new feature in Ghostscript (requires version 9.23 or better) which allows you to create a PDF file which consists only of an image, and choose the colour model. You could run the original PDF file through Ghostscript using something like:
gs -sDEVICE=pdfimage8 -r150 -sOutputFile=gs.pdf
which would produce a pretty minimal PDF file where the original input has been rendered to a gray scale image (at 150 dpi), and that image wrapped up as a PDF file. I've no idea if that might work better for you.
Later EDIT
Yep, its pretty much what I expected.
The original file has what appears to be marked JPEG compression artefacts (all the rectangular 'speckles' round the text). Obviously without seeing the original document I can't tell whether this is because the original document was a JPEG printed to paper, or whether the artefacts were introduced by the scanner, or (more likely) whatever application converted the scanned image into a PDF. Checking the image stored in the PDF file I see that it is indeed a JPEG image.
Nevertheless, the original image is (in my opinion) really very noisy.
Now the output from 'convert' is arguably slightly better (in terms of legibility) than the original. I presume this is 'something' to do with your convert command line, can't be sure. The image in this case is not a JPEG, its compressed with RunLength encoding which is of course lossless. Its also less efficient as a compression method, so the image is bigger. For reasons best known to ImageMagick it also applies a soft mask to the image data. So that's two images per page now instead of just 1. Not too surprising that its larger than the original!
I suspect that the soft mask is due to your command line including RGBA. I assume that produces an alpha channel, and PDF doens't support simple alpha channel blending, its own transparency model is much more sophisticated. So I sort of suspect you are actually making the output file here larger than it needs to be. I'm afraid I can't help you with ImageMagick, I don't know anything about it, but getting rid of that second image would help a great deal.
Note that both your original file and the output from ImageMagick are essentially uncompressed (in terms of the PDF file 'structure').
Then we come to the Ghostscript produced PDF. The 'structure' of the PDF file is itself compressed, giving small size benefits. The images are all JPEG compressed, giving additional compression, but at the cost of quality. Applying JPEG quantisation multiple times always costs quality. By simply comparing the output from 'convert' with the output from Ghostscript I can easily see the degradation in quality.
Now we come to the Acrobat output. Ccomparing it with the other files it shows the worst quality. The JPEG artefacts are very clearly visible in the displayed image. In this case both the image and the soft mask have been compressed with the JPEG2000 compression scheme, which is a 'better' compression than JPEG. However, it looks like applying it to data which has already been quantised for JPEG yields pretty poor quality results. Or at least, applying it to a soft-masked JPEG image does :-)
The main problem with JPEG2000 is that it is patent encumbered. While decoders can be written royalty-free, to write an encoder you must licence the patented technology from the (many) patent holders, an expensive process.
So the AGPL version of Ghostscript does not include a JPEG2000 decoder, and as such cannot write JPEG2000 images.
Obviously you could use a copy of Acrobat to rewrite your PDF file with JPEG2000 compression as you have done here.
Assuming you want to avoid doing that, then my suggestion would be to investigate why convert is producing an image with a soft mask applied. I strongly suspect this is due to the use of rgba instead of rgb.
Avoiding the creation of the second (soft mask) image would (I believe) significantly decrease the size of the PDF file produced by 'convert'. You could gain at least some additional benefit, without any loss of quality, by running it through Ghostscript's pdfwrite device and specifying /FlateEncode for the GrayImageFilter. That would produce a PDF file where the PDF furniture is compressed, and where a better compression scheme is applied to the image data.
You could also just leave the Ghostscript line as it is, the quality degradation may be enough for you to live with.
if you use ubuntu you can try this on the command line. the result is impressive
Install ghostscript, for Ubuntu/Debian:
sudo apt-get install ghostscript
Resize your pdf with the command:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Replace the file names output.pdf and input.pdf with your file names.
PDFs can start as vectors. But once you read it into ImageMagick, it gets rasterized. When writing back to PDF, it just imbeds the raster image into a vector PDF shell. So it has not been re-vectorized.
Your use of -density 150 has increased the rasterized file. The nominal density is 72. So have right there increased by 4x, which would just about cover your size increase. I think you stated your increase wrong. It probably should be Original: 365K Converted: 1.358M not Original: 365K Converted: 1.358K
Also if the scanned PDF was a raster in a vector, it may have had limited colors in palette form or simply compressed JPG form. Your rasterizing has converted to 24-bit color and by processing has increased the colors. So even as non-compressed grayscale it is larger.
You can compress your output PDF in ImageMagick as follows by writing the raster image to compressed JPG format and piping to another convert to write to PDF.
convert -density XXX input.pdf ... -colorspace gray -quality 50 JPG:- | convert - output.pdf
Adjust the quality value as desired

Merge pdfs in a grid

I have hundreds of pdf files representing cards for a card game. They are standard 2.5"x3.5" files build using LaTeX. In order to print them I want to concatenate them together into a single pdf with 9 cards per page (the most that can fit). Currently I do this using montage -density 300 -tile 3x3 -geometry 750x1050+50+25 a.pdf b.pdf ...
Unfortunately there are a few problems with this solution:
Loss of detail -- The -density option is necessary to get decent quality, and the result becomes dependent on desired quality. My printer can print at 600dpi, so I should use -density 600, but some printers won't handle that correctly.
Slow -- The other reason I don't use 600 is that even at 300 imagemagick is extremely slow because I think it is converting the pdfs to images, then concatenating them into a pdf.
Limited -- Additionally when the number of input files grows sufficiently large, montage will just crash without being able to create the resulting pdf.
Loss of perks -- Finally, the resulting pdf doesn't keep the metadata of the originals. Most importantly the text is no longer selectable. This means that I can't search the pdf for a specific card.
In order to fix this I am currently using pdfunite to make a long pdf with 1 card per page, then telling my printer to print 9 on each page. The result is fairly close to the desired effect, however the sizes of the cards are slightly distorted, which is a bit of a problem.
Is there any way to concatenate pdfs in a grid with specific sizes?

Getting screenshot and find location of multiple smaller images in it in linux

I want to get a screenshot of a x11 window and find the location of smaller images in it. I've had no experiences with working with images, I searched a lot, but I don't get much helpful results.
The image are from files and can be loaded with any format that is easier to use.
The getting screenshot is easy, using XGetImage. But then the question is that which format to use XYPixmap or ZPixmap? What's the difference? How each pixel is represented?
And then what about the images? Which file format is easier to use? And then how each pixel is represented in that format?
And which algorithm should I use to find the location of the images in the screenshot?
I'm really lost here. I need a push in the right direction and see some example code that can help me to understand what I'm dealing with. Couldn't find any similar work.
The language, frameworks or the tools doesn't really matter to me as long as I get it working on my ubuntu machine. I can work in either C, C++, haskell, python or javascript.
With XYPixmap, each image plane is a separate bitmap (one bit per pixel, with padding at the end each scanline). If you have 24-bit color, you get 24 separate bitmaps. To retrieve pixel value at some (x,y) coordinates, you need to fetch one bit from each of the bitmaps at these coordinates, and pack these bits into a pixel.
With ZPixmap, pixels are represented as sequences of bits, with padding at the end of each scanline. If you have 24-bit color, every 3 bytes is a pixel.
In both cases, there may bee padding in the end and sometimes in the beginning of each scanline. It is all described here.
I would not use either format directly. Convert your pixmap to a simple 1, 2, or 4 bytes-per-pixel 2D array, and do the same with the patterns you want to search. If you want to find exact matches, you can use a slightly modified string search algorithm like KMP. Fuzzy matches are tricky, I don't know of any methods that work well.

game background file of just 2KB ...how?

I am making game for mobile phone and i have little knowledge of creating graphics for games. I am making graphics using CorelDraw & Photoshop.
I made flash.png using above 2 software's & could squeeze the size to 47Kb only.....
But I came across one game which has file size just 2kb for its background (bg0 & bg1.png)
I want to know how do I make such beautiful graphics without increasing the size of my file...
I assume the gamer must have hand sketched, scanned & used one of the above software's to fill the colors.....but i am not sure about it...
plz help
There are several ways to reduce the size of a PNG:
Reduce the colour depth. Don't use RGB true/24 bit colour, use an indexed colour image. You need to add a palette to the image, but each pixel is one byte, not two.
Once you have an indexed colour image, reduce the number of colours in the palette. There is a limit to how many colours you can reduce it by - the fewer colours, the lower the image quality.
Remove unnecessary PNG chunks. Art packages may add additional data to the PNG that isn't image data (creation date, author info, resolution, comments, etc.)
Check http://pmt.sourceforge.net/pngcrush/ to get rid of unneeded PNG chunks and compress the IDAT chunk even further. This might help a lot or not at all depending on the PNG that came out of the art packages. If it doesn't help, consider index PNGs. And if you go for paletized PNGs be sure to check out http://en.wikipedia.org/wiki/Color_cycling for cool effects you might be able to use.
Use a paletted png with few colors and then pass the png through a png optimizer like the free exe PngOptimizer
If your png still is too big reduce the number of colors used and reoptimize. Rince and repeat ^^.
I have used this technique on quite a lot of mobile games where size was of the essence.

Reducing colors in a PNG image is making the file size bigger

I am using ImageMagick to programmatically reduce the size of a PNG image by reducing the colors in the image. I get the images unique-colors and divide this by 2. Then I assign this value to the -colors option as follows:
variable = unique-colors / 2
convert image.png -colors variable -depth 8
I thought this would substantially reduce the size of the image but instead it increases the images size on disk. Can anyone shed any light on this.
Thanks.
EDIT: Turns out the problem was dithering. Dithering helps your reduced color images look more like the originals but adds to the image size. To remove dithering in ImageMagick add +dither to your command.
Example
convert CandyBar.png +dither -colors 300 -depth 8 smallerCandyBar.png
Imagemagick probably uses some dithering algorithm to make image appear as though it has original amount of colors. This increases image data "randomness" (single pixels are recolored at some places to blend into other colors) and this image data no longer packs as well. Research further into how the convert command does the dithering. You can also see this effect by adding second image as a layer in gimp/equivalent program and tuning transparency.
You should use pngquant for this.
You don't need to guess number of colors, it has actual --quality setting:
pngquant --verbose --quality=70 image.png
The above will automatically choose number of colors needed to match given quality in the same scale as JPEG quality (100 = perfect, 70 = OK, 20 = awful).
pngquant has substantially better quantization algorithm, and the better the quantization the better quality/filesize ratio.
And pngquant doesn't dither areas that look good without dithering, and this avoids adding unnecessary noise/randomness to the file.
The "new" PNG's compression is not as good as the one of the original.

Resources