JPEG compression quality in a TIFF image file - jpeg

I built the tiff library from source including zlib and jpeg compression options. I can save with either compression type depending if I want lossless images at a higher file size, or smaller file sizes but lossy images.
The issue I have is, how do I control the JPEG quality? The program I wrote wants to be able to create both, and change the image quality when using JPEG compression. I expected there to be a tiff tag where the quality can be set, but I have yet to find that when searching. I would like to try a few different qualities between 50 and 100.
The TIFF image container must be used, so I cannot just use libjpeg directly to make JPEG images.

I search the whole TIFF source code and found the tag TIFFTAG_JPEGQUALITY that allows the setting of the JPEG quality, using an int between 0 and 100. I tested it and it does in fact change the quality.
I searched the TIFF tags website and this tag is not listed as a valid tag, but it supported natively by the latest version.

Related

Ultimate JPEG-2000 compression

I have done a research comparing JPEG and JPEG-2000 and couldn't reach the same or better perceptual level of quality for JPEG-2000 compared to JPEG in extreme compressing levels (for web).
Although, announced as ~20+% better perceptual quality with the same size than original JPEG, with available tools to reconvert existing JPEGs or even lossless PNGs, original JPEG was still superior. JPEG-2000 managed to get arguably better results only for huge images which are not as widely used in web (Full HD and bigger).
ImageMagick, GraphicsMagick, OpenJPEG all showed identical results (I assume, due to usage of Jasper to encode JPEG-2000) and all lacked options for encoding. Using Kakadu or online/basic converters didn't help either. In current status quo, tools like imagemin with plugins, can provide much better quality JPEGs on output than JPEG-2000 when maximally compressed for web. So JPEG-2000 being useful mostly for Safari, doesn't get any point to be another format to encode since regular JPEG provides better results.
What do I do wrong and are there any other tools/tricks that have more advanced options for JPEG-2000 encoding to finally beat JPEG?
It's not just you. JPEG 2000 isn't that much better. It's difficult to encode well, and we've got more mature JPEG encoders nowadays (MozJPEG, Guetzli).
JPEG 2000 is wavelet-based, which gives it better scores in the PSNR metric. However, JPEG 2000 excels in exactly the thing PSNR metric prefers too much: blurring. It makes JPEG 2000 look great in low quality range, where the older JPEG breaks down. But it's not helpful in practice for higher-quality images. Wavelets also struggle with text and sharp edges, just like the older JPEG.
JPEG 2000 has very nice progressive loading, but Safari has stopped supporting progressive rendering at some point.
The only practical use for JPEG 2000 is delivering photographic-like images with alpha channel, because JPEG can't do that, and JPEG 2000 still wins compared to PNG.

Compress PDF after manipulation

I have the following problem:
I am receiving various scanned PDF files from a Kyocera Scanner Device.
I have to automatically manipulate these PDF Files in order to:
Delete the colors from textmarkers
Convert the PDF to grayscale
Put it in our DMS
I am using a Bash-Script to do the job.
For deleting the textmarker colors and converting to grayscale I use Imagemagick:
convert -density 150 INPUT.pdf \
-channel rgba \
-alpha set \
-fuzz 15% \
-fill white \
-opaque 'rgb(255,200,195)' \
-opaque 'rgb(255,253,177)' \
-opaque 'rgb(255,155,240)' \
-opaque 'rgb(255,91,193)' \
-colorspace gray OUTPUT-convert.pdf
The resulting image is quite good, BUT the size of the PDF is huge:
Original: 365K
Converted: 1.358K
So I've found a ghostscript command to do the job and reduce the file size:
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -dCompatibilityLevel=1.4 \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=/LeaveColorUnchanged \
-dPDFSETTINGS=/ebook \
-sOutputFile=OUTPUT-ghostscript.pdf OUTPUT-convert.pdf
Now the file sizes are:
Original: 365K
Converted: 1.358K (OUTPUT-convert.pdf)
Ghostscript: 500K (OUTPUT-ghostscript.pdf)
I can't figure out why the size of the PDF after manipulation, from color to grayscale, is higher then the original document. The density (150 dpi) is the resolution of the original document.
When I put the converted PDF (1.358K) through Adobe Acrobat on Windows and recreate the PDF, the size is 213K. I have no loss in quality. How can I achieve this under linux with a bash script?
Any help is appreciated!
Here is a link for example PDF Files:
http://62.75.158.162/download/yKLu3fkbLy7MgkczDrKdG6osHdXh3jvy/
Its not really possible to comment very much without seeing an example file, to determine exactly what has happened at each stage.
However, I very strongly suspect that you have 'lost quality', its just that, at screen resolutions, you can't tell. Your original PDF file was created using ImageMagick at a resolution of 150 dpi. Most probably the image is stored uncompressed in the PDF file, which is why its large.
When you run that PDF file back through Ghostscript there are two effects. Firstly you've used the PDFSETTINGS canned set of job configuration. That (amongst many other things) downsamples grey images to a resolution of 150 dpi (so fortunately for you, no effect). It also compresses the image data using JPEG compression.
Now I've no idea what's in the original PDF file, but if the data there was compressed using JPEG, as seems likely, then you are double applying JPEG quantisation. That's a lossy process and will result in a loss of quality.
Since you are altering the original image data (to change the colour) you have no choice about decompressing the image data. However, to preserve quality you should then not use JPEG compression again, instead you should use Flate compression. The compression ration won't be as good, but it will keep the quality unchanged. To do that you would need to specify the GrayImageFilter using distillerparams, you can't use a PDFSETTINGS for that.
I can't imagine what Acrobat has done to decrease the file size still further (and you haven't said how you 'recreate the PDF file'), but I would imagine it involves reducing the quality of the image still further. Its hard to see how it could save 50% of the file size without doing so. Its also possible it is (like Ghostscript) JPEG compressing the grayscale data but using a more aggressive set of JPEG parameters (resulting in still more loss of quality, of course).
If you posted examples of the original, Ghostscript output, and Acrobat output I might be able to tell you more, but not from this.
For what its worth, there's a new feature in Ghostscript (requires version 9.23 or better) which allows you to create a PDF file which consists only of an image, and choose the colour model. You could run the original PDF file through Ghostscript using something like:
gs -sDEVICE=pdfimage8 -r150 -sOutputFile=gs.pdf
which would produce a pretty minimal PDF file where the original input has been rendered to a gray scale image (at 150 dpi), and that image wrapped up as a PDF file. I've no idea if that might work better for you.
Later EDIT
Yep, its pretty much what I expected.
The original file has what appears to be marked JPEG compression artefacts (all the rectangular 'speckles' round the text). Obviously without seeing the original document I can't tell whether this is because the original document was a JPEG printed to paper, or whether the artefacts were introduced by the scanner, or (more likely) whatever application converted the scanned image into a PDF. Checking the image stored in the PDF file I see that it is indeed a JPEG image.
Nevertheless, the original image is (in my opinion) really very noisy.
Now the output from 'convert' is arguably slightly better (in terms of legibility) than the original. I presume this is 'something' to do with your convert command line, can't be sure. The image in this case is not a JPEG, its compressed with RunLength encoding which is of course lossless. Its also less efficient as a compression method, so the image is bigger. For reasons best known to ImageMagick it also applies a soft mask to the image data. So that's two images per page now instead of just 1. Not too surprising that its larger than the original!
I suspect that the soft mask is due to your command line including RGBA. I assume that produces an alpha channel, and PDF doens't support simple alpha channel blending, its own transparency model is much more sophisticated. So I sort of suspect you are actually making the output file here larger than it needs to be. I'm afraid I can't help you with ImageMagick, I don't know anything about it, but getting rid of that second image would help a great deal.
Note that both your original file and the output from ImageMagick are essentially uncompressed (in terms of the PDF file 'structure').
Then we come to the Ghostscript produced PDF. The 'structure' of the PDF file is itself compressed, giving small size benefits. The images are all JPEG compressed, giving additional compression, but at the cost of quality. Applying JPEG quantisation multiple times always costs quality. By simply comparing the output from 'convert' with the output from Ghostscript I can easily see the degradation in quality.
Now we come to the Acrobat output. Ccomparing it with the other files it shows the worst quality. The JPEG artefacts are very clearly visible in the displayed image. In this case both the image and the soft mask have been compressed with the JPEG2000 compression scheme, which is a 'better' compression than JPEG. However, it looks like applying it to data which has already been quantised for JPEG yields pretty poor quality results. Or at least, applying it to a soft-masked JPEG image does :-)
The main problem with JPEG2000 is that it is patent encumbered. While decoders can be written royalty-free, to write an encoder you must licence the patented technology from the (many) patent holders, an expensive process.
So the AGPL version of Ghostscript does not include a JPEG2000 decoder, and as such cannot write JPEG2000 images.
Obviously you could use a copy of Acrobat to rewrite your PDF file with JPEG2000 compression as you have done here.
Assuming you want to avoid doing that, then my suggestion would be to investigate why convert is producing an image with a soft mask applied. I strongly suspect this is due to the use of rgba instead of rgb.
Avoiding the creation of the second (soft mask) image would (I believe) significantly decrease the size of the PDF file produced by 'convert'. You could gain at least some additional benefit, without any loss of quality, by running it through Ghostscript's pdfwrite device and specifying /FlateEncode for the GrayImageFilter. That would produce a PDF file where the PDF furniture is compressed, and where a better compression scheme is applied to the image data.
You could also just leave the Ghostscript line as it is, the quality degradation may be enough for you to live with.
if you use ubuntu you can try this on the command line. the result is impressive
Install ghostscript, for Ubuntu/Debian:
sudo apt-get install ghostscript
Resize your pdf with the command:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Replace the file names output.pdf and input.pdf with your file names.
PDFs can start as vectors. But once you read it into ImageMagick, it gets rasterized. When writing back to PDF, it just imbeds the raster image into a vector PDF shell. So it has not been re-vectorized.
Your use of -density 150 has increased the rasterized file. The nominal density is 72. So have right there increased by 4x, which would just about cover your size increase. I think you stated your increase wrong. It probably should be Original: 365K Converted: 1.358M not Original: 365K Converted: 1.358K
Also if the scanned PDF was a raster in a vector, it may have had limited colors in palette form or simply compressed JPG form. Your rasterizing has converted to 24-bit color and by processing has increased the colors. So even as non-compressed grayscale it is larger.
You can compress your output PDF in ImageMagick as follows by writing the raster image to compressed JPG format and piping to another convert to write to PDF.
convert -density XXX input.pdf ... -colorspace gray -quality 50 JPG:- | convert - output.pdf
Adjust the quality value as desired

Why does Google serve JPEG instead of WebP in image search?

With all the fuss about WEBP and how cool it is, I'm still getting JPEG thumbnail images in image search on google.com in year 2016, even though Chrome browser tells in HTTP header it accepts webp images: accept:image/webp,image/,/*;q=0.8
Why is that so?
Answering myself.
It may have reason "just not adopted yet", although it may as well be "not worth adopting", because of the following:
WEBP gives better overall quality, but image distortions are different on low quality encoder settings, when compared to classical JPEG:
JPEG gives uniform distortions all over the picture, on hard edges, soft edges and soft gradients
while WEBP handles soft gradients and hard edges better than JPEG, it gives more distortions on the soft-edges. Because of that, image looks deformed.
Example: moon in the following image: http://xooyoozoo.github.io/yolo-octo-bugfixes/#pont-de-quebec-at-night&jpg=s&webp=s
As a side note: WEBP is used for video thumbnails on YouTube, but given the source is video, WEBP is more acceptable in this scenario, than encoding thumbnails for JPEG images.

Faster web experience when available is gold

In node.js can I take the binary straight from the canvas without calling toDataURL (which would convert it to 3x size base64)?
So I would then have binary (example: open up any image with a text editor)
Then convert that into webp base64 (?)
I've found that converting base64 png to base64 webp is very slow but, transporting webp via websockets is very fast. After a lot of tests though; I see that converting to webp from png then transporting is in fact much slower that just transporting pngs (one example is https://github.com/lovell/sharp which can do the base64 conversion)
If I could just go from canvas to webp then I would be transporting 80% less data in the same time. I would be decreasing the transport time and saving the user mobile bandwidth (if they have a webp supporting browser), if not, (their choice - you-snooze-you-loose) and I fallback to png.

dwebp increasing the jpeg original jpeg file size

I used cwebp to convert my jpg image to web. Now I am using dwebp to convert it back but its increasing in size from original one. Is there any way to control the file size in dwebp.
Transcoding between lossy formats tends to increase the size unless the representation of data happens to be extremely compatible between the formats, be it audio, pictures, video or other lossy data. WebP uses a 4x4 Hadamard transform, whereas JPEG uses an 8x8 Discrete Cosine Transform (DCT). Quantization, which is the main form of data loss in these formats, produces different kind of artefacts in these transformations, and transcoding cannot be optimal. Particularly, if either WebP or JPEG was saved with extremely low quality, the other format will struggle to compete with it after transcoding -- the later format will not only have to codify the image signal, but the resulting artefacts from the other format, too.
So, while there is an inherent tendency for an increase in file size in such back-and-forth conversion, the exact amount of loss happening at every stage can be controlled. Which flags and tools (including versions) are you using exactly?

Resources