Why can JPEG images contain several quantization tables? - jpeg

I checked and roughly understood the JPEG encoding algorithm from the wiki page about it.
Transform to YCbCr, downsample
Split into 8x8 blocks
Apply DCT on blocks
Divide resulting matrices by quantization table entries
Entropy encoding
I understand that the quantization table in a file depends on what created the image, e.g. a camera manufacturer likely has their own proprietary QT algorithms, photoshop etc have their own QTs, there are public ones, etc.
Now, if one opens 'real' JPEG files they may contain several quantization tables. How can this be? I'd assume the decoding algorithm looks like this:
Decode entropy encoding, receive blocks
Multiply blocks by quantization table entries
revert other operations
What does the second/third/... QT do/when is it used? Is there an upper limit on the number of QTs in a JPEG file? When does it happen that a second QT is added to a JPEG file?

The quantization tables are used for different color components.
Like you already know, in a first step the image is transformed into YCbCr Color Space. In this color space you have three colors: Luminance (Y), Chrominance blue (Cb) and Chrominance red (Cr). As the human eye is less sensitive to colors but very sensitive for brightness, multiple quantization tables are used for the different components.
The Quantization Tables used for the Luminance consists of "lower" values, such that the dividing and rounding will not loose to much information on this component. Blue and red on the other hand have "higher" values as information is not needed that much.

Related

Lossless grayscale JPEG "color" inversion?

Imagemagick can invert the colors of a JPEG like so:
mogrify -negate image.jpg
However, that's not lossless. My intuition says that color inversion should be doable in a lossless fashion, at least for grayscale images, but I know hardly anything about JPEG. Hence my questions:
Is lossless JPEG grayscale inversion possible in theory?
If so, is libjpeg or any other software out there able to do it?
It's not lossless because there is not a 1:1 match between the gamuts of the RGB and YCbCr colorspaces used in JPEG. If you start with an RGB value that is within YCbCR and flip it, you may get a value outside the YCbCr colorspace range that will end up getting clamped.
JPEG encodes images as a series of entropy coded deltas across MCUs (minimum coded units - 8x8 blocks of DCT values) and entropy coded quantized DCT coefficients within each MCU. To do something like inverting the pixel values (even if grayscale) would involve decoding the entropy coded bits, modifying the values and re-encoding them since you can't "invert" entropy coded DCT values. There isn't a one to one matching of entropy coded lengths for each value because the bits are encoded based on the statistical probability and the magnitude/sign of quantized values. The other problem is that the coded DCT values exist in the frequency domain. I'm not a mathematician, so I can't say for sure if there is a simple way to invert the spatial domain values in the frequency domain, but I think at best it's really complicated and likely the quantization of the values will interfere with a simple solution. The kind of things you can do losslessly in JPEG files is rotate, crop and less well known operations such as extracting a grayscale image from a color image. Individual pixel values can't be modified without having to decode and recode the MCUs which incurs the "loss" in JPEG quality.

Why is GIF image size more than the sum of individual frame size?

I just tried to convert few JPEGs to a GIF image using some online services. For a collection of 1.8 MB of randomly selected JPEGs, the resultant GIF was about 3.8 MB in size (without any extra compression enabled).
I understand GIF is lossless compression. And that's why I expected the resultant output to be around 1.8 MB (input size). Can someone please help me understand what's happening with this extra space ?
Additionally, is there a better way to bundle a set of images which are similar to each other (for transmission) ?
JPEG is a lossy compressed file, but still it is compressed. When it uncompresses into raw pixel data and then recompressed into GIF, it is logical to get that bigger a size
GIF is worse as a compression method for photographs, it is suited for flat colored drawings mostly. It uses RLE [run-length encoding] if I remember well, that is you get entries in the compressed file that mean "repeat this value N times", so you need to have lots of same colored pixels in horizontal sequence to get good compression.
If you have images that are similar to each other, maybe you should consider packing them as consequtive frames (the more similar should be closer) of a video stream and use some lossless compressor (or even risk it with a lossy one) for video, but maybe this is an overkill.
If you have a color image, multiply the width x height x 3. That is the normal size of the uncompressed image data.
GIF and JPEG are two difference methods for compressing that data. GIF uses the LZW method of compression. In that method the encoder creates a dictionary of previously encountered data sequences. The encoder write codes representing sequences rather than the actual data. This can actual result in an file larger than the actual image data if the encode cannot find such sequences.
These GIF sequences are more likely to occur in drawing where the same colors are used, rather than in photographic images where the color varies subtly through out.
JPEG uses a series of compression steps. These have the drawback that you might not get out exactly what you put in. The first of these is conversion from RGB to YCbCr. There is not a 1-to-1 mapping between these colorspaces so modification can occur there.
Next is subsampling.The reason for going to YCbCr is that you can sample the Cb and Cr components at a lower rate than the Y component and still get good representation of the original image. If you do 1 Y to 4 Cb and 4 Cr you reduce the amount of data to compress by half.
Next is the discrete cosine transform. This is a real number calculation performed on integers. That can produce rounding errors.
Next is quantization. In this step less significant values from the DCT are discarded (less data to compress). It also introduces errors from integer division.

How can a jpeg encoder become more efficient

Earlier I read about mozjpeg. A project from Mozilla to create a jpeg encoder that is more efficient, i.e. creates smaller files.
As I understand (jpeg) codecs, a jpeg encoder would need to create files that use an encoding scheme that can also be decoded by other jpeg codecs. So how is it possible to improve the codec without breaking compatibility with other codecs?
Mozilla does mention that the first step for their encoder is to add functionality that can detect the most efficient encoding scheme for a certain image, which would not break compatibility. However, they intend to add more functionality, first of which is "trellis quantization", which seems to be a highly technical algorithm to do something (I don't understand).
I'm also not entirely sure this quetion belongs on stack overflow, it might also fit superuser, since the question is not specifically about programming. So if anyone feels it should be on superuser, feel free to move this question
JPEG is somewhat unique in that it involves a series of compression steps. There are two that provide the most opportunities for reducing the size of the image.
The first is sampling. In JPEG one usually converts from RGB to YCbCR. In RGB, each component is equal in value. In YCbCr, the Y component is much more important than the Cb and Cr components. If you sample the later at 4 to 1, a 4x4 block of pixels gets reduced from 16+16+16 to 16+1+1. Just by sampling you have reduced the size of the data to be compressed by nearly 1/3.
The other is quantization. You take the sampled pixel values, divide them into 8x8 blocks and perform the Discrete Cosine transform on them. In 8bpp this takes 8x8 8-bit data and converts it to 8x8 16 bit data (inverse compression at that point).
The DCT process tends to produce larger values in the upper right corner and smaller values (close to zero) towards the lower left corner. The upper right coefficients are more valuable than the lower left coefficients.
The 16-bit values are then "quantized" (division in plain english).
The compression process defines an 8x8 quantization matrix. Divide the corresponding entry in the DCT coefficients by the value in the quantization matrix. Because this is integer division, the small values will go to zero. Long runs of zero values are combined using run-length compression. The more consecutive zeros you get, the better the compression.
Generally, the quantization values are much higher at the lower left than in the upper right. You try to force these DCT coefficients to be zero unless they are very large.
This is where much of the loss (not all of it though) comes from in JPEG.
The trade off is to get as many zeros as you can without noticeably degrading the image.
The choice of quantization matrices is the major factor in compression. Most JPEG libraries present a "quality" setting to the user. This translates into the selection of a quantization matrices in the encoder. If someone could devise better quantization matrices, you could get better compression.
This book explains the JPEG process in plain English:
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=sr_1_1?ie=UTF8&qid=1394252187&sr=8-1&keywords=0201604434
JPEG provides you multiple options. E.g. you can use standard Huffman tables or you can generate Huffman tables optimal for a specific image. The same goes for quantization tables. You can also switch to using arithmetic coding instead of Huffman coding for entropy encoding. The patents covering arithmetic coding as used in JPEG have expired. All of these options are lossless (no additional loss of data). One of the options used by Mozilla is instead of using baseline JPEG compression they use progressive JPEG compression. You can play with how many frequencies you have in each scan (SS, spectral selection) as well as number of bits used for each frequency (SA, successive approximation). Consecutive scans will have additional frequencies and or addition bits for each frequency. Again all of these different options are lossless. For the standard images used for JPEG switching to progressive encoding improved compression from 41 KB per image to 37 KB. But that is just for one setting of SS and SA. Given the speed of computers today you could automatically try many many different options and choose the best one.
Although hardly used the original JPEG standard had a lossless mode. There were 7 different choices for predictors. Today you would compress using each of the 7 choices and pick the best one. Use the same principle for what I outlined above. And remember non of them encounter additional loss of data. Switching between them is lossless.

How to write a JFIF file?

I'm developing a C++ project, in which I have to compress bitmap data in JPEG format, and write as output a .jpg file that could be opened by a normal image viewer.
I cannot use any kind of libraries.
All the JPEG compression is done, the only thing I'm missing is how to write the stuff in the file correctly.
I already looked into the JFIF format file specification, and googled a lot, but can't figure out how to do it.
More in detail, I should have all the headers put correctly in the file, what I really miss is, after I have the 3 color components ready to be written, how can I do that? (in which order to write the components, how to handle subsampled components (is there other stuff?))
EDIT:
Link to a sample output image (starting from a random generated 8x8 RGB bitmap).
https://dl.dropbox.com/u/46024798/out.jpg
The headers of the image (should) specify that this is a JPEG 8x8px with 3 color components, subsampling 4:4:4.
More in detail, what I did is:
Generate 3 random 8x8 blocks, with values in range [0..255]
Subtract 128 to all the elements (now in range [-128..127])
Apply the Discrete Cosene Transormation to the 3 blocks
Quantize the result
Put the results of quantization in zig-zag order
Look up in the Huffman Tables for the values to write in the files (with End Of Block marker and that kind of stuff)
And for the JPEG compression, that should be ok.
Then I write the file:
First, I write the SOI header, the APP0 marker, the "magic string" JFIF, version, units, density and thumbnail info
Then the quantization table
Then the Start Of Frame marker, with image precision, dimensions, number of components, subsampling info, a DC Huffman Table and an AC Huffman Table
Then the Start Of Scan header (probably where I messed up), in which I point the IDs of the Huffman Tables to use for each component and other stuff for which I don't know exactly the meaning (spectral selection??, successive approximation??)
Finally, I write the Huffman Encoded values in this order:
All the Y block
All the Cb block
All the Cr block
And End Of Image

Identify low-color jpeg images

From a large collection of jpeg images, I want to identify those more likely to be simple logos or text (as opposed to camera pictures). One identifying characteristic would be low color count. I expect most to have been created with a drawing program.
If a jpeg image has a palette, it's simple to get a color count. But I expect most files to be 24-bit color images. There's no restriction on image size.
I suppose I could create an array of 2^24 (16M) integers, iterate through every pixel and inc the count for that 24-bit color. Yuck. Then I would count the non-zero entries. But if the jpeg compression messes with the original colors I could end up counting a lot of unique pixels, which might be hard to distinguish from a photo. (Maybe I could convert each pixel to YUV colorspace, and save fewer counts.)
Any better ideas? Library suggestions? Humorous condescensions?
Sample 10000 random coordinates and make a histogram, then analyze the histogram.

Resources