How can a jpeg encoder become more efficient - jpeg

Earlier I read about mozjpeg. A project from Mozilla to create a jpeg encoder that is more efficient, i.e. creates smaller files.
As I understand (jpeg) codecs, a jpeg encoder would need to create files that use an encoding scheme that can also be decoded by other jpeg codecs. So how is it possible to improve the codec without breaking compatibility with other codecs?
Mozilla does mention that the first step for their encoder is to add functionality that can detect the most efficient encoding scheme for a certain image, which would not break compatibility. However, they intend to add more functionality, first of which is "trellis quantization", which seems to be a highly technical algorithm to do something (I don't understand).
I'm also not entirely sure this quetion belongs on stack overflow, it might also fit superuser, since the question is not specifically about programming. So if anyone feels it should be on superuser, feel free to move this question

JPEG is somewhat unique in that it involves a series of compression steps. There are two that provide the most opportunities for reducing the size of the image.
The first is sampling. In JPEG one usually converts from RGB to YCbCR. In RGB, each component is equal in value. In YCbCr, the Y component is much more important than the Cb and Cr components. If you sample the later at 4 to 1, a 4x4 block of pixels gets reduced from 16+16+16 to 16+1+1. Just by sampling you have reduced the size of the data to be compressed by nearly 1/3.
The other is quantization. You take the sampled pixel values, divide them into 8x8 blocks and perform the Discrete Cosine transform on them. In 8bpp this takes 8x8 8-bit data and converts it to 8x8 16 bit data (inverse compression at that point).
The DCT process tends to produce larger values in the upper right corner and smaller values (close to zero) towards the lower left corner. The upper right coefficients are more valuable than the lower left coefficients.
The 16-bit values are then "quantized" (division in plain english).
The compression process defines an 8x8 quantization matrix. Divide the corresponding entry in the DCT coefficients by the value in the quantization matrix. Because this is integer division, the small values will go to zero. Long runs of zero values are combined using run-length compression. The more consecutive zeros you get, the better the compression.
Generally, the quantization values are much higher at the lower left than in the upper right. You try to force these DCT coefficients to be zero unless they are very large.
This is where much of the loss (not all of it though) comes from in JPEG.
The trade off is to get as many zeros as you can without noticeably degrading the image.
The choice of quantization matrices is the major factor in compression. Most JPEG libraries present a "quality" setting to the user. This translates into the selection of a quantization matrices in the encoder. If someone could devise better quantization matrices, you could get better compression.
This book explains the JPEG process in plain English:
http://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=sr_1_1?ie=UTF8&qid=1394252187&sr=8-1&keywords=0201604434

JPEG provides you multiple options. E.g. you can use standard Huffman tables or you can generate Huffman tables optimal for a specific image. The same goes for quantization tables. You can also switch to using arithmetic coding instead of Huffman coding for entropy encoding. The patents covering arithmetic coding as used in JPEG have expired. All of these options are lossless (no additional loss of data). One of the options used by Mozilla is instead of using baseline JPEG compression they use progressive JPEG compression. You can play with how many frequencies you have in each scan (SS, spectral selection) as well as number of bits used for each frequency (SA, successive approximation). Consecutive scans will have additional frequencies and or addition bits for each frequency. Again all of these different options are lossless. For the standard images used for JPEG switching to progressive encoding improved compression from 41 KB per image to 37 KB. But that is just for one setting of SS and SA. Given the speed of computers today you could automatically try many many different options and choose the best one.
Although hardly used the original JPEG standard had a lossless mode. There were 7 different choices for predictors. Today you would compress using each of the 7 choices and pick the best one. Use the same principle for what I outlined above. And remember non of them encounter additional loss of data. Switching between them is lossless.

Related

Why can JPEG images contain several quantization tables?

I checked and roughly understood the JPEG encoding algorithm from the wiki page about it.
Transform to YCbCr, downsample
Split into 8x8 blocks
Apply DCT on blocks
Divide resulting matrices by quantization table entries
Entropy encoding
I understand that the quantization table in a file depends on what created the image, e.g. a camera manufacturer likely has their own proprietary QT algorithms, photoshop etc have their own QTs, there are public ones, etc.
Now, if one opens 'real' JPEG files they may contain several quantization tables. How can this be? I'd assume the decoding algorithm looks like this:
Decode entropy encoding, receive blocks
Multiply blocks by quantization table entries
revert other operations
What does the second/third/... QT do/when is it used? Is there an upper limit on the number of QTs in a JPEG file? When does it happen that a second QT is added to a JPEG file?
The quantization tables are used for different color components.
Like you already know, in a first step the image is transformed into YCbCr Color Space. In this color space you have three colors: Luminance (Y), Chrominance blue (Cb) and Chrominance red (Cr). As the human eye is less sensitive to colors but very sensitive for brightness, multiple quantization tables are used for the different components.
The Quantization Tables used for the Luminance consists of "lower" values, such that the dividing and rounding will not loose to much information on this component. Blue and red on the other hand have "higher" values as information is not needed that much.

How to compare images and determine which has more content?

Goal: I want to grab the best frame from an animated GIF and use it as a static preview image. I believe the best frame is one that shows the most content - not necessarily the first or last frame.
Take this GIF for example:
--
This is the first frame:
--
Here is the 28th frame:
It's clear that frame 28th represents the entire GIF well.
How could I programmatically determine if one frame has more pixel/content over another? Any thoughts, ideas, packages/modules, or articles that you can point me to would be greatly appreciated.
One straightforward way this could be accomplished would be to estimate the entropy of each image and choose the frame with maximal entropy.
In information theory, entropy can be thought of as the "randomness" of the image. An image of a single color is very predictable, the flatter the distribution, the more random. This is highly related to the compression method described by Arthur-R as entropy is the lower bound on how much data can be losslessly compressed.
Estimating Entropy
One way to estimate the entropy is to approximate the probability mass function for pixel intensities using a histogram. To generate the plot below I first convert the image to grayscale, then compute the histogram using a bin spacing of 1 (for pixel values from 0 to 255). Then, normalize the histogram so that the bins sum to 1. This normalized histogram is an approximation of the pixel probability mass function.
Using this probability mass function we can easily estimate the entropy of the grayscale image which is described by the following equation
H = E[-log(p(x))]
Where H is entropy, E is the expected value, and p(x) is the probability that any given pixel takes the value x.
Programmatically H can be estimated by simply computing -p(x)*log(p(x)) for each value p(x) in the histogram and then adding them together.
Plot of entropy vs. frame number for your example.
with frame 21 (the 22nd frame) having the highest entropy.
Observations
The entropy computed here is not equal to the true entropy of the
image because it makes the assumption that each pixel is independently sampled from the same distribution. To get the true entropy we would need to know
the joint distribution of the image which we won't be able to know without
understanding the underlying random process that generated the images
(which would include human interaction). However, I don't think the true entropy would be very useful and this measure should
give a reasonable estimate of how much content is in the image.
This method will fail if some not-so-interesting frame
contains much more noise (randomly colored pixels) than the most
interesting frame because noise results in a high entropy. For example, the
following image is pure uniform noise and therefore has maximum entropy (H = 8 bits), i.e. no compression is possible.
Ruby Implementation
I don't know ruby but it looks like one of the answers to this question refers to a package for computing entropy of an image.
From m. simon borg's comment
FWIW, using Ruby's File.size() returns 1904 bytes for the 28th frame
image and 946 bytes for the first frame image – m. simon borg
File.size() should be roughly proportional to entropy.
As an aside, if you check the size of the 200x200 noise image on disk you will see that the file is 40,345 bytes even after compression, but the uncompressed data is only 40,000 bytes. Information theory tells us that no compression scheme can ever losslessly compress such images on average.
There are a couple ways I might go about this. My first thought (this may not be the most practical solution, but it seems theoretically interesting!) would be to try losslessly compressing each frame, and in theory, the frame with the least repeatable content (and thus the most unique content) would have the largest size, so you could then compare the size in bytes/bits of each compressed frame. The accuracy of this solution would probably be highly dependent on the photo passed in.
A more realistic/ practical solution might be to grab the predominant color in the GIF (so in the example, the background color), and then iterate through each pixel and increment a counter each time the color of the current pixel doesn't match the color of the background.
I'm thinking about some more optimized/ sample based solutions, and will edit my response to include them a little later, if performance is a concern for you.
I think that you can choose an API such as Restful Web Service for do that because without it that's so hard.
For example,these are some famous API's:
https://cloud.google.com/vision/
https://www.clarifai.com/
https://vize.ai
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
https://imagga.com

Why is GIF image size more than the sum of individual frame size?

I just tried to convert few JPEGs to a GIF image using some online services. For a collection of 1.8 MB of randomly selected JPEGs, the resultant GIF was about 3.8 MB in size (without any extra compression enabled).
I understand GIF is lossless compression. And that's why I expected the resultant output to be around 1.8 MB (input size). Can someone please help me understand what's happening with this extra space ?
Additionally, is there a better way to bundle a set of images which are similar to each other (for transmission) ?
JPEG is a lossy compressed file, but still it is compressed. When it uncompresses into raw pixel data and then recompressed into GIF, it is logical to get that bigger a size
GIF is worse as a compression method for photographs, it is suited for flat colored drawings mostly. It uses RLE [run-length encoding] if I remember well, that is you get entries in the compressed file that mean "repeat this value N times", so you need to have lots of same colored pixels in horizontal sequence to get good compression.
If you have images that are similar to each other, maybe you should consider packing them as consequtive frames (the more similar should be closer) of a video stream and use some lossless compressor (or even risk it with a lossy one) for video, but maybe this is an overkill.
If you have a color image, multiply the width x height x 3. That is the normal size of the uncompressed image data.
GIF and JPEG are two difference methods for compressing that data. GIF uses the LZW method of compression. In that method the encoder creates a dictionary of previously encountered data sequences. The encoder write codes representing sequences rather than the actual data. This can actual result in an file larger than the actual image data if the encode cannot find such sequences.
These GIF sequences are more likely to occur in drawing where the same colors are used, rather than in photographic images where the color varies subtly through out.
JPEG uses a series of compression steps. These have the drawback that you might not get out exactly what you put in. The first of these is conversion from RGB to YCbCr. There is not a 1-to-1 mapping between these colorspaces so modification can occur there.
Next is subsampling.The reason for going to YCbCr is that you can sample the Cb and Cr components at a lower rate than the Y component and still get good representation of the original image. If you do 1 Y to 4 Cb and 4 Cr you reduce the amount of data to compress by half.
Next is the discrete cosine transform. This is a real number calculation performed on integers. That can produce rounding errors.
Next is quantization. In this step less significant values from the DCT are discarded (less data to compress). It also introduces errors from integer division.

Is re-encoding JPEG images an idempotent operation?

I am aware that JPEG compression is lossy.
I have 2 questions:
Given an operation T:
1. Take a JPEG-80 image
2. Decode it to a byte buffer
3. Encode given byte buffer as JPEG-80
Is T an idempotent operation in terms of visual quality?
Or will the quality of the image keep degrading as I repeat T?
Does the same hold true for the JPEG-XR codec?
Thank you!
Edit:
Since there have been conflicting answers, it would be great if you could provide references!
It's not guaranteed, but it may happen. Especially if you repeat the encode -> decode -> encode -> decode process enough times, it will eventually settle on a fixpoint and stop losing quality further (as long as you stick to the same quality and same encoder).
JPEG encoding is done in several steps:
RGB to YUV conversion
DCT (change into frequency domain)
Quantization (throwing away bits of the DCT)
Lossless compression
And decoding is the same process backwards.
Steps 1 and 2 have rounding errors (especially in speed-optimized encoders using integer math), so for idempotent re-encoding you need to be lucky to get encoding and decoding rounding errors to be small or cancel each other out.
The step 3, which is the major lossy step, is actually idempotent. If your decoded pixels convert to similar-enough DCT it will quantize to the same data again!
JPEG XR also uses YUV, so it may suffer some rounding errors, but OTOH instead of DCT it uses a different transform that can be computed without rounding errors, so it should be easier to round-trip JPEG-XR than other formats.
By definition, a lossy operation discards data by simplifying the representation in a way that (ideally) isn't noticeable to the end user. However, the encoder has no magic method for determining which pixels are important and which aren't, so it encodes all pixels equally, even if they are artifacts!
In other words, the encoder will treat the lossily-compressed image the same as a lossless image. The lossy image will be further simplified, discarding additional data in the process, because for all the encoder knows, the user intends to represent the artifacts.
Here are some examples of JPEG generation loss:
http://vimeo.com/3750507
http://en.wikipedia.org/wiki/File:JPEG_Generarion_Loss_rotating_90_(stitch_of_0,100,200,500,900,2000_times).png

Is JPEG lossless when quality is set to 100?

I understand that JPEG is a lossy compression standard, and that the 'quality' factor controls the degree of compression and thus the amount of data loss.
But when the quality number is set to 100, is the resulting jpeg lossless?
As correctly answered above, using a "typical" JPEG encoder at quality 100 does not give you lossless compression. Lossless JPEG encoding exists, but it's different in nature and seldom used.
I'm just posting to say why quality 100 does not mean lossless.
In JPEG compression information is mostly lost during the DCT coefficient quantization step (8-by-8 coefficient blocks are divided by a 8-by-8 quantization table, so they become smaller --> 'more compressible'). When you set JPEG quality to 100, no real quantization takes place (because the quantization table will be all 1s, at least with standard IJG-JPEG tables), so in fact you don't lose information here..
However, there are mainly two factors leading to information loss even when no quantization takes place:
Typically, JPEG compression reduces color information (becase the human visual system is less senstitive to that than to lumimance). Therefore, even at quality 100 you may be carrying out chrominance subsampling (which means, dropping half or more Cb and Cr coefficients). When this happens, information is lost, even when no quantization happens. However, you can tell the encoder to preserve full chromimance (so called 4:4:4 color sampling).
Nevertheless, JPEG encoding implies going to the DCT domain, which causes rounding of coefficients. Rounding discards some information. This will happen regardless of all other options.
Jpeg is lossy regardless of the setting. At 100, you just get the LEAST loss possible.
It's easy enough to test. Whip up a simple .bmp, compress that to a q=100 jpeg, then re-extract back to a .bmp. Use Gimp/Photoshop to do a "difference" of the two bitmaps, and you'll see the lossiness - it'll be much less noticeable than on a q=50 or q=1 conversion, but still be present.
There is a lossless form of JPEG but it is not widely supported and you do not get it by tweaking the quality setting - it's an entirely different process.
According to wikipedia, No.
jpeg 100 has a compression ratio of 2.6:1. The compression method is usually lossy, meaning that some original image information is lost and cannot be restored, possibly affecting image quality.
There is an optional lossless mode defined in the JPEG standard; however, this mode is not widely supported in products.

Resources