From a large collection of jpeg images, I want to identify those more likely to be simple logos or text (as opposed to camera pictures). One identifying characteristic would be low color count. I expect most to have been created with a drawing program.
If a jpeg image has a palette, it's simple to get a color count. But I expect most files to be 24-bit color images. There's no restriction on image size.
I suppose I could create an array of 2^24 (16M) integers, iterate through every pixel and inc the count for that 24-bit color. Yuck. Then I would count the non-zero entries. But if the jpeg compression messes with the original colors I could end up counting a lot of unique pixels, which might be hard to distinguish from a photo. (Maybe I could convert each pixel to YUV colorspace, and save fewer counts.)
Any better ideas? Library suggestions? Humorous condescensions?
Sample 10000 random coordinates and make a histogram, then analyze the histogram.
Related
I checked and roughly understood the JPEG encoding algorithm from the wiki page about it.
Transform to YCbCr, downsample
Split into 8x8 blocks
Apply DCT on blocks
Divide resulting matrices by quantization table entries
Entropy encoding
I understand that the quantization table in a file depends on what created the image, e.g. a camera manufacturer likely has their own proprietary QT algorithms, photoshop etc have their own QTs, there are public ones, etc.
Now, if one opens 'real' JPEG files they may contain several quantization tables. How can this be? I'd assume the decoding algorithm looks like this:
Decode entropy encoding, receive blocks
Multiply blocks by quantization table entries
revert other operations
What does the second/third/... QT do/when is it used? Is there an upper limit on the number of QTs in a JPEG file? When does it happen that a second QT is added to a JPEG file?
The quantization tables are used for different color components.
Like you already know, in a first step the image is transformed into YCbCr Color Space. In this color space you have three colors: Luminance (Y), Chrominance blue (Cb) and Chrominance red (Cr). As the human eye is less sensitive to colors but very sensitive for brightness, multiple quantization tables are used for the different components.
The Quantization Tables used for the Luminance consists of "lower" values, such that the dividing and rounding will not loose to much information on this component. Blue and red on the other hand have "higher" values as information is not needed that much.
Imagemagick can invert the colors of a JPEG like so:
mogrify -negate image.jpg
However, that's not lossless. My intuition says that color inversion should be doable in a lossless fashion, at least for grayscale images, but I know hardly anything about JPEG. Hence my questions:
Is lossless JPEG grayscale inversion possible in theory?
If so, is libjpeg or any other software out there able to do it?
It's not lossless because there is not a 1:1 match between the gamuts of the RGB and YCbCr colorspaces used in JPEG. If you start with an RGB value that is within YCbCR and flip it, you may get a value outside the YCbCr colorspace range that will end up getting clamped.
JPEG encodes images as a series of entropy coded deltas across MCUs (minimum coded units - 8x8 blocks of DCT values) and entropy coded quantized DCT coefficients within each MCU. To do something like inverting the pixel values (even if grayscale) would involve decoding the entropy coded bits, modifying the values and re-encoding them since you can't "invert" entropy coded DCT values. There isn't a one to one matching of entropy coded lengths for each value because the bits are encoded based on the statistical probability and the magnitude/sign of quantized values. The other problem is that the coded DCT values exist in the frequency domain. I'm not a mathematician, so I can't say for sure if there is a simple way to invert the spatial domain values in the frequency domain, but I think at best it's really complicated and likely the quantization of the values will interfere with a simple solution. The kind of things you can do losslessly in JPEG files is rotate, crop and less well known operations such as extracting a grayscale image from a color image. Individual pixel values can't be modified without having to decode and recode the MCUs which incurs the "loss" in JPEG quality.
I just tried to convert few JPEGs to a GIF image using some online services. For a collection of 1.8 MB of randomly selected JPEGs, the resultant GIF was about 3.8 MB in size (without any extra compression enabled).
I understand GIF is lossless compression. And that's why I expected the resultant output to be around 1.8 MB (input size). Can someone please help me understand what's happening with this extra space ?
Additionally, is there a better way to bundle a set of images which are similar to each other (for transmission) ?
JPEG is a lossy compressed file, but still it is compressed. When it uncompresses into raw pixel data and then recompressed into GIF, it is logical to get that bigger a size
GIF is worse as a compression method for photographs, it is suited for flat colored drawings mostly. It uses RLE [run-length encoding] if I remember well, that is you get entries in the compressed file that mean "repeat this value N times", so you need to have lots of same colored pixels in horizontal sequence to get good compression.
If you have images that are similar to each other, maybe you should consider packing them as consequtive frames (the more similar should be closer) of a video stream and use some lossless compressor (or even risk it with a lossy one) for video, but maybe this is an overkill.
If you have a color image, multiply the width x height x 3. That is the normal size of the uncompressed image data.
GIF and JPEG are two difference methods for compressing that data. GIF uses the LZW method of compression. In that method the encoder creates a dictionary of previously encountered data sequences. The encoder write codes representing sequences rather than the actual data. This can actual result in an file larger than the actual image data if the encode cannot find such sequences.
These GIF sequences are more likely to occur in drawing where the same colors are used, rather than in photographic images where the color varies subtly through out.
JPEG uses a series of compression steps. These have the drawback that you might not get out exactly what you put in. The first of these is conversion from RGB to YCbCr. There is not a 1-to-1 mapping between these colorspaces so modification can occur there.
Next is subsampling.The reason for going to YCbCr is that you can sample the Cb and Cr components at a lower rate than the Y component and still get good representation of the original image. If you do 1 Y to 4 Cb and 4 Cr you reduce the amount of data to compress by half.
Next is the discrete cosine transform. This is a real number calculation performed on integers. That can produce rounding errors.
Next is quantization. In this step less significant values from the DCT are discarded (less data to compress). It also introduces errors from integer division.
I'm very interested in understanding how graphic file format (PNG, JPG, GIF) work. Are there any code examples that demonstrate how these files are made and also how they are interpreted (viewed in browser)?
Regardless of which graphic file format you are working with, you need to understand the basic nature that all graphic files have in common.
File Header
File Type, Version, (Time & Date Stamp - if included)
Possible data structure/s info or chunks
Flags for which color type to be expected, if compression is available and which type, byte order (endian), has transparency, and other various flags.
Image Data Info
Width normally in pixels sometimes in pels, bits or bytes
Height normally in pixels sometimes in pels, bits or bytes
Bits Per Pixel or Pixel Depth
Image Size in Bytes: numPixelsWidth * numPixelsHeight * ((bits or bytes) for each pixel)
Color Type: - Each Pixel has color data which can vary
Gray Scale
Palette
Color RGB
Color RGBA
Possible Others
If Compression Is Present Which Coding and Encoding Is Used
The actual image data
Once you understand this basic structure then parsing image files becomes easier once you know the specification to the file structure you are working with. When you know how many bytes to read in to your file pointer that includes all headers and chunks, then you can advance your file pointer to the data structure that will read in or write out all the pixel (color) data. In many cases the pixel data is usually 24bits per pixel such that each channel RGBA - Red, Green, Blue, and Alpha are 8bits each or one byte same as an unsigned char. This is represented in either a structure or a two dimensional array. Either way once you know the file's structure and know how to read in the actual image or color data you can easily store it into a single array. Then what you do with it from there depends on your application's needs.
The most detailed information can be obtained by reading the file format specification and implementing a parser in the language you know best.
A good way would be to read the format and transform it into an array of four byte tupples (RGBA, the red, green, blue and alpha parts of a color) This will allow you to use this format as an in between format between formats for easy conversion. At the same time most APIs support the displaying of this raw format.
A good format to get started with is BMP. As old as it is, if this is your first encounter with writing a parser this is a safe an 'easy' format. A good second format is PNG. Start with the uncompressed variations and later add the compression.
Next step is TGA to learn reading chunks or JPG to learn more about compression.
Extra tip: Some implementations of writers contain(ed) errors causing images to be in violation of the format. Others added extra features that never made it to the official specs. When writing a parser this can be a real pain. When you are running into problems always second guess the image you are trying to read. A good binary/hex file reader/editor can be a very helpful tool. I used AXE, if I remember correctly it allows you to overlay the hex codes with a format so you can quickly recognize the header and chunks.
how can I see the color space of my image with openCV ?
I would like to be sure it is RGB, before to convert to another one using cvCvtColor() function
thanks
Unfortunately, OpenCV doesn't provide any sort of indication as to the color space in the IplImage structure, so if you blindly pick up an IplImage from somewhere there is just no way to know how it was encoded. Furthermore, no algorithm can definitively tell you if an image should be interpreted as HSV vs. RGB - it's all just a bunch of bytes to the machine (should this be HSV or RGB?). I recommend you wrap your IplImages in another struct (or even a C++ class with templates!) to help you keep track of this information. If you're really desperate and you're dealing only with a certain type of images (outdoor scenes, offices, faces, etc.) you could try computing some statistics on your images (e.g. build histogram statistics for natural RGB images and some for natural HSV images), and then try to classify your totally unknown image by comparing which color space your image is closer to.
txandi makes an interesting point. OpenCV has a BGR colorspace which is used by default. This is similar to the RGB colorspace except that the B and R channels are physically switched in the image. If the physical channel ordering is important to you, you will need to convert your image with this function: cvCvtColor(defaultBGR, imageRGB, CV_BGR2RGB).
As rcv said, there is no method to programmatically detect the color space by inspecting the three color channels, unless you have a priori knowledge of the image content (e.g., there is a marker in the image whose color is known). If you will be accepting images from unknown sources, you must allow the user to specify the color space of their image. A good default would be to assume RGB.
If you modify any of the pixel colors before display, and you are using a non-OpenCV viewer, you should probably use cvCvtColor(src,dst,CV_BGR2RGB) after you have finished running all of your color filters. If you are using OpenCV for the viewer or will be saving the images out to file, you should make sure they are in BGR color space.
The IplImage struct has a field named colorModel consisting of 4 chars. Unfortunately, OpenCV ignores this field. But you can use this field to keep track of different color models.
I basically split the channels and display each one to figure out the color space of the image I'm using. It may not be the best way, but it works for me.
For detailed explanation, you can refer the below link.
https://dryrungarage.wordpress.com/2018/03/11/image-processing-basics/