When I search for YUV/YCbCr conversion articles, the articles always include how the conversion matrix is generated.
I'm trying to understand how the YIQ conversion matrix is generated, but I can't find any sources for it; only the conversion matrix pre-baked.
Can anyone explain how the YIQ matrix is created and how it differs from YUV/YCbCr?
The question is some years old, but I found the answer today:
YIQ is the as YUV, with one difference: The U and V components of the YUV color space are rotated by 33° counter-clockwise:
Source: German Wikipedia, YIQ
So when you combine the matrix for the RGB to YUV conversion and this one, you get the YIQ matrix.
Related
Wikipedia and plethora of online resources provide detailed and abundant help with various color space conversions from/to RGB. What I need is a straight YUV->HSL/HSV conversion.
In fact what I need is just the Hue (don't care much for the Saturation or the brightness Lightness/Value). In other words I just need to calculate the "color angle" for a given YUV color.
Code in any language would suffice, though my preference is C-style syntax.
Note that by YUV I mean specifically Y′UV, a.k.a. YCbCr (if that makes any difference).
While YUV->RGB colorspace conversion is linear (same as "can be expressed as a matrix operation") the RGB->HSL is not. Thus it is not possible to combine the two into a single operation.
Thank you Kel Solaar for confirming this for me.
For reference:
YUV(YCbCr)->RGB conversion
RGB->HSL conversion
Note that mathematically the calculation for Hue is written piecewise as the "base angle" depends on which sector the color is in and the "major color" is driven by the max(R, G, B) expression.
I think they are from the different worlds of interest. Here is a google patent
https://patents.google.com/patent/CN105847775A/en
Here's my problem:
I'm doing some rendering using spectral samples, and want to save an image showing the results. I am weighting my spectral power function by the CIE XYZ color matching functions to obtain an XYZ color space result. I multiply this XYZ color tuple by the matrix given by this page for converting to sRGB, and clamp the results to (0,1).
To save the image, I scale the converted tuple by 255 and cast it to bytes, and pass the array to libpng's png_write_image(). When I view a uniform intensity, pure-color spectrum rendered this way, it looks wrong; there are dark bands in the transitions between the colors. This is perhaps not surprising, because to convert from XYZ to sRGB, the color components must be raised to 2.4 after the matrix multiply (or linearly scaled if they are small enough). But if I do this, it looks worse! Only after raising to 1/2.2 does it start to look right. It seems like, in the absence of me doing anything, the decoded images are having a gamma of ~2.2 applied twice.
Here's the way I would expect it to work: I apply the matrix to XYZ, and I have a roughly energy-linear RGB tuple. I raise this to 2.2, and now have a perceptually linear color tuple. I encode these numbers as they are (thus making efficient use of the file precision), and store a field in the file that says "these bytes have been encoded with gamma 2.2". Then at image load time, the decoding system un-applies the encoded gamma, then applies the system gamma before display. (And thus from an authoring perspective, I shouldn't have to care what the viewer's system gamma is). But the results I'm getting suggest it doesn't work this way.
Worse, I have tried calling png_set_gAMA() with both 2.2 and 1/2.2 and see no difference in the image. I get similar results with png_set_sRGB() (which I believe should force the gamma to 1/2.2).
There must be something I have backwards or don't understand with regards to either how I should be converting my color values, or how PNG handles gamma and color spaces. To break this down into a few clarifying questions:
What is the color space of the byte values I am expected to pass to write_png()?
What calls, if any, must I make to libpng in order to specify the color space and gamma of the passed bytes, to ensure proper display? Why might they fail?
How does the gamma field in the the png file relate to the exponent I have applied to the passed color channel values, if any?
If I am expected to invert a gamma curve before sending my image data (which I doubt, but seems necessary now), should that inversion include the linear part of the sRGB curve?
Furthermore, I see hints that "white point" matters in conversion between XYZ and sRGB. It is unclear to me whether the matrices in the site given above include a renormalization to D65 (it does not match Wikipedia's matrix)-- or even when such a conversion is necessary. Most of the literature I've found glosses over the details. Is there yet another step in the conversion not mentioned in the wiki article, or will this be handled automatically?
It is pretty much the way you expected. png_set_gAMA() causes libpng to write a gAMA
chunk in the output PNG file. It doesn't change the pixels themselves. A png-compliant
viewer is supposed to use the gamma value from the chunk, combined with the gamma of the display, to write the pixel intensity values properly on the display. Most decoders won't actually do the two-step (unapply the image gamma, then apply the system gamma) method you described, although the result is conceptually the same: It will combine the image gamma with the system gamma to create a lookup table, then use that table to convert the pixels in one step.
From what you observed (gamma=2.2 and gamma=1/2.2 behaving the same), it appears that you are using a viewer that doesn't do anything with the PNG gAMA chunk data.
You said:
because to convert from XYZ to sRGB, the color components must be raised to 2.4 after the matrix multiply...
No, this is incorrect. Going from linear (XYZ) to sRGB, you do NOT raise to 2.4 nor 2.2, that is for going FROM sRGB to linear.
Going from linear to sRGB you raise to ^(1/2.2) or if using the sRGB piecewise, you'll see 1/2.4 — the effective gamma you are applying is ^0.45455
On the wikipedia page you linked, this is the FORWARD transformation.
From XYZ to sRGB:
That of course being after the correct matrix is applied. Assuming everything is in D65, then:
Straight Talk about Linear
Light in the real world is linear. If you triple 100 photons, you then have 300 photons. But the human eye does not see a trippling, we see only a modest increast by comparison.
This is in part why transfer curves or "gamma" is used, to make the most of the available code space in an 8 bit image (oversimplification on my part I know).
To do this, a linear light value is raised to the power of 0.455, and to get that sRGB value back to a linear space, then we raise it to the inverse, i.e. ^1/0.455 otherwise known as ^2.2
The use of the matrixes must be done in linear space. but after transiting the matrix, you need to apply the trc or "gamma" encoding. Based on your statements, no, things are not having 2.2 added twice, you are simply going the wrong way.
You wrote: " It seems like, in the absence of me doing anything, the decoded images are having a gamma of ~2.2 applied twice. "
I think your monitor (hardwrare or your systems icc profile) has already a gamma setting itself.
In the official documentation of DXGI_FORMAT, it tells us that only a format with _SRGB enumeration postfix is in sRGB color space. I thought other format without this postfix are all in the linear space. But I found a very strange behavior of format conversion function in DirectXTex library. ( You can download it from http://directxtex.codeplex.com/ )
At first, I exported a texture file as DXGI_FORMAT_R32G32B32A32_FLOAT by using NVIDIA Photoshop DDS Plugin. Then I load this file by LoadFromDDSFile() function, and convert its format to DXGI_FORMAT_R16G16B16A16_UNORM by Convert() function. (Both of these two functions are provided by DirectXTex library.)
You guess what? After the image was converted to DXGI_FORMAT_R16G16B16A16_UNORM, the brightness of all pixels were also changed, the whole image becomes brighter than before.
If I manually convert the pixel values from sRGB space to Linear space after the image was converted to DXGI_FORMAT_R16G16B16A16_UNORM format, the resultant pixel values are same as input. Therefore, I suppose that the DirectXTex library treats DXGI_FORMAT_R32G32B32A32_FLOAT as a format in linear color space, and treats DXGI_FORMAT_R16G16B16A16_UNORM as a format in sRGB color space, then it did the color space transforming from linear space to sRGB space. ( I tried to find out why the Convert() function also converts the color space, but it was implemented by WIC, and there is no source code for it. )
So, is there any bug in DirectXTex library? Or is it the real standard for DXGI_FORMATs? If there were different color spaces for some special DXGI_FORMATs, please tell me that where can I find the specification for it.
Any help will be grateful. Thanks!
By convention float RGB values are linear, and integer RGB values are gamma-compressed. There is no particular benefit to gamma-compressing floats since the reason for gamma is to use more bits where it is perceptually needed, and floats have sufficient (perhaps excessive) number of bits throughout and are already pseudo-log encoded (using the exponent). (source)
Note that the colorspace of integer RGB textures in DXGI which are not specifically *_SRGB is not sRGB, it is driver dependent, and usually has a fixed gamma of 0.5.
The DirectXTex library does appear to be behaving correctly. However, please note that you are also relying on the behavior of whatever software you use to both capture and display the DDS files. A better test for just DirectXTex is simply to do a round-trip conversion float->int->float in the library and compare the results numerically rather than visually.
I would like to do some odd geometric/odd shape recognition. But I'm not sure how to do it.
Here's what I have so far:
Convert RGB image to Monochrome.
Otsu Threshold
Hough Transform.
I'm not sure what to do next.
For geometric information, you could do a raster to vector conversion to convert your image into coordinated vectors (lines and points) and finite element analysis to look for known shapes. Not easy but libraries should be available for both.
Edit: Note that there are sometimes easier practical solutions, but they depend on the image and types of errors. For example, removing perspective, identifying a 3d object from a 2d image, significance of colour, etc... You often see registration markers added to the real world object to overcome
this and allow much easier identification. Looking up articles on feature extraction techniques might help.
How do I convert the RGB values of a pixel to a single monochrome value?
I found one possible solution in the Color FAQ. The luminance component Y (from the CIE XYZ system) captures what is most perceived by humans as color in one channel. So, use those coefficients:
mono = (0.2125 * color.r) + (0.7154 * color.g) + (0.0721 * color.b);
This MSDN article uses (0.299 * color.R + 0.587 * color.G + 0.114 * color.B);
This Wikipedia article uses (0.3* color.R + 0.59 * color.G + 0.11 * color.B);
This depends on what your motivations are. If you just want to turn an arbitrary image to grayscale and have it look pretty good, the conversions in other answers to this question will do.
If you are converting color photographs to black and white, the process can be both very complicated and subjective, requiring specific tweaking for each image. For an idea what might be involved, take a look at this tutorial from Adobe for Photoshop.
Replicating this in code would be fairly involved, and would still require user intervention to get the resulting image aesthetically "perfect" (whatever that means!).
As mentioned also, a grayscale translation (note that monochromatic images need not to be in grayscale) from an RGB-triplet is subject to taste.
For example, you could cheat, extract only the blue component, by simply throwing the red and green components away, and copying the blue value in their stead. Another simple and generally ok solution would be to take the average of the pixel's RGB-triplet and use that value in all three components.
The fact that there's a considerable market for professional and not-very-cheap-at-all-no-sirree grayscale/monochrome converter plugins for Photoshop alone, tells that the conversion is just as simple or complex as you wish.
The logic behind converting any RGB based picture to monochrome can is not a trivial linear transformation. In my opinion such a problem is better addressed by "Color Segmentation" techniques. You could achieve "Color segmentation" by k-means clustering.
See reference example from MathWorks site.
https://www.mathworks.com/examples/image/mw/images-ex71219044-color-based-segmentation-using-k-means-clustering
Original picture in colours.
After converting to monochrome using k-means clustering
How does this work?
Collect all pixel values from entire image. From an image which is W pixels wide and H pixels high, you will get W *H color values. Now, using k-means algorithm create 2 clusters (or bins) and throw the colours into the appropriate "bins". The 2 clusters represent your black and white shades.
Youtube video demonstrating image segmentation using k-means?
https://www.youtube.com/watch?v=yR7k19YBqiw
Challenges with this method
The k-means clustering algorithm is susceptible to outliers. A few random pixels with a color whose RGB distance is far away from the rest of the crowd could easily skew the centroids to produce unexpected results.
Just to point out in the self-selected answer, you have to LINEARIZE the sRGB values before you can apply the coefficients. This means removing the transfer curve.
To remove the power curve, divide the 8 bit R G and B channels by 255.0, then either use the sRGB piecewise transform, which is recommended for image procesing, OR you can cheat and raise each channel to the power of 2.2.
Only after linearizing can you apply the coefficients shown, (which also are not exactly correct in the selected answer).
The standard is 0.2126 0.7152 and 0.0722. Multiply each channel by its coefficient and sum them together for Y, the luminance. Then re-apply the gamma to Y and multiply by 255, then copy to all three channels, and boom you have a greyscale (monochrome) image.
Here it is all at once in one simple line:
// Andy's Easy Greyscale in one line.
// Send it sR sG sB channels as 8 bit ints, and
// it returns three channels sRgrey sGgrey sBgrey
// as 8 bit ints that display glorious grey.
sRgrey = sGgrey = sBgrey = Math.min(Math.pow((Math.pow(sR/255.0,2.2)*0.2126+Math.pow(sG/255.0,2.2)*0.7152+Math.pow(sB/255.0,2.2)*0.0722),0.454545)*255),255);
And that's it. Unless you have to parse hex strings....