RGB to monochrome conversion - colors

How do I convert the RGB values of a pixel to a single monochrome value?

I found one possible solution in the Color FAQ. The luminance component Y (from the CIE XYZ system) captures what is most perceived by humans as color in one channel. So, use those coefficients:
mono = (0.2125 * color.r) + (0.7154 * color.g) + (0.0721 * color.b);

This MSDN article uses (0.299 * color.R + 0.587 * color.G + 0.114 * color.B);
This Wikipedia article uses (0.3* color.R + 0.59 * color.G + 0.11 * color.B);

This depends on what your motivations are. If you just want to turn an arbitrary image to grayscale and have it look pretty good, the conversions in other answers to this question will do.
If you are converting color photographs to black and white, the process can be both very complicated and subjective, requiring specific tweaking for each image. For an idea what might be involved, take a look at this tutorial from Adobe for Photoshop.
Replicating this in code would be fairly involved, and would still require user intervention to get the resulting image aesthetically "perfect" (whatever that means!).

As mentioned also, a grayscale translation (note that monochromatic images need not to be in grayscale) from an RGB-triplet is subject to taste.
For example, you could cheat, extract only the blue component, by simply throwing the red and green components away, and copying the blue value in their stead. Another simple and generally ok solution would be to take the average of the pixel's RGB-triplet and use that value in all three components.
The fact that there's a considerable market for professional and not-very-cheap-at-all-no-sirree grayscale/monochrome converter plugins for Photoshop alone, tells that the conversion is just as simple or complex as you wish.

The logic behind converting any RGB based picture to monochrome can is not a trivial linear transformation. In my opinion such a problem is better addressed by "Color Segmentation" techniques. You could achieve "Color segmentation" by k-means clustering.
See reference example from MathWorks site.
https://www.mathworks.com/examples/image/mw/images-ex71219044-color-based-segmentation-using-k-means-clustering
Original picture in colours.
After converting to monochrome using k-means clustering
How does this work?
Collect all pixel values from entire image. From an image which is W pixels wide and H pixels high, you will get W *H color values. Now, using k-means algorithm create 2 clusters (or bins) and throw the colours into the appropriate "bins". The 2 clusters represent your black and white shades.
Youtube video demonstrating image segmentation using k-means?
https://www.youtube.com/watch?v=yR7k19YBqiw
Challenges with this method
The k-means clustering algorithm is susceptible to outliers. A few random pixels with a color whose RGB distance is far away from the rest of the crowd could easily skew the centroids to produce unexpected results.

Just to point out in the self-selected answer, you have to LINEARIZE the sRGB values before you can apply the coefficients. This means removing the transfer curve.
To remove the power curve, divide the 8 bit R G and B channels by 255.0, then either use the sRGB piecewise transform, which is recommended for image procesing, OR you can cheat and raise each channel to the power of 2.2.
Only after linearizing can you apply the coefficients shown, (which also are not exactly correct in the selected answer).
The standard is 0.2126 0.7152 and 0.0722. Multiply each channel by its coefficient and sum them together for Y, the luminance. Then re-apply the gamma to Y and multiply by 255, then copy to all three channels, and boom you have a greyscale (monochrome) image.
Here it is all at once in one simple line:
// Andy's Easy Greyscale in one line.
// Send it sR sG sB channels as 8 bit ints, and
// it returns three channels sRgrey sGgrey sBgrey
// as 8 bit ints that display glorious grey.
sRgrey = sGgrey = sBgrey = Math.min(Math.pow((Math.pow(sR/255.0,2.2)*0.2126+Math.pow(sG/255.0,2.2)*0.7152+Math.pow(sB/255.0,2.2)*0.0722),0.454545)*255),255);
And that's it. Unless you have to parse hex strings....

Related

How does one properly scale an XYZ color gamut bounding volume after computing it from color matching functions?

After computing the XYZ gamut bounding mesh below from spectral samples/color matching functions, how does one scale the resulting volume for compatibility with popular color spaces such as sRGB? More specifically, the size and scale of the volume depends on the number of samples and the integral approximation method used to compute it. How, then, can one determine the right values to scale such volumes to match known color spaces like sRGB, P3-Display, NTSC, PAL, etc?
It seemed like fitting the whole volume so that Y ranges from [0, 1] would work, but it had several problems:
When compared to a sub-volume generated by converting the sRGB color cube to XYZ space, the result protruded outside of the 'full gamut'.
Converting random XYZ values from the full gamut volume to sRGB and back, the final XYZ doesn't match the initial one.
Most (all?) standardized color spaces derive from CIE XYZ, so each must have some kind of function or transformation to and from the full XYZ Gamut, or at least each must have some unique parameters for a general function.
How does one determine the correct function and its parameters?
Short answer
If I understand your question, you are trying to accomplish is determining the sRGB gamut limits (boundary) relative to the XYZ space you have constructed.
Longer answer
I am assuming you are NOT trying to accomplish gamut mapping. This is non-trivial, and there are multiple methods (perceptual, absolute, relative, etc). I'm going to set gamut mapping aside, and instead focus on determining how some arbitrary color space fints inside your XYZ volume.
First to answer your granular questions:
After computing the XYZ gamut bounding mesh below from spectral samples, how does one scale the volume for compatibility with popular color spaces such as sRGB?
What spectral samples? From a spectrophotometer reading a test print under a given standard illuminant? Or where did they come from? A color matching experiment?
The math is a matter of integrating the spectral data to form the XYZ space, which you apparently have done. What illuminant (white point)??
It seemed like fitting the whole volume so that Y ranges from [0, 1] would work, but it had several problems:
Whole volume of what? The sRGB space? How did you convert the sRGB data to YXZ? OR is this really the question you are asking?
What are the proper scaling constants?
They depend on the spectral data and the adapted white point for the spectral data. sRGB is D65. Most printing is done using D50.
Does each color space have its own ranges for x, y, and z values? How can I determine them?
YES.
Every color space has a different transformation matrix depending on the coordinates of the R G and B primaries. The primaries can be imaginary, such as in ProPhoto.
Some Things
The math you are looking for you can find at brucelindbloom.com and also, you might want to check out Thomas Mansencal's ColorScience, a python library that's the swiss-army-knife of color.
sRGB
XYZ is a linear light space, wherein Y = 0.2 to Y = 0.4 is a doubling of luminance.
sRGB is not a linear space, there is a gamma curve or tone response curve on sRGB data, such that rgb(20,20,20) to rgb(40,40,40) is NOT a doubling of luminance.
The first thing that needs to be done is linearize the sRGB color data.
Then take the linear RGB and run it through the appropriate matrix. If the XYZ data is relative to a different adapting white point, then you need to do something like a Bradford transform to convert to the appropriate one for your XYZ space.
The Bruce Lindbloom site has some ready-to-go matrixes for a couple common situations.
The problem you are describing can be caused by either (or both) failing to linearize the sRGB data and/or not adapting the white point. And... possibly other factors.
If you can answer my questions regarding the source of the spectral data I can better assist.
Further research and experimentation implied that the XYZ volume should scale such that { max(X), max(Y), max(Z) } should equal the illuminant from the working space. In the case of sRGB, that illuminant (also called white point) is called D65.
Results look convincing, but expert confirmation would still be appreciated.

Luminance Matching Two Colors

This will likely seem like a very easy thing I'm trying to do but Google search has not turned up exactly what I'm looking for and I'd like to do this correctly.
Essentially I need to luminance match two bmps. They are simple circles (125x125 pixels) and their original color is only know to me by their (0-255 ranged) RGB value of 255,0,0. I need to find an RGB value of gray that is the same luminance of these circles.
All other luminance/brightness matching tutorials I have seen have been for pictures that have included, a variety of hues, brightnesses, etc. and I am not sure if those techniques will work in this (admittedly more simple) case.
I am hoping to be able to just figure out the RGB values so I can input them into an experiment builder program but I do have access to GIMP if any of its tools are needed or will help.
I apologize for this likely easy question but I know little of graphics, brightness measures, etc. I appreciate any help that can be provided.
ADDENDUM: I actually think this would be a good place to ask one additional question. Is there a formula for conversion of candela to (perhaps approximate?) RGB values? I'm basing these color values loosely off of candela values and would love to know if an equation/way of equating the two beyond guesswork exists.
You need to be careful about luminance-matching digital images, because the actual luminance depends on how they're displayed. In particular, you want to watch out for "gamma correction", which is a nonlinear mapping between the RGB values and the actual display brightness. Some images may have an internal "gamma" value associated with the data itself, and many display devices effectively apply a "gamma" to the RGB values they display.
However, for an image stored and displayed linearly (with an effective gamma of 1), there is a standard luminance measure for RGB values:
Y = 0.2126 * R + 0.7152 * G + 0.0722 * B
There are, actually, a number of standards, with different weights for the linear R, G, and B components. However, if you aren't sure exactly how your image will be displayed, you might as well pick one and stick with it...
Anyway, you can use this to solve your specific problem, as follows: you want a grey value (r,g,b) = (x,x,x) with the same luminance as a pure-red value of 255. Conveniently, the three luminance constants sum to 1.0. This gives you the following formula:
Y == 1.0 * x == 0.2126 * 255
--> x ~ 54
If you want to match a different color, or use different luminance weights (which still sum to 1.0), the procedure is the same: just weight the RGB values according to the luminance formula, then pick a grey value equal to the luminance.
I believe the answer already given is misleading (SO doesn't let me comment). As mentioned the formula given applies to intensities and you should watch out for gamma, see e.g. here:
http://www.poynton.com/notes/colour_and_gamma/GammaFAQ.html#luminance
Thus, the application example should use coefficients that account for gamma, or compensate the gamma by hand which it doesn't. Yes, the image could be linear (so you have actual intensities), but judging from the description the chance is close to zero that it is.
These coefficients yield 'luma', not luminance, but that is what you have asked for anyway. See:
http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html#RTFToC11
To summarize:
luma = 0.299 R + 0.587 G + 0.114 B
(r,g,b) = (luma, luma, luma)
The material should also help with your addendum question. I've found it to be very reliable, which is clearly an exception in this field.

normalizeColor in Brad Larson's Threshold filter

That might be easy to understand, but I don't get the use of the normalizeColor function in Brad Larson's GPUImage. You find it for e.g. in the colorObjectTracking example under Threshold.fsh:
vec3 normalizeColor(vec3 color)
{
return color / max(dot(color, vec3(1.0/3.0)), 0.3);
}
Here is what I get: You take the incoming color "color" and divide it either by 0.3 or by the dot product of the color vector and (1/3,1/3,1/3) if the result of the dot product is bigger than 0.3.
So two questions:
Why is necessary to normalize "color" to the average of it's elements?
Why is there a minimum limit of 0.3? (As I understand the max() function)
Thanks alot!
alti
A more appropriate place to ask this might have been on the project site itself, but I'll bite.
The point of the fragment shader there is to identify pixels in an image that are of a particular color. That function does a crude normalization for the brightness of a color, so that different lighting conditions could be accounted for when matching a color on an object. The max() operation there is just a cap to prevent things from getting really wacky at certain color values.
This particular fragment shader is entirely based on the example provided by Apple's Core Image engineers in their GPU Gems article entitled "Object Detection by Color: Using the GPU for Real-Time Video Image Processing", and they go into a little more detail about it there.
A better approach would be to get the proximity to a given color by removing the luminance component and instead examining the chrominance of a pixel. If you have the YUV source, you can do this pretty easily from the Cr and Cb components. My GPUImageChromaKeyFilter illustrates the extraction of YUV data from RGB inputs, with a thresholding then applied around the chrominance. This, too, was drawn from an example by Apple (I believe this was from their ChromaKey WWDC sample).

What are the practical differences when working with colors in a linear vs. a non-linear RGB space?

What is the basic property of a linear RGB space and what is the fundamental property of a non-linear one? When talking about the values inside each channel in those 8 (or more) bits, what changes?
In OpenGL, colors are 3+1 values, and with this i mean RGB+alpha, with 8 bit reserved to each channel, and this is the part that i get clearly.
But when it comes to gamma correction i don't get what the effect of working in a non-linear RGB space is.
Since i know how to use a curve in a graphic software for photo-editing, my explanation is that in a linear RGB space you take the values as they are, with no manipulation and no math function attached, instead when it's non-linear each channel usually evolves following a classic power function behaviour.
Even if i take this explanation as the real one, i still don't get what a real linear space is, because after computation all non-linear RGB spaces becomes linear and most important of all i don't get the part where a non-linear color space is more suitable for the human eye because in the end all RGB spaces are linear for what i understand.
Let's say you're working with RGB colors: each color is represented with three intensities or brightnesses. You've got to choose between "linear RGB" and "sRGB". For now, we'll simplify things by ignoring the three different intensities, and assume you just have one intensity: that is, you're only dealing with shades of gray.
In a linear color-space, the relationship between the numbers you store and the intensities they represent is linear. Practically, this means that if you double the number, you double the intensity (the lightness of the gray). If you want to add two intensities together (because you're computing an intensity based on the contributions of two light sources, or because you're adding a transparent object on top of an opaque object), you can do this by just adding the two numbers together. If you're doing any kind of 2D blending or 3D shading, or almost any image processing, then you want your intensities in a linear color-space, so you can just add, subtract, multiply, and divide numbers to have the same effect on the intensities. Most color-processing and rendering algorithms only give correct results with linear RGB, unless you add extra weights to everything.
That sounds really easy, but there's a problem. The human eye's sensitivity to light is finer at low intensities than high intensities. That's to say, if you make a list of all the intensities you can distinguish, there are more dark ones than light ones. To put it another way, you can tell dark shades of gray apart better than you can with light shades of gray. In particular, if you're using 8 bits to represent your intensity, and you do this in a linear color-space, you'll end up with too many light shades, and not enough dark shades. You get banding in your dark areas, while in your light areas, you're wasting bits on different shades of near-white that the user can't tell apart.
To avoid this problem, and make the best use of those 8 bits, we tend to use sRGB. The sRGB standard tells you a curve to use, to make your colors non-linear. The curve is shallower at the bottom, so you can have more dark grays, and steeper at the top, so you have fewer light grays. If you double the number, you more than double the intensity. This means that if you add sRGB colors together, you end up with a result that is lighter than it should be. These days, most monitors interpret their input colors as sRGB. So, when you're putting a color on the screen, or storing it in an 8-bit-per-channel texture, store it as sRGB, so you make the best use of those 8 bits.
You'll notice we now have a problem: we want our colors processed in linear space, but stored in sRGB. This means you end up doing sRGB-to-linear conversion on read, and linear-to-sRGB conversion on write. As we've already said that linear 8-bit intensities don't have enough darks, this would cause problems, so there's one more practical rule: don't use 8-bit linear colors if you can avoid it. It's becoming conventional to follow the rule that 8-bit colors are always sRGB, so you do your sRGB-to-linear conversion at the same time as widening your intensity from 8 to 16 bits, or from integer to floating-point; similarly, when you've finished your floating-point processing, you narrow to 8 bits at the same time as converting to sRGB. If you follow these rules, you never have to worry about gamma correction.
When you're reading an sRGB image, and you want linear intensities, apply this formula to each intensity:
float s = read_channel();
float linear;
if (s <= 0.04045) linear = s / 12.92;
else linear = pow((s + 0.055) / 1.055, 2.4);
Going the other way, when you want to write an image as sRGB, apply this formula to each linear intensity:
float linear = do_processing();
float s;
if (linear <= 0.0031308) s = linear * 12.92;
else s = 1.055 * pow(linear, 1.0/2.4) - 0.055; ( Edited: The previous version is -0.55 )
In both cases, the floating-point s value ranges from 0 to 1, so if you're reading 8-bit integers you want to divide by 255 first, and if you're writing 8-bit integers you want to multiply by 255 last, the same way you usually would. That's all you need to know to work with sRGB.
Up to now, I've dealt with one intensity only, but there are cleverer things to do with colors. The human eye can tell different brightnesses apart better than different tints (more technically, it has better luminance resolution than chrominance), so you can make even better use of your 24 bits by storing the brightness separately from the tint. This is what YUV, YCrCb, etc. representations try to do. The Y channel is the overall lightness of the color, and uses more bits (or has more spatial resolution) than the other two channels. This way, you don't (always) need to apply a curve like you do with RGB intensities. YUV is a linear color-space, so if you double the number in the Y channel, you double the lightness of the color, but you can't add or multiply YUV colors together like you can with RGB colors, so it's not used for image processing, only for storage and transmission.
I think that answers your question, so I'll end with a quick historical note. Before sRGB, old CRTs used to have a non-linearity built into them. If you doubled the voltage for a pixel, you would more than double the intensity. How much more was different for each monitor, and this parameter was called the gamma. This behavior was useful because it meant you could get more darks than lights, but it also meant you couldn't tell how bright your colors would be on the user's CRT, unless you calibrated it first. Gamma correction means transforming the colors you start with (probably linear) and transforming them for the gamma of the user's CRT. OpenGL comes from this era, which is why its sRGB behavior is sometimes a little confusing. But GPU vendors now tend to work with the convention I described above: that when you're storing an 8-bit intensity in a texture or framebuffer, it's sRGB, and when you're processing colors, it's linear. For example, an OpenGL ES 3.0, each framebuffer and texture has an "sRGB flag" you can turn on to enable automatic conversion when reading and writing. You don't need to explicitly do sRGB conversion or gamma correction at all.
I am not a "human color detection expert", but I've met similar thing on the YUV->RGB conversion. There are different weights for R/G/B channels, so if you change the source color by x, RGB values change different quantity.
As said, I'm not an expert, anyway, I think, if you want to do some color-correct transformation, you should do it in YUV space, then convert it to RGB (or do the mathematically equivalent operation on RGB, beware of data loss). Also, I'm not sure that YUV is the best native representation of colors, but video cameras provide that format, that's where I've met the issue.
Here is the magic YUV->RGB formula with secret numbers included: http://www.fourcc.org/fccyvrgb.php

Why does greyscale work the way it does?

My original question
I read that to convert a RGB pixel into greyscale RGB, one should use
r_new = g_new = b_new = r_old * 0.3 + g_old * 0.59 + b_old * 0.11
I also read, and understand, that g has a higher weighting because the human eye is more sensitive to green. Implementing that, I saw the results were the same as I would get from setting an image to 'greyscale' in an image editor like the Gimp.
Before I read this, I imagined that to convert a pixel to greyscale, one would convert it to HSL or HSV, then set the saturation to zero (hence, removing all colour). However, when I did this, I got a quite different image output, even though it also lacked colour.
How does s = 0 exactly differ from the 'correct' way I read, and why is it 'incorrect'?
Ongoing findings based on answers and other research
It appears that which luminance coefficients to use is the subject of some debate. Various combinations and to-greyscale algorithms have different results. The following are some presets used in areas like TV standards:
the coefficients defined by ITU-R BT.601 (NTSC?) are 0.299r + 0.587g + 0.114b
the coefficients defined by ITU-R BT.709 (newer) are 0.2126r + 0.7152g + 0.0722b
the coefficients of equal thirds, (1/3)(rgb), is equivalent to s = 0
This scientific article details various greyscale techniques and their results for various images, plus subjective survey of 119 people.
However, when converting an image to greyscale, to achieve the 'best' artistic effect, one will almost certainly not be using these predefined coefficients, but tweaking the contribution from each channel to produce the best output for the particular image.
Although these transformation coefficients exist, nothing binds you to using them. As long as the total intensity of each pixel is unchanged, the contributions from each channel can be anything, ranging from 0 to 100%.
Photographers converting images to grayscale use channel mixers to adjust levels of each channel (RGB or CMYK). In your image, there are many reds and greens, so it might be desirable (depending on your intent) to have those channels more highly represented in the gray level intensity than the blue.
This is what distinguishes "scientific" transformation of the image from an "artistic" combination of the bands.
An additional consideration is the dynamic range of values in each band, and attempting to preserve them in the grayscale image. Boosting shadows and/or highlights might require increasing the contribution of the blue band, for example.
An interesting article on the topic here.... "because human eyes don't detect brightness linearly with color".
http://www.scantips.com/lumin.html
Looks like these coefficients come from old CRT technology and are not well adapted to today's monitors, from the Color FAQ:
The coefficients 0.299, 0.587 and
0.114 properly computed luminance for monitors having phosphors that were
contemporary at the introduction of
NTSC television in 1953. They are
still appropriate for computing video
luma to be discussed below in section
11. However, these coefficients do not accurately compute luminance for
contemporary monitors.
Couldn't find the right conversion coefficient, however.
See also RGB to monochrome conversion
Using s = 0 in HSL/HSV and converting to RGB results in R = G = B, so is the same as doing r_old * 1/3 + g_old * 1/3 + b_old * 1/3.
To understand why, have a look at the Wikipedia page that describes conversion HSV->RGB. Saturation s will be 0, so C and X will be, too. You'll end up with R_1,G_1,B_1 being (0,0,0) and then add m to the final RGB values which results in (m,m,m) = (V,V,V). Same for HSL, result will be (m,m,m) = (L,L,L).
EDIT: OK, just figured out the above is not the complete answer, although it's a good starting point. RGB values will be all the same, either L or V, but it still depends on how L and V were originally calculated, again, see Wikipedia. Seems the program/formulas you've used for converting used the 1/3 * R + 1/3 * G + 1/3 * B solution or one of the other two (hexcone/bi-hexcone).
So after all, using HSL/HSV just means you'll have to decide which formula to use earlier and conversion to RGB grayscale values later is just isolating the last component.

Resources