Why does greyscale work the way it does?

Why does greyscale work the way it does? - colors

My original question
I read that to convert a RGB pixel into greyscale RGB, one should use
r_new = g_new = b_new = r_old * 0.3 + g_old * 0.59 + b_old * 0.11
I also read, and understand, that g has a higher weighting because the human eye is more sensitive to green. Implementing that, I saw the results were the same as I would get from setting an image to 'greyscale' in an image editor like the Gimp.
Before I read this, I imagined that to convert a pixel to greyscale, one would convert it to HSL or HSV, then set the saturation to zero (hence, removing all colour). However, when I did this, I got a quite different image output, even though it also lacked colour.
How does s = 0 exactly differ from the 'correct' way I read, and why is it 'incorrect'?
Ongoing findings based on answers and other research
It appears that which luminance coefficients to use is the subject of some debate. Various combinations and to-greyscale algorithms have different results. The following are some presets used in areas like TV standards:
the coefficients defined by ITU-R BT.601 (NTSC?) are 0.299r + 0.587g + 0.114b
the coefficients defined by ITU-R BT.709 (newer) are 0.2126r + 0.7152g + 0.0722b
the coefficients of equal thirds, (1/3)(rgb), is equivalent to s = 0
This scientific article details various greyscale techniques and their results for various images, plus subjective survey of 119 people.
However, when converting an image to greyscale, to achieve the 'best' artistic effect, one will almost certainly not be using these predefined coefficients, but tweaking the contribution from each channel to produce the best output for the particular image.

Although these transformation coefficients exist, nothing binds you to using them. As long as the total intensity of each pixel is unchanged, the contributions from each channel can be anything, ranging from 0 to 100%.
Photographers converting images to grayscale use channel mixers to adjust levels of each channel (RGB or CMYK). In your image, there are many reds and greens, so it might be desirable (depending on your intent) to have those channels more highly represented in the gray level intensity than the blue.
This is what distinguishes "scientific" transformation of the image from an "artistic" combination of the bands.
An additional consideration is the dynamic range of values in each band, and attempting to preserve them in the grayscale image. Boosting shadows and/or highlights might require increasing the contribution of the blue band, for example.

An interesting article on the topic here.... "because human eyes don't detect brightness linearly with color".
http://www.scantips.com/lumin.html

Looks like these coefficients come from old CRT technology and are not well adapted to today's monitors, from the Color FAQ:
The coefficients 0.299, 0.587 and
0.114 properly computed luminance for monitors having phosphors that were
contemporary at the introduction of
NTSC television in 1953. They are
still appropriate for computing video
luma to be discussed below in section
11. However, these coefficients do not accurately compute luminance for
contemporary monitors.
Couldn't find the right conversion coefficient, however.
See also RGB to monochrome conversion

Using s = 0 in HSL/HSV and converting to RGB results in R = G = B, so is the same as doing r_old * 1/3 + g_old * 1/3 + b_old * 1/3.
To understand why, have a look at the Wikipedia page that describes conversion HSV->RGB. Saturation s will be 0, so C and X will be, too. You'll end up with R_1,G_1,B_1 being (0,0,0) and then add m to the final RGB values which results in (m,m,m) = (V,V,V). Same for HSL, result will be (m,m,m) = (L,L,L).
EDIT: OK, just figured out the above is not the complete answer, although it's a good starting point. RGB values will be all the same, either L or V, but it still depends on how L and V were originally calculated, again, see Wikipedia. Seems the program/formulas you've used for converting used the 1/3 * R + 1/3 * G + 1/3 * B solution or one of the other two (hexcone/bi-hexcone).
So after all, using HSL/HSV just means you'll have to decide which formula to use earlier and conversion to RGB grayscale values later is just isolating the last component.

Related

Luminance Matching Two Colors

This will likely seem like a very easy thing I'm trying to do but Google search has not turned up exactly what I'm looking for and I'd like to do this correctly.
Essentially I need to luminance match two bmps. They are simple circles (125x125 pixels) and their original color is only know to me by their (0-255 ranged) RGB value of 255,0,0. I need to find an RGB value of gray that is the same luminance of these circles.
All other luminance/brightness matching tutorials I have seen have been for pictures that have included, a variety of hues, brightnesses, etc. and I am not sure if those techniques will work in this (admittedly more simple) case.
I am hoping to be able to just figure out the RGB values so I can input them into an experiment builder program but I do have access to GIMP if any of its tools are needed or will help.
I apologize for this likely easy question but I know little of graphics, brightness measures, etc. I appreciate any help that can be provided.
ADDENDUM: I actually think this would be a good place to ask one additional question. Is there a formula for conversion of candela to (perhaps approximate?) RGB values? I'm basing these color values loosely off of candela values and would love to know if an equation/way of equating the two beyond guesswork exists.

You need to be careful about luminance-matching digital images, because the actual luminance depends on how they're displayed. In particular, you want to watch out for "gamma correction", which is a nonlinear mapping between the RGB values and the actual display brightness. Some images may have an internal "gamma" value associated with the data itself, and many display devices effectively apply a "gamma" to the RGB values they display.
However, for an image stored and displayed linearly (with an effective gamma of 1), there is a standard luminance measure for RGB values:
Y = 0.2126 * R + 0.7152 * G + 0.0722 * B
There are, actually, a number of standards, with different weights for the linear R, G, and B components. However, if you aren't sure exactly how your image will be displayed, you might as well pick one and stick with it...
Anyway, you can use this to solve your specific problem, as follows: you want a grey value (r,g,b) = (x,x,x) with the same luminance as a pure-red value of 255. Conveniently, the three luminance constants sum to 1.0. This gives you the following formula:
Y == 1.0 * x == 0.2126 * 255
--> x ~ 54
If you want to match a different color, or use different luminance weights (which still sum to 1.0), the procedure is the same: just weight the RGB values according to the luminance formula, then pick a grey value equal to the luminance.

I believe the answer already given is misleading (SO doesn't let me comment). As mentioned the formula given applies to intensities and you should watch out for gamma, see e.g. here:
http://www.poynton.com/notes/colour_and_gamma/GammaFAQ.html#luminance
Thus, the application example should use coefficients that account for gamma, or compensate the gamma by hand which it doesn't. Yes, the image could be linear (so you have actual intensities), but judging from the description the chance is close to zero that it is.
These coefficients yield 'luma', not luminance, but that is what you have asked for anyway. See:
http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html#RTFToC11
To summarize:
luma = 0.299 R + 0.587 G + 0.114 B
(r,g,b) = (luma, luma, luma)
The material should also help with your addendum question. I've found it to be very reliable, which is clearly an exception in this field.

What are the practical differences when working with colors in a linear vs. a non-linear RGB space?

What is the basic property of a linear RGB space and what is the fundamental property of a non-linear one? When talking about the values inside each channel in those 8 (or more) bits, what changes?
In OpenGL, colors are 3+1 values, and with this i mean RGB+alpha, with 8 bit reserved to each channel, and this is the part that i get clearly.
But when it comes to gamma correction i don't get what the effect of working in a non-linear RGB space is.
Since i know how to use a curve in a graphic software for photo-editing, my explanation is that in a linear RGB space you take the values as they are, with no manipulation and no math function attached, instead when it's non-linear each channel usually evolves following a classic power function behaviour.
Even if i take this explanation as the real one, i still don't get what a real linear space is, because after computation all non-linear RGB spaces becomes linear and most important of all i don't get the part where a non-linear color space is more suitable for the human eye because in the end all RGB spaces are linear for what i understand.

Let's say you're working with RGB colors: each color is represented with three intensities or brightnesses. You've got to choose between "linear RGB" and "sRGB". For now, we'll simplify things by ignoring the three different intensities, and assume you just have one intensity: that is, you're only dealing with shades of gray.
In a linear color-space, the relationship between the numbers you store and the intensities they represent is linear. Practically, this means that if you double the number, you double the intensity (the lightness of the gray). If you want to add two intensities together (because you're computing an intensity based on the contributions of two light sources, or because you're adding a transparent object on top of an opaque object), you can do this by just adding the two numbers together. If you're doing any kind of 2D blending or 3D shading, or almost any image processing, then you want your intensities in a linear color-space, so you can just add, subtract, multiply, and divide numbers to have the same effect on the intensities. Most color-processing and rendering algorithms only give correct results with linear RGB, unless you add extra weights to everything.
That sounds really easy, but there's a problem. The human eye's sensitivity to light is finer at low intensities than high intensities. That's to say, if you make a list of all the intensities you can distinguish, there are more dark ones than light ones. To put it another way, you can tell dark shades of gray apart better than you can with light shades of gray. In particular, if you're using 8 bits to represent your intensity, and you do this in a linear color-space, you'll end up with too many light shades, and not enough dark shades. You get banding in your dark areas, while in your light areas, you're wasting bits on different shades of near-white that the user can't tell apart.
To avoid this problem, and make the best use of those 8 bits, we tend to use sRGB. The sRGB standard tells you a curve to use, to make your colors non-linear. The curve is shallower at the bottom, so you can have more dark grays, and steeper at the top, so you have fewer light grays. If you double the number, you more than double the intensity. This means that if you add sRGB colors together, you end up with a result that is lighter than it should be. These days, most monitors interpret their input colors as sRGB. So, when you're putting a color on the screen, or storing it in an 8-bit-per-channel texture, store it as sRGB, so you make the best use of those 8 bits.
You'll notice we now have a problem: we want our colors processed in linear space, but stored in sRGB. This means you end up doing sRGB-to-linear conversion on read, and linear-to-sRGB conversion on write. As we've already said that linear 8-bit intensities don't have enough darks, this would cause problems, so there's one more practical rule: don't use 8-bit linear colors if you can avoid it. It's becoming conventional to follow the rule that 8-bit colors are always sRGB, so you do your sRGB-to-linear conversion at the same time as widening your intensity from 8 to 16 bits, or from integer to floating-point; similarly, when you've finished your floating-point processing, you narrow to 8 bits at the same time as converting to sRGB. If you follow these rules, you never have to worry about gamma correction.
When you're reading an sRGB image, and you want linear intensities, apply this formula to each intensity:
float s = read_channel();
float linear;
if (s <= 0.04045) linear = s / 12.92;
else linear = pow((s + 0.055) / 1.055, 2.4);
Going the other way, when you want to write an image as sRGB, apply this formula to each linear intensity:
float linear = do_processing();
float s;
if (linear <= 0.0031308) s = linear * 12.92;
else s = 1.055 * pow(linear, 1.0/2.4) - 0.055; ( Edited: The previous version is -0.55 )
In both cases, the floating-point s value ranges from 0 to 1, so if you're reading 8-bit integers you want to divide by 255 first, and if you're writing 8-bit integers you want to multiply by 255 last, the same way you usually would. That's all you need to know to work with sRGB.
Up to now, I've dealt with one intensity only, but there are cleverer things to do with colors. The human eye can tell different brightnesses apart better than different tints (more technically, it has better luminance resolution than chrominance), so you can make even better use of your 24 bits by storing the brightness separately from the tint. This is what YUV, YCrCb, etc. representations try to do. The Y channel is the overall lightness of the color, and uses more bits (or has more spatial resolution) than the other two channels. This way, you don't (always) need to apply a curve like you do with RGB intensities. YUV is a linear color-space, so if you double the number in the Y channel, you double the lightness of the color, but you can't add or multiply YUV colors together like you can with RGB colors, so it's not used for image processing, only for storage and transmission.
I think that answers your question, so I'll end with a quick historical note. Before sRGB, old CRTs used to have a non-linearity built into them. If you doubled the voltage for a pixel, you would more than double the intensity. How much more was different for each monitor, and this parameter was called the gamma. This behavior was useful because it meant you could get more darks than lights, but it also meant you couldn't tell how bright your colors would be on the user's CRT, unless you calibrated it first. Gamma correction means transforming the colors you start with (probably linear) and transforming them for the gamma of the user's CRT. OpenGL comes from this era, which is why its sRGB behavior is sometimes a little confusing. But GPU vendors now tend to work with the convention I described above: that when you're storing an 8-bit intensity in a texture or framebuffer, it's sRGB, and when you're processing colors, it's linear. For example, an OpenGL ES 3.0, each framebuffer and texture has an "sRGB flag" you can turn on to enable automatic conversion when reading and writing. You don't need to explicitly do sRGB conversion or gamma correction at all.

I am not a "human color detection expert", but I've met similar thing on the YUV->RGB conversion. There are different weights for R/G/B channels, so if you change the source color by x, RGB values change different quantity.
As said, I'm not an expert, anyway, I think, if you want to do some color-correct transformation, you should do it in YUV space, then convert it to RGB (or do the mathematically equivalent operation on RGB, beware of data loss). Also, I'm not sure that YUV is the best native representation of colors, but video cameras provide that format, that's where I've met the issue.
Here is the magic YUV->RGB formula with secret numbers included: http://www.fourcc.org/fccyvrgb.php

A better Greyscale algorithm

I'm trying to create a spectral image with a constant grey-scale value for every row. I've written some fantastically slow code that basically tries 1000 different variation between black and white for a given hue and it finds the one whose grey-scale value most closely approximates the target value, resulting in the following image:
On my laptop screen (HP) there is a very noticeable 'dip' near the blue peak, where blue pixels near the bottom of the image appear much brighter than the neighbouring purple and cyan pixels. On my second screen (Acer, which has far superior colour display) the dip is smaller, but still there.
I use the following function to compute the grey-scale approximation of a colour:
Math.Abs(targetGrey - (0.2989 * R + 0.5870 * G + 0.1140 * B))
when I convert the image to grey-scale using Paint.NET, I get a perfect black to white gradient, so that part of the code at least works.
So, question: Is this purely an artefact of the display qualities of my screens? Or can the above mentioned grey-scale algorithm be improved upon to give a visually more consistent result?
EDIT: The problem seems to be mostly monitor calibration. Not, I repeat not, a problem with the code.

I'm wondering if its more to do with the way our eyes interpret the colors, rather than screen artifacts.
That said... I am using a very-high quality screen (Dell Ultrasharp, IPS) that has incredible color reproduction and I'm not sure what you mean by "dip" in the blue peak. So either I'm just not noticing it, or my screen doesn't show the same picture and it more color-accurate.

The output looks correct given the greyscale conversion you have used (which I believe is the standard one for sRGB colour spaces).
However - there are lots of tradeoffs in colour models and one of these is that you can get results which aren't visually quite what you want. In your case, the fact that there is a very low blue weight means that a greater amount of blue is needed to get any given greyscale value, hence the blue seems to start lower, at least in terms of how the human eye perceives it.
If your objective is to get a visually appealing spectral image, then I'd suggest altering your function to make the R,G,B weights more equal, and see if you like what you get.

Sorting a list of colors in one dimension?

I would like to sort a one-dimensional list of colors so that colors that a typical human would perceive as "like" each other are near each other.
Obviously this is a difficult or perhaps impossible problem to get "perfectly", since colors are typically described with three dimensions, but that doesn't mean that there aren't some sorting methods that look obviously more natural than others.
For example, sorting by RGB doesn't work very well, as it will sort in the following order, for example:
(1) R=254 G=0 B=0
(2) R=254 G=255 B=0
(3) R=255 G=0 B=0
(4) R=255 G=255 B=0
That is, it will alternate those colors red, yellow, red, yellow, with the two "reds" being essentially imperceivably different than each other, and the two yellows also being imperceivably different from each other.
But sorting by HLS works much better, generally speaking, and I think HSL even better than that; with either, the reds will be next to each other, and the yellows will be next to each other.
But HLS/HSL has some problems, too; things that people would perceive as "black" could be split far apart from each other, as could things that people would perceive as "white".
Again, I understand that I pretty much have to accept that there will be some splits like this; I'm just wondering if anyone has found a better way than HLS/HSL. And I'm aware that "better" is somewhat arbitrary; I mean "more natural to a typical human".
For example, a vague thought I've had, but have not yet tried, is perhaps "L is the most important thing if it is very high or very low", but otherwise it is the least important. Has anyone tried this? Has it worked well? What specifically did you decide "very low" and "very high" meant? And so on. Or has anyone found anything else that would improve upon HSL?
I should also note that I am aware that I can define a space-filling curve through the cube of colors, and order them one-dimensionally as they would be encountered while travelling along that curve. That would eliminate perceived discontinuities. However, it's not really what I want; I want decent overall large-scale groupings more than I want perfect small-scale groupings.
Thanks in advance for any help.

If you want to sort a list of colors in one dimension you first have to decide by what metrics you are going to sort them. The most sense to me is the perceived brightness (related question).
I have came across 4 algorithms to sort colors by brightness and compared them. Here is the result.
I generated colors in cycle where only about every 400th color was used. Each color is represented by 2x2 pixels, colors are sorted from darkest to lightest (left to right, top to bottom).
1st picture - Luminance (relative)
0.2126 * R + 0.7152 * G + 0.0722 * B
2nd picture - http://www.w3.org/TR/AERT#color-contrast
0.299 * R + 0.587 * G + 0.114 * B
3rd picture - HSP Color Model
sqrt(0.299 * R^2 + 0.587 * G^2 + 0.114 * B^2)
4td picture - WCAG 2.0 SC 1.4.3 relative luminance and contrast ratio formula
Pattern can be sometimes spotted on 1st and 2nd picture depending on the number of colors in one row. I never spotted any pattern on picture from 3rd or 4th algorithm.
If i had to choose i would go with algorithm number 3 since its much easier to implement and its about 33% faster than the 4th

You cannot do this without reducing the 3 color dimensions to a single measurement. There are many (infinite) ways of reducing this information, but it is not mathematically possible to do this in a way that ensures that two data points near each other on the reduced continuum will also be near each other in all three of their component color values. As a result, any formula of this type will potentially end up grouping dissimilar colors.
As you mentioned in your question, one way to sort of do this would be to fit a complex curve through the three-dimensional color space occupied by the data points you're trying to sort, and then reduce each data point to its nearest location on the curve and then to that point's distance along the curve. This would work, but in each case it would be a solution custom-tailored to a particular set of data points (rather than a generally applicable solution). It would also be relatively expensive (maybe), and simply wouldn't work on a data set that was not nicely distributed in a curved-line sort of way.
A simpler alternative (that would not work perfectly) would be to choose two "endpoint" colors, preferably on opposite sides of the color wheel. So, for example, you could choose Red as one endpoint color and Blue as the other. You would then convert each color data point to a value on a scale from 0 to 1, where a color that is highly Reddish would get a score near 0 and a color that is highly Bluish would get a score near 1. A score of .5 would indicate a color that either has no Red or Blue in it (a.k.a. Green) or else has equal amounts of Red and Blue (a.k.a. Purple). This approach isn't perfect, but it's the best you can do with this problem.

There are several standard techniques for reducing multiple dimensions to a single dimension with some notion of "proximity".
I think you should in particular check out the z-order transform.
You can implement a quick version of this by interleaving the bits of your three colour components, and sorting the colours based on this transformed value.
The following Java code should help you get started:
public static int zValue(int r, int g, int b) {
return split(r) + (split(g)<<1) + (split(b)<<2);
}
public static int split(int a) {
// split out the lowest 10 bits to lowest 30 bits
a=(a|(a<<12))&00014000377;
a=(a|(a<<8)) &00014170017;
a=(a|(a<<4)) &00303030303;
a=(a|(a<<2)) &01111111111;
return a;
}

There are two approaches you could take. The simple approach is to distil each colour into a single value, and the list of values can then be sorted. The complex approach would depend on all of the colours you have to sort; perhaps it would be an iterative solution that repeatedly shuffles the colours around trying to minimise the "energy" of the entire sequence.
My guess is that you want something simple and fast that looks "nice enough" (rather than trying to figure out the "optimum" aesthetic colour sort), so the simple approach is enough for you.
I'd say HSL is the way to go. Something like
sortValue = L * 5 + S * 2 + H
assuming that H, S and L are each in the range [0, 1].

Here's an idea I came up with after a couple of minutes' thought. It might be crap, or it might not even work at all, but I'll spit it out anyway.
Define a distance function on the space of colours, d(x, y) (where the inputs x and y are colours and the output is perhaps a floating-point number). The distance function you choose may not be terribly important. It might be the sum of the squares of the differences in R, G and B components, say, or it might be a polynomial in the differences in H, L and S components (with the components differently weighted according to how important you feel they are).
Then you calculate the "distance" of each colour in your list from each other, which effectively gives you a graph. Next you calculate the minimum spanning tree of your graph. Then you identify the longest path (with no backtracking) that exists in your MST. The endpoints of this path will be the endpoints of the final list. Next you try to "flatten" the tree into a line by bringing points in the "branches" off your path into the path itself.
Hmm. This might not work all that well if your MST ends up in the shape of a near-loop in colour space. But maybe any approach would have that problem.

RGB to monochrome conversion

How do I convert the RGB values of a pixel to a single monochrome value?

I found one possible solution in the Color FAQ. The luminance component Y (from the CIE XYZ system) captures what is most perceived by humans as color in one channel. So, use those coefficients:
mono = (0.2125 * color.r) + (0.7154 * color.g) + (0.0721 * color.b);

This MSDN article uses (0.299 * color.R + 0.587 * color.G + 0.114 * color.B);
This Wikipedia article uses (0.3* color.R + 0.59 * color.G + 0.11 * color.B);

This depends on what your motivations are. If you just want to turn an arbitrary image to grayscale and have it look pretty good, the conversions in other answers to this question will do.
If you are converting color photographs to black and white, the process can be both very complicated and subjective, requiring specific tweaking for each image. For an idea what might be involved, take a look at this tutorial from Adobe for Photoshop.
Replicating this in code would be fairly involved, and would still require user intervention to get the resulting image aesthetically "perfect" (whatever that means!).

As mentioned also, a grayscale translation (note that monochromatic images need not to be in grayscale) from an RGB-triplet is subject to taste.
For example, you could cheat, extract only the blue component, by simply throwing the red and green components away, and copying the blue value in their stead. Another simple and generally ok solution would be to take the average of the pixel's RGB-triplet and use that value in all three components.
The fact that there's a considerable market for professional and not-very-cheap-at-all-no-sirree grayscale/monochrome converter plugins for Photoshop alone, tells that the conversion is just as simple or complex as you wish.

The logic behind converting any RGB based picture to monochrome can is not a trivial linear transformation. In my opinion such a problem is better addressed by "Color Segmentation" techniques. You could achieve "Color segmentation" by k-means clustering.
See reference example from MathWorks site.
https://www.mathworks.com/examples/image/mw/images-ex71219044-color-based-segmentation-using-k-means-clustering
Original picture in colours.
After converting to monochrome using k-means clustering
How does this work?
Collect all pixel values from entire image. From an image which is W pixels wide and H pixels high, you will get W *H color values. Now, using k-means algorithm create 2 clusters (or bins) and throw the colours into the appropriate "bins". The 2 clusters represent your black and white shades.
Youtube video demonstrating image segmentation using k-means?
https://www.youtube.com/watch?v=yR7k19YBqiw
Challenges with this method
The k-means clustering algorithm is susceptible to outliers. A few random pixels with a color whose RGB distance is far away from the rest of the crowd could easily skew the centroids to produce unexpected results.

Just to point out in the self-selected answer, you have to LINEARIZE the sRGB values before you can apply the coefficients. This means removing the transfer curve.
To remove the power curve, divide the 8 bit R G and B channels by 255.0, then either use the sRGB piecewise transform, which is recommended for image procesing, OR you can cheat and raise each channel to the power of 2.2.
Only after linearizing can you apply the coefficients shown, (which also are not exactly correct in the selected answer).
The standard is 0.2126 0.7152 and 0.0722. Multiply each channel by its coefficient and sum them together for Y, the luminance. Then re-apply the gamma to Y and multiply by 255, then copy to all three channels, and boom you have a greyscale (monochrome) image.
Here it is all at once in one simple line:
// Andy's Easy Greyscale in one line.
// Send it sR sG sB channels as 8 bit ints, and
// it returns three channels sRgrey sGgrey sBgrey
// as 8 bit ints that display glorious grey.
sRgrey = sGgrey = sBgrey = Math.min(Math.pow((Math.pow(sR/255.0,2.2)*0.2126+Math.pow(sG/255.0,2.2)*0.7152+Math.pow(sB/255.0,2.2)*0.0722),0.454545)*255),255);
And that's it. Unless you have to parse hex strings....

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string