Why is color segmentation easier on HSV? - colors

I've heard that if you need to do a color segmentation on your software (create a binary image from a colored image by setting pixels to 1 if they meet certain threshold rules like R<100, G>100, 10< B < 123) it is better to first convert your image to HSV. Is this really true? And why?

The big reason is that it separates color information (chroma) from intensity or lighting (luma). Because value is separated, you can construct a histogram or thresholding rules using only saturation and hue. This in theory will work regardless of lighting changes in the value channel. In practice it is just a nice improvement. Even by singling out only the hue you still have a very meaningful representation of the base color that will likely work much better than RGB. The end result is a more robust color thresholding over simpler parameters.
Hue is a continuous representation of color so that 0 and 360 are the same hue which gives you more flexibility with the buckets you use in a histogram. Geometrically you can picture the HSV color space as a cone or cylinder with H being the degree, saturation being the radius, and value being the height. See the HSV wikipedia page.

Related

Relation of luminance in RGB/XYZ color and physical luminance

Short version: When a color described in XYZ or xyY coordinates has a luminance Y=1, what are the physical units of that? Does that mean 1 candela, or 1 lumen? Is there any way to translate between this conceptual space and physical brightness?
Long version: I want to simulate how the sky looks in different directions, at different times of day, and (eventually) under different cloudiness and air pollution conditions. I've learned enough to figure out how to translate a given spectrum into a chrominance, for example xyz coordinates. But almost everything I've read on color theory in graphical display is focused on relative color, so the luminance is always 1. Non-programming color theory describes the units of luminance, so that I can translate from a spectrum in watts/square meter/steradian to candela or lumens, but nothing that describes the units of luminance in programming. What are the units of luminance in XYZ coordinates? I understand that the actual brightness of a patch would depend on monitor settings, but I'm really not finding any hints as to how to proceed.
Below is an example of what I'm coming across. The base color, at relative luminance of 1, was calculated from first principles. All the other colors are generated by increasing or decreasing the luminance. Most of them are plausible colors for mid-day sky. For the parameters I've chosen, I believe the total intensity in the visible range is 6.5 W/m2/sr = 4434 cd/m2, which seems to be in the right ballpark according to Wiki: Orders of Magnitude. Which color would I choose to represent that patch of sky?
Without more, luminance is usually expressed in candelas per square meter (cd/m2), and CIE XYZ's Y component is a luminance in cd/m2 — if the convention used is "absolute XYZ", which is rare. (The link is to an article I wrote which contains more detailed information.) More commonly, XYZ colors are normalized such that the white point (such as the D65 or D50 white point) has Y = 1 (or Y = 100).

CIE-L*u*v* color interpolation

I'm writing a vertex decimator that needs to interpolate vertex colors on a mesh. I'm reading Level of Detail for 3D Graphics for domain material. In the color interpolation secion, the book goes on to suggest using the CIE-Luv* color space to perform perceptual linear interpolation of colors.
The translation equations to and from the CIE XYZ color space are provided. I am able to implement the equations it provides, but Wikipedia leaves out numeric values of the following variables: u'n, v'n, and Yn.
The article say these values depend on a "specified white point" and its "luminance". It suggests u'n = 0.2009 and v'n = 0.4610 when using 2° observer and standard illuminant C. If I am using these, what would Yn be? I do not know enough physics to figure this out, and I have been unable to search for an answer on Google.
In the end, my question boils down to: What are satisfactory/appropriate values I can use for u'n, v'n, and Yn?
Also, I'm assuming I simply linearly interpolate piecewise each component of CIE-Luv* (L*, u*, and v*) when interpolating values in this color space. Is this correct?
These three values are left out its because they depend on the colorspace of the specific device (e.g. display, printer or camera). Since computer screens use an RGB colorspace where perceived grey are R=B=G, you can assume that the values are not device dependant. I can't remember the values of by heart, so I'll edit them in later.
The human eye perceives luminance/intensity logarithmically, however, a linear interpolation is close enough, especially since you don't know what the actual min and max screen levels are.
The human eye perceives the color angle linearly, however, you need to take into account that the angle id's cyclic, therefore, the interpolation of the min and max angles should equal min (or max) and not the half way point. E.g. average of purple and red should be purple.
I think that the perception of saturation is also logarithmic, however, can be approximated by a linear interpolation.
Edit:
It seems like most sites use the sRGB to XYZ formulas.
http://www.brucelindbloom.com/index.html?Eqn_RGB_XYZ_Matrix.html
http://www.easyrgb.com/index.php?X=MATH&H=02#text2
http://colormine.org/convert/rgb-to-xyz

What are the practical differences when working with colors in a linear vs. a non-linear RGB space?

What is the basic property of a linear RGB space and what is the fundamental property of a non-linear one? When talking about the values inside each channel in those 8 (or more) bits, what changes?
In OpenGL, colors are 3+1 values, and with this i mean RGB+alpha, with 8 bit reserved to each channel, and this is the part that i get clearly.
But when it comes to gamma correction i don't get what the effect of working in a non-linear RGB space is.
Since i know how to use a curve in a graphic software for photo-editing, my explanation is that in a linear RGB space you take the values as they are, with no manipulation and no math function attached, instead when it's non-linear each channel usually evolves following a classic power function behaviour.
Even if i take this explanation as the real one, i still don't get what a real linear space is, because after computation all non-linear RGB spaces becomes linear and most important of all i don't get the part where a non-linear color space is more suitable for the human eye because in the end all RGB spaces are linear for what i understand.
Let's say you're working with RGB colors: each color is represented with three intensities or brightnesses. You've got to choose between "linear RGB" and "sRGB". For now, we'll simplify things by ignoring the three different intensities, and assume you just have one intensity: that is, you're only dealing with shades of gray.
In a linear color-space, the relationship between the numbers you store and the intensities they represent is linear. Practically, this means that if you double the number, you double the intensity (the lightness of the gray). If you want to add two intensities together (because you're computing an intensity based on the contributions of two light sources, or because you're adding a transparent object on top of an opaque object), you can do this by just adding the two numbers together. If you're doing any kind of 2D blending or 3D shading, or almost any image processing, then you want your intensities in a linear color-space, so you can just add, subtract, multiply, and divide numbers to have the same effect on the intensities. Most color-processing and rendering algorithms only give correct results with linear RGB, unless you add extra weights to everything.
That sounds really easy, but there's a problem. The human eye's sensitivity to light is finer at low intensities than high intensities. That's to say, if you make a list of all the intensities you can distinguish, there are more dark ones than light ones. To put it another way, you can tell dark shades of gray apart better than you can with light shades of gray. In particular, if you're using 8 bits to represent your intensity, and you do this in a linear color-space, you'll end up with too many light shades, and not enough dark shades. You get banding in your dark areas, while in your light areas, you're wasting bits on different shades of near-white that the user can't tell apart.
To avoid this problem, and make the best use of those 8 bits, we tend to use sRGB. The sRGB standard tells you a curve to use, to make your colors non-linear. The curve is shallower at the bottom, so you can have more dark grays, and steeper at the top, so you have fewer light grays. If you double the number, you more than double the intensity. This means that if you add sRGB colors together, you end up with a result that is lighter than it should be. These days, most monitors interpret their input colors as sRGB. So, when you're putting a color on the screen, or storing it in an 8-bit-per-channel texture, store it as sRGB, so you make the best use of those 8 bits.
You'll notice we now have a problem: we want our colors processed in linear space, but stored in sRGB. This means you end up doing sRGB-to-linear conversion on read, and linear-to-sRGB conversion on write. As we've already said that linear 8-bit intensities don't have enough darks, this would cause problems, so there's one more practical rule: don't use 8-bit linear colors if you can avoid it. It's becoming conventional to follow the rule that 8-bit colors are always sRGB, so you do your sRGB-to-linear conversion at the same time as widening your intensity from 8 to 16 bits, or from integer to floating-point; similarly, when you've finished your floating-point processing, you narrow to 8 bits at the same time as converting to sRGB. If you follow these rules, you never have to worry about gamma correction.
When you're reading an sRGB image, and you want linear intensities, apply this formula to each intensity:
float s = read_channel();
float linear;
if (s <= 0.04045) linear = s / 12.92;
else linear = pow((s + 0.055) / 1.055, 2.4);
Going the other way, when you want to write an image as sRGB, apply this formula to each linear intensity:
float linear = do_processing();
float s;
if (linear <= 0.0031308) s = linear * 12.92;
else s = 1.055 * pow(linear, 1.0/2.4) - 0.055; ( Edited: The previous version is -0.55 )
In both cases, the floating-point s value ranges from 0 to 1, so if you're reading 8-bit integers you want to divide by 255 first, and if you're writing 8-bit integers you want to multiply by 255 last, the same way you usually would. That's all you need to know to work with sRGB.
Up to now, I've dealt with one intensity only, but there are cleverer things to do with colors. The human eye can tell different brightnesses apart better than different tints (more technically, it has better luminance resolution than chrominance), so you can make even better use of your 24 bits by storing the brightness separately from the tint. This is what YUV, YCrCb, etc. representations try to do. The Y channel is the overall lightness of the color, and uses more bits (or has more spatial resolution) than the other two channels. This way, you don't (always) need to apply a curve like you do with RGB intensities. YUV is a linear color-space, so if you double the number in the Y channel, you double the lightness of the color, but you can't add or multiply YUV colors together like you can with RGB colors, so it's not used for image processing, only for storage and transmission.
I think that answers your question, so I'll end with a quick historical note. Before sRGB, old CRTs used to have a non-linearity built into them. If you doubled the voltage for a pixel, you would more than double the intensity. How much more was different for each monitor, and this parameter was called the gamma. This behavior was useful because it meant you could get more darks than lights, but it also meant you couldn't tell how bright your colors would be on the user's CRT, unless you calibrated it first. Gamma correction means transforming the colors you start with (probably linear) and transforming them for the gamma of the user's CRT. OpenGL comes from this era, which is why its sRGB behavior is sometimes a little confusing. But GPU vendors now tend to work with the convention I described above: that when you're storing an 8-bit intensity in a texture or framebuffer, it's sRGB, and when you're processing colors, it's linear. For example, an OpenGL ES 3.0, each framebuffer and texture has an "sRGB flag" you can turn on to enable automatic conversion when reading and writing. You don't need to explicitly do sRGB conversion or gamma correction at all.
I am not a "human color detection expert", but I've met similar thing on the YUV->RGB conversion. There are different weights for R/G/B channels, so if you change the source color by x, RGB values change different quantity.
As said, I'm not an expert, anyway, I think, if you want to do some color-correct transformation, you should do it in YUV space, then convert it to RGB (or do the mathematically equivalent operation on RGB, beware of data loss). Also, I'm not sure that YUV is the best native representation of colors, but video cameras provide that format, that's where I've met the issue.
Here is the magic YUV->RGB formula with secret numbers included: http://www.fourcc.org/fccyvrgb.php

Most "stable" color representation : RGB? HSV? CIELAB?

There are several color representations in computer science : the standard RGB, but also HSV, HSL, CIE XYZ, YCC, CIELAB, CIELUV, ... It seems to me that most of the times, these representation try to approximate human vision (colors perceptually identical should have similar representations)
But what I want to know is which representation is the most "stable" when it comes to pictures. I have an object, let's say a bottle of Coke, and I have thousands of pictures of this bottle, taken under very different circumstances (the main difference would be the how light or dark the picture is, but there's orientation, etc...)
My question is : what color representation will empirically give me the most stable representation of the colors of the bottle? The "red" color of the label should not vary too much. Well, I'll know it will vary, but I would like to know the most "stable" representation.
I've been taught that HSV is better than RGB for these kind of things, but I have no clue for the rest.
Edit (technical details) : I take a particular point of the bottle. I pick the corresponding pixels in a thousand pictures of this point. I now have a cloud of points, that depend on the representation. I want the representation that minimizes the "size" of this cloud, for example the one that minimizes the mean distance of the points of the cloud to its barycenter.
You might want to check out http://www.cs.harvard.edu/~sjg/papers/cspace.pdf, which proposes a new colorspace apparently designed to address this precise question.
I'm not aware of a colourspace that does what you want, but I do have some remarks:
RGB closely matches the way colours are displayed to us on monitors. It is one of the worst colourspaces available in terms of approximating human perception.
As for the other colourspaces: Some try to make sure colours that are perceptually close together are also close together in the colourspace. Others also try to ensure that perceptually similar differences in colour also produce similar differences in the colourspace, regardless of where in the colourspace you are.
The first means that if you think the difference in colour between blue A, and blue B is similar to the difference in colour between the blue A and blue C, then in the colourspace the distance between blue A and blue B will be similar to the distance between blue A and blue C, and they will all three be close together in the colourspace. I think this is called a perceptually smooth colourspace. CIE XYZ is an example of this.
The second means that if you think the difference in colour between blue A and blue B is similar to the difference in colour between red A and red B then in the colourspace the distance between blue A and blue B will be similar to the difference between red A and red B. This is called a perceptually uniform colourspace. CIE Lab is an example of this.
[edit 2011-07-29] As for your problem: Any of HSV, HSL, CIE XYZ, YCC, CIELAB, CIELUV, YUV separate out the illumination from the colour info in some way, so those are the better options. They provide some immunity from illumination changes, but won't help you when the colour temperature changes drastically or coloured light is used. XYZ and YUV are computationally less expensive to get to from RGB (which is what most cameras give you) but also less "good" than HSV, HSL, or CIELAB (the latter is often considered one of the best, but it is also one of the most difficult).
Depending on what you are searching for you could calibrate the color balance of the images. For example: suppose you are matching coca cola logos: You know that the letters in the logo are always white. So if they are not in your image you can use the colour they have to correct that, which gives you information about the other colours.
Our perception of the color of something is mostly determined by its hue; a colorspace such as HSV which gives a single value representing hue will work best.
The eye is a remarkable instrument though, and knowing the color of a single point is not enough. If the entire scene has a yellow or blue tint to it, the eye will compensate and your perception will be of a purer color - the orange Coke bottle will appear to be redder than it is. Likewise with darkness and brightness. If possible, you should try to compensate the image before taking the color sample.

What is the formula for alpha blending for a number of pixels?

I have a number of RGBA pixels, each of them has an alpha component.
So I have a list of pixels: (p0 p1 p2 p3 p4 ... pn) where p_0_ is the front pixel and p_n_ is the farthest (at the back).
The last (or any) pixel is not necessary opaque, so the resulting blended pixel can be somehow transparent also.
I'm blending from the beginning of the list to the end, not vice-versa (yes, it is raytracing). So if the result at any moment becomes opaque enough I can stop with correct enough result.
I'll apply the blending algorithm in this way: ((((p0 # p1) # p2) # p3) ... )
Can anyone suggest me a correct blending formula not only for R, G and B, but for A component also?
UPD: I wonder how is it possible that for determined process of blending colors we can have many formulas? Is it some kind of aproximation? This looks crazy, as for me: formulas are not so different that we really gain efficiency or optimization. Can anyone clarify this?
Alpha-blending is one of those topics that has more depth than you might think. It depends on what the alpha value means in your system, and if you guess wrong, then you'll end up with results that look kind of okay, but that display weird artifacts.
Check out Porter and Duff's classic paper "Compositing Digital Images" for a great, readable discussion and all the formulas. You probably want the "over" operator.
It sounds like you're doing something closer to volume rendering. For a formula and references, see the Graphics FAQ, question 5.16 "How do I perform volume rendering?".
There are various possible ways of doing this, depending on how the RGBA values actually represent the properties of the materials.
Here's a possible algorithm. Start with final pixel colours lightr=lightg=lightb=0, lightleft=1;
For each r,g,b,a pixel encountered evaluate:
lightr += lightleft*r*(1-a)
lightg += lightleft*g*(1-a)
lightb += lightleft*b*(1-a)
lightleft *= 1-a;
(The RGBA values are normalised between 0 and 1, and I'm assuming that a=1 means opaque, a=0 means wholly transparent)
If the first pixel encountered is blue with opacity 50%, then 50% of the available colour is set to blue, and the rest unknown. If a red pixel with opacity 50% is next, then 25% of the remaining light is set to red, so the pixel has 50% blue, 25% red. If a green pixel with opacity 60% is next, then the pixel is 50% blue, 25% red, 15% green, with 10% of the light remaining.
The physical materials that correspond to this function are light-emitting but partially opaque materials: thus, a pixel in the middle of the stack can never darken the final colour: it can only prevent light behind it from increasing the final colour (by being black and fully opaque).

Resources