Numerical difference/distance between two colours - colors

I have a list of colours and want to merge the two "closest" (most similar) by taking their average. To do this I need a numerical distance function between colours. I could treat RGB as a 3D vector for distance, however this may not be perceptively best. E.g. the human eye apparently notices contrast in luminance far better than hue. For that matter, averaging may not be correct either. The speed of this calculation is also a concern.
Before jumping in to various colour spaces and color difference, I'd like to know if there is a well know method that loosely balances visual accuracy and performance. Is the 3D distance the way to go?

Related

Finding nearest color in RGB?

I have to find the nearest color. For example, I have two colors colorA1, colorA2 which are nearly same color. And also I have other color colorB1.
And I need such a method:
Color getNearestColor(colorA1, colorA2, colorB1). This method should give me the colorB2 which is calculated by using the difference of colorA1 and colorA2, then using their distance it should give me colorB2 which has the same distance as in colorA1 and colorA2.
Can you give some ideas how to implement it?
To find the nearest colour, you need a definition of "near", so a metric.
In Wikipedia you will find different metrics of color differences.
Personally I would use the 2*R*R + 4*G*G + 3*B*B. (no need for square roots, you will just compare same metrics). Easy to calculate, you can use just integers (if you use 32 bit integers, you will have no overflow).
Then find which colour has the smallest differences between your target colour.
The other methods are more precise, but in that case "RGB" is not enough. You need to know which colour space are using (probably you are in sRGB).

What is the relationship between color space RGB, XYZ and the color matching function?

What is the relationship between color spaces (RGB, XYZ) and the color matching function? Let's say we have some color matching function in the color space XYZ (3 row matrix). We also have the transformation matrix which translates from XYZ coordinates to RGB coordinates.
My understanding is that there is some visual input, which is made up of the color spectrum S(y). The human eye does not see the world - it only sees its interpretation of the world. The human eye has 3 cone types LMS, each of which is responsible for processing RED, GREEN, or BLUE. The human eye sees the spectral color only because it's eye sums over RED, GREEN, BLUE vector, and this sum matches the color of the input. In order to match the color, there is a color matching function, which takes the input spectrum and produces the weights by which to multiply the primary RED, GREEN, BLUE color vector. These then get added and their output visually matches the spectral input, even though the spectrum had many many frequencies added, while the human eye was only adding 3. So we went from HUGE space to space where we can describe all with 3 vectors, summed as dictated by the color matching function.
The spectral input, color primaries, and color matching functions behave as described above and can be summarized in this formula:
where pi is the 3d vector of primary colors, c - color matching function is also a vector of 3 components, and finally s is the spectral input.
We have XYZ color space, and a corresponding color matching function which does what is described above. We are then given matrix T, which transforms XYZ coordinates to RGB coordinates. We already know T, and we need to use it to produce a new color matching function for the RGB color space.
I do not understand how the color space relates to choice of primaries pi(λ) and the choice of color functions ci(λ1).
I have been trying to understand about colours from months and after some research, i believe I have some insights which probably can help me answer your question.
I do not understand how the color space relates to choice of primaries
pi(λ)
Primaries are nothing but the wavelength of the colors that we choose to use for making all the other colors in space and that also defines the gamut of the colour space. So if you play with the applet provided in the link that is given below you can see that the whole gamut in the colour space changes when you change your primary.
Have a look at Alternative primaries and gamuts section.
Now I do not know how much you understand the RGB and XYZ or what do you mean when you say RGB here (assuming you are referring to sRGB gamut values); XYZ are actually Tristimulus values which are called rho, beta and gamma as shown in the image above and just for simplicity XYZ are converted to xy space from where you get your standard sRGB gamut.
Please go through this if you are interested in understanding how colour sensors work and converting sensor values to XYZ matrix
Please comment if I have missed any information or answer needs editing.
I think lots of issues with color selection are due technical problems people had to solve. Usually you are not trying to reproduce colors as accurately as possible, but to make them pleasant looking, cheap, fast to calculate on cpu.... If someone watches plains of New Zealand on TV he is very unlikely to know they really look like, but almost certainly wants to enjoy the picture and pay little for it.
Several reasons why you might want to use different color matching functions might include:
You are taking pictures under non-white light and you want your picture to look natural.
You are taking underwater pictures and want to compensate for the fact that water attenuates different frequencies at different speeds.
Your sensor is not perfect and you want to compensate for that.
On the other hand you might want to change your primaries due to some reason. For example your images might be taking a picture of a scene with limited amount of colors. By nudging your primaries a little you might get a "fuller" picture.
Finally sometimes you just have to compensate for some of the limitations you have with your devices. Your phosphorus on CRT TV will impose some restrictions. So will the noise in air when transmitting using PAL. On the other hand if you go digital you might be forced to have less than 36 bits per pixel. In that case you will have to make compromises and this will give you opportunity to lose as little as possible.
If you want a short tutorial visit Cambridge in colour.
Here is a Szeliski's textbook on photography, look at chapters 1 2 and 10.
Poyton has list of common transformations.

Averaging many curves with different x and y values

I have several curves that contain many data points. The x-axis is time and let's say I have n curves with data points corresponding to times on the x-axis.
Is there a way to get an "average" of the n curves, despite the fact that the data points are located at different x-points?
I was thinking maybe something like using a histogram to bin the values, but I am not sure which code to start with that could accomplish something like this.
Can Excel or MATLAB do this?
I would also like to plot the standard deviation of the averaged curve.
One concern is: The distribution amongst the x-values is not uniform. There are many more values closer to t=0, but at t=5 (for example), the frequency of data points is much less.
Another concern. What happens if two values fall within 1 bin? I assume I would need the average of these values before calculating the averaged curve.
I hope this conveys what I would like to do.
Any ideas on what code I could use (MATLAB, EXCEL etc) to accomplish my goal?
Since your series' are not uniformly distributed, interpolating prior to computing the mean is one way to avoid biasing towards times where you have more frequent samples. Note that by definition, interpolation will likely reduce the range of your values, i.e. the interpolated points aren't likely to fall exactly at the times of your measured points. This has a greater effect on the extreme statistics (e.g. 5th and 95th percentiles) rather than the mean. If you plan on going this route, you'll need the interp1 and mean functions
An alternative is to do a weighted mean. This way you avoid truncating the range of your measured values. Assuming x is a vector of measured values and t is a vector of measurement times in seconds from some reference time then you can compute the weighted mean by:
timeStep = diff(t);
weightedMean = timeStep .* x(1:end-1) / sum(timeStep);
As mentioned in the comments above, a sample of your data would help a lot in suggesting the appropriate method for calculating the "average".

Mapping RGB/hex color codes to general color categories

Is there a dataset that maps each of the ~16M RGB or hex color values to a general color family/category - e.g. red, purple, orange, beige, brown, etc. - that I could access programmatically or load into a database or JSON document to cross-refence the color codes against? The use case is to classify the results of PIL color detection of swatch files into a small set of color pickers for a shopping site. It would also work if the mapping is a bit more granular, say 100-200 categories, since it would be easy enough to map those to my target 10-15 myself. I have some knowledge of kNN classification and will work with that if I have to, but it would be so much easier to use a static mapping if one already exists.
You can use a table such as the one in X11
http://www.astrouw.edu.pl/~jskowron/colors-x11/rgb.html
In order to find color proximity, it's best to transform the colors to Lab color space first, so that euclidean distances have more meaning, and then nearest neighbor would give good results.
You could convert from RGB to CIE Lab color space wherein Euclidian distance between two color selections is perceptually more meaningful. Here is the link to all relevant color space transformation formulae used in OpenCV's color conversion method (cvtColor): http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html
Since your use case is to compare two swatches, I would advise you to use texture descriptors (http://www.robots.ox.ac.uk/~vgg/research/texclass/with.html) in addition to color information for better results.

Given a color, how do I find which color it's closest to?

Let's say that I have a list of valid color values like [0x67FF82, 0x808080, 0xffffff, ...] and given an input color, in hex, I want to find which color in the list of acceptable colors that the input color is closest to.
My thought is that I'd find the color in which the absolute value of the difference of the red, green, and blue values is smallest. Is this correct?
It sounds like you're looking for a way to quantify the "distance" between colors - in math, they'd call it a metric. Many people are intuitively pretty comfortable with the Euclidean metric for example - it's simply the distance between two points as measured with a ruler. In the case of colors, things are more complicated because of subjective perception of different colors.
There's a pretty mathy wikipedia article about color difference, which includes links to different implementations.
The difference or distance between two colors is a metric of interest in color science. It allows people to quantify a notion that would otherwise be described with adjectives, to the detriment of anyone whose work is color critical. Common definitions make use of the Euclidean distance in a device independent color space.
In particular, there's Python Colormath, an implementation in python that converts between different color encodings and also seems to have a function for calculating the distance between two colors. If you happen to be coding in python, that sounds helpful, although I unfortunately don't have any personal experience with that tool. There's also similar resources available for MATLAB and Excel provided by the authors of CIE2000, a leading color-difference formula.

Resources