Suggestion on matplotlib colors : Need distinct shades. - colors

I'm plotting a voronoi diagram in which I shade the polygons depending on a proportional probability( By which I mean, If I were to plot give polygons their total probability might be 1).This is my code where I give the facecolor as the probability value.
matplotlib.patches.Polygon(poly, facecolor= probList[i])
The problem is the shades are not distinct enough to reflect my probability values. I'm fine with going any colors as long as the shades reflect probability.StackOverflow ppl, please throw in your suggestions.Thanks!

Picking from matplotlibs colormaps is probably a good start. The link shows all of the preset colormaps.
My favorites (and a common choice) for ordered values (like probability, which goes from zero to one) are hot or afmhot, because they show good distinction of intermediate values and have clear perceptual ordering.
Below are the sequential colormaps from matplotlib (taken from the reference above). Or, see the full set (again, at the reference above) if you want more distinction at the cost of less obvious ordering. (Even if you choose to not use a sequential colormap, you might still want to avoid the unfortunately popular jet colormap because, amongst other reasons, it starts and ends with dark colors, making it hard to understand).

Related

Interpolation technique for weirdly spaced point data

I have a spatial dataset that consists of a large number of point measurements (n=10^4) that were taken along regular grid lines (500m x 500m) and some arbitrary lines and blocks in between. Single measurements taken with a spacing of about 0.3-1.0m (varying) along these lines (see example showing every 10th point).
The data can be assumed to be normally distributed but shows a strong small-scale variability in some regions. And there is some trend with elevation (r=0.5) that can easily be removed.
Regardless of the coding platform, I'm looking for a good or "the optimal" way to interpolate these points to a regular 25 x 25m grid over the entire area of interest (5000 x 7000m). I know about the wide range of kriging techniques but I wondered if somebody has a specific idea on how to handle the "oversampling along lines" with rather large gaps between the lines.
Thank you for any advice!
Leo
Kriging technique does not perform well when the points to interpolate are taken on a regular grid, because it is necessary to have a wide range of different inter-points distances in order to well estimate the covariance model.
Your case is a bit particular... The oversampling over the lines is not a problem at all. The main problem is the big holes you have in your grid. If think that these holes will create problems whatever the interpolation technique you use.
However it is difficult to predict a priori if kriging will behave well. I advise you to try it anyway.
Kriging is only suited for interpolating. You cannot extrapolate with kriging metamodel, so that you won't be able to predict values in the bottom left part of your figure for example (because you have no point here).
To perform kriging, I advise you to use the following tools (depending the languages you're more familiar with):
DiceKriging package in R (the one I use preferably)
fields package in R (which is more specialized on spatial fields)
DACE toolbox in matlab
Bonus: a link to a reference book about kriging which is available online: http://www.gaussianprocess.org/
PS: This type of question is more statistics oriented than programming and may be better suited to the stats.stackexchange.com website.

D3 - Difference between basis and linear interpolation in SVG line

I implemented a multi-series line chart like the one given here by M. Bostock and ran into a curious issue which I cannot explain myself. When I choose linear interpolation and set my scales and axis everything is correct and values are well-aligned.
But when I change my interpolation to basis, without any modification of my axis and scales, values between the lines and the axis are incorrect.
What is happening here? With the monotone setting I can achieve pretty much the same effect as the basis interpolation but without the syncing problem between lines and axis. Still I would like to understand what is happening.
The basis interpolation is implementing a beta spline, which people like to use as an interpolation function precisely because it smooths out extreme peaks. This is useful when you are modeling something you expect to vary smoothly but only have sharp, infrequently sampled data. A consequence of this is that resulting line will not connect all data points, changing the appearance of extreme values.
In your case, the sharp peaks are the interesting features, the exception to the typically 0 baseline value. When you use a spline interpolation, you are smoothing over these peaks.
Here is a fun demo to play with the different types of line interpoations:
http://bl.ocks.org/mbostock/4342190
You can drag the data around so they resemble a sharp peak like yours, even click to add new points. Then, switch to a basis interpolation and watch the peak get averaged out.

SVG plot from point-value pairs

I need to write some code (for a web.py webapp with a straight-HTML/JS client) that will generate a visual representation of a set of point-values. Each point has an X and Y coordinate, and the value is an integer. If I can use SVG to do this, then I can scale the image client-side with no extra code. Can I actually do this? I am concerned about a couple of things:
The points don't necessarily have any relation to each other. They aren't necessarily in a grid, nor can we say how many points are nearby, etc.
Gradients are primarily one-direction, and multiple gradients on the same shape seems to be a foreign concept.
Fills require an existing image, at which point, I'd be better off generating the entire image server-side anyway.
Objects always have a layering, even if it isn't specified, which can change how the image is rendered.
If it helps, consider a situation where we have a point surrounded by 5 others, where one of them is a bit closer than the others (exact distances and sizes can be adjusted). All six of the points have different colors (Red, Green, Blue, Cyan, Magenta, Yellow, with red in the center and Yellow being slightly closer), and the outer five points are arranged roughly in a pentagon. Note that this situation is not the only option, just a theoretically possible situation.
Can I do this with SVG, or should I render an image server-side?
EDIT: The main difficulty isn't in drawing the points, it is in filling the space between the points so that there is no whitespace, and color transitions aren't harsh/unpredictable if you know the data.
I don't entirely understand the different issues you are having with wanting to use svg. I am currently using the set up you are describing to render X-Y scatter plots and gaussian curves and found that it works great.
Regarding the last point about object layering, you have to be particularly careful when layering objects with less than 100% opacity which are different colors. The way the colors "add" depends on the order in which you add the objects to your svg drawing.
Thankfully you can use different filters to overlay the colors without blending them. Specifically I am using the FeComposite filter element. There is a good example of its usage here:
http://www.w3.org/TR/SVG/filters.html#feCompositeElement

Sorting a list of colors in one dimension?

I would like to sort a one-dimensional list of colors so that colors that a typical human would perceive as "like" each other are near each other.
Obviously this is a difficult or perhaps impossible problem to get "perfectly", since colors are typically described with three dimensions, but that doesn't mean that there aren't some sorting methods that look obviously more natural than others.
For example, sorting by RGB doesn't work very well, as it will sort in the following order, for example:
(1) R=254 G=0 B=0
(2) R=254 G=255 B=0
(3) R=255 G=0 B=0
(4) R=255 G=255 B=0
That is, it will alternate those colors red, yellow, red, yellow, with the two "reds" being essentially imperceivably different than each other, and the two yellows also being imperceivably different from each other.
But sorting by HLS works much better, generally speaking, and I think HSL even better than that; with either, the reds will be next to each other, and the yellows will be next to each other.
But HLS/HSL has some problems, too; things that people would perceive as "black" could be split far apart from each other, as could things that people would perceive as "white".
Again, I understand that I pretty much have to accept that there will be some splits like this; I'm just wondering if anyone has found a better way than HLS/HSL. And I'm aware that "better" is somewhat arbitrary; I mean "more natural to a typical human".
For example, a vague thought I've had, but have not yet tried, is perhaps "L is the most important thing if it is very high or very low", but otherwise it is the least important. Has anyone tried this? Has it worked well? What specifically did you decide "very low" and "very high" meant? And so on. Or has anyone found anything else that would improve upon HSL?
I should also note that I am aware that I can define a space-filling curve through the cube of colors, and order them one-dimensionally as they would be encountered while travelling along that curve. That would eliminate perceived discontinuities. However, it's not really what I want; I want decent overall large-scale groupings more than I want perfect small-scale groupings.
Thanks in advance for any help.
If you want to sort a list of colors in one dimension you first have to decide by what metrics you are going to sort them. The most sense to me is the perceived brightness (related question).
I have came across 4 algorithms to sort colors by brightness and compared them. Here is the result.
I generated colors in cycle where only about every 400th color was used. Each color is represented by 2x2 pixels, colors are sorted from darkest to lightest (left to right, top to bottom).
1st picture - Luminance (relative)
0.2126 * R + 0.7152 * G + 0.0722 * B
2nd picture - http://www.w3.org/TR/AERT#color-contrast
0.299 * R + 0.587 * G + 0.114 * B
3rd picture - HSP Color Model
sqrt(0.299 * R^2 + 0.587 * G^2 + 0.114 * B^2)
4td picture - WCAG 2.0 SC 1.4.3 relative luminance and contrast ratio formula
Pattern can be sometimes spotted on 1st and 2nd picture depending on the number of colors in one row. I never spotted any pattern on picture from 3rd or 4th algorithm.
If i had to choose i would go with algorithm number 3 since its much easier to implement and its about 33% faster than the 4th
You cannot do this without reducing the 3 color dimensions to a single measurement. There are many (infinite) ways of reducing this information, but it is not mathematically possible to do this in a way that ensures that two data points near each other on the reduced continuum will also be near each other in all three of their component color values. As a result, any formula of this type will potentially end up grouping dissimilar colors.
As you mentioned in your question, one way to sort of do this would be to fit a complex curve through the three-dimensional color space occupied by the data points you're trying to sort, and then reduce each data point to its nearest location on the curve and then to that point's distance along the curve. This would work, but in each case it would be a solution custom-tailored to a particular set of data points (rather than a generally applicable solution). It would also be relatively expensive (maybe), and simply wouldn't work on a data set that was not nicely distributed in a curved-line sort of way.
A simpler alternative (that would not work perfectly) would be to choose two "endpoint" colors, preferably on opposite sides of the color wheel. So, for example, you could choose Red as one endpoint color and Blue as the other. You would then convert each color data point to a value on a scale from 0 to 1, where a color that is highly Reddish would get a score near 0 and a color that is highly Bluish would get a score near 1. A score of .5 would indicate a color that either has no Red or Blue in it (a.k.a. Green) or else has equal amounts of Red and Blue (a.k.a. Purple). This approach isn't perfect, but it's the best you can do with this problem.
There are several standard techniques for reducing multiple dimensions to a single dimension with some notion of "proximity".
I think you should in particular check out the z-order transform.
You can implement a quick version of this by interleaving the bits of your three colour components, and sorting the colours based on this transformed value.
The following Java code should help you get started:
public static int zValue(int r, int g, int b) {
return split(r) + (split(g)<<1) + (split(b)<<2);
}
public static int split(int a) {
// split out the lowest 10 bits to lowest 30 bits
a=(a|(a<<12))&00014000377;
a=(a|(a<<8)) &00014170017;
a=(a|(a<<4)) &00303030303;
a=(a|(a<<2)) &01111111111;
return a;
}
There are two approaches you could take. The simple approach is to distil each colour into a single value, and the list of values can then be sorted. The complex approach would depend on all of the colours you have to sort; perhaps it would be an iterative solution that repeatedly shuffles the colours around trying to minimise the "energy" of the entire sequence.
My guess is that you want something simple and fast that looks "nice enough" (rather than trying to figure out the "optimum" aesthetic colour sort), so the simple approach is enough for you.
I'd say HSL is the way to go. Something like
sortValue = L * 5 + S * 2 + H
assuming that H, S and L are each in the range [0, 1].
Here's an idea I came up with after a couple of minutes' thought. It might be crap, or it might not even work at all, but I'll spit it out anyway.
Define a distance function on the space of colours, d(x, y) (where the inputs x and y are colours and the output is perhaps a floating-point number). The distance function you choose may not be terribly important. It might be the sum of the squares of the differences in R, G and B components, say, or it might be a polynomial in the differences in H, L and S components (with the components differently weighted according to how important you feel they are).
Then you calculate the "distance" of each colour in your list from each other, which effectively gives you a graph. Next you calculate the minimum spanning tree of your graph. Then you identify the longest path (with no backtracking) that exists in your MST. The endpoints of this path will be the endpoints of the final list. Next you try to "flatten" the tree into a line by bringing points in the "branches" off your path into the path itself.
Hmm. This might not work all that well if your MST ends up in the shape of a near-loop in colour space. But maybe any approach would have that problem.

Antialiasing and gamma compensation

The luminence of pixels on a computer screen is not usually linearly related to the digital RGB triplet values of a pixel. The nonlinear response of early CRTs required a compensating nonlinear encoding and we continue to use such encodings today.
Usually we produce images on a computer screen and consume them there as well, so it all works fine. But when we antialias, the nonlinearity — called gamma — means that we can't just add an alpha value of 0.5 to a 50% covered pixel and expect it to look right. An alpha value of 0.5 is only 0.5^2.2=22% as bright as an alpha of 1.0 with a typical gamma of 2.2.
Is there any widely established best practice for antialiasing gamma compensation? Do you have a pet method you use from day to day? Has anyone seen any studies of the results and human perceptions of the quality of the graphic output with different techniques?
I've thought of doing standard X^(1/2.2) compensation but that is pretty computationally intense. Maybe I can make it faster with a 256 entry lookup table, though.
Lookup tables are used quite often for work like that. They're small and fast.
But whether look-up or some formula, if the end result is an image file, and the format permits, it's best to save a color profile or at least the gamma value in the file for later viewing, rather than try adjusting RGB values yourself.
The reason: for typical byte-valued R, G, B channels, you have 256 unique values in each channel at each pixel. That's almost good enough to look good to the human eye (I wish "byte" had been defined as nine bits!) Any kind of math, aside from trivial value inversion, would map many-to-one for some of those values. The output won't have 256 values to pick from for each pixel for R, G, or B, but far fewer. That can lead to contouring, jaggies, color noise and other badness.
Precision issues aside, if any kind of decent quality is desired, all composting, mixing, blending, color correction, fake lens flare addition, chroma-keying and whatever, should be done in linear RGB space, where the values of R, G and B are in proportion to physical light intensity. The image math mimics physical light math. But where ultimate speed is vital, there are ways to cheat.
Jim Blinns - "Dirty Pixels" book outlines a fast and good compositing calculation by using 16 bit math plus lookup tables to accurately go back and forward to linear color space. This guy worked on NASAs visualisations, he knows his stuff.
I'm trying to answer, though mainly for reference now, to the actual questions:
First, there are the recommendations from ITU (http://www.itu.int/rec/T-REC-H.272-200701-I/en) which can be applied to programming (but you have to know your stuff).
In Jim Blinn's "Notation, Notation, Notation", Chapter 9, has a very detailed mathematical and perceptual error analysis, although he only covers compositing (many other graphics tasks are affected too).
The notation he establishes can also be used to derive a way of dealing with gamma, or to check if a given way of doing so is actually correct. Very handy, my pet method (mainly as I discovered it independently but later found his book).
When generating images, one typically works in a linear color space (like linear RGB or one of the CIE color spaces) and then converts to a non-linear RGB space at the end. That conversion can be accelerated in hardware or via lookup tables or even through tricky math. (See the other answers' references.)
When performing an alpha blend (e.g., render this icon onto this background), this kind of precision is often elided in favor of speed. The results are computed directly in the non-linear RGB-space by lerping with the alpha as the parameter. This is not "correct", but it's good enough in most cases. Especially for things like icons on desktops.
If you're trying to do more correct blending, you treat it like an original render. Work in linear space (which may require an initial conversion) and then convert to your non-linear display space at the end.
A lot of graphics nowadays use sRGB as the non-linear display color space. If I recall correctly, sRGB is very similar to a gamma of 2.2, but there are adjustments made to values at the low end.

Resources