Is there a way to dull/fade the coloring of marks in Altair? - altair

I have the following DataFrame (only a part of it is shown):
I use it to generate the following plot in Altair. I generated this plot based on a modification of the code suggested in this post.
However, due to the fact that each of my Y labels has a different number of associated data points, the only way I could make the plot appear as desired was by using np.resize to repeat values. This works almost perfectly, but leads to the unfortunate issue that some of the marks in the plot appear darker than others, which can be misleading because it does not actually relate to the data in any way. Is there any way to get around this in Altair?

It sounds like you're asking about the opacity of the marks, which defaults to semi-transparent. You can adjust this with the opacity argument to mark_point(); for example:
alt.Chart(data).mark_point(opacity=1)

Related

Overlapping/crowded labels on y-axis python [duplicate]

This question already has answers here:
How to change spacing between ticks
(4 answers)
Closed 5 months ago.
I'am kind of in a rush to finish this for tomorrows presentation towards the project owner. We are a small group of economic students in germany trying to figure out machine learning with python. We set up a Random Forest Classifier and are desperate to show the estimators important features in a neat plot. By applying google search we came up with the following solution that kind of does the trick, but leaves us unsatisfied due to the overlapping of the labels on the y-axis. The code we used looks like this:
feature_importances = clf.best_estimator_.feature_importances_
feature_importances = 100 * (feature_importances / feature_importances.max())
sorted_idx = np.argsort(feature_importances)
pos = np.arange(sorted_idx.shape[0])
plt.barh(pos, feature_importances[sorted_idx], align='center', height=0.8)
plt.yticks(pos, df_year_four.columns[sorted_idx])
plt.show()
Due to privacy let me say this: The feature names on the y-axis are overlapping (there are about 30 of them). I was looking into the documentation of matplotlib in order to get an understanding of how to do this by myself, unfortunately I couldn't find anything helpful. Seems like training and testing models is easier than understanding matplotlib and creating plots :D
Thank you so much for helping out and taking the time, I appreciate it.
I see your solution, and I want to just add this link here to explain why: How to change spacing between ticks in matplotlib?
The spacing between ticklabels is exclusively determined by the space between ticks on the axes. Therefore the only way to obtain more space between given ticklabels is to make the axes larger.
The question I linked shows that by making the graph large enough, your axis labels would naturally be spaced better.
You are using np.argsort that will return a numpy array with many indices. And you are using that array as labels for your Y-Axis thus there is overlapping of labels.
My suggestion will be to use an index for sorted_idx like,
plt.yticks(pos, df_year_four.columns[sorted_idx[0]])
This will plot only for 1 label.
Got it guys!
'Geistesblitz' as we say in germany! (spiritual lightening)
See the variable feature_importances in the third top row? Add feature_importnaces[:-15]
to view only the top half of the features and loosen up the y-axis. Yes!!! This does well because there are way less important features.

D3 - Difference between basis and linear interpolation in SVG line

I implemented a multi-series line chart like the one given here by M. Bostock and ran into a curious issue which I cannot explain myself. When I choose linear interpolation and set my scales and axis everything is correct and values are well-aligned.
But when I change my interpolation to basis, without any modification of my axis and scales, values between the lines and the axis are incorrect.
What is happening here? With the monotone setting I can achieve pretty much the same effect as the basis interpolation but without the syncing problem between lines and axis. Still I would like to understand what is happening.
The basis interpolation is implementing a beta spline, which people like to use as an interpolation function precisely because it smooths out extreme peaks. This is useful when you are modeling something you expect to vary smoothly but only have sharp, infrequently sampled data. A consequence of this is that resulting line will not connect all data points, changing the appearance of extreme values.
In your case, the sharp peaks are the interesting features, the exception to the typically 0 baseline value. When you use a spline interpolation, you are smoothing over these peaks.
Here is a fun demo to play with the different types of line interpoations:
http://bl.ocks.org/mbostock/4342190
You can drag the data around so they resemble a sharp peak like yours, even click to add new points. Then, switch to a basis interpolation and watch the peak get averaged out.

Why is there such a difference between RGB to XYZ color conversions?

Recently I have been trying to understand code that converts between the RGB color space and the CIE-XYZ color space, but it seems like every different calculator I try gives me radically different results.
For example, trying to convert (255, 100, 70) to XYZ yields the following result, even when explicitly using d50 for everything:
EasyRGB gives (46.903, 30.817, 9.270)
Wolfram Alpha gives (0.7493, 0.7245, 0.6308)
Bruce Lindbloom.com gives (0.493910, 0.317574, 0.070047)
Java gives (0.95880127, 0.99554443, 0.8227539)
I don't see how these could possibly give such different answers. Which one is correct (if any)? Is there some sort of parameter that I am missing that differs between these websites?
Because there are different RGB spaces, not just one.
On this page there is the general formula:
http://brucelindbloom.com/index.html?Eqn_RGB_to_XYZ.html
but the general formula depends on some parameters e.g. matrix M, which is different for each individual RGB space.
If you go through this Calculator you will realize that there are lot of parameters that needs to be defined before converting those values to XYZ. So its not a direct calculation which will hold true in any given condition. There are lot of variables that you have to consider while doing so (Also knowing which variables will not affect your calculation).
The calculation will also depend on the application you are trying to develop. The approach for sensing the colour would be different from reproducing the same on screen.

Using images for points in gnuplot

I have a frivolous question. Is there any way to use an image in lieu of points in gnuplot? For example, if I was plotting data about pasta consumption or something, I would have pictures of pasta (instead of usual gnuplot points).
Another option is to find a dingbats type of font with a suitable glyph. Then you can use "plot with labels" using that glyph as the label string. See for example the 5th plot in the demo
http://gnuplot.sourceforge.net/demo/stringvar.html
I haven't played around with this feature at all myself, however, there is:
http://gnuplot.sourceforge.net/demo/barchart_art.html
Which shows the use of png files on a bar chart (Note, that this feature was added in Gnuplot 4.5 -- I think). With a little creativity, it seems like you could use that feature to do what you're asking -- although it would require a whole bunch of plot commands so it might be useful to write a script to generate the gnuplot script (or use iteration depending on your dataset) -- Obviously your image files would have to be in a format that your version of gnuplot understands as well ...
A possible strategy may be the plot with rgbimage option in gnuplot.
See the second example over here: http://www.gnuplot.info/demo_4.2/image.html
If you relate the center option with your data points, this may be possible.

Gnuplot fine grained ranges(grid)

I have this graph created with gnuplot
However the red line at the bottom seems like very straight due to the y-axis range although it is not (it should look like the blue one). How can make the range of the y-axis very fine grained (lots of ticks) so very small values of the red graph can be visible ? Hope I was clear thanks.
I can think of two possible solutions to your question.
Use a logarithmic scale with set logscale y. This would change the look of your plot quite a bit but you would still have all the data related to a single scale and it would most probably introduce a "higher resolution" to your red line.
Introduce a second y-axis like in this example.
As far as I know, it is not possible to increase the resolution only on a specific part of an axis. I think, this would lead to more confusion than it would do any good.

Resources