Basically I want to generate a graphic with a counter variable on X-axis and response time on Y-axis. I did a research and most of plugins can generate graphics but with X and Y axis predefined.
I would like to try any approach to avoid writing a report file myself, but if it is the only possible way, then please give an example on how to do it.
Related
I have the following DataFrame (only a part of it is shown):
I use it to generate the following plot in Altair. I generated this plot based on a modification of the code suggested in this post.
However, due to the fact that each of my Y labels has a different number of associated data points, the only way I could make the plot appear as desired was by using np.resize to repeat values. This works almost perfectly, but leads to the unfortunate issue that some of the marks in the plot appear darker than others, which can be misleading because it does not actually relate to the data in any way. Is there any way to get around this in Altair?
It sounds like you're asking about the opacity of the marks, which defaults to semi-transparent. You can adjust this with the opacity argument to mark_point(); for example:
alt.Chart(data).mark_point(opacity=1)
This question already has answers here:
How to change spacing between ticks
(4 answers)
Closed 5 months ago.
I'am kind of in a rush to finish this for tomorrows presentation towards the project owner. We are a small group of economic students in germany trying to figure out machine learning with python. We set up a Random Forest Classifier and are desperate to show the estimators important features in a neat plot. By applying google search we came up with the following solution that kind of does the trick, but leaves us unsatisfied due to the overlapping of the labels on the y-axis. The code we used looks like this:
feature_importances = clf.best_estimator_.feature_importances_
feature_importances = 100 * (feature_importances / feature_importances.max())
sorted_idx = np.argsort(feature_importances)
pos = np.arange(sorted_idx.shape[0])
plt.barh(pos, feature_importances[sorted_idx], align='center', height=0.8)
plt.yticks(pos, df_year_four.columns[sorted_idx])
plt.show()
Due to privacy let me say this: The feature names on the y-axis are overlapping (there are about 30 of them). I was looking into the documentation of matplotlib in order to get an understanding of how to do this by myself, unfortunately I couldn't find anything helpful. Seems like training and testing models is easier than understanding matplotlib and creating plots :D
Thank you so much for helping out and taking the time, I appreciate it.
I see your solution, and I want to just add this link here to explain why: How to change spacing between ticks in matplotlib?
The spacing between ticklabels is exclusively determined by the space between ticks on the axes. Therefore the only way to obtain more space between given ticklabels is to make the axes larger.
The question I linked shows that by making the graph large enough, your axis labels would naturally be spaced better.
You are using np.argsort that will return a numpy array with many indices. And you are using that array as labels for your Y-Axis thus there is overlapping of labels.
My suggestion will be to use an index for sorted_idx like,
plt.yticks(pos, df_year_four.columns[sorted_idx[0]])
This will plot only for 1 label.
Got it guys!
'Geistesblitz' as we say in germany! (spiritual lightening)
See the variable feature_importances in the third top row? Add feature_importnaces[:-15]
to view only the top half of the features and loosen up the y-axis. Yes!!! This does well because there are way less important features.
I am wondering if anyone could provide a simple working example of a histogram that has different background colours for different values of "x". Something that would look like the following graph:
I cannot seem to find an easy way to do this, even though it is a fairly common visual tool when using histograms in a time context.
Please study https://stackoverflow.com/help/mcve for future questions. Here in the question we see no data example, no attempt at code, no provenance for your graph.
This is reproducible:
webuse grunfeld, clear
line invest year if company == 1
twoway scatteri 0 1939 1500 1939 1500 1945 0 1945, recast(area) color(gs12) || line invest year if company == 1 , ytitle(invest) legend(order(1 "WW II") pos(11))
Steps:
Draw a line plot and decide what to highlight. It's a rectangle and you need the coordinates of the corners.
It's crucial to draw the rectangle first, as otherwise it will overwrite your line plot. Tastes and imperatives vary, but a light gray often works well.
The rectangle is drawn by specifying an "immediate" scatteri plot of the coordinates of the corners, but recasting to an area plot.
You need to reach in and fix the vertical axis title and very possibly the legend. Fine tuning: use the Graph Editor.
Optionally use plotregion(margin(zero)) to remove the default area between the axes and the plotregion.
I implemented a multi-series line chart like the one given here by M. Bostock and ran into a curious issue which I cannot explain myself. When I choose linear interpolation and set my scales and axis everything is correct and values are well-aligned.
But when I change my interpolation to basis, without any modification of my axis and scales, values between the lines and the axis are incorrect.
What is happening here? With the monotone setting I can achieve pretty much the same effect as the basis interpolation but without the syncing problem between lines and axis. Still I would like to understand what is happening.
The basis interpolation is implementing a beta spline, which people like to use as an interpolation function precisely because it smooths out extreme peaks. This is useful when you are modeling something you expect to vary smoothly but only have sharp, infrequently sampled data. A consequence of this is that resulting line will not connect all data points, changing the appearance of extreme values.
In your case, the sharp peaks are the interesting features, the exception to the typically 0 baseline value. When you use a spline interpolation, you are smoothing over these peaks.
Here is a fun demo to play with the different types of line interpoations:
http://bl.ocks.org/mbostock/4342190
You can drag the data around so they resemble a sharp peak like yours, even click to add new points. Then, switch to a basis interpolation and watch the peak get averaged out.
I have this graph created with gnuplot
However the red line at the bottom seems like very straight due to the y-axis range although it is not (it should look like the blue one). How can make the range of the y-axis very fine grained (lots of ticks) so very small values of the red graph can be visible ? Hope I was clear thanks.
I can think of two possible solutions to your question.
Use a logarithmic scale with set logscale y. This would change the look of your plot quite a bit but you would still have all the data related to a single scale and it would most probably introduce a "higher resolution" to your red line.
Introduce a second y-axis like in this example.
As far as I know, it is not possible to increase the resolution only on a specific part of an axis. I think, this would lead to more confusion than it would do any good.