I have a pandas dataframe of 3 columns (x, y, z) which I plot using a scatter plot with z-variable assigned to the c-value, with the resulting image
Variables x, y, and z are all continuous real data and z != f(x,y). I'm unable to provide the actual data sample.
As you can see the points overlap, and highest values are hidden from view.
I would like this plot to display the highest (red) points on top of the lowest (blue points) to produce an intensity plot similar to this
I assume this is achieved by somehow controlling the plot order of z, and I have tried sorting the dataframe by the z-variable with no success.
I would appreciate some method for the existing chart or a suggestion for a new chart to make this possible.
Try with this
df = pd.DataFrame(data={'x': [0.2, 0.21, 0.22],
'y': [0.2, 0.21, 0.22],
'z': [1.8, 2, 3]})
df.plot.scatter('x','y',c='z', s=5000)
Then with this
df.sort_values('z', ascending=False, inplace=True)
df.plot.scatter('x','y',c='z', s=5000)
The z order is reversed between the two
Related
I want to create two plots, with the second plot generally being about 1/2 or less the size of the main plot. However, when trying to do this with one row and two columns I get an error
fig, axs=plt.subplots(1,2, figsize=(12,10), gridspec_kw={'height_ratios': [2,1],
'width_ratios':[3,1]})
(ValueError: Expected the given number of height ratios to match the number of rows of the grid).
If I only put in one argument in the list for height ratios then I get two plots that are of the same size.
fig, axs=plt.subplots(1,2, figsize=(12,10), gridspec_kw={'height_ratios': [3],
'width_ratios':[3,1]})
That plot is shown below. How can I make the plot on the right half the size of the one on the left, while placing it in the bottom right (not top right)?
The trick here is that height_ratios depends on the number of rows. A ratio is a relationship between 2 things so you cannot introduce a ratio between height subplots if there is only one row (aka one 'height' for the subplots) - no matter how many columns there are. However, you can trick plt.subplots using fig.add_gridspec to introduce more rows and columns but never call on them. Here is how you can go about it:
import matplotlib.pyplot as plt
if __name__ == "__main__":
fig = plt.figure(figsize=(12, 10))
gs = fig.add_gridspec(nrows=2, ncols=2, width_ratios=[3, 1])
fig.suptitle('An overall title')
# Add left subplot
# gs[top and bottom rows, first column (the 'left' subplot))]
ax_left = fig.add_subplot(gs[:, 0])
ax_left.set_xlabel("Left X label")
ax_left.set_ylabel("Left Y label")
# Add bottom right subplot - gs[bottom row, last column (the 'left' subplot)]
# We do not add the upper right subplot
ax_right_bottom = fig.add_subplot(gs[-1, -1])
ax_right_bottom.set_xlabel("Right Bottom X label")
ax_right_bottom.set_ylabel("Right Bottom Y label")
plt.tight_layout()
plt.show()
If you wanted to make the bottom right subplot smaller or bigger in relation to the left subplot, now you could use height_ratios because now there are two rows and you can implement a ratio.
You can read more about it in Arranging multiple Axes in a Figure - it's full of useful tips for wrangling axes and subplots. Cheers!
Say I have data graphed in an xy scatter with straight lines and markers. Data are plotted at (1, 3), (2, 4), (3, 0), and (4, 0). The last two data points are directly on the x axis. To fix this, I want the x axis to be slightly below y= 0.
Here's what I've tried:
Setting the min-y value to a negative number (e.g. -1). I then set the x axis to cross at a number greater than y = -1 (e.g. y= -.02). While this drops the x axis (i.e. floats the zero) as desired, the y axis and negative y-axis values up to -1 are shown on the graph. Typically to cover this area up, I add a white shape with no border. This is neither elegant nor works well when set up in VBA to be used with data sets of various sizes.
When I set the min value to y=-.02 and the x axis to cross at y= -.02, I don't have to worry about the negative values on the y-axis, but the major interval changes to .8, 1.8, 2.8, etc. If I wanted to change the major interval to 0.5, 1.0, 1.5, etc., I'd need to set the min value to -.5, which is far too large when I only want the data points to be slightly above y=0.
Any thoughts?
The regular axis formatting options won't do what you want to achieve, so you need a workaround. For example:
Hide the X axis altogether by formatting it to have no line. Use a new data series with two points. The first point is X=0, Y=-0.02, the second data point is x=the max of the other two series (formula) and Y is the same as the first data point.
Format this series as a line without markers and use it in lieu of the X axis. You can even make the position of this fake X axis dynamic by calculating the Y value from your data, if you want.
This will work with data from the grid, so you don't need to manipulate with VBA, but you could place the series with VBA instead, if you want.
If you need more help implementing this, leave a comment and I'll add more detail.
Ok, so I have two data series graphed, like so.
These are two scatter plots, based on x and y values, that are produced using a combo chart. The orange scatter plot is an ellipse whose calculation is based upon aspects relating to the purple scatter plot. I have made the orange ellipse in order to... well... select the part of the purple scatter plot that I want to do other things with. Problem is, I don't know how to actually select the data points this area refers to.
The data for this chart is based upon four columns: A,B (forming the purple plot) and C,D (forming the orange plot). Reordering the columns makes little difference.
Implementing Anger's proposed solution below, all instances seem to return true. Also, there happen to be more scatter plot rows than there are ellipse rows, so I'm not sure how to solve that for the sake of comparison.
If you specify the equation of the ellipse (center point and semi-major/minor axes), you can use the equation of the ellipse to flag points that are inside or outside.
if( ((Ex-x)/Lx)^2+((Ey-y)/Ly)^2 < 1, "INSIDE", "OUTSIDE")
Where Ex, Ey are the coordinates of the ellipse's center; x, y are your data point's coordinates, and Lx, Ly are the semi-major and semi-minor axes.
Just by eye, I would say Ex = 1.8, Ey = 1.21, Lx = 0.6, and Ly = 0.5.
I have probably a simple issue, but I spent more than one hour checking both in here and on google without finding the solution I need.
My simple goal is to plot the histogram for the values stored in a dataframe column.
The problem I face is that I need the x ticks used to define the bins to correspond to percentile values
perc = [0.10, 0.20, 0.30, 0.50, 0.70, 0.90, 1.0] # percentile values
This implies the histogram plot will have non-uniform bin widths.
I tried this solution Percentiles on X axis with matplotlib but that gives me ton of vertical bars and not just bins corresponding to my percentile levels.
I also saw a similar question here python plot hist(graph) with percentile but it doesn't have any answers.
Addition to my question.
I'm trying the following code but it doesn't really do what I need.
data = df["sales"].values # this is a an array with 58k values, 99% of which between 100 and 5000 and 1% of which between 1mln and 2mln
perc = [0, 0.10, 0.20, 0.30, 0.50, 0.70, 0.90, 0.99, 1.0]
perc_values = np.percentile(data, perc)
histaa = np.histogram(data, bins = perc_values)
plt.hist(histaa, bins = perc_values);
The bins I obtain has not the same width and the plot has a big empty region.
Adding also a screenshot.
I wonder If anybody could help.
Many thanks
I have a bunch of data that I'm plotting as point plots. The data is simply a column for X and a column for Y. The catch here though is this is plotted using axes x2y2.
The x1y1 is used for a histogram. The X axis is the same range for both plots.
I know how to derive the X coordinate, but am wondering if there is an easy way to determine the Y value to use to draw an arrow. I want to draw an arrow callout for an arbitrary point on the point plot.
y1 and y2 are independent.
The coordinates for drawing the arrow can refer to different coordinate systems (first, second, character, screen, and graph, see help coordinates).
So, to draw an arrow e.g. from the top-middle of the plot (graph 0.5, graph 1) to x2 = 1, y2 = 2 (second 1, second 2) you would write
set arrow from graph 0.5, graph 1 to second 1, second 2 head