Python matplotlib with percentile data on X axis - python-3.x

I have probably a simple issue, but I spent more than one hour checking both in here and on google without finding the solution I need.
My simple goal is to plot the histogram for the values stored in a dataframe column.
The problem I face is that I need the x ticks used to define the bins to correspond to percentile values
perc = [0.10, 0.20, 0.30, 0.50, 0.70, 0.90, 1.0] # percentile values
This implies the histogram plot will have non-uniform bin widths.
I tried this solution Percentiles on X axis with matplotlib but that gives me ton of vertical bars and not just bins corresponding to my percentile levels.
I also saw a similar question here python plot hist(graph) with percentile but it doesn't have any answers.
Addition to my question.
I'm trying the following code but it doesn't really do what I need.
data = df["sales"].values # this is a an array with 58k values, 99% of which between 100 and 5000 and 1% of which between 1mln and 2mln
perc = [0, 0.10, 0.20, 0.30, 0.50, 0.70, 0.90, 0.99, 1.0]
perc_values = np.percentile(data, perc)
histaa = np.histogram(data, bins = perc_values)
plt.hist(histaa, bins = perc_values);
The bins I obtain has not the same width and the plot has a big empty region.
Adding also a screenshot.
I wonder If anybody could help.
Many thanks

Related

Is there a way to change the y axis on Gnuplot so that my image graphs from hour 16 to hour 15 instead of 0 to 24?

I'm sorry if this has already been asked, I couldn't find it anywhere, but I have an image plot on gnuplot of a three-columned data file for a y range [0:24] and I can't figure out how to use gnuplot to rearrange the image graph so my y axis runs from 16:24 and then 0:16 (in that order and on the same axis). The command I've been using is "plot [] [0:24] '/Users/eleanor/PycharmProjects/attempt2.gray' u 1:2:3 w image" but I don't know what command to use so that hour 16 is at the very bottom instead of 0, and then when y reaches 23:59 y goes to 0 next and then continues increasing up to 15:59 at the very top of the axis. I'm not sure if that makes sense or not, and I've already tried changing the y range to [16:15] and that did nothing except give me an error lol. Any tips would be very much appreciated! :)
a piece of the file im using is below (with the first column being the day of year, the second being the time in decimal hours, and the third being the data):
20 0.0 7.327484247409568
20 0.002777777777777778 8.304658863945411
20 0.005555555555555556 11.641408500506405
20 0.008333333333333333 6.543382279013497
20 0.011111111111111112 13.922090817182697
20 0.013888888888888888 10.696406455987988
20 0.016666666666666666 12.537636516165243
20 0.019444444444444445 11.816216763447612
20 0.022222222222222223 8.914413125514413
20 0.025 5.8225423124691496
20 0.027777777777777776 10.896730484548698
20 0.030555555555555555 9.097140108173859
As currently implemented, with image treats the entire block of data as a single entity. You can't chop it up into pieces within a single plot command. However if your data is dense enough, it may be that you can approximate the same effect by plotting each pixel as a colored square:
set xrange [*:*] noextend
set yrange [0:24]
plot 'datafile' using 1:(($2>16.)? ($2-16.) : ($2+8.)):3 with points pt 5 lc palette
I strongly recommend not making the range limits part of the plot command. Set them beforehand using set xrange and set yrange.
If necessary, you can adjust the size of the individual square "pixels" by using set pointsize P where P is a scale factor. It probably looks best if you make the points just large enough (or small enough) to touch each other. I think the default ones in the image I show are too large.
You can also use the boxxyerror plotting style instead of the image plotting style. Well, here's what the help for boxxyerror says
gnuplot> ? boxxyerror
The `boxxyerror` plot style is only relevant to 2D data plotting.
It is similar to the `xyerrorbars` style except that it draws rectangular areas
rather than crosses. It uses either 4 or 6 basic columns of input data.
Additional input columns may be used to provide information such as
variable line or fill color (see `rgbcolor variable`).
4 columns: x y xdelta ydelta
6 columns: x y xlow xhigh ylow yhigh
....
If you adopt the four-column plotting style above, you must specify xdelta and ydelta in addition to x and y to specify the rectangle. The xdelta and ydelta should be the half-width and half-height of each pixel. From your data, let's say xdelta is half of 1 and ydelta is half of 0.002777777777777778 hours.
Our final script will look like this.
In this script, the second column of "using" is the same as Ethan's answer.
dx = 1.0/2.0
dy = 0.002777777777777778/2.0
set xrange [-1:32]
set yrange [0:24]
set ytics ("16" 0, "20" 4, "0" 8, "4" 12, "8" 16, "12" 20, "16" 24)
set palette defined (0 "green", 0.5 "yellow", 1 "red")
unset key
plot "datafile" using 1:($2>16?($2-16):($2+8)):(dx):(dy):3 \
with boxxy palette

How to send highest matplotlib scatter plot c-values to front?

I have a pandas dataframe of 3 columns (x, y, z) which I plot using a scatter plot with z-variable assigned to the c-value, with the resulting image
Variables x, y, and z are all continuous real data and z != f(x,y). I'm unable to provide the actual data sample.
As you can see the points overlap, and highest values are hidden from view.
I would like this plot to display the highest (red) points on top of the lowest (blue points) to produce an intensity plot similar to this
I assume this is achieved by somehow controlling the plot order of z, and I have tried sorting the dataframe by the z-variable with no success.
I would appreciate some method for the existing chart or a suggestion for a new chart to make this possible.
Try with this
df = pd.DataFrame(data={'x': [0.2, 0.21, 0.22],
'y': [0.2, 0.21, 0.22],
'z': [1.8, 2, 3]})
df.plot.scatter('x','y',c='z', s=5000)
Then with this
df.sort_values('z', ascending=False, inplace=True)
df.plot.scatter('x','y',c='z', s=5000)
The z order is reversed between the two

Plotting a heatmap with different bin sizes in Gnuplot

I have a data file that I would like to plot as a heatmap. There are 3 columns: x, y, and the count at point (x,y). The problem is that the bins have different sizes in y (and not in x), for example
-0.3 0 0
-0.3 6.7082 0
-0.3 8.66025 0
-0.3 10.247 0
-0.3 11.619 0
-0.3 12.8452 0
...
But when I plot using for example
set view map
set size ratio -1
set key off
splot "histo.txt" u 1:2:3 w image
I get an image in which the bin sizes in the y direction are the same, thus the picture is distorted.
How can I plot a heatmap with different bin sizes in one direction? I also know exactly where each bin should begin and end in y, the values in the second column of the data file are a weigthed average.
Thank you.
Gnuplot offers basically two plotting styles suitable for heat maps, pm3d and image, which however have very different behaviour:
image:
Draws a pixel image
Always uses a regular grid, no matter what x or y values are used
Each quadrangle (here, the pixel) is centered on one data point
pm3d:
Draws vectorial quadrangles
Can use irregular grids with varying spacings
Draws each quadrangle with four data points as corners. The color is by default given by the mean value of those four points, that can be changed with set pm3d corners2color ...
Can interpolate
Many more features, applicable for 3d etc
So, to summarize: image can be used for heat maps and has its advantages, but in your case you need pm3d, which offers you the flexibility, you need.

Axis value with different decimals in excel

How can I have y axis values with 2 different decimals without using VBA?
I have values in y axis (0.01, 0.10, 1.00, 10.00, 100.00).
I want to change as follows ( to have 2 different decimals): 0.01, 0.10, 1.0, 10.0, 100.0.
If the Y axis on your chart shows values that are in the vicinity of 100, then values in the vicinity of 0.01 will be pretty hard to spot.
Even values less than 10 will not show as a significant blip. What difference does the Y axis label make for units that are not showing, anyway?
If you want to make such fine a distinction, you will want to use two different charts from the outset.
Right click on your y-axis, then click Format Axis. Under Axis Options, check Logarithmic scale.

gnuplot xrange min does not show

I have my dataset (d.asc) as follows:
0.1 0.5
0.12 0.56
...
90.4 0.34
...
100 0.78
I have my plot generation file as follows:
set xrange [0.1:100]
set grid
plot "d.asc" using 1:2 notitle with lines
I.e. I want to see first column on x-axis, and second column on y-axis. But, the x-axis values start from 0 and increment by 10 upto 100.
[1] Why it does not start from 0.1?
[2] Also is there a way to have only three (or four, etc.) specific value points on x-axis? For example I want to see on x-axis only 0.1, 90.4, and 100. Thanks.
[1] Why it does not start from 0.1?
Gnuplot likes to pick round numbers for its tic increments and positions. In your case the increments are 10, so they would appear at 0, 10, ... 100. Since you manually set the x range to start at 0.1 a tic does not appear until 10.
[2] Also is there a way to have only three (or four, etc.) specific value points on x-axis?
Yes, you can specify specific points with this syntax:
set xtics ("0.1" 0.1, "90.4" 90.4, "100" 100)
The value in quotes is the text that appears at the tic, and the number is the actual position at which it appears. (help set xtics for more format info.)

Resources