Understanding Density Plots from Pandas DataFrames - python-3.x

I am trying to understand the distribution of my data for a particular column. It has close to ~1 Million records.
Here is the code that I have written to see the density plot.
df[ "ratio"].plot.kde(bw_method=0.1) # Plot continuous column
https://wellsr.com/python/python-pandas-density-plot-from-a-dataframe/
Here is the plot that I get:
I am not clear what does x-axis and y-axis indicate?
Is x-axis the ratio values from dataframe?
What does Density means in y-axis and how it is calculated?
Do we have any such formula to derive this values in y-axis? I am more interested in deriving the values. Given the column ratio how can we come up with density values. Can someone quickly show the maths?

If you are plotting a KDE, it means you are plotting a Probabilistic Density Function (PDF) of a random variable.
The X-Axis will be the range of values of the parameter you are plotting for. In your case, since you are plotting for Ratio, X-Axis will represent the range of values of your parameter ratio
Y-Axis on the other hand represents kernel density i.e the probability of the parameter your are plotting for.
Read the documentation

Related

How to interpolate axis labels in matplotlib imshow

I have a scientific 2D matrix that I want to plot. The bin labels are monotonically increasing real numbers which are are frequently on a logarithmic or another non-linear scale. As seen below, replacing the ticks and plotting the bin values directly for each bin is not really very pretty. What are my options?
I would prefer to only make ticks at integer values of each coordinate, even if that tick would not correspond to any particular bin but instead be somewhere in-between the bins.
Another request, if possible, is to then convert the labels to a logarithmic scale, namely, instead of [1,10,100,1000,10000] write 10^1, 10^2, 10^3, 10^4, 10^5.

How can I plot time-series on matplotlib polar plot?

Unsure if a polar plot is exactly what I should be using to accomplish this. But, essentially, I have multiple time-series (just amplitude vs. time), each corresponding to a different angle in degrees. For example, I have ampl. vs time at 5 degrees, 10 degrees, 15, etc. For angles 0 to 360 at increments of 5, I would like to plot each time series on a circular, or polar plot.
I am attempting to do this with matplotlib and the projection='polar' flag on. There are 8000 amplitude values for each time-series. They are in a numpy array called data. To test plotting the time-series associated with 5-degrees, I made a numpy array of 8000 5's with 5*np.ones(8000) so that they are the same length.
thetas=np.arange(0,365,5) #make 0 to 360 degrees at increments of 5
ax=plt.subplot(111,projection='polar') #turn on polar projection
ax.plot(5*np.ones(8000),data)
plt.show()
I get:
You can see these data are not plotting along the 5-degree line, nor does it look like any amplitudes are showing (there should be squiggles up and down varying with time). Thank you in advance!
EDIT: Example of what I want (each color line is a different time-series)

Create Normal Distribution curve in Excel

Trying to draw a Bell Curve/Normal Distribution curve with the data set provided, but it is not getting created on Excel. Can anyone help me in creating the same.
https://docs.google.com/spreadsheets/d/1ipDo6WlbmDUBZuuS4ya3ZGD7mkP_vnbByK3KvyLbJ88/edit?usp=sharing
The above file can be used as the data set for creating the curve. Can someone explain me the procedure of how to make a curve with the above data set in Excel?
if your data is normally distributed it should resemble a bell curve.
By "Trying to draw a Bell Curve/Normal Distribution curve", are you referring to a line diagram?
Remember, the bell curve is a histogram of your data. If you inserted a histogram of your data, would that be enough?
If not, what you could do is calculate the standard deviation of your data (and the mean), then you could make a column for different standard deviations and what value we expect it to be.
We could then incorporate that into your old histogram. You could use a "Combo" chart and plot the histogram on one axis and the a line for your calculated values (you can make it smooth if you think it's too sharp. Also, you could decrease the distance between each of your calculated values (1.1, 1.2, ...) instead of let's say halves of standard deviations.
Unfortunately, the data you provided is not at all normally distributed.
So you can't create a bell curve based on this data, no.

Excel - Plot average of two plots with inconsistent time (X) axis

I have managed to plot two different data sets on the same axis however, I'm also looking to plotting another line showing their average.
The main problem is that both data sets have different X (time) values so it's not possible to add an average column at the end and plot that. (See the highlighted row 22 for example, corresponding Time values are different)
Is there any way I can plot an average of two plots on the same axis?
One idea that might work is to place the values of both series, one above the other in two new columns, sort this new data according to time, smooth it, then plot the smoothed combined data. Alternatively, you could do the smoothing by simply plotting the new sorted series, adding a moving average trendline to it, then change the formatting of the new series so that it is no longer visible (but the trendline is). Something like this:
In the above picture, series 3 is the plot of the sorted aggregate data of series 1 and 2. If you change the formatting of series 3 so that there is no line, you get something like this:
For my relatively small mock data sets, the results are admittedly poor (it was based on just 25 data points in each series), but if you have a large amount of closely spaced data, and you play around with the moving average window size, you might get something acceptable. If not, you should probably just interpolate both datasets to obtain two consistent time series.

I want to make a scatter graph of the output of ftrace (from kernel)

I want to make a scatter graph of the output of ftrace (from kernel) on asm_do_IRQ..The problem is there are 8000+ entries and I get the results as a single line plot. Is there any way to do a normalisation of the values so that I can get a scatter plot? The values I want to print are as below:
Interrupt Time
uart-pl011 196.98111
Nomadi 196.983246
prcmu 196.983307
dma40 196.983429
dma40 196.984222
Nomadi 196.98642
dma40 196.988922
prcmu 196.988953
since the number of values are huge, excel takes time on the Y axis and plots the number of interrupts on the X axis. But i want the interrupts by name on the Y axis and time on the X axis.
looks like this is not possible. Scatter graphs can only be used to plot two variables, and since one of the axes i want to plot is a string, this is not possible - i can only get a trend, not the exact plots vs time

Resources