how to plot mean and standard deviation of multiple measures in gnuplot - gnuplot

I'd like to plot mean and standard deviation of multiple measures in one Gnuplot, if possible.
For example, object A and B has measures of lengths and weights like this (CSV format):
length,weight
100.0,0.1
100.5,0.12
98.8,0.09
100.1,0.11
Is it possible to plot this in a single figure with good visibility, i.e., the length is at the magnitude of 100, while the weight is at 0.1? I don't want the logarithmic way because in my real data, it doesn't make sense to logarithmize them.
Yerrorlines seems an option, but can histogram do this too?
Anyone knows how to do so in Gnuplot?
Thanks!

Related

Create Normal Distribution curve in Excel

Trying to draw a Bell Curve/Normal Distribution curve with the data set provided, but it is not getting created on Excel. Can anyone help me in creating the same.
https://docs.google.com/spreadsheets/d/1ipDo6WlbmDUBZuuS4ya3ZGD7mkP_vnbByK3KvyLbJ88/edit?usp=sharing
The above file can be used as the data set for creating the curve. Can someone explain me the procedure of how to make a curve with the above data set in Excel?
if your data is normally distributed it should resemble a bell curve.
By "Trying to draw a Bell Curve/Normal Distribution curve", are you referring to a line diagram?
Remember, the bell curve is a histogram of your data. If you inserted a histogram of your data, would that be enough?
If not, what you could do is calculate the standard deviation of your data (and the mean), then you could make a column for different standard deviations and what value we expect it to be.
We could then incorporate that into your old histogram. You could use a "Combo" chart and plot the histogram on one axis and the a line for your calculated values (you can make it smooth if you think it's too sharp. Also, you could decrease the distance between each of your calculated values (1.1, 1.2, ...) instead of let's say halves of standard deviations.
Unfortunately, the data you provided is not at all normally distributed.
So you can't create a bell curve based on this data, no.

Normalisation or Standardisation for detecting outlier?

When to use min max scaling that is normalisation and when to use standardisation that is using z score for data pre-processing ?
I know that normalisation brings down the range of feature to 0 to 1, and z score bring downs to -3 to 3, but am unsure when to use one of the two technique for detecting the outliers in data?
Let us briefly agree on the terms:
The z-score tells us how many standard deviations a given element of a sample is away from the mean.
The min-max scaling is the method of rescaling a range of measurements the interval [0, 1].
By those definitions, z-score usually spans an interval much larger than [-3,3] if your data follows a long-tailed distribution. On the other hand, a plain normalization does indeed limit the range of the possible outcomes, but will not help you help you to find outliers, since it just bounds the data.
What you need for outlier dedetction are thresholds above or below which you consider a data point to be an outlier. Many programming languages offer Violin plots or Box plots which nicely show your data distribution. The methods behind plots implement a common choice of thresholds:
Box and whisker [of the box plot] plots quartiles, and the band inside the box is always the second quartile (the median). But the ends of the whiskers can represent several possible alternative values, among them:
the minimum and maximum of all of the data [...]
one standard deviation above and below the mean of the data
the 9th percentile and the 91st percentile
the 2nd percentile and the 98th percentile.
All data points outside the whiskers of the box plots are plotted as points and considered outliers.

Excel Interpolate with logarithmic prediction

Is there a function within Excel to Interpolate while taking into account a logarithmic prediction?
At the moment I am using linear interpolation but would like to find a better way to fill in the blanks if possible.
There's no logarithmic regression or interpolation in Excel, even in the Anlaysis ToolPak. You'll need much more advanced software for that, such as MatLab.
If you're stuck working in Excel... here's a possible mathematical solution:
Rather than working with the raw data x and y, instead try plotting x and a^y, where a is the base. (Or plotting log(x,a) against y.) If you have the correct base a (and there's no vertical offset), you will then have a linear relationship from which you can perform a linear interpolation as normal, then convert the interpolated values back to actual values by taking the log of them.
If you don't know what a is, then you can instead calculate a line of best fit for an arbitrary a, calculate the standard residuals, and then use Problem Solver to modify a until you get the lowest possible standard residuals, at which point you have the best estimate of a.
Similarly if there is a vertical offset b, you'll need to test some variables there that also result in a linear relationship. Plot x against a^(y-b)

Gnuplot - Plot data on another abscissa by interpolation

Good evening,
I have a problem with Gnuplot. I tried to sum up my problem to make the comprehension easier.
What I have : 2 sets of data, the first one is my experimental data, about 20 points, the second one is my numerical data, about 300 points. But the two sets don't have the same abscissa.
What I want to have : I want my numerical data be interpolate on the x-experimental abscissa.
I know it is possible to do that with Xmgrace (paragraph Interpolation at http://plasma-gate.weizmann.ac.il/Xmgr/doc/trans.html#interp) but with Gnuplot ?
What I want to have in addition : is it possible, then, to subtract the y-experimental data of my y-numerical data at the x-experimental abscissa points ?
Thank you in advance for your answer,
zackalucard
You cannot interpolate the ordinate values of one set to the abscissa values of the other. gnuplot has no mechanism for that.
You can however plot both datasets using one of the smoothing algorithms (check "help smooth") with common abscissa values (which might (be made to) coincide with the original values of one set.)
set table "data1.tmp"
plot dataf1 smooth cspline
set xrange [GPVAL_x_min:GPVAL_X_max] # fix xrange settings
set table "data2.tmp"
plot dataf2 smooth cspline
unset table
Now you have the interpolated data in two temporary files, and only need to combine them into one:
system("paste data1.tmp data2.tmp > correlation.dat") # unixoid "paste" command
plot "correlation.dat" using 2:4
(If you have a sensible fit function for both datasets, the whole thing becomes much easier : plot dataf1 using (fit1($1)):(fit2($1)))
You can use smoothing, this should do the trick
plot "DATA" smooth csplines
(csplines is just one options, there others, e.g. bezier)
But I don't think you can automatically determine the intersection of the smoothed curved. You use the mouse to determine the intersection visually, or alternatively fit some functions f(x) and g(x) to your curves and solve f(x)=g(x) analytically

Averaging many curves with different x and y values

I have several curves that contain many data points. The x-axis is time and let's say I have n curves with data points corresponding to times on the x-axis.
Is there a way to get an "average" of the n curves, despite the fact that the data points are located at different x-points?
I was thinking maybe something like using a histogram to bin the values, but I am not sure which code to start with that could accomplish something like this.
Can Excel or MATLAB do this?
I would also like to plot the standard deviation of the averaged curve.
One concern is: The distribution amongst the x-values is not uniform. There are many more values closer to t=0, but at t=5 (for example), the frequency of data points is much less.
Another concern. What happens if two values fall within 1 bin? I assume I would need the average of these values before calculating the averaged curve.
I hope this conveys what I would like to do.
Any ideas on what code I could use (MATLAB, EXCEL etc) to accomplish my goal?
Since your series' are not uniformly distributed, interpolating prior to computing the mean is one way to avoid biasing towards times where you have more frequent samples. Note that by definition, interpolation will likely reduce the range of your values, i.e. the interpolated points aren't likely to fall exactly at the times of your measured points. This has a greater effect on the extreme statistics (e.g. 5th and 95th percentiles) rather than the mean. If you plan on going this route, you'll need the interp1 and mean functions
An alternative is to do a weighted mean. This way you avoid truncating the range of your measured values. Assuming x is a vector of measured values and t is a vector of measurement times in seconds from some reference time then you can compute the weighted mean by:
timeStep = diff(t);
weightedMean = timeStep .* x(1:end-1) / sum(timeStep);
As mentioned in the comments above, a sample of your data would help a lot in suggesting the appropriate method for calculating the "average".

Resources