How to compute shortest distances of points and curve in gnuplot? - gnuplot

Let's say I have a fitted curve in gnuplot (or simply sin(x) function) and file with data - points nearby the function. How to compute the distance of points from the curve and write them to the file with data in gnuplot? Is it possible to implement easily sum of squares in gnuplot? Thank you very much

Your question seems to mix two different concepts. If the curve was fitted to the points then the component term in the sum-of-squares uses the difference in y values. I.e. for a point [xi, yi] the term is (func(xi) - yi)**2.
But this is not the same thing as "distance of the point from the curve", since the nearest point on the curve may be at some different x value. The answer to that question in general requires calculus and is not something that gnuplot is designed to help you with, although if you work out the relevant equation you could use gnuplot's "fit" to find the minimum by approximation rather than by solving the differential equation analytically.
To plot the residuals after fitting
Assume data points [xi, yi] in columns 1 and 2 of file "data".
Assume fit(x) is the function you got from fitting. Then you can plot the residual for each point:
plot 'data' using 1:( (fit($1)-$2)**2 ) with linespoints

Related

How to optimize two ranges for the determination of the intersection point between two curves

I start this thread asking for your help in Excel.
The main goal is to determine the coordinates of the intersection point P=(x,y) between two curves (curve A, curve B) modeled by points.
The curves are non-linear and each defining point is determined using complex equations (equations are dependent by a lot of parameters chosen by user, as well as user will choose the number of points which will define the accuracy of the curves). That is to say that each curve (curve A and curve B) is always changing in the plane XY (Z coordinate is always zero, we are working on the XY plane) according to the input parameters and the number of the defining points is also depending by the user choice.
My first attempt was to determine the intersection point through the trend equations of each curve (I used the LINEST function to determine the coefficients of the polynomial equation) and by solving the solution putting them into a system. The problem is that Excel is not interpolating very well the curves because they are too wide, then the intersection point (the solution of the system) is very far from the real solution.
Then, what I want to do is to shorten the ranges of points to be able to find two defining trend equations for the curves, cutting away the portion of curves where cannot exist the intersection.
Today, in order to find the solution, I plot the curves on Siemens NX cad using multi-segment splines with order 3 and then I can easily find the coordinates of the intersection point. Please notice that I am using the multi-segment splines to be more precise with the approximation of the functions curve A and curve B.
Since I want to avoid the CAD tool and stay always on Excel, is there a way to select a shorter range of the defining points close to the intersection point in order to better approximate curve A and curve B with trend equations (Linest function with 4 points and 3rd order spline) and then find the solution?
I attach a picture to give you an example of Curve A and Curve B on the plane:
https://postimg.cc/MfnKYqtk
At the following link you can find the Excel file with the coordinate points and the curve plot:
https://www.mediafire.com/file/jqph8jrnin0i7g1/intersection.xlsx/file
I hope to solve this problem with your help, thank you in advance!
kalo86
Your question gave me some days of thinking and research.
With the help of https://pomax.github.io/bezierinfo/
§ 27 - Intersections (Line-line intersections)
and
§ 28 - Curve/curve intersection
your problem can be solved in Excel.
About the mystery of Excel smoothed lines you find details here:
https://blog.splitwise.com/2012/01/31/mystery-solved-the-secret-of-excel-curved-line-interpolation/
The author of this fit is Dr. Brian T. Murphy, PhD, PE from www.xlrotor.com. You find details here:
https://www.xlrotor.com/index.php/our-company/about-dr-murphy
https://www.xlrotor.com/index.php/knowledge-center/files
=>see Smooth_curve_bezier_example_file.xls
https://www.xlrotor.com/smooth_curve_bezier_example_file.zip
These knitted together you get the following results for the intersection of your given curves:
for the straight line intersection:
(x = -1,02914127711195 / y = 23,2340949174492)
for the smooth line intersection:
(x = -1,02947493047196 / y = 23,2370611219553)
For a full automation of your task you would need to add more details regarding the needed accuracy and what details you need for further processing (and this is actually not the scope of this website ;-).
Intersection of the straight lines:
Intersection of the smoothed lines:
comparison charts:
solution,
Thank you very much for the anwer, you perfectly centered my goal.
Your solution (for the smoothed lines) is very very close to what I determine in Siemens NX.
I'm going to read the documentation at the provided link https://pomax.github.io/bezierinfo/ in order to better understand the math behind this argument.
Then, to resume my request, you have been able to find the coordinates (x,y) of the intersection point between two curves without passing through an advanced CAD system with a very good precision.
I am starting to study now, best regards!
kalo86

Fit log-log data with gnuplot

i try to fit this plot as you cans see the fit is not so good for the data.
My code is:
clear
reset
set terminal pngcairo size 1000,600 enhanced font 'Verdana,10'
set output 'LocalEnergyStepZoom.png'
set ylabel '{/Symbol D}H/H_0'
set xlabel 'n_{step}'
set format y '%.2e'
set xrange [*:*]
set yrange [1e-16:*]
f(x) = a*x**b
fit f(x) "revErrEnergyGfortCaotic.txt" via a,b
set logscale
plot 'revErrEnergyGfortCaotic.txt' w p,\
'revErrEnergyGfortRegular.txt' w p,\
f(x) w l lc rgb "black" lw 3
exit
So the question is how mistake i compute here? because i suppose that in a log-log plane a fit of the form i put in the code should rappresent very well the data.
Thanks a lot
Finally i can be able to solve the problem using the suggestion in the answer of Christop and modify it just a bit.
I found the approximate slop of the function (something near to -4) then taking this parameter fix i just fit the curve with only a, found it i fix it and modify only b. After that using the output as starting solution for the fit i found the best fit.
You must find appropriate starting values to get a correct fit, because that kind of fitting doesn't have one global solution.
If you don't define a and b, both are set to 1 which might be too far away. Try using
a = 100
b = -3
for a better start. Maybe you need to tweak those value a bit more, I couldn't because I don't have the data file.
Also, you might want to restrict the region of the fitting to the part above 10:
fit [10:] f(x) "revErrEnergyGfortCaotic.txt" via a,b
Of course only, if it is appropriate.
This is a common issue in data analysis, and I'm not certain if there's a nice Gnuplot way to solve it.
The issue is that the penalty functions in standard fitting routines are typically the sum of squares of errors, and try as you might, if your data have a lot of dynamic range, the errors for the smallest y-values come out to essentially zero from the point of view of the algorithm.
I recently taught a course to students where they needed to fit such data. Lots of them beat their (matlab) fitting routines into submission by choosing very stringent convergence criteria, but even this did not help too much.
What you really need to do, if you want to fit this power-law tail well, is to convert the data into log-log form and run a linear regression on that log-log representation.
The main problem here is that the residual errors of the function values of the higher x are very small compared to the residuals at lower x values. After all, you almost span 20 orders of magnitude on the y axis.
Just weight the y values with 1/y**2, or even better: if you have the standard deviations of your data points weight the values with 1/std**2. Then the fit should converge much much better.
In gnuplot weighting is done using a third data column:
fit f(x) 'data' using 1:2:(1/$2**2") via ...
Or you can use Raman Shah's advice and linearize the y axis and do a linear regression.
you need to use weights for your fit (currently low values are not considered as important) and have a better starting guess (via "pars_file.pars")

Colors: CIE XYZ model - Chromaticity graph

I want to draw a section graph for XYZ CIE color model, like this one:
Do you have any idea how to do it?
Very briefly...
You can plot the spectral line (the horseshoe) by plotting the xy (I have XY not xy) data for the standard observer. Then you can find the polygon you need to fill by applying a convex hull algorithm to the points. Make a list of xy values you want to paint within the polygon. Find the z value for a fixed luminance by z = 1 - x - y. Convert to RGB - you will need a function called something like XYZtoRGB (there is a python module, or use the transform on wikipedia). You may want to increase the luminance by multiplying all the numbers by a constant or something first. Set the pixels at the xy locations to the RGB values. Plot along with the convex hull and/or the spectral line you calculated.
I have the data for the standard 2deg (I think) observer (I can't find a link) - you will need to divide by X+Y+Z to convert from XYZ to xyz. Send me a message if you want me to send them to you, there is too much data to post here.
The colour Python module has a plotting submodule where this kind of plot is one of the provided plots. See documentation for plot_chromaticity_diagram_CIE1931 and plot_sds_in_chromaticity_diagram_CIE1931
It uses Matplotlib under the hood.

Drawing a straight line averaging a curve

I would like to draw a straight line that makes the average of a curve. I am plotting my data like that:
plot 'dataset' u 2:4 w p smooth bezier
My data consists of multiple columns and I would get something like that:
Any ideas of how to do it? I guess it is more an interpolation than an average. It is not relevant the ups and downs of the curve, and it would be much better to have a straight line interpolating the curve...
Using a straight line could be more or less easy to fit using fit however, how could I fit a curve that does not look like a well know curve? Let me show you an example? How could I fit a smooth curve among the main group of points? Please notice that there is some noise on the lower part of the graph that I wouldn't like to represent.
If you want to do some basic statistics on your data, gnuplot has a builtin command stats which may do what you want. Gnuplot offers some internal variables after plotting that contain data about min, max, etc. To see what these are, type show variables all after plotting your data.
Otherwise if you want to fit your data to a line, gnuplot does that as well:
f(x) = a*x + b
fit f(x) 'data.dat' using 2:4 via a,b
plot 'data.dat' using 2:4, f(x)

Bézier curve compute point from one axis

I have a Cubic Bézier curve. But I have a problem when I need only one point. I have only value from the X-axis and want to find a value that coresponds to Y-axis to that point. Or find the t step, from it I can easely calculate the Y-axis.
Any clue how to do it? Or is there any formula to do this?
Any solution will have to deal with the fact that there may be multiple solutions if the curve is not X monotone. Consider the cubic bezier (0,0),(2,0),(-1,1),(1,1):
As you can see, there are 4 parameter values (and Y coordinates) at which X==1/2.
This means that if you use subdivision (which is probably your simplest solution), then you need to be careful that your initial bounding t values only surround the point you want.
You can also guess what this implies about the order of an algebraic solution.
A parametric curve extends to any dimension by adding coefficients for those dimensions. Are you sure you've got things straight? It seems like you are using the x-axis as the curve parameter t. The t parameter controls the computations of X- and Y-coordinates by having two cubic equations. Take a look at Wikipedia which provides some pretty neat explanations for the 2D case.
Edit:
Solve as a general third-degree polynomial. Beware that it might have 3 solutions, though.

Resources