How to interpolate data with Gnuplot for further calculations - gnuplot

I am (somehow) familiar with the smooth/interpolation techniques in Gnuplot. It seems to me that these interpolations work only for plotting the interpolated values. However, I need the interpolated values for further calculations.
A simple example may illustrate this:
Let’s say we are selling a specific item on four days and have the number of sales stored in input_numbers.dat:
# days | number_of_sold_items
1 4
2 70
3 80
4 1
Now, I want to plot my income for each day. But the relation between the price per item and the number of sold items is not a simple linear relation, but something complicate which is only known for a few examples – stored in input_price.dat:
# number_of_sold_items | price_per_item
1 5.00
3 4.10
10 3.80
100 3.00
How can I do something like this (pseudocode):
make INTERPOLATED_PRICE(x) using "input_price.dat"
plot "input_numbers.dat" using 1:($2*INTERPOLATED_PRICE($2))
I can do it by fitting but it is not what I want. The relation of the data is too complicated.
P.S.: I know that the price per item vs the number of items in such an example is more like a step-like function and not smooth. This is just an example for some interpolation in general.

It’s hard to prove the non-existence of something but I am pretty confident that this cannot be done with Gnuplot alone, as:
I am under the illusion to be sufficiently familiar with Gnuplot that I would know about it if it existed.
I cannot find anything about such a feature.
It would completely go against Gnuplot’s paradigm to be a one-purpose tool for plotting (fitting is already borderline) and not to feature data processing.

Gnuplot can do something like this:
text = "%f*x + %f"
a = 2
b = 10
eval("f(x) = ".sprintf(text,a,b))
set grid x y
plot f(x)
which basically means that complicated functions can be defined dynamically: The sprintf command converts the text "%f*x + %f" into "2.0*x + 10", the dot operator . concatenates the strings "f(x) = " and "2.0*x + 10", and the eval command defines the function f(x) = 2.0*x + 10. The result can be plotted and gives the expected diagram:
This behavior can be used for creating a piecewise interpolation function as follows:
ip_file = "input_price.dat"
stats ip_file nooutput
n = STATS_records - 1
xmin = STATS_min_x
xmax = STATS_max_x
ip_f = sprintf("x < %f ? NaN : ", xmin)
f(x) = a*x + b # Make a linear interpolation from point to point.
do for [i=0:n-1] {
set xrange [xmin:xmax]
stats ip_file every ::i::(i+1) nooutput
xmintemp = STATS_min_x
xmaxtemp = STATS_max_x
set xrange [xmintemp:xmaxtemp]
a = 1
b = 1
fit f(x) ip_file every ::i::(i+1) via a, b
ip_f = ip_f.sprintf("x < %f ? %f * x + %f : ", xmaxtemp, a, b)
}
ip_f = ip_f."NaN"
print ip_f # The analytical form of the interpolation function.
eval("ip(x) = ".ip_f)
set samples 1000
#set xrange [xmin:xmax]
#plot ip(x) # Plot the interpolation function.
unset xrange
plot "input_numbers.dat" using 1:($2*ip($2)) w lp
The every in combination with stats and fit limits the range to two successive datapoints, see help stats and help every. The ternary operator ?: defines the interpolation function section by section, see help ternary.
This is the resulting analytical form of the interpolation function (after some formatting):
x < 1.000000 ? NaN
: x < 3.000000 ? -0.450000 * x + 5.450000
: x < 10.000000 ? -0.042857 * x + 4.228571
: x < 100.000000 ? -0.008889 * x + 3.888889
: NaN
This is the resulting interpolation function (plotted by plot ip(x)):
This is the resulting plot using the interpolation function in another calculation (plot "input_numbers.dat" using 1:($2*ip($2))):
I don't know the limits on how many ternary operators you can nest and on how long a string or a function definition can be, ...
Tested with Gnuplot 5.0 on Debian Jessie.

Linear interpolation is not available, but how about this:
set xr [0:10]
set sample 21
# define an inline example dataset
$dat << EOD
0 1
2 2
4 4
6 5
8 4
10 3
EOD
# plot interpolated data to another inline dataset
set table $interp
plot $dat us 1:2 with table smooth cspline
unset table
plot $dat w lp, $interp w lp

As I understand your question, you are not looking for interpolation but for a lookup-table, i.e. depending on the number of sold items you have a different price.
What you can do with gnuplot is:
(mis)using stats to create a lookup-string (check help stats)
(mis)using sum to create a lookup-function (check help sum)
Comment: I assume it will be a difference if you for example sell 3 times 1 item on a single day or 1 time 3 items on a single day, because of the graduation of prices.
So, I would suggest a different input data format, i.e. with a date.
(However, not yet implemented in the example below, but can be done. Then, you can make use of the smooth frequency option.) Some data format, e.g. like this:
# date sold_items
2022-09-01 1
2022-09-01 1
2022-09-01 1
2022-09-02 3
Script: (works with gnuplot 5.0.0, Jan. 2015)
### implement lookup table
reset session
$SALES <<EOD
# days | number_of_sold_items
1 4
2 70
3 80
4 1
EOD
$PRICE <<EOD
# number_of_sold_items | price_per_item
1 5.00
3 4.10
10 3.80
100 3.00
EOD
LookupStr = ''
stats $PRICE u (LookupStr=LookupStr.sprintf(" %g %g",$1,$2)) nooutput
Lookup(v) = (p0=NaN, sum [i=1:words(LookupStr)/2] (v>=real(word(LookupStr,i*2-1)) ? \
p0=real(word(LookupStr,i*2)) : 0), p0)
set grid x,y
set key noautotitle
set multiplot
plot $SALES u 1:2 w lp pt 6 lc "dark-grey" ti "sold items", \
'' u 1:($2*Lookup($2)) w lp pt 7 lc "red" ti "total income"
# price table as graph inset
set origin x0=0.41, y0=0.42
set size sx=0.30, sy=0.28
set obj 1 rect from screen x0,y0 to screen x0+sx,y0+sy fs solid noborder lc "white" behind
set margins 0,0,0,0
set xrange [:150]
set yrange [2.5:5.5]
set xlabel "pieces" offset 0,0.5
set ylabel "price / piece"
set logscale x
plot $PRICE u 1:2 w steps lc "blue", \
'' u 1:2 w p pt 7 lc "blue"
unset multiplot
### end of script
Result:

Related

Is there any way to visualize the field on adaptive mesh with gnuplot?

I am a beginner in gnuplot. Recently I tried to visualize a pressure field on adaptive mesh.
Firstly I got the coordinates of nodes and center of the cell and the pressure value at the center of the cell.
And, I found something difficult to deal with. That is the coordinates in x and y directions are not regular, which made me feel hard in preparing the format of source data. For regular and equal rectangular case, I can do something just like x-y-z format. But is there any successful case in adaptive mesh?
I understand that you have some x,y,z data which is in no regular grid (well, your adaptive mesh).
I'm not fully sure whether this is what you are looking for, but
gnuplot can grid the data for you, i.e. inter-/extrapolating your data within a regular grid and then plot it.
Check help dgrid3d.
Code:
### grid data
reset session
# create some test data
set print $Data
do for [i=1:200] {
x = rand(0)*100-50
y = rand(0)*100-50
z = sin(x/15)*sin(y/15)
print sprintf("%g %g %g",x,y,z)
}
set print
set view equal xyz
set view map
set multiplot layout 1,2
set title "Original data with no regular grid"
unset dgrid3d
splot $Data u 1:2:3 w p pt 7 lc palette notitle
set title "Gridded data"
set dgrid3d 100,100 qnorm 2
splot $Data u 1:2:3 w pm3d
unset multiplot
### end of code
Result:
If you have the size of each cell, you can use the "boxxyerror" plotting style. Let xdelta and ydelta be half the size of a cell along the x-axis and y-axis.
Script:
$datablock <<EOD
# x y xdelta ydelta pressure
1 1 1 1 0
3 1 1 1 1
1 3 1 1 1
3 3 1 1 3
2 6 2 2 4
6 2 2 2 4
6 6 2 2 5
4 12 4 4 6
12 4 4 4 6
12 12 4 4 7
EOD
set xrange [-2:18]
set yrange [-2:18]
set palette maxcolors 14
set style fill solid 1 border lc black
plot $datablock using 1:2:3:4:5 with boxxyerror fc palette title "mesh", \
$datablock using 1:2 with points pt 7 lc rgb "gray30" title "point"
pause -1
In this script, 5-column data (x, y, xdelta, ydelta, pressure) is given for "boxxyerror" plot. To colorize the cells, the option "fc palette" is required.
Result:
I hope this figure is what you are looking for.
Thanks.

How to remove line between "jumping" values, in gnuplot?

I would like to draw a line with plots that contain "jumping" values.
Here is an example: when we have plots of sin(x) for several cycles and plot it, unrealistic line will appear that go across from right to left (as shown in following figure).
One idea to avoid this might be using with linespoints (link), but I want to draw it without revising the original data file.
Do we have simple and robust solution for this problem?
Assuming that you are plotting a function, that is, for each x value there exists one and only one corresponding y value, the easiest way to achieve what you want is to use the smooth unique option. This smoothing routine will make the data monotonic in x, then plot it. When several y values exist for the same x value, the average will be used.
Example:
Data file:
0.5 0.5
1.0 1.5
1.5 0.5
0.5 0.5
Plotting without smoothing:
set xrange [0:2]
set yrange [0:2]
plot "data" w l
With smoothing:
plot "data" smooth unique
Edit: points are lost if this solution is used, so I suggest to improve my answer.
Here can be applied "conditional plotting". Suppose we have a file like this:
1 2
2 5
3 3
1 2
2 5
3 3
i.e. there is a backline between 3rd and 4th point.
plot "tmp.dat" u 1:2
Find minimum x value:
stats "tmp.dat" u 1:2
prev=STATS_min_x
Or find first x value:
prev=system("awk 'FNR == 1 {print $1}' tmp.dat")
Plot the line if current x value is greater than previous, or don't plot if it's less:
plot "tmp.dat" u ($0==0? prev:($1>prev? $1:1/0), prev=$1):2 w l
OK, it's not impossible, but the following is a ghastly hack. I really advise you add an empty line in your dataset at the breaks.
$dat << EOD
1 1
2 2
3 3
1 5
2 6
3 7
1 8
2 9
3 10
EOD
plot for [i=0:3] $dat us \
($0==0?j=0:j=j,llx=lx,lx=$1,llx>lx?j=j+1:j=j,i==j?$1:NaN):2 w lp notit
This plots your dataset three times (acually four, there is a small error in there. I guess i have to initialise all variables), counts how often the abscissa values "jump", and only plots datapoints if this counter j is equal to the plot counter i.
Check the help on the serial evaluation operator "a, b" and the ternary operator "a?b:c"
If you have data in a repetitive x-range where the corresponding y-values do not change, then #Miguel's smooth unique solution is certainly the easiest.
In a more general case, what if the x-range is repetitive but y-values are changing, e.g. like a noisy sin(x)?
Then compare two consecutive x-values x0 and x1, if x0>x1 then you have a "jump" and make the linecolor fully transparent, i.e. invisible, e.g. 0xff123456 (scheme 0xaarrggbb, check help colorspec). The same "trick" can be used when you want to interrupt a dataline which has a certain forward "jump" (see https://stackoverflow.com/a/72535613/7295599).
Minimal solution:
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) w l lc rgb var
Script:
### plot "folded" data without connecting lines
reset session
# create some test data
set table $Data
plot [0:2*pi] for [i=1:4] '+' u 1:(sin(x)+rand(0)*0.5) w table
unset table
set xrange[0:2*pi]
set key noautotitle
set multiplot layout 1,2
plot $Data u 1:2 w l lc "red" ti "data as is"
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) \
w l lc rgb var ti "\n\n\"Jumps\" removed\nwithout changing\ninput data"
unset multiplot
### end of script
Result:

Gnuplot Data and Parametric

I've had a look at trying to plot some data with Gnuplot and super impose a function on it however I have found no information after a couple of hours research.
Say you have typical datapoints:
x y
0 1
1 5
2 6
3 6
...
and now you want to super impose a extrapolation taking y down to 0 for some gradient (eg. a gradient of 1 over 3 x steps, then your straight line function is (0 - 6)/(3)*x + 6 = -2*x + 6)
I am under the assumption you can just plot this via a parametric function,
eg:
y(x) = 2*x + 6
plot 'datafile.dat' using 1:2, \
y(x) [3:6]
Is this the right approach? I've tried but it doesn't plot correctly.
Also my data is set xdata time and I'm doing a multiplot, (however I only need this on one of them) which complicates things. (I've also tried it with set parametric)
Any ideas?
you can generate a piece-wise function like this:
y(x) = (x>3 && x<6) ? 2*x + 6 : 0/1
plot [0:6] 'datafile.dat' using 1:2, y(x)
but it will draw a vertical line at x=3.
the best attempt maybe to store the function in a file:
set table 'func.dat'
plot [3:6] y(x)
unset table
and then plot both data and function:
plot [0:6] 'datafile.dat' using 1:2, 'func.dat' w l

Whether is it possible to plot normal probability distribution in gnuplot

My data file is as-
2 3 4 1 5 2 0 3 4 5 3 2 0 3 4 0 5 4 3 2 3 4 4 0 5 3 2 3 4 5 1 3 4
My requirement is to plot normal PDF in gnuplot.
I could do it by calculating f(x)
f(x) = \frac{1}{\sqrt{2\pi\sigma^2} } e^{ -\frac{(x-\mu)^2}{2\sigma^2} }
for each x using shell script.
Then I plot it in gnuplot using the command-
plot 'ifile.txt' using 1:2 with lines
But whether is it possible to plot directly in gnuplot?
gnuplot provides a number of processing options under the smooth keyword (try typing help smooth for more info). For your specific case, I would recommend a fit though.
First, note that your data points are in a row, you need to convert it to columns for gnuplot to use it. You can do it with awk:
awk '{for (i=1;i<=NF;i++) print $i}' datafile
which can be invoked from within gnuplot:
plot "< awk '{for (i=1;i<=NF;i++) print $i}' datafile" ...
Now assume that datafile has the right format for simplicity.
You can use the smooth frequency option to see how many occurrences of each value you have:
plot "datafile" u 1:(1.) smooth frequency w lp pt 7
To get the normalized distribution, you divide by the number of values. This can be done automatically within gnuplot with stats:
stats "datafile"
This will store the number of values in variable STATS_records, which in you case has value 33:
gnuplot> print STATS_records
33.0
So the normalized distribution (the probability of getting a value at x) is:
plot "datafile" u 1:(1./STATS_records) smooth frequency w lp pt 7
As you can see, your distribution doesn't really look like a normal distribution, but anyway, let's go on. Create a Gaussian for fitting and fit to your data, and plot it. You need to fit to the probability, rather than to the data itself. To do so, we plot to a table to extract the data generated by smooth frequency:
# Non-normalized Gaussian
f(x)= A * exp(-(x-x0)**2/2./sigma**2)
# Save probability data to table
set table "probability"
plot "datafile" u 1:(1./STATS_records) smooth frequency not
unset table
# Fit the Gaussian to the data, exclude points from table with grep
fit f(x) "< grep -v 'u' probability" via x0, sigma, A
# Normalize the gaussian
g(x) = 1./sqrt(2.*pi*sigma**2) * f(x) / A
# Plot
plot "datafile" u 1:(1./STATS_records) smooth frequency w lp pt 7, g(x)
set table generates some points which you should exclude, that's why I used grep to filter the file. Also, the Gaussian needs to be normalized after the fitting is done with a variable amplitude. If you want to retrieve the fitting parameters:
gnuplot> print x0, sigma
3.40584703189268 1.76237558717934
Finally note that if the spacing between data points is not homogeneous, e.g. instead of x = 0, 1, 2, 3 ... you have values at x = 0, 0.1, 0.5, 3, 3.2 ... then you'll need to use a different way to do this, for example defining bins of regular size to group data points.

Gnuplot histogram with errorbars (High and Low)

I am trying to create a histogram (barchart) with High and Low errors, using gnuplot. I have found this thread Gnuplot barchart histogram with errorbars Unfortunately it consists only from X value and X-error (2 vaues). Whats I would like to achieve is X value (average) and error bar consisting of High and Low values (total 3: avg, High and Low). How I can do this using gnuplot?
My script is identical to the one mentioned in the Thread, I only changed some labels etc (simple cosmetic changes). My example dataset structure is as follows:
WikiVote 10 12 7
If you have a very simple datafile:
#y ymin ymax
4 3 8
You can plot this datafile using:
set yrange [0:]
set style histogram errorbars gap 2 lw 1
plot 'datafile' u 1:2:3 w hist
I have modified the code provided by mgilson, to achieve multiple histograms for a single X value. If anybody needs it here is the code.
plot 'stack_2.dat' u 2:3:4:xtic(1) w hist ti "Hadoop" linecolor rgb "#FF0000", '' u 5:6:7:xtic(1) w hist ti "Giraph" lt 1 lc rgb "#00FF00"
Here is the pattern
#y_0 #min #max #y_1 #min #max
Dataset 4 3 8 6 5 9

Resources