Subtract smoothed data from original - gnuplot

I wonder whether there is a way to subtract smoothed data from original ones when doing things of the kind:
plot ["17.12.2020 08:00:00":"18.12.2020 20:00:00"] 'data3-17-28.csv1' using 4:5 title 'Sensor 3' with lines, \
'' using 4:5 smooth acsplines
Alternatively I would need to do it externally, of course.

As #Suntory already suggested you can plot smoothed data into a table.
However, keep in mind, the number of datapoints will be determined by set samples, default setting is 100 and the smoothed datapoints will be equidistant. So, if you set samples to the number of your datapoints and your data is equidistant as well, then all should be fine.
Concatenating data line by line is not straightforward in gnuplot, since gnuplot is not intended to do such operations.
The following gnuplot-only solution assumes that you have your data in a datablock $Data without headers and empty lines. If not, you could either plot it with table from file into a table named $Data or use the following approach in the accepted answer of this question: gnuplot: load datafile 1:1 into datablock
If you don't have equidistant data, you need to interpolate data, which is also not straightforward in gnuplot, see: Resampling data with gnuplot
It's up to you: either you use external tools (which might not be platform-independent) or you apply a somewhat cumbersome platform independent gnuplot-only solution.
Code:
### plot difference of data to smoothed data
reset session
$Data <<EOD
1 0
2 13
3 16
4 17
5 11
6 8
7 0
EOD
stats $Data u 0 nooutput # get number of rows or datapoints
set samples STATS_records
set table $Smoothed
plot $Data u 1:2 smooth acsplines
unset table
# put both datablock into one
set print $Difference
do for [i=1:|$Data|] {
print sprintf('%s %s',$Data[i],$Smoothed[i+4])
}
set print
plot $Data u 1:2 w lp pt 7, \
$Smoothed u 1:2 w lp pt 6, \
$Difference u 1:($2-$4) w lp pt 4 lc "red"
### end of code
Result:

If I well understand you would like this :
First write your smooth's data in out.csv file
set table "out.csv" separator comma
plot 'file' u 4:5 smooth acsplines
unset table
Then this line will paste 'out.csv' to file as an appended column.You will maybe need to delete first lines using sed command (sed '1,4d' out.csv)
stats 'file' matrix
Thanks to stats we automatically get the number of column in your original data (STATS_size_x).
plot "< paste -d' ' file out.csv" u 4:($5-$(STATS_size_x+2)) w l
Could you please try this small code on your data.

Related

How to remove line between "jumping" values, in gnuplot?

I would like to draw a line with plots that contain "jumping" values.
Here is an example: when we have plots of sin(x) for several cycles and plot it, unrealistic line will appear that go across from right to left (as shown in following figure).
One idea to avoid this might be using with linespoints (link), but I want to draw it without revising the original data file.
Do we have simple and robust solution for this problem?
Assuming that you are plotting a function, that is, for each x value there exists one and only one corresponding y value, the easiest way to achieve what you want is to use the smooth unique option. This smoothing routine will make the data monotonic in x, then plot it. When several y values exist for the same x value, the average will be used.
Example:
Data file:
0.5 0.5
1.0 1.5
1.5 0.5
0.5 0.5
Plotting without smoothing:
set xrange [0:2]
set yrange [0:2]
plot "data" w l
With smoothing:
plot "data" smooth unique
Edit: points are lost if this solution is used, so I suggest to improve my answer.
Here can be applied "conditional plotting". Suppose we have a file like this:
1 2
2 5
3 3
1 2
2 5
3 3
i.e. there is a backline between 3rd and 4th point.
plot "tmp.dat" u 1:2
Find minimum x value:
stats "tmp.dat" u 1:2
prev=STATS_min_x
Or find first x value:
prev=system("awk 'FNR == 1 {print $1}' tmp.dat")
Plot the line if current x value is greater than previous, or don't plot if it's less:
plot "tmp.dat" u ($0==0? prev:($1>prev? $1:1/0), prev=$1):2 w l
OK, it's not impossible, but the following is a ghastly hack. I really advise you add an empty line in your dataset at the breaks.
$dat << EOD
1 1
2 2
3 3
1 5
2 6
3 7
1 8
2 9
3 10
EOD
plot for [i=0:3] $dat us \
($0==0?j=0:j=j,llx=lx,lx=$1,llx>lx?j=j+1:j=j,i==j?$1:NaN):2 w lp notit
This plots your dataset three times (acually four, there is a small error in there. I guess i have to initialise all variables), counts how often the abscissa values "jump", and only plots datapoints if this counter j is equal to the plot counter i.
Check the help on the serial evaluation operator "a, b" and the ternary operator "a?b:c"
If you have data in a repetitive x-range where the corresponding y-values do not change, then #Miguel's smooth unique solution is certainly the easiest.
In a more general case, what if the x-range is repetitive but y-values are changing, e.g. like a noisy sin(x)?
Then compare two consecutive x-values x0 and x1, if x0>x1 then you have a "jump" and make the linecolor fully transparent, i.e. invisible, e.g. 0xff123456 (scheme 0xaarrggbb, check help colorspec). The same "trick" can be used when you want to interrupt a dataline which has a certain forward "jump" (see https://stackoverflow.com/a/72535613/7295599).
Minimal solution:
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) w l lc rgb var
Script:
### plot "folded" data without connecting lines
reset session
# create some test data
set table $Data
plot [0:2*pi] for [i=1:4] '+' u 1:(sin(x)+rand(0)*0.5) w table
unset table
set xrange[0:2*pi]
set key noautotitle
set multiplot layout 1,2
plot $Data u 1:2 w l lc "red" ti "data as is"
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) \
w l lc rgb var ti "\n\n\"Jumps\" removed\nwithout changing\ninput data"
unset multiplot
### end of script
Result:

Avoid connection of points when there is empty data

I am trying make a line chart using Gnuplot. I need to get something like the following but with an exception:
In the example above you can see a straight line which joins two separate points over empty data. It is the one that crosses the '2016-09-27 00:00:00' x tick. I would like there would be a empty space instead of that straight line. How could I achieve this?
This is the current code:
set xdata time
set terminal pngcairo enhanced font "arial,10" fontscale 1.0 size 900, 350
set output filename
set key off
set timefmt '"%Y-%m-%d %H:%M:%S"'
set format x "%Y-%m-%d %H:%M"
set xtics rotate by -80
set mxtics 10
set datafile missing "-"
set style line 1 lt 2 lc rgb 'blue' lw 1
set style line 2 lt 2 lc rgb 'green' lw 1
set style line 3 lt 2 lc rgb 'red' lw 1
plot\
fuente using 1:2 ls 1 with lines,\
fuente using 1:3 ls 2 with lines,\
fuente using 1:4 ls 3 with lines
Three options:
In the data file, put an empty line where the gap is. This results in exactly what you want, but would also affect the other data from that file.
Use every to only plot a portion of the data and plot it twice, once up to the gap, once from the gap. Suppose that the gap occurs between data points 42 and 43 in your case, then you could use:
plot\
fuente using 1:2 ls 1 every ::::41 with lines,\
fuente using 1:2 ls 1 every ::42 with lines,\
fuente using 1:3 ls 2 with lines,\
fuente using 1:4 ls 3 with lines
(The every statement takes up to six arguments separated by colons but you can leave them empty for default values. The fifth argument is the end point, the third is the starting point.)
If you use - for missing data in your file (as indicated by your set datafile missing "-"), you have modify your using statement for this to be effective:
plot\
fuente using 1:($2) ls 1 with lines,\
fuente using 1:3 ls 2 with lines,\
fuente using 1:4 ls 3 with lines
Of course, you can always change your data and e.g. insert empty lines (as #Wrzlprmft suggested) when data is missing which will interrupt your line.
With large datasets and a lot of "breaks" this would be painful if you have to do it manually.
I would say that there is a solution without changing your data.
Let me ask: "What do you consider as missing data?"
My assumption would be: you have e.g. a data logger which takes values every 10 minutes.
If for some reason the logger did not take some data there will be a "gap" of missing data.
Now, you can define what you consider as a gap, e.g. >1 hour of no data would be a gap.
Hence, you simply compare two consecutive values t0 and t1 and if the difference is larger then your gap you change the line color from whatever color to transparent (according to the scheme 0xaarrggbb). Check help linecolor variable and help colorspec.
Script:
### don't show line in missing data gaps
reset session
myFmt = "%Y-%m-%d %H:%M"
# create some random test data
set print $Data
tStart = "2016-09-27"
tEnd = "2016-10-10"
t0 = strptime(myFmt,tStart)
t1 = strptime(myFmt,tEnd)
y0 = 100
do for [t=t0:t0+(t1-t0)*0.2:600] { print sprintf("%s %g",strftime(myFmt,t),y0=y0+(rand(0)-0.5)) }
do for [t=t0+(t1-t0)*0.3:t0+(t1-t0)*0.5:600] { print sprintf("%s %g",strftime(myFmt,t),y0=y0+(rand(0)-0.5)) }
do for [t=t0+(t1-t0)*0.8:t0+(t1-t0):600] { print sprintf("%s %g",strftime(myFmt,t),y0=y0+(rand(0)-0.5)) }
set print
set format x "%d.%m." timedate
gap = 3600 # 1 hour
myColor(tCol,color) = (t0=t1, t1=timecolumn(tCol,myFmt), t1-t0>gap ? 0xff123456 : color)
set multiplot layout 2,1
plot $Data u (timecolumn(1,myFmt)):3 w l lc rgb 0xff0000 ti "data as is"
plot t1=NaN $Data u (timecolumn(1,myFmt)):3:(myColor(1,0x0000ff)) w l lc rgb var ti "with removed gaps"
unset multiplot
### end of script
Result:

Multiple data blocks, changing plot title

I'm trying to build a graph with a csv file.
It's supposed to have an undetermined number of data blocks inside.
My CSV looks like this :
year;amount;NAME1
year;amount;NAME1
year;amount;NAME1
year;amount;NAME2
year;amount;NAME2
year;amount;NAME2
So I want my graph to have two curves (or more if there's more blocks), one named NAME1 and the other NAME2.
The only way I've fount to retrieve the name is by using:
title columnhead(3)
But by using this, the first line of my csv is missing, and I can't figure why ...
Here's my script generating the image
gnuplot <<EOF
set terminal png
set title "Stages par professeur par années"
set output "stages_entr_ann.png"
set auto x
set key on outside left bmargin
set datafile separator ";"
set xtics 1
set ytics 1
stats 'fichier3_t.stat'
plot for [IDX=0:STATS_blocks-1] 'fichier3_t.stat' index IDX u 1:2 title columnhead(3) with linespoints ls IDX
EOF
(There's an unknow amount of blocks, so I'm using STATS_blocks)
The point is that columnhead expects something like this:
Year Amount Name
2013 5000 John
2014 8000 Max
2015 12000 Susanne
i.e. the first row of each colum is treated as label, not as data. While gnuplot extracts the name fine, it ignores the rest of the line.
There is no simple and direct solution for this, but you can do a workaround:
plot for [IDX=0:STATS_blocks-1] 'fichier3_t.stat' index IDX u 1:(1/0) title columnhead(3) with linespoints ls IDX, for [IDX=0:STATS_blocks-1] 'fichier3_t.stat' index IDX u 1:(1/0) notitle with linespoints ls IDX
To make it more clear:
plot 'fichier3_t.stat' ... u 1:(1/0) title columnhead(3) with linespoints ..., \
'fichier3_t.stat' ... u 1:2 notitle with linespoints ...
The first command is similar to yours, it just does not plot any data (because 1/0 is always invalid). It's only there to generate the entry in the legend (key).
The second command plots the data, but does not generate any entry in the legend. Just make sure the same line style is assigned to the two plots.
If the table is large and speed is a problem, you may also use every for the title generating command to reduce the number of points it trys to plot.
Finally, you may also get the message "No valid data in xrange" or similar. That's intended here but shouldn't be a problem.

Is there a way to put a label for the last entry in gnuplot?

I want to use gnuplot for real time plotting (Data gets appended to file which I use for plotting and I use replot for real time plotting). I also want to put a label for the latest entry which is plotted. So as to get a idea what is the latest value. Is there a way to do this?
If you are on a unixoid system, you can use tail to extract the last line from the file and plot it separately in whatever way you desire. To give a simple example:
plot\
"data.dat" w l,\
"< tail -n 1 data.dat" u 1:2:2 w labels notitle
This will plot the whole of data.dat with lines and the last point with labels, with the label depicting the value.
There is no need to use the Linux command tail, you can simply do it with gnuplot-only, hence platform-independently.
The principle: while plotting the data, you assign the values of column 1 and 2 to variables x0 and y0, respectively.
After the first plot command, x0 and y0 will contain the last values.
With this, you don't have to load the file a second time for extracting the last values.
For the label plotting, use these values and print the label with a sprintf() expression (check help sprintf).
The construct '+' u ... every ::0::0 is just one way of many ways to plot a single data point.
Data: SO28152083.dat
1 5.1
2 2.2
3 3.3
4 1.4
5 4.5
Script: (works with gnuplot 4.4.0, March 2010 or even with earlier versions)
### plot last value as label
reset
FILE = "SO28152083.dat"
set key noautotitle
set offsets 0.5,0.5,1,1
plot FILE u (x0=$1):(y0=$2) w lp pt 7 lc rgb "red" ti "data", \
'+' u (x0):(y0):(sprintf("%g",y0)) every ::0::0 w labels offset 0,1
### end of script
Result:

Smooth command not supporting variable colors?

I am trying to make a plot in GNUplot using smooth csplines command. The data file can have many different sections to plot (not constant) and i wold like to use the lc variable option to differentiate them with different color. Am i wrong of is not supporting the lc variable option?
Correct, you cannot mix smooth and lc palette in a single plot command. You could write the smoothed data to an intermediate file with set table and then plot this data with lc palette.
Consider the example file test.txt:
1
3
2
5
4
6
Now plot this with:
set table 'tmp.txt'
plot 'test.txt' using 0:1 smooth cspline
unset table
And then plot the file tmp.txt with lc rgb variable or similar:
rgb(r,g,b) = 65536 * int(r) + 256 * int(g) + int(b)
plot 'test.txt' using 0:1 pt 7 t 'original', \
'tmp.txt' using 1:2:($2 < 4.2 ? rgb(255,0,0) : rgb(0,255,0)) with lines lc rgb var t 'smoothed'
Result with 4.6.4:
Note, that this doesn't allow you to use some criteria contained in an additional column of your original data for coloring (say, in the third column of test.txt). That would require much more fiddling.

Resources