gnuplot - intersection of two plots - linux

I am using gnuplot to plot data from two separate csv files (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs) with a different number of rows which generates the following graph.
These data seem to have no common timestamp (the first column) in both csv files and yet gnuplot seems to fit the plotting as shown above.
Here is the gnuplot script that I use to generate my plot.
# ###### GNU Plot
set style data lines
set terminal postscript eps enhanced color "Times" 20
set output "output.eps"
set title "Actual vs. Estimated Comparison"
set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey
set grid xtics ytics mytics
#set size 2
#set size ratio 0.4
#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"
set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0
plot "estimated.csv" using ($1):2 with lines title "Estimated", "actual.csv" using ($1):2 with lines title "Actual";
Is there any way where we can print out (write to a file) the values of the intersection of these plots by ignoring the peaks above green plot? I also have tried to do an sql-join query but it doesn't seem to print out anything for the same reason I explained above.
PS: If the blue line doesn't touch the green line (i.e. if it is way below the green line), I want to take the values of the closest green line so that it will be a one-to-one correspondence (or very close) with the actual dataset.

Perhaps one could somehow force Gnuplot to reinterpolate both data sets on a fine grid, save this auxiliary data and then compare it row by row. However, I think that it's indeed much more practical to delegate this task to an external tool.
It's certainly not the most efficient way to do it, nevertheless a "lazy approach" could be to read the data points, interpret each dataset as a LineString (collection of line segments, essentially equivalent to assuming a linear interpolation between data points) and then calculate the intersection points. In Python, the script to do this might look like this:
#!/usr/bin/env python
import sys
import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))
for g in lines[0].intersection(lines[1]):
if g.geom_type != 'Point':
continue
print('%f,%f' % (g.x, g.y))
Then in Gnuplot, one can invoke it directly:
set terminal pngcairo
set output 'fig.png'
set datafile separator comma
set yr [0:700]
set xr [0:10]
set xtics 0,2,10
set ytics 0,100,700
set grid
set xlabel "Time [seconds]"
set ylabel "Segments"
plot \
'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
'actual.csv' w l lc rgb 'green' t 'Actual', \
'<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''
which gives:

Related

Gnuplot: Violin plot with data from file

I created a script to plot the columns of a dataset using violin plots to show the distribution of the data points starting from the Gnuplot Demo Scripts. However, I can't solve the following error:
"violinplot.gnu", line 27: all points y value undefined!
Does anyone have any idea?
The script:
reset
set terminal pdfcairo size 20,14 enhanced font 'Times,28'
set output 'violinplot.0.pdf'
set datafile separator ','
set table $kdensity1
plot 'profile.csv' using 2:(1) smooth kdensity bandwidth 10. with filledcurves above y lt 9 title 'B'
unset table
unset key
print $kdensity1
set border 2
#unset margins
#unset xtics
set ytics nomirror rangelimited
set title "Distribution of times in milliseconds"
set boxwidth 0.075
set style fill solid bo -1
set errorbars lt black lw 5
set xrange [-6:6]
plot $kdensity1 using (1 + $2/1.):1 with filledcurve x=1 lt 10, \
$kdensity1 using (1 - $2/1.):1 with filledcurve x=1 lt 10
The dataset is in a CSV format as follows (and each column contains time in milliseconds):
1,1814,604,840,1306,13623
2,2195,68,908,1380,14416
3,1173,70,887,512,14301
4,1286,112,982,1541,9549
5,630,97,869,1321,5725
6,1227,689,917,393,4700
7,3402,357,951,500,5431
8,3429,120,969,1661,6281
...
Gnuplot Version 5.2 patchlevel 2
The reason for your error is simple but "nasty" and hidden.
Your input data is comma separated. However, if you plot to a table via set table $kdensity the default column separator is whitespace. That's why gnuplot doesn't find any data in column 2.
I guess since gnuplot 5.2.2. you could set set table $kdensity separator comma. But in order to get a comma as separator you have to use the "plotting style" with table (e.g. plot FILE u 1:2 w table). However, with table and smooth ... do not work together. Either you use with table and you will get the comma but not "smoothed" or you "smooth" and you will not get the comma.
Two possible solutions:
after plotting to the smoothed table set your separator to whitespace (see example below).
or alternatively,
change your input data to whitespace separated.
If you want plot the original (comma separated) data as well, then you have two different column separators. Then you have to apply another workaround.
Script: (works with gnuplot>=5.2.2)
### violin plot with comma separated input data
reset session
# create some random test data (comma separated)
set table $Data separator comma
set samples 100
n = 0
plot for [i=1:3] '+' u (n=n+1):(invnorm(rand(0))*i*25 +i*200) w table
unset table
set datafile separator ','
set table $kdensity
set samples 1000
plot $Data using 2:(1) smooth kdensity bandwidth 10.
unset table
set datafile separator whitespace
set key noautotitle
set style fill solid 0.7
plot $kdensity u (1 + $2/1.):1 w filledcurves x=1 lt 10, \
'' u (1 - $2/1.):1 w filledcurves x=1 lt 10
### end of script
Result:

How to assign specific title to each line in the data file in gnuplot

I have a data file which keeps all the x, y coordinates and radius values for drawing circles. Each circle stand for a region. Up to now I drew the circles. But I want to assign specific legend to each line in the data file. Because after drawing regions, I want to put some points on this regions depend on the region number. However I couldn't figure out how to do it. Is there anyone who know how to assign a specific legend to the circles depend on its line number in the data file. The data file looks like
X Y R Legend
5 6 0.1 1
....
and so on. I want to use the last column as title to assign to the circles. Is there any way to do that?
It depends how exactly you want to show the corresponding "title". Let's assume that the data file circles.dat contains following data:
5 6.0 0.1 1
5 5.5 0.1 2
4 5.0 0.2 3
One option would be to plot the circles and use the fourth column as labels which are placed at the centers of the individual circles. This can be directly achieved with the with labels plotting style as:
set terminal pngcairo
set output 'fig1.png'
fName = 'circles.dat'
unset key
set xr [3:6]
set yr [4:7]
set size square
set tics out nomirror
set xtics 3,1,6
set mxtics 2
set ytics 4,1,7
set mytics 2
plot \
fName u 1:2:3 w circles lc rgb 'red' lw 2, \
'' u 1:2:4 w labels tc rgb 'blue'
This produces:
Alternatively, one might want to put those labels into the legend of the graph. Perhaps there is a more elegant solution, nevertheless one way is to
plot each line of the data file separately and extract the fourth column (to be used as key title) manually:
set terminal pngcairo
set output 'fig2.png'
fName = 'circles.dat'
unset key
set xr [3:6]
set yr [4:7]
set size square
set tics out nomirror
set xtics 3,1,6
set mxtics 2
set ytics 4,1,7
set mytics 2
set key top right reverse
stat fName nooutput
plot \
for [i=0:STATS_records-1] fName u 1:2:3 every ::i::i w circles t system(sprintf("awk 'NR==%d{print $4}' '%s'", i+1, fName))
This gives:

Mapping between data and line color in gnuplot

I want to create a simple histogram in gnuplot and want to adapt the color of the bars according to the data. Currently, I am struggling with the mapping between color and data.
Let's say I have the following data file:
X, 500.00, 100.00, 1
Y, 600.00, 200.00, 2
I generate the histogram with the following code:
reset
fontsize = 12
set terminal png
set output "file.png"
set style fill solid 1.00 border 0
set style histogram errorbars gap 2 lw 1
set style data histogram
set xtics rotate by -45
set grid ytics
set xlabel "label"
set ylabel "label"
set yrange [0:*]
set datafile separator ","
plot 'data.dat' using 2:3:4:xtic(1) ti "" lc variable
Now I want to create a mapping between the fourth column in the data and the color, e.g. 1 -> yellow, 2 -> blue.
I assumed that I can define something like the following
set style line 1 linecolor rgb "yellow"
set style line 2 linecolor rgb "blue"
but this code is not working since it defines styles and not colors. On the other hand I have read in the documentation that "rgb variable" is only available in 3D plotting mode (splot), so I think in this terms my whole approach might go in a wrong direction.
Does anyone know how to realise the mapping between data and line colors?
Have you tried with the command palette? I had the same problems some times ago. I wanted to make this plot (that is in some ways what you need)
So I used the number of elements in a column of the histogram to set the color of that column. My datafile looked like
#MY_FILE
...
26 0.02302 2302
28 0.02233 2233
30 0.02261 2261
32 0.02383 2383
34 0.02279 2279
36 0.02366 2366
38 0.02226 2226
40 0.02148 2148
...
#EOF
where the first row $1 was the n parameter, the second one $2 was my pdf (simply the histogram normalized) and the third $3 column was the number of occurrence int the bin. Then I used the last column as the parameter to color my graph with the command
set palette model RGB defined (1 "blue", 2 "red")
that create a gradient between the starting point 1 and the end 2. Then to use the palette i plotted with the line
p 'MY_FILE' u 1:2:3 w boxes palette
where the w boxes was the command to generate my histogram, and the palette (also pal) command was the command o set the color, which use the third column as specified in u 1:2:3, where the 1:2 is my histogram and 3 is the color gradient.
if you don't want the lateral strip of color (heatmap) just type in gnuplot
unset colorbox
Here's some documentation about palette command in gnuplot:
http://gnuplot.sourceforge.net/demo_5.0/pm3dcolors.html
type help palette on gnuplot
THIS could be particularly HELPFUL: http://www.gnuplotting.org/defining-a-palette-with-discrete-colors/

Gnuplot interchanging Axes

I would like to reproduce this plot with gnuplot:
My data has this format:
Data
1: time
2: price
3: volume
I tried this:
plot file using 1:2 with lines, '' using 1:3 axes x1y2 with impulses
Which gives a normal time series chart with y1 as price and y2 as volume.
Next, I tried:
plot file using 2:1 with lines, '' using 2:3 axes x1y2 with impulses
Which gives prices series with y1 as time and y2 as volume.
However, I need the price to remain at y1 and volume at x2.
Maybe something like:
plot file using 1:2 with lines,' ' using 2:3 axes y1x2 with impulses
However, that does not give what I want.
Gnuplot has no official way to draw this kind of horizontal boxplots. However, you can use the boxxyerrorbars (shorthand boxxy) to achieve this.
As I don't have any test data of your actual example, I generated a data file from a Gaussian random-walk. To generate the data run the following python script:
from numpy import zeros, savetxt, random
N = 500
g = zeros(N)
for i in range(1, N):
g[i] = g[i-1] + random.normal()
savetxt('randomwalk.dat', g, delimiter='\t', fmt='%.3f')
As next thing, I do binning of the 'position data' (which in your case would be the volume data). For this one can use smooth frequency. This computes the sum of the y values for the same x-values. So first I use a proper binning function, which returns the same value for a certain range (x +- binwidth/2). The output data is saved in a file, because for the plotting we must exchange x and y value:
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
set output "| head -n -2 > randomwalk.hist"
set table
plot 'randomwalk.dat' using (hist($1)):(1) smooth frequency
unset table
unset output
Normally one should be able to use set table "randomwalk.hist", but due to a bug, one needs this workaround to filter out the last entry of the table output, see my answer to Why does the 'set table' option in Gnuplot re-write the first entry in the last line?.
Now the actual plotting part is:
unset key
set x2tics
set xtics nomirror
set xlabel 'time step'
set ylabel 'position value'
set x2label 'frequency'
set style fill solid 1.0 border lt -1
set terminal pngcairo
set output 'randwomwalk.png'
plot 'randomwalk.hist' using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) with boxxy lc rgb '#00cc00' axes x2y1,\
'randomwalk.dat' with lines lc rgb 'black'
which gives the result (with 4.6.3, depends of course on your random data):
So, for your data structure, the following script should work:
reset
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
file = 'data.txt'
histfile = 'pricevolume.hist'
set table histfile
plot file using (hist($2)):($3) smooth unique
unset table
# get the number of records to skip the last one
stats histfile using 1 nooutput
unset key
set x2tics
set xtics nomirror
set xlabel 'time'
set ylabel 'price'
set x2label 'volume'
set style fill solid 1.0 border lt -1
plot histfile using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) every ::::(STATS_records-2) with boxxy lc rgb '#00cc00' axes x2y1,\
file with lines using 1:2 lc rgb 'black'
Note, that this time the skipping of the last table entry is done by counting all entries with the stats command, and skipping the last one with every (yes, STATS_records-2 is correct, because the point numbering starts at 0). This variant doesn't need any external tool.
I also use smooth unique, which computes the average value of the , instead of the sum (which is done with smooth frequency).

x range for non-numerical data in Gnuplot

When running the following script, I get an error message:
set terminal postscript enhanced color
set output '| ps2pdf - histogram_categorie.pdf'
set auto x
set key off
set yrange [0:20]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "categorie.dat" using 1:2 ti col with boxes
The error message that I get is
smeik:plots nvcleemp$ gnuplot categorie.gnuplot
plot "categorie.dat" using 1:2 ti col with boxes
^
"categorie.gnuplot", line 13: x range is invalid
The content of the file categorie.dat is
categorie aantal
poussin 13
pupil 9
miniem 15
cadet 15
junior 6
senior 5
veteraan 8
I understand that the problem is that I haven't defined an x range. How can I make him use the first column as values for the x range? Or do I need to take the row numbers as x range and let him use the first column as labels? I'm using Gnuplot 4.4.
I'm ultimately trying to get a plot that looks the same as the plot I made before this one. That one worked fine, but had numerical data on the x axis.
set terminal postscript enhanced color
set output '| ps2pdf - histogram_geboorte.pdf'
set auto x
set key off
set yrange [0:40]
set xrange [1935:2005]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "geboorte.dat" using 1:2 ti col with boxes,\
"geboorte.dat" using 1:($2+2):2 with labels
and the content of the file geboorte.dat is
decennium aantal
1940 2
1950 1
1960 3
1970 2
1980 3
1990 29
2000 30
the boxes style expects that the x-values are numeric. That's an easy one, we can give it the pseudo-column 0 which is essentially the script's line number:
plot "categorie.dat" using (column(0)):2 ti col with boxes
Now you probably want the information in the first column on the plot somehow. I'll assume you want those strings to become the x-tics:
plot "categorie.dat" using (column(0)):2:xtic(1) ti col with boxes
*careful here, this might not work with your current boxwidth settings. You might want to consider set boxwidth 1 or plot ... with (5*column(0)):2:xtic(1) ....
EDIT -- Taking your datafiles posted above, I've tested both of the above changes to get the boxwidth correct, and both seemed to work.

Resources