Gnuplot interchanging Axes - gnuplot

I would like to reproduce this plot with gnuplot:
My data has this format:
Data
1: time
2: price
3: volume
I tried this:
plot file using 1:2 with lines, '' using 1:3 axes x1y2 with impulses
Which gives a normal time series chart with y1 as price and y2 as volume.
Next, I tried:
plot file using 2:1 with lines, '' using 2:3 axes x1y2 with impulses
Which gives prices series with y1 as time and y2 as volume.
However, I need the price to remain at y1 and volume at x2.
Maybe something like:
plot file using 1:2 with lines,' ' using 2:3 axes y1x2 with impulses
However, that does not give what I want.

Gnuplot has no official way to draw this kind of horizontal boxplots. However, you can use the boxxyerrorbars (shorthand boxxy) to achieve this.
As I don't have any test data of your actual example, I generated a data file from a Gaussian random-walk. To generate the data run the following python script:
from numpy import zeros, savetxt, random
N = 500
g = zeros(N)
for i in range(1, N):
g[i] = g[i-1] + random.normal()
savetxt('randomwalk.dat', g, delimiter='\t', fmt='%.3f')
As next thing, I do binning of the 'position data' (which in your case would be the volume data). For this one can use smooth frequency. This computes the sum of the y values for the same x-values. So first I use a proper binning function, which returns the same value for a certain range (x +- binwidth/2). The output data is saved in a file, because for the plotting we must exchange x and y value:
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
set output "| head -n -2 > randomwalk.hist"
set table
plot 'randomwalk.dat' using (hist($1)):(1) smooth frequency
unset table
unset output
Normally one should be able to use set table "randomwalk.hist", but due to a bug, one needs this workaround to filter out the last entry of the table output, see my answer to Why does the 'set table' option in Gnuplot re-write the first entry in the last line?.
Now the actual plotting part is:
unset key
set x2tics
set xtics nomirror
set xlabel 'time step'
set ylabel 'position value'
set x2label 'frequency'
set style fill solid 1.0 border lt -1
set terminal pngcairo
set output 'randwomwalk.png'
plot 'randomwalk.hist' using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) with boxxy lc rgb '#00cc00' axes x2y1,\
'randomwalk.dat' with lines lc rgb 'black'
which gives the result (with 4.6.3, depends of course on your random data):
So, for your data structure, the following script should work:
reset
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
file = 'data.txt'
histfile = 'pricevolume.hist'
set table histfile
plot file using (hist($2)):($3) smooth unique
unset table
# get the number of records to skip the last one
stats histfile using 1 nooutput
unset key
set x2tics
set xtics nomirror
set xlabel 'time'
set ylabel 'price'
set x2label 'volume'
set style fill solid 1.0 border lt -1
plot histfile using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) every ::::(STATS_records-2) with boxxy lc rgb '#00cc00' axes x2y1,\
file with lines using 1:2 lc rgb 'black'
Note, that this time the skipping of the last table entry is done by counting all entries with the stats command, and skipping the last one with every (yes, STATS_records-2 is correct, because the point numbering starts at 0). This variant doesn't need any external tool.
I also use smooth unique, which computes the average value of the , instead of the sum (which is done with smooth frequency).

Related

Can I plot 1D heatmap with gnuplot?

I'm trying to plot a 1D heatmap using two columns of data (x value and y value) in gnuplot. The linegraph plotted using my data is like this:
Linegraph:
However after some trying I can only achieve this:
What I've got:
And what I want to get is something like this. (Only example)
What I want:
The gnuplot script that I use is as follows:
set view map
set size ratio 0.2
unset ytics
unset key
splot 'test.dat' u 1:(1):2 palette
Could anyone help please?
So you want to use the y axis as a fake dimension in order to increase the width of your second line plot?
Sure, this is e.g. possible with boxxyerror with explicit ymin and ymax errors that fill the yrange.
set xr [-10:10]
set yr [0:1]
xspacing = 0.1
plot '+' u 1:(0.5):($1-xspacing):($1+xspacing):(0):(1):(sin($1)) w boxxyerror lc palette
In your case replace the sin(x) with the respective column of your data. With the special file '+' the x-width has no effect, but in your case you might need to play around with a proper xspacing in order to avoid white gaps between the points.
I would do it like this:
unset key
set xrange noextend
set offset 0,0,graph .05,graph .05
set palette cubehelix negative
plot 'foo.dat' using 0:3 with lines lc "black", \
'foo.dat' using 0:(70):3 with lines lc palette lw 10

Cumulative data and extrapolation with gnuplot

Having a list of dates and events which is not necessarily sorted by date
e.g. like
# Date Event
04.12.2018 -4
23.06.2018 5
04.10.2018 3
11.11.2018 -9
08.03.2018 -4
08.03.2018 2
11.11.2018 -3
I would like to sum up the events and do a (e.g. linear) extrapolation, e.g. when the data will hit a certain threshold (e.g. zero).
It looks like smooth frequency and smooth cumulative seemed to be made for this.
But I am struggeling with the following:
a) how can I add a start value (offset), e.g. StartValue = 500
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):($2+StartValue) smooth cumulative w l t "Cumulated Events"
doesn't do it.
b) how can I get the cumulative data? Especially if the data is not sorted by date?
set table "DataCumulative.dat"
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative with table
unset table
This look similar to this question (GNUPLOT: saving data from smooth cumulative) but I don't get the expected numbers. In my example below in the file "DataCumulative.dat", I expected unique dates and basically the data from the lower plot. How to get this?
The code:
### start code
reset session
set colorsequence classic
# function for creating a random date between two dates
t(date_str) = strptime("%d.%m.%Y", date_str)
date_random(d0,d1) = strftime("%d.%m.%Y",rand(0)*(t(d1)-t(d0)) + t(d0))
# create some random date data
date_start = "01.01.2018"
date_end = "30.06.2018"
set print $Data
do for [i=1:1000] {
print sprintf("%s\t%g", date_random(date_start,date_end), floor(rand(0)*10-6))
}
set print
set xdata time
set timefmt "%d.%m.%Y"
set xtics format "%b"
set xrange[date_start:"31.12.2018"]
set multiplot layout 2,1
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth frequency with impulses t "Events"
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative w l t "Cumulated Events"
unset multiplot
# attempt to get cumulative data into datablock
set table "DataCumulative.dat"
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative with table
unset table
### end of code
The plots:
I guess, I finally got it now. However, there are a few learnings which I still don't understand completely.
1.
In order to get the cumulative data you should not set
set table $DataCumulative
plot $Data u (stringcolumn(1)):2 smooth cumulative with table
unset table
but instead:
set table $DataCumulative
plot $Data u (stringcolumn(1)):2 smooth cumulative
unset table
note the missing "with table" in the plot command.
The first version gives you the original data, the second one the desired cumulative data. But I don't yet understand why.
2.
the default datafile separator setting
which is
set datafile separator whitespace
it doesn't seem not to work. It will give an error message like line xxx: No data to fit
instead, you have to set
set datafile separator " \t" # space and TAB
But I don't understand why.
3.
fitting time date
f_lin(x) = m*x + c
won't give a good fit at all. Apparently, you have to subtract the start date and do the fitting.
f_lin(x) = m*(x-strptime("%d.%m.%Y", Date_Start)) + c
I remember reading this long time ago in the gnuplot documention but I can't find it anymore.
For the time being, I am happy now with the following.
The modified code:
### generate random date between two dates
reset session
# function for creating a random date between two dates
t(date_str) = strptime("%d.%m.%Y", date_str)
date_random(d0,d1) = strftime("%d.%m.%Y",rand(0)*(t(d1)-t(d0)) + t(d0))
# create some random date data
Date_Start = "01.01.2018"
Date_End = "30.06.2018"
set print $Data
do for [i=1:100] {
print sprintf("%s\t%g", date_random(Date_Start,Date_End), floor(rand(0)*10-6))
}
set print
set xdata time
set timefmt "%d.%m.%Y"
# get cumulative data into datablock
set xtics format "%d.%m.%Y"
set table $DataCumulative
plot $Data u (stringcolumn(1)):2 smooth cumulative
unset table
set xtics format "%b"
set datafile separator " \t" # space and TAB
# linear function and fitting
f_lin(x) = m*(x-strptime("%d.%m.%Y", Date_Start)) + c
set fit nolog quiet
fit f_lin(x) $DataCumulative u 1:2 via m,c
Level_Start = 500
Level_End = 0
x0 = (Level_End - Level_Start - c)/m + strptime("%d.%m.%Y", Date_Start)
set multiplot layout 3,1
# event plot & cumulative plot
set xrange[Date_Start:"31.12.2018"]
set xtics format ""
set lmargin 7
set bmargin 0
plot $Data u (timecolumn(1,"%d.%m.%Y")):2 smooth frequency with impulses lc rgb "red" t "Events 2018"
set xtics format "%b"
set bmargin
plot $Data u (timecolumn(1,"%d.%m.%Y")):2 smooth cumulative w l lc rgb "web-green" t "Cumulated Events 2018"
# fit & extrapolation plot
set label 1 at x0, graph 0.8 strftime("%d.%m.%Y",x0) center
set arrow 1 from x0, graph 0.7 to x0, Level_End
set key at graph 0.30, graph 0.55
set xrange[Date_Start:x0+3600*24*50] # end range = extrapolated date + 50 days
set xtics format "%m.%y"
set yrange [-90:]
plot $DataCumulative u (timecolumn(1,"%d.%m.%Y")):($2+Level_Start) w l lc rgb "blue" t "Cumulated Events",\
Level_End w l lc rgb "red" not,\
f_lin(x)+Level_Start w l ls 0 t "Fitting \\& Extrapolation"
unset multiplot
### end of code
will result in:

Gnuplot: Violin plot with data from file

I created a script to plot the columns of a dataset using violin plots to show the distribution of the data points starting from the Gnuplot Demo Scripts. However, I can't solve the following error:
"violinplot.gnu", line 27: all points y value undefined!
Does anyone have any idea?
The script:
reset
set terminal pdfcairo size 20,14 enhanced font 'Times,28'
set output 'violinplot.0.pdf'
set datafile separator ','
set table $kdensity1
plot 'profile.csv' using 2:(1) smooth kdensity bandwidth 10. with filledcurves above y lt 9 title 'B'
unset table
unset key
print $kdensity1
set border 2
#unset margins
#unset xtics
set ytics nomirror rangelimited
set title "Distribution of times in milliseconds"
set boxwidth 0.075
set style fill solid bo -1
set errorbars lt black lw 5
set xrange [-6:6]
plot $kdensity1 using (1 + $2/1.):1 with filledcurve x=1 lt 10, \
$kdensity1 using (1 - $2/1.):1 with filledcurve x=1 lt 10
The dataset is in a CSV format as follows (and each column contains time in milliseconds):
1,1814,604,840,1306,13623
2,2195,68,908,1380,14416
3,1173,70,887,512,14301
4,1286,112,982,1541,9549
5,630,97,869,1321,5725
6,1227,689,917,393,4700
7,3402,357,951,500,5431
8,3429,120,969,1661,6281
...
Gnuplot Version 5.2 patchlevel 2
The reason for your error is simple but "nasty" and hidden.
Your input data is comma separated. However, if you plot to a table via set table $kdensity the default column separator is whitespace. That's why gnuplot doesn't find any data in column 2.
I guess since gnuplot 5.2.2. you could set set table $kdensity separator comma. But in order to get a comma as separator you have to use the "plotting style" with table (e.g. plot FILE u 1:2 w table). However, with table and smooth ... do not work together. Either you use with table and you will get the comma but not "smoothed" or you "smooth" and you will not get the comma.
Two possible solutions:
after plotting to the smoothed table set your separator to whitespace (see example below).
or alternatively,
change your input data to whitespace separated.
If you want plot the original (comma separated) data as well, then you have two different column separators. Then you have to apply another workaround.
Script: (works with gnuplot>=5.2.2)
### violin plot with comma separated input data
reset session
# create some random test data (comma separated)
set table $Data separator comma
set samples 100
n = 0
plot for [i=1:3] '+' u (n=n+1):(invnorm(rand(0))*i*25 +i*200) w table
unset table
set datafile separator ','
set table $kdensity
set samples 1000
plot $Data using 2:(1) smooth kdensity bandwidth 10.
unset table
set datafile separator whitespace
set key noautotitle
set style fill solid 0.7
plot $kdensity u (1 + $2/1.):1 w filledcurves x=1 lt 10, \
'' u (1 - $2/1.):1 w filledcurves x=1 lt 10
### end of script
Result:

How to assign specific title to each line in the data file in gnuplot

I have a data file which keeps all the x, y coordinates and radius values for drawing circles. Each circle stand for a region. Up to now I drew the circles. But I want to assign specific legend to each line in the data file. Because after drawing regions, I want to put some points on this regions depend on the region number. However I couldn't figure out how to do it. Is there anyone who know how to assign a specific legend to the circles depend on its line number in the data file. The data file looks like
X Y R Legend
5 6 0.1 1
....
and so on. I want to use the last column as title to assign to the circles. Is there any way to do that?
It depends how exactly you want to show the corresponding "title". Let's assume that the data file circles.dat contains following data:
5 6.0 0.1 1
5 5.5 0.1 2
4 5.0 0.2 3
One option would be to plot the circles and use the fourth column as labels which are placed at the centers of the individual circles. This can be directly achieved with the with labels plotting style as:
set terminal pngcairo
set output 'fig1.png'
fName = 'circles.dat'
unset key
set xr [3:6]
set yr [4:7]
set size square
set tics out nomirror
set xtics 3,1,6
set mxtics 2
set ytics 4,1,7
set mytics 2
plot \
fName u 1:2:3 w circles lc rgb 'red' lw 2, \
'' u 1:2:4 w labels tc rgb 'blue'
This produces:
Alternatively, one might want to put those labels into the legend of the graph. Perhaps there is a more elegant solution, nevertheless one way is to
plot each line of the data file separately and extract the fourth column (to be used as key title) manually:
set terminal pngcairo
set output 'fig2.png'
fName = 'circles.dat'
unset key
set xr [3:6]
set yr [4:7]
set size square
set tics out nomirror
set xtics 3,1,6
set mxtics 2
set ytics 4,1,7
set mytics 2
set key top right reverse
stat fName nooutput
plot \
for [i=0:STATS_records-1] fName u 1:2:3 every ::i::i w circles t system(sprintf("awk 'NR==%d{print $4}' '%s'", i+1, fName))
This gives:

gnuplot - intersection of two plots

I am using gnuplot to plot data from two separate csv files (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs) with a different number of rows which generates the following graph.
These data seem to have no common timestamp (the first column) in both csv files and yet gnuplot seems to fit the plotting as shown above.
Here is the gnuplot script that I use to generate my plot.
# ###### GNU Plot
set style data lines
set terminal postscript eps enhanced color "Times" 20
set output "output.eps"
set title "Actual vs. Estimated Comparison"
set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey
set grid xtics ytics mytics
#set size 2
#set size ratio 0.4
#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"
set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0
plot "estimated.csv" using ($1):2 with lines title "Estimated", "actual.csv" using ($1):2 with lines title "Actual";
Is there any way where we can print out (write to a file) the values of the intersection of these plots by ignoring the peaks above green plot? I also have tried to do an sql-join query but it doesn't seem to print out anything for the same reason I explained above.
PS: If the blue line doesn't touch the green line (i.e. if it is way below the green line), I want to take the values of the closest green line so that it will be a one-to-one correspondence (or very close) with the actual dataset.
Perhaps one could somehow force Gnuplot to reinterpolate both data sets on a fine grid, save this auxiliary data and then compare it row by row. However, I think that it's indeed much more practical to delegate this task to an external tool.
It's certainly not the most efficient way to do it, nevertheless a "lazy approach" could be to read the data points, interpret each dataset as a LineString (collection of line segments, essentially equivalent to assuming a linear interpolation between data points) and then calculate the intersection points. In Python, the script to do this might look like this:
#!/usr/bin/env python
import sys
import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))
for g in lines[0].intersection(lines[1]):
if g.geom_type != 'Point':
continue
print('%f,%f' % (g.x, g.y))
Then in Gnuplot, one can invoke it directly:
set terminal pngcairo
set output 'fig.png'
set datafile separator comma
set yr [0:700]
set xr [0:10]
set xtics 0,2,10
set ytics 0,100,700
set grid
set xlabel "Time [seconds]"
set ylabel "Segments"
plot \
'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
'actual.csv' w l lc rgb 'green' t 'Actual', \
'<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''
which gives:

Resources