Gnuplot: Violin plot with data from file

Gnuplot: Violin plot with data from file - gnuplot

I created a script to plot the columns of a dataset using violin plots to show the distribution of the data points starting from the Gnuplot Demo Scripts. However, I can't solve the following error:
"violinplot.gnu", line 27: all points y value undefined!
Does anyone have any idea?
The script:
reset
set terminal pdfcairo size 20,14 enhanced font 'Times,28'
set output 'violinplot.0.pdf'
set datafile separator ','
set table $kdensity1
plot 'profile.csv' using 2:(1) smooth kdensity bandwidth 10. with filledcurves above y lt 9 title 'B'
unset table
unset key
print $kdensity1
set border 2
#unset margins
#unset xtics
set ytics nomirror rangelimited
set title "Distribution of times in milliseconds"
set boxwidth 0.075
set style fill solid bo -1
set errorbars lt black lw 5
set xrange [-6:6]
plot $kdensity1 using (1 + $2/1.):1 with filledcurve x=1 lt 10, \
$kdensity1 using (1 - $2/1.):1 with filledcurve x=1 lt 10
The dataset is in a CSV format as follows (and each column contains time in milliseconds):
1,1814,604,840,1306,13623
2,2195,68,908,1380,14416
3,1173,70,887,512,14301
4,1286,112,982,1541,9549
5,630,97,869,1321,5725
6,1227,689,917,393,4700
7,3402,357,951,500,5431
8,3429,120,969,1661,6281
...
Gnuplot Version 5.2 patchlevel 2

The reason for your error is simple but "nasty" and hidden.
Your input data is comma separated. However, if you plot to a table via set table $kdensity the default column separator is whitespace. That's why gnuplot doesn't find any data in column 2.
I guess since gnuplot 5.2.2. you could set set table $kdensity separator comma. But in order to get a comma as separator you have to use the "plotting style" with table (e.g. plot FILE u 1:2 w table). However, with table and smooth ... do not work together. Either you use with table and you will get the comma but not "smoothed" or you "smooth" and you will not get the comma.
Two possible solutions:
after plotting to the smoothed table set your separator to whitespace (see example below).
or alternatively,
change your input data to whitespace separated.
If you want plot the original (comma separated) data as well, then you have two different column separators. Then you have to apply another workaround.
Script: (works with gnuplot>=5.2.2)
### violin plot with comma separated input data
reset session
# create some random test data (comma separated)
set table $Data separator comma
set samples 100
n = 0
plot for [i=1:3] '+' u (n=n+1):(invnorm(rand(0))*i*25 +i*200) w table
unset table
set datafile separator ','
set table $kdensity
set samples 1000
plot $Data using 2:(1) smooth kdensity bandwidth 10.
unset table
set datafile separator whitespace
set key noautotitle
set style fill solid 0.7
plot $kdensity u (1 + $2/1.):1 w filledcurves x=1 lt 10, \
'' u (1 - $2/1.):1 w filledcurves x=1 lt 10
### end of script
Result:

Related

Gnuplot multi column plot using CSV headings

I'm struggling to get a multi-column bar chart / histogram going with my input as a CSV with headings. As well as the key showing the {wcfiles,wclines,clocfiles,cloclines} attributes.
$summary << EOD
browser,wcfiles,wclines,clocfiles,cloclines
webkitgtk-2.28.2,19472,4710385,18620,3120740
firefox-78.0.1,289298,43627834,240137,24371602
chromium-83.0.4103.116,420343,100340817,269434,49597826
EOD
set datafile separator ','
set yrange [0:*] # start at zero, find max from the data
set style fill solid border -1
set ytics format "%.0s%c" # will generate labels 100k 200k 300k ... 1M
set title 'sloc the Web'
plot '$summary' using 0:2:($0+1):xtic(1) with boxes lc variable,\
"" u 3 title "wclines",\
"" u 4 title "clocfiles"

Check the examples #Ethan mentioned.
In your case you should set logscale y, otherwise it will be difficult to visualize values with differences of several orders of magnitude.
Code:
### histogram clustered
reset session
$Data <<EOD
browser,wcfiles,wclines,clocfiles,cloclines
webkitgtk-2.28.2,19472,4710385,18620,3120740
firefox-78.0.1,289298,43627834,240137,24371602
chromium-83.0.4103.116,420343,100340817,269434,49597826
EOD
set datafile separator ','
set title 'sloc the Web'
set yrange [1000:*]
set logscale y
set ytics format "%.0s%c"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
plot $Data u 2:xtic(1) ti col,\
'' u 3 ti col,\
'' u 4 ti col
### end of code
Result:

gnuplot - intersection of two plots

I am using gnuplot to plot data from two separate csv files (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs) with a different number of rows which generates the following graph.
These data seem to have no common timestamp (the first column) in both csv files and yet gnuplot seems to fit the plotting as shown above.
Here is the gnuplot script that I use to generate my plot.
# ###### GNU Plot
set style data lines
set terminal postscript eps enhanced color "Times" 20
set output "output.eps"
set title "Actual vs. Estimated Comparison"
set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey
set grid xtics ytics mytics
#set size 2
#set size ratio 0.4
#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"
set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0
plot "estimated.csv" using ($1):2 with lines title "Estimated", "actual.csv" using ($1):2 with lines title "Actual";
Is there any way where we can print out (write to a file) the values of the intersection of these plots by ignoring the peaks above green plot? I also have tried to do an sql-join query but it doesn't seem to print out anything for the same reason I explained above.
PS: If the blue line doesn't touch the green line (i.e. if it is way below the green line), I want to take the values of the closest green line so that it will be a one-to-one correspondence (or very close) with the actual dataset.

Perhaps one could somehow force Gnuplot to reinterpolate both data sets on a fine grid, save this auxiliary data and then compare it row by row. However, I think that it's indeed much more practical to delegate this task to an external tool.
It's certainly not the most efficient way to do it, nevertheless a "lazy approach" could be to read the data points, interpret each dataset as a LineString (collection of line segments, essentially equivalent to assuming a linear interpolation between data points) and then calculate the intersection points. In Python, the script to do this might look like this:
#!/usr/bin/env python
import sys
import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))
for g in lines[0].intersection(lines[1]):
if g.geom_type != 'Point':
continue
print('%f,%f' % (g.x, g.y))
Then in Gnuplot, one can invoke it directly:
set terminal pngcairo
set output 'fig.png'
set datafile separator comma
set yr [0:700]
set xr [0:10]
set xtics 0,2,10
set ytics 0,100,700
set grid
set xlabel "Time [seconds]"
set ylabel "Segments"
plot \
'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
'actual.csv' w l lc rgb 'green' t 'Actual', \
'<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''
which gives:

Clustered bar plot in gnuplot with errorbars

I'm new to using gnuplot and I've followed this question which plots the data as I desire. However, I'd very much like to also include error bars. I've tried to do so by adding min and max error columns as follows:
Broswer,Video,min,max,Audio,min,max,Flash,min,max,HTML,min,max,JavaScript,min,max
IE,30%,5,5,10%,5,5,25%,5,5,20%,5,5,15%,5,5
Chrome,20%,5,5,5%,5,5,35%,5,5,30%,5,5,10%,5,5
Which I then try to plot with the script modified as follows:
set terminal pdf enhanced
set output 'bar.pdf'
set style data histogram
set style histogram cluster gap 1
set style fill solid border rgb "black"
set auto x
set yrange [0:*]
set datafile separator ","
plot 'data.dat' using 2:xtic(1) title col with yerrorbars, \
'' using 3:xtic(1) title col with yerrorbars, \
'' using 4:xtic(1) title col with yerrorbars, \
'' using 5:xtic(1) title col with yerrorbars, \
'' using 6:xtic(1) title col with yerrorbars
From what I understand from reading this should also plot errorbars, but I get the error:
"plot2", line 16: Not enough columns for this style
Googling this error informs me that it has something to do with the first column being non-numerical. I've tried a few suggestions including this one, but nothing has worked so far. So, any suggestions? Thanks.

This error tells you, that the yerrorbars plotting style requires more than one column for plotting (the xtic(1) takes a special parts). Looking at the documentation, you can see, that you can use either two, three or four columns. I don't go more into detail, because the with yerrorbars selects a completely new plotting style and you don't get any histogram at all.
In order to plot clustered histograms, you must add errorbars to the histogram's style definition, and of course you must give the column for the yerror values:
set style data histogram
set style histogram cluster gap 1 errorbars
set style fill solid border rgb "black"
set auto x
set yrange [0:*]
set datafile separator ","
plot 'data.dat' using 2:3:xtic(1) title col(2),\
'' using 5:6 title col(5), \
'' using 8:9 title col(8), \
'' using 11:12 title col(11), \
'' using 14:15 title col(14)
Or, in shorter notation
plot for [i=2:14:3] 'data.dat' using i:i+1:xtic(1) title col(i)
If you explicitly need to plot min and max values, than you must add a third column. But then the last two columns are ymin and ymax and not delta values. Judging from you data file error, the values in the data file are deltas, so the plot command should be:
plot for [i=2:14:3] 'data.dat' using i:(column(i) - column(i+1)):(column(i) + column(i+2)):xtic(1) title col(i)

Gnuplot interchanging Axes

I would like to reproduce this plot with gnuplot:
My data has this format:
Data
1: time
2: price
3: volume
I tried this:
plot file using 1:2 with lines, '' using 1:3 axes x1y2 with impulses
Which gives a normal time series chart with y1 as price and y2 as volume.
Next, I tried:
plot file using 2:1 with lines, '' using 2:3 axes x1y2 with impulses
Which gives prices series with y1 as time and y2 as volume.
However, I need the price to remain at y1 and volume at x2.
Maybe something like:
plot file using 1:2 with lines,' ' using 2:3 axes y1x2 with impulses
However, that does not give what I want.

Gnuplot has no official way to draw this kind of horizontal boxplots. However, you can use the boxxyerrorbars (shorthand boxxy) to achieve this.
As I don't have any test data of your actual example, I generated a data file from a Gaussian random-walk. To generate the data run the following python script:
from numpy import zeros, savetxt, random
N = 500
g = zeros(N)
for i in range(1, N):
g[i] = g[i-1] + random.normal()
savetxt('randomwalk.dat', g, delimiter='\t', fmt='%.3f')
As next thing, I do binning of the 'position data' (which in your case would be the volume data). For this one can use smooth frequency. This computes the sum of the y values for the same x-values. So first I use a proper binning function, which returns the same value for a certain range (x +- binwidth/2). The output data is saved in a file, because for the plotting we must exchange x and y value:
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
set output "| head -n -2 > randomwalk.hist"
set table
plot 'randomwalk.dat' using (hist($1)):(1) smooth frequency
unset table
unset output
Normally one should be able to use set table "randomwalk.hist", but due to a bug, one needs this workaround to filter out the last entry of the table output, see my answer to Why does the 'set table' option in Gnuplot re-write the first entry in the last line?.
Now the actual plotting part is:
unset key
set x2tics
set xtics nomirror
set xlabel 'time step'
set ylabel 'position value'
set x2label 'frequency'
set style fill solid 1.0 border lt -1
set terminal pngcairo
set output 'randwomwalk.png'
plot 'randomwalk.hist' using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) with boxxy lc rgb '#00cc00' axes x2y1,\
'randomwalk.dat' with lines lc rgb 'black'
which gives the result (with 4.6.3, depends of course on your random data):
So, for your data structure, the following script should work:
reset
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
file = 'data.txt'
histfile = 'pricevolume.hist'
set table histfile
plot file using (hist($2)):($3) smooth unique
unset table
# get the number of records to skip the last one
stats histfile using 1 nooutput
unset key
set x2tics
set xtics nomirror
set xlabel 'time'
set ylabel 'price'
set x2label 'volume'
set style fill solid 1.0 border lt -1
plot histfile using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) every ::::(STATS_records-2) with boxxy lc rgb '#00cc00' axes x2y1,\
file with lines using 1:2 lc rgb 'black'
Note, that this time the skipping of the last table entry is done by counting all entries with the stats command, and skipping the last one with every (yes, STATS_records-2 is correct, because the point numbering starts at 0). This variant doesn't need any external tool.
I also use smooth unique, which computes the average value of the , instead of the sum (which is done with smooth frequency).

In the gnuplot how do I plot data from two different files into a single plot?

I have two different files to plot in the gnuplot. they use a) different separator b) different time on x-axis
hence for each of them to plot separately I need to pass
set datafile separator
set timefmt
I would like to impose/overlay both data in a single graph such, that they are aligned with time
how could I do this?

The problem with the different separators can be addressed by using the format after the using modifier to specify a different separator for each file, e.g.:
plot 'file1.dat' u 1:2 '%lf,%lf'
plots a two column file with comma separator. See help\using for some more detail.
I am not expert of time formats, so I don't know how to deal with the timestamp format problem. But maybe you can use some function like strftime(). I never tried it, but it seems to me it does what you need.

You're right, you will need to pass set datafile separator and set timefmt once per file. You can do it like this:
set terminal <whatever>
set output <whatever.wht>
set xdata time # tell gnuplot to parse x data as time
set format x '%F' # time format to display on plot x axis
set datafile separator ' ' # separator 1
set timefmt '%F' # time format 1
plot 'file1'
set datafile separator ',' # separator 2
set timefmt '%s' # time format 2
replot 'file2'
The replot command by itself replots the previous line, and if you specify another line to be plotted that will go on top of the first one like I did here.

It seems to me that you have 2 options. The first is to pick a datafile format and beat both datafiles into that format, maybe using awk:
plot '<awk "-f;" "{print $1,$2}" data1' using 1:2 w lines,\
'data2' using 1:2 w lines
*Note, your awk command will almost certainly be different, this just shows how to use awk in an inline pipe.
Your second option is to use multiplot with explicit axes alignment:
set multiplot
set xdata time
set datafile sep ';' #separator for first file
set timefmt "..." #time format for first file
set lmargin at screen 0.9
set rmargin at screen 0.1
set tmargin at screen 0.9
set bmargin at screen 0.1
unset key
plot 'data1' u 1:2 w lines ls 1 nontitle
set key #The second plot command needs to add both "titles" to the legend/key.
set datafile sep ',' #separator for second file
set timefmt "..." #time format for second file
unset border
unset xtics
unset ytics
#unset other stuff that you set to prevent it from being plotted twice.
plot NaN w lines ls 1 title "title-for-plot-1", \
'data1' u 1:2 w lines ls 2 title "title-for-plot-2"
The plot NaN trick is only necessary if you want to have things show up correctly in the legend. If you're not using a legend, you can not worry about it.

This works for me :
reset
set term pngcairo
set output 'wall.png'
set xlabel "Length (meter)"
set ylabel "error (meter)"
set style line 1 lt 1 linecolor rgb "yellow" lw 10 pt 1
set style line 2 lt 1 linecolor rgb "green" lw 10 pt 1
set style line 3 lt 1 linecolor rgb "blue" lw 10 pt 1
set datafile separator ","
set key
set auto x
set xtics 1, 2, 9
set yrange [2:7]
set grid
set label "(Disabled)" at -.8, 1.8
plot "file1.csv" using 1:2 ls 1 title "one" with lines ,\
"file2.csv" using 1:2 ls 2 title "two" with lines ,\
"file3.csv" using 1:2 ls 3 title "three" with lines
set output

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string