My question regards the keyword every that is used to sample an input data file (i.e., .csv, .dat etc.). I am reading the documentation of the keyword that says the following:
plot 'file' every {<point_incr>}
{:{<block_incr>}
{:{<start_point>}
{:{<start_block>}
{:{<end_point>}
{:<end_block>}}}}}
The thing is I cannot completely comprehend how to adapt this to a data set. For instance, if I have some dummy data that I wish to use to create a bar chart for example and the data are the following
# first bars group
#x axis #y axis
0 2
0.2 3
0.4 4
0.6 5
0.8 6
#second bars group
1 1
1.2 2
1.4 3
1.6 4
1.8 5
#etc.
3 10
3.2 20
3.4 30
3.6 40
3.8 50
4 20
4.2 30
4.4 40
4.6 50
4.8 60
And lets say that I want to create four bar clusters from the data. One for every block. How can I use the syntax of the keyword? Could someone give me some examples to better understand the use of it? Thank you in advance
As you've found, the every keyword allows you to cherry-pick a subset of single-newline-separated points and double-newline-separated blocks from your datafile. Your example datafile shows 20 points divided into 4 blocks.
So to plot the first block (indexed 0 in gnuplot), you only need to specify the end block, and use the default values for the other every parameters. Try:
plot 'data.txt' every :::::0 with boxes
It seems your goal is to plot each block with separate styling. Here's how you could do that with a few extra styling commands. (Note my use of gnuplot's shorthand for some keywords.)
set key left top
set boxwidth 0.2
p 'data.txt' ev :::0::0 w boxes t 'first',\
'data.txt' ev :::1::1 w boxes t 'second',\
'data.txt' ev :::2::2 w boxes t 'third',\
'data.txt' ev :::3::3 w boxes t 'fourth'
From help every:
The data points to be plotted are selected according to a loop from
<start_point> to <end_point> with increment <point_incr> and the
blocks according to a loop from <start_block> to <end_block> with
increment <block_incr>.
This should be pretty clear, however, you have to know if blocks are separated by two (or more) empty lines, you have to address them differently. Check help index. To my opinion the documentation is a bit confusing about datablock, (sub-)block, dataset, etc...
Check the following example. I assume this is not your final graph, but still needs some tuning. Depending on your detailed requirements you also might want to check help histograms.
For example every :::i::i will plot all datapoints in in block i, i.e. from block i to block i.
Code:
### plotting using "every"
reset session
$Data <<EOD
# first bars group
#x axis #y axis
0 2
0.2 3
0.4 4
0.6 5
0.8 6
#second bars group
1 1
1.2 2
1.4 3
1.6 4
1.8 5
#etc.
3 10
3.2 20
3.4 30
3.6 40
3.8 50
4 20
4.2 30
4.4 40
4.6 50
4.8 60
EOD
set key top left
set boxwidth 0.2
set key out noautotitles
set style fill solid 0.3
set yrange [:70]
plot for [i=0:3] $Data u 1:2 every :::i::i w boxes
### end of code
Result:
Related
I am able to successfully reproduce Jitter examples from here: http://gnuplot.sourceforge.net/demo/violinplot.html
However, when I try to use my own data, the points are not "jittered".
Here is the data file (data.dat):
10 1 1 3 8 8 8
20 2 2 3 8 8 8
30 3 3 3 8 8 8
Here is a minimal gnuplot input file:
set jitter
plot 'data.dat' using 1:2 with points, '' u 1:3 with points, '' u 1:4 with points, '' u 1:5 with points, '' u 1:6 with points, '' u 1:7 with points
The points are right on top of each other, whereas I want points that are in the same place to be slightly offset (x-axis).
I've installed the latest version of gnuplot:
$ gnuplot --version
gnuplot 5.2 patchlevel 6
EDIT WITH SOLUTION:
#Ethan's comment cleared it up for me. I'm able to get the jittering by reorganizing my input data file so that it's a single dataset, which contains internal "collisions", rather than reading in lots of individual data sets. e.g:
10 1
10 1
10 3
10 3
20 2
20 2
30 8
30 8
And my gnuplot file is now just:
set jitter
plot 'data.dat' using 1:2 with points
"set jitter" will not work across multiple data sets as noted in the comment. You could do something similar by adding a random displacement in the 'using' specifier.
plot for [col=2:7] 'data.dat' using 1:(column(col) + (rand(0)-0.5)/2.) with points
This is different from "set jitter" because all points will be randomly displaced, whereas with jitter only overlapping points are shifted and the displacement is not random.
Alternatively, since in your case the columns are distinct perhaps you want to shift systematically based on the column number:
plot for [col=2:7] 'data.dat' using (column(1)+col/4.) : (column(col))
Suppose I have the following data:
"1,5"
"2,10"
""
"3,4"
"4,2"
""
"5,6"
"6,10"
I want to graph this using gnuplot with a line between each condition, similar to this display:
How might this be accomplished? I have looked into gridlines, but that does not seem to suit my need. I am also looking for a solution that will automatically draw condition / phase lines between each break in the data set.
As mentioned in the comments and explained in the linked question and its answers, you can draw arbitrary lines manually via set arrow ... (check help arrow).
However, if possible I don't want to adjust the lines manually every time I change the data or if I have many different plots.
But, hey, you are using gnuplot, so, make it automated!
To be honest, within the time figuring out how it can be done I could have changed a "few" lines and labels manually ;-). But now, this might be helpful for others.
The script below is written in such a way that it doesn't matter whether you have zero, one or two or more empty lines between the different blocks.
Comments:
the function valid(1) returns 0 and 1 if column(1) contains a valid number (check help valid).
the vertical lines are plotted with vectors (check help vectors). The x-position is taken as average of the x-value before the label line and the x-value after the label line. The y-value LevelY is determined beforehand via stats (check help stats).
the labels are plotted with labels (check help labels) and positioned at the first x-value after each label line and at an y-value of LevelY with an offset.
Script:
### automatic vertical lines and labels
reset session
$Data <<EOD
Baseline
1 10.0
2 12.0
3 10.5
4 11.0 # zero empty lines follow
Treatment
5 45.0
6 35.0
7 32.5
8 31.0 # one empty line follows
Baseline
9 14.0
10 12.8
11 12.0
12 11.3 # two empty lines follow
Treatment
13 35.0
14 45.0
15 45.0
16 37.0
EOD
set offset 1,1,1,1
set border 3
set title "Student Performance" font ",14"
set xlabel "Sessions"
set xtics 1 out nomirror
set ylabel "Number of Responses"
set yrange [0:]
set ytics out nomirror
set key noautotitle
set grid x,y
stats $Data u 2 nooutput
LevelY = STATS_max # get the max y-level
getLinePosX(col) = (v0=v1,(v1=valid(col))?(x0=x1,x1=column(1)):0, v0==0?(x0+x1)/2:NaN)
getLabel(col) = (v0=v1,(v1=valid(col))?0:(h1=strcol(1),h0=h1),column(1))
plot x1=NaN $Data u (y0=(valid(1)?$2:NaN),$1):(y0) w lp pt 13 ps 2 lw 2 lc "red", \
x1=v1=NaN '' u (getLinePosX(1)):(0):(0):(LevelY) w vec nohead lc "black" lw 1.5 dt 2, \
v1=NaN '' u (getLabel(1)):(LevelY):(sprintf("%s",v0==0?h0:'')) w labels left offset 0,1.5 font ",12"
### end of script
Result:
I want to know how to create the plot of a stepwise function in Gnuplot. The function I want to plot includes the operations cost for several distance range and multiple products. For instance, if the distance is 0-300 Km for product 1 the cost is 1.05 $/Km and for product 2, it is 0.86 $/Km. When the distance increases, the cost for each product decrease.
I have defined one function for each product and plot them functions together:
gnuplot> f(x)=x<=300 ? 1.05 : x<=650 ? 0.65 : x<=1300 ? 0.46 : x<=1950 ? 0.4 : x<=3250 ? 0.31 : 0.22
gnuplot> x<=300 ? 0.86 : x<=650 ? 0.53 : x<=1300 ? 0.38: x<=1950 ? 0.32 : x<=3250 ? 0.24 : 0.19
gnuplot> plot [0:5000][0:3] f(x), g(x)
There is one problem: I can not remove the vertical lines. Any idea?
Thanks for your help
There are basically two approaches you can take. The best approach is to use a datafile, but you can use functions, although it will be more difficult.
Datafile Approach
You are probably going to have trouble doing this as a function, because you are going to get those vertical lines. A datafile gives you a little better control, and even allows you to mark the end points of the pieces of the piecewise function with the typical open/closed dots. Set up your data file with this format:
x y # left point of piece 1
x y # right point of piece 1
# one single blank line
x y # left point of piece 2
x y # right point of piece 2
# one single blank line
...
With your function f, we can do this like
0 1.05
300 1.05
300 0.65
650 0.65
650 0.46
1300 0.46
1300 0.4
1950 0.4
1950 0.31
3250 0.31
3250 0.22
6000 0.22
then plot datafile with lines gives
We can get even fancier with†
plot datafile w lines,\
last=0,\
"" u 1:(oldlast=last,last=$1,$1==oldlast?$2:1/0) w points pt 6 lt 1,\
last=0,\
"" u 1:(oldlast=last,last=$1,$1==oldlast?1/0:$2) w points pt 7 lt 1
to produce
Here we first plot the same curve as before. Then we initialize the variable last to be 0 (the value of the first x coordinate)‡, and plot the open dots.
To do this we evaluate (oldlast=last,last=$1,$1==oldlast?$2:1/0) which first stores the value of last as oldlast and then stores the value of the first column (the x coordinate) as last to use on the next point. Finally we check to see if the x-coordinate is the same as the value of oldlast (the value of the x-coordinate from the last point). If it is, we use the 2nd column value, otherwise we use the unplottable 1/0. This will cause points to be plotted only if the are the first point in the two point blocks. We plot these with points using pointstyle 6 (an open point) and linetype 1 (the same as used in the lines).
We do the same thing again, but this time plot the second points with filled dots (pointtype 7).
We can either add the points for the function g to the same file, separating it from the others by two blank lines and then use indexes to refer to them, or create a separate datafile for g. We can then add similar plot commands to the current command. For example, if we use the same file with function f followed by function g, we can do:
plot datafile i 0 w lines,\
last=0,\
"" i 0 u 1:(oldlast=last,last=$1,$1==oldlast?$2:1/0) w points pt 6 lt 1,\
last=0,\
"" i 0 u 1:(oldlast=last,last=$1,$1==oldlast?1/0:$2) w points pt 7 lt 1,\
datafile i 1 w lines,\
last=0,\
"" i 1 u 1:(oldlast=last,last=$1,$1==oldlast?$2:1/0) w points pt 6 lt 1,\
last=0,\
"" i 1 u 1:(oldlast=last,last=$1,$1==oldlast?1/0:$2) w points pt 7 lt 1
Function Approach
As far as getting only one jump, your functions have a lot of redundant conditions. Redefine f (and similarly for g) as
f(x)=x<=300 ? 1.05 : x<=650 ? 0.65 : x<=1300 ? 0.46 : x<=1950 ? 0.4 : x<=3250 ? 0.31 : 0.22
and then plot it. Make sure that the samples are set high enough, otherwise you may end up collecting multiple jumps together or get undesirable slanted lines. With
set xrange[0:6000]
set yrange[0:2]
set samples 1000
plot f(x)
we get
However, this will still get the vertical connecting lines. This is going to be very hard to avoid with a function. The best way that I can think of to avoid this is to inject a very small non-plottable value just before the breaks. For f(x), we can do this with
f(x)=x<=290 ? 1.05 : x<=300? (1/0) : x<=640? 0.65 : x<=650 ? (1/0) : x<=1290 ? 0.46 : x<=1300 ? (1/0) : x<=1940? 0.4 : x<=1950 ? (1/0) : x<=3240 ? 0.31: x<=3250? (1/0) : 0.22
Here, we have inject a non-plottable value of 1/0 for a region of length 10 just before the breaks. Smaller lengths can be used as well. If we set the samples high enough to be sure that the sampling hits each of these breaks (in this case a sample of 1000 like before is good enough), it will avoid connecting the points.
With samples set too small (for example 100), we might still get the connecting lines
Thus if we use a gap with a size smaller than 10, we may need to use higher sampling to avoid the connecting lines. Larger gaps may work with smaller sampling.
Depending on the sampling, the gaps might be larger than specified as well if the sampling is too low. For example, setting the gaps to a size of 100 with
f(x)=x<=200 ? 1.05 : x<=300? (1/0) : x<=550? 0.65 : x<=650 ? (1/0) : x<=1200 ? 0.46 : x<=1300 ? (1/0) : x<=1850? 0.4 : x<=1950 ? (1/0) : x<=3150 ? 0.31: x<=3250? (1/0) : 0.22
and a sampling of 10, we get
where the gaps have a size of 222.22 (I have added labels to make it easy to compute the gap sizeΔ), but with a sampling of 1000, we get
where the gaps have size 101.1, very close to the value of 100 specified in the function.
To use functions to do this, therefore, use this model and set the gap size to a value small enough that it will appear non-existent on the final graph (notice that on the graph from 0 to 6000, we can barely see the gap size of 10), and then set the samples reasonably high.
With the function approach, I don't know of any way to add the filled and open dots if those are desired.
† Gnuplot version 5.1 (the current development version) supports a pointtype variable option which can simplify this to
plot last=0,\
datafile u 1:2:(oldlast=last,last=$1,$1==oldlast?6:7) w linespoints pt var lt 1
Here we just plot all points, but use the same test as before to select between pointtype 7 or 6. As we can do both point types at once, we can just use the linespoints style instead of doing two separate plots.
‡ Initializing last to a value less than the first x-coordinate will cause that first point to be filled.
Δ To draw these labels, in the first case (with set xrange[0:1000] and set samples 10), I used
plot f(x),\
"+" u 1:(f($1)+0.1):(abs($1-250)<150||abs($1-600)<160?sprintf("%0.2f",$1):"") w labels
and in the case of set samples 1000
plot f(x),\
"+" u 1:(f($1)+0.1):(abs($1-250)<51||abs($1-600)<51?sprintf("%0.2f",$1):"") w labels
It takes a little playing around with the bounds on the abs functions here to get the desired labels to appear. Examining the output using set table can be helpful for getting them right.
Your ternary statements are trying to do too much. When you are writing
f0(x)=(x_low < x <= x_high) ? y : 0
you should be writing
f0(x)=((x_low < x) && ( x <= x_high)) ? y : 0
So your function f(x) should look like
f(x)=(x<=300) ? 1.05 : (x<=650) ? 0.65 : (x<=1300) ? 0.46 : (x<=1950) ? 0.4 : (x<=3250) ? 0.31 : (x<3250) ? 0.22 : 0
As for the plotting style if you want it to be discontinuous, use separate functions for the steps like and plot them individually. Your first function would be split up like:
f1(x)= (x<=300) ? 1.05 : 1/0
f2(x)=(x>300) && (x<=650) ? 0.65 : 1/0
...
If you just want steps without interpolation, use steps
plot [0:6000][0:3] f(x) w steps, g(x) w steps
I have the below data in gnuplot:
2012-09-18 0 2 12
2012-03-15 1 4 5
2012-12-18 24 8 11
2012-09-18 2 8 11
2012-03-15 16 5 5
2011-12-06 5 2 3
2012-12-18 3 12 8
2012-09-18 4 4 8
2012-03-29 11 6 2
2011-12-06 9 7 3
2012-12-18 6 7 8
2012-09-18 4 3 8
2012-02-09 27 2 1
2012-12-18 2 1 8
2012-09-18 6 14 8
1st column; x (date)
2nd column; y
3rd column; the point color
4th column; number of occurrences(the point is duplicated)
I need to write a gnuplot program which:
Draws my (x,y) points.
Gives each point a different color depending on the 3rd column value (maybe over 50 different colors).
If the 4th column is greater than 0 then the point is duplicated and it must be drawn n times and give its x,y a random positing with a small margin. for example, (rand(x)-0.5,rand(y)-0.5).
Another question, what is the best and fastest way/tool to learn gnuplot?
This is supposed to be an extension to my answer for your other question drawing duplicated points in gnuplot with small margin:
You need to have the first column interpreted as time data. For this you need
set xdata time
set timefmt '%Y-%m-%d'
In order to set the point color, it is best to define a palette and then use linecolor palette, which sets the point color based on its value in the palette.
So, using the explanations from drawing duplicated points in gnuplot with small margin the final script is:
reset
filename = 'data.dat'
stats filename using 4 nooutput
set xdata time
set timefmt '%Y-%m-%d'
set format x '%Y-%m'
rand_x(x) = x + 60*60*24*7 * (rand(0) - 0.5)
rand_y(y) = y + (rand(0) - 0.5)
plot for [i=0:int(STATS_max)-1] filename \
using (rand_x(timecolumn(1))):(i < $4 ? rand_y($2) : 1/0):3 pointtype 7 linecolor palette notitle
Some other things you must have in mind are:
The stats call must come before set xdata time, because the statistics don't work with time data.
When calculating with time data in the using statement, one needs to use the timecolumn function (as opposed to column or $.. in generic cases). This gives the time as a timestamp (i.e. in seconds).
For that reason you need two different random functions for x and y, because of the very different scalings. Here, I used a 'jitter' of one week (60*60*24*7 seconds) on the time axis.
The result with 4.6.4 is:
Some remarks to your question about learning gnuplot: Try to solve your questions by yourself and then post more concrete questions! Browse through the gnuplot demos to see what is possible, look which feature or plotting style is used, look them up in the documentation, what options/settings are offered? Play around with those demos and try to apply that to your data sets etc. In the end its all about practice (I've been using gnuplot for 12 years...).
I try to make a plot on gnuplot which has no real range order on x-axis.
--------------------->
1 4 2 20 17 12 10 8
It's therefore not a real function as you would interpret it with math knowledge, but it has some sort of index on its x-axis which has no numbering order and runs from 1-20 but 20 could be the first, or in the middle.. everything may be mixed..
hope you understand what I mean cause I am hoping gnuplot can handle that.
maybe i can write my data file so that point 2 contains the data that should be there on the y-axis and just move the labels around on x-axis?
You could e.g. write a datafile "data" containing such values
1 1.5
4 2
2 3.2
20 2.2
17 0.4
12 4.3
The second column are the "y-values", the first column the labels of the x-axis (xtics)
now try to plot this data with:
plot './data' u 2:xticlabel(1)
is that what you want?
Solution is using xticlabels and add an extra column in the data file.
ie
#xdata ydata label
0 2 1
1 3 14
2 10 0
3 8 20
etc.
command: plot "data.dat" using 1:2:xticlabels(3) with lp"