I have a following kind of file:
<string1> <x1> <y1>
<string2> <x2> <y2>
...
I want to draw a scatter plot from the (x,y) values, having the different strings in the first column in different data sets, which will be drawn with different colors (I have many different x,y values but only a few different strings). I tried this:
plot "DATAFILE" using 2:3 title column(1)
Unfortunately, this one picks the first column for the first row and uses that as a title for all entries.
You could use awk to pick only rows where the first column matches your strings:
plot "<awk '$1~/string1/' DATAFILE" using 2:3 title column(1),\
"<awk '$1~/string2/' DATAFILE" using 2:3 title column(1)
and so on. For a built-in gnuplot solution, you can do:
plot "DATAFILE" u 2:(stringcolumn(1) eq "string1" ? $3:1/0),\
"DATAFILE" u 2:(stringcolumn(1) eq "string2" ? $3:1/0)
if you want to do something more automatic that would generate plots for every unique entry in column 1, this solution worked for me:
input file (test.dat - separated, otherwise need to change cut statement below):
one 1 3
two 2 4
ten 3 5
ten 4 3
two 5 4
one 6 5
one 7 3
ten 8 4
two 9 5
ten 10 3
two 11 4
one 12 5
the following line creates a plotting statement for gnuplot, and saves in a file:
cut -f1 test.dat | sort -u | awk '
BEGIN {print "plot\\"}
{print "\"test.dat\" u 2:(stringcolumn(1) eq \""$1"\" ?\$3:1/0),\\"}' > plot.gp
and the contents are:
plot\
"test.dat" u 2:(stringcolumn(1) eq "one" ?$3:1/0),\
"test.dat" u 2:(stringcolumn(1) eq "ten" ?$3:1/0),\
"test.dat" u 2:(stringcolumn(1) eq "two" ?$3:1/0),\
then you'd do:
gnuplot plot.gp
or add the line load "plot.gp" to your script.
I am pretty sure there must be a "gnuplot-only" solution, but that goes beyond my knowledge. Hope this helps.
You have just one plot, so just one title.
If you want to plot separately all datasets (separated by two consecutive blank lines), you (just) need to say so:
N_datasets=3
plot for [i=0:N_datasets-1] "file.dat" using 2:3 index i with title columnhead(1)
But the formatting of your datafile is not what gnuplot expects, and using title columnhead will also skip first line (assumed to contain headers only). The standard gnuplot format for this would be:
string1
x1_1 y1_1
x1_2 y1_2
...
string2
x2_1 y2_1
x2_2 y2_2
...
Related
Using gnuplot 5, I want to make a scatter plot using data for more than one series. I know it is possible to say
plot data_file using 1:2 with points, data_file using 3:4 with points
when my series is in different columns; also I could store the data in several data files. What I really would prefer, however, is to store all the data in a single data file and use the first column to indicate set membership, like this:
foo 10 11
foo 12 22
bar 1 4
foo 5 8
bar 2 3
and so on. Is this possible in gnuplot 5?
you could preprocess the file externally (in order to select a particular group) and instruct Gnuplot to plot it. For example:
dataFile="input.dat"
selectGroup(group, fname)=sprintf("< gawk '$1==\"%s\"{print $2, $3}' %s", group, fname)
plot for [group in "foo bar"] selectGroup(group, dataFile) w p t group
I have a file where my data are separated into several indexes. I would like to plot some or all of the indexes as stacked filledcurves by adding the values of selected previous indexes to the values of the current index. I could not find a way to use the sum function as in the case of data arranged as columns in a single index (as in this question), even using the pseudocolumn(-2) as the index number.
Important note: every index as strictly identical sets of x values, only the y values differ.
Is there a way to do something like
p 'data.dat' index (sum(ind=1,3,4,5) ind) u 1:2 w filledcurve x1 t 'Sum(1,3,4,5)', '' index (sum(ind=1,2,5) ind) u 1:2 w filledcurve x1 t 'Sum(1,2,5)'
within gnuplot or do I have to resort to a script (maybe a variation of the one in this answer)?
You can do this with some help outside gnuplot (invoked within gnuplot). Imagine you have the following data file with 4 indices (0 to 3):
1 2
2 3
1 5
2 5
1 0
2 3
1 4
2 3
Now say that we want to sum 1 and 2 and 0 and 3. The first sum should return:
1 5
2 8
while the second sum should return
1 6
2 6
We can select the blocks we want using set table:
set table "sum1"
plot for [i in "1 2"] "data3" index 0+i pt 7 not
set table "sum2"
plot for [i in "0 3"] "data3" index 0+i pt 7 not
unset table
Now use sed piping to remove the empty lines and smooth freq to sum for equal x values:
plot "< sed '/^\s*$/d' sum1" smooth freq t "sum1", \
"< sed '/^\s*$/d' sum2" smooth freq t "sum2"
Although you may be able to do it using functions and variables of gnuplot 4.4+, this won't be very efficient as you want to perform an operation on several distant lines in your file, which is in fact an operation on arrays. Gnuplot is not meant for this, the datafiles should have a structure reasonably close to what you want to plot. I advise that you try to produce a file with such a structure, e.g. have the values you want to sum on the same line in different columns.
Here is my data file:
25 10 8
0 50 11
34 25 0
14 0 22
200 25 56
And I plot 3D vectors with splot:
splot "data" using (0):(0):(0):1:2:3 with vectors
But I would like different colors for my vectors, using something like ls nth_vector with splot (so ls 1 for the first line of the file, then ls 2, etc.). Is it possible?
Thanks!
If you double space your data file you can achieve this using index. You can use awk within gnuplot to do the spacing on the fly:
splot for [i=0:system("wc -l < data")] '<awk -v s="\n" "{print s}1" data' using (0):(0):(0):1:2:3 index i notitle with vectors
The system command counts the number of lines in the file. awk prints two newlines for every line in the data file, so each line has a separate index. I have used a variable containing the \n character as this avoids difficulties in escaping strings.
edit
There's no need for any of that awk. You can use stats to get the number of lines in your file and every to plot each line separately:
stats 'data' nooutput
splot for [i=0:STATS_records] "data" using (0):(0):(0):1:2:3 every ::i::i with vectors notitle
You can use the row number (zeroth column) as linetype index for the linecolor variable option:
splot 'data' using (0):(0):(0):1:2:3:0 with vectors lc var
For the vectors plotting style you could even use arrowstyle variable to change the whole arrow settings.
This thread here:
Custom string in xticlabels
solved the question of customizing xticlabel strings.
Now, how do I sort the data from column 4 (for example) so that only rows containing certain strings in column 4 will be used to create the xticlabel? IOW, what is the proper format to do: (IF strcol(4) eq "Sunrise") plot 'datafile' u 4:2:( xticlabels( strcol(4).strcol(2) ) )
Given this datafile:
Sunrise cat 1
Sunset dog 2
Sunrise fish 3
waste space 4
blah blah 5
Sunrise label 6
we can plot it with this line:
plot 'test.dat' u 3:xticlabels(strcol(1) eq 'Sunrise'?strcol(1).strcol(2):'')
And it creates this plot:
Basically what I did there is I looked at the string in column 1, if it is "Sunrise", I concatenated it with the string in column 2. If it isn't "Sunrise", then I return an empty string to prevent a label from being placed there. This does however, place a major tic at the location of each data point. To avoid that, you can use the following:
plot 'test.dat' u 3:xticlabels(strcol(1) eq 'Sunrise'?strcol(1).strcol(2):NaN)
which produces this plot (I've tested on gnuplot 4.4.2 and 4.6.0):
It also issues a bunch of warnings about non-string labels, but I guess that's OK.
I am trying to figure out the basic syntax to draw a line graph of a server's disk usage. The data is stored in a Oracle database which obviously stores new data on separate rows, not on the same row. From what I have read so far, gnuplot seems to prefer related data to be on the same row. My data looks like this.
#disk date GB_used
disk1 20121022 99
disk1 20121023 104
disk2 20121022 170
disk2 20121023 182
Can gnuplot handle data in this format? The graph output would have 2 lines, one for disk1 and one for disk2. The data file only has a few disk_numbers but will eventually contain hundreds of rows from records for each day.
I assume you want to plot GB_used vs date for disk1 and disk2. If that's the case, this is almost the format gnuplot likes:
#disk date GB_used
disk1 20121022 99
disk1 20121023 104
disk2 20121022 170
disk2 20121023 182
Here's a simple awk script to convert it:
awk 'BEGIN{getline;x=$1;print $0}{if($1!=x){print '\n\n';print$0;x=$1}else{print $0}}' example.dat
In this case, gnuplot would want you to separate the two datasets by 1 or 2 blank lines. If you separate by 1 blank line, gnuplot will plot 2 lines of the same linetype
If you separate by 2 blank lines, gnuplot will plot both data sets, and you can make it plot with different line types:
plot for [idx=0:1] 'example.dat' i idx u 2:3 w lines
Essentially the same effect can be achieved by filtering as demonstrated in the answer by #andyras
Yes, it can, but you may have to trick it a bit. Here is the basic plot command:
plot "< sed 's/^disk//' data.dat" using ($1==1?$2:1/0):3 title 'disk 1', \
'' using ($1==2?$2:1/0):3 title 'disk 2'
First I run the data file through sed to remove the string 'disk' from each row. Then gnuplot makes a conditional comparison after the using keyword. In the first plot command, it checks if the first data column is equal to 1 (which it would be for 'disk1' - 'disk'), if so it plots the second column vs. the third, else it plots 1/0 (which gnuplot ignores).
I tried doing it in pure gnuplot:
plot 'data.dat' u ($1 eq 'disk1'?$2:1/0):3 t 'disk 1', \
'' u ($1 eq 'disk2'?$2:1/0):3 t 'disk 2'
but it did not like the string comparison in the plot command.
To get the time format right you will want to do something like
set xdata time
set timefmt '%Y%m%d'
set format x '%F'
before the plot command.
EDIT:
As #mgilson pointed out, the strcol command can be used if you want a 'pure gnuplot' solution:
plot 'data.dat' u (strcol(1) eq 'disk1'?$2:1/0):3 t 'disk 1', \
'' u (strcol(1) eq 'disk2'?$2:1/0):3 t 'disk 2'