gnuplot using vertically stored datafile

gnuplot using vertically stored datafile - gnuplot

I am trying to figure out the basic syntax to draw a line graph of a server's disk usage. The data is stored in a Oracle database which obviously stores new data on separate rows, not on the same row. From what I have read so far, gnuplot seems to prefer related data to be on the same row. My data looks like this.
#disk date GB_used
disk1 20121022 99
disk1 20121023 104
disk2 20121022 170
disk2 20121023 182
Can gnuplot handle data in this format? The graph output would have 2 lines, one for disk1 and one for disk2. The data file only has a few disk_numbers but will eventually contain hundreds of rows from records for each day.

I assume you want to plot GB_used vs date for disk1 and disk2. If that's the case, this is almost the format gnuplot likes:
#disk date GB_used
disk1 20121022 99
disk1 20121023 104
disk2 20121022 170
disk2 20121023 182
Here's a simple awk script to convert it:
awk 'BEGIN{getline;x=$1;print $0}{if($1!=x){print '\n\n';print$0;x=$1}else{print $0}}' example.dat
In this case, gnuplot would want you to separate the two datasets by 1 or 2 blank lines. If you separate by 1 blank line, gnuplot will plot 2 lines of the same linetype
If you separate by 2 blank lines, gnuplot will plot both data sets, and you can make it plot with different line types:
plot for [idx=0:1] 'example.dat' i idx u 2:3 w lines
Essentially the same effect can be achieved by filtering as demonstrated in the answer by #andyras

Yes, it can, but you may have to trick it a bit. Here is the basic plot command:
plot "< sed 's/^disk//' data.dat" using ($1==1?$2:1/0):3 title 'disk 1', \
'' using ($1==2?$2:1/0):3 title 'disk 2'
First I run the data file through sed to remove the string 'disk' from each row. Then gnuplot makes a conditional comparison after the using keyword. In the first plot command, it checks if the first data column is equal to 1 (which it would be for 'disk1' - 'disk'), if so it plots the second column vs. the third, else it plots 1/0 (which gnuplot ignores).
I tried doing it in pure gnuplot:
plot 'data.dat' u ($1 eq 'disk1'?$2:1/0):3 t 'disk 1', \
'' u ($1 eq 'disk2'?$2:1/0):3 t 'disk 2'
but it did not like the string comparison in the plot command.
To get the time format right you will want to do something like
set xdata time
set timefmt '%Y%m%d'
set format x '%F'
before the plot command.
EDIT:
As #mgilson pointed out, the strcol command can be used if you want a 'pure gnuplot' solution:
plot 'data.dat' u (strcol(1) eq 'disk1'?$2:1/0):3 t 'disk 1', \
'' u (strcol(1) eq 'disk2'?$2:1/0):3 t 'disk 2'

Related

Subtract smoothed data from original

I wonder whether there is a way to subtract smoothed data from original ones when doing things of the kind:
plot ["17.12.2020 08:00:00":"18.12.2020 20:00:00"] 'data3-17-28.csv1' using 4:5 title 'Sensor 3' with lines, \
'' using 4:5 smooth acsplines
Alternatively I would need to do it externally, of course.

As #Suntory already suggested you can plot smoothed data into a table.
However, keep in mind, the number of datapoints will be determined by set samples, default setting is 100 and the smoothed datapoints will be equidistant. So, if you set samples to the number of your datapoints and your data is equidistant as well, then all should be fine.
Concatenating data line by line is not straightforward in gnuplot, since gnuplot is not intended to do such operations.
The following gnuplot-only solution assumes that you have your data in a datablock $Data without headers and empty lines. If not, you could either plot it with table from file into a table named $Data or use the following approach in the accepted answer of this question: gnuplot: load datafile 1:1 into datablock
If you don't have equidistant data, you need to interpolate data, which is also not straightforward in gnuplot, see: Resampling data with gnuplot
It's up to you: either you use external tools (which might not be platform-independent) or you apply a somewhat cumbersome platform independent gnuplot-only solution.
Code:
### plot difference of data to smoothed data
reset session
$Data <<EOD
1 0
2 13
3 16
4 17
5 11
6 8
7 0
EOD
stats $Data u 0 nooutput # get number of rows or datapoints
set samples STATS_records
set table $Smoothed
plot $Data u 1:2 smooth acsplines
unset table
# put both datablock into one
set print $Difference
do for [i=1:|$Data|] {
print sprintf('%s %s',$Data[i],$Smoothed[i+4])
}
set print
plot $Data u 1:2 w lp pt 7, \
$Smoothed u 1:2 w lp pt 6, \
$Difference u 1:($2-$4) w lp pt 4 lc "red"
### end of code
Result:

If I well understand you would like this :
First write your smooth's data in out.csv file
set table "out.csv" separator comma
plot 'file' u 4:5 smooth acsplines
unset table
Then this line will paste 'out.csv' to file as an appended column.You will maybe need to delete first lines using sed command (sed '1,4d' out.csv)
stats 'file' matrix
Thanks to stats we automatically get the number of column in your original data (STATS_size_x).
plot "< paste -d' ' file out.csv" u 4:($5-$(STATS_size_x+2)) w l
Could you please try this small code on your data.

gnuplot xy scatter plot with multiple groups

Using gnuplot 5, I want to make a scatter plot using data for more than one series. I know it is possible to say
plot data_file using 1:2 with points, data_file using 3:4 with points
when my series is in different columns; also I could store the data in several data files. What I really would prefer, however, is to store all the data in a single data file and use the first column to indicate set membership, like this:
foo 10 11
foo 12 22
bar 1 4
foo 5 8
bar 2 3
and so on. Is this possible in gnuplot 5?

you could preprocess the file externally (in order to select a particular group) and instruct Gnuplot to plot it. For example:
dataFile="input.dat"
selectGroup(group, fname)=sprintf("< gawk '$1==\"%s\"{print $2, $3}' %s", group, fname)
plot for [group in "foo bar"] selectGroup(group, dataFile) w p t group

Multiple datasets in the same data file in gnuplot

I have a following kind of file:
<string1> <x1> <y1>
<string2> <x2> <y2>
...
I want to draw a scatter plot from the (x,y) values, having the different strings in the first column in different data sets, which will be drawn with different colors (I have many different x,y values but only a few different strings). I tried this:
plot "DATAFILE" using 2:3 title column(1)
Unfortunately, this one picks the first column for the first row and uses that as a title for all entries.

You could use awk to pick only rows where the first column matches your strings:
plot "<awk '$1~/string1/' DATAFILE" using 2:3 title column(1),\
"<awk '$1~/string2/' DATAFILE" using 2:3 title column(1)
and so on. For a built-in gnuplot solution, you can do:
plot "DATAFILE" u 2:(stringcolumn(1) eq "string1" ? $3:1/0),\
"DATAFILE" u 2:(stringcolumn(1) eq "string2" ? $3:1/0)
if you want to do something more automatic that would generate plots for every unique entry in column 1, this solution worked for me:
input file (test.dat - separated, otherwise need to change cut statement below):
one 1 3
two 2 4
ten 3 5
ten 4 3
two 5 4
one 6 5
one 7 3
ten 8 4
two 9 5
ten 10 3
two 11 4
one 12 5
the following line creates a plotting statement for gnuplot, and saves in a file:
cut -f1 test.dat | sort -u | awk '
BEGIN {print "plot\\"}
{print "\"test.dat\" u 2:(stringcolumn(1) eq \""$1"\" ?\$3:1/0),\\"}' > plot.gp
and the contents are:
plot\
"test.dat" u 2:(stringcolumn(1) eq "one" ?$3:1/0),\
"test.dat" u 2:(stringcolumn(1) eq "ten" ?$3:1/0),\
"test.dat" u 2:(stringcolumn(1) eq "two" ?$3:1/0),\
then you'd do:
gnuplot plot.gp
or add the line load "plot.gp" to your script.
I am pretty sure there must be a "gnuplot-only" solution, but that goes beyond my knowledge. Hope this helps.

You have just one plot, so just one title.
If you want to plot separately all datasets (separated by two consecutive blank lines), you (just) need to say so:
N_datasets=3
plot for [i=0:N_datasets-1] "file.dat" using 2:3 index i with title columnhead(1)
But the formatting of your datafile is not what gnuplot expects, and using title columnhead will also skip first line (assumed to contain headers only). The standard gnuplot format for this would be:
string1
x1_1 y1_1
x1_2 y1_2
...
string2
x2_1 y2_1
x2_2 y2_2
...

Multiple data blocks, changing plot title

I'm trying to build a graph with a csv file.
It's supposed to have an undetermined number of data blocks inside.
My CSV looks like this :
year;amount;NAME1
year;amount;NAME1
year;amount;NAME1
year;amount;NAME2
year;amount;NAME2
year;amount;NAME2
So I want my graph to have two curves (or more if there's more blocks), one named NAME1 and the other NAME2.
The only way I've fount to retrieve the name is by using:
title columnhead(3)
But by using this, the first line of my csv is missing, and I can't figure why ...
Here's my script generating the image
gnuplot <<EOF
set terminal png
set title "Stages par professeur par années"
set output "stages_entr_ann.png"
set auto x
set key on outside left bmargin
set datafile separator ";"
set xtics 1
set ytics 1
stats 'fichier3_t.stat'
plot for [IDX=0:STATS_blocks-1] 'fichier3_t.stat' index IDX u 1:2 title columnhead(3) with linespoints ls IDX
EOF
(There's an unknow amount of blocks, so I'm using STATS_blocks)

The point is that columnhead expects something like this:
Year Amount Name
2013 5000 John
2014 8000 Max
2015 12000 Susanne
i.e. the first row of each colum is treated as label, not as data. While gnuplot extracts the name fine, it ignores the rest of the line.
There is no simple and direct solution for this, but you can do a workaround:
plot for [IDX=0:STATS_blocks-1] 'fichier3_t.stat' index IDX u 1:(1/0) title columnhead(3) with linespoints ls IDX, for [IDX=0:STATS_blocks-1] 'fichier3_t.stat' index IDX u 1:(1/0) notitle with linespoints ls IDX
To make it more clear:
plot 'fichier3_t.stat' ... u 1:(1/0) title columnhead(3) with linespoints ..., \
'fichier3_t.stat' ... u 1:2 notitle with linespoints ...
The first command is similar to yours, it just does not plot any data (because 1/0 is always invalid). It's only there to generate the entry in the legend (key).
The second command plots the data, but does not generate any entry in the legend. Just make sure the same line style is assigned to the two plots.
If the table is large and speed is a problem, you may also use every for the title generating command to reduce the number of points it trys to plot.
Finally, you may also get the message "No valid data in xrange" or similar. That's intended here but shouldn't be a problem.

gnuplot plot from string

Is it possible to pass to plot data in a string?
I mean do something like this:
plot "09-13-2010,2263.80 09-14-2010,2500" using 1:2 with lines

It is possible to do something like:
set xdata time
set timefmt "%m-%d-%y"
plot "< echo '09-13-2010,2263.80 09-14-2010,2500' | tr ' ' '\n' | tr ',' ' '" using 1:2 with lines
Where the < character indicates to Gnuplot that we want our input from the output of a command. Gnuplot separates records with a newline. Groups of records are separated by a blank record. Within a record, the default column separator is a space. In the above example tr is used to split your data into lines, and the rewrite the lines into records.
Another way to plot your data from a string is to use the "-" input specifier, and then load the data in from the command line. A program could easily emit the following:
set xdata time
set timefmt "%m-%d-%y"
plot '-' using 1:2 with lines
09-13-2010 2263.80
09-14-2010 2500
e
Your best bet is to use an input file like:
09-13-2010 2263.80
09-14-2010 2500
Assuming the input file is named mydata.txt, you can then plot it with the commands:
set xdata time
set timefmt "%m-%d-%y"
plot 'mydata.txt' using 1:2 with lines
All the examples above give you something like:
If you want to plot two data series using dates and the `-' input you could do the following:
set xdata time
set timefmt "%m-%d-%y"
plot '-' using 1:2 title "Series 1" with lines,'-' using 1:2 title "Series 2" with lines
09-13-2010 2263.80
09-14-2010 2500
e
09-13-2010 2500
09-14-2010 2263.80
e
The previous example gives:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

gnuplot using vertically stored datafile - gnuplot

Related

Subtract smoothed data from original

gnuplot xy scatter plot with multiple groups

Multiple datasets in the same data file in gnuplot

Multiple data blocks, changing plot title

gnuplot plot from string

Categories

Resources