gnuplot: histogram of events: issue with timecolumn() - gnuplot

I would like to see the number of events per timeperiod.
My rows look like this
"2020-11-11 09:15:50",field2,field3
This is what I have tried
binwidth = 3600 # 1h in seconds
bin(t) = (t - (int(t) % binwidth) + binwidth/2)
set datafile separator ","
#set xdata time
set timefmt '"%Y-%m-%d %H:%M:%S"'
set boxwidth binwidth
plot 'Statistics.log' using (bin(timecolumn(1, '"%Y-%m-%d %H:%M:%S"'))):(1) smooth freq with boxes
I'm getting
unknown type in magnitude()
How would I debug errors like these? (How do I dump what gnuplot "sees" for timecolumn() etc.?)
(gnuplot 4.6)

At first, The timecolumn() in gnuplot 4.6 is a single-argument function, and only the argument for the column number is allowed. Therefore, the plot command can be rewritten as,
plot "test.dat" using (bin(timecolumn(1))):(1) smooth freq with boxes
Secondly, do not include leading and trailing double quotes in your timefmt formatting.
set timefmt '%Y-%m-%d %H:%M:%S'
For more information about this, please refer to the "help data" section.
...
However, whitespace inside a pair of double quotes is ignored when
counting columns, so the following datafile line has three columns:
1.0 "second column" 3.0
Finally, your code can be modified as follows (for gnuplot 4.6)
binwidth = 3600 # 1h in seconds
bin(t) = (t - (int(t) % binwidth) + binwidth/2)
set datafile separator ","
set xdata time
set timefmt '%Y-%m-%d %H:%M:%S'
set boxwidth binwidth
plot 'Statistics.log' using (bin(timecolumn(1))):(1) smooth freq with boxes

A few minutes too late... while testing... #binzo basically already answered.
The only difference: if your data uses double quotes for the date
"2020-11-11 09:15:50",field2,field3`
and you don't want to change your existing data, you have to specify it in set timefmt. For some strange reason which I cannot explain right now, if you set datafile separator "," it will mess up the graph, but it seems to work without.
Code: (tested with gnuplot 4.6.0)
### timedata in histogram (gnuplot 4.6)
reset
FILE = 'Statistics.log'
myTimeFmt = '"%Y-%m-%d %H:%M:%S"'
# create some test data
myDate = strptime(myTimeFmt, '"2020-11-11 11:11:11"')
myRandomDate(n) = myDate + 3*3600*invnorm(rand(0))
set print FILE
do for [i=1:500] {
print sprintf("%s,%g,%g",strftime(myTimeFmt,myRandomDate(0)),rand(0),rand(0))
}
set print
# set datafile separator "," # if uncommented this will messup the plot, don't know why
set xdata time
set format x "%Y-%m-%d\n%H:%M"
set timefmt '"%Y-%m-%d %H:%M:%S"'
binwidth = 3600 # 1 h in seconds
bin(t) = (t - (int(t) % binwidth) + binwidth/2)
set boxwidth binwidth
set style fill solid 0.5
set xtics 4*3600 # 4 h in seconds
plot FILE u (bin(timecolumn(1))):(1) smooth freq w boxes notitle
### end of code
Result:

Related

Gnuplot only Plotting one Dot instead of all data

I am trying to plot time vs entropy of a data. When I run the script, it just produces a graph with one dot on y axis and no plot. Here is my script:
set terminal png
set output 'output.png'
set xdata time
set timefmt '"%Y-%m-%d %H:%M:%S"'
set format x '"%Y-%m-%d %H:%M:%S"'
set xrange ['"2008-01-01 00:00"':'"2008-03-20 00:00"']
set yrange [0.5:2.4]
set style data lines
set xlabel "Time"
set ylabel "Entropy"
plot "foobar-entropy.txt" using 1:2 w lp ls 4 lw 3
And here is the data:
"2008-01-01 02:13:38" 1.0
"2008-01-10 02:12:13" 1.5
"2008-01-20 02:11:55" 1.459
"2008-01-30 02:10:28" 1.811
"2008-02-10 02:09:44" 1.722
"2008-02-20 02:08:00" 1.65
"2008-02-28 02:07:00" 2.149
"2008-03-10 02:06:00" 2.18
"2008-03-20 02:04:00" 2.33
Any help would be appreciated.
Finally, found the mystery after #Christoph told about the line breaks. The issue was that the file had different line endings which gnuplot do not support.
When I opened the file with vi editor it appeared as follows:
"2008-01-01 02:13:38" 1.0^M
"2008-01-10 02:12:13" 1.5^M
"2008-01-20 02:13:55" 1.459^M
"2008-01-30 02:12:28" 1.811^M
"2008-02-10 02:12:44" 1.722^M
"2008-02-20 02:13:00" 1.65^M
"2008-02-28 02:13:00" 2.149^M
"2008-03-10 02:13:00" 2.18^M
"2008-03-20 02:13:00" 2.33^M
After running the command dos2unix on the file, it changed the old-style carriage-return characters to linefeeds and it works fine now.

How to autoscale the yrange of gnuplot in python

I have a python script that I write out a .gnu file and it plots a .png file. I am trying to make the yrange more dynamic by setting the range to be 5% of the max and min.
What am I doing wrong?
This code will not run like this.
#-- write out .gnu file
self.output = textwrap.dedent('''\
set terminal png size 800,600
set output "{0}"
set grid
set xlabel "Cycle"
set title "{1}"
set xtics ({2})
set yrange[GPVAL_Y_MIN:GPVAL_Y_MAX]
plot ''').format(self.figurename, self.title, ",".join(plot_data.keys()), self.styletype, self.datafile)
for n in range(0,max_num_lines):
tmp_str = " ".join(['"{2}"','using','1:'+str(n+2),'title',"'"+self.titles[n+1]+"'",'w linespoints {1}']).format(self.figurename, self.linecombos[n], self.datafile)
if n!=max(range(0,max_num_lines)):
tmp_str += ", "
self.output += tmp_str
pass
The internal variables GPVAL_Y_MIN/GPVAL_Y_MAX are not initialized until you actually plot something. To circumvent this, you might use the stats command which analyzes a file and provides the desired min/max values via the STATS_min_y/STATS_max_y variables:
self.output = textwrap.dedent('''\
set terminal png size 800,600
set output "{0}"
set grid
set xlabel "Cycle"
set title "{1}"
set xtics ({2})
#analyze the file and adjust y-range
stats "{3}" nooutput
set yrange[STATS_min_y:STATS_max_y]
plot ''').format(self.figurename, self.title, ",".join(plot_data.keys()), self.styletype, self.datafile)

store commented value from data file in gnuplot

I have multiple data files output_k, where k is a number. The files look like
#a=1.00 b = 0.01
# mass mean std
0.2 0.0163 0.0000125
0.4 0.0275 0.0001256
Now I need to retrieve the values of a and b and to store them in a variable, so I can use them for the title or function input etc. The looping over the files in the folder works. But I need some help with reading out the the parameters a and b. This is what i have so far.
# specify the number of plots
plot_number = 100
# loop over all data files
do for [i=0:plot_number] {
a = TODO
b = TODO
#set terminal
set terminal postscript eps size 6.4,4.8 enhanced color font 'Helvetica,20' linewidth 2
set title "Measurement \n{/*0.8 A = a, B = b}"
outFile=sprintf("plot_%d.eps", i)
dataFile=sprintf("output_%d.data", i)
set output outFile
plot dataFile using 1:2:3 with errorbars lt 1 linecolor "red", f(a,b)
unset output
}
EDIT:
I am working with gnuplot for windows.
If you are on a Unixoid system, you can use system to get the output of standard command line tools, namely head and sed, which again allow to extract said values form the files:
a = system(sprintf("head -n 1 output_%i.data | sed \"s/#a=//;s/ b .*//\"", i))
b = system(sprintf("head -n 1 output_%i.data | sed \"s/.*b = //\"", i))
This assumes that the leading spaces to all lines in your question are actually a formatting mistake.
A late answer, but since you are working under Windows you either install the comparable utilities or you might be interested in a gnuplot-only solution (hence platform-independent).
you can use stats to extract information from the datablock (or file) to variables. Check help stats.
the extraction of your a and b depends on the exact structure of that line. You can split a line at spaces via word(), check help word and get substrings via substr() or indexing, check help substr.
Script: (works with gnuplot>=5.0.0)
### extract information from commented header without external tools
reset session
$Data <<EOD
#a=1.00 b = 0.01
# mass mean std
0.2 0.0163 0.0000125
0.4 0.0275 0.0001256
EOD
set datafile commentschar ''
set datafile separator "\t"
stats $Data u (myHeader=strcol(1)[2:]) every ::0::0 nooutput
set datafile commentschar # reset to default
set datafile separator # reset to default
a = real(word(myHeader,1)[3:])
b = real(word(myHeader,4))
set label 1 at graph 0.1,0.9 sprintf("a=%g\nb=%g",a,b)
plot $Data u 1:2 w lp pt 7 lc "red"
### end of script
Result:

How plot graph with missing data lines?

I have data recorded in time. But some data lines are missing and gnuplot replace them with long lines in these intervals.
How can i set gnuplot to draw nothing instead of draw lines in these intervals?
PS. I don't have free cells in these lines, I dont have these lines at all.
lines:
column 1 ... col 195
13:30:20.8 0.78061899
13:30:21.8 5.969546498
13:32:19.8 17.21257881
13:32:20.8 6.922475345
If you don't want to draw a line between two points you must insert an empty line in the data file between the two point entries, so that effectively you have
13:30:20.8 0.78061899
13:30:21.8 5.969546498
13:32:19.8 17.21257881
13:32:20.8 6.922475345
This cannot be done with gnuplot directly, but you can use e.g. awk to do the processing on-the-fly:
set timefmt '%H:%M:%S'
set xdata time
filename = 'data.txt'
plot 'awk ''{split($1,d,":"); t_prev = t; t = (d[1] * 60 + d[2])*60 + d[3]; if (t_prev && (t - t_prev > 10)) print ""; print }'' '.filename with lines
Here, the gap threshold is 10 seconds.
I suppose your miss data identifier is "NaN", then you can use the following command
plot "data" using 1:($2) with linespoints
instead of
plot "data" using 1:2 with linespoints
The former one will ignore the missing data and treat it as blank line and therefore not draw a connecting line across the gap while the latter one will draw continuous, unbroken line.
Just for the records: there are later questions about the same/similar issue.
Avoid connection of points when there is empty data
How to remove line between "jumping" values, in gnuplot?
Removing vertical lines due to sudden jumps in gnuplot
However, my solutions there require transparent color, which was not available in at the time of OP's question (gnuplot 4.6.5, Feb 2014). Nevertheless, there is a solution without external tools like awk or changing the data.
First solution for gnuplot 4.6.: Instead of a transparent line you use a white line which, however, will cover the grid lines, although it will be hardly visible.
Second solution for gnuplot 4.6 is using vectors. This really interrupts the line and will work for gnuplot 5.x as well.
Data:
00:00:00 0.406406
00:00:44 0.339779
00:01:28 0.986602
00:02:13 0.17746
00:02:57 0.0580277
00:03:42 0.586614
00:04:26 0.84247
00:05:11 0.597502
00:05:55 0.0394846
00:06:40 0.369416
00:13:20 0.527109
00:13:42 0.371411
00:14:04 0.851465
00:14:26 0.980312
00:14:48 0.431391
00:15:11 0.545491
00:15:33 0.708445
00:15:55 0.861669
00:16:17 0.277122
00:16:40 0.787273
Script:
### avoid showing a line across larger time gaps
reset
FILE = "SO26510245.dat"
myFmt = "%H:%M:%S"
tGap = 60 # 60 seconds
set format x "%H:%M"
set timefmt "%H:%M:%S"
set xdata time
set ytics 0.5
set key top center noautotitle
set grid x,y
set multiplot layout 3,1
plot FILE u 1:2 w l lc rgb "red" ti "data as is"
myColor(col) = (t0=t1, t1=timecolumn(1), t1-t0>tGap ? 0xffffff : 0x0000ff)
plot t1=NaN FILE u 1:2:(myColor(1)) w l lc rgb var ti "white line"
myGap(col) = (t1-t0>tGap ? NaN : y0)
plot t1=y1=NaN FILE u (t0=t1,t1=timecolumn(1),t0):(y0=y1,y1=$2,myGap(0)):(t1-t0):(y1-y0) \
w vec lc rgb "web-green" nohead ti "with vectors"
unset multiplot
### end of script
Result: (created with gnuplot 4.6.0, from March 2012)

gnuplot "stats" command unexpected min & "out of range" results

I’m trying to develop a histogram script. The plot itself seems correct, but I have some problems or questions:
I don’t understand why the “stats” output says my data file has “out of range” points. What does that mean?
The “stats” minimum value doesn’t look correct, either. From the data file, minimum = -0.0312, but stats reports 0.0.
The script:
# Gnuplot histogram from "Gnuplot In Action", 13.2.1 Jitter plots and histograms (p. 256)
# these functions put data points (x) into bins of specified width
bin(x,width) = width*floor(x/width)
binwidth = 0.01
set boxwidth binwidth
# data file
data_file = "sorted.csv"
png_file = "sorted.png"
datapoint_count = 14
# taking explanations from the data file
set style data linesp
set key autotitle columnheader
set datafile separator "," # CSV format
# histogram
myTitle = "Histogram from \n" . data_file
set title myTitle
set style fill solid 1.0
set xlabel "Slack"
set mxtics
set ylabel "Count"
set yrange [0:*] # min count is always 0
set terminal png # plot file format
set output png_file # plot to file
print "xrange="
show xrange
print "yrange="
show yrange
stats data_file using ($1)
print "STATS_records=", STATS_records
print "STATS_invalid=", STATS_invalid
print "STATS_blank=", STATS_blank
print "STATS_min=", STATS_min
print "STATS_max=", STATS_max
plot data_file using (bin($1,binwidth)):(1) smooth frequency with boxes
The data file:
slack
-0.0312219
-0.000245109
-4.16338e-05
-2.08616e-05
-1.82986e-05
8.31485e-06
1.00136e-05
1.23084e-05
0
0.000102907
0.000123322
0.000138402
0.19044
0.190441
The output:
gnuplot sorted.gp
Could not find/open font when opening font "arial", using internal non-scalable font
xrange=
set xrange [ * : * ] noreverse nowriteback # (currently [-10.0000:10.0000] )
yrange=
set yrange [ 0.00000 : * ] noreverse nowriteback # (currently [:10.0000] )
* FILE:
Records: 9
Out of range: 5
Invalid: 0
Blank: 0
Data Blocks: 1
* COLUMN:
Mean: 0.0424
Std Dev: 0.0792
Sum: 0.3813
Sum Sq.: 0.0725
Minimum: 0.0000 [3]
Maximum: 0.1904 [8]
Quartile: 0.0000
Median: 0.0001
Quartile: 0.0001
STATS_records=9.0
STATS_invalid=0.0
STATS_blank=0.0
STATS_min=0.0
STATS_max=0.190441
If you give a single column to the stats command, the yrange is used to select the range from this column.
At first sight this doesn't make sense, but behaves like a plot command which has only a single column, in which case this single column is the y-value and the row number is choosen as x-value.
So, just move the set yrange part behind the stats command.
data_file = 'sorted.csv'
stats data_file using 1
show variables all
set yrange [0:*]
plot data_file ...

Resources