I have data files from 20 GB to 50 GB each. I want to draw plots using theses files which I later use to make small videos. My Gnuplot script is as follows:
unset xtics
unset ytics
unset key
unset border
set xrange [0:12.8]
set yrange [0:7.2]
set cbrange [0:1]
set size ratio -1
set term png
file = "hugeDataFile.dat"
sizeX = 1280
sizeY = 720
lines = sizeX*sizeY
numberOfImages = 1500
do for [i=1:numberOfImages]{
set terminal png size sizeX, sizeY
set output sprintf('%05d.png', i)
start = (i-1)*lines+1
end = i*lines
plot file every ::start::end u 1:2:3 pt 5 ps 0.4 pal
}
The problem is, it takes more than 48 hours to plot those 1500 .png images from a 31 GB data file. Is there any way to accelerate that process? Can I make the Gnuplot script more efficient?
Related
I want to read multiple png files - which themselves were created with gnuplot (terminal png) - in order to achieve an "overlay" - that is, a number of functions plotted together one on top of the other, with no background. This apparently could be done with gnuplot in one session.
I found this idea from the Linux Gazette article "Plotting the spirograph equations with 'gnuplot' ", from 2006 :
https://linuxgazette.net/133/luana.html
I am stuck on a number of error messages (vide infra) :
line 0: Bad data on line 1 of file [...]
line 0: warning: using default binary record/array structure
line 0: Too many using specs for this style
Looking for solutions, I read in the help pages ( http://gnuplot.info/docs_5.5/loc7742.html ) that gnuplot can read png images :
plot 'file.png' binary filetype=png
... and I have looked into using pngcairo instead of png itself. I am using eog to view the .png images. Here is sample code which generates the error above, and more if adjusted :
set size ratio -1
set nokey
set noxtics
set noytics
set noborder
set parametric
i2p = {0,1}*2*pi
set terminal png
t0 = 0
t1 = 1
#---------------------------------------------
# plot first function in the gnuplot session :
#---------------------------------------------
test01(t) = exp(i2p*(2*t))
set output "solve_png_problem_15nov22a.png"
plot [t=t0:t1] 1*real(test01(t)),1*imag(test01(t)) lc 1
#---------------------------------------------------
# plot second function in the same gnuplot session :
#---------------------------------------------------
test02(t) = + 3*1.0**20 * exp(i2p*(-3*t+20/200. )) + 3*1.0**19 * exp(i2p* (2*t+20/200.))
set output "solve_png_problem_15nov22b.png"
plot [t=t0:t1] 1*real(test02(t)),1*imag(test02(t)) lc 2
#------------------------------------------------------------
# last plotting to apparently "overlay" the two plots above :
#------------------------------------------------------------
set terminal png size 600,600
set output "solve_png_problem_15nov22_overlay.png"
set noparametric
plot "solve_png_problem_15nov22a.png", "solve_png_problem_15nov22b.png"
.... the reduced sample code is generated from the awk script supplement to the article - see it for detail :
https://linuxgazette.net/133/misc/luana/spirolang.awk.txt
The functions are nontrivial so they were kept in tact, as the associated settings might be causing the problem. The individual images look ok, so I think the problem is in the last plot command.
I read in the help pages that gnuplot can read png images :
plot 'file.png' binary filetype=png
... and also filetype=auto, and I have looked into using pngcairo instead of png itself, with no progress ; I have read the results of Google searches for the error messages. I have read the help pages on terminal, png, image, binary, and so on. I was expecting gnuplot to simply recognize the file was a png image that gnuplot itself generated, using the png terminal. What actually results is the error"Too many using specs for this style". For this, I have tried moving the position of the "binary filetype=png" in the code, which give the error "line 0: Bad data on line 1 of file [...]". I have also tried using programs outside gnuplot, such as montage and composite (ImageMagick).
gnuplot version 5.4 patchlevel 2
Ubuntu 22.04
post-answer update:
TL;DR : use svg terminal.
I saved a lot of grief by simply using the svg terminal. The original work must have been published before gnuplot got the svg terminal. I still need to work svg into the original script - but svg will make it a lot easier.
Try this in GNUPLOT.
gnuplot<<EOF
set terminal png medium size 600,600 background rgb "white"
set size ratio -1
set nokey
set noxtics
set noytics
set noborder
set parametric
i2p = {0,1}*2*pi
t0 = 0
t1 = 1
#---------------------------------------------
# plot first function in the gnuplot session :
#---------------------------------------------
test01(t) = exp(i2p*(2*t))
set output "solve_png_problem_15nov22a.png"
plot [t=t0:t1] 1*real(test01(t)),1*imag(test01(t)) lc 1
#---------------------------------------------------
# plot second function in the same gnuplot session :
#---------------------------------------------------
test02(t) = + 3*1.0**20 * exp(i2p*(-3*t+20/200. )) + 3*1.0**19 * exp(i2p* (2*t+20/200.))
set output "solve_png_problem_15nov22b.png"
plot [t=t0:t1] 1*real(test02(t)),1*imag(test02(t)) lc 2
#------------------------------------------------------------
# last plotting to "overlay" the two plots above :
#------------------------------------------------------------
set output "solve_png_problem_15nov22_overlay.png"
plot \
[t=t0:t1] 1*real(test01(t)),1*imag(test01(t)) lc 1, \
[t=t0:t1] 1*real(test02(t)),1*imag(test02(t)) lc 2
EOF
First Result:
Second Result:
Combined Result:
I have a python script that I write out a .gnu file and it plots a .png file. I am trying to make the yrange more dynamic by setting the range to be 5% of the max and min.
What am I doing wrong?
This code will not run like this.
#-- write out .gnu file
self.output = textwrap.dedent('''\
set terminal png size 800,600
set output "{0}"
set grid
set xlabel "Cycle"
set title "{1}"
set xtics ({2})
set yrange[GPVAL_Y_MIN:GPVAL_Y_MAX]
plot ''').format(self.figurename, self.title, ",".join(plot_data.keys()), self.styletype, self.datafile)
for n in range(0,max_num_lines):
tmp_str = " ".join(['"{2}"','using','1:'+str(n+2),'title',"'"+self.titles[n+1]+"'",'w linespoints {1}']).format(self.figurename, self.linecombos[n], self.datafile)
if n!=max(range(0,max_num_lines)):
tmp_str += ", "
self.output += tmp_str
pass
The internal variables GPVAL_Y_MIN/GPVAL_Y_MAX are not initialized until you actually plot something. To circumvent this, you might use the stats command which analyzes a file and provides the desired min/max values via the STATS_min_y/STATS_max_y variables:
self.output = textwrap.dedent('''\
set terminal png size 800,600
set output "{0}"
set grid
set xlabel "Cycle"
set title "{1}"
set xtics ({2})
#analyze the file and adjust y-range
stats "{3}" nooutput
set yrange[STATS_min_y:STATS_max_y]
plot ''').format(self.figurename, self.title, ",".join(plot_data.keys()), self.styletype, self.datafile)
I have data recorded in time. But some data lines are missing and gnuplot replace them with long lines in these intervals.
How can i set gnuplot to draw nothing instead of draw lines in these intervals?
PS. I don't have free cells in these lines, I dont have these lines at all.
lines:
column 1 ... col 195
13:30:20.8 0.78061899
13:30:21.8 5.969546498
13:32:19.8 17.21257881
13:32:20.8 6.922475345
If you don't want to draw a line between two points you must insert an empty line in the data file between the two point entries, so that effectively you have
13:30:20.8 0.78061899
13:30:21.8 5.969546498
13:32:19.8 17.21257881
13:32:20.8 6.922475345
This cannot be done with gnuplot directly, but you can use e.g. awk to do the processing on-the-fly:
set timefmt '%H:%M:%S'
set xdata time
filename = 'data.txt'
plot 'awk ''{split($1,d,":"); t_prev = t; t = (d[1] * 60 + d[2])*60 + d[3]; if (t_prev && (t - t_prev > 10)) print ""; print }'' '.filename with lines
Here, the gap threshold is 10 seconds.
I suppose your miss data identifier is "NaN", then you can use the following command
plot "data" using 1:($2) with linespoints
instead of
plot "data" using 1:2 with linespoints
The former one will ignore the missing data and treat it as blank line and therefore not draw a connecting line across the gap while the latter one will draw continuous, unbroken line.
Just for the records: there are later questions about the same/similar issue.
Avoid connection of points when there is empty data
How to remove line between "jumping" values, in gnuplot?
Removing vertical lines due to sudden jumps in gnuplot
However, my solutions there require transparent color, which was not available in at the time of OP's question (gnuplot 4.6.5, Feb 2014). Nevertheless, there is a solution without external tools like awk or changing the data.
First solution for gnuplot 4.6.: Instead of a transparent line you use a white line which, however, will cover the grid lines, although it will be hardly visible.
Second solution for gnuplot 4.6 is using vectors. This really interrupts the line and will work for gnuplot 5.x as well.
Data:
00:00:00 0.406406
00:00:44 0.339779
00:01:28 0.986602
00:02:13 0.17746
00:02:57 0.0580277
00:03:42 0.586614
00:04:26 0.84247
00:05:11 0.597502
00:05:55 0.0394846
00:06:40 0.369416
00:13:20 0.527109
00:13:42 0.371411
00:14:04 0.851465
00:14:26 0.980312
00:14:48 0.431391
00:15:11 0.545491
00:15:33 0.708445
00:15:55 0.861669
00:16:17 0.277122
00:16:40 0.787273
Script:
### avoid showing a line across larger time gaps
reset
FILE = "SO26510245.dat"
myFmt = "%H:%M:%S"
tGap = 60 # 60 seconds
set format x "%H:%M"
set timefmt "%H:%M:%S"
set xdata time
set ytics 0.5
set key top center noautotitle
set grid x,y
set multiplot layout 3,1
plot FILE u 1:2 w l lc rgb "red" ti "data as is"
myColor(col) = (t0=t1, t1=timecolumn(1), t1-t0>tGap ? 0xffffff : 0x0000ff)
plot t1=NaN FILE u 1:2:(myColor(1)) w l lc rgb var ti "white line"
myGap(col) = (t1-t0>tGap ? NaN : y0)
plot t1=y1=NaN FILE u (t0=t1,t1=timecolumn(1),t0):(y0=y1,y1=$2,myGap(0)):(t1-t0):(y1-y0) \
w vec lc rgb "web-green" nohead ti "with vectors"
unset multiplot
### end of script
Result: (created with gnuplot 4.6.0, from March 2012)
I am making several graphs at once with a perl script which runs gnuplot and outputs png images.
My data looks like:
3.57 3.13 2.88 3.38 A1H1'-A1H8
4.95 4.53 4.17 4.89
3.91 3.37 3.11 3.64
3.98 4.22 3.88 4.55 A1H2'-A2H1'
...
columns are x, y, y low error, and point label.
GNUPlot input is:
set xlabel 'X-Ray Distance (Angstrom)'
set ylabel 'NOESY Distance (Angstrom)'
set title 'r(AAAA) A-Form Correlation'
set terminal png size 1200, 900
set xrange[2:9]
set yrange[2:9]
set output 'correlation_AAAA.png'
plot x title 'NMR = X-Ray', \
'correlation_AAAA.dat' title 'NMR' with yerrorbars
My question is, how can I get the 5th column to show as a label for some points (not all)?
This link: http://newsgroups.derkeiler.com/Archive/Comp/comp.graphics.apps.gnuplot/2008-02/msg00094.html says it is very difficult (nigh on impossible)
I got lost in all of the GNUPlot documentation. My own fault.
Here is the minimal working solution:
set label " A4H8-A3H3'" at 7.42, 2.98
set rmargin at screen 0.92
plot x title 'NMR = X-Ray', \
'correlation_AAAA.dat' title 'NMR' with yerrorbars
pause -1
the "set rmargin" is necessary so labels don't run off the screen edge.
Thanks for your kind help Christoph. I was confused when I tried to put "using labels with yerrorbars" which GNUPlot did not like.
I’m trying to develop a histogram script. The plot itself seems correct, but I have some problems or questions:
I don’t understand why the “stats” output says my data file has “out of range” points. What does that mean?
The “stats” minimum value doesn’t look correct, either. From the data file, minimum = -0.0312, but stats reports 0.0.
The script:
# Gnuplot histogram from "Gnuplot In Action", 13.2.1 Jitter plots and histograms (p. 256)
# these functions put data points (x) into bins of specified width
bin(x,width) = width*floor(x/width)
binwidth = 0.01
set boxwidth binwidth
# data file
data_file = "sorted.csv"
png_file = "sorted.png"
datapoint_count = 14
# taking explanations from the data file
set style data linesp
set key autotitle columnheader
set datafile separator "," # CSV format
# histogram
myTitle = "Histogram from \n" . data_file
set title myTitle
set style fill solid 1.0
set xlabel "Slack"
set mxtics
set ylabel "Count"
set yrange [0:*] # min count is always 0
set terminal png # plot file format
set output png_file # plot to file
print "xrange="
show xrange
print "yrange="
show yrange
stats data_file using ($1)
print "STATS_records=", STATS_records
print "STATS_invalid=", STATS_invalid
print "STATS_blank=", STATS_blank
print "STATS_min=", STATS_min
print "STATS_max=", STATS_max
plot data_file using (bin($1,binwidth)):(1) smooth frequency with boxes
The data file:
slack
-0.0312219
-0.000245109
-4.16338e-05
-2.08616e-05
-1.82986e-05
8.31485e-06
1.00136e-05
1.23084e-05
0
0.000102907
0.000123322
0.000138402
0.19044
0.190441
The output:
gnuplot sorted.gp
Could not find/open font when opening font "arial", using internal non-scalable font
xrange=
set xrange [ * : * ] noreverse nowriteback # (currently [-10.0000:10.0000] )
yrange=
set yrange [ 0.00000 : * ] noreverse nowriteback # (currently [:10.0000] )
* FILE:
Records: 9
Out of range: 5
Invalid: 0
Blank: 0
Data Blocks: 1
* COLUMN:
Mean: 0.0424
Std Dev: 0.0792
Sum: 0.3813
Sum Sq.: 0.0725
Minimum: 0.0000 [3]
Maximum: 0.1904 [8]
Quartile: 0.0000
Median: 0.0001
Quartile: 0.0001
STATS_records=9.0
STATS_invalid=0.0
STATS_blank=0.0
STATS_min=0.0
STATS_max=0.190441
If you give a single column to the stats command, the yrange is used to select the range from this column.
At first sight this doesn't make sense, but behaves like a plot command which has only a single column, in which case this single column is the y-value and the row number is choosen as x-value.
So, just move the set yrange part behind the stats command.
data_file = 'sorted.csv'
stats data_file using 1
show variables all
set yrange [0:*]
plot data_file ...