Sometimes it might be useful to digitize scanned graphs to get the data and redraw a plot.
I'm aware that there are dedicated tools with a lot of features, e.g. see miscellaneous links on the gnuplot home page: http://gnuplot.info/links.html
However, is there maybe a way using gnuplot only? Maybe with a few lines of code you can implement a simple version of a digitizer? Which should already be sufficient for some cases having a limited number of datapoints.
(edit: part of original question now put as answer)
In case this might be useful to someone... this gnuplot version is just for fun, sharing and for demonstration of gnuplot capabilities.
Improvements are welcome, e.g.
logarithmic axes, time axes, multiple curves, tilted images, etc.
How it works:
an image is imported into a graph. gnuplot supports e.g. GIF, JPG, PNG. To see the list of filetypes show datafile binary filetype.
the first 3 mouseclicks have to be on 1) end of y-axis, 2) origin, and 3) end of x-axis
further mouseclicks draw datapoints
The following key bindings are implemented:
0 (re-)define the axes from end of y-axis to origin to end of x-axis
x remove all datapoints
c clear only last datapoint
s save data to file
Code:
### a simple gnuplot digitizer
reset session
IMAGE = 'Superconductivity.png'
DATAFILE = 'Digitized.dat'
OriginX = 4.00
OriginY = 0.00
AxisYEnd = 0.15
AxisXEnd = 4.40
DataHeader = '# Temperature Resistance'
set print $Data
print DataHeader
set print
print $Data
$Axes <<EOD
EOD
set margins 0,0,0,0
AxesPoints = 0
fx(point) = (word(point,1)-word($Axes[2],1))/(word($Axes[3],1)-word($Axes[2],1))*(AxisXEnd - OriginX)+OriginX
fy(point) = (word(point,2)-word($Axes[2],2))/(word($Axes[1],2)-word($Axes[2],2))*(AxisYEnd - OriginY)+OriginY
bind Button1 '\
if (AxesPoints<3) { set print $Axes append; print MOUSE_X, MOUSE_Y; AxesPoints=AxesPoints+1; replot; } \
else { set print $Data append; print MOUSE_X, MOUSE_Y; set print; replot; }'
bind 0 'AxesPoints=0; set print $Axes; print ""; set print; replot;'
bind x 'set print $Data; print DataHeader; set print; replot'
bind c 'if (|$Data|>2) {array A[|$Data|]; do for [i=2:|$Data|-1] { A[i]=$Data[i] }; \
set print $Data; do for [i=1:|A|-1] { print A[i] }; } \
else {set print $Data; print DataHeader; }; set print; replot;'
bind s 'set print DATAFILE; print $Data[1]; set print DATAFILE append; \
do for [i=2:|$Data|] { print sprintf("%g %g", fx($Data[i]), fy($Data[i])) }; \
set print; pause -1 sprintf("Data saved to: %s",DATAFILE);'
plot IMAGE binary filetype=auto origin=(0,0) dx=1 dy=1 with rgbimage notitle, \
$Axes u 1:2 w l lw 2 lc "blue" noautoscale not, \
$Data u 1:2 w p pt 7 lc "red" noautoscale notitle
### end of code
Input: Superconductivity.png
Procedure: (Screen capture of wxt terminal)
Result: Digitized.dat
# Temperature Resistance
4.3696 0.130726
4.3328 0.126117
4.232 0.113966
4.20241 0.00157884
4.19248 0.000406976
4.18307 0.000751641
Related
I would like to generate a three dimensional mesh or net with variable length at each of three sides. How to achive that in gnuplot?
Thanks a lot for help.
Your question is not very detailed... furthermore, no code and no research effort visible (By the way, that's what people expect here on SO). In gnuplot console, e.g. check help do, help print, help sprintf or other keywords.
A guess what you might want could be the following:
Code:
### creating a 3D grid
reset session
set view equal xyz
dx = 0.7
dy = 0.8
dz = 0.6
set print $Data
do for [z=0:3] {
do for [y=0:4] {
do for [x=0:5] {
print sprintf("%g %g %g",x*dx,y*dy,z*dz)
}
print ""
}
print ""
}
set print
set xtics 1
set ytics 1
set ztics 1
set view 73,53
splot $Data u 1:2:3 w lp pt 7
### end of code
Result:
I have a data file file.dat with three columns (radio, angle, Temperature) for points in the plane, and I want to plot this data as a histogram using polar coordenates and color maps, like in the figure below but using gnuplot. I can create a histogram.dat file with the values of the bins that I want but I don't know how to plot it in gnuplot
To my knowledge there is no right-away "polar heatmap" plotting style in gnuplot (but I could be wrong, at least, I haven't seen an example on the demo page). Hence, you have to implement it yourself.
Basically, for each datapoint you have to plot a filled segment. Therefore, for each datapoint you have to create points on the circumference of this single segment. Then you can plot this segment with filledcurves and a specific color.
Assumptions:
data is in a regular grid/steps in angle (astep) and radius (rstep).
data is in a datablock (how to get it from a file into a datablock, see gnuplot: load datafile 1:1 into datablock)
separators are whitespaces
no header lines
Further optimization potential:
automatic extraction of astep and rstep.
I hope you can adapt the code to your needs.
Code:
### workaround for polar heatmap
reset session
set size square
set angle degrees
unset border
unset tics
set cbtics
set polar
set border polar
unset raxis
# create some test data
f(a,r) = r*cos(a) * r*sin(a) + rand(0)*100
set print $Data
do for [a=0:350:10] {
do for [r=1:20] {
print sprintf("%g %g %g",a,r,f(a,r))
}
}
set print
astep = 10
rstep = 1
# create the segments for each datapoint
set print $PolarHeatmap
do for [i=1:|$Data|] {
a = real(word($Data[i],1))
r = real(word($Data[i],2))
c = real(word($Data[i],3))
do for [j=-5:5] {
print sprintf("%g %g %g",a+j*astep/10., r-0.5*rstep, c)
}
do for [j=5:-5:-1] {
print sprintf("%g %g %g",a+j*astep/10., r+0.5*rstep, c)
}
print ""
print ""
}
set print
set style fill noborder
set palette defined (0 "blue", 1 "grey", 2 "green")
plot $PolarHeatmap u 1:2:3 w filledcurves palette notitle
### end of code
Result:
With a set of data files. I would like to performs series of operations on each file (such as fitting) and stack the resulting curves continiously along with my analysis (to see how each curves fit on the bigger picture). I wrote the following code snippet
reset
PATH = 'XRP_'
nmin = 1
nmax = 20
f(x) = log10(x); h(x) = a*x + b
name(i) = sprintf(PATH.'%04d/data_main_ddnls_twod_mlce.dat', i)
set xrange [0:7]
start = 0
set fit
do for [i=nmin:nmax]{
fit [4:] h(x) name(i) using (f($1)):(f($4)) via a, b
if (start==0){
plot name(i) using (f($1)):(f($4)) w l title sprintf("%04d", i)
} else {
replot name(i) using (f($1)):(f($4)) w l title sprintf("%04d", i)
}
start = start + 1
pause -1
}
# Add the slope
replot (1./5.)*x + 0.5 lc 'black' lw 3 dt 2
unset fit
# pause -1
Instead of stacking all the previous curves + the current one, it plots only the current curve i-times (see loop of code). For instance, after 10 iterations it plots only the 10th datafile, 10 times (see legends on picture)
How can I fix this?
The reason your plot behaves the way it does, and example (1) from theozh does also, is that "replot f(x)" acts by tacking ", f(x)" onto the end of the previous plot command. By putting it in a loop you are basically creating the successive commands
plot f(x,i)
plot f(x,i), f(x,i)
plot f(x,i), f(x,i), f(x,i)
...
Yes the value of i might change each time, but nevertheless each plot command produces multiple copies of the same thing.
Alternative solution: I don't normally recommend multiplot mode for creating a single output, but in this case it may be the best option.
# force identical margins even if the range changes
set margins screen 0.1, screen 0.9, screen 0.1, screen 0.9
# ... same prelimary stuff as shown in the question
# revised loop using multiplot rather than replot
set multiplot
do for [i=nmin:nmax]{
fit [4:] h(x) name(i) using (f($1)):(f($4)) via a, b
plot name(i) using (f($1)):(f($4)) w l \
title sprintf("%04d", i) at screen 0.9, screen 1.0 - 0.02*i
unset tics
}
unset multiplot
Note that you cannot use auto-generated title placement because each of the multiplot iterations will put the title in the same place. So instead we use the form "title foo at ". Similarly it is better to turn off tic generation after the first pass so that you don't redraw the tics and labels each time through the loop.
Indeed, a strange behaviour which I also would not have expected. See the minimal examples below.
Version 1: basically your attempt. Not the expected result. I also don't know
why.
Version 2: the expected result. Basically the same but not in a loop.
Version 3: the expected result, although in a loop but using eval.
Not very satisfying but at least some solution. Hopefully, others will have better solutions or explanations.
### plotting in a loop
reset session
set colorsequence classic
# Version 1
set title "Version 1"
do for [i=1:5] {
if (i==1) { plot x**i }
else { replot x**i noautoscale }
}
pause -1
# Version 2
set title "Version 2"
plot x**1
replot x**2 noautoscale
replot x**3 noautoscale
replot x**4 noautoscale
replot x**5 noautoscale
pause -1
# Version 3
set title "Version 3"
do for [i=1:5] {
if (i==1) { cmd = sprintf("plot x**%d",i) }
else { cmd = sprintf("replot x**%d noautoscale",i) }
eval cmd
}
### end of code
I have x- and y-data points representing a star cluster. I want to visualize the density using Gnuplot and its scatter function with overlapping points.
I used the following commands:
set style fill transparent solid 0.04 noborder
set style circle radius 0.01
plot "data.dat" u 1:2 with circles lc rgb "red"
The result:
However I want something like that
Is that possible in Gnuplot? Any ideas?
(edit: revised and simplified)
Probably a much better way than my previous answer is the following:
For each data point check how many other data points are within a radius of R. You need to play with the value or R to get some reasonable graph.
Indexing the datalines requires gnuplot>=5.2.0 and the data in a datablock (without empty lines). You can either first plot your file into a datablock (check help table) or see here:
gnuplot: load datafile 1:1 into datablock
The time for creating this graph will increase with number of points O(N^2) because you have to check each point against all others. I'm not sure if there is a smarter and faster method. The example below with 1200 datapoints will take about 4 seconds on my laptop. You basically can apply the same principle for 3D.
Script: works with gnuplot>=5.2.0
### 2D density color plot
reset session
t1 = time(0.0)
# create some random rest data
set table $Data
set samples 700
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
set samples 500
plot '+' u (invnorm(rand(0))+2):(invnorm(rand(0))+2) w table
unset table
print sprintf("Time data creation: %.3f s",(t0=t1,t1=time(0.0),t1-t0))
# for each datapoint: how many other datapoints are within radius R
R = 0.5 # Radius to check
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
set print $Density
do for [i=1:|$Data|] {
x0 = real(word($Data[i],1))
y0 = real(word($Data[i],2))
c = 0
stats $Data u (Dist(x0,y0,$1,$2)<=R ? c=c+1 : 0) nooutput
d = c / (pi * R**2) # density: points per unit area
print sprintf("%g %g %d", x0, y0, d)
}
set print
print sprintf("Time density check: %.3f sec",(t0=t1,t1=time(0.0),t1-t0))
set size ratio -1 # same screen units for x and y
set palette rgb 33,13,10
plot $Density u 1:2:3 w p pt 7 lc palette z notitle
### end of script
Result:
Would it be an option to postprocess the image with imagemagick?
# convert into a gray scale image
convert source.png -colorspace gray -sigmoidal-contrast 10,50% gray.png
# build the gradient, the heights have to sum up to 256
convert -size 10x1 gradient:white-white white.png
convert -size 10x85 gradient:red-yellow \
gradient:yellow-lightgreen \
gradient:lightgreen-blue \
-append gradient.png
convert gradient.png white.png -append full-gradient.png
# finally convert the picture
convert gray.png full-gradient.png -clut target.png
I have not tried but I am quite sure that gnuplot can plot the gray scale image directly.
Here is the (rotated) gradient image:
This is the result:
Although this question is rather "old" and the problem might have been solved differently...
It's probably more for curiosity and fun than for practical purposes.
The following code implements a coloring according to the density of points using gnuplot only. On my older computer it takes a few minutes to plot 1000 points. I would be interested if this code can be improved especially in terms of speed (without using external tools).
It's a pity that gnuplot does not offer basic functionality like sorting, look-up tables, merging, transposing or other basic functions (I know... it's gnuPLOT... and not an analysis tool).
The code:
### density color plot 2D
reset session
# create some dummy datablock with some distribution
N = 1000
set table $Data
set samples N
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
unset table
# end creating dummy data
stats $Data u 1:2 nooutput
XMin = STATS_min_x
XMax = STATS_max_x
YMin = STATS_min_y
YMax = STATS_max_y
XRange = XMax-XMin
YRange = YMax-YMin
XBinCount = 20
YBinCount = 20
BinNo(x,y) = floor((y-YMin)/YRange*YBinCount)*XBinCount + floor((x-XMin)/XRange*XBinCount)
# do the binning
set table $Bins
plot $Data u (BinNo($1,$2)):(1) smooth freq # with table
unset table
# prepare final data: BinNo, Sum, XPos, YPos
set print $FinalData
do for [i=0:N-1] {
set table $Data3
plot $Data u (BinNumber = BinNo($1,$2),$1):(XPos = $1,$1):(YPos = $2,$2) every ::i::i with table
plot [BinNumber:BinNumber+0.1] $Bins u (BinNumber == $1 ? (PointsInBin = $2,$2) : NaN) with table
print sprintf("%g\t%g\t%g\t%g", XPos, YPos, BinNumber, PointsInBin)
unset table
}
set print
# plot data
set multiplot layout 2,1
set rmargin at screen 0.85
plot $Data u 1:2 w p pt 7 lc rgb "#BBFF0000" t "Data"
set xrange restore # use same xrange as previous plot
set yrange restore
set palette rgbformulae 33,13,10
set colorbox
# draw the bin borders
do for [i=0:XBinCount] {
XBinPos = i/real(XBinCount)*XRange+XMin
set arrow from XBinPos,YMin to XBinPos,YMax nohead lc rgb "grey" dt 1
}
do for [i=0:YBinCount] {
YBinPos = i/real(YBinCount)*YRange+YMin
set arrow from XMin,YBinPos to XMax,YBinPos nohead lc rgb "grey" dt 1
}
plot $FinalData u 1:2:4 w p pt 7 ps 0.5 lc palette z t "Density plot"
unset multiplot
### end of code
The result:
This is a minimal working example of the code I'm using:
#!/bin/bash
gnuplot << EOF
set term postscript portrait color enhanced
set encoding iso_8859_1
set output 'temp.ps'
set grid noxtics noytics noztics front
set size ratio 1
set multiplot
set lmargin 9; set bmargin 3; set rmargin 2; set tmargin 1
n=32 #number of intervals
max=13. #max value
min=-3.0 #min value
width=(max-min)/n #interval width
hist(x,width)=width*floor(x/width)+width/2.0
set boxwidth width
set style fill solid 0.25 noborder
plot "< awk '{if (3.544068>=\$1) {print \$0}}' /data_file" u (hist(\$2,width)):(1.0) smooth freq w boxes lc rgb "red" lt 1 lw 1.5 notitle
EOF
which gets me this:
What I need is to use histeps instead, but when I change boxes for histeps in the plotcommand above, I get:
What is going on here??
Here's the data_file. Thank you!
EDIT: If having histeps follow the actual outer bars limits instead of interpolating values in between (like boxesdoes) is not possible, then how could I draw just the outline of a histogram made with boxes?
EDIT2: As usual mgilson, your answer is beyond useful. One minor glitch though, this is the output I'm getting which when I combine both plots with the command:
plot "< awk '{if (3.544068>=\$1) {print \$0}}' data_file" u (hist(\$2,width)):(1.0) smooth freq w boxes lc rgb "red" lt 1 lw 1.5 notitle, \
"<python pyscript.py data_file" u 1:2 w histeps lc rgb "red" lt 1 lw 1.5 notitle
Something appears to be shifting the output of the python script and I can't figure out what it might be.
(Fixed in comments)
The binning is quite easy if you have python + numpy. It's a very popular package, so you should be able to find it in your distribution's repository if you're on Linux.
#Call this script as:
#python this_script_name.py 3.14159 data_file.dat
import numpy as np
import sys
n=32 #number of intervals
dmax=13. #max value
dmin=-3.0 #min value
#primitive commandline parsing
limit = float(sys.argv[1]) #first argument is the limit
datafile = sys.argv[2] #second argument is the datafile to read
data = [] #empty list
with open(datafile) as f: #Open first commandline arguement for reading.
for line in f: #iterate through file returning 1 line at a time
line = line.strip() #remove whitespace at start/end of line
if line.startswith('#'): #ignore comment lines.
continue
c1,c2 = [float(x) for x in line.split()] #convert line into 2 floats and unpack
if limit >= c1: #Check to make sure first one is bigger than your 3.544...
data.append(c2) #If c1 is big enough, then c2 is part of the data
counts, edges = np.histogram(data, #data to bin
bins=n, #number of bins
range=(dmin,dmax), #bin range
normed=False #numpy2.0 -- use `density` instead
)
centers = (edges[1:] + edges[:-1])/2. #average the bin edges to the center.
for center,count in zip(centers,counts): #iterate through centers and counts at same time
print center,count #write 'em out for gnuplot to read.
and the gnuplot script looks like:
set term postscript portrait color enhanced
set output 'temp.ps'
set grid noxtics noytics noztics front
set size ratio 1
set multiplot
set lmargin 9
set bmargin 3
set rmargin 2
set tmargin 1
set style fill solid 0.25 noborder
plot "<python pyscript.py 3.445 data_file" u 1:2 w histeps lc rgb "red" lt 1 lw 1.5 notitle
I'll explain more when I get a little more free time ...