How to add data labels to Gnuplot histogram (smooth freq)? - gnuplot

I have data of protein molecular weights in column 6 of my file. The column in question looks like this:
MW [kDa]
16.8214045562515
101.41770820613989
24.332255496943485
43.946599899844436
210.58276787970942
57.987597263605494
27.384315650885558
119.02857910337919
8.962938979036466
I would like to plot a histogram and I am doing it using Gnuplot's smooth frequency function:
echo n=20 >$gnuplot #number of intervals
echo max=100 >> $gnuplot #max value
echo min=-0 >> $gnuplot #min value
echo width=\(max-min\)\/n >> $gnuplot #interval width
echo hist\(x,width\)=width*floor\(x\/width\)+width\/2.0 >> $gnuplot
echo plot \"$dataFile\" using \(hist\(\$6,width\)\)\:\(1.0\) smooth freq w boxes lc rgb\"blue\" notitle >> $gnuplot
How do I add a data label representing the count for each bin on top of each histogram bar? I cannot seem to find a way to do it.

I would plot the histogram data into a table first and then use this table for plotting the histogram itself and the labels.
Check the following example. If you have a file, e.g. 'myData.dat', skip the random data generation lines, instead add the line FILE = 'myData.dat' and replace all $Data with FILE. As #Eldrad mentioned in the comments, use the plotting style with labels for the labels. Check help labels and help table.
Code:
### histogram with labeled bins
reset session
# create some random test data
set print $Data
do for [i=1:2000] {
print sprintf("%g",(invnorm(rand(0))+10)*20)
}
set print
stats $Data u 1 nooutput
xmin = STATS_min
xmax = STATS_max
N = 20
myWidth = (xmax-xmin)/N
bin(col) = myWidth*floor(column(col)/myWidth)+myWidth/2.
set key noautotitle
set style fill solid 0.3
set boxwidth myWidth
set grid x,y
set offsets graph 0,0,0.05,0 # l,r,t,b
set table $Histo
plot $Data u (bin(1)) smooth freq
unset table
plot $Histo u 1:2 w boxes lc rgb "blue", \
'' u 1:2:2 w labels offset 0,0.7
### end of code
Result:

Related

Place tics at values stored in variables

I have used the stats command to store the x-postion of absolute maxima in my plot of seven datasets in seven variables, grN_pos_max_y with N that goes from 1 to 7. Can I place the tics in the x-axis at the positions specified by these variables?
I tried using
$maxima << EOD
gr1_pos_max_y
gr2_pos_max_y
gr3_pos_max_y
gr4_pos_max_y
gr5_pos_max_y
gr6_pos_max_y
gr7_pos_max_y
EOD
and then
plot ..., \
$maxima u 1:(NaN):xticlabel(1) notitle
but I don't know how to read variables into a data block (if I replace the variable names by their values, however, it works).
Edit: This is what I want (I plotted it using Ethan's answer)
I'm not entirely sure I understand what you want, but this may get you partway there:
set xtics add (gr1_pos_max_y, gr2_pos_max_y, gr3_pos_max_y, gr4_pos_max_y, gr5_pos_max_y, gr6_pos_max_y, gr7_pos_max_y)
plot 'whatever'
That will get you plain (unlabeled) tic marks in addition to whatever tic marks and labels are being generated automatically.
If you want only these marks and no auto-generated marks, remove the keyword add.
If you want to place labels to go with these new tics, change it to:
set xtics add ( "Max 1" gr1_pos_maxy, "Max 2" gr2_pos_maxy, ...
This is all assuming you want these tics to label a plot that contains something other than the tics themselves. If you want only a plot of these y values, perhaps as impulses?, please re-phrase the question or show a sketch of what you want it to look like.
There is no need for awk, you can do it all in gnuplot.
put stats into a loop and write the STATS values into a datablock $Maxima
plot your data and $Maxima as Ethan suggested with impulses
you can also plot the maxima y-value as labels in the graph
The script needs to be adapted depending on your file naming scheme.
Script:
### extract maxima from several files
reset session
N = 7
myFile(n) = sprintf("SO72750257_%d.dat",n)
# create some "random" test data
do for [n=1:N] {
set table myFile(n)
f(x) = -a*(x-x0)**2 +y0
x0 = (n-1)*10./N + rand(0)*10./N
a = rand(0)*50+10
y0 = rand(0)*80+20
plot [0:10] '+' u 1:(f(x))
unset table
}
# extract maxima
set print $Maxima
do for [n=1:N] {
stats myFile(n) u 1:2 nooutput
print sprintf("%.1f %.1f", STATS_pos_max_y, STATS_max_y)
}
set print
set yrange[0:]
set offsets graph 0.05, graph 0.05, graph 0.1, 0
set xtics () # remove all xtics
set key out noautotitle
plot for [i=1:N] myFile(i) u 1:2 w l ti sprintf("Set %d",i), \
$Maxima u 1:2:($0+1):xtic(1) w impulses lc var dt 2, \
$Maxima u 1:2:2 w labels offset 0, char 1
### end of script
Result:

In gnuplot show only the maxmimum point of the graph and highlight it

In Gnuplot I write below code:
set xlabel "Time in Seconds"
set ylabel "Resistance in Ohms"
while(1){
set multiplot layout 2, 1 title " " font ",12"
set tmargin 1.5
set title "MQ7 Gas Sensor Data"
unset key
plot 'putty2.log' using 0:1 with lines ,'' using 0:2:2 with labels center boxed bs 1 notitle column
set title "MQ9 Gas Sensor Data"
unset key
plot 'putty2.log' using 0:3 with lines
pause 1;
reread;
}
This code is described by drawing the multiplot of the data file 'putty.log' in Gnuplot. After doing this I got this:
but I want to show only the maximum point in the 1st multigraph.
Any help will be appreciated.
As starting point, the following script is a simple way to identify maxima in noisy curves. Actually, the random test data generation takes almost more lines than the maxima extraction.
On the smoothened curve you simply check if the 3 consecutive y-values y0,y1,y2 fulfil y0<y1 && y1>y2, then you have a maximum at y1.
The smoothing via smooth bezier might not be suitable for all type of data. Maybe some averaging together with smoothing might lead to better results.
For example, in the example below the human eye would also detect maxima at 35 and 42.
Futhermore, if you also want to display the y-values of the maxima, the Bezier smoothing probably will mostly return too low values compared to what averaging would give.
I hope you can optimize the script for your data and special needs.
Script:
### find maxima on smoothened data
reset session
# create some random test data
set table $Backbone
set samples 30
plot [0:100] '+' u 1:(rand(0)*10+10) w table
set table $CSpline
set samples 1000
plot $Backbone u 1:2 smooth cspline
set table $Data
noise(h) = (rand(0)*2-1)*h
spike(p,h) = rand(0) < p ? (rand(0)*2-1)*h : 0
plot $CSpline u 1:($2 + noise(1) + spike(0.2,3)) w table
unset table
# smooth the data to facilitate identification of maxima
set table $Smooth
set samples 200
plot $Data u 1:2 smooth bezier
unset table
# simple maxima extraction
set table $Maxima
plot x2=x1=y2=y1=NaN $Smooth u (x0=x1,x1=x2,x2=$1,y0=y1,y1=y2,y2=$2, y0<y1 && y1>y2 ? x1 : NaN):(y1) w table
unset table
set yrange[0:]
set key noautotitle
plot $Data u 1:2 w l lc "red", \
$Smooth u 1:2 w l lc "blue", \
$Maxima u 1:2 w impulses lc "black", \
'' u 1:(0):(sprintf("%.2f",$1)) w labels left offset 1,0.5 rotate by 90 tc "blue"
### end of script
Result:

Gnuplot Multi Column fit (not Multi-branch)

I have data files "y.csv" which contains several runs (data sets) of an experiment in columns that I want to simultaneously fit to a single function. It should work like plot for [i=2:*] "y.csv" using 1:i
to automatically accomodate however many columns are in the file. Here is a short example data file:-
,B,C,D,E,F,G,H
01,,,,,,,
02,0.2200,0.2200,0.2080,0.2170,0.1530,,
03,0.2720,0.3230,0.2530,0.2380,0.2620,,
04,0.3900,0.3790,0.3770,0.3760,0.3500,,
05,0.5520,0.5600,0.5450,0.4830,0.4870,,
06,0.6640,0.6300,0.6830,0.6030,0.6520,,
07,0.6440,0.6900,0.6360,0.5960,0.6520,,
08,0.6030,0.6470,0.6190,0.6300,0.6280,,
09,0.5450,0.5890,0.5860,0.6830,0.5540,,
10,0.6370,0.6430,0.5800,0.5270,0.6180,,
11,0.6400,0.5600,0.7190,0.6780,0.7420,,
12,,,,,,,
I can automatically plot each of these columns, overlooking column headers, etc with:-
set datafile separator ","
set datafile columnheaders
set key autotitle columnheader
set key top left
set key title "Run"
set xrange [1:12]
set xlabel "Dilution (Proportional to log([]) )"
set ylabel "Response"
plot for [i=2:*] "y.csv" using 1:i with linespoints
I can set up a function to fit with the following:-
sig(x) = 1 / (1+exp(-x)) ; # Appears stable enough in gnuplot
A = 0.6 ; # Sigmoid Amplitude
B = 0.2 ; # Sigmoid offset
C = 6 ; # Center shift on displayed X axis
K = 1 ; # Shape factor
ssig(x) = B + A*sig(K*(x-C)) ; # Fit to this
And, I can fit to the first data column with:-
fit ssig(x) "y.csv" using 1:2 via A,B,C,K
But I can't work out the syntax of how to automatically do this over all the columns like I can for plotting. I was expecting something like
fit [1:-1:i=2:*] ssig(x) "y.csv" using 1:i via A,B,C,K
would iterate over the columns. I just don't understand the multi-branch syntax, and guess I am missing some simple concept.
Many thanks
Based on your comment you are actually not searching for a multi-branch fit, but you want to merge all columns into one single data set and perform a fit using all data points at the same time. This can be achieved quite easily by reshaping the data file into a datablock first:
set datafile separator ","
set table $FITDATA
plot for [i=2:*] "y.csv" u 1:i
unset table
unset datafile separator
sig(x) = 1 / (1+exp(-x)) ; # Appears stable enough in gnuplot
A = 0.6 ; # Sigmoid Amplitude
B = 0.2 ; # Sigmoid offset
C = 6 ; # Center shift on displayed X axis
K = 1 ; # Shape factor
ssig(x) = B + A*sig(K*(x-C)) ; # Fit to this
set fit errorvariables
fit ssig(x) $FITDATA u 1:2 via A,B,C,K
In the datablock the columns are separated by tabs, not comma, therefore one has to revert the datafile separator to default while fitting, and change it back again for plotting. Maybe someone else has a cleaner solution for this. set fit errorvariables saves the fit errors, so that they can be used for the plot title later.
set datafile separator ","
set datafile columnheaders
set key autotitle columnheader
set key top left
set key title "Run"
set xrange [1:12]
set xlabel "Dilution (Proportional to log([]) )"
set ylabel "Response"
plot for [i=2:*] "y.csv" u 1:i w lp, \
ssig(x) lc black lw 3 t "fit", \
keyentry t sprintf("A = %.3f ± %.3f", A, A_err), \
keyentry t sprintf("B = %.3f ± %.3f", B, B_err), \
keyentry t sprintf("C = %.3f ± %.3f", C, C_err), \
keyentry t sprintf("K = %.3f ± %.3f", K, K_err)

Automatic offset in gnuplot

I am plotting data from a datafile and the data has behaviour that after a while on the x-axis the y-axis start to monotonically decrease and ultimately go to zero (with some very small fluctuations later on).
Hence, I want to offset the y-axis so that those fluctuations are clearly visible. For that I use something like set offsets 0,0,0,0.1. But I have actually written a bash script to generate the plot for me. I just need to provide the datafile name to it. So for each plot I don't want to go into the script and manually set offset value based on the data.
I would like if the offset were determined by gnuplot automatically based on the bin-size on the axis, like the offset is 1*bin-size. So my command could look like :
set offsets 0,0,0,1*$bin_size
Is there any way to achieve this?
Edit:
This is the script I am using.
#!/bin/bash
#Requires that the script be in the same directory as the data files
#sed -n '3001,4000p' fish_data_re.dat > fish_data_re_3k_4k.dat : Can be used to extract data from specific range in data file
DATA_FILE_NAME="abc"
DATA_FILE_TYPE="dat"
#Code to generate normalised files
awk 'NR == FNR {if(max < $2) {max = $2}; next} {$2 = $2 / max; printf "%f\t%f\n", $1, $2}' $DATA_FILE_NAME.$DATA_FILE_TYPE $DATA_FILE_NAME.$DATA_FILE_TYPE > $DATA_FILE_NAME\_normed.$DATA_FILE_TYPE
DATA_FILE_NAME="$DATA_FILE_NAME\_normed"
DATA_FILE_TYPE="dat"
OUTPUT_FILE_TYPE="eps"
OUTPUT_FILE_NAME="$DATA_FILE_NAME\_plot.$OUTPUT_FILE_TYPE"
X_LABEL="Time"
Y_LABEL="Real Classical Fisher Information"
TITLE="Real Classical Fisher Information vs Time"
#Set font size for axis tics
X_TICS_SIZE="6"
Y_TICS_SIZE="6"
gnuplot <<- MULTI_LINE_CODE_TAG
set xlabel "$X_LABEL"
set ylabel "$Y_LABEL"
#Following command allows the printing of underscore from name of data file in plot
set key noenhanced
set title "$TITLE"
set xtics font ", $X_TICS_SIZE"
set ytics font ", $Y_TICS_SIZE"
set xtics nomirror
set ytics nomirror
#set ytics format "%.22g"
set ytics format "%0.s*10^{%L}"
#set xtics format "%t"
set multiplot
#------The big-plot------
set title "$TITLE"
set offsets 0,0,0,0.01
#Following plots only data from line 1 to line 100
#plot "<(sed -n '1,100p' $DATA_FILE_NAME.$DATA_FILE_TYPE)" u 1:2 notitle w l lc "red" lw 2
plot "$DATA_FILE_NAME.$DATA_FILE_TYPE" u 1:2 notitle w l lc "red" lw 2
#------The sub-plot------
unset title
unset offsets
set origin 0.25,0.3
set size 0.45,0.45
set xrange [30:60]
set yrange [-0.01:0.01]
unset xlabel
unset ylabel
#unset label
plot "$DATA_FILE_NAME.$DATA_FILE_TYPE" u 1:2 notitle w l lc "red" lw 2
unset multiplot
set term "$OUTPUT_FILE_TYPE"
set output "$OUTPUT_FILE_NAME"
replot
MULTI_LINE_CODE_TAG
exit
As you can see I need to provide the offset manually.
Here is the plot I am getting.
The y-axis here got offset by -0.002 -0.2. I want to automate this thing and want gnuplot to always use the the offset as the size of a bin (which I define as the distance between successive tics).
(If this is a trivial question I apologise in advance, I am quite new to gnuplot.)
I guess I still don't understand your exact problem. By the way, your offset it -200e-3 = -0.2 not -0.002.
Is your data always between 0 and 1?
You could set the offsets depending on the graph (check help offsets)
set offsets 0,0,0, graph 0.2
In general, why not using logarithmic scale? With this you will be able to see all small features in your data.
Code:
### linear scale vs logarithmic scale
reset session
# Gauss curve by specifing Amplitude A, position x0 and width via FWHM
GaussW(x,x0,A,FWHM) = A * exp(-(x-x0)**2/(2*(FWHM/(2*sqrt(2*log(2))))**2))
# create some test data
set xrange[0:100]
set samples 500
set table $Data
plot '+' u 1:(GaussW($1,5,1,2.5) + GaussW($1,40,7e-3,2) + GaussW($1,47,8e-4,5) + 2e-4) w table
unset table
set multiplot layout 1,2
set offset 0,0,0, graph 0.2
set yrange[-0.02:1]
plot $Data u 1:2 w l title "linear y-scale"
set logscale y
set yrange[1e-4:1]
plot $Data u 1:2 w l title "logarithmic y-scale"
unset multiplot
### end of code
Result:

Gnuplot: Scatter plot and density

I have x- and y-data points representing a star cluster. I want to visualize the density using Gnuplot and its scatter function with overlapping points.
I used the following commands:
set style fill transparent solid 0.04 noborder
set style circle radius 0.01
plot "data.dat" u 1:2 with circles lc rgb "red"
The result:
However I want something like that
Is that possible in Gnuplot? Any ideas?
(edit: revised and simplified)
Probably a much better way than my previous answer is the following:
For each data point check how many other data points are within a radius of R. You need to play with the value or R to get some reasonable graph.
Indexing the datalines requires gnuplot>=5.2.0 and the data in a datablock (without empty lines). You can either first plot your file into a datablock (check help table) or see here:
gnuplot: load datafile 1:1 into datablock
The time for creating this graph will increase with number of points O(N^2) because you have to check each point against all others. I'm not sure if there is a smarter and faster method. The example below with 1200 datapoints will take about 4 seconds on my laptop. You basically can apply the same principle for 3D.
Script: works with gnuplot>=5.2.0
### 2D density color plot
reset session
t1 = time(0.0)
# create some random rest data
set table $Data
set samples 700
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
set samples 500
plot '+' u (invnorm(rand(0))+2):(invnorm(rand(0))+2) w table
unset table
print sprintf("Time data creation: %.3f s",(t0=t1,t1=time(0.0),t1-t0))
# for each datapoint: how many other datapoints are within radius R
R = 0.5 # Radius to check
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
set print $Density
do for [i=1:|$Data|] {
x0 = real(word($Data[i],1))
y0 = real(word($Data[i],2))
c = 0
stats $Data u (Dist(x0,y0,$1,$2)<=R ? c=c+1 : 0) nooutput
d = c / (pi * R**2) # density: points per unit area
print sprintf("%g %g %d", x0, y0, d)
}
set print
print sprintf("Time density check: %.3f sec",(t0=t1,t1=time(0.0),t1-t0))
set size ratio -1 # same screen units for x and y
set palette rgb 33,13,10
plot $Density u 1:2:3 w p pt 7 lc palette z notitle
### end of script
Result:
Would it be an option to postprocess the image with imagemagick?
# convert into a gray scale image
convert source.png -colorspace gray -sigmoidal-contrast 10,50% gray.png
# build the gradient, the heights have to sum up to 256
convert -size 10x1 gradient:white-white white.png
convert -size 10x85 gradient:red-yellow \
gradient:yellow-lightgreen \
gradient:lightgreen-blue \
-append gradient.png
convert gradient.png white.png -append full-gradient.png
# finally convert the picture
convert gray.png full-gradient.png -clut target.png
I have not tried but I am quite sure that gnuplot can plot the gray scale image directly.
Here is the (rotated) gradient image:
This is the result:
Although this question is rather "old" and the problem might have been solved differently...
It's probably more for curiosity and fun than for practical purposes.
The following code implements a coloring according to the density of points using gnuplot only. On my older computer it takes a few minutes to plot 1000 points. I would be interested if this code can be improved especially in terms of speed (without using external tools).
It's a pity that gnuplot does not offer basic functionality like sorting, look-up tables, merging, transposing or other basic functions (I know... it's gnuPLOT... and not an analysis tool).
The code:
### density color plot 2D
reset session
# create some dummy datablock with some distribution
N = 1000
set table $Data
set samples N
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
unset table
# end creating dummy data
stats $Data u 1:2 nooutput
XMin = STATS_min_x
XMax = STATS_max_x
YMin = STATS_min_y
YMax = STATS_max_y
XRange = XMax-XMin
YRange = YMax-YMin
XBinCount = 20
YBinCount = 20
BinNo(x,y) = floor((y-YMin)/YRange*YBinCount)*XBinCount + floor((x-XMin)/XRange*XBinCount)
# do the binning
set table $Bins
plot $Data u (BinNo($1,$2)):(1) smooth freq # with table
unset table
# prepare final data: BinNo, Sum, XPos, YPos
set print $FinalData
do for [i=0:N-1] {
set table $Data3
plot $Data u (BinNumber = BinNo($1,$2),$1):(XPos = $1,$1):(YPos = $2,$2) every ::i::i with table
plot [BinNumber:BinNumber+0.1] $Bins u (BinNumber == $1 ? (PointsInBin = $2,$2) : NaN) with table
print sprintf("%g\t%g\t%g\t%g", XPos, YPos, BinNumber, PointsInBin)
unset table
}
set print
# plot data
set multiplot layout 2,1
set rmargin at screen 0.85
plot $Data u 1:2 w p pt 7 lc rgb "#BBFF0000" t "Data"
set xrange restore # use same xrange as previous plot
set yrange restore
set palette rgbformulae 33,13,10
set colorbox
# draw the bin borders
do for [i=0:XBinCount] {
XBinPos = i/real(XBinCount)*XRange+XMin
set arrow from XBinPos,YMin to XBinPos,YMax nohead lc rgb "grey" dt 1
}
do for [i=0:YBinCount] {
YBinPos = i/real(YBinCount)*YRange+YMin
set arrow from XMin,YBinPos to XMax,YBinPos nohead lc rgb "grey" dt 1
}
plot $FinalData u 1:2:4 w p pt 7 ps 0.5 lc palette z t "Density plot"
unset multiplot
### end of code
The result:

Resources