Whether is it possible to plot normal probability distribution in gnuplot - linux

My data file is as-
2 3 4 1 5 2 0 3 4 5 3 2 0 3 4 0 5 4 3 2 3 4 4 0 5 3 2 3 4 5 1 3 4
My requirement is to plot normal PDF in gnuplot.
I could do it by calculating f(x)
f(x) = \frac{1}{\sqrt{2\pi\sigma^2} } e^{ -\frac{(x-\mu)^2}{2\sigma^2} }
for each x using shell script.
Then I plot it in gnuplot using the command-
plot 'ifile.txt' using 1:2 with lines
But whether is it possible to plot directly in gnuplot?

gnuplot provides a number of processing options under the smooth keyword (try typing help smooth for more info). For your specific case, I would recommend a fit though.
First, note that your data points are in a row, you need to convert it to columns for gnuplot to use it. You can do it with awk:
awk '{for (i=1;i<=NF;i++) print $i}' datafile
which can be invoked from within gnuplot:
plot "< awk '{for (i=1;i<=NF;i++) print $i}' datafile" ...
Now assume that datafile has the right format for simplicity.
You can use the smooth frequency option to see how many occurrences of each value you have:
plot "datafile" u 1:(1.) smooth frequency w lp pt 7
To get the normalized distribution, you divide by the number of values. This can be done automatically within gnuplot with stats:
stats "datafile"
This will store the number of values in variable STATS_records, which in you case has value 33:
gnuplot> print STATS_records
33.0
So the normalized distribution (the probability of getting a value at x) is:
plot "datafile" u 1:(1./STATS_records) smooth frequency w lp pt 7
As you can see, your distribution doesn't really look like a normal distribution, but anyway, let's go on. Create a Gaussian for fitting and fit to your data, and plot it. You need to fit to the probability, rather than to the data itself. To do so, we plot to a table to extract the data generated by smooth frequency:
# Non-normalized Gaussian
f(x)= A * exp(-(x-x0)**2/2./sigma**2)
# Save probability data to table
set table "probability"
plot "datafile" u 1:(1./STATS_records) smooth frequency not
unset table
# Fit the Gaussian to the data, exclude points from table with grep
fit f(x) "< grep -v 'u' probability" via x0, sigma, A
# Normalize the gaussian
g(x) = 1./sqrt(2.*pi*sigma**2) * f(x) / A
# Plot
plot "datafile" u 1:(1./STATS_records) smooth frequency w lp pt 7, g(x)
set table generates some points which you should exclude, that's why I used grep to filter the file. Also, the Gaussian needs to be normalized after the fitting is done with a variable amplitude. If you want to retrieve the fitting parameters:
gnuplot> print x0, sigma
3.40584703189268 1.76237558717934
Finally note that if the spacing between data points is not homogeneous, e.g. instead of x = 0, 1, 2, 3 ... you have values at x = 0, 0.1, 0.5, 3, 3.2 ... then you'll need to use a different way to do this, for example defining bins of regular size to group data points.

Related

How to remove line between "jumping" values, in gnuplot?

I would like to draw a line with plots that contain "jumping" values.
Here is an example: when we have plots of sin(x) for several cycles and plot it, unrealistic line will appear that go across from right to left (as shown in following figure).
One idea to avoid this might be using with linespoints (link), but I want to draw it without revising the original data file.
Do we have simple and robust solution for this problem?
Assuming that you are plotting a function, that is, for each x value there exists one and only one corresponding y value, the easiest way to achieve what you want is to use the smooth unique option. This smoothing routine will make the data monotonic in x, then plot it. When several y values exist for the same x value, the average will be used.
Example:
Data file:
0.5 0.5
1.0 1.5
1.5 0.5
0.5 0.5
Plotting without smoothing:
set xrange [0:2]
set yrange [0:2]
plot "data" w l
With smoothing:
plot "data" smooth unique
Edit: points are lost if this solution is used, so I suggest to improve my answer.
Here can be applied "conditional plotting". Suppose we have a file like this:
1 2
2 5
3 3
1 2
2 5
3 3
i.e. there is a backline between 3rd and 4th point.
plot "tmp.dat" u 1:2
Find minimum x value:
stats "tmp.dat" u 1:2
prev=STATS_min_x
Or find first x value:
prev=system("awk 'FNR == 1 {print $1}' tmp.dat")
Plot the line if current x value is greater than previous, or don't plot if it's less:
plot "tmp.dat" u ($0==0? prev:($1>prev? $1:1/0), prev=$1):2 w l
OK, it's not impossible, but the following is a ghastly hack. I really advise you add an empty line in your dataset at the breaks.
$dat << EOD
1 1
2 2
3 3
1 5
2 6
3 7
1 8
2 9
3 10
EOD
plot for [i=0:3] $dat us \
($0==0?j=0:j=j,llx=lx,lx=$1,llx>lx?j=j+1:j=j,i==j?$1:NaN):2 w lp notit
This plots your dataset three times (acually four, there is a small error in there. I guess i have to initialise all variables), counts how often the abscissa values "jump", and only plots datapoints if this counter j is equal to the plot counter i.
Check the help on the serial evaluation operator "a, b" and the ternary operator "a?b:c"
If you have data in a repetitive x-range where the corresponding y-values do not change, then #Miguel's smooth unique solution is certainly the easiest.
In a more general case, what if the x-range is repetitive but y-values are changing, e.g. like a noisy sin(x)?
Then compare two consecutive x-values x0 and x1, if x0>x1 then you have a "jump" and make the linecolor fully transparent, i.e. invisible, e.g. 0xff123456 (scheme 0xaarrggbb, check help colorspec). The same "trick" can be used when you want to interrupt a dataline which has a certain forward "jump" (see https://stackoverflow.com/a/72535613/7295599).
Minimal solution:
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) w l lc rgb var
Script:
### plot "folded" data without connecting lines
reset session
# create some test data
set table $Data
plot [0:2*pi] for [i=1:4] '+' u 1:(sin(x)+rand(0)*0.5) w table
unset table
set xrange[0:2*pi]
set key noautotitle
set multiplot layout 1,2
plot $Data u 1:2 w l lc "red" ti "data as is"
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) \
w l lc rgb var ti "\n\n\"Jumps\" removed\nwithout changing\ninput data"
unset multiplot
### end of script
Result:

Is there a way to put a label for the last entry in gnuplot?

I want to use gnuplot for real time plotting (Data gets appended to file which I use for plotting and I use replot for real time plotting). I also want to put a label for the latest entry which is plotted. So as to get a idea what is the latest value. Is there a way to do this?
If you are on a unixoid system, you can use tail to extract the last line from the file and plot it separately in whatever way you desire. To give a simple example:
plot\
"data.dat" w l,\
"< tail -n 1 data.dat" u 1:2:2 w labels notitle
This will plot the whole of data.dat with lines and the last point with labels, with the label depicting the value.
There is no need to use the Linux command tail, you can simply do it with gnuplot-only, hence platform-independently.
The principle: while plotting the data, you assign the values of column 1 and 2 to variables x0 and y0, respectively.
After the first plot command, x0 and y0 will contain the last values.
With this, you don't have to load the file a second time for extracting the last values.
For the label plotting, use these values and print the label with a sprintf() expression (check help sprintf).
The construct '+' u ... every ::0::0 is just one way of many ways to plot a single data point.
Data: SO28152083.dat
1 5.1
2 2.2
3 3.3
4 1.4
5 4.5
Script: (works with gnuplot 4.4.0, March 2010 or even with earlier versions)
### plot last value as label
reset
FILE = "SO28152083.dat"
set key noautotitle
set offsets 0.5,0.5,1,1
plot FILE u (x0=$1):(y0=$2) w lp pt 7 lc rgb "red" ti "data", \
'+' u (x0):(y0):(sprintf("%g",y0)) every ::0::0 w labels offset 0,1
### end of script
Result:

Plotting horizontal lines in gnuplot on an existing graph, using the same coloured lines

The starting point is that I have a graph with 4 lines on it. They are the results of my simulation, plotted over an x-axis of iteration, at 4 different locations. I also have experimental values at each of those locations. I want to plot those 4 experimental values as horizontal lines on the same graph. I would also like the line colours of the simulation and experiment results at each location to be the same.
With #Tom's help, below, I have got the following script to do this:
unset bars
max = 1e6
set xrange[7000:24000]
set yrange[-0.5:1.5]
plot for [i=2:5] 'sim' using 1:(column(i)) ls i, \
for [i=1:4] 'expt' using (1):1:(max) every ::(i-1)::(i-1) with xerror ls i ps 0
The problem is that I want the values in xrange[x_min:x_max] and yrange[y_min:y_max] to be taken from sim and expt as follows:
x_min = min(sim[:1]) # where min(sim[:1]) means "min value in file 'sim' col 1"
x_max = max(sim[:1])
y_min = min(sim[:2],sim[:3],sim[:4],sim[:5],expt[:1])
y_max = max(sim[:2],sim[:3],sim[:4],sim[:5],expt[:1])
My OS is Scientific Linux: Release 6.3, Kernel Linux 2.6.32-358.2.1.el6.x86_64, GNOME 2.28.2
sim and expt are .txt files
A representative sample of sim is:
7520 0.282511 0.0756715 -0.222863 -0.0898819
7521 0.315944 0.201687 -0.321723 -0.106345
7522 0.230956 0.102217 -0.34196 -0.061009
7523 1.460043 -0.00118292 -0.045077 0.673926
A representative sample of expt is:
1.112
0.123
-0.45
0.862
Thank you for your help.
I think that this is a way to solve your problem:
unset bars
max = 1e6
set xrange[0:8]
plot for [i=1:4] 2*i+sin(x) ls i, \
for [i=1:4] 'expt' using (1):1:(max) every ::(i-1)::(i-1) with xerror ls i ps 0
Based on some information I found on Gnuplot tricks, I have (ab)used error bars to produce horizontal lines based on the points in this data file:
2
4
6
8
The (1):1:(max) specifies that a point should be plotted at the coordinate (1, y), where y is read from the data file. The max is the value of xdelta, which determines the size of the x error bar. This is one way of achieving a horizontal line in your plot, as a suitably large value of max will result in an error bar across the entire xrange of your plot.
Here's what the output looks like:
Considering, that you have a data file with five columns, one with the x-values and four with y-values. Now you have additional file where a number path_to_expt comes from. In order to plot the columns and one horizontal line having the y-value path_to_expt you can use
plot for [i=2:5] path_to_file using 1:(column(i))
This plot col 2 against 1, 3 vs 1, 4 vs 1 and 5 vs 1. To get different styles, just use set linetype to redefine the automatically assigned line types:
set linetype 1 lc rgb 'orange'
# ... other lt definitions
plot for [i=2:5] path_to_file using 1:(column(i))
If you don't want to overwrite exising linetype 1..4, use e.g. 11..14:
set linetype 11 lc rgb 'orange'
# ...
plot for [i=2:5] path_to_file using 1:(column(i)) lt (9 + i)
Finally, in order to plot a horizontal line, using the same x-values as in the data file, use
mynumber = 27
plot path_to_file using 1:(mynumber)
If you don't put a number in parentheses, it is interpreted as column number (like the 1 here), whereas put inside parentheses, it is treated as number.
Another option would be to set arrows:
set arrow from graph 0, first mynumber to graph 1, first mynumber lt 1
plot for [i=2:5] path_to_file using 1:(column(i))

Gnuplot: fence plot from data

I'm trying to figure out how to do fence plots in gnuplot, but I'm having a hard time understanding what's going on in the examples I find on the internet.
I have a (varying) number of data sets from different points in time in my simulation, in a datafile organized as a matrix of values1:
t1 x11 y11 // indices here indicate that (x1,y1) are a data point which
t1 x21 y21 // I'd plot on a regular 2D plot for this timestep, with the
... // additional index noting which time step the values are for.
t1 xN1 yN1
[blank line]
t2 x12 y12
t2 x22 y22
...
t2 xN2 yN2
[etc...]
tM xNM yNM
I want to plot this with one fence for each time value. I can plot do simply splot 'data.txt' and get something which is quite similar to what I want - + markers along the "top edges" of the fences, with time on x axis, x-data on y axis and y-data on z axis. However, if I add something like w lines to the splot command I just get a surface with all the data series connected.
I've tried to adapt the examples from the demo script collection (about halfway down), but they both rely on a dummy variable, and I can't figure out how to combine that with my data series. I've found some other exampels as well, but they are all quite elaborate and I don't understand what they do at all.
What is a good way to create fence plots from data using gnuplot?
1 If it's necessary, it is possible to change this - I am in control of the code that generates the data. It's a hassle, though...
This does require a bit of a change to the data unfortunately. The change is pretty minor though and could probably be handled with a simple awk1,2 script:
Here's a copy/paste of my interactive gnuplot session:
gnuplot> !cat test.dat
1 2 3
1 2 0
1 3 4
1 3 0
1 4 5
1 4 0
2 2 3
2 2 0
2 3 4
2 3 0
2 4 5
2 4 0
3 2 3
3 2 0
3 3 4
3 3 0
3 4 5
3 4 0
!
gnuplot> splot 'test.dat' u 1:2:3 w lines
The thing to note here is that there are 2 blank lines between "fences" and each x,y data point appears twice with a blank line afterward. The second time it appears, the z-coordinate is 0.
To get each fence to have a different color:
gnuplot> splot for [i=0:3] 'test.dat' index i u 1:2:3 w lines
The awk script can even be done inline:
splot "< awk {...} datafile"
But that can get a little tricky with quoting (to include a single quote in a single quoted string, you double it) ...
AWKCMD='awk ''{if(!NF){print ""}else if(index($0,"#")!=1){printf "%s %s %s\n%s %s 0\n\n", $1,$2,$3,$1,$2}}'' '
splot '<'.AWKCMD.'datafile.dat' u 1:2:3 w lines
As far as efficiency is concerned, I'm believe that the iteration I used above will call the awk command for each time it iterates. The workaround here is to pull the color from the index number:
splot '<'.AWKCMD.' test.dat' u 1:2:3:(column(-2)) w l lc variable
I believe that this will only do the awk command once as desired so with only a million entries it should still respond relatively quickly.
1awk '{if(!NF){print ""}else{printf "%s %s %s\n%s %s 0\n\n", $1,$2,$3,$1,$2}}' test.dat
2awk '{if(!NF){print ""}else if(index($0,"#")!=1){printf "%s %s %s\n%s %s 0\n\n", $1,$2,$3,$1,$2}}' test.dat (version which ignores comments)

gnuplot: 3D plot of a matrix of data

How can I plot (a 3D plot) a matrix in Gnuplot having such data structure,
using the first row and column as a x and y ticks (the first number of the first row is the number of columns) ?
4 0.5 0.6 0.7 0.8
1 -6.20 -6.35 -6.59 -6.02
2 -6.39 -6.52 -6.31 -6.00
3 -6.36 -6.48 -6.15 -5.90
4 -5.79 -5.91 -5.87 -5.46
Exactly this data format can be read in with matrix nonuniform:
set view 50,20
set ticslevel 0
splot 'data.txt' matrix nonuniform with lines t ''
This generates the correct tics, like specified in the data file:
To plot a 4D plot, using colour as the 4th dimension, you can use
splot '1.txt' using 2:3:4:5 every ::1 palette
# | |
# | |
# used for 3d plots skip the header line
Or do you want to draw a different picture, with x and y being the first column and line, and the numbers in the matrix just represinting z? Then use the following:
splot '1.txt' every ::1:1 matrix
To add some effects, you can change it to
set dgrid3d 4,4
splot '1.txt' every ::1:1 matrix with lines

Resources