gnuplot print column header with stats

gnuplot print column header with stats - gnuplot

The gnuplot stats command can be used to report stats for an input dataset. It creates a set of variables containing information about a specific column in the dataset. Here is an example of such use:
set print "StatDat.dat"
do for [i=2:9] { # Here you will use i for the column.
stats 'data.dat' u i nooutput ;
print i, STATS_median, STATS_mean , STATS_stddev # ...
}
set print
plot "StatDat.dat" us 1:2 # or whatever column you want...
It would be useful to include the reported column header, something like:
print STATS_columnheader, STATS_median, STATS_mean , STATS_stddev # ...
However gnuplot does not provide the required STATS_columnheader variable.
Is there an alternative way to achieve this ?

You can use an external tool such as awk to extract and return a column header. You can create a function like this:
columnheading(f,c) = system("awk '/^#/ {next}; {print $".c.";exit}' ".f)
which, given a file f and a column number c, will return the column header. You'd use it like this:
print columnheading('StatDat.dat',i).' ', STATS_median, STATS_mean , STATS_stddev # ...
The awk expression skips all lines until the first non-comment line, prints the word given by the c parameter and exits. The printed word is returned by gnuplots system command.

Workaround
A quick & dirty solution already expressed as comment to the answer here:
it is possible to store the header line once for all in a variable, then to call it when it is needed.
Under *nix it is possible to use head or a combination such head -n 10| tail -n 1 if it is in the 10th line...
Here the example modified:
firstrow = system('head -1 '.datafile) # you call here only one time
set print "StatDat.dat"
do for [i=2:9] { # Here you will use i for the column.
stats datafile u i nooutput ;
print word(firstrow, i), " ", STATS_median, STATS_mean , STATS_stddev
# or whatever you want...
}
set print
plot "StatDat.dat" us 1:2 # or whatever column you want...
Note that the gnuplot function word will return the nth word in string, so you may have problem if the header is composed from more than a word...
... problems that you can overcome with other tricks
Only a path to a hack/trick
The following doesn't works because gnuplot starts to process the file in the plot command after skipping the header and the commented lines...
Since a function can assume the form f(x) = (statement1, statement2, statement3, return value) executing the statements and returning the value (see, e.g.), you can image to build a function that stores the first line "field by filed" in an array (directly form gnuplot 5.1, via some other tricks before), maybe hiding the plot with set terminal unknown.
array MyHeader[4]
f(x,y,z) = (x == 0 ? (MyHeader[y]=z, z ) : z)
set terminal unknown # set terminal dumb
set key autotitle columnhead
do for [i=2:4] { # Here you will use i for the column.
plot datafile using 1:(f($0,i,column(i)))
}
print MyHeader
Unfortunately the above script stores only the 1st row values...
but for the moment I've finished the time I can dedicate to this problem :-(
(maybe someone can find useful some hint, or finish it).

What exactly do you want to do with the headers?
As you can see in the plot command, apparently gnuplot considers the first uncommented line as header line which is then for example used for the legend.
No need for external tools or scripts. With stats you can also easily extract any line, e.g. the second line via every ::1::1.
Since gnuplot 5.2.0 (Sep 2017) you can use arrays which you can also plot,e.g. for tables within the plot.
Script:
### extract header lines via stats
reset session
$Data <<EOD
# comment line
PosX PosY Density # header line
x/m y/cm g/cm2 # (sub)header line
1 1.1 2.1
2 1.2 2.2
3 1.3 2.3
4 1.4 2.4
5 1.5 2.5
6 1.6 2.6
7 1.7 2.7
8 1.8 2.8
9 1.9 2.9
EOD
array Headers[3]
array Units[3]
do for [i=1:|Headers|] {
stats $Data u (Headers[i]=strcol(i)) every ::0::0 nooutput
stats $Data u (Units[i] =strcol(i)) every ::1::1 nooutput
}
print Headers
print Units
set key top left
plot $Data u 1:2 w lp pt 7 lc "red" ti columnheader, \
'' u 1:3 w lp pt 7 lc "blue" ti columnheader, \
Headers u ($1+3):(2.1):2 w labels notitle, \
Units u ($1+3):(1.9):2 w labels notitle
### end of code
Result:
["PosX","PosY","Density"]
["x/m","y/cm","g/cm2"]

Related

Hhow to print the X value at GPVAL_DATA_Y_MIN

I can print the minimum x,y using print GPVAL_DATA_X_MIN, GPVAL_DATA_Y_MIN
How can I print the X value at GPVAL_DATA_Y_MIN?

Reading your question again, I guess using stats and setting a label is the easiest way.
Check help stats and help label. In order to see which values stats is calculating type show var STATS in the gnuplot console.
Code:
### label the coordinates of y-min value
reset session
# create some random test data
set samples 20
set table $Data
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
unset table
stats $Data u 1:2 nooutput
set label 1 at STATS_pos_min_y, STATS_min_y sprintf("(%.2f|%.2f)",STATS_pos_min_y, STATS_min_y) offset 1,0
unset key
set offset 0.1,0.1,0.1,0.1
plot $Data u 1:2 w p pt 7 lc "red"
### end of code
Result:

Sum of selected columns filtered by regex in gnuplot

Well, I do understand that gnuplot is not a data-processing system but a plotting software. But anyway...
In python-pandas, I can select multiple columns by passing a regex to dataframe e.g. df.filter( regex = '\.x$' ) will return columns named 'sw0.x', 'sw1.x' etc. Then I can sum them up and plot them.
Recently I've moved to pgfplots (latex) and I use gnuplot extensively with pgfplots on large-data set. Many times I needs to plot the sum of many columns which matches a given regular expression. I want to do something like plot 'data.csv' SUM("\.x$") every 100 with line where function/macro/whatever SUM accepts the regular expression and returns me the sum of appropriate columns.

In that case, it will be most likely necessary to "outsource" this processing part to Pandas. For example if you create a script filter.py such as:
#!/usr/bin/env python
import pandas as pd
import sys
df = pd.read_csv(sys.argv[1], sep = ',', header = 0)
s = df.filter(regex='\.x$', axis = 1).sum(axis = 1)
s.to_csv(sys.stdout, sep = '\t')
then you can "reuse" it in Gnuplot as:
plot "<python filter.py data.csv" w lp

gnuplot does not support regular expressions, but in some cases you can get similar functionality by defining suitable functions.
#Dilawar, you don't give too much details about your data. I assume separator is whitespace.
As #ewcz wrote, you can always use external tools to (pre-)process your data into such a format that gnuplot can plot it.
However, if possible and if it is not getting too complicated why not using gnuplot itself?
In your case you're asking about summing up columns if the end of the columnheader matches a certain string. Check the example below which can certainly be optimized further.
Script: (revised and simplified version)
### select columns by matching end of columnheader
reset session
$Data <<EOD
ID sw0.x sw0.y sw0.z sw1.x sw1.y sw1.z
1 0.1 1.1 3.1 0.2 1.2 3.2
2 0.2 1.2 3.2 0.3 1.3 3.3
3 0.3 1.3 3.3 0.4 1.4 3.4
4 0.4 1.4 3.4 0.5 1.5 3.5
EOD
set datafile separator "\t"
stats $Data u (myHeaders=strcol(1)) every ::0::0 nooutput
set datafile separator # restore to default
myHeader(i) = word(myHeaders,i) # get the ith item of the header line
colMax = words(myHeaders)
matchEnd(s,m) = s[strlen(s)-strlen(m)+1:strlen(s)] eq m # 1=match, 0=no match
sumUp(m) = sum [col=1:colMax] ( matchEnd(myHeader(col),m) ? column(col) : 0 )
myMatches = ".x .y .z"
myMatch(i) = word(myMatches,i)
set key out
plot for [i=2:colMax] $Data u 1:i w lp pt 6 ti columnhead, \
for [i=1:words(myMatches)] '' u 1:(sumUp(myMatch(i))) \
w lp pt 7 ps 2title sprintf("Sum up %s",myMatch(i))
### end of script
Result:

Plotting constant from a file in GNUplot

I have a data file containing a gaussian function, and an other date file that contains one column with 3 rows. Those three row are all constant which are
1: mean+variance
2: mean
3: mean-variance
from the gaussian in the first file.
I would like to plot all these as constant lines on the gaussian function. I've tried the "every" command, (plot "stat.dat" every ::0::0 w lines) which didn't work.
Thank you, any help is appreciated.

Do you mean something like this?
set terminal pngcairo
set output "gauss.png"
set samples 1000
x0 = -5
s2 = 1
set xrange [-10:10]
set yrange [0:0.5]
plot (1/sqrt(2*pi*s2))*exp(-(x-x0)**2/(2*s2)) title "Gaussian", \
"stat.dat" u 1:(5) every ::0::0 w impulse title "mean + variance", \
"stat.dat" u 1:(5) every ::1::1 w impulse title "mean", \
"stat.dat" u 1:(5) every ::2::2 w impulse title "mean - variance"
I have replaced your data file which contains the gaussian function by an analytical expression. The result looks as follows:

Plot data and fit-functions of multiple files into one plot

I have N input files and I want to plot the data of these files together with their fit function into one single plot (i.e. one plot for all files, data and fit-function).
After a long time of fiddling I found a solution (see below), but I find it "cumbersome and ugly" and I'm wondering if there is a better, more elegant way of achieving the same thing.
I should say that I'm on gnuplot 5.0 under Windows. The test script below doesn't specify a terminal (I'm testing with windows and wxt), but the final script will use pngcairo terminal.
Things that I find sub-optimal about my solution:
I need two intermediary tables $data and $fit. My original attempt was to use a do for{} loop to read each file in turn perform the fit and generate the plot, but that didn't work out.
Rather than using a fit function, I plot the fit curve (in this simple case a straight line) as data into a table. I experimented with creating on-the-fly user functions using eval but just couldn't quite figure it out (especially how to keep them in sync with the data).
I want the fit-equation to be displayed in the chart. I do this by setting labels, but it would be nicer if it would just be part of the key.
My test data:
data1.dat
100 0.15
200 0.29
300 0.46
400 0.58
data2.dat
100 0.12
200 0.22
300 0.35
400 0.48
data3.dat
100 0.1
200 0.22
300 0.29
400 0.40
My gnuplot script:
set key left
set xrange [0:*]
set yrange [0:0.5]
# user function for linear fit
lin(x) = slope * x + offset
max(a,b) = ((a>=b)? a : b)
file_list = "data1 data2 data3"
x_max = 0
# first write all data of interest into a (memory) table
set table $data
do for [name in file_list] {
filename = name . ".dat"
plot filename u 1:2
print ""
print ""
x_max = max(GPVAL_DATA_X_MAX, x_max)
}
unset table
x_max = max(GPVAL_DATA_X_MAX, x_max)
num_indices = words(file_list)
# now calculate a linear fit for each dataset
set sample 2
set table $fit
do for [i = 0:(num_indices-1)]{
fit lin(x) $data index i using 1:2 via slope, offset
plot [0:x_max][0:0.5] lin(x)
set label (i+1) sprintf("%s = %.3g*x + %.3g", word(file_list, i+1)."(x) ", slope, offset) at 200,(0.20 - 0.05*i)
}
unset table
set title "Data and Linear Fit"
set xlabel "x"
set ylabel "y"
#now we got both data and fit for all files, plot everything at once
plot for [i = 0:(num_indices-1)] $data index i title word(file_list,i+1) with points lc i+1, for [i = 0:(num_indices-1)] $fit index i with lines lc i+1 noti

There is always the stupid, brute force way. You can create a new datafile containing all points you want to fit (e.g. using "cat data1.dat data2.dat data3.dat > newdata.dat" in a linux system and then fit newdata).

how to output dots/arrows at the ends of line

I am new to gnuplot, I need to plot my data and display a small circle or an arrow at each end of the line chart. how can I do that?
I use this command to display the line chart:
plot 'data.txt' with lines

I don't know if there is a way to make lines have something at the end automatically, but I found a workaround. With this data file:
1 1
2 3
3 2
and the following script:
set term png
set out 'plot.png'
stats 'data.dat' name 'a'
# plot line, then circle only if it is the last data point
plot 'data.dat' t 'data', \
'' u ($0==(a_records-1)?$1:1/0):2 with points pt 7 ps 2 t 'end'
I can make a plot like this:
The stats command is to find the number of data points, then the dual plot command draws the line connecting the data points, then a circle only on the last data point (determined with the a_records variable. An arrow would be trickier to draw...
To find more info about different point/line style options, the test command at the gnuplot command line is your friend.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

gnuplot print column header with stats - gnuplot

Related

Hhow to print the X value at GPVAL_DATA_Y_MIN

Sum of selected columns filtered by regex in gnuplot

Plotting constant from a file in GNUplot

Plot data and fit-functions of multiple files into one plot

how to output dots/arrows at the ends of line

Categories

Resources