Number of columns in data file using STATS

Number of columns in data file using STATS - gnuplot

Based on the answer to this question, I'd like to determine the amount of columns in my file.
The file looks like this:
Header,,Header2,,Header3,,
1,2,3,4,5,6
11,12,13,14,15,16
When I now try to use the stats command:
stats 'data.dat'
max_col = STATS_columns
Gnuplot gives the error that there's bad data on line 1 of file data.dat which obviously is the header.
If I remove the header, everything's fine, but I'm planning on using columnheader for automated labelling (as discussed e.g. here) of the curves, so removing the header is not a solution.
If it matters: I'm working on a Windows-machine.

As hinted in the comments, the solution is simply modifying the command like this:
stats 'data.dat' skip 1
max_col = STATS_columns

Be aware that STATS_columns will not necessarily give you the maximum number of columns in a file or datablock. Apparently, it will just give you the number of columns in the first data row. If following rows have more columns stats will not take this into account.
If you need to know the maximum number of columns of data which is not a regular table, the following code (tested in gnuplot 5.2.5) is counting the separators of all lines in a "irregular" datablock and tells you the maximum. Well, possible drawback: commented lines containing separators will also be included. A file has to be loaded to a datablock first.
I am happy to learn if there is a better way.
### maximum number of columns in datablock
reset session
$Data <<EOD
11 12 33
21 22 23 24 25
31 32 33 34 35 36 37 38 39
41 42 43 44 45
51 52
EOD
# Method with stats
stats $Data nooutput
print "STATS_columns: ", STATS_columns
# Method with counting separators
CountChar(s,c) = int(sum[Count_i=1:strlen(s)] (s[Count_i:Count_i] eq c))
ColMax = 0
Separator = "\t"
do for [i=1:|$Data|] {
ColCount = CountChar($Data[i],Separator)+1
ColMax = (ColCount > ColMax ? ColCount : ColMax)
}
print "Maximum Columns: ", ColMax
### end of code
Result:
STATS_columns: 3
Maximum Columns: 9

Related

How can I subtract the value in first row from all other values in the column?

I want to subtract all rows of a file from its first row, and then plot it. How can I implement such math work in gnuplot?
Here is an example of what i want to do:
Let's say i have a file that has two columns and 1000 rows. I want a script that subtract all data's in 2nd column from the 2nd column value in first row.

I am pretty sure that there are a similar questions on SO, however, apparently not so easy to find.
I would have searched for "normalization" or "offset".
The following example works even if you have single or double empty lines in your data. The expression in the plot command uses serial evaluation, check help operators binary.
Sometimes, you might see similar solutions using the pseudocolumn 0 (check help pseudocolumns), however, which might lead to wrong results if you have empty lines in your data.
Script:
### offset data: subtraction of first value in a column
reset session
$Data <<EOD
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
EOD
plot t=0 $Data u 1:(t==0?y0=$2:0,t=t+1,$2-y0) w lp pt 7 lc "red"
### end of script
Result:

Gnuplot with Linear Regression

I am trying to apply linear regression with (Fit(x)). Instead of having two columns in a data file, e.g. x and y values, this file have, for example, 5 columns. I want to pick the avg value of each column and feed it to the F(X) function.
Data:
A B C D E
2 2 5 10 20
4 5 6 11 1
6 8 7 12 4
8 9 12 13 8
10 11 10 14 17
Could I?
Thanks for help

Assume that your data is as specified and you wish to fit a function f(x)=m*x+b to the data where the 0-based column index (0,1,2,3 or 4) should be the x value and the column average should be the y value. We need to construct a new data file that contains the averages.
In gnuplot 5, we can use something called inline data. This is a special variable that behaves like a file. We will find the average of each column of the data and construct an inline data variable containing these. We do this by looping over the column indices and applying the stats function. The print command can be instructed to print to an inline data variable.
set print $l append
do for [i=1:5] {
stats datafile u i nooutput
print STATS_mean
}
set print # restores ordinary print behavior
With your data, we can see what is contained in $l by printing it with print $l:
6.0
7.0
8.0
12.0
10.0
We now can apply the fit command with this data
f(x) = m*x + b
fit f(x) $l u 0:1 via m,b
This will fit the data so that f(x) = average of column x (or as close to it as can be obtained with the fit).
In gnuplot 4.6, inline data is not available, but we can use a temporary file. Replacing all occurrences of $l with "tempfile" will work the same (except for the print $l command), but will add the data to a temporary file named tempfile.

How to get the value of a specific column in a specific line in any time of processing in gnuplot?

I got a data file in the format like this:
# begin
16 1
15 2
14 3
13 4
12 5
11 6
Now I want to use gnuplot to draw a line through the points:
(1, (16/16)) (2, (16/15)) (3, (16/14)) ... (6, (16/11))
As you see, the x axis is the range [1:6] and the Y axis corresponds the values obtained from the number in the first line at the first column(ie. 16 in this example) divided by the number in each line at the first column.
The problem is that I don't know how to get the value of the number at the first column in the first line (16), so that I could do something like
plot "datafile" using 2:(16/$1) with linespoints
I have done a lot of search about how to achieve that but with no luck. It seems that gnuplot doesn't provide some flexible ways to allow arbitrary data selection. Any ideas how to do that? Or maybe I just got stuck into a not so common problem?
Thanks for your help in advance.

You can use the stats command to extract a single numerical value from your data file. The row is selected with the every option, the column with the using:
col = 1
row = 0
stats 'datafile' every ::row::row using col nooutput
value = STATS_min
plot "datafile" using 2:(value/$1) w lp
Note, that column numbering starts at 1, and row numbering at 0 (comment lines are skipped and aren't counted).

gnuplot: plotting a file with 4 columns all on y-axis

I have a file that contains 4 numbers (min, max, mean, standard derivation) and I would like to plot it with gnuplot.
Sample:
24 31 29.0909 2.57451
12 31 27.2727 5.24129
14 31 26.1818 5.04197
22 31 27.7273 3.13603
22 31 28.1818 2.88627
If I have 4 files with one column, then I can do:
gnuplot "file1.txt" with lines, "file2.txt" with lines, "file3.txt" with lines, "file4.txt" with lines
And it will plot 4 curves. I do not care about the x-axis, it should just be a constant increment.
How could I please plot? I can't seem to find a way to have 4 curves with 1 file with 4 columns, just having a constantly incrementing x value.
Thank you.

You can plot different columns of the same file like this:
plot 'file' using 0:1 with lines, '' using 0:2 with lines ...
(... means continuation). A couple of notes on this notation: using specifies which column to plot i.e. column 0 and 1 in the first using statement, the 0th column is a pseudo column that translates to the current line number in the data file. Note that if only one argument is used with using (e.g. using n) it corresponds to saying using 0:n (thanks for pointing that out mgilson).
If your Gnuplot version is recent enough, you would be able to plot all 4 columns with a for-loop:
set key outside
plot for [col=1:4] 'file' using 0:col with lines
Result:
Gnuplot can use column headings for the title if they are in the data file, e.g.:
min max mean std
24 31 29.0909 2.57451
12 31 27.2727 5.24129
14 31 26.1818 5.04197
22 31 27.7273 3.13603
22 31 28.1818 2.88627
and
set key outside
plot for [col=1:4] 'file' using 0:col with lines title columnheader
Results in:

Just to add that you can specify the increment in the for loop as third argument. It is useful if you want to plot every nth column.
plot for [col=START:END:INC] 'file' using col with lines
In this case it changes nothing but anyway:
plot for [col=1:4:1] 'file' using col with lines

Gnuplot: Plotting multiple series on graph, but number of different series to overlay unknown ahead of time

I am trying to write a script wrapping gnuplot that will take a dataset and produce an overlayed graph, the number of series to be plotted based on the number of distinct values in a given column, or based on the number of different datasets in the file. An example file would be:
#SeriesName x y
Series1 0 10
Series1 1 11
Series1 2 13
...
SeriesN 0 14
SeriesN 1 19
SeriesN 2 15
I have this in one continuous set of lines, but I can split it into index-able chunks if necessary. The problem is that I don't know the different names of the SeriesName values I'll have ahead of time, nor how many of distinct values there will be. But I want one line on the graph per distinct value of SeriesName. I can see how to make graphs if I know ahead of time the different values of SeriesName, but I don't know how to tell gnuplot to "make one line per value of series, and label each line with the name that is the value of SeriesName that was used for each line."
Can gnuplot do this? Otherwise, I can make two passes through the data, the first one of which I will gather the unique values of SeriesName, and then use bash/perl/python to explicitly build a `plot' statement, but it seems like gnuplot should have some functionality for a user to have to avoid that. Am I missing something?
Thanks in advance.
Update: I also posted to a forum to where the author of Gnuplot in Action (Philipp Janert) posts, and I posted a workaround to my own problem, but I don't think it qualifies as an answer, as what it ultimately does is make a second run through the data and then does a source code filter on gnuplot commands to make a gnuplot script compliant with a particular dataset. I would think that there would be an answer using just the syntax of gnuplot better than what I did. For reference, here is the link: http://www.manning-sandbox.com/thread.jspa?messageID=122752#122752

Just for the records, here is a solution which works with gnuplot>=4.4.0 and gnuplot 5.x.
When the series label changes in column 1 it will be added to a string. This string will be used later to plot the legend.
Data: SO8812078.dat
#SeriesName x y
Series1 0 10
Series1 1 11
Series1 2 13
Series2 0 12
Series2 1 13
Series2 2 14
SeriesN 0 14
SeriesN 1 19
Script: (works with gnuplot>=4.4.0, March 2010)
### take legend from column
reset
FILE = "SO8812078.dat"
myTitles = ''
set key noautotitle
plot t1='' FILE u (t0=t1,t1=strcol(1),t0 ne t1?myTitles=myTitles.' '.t1:0,$2):3:(words(myTitles)) w lp pt 7 lc var, \
for [i=0:words(myTitles)] 1/0 w lp pt 7 lc i ti word(myTitles,i)
### end of script
Result: (created with gnuplot 4.4.0)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string