gnuplot: how to know the last column number? - gnuplot

I have a problem handling data using gnuplot.
My data has different column number per line.
I want to plot with X-axis of the first column and Y-axis of the last.
The last columns are always different every line.
For example, my data looks like that (my.dat)
1 2
2 1 3
3 4 4
4 5
5 2 1 3 6
plot 'my.dat' us 1:(lastcolumn) w l
Before reading in gnuplot, I can pre-process of the data.
But my gnuplot is windows version, I cannot use awk or any parsing program.
So I hope it handles only into gnuplot.
Is that possible?
Thanks

Yes, you can check that with gnuplot. The idea is as follows:
You analyze your data with stats and inside the using you check recursively with valid which column is the last valid. If an invalid column is reached you return the number of the previous column otherwise the next column is checked. The last column is then contained in the variable STATS_max
check_valid_column(c) = valid(c) ? check_valid_column(c + 1) : c - 1
stats 'my.dat' using (check_valid_column(1)) nooutput
last_column = int(STATS_max)
plot 'my.dat' using 1:last_column

Just for the records, here is an alternative suggestion. Christoph's solution is certainly more elegant and probably faster.
However, with the recursive approach you will get an error "recursion depth limit exceeded" if you have more than 250 columns (admittedly, probably very rare cases).
The solution below uses the lines as one string and counts the columns with words(). This, however, works only if you have whitespace as separator. With comma it will not work. Not sure what string length limit would be.
Code: (edit: no need to plot to a dummy table, stats can be used instead)
### find the maximum number of columns
reset session
# create some random test data
set print $Data
rows = int(rand(0)*5+5) # random 5 to 9 lines
do for [r=1:rows] {
minCols = 251 # if minCols >250, the recursive approach will fail
cols = int(rand(0)*10+minCols)
line = ''
do for [c=1:cols] { line = sprintf("%s %d",line,rand(0)*10) }
print line
}
set print
# alternative approach with word(). Works only for separator whitespace.
set datafile separator "\n"
maxCol=0
stats $Data u (cols=words(strcol(1)), cols>maxCol?maxCol=cols:0) nooutput
set datafile separator whitespace
print "words() approach: ", maxCol
# Recursive approach for comparison
print "Recursive approach: "
check_valid_column(c) = valid(c) ? check_valid_column(c + 1) : c - 1
stats $Data u (check_valid_column(1)) nooutput
last_column = int(STATS_max)
print last_column
### end of code
Result: (if number of max columns>250)
words() approach: 259
Recursive approach:
"SO41032862.gp" line 28: recursion depth limit exceeded

Related

Gnuplot - plotting series based on label in third column

I have data in the format:
1 1 A
2 3 ab
1 2 A
3 3 x
4 1 x
2 3 A
and so on. The third column indicates the series. That is in the case above there are 3 distinct data series, one designated A, another designated ab and last designated x. Is there a way to plot the three data series from such data structure in gnuplot without using eg. awk? The difficulty here is that the number of categories (here denoted A, ab, x) is quite large and it is not feasible to write them out by hand.
I was thinking along the lines:
plot data u 1:2:3 w dots
but that does not work and I get warning: Skipping data file with no valid points (I tried quoted and unquoted version of the third column). A similar question has to manually define the palette which is undesirable.
With a little bit of work you can make a list of unique categories from within gnuplot without using external tools. The following code snippet first assembles a list of the entire third column of the data file, and then loops over it to generate a list of unique category names. If memory use or processing time become an issue then one could probably combine these steps and avoid forming a single string with the entire third column.
delimiter = "#" # some character that does not appear in category name
categories = ""
stats "test.dat" using (categories = categories." ".delimiter.strcol(3).delimiter) nooutput
unique_categories = ""
do for [cat in categories] {
if (strstrt (unique_categories, cat) ==0) {
unique_categories = unique_categories." ".cat
}
}
set xrange[0:5]
set yrange [0:4]
plot for [cat in unique_categories] "test.dat" using 1:(delimiter.strcol(3).delimiter eq cat ? $2 : NaN) title cat[2:strlen(cat)-1]
Take a look at the contents of the string variables categories and unique_categories to get a better idea of what this code does.

Coloring intervals of missing values with Gnuplot

I have temporal data, where some time intervals contain only missing values. I want to show explicitely those missing values intervals.
For now, the solution I have is to check whether the value is NaN or not, as such:
plot file_name using 1:(stringcolumn(num_column) eq "NaN" ? 1/0 : column(num_column)) with lines,\
"" using 1:(stringcolumn(num_column) eq "NaN" ? 1000 : 1/0) with points
Which will result in drawing points at y = 1000 instead of the line for missing values, which gives the following result:
However, this is not ideal because a) I need to specify a y value at which to draw the points and b) it's quite ugly, especially when the dataset is longer in time.
I would like to produce something like this instead:
That is, to fill completely this interval with a color (possibly with some transparency unlike my image). Note that in these examples there is only one interval of missing values, bu in reality there can be any number of them on one plot.
We can do some pre-processing to accomplish this. Suppose that we have the following data file, data.txt
1 8
2 6
4 NaN
5 NaN
6 NaN
7 9
8 10
9 NaN
10 NaN
11 6
12 11
and the following python 3 program (obviously, using python is not the only way to do this), process.py1
data = [x.strip().split() for x in open("data.txt","r")]
i = 0
while i<len(data):
if (data[i][1]=="NaN"):
print(data[i-1][0],end=" ") # or use data[i][0]
i+=1
while data[i][1]=="NaN": i+=1
print(data[i][0],end=" ") # or use data[i-1][0]
else: i+=1
This python program will read the data file, and for each range of NaN values, it will output the last good and next good x-coordinates. In the case of the example data file, it outputs 2 7 8 11 which can be used as bounds for drawing rectangles. Now we can do, in gnuplot2
breaks = system("process.py")
set for [i=0:words(breaks)/2-1] object (i+1) rectangle from word(breaks,2*i+1),graph 0 to word(breaks,2*i+2),graph 1 fillstyle solid noborder fc rgb "orange"
Which will draw filled rectangles over this range. It determines how many "blocks" (groups of two values) are in the breaks variable then reads these two at a time using the breaks as left and right bounds for rectangles.
Finally, plotting the data
plot "data.txt" u 1:2 with lines
produces
which shows the filled rectangles over the range of NaN values.
Just to provide a little more applicability, the following awk program, process.awk3 serves the same purpose as the above python program, if awk is available and python isn't:
BEGIN {
started = 0;
last = "";
vals = "";
}
($2=="NaN") {
if (started==0) {
vals = vals " " last;
started = 1;
}
}
($2!="NaN") {
last = $1
if (started==1) {
vals = vals " " last;
started = 0;
}
}
END {
sub(/^ /,"",vals);
print vals;
}
We can use this by replacing the system call above with
breaks = system("awk -f process.awk data.txt")
1 The boundaries are extended to the last and next point to completely fill the gap. If this is not desired, the commented values will cover only the region identified by NaN in the file (4-6 and 8-10 in the example case). The program will not handle NaN values as the first or last data point.
2 I used solid orange for the gaps. Feel free to use any color spec there.
3 The awk program extends the boundaries in the same way as the python program, but takes more modification to get the other behavior. It has the same limitations in not handling NaN values as the first or last data point.
Using two filled curves
A somewhat "hacky" way of doing it is using two filled curves, as such:
plot file_name using 1:(stringcolumn(num_column) eq "NaN" ? 1/0 : column(num_column)) with lines ls 2,\
"" using 1:(stringcolumn(num_column) eq "NaN" ? 0 : 1/0) with filledcurve x1 ls 3,\
"" using 1:(stringcolumn(num_column) eq "NaN" ? 0 : 1/0) with filledcurve x2 ls 3
Both filledcurve must have the same linestyle, so that we get one uniform rectangle.
One filledcurve has x1 as parameter and the other x2, so that one fills above 0 and the other below 0.
You can remove the curve at 0 and make the filling transparent using this:
set style fill transparent solid 0.8 noborder
This is the result:
Note that the dashed line at 0 under the rectangle is a bit glitchy compared to the other dashed lines. Note also that if some rectangles are very small in width, they will look lighter than expected.

Prevent backward lines in gnuplot

I have some values given by clock time, where the first column is the time. However, the values until 2 o clock still belong to the current day. Given
3 1
12 4
18 1
21 2
1 3
2 0
named as test.data, I'd like to print this in gnuplot:
set xrange [0:24]
plot 'test.data' with lines
However, the plot contains a backward line. It's striking through the whole diagram.
Is there a way to tell gnuplot to explicitly not print such backward lines, or even better, print them wrapping around the x axis (e.g. in my example drawing the line as a forward line up to 24, and then continuing it at 0)?
Note: The x axis of the plot should still start at 0 and end at 24.
As far as wrapping over the edge of the graph (a pac-man like effect), gnuplot can't do that on it's own. Even doing it manually, you would have to somehow calculate the right point to re-enter the graph based on the slope of the connecting line, and insert a new point into the data to control where the re-entry line enters, and where the exiting line exits. This would require external processing.
If you can do some outside preprocessing, adding a blank line before the 1 3 line will insert a discontinuity into the plot and prevent gnuplot from connecting those points (see help datafile for how gnuplot handles blank lines). Of course, you could always sort the data too.
I would recommend sorting the data before plotting, but if you do want to do this wrapping effect, the following python program (wrapper.py) will set up the data for it
data = [tuple(map(float,x.strip().split(" "))) for x in open("data.txt","r")]
data2 = sorted(data)
back_in_to = data2[0]
out_from = data2[-1]
xdelta = back_in_to[0] + 24 - out_from[0]
ydelta = back_in_to[1] - out_from[1]
slope = ydelta/xdelta
outy = out_from[1] + (24-out_from[0])*slope
print(0,outy)
for x in data2:
print(*x)
if x[0]==data[-1][0]: print("")
print(24,outy)
It reads in the data (assumed to be in data.txt, and calculates the points where a line should leave the graph and where it should re-enter, adding these points to the sorted data. It adds a blank line after the last point in the original graph, causing the break in the line. We can then plot like
plot "< wrapper.py" with lines
If we look at your original plot
we see the backward line that you referred to which reaches from the furthest right point to the next left point. The plot that the python program pre-processed reaches through the right of the graph to move back to this point.

Plotting multiple graphs depending on column value with gnuplot

I have the following data, which I wan't to plot using GNUPLOT:
#TIME #VALUE #SOURCE
1 100 A
1 88 B
2 115 A
2 100 B
3 130 A
3 210 B
I want to have two lines drawn, depending on the value of column #SOURCE. One line for A and one line for B. Is this possible with GNUPLOT and if yes how?
Is it possible to also draw a summation of column #VALUE depending over column #TIME? Means, that for all equal entries in #TIME, the values in #VALUE will be summed up.
Thanks in advance,
Frank
One way to do it would be to use grep to locate lines ending with A or B and plot the result. You can do this in a single plot line with a for loop if you know the characters lines will end in:
plot for [s in 'A B'] sprintf("<(grep -v '%s$' data.dat)", s) u 1:2 w l
This plots the data you provided (saved in data.dat) as two different lines.
You could also change the for part to [s in 'word1 word2 word3'] or any other string you like. If you don't know the character/word lines will be ending with you would probably need to pass the file twice first to determine the string for the for loop and a second time to do the plotting.

How to do math operation on rows in gnuplot

Say, my data file has two columns and five rows as follows,
1 3
2 5
3 3
4 4
5 2
Now I would like to plot them but with a little math operation on second column. For example,
plot 'test.dat' u 1:($2*)
What I mean by asterisk is I would like to sqrt(row2^2+row1^2), which is sqrt(5^2+3^2), on second column values. How I can do that? Many thanks!
Usually, one can access only the values of all columns of the current row. Accessing the values of a previous row is possible, but tricky. Basically, you must save the values in temporary variables.
That works in the following way:
In the first row, save the values of both columns and do not plot them (use NaN as value).
In the second row, save the current x-values, use the x-value of the previous row. Then save the current y-value, and compute your value based on the previous row (prevY) and the current row (currY).
That doesn't plot the last line. But that hasn't a next row anyway. If you want it to plot also the last line with e.g. 0 as additional value, you must add a last row with 0 0.
In the script I use set macros for better readability of the code:
set macros
prevX = currX = prevY = currY = 0
UsePreviousXvalue = '(($0 == 0) ? (prevX = NaN, currX = $1) : (prevX = currX, currX = $1)), prevX'
AssignYvalue = '(prevY = currY, currY = $2)'
plot 'test.dat' using (#UsePreviousXvalue):(#AssignYvalue, sqrt(prevY**2 + currY**2))

Resources