How to do math operation on rows in gnuplot - gnuplot

Say, my data file has two columns and five rows as follows,
1 3
2 5
3 3
4 4
5 2
Now I would like to plot them but with a little math operation on second column. For example,
plot 'test.dat' u 1:($2*)
What I mean by asterisk is I would like to sqrt(row2^2+row1^2), which is sqrt(5^2+3^2), on second column values. How I can do that? Many thanks!

Usually, one can access only the values of all columns of the current row. Accessing the values of a previous row is possible, but tricky. Basically, you must save the values in temporary variables.
That works in the following way:
In the first row, save the values of both columns and do not plot them (use NaN as value).
In the second row, save the current x-values, use the x-value of the previous row. Then save the current y-value, and compute your value based on the previous row (prevY) and the current row (currY).
That doesn't plot the last line. But that hasn't a next row anyway. If you want it to plot also the last line with e.g. 0 as additional value, you must add a last row with 0 0.
In the script I use set macros for better readability of the code:
set macros
prevX = currX = prevY = currY = 0
UsePreviousXvalue = '(($0 == 0) ? (prevX = NaN, currX = $1) : (prevX = currX, currX = $1)), prevX'
AssignYvalue = '(prevY = currY, currY = $2)'
plot 'test.dat' using (#UsePreviousXvalue):(#AssignYvalue, sqrt(prevY**2 + currY**2))

Related

Excel - max between 2 series

I have 2 series of data. For sake of simplicity, lets say the data looks like below,
set 1:
1 3
2 3.5
3 4
4 4.5
5 5
6 5.5
7 6
8 6.5
9 7
10 7.5
set 2:
1.5 2
2.8 4.5
3.5 8
4.5 6
5.5 4.8
6.5 4
7.5 6.5
8.5 9
9.5 3
10.5 4
After charting these 2 sets, I want to get the line with the higher data. I want the black line, In the attached pic. How do I get that? My actual data has thousands of data points, so doing this manually isn't possible.
Added later: Another thing I forgot to mention, in my actual data 1 set has about 500 x,y values, and the other set has about 50 values. Though the end points have same/similar x values.
Thanks for your help.
Given your information about the chart and the tables, I would do something like this:
The new series will be based on two formulas:
In Column H, I have the formula for the max value (between your two series):
=MAX(B2,E2)
In Column G, I have the formula that based on the Max value (formula above), which X value I should use (X-value from Series 1 or 2).
=IF(H2=B2,A2,D2)
Then I can plot my graph:
Series 1, Column B
Series 2, Column E
Series 3, Column H.
All series uses the X values of Column G.
Introduction
A few assumptions/comments/pitfalls/constraints regarding my solution:
Set 1 and Set 2 are in columns A till D.
The combined data set will combine the x-values of both Sets, and will have additional data points where the lines cross.
It involves several helper columns, in particular to allow you to copy/paste this across multiple worksheet with data.
I did not try to condense too much, to improve readability, and probably some helper columns could be combined.
It was tested with the data set from the question, but difficult to guarantee all "boundary" conditions, e.g. identical data points between Set 1 and Set 2, zero overlap between the two data sets, empty data sets, etc. (I did test some of these, see my comments at the end).
Set 1 and Set 2 must be sorted (on x-values). If this is not the case, a few additional helper columns are needed to sort the data dynamically.
To better understand the solution described below, see herewith the resulting graph, based on the data set in the question (although I added one data point [2.5;3.75] to avoid having the data points of Set 1 and Set 2 perfectly alternating):
General solution outline / methodology
Combine both datasets in a single (sorted) column;
For all x-values, determine highest y-value, between the y-value in the Set, and the calculated y-value on the line segment from the neighboring values in the other Set (looks simple, in particular with the given example data set, but this is quite tricky to do when data sets have no alternating x-values);
Find the points (x & y values) where the lines of the graph are crossing (intersecting), let's call this Set 3
Combine and sort (on x-values) the three data sets in a two columns (for x & y values).
The details and formulas
For the formulas, I assume row 1 contains headings, and the data start on row 2. All formulas should be entered in row 2, except for a few, where I mention to put them in row 3 (because they need data from the preceding row). The result is in columns E (x-values) and F (y-values), and G till AG are helper columns).
Column E : =INDEX(AH$2:AH$30;MATCH(ROWS(AH$2:AH2);$AJ$2:$AJ$30;0)) These is the actual result. Gets all x-values in AH and sorts these based on an index column AJ; this should actually be the last column in the logical flow, but for presentation purposes it is cleaner to have this next to the input data sets;
F : =INDEX(AF$2:AF$30;MATCH(ROWS(AF$2:AF2);$AG$2:$AG$30;0)) Same for y-values;
G : =IF(ISNA(H2);NA();COUNTIF($H$2:$H$30;"<="&H2)) Creates index to sort combined x-values of both data sets. You also can dynamically sort without such helper column, but then you need a VLOOKUP or INDEX/MATCH and with long decimal numbers I have some bad experiences with these;
H : =IF(ROW()-1<=COUNT($A$2:$A$30);A2;IF((ROW()-1)<=(COUNT($A$2:$A$30)+COUNT($C$2:$C$30));INDEX($C$2:$C$30;ROW()-COUNT($A$2:$A$30)-1;1);NA())) Combines x-values of both data sets, i.e. in columns A & C;
I : =IF(ROW()-1<=COUNT($B$2:$B$30);B2;IF((ROW()-1)<=(COUNT($B$2:$B$30)+COUNT($D$2:$D$30));INDEX($D$2:$D$30;ROW()-COUNT($B$2:$B$30)-1;1);NA())) Same for the y-values;
J : =IF(ROW()-1<=COUNT($A$2:$A$30);"S1";IF((ROW()-1)<=(COUNT($A$2:$A$30)+COUNT($C$2:$C$30));"S2";NA())) Assign "S1", or "S2" to each data point, as indication from which data set they come;
K : =IF(J2=J3;INTERCEPT(I2:I3;H2:H3);NA()) Determines the intercept of the line segment starting at that data point;
L : =IF(J2=J3;SLOPE(I2:I3;H2:H3);NA()) Same for slope;
M : =INDEX(H$2:H$30;MATCH(ROWS(H$2:H2);$G$2:$G$30;0)) Sorts all x-values;
N : =INDEX(I$2:I$30;MATCH(ROWS(I$2:I2);$G$2:$G$30;0)) Same for y-values
O : =INDEX(J$2:J$30;MATCH(ROWS(J$2:J2);$G$2:$G$30;0)) Same for corresponding "S1/S2" value to indicate from which data set they come;
P : =INDEX(K$2:K$30;MATCH(ROWS(K$2:K2);$G$2:$G$30;0)) Same for intercept;
Q : =INDEX(L$2:L$30;MATCH(ROWS(L$2:L2);$G$2:$G$30;0)) Same for slope;
R : =IF(O2="S1";"S2";"S1") Inversion between S1 & S2.
S : {=IFERROR(INDEX($O$2:$Q2;MAX(IF($O$2:$O2=$R3;ROW($O$2:$O2)-ROW(INDEX($O$2:$O2;1;1))+1));2);NA())} Array formula to be put in cell S3 (hence ctrl+shift+enter) that will search for the intercept of the preceding data point of the other data set.
T : {=IFERROR(INDEX($O$2:$Q2;MAX(IF($O$2:$O2=$R3;ROW($O$2:$O2)-ROW(INDEX($O$2:$O2;1;1))+1));3);NA())} Same for slope;
U : =IF(OR(ISNA(N2);NOT(ISNUMBER(S2)));NA();M2*T2+S2) Calculates the y-value on the line segment of the other data set;
V : =MAX(IFNA(U2;N2);N2) Maximum value between the original y-value and the calculated y-value on the corresponding line segment of the other data set;
W : =(V2=N2) Checks whether the y-value comes from the original data set or not;
X : =IF(O2="S1";IF(W2;"S1";"S2");IF(W2;"S2";"S1")) Determines on which data set (line) the y-value sits (S1 or S2);
Y : =IFERROR(AND((X2<>X3);COUNTIF(X3:$X$30;X2)>0);FALSE) Determines when the data sets cross (i.e. the lines on the graph intersect);
Z : =IF(Y2;(S2-P2)/(Q2-T2);NA()) Calculates x-value of intersection;
AA : =IF(Y2;Z2*Q2+P2;NA()) Calculates y-value of intersection;
AB : =COUNTIF($Z$2:$Z$30;"<="&Z2) Index to sort the newly calculated intersection points (I sort them because then the combining with the other data sets is straightforward, re-using formula of column H;
AC : =INDEX(Z$2:Z$30;MATCH(ROWS(Z$2:Z2);$AB$2:$AB$30;0)) Sorted x-values of intersection points;
AD : =INDEX(AA$2:AA$30;MATCH(ROWS(AA$2:AA2);$AB$2:$AB$30;0)) Same for y-values;
AE : =IF(ROW()-1<=COUNT(M$2:M$30);M2;IF((ROW()-1)<=(COUNT(M$2:M$30)+COUNT(AC$2:AC$30));INDEX(AC$2:AC$30;ROW()-COUNT(M$2:M$30)-1;1);NA())) Combine x-values of Set 1, Set 2, and the intersection points;
AF : =IF(ROW()-1<=COUNT(V$2:V$30);V2;IF((ROW()-1)<=(COUNT(V$2:V$30)+COUNT(AD$2:AD$30));INDEX(AD$2:AD$30;ROW()-COUNT(V$2:V$30)-1;1);NA())) Same for y-values;
AG : =IF(ISNA(AE2);NA();COUNTIF($AE$2:$AE$30;"<="&AE2)) Create index to sort the resulting data set (and this is used to calculate the final results in columns E & F;
All formulas go until row 30, but this need to be changed of course based on the actual data sets. The idea is to add these formulas to one worksheet, and then columns E > AG can be copied to all other worksheets. There are obviously quite a few #NA values, but this is on purpose, and are not errors or mistakes. On request, I can share the actual spreadsheet, so you do not have to retype all formulas.
Some additional comments
You have to modify some formulas (the sort indices) if there are identical x-values, either within Set 1 (which I will not cover here, as it seems this would be unlikely, or be data input errors), or between Set 1 and Set 2. The dynamic sorting does not work in that case. A workaround is to create a "synthetic" sort column, e.g. with =TEXT(J2;"0000.00000000000")&L2. This formats all numbers the same way as text, and appends S1 or S2. So this should give unique sort values, which would sort the same way as the corresponding numbers.
Empty data sets or data sets with only 1 value are not treated correctly either (the intercept formulas and finding values for the "previous" data point are meaningless in these cases).

Gnuplot - plotting series based on label in third column

I have data in the format:
1 1 A
2 3 ab
1 2 A
3 3 x
4 1 x
2 3 A
and so on. The third column indicates the series. That is in the case above there are 3 distinct data series, one designated A, another designated ab and last designated x. Is there a way to plot the three data series from such data structure in gnuplot without using eg. awk? The difficulty here is that the number of categories (here denoted A, ab, x) is quite large and it is not feasible to write them out by hand.
I was thinking along the lines:
plot data u 1:2:3 w dots
but that does not work and I get warning: Skipping data file with no valid points (I tried quoted and unquoted version of the third column). A similar question has to manually define the palette which is undesirable.
With a little bit of work you can make a list of unique categories from within gnuplot without using external tools. The following code snippet first assembles a list of the entire third column of the data file, and then loops over it to generate a list of unique category names. If memory use or processing time become an issue then one could probably combine these steps and avoid forming a single string with the entire third column.
delimiter = "#" # some character that does not appear in category name
categories = ""
stats "test.dat" using (categories = categories." ".delimiter.strcol(3).delimiter) nooutput
unique_categories = ""
do for [cat in categories] {
if (strstrt (unique_categories, cat) ==0) {
unique_categories = unique_categories." ".cat
}
}
set xrange[0:5]
set yrange [0:4]
plot for [cat in unique_categories] "test.dat" using 1:(delimiter.strcol(3).delimiter eq cat ? $2 : NaN) title cat[2:strlen(cat)-1]
Take a look at the contents of the string variables categories and unique_categories to get a better idea of what this code does.

gnuplot: how to know the last column number?

I have a problem handling data using gnuplot.
My data has different column number per line.
I want to plot with X-axis of the first column and Y-axis of the last.
The last columns are always different every line.
For example, my data looks like that (my.dat)
1 2
2 1 3
3 4 4
4 5
5 2 1 3 6
plot 'my.dat' us 1:(lastcolumn) w l
Before reading in gnuplot, I can pre-process of the data.
But my gnuplot is windows version, I cannot use awk or any parsing program.
So I hope it handles only into gnuplot.
Is that possible?
Thanks
Yes, you can check that with gnuplot. The idea is as follows:
You analyze your data with stats and inside the using you check recursively with valid which column is the last valid. If an invalid column is reached you return the number of the previous column otherwise the next column is checked. The last column is then contained in the variable STATS_max
check_valid_column(c) = valid(c) ? check_valid_column(c + 1) : c - 1
stats 'my.dat' using (check_valid_column(1)) nooutput
last_column = int(STATS_max)
plot 'my.dat' using 1:last_column
Just for the records, here is an alternative suggestion. Christoph's solution is certainly more elegant and probably faster.
However, with the recursive approach you will get an error "recursion depth limit exceeded" if you have more than 250 columns (admittedly, probably very rare cases).
The solution below uses the lines as one string and counts the columns with words(). This, however, works only if you have whitespace as separator. With comma it will not work. Not sure what string length limit would be.
Code: (edit: no need to plot to a dummy table, stats can be used instead)
### find the maximum number of columns
reset session
# create some random test data
set print $Data
rows = int(rand(0)*5+5) # random 5 to 9 lines
do for [r=1:rows] {
minCols = 251 # if minCols >250, the recursive approach will fail
cols = int(rand(0)*10+minCols)
line = ''
do for [c=1:cols] { line = sprintf("%s %d",line,rand(0)*10) }
print line
}
set print
# alternative approach with word(). Works only for separator whitespace.
set datafile separator "\n"
maxCol=0
stats $Data u (cols=words(strcol(1)), cols>maxCol?maxCol=cols:0) nooutput
set datafile separator whitespace
print "words() approach: ", maxCol
# Recursive approach for comparison
print "Recursive approach: "
check_valid_column(c) = valid(c) ? check_valid_column(c + 1) : c - 1
stats $Data u (check_valid_column(1)) nooutput
last_column = int(STATS_max)
print last_column
### end of code
Result: (if number of max columns>250)
words() approach: 259
Recursive approach:
"SO41032862.gp" line 28: recursion depth limit exceeded

How to plot of difference between current record and previous record of two different columns in the same data file in gnuplot

I have 3 columns of data in a text file. First column is time. Second and Third columns are my variable parameters. My intention is to plot the difference in the current value of data from 3rd column and the previous data value from the 2nd column against 1st column current value.
That is a bit tricky since gnuplot usually has access only to the values of the current row. However, you can keep a limited number of previous values in memory with some trickery inside the using statement.
Unfortunately you don't give any example data, so here is my test data file test.dat (without times, you'll need to adapt that by yourself):
0 1 5
1 12 7
2 1 6
3 5 5
The following script plots the difference between the current value in column 3 and the previous value in column 2:
back2 = back1 = 0
shift(x) = (back2 = back1, back1 = x)
plot 'test.dat' using 0:(shift($2), $0 < 1 ? 1/0 : $3 - back2) w lp pt 7 ps 2
What happens here is basically the following:
Inside the using statement you can separate several statements like assignments with commas. Only the last expression in one column is used as actual data value.
The shift function the second column first assigns the value of the previous row to variable back2 and then the current value to variable back1. So, calling shift($2) saves the value in column 2 of the previous row to variable back2. Then you can do the actual computation as $3 - back2. The condition $0 < 1 discards the first row for which you don't have a previous value.
The result is:

specifying points in space associated with a particular value in gnuplot

I have 10x10x10 grid points. Some of these points are associated with a value 1 and the others are associated with a value -1. I want to specify(give a color to) only those points which have value 1. Can anyone please tell me how this can be achieved in Gnuplot.
Thanks in advance.
If you want to filter out completely all points with value -1, you can do it as follows:
splot 'file' using 1:2:($4 == 1 ? $3 : 1/0) with points
That assumes, that your data file has four columns with the x, y, z value in the columns 1, 2, 3 and in the fourth column the values 1 or -1.
With the using statement you can specify which columns are used for plotting: using 1:2:3 uses the first column as x, the second as y and the third column as z value.
You can also do calculations inside the using statement. In that case you must put the respective expression in braces, and refer to the values of a column with e.g. $3 or column(3): using 1:2:($3/10) would scale the values in the third column by 10 and use the result as z value.
The expression I used above, using 1:2:($4 == 1 ? $3 : 1/0) does the following: If the value in the fourth column is equal to 1, use the value in the third column, otherwise use 1/0. The 'special' value 1/0 makes gnuplot ignore a point.

Resources