Adding annotated values in xmgrace using batch mode - string

I need some help adding a series of annotated string values to my xy plot in Grace-5.1.22 when using a batch script to plot the graph.
I have two columns of data: a series of x values and a series of y values, and then a third column containing a number for each data point. If I want to plot all numbers in the z column as annotated values then I am able to import the data into grace using the command:
BLOCK XYZ "1:2:3"
S0 AVALUE ON
S0 AVALUE TYPE "Z"
However, since where data points are bunched together the annotations are not clear, I don't want to display the annotation on EVERY point and therefore have replaced some values in the z column with "" to leave the corresponding annotation blank. This means that the 3rd column is now a string and the above commands will not work.
This website http://plasma-gate.weizmann.ac.il/Xmgr/doc/commands.html suggests that I should be able to import the data as:
BLOCK XYSTRING "1:2:3"
However, unfortunately this type was removed from versions 5.0.3 onwards http://plasma-gate.weizmann.ac.il/Grace/doc/CHANGES.html
Instead the documentation now says that "any set type now may have an additional column of text strings associated" however the command:
BLOCK XY "1:2:3"
does not load any data giving the error: "Too many columns scanned in column string".
I am able to import string type annotated values into xy data using the GUI by:
Data>>Import>>ASCII>>"filename.dat">>"load as BLOCK DATA">>"strings from column 3"
however since I would like to plot the same graph many times for different data sets I would like a way of automating this process and not using the GUI.
Sorry for the for the long question but I would be very grateful if anyone could help me.

I know it's old but I had the same question and after 3 hours of experimenting, I found a solution. Therefore I post it for future reference (since no answer currently exists elsewhere online).
If your "filename.dat" has the content format of (x y "string") or (x y z) then you can add to your batch file the following to read it and add the annotations:
READ XY "filename.dat"
s0 line type 1
s0 line linewidth 1
s0 line color 1
# Enable annotations
s0 avalue on
# Select type: 0=None,4=String,5=Z
s0 avalue type 5
s0 avalue color 1
s0 avalue char size 1.0000
s0 avalue offset 0.000000 , 0.000000
If you want to see more options for batch files, you can make a plot interactively with the gui and then save it as an .agr file. If you open this with a text editor, you have all the available commands there.

This is not a direct answer to your question but it may give you a hint.
I ran accross the same error message with the following syntax :
gracebat -graph 0 -settype bardy -block mydatafile -bxy 0:5:6:2
(the last column intended to be the optional string column)
And this one proved to be the right one :
gracebat -graph 0 -settype bardy -block mydatafile -bxy 0:5:6:{2}
So I guess you need to add curly braces.

Related

How to calculate with the Poisson-Distribution in Matlab?

I’ve used Excel in the past but the calculations including the Poisson-Distribution took a while, that’s why I switched to SQL. Soon I’ve recognized that SQL might not be a proper solution to deal with statistical issues. Finally I’ve decided to switch to Matlab but I’m not used to it at all, my problem Is the following:
I’ve imported a .csv-table and have two columns with values, let’s say A and B (110 x 1 double)
These values both are the input values for my Poisson-calculations. Since I wanna calculate for at least the first 20 events, I’ve created a variable z=1:20.
When I now calculated let’s say
New = Poisspdf(z,A),
it says something like non-scalar arguments must match in size.
Z only has 20 records but A and l both have 110 records. So I’ve expanded Z= 1:110 and transposed it:
Znew = Z.
When I now try to execute the actual calculation:
Results = Poisspdf(Znew,A).*Poisspdf(Znew,B)
I always get only a 100x1 Vector but what I want is a matrix that is 20x20 for each record of A and B (based on my actual choice of z=1:20, I only changed to z=1:110 because Matlab told that they need to match in size).
So in this 20x20 Matrix there should always be in each cell the result of a slightly different calculation (Poisspdf(Znew,A).*Poisspdf(Znew,B)).
For example in the first cell (1,1) I want to have the result of
Poisspdf(0,value of A).*Poisspdf(0,value of B),
in cell(1,2): Poisspdf(0,value of A).*Poisspdf(1,value of B),
in cell(2,1): Poisspdf(1,value of A).*Poisspdf(0,value of B),
and so on...assuming that it’s in the Format cell(row, column)
Finally I want to sum up certain parts of each 20x20 matrix and show the result of the summed up parts in new columns.
Is there anybody able to help? Many thanks!
EDIT:
Poisson Matrix in Excel
In Excel there is Poisson-function: POISSON(x, μ, FALSE) = probability density function value f(x) at the value x for the Poisson distribution with mean μ.
In e.g. cell AD313 in the table above there is the following calculation:
=POISSON(0;first value of A;FALSE)*POISSON(0;first value of B;FALSE)
, in cell AD314
=POISSON(1;first value of A;FALSE)*POISSON(0;first value of B;FALSE)
, in cell AE313
=POISSON(0;first value of A;FALSE)*POISSON(1;first value of B;FALSE)
, and so on.
I am not sure if I completely understand your question. I wrote this code that might help you:
clear; clc
% These are the lambdas parameters for the Poisson distribution
lambdaA = 100;
lambdaB = 200;
% Generating Poisson data here
A = poissrnd(lambdaA,110,1);
B = poissrnd(lambdaB,110,1);
% Get the first 20 samples
zA = A(1:20);
zB = B(1:20);
% Perform the calculation
results = repmat(poisspdf(zA,lambdaA),1,20) .* repmat(poisspdf(zB,lambdaB)',20,1);
% Sum
sumFinal = sum(results,2);
Let me know if this is what you were trying to do.

Excel - max between 2 series

I have 2 series of data. For sake of simplicity, lets say the data looks like below,
set 1:
1 3
2 3.5
3 4
4 4.5
5 5
6 5.5
7 6
8 6.5
9 7
10 7.5
set 2:
1.5 2
2.8 4.5
3.5 8
4.5 6
5.5 4.8
6.5 4
7.5 6.5
8.5 9
9.5 3
10.5 4
After charting these 2 sets, I want to get the line with the higher data. I want the black line, In the attached pic. How do I get that? My actual data has thousands of data points, so doing this manually isn't possible.
Added later: Another thing I forgot to mention, in my actual data 1 set has about 500 x,y values, and the other set has about 50 values. Though the end points have same/similar x values.
Thanks for your help.
Given your information about the chart and the tables, I would do something like this:
The new series will be based on two formulas:
In Column H, I have the formula for the max value (between your two series):
=MAX(B2,E2)
In Column G, I have the formula that based on the Max value (formula above), which X value I should use (X-value from Series 1 or 2).
=IF(H2=B2,A2,D2)
Then I can plot my graph:
Series 1, Column B
Series 2, Column E
Series 3, Column H.
All series uses the X values of Column G.
Introduction
A few assumptions/comments/pitfalls/constraints regarding my solution:
Set 1 and Set 2 are in columns A till D.
The combined data set will combine the x-values of both Sets, and will have additional data points where the lines cross.
It involves several helper columns, in particular to allow you to copy/paste this across multiple worksheet with data.
I did not try to condense too much, to improve readability, and probably some helper columns could be combined.
It was tested with the data set from the question, but difficult to guarantee all "boundary" conditions, e.g. identical data points between Set 1 and Set 2, zero overlap between the two data sets, empty data sets, etc. (I did test some of these, see my comments at the end).
Set 1 and Set 2 must be sorted (on x-values). If this is not the case, a few additional helper columns are needed to sort the data dynamically.
To better understand the solution described below, see herewith the resulting graph, based on the data set in the question (although I added one data point [2.5;3.75] to avoid having the data points of Set 1 and Set 2 perfectly alternating):
General solution outline / methodology
Combine both datasets in a single (sorted) column;
For all x-values, determine highest y-value, between the y-value in the Set, and the calculated y-value on the line segment from the neighboring values in the other Set (looks simple, in particular with the given example data set, but this is quite tricky to do when data sets have no alternating x-values);
Find the points (x & y values) where the lines of the graph are crossing (intersecting), let's call this Set 3
Combine and sort (on x-values) the three data sets in a two columns (for x & y values).
The details and formulas
For the formulas, I assume row 1 contains headings, and the data start on row 2. All formulas should be entered in row 2, except for a few, where I mention to put them in row 3 (because they need data from the preceding row). The result is in columns E (x-values) and F (y-values), and G till AG are helper columns).
Column E : =INDEX(AH$2:AH$30;MATCH(ROWS(AH$2:AH2);$AJ$2:$AJ$30;0)) These is the actual result. Gets all x-values in AH and sorts these based on an index column AJ; this should actually be the last column in the logical flow, but for presentation purposes it is cleaner to have this next to the input data sets;
F : =INDEX(AF$2:AF$30;MATCH(ROWS(AF$2:AF2);$AG$2:$AG$30;0)) Same for y-values;
G : =IF(ISNA(H2);NA();COUNTIF($H$2:$H$30;"<="&H2)) Creates index to sort combined x-values of both data sets. You also can dynamically sort without such helper column, but then you need a VLOOKUP or INDEX/MATCH and with long decimal numbers I have some bad experiences with these;
H : =IF(ROW()-1<=COUNT($A$2:$A$30);A2;IF((ROW()-1)<=(COUNT($A$2:$A$30)+COUNT($C$2:$C$30));INDEX($C$2:$C$30;ROW()-COUNT($A$2:$A$30)-1;1);NA())) Combines x-values of both data sets, i.e. in columns A & C;
I : =IF(ROW()-1<=COUNT($B$2:$B$30);B2;IF((ROW()-1)<=(COUNT($B$2:$B$30)+COUNT($D$2:$D$30));INDEX($D$2:$D$30;ROW()-COUNT($B$2:$B$30)-1;1);NA())) Same for the y-values;
J : =IF(ROW()-1<=COUNT($A$2:$A$30);"S1";IF((ROW()-1)<=(COUNT($A$2:$A$30)+COUNT($C$2:$C$30));"S2";NA())) Assign "S1", or "S2" to each data point, as indication from which data set they come;
K : =IF(J2=J3;INTERCEPT(I2:I3;H2:H3);NA()) Determines the intercept of the line segment starting at that data point;
L : =IF(J2=J3;SLOPE(I2:I3;H2:H3);NA()) Same for slope;
M : =INDEX(H$2:H$30;MATCH(ROWS(H$2:H2);$G$2:$G$30;0)) Sorts all x-values;
N : =INDEX(I$2:I$30;MATCH(ROWS(I$2:I2);$G$2:$G$30;0)) Same for y-values
O : =INDEX(J$2:J$30;MATCH(ROWS(J$2:J2);$G$2:$G$30;0)) Same for corresponding "S1/S2" value to indicate from which data set they come;
P : =INDEX(K$2:K$30;MATCH(ROWS(K$2:K2);$G$2:$G$30;0)) Same for intercept;
Q : =INDEX(L$2:L$30;MATCH(ROWS(L$2:L2);$G$2:$G$30;0)) Same for slope;
R : =IF(O2="S1";"S2";"S1") Inversion between S1 & S2.
S : {=IFERROR(INDEX($O$2:$Q2;MAX(IF($O$2:$O2=$R3;ROW($O$2:$O2)-ROW(INDEX($O$2:$O2;1;1))+1));2);NA())} Array formula to be put in cell S3 (hence ctrl+shift+enter) that will search for the intercept of the preceding data point of the other data set.
T : {=IFERROR(INDEX($O$2:$Q2;MAX(IF($O$2:$O2=$R3;ROW($O$2:$O2)-ROW(INDEX($O$2:$O2;1;1))+1));3);NA())} Same for slope;
U : =IF(OR(ISNA(N2);NOT(ISNUMBER(S2)));NA();M2*T2+S2) Calculates the y-value on the line segment of the other data set;
V : =MAX(IFNA(U2;N2);N2) Maximum value between the original y-value and the calculated y-value on the corresponding line segment of the other data set;
W : =(V2=N2) Checks whether the y-value comes from the original data set or not;
X : =IF(O2="S1";IF(W2;"S1";"S2");IF(W2;"S2";"S1")) Determines on which data set (line) the y-value sits (S1 or S2);
Y : =IFERROR(AND((X2<>X3);COUNTIF(X3:$X$30;X2)>0);FALSE) Determines when the data sets cross (i.e. the lines on the graph intersect);
Z : =IF(Y2;(S2-P2)/(Q2-T2);NA()) Calculates x-value of intersection;
AA : =IF(Y2;Z2*Q2+P2;NA()) Calculates y-value of intersection;
AB : =COUNTIF($Z$2:$Z$30;"<="&Z2) Index to sort the newly calculated intersection points (I sort them because then the combining with the other data sets is straightforward, re-using formula of column H;
AC : =INDEX(Z$2:Z$30;MATCH(ROWS(Z$2:Z2);$AB$2:$AB$30;0)) Sorted x-values of intersection points;
AD : =INDEX(AA$2:AA$30;MATCH(ROWS(AA$2:AA2);$AB$2:$AB$30;0)) Same for y-values;
AE : =IF(ROW()-1<=COUNT(M$2:M$30);M2;IF((ROW()-1)<=(COUNT(M$2:M$30)+COUNT(AC$2:AC$30));INDEX(AC$2:AC$30;ROW()-COUNT(M$2:M$30)-1;1);NA())) Combine x-values of Set 1, Set 2, and the intersection points;
AF : =IF(ROW()-1<=COUNT(V$2:V$30);V2;IF((ROW()-1)<=(COUNT(V$2:V$30)+COUNT(AD$2:AD$30));INDEX(AD$2:AD$30;ROW()-COUNT(V$2:V$30)-1;1);NA())) Same for y-values;
AG : =IF(ISNA(AE2);NA();COUNTIF($AE$2:$AE$30;"<="&AE2)) Create index to sort the resulting data set (and this is used to calculate the final results in columns E & F;
All formulas go until row 30, but this need to be changed of course based on the actual data sets. The idea is to add these formulas to one worksheet, and then columns E > AG can be copied to all other worksheets. There are obviously quite a few #NA values, but this is on purpose, and are not errors or mistakes. On request, I can share the actual spreadsheet, so you do not have to retype all formulas.
Some additional comments
You have to modify some formulas (the sort indices) if there are identical x-values, either within Set 1 (which I will not cover here, as it seems this would be unlikely, or be data input errors), or between Set 1 and Set 2. The dynamic sorting does not work in that case. A workaround is to create a "synthetic" sort column, e.g. with =TEXT(J2;"0000.00000000000")&L2. This formats all numbers the same way as text, and appends S1 or S2. So this should give unique sort values, which would sort the same way as the corresponding numbers.
Empty data sets or data sets with only 1 value are not treated correctly either (the intercept formulas and finding values for the "previous" data point are meaningless in these cases).

Gnuplot - plotting series based on label in third column

I have data in the format:
1 1 A
2 3 ab
1 2 A
3 3 x
4 1 x
2 3 A
and so on. The third column indicates the series. That is in the case above there are 3 distinct data series, one designated A, another designated ab and last designated x. Is there a way to plot the three data series from such data structure in gnuplot without using eg. awk? The difficulty here is that the number of categories (here denoted A, ab, x) is quite large and it is not feasible to write them out by hand.
I was thinking along the lines:
plot data u 1:2:3 w dots
but that does not work and I get warning: Skipping data file with no valid points (I tried quoted and unquoted version of the third column). A similar question has to manually define the palette which is undesirable.
With a little bit of work you can make a list of unique categories from within gnuplot without using external tools. The following code snippet first assembles a list of the entire third column of the data file, and then loops over it to generate a list of unique category names. If memory use or processing time become an issue then one could probably combine these steps and avoid forming a single string with the entire third column.
delimiter = "#" # some character that does not appear in category name
categories = ""
stats "test.dat" using (categories = categories." ".delimiter.strcol(3).delimiter) nooutput
unique_categories = ""
do for [cat in categories] {
if (strstrt (unique_categories, cat) ==0) {
unique_categories = unique_categories." ".cat
}
}
set xrange[0:5]
set yrange [0:4]
plot for [cat in unique_categories] "test.dat" using 1:(delimiter.strcol(3).delimiter eq cat ? $2 : NaN) title cat[2:strlen(cat)-1]
Take a look at the contents of the string variables categories and unique_categories to get a better idea of what this code does.

charting trendlines together

I need to display this kind of data in one chart:
Which means:
Function A: Y= X + 10 ;
Function B: Y= X * 2 ;
but what I got from excel is:
Actually what I want is (sorry for the drawing):
My real data is like:
If I format it like this:
I can still not get the correct charts.
Updated to account for change of question
Your data structure is wrong to achieve this in Excel, see for example:
On the left you want your x axis values, don't put a header in this column otherwise it will plotted on your graph as a series (extra line).
You can fill in gaps in your data with #N/A and excel will ignore these points, joining the line to the next available point.

R: Reference data frame to return column names as a variable, not string (for subset)

Is there a way to reference a data frame's column names as a variable, not a string (in R)? Say I want to get the first column name of data frame df. the code colnames returns...
> colnames(df)[[1]]
[1] "colname1"
The reason I ask is I'm having a hard time making the function subset generalized to any data frame. Say I wish to do a conditional subset on a data frame with a known conditional, but I don't know the column name at runtime (just the column number). Example --
> df<-data.frame( x=c(1:3), y=c(4:6))
> df.sub <- subset(df, df$y >5 )
But lets say I don't know the column name of df at runtime, only that its column number 2. The function call
> df.sub <- subset(df, colnames(df)[[2]] >5 )
Doesn't work because colnames returns a string, and subset is 'smart' and looks inside df for the object name. Is there a good way around this? I could use [ 's instead but I feel the problem would be the same.
You should be able to use double square brackets successfully for either name or index number:
> subset(df, df[["y"]] > 5)
x y
3 3 6
> subset(df, df[[2]] > 5)
x y
3 3 6
However, note the following from the help page to subset:
Warning
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[, and in particular the non-standard evaluation of argument subset
can have unanticipated consequences.
And, to give some bad advice, you could also use get:
> subset(df, get(colnames(df)[2]) > 5)
x y
3 3 6
As #Roland notes in the comments, most R users would actually use something along the lines of:
> df[df[[2]] > 5, ]
x y
3 3 6

Resources