I have 2 series of data. For sake of simplicity, lets say the data looks like below,
set 1:
1 3
2 3.5
3 4
4 4.5
5 5
6 5.5
7 6
8 6.5
9 7
10 7.5
set 2:
1.5 2
2.8 4.5
3.5 8
4.5 6
5.5 4.8
6.5 4
7.5 6.5
8.5 9
9.5 3
10.5 4
After charting these 2 sets, I want to get the line with the higher data. I want the black line, In the attached pic. How do I get that? My actual data has thousands of data points, so doing this manually isn't possible.
Added later: Another thing I forgot to mention, in my actual data 1 set has about 500 x,y values, and the other set has about 50 values. Though the end points have same/similar x values.
Thanks for your help.
Given your information about the chart and the tables, I would do something like this:
The new series will be based on two formulas:
In Column H, I have the formula for the max value (between your two series):
=MAX(B2,E2)
In Column G, I have the formula that based on the Max value (formula above), which X value I should use (X-value from Series 1 or 2).
=IF(H2=B2,A2,D2)
Then I can plot my graph:
Series 1, Column B
Series 2, Column E
Series 3, Column H.
All series uses the X values of Column G.
Introduction
A few assumptions/comments/pitfalls/constraints regarding my solution:
Set 1 and Set 2 are in columns A till D.
The combined data set will combine the x-values of both Sets, and will have additional data points where the lines cross.
It involves several helper columns, in particular to allow you to copy/paste this across multiple worksheet with data.
I did not try to condense too much, to improve readability, and probably some helper columns could be combined.
It was tested with the data set from the question, but difficult to guarantee all "boundary" conditions, e.g. identical data points between Set 1 and Set 2, zero overlap between the two data sets, empty data sets, etc. (I did test some of these, see my comments at the end).
Set 1 and Set 2 must be sorted (on x-values). If this is not the case, a few additional helper columns are needed to sort the data dynamically.
To better understand the solution described below, see herewith the resulting graph, based on the data set in the question (although I added one data point [2.5;3.75] to avoid having the data points of Set 1 and Set 2 perfectly alternating):
General solution outline / methodology
Combine both datasets in a single (sorted) column;
For all x-values, determine highest y-value, between the y-value in the Set, and the calculated y-value on the line segment from the neighboring values in the other Set (looks simple, in particular with the given example data set, but this is quite tricky to do when data sets have no alternating x-values);
Find the points (x & y values) where the lines of the graph are crossing (intersecting), let's call this Set 3
Combine and sort (on x-values) the three data sets in a two columns (for x & y values).
The details and formulas
For the formulas, I assume row 1 contains headings, and the data start on row 2. All formulas should be entered in row 2, except for a few, where I mention to put them in row 3 (because they need data from the preceding row). The result is in columns E (x-values) and F (y-values), and G till AG are helper columns).
Column E : =INDEX(AH$2:AH$30;MATCH(ROWS(AH$2:AH2);$AJ$2:$AJ$30;0)) These is the actual result. Gets all x-values in AH and sorts these based on an index column AJ; this should actually be the last column in the logical flow, but for presentation purposes it is cleaner to have this next to the input data sets;
F : =INDEX(AF$2:AF$30;MATCH(ROWS(AF$2:AF2);$AG$2:$AG$30;0)) Same for y-values;
G : =IF(ISNA(H2);NA();COUNTIF($H$2:$H$30;"<="&H2)) Creates index to sort combined x-values of both data sets. You also can dynamically sort without such helper column, but then you need a VLOOKUP or INDEX/MATCH and with long decimal numbers I have some bad experiences with these;
H : =IF(ROW()-1<=COUNT($A$2:$A$30);A2;IF((ROW()-1)<=(COUNT($A$2:$A$30)+COUNT($C$2:$C$30));INDEX($C$2:$C$30;ROW()-COUNT($A$2:$A$30)-1;1);NA())) Combines x-values of both data sets, i.e. in columns A & C;
I : =IF(ROW()-1<=COUNT($B$2:$B$30);B2;IF((ROW()-1)<=(COUNT($B$2:$B$30)+COUNT($D$2:$D$30));INDEX($D$2:$D$30;ROW()-COUNT($B$2:$B$30)-1;1);NA())) Same for the y-values;
J : =IF(ROW()-1<=COUNT($A$2:$A$30);"S1";IF((ROW()-1)<=(COUNT($A$2:$A$30)+COUNT($C$2:$C$30));"S2";NA())) Assign "S1", or "S2" to each data point, as indication from which data set they come;
K : =IF(J2=J3;INTERCEPT(I2:I3;H2:H3);NA()) Determines the intercept of the line segment starting at that data point;
L : =IF(J2=J3;SLOPE(I2:I3;H2:H3);NA()) Same for slope;
M : =INDEX(H$2:H$30;MATCH(ROWS(H$2:H2);$G$2:$G$30;0)) Sorts all x-values;
N : =INDEX(I$2:I$30;MATCH(ROWS(I$2:I2);$G$2:$G$30;0)) Same for y-values
O : =INDEX(J$2:J$30;MATCH(ROWS(J$2:J2);$G$2:$G$30;0)) Same for corresponding "S1/S2" value to indicate from which data set they come;
P : =INDEX(K$2:K$30;MATCH(ROWS(K$2:K2);$G$2:$G$30;0)) Same for intercept;
Q : =INDEX(L$2:L$30;MATCH(ROWS(L$2:L2);$G$2:$G$30;0)) Same for slope;
R : =IF(O2="S1";"S2";"S1") Inversion between S1 & S2.
S : {=IFERROR(INDEX($O$2:$Q2;MAX(IF($O$2:$O2=$R3;ROW($O$2:$O2)-ROW(INDEX($O$2:$O2;1;1))+1));2);NA())} Array formula to be put in cell S3 (hence ctrl+shift+enter) that will search for the intercept of the preceding data point of the other data set.
T : {=IFERROR(INDEX($O$2:$Q2;MAX(IF($O$2:$O2=$R3;ROW($O$2:$O2)-ROW(INDEX($O$2:$O2;1;1))+1));3);NA())} Same for slope;
U : =IF(OR(ISNA(N2);NOT(ISNUMBER(S2)));NA();M2*T2+S2) Calculates the y-value on the line segment of the other data set;
V : =MAX(IFNA(U2;N2);N2) Maximum value between the original y-value and the calculated y-value on the corresponding line segment of the other data set;
W : =(V2=N2) Checks whether the y-value comes from the original data set or not;
X : =IF(O2="S1";IF(W2;"S1";"S2");IF(W2;"S2";"S1")) Determines on which data set (line) the y-value sits (S1 or S2);
Y : =IFERROR(AND((X2<>X3);COUNTIF(X3:$X$30;X2)>0);FALSE) Determines when the data sets cross (i.e. the lines on the graph intersect);
Z : =IF(Y2;(S2-P2)/(Q2-T2);NA()) Calculates x-value of intersection;
AA : =IF(Y2;Z2*Q2+P2;NA()) Calculates y-value of intersection;
AB : =COUNTIF($Z$2:$Z$30;"<="&Z2) Index to sort the newly calculated intersection points (I sort them because then the combining with the other data sets is straightforward, re-using formula of column H;
AC : =INDEX(Z$2:Z$30;MATCH(ROWS(Z$2:Z2);$AB$2:$AB$30;0)) Sorted x-values of intersection points;
AD : =INDEX(AA$2:AA$30;MATCH(ROWS(AA$2:AA2);$AB$2:$AB$30;0)) Same for y-values;
AE : =IF(ROW()-1<=COUNT(M$2:M$30);M2;IF((ROW()-1)<=(COUNT(M$2:M$30)+COUNT(AC$2:AC$30));INDEX(AC$2:AC$30;ROW()-COUNT(M$2:M$30)-1;1);NA())) Combine x-values of Set 1, Set 2, and the intersection points;
AF : =IF(ROW()-1<=COUNT(V$2:V$30);V2;IF((ROW()-1)<=(COUNT(V$2:V$30)+COUNT(AD$2:AD$30));INDEX(AD$2:AD$30;ROW()-COUNT(V$2:V$30)-1;1);NA())) Same for y-values;
AG : =IF(ISNA(AE2);NA();COUNTIF($AE$2:$AE$30;"<="&AE2)) Create index to sort the resulting data set (and this is used to calculate the final results in columns E & F;
All formulas go until row 30, but this need to be changed of course based on the actual data sets. The idea is to add these formulas to one worksheet, and then columns E > AG can be copied to all other worksheets. There are obviously quite a few #NA values, but this is on purpose, and are not errors or mistakes. On request, I can share the actual spreadsheet, so you do not have to retype all formulas.
Some additional comments
You have to modify some formulas (the sort indices) if there are identical x-values, either within Set 1 (which I will not cover here, as it seems this would be unlikely, or be data input errors), or between Set 1 and Set 2. The dynamic sorting does not work in that case. A workaround is to create a "synthetic" sort column, e.g. with =TEXT(J2;"0000.00000000000")&L2. This formats all numbers the same way as text, and appends S1 or S2. So this should give unique sort values, which would sort the same way as the corresponding numbers.
Empty data sets or data sets with only 1 value are not treated correctly either (the intercept formulas and finding values for the "previous" data point are meaningless in these cases).
Related
In my Excel file, I have data split up over different tables for different values of parameter X.
I have tables for parameter X for values 0.1, 0.5, 1, 5 and 10. Each table has a parameter Y at the far left that I want to able to search for with a few data cells right of it. Like so:
X = 0.1
Y
Data_0
Data_1
Data_2
1
0.071251
0.681281
0.238509
2
0.283393
0.509497
0.397196
3
0.678296
0.789879
0.439004
4
0.788525
0.363215
0.248953
etc.
Now I want to find Data_0, Data_1 and Data_2 for a given X and Y value (in two separate cells).
My thought was naming the tables X0.1 X0.5 etc. and when defining the matrix for the lookup function use some syntax that would change the table it searches in. With three of these functions in adjacent cells, I would obtain the three values desired.
Is that possible, or is there some other method that would give me the result I want?
Thanks in advance
On the question what would be my desired result from this data:
I would like A1 to give the value for the X I'm searching for (so 0.1 in this case)
A2 would be the value of Y (let's pick 3)
then I want C1:E1 to give the values 0.678... 0.789... 0.439...
Now from usmanhaq, I think it should be something like:
=vlookup(A2,concatenate("X",A1),2)
=vlookup(A2,concatenate("X",A1),3)
=vlookup(A2,concatenate("X",A1),4)
for the three cells.
This exact formulation doesn't work and I can't find the formulation that does work.
This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")
I want to create an automated scatter plot. This is the first example table based on the step size I end up measuring A, B, C, D for a specific frequency. In this scatter plot I created manually you can see I want to plot C v/s A for a particular frequency.
But I need to do this automatically as based on the step size the number of row can change. Here, since the step size decreased the number of samples increased, and now the scatter plot needs to update number of A and C values it plots.
Is there a formula I can use without using any macros?
The relation between the step size and frequency is (number of samples of a single frequency = (360/step size)) so for a step size of 60 you will have in reality six entries of frequency 100 and six of 200 .
You can use formulas to define chart ranges if you hide the formulas in named ranges. Combine that with the fact that #N/A values are not plotted and you can get this to work without VBA.
For your example graph you could define two names ranges as follows:
Name: A_100
Refers To: =IF(Sheet1!$E$3:$E$100=100,OFFSET(Sheet1!$A$3,0,0,360/Sheet1!$B$1,1),NA())
and
Name: C_100
Refers To: =IF(Sheet1!$E$3:$E$100=100,OFFSET(Sheet1!$C$3,0,0,360/Sheet1!$B$1,1),NA())
Then set the X and Y axis of the chart to SheetName!A_100 and SheetName!C_100
The if statement filters out all the points not at frequency 100, if you have a formula for selecting the frequency replace "Sheet1!$E$3:$E$100=100" with that.
The offset function takes the first cell in the column and expands the number of rows according to your 360/step size formula.
Since I solved previous problem with collecting data from database, I need to put that data on a chart now. I am working on a report generating software called ReportWorx.
Problem is, data comes in series and looks like this:
ID DATE SAMPLE
1 XX-XX-XX VALUE
1 XX-XX-XX VALUE
1 XX-XX-XX VALUE
2 XX-XX-XX VALUE
2 XX-XX-XX VALUE
3 XX-XX-XX VALUE
3 XX-XX-XX VALUE
I can not change how it looks because it is generated automatically. What I want is linear chart in which 1, 2, 3 are series name and of course next to it DATE and VALUE are put on a linear chart (or bargraph, w/e) (Date at X axis, Value at Y axis).
I can`t specify how many records will be there (how many rows) but I found few solutions about creating dynamically increasing charts, so probably it will not be a poblem. I just do not know how to separate thos ID series from each other.
EDIT:
I have found a solution in VBA according to the first answer. Here you have VBA code below:
Sub Rewrite()
Dim row, id
For row = 38 To 1000
For id = 1 To 37
If Sheet1.Cells(row, 1).Value = id Then
Sheet2.Cells(row, 1).Value = Sheet1.Cells(row, 2)
Sheet2.Cells(row, id + 1).Value = Sheet1.Cells(row, 3)
End If
Next id
Next row
End Sub
Thank You #sancho.s
I will post a solution that I use a lot for cases like yours.
With reference to the figure (where I used sample numbers), you set up 3 new columns (D:F here), the header of which contain the corresponding labels. Then you use a formula for "splitting" the list of X data (column B here) associated with each label, and assigning a "NULL" value for data not corresponding (#N/A here, but you can choose whatever you want):
=IF($A3=D$2,$B3,$B$1)
You enter this in D3. The absolute/relative indexing used allows for copy-and-paste throughout D3:F9.
Cell B1 here contains the "NULL" value.
Then you plot 3 series: column C against columns D, E, F.
PS: I guess you could split the Y data column instead, with similar results. For some reason that I do not recall, I decided a long time ago that this was the best option, at least in my case then. You may want to try out the other option.
PS2: This also works for data that is not sorted by label.
PS3: Using NA() as the "NULL" value avoids cell values being taken as zero and then showing up in the chart, as it is the case with other errors (e.g., try using =1/0 in B1). It is the best option I found so far. Alternatively (just in case you find it useful), you can use an explicit value which is outside the actual X data range, but then you would have to manually set the X axis range. All this is for a Scatter plot, just check what works for your case.
Say, my data file has two columns and five rows as follows,
1 3
2 5
3 3
4 4
5 2
Now I would like to plot them but with a little math operation on second column. For example,
plot 'test.dat' u 1:($2*)
What I mean by asterisk is I would like to sqrt(row2^2+row1^2), which is sqrt(5^2+3^2), on second column values. How I can do that? Many thanks!
Usually, one can access only the values of all columns of the current row. Accessing the values of a previous row is possible, but tricky. Basically, you must save the values in temporary variables.
That works in the following way:
In the first row, save the values of both columns and do not plot them (use NaN as value).
In the second row, save the current x-values, use the x-value of the previous row. Then save the current y-value, and compute your value based on the previous row (prevY) and the current row (currY).
That doesn't plot the last line. But that hasn't a next row anyway. If you want it to plot also the last line with e.g. 0 as additional value, you must add a last row with 0 0.
In the script I use set macros for better readability of the code:
set macros
prevX = currX = prevY = currY = 0
UsePreviousXvalue = '(($0 == 0) ? (prevX = NaN, currX = $1) : (prevX = currX, currX = $1)), prevX'
AssignYvalue = '(prevY = currY, currY = $2)'
plot 'test.dat' using (#UsePreviousXvalue):(#AssignYvalue, sqrt(prevY**2 + currY**2))