Any simple way to do VLOOKUP combine "linear interpolation" in excel? - excel

I'm making an excel sheet for calculating z-score for infant weight/age (Input: "Baby Month Age", and "Baby weight"). To do that, I need get LMS parameters first for a specific month, from below table.
http://www.who.int/childgrowth/standards/tab_wfa_boys_p_0_5.txt
(For Integer Month number, this can be done by vlookup Method without issue.) For Non-Integer Month number, I need use some kind of "linear interpolation" approach to get an approximate LMS data.
The question is, both Trend method and Vlookup method are not working for me. For Trend method, it is not working as the raw data, like L parameters is not linear data, if I use Trend method, for the several top month, return data will far from existing data. As for Vlookup method, it just finds the closest month data.
I had to use multiple "Match" and "Index" Method to do the "linear interpolation" for myself. However, I wonder whether there is any existing function for that?
My current formula for L parameters is below:
=MOD([Month Age],1)*(INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A)+1,2)-INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A),2))+INDEX('WHO BOY AGE WEIGHT'!A:D,MATCH([Month Age],'WHO BOY AGE WEIGHT'!A:A),2)

If we assume that months increment always by 1 (no gap in month data), you can use something like this formula to interpolate between the two values surrounding the give non-integer value:
=(1-MOD(2.3, 1))*VLOOKUP(2.3,A:S,2)+MOD(2.3, 1)*VLOOKUP(2.3+1,A:S, 2)
Which interpolates L(2.3) from data of L(2) = .197 and L(3) = .1738, resulting in .19004.
You can replace 2.3 by any cell reference. You can also change the lookup column 2 for L into 3 for M, 4 for S etc.
To answer the question whether there is some direct "interpolate" function in Excel, not that I know about, although there is good artillery for statistical estimation.

Related

EXCEL - Dual VLOOKUP and Interpolation

I have a table on Excel with data as the following:
Meaning, I have different JPH based on the %SMALL unit and the number of active stations.
I need to create a matrix like the following (with %SMALL on horizontal and STATIONS on vertical axes):
And the formula for each cell should:
Take the input of Stations (column "B")
Check, for that specific Stations number, the amount of data on the other table (like make a filter on STATIONS for the specific number)
Perform an VLOOKUP for checking the JPH based on the %SMALL value on row 2
Interpolate for the exact JPH value, if not found on table
For now, I was able to create the last part (the VLOOKUP and the interpolation), with the following:
=IFERROR(VLOOKUP(C2;'EARLY-STATIONS'!$F:$H;3;FALSE);AVERAGE(OFFSET(INDEX('EARLY-STATIONS'!$H:$H;MATCH(C2;'EARLY-STATIONS'!$F:$F;1));0;0;2;1)))
The problem I'm facing is than with this, the calculation is not checking the number of stations, so the Iteration is not accurate.
Unfortunately I cannot use VBA macros to solve this.
Any clue?
This is an attempt because more clarity is needed in terms of all possible scenarios to consider, based on different input data and how to understand the "extrapolation" process. This approach understands as extrapolation the average of two values (lower and greater), but the idea can be customized to any other way to calculate it. Per tags listed in the question I assume there is no Excel version constraint. This is O365 solution:
=LET(sm, A2:A10, st, B2:B10, jph, C2:C10, smx, F1:J1, sty, E2:E4, NULL, "",
GETLk, LAMBDA(x,y,mode, FILTER(jph, (st=y)
* (sm = INDEX(sm, XMATCH(x, sm, mode))), NULL)),
GET, LAMBDA(x,y, LET(f, FILTER(jph, (jph=GETLk(x,y, 1))
+ (jph=GETLk(x,y, -1)), NULL), IF(#f=NULL, NULL, AVERAGE(f)))),
HREDUCE, LAMBDA(yi, DROP(REDUCE("", smx, LAMBDA(ac,x,
HSTACK(ac, GET(x, yi)))),,1)),
DROP(REDUCE("", sty, LAMBDA(ac,y, VSTACK(ac, HREDUCE(y)))),1))
The above formula spills the entire result, I don't think for this case you can use a LOOKUP-like function.
Here is the output:
The highlighted cells where the average is calculated.
Explanation
The main idea is to use DROP/REDUCE/HSTACK/VSTACK pattern to generate the grid. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length on how to apply it.
We use two user LAMBDA functions to abstract some calculations:
GETLk(x,y,mode), filters jph name based on %SMALL and Stations columns values, based on input values x (x-axis value from the grid), y (y-axis value form the grid) respectively. The third input argument mode, is for doing the approximate search in XMATCH (1-next largest, -1 next smallest). In case the value exist in the input table, XMATCH returns the same value in both cases.
GET(x,y) has the logic to find the value or if the value doesn't exist to calculate the average. It uses the previous LAMBDA function GETLk. We filter for jph values that match the input values (x,y), but we use an OR condition in the FILTER (+), to select both lower or greater values. If the value exist, returns just one value otherwise two values are returned by FILTER (f). Finally if f is not empty we return the average, otherwise the value we setup as NULL.
HREDUCE: Concatenate the result by columns for a given row of the grid. Check the referred question for more information about it.

How to get min() max() sequentially from a single column?

Hi Stackoverflow Community,
This is my first post, apologies ahead if I haven't structured my question better, I'll try to improve on it later on.
I have a 2 excel columns with street address numbers and street address, none of which are unique hence using vlookup/index match is tricky.
I am trying to populate another column with the minimum and maximum value of the street address numbers using TEXTJOIN and it works, however I need the min/max for each specific street address group, there are close to 1 million lines of data.
For example, min=1, max=13, on florence st, min=3, max=53, on gibson st
Is such a formula helpful for you:
=MINIFS(B$2:B$10,A$2:A$10,"Ramsay street")
As you see, I take the minimum value of the "B" column, based on a criterion on the "A" column.
Hereby a screenshot of an example:
If you are on Excel 365 - current channel you can use this formula:
=LET(streetnamesUnique,UNIQUE(data[StreetName]),
minNumber,BYROW(streetnamesUnique,LAMBDA(s, MINIFS(data[Number],data[StreetName],s))),
maxNumber,BYROW(streetnamesUnique,LAMBDA(s, MAXIFS(data[Number],data[StreetName],s))),
HSTACK(streetnamesUnique,minNumber,maxNumber))
If on Excel 365 semi annual channel:
=LET(streetnamesUnique,UNIQUE(data[StreetName]),
minNumber,BYROW(streetnamesUnique,LAMBDA(s, MINIFS(data[Number],data[StreetName],s))),
maxNumber,BYROW(streetnamesUnique,LAMBDA(s, MAXIFS(data[Number],data[StreetName],s))),
MAKEARRAY(ROWS(streetnamesUnique),3,LAMBDA(r,c,
IF(c=1,INDEX(streetnamesUnique,r),
IF(c=2,INDEX(minNumber,r),
INDEX(maxNumber,r))))))
Both formulas first retrieve the unique streetnames - then retrieves, per each streetname min and max number.
In the end the new range is built from these values - either by HSTACK or MAKEARRAY.

How to define an array of numbers with a formula

I have a project where I need to break people into 3 buckets with task lists that rotate quarterly (Phase A = task list 1, B = task list 2, C = task list 3). The goal here is to sort people into the buckets based on a departure date, with the ideal being that they would depart when they're in the C phase. I have a formula already set up that will tell me the number of quarters between the project start date and the person's departure date, so now I'm trying to figure out how to get Excel to tell me if a person's departure date falls within their bucket's C Phase.
I have this formula in a column called DEROSQtr:=ROUNDDOWN(DAYS360("1-Oct-2020",[#DEROS],FALSE)/90,0)
Now the easy way to approach this would be to build a static array and just see if that formula results in a value in the right array, where the numbers in the array define which quarter from Oct 2020 that the bucket's C Phase is going to be in:
ArrayA = {1;4;7;10;13;16} ArrayB = {2;5;8;11;14;17} ArrayC = {0;3;6;9;12;15}
The formula that pulls this all together is then:
=IF([#EFP]="A",IF(IFNA(MATCH([#DEROSQtr],ArrayA,0),-1)<>-1,TRUE,FALSE),IF([#EFP]="B",IF(IFNA(MATCH([#DEROSQtr],ArrayB,0),-1)<>-1,TRUE,FALSE),IF([#EFP]="C",IF(IFNA(MATCH([#DEROSQtr],ArrayC,0),-1)<>-1,TRUE,FALSE),"-")))
Now while this will work for as long as I build out the static array, I'm trying to figure out how to define each of these buckets with a formula that Excel can work with, i.e. bucket A hits phase C in 3n + 1 quarters where n is the number of cycles through all 3 phases, so ArrayA = 3n+1, ArrayB = 3n+2 and ArrayC = 3n. What I'm hunting for here is the best way to define each of the arrays as a formula.
After some additional digging and looking back at how to define each array, I came across the MOD() function in Excel. I was then able to rewrite the formula that does the checking as =IF([#EFP]="A",IF(MOD([#DEROSQtr]-1,3)=0,TRUE,FALSE),IF([#EFP]="B",IF(MOD([#DEROSQtr]-2,3)=0,TRUE,FALSE),IF([#EFP]="C",IF(MOD([#DEROSQtr],3)=0,TRUE,FALSE),"-"))), replacing ArrayA(3n+1) with MOD([#DEROSQtr]-1,3), ArrayB(3n+2) with MOD([#DEROSQtr]-2,3), and ArrayC(3n) with MOD([#DEROSQtr],3).
Since I do not have the data you are calculating your quarter, its difficult to give you exact answer. However, as I understand your have a column which has the formula to calculate the quarter say "Formula_Col"
Solution will be to add a new column and flag it based on the values in "Formula_Col".
If you can give some sample data I can provide exact answer.

Excel CUBEVALUE & CUBESET count records greater than a number

I am writing a series of queries to my workbook's data model to retrieve the number of documents by Category_Name which are greater than a certain numbers of days old (e.g. >=650).
Currently this formula (entered in celll C3) returns the correct number for a single Days Old value (=3).
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]",
"[EDD_Report_10-01-18].[Days Old].[34]")
How do I return the number of documents for Days Old values >=650?
The worksheet looks like:
A B C
1 Date PL Count of Docs
2 10/1/2018 ALD 3
3 ...
UPDATE: As suggested in #ama 's answer below, the expression in step B did not work.
However, I created a subset of the Days Old values using
=CUBESET("ThisWorkbookDataModel",
"{[EDD_Report_10-01-18].[Days Old].[all].[650]:[EDD_Report_10-01-18].[Days Old].[All].[3647]}")
The cell containing this cubeset is referenced as the third Member_expression of the original CUBEVALUE formula. The limitation is now that the values for the beginning and end must be members of the Days Old set.
This is limiting, in that, I was hoping for a more general test for >=650 and there is no way to guarantee that specific values of Days Old will be in the query.
First time I hear about CUBE, so you got me curious and I did some digging. Definitely not an expert, but here is what I found:
MDX language should allow you to provide value ranges in the form of {[Table].[Field].[All].[LowerBound]:[Table].[Field].[All].[UpperBound]}.
A. Get the total number of entries:
D3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All]")
B. Get the number of entries less than 650:
E3 =CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[0]:[EDD_Report_10-01-18].[Days Old].[All].[649]}")
Note I found something about using .[All].[650].lag(1)} but I think for it to work properly your data might need to be sorted?
C. Substract
C3 =D3-E3
Alternatively, go for the quick and dirty:
=CUBEVALUE("ThisWorkbookDataModel",
"[Measures].[Count of Docs]",
"[EDD_Report].[Category_Name].&["&$B2&"]"),
"{[EDD_Report_10-01-18].[Days Old].[All].[650]:[EDD_Report_10-01-18].[Days Old].[All].[99999]}")
Hope this helps and do let me know, I am still curious!

Obtain every nth row of filtered records

I'm looking for information on how to copy nth rows of records from one excel sheet to the next, and now I am wondering if there is a way to do this for filtered data (i.e. I have 400 students enrolled at school, and I want every 15th male whose parents have not graduated from college (flags have been created for both gender and parent education, which I am using to filter on). Are there any ideas on how to do this? If not, I could just use the offset function for each combination of variables I am filtering on, but that's over 30-40 combinations if I did my math right. Thanks for any help you can provide.
There are a few standard formulas used for retrieving the first, second, third, etc set of values that match criteria. I prefer a standard formula model using the INDEX function and SMALL function. By throwing a little maths at the increment to change it from 1, 2, 3 ... to 1, 16, 31, 46, ... you should be able to achieve your offset results. In the following example image, I've used a stagger of 4 rather than 15 in order to accommodate sample data vertically while still producing more than a single result.
        
The formula in F2 is,
=IFERROR(INDEX(A$2:A$999, SMALL(INDEX(ROW($1:$998)+((C$2:C$999<>"M")+(D$2:D$999<>"N"))*1E+99, , ), 1+(ROW(1:1)-1)*4)), "")
For your purposes the 4 in 1+(ROW(1:1)-1)*4 will need to be changed to 15.
=IFERROR(INDEX(A$2:A$999, SMALL(INDEX(ROW($1:$998)+((C$2:C$999<>"M")+(D$2:D$999<>"N"))*1E+99, , ), 1+(ROW(1:1)-1)*15)), "")
Fill down as necessary.
Once you have retrieved a unique identifier, the remainder can be retrieved with a simple VLOOKUP function.

Resources