Explain LOOKUP formula - excel

I'm trying to understand some legacy Excel file (it works, but I would really like to understand how/why it's working).
There is a sheet for data input (input sheet)and some code that is called to process data in the input sheet. I found out that number of rows in the input sheet is determined using a Lookup formula like this:
=LOOKUP(2;1/('Input sheet'!E1:E52863<>"");ROW(A:A))
"E" column contains names for import items and column is NOT sorted
"A" column does not contain anything special - I can replace it with B, C or whatever column and it does not affect the formula's outcome
According to what I have found about Lookup behaviour: •If the LOOKUP function can not find an exact match, it chooses the largest value in the lookup_range that is less than or equal to the value.
What does this ^-1 operation to the specified range? If E(x) is not empty -> it should turn into 1, but if it is empty - then it would be 1/0 -> that should produce #DIV/0! error...
1/('Input sheet'!E1:E52863<>"")
The outcome is the same, if I replace 2 with any positive number (ok, tried only some, but it looks like this is the case). If I change lookup value to 0, then I get #N/A error -> •If the value is smaller than all of the values in the lookup_range, then the LOOKUP function will return #N/A
I am stuck... can anyone shed some light?

LOOKUP has the rare ability to ignore errors. Conducting the 1/n operation will produce an error every time n is zero. False is the same as zero. So, for your formula, every empty cell produces an error in this calculation. All of those results are put in a vector array in the 2nd argument.
Searching for any positive value (the 1st argument) larger than 1 will result in LOOKUP finding the last non-error value in the above vector.
It also has the nice optional 3rd argument where you can specify the vector of results from which to return the lookup value. This is similar to the INDEX component of the the INDEX/MATCH combo.
In the case of your formula, the 3rd argument is an array that looks like this: {1;2;3;4;5;6;7;8;9;...n} where n is the last row number of the worksheet, which in modern versions of Excel is 1048576.
So LOOKUP returns the value from the vector in the 3rd argument that corresponds to the last non-error (non-blank cell) in the 2nd argument.
Note that this method of determining the last row will ignore cells that have formulas that result in a zero-length string. Such cells look blank but of course they are not. Depending on the situation, this may be precisely what you want. If, on the other hand you want to find the last row in column E that has a formula in it even if it results in a zero-length string, then this will do that:
=MATCH("";'Input sheet'!E:E;)

You might get some idea what the formula is doing (or any other formula) if you apply Evaluate Formula. Though since the principle is the same whether 3 rows or 52863 I'd suggest limiting the range, to speed things up if choosing Evaluate Formula. As usual with trying to explain formulae, it is best to start from the inside and work outwards. This:
'Input Sheet'!E1:E52863<>""
returns an array with a result for every entry in ColumnE from Row1 to Row52863. Since it is a comparison (<> does not equal) the result is Boolean - ie TRUE (not empty) or FALSE (is empty). So if only the first half of E1 to E52863 is populated, the result is {TRUE;TRUE;TRUE; ... and a LOT more TRUE; ... and FALSE ... and a LOT more ;FALSE and finally }.
Working outwards, the next step is to divide this array into 1. In arithmetic operations Boolean TRUE is treated as 1 and FALSE as 0, so the resultant array is {1;1;1; ... and a LOT more 1; ... and #DIV/0!... and a LOT more ;#DIV/0! and finally }.
This then becomes the lookup_vector within which LOOKUP seeks the lookup_value. The lookup_value you show is 2. But the array comprises either 1 or #DIV/0! - so 2 will never be found in it. As you have noticed, that 2 could just as well be 3, or 45 or 123 - anything as long as not a value present in the array.
That (not present) is necessary because LOOKUP stops searching when it finds a match. The fact that there is no match forces it to the end of the (valid) possibilities - ie the last 1. At this point, in my opinion, it would be logical to return "not found" but - I suspect merely a quirk, though very convenient - it returns that 1 - by its index number in the list, ie 52863 if all cells in E1:E52863 are populated.
Although the result_vector (Row(A:A)) is optional for LOOKUP it is required in this usage in effect to fix the start point for the index (effectively Row1, since an entire column). You might change that to say A3:A.. and the result would be the number of the highest populated row number in ColumnE plus 2 (3 -1).

Related

In Excel, How do I return a list of values that match a description?

My question is that I want to return a list of values in column B in sheet 2 (or in this case NBA Players) that contain the value "PG" in cell A3 in sheet 1, from column A in sheet 2. Not only do I want it to match "PG" but I also want the value to have a salary (Column C) that is between $7100 (Cell B2 in Sheet 1) and $8000 (Cell C2) in Sheet 1). Any help would be appreciated.
you are either going to need to use an array formula or a function that returns array like calculations. I will suggest using the AGGREGATE function. Avoid using full comm/row references within an array formula or a function performing array like calculations or you may wind up bogging down your system with excessive calculations.
The AGGREGATE function is made up a several individual functions. Depending which one you choose, it will perform array operations. I am going to suggest that formula 14. What the following example will do is generate a list of results sorted from smallest to largest that ignores error values, then return the first value from the list. The thing we will list is the row number for a row that matches your ALL your criteria. So the basics of AGGREGATE looks like this:
AGGREGATE(Formula #, Error/hidden handling #, Formula, parameter)
The hardest part of this is coming up with the right formula. In the numerator you put the thing you are looking for. In the denominator you place your TRUE/FALSE condition checks. Separate each condition check with *. * will act as an AND function. The thing that makes this work is that TRUE/FALSE convert to 1/0 when they are sent through a math operation. So anything you do not want is FALSE. and anything divided by FALSE becomes divide by 0 which in turn generates an error. Since AGGREGATE is set to ignore error, only things that meet your condition will exist in the list and since they are being divided by TRUE which is 1, your thing remains unchanged. So the aggregate function is going to start to look like:
AGGREGATE(14,6,ROW(some range)/((Condition 1)*Condition 2)*...*(Condition N)),1)
So as eluded to before, 14 set the AGGREGATE to sort a list in ascending order. 6 tells AGGREGATE to ignore errors, and the 1 tells AGGREGATE to return the first item in its sorted list. If it was 2 instead of 1 it would return the 2nd position. If you ask for a position that is greater than the number of items in the list, there will be an error produced by AGGREGATE which does not get ignored.
So now that there is some understanding of what AGGREGATE does lets see how we can apply this to your data. For starters lets assume your data is in rows 2:100 and row 1 is a header row. You will have to adjust the references to suit your data.
CONDITION 1
LEFT($A$2:$A$100,2)="PG"
Checks to see if the first two characters are PG. based on the data in your screen shot, PG was either to the left of the / or was the only entry. There was also an observation that there was only one / in the cells of column A. If you also need to check if it after the / and with the assumption that it can only be on one side and not both at the same time you could use this alternative for your condition check:
(LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG")
In this case the + is performing the task of an OR function. The caveat mentioned earlier is important because if both sides are TRUE then you wind up with TRUE+TRUE which becomes 1+1 which is 2 and we only want to divide by 1 or 0. Though to counter that you could go with:
MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)
CONDITION 2
Check that the salary in C is less than or equal a value 80000.
($C$2:$C$100<=80000)
CONDITION 3
Check that the salary in C is greater than or equal a value 71000.
($C$2:$C$100>=71000)
Now lets put this all together to get a list of row numbers that meet your conditions:
AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))
Now provided I did not screw up the bracketing in that formula, you can place that formula in a cell and copy it down until it produces errors. As you copy it down, the only thing that will change is the A1 in ROW(A1). It acts like a counter. 1,2,3 etc. so you will get a list of row numbers that meet your criteria. Now we need to convert those row numbers to names.
To find the names, the INDEX function is your friend here. Because it is not part of an array formula or inside a function performing array like calculations, full column reference can be used. So we take our formula that is generating row numbers and place it inside the INDEX function to give:
INDEX(B:B,Row Number)
INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1)))
Now if you hate seeing error codes when you have copied down further then results you can place the whole thing inside and IFERROR function to give:
IFERROR(formula,What to display in case of an error)
So for blank entries:
IFERROR(INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))),"")
and custom message:
IFERROR(INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))),"NOT FOUND")
So now you just need to adjust the references to suit your data. If your data is located on another sheet remember to include the sheet name. A reference to B3:C4 would become:
Sheet1!B3:C4
and if the sheet name has a space in it:
'Space Name'!B3:C4

Unpredictable output from Excel Index-Match

I'm trying to do what should be a simple enough task: find the index of the last cell in an array that matches a certain value.
I am using a MATCH-INDEX function combination that is giving me incorrect, inconsistent results. I can't figure out the problem.
In this example I'm using an array with values of 1 or -1 and trying to find the index of the last 1 in the array.
My understanding is (and using the Evaluate Formula tool confirms), INDEX(A1:F1=1,0) should return an array of
{FALSE,TRUE,FALSE,TRUE,FALSE,FALSE}
Why does the MATCH(TRUE,...) not give a result of 4 when it is fed this index array? MATCH works backwards in the array and should return the index of the last match, which is the TRUE value in the 4th position.
But here this code is giving a result of 6.
To make matters worse, if I use the same code but change around the array fed into INDEX, the results are inconsistent. When I'm changing array values to 1 or -1, sometimes the formula result changes and sometimes it doesn't, and I can't figure out why.
Below, changing only the 3rd array value changes the result of the formula from 4 to 6. WHAT IS GOING ON?!
It seems that match only finds first, but not last.
But I have a possible solution, I would do it the following way:
expect: your values are between H4 AND H27, the value you look for is "39"
1: find the rownum for all matching cells
I4 --> I27: IF(H4 = 39; ROW(H4); -1)
2: find the max rownum in I4 --> I27
MAX(....)
3: subtract the rownum of first cell (zero- based index)
MAX(....) - ROW(H4)
4: to get it in only one cell put into a Matrix function (CONTROL SHIFT ENTER makes CURLY BRACKETS)
{=MAX(IF($H$4:$H$27=39;ROW(H4:H27);-1))-ROW(H4)}
sounds strange, but works :)
Index doesn't return an Array, but a number, as the name tells you, the "INDEX" of the element, which fulfills your question. With the last Parameter you can decide, whether it has to be exact or could be the next smaller or next bigger value (last both options will only work with sorted Input)
You can solve your question either in more then one step as written in my last answer or with matrix functions, in your special case (now with columns instead of rows):
{=MAX(IF(A1:F1=1;COL(A1:F1);-1))}

What is this programmer doing with his Lookup function?

I found on a forum a formula to find the last populated cell in a column:
LOOKUP(2,1/(G:G<>""),ROW(G:G)))
But what's going on with this bit?
1/(G:G<>"")
One divided by ??? (something that's not equal to ""?) I don't understand the logic, here.
If you want to observe the calculation step by step by evaluating the formula with Formula Auditing I strongly recommend limiting the range. Say apply:
=LOOKUP(2,1/(G1:G10<>""),ROW(G1:G10))
And, for illustration purposes, populate no lower down the sheet than say G6 (but a void between, in the range G1:G5, may in that case help to understand what is happening).
For this answer I am only going to consider five cells: G1, G3 and G4 populated, G2 and G5 (onwards) not.
1/(G1:G5<>"")
Is indeed at the heart of this formula. G1:G5<>"" does, as you have recognised, test whether not equal to "". "" is the convention for 'empty' for an Excel cell. If populated (ie "not empty") this returns TRUE and FALSE otherwise. Hence for the five cells as chosen for this example an array is returned, regarding G1:G5 in order, of:
TRUE;FALSE;TRUE;TRUE;FALSE.
In arithmetic calculations Excel treats TRUE as 1 and FALSE as 0. Hence using the above truth table as the denominator and 1 as the numerator gives an array (again in order) of:
1/1;1/0;1/1;1/1;1/0
which resolves to:
1;#DIV/0!;1;1;#DIV/0!.
In the LOOKUP function above 2 was chosen as the lookup_value. (Any other number greater than 1 would serve equally well.) So we are looking for 2 in an array that is composed exclusively of either 1s or errors. Therefore there is no chance of finding an exact match, so the default kicks in, which is the last value (in order, not counting errors). The last 1 in the array is the fourth element, and the fourth element in ROW(G1:G5) is …4.
G4 is the last populated cell in ColumnG (in my example).

4 variables index function, with great than and less than for 2 variables

I am trying to use index match functions to determine the appropriate rate for the below table.
So for example a consumer loan that is for a person that owns property, the car is 2 years or less in age and the total loan to value ratio is less than 140% should return a value of 5.15%
I believe this is what you wanted...
I would use a series of nested if functions to evaluate which column of LTV I would want the value to come from.
"That is what is done in the AND( ) part. If the value is greater than the 110% and smaller than 140% let's do the Index Match on the 110% Column, Otherwise do it on the 140% Column."
You could extend this for more columns with more IFs in the false condition.
Then it is a simple INDEX match with concatenation. It searches for the three parameters all concatenated in a single range of concatenations.
Hope it helped.
Proof of Concept
In order to achieve the above I had to make a minor edit to your header to be able to distinguish between the two 140% columns.
The functions used in this answer are:
AGGREGATE function
MATCH function
INDEX function
ROW function
IFERROR function
I placed the main part of the formula inside the IFERROR function as a way of dealing with things that may be out of range or when not all the input have been provided. I then assumed that what you were basing your search on would be provided in a series of cells. In my example I assumed the questions would be asked in the range H3 to K3 and I place the results in L3.
The main concept is centered around the INDEX function. I specified the index range as being the height of your table and the width of the percentage rates. Or for this example D2:F9.
=IFERROR(INDEX($D$2:$F$9,row number, column number),"Not Found")
That is the easy part. That more challenging part is determining the row and column number to look in. Lets start with the column number as it is the slightly easier of the two. I assumed the ratio to look for, or rather the header of the column to look in would be supplied. I basically used this equation to determine the column number:
=MATCH(K3,$D$1:$F$1,0)
which in layman's terms is which column between D and F, counting column D as 1, has the value equal to the contents of K3. So now that there is a formula to determine the column, we can drop that into our original formula and wind up with:
=IFERROR(INDEX($D$2:$F$9,row number,MATCH(K3,$D$1:$F$1,0)),"Not Found")
Now we just need to determine the row number. This is the most complex operation. We are going to basically make a bunch of logical checks and take the first row that matches all the logical checks. The premise here is that a logical check is either TRUE or FALSE. In excel 0 is false an every other integer is TRUE. So if we multiply a series of logical checks together, only the one that is true in all cases will be equal to 1. The first logical check is the loan type. it will be followed by the living status and then the vehicle age.
=(H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)
now if you put that into an array formula you will get a series of true false or 1/0. We are going to use it inside an AGGREGATE function with a special feature. The AGGREGATE function will perform array like calculation for some of its functions. We are going to use function 15 which will do this. We are also going to tell the aggregate function to ignore all errors, which is what the 6 does. So in the end what we wind up doing is dividing each row number by the logical check. If the logical check is false or 0, it will generate a Div/0! error which aggregate will choose to ignore. In the end we wind up with a list of row which match our logical check. We then tell the aggregate that we want the first result with the ,1. so we wind up with a formula that looks like:
=AGGREGATE(15,6,ROW($A$2:$A$9)/((H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)),1)
While this does provide us with the row number we want, we need to adjust it to make it an index number. In order to do this you need to subtract the number of header rows. In this case 1. So the index row number is given by this formula:
=AGGREGATE(15,6,ROW($A$2:$A$9)/((H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)),1)-1
And when we substitute that back into the earlier equation for the row number, we wind up with the final equation of:
=IFERROR(INDEX($D$2:$F$9,AGGREGATE(15,6,ROW($A$2:$A$9)/((H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)),1)-1,MATCH(K3,$D$1:$F$1,0)),"Not Found")

Return the index of the last column with a specific name in excel

I'm trying to create an excel register that counts the number of times someone has registered, and returns the date of the last time they turned up but am having trouble with this last step:
See this simplified setup:
To do this, I assume I need to find the index value of the column in which the name last appears, and the use it to return the date in the first row, the tricky part trying to get that index value.
I've tried to use lookup formulas and am pretty sure that an array formula is how this can be accomplished but am unsure how I can use them in this specific case.
Assuming you have Excel 2010 or later:
=INDEX($1:$1,AGGREGATE(14,6,COLUMN(A$2:D$8)/(A$2:D$8=O2),1))
Copy down as required.
The explanation is as follows:
The portion:
(A$2:D$8=O2)
simply returns an array of Boolean TRUE/FALSE values as to whether each of the cells within that range is equal to the entry in O2 or not, i.e. using your example:
{TRUE,FALSE,TRUE,FALSE;FALSE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE;FALSE,FALSE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE}
The part:
COLUMN(A$2:D$8)
returns the column number for each column within the specified range, i.e.:
{1,2,3,4}
By reciprocating this array with that containing our conditional Boolean TRUE/FALSE returns, we produce an array whose only numerical entries correspond to columns in which our search string (i.e. "James") is located, since:
COLUMN(A$2:D$8)/(A$2:D$8=O2)
which is:
{1,2,3,4}/{TRUE,FALSE,TRUE,FALSE;FALSE,FALSE,FALSE,FALSE;FALSE,TRUE,FALSE,FALSE;FALSE,FALSE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE;FALSE,FALSE,FALSE,FALSE}
becomes:
{1,#DIV/0!,3,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!;#DIV/0!,2,#DIV/0!,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,4;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!}
by virtue of the fact that, when subjected to a suitable mathematical operation (of which division is one), Boolean TRUE/FALSE values are coerced into their numerical equivalents (TRUE=1, FALSE=0), meaning that, effectively, for any numerical value x:
x/TRUE ⇒ x/1 = x
and:
x/FALSE ⇒ x/0 = #DIV/0!
By setting AGGREGATE's first parameter to 14 (equivalent to the function LARGE) and its second to 6 (instructing it to ignore any errors in the array passed), we can extract the largest column index which meets our criterion, such that:
AGGREGATE(14,6,COLUMN(A$2:D$8)/(A$2:D$8=O2),1)
which is here:
AGGREGATE(14,6,{1,#DIV/0!,3,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!;#DIV/0!,2,#DIV/0!,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,4;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!;#DIV/0!,#DIV/0!,#DIV/0!,#DIV/0!},1)
returns 4.
All that is left is to pass this value to INDEX, such that:
INDEX($1:$1,AGGREGATE(14,6,COLUMN(A$2:D$8)/(A$2:D$8=O2),1))
which is:
INDEX($1:$1,4)
returns:
13/11/2015
as required.
Regards

Resources