What is this programmer doing with his Lookup function? - excel-formula

I found on a forum a formula to find the last populated cell in a column:
LOOKUP(2,1/(G:G<>""),ROW(G:G)))
But what's going on with this bit?
1/(G:G<>"")
One divided by ??? (something that's not equal to ""?) I don't understand the logic, here.

If you want to observe the calculation step by step by evaluating the formula with Formula Auditing I strongly recommend limiting the range. Say apply:
=LOOKUP(2,1/(G1:G10<>""),ROW(G1:G10))
And, for illustration purposes, populate no lower down the sheet than say G6 (but a void between, in the range G1:G5, may in that case help to understand what is happening).
For this answer I am only going to consider five cells: G1, G3 and G4 populated, G2 and G5 (onwards) not.
1/(G1:G5<>"")
Is indeed at the heart of this formula. G1:G5<>"" does, as you have recognised, test whether not equal to "". "" is the convention for 'empty' for an Excel cell. If populated (ie "not empty") this returns TRUE and FALSE otherwise. Hence for the five cells as chosen for this example an array is returned, regarding G1:G5 in order, of:
TRUE;FALSE;TRUE;TRUE;FALSE.
In arithmetic calculations Excel treats TRUE as 1 and FALSE as 0. Hence using the above truth table as the denominator and 1 as the numerator gives an array (again in order) of:
1/1;1/0;1/1;1/1;1/0
which resolves to:
1;#DIV/0!;1;1;#DIV/0!.
In the LOOKUP function above 2 was chosen as the lookup_value. (Any other number greater than 1 would serve equally well.) So we are looking for 2 in an array that is composed exclusively of either 1s or errors. Therefore there is no chance of finding an exact match, so the default kicks in, which is the last value (in order, not counting errors). The last 1 in the array is the fourth element, and the fourth element in ROW(G1:G5) is …4.
G4 is the last populated cell in ColumnG (in my example).

Related

How to regexmatch a range?

I am trying to take a data from a table and get the value of how much a class gets a point. I used VLOOKUP to do this, but the problem is that I have to tell the sheets on which class gets how much.
The data:
Your data seems to be setup in a way that unnecessarily complicates things.
kelas-column isn't showing the class, but name and class. For easy use in calculation this would better be divided in two columns: name | class
poins-column seems to be numbers formatted as text (judging by the leading +) if it was showing the number only and the class would show the actual class, a simple SUMIF would solve your problem.
Now it's still doable using SUMPRODUCT:
=SUMPRODUCT(--(A17=RIGHT($B$2:$B$11,2)),--($D$2:$D$11))
The first part checks if the search value A17 equals the last 2 digits in range B2:B11 (the $'s in the formula are to lock the range when dragging the formula down or aside).
This results in an array of TRUE's and FALSE's which is converted to 1's and 0's by the leading --.
The second part simply converts the text values to numbers using the same logic as with the TRUE's and FALSE's, using the --.
SUMPRODUCT multiplies the first array with the second array and adds it all up.
If a condition is true it multiplies the value of the points column by 1 (equals the points), if false it multiplies by 0 (equals 0).
In the end it sums all values meeting given condition.

In Excel, How do I return a list of values that match a description?

My question is that I want to return a list of values in column B in sheet 2 (or in this case NBA Players) that contain the value "PG" in cell A3 in sheet 1, from column A in sheet 2. Not only do I want it to match "PG" but I also want the value to have a salary (Column C) that is between $7100 (Cell B2 in Sheet 1) and $8000 (Cell C2) in Sheet 1). Any help would be appreciated.
you are either going to need to use an array formula or a function that returns array like calculations. I will suggest using the AGGREGATE function. Avoid using full comm/row references within an array formula or a function performing array like calculations or you may wind up bogging down your system with excessive calculations.
The AGGREGATE function is made up a several individual functions. Depending which one you choose, it will perform array operations. I am going to suggest that formula 14. What the following example will do is generate a list of results sorted from smallest to largest that ignores error values, then return the first value from the list. The thing we will list is the row number for a row that matches your ALL your criteria. So the basics of AGGREGATE looks like this:
AGGREGATE(Formula #, Error/hidden handling #, Formula, parameter)
The hardest part of this is coming up with the right formula. In the numerator you put the thing you are looking for. In the denominator you place your TRUE/FALSE condition checks. Separate each condition check with *. * will act as an AND function. The thing that makes this work is that TRUE/FALSE convert to 1/0 when they are sent through a math operation. So anything you do not want is FALSE. and anything divided by FALSE becomes divide by 0 which in turn generates an error. Since AGGREGATE is set to ignore error, only things that meet your condition will exist in the list and since they are being divided by TRUE which is 1, your thing remains unchanged. So the aggregate function is going to start to look like:
AGGREGATE(14,6,ROW(some range)/((Condition 1)*Condition 2)*...*(Condition N)),1)
So as eluded to before, 14 set the AGGREGATE to sort a list in ascending order. 6 tells AGGREGATE to ignore errors, and the 1 tells AGGREGATE to return the first item in its sorted list. If it was 2 instead of 1 it would return the 2nd position. If you ask for a position that is greater than the number of items in the list, there will be an error produced by AGGREGATE which does not get ignored.
So now that there is some understanding of what AGGREGATE does lets see how we can apply this to your data. For starters lets assume your data is in rows 2:100 and row 1 is a header row. You will have to adjust the references to suit your data.
CONDITION 1
LEFT($A$2:$A$100,2)="PG"
Checks to see if the first two characters are PG. based on the data in your screen shot, PG was either to the left of the / or was the only entry. There was also an observation that there was only one / in the cells of column A. If you also need to check if it after the / and with the assumption that it can only be on one side and not both at the same time you could use this alternative for your condition check:
(LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG")
In this case the + is performing the task of an OR function. The caveat mentioned earlier is important because if both sides are TRUE then you wind up with TRUE+TRUE which becomes 1+1 which is 2 and we only want to divide by 1 or 0. Though to counter that you could go with:
MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)
CONDITION 2
Check that the salary in C is less than or equal a value 80000.
($C$2:$C$100<=80000)
CONDITION 3
Check that the salary in C is greater than or equal a value 71000.
($C$2:$C$100>=71000)
Now lets put this all together to get a list of row numbers that meet your conditions:
AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))
Now provided I did not screw up the bracketing in that formula, you can place that formula in a cell and copy it down until it produces errors. As you copy it down, the only thing that will change is the A1 in ROW(A1). It acts like a counter. 1,2,3 etc. so you will get a list of row numbers that meet your criteria. Now we need to convert those row numbers to names.
To find the names, the INDEX function is your friend here. Because it is not part of an array formula or inside a function performing array like calculations, full column reference can be used. So we take our formula that is generating row numbers and place it inside the INDEX function to give:
INDEX(B:B,Row Number)
INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1)))
Now if you hate seeing error codes when you have copied down further then results you can place the whole thing inside and IFERROR function to give:
IFERROR(formula,What to display in case of an error)
So for blank entries:
IFERROR(INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))),"")
and custom message:
IFERROR(INDEX(B:B,AGGREGATE(14,6,ROW($A$2:$A$100)/MIN((LEFT($A$2:$A$100,2)="PG")+(RIGHT($A$2:$A$100,2)="PG"),1)*($C$2:$C$100<=80000)*($C$2:$C$100>=71000),ROW(A1))),"NOT FOUND")
So now you just need to adjust the references to suit your data. If your data is located on another sheet remember to include the sheet name. A reference to B3:C4 would become:
Sheet1!B3:C4
and if the sheet name has a space in it:
'Space Name'!B3:C4

Find text in a table array excel

below is an extract of a large table (2017 IDs by 35 cases)
I want a formula which will look for a Case reference e.g. P0093 and return the first ID it finds (column A).
So for example, given 'P0094', the formula will result in '1'.
I expect the answer is something to do with an array formula, which are a bit of a blindspot for me.
Thanks in advance.
ID Case 1 Case 2 Case 3 Case 4 Case 5
1 P0001 P0092 P0093 P0094
2 P0016 P0150 P0419 P0420
3 P0018 P0189 P0421 P0422
4 P0004 P0095 P0096 P0097
5 P0005 P0104 P0105
6 P0021 P0068 P0069
7 P0007 P0098 P0099 P0100
8 P0008 P0101 P0102 P0103
9 P0009 P0062 P0233 P0234
Try this: (line break added for readability)
= IFERROR(INDEX(A:A,MATCH(1,(MMULT((A1:E10="P0094")+0,
TRANSPOSE((COLUMN(A1:E10)>0)+0))>0)+0,0)),"no match")
Just change both instances of E10 in the formula above to however large your actual data table is. (Assuming 2017 ID's and 35 cases, I would probably change E10 to AJ2018 but I don't know for sure.)
Also note this is an array formula, so you must press Ctrl+Shift+Enter on the keyboard after typing this formula rather than just pressing Enter.
non CSE alternative
See the picture for cell reference layout. Use the following formula in I13:
=IFERROR(INDEX($A$1:$A$10,AGGREGATE(15,6,ROW($B$2:$E$10)/($B$2:$E$10=$I$12),1)),"not found")
Aggregate performs array like operations without actually being an array.
The concept is to find the row where the value you are looking for is true and all other rows become an error. It starts by making the denominator return TRUE or False. Because the TRUE or FALSE is sent through a math operation Excel converts the TRUE to 1 and FALSE to 0. Since all values divided by 0 become an error, only rows where the denominator are true are going to be kept. The reason for this is the 6 in the aggregate function which tells aggregate to ignore all errors. 15 in aggregate tell aggregate to sort the results from smallest to largest. Finally the ,1) tells aggregate to return the first value in the list. Once that is known, INDEX takes over. Index returns the row entry in the range A1:A10 that is passed from aggregate. If the range had been A2:A10, then I would have to subtract row(A2)-1 to get the starting entry number in the list instead of the row number.
An important thing to note. Even though this is not an array, AGGREGATE performs array like calculations. As such full column references should be avoided within the AGGREGATE function to avoided wasted calculations on blank cells.

4 variables index function, with great than and less than for 2 variables

I am trying to use index match functions to determine the appropriate rate for the below table.
So for example a consumer loan that is for a person that owns property, the car is 2 years or less in age and the total loan to value ratio is less than 140% should return a value of 5.15%
I believe this is what you wanted...
I would use a series of nested if functions to evaluate which column of LTV I would want the value to come from.
"That is what is done in the AND( ) part. If the value is greater than the 110% and smaller than 140% let's do the Index Match on the 110% Column, Otherwise do it on the 140% Column."
You could extend this for more columns with more IFs in the false condition.
Then it is a simple INDEX match with concatenation. It searches for the three parameters all concatenated in a single range of concatenations.
Hope it helped.
Proof of Concept
In order to achieve the above I had to make a minor edit to your header to be able to distinguish between the two 140% columns.
The functions used in this answer are:
AGGREGATE function
MATCH function
INDEX function
ROW function
IFERROR function
I placed the main part of the formula inside the IFERROR function as a way of dealing with things that may be out of range or when not all the input have been provided. I then assumed that what you were basing your search on would be provided in a series of cells. In my example I assumed the questions would be asked in the range H3 to K3 and I place the results in L3.
The main concept is centered around the INDEX function. I specified the index range as being the height of your table and the width of the percentage rates. Or for this example D2:F9.
=IFERROR(INDEX($D$2:$F$9,row number, column number),"Not Found")
That is the easy part. That more challenging part is determining the row and column number to look in. Lets start with the column number as it is the slightly easier of the two. I assumed the ratio to look for, or rather the header of the column to look in would be supplied. I basically used this equation to determine the column number:
=MATCH(K3,$D$1:$F$1,0)
which in layman's terms is which column between D and F, counting column D as 1, has the value equal to the contents of K3. So now that there is a formula to determine the column, we can drop that into our original formula and wind up with:
=IFERROR(INDEX($D$2:$F$9,row number,MATCH(K3,$D$1:$F$1,0)),"Not Found")
Now we just need to determine the row number. This is the most complex operation. We are going to basically make a bunch of logical checks and take the first row that matches all the logical checks. The premise here is that a logical check is either TRUE or FALSE. In excel 0 is false an every other integer is TRUE. So if we multiply a series of logical checks together, only the one that is true in all cases will be equal to 1. The first logical check is the loan type. it will be followed by the living status and then the vehicle age.
=(H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)
now if you put that into an array formula you will get a series of true false or 1/0. We are going to use it inside an AGGREGATE function with a special feature. The AGGREGATE function will perform array like calculation for some of its functions. We are going to use function 15 which will do this. We are also going to tell the aggregate function to ignore all errors, which is what the 6 does. So in the end what we wind up doing is dividing each row number by the logical check. If the logical check is false or 0, it will generate a Div/0! error which aggregate will choose to ignore. In the end we wind up with a list of row which match our logical check. We then tell the aggregate that we want the first result with the ,1. so we wind up with a formula that looks like:
=AGGREGATE(15,6,ROW($A$2:$A$9)/((H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)),1)
While this does provide us with the row number we want, we need to adjust it to make it an index number. In order to do this you need to subtract the number of header rows. In this case 1. So the index row number is given by this formula:
=AGGREGATE(15,6,ROW($A$2:$A$9)/((H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)),1)-1
And when we substitute that back into the earlier equation for the row number, we wind up with the final equation of:
=IFERROR(INDEX($D$2:$F$9,AGGREGATE(15,6,ROW($A$2:$A$9)/((H3=$A$2:$A$9)*(I3=$B$2:$B$9)*(J3=C2:C9)),1)-1,MATCH(K3,$D$1:$F$1,0)),"Not Found")

Explain LOOKUP formula

I'm trying to understand some legacy Excel file (it works, but I would really like to understand how/why it's working).
There is a sheet for data input (input sheet)and some code that is called to process data in the input sheet. I found out that number of rows in the input sheet is determined using a Lookup formula like this:
=LOOKUP(2;1/('Input sheet'!E1:E52863<>"");ROW(A:A))
"E" column contains names for import items and column is NOT sorted
"A" column does not contain anything special - I can replace it with B, C or whatever column and it does not affect the formula's outcome
According to what I have found about Lookup behaviour: •If the LOOKUP function can not find an exact match, it chooses the largest value in the lookup_range that is less than or equal to the value.
What does this ^-1 operation to the specified range? If E(x) is not empty -> it should turn into 1, but if it is empty - then it would be 1/0 -> that should produce #DIV/0! error...
1/('Input sheet'!E1:E52863<>"")
The outcome is the same, if I replace 2 with any positive number (ok, tried only some, but it looks like this is the case). If I change lookup value to 0, then I get #N/A error -> •If the value is smaller than all of the values in the lookup_range, then the LOOKUP function will return #N/A
I am stuck... can anyone shed some light?
LOOKUP has the rare ability to ignore errors. Conducting the 1/n operation will produce an error every time n is zero. False is the same as zero. So, for your formula, every empty cell produces an error in this calculation. All of those results are put in a vector array in the 2nd argument.
Searching for any positive value (the 1st argument) larger than 1 will result in LOOKUP finding the last non-error value in the above vector.
It also has the nice optional 3rd argument where you can specify the vector of results from which to return the lookup value. This is similar to the INDEX component of the the INDEX/MATCH combo.
In the case of your formula, the 3rd argument is an array that looks like this: {1;2;3;4;5;6;7;8;9;...n} where n is the last row number of the worksheet, which in modern versions of Excel is 1048576.
So LOOKUP returns the value from the vector in the 3rd argument that corresponds to the last non-error (non-blank cell) in the 2nd argument.
Note that this method of determining the last row will ignore cells that have formulas that result in a zero-length string. Such cells look blank but of course they are not. Depending on the situation, this may be precisely what you want. If, on the other hand you want to find the last row in column E that has a formula in it even if it results in a zero-length string, then this will do that:
=MATCH("";'Input sheet'!E:E;)
You might get some idea what the formula is doing (or any other formula) if you apply Evaluate Formula. Though since the principle is the same whether 3 rows or 52863 I'd suggest limiting the range, to speed things up if choosing Evaluate Formula. As usual with trying to explain formulae, it is best to start from the inside and work outwards. This:
'Input Sheet'!E1:E52863<>""
returns an array with a result for every entry in ColumnE from Row1 to Row52863. Since it is a comparison (<> does not equal) the result is Boolean - ie TRUE (not empty) or FALSE (is empty). So if only the first half of E1 to E52863 is populated, the result is {TRUE;TRUE;TRUE; ... and a LOT more TRUE; ... and FALSE ... and a LOT more ;FALSE and finally }.
Working outwards, the next step is to divide this array into 1. In arithmetic operations Boolean TRUE is treated as 1 and FALSE as 0, so the resultant array is {1;1;1; ... and a LOT more 1; ... and #DIV/0!... and a LOT more ;#DIV/0! and finally }.
This then becomes the lookup_vector within which LOOKUP seeks the lookup_value. The lookup_value you show is 2. But the array comprises either 1 or #DIV/0! - so 2 will never be found in it. As you have noticed, that 2 could just as well be 3, or 45 or 123 - anything as long as not a value present in the array.
That (not present) is necessary because LOOKUP stops searching when it finds a match. The fact that there is no match forces it to the end of the (valid) possibilities - ie the last 1. At this point, in my opinion, it would be logical to return "not found" but - I suspect merely a quirk, though very convenient - it returns that 1 - by its index number in the list, ie 52863 if all cells in E1:E52863 are populated.
Although the result_vector (Row(A:A)) is optional for LOOKUP it is required in this usage in effect to fix the start point for the index (effectively Row1, since an entire column). You might change that to say A3:A.. and the result would be the number of the highest populated row number in ColumnE plus 2 (3 -1).

Resources