INDEX MATCH with COUNTIF only finds first 3 unique values then stops - excel-formula

I have a long list with codes in sheet 1 of my data starting at A2 (the list is very long 4500 rows):
C56
A125
R89
C56
B125
B125
A125
R89
D512
S226
D512
R89
In sheet 2 I need to create a list of all the codes and remove the duplicates. I'm using the following array formula to try and do it:
{=INDEX('Sheet 1'!$A$2:$A$4568,MATCH(0,COUNTIF('Sheet 1'!$A$1:A1,'Sheet 1'!$A$2:$A$4568),0))}
The formula results in the 1st and 2nd unique codes to be returned correctly but the 3rd unique code is duplicated for the rest of the lines:
Code
C56
A125
R89
R89
R89
I also tried =IFERROR(INDEX('Sheet 1'!$A$2:$A$4568,MATCH(0,INDEX(COUNTIF('Sheet 2'!$A$1:A1,'Sheet 1'!$A$2:$A$4568&""),0),0))&"","") as a non-array formula, but again I get the same result.
I can't work out what is wrong with either of these formulas.

=INDEX('Sheet 1'!$A$2:$A$4568,MATCH(0,COUNTIF('Sheet 1'!$B$1:B1,'Sheet 1'!$A$2:$A$4568),0))
Assuming your list starts in B2
(Array formula - Ctrl+Shift+Enter)

Related

Sum of last 4 results in a column with multiple conditions

I have a list of data (top to bottom with gaps) in range AM4:AM1000 which is always added to and to which i want to find and sum the last 4 results. But I only want to find those results which correspond to a separate column, range AL4:AL1000 that is equal to cell E3, and where a third column (AS4:AS1000) meets the criteria of "p".
Im using the code below which extracts the last 4 results but I can't make it meet the other two conditions. any help would be gratefully appreciated
=SUM(INDIRECT("Am" & LARGE(IF(NOT(AM4:AM1000=""),ROW(AM4:AM1000),0),1) & ":Am" & LARGE(IF(NOT(AM4:AM1000=""),ROW(AM4:AM1000),0),4)))
Iv also tried the code below but this only returns the value 0
=SUM(IFERROR(INDEX($AM$4:$AM$1000,LARGE(IF(ISNUMBER(MATCH($AL$4:$AL$1000, $E$3, 0)),IF(AS$4:AS$1000="p",MATCH(ROW(AL$4:AL$1000), ROW(AL$4:AL$1000)), "")), ROWS($I$6:$I7))),""))
Here's an array formula for you to try - make sure to enter it by using Ctrl+Shift+Enter:
=SUMPRODUCT(IF(ROW($AM$4:$AM$1000)=TRANSPOSE(LARGE(IF(--($AL$4:$AL$1000=$E$3)*(--($AS$4:$AS$1000="p")),ROW($AM$4:$AM$1000),0),ROW($A$1:$A$4))),$AM$4:$AM$1000,0))
The result of 26 matches your criteria (highlighted cells):

Find a range of value in excel

I have two different sheets with 300,000 data in Excel.
First sheet contains:
S2_Symbol Start_Pos End Position
STE 254857 267891
PRI 748578 758962
ILA 852741 963369
VIS 789456 796325
Second:
S1_Location
789460
852898
748678
My output should be like this:
S1_Location Symbol
789460 VIS
852898 ILA
748678 PRI
I have to find that S1_location falls in which S2_location and its corresponding Symbol. I have used INDEX formula in Excel but for each cell, I have to change the reference cell manually. I couldn't do it 300,000 data.
How can I do in an in Excel or should I use a script?
This solution assumes the following:
Start and End Positions for each S2 Symbol are unique (i.e. there is no intersection between the ranges allocated to each symbol)
Data in first sheet is located at A1:D17 (adjust ranges in formulas as needed)
Data in second sheet is locate at A1:B300010 (adjust ranges in formulas as needed)
The solution requires:
To add a working column in worksheet one. Enter this formula in D2 and copy till last record.
=ROWS($A$1:$A2)
Fig. 1
Then in second worksheet enter this formula at B2 and copy till last record.
=INDEX( Sheet1!$A$1:$A$17,
SUMIFS( Sheet1!$D$1:$D$17,
Sheet1!$B$1:$B$17, "<=" & $A2, Sheet1!$C$1:$C$17, ">=" & $A2 ) )
Fig. 2
It took aprox. less than 14 seconds to copy downwards and calculate the formulas in sheet 2.
As it can be seen in figures 1 and 2 none of the tables need to be sorted.
Assuming both sheets start in A1, and First sheet ColumnB is sorted ascending, in Second sheet B2 please try:
=INDEX(First!A:A,MATCH(A2,First!B:B))
copied down to suit. It relies on inexact matching.
Assuming we have a Sheet1 like this:
note, the Sheet1is sorted by Start_Pos, End_Pos in ascending order.
and a Sheet2 like this:
Then the formula in Sheet2!B2 downwards could be:
=INDEX(Sheet1!A:A,IF(MATCH(A2,Sheet1!B:B)>IFERROR(MATCH(A2-(10^-10),Sheet1!C:C),0),MATCH(A2,Sheet1!B:B),NA()))
See MATCH: https://support.office.com/en-us/article/MATCH-function-e8dffd45-c762-47d6-bf89-533f4a37673a
The idea is: MATCH without exact matching (without parameter match_type) gets the row of the largest value which is smaller or equal the search value. So in the Start_Pos column it will get the row from which we can get the S2_Symbol. But from the End_Pos column it should get one row beforehand if the value is not outside the given ranges.
There is only one exception. If the value is exact the value in the End_Pos column, then it will return the same row as in the Start_Pos column. Considering this exception, we can search in the End_Pos column with a little bit smaller value. Thanks to Tom Sharpe for his comment.
The formula in Sheet2!D2 downwards is:
{=INDEX(Sheet1!A:A,MIN(IF($A2>=Sheet1!$B$2:$B$300000,IF($A2<=Sheet1!$C$2:$C$300000,ROW(Sheet1!$A$2:$A$300000),2^20+1))))}
this is an array formula which is exactly formulated respecting the requirements. But this is very bad in performance for using in much many cells. But using this, the Sheet1 is not required to be sorted.
Benchmark test:
Have the following Sheet1:
Formulas:
A2:A300002: ="S"&(ROW(A1)-1)*10&"-"&(ROW(A1)-1)*10+7
B2:B300002: =(ROW(A1)-1)*10
C2:C300002: =B2+7
and the following Sheet2:
Formulas:
A2:A300002: =RANDBETWEEN(0,3000007)
B2:B300002: =INDEX(Sheet1!A:A,IF(MATCH(A2,Sheet1!B:B)>IFERROR(MATCH(A2-10^-9,Sheet1!C:C),0),MATCH(A2,Sheet1!B:B),NA()))
Note the -10^-9 instead of -10^-10 in previous version. This is because we have only 16 digits precision. In previous version this was maximum 6 digits integer part and then 10 digits decimal part. Now it is maximum 7 digits integer part and then 9 digits decimal part.
Calculation after pressing F9 in Sheet2 takes ca. 2 s. (Excel 2007, Windows 7, 4 core processor).
I would have gone for something like this which gives you the first match if there is one:-
=INDEX(First!A:A,MATCH(1,(First!B:B<=A2)*(First!C:C>=A2),0))
assuming keys and start and end values are in a sheet called First and lookup values start in A2.
Array formula which must be entered with CtrlShiftEnter
In response to the question from #pnuts about how long it will take, I have set up a similar benchmark with 300,000 rows in each sheet and it has reached 1% after 90 minutes, so it should take about 150 hours to reach 100% or roughly one week. This is to be expected as the number of computations required is (rows in sheet 1) X (rows in sheet 2)
300,000 X 300,000
but in fact because the multiplication applies to complete columns, I believe it is more correctly
300,000 X 1,048,576
i.e. > 300 billion.
A practical version which gives good response for smaller ranges is as follows:-
I define three named ranges Range1, Range2 and Range3
=First!$A$1:INDEX(First!$A:$A,MATCH("ZZZ",First!$A:$A))
=First!$B$1:INDEX(First!$B:$B,MATCH(9.9E+307,First!$B:$B))
=First!$C$1:INDEX(First!$C:$C,MATCH(9.9E+307,First!$C:$C))
and the modified formula is
=INDEX(Range1,MATCH(1,(Range2<=A2)*(Range3>=A2),0))
I was thinking of deleting this answer, but would rather it stood as a counter-example.

Is there a 2 Value Look up function in MS Excel that can perform the following?

I am going crazy over this. It seems so simple yet I can't figure this out. I have two worksheets. First worksheet is my data. Second is like an answer key. Upon checking checking, A1:B1 in Sheet 1 is a match with the conditions in Row 52 in SHEET 2, therefore, the value in Column C is "MGC". What is the formula that will perform this function? It's really hard to explain without the data so I pasted a link of the sample spreadsheet. Thank you so much in advance.
sample spreadsheet here. https://docs.google.com/spreadsheets/d/1_AjuNfCdGfEM-XkqPa6W4hSIxQg4NM2Vg4c2C1pQ_vQ/edit?usp=sharing
screenshot here. (wont let me post i have no reputation)
In Sheet2, insert a column in front of Column A and put the formula in A2 =C2&D2.
Then in Sheet1, Cell C2 the formula =vlookup(A2&B2,Sheet2!A:B,2,0).
the first make a concatenated key to lookup, then the second looks up that key.
How about a index(match())? If I've understood correctly you need to match across both the A and B column in sheet one, checking for the relevant values in B and C on sheet 2 to retrun worksheet 2 column a to worksheet 1 column c.
third version try:
=INDEX(Sheet2!$C$1:$C$360,MATCH(Sheet1!A1&Sheet1!B1,Sheet2!$B$1:$B$360&Sheet2!$C$1:$C$360,0))
Basically what this does is use concatenation, the & operator, to specify you are looking for "Criteria A" & "Criteria B" in sheet 1, which makes the string "Criteria A Criteria B", which is supplied in the first part of the match function.
In the second it then says match this against all of my variables in sheet 2 in the same way with concantenation.
The final part of match function (0) specifies you want an 'exact' match
It then supplied this as a reference to the index function, which then finds the row intersecting with the value you want, and returns that.
As noted here https://support.microsoft.com/en-us/kb/59482 this is an array formula, so it behaves differently, and must be input differently. https://support.office.com/en-za/article/Guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7
There are (at least) 2 ways you could do this without VBA.
USING A SORTED LIST
The first relies on the assumption that your data can be re-sorted, so that everything "Unreported" is in the top, and everything "reported" is together below that (or vice versa). Assuming that this is the case (and it appears to already be sorted like this),we will use the function OFFSET to create a new range which shows only the values that align with either being "Unreported" or "Reported".
Offset takes a given reference to a point on a sheet, and then moves down/up & left/right to see what reference you want to return. Then, it returns a range of cells of a given height, and a given width. Here, we will want to start on Sheet2 at the top left, moving down until we find the term "Unreported" or "Reported". Once that term is found, we will want to move one column to the right (to pull column B from sheet 2), and then have a 'height' of as many rows as there are "unreported" or "reported" cells. This will look as follows in A1 on sheet 1, copied down:
=OFFSET(Sheet2!$A$1,MATCH(A1,Sheet2!A:A,0)-1,1,COUNTIF(Sheet2!A:A,A1),1)
This says: First, start at cell A1 on sheet2. Then find the term in A1 (either "unreported" or "reported", on sheet2!A:A (we subtract 1 because OFFSET starts at A1 - so if your data starts at A1 we need to actually stay at "0". If you have headers on sheet2, you will not need this -1). Then, move 1 column to the right. Go down the rows for as many times as Sheet2 column A has the term found in Sheet1 A1. Stay 1 column wide. Together, this will leave you with a single range on sheet2, showing column B for the entire length that column A matches your term in sheet1 A1.
Now we need to take that OFFSET, and use it to find out when the term in Sheet1 B1 is matched in Sheet2 column B. This will work as follows:
=MATCH(B1,[FORMULA ABOVE],0)
This shows the number of rows down, starting at the special OFFSET array created above, that the term from B1 is matched in column B from sheet2. To use this information to pull the result from column C on sheet 2, we can use the INDEX function, like so:
=INDEX([FORMULA ABOVE],MATCH(B1,[FORMULA ABOVE],0))
Because this would be fairly convoluted to have in a single cell, we can simplify this by using VLOOKUP, which will only require the OFFSET function to be entered a single time. This will work as follows:
=VLOOKUP(B1,[FORMULA ABOVE],2,0)
This takes the OFFSET formula above, finds the matching term in B1, and moves to the 2nd column to get the value from column C in sheet2. Because we are going to use VLOOKUP, the offset formula above will need to be adjusted to provide 2 columns of data instead of 1. Together, this will look as follows:
FINAL FORMULA FOR SHEET1, C1 & COPIED DOWN
=VLOOKUP(B1,OFFSET(Sheet2!$A$1,MATCH(A1,Sheet2!A:A,0)-1,1,COUNTIF(Sheet2!A:A,A1),2),2,0)
OPTION USING ARRAY FORMULAS
The above method will only work if your data is sorted so that the REPORTED and UNREPORTED rows are grouped together. If they cannot be sorted, you can use an ARRAY FORMULA, which essentially takes a formula which would normal apply to a single cell, and runs it over an entire range of cells. It returns an array of results, which must be reduced down to a single value. A basic array formula looks like this [assume for this example that A1 = 1, A2 = 2...A5 = 5]:
=IF(A1:A5>3,A1:A5,"")
Confirm this (and all array functions) by pressing CTRL + SHIFT + ENTER, instead of just ENTER. This looks at each cell from A1:A5, and if the value is bigger than 3, it gives the number from that cell - otherwise, it returns "". In this case, the result would be the array {"";"";"";4;5}. To get the single total of 9, wrap that in a SUM function:
=SUM(IF(A1:A5>3,A1:A5,""))
In your case, we will want to use an array formula to see what row in Sheet2 matches A1 from Sheet1, and B1 from Sheet1. This will look like this:
=IF(Sheet2!$A$1:A$100=A1,IF(Sheet2!$B$1:$B$100,ROW($B$1:$B$100),""),"")
This checks which rows in column A from sheet 2 match A1. For those that do, it then checks which rows in column B from sheet 2 match B1. For those, it pulls the row number from that match. Everything else returns "". Assuming no duplicates, there should only 1 row number which gets returned. To pull that number from the array of results, wrap the whole thing in a MATCH function. Now that you have the row number, you can use an INDEX function to pull the result in Column C with that row, like this:
FINAL ARRAY FORMULA METHOD
=INDEX($C$1:$C$100,MAX(IF(Sheet2!$A$1:A$100=A1,IF(Sheet2!$B$1:$B$100,ROW(Sheet2!$B$1:$B$100),""),"")))
Remember to confirm with CTRL + SHIFT + ENTER instead of just ENTER, when you type this formula. Note that I didn't refer to all of Sheet2!A:A, because array formulas run very slowly over large ranges.
The following formula should work without making any changes to the datasheets.
=INDEX(Sheet2!$A$1:$A$360,MATCH(Sheet1!A1,IF(Sheet2!$C$1:$C$360=Sheet1!B1,Sheet2!$B$1:$B$360),0))
Remember to save this formula as an array with CTRL+SHIFT+ENTER
Documentation on how to use INDEX and MATCH against multiple criteria can be found on Microsoft Support.
It's not clear what you want to do with the multiples that do not have corresponding matches. txed is listed as Unreported twice in Sheet1; kntyctap is listed as Unreported three times. There are only one corresponding match on Sheet2 for each of these.
Non-array Standard Formulas for multiple criteria matches
For Excel 2010 and above use this standard formula in Sheet1!C1:
=IFERROR(INDEX(Sheet2!$A$1:$A$999,AGGREGATE(15,6,ROW(1:999)/((Sheet2!$B$1:$B$999=A2)*(Sheet2!$C$1:$C$999=B1)), COUNTIFS(A$1:A1, A1, B$1:B1, B1))), "")
For version of Excel prior to 2010 use this standard formula in Sheet1!C1:
=IFERROR(INDEX(Sheet2!$A$1:$A$999, SMALL(INDEX(ROW($1:$999)+((Sheet2!$B$1:$B$999<>A1)+(Sheet2!$C$1:$C$999<>B1))*1E+99, , ), COUNTIFS(A$1:A1, A1, B$1:B1, B1))), "")
I've handled error with the IFERROR function in that latter formula. Excel 2003 and previous may have to use an IF(ISERROR(..., ...)) combination.

Excel INDEX and MATCH Get Value

I have an excel workbook that I need some help with INDEX and MATCH or any other Formula that can get me my end result.
Here is sheet1:
SIT_ID METER SUSE_CD
10834282 DT0061 B
10834282 AW7931 P
21676286 CQ9635 P
21676286 DP4838 B
21726281 AW7880 P
21726281 DT0032 B
Here is Sheet2:
Site ID B P
10834282
21676286
21726281
Ultimately what I am trying to do is on Sheet2 is put the Meter that = B for the SITEID in the column and then Put the Meter that = P in the Same row.
I have never used Index or Match and I looked it up online but I am confused and hoping someone can help me with the correct formula or point me in the right direction.
Thanks so much!
INDEX first takes a range, then a row number, an optional column number (and an optional area number).
MATCH takes a value to lookup, an array and a mode.
In your problem you can use the following in Sheet2 cell B2:
=INDEX(Sheet1!$B$2:$B$7, MATCH($A2, IF(Sheet1!$C$2:$C$7=B$1,Sheet1!$A$2:$A$7), 0))
This formula is an array formula and will work with Ctrl+Shift+Enter and then you can fill it to the other cells.
I had to use an IF because there're two conditions to check.
EDIT: Use this one if your cell formats are different:
=INDEX(Sheet1!$B$2:$B$7,MATCH($A2*1,IF(Sheet1!$C$2:$C$7=B$1,Sheet1!$A$2:$A$7*1),0))
EDIT2: Adding trimming:
=INDEX(Sheet1!$B$2:$B$7,MATCH($A2*1,IF(TRIM(Sheet1!$C$2:$C$7)=TRIM(B$1),Sheet1!$A$2:$A$7*1),0))
EDIT3: If you're using it on your full data, change the range:
=INDEX(Sheet1!$B:$B,MATCH($A2*1,IF(TRIM(Sheet1!$C:$C)=TRIM(B$1),Sheet1!$A:$A*1),0))
Assuming your Sheet1 looks like this:
And your Sheet2 looks like this:
The formula in Sheet2 cell B2 and copied over and down to cell C4 is:
=INDEX(Sheet1!$B$2:$B$7,MATCH(1,INDEX((Sheet1!$A$2:$A$7=$A2)*(Sheet1!$C$2:$C$7=B$1),),0))
Note that this is a regular formula, so no need for Ctrl+Shift+Enter
A helper column D is added to initial columns.
D2: =$A2 & $C2
Now it's possible to make a simple search of the concatenated SITE_ID and SUSE_CD:
H2: =MATCH($G2&" B";$D$2:$D$8;0)
The result would be a row number (=1 in this case) for the needed string in array $D$2:$D$8.
INDEX shows the value of the cell, found by counting n-th row (defined by MATCH) and m-th column (=2) in array $A2:$A$8 from the upper left cell (A2).
Altogether: =INDEX($A$2:$B$8;MATCH($G2&" B";$D$2:$D$8;0);2)
The easiest way to get around with this is,
to use concatenation operator in the match function.
Don't forget to use Ctrl+Shift+Enter
Use below formula in column B of Sheet 2
{=INDEX(Sheet1!$B:$B,MATCH(Sheet2!$A2&Sheet2!$B$1,Sheet1!$A:$A&Sheet1!$C:$C,0))}
And the below formula in column C of Sheet 2
{=INDEX(Sheet1!$B:$B,MATCH(Sheet2!$A2&Sheet2!$C$1,Sheet1!$A:$A&Sheet1!$C:$C,0))}
And then flash fill the remaining rows.

SUMPRODUCT( SUMIF() ) - How does this work?

Part 1:
I was able to construct a formula that does exactly what I want (from some examples), but yet, I'm unable to figure out how exactly it works. I have, starting with cell A1:
Price $
table 20
chair 10
Invoice Quantity
table 17
chair 1
chair 2
table 3
What I want is the final total (430) for the invoice which is computed as Quantity*Price for each item (17*20 + 1*10 + 2*10 + 3*20). the following formula correctly does this:
=SUMPRODUCT(B6:B9,SUMIF(A2:A3,A6:A9,B2:B3))
I understand the basics of SUMPRODUCT and SUMIF. But here, my argument for SUMIF's range is A2:A3, which makes me think the SUMIF would iterate through A2 and A3, and not through A8:A11 (which is the criteria). What gives?
Edit: the unclear part is, what exactly does SUMIF do (what is its iteration pattern) when the first two arguments are of different dimensions (here, the range is 2 cells while the criteria is 4 cells). Also, what is the "output" of SUMIF? An array? Of what dimensions?
Part 2:
In addition, if I ignored the quantity and simply wanted to add 20 whenever I saw a table and 10 whenever I saw a chair, I figured I would do:
=SUMIF(A2:A3,A6:A9,B2:B3)
But that doesn't work, and I have to enclose it with a SUMPRODUCT() for it to work and correctly evaluate to 60. Enclosing it within a SUM doesn't work either (probably because the SUMIF doesn't return an array?) Why?
I've read a bunch of tutorials and still can't understand this, and would be most grateful for a clear, intuitive explanation for both these cases. Thank you.
SUMIF can produce an array of results. If you take my formula
=SUMIF(A6:A9,A2:A3,B6:B9)
it says
For the criteria in A2 (ie table)
- look at A6:A9
- where table is matched, sum the corresponding value in B6:B9
- returns 20 (ie 17 +0 +0 +3)
- this is stored in the first position of the array
Then for the criteria in A3 (ie chair)
- look at A6:A9
- where table is matched, sum the corresponding value in B6:B9
- returns 3 (ie 0 +1 +2 +0)
- this is stored in the second position of the array
So the end array from the SUMIF is {20:3}
You can see the array result by highlighting the SUMIF formula in Excel's formula bar and then pressing F9
Then use SUMPRODUCT to multiple the count in the SUMIF by the $ values in B2:B3 to get total dollars
={20;3}*{20:10}
=20*20 + 3*10
= 430
Part 1
Rather than
SUMIF(A2:A3,A6:A9,B2:B3)
which produces a four element array of
={20;10;10;20}
(corresponding to table;chair;chair;table)
You should use
SUMIF(A6:A9,A2:A3,B6:B9)
which sums the values in B6:B9 against your two criteria in A2:A3 giving the desired result
={20;3}
(corresponding to table;chair)
and then use SUMPRODUCT to weight your array, ie
=SUMPRODUCT(SUMIF(A6:A9,A2:A3,B6:B9),B2:B3)
={20;3}*{20:10}
=430
Part 2
Use COUNTIF to return an array of the number of chairs and tables and then multiply by the vales using SUMPRODUCT
=SUMPRODUCT(B2:B3,COUNTIF(A6:A9,A2:A3))
={20;10} * {2;2}
=60
Well you only have one minor mistake:
probably because the SUMIF doesn't return an array?
SUMIF can work with arrays, thats why you formula SUMPRODUCT( SUMIF() ) works in first place, to SUMIF show an array you have to select a group of cells (like C6:C9) input the formula and use CTRL+SHIFT+ENTER instead of ENTER only. this generate an "array fomula", identified by curly brackets {} (those can only be entered with CTRL+SHIFT+ENTER, no manualy) and show the array formula and results

Resources