Return where two columns match - excel

Forgive me if this has been request before, any assistance is greatly appreciate.
I have the data below consisting of thousands of rows. I need to isolate rows only where the field1 and field2 columns match. is there a quick method of performing this in excel?
FileID1 FileID2 Hash
27468 27462 8BEA348CA9301F6459F8E8A2DD126D7C
29874 29843 EEFFBC24EAE3F4FD5ED5232993081A36
31150 1126 AE3675DC487DEF0F9C9FEC42B81B1438
**32330 32330 59D77968DB2FE6AFE42EEC21268F3D5A **
33218 33211 9231697E3A859F0D2C4E39AFB1C4AFFE
33984 33980 3B20A501EB17BA2A6FA6A43D9A3D70BA
35275 35260 201D7B2CE5E1DB924CAEDC0F7DA93489
**35402 35402 726C1DEE00F5D17EAB39B3DD1AE4EC0E **
35887 35883 176C07CD85BDD52449073310B9177977
36734 36657 2CDECE0B8C581D9E0F68B8BC3CEDAAB9
36924 36912 94BF549976E42D891F59A66C9972992E
BTW - I know that something like =IF(A1=B1,C1,"") but wanted something more refined where one does not have to copy the data, paste as text and then sort.

You can use if A column for first ID and B column is second ID and C column is the hash values you can use in D column as an expression like that
=IF(A1=B1,1,0)
this gives a value 1 for equal values and gives 0 for the non-equal values
or like your example
=IF(A1=B1,"**","")

To return the hashes where ID1 is equal to ID2, you can use this array formula:
=INDEX($C$2:$C$12,SMALL(IF($A$2:$A$12=$B$2:$B$12,ROW($C$2:$C$12)-ROW($C$2)+1),ROWS($C$2:C2)))
(enter with CTRL+SHIFT+ENTER).
You can wrap =IFERROR([formula above],"") to hide the #NUM! errors).
You can make some tweaks to that, if you're not after just the hashes. For instance, in D column, starting D2, you can use that same formula, just change the Index() columns at the start to be column A, and it'll return the IDs in column A that are duplicate too.

Related

Excel SUMPRODUCT and dynamic text conditions

I am trying to do a summation of rows with certain dynamic conditions. I have rows like:
A can be only one value, K can have multiple OR-values. In the end M is to be summed.
I have tried to use SUMPRODUCT() which works for column A but not for K. What I am looking for is something like:
=SUMPRODUCT(--(!$A$2:$A$20000="AA")*--(!$K$2:$K$20000="AA" OR "BB")*$M$2:$M$20000)
I know I can do ="AA" and then ="BB" but I need "AA" and "BB" to be dynamic based on other cells. And the number of arguments is different. I tried {"AA";"BB"} but I know this will not work as the match then needs to be in the same row.
Can it at all be achieved?
Thanks a lot!
=SUMPRODUCT(($A$2:$A$20000="AA")*(($K$2:$K$20000="AA")+($K$2:$K$20000="BB"))*$M$2:$M$20000)
Note that:
Since you are multiplying/adding arrays, there's no need to include the double unary's
I don't know why you have a ! in your example formula.
To return an OR array of TRUE;FALSE, we add.
Your comments still do not provide a clear explanation of what you are making dynamic.
But to create a dynamic OR for column K, including testing for column A and summing column M, you can do the following:
For column K, let us assume that your possible OR's are entered separately in the range F2:F10
=SUMPRODUCT(MMULT(--($K$2:$K$20000=TRANSPOSE($F$2:$F$10)),--(ROW($F$2:$F$10)>0))*($A$2:$A$20000="AAA")*$M$2:$M$20000)
The matrix multiplication will produce a single column of 19,999 entries which will be a 1 for matches of any of the OR's and 0 if it does not match.
See How to do a row-wise sum in an array formula in Excel?
for information about the MMULT function in this application.
In the above formula, "blanks" in the OR range (F2:F10) will also match blank entries in column K. So it is conceivable that if there is a blank in K and F and a AAA in col A and a value in column M that a wrong result might be returned.
To avoid that possibility, we can use a dynamic formula to size column F where we are entering our OR values:
=INDEX($F$2:$F$10,1):INDEX($F$2:$F$10,COUNTA($F$2:$F$10))
will return only the values in col F that are not blank (assuming no blanks within the column)
So:
=SUMPRODUCT(MMULT(--($K$2:$K$20000=TRANSPOSE(INDEX($F$2:$F$10,1):INDEX($F$2:$F$10,COUNTA($F$2:$F$10)))),--(ROW(INDEX($F$2:$F$10,1):INDEX($F$2:$F$10,COUNTA($F$2:$F$10)))>0))*($A$2:$A$20000="AAA")*$M$2:$M$20000)
Given this data:
the last formula will return a value of 5 (sum of M2,M3,M7)
Use SUMIFS with SUMPRODUCT wrapper:
=SUMPRODUCT(SUMIFS($M$2:$M$20000,$A$2:$A$20000,"AA",$K$2:$K$20000,{"AA","BB"}))

VLOOKUP search on several columns

I have an Excel spreadsheet with 3 columns. I would like to lookup for a value that can be in the first 2 and then get the corresponding value from the third one.
A B C
Mustang Empty Ford
Camaro Corvette Chevrolet
The VLOOKUP can only search in the first column. What I need is to be able to find a value in column A and B and return the value from C.
=VLOOKUP("Corvette",A1:C2,3,0) returns #N/A (would like to return Chevrolet)
=VLOOKUP("Camaro",A1:C2,3,0) returns Chevrolet
Is it possible?
use AGGREGATE:
=INDEX(C:C,AGGREGATE(15,7,ROW($A$1:$C$2)/($A$1:$C$2=E1),1))
If it is only three columns then this will be quicker:
=INDEX(C:C,IFERROR(IFERROR(MATCH(E1,A:A,0),MATCH(E1,B:B,0)),MATCH(E1,C:C,0)))
But as you can see adding an IFERROR for each column can get out of hand with more columns
Both the above will return the first encountered of the lookup. If the data set is unique, no duplicates in any of the columns the we can use the following.
this uses FILTER which is currently available on Office 365 for Insiders:
=FILTER(C:C,(A:A=E1)+(B:B=E1)+(C:C=E1))
But it does require that the data set be totally filled with unique. If one want to return all that match we can use TEXTJOIN to create a comma separated list:
=TEXTJOIN(",",TRUE,UNIQUE(FILTER(C:C,(A:A=E1)+(B:B=E1)+(C:C=E1))))
You can also try this:
=IFERROR(VLOOKUP(F2,$A$2:$C$3,3,0),VLOOKUP(F2,$B$2:$C$3,2,0))
where F2 is the look up item, and $A$2:$C$3 is the range of your 3 columns.
The logic is to use two VLOOKUP to return the value from the 3rd column if the look up value is in Column A, or return the value from the 2nd column if the look up value is in Column B.
Cheers :)
I like index-match a bit more for this:
=if(Isnumber(match(Thing, FirstColumn,0)),Index(ThirdColumn, Match(Thing, FirstColumn,0)),Index(ThirdColumn,Match(Thing, SecondColumn,0)))
Basically, test for existence in the first column. If its there, keep going, otherwise, use the second column.

Count of values which appear more than once in a column

In my excel column I have values as such:
ID
a
a
a
b
c
c
d
e
I would like to return the count of ids which occur twice or more. In this case answer is 2 (a,c).
Constraints:
No helper cols or one at most(There are a ton of other filters to be added to the countifs which are not relevant to the question,adding helpers would mean 12+ extra columns, one for each month)
2.No VBA ( UDF is ok)
3.Formula result in single cell.
The current formula which I have tried:
=COUNTIFS(F13:F22,COUNTIF(F13:F22,">=2"))
gives me 0.
Thanks in advance.
Hmm with no specific order of values, try:
=SUM(IF(COUNTIF(A2:A9,A2:A9)>1,1/COUNTIF(A2:A9,A2:A9),0))
Enter as array through CtrlShiftEnter
Another variant would be:
=SUMPRODUCT((COUNTIF(A2:A9,A2:A9)>1)/COUNTIF(A2:A9,A2:A9))
With the advantage you won't have to enter as array.
Would you choose to add criteria I believe that the second formula is a bit more userfriendly adding them in, like so (edited your sample data a little to show):
=SUMPRODUCT((B2:B9=1)*(C2:C9="x")*(COUNTIF(A2:A9,A2:A9)>1)/COUNTIF(A2:A9,A2:A9))

My index match formula is returning #N/A and I cannot figure out why?

I am trying to use an index match formula to return a value based on two values. However, it is returning #n/a. I have created a simple table with one row and 3 coloumns as a test to try and figure out what is going wrong. Below is a simple table I made for this purpose. I want to return column L based on the criteria from columns J and K.
J K L
123 4 7
Here is the formula I have used.
=INDEX(L3,MATCH(1,(M8=J3)*(N8=K3),0))
I also used ctrl-shft-enter to run the formula but it is giving me an NA value. When I use an index match to return a value based on only one criteria, the formuala works and it returns a 7 but when I try for multiple criteria, the formula fails.
Any help would be greatly appreciated.
Thanks,
G
I think what you need to do is Concatenate the columns of interest then do the match. Try:
=INDEX(L3,MATCH(M8&N8,J3:J6&K3:K6,0))
This should be entered as array formula using Ctrl+Shift+Enter.
What the formula does is:
Concatenate the values being searched in memory.
=INDEX(L3,MATCH(123&4,J3:J6&K3:K6,0))
Then it also concatenates all the values in the columns joined in memory.
=INDEX(L3,MATCH("1234",{"1234";"";"";""},0))
And then the actual matching.

Index match match - correct approach?

I have a data source in the format as the one below. In reality, that would contain few thousand rows.
I need to use something like INDEX-MATCH-MATCH in order to be able to get the "Status" for each "Content" item for each UserID.
The final result should look like this. The first two columns are not dynamic.
The INDEX formula goes to C and D.
I am using the following sequence to try and write the formula, but I don't seem to understand where the problem is.
=INDEX(Sheet1!A:K, [Vertical Position], [Horizontal Position])
look up the user with ID xxx:
=INDEX(Sheet1!A:K, MATCH(A2, Sheet1!A:K,0), [Horizontal Position])
look up the status for eLearn1.
=INDEX(Sheet1!A:K, MATCH(A2, Sheet1!A:K,0), MATCH("Status", Sheet1!A:K,0))
What am I doing wrong?
The question is not clear, but I think you are trying to do a LOOKUP based on the values of two columns. So for a particular value of Column A (UserID) and Column B (Content) you need to return Column H (Status).
This can be done using an array formula to return the row number of the matching line which can be fed into INDEX. Note, that this will only work as long as Columns A&B only have unique pairings.
I have set up some sample data:
Columns A-C are my source data. Cells G2:H4 are the lookup.
The formula is:
=INDEX($C$1:$C$7, SUM(($A$1:$A$7=$F2)* ($B$1:$B$7=G$1)*ROW($C$1:$C$7)))
This needs to be entered as an array formula by pressing CTRL-ALT-ENTER.
The formula works by matching the value you are searching for in both arrays and multiplying out the results. This should give you a result array consisting of all False with one True indicating the matched row. This is then multiplied against the row number to return the correct row to the INDEX formula.

Resources