excel matching two arrays - excel

I have a problem that seems pretty easy, but still cannot find a proper solution, I want to avoid using vba.
I have two tables in one spreadsheet. both have the same columns - Name, City, Province.
My goal is compare both and if three out of three values in a row match, then pull "1", if not, pull 0.
I have used the formulas below , but it does not work for my case .
=IF(AND(A2=P:P,G2=M:M,H2=L:L),1,0)
=INDEX(A:P,MATCH(A2,P:P,FALSE),MATCH(G2,M:M,FALSE),2)
=INDEX(L:P,MATCH(A5,P:P,0),MATCH(G5,M:M,0),MATCH(H5,L:L,0))
=SUMPRODUCT(--(L2:L60=H2),--(M2:M60=G2),--(P2:P60=A2),B2:B60)
It seems that the solution is quiet simple , but I cannot find it,
Thanks in advance!

The key here is to merge the columns together, them Match on that.
Like this
=IFERROR( IF( MATCH(H3&"_"&I3&"_"&J3, $C$2:$C$60&"_"&$B$2:$B$60&"_"&$A$2:$A$60,0), "Yes"), "No")
Choose a seperator character that doesn't otherwise appear in your data (I've chosne _)

Assumption: Values just need to exist, not that they need to be of equivalent row.
=If(IfError(Match(A2,P:P,0),0)*IfError(Match(G2,M:M,0),0)*IfError(Match(H2,L:L,0),0)>0,1,0)
For each IfError, you will output a row number (>0) if you match, or if there is no match a zero will be output. Multiply anything by zero and you get zero, whcih allows a 1 or 0 output for true/false in the overarching If-statement.
If they need to be of the same row, you can compare 2 matches, which rely on the transitive property (A=B, B=C, so A=C):
=If(And(Match(A2,P:P,0)=Match(G2,M:M,0),Match(G2,M:M,0)=Match(H2,L:L,0)),1,0)
Edit1:
Per my comment (to this answer) about false negatives, a UDF or subroutine in VBA would be more appropriate, considering Match() returns the first row that has a match.
As this is not a VBA tagged post, this is a bit above the expected answer... My recommendation would be to:
A) Ensure you are comfortable using VBA.
B) Make a post about creating a user-defined function (note that any post on here about VBA has an expectation that the poster can interact with an expert on the topic and will be putting forth effort to write the code themselves, as StackOverflow is not a code-for-you service).
To help give a lead on what may be in your UDF:
A loop to go through the values from first row to last row in the search column (i.e., L, M, & P)
A variable to dynamically identify the last row of your search column
An if-statement to compare values from your lookup values (i.e., A2, G2, H2) to the search values at the current iteration of the loop
An output of 1 (has match) or 0 (no match).
There are many ways to go about this with VBA; hopefully that's a good start for you, Irina!

Related

Excel formula to get unique value from a lookup only if all values match

I'm trying to find a way to perform an INDEX MATCH lookup that goes beyond the first match to check if all matching values are equivalent. I've found formulas that will return all matches, but what I'd like to do is have the matching value returned in the formula cell, but only if all values returned are the same.
Here's an example:
I'm matching the report number with the report number below and only picking up the area value if all report-area combinations are the same. Is there a clean way to do this?
Dan.
My solution may seem a bit messy, but you can make it simpler as you go, once you start implementing it:
First, I made a Report Count. (How many 12345 Reports are there in total, and so on).
=COUNTIF($A$2:$A$10;A2)
Then, I concat the Report-Area to get a unique identifier for every Report-Area combination.
=A2&"-"&B2
Now, I make a Count of that column, meaning I count how many combinations are there for each case (e.g. how many 12345-2C are there in total).
=COUNTIF($D$2:$D$10;D2)
Then, I create an "Ok" Column for checking if the Report Count matches the Concat Count.
=IF(C2=E2;"OK";"")
That said, we have our table ready for checking what you're looking for.
In just one formula (the one under Lookup header) on cell B13:
=IF(INDEX(F2:F10;MATCH(A13;A2:A10;0))="OK";INDEX(B2:B10;MATCH(A13;A2:A10;0));"")
I check IF there's an "OK" on the OK column for that Report number.
If there is, I Search for the "Area" value for that Report number.
If there isn't an "OK", I leave a blank cell. (In your example, it's #N/A)
The formulas on H2, I2 and C13 are just for reference. Plain text.
Again, I know it seems messy, but if you're not too familiar with some Excel formulas and functions, this is a good way to learn and build complex formulas step by step (Just as our fellow n8 said)
I assume you understand how INDEX MATCH works. If you don't, I'll edit an explanation for you.
Good Luck!

Can these formulas be simplified? Why does INDIRECT function seem to not work inside an ISBLANK test within a MATCH formula?

Summary
I need an array formula that takes a row of data of certain length from Sheet1. For that row, in each column that is not blank, I need to grab the Sheet1 header value for that column and display that data in a continuous row on Sheet2 (without any spaces in between the row's cells).
Background
I have a table of data (employees and industry certifications with expiration date being the table's cell data) on sheet 1, with a row for each employee the spreadsheet is tracking. The certifications are the columns.
We are using this information to link to ID Badge Printer software (Bodno Silver), where we are limited to linking columns of data to a particular textbox.
The problem lies in the fact that not everyone has every certification. The rows are peppered with blanks separating the certifications that each employee does have. While setting up the required text boxes in the badge software template, that each link to a specific column, I quickly realized that since not everyone has every certification if we used the data how it was we would have a bunch of strange looking blanks in between the listed certifications rather than a continuous list.
What I did
My solution to this (which I'm open to a better one if anyone knows of one, other than "use better software"), was to create a new sheet and array formulas that no one would use except for me and the id printer software. This sheet would have a similar data table that took the rows of data interspersed with blank cells between expiration dates, and put the matching column headers for cells that had a date in them into a continuous row of the same maximum length (eliminating the blank cells).
Essentially, this would allow me to circumvent the restrictions of the badge software and each textbox would be MatchedCert1, MatchedCert2, MatchedCert3, etc. up to the original maximum number of certifications.
Pictures are probably better than my words at explaining what I am going for:
Sheet1 (source)
Sheet2 (result)
The array formulas
I worked on this one for a while. What I thought would be a simple INDEX, MATCH, ISBLANK formula (that I could create using the appropriate relative and absolute cell linking) and then expand to the whole sheet turned into a witch hunt and me praying for forgiveness for my sins to all that may be holy. Also a lot of googling.... I realized quickly that this one may not be so simple after all.
Finally, I arrived at the following two array formulas in order to correctly show what I was going for:
First Column of training section
{=IFERROR(INDEX(Sheet1!$E$2:$P3,1,MATCH(FALSE,ISBLANK(Sheet1!E3:Q3),0)),"")}
(easy enough, right? I thought so...)
I felt good about this until I tried to think through what would be required to get the formula to be universal so that I could use it on the entire table.
I feel dirty just putting the following in public, but here goes...
Second column through last column array formula
{=IFNA(INDEX(INDIRECT(ADDRESS(ROW($E$2),(MATCH(E3,Sheet1!$2:$2,0)+1),1,1, "Sheet1")&":"&ADDRESS(ROW(E3),COLUMN($Q3),1)),1,MATCH(FALSE, ISBLANK(INDEX(INDIRECT("Sheet1!"&ADDRESS(ROW(E3),(MATCH(E3,Sheet1!$2:$2,0)+1),1)&":"&ADDRESS(ROW(E3),COLUMN($Q3),1)),0,0)), 0)),"")}
(please don't call the police...)
[ninja edit] While this array formula works for 2nd result column through the final column, it doesn't work if there's not a blank column following the result range. The actual spreadsheet has 4 different groups of certifications that run horizontally, but I was able to just add a blank column in the corresponding data from the other sheet easily enough, so I just let it go. I'd give somebody a nickle for the answer to why that's the case here too [/edit]
Results
The first array formula, and INDEX MATCH using ISBLANK is rather straightforward.
The biggest question for me here, and the thing that drove me absolutely nuts for a couple of days, is why the second array formula requires the additional INDEX function nested inside of the ISBLANK function.
While taking the function apart and experimenting I realized that if I have any INDIRECT reference inside a ISBLANK function, which is itself inside of a MATCH function, the result of the match was ALWAYS 1:
{=MATCH(FALSE,ISBLANK(INDIRECT("$E3:$Q3")), 0)}
The above ALWAYS returns 1, whereas if I put the range in explicitly, the function would work just fine. That wasn't an option for me, since I needed to dynamically return the starting position for the match using the previous cell's address.
However, adding an INDEX function (with a column and row value of 0) to encapsulate the INDIRECT function provides the correct answer. I figured this out just by trial and error.
Questions
Can someone with more knowledge please let me know what is causing this behavior?
As a broader question, given I am limited to using formulas (no VBA), I would also like to know if I'm going about this in the wrong way or if there is a much simpler way of accomplishing this without this behemoth of a formula?
I know this sheet will probably require maintenance in a year - good luck future self!
Put this in E3, Copy over and down
=IFERROR(INDEX(Sheet1!$2:$2,AGGREGATE(15,6,COLUMN(INDEX($E:$P,MATCH($C3,Sheet1!$C:$C,0),0))/(INDEX(Sheet1!$E:$P,MATCH($C3,Sheet1!$C:$C,0),0)<>""),COLUMN(A:A))),"")
As to why your formula is not working, it is too convoluted to parse. One note, unless the sheets is the variable, one should avoid INDIRECT as much as possible. INDEX can almost always be used in its place.
Both INDIRECT and ADDRESS are volatile functions. Volatile functions will re-calculate every time Excel re-calculates, leading to a lot of unnecessary computations.
Not a solution but to answer why you are seeing this behavior:
EDIT: PREVIOUS EXPLANATION WAS JUST PLAIN WRONG
This confused me so, I did a bit of investigation:
I think that your problem is actually coming from the ISBLANK function because it is intended to be used with single values, and cannot handle ranges. Any BLANKs which are returned by functions are only converted to numeric values (0), when the BLANK is returned to (or displayed on) the sheet. If the function is returning to another function, the BLANK value seems to be preserved.
EDIT: ADDING A SOLUTION WITHOUT ARRAY FORMULAS
This is probably more complex than using an array formula... but I strongly dislike them, so do all I can to remove them.
Firstly, I would add an index to your positions in the results sheet:
=IF(F$7>COUNTIFS($F3:$L3,"<>"),
"",
IF(
MINIFS(
$F$7:$L$7,$F$7:$L$7,
">" & IFNA(INDEX($F$7:$L$7,MATCH(E9,$F$2:$L$2,0)),0),
$F3:$L3,
"<>"
)=0,
"",
INDEX(
$F$2:$L$2,
MATCH(
MINIFS(
$F$7:$L$7,$F$7:$L$7,
">" & IFNA(INDEX($F$7:$L$7,MATCH(E9,$F$2:$L$2,0)),0),
$F3:$L3,
"<>"
),
$F$7:$L$7,
0
)
)
)
)
Basically, the formula looks at the cert in the previous cell, and looks for the next, minimum index, greater than that.

Excel: Lookup multiple values in one cell and return results in another

I'm trying to find a way to lookup each of several comma separated values in one cell, and return the results as a comma separated list in another cell.
The number of values to lookup is not constant, it may be only one, or several hundred.
Example - Sheet A has the initial values and will hold the returned values. Sheet B is the table with the data to lookup.
EDIT: I've gotten better and learned how to make this a 1-liner. I'll leave my original answer below, but here's my updated one (as this hasn't been accepted or even commented on, yet):
=TEXTJOIN(", ",TRUE,INDEX(SheetB!$A$2:$B$6,MATCH(FILTERXML("<x><y>"&SUBSTITUTE($B2,",","</y><y>")&"</y></x>","//y"),SheetB!$A$2:$A$6,0),2),"")
I think INDEX/MATCH was the key component I hadn't learned yet when I tried answering this before. :)
The FILTERXML/SUBSTITUTE I had learned from another site where it takes in your delimited values and replaces the delimiter and wraps the rest with the needed xml tags for FILTERXML to then return an array of your values.
INDEX/MATCH then does your lookup, first getting the row number via MATCH (note the final value of 0 in MATCH means exact match), then INDEX returns the value of the corresponding matching column (the 2 at the end of INDEX's arguments).
Finally, TEXTJOIN joins the results together with the delimiter of your choice (and the second argument set to TRUE allows it to skip blanks).
As a final note, I think this solution may work without Office 365. FILTERXML appears to have been available as early as Office 2013.
Original, outdated answer follows:
I've been searching for an answer to this myself and was rather dismayed to find your question - exactly the question I have - without an answer.
Almost 4 years late, this answer does require Office 365 Excel, and it's not the most elegant, but here's what I've been able to come up with.
Pick an unused column on your Sheet B (with enough contiguous unused columns to its right to handle spillover of however many values you'll need to be splitting, or use a new sheet dedicated to this) and put this formula into the 2nd row's cell for that column, dragging it down through each cell that has corresponding data back on Sheet A (and assuming your column header "File #s" is column B): =TRANSPOSE(FILTERXML("<x><y>"&SUBSTITUTE(SheetA!$B2,delim,"</y><y>")&"</y></x>","//y"))
Replace delim with your delimiter wrapped in quotes, or a cell reference to a cell that has your delimiter in it. Also, if it's possible to have no value in the corresponding cell in Sheet A, then you'll need to wrap the FILTERXML() function (or the whole thing) with IFERROR().
Then, back on Sheet A in your Results column, use this formula: =TEXTJOIN(delim,TRUE,FILTER(SheetB!$B$2:$B$6,COUNTIF(SheetB!D2#,SheetB!$A$2:$A$6),if_no_match))
Again, replace delim with the delimiter or cell reference with your delimiter of choice. SheetB!$B$2:$B$6 and SheetB!$A$2:$A$6 are your lookup table's columns (and thus extend them to encompass the whole thing). The SheetB!D2# references the column where you put the TRANSPOSE(FILTERXML()) formula. Finally, replace if_no_match with whatever you want to appear if there was no match.
I'd ideally like to find a way that uses a single, self-contained formula, but alas, this is as far as I've managed so far.

Sumproduct or Countif on a 2D matrix

I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....

excel 2007 duplicate row

How can I compare records in a table, to make sure these records are not duplicates? Using excel 2007 I don't won’t them to delete after comparison.
Duplicates rows should be colored. I have a table columns are from A to P and I have 500 rows. I want to put condition on A, B, E, F, G, I.
If you don't want to sort your column, you can try with a matrix formula (http://www.stanford.edu/~wfsharpe/mia/mat/mia_mat4.htm).
Practically, you can compare your current row to every row above. Somtething like :
=MIN(LINE(B1)*(IF(A2=A1;1;0))*(IF(B2=B1;1;0)))*(...)
validated with CTRL-SHIFT-ENTER will check if all the conditions are true, else, will return 0.
Please send a file (with anonymous data) if you want a practical example.
Hope that helps
Edit : here is the good solution (provided you want to compare data in the Q column) :
=MIN(LIGNE($Q$5:Q6)*EQUIV(Q6;$Q$5:Q6;0))
If you want to have the first line where the value appear
=MIN(LIGNE($Q$5:Q5)*EQUIV(Q6;$Q$5:Q5;0))
If you'd rather have #N/A if there are no duplicate before that line
Still validate with CTRL-SHIFT-ENTER
Sort by the columns you are interested in then use a formula to compare each row with the one above. You can then use conditional formatting to colour the results.
I may sound stupid here, but usually the simple answers are usually the best.
I did this recently, by literally using the CONCATENATE() function with the TEXT() function to combine all the columns I wanted to compare into a single cell. So in effect I am creating a cell with a unique "key" that holds all the data I want to be unique.
I then sort that column and create another empty column next to it.
Then us this formula to compare the row with the row above it: =IF(A2=A1,0,1)
This simply puts a 0 where it's the same row and a 1 where it's different.
I then filter on the '1's and there are my duplicates!
It'a also usefull as an alternative way of doing a unique COUNT(DISTINCT ...) where I want to count how many unique references of my data exists. SUBTOTAL(3...) is not enough.

Resources