Improve fuzzy match filter in Excel - excel

I work with a kind of fuzzy match filter, based on additional column with a filter list. The formula is:
=IF(SUMPRODUCT(COUNTIF(A2,"*"&B$2:$B$22&"*"))>0,"Delete","Keep")
In fact there are two formulas, they work on the same kind - they were created on experimenting. The second is:
=IF(SUMPRODUCT(--ISNUMBER(SEARCH($B$2:$B$22,A2))),"Delete","Keep")
Column A contains data to filter,
column B (from B2 to B22) contains the filter list,
in the column C i write "keep", if there is no partly match of the value from column A with values from column B, and write "delete", if there is any partly match.
Q: how to write instead of "delete" the matching value from column B? I can't get this work in both of formulas.
Update
After translation of formula by #Jerry
=IFERROR(INDEX($B$2:$B$22,MATCH(9^99,SEARCH($B$2:$B$52,A2))),"Keep")
to
=WENNFEHLER(INDEX($B$2:$B$22;VERGLEICH(9^99;SUCHEN($B$2:$B$52;A2)));"Keep")
with this translation tool (worked for me in other cases errorfree), i get following result:
which is another, than the result by Jerry.

If there can be only 1 match, then you can use INDEX and MATCH in an array formula (use Ctrl+Shift+Enter and you will see curly parens around the formula in the formula bar if you did it right):
=IFERROR(INDEX($B$2:$B$22,MATCH(9^99,SEARCH($B$2:$B$22,A2))),"Keep")
If there are more than one match, you will get the last match with the above formula. If you wish in that case to return the first formula, you will have to use --ISNUMBER around the search function, use 1 for the first parameter of MATCH and use exact match (i.e. use 0/FALSE for the 3rd parameter of MATCH.
Of course, you can use COUNTIF(A2,"*"&B$2:$B$22&"*") instead in that case for the inner part of the formula instead of --ISNUMBER(...).

For all who is interested: this formula does the trick:
=IFERROR(LOOKUP(9.99E+307,SEARCH(B$2:B$22,A2),B$2:B$22),"Keep")

Related

Lookups with Multiple Non-Exact Criteria using INDEX-MATCH - Problem finding nearest values that best meet conedition

I am trying to make lookups with Multiple Non-Exact Criteria using INDEX-MATCH.
The formual looks like this:
=INDEX(C314:C318;MATCH(1;(D314:D318>=G313)*(E314:E318>=G314);0))
Criterias are: greater or equal to amount X.
Formula works fine, however when using a long list of values, it does not find the best matching value, it finds the first value that matches the criteria.
For example.
Condition 1 is: code "find code equal to 2055516"
Condition 2 is: numerical "find value equal or above 77"
Condition 3 is: alphabetical "find letter equal or greater than H"
In a large dataset where I´ve got many values, it finds only the next best value that matches this criteria. First value that meet that condition would be "80" and "R", however, following values in my dataset, way below, meet much better those criteria with "78" and "I". Problem here is clear I guess.
How can I adapt my formula to look for those much more fitting values that meet my condtions?
Dataset table looks like this:
The formula should return the name "A, B, C, D, E" of the best maching product.
I used a helper column called Helper to rank the letters in Condition 2 alphabetically first using the following formula (drag it down to apply to all rows):
=COUNTIF(Condition_2,"<="&Condition_2)
then use the following formula to find the best match (although it is an array formula it does not need to be confirmed by Ctrl+Shift+Enter):
=INDEX(Product1,MATCH(AGGREGATE(15,6,Helper/((Condition_1>=77)/(Condition_2>="H")),1),Helper,0))
Replace the named ranges in the above formula with the actual ranges in your worksheet.
Replace , with ; as the delimiter in all formulas to suit your system.
EDIT #2
Based on the new scenario, the problem can be solved by AGGREGATE function solely given that the look up value is a number (EAN)
The formula in Cell J2 of my above example is:
=AGGREGATE(15,6,EAN/((DIMENSION=F2)/(LOAD_INDEX>=G2)/(SPEED_INDEX>=H2)),1)
Please note the following are all named ranges which needs to be replaced with the actual range on your workbook:
DIMENSION being B2:B8
LOAD_INDEX being C2:C8
SPEED_INDEX being D2:D8
EAN being A2:A8
If you do not want to show the error #NUM! for no matching result, you can use IFERROR to return a blank cell as shown in Cell J3 of my example. The formula is:
=IFERROR(AGGREGATE(15,6,EAN/((DIMENSION=F3)/(LOAD_INDEX>=G3)/(SPEED_INDEX>=H3)),1),"")
EDIT #3
Please use the following array formula (need to confirm by pressing Ctrl+Shift+Enter) to find the closest match of LOAD INDEX and SPEED INDEX with the help of a Helper column.
{=INDEX(EAN,MATCH(AGGREGATE(15,6,Helper/((LOAD_INDEX/((DIMENSION=G2)/(LOAD_INDEX>=H2)/(SPEED_INDEX>=I2)))=AGGREGATE(15,6,LOAD_INDEX/((DIMENSION=G2)/(LOAD_INDEX>=H2)/(SPEED_INDEX>=I2)),1)),1),Helper/((LOAD_INDEX/((DIMENSION=G2)/(LOAD_INDEX>=H2)/(SPEED_INDEX>=I2)))=AGGREGATE(15,6,LOAD_INDEX/((DIMENSION=G2)/(LOAD_INDEX>=H2)/(SPEED_INDEX>=I2)),1)),0))}
The logic is to first find the closest matches to LOAD INDEX and then find the closest match to SPEED LIMIT from the range with the closest matches to LOAD INDEX.
Again if you do not want to show #NUM! error for no matching result, you can use IFERROR to return the desired result.
Let me know if there is any question. Cheers :)
This might be done through something much smarter, but the following worked for me:
Formula in G2, following the above sample data to return the row with the best match:
=INDEX(A2:A6,MATCH(SMALL(IF(B2:B6-E2>-1,B2:B6-E2+IF(CODE(C2:C6)-CODE(F2)>-1,CODE(C2:C6)-CODE(F2),""),""),1),IF(B2:B6-E2>-1,B2:B6-E2+IF(CODE(C2:C6)-CODE(F2)>-1,CODE(C2:C6)-CODE(F2),""),""),0))
Note Enter as array formula through Ctrl+Shift+Enter
When no criteria matches, it will return an error, you could catch through IFERROR().

Match function parts in Excel 2016

I have this function:
MATCH(1,(PositionParameter[[#All],[Position Revised]]=$C94)asterisk(PositionParameter[[#All],[Campus Type Short]]=G$3)asterisk(PositionParameter[[#All],[Campus Num Arbitrary]]=G$1),0))
and I can't figure out what it does. I don't know what the asterisks are for. PositionParameter is the name of the worksheet, Position Revised is the name of a column, Campus Type Short is the name of a column, and Campus Num Arbitrary is the name of a column. There is suppose to be an asterisk between the first PositionParameter() and the second PositionParameter(). There is supposed to be another asterisk between the second PositionParameter() and the third PositionParameter(), but it is rendered as an italic. I took the asterisk out and spelled it out. The tooltip tells me this is suppose to return some sort of array, but I can't figure out its components. Can someone explain the asterisks to me? I would appreciate it.
Thanks,
Howard Hong
Your formula returns a single value - the relative position of the first row in the data where all three conditions are met.
It works like this:
Each of these three conditional statements:
PositionParameter[[#All],[Position Revised]]=$C94
PositionParameter[[#All],[Campus Type Short]]=G$3
PositionParameter[[#All],[Campus Num Arbitrary]]=G$1
.....returns an array of TRUE/FALSE values. Multiplying these three arrays together produces a single array of 1/0 values, 1 when all conditions are met in a row, 0 otherwise. This array forms the "lookup array" of the MATCH function
The "lookup value" is 1 so that value is looked up in the lookup array and the result of the MATCH function is the position of the first 1, which corresponds to the first row where all conditions are satisfied.
If there are no rows which meet all three conditions then the result is #N/A
Note that the zero at the end is the third parameter of the MATCH function - zero menas that an exact match must be found.
This is an "array formula" which needs to be confirmed with CTRL+SHIFT+ENTER
Often you would use this in conjunction with INDEX function to return a value from another column in the first row where conditions are satisfied, e.g. using normal cell references
=INDEX(A:A,MATCH(1,(B:B="x")*(C:C="y"),0))
That formula will return the value from column A in the first row where the two specified conditions are met (col B = "x"and col C = "y")
Well, asterisk could be a multiplication symbol or it could be a wildcard in Match. By the looks of the placement, I'd say it's multiplying data from an array or table.
And, um... I don't know what the asterisks are for but I took the asterisk out and spelled it out? Why would you do that? Was it working before you changed it? Where did you find this formula?
Please read [mcve]. Without sample data or other information about the purpose of the formula, I will take a wild guess:
Paste this into the cell:
=MATCH(1,(PositionParameter[[#All],[Position Revised]]=$C94)*(PositionParameter[[#All],[Campus Type Short]]=G$3)*(PositionParameter[[#All],[Campus Num Arbitrary]]=G$1),0))
. . . and assuming it's supposed to be an array, instead of hitting Enter on that cell:
hit: Ctrl+Shift+Enter to create an array formula.
Besides the link above, here is some other reading & practice for you:
Create an array formula
MATCH function
I think certains applications replace certain symbols (that aren't allowed in the application] with words when copying and pasting from Excel to them, but without more information about what happened, I can't say for sure what happened.
Assuming that the * are real and that the formula is entered as an array formula then it should return an array of 0s and 1s.
The formula is looking for Position Revised=C94 AND Campus Type Short =G3 AND Campus Num Arbitrary = G1
It will return a 1 for each row that matches all these conditions and a 0 for each row that does not.
If no rows match the conditions it will return #N/A

Using substrings within an INDEX MATCH function

I'm trying to use an INDEX MATCH function to automate filling one spreadsheet using data from another. However, the sheet I am filling has people's names listed in two separate cells (e.g. "John" in A1, "Doe" in B1), while the sheet I am filling from has them listed as "DOE, JOHN Q." in one cell. A regular INDEX MATCH function would never give a result since no cell in the domain contains just the string "Doe." Is there any way to perform the function for cells that have the lookup value within the cell as a substring?
For reference, this is my attempt at this problem, which of course returns an error:
=INDEX(Sheet2!G2:Sheet2!G2340,MATCH(B3,ISNUMBER(SEARCH(B3,Sheet2!C2:Sheet2!C2340)),0))
I think you're on the right track. I think your initial formula should work but you need to make two changes:
Match criteria should be "TRUE" instead of "B3", as the result iof the ISNUMBER() function is a TRUE or FALSE
When entering the formula, press CTRL+SHIFT+ENTER instead of just ENTER, to make this an array function
Formula:
=INDEX(Sheet2!G2:Sheet2!G2340,MATCH(TRUE,ISNUMBER(SEARCH(B3,Sheet2!C2:Sheet2!C2340)),0))
If you wanted to search for both the first and last name, you could multiply two ISNUMBER() arrays together, and search for 1 instead of TRUE. This would get it to match both A3 and B3:
=INDEX(Sheet2!G2:Sheet2!G2340,MATCH(1,ISNUMBER(SEARCH(A3,Sheet2!C2:Sheet2!C2340))*ISNUMBER(SEARCH(B3,Sheet2!C2:Sheet2!C2340)),0))
Alternately, you could attempt to match both first and last names together with:
=INDEX(Sheet2!G2:Sheet2!G2340,MATCH(TRUE,ISNUMBER(SEARCH(B3&", "&A3,Sheet2!C2:Sheet2!C2340)),0))
Hope this helps. If I missed something, let me know!
You may concatenate the strings and provide that as an argument for MATCH or INDEX function. Still it would not work precisely as you do not have second names/initials like
MATCH(B1 & ", "& A1, Sheet2!C2:C2340, -1)
Although, you need to provide more details to get anything more from the StackOverflow community.

Excel MATCH character limit

I use the following formula =INDEX(Dict!A:A,MATCH(A2,Dict!A:A,0),1) but MATCH only works with text below 256 characters. Any way to overcome this limitiation?
To accommodate partial matches use SEARCH like this:
=INDEX(Dict!A:A,MATCH(TRUE,INDEX(ISNUMBER(SEARCH(A2,Dict!A:A)),0),0))
That will work to return a value > 256 characters but A2 can't be > 256 characters
You can use the LDMP look-up method, taking advantage of the concat formula available since the Excel 2016.
={CONCAT(IF(Value=A:A;B:B;"")}
Note that it is a matrix formula so you must enter it CTRL + SHIFT + ENTER.
Additionally the formula returns not only the first value but all the matching values.
There is another way to do this, if you don't mind the messiness of a helper column, and your original formula is not being repeated in subsequent rows (i.e. matching cells A3, A4, A5...). The helper column can of course be hidden to keep DICT looking pretty.
Insert (or use) a column (say B:B) next to A:A in DICT!, and populate it with a simple formula "=A1=SHEET1!A$2" (SHEET1 being the name of your source/original sheet), which will populate the column with TRUE and FALSE values, indicating which rows (if any) in DICT match to A2.
The match syntax then changes from "MATCH(A2,Dict!A:A,0)" to "MATCH(TRUE,Dict!B:B,0)".
Note: I know this works in principle as I have just done it, but if I added a typo in retrofitting it to the example provided, apologies, however the principle should be easy to follow.

Matching two columns in one sheet to two columns in another

I need to assign a status to a row based on a VLOOKUP query between two worksheets. The problem is that the identifier is not always unique. However, the identifier + a date value should be unique. I wanted to use:
=VLOOKUP(A3&H3,'OtherSheet'!D:E,1,FALSE)
with A3 being the identifier and H3 being the corresponding date. D in the other sheet is the identifier and E is the date column. However, I keep getting #N/A.
Does this mean that there are no matches with the "identifier+date" or is Excel looking for "identifier+date" in either column D or E? If the latter is true, how can I let Excel concatenate D and E when matching to the search pattern?
There's work around without using CTRL+Shift+Enter.
Use this formula that will match A3 in D column of othersheet and H3 with the date in column E of the othersheet.
=INDEX(OtherSheet!F:F,MATCH(1,INDEX((OtherSheet!D:D=A3)*(OtherSheet!E:E=H3),),0))
The formula will return data from F column of OtherSheet.
You can modify the range OtherSheet!F:F as appropriate.
That formula is looking to find A3 concatenated with H3 (identifier&date) in OtherSheet ColumnD that contains only identifiers, so will inevitably fail. Yes, Excel is looking for “identifier+date” in column D.
Excel will happily concatenate A3 with H3 ‘on the fly’ (within a formula) but will not so happily concatenate OtherSheet ColumnD and ColumnE values in the same way. The conventional solution, because usually simplest in a case like this, is to prepare for the VLOOKUP by adding a helper column that concatenates the D and E values while preserving these in the same row as the value sought.
Because VLOOKUP will only look to the right this is usually a column that is added to the left of the value being searched for, so say either in C or by insertion of a column immediately to the right of C. However, since you are only checking a single column the location is not critical. You might add this (in OtherSheet) as ColumnZ, with a formula such as:
=D2&E2
copied down to suit*. Again because you are only checking a single column it does not matter which row such a formula is placed in.
However, because only checking whether A3&H3 exists in OtherSheet a simple alternative may be to apply COUNTIFS:
=COUNTIFS(OtherSheet!D:D,A3,OtherSheet!E:E,H3)
Any result other than 0 from this should indicate that the combination being tested for exists in OtherSheet – without need for a helper column.
* Depending on the format of your identifiers it is possible that concatenation may introduce ambiguity. For example ID90 concatenated with 11/1/15 may not be distinguishable from ID901 concatenated with 1/1/15, so it may be advisable if taking this approach to introduce a delimiter, in both the VLOOKUP formula (say A3&"|"&H3 rather than just A3&H3) and therefore also in the helper column, say =D2&"|"&E2.
You likely would want to use Index/Match instead. Vlookup is tricky when it comes to searches for multiple things. Here's the way you would use Index/Match:
Without knowing how your spreadsheet is set up, here's how you could do it:
If I understand correctly, you want to use A3 to find the match in OtherSheet!D, and H3's match in OtherSheet!E. Index match is perfect for this. Instead of vLookup, use
=Index(OtherSheet!D:D&","&Text(OtherSheet!E:E,"mm-dd-yyyy"),Match(A3&H3,OtherSheet!D&OtherSheet!E,0)), and enter with CTRL+SHIFT+ENTER.
What the Index() will return is the concatenated Identifier and Date, separated with a comma. If, though, you have a table like this:
That index/match formula will return "Batman". The index to return is the named range G2:G5. You're looking for a match on A1 (the Identifier) and B1 (the Date), then you're searching for (in the order you just put) the Identifier to be in the range E2:E5, and the Date to be in F2:F5. When there's a match for both, it returns the name in G2:G5.
Here's a link to a site on using Index/Match, and another and its advantages over vlookup.

Resources