Compare data in two cells to look for a partial match

Compare data in two cells to look for a partial match - excel

This is probably quite a simple script to write, but my coding is terrible.
I have a column with A list of surnames, and column B with file names that should contain the surname somewhere in the string.
i.e.
A B
Smith 10 0950 Smith 10101950
Jones 10 0955 Jones 10051942
Thomas 10 1008 Thoma 01051972
I need to check that the spelling of the surname in column A is found within the string in column B, and highlight the row where there is a mismatch.
In this instance the third line for Thomas would be highlighted.
There should be no more than 1000 rows to be checked.
I cannot find a piece of code anywhere that will do this for me.
Thanks for any assistance you can offer.

dim numrows as long
numrows = Cells.find("*", [A1], , , xlByRows, xlPrevious).Row
'loop over all rows
for i = 1 to numrows
'instr function returns 0 if string 2 is not found within string 1. in this case, if the value in col A, is not found within col B
if instr(1, cells(i,2).formulaR1C1, Cells(i,1).formulaR1C1) = 0 then
Cells(i,1).entireRow.interior.color = RGB(255,255,0) 'highlight row
end if
next
Welcome to SO, you're correct, it was simple. however please do note that according to the help page on "on topic answers" - "Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results." do keep that in mind in future posts :)
also, don't forget to mark as answer if this works for you :)

Related

VLOOKUP with conditions

I have an issue at the moment which I'm not able to resolve even with multiple combinations of If and Vlookups. I'm not doing this right.
I have a sheet which has the names of the products and an empty column for the Sl Number. The Sl number needs to be retrieved from Sheet 2 if it matches the value in the adjacent cell of the formula (This I know can be possible with Vlookup). However, I am trying to display the value even if the match is not exact. By that I mean if the product name has all the values as on the sheet 1 but also has additional information in brackets, then the value should still be displayed.
Sheet 1
Formula in A2 - A7 = "=VLOOKUP(B2, Sheet2!B:E, 2, 0)"
Sheet 2
The complete data
Is this possible?
Thanks in advance.
Apologies, I'm new here and not sure how this works. So trying to do the right thing but may take some time.
Thanks Frank and Tim. I have another extended question to this.
Is there a way to retrieve the value by ignoring text in brackets on the lookup cell itself?
For example:
Sheet 1
Sl Number Name
123454 Cream SPF 30+ 50g
**NA** Bar Chocolate 70g X 6 (Sample)
234256 Hand Wash 150ml
26786 Toothpaste - Whitening 110g
Sheet 2
ID Name Sl number Manufacturer Quantity
8 Collagen Essence 10ml 456788 AL 87
9 Hand Wash 150ml 234256 AD 23
10 Bar Chocolate 70g X 6 835424 AU 234
Row 2 on Sheet 1 has the name that includes (Sample) and the same product on sheet 2 does not contain the (Sample) for that product. Is there a way I can use lookup in the above scenario?
Thank you

Tim's comment
=VLOOKUP(B2 & "*", Sheet2!B:E, 2, 0) as long as the "Extra" info is tagged onto the end of the name, and none of your product names is a
substring of another product name. – Tim Williams 53 mins ago
Will get what you are looking for, as for getting rid of text between "(...)" use
=IFERROR(IF(FIND("(",A2),LEFT(A2,FIND("(",A2)-1),A2),A2)
To create a new column that will cut out anything that has parentheses "(...)" this presumes that all of your entries has the "(...)" at the end, i.e. far right side.
As you are new, I presume you might be interested in an explanation. I'll explain what Tim and I did. If I am incorrect, anyone is free to edit.
Based on your question, it would appear that you are familiar with Excel but not the site. This said, my understanding of the key difference between your attempt and Tim's was =VLOOKUP(B2 & "*", Sheet2!B:E, 2, 0) or specifically & "*". This introduces a Wildcard to the search parameter. So if you typed "Bob" but the actual reference was "Bob's Burger" That "*" would allow ['s Burger] to be included as part of the possible search given that you set vLookup to search for Approximate rather than exact matches. =VLOOKUP(B2 & "*", Sheet2!B:E, 2, 0) specifically , 0).
As for my part, IFERROR is effectively an catch-all for errors in IF functions. If there is a error, then X. In this case, if it does not find "(" in the cell, then it will throw an error. Since it is an error, display the original cell.
As for IF(FIND("(",A2),LEFT(A2,FIND("(",A2)-1),A2) It asks Excel to look for "(" in the cell A2, if it finds it, then it it counts from the LEFT until it finds the "(" and deletes the text one space to the left of the first "(". Thus removing the "(...)".

Excel non-uniform data extraction

I've had a really hard time tracking down a solution for this--though I'm sure it's out there. Just not sure of the exact wording to get what I'm looking for.
I have a huge data set where some of the data is missing information so it is not uniform. I want to extract just the name into one column and the e-mail in to the next column.
The best way I can narrow this down is there is a space between each unique entry with the name always being in the first box.
Example:
John Doe
John Doe's Company
(555) 555-5555
John.doe#johndoe.com
John Doe
(555) 555-5555
John Doe
Jane Doe's Company
John.doe#johndoe.com
With the results wanted being (in two excel columns):
John Doe | john.doe#johndoe.com
John Doe |
John Doe | john.doe#johndoe.com
Any suggestions on the best way to do this would be appreciated it. To make it complicated if there was no e-mail I would want to ignore that set completely, but I could just manually check.

VBA coding:
1. Indicate in Row1 the initial row where the data begins.
2. Place a flag in this case the word "end" to indicate the end of the information.
3. Create a second sheet
Sub ToList()
Row1 = 1 'Row initial from data
Row2 = 1 'Row initial to put list
Do
Name = False
Do
field = Trim(Sheets(1).Cells(Row1, 1))
If field <> "" And LCase(field) <> "end" And Not Name Then
Sheets(2).Cells(Row2, 1) = field
Name = True
End If
Row1 = Row1 + 1
Loop Until (IIf(field = "" Or LCase(field) = "end", True, False))
fieldprev = Sheets(1).Cells(Row1 - 2, 1)
If InStr(fieldprev, "#") > 0 Then
Sheets(2).Cells(Row2, 2) = fieldprev
End If
Row2 = Row2 + 1
Loop Until (IIf(LCase(field) = "end", True, False))
End Sub

Extracting the e-mail address shouldn't be too difficult as you just need to is search for a string containing the # character. A series of search() and mid() functions can be used to separate out the individual words. Search for each instance of a space and use that value in a mid() function. Then search for # in the results and you should find the e-mail address. Extracting the name will be more difficult if the original data is very messy.
However I second the comment above about using an external script, especially for a large dataset. Excel isn't really designed for the sort of thing you are describing here.

Need formula for excel, to subtract the number "9" to each number individually and

I want you to have some fun. I need something specific.
First i must explain what i do. I use a simple codification for product prices at retail store, because i dont want people know the real price for themselves. So i change the original numbers to another subtracting the number 9 for each number.
Normally I manually write down all the prices with this codification for every product.
So.. for example number 10 would be 89. (9-1 = 8) and (9-0 = 9)
Other examples:
$128 = 871
$75 = 24
$236 = 763
$9 = 0
Finally i put 2 number nines (9) at the beginning of the codified price also, to confuse people who might think that number could be the price.
So the examples i used before are like this:
99871 (means $128)
9924 (means $75)
99763 (means $236)
990 (means $9)
Remember that i need 2 (two) nines before the real price. The real prices never start with 0 so, the nines at the beginning exist only to confuse people.
Ok. So, now that you understand, here comes the 2nd part.
I have an excel whith hundreds of my products added, with prices, description, etc. And i decided it is time to use a printer and start to print this information from excel. I have a software to do that, but first i need to have the codified prices in the excel also.
The fun part begins when i want to convert the real prices that are already written in my excel document into a new column AUTOMATICALLY. So that way i don´t have to type again all the prices in codified form for the old and new items i add in the future.
Can someone help me with this? Is it even possible?
I tried with =A1-9999 but, it works well with 2 character number only. Because if the real price is 5, i will get 3 nines: 9994(code). And if the price is 234 i will get only 1 nine 9765(code). And it is a condition i need to have the TWO nines at first.
Thank you very much in advanced!

Though you have requested for formula , I am suggesting VBA program which seems to me very convenient.
You have to open VBE and insert a module and copy the program. Change the code lines wherever indicated to suit your requirements for sheets etc.
Sub NumberCode()
Dim c As Range
Dim LR As Integer
Dim numProbs As Long
Dim sht As Worksheet
Dim s As Integer
Dim v As Long
Dim v1 As Long
Set sht = Worksheets("Sheet1") ' change as per yr requirement
numProbs = 0
LR = sht.Cells(Rows.Count, "A").End(xlUp).Row
For Each c In sht.Range("A1:A" & LR).Cells
s = Len(c)
v = c.Value
v1 = 99
For s = 1 To Len(c)
v1 = v1 & (9 - Mid(c, s, 1))
Next
c.Offset(0, 1).Value = v1
v1 = 99
numProbs = numProbs + 1
Next
MsgBox "Number coding finished"
End Sub
Sample sheet of results is appended below.

I will be using helper cells but you could dump it all into one cell if you want since you are only dealing with 4 characters.
For the purpose of this example, I am assuming your original price list starts in B11.
=IFERROR(9-MID($B11,COLUMN(A1),1),"")
Place that in D11 and copy to the right three more times so you have it from D11 to G11. That formula strips off 1 character from your price and subtracts that character from 9. When you go the next column it repeats itself. If you do not have that many characters, it will return "".
In C11 you will build your number based on the adjacent 4 columns using this formula:
="99"&D11&E11&F11&G11
It places 99 in front then adds the numbers from the adjacent 4 columns.
Select cells C11 to G11 and copy and paste downward beside your data column as far as you need to go.
An alternate more concise method would be:
=REPT(9,LEN(B11)+2)-B11

Perhaps I'm missing something, though simply:
=REPT(9,2+LEN(A1))-A1
seems good to me.
Regards

How can I lookup data from one column, when the value I'm referencing changes columns?

I want to do an INDEX-MATCH-like lookup between two documents, except my MATCH's index array doesn't stay in one column.
In Vague-English: I want a value from a known column that matches another value that may be found in any column.
Refer to the image below. Let's call everything to the left of the bold vertical line on column H doc1, and the right side will be doc2.
Doc2 has a column "Find This", which will be the INDEX's array. It is compared with "ID1" from doc1 (Note that the values in "Find This" will not be in the same order as column ID1, but it's easier to undertsand this way).
The "[Result]" column in doc2 will be the value from doc1's "Want This" column from the row that matches "FIND THIS" ...However, sometimes the value from "FIND THIS" is not in the "ID1" column, and is instead in "ID2","ID3", etc.
So, I'm trying to generate Col K from Col J. This would be like pressing Ctrl+F and searching for a value in Col J, then taking the value from Col D in that row and copying it to Col K.
I made identical values from a column the same color in the other doc to make it easier to visualize where they are coming from.
Note also that in column F of doc1, the same value from doc2's "Find This" can be found after some other text.
Also note that the column headers are only there as examples, the ID columns are not actually numbered.
I would simply hard-code the correct column to search from, but I'm not in control of doc1, and I'm worried that future versions may have new "ID" columns, with other's being removed.
I'd prefer this to be a solution in the form of a formula, but VB will do.

To generate column K based on given values of column J then you could use the following:
=INDEX(doc1!$D$2:$D$14,SUMPRODUCT((doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14))-1)
Copy that formula down as far as you need to go.
It basically only returns the row of the where a matching column J is found. we then find that row in the index of your D range to get your value in K.
Proof of concept:
UPDATE:
If you are working with non unique entities n column J. That is the value on its own can be found in multiple rows and columns. Consider using the following to return the Last row where there J value is found:
=INDEX(doc1!$D$2:$D$14,AGGREGATE(14,6,(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14),1)-1)
UPDATE 2:
And to return the first row where what you are looking in column J is found use:
=INDEX($D$2:$D$14,AGGREGATE(15,6,1/($B$2:$H$14=J2)*ROW($B$2:$H$14)-1,1))
Thanks to Scott Craner for the hint on the minimum formula.
To determine if you have UNIQUE data from column J in your range B2:H14 you can enter this array formula. In order to enter an array formula you need to press CTRL+SHFT+ENTER at the same time and not just ENTER. You will know you have done it right when you see {} around your formula in the formula bar. You cannot at the {} manually.
=IF(MAX(COUNTIF($B$2:$H$14,J2:J14))>1,"DUPLICATES","UNIQUE")
UPDATE 3
AGGREGATE - A relatively new function to me but goes back to Excel 2010. Aggregate is 19 functions rolled into 1. It would be nice if they all worked the same way but they do not. I think it is functions numbered 14 and up that will perform the same way an array formula or a CSE formula if you prefer. The nice thing is you do not need to use CSE when entering or editing them. SUMPRODUCT is another example of a regular formula that performs array formula calculations.
The meat of this explanation I believe is what is happening inside of the AGGREGATE brackets. If you click on the link you will get an idea of what the first two arguments are. The first defines which function you are using, and the second tell AGGREGATE how to deal with Errors, hidden rows, and some other nested functions. That is the relatively easy part. What I believe you want to know is what is happening with this:
(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14)
For illustrative purpose lets reduce this formula to something a little smaller in scale that does the same thing. I'll avoid starting in A1 as that can make life a little easier when counting since it the 1st row and first column. So by placing the example range outside of it you can see some more special considerations potentially.
What I want to know is what row each of the items list in Column C occurs in column B
| B | C
3 | DOG | PLATYPUS
4 | CAT | DOG
5 | PLATYPUS |
The full formula for our mini example would be:
{=($B$3:$B$5=C2)*ROW($B$3:$B$5)}
And we are going to look at the following as an array
=INDEX($B$3:$B$5,AGGREGATE(14,6,($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
So the first brackets is going to be a Boolean array as you noted. Every cell that is TRUE will TRUE until its forced into a math calculation. When that happens, True becomes 1 and False becomes 0.I that formula was entered as a CSE formula and place in D2, it would break down as follows:
FALSE X 3
FALSE X 4
TRUE X 5
The 3, 4 and 5 come from ROW() returning the value of the row number that it is dealing with at the time of the array math operation. Little trick, we could have had ROW(1:3). Just need to make sure the size of the array matches! This is not matrix math is just straight across multiplication. And since the Boolean is now experiencing a math operation we are now looking at:
0 X 3 = 0
0 X 4 = 0
1 X 5 = 5
So the array of {0,0,5} gets handed back to the aggregate for more processing. The important thing to note here is that it contains ONLY 0 and the individual row numbers where we had a match. So with the first aggregate formula, formula 14 was chosen which is the LARGE function. And we also told it to ignore errors, which in this particular case does not matter. So after providing the array to the aggregate function, there was a ,1) to finish off the aggregate function. The 1 tells the aggregate function that we want the 1st larges number when the array is sorted from smallest to largest. If that number was 2 it would be the 2nd largest number and so on. So the last row or the only row that something is found on is returned. So in our small example it would be 5.
But wait that 5 was buried inside another function called Index. and in our small example that INDEX formula would be:
=INDEX($B$3:$B$5,AGGREGATE(...)-2)
Well we know that the range is only 3 rows long, so asking for the 5th row, would have excel smacking you up side the head with an error because your index number is out of range. So in comes the header row correction of -1 in the original formula or -2 for the small example and what we really see for the small example is:
=INDEX($B$3:$B$5,5-2)
=INDEX($B$3:$B$5,3)
and here is a weird bit of info, That last one does not become PLATYPUS...it becomes the cell reference to =B5 which pulls PLATYPUS. But that little nuance is a story for another time.
Now in the comments Scott essentially told me to invert for the error to get the first row. And this is important step for the aggregate and it had me running in circles for awhile. So the full equation for the first row option in our mini example is
=INDEX($B$3:$B$5,AGGREGATE(15,6,1/($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
And what Scott Craner was actually suggesting which Skips one math step is:
=INDEX($B$3:$B$5,AGGREGATE(15,6,ROW($B$3:$B$5)/($B$3:$B$5=C2),1)-2)
However since I only realized this after writing this all up the explanation will continue with the first of these two equations
So the important thing to note here is the change from function 14 to function 15 which is SMALL. Think of it a finding the minimum. And this time that 6 plays a huge factor along with the 1/. So our array in the middle this time equates to:
1/FALSE X 3
1/FALSE X 4
1/TRUE X 5
Which then becomes:
1/0 X 3
1/0 X 4
1/1 X 5
Which then has excel slapping you up side the head again because you are trying to divide by 0:
#div/0 X 3
#div/0 X 4
1/1 X 5
But you were smart and you protected yourself from that slap upside the head when you told AGGREGATE to ignore error when you used 6 as the second argument/reference! Therefore what is above becomes:
{5}
Since we are performing a SMALL, and we passed ,1) as the closing part of the AGGREGATE, we have essentially said give me the minimum row number or the 1st smallest number of the resulting array when sorted in ascending order.
The rest plays out the same as it did for the LARGE AGGREGATE method. The pitfall I fell into originally is I did not use the 1/ to force an error. As a result, every time I tried getting the SMALL of the array I was getting 0 from all the false results.
SUMPRODUCT works in a very similar fashion, but only works when your result array in the middle only returns 1 non zero answer. The reason being is the last step of the SUMPRODUCT function is to all the individual elements of the resulting array. So if you only have 1 non zero, you get that non zero number. If you had two rows that matched for instance 12 and 31, then the SUMPRODUCT method would return 43 which is not any of the row numbers you wanted, where as aggregate large would have told you 31 and aggregate small would have told you 12.

Something like this maybe, starting in K2 and copied down:
=IFERROR(INDEX(D:D,MAX(IFERROR(MATCH(J2,B:B,0),-1),IFERROR(MATCH(J2,E:E,0),-1),IFERROR(MATCH(J2,G:G,0),-1),IFERROR(MATCH(J2,H:H,0),-1))),"")
If you want to keep the positions of the columns for the Match variable, consider creating generic range names for each column you want to check, like "Col1", "Col2", "Col3". Create a few more range names than you think you will need and reference them to =$B:$B, =$E:$E etc. Plug all range names into Match functions inside the Max() statement as above.
When columns are added or removed from the table, adjust the range name definitions to the columns you want to check.
For example, if you set up the formula with five Matches inside the Max(), and the table changes so you only want to check three columns, point three of the range names to the same column. The Max() will only return one result and one lookup, even if the same column is matched several times.

I came up with a vba solution if I understood correctly:
Sub DisplayActiveRange()
Dim sheetToSearch As Worksheet
Set sheetToSearch = Sheet2
Dim sheetToOutput As Worksheet
Set sheetToOutput = Sheet1
Dim search As Range
Dim output As Range
Dim searchCol As String
searchCol = "J"
Dim outputCol As String
outputCol = "K"
Dim valueCol As String
valueCol = "D"
Dim r As Range
Dim currentRow As Integer
currentRow = 1
Dim maxRow As Integer
maxRow = sheetToOutput.UsedRange.Rows.Count
For currentRow = 1 To maxRow
Set search = Range("J" & currentRow)
For Each r In sheetToSearch.UsedRange
If r.Value <> "" Then
If r.Value = search.Value Then
Set output = sheetToOutput.Range(outputCol & currentRow)
output.Value = sheetToSearch.Range(valueCol & currentRow).Value
currentRow = currentRow + 1
Set search = sheetToOutput.Range(searchCol & currentRow)
End If
End If
Next
Next currentRow
End Sub

There might be better ways of doing it, but this will give you what you want. We assume headers in both "source" and "destination" sheets. You will need to adapt the "Const" declarations according to how your sheets are named. Press Control & G in Excel to bring up the VBA window and copy and paste this code into "This Workbook" under the "VBA Project" group, then select "Run" from the menu:
Option Explicit
Private Const sourceSheet = "Source"
Private Const destSheet = "Destination"
Public Sub FindColumns()
Dim rowCount As Long
Dim foundValue As String
Sheets(destSheet).Select
rowCount = 1 'Assume a header row
Do While Range("J" & rowCount + 1).value <> ""
rowCount = rowCount + 1
foundValue = FncFindText(Range("J" & rowCount).value)
Sheets(destSheet).Select
Range("K" & rowCount).value = foundValue
Loop
End Sub
Private Function FncFindText(value As String) As String
Dim rowLoop As Long
Dim colLoop As Integer
Dim found As Boolean
Dim pos As Long
Sheets(sourceSheet).Select
rowLoop = 1
colLoop = 0
Do While Range(alphaCon(colLoop + 1) & rowLoop + 1).value <> "" And found = False
rowLoop = rowLoop + 1
Do While Range(alphaCon(colLoop + 1) & rowLoop).value <> "" And found = False
colLoop = colLoop + 1
pos = InStr(Range(alphaCon(colLoop) & rowLoop).value, value)
If pos > 0 Then
FncFindText = Mid(Range(alphaCon(colLoop) & rowLoop).value, pos, Len(value))
found = True
End If
Loop
colLoop = 0
Loop
End Function
Private Function alphaCon(aNumber As Integer) As String
Dim letterArray As String
Dim iterations As Integer
letterArray = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
If aNumber <= 26 Then
alphaCon = (Mid$(letterArray, aNumber, 1))
Else
If aNumber Mod 26 = 0 Then
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations - 1, 1)) & (Mid$(letterArray, 26, 1))
Else
'we deliberately round down using 'Int' as anything with decimal places is not a full iteration.
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations, 1)) & (Mid$(letterArray, (aNumber - (26 * iterations)), 1))
End If
End If
End Function

Match items in list against themselves for semi-uniqueness

I'm really just looking for some kind of tool that will check for close approximations of duplicates in a column of data. For instance, say I have a column of data with addresses as such:
113 James Way
3448 Harlon Circle
5888 Murray Rd
3448 Harlon Cr.
In this case entry 2 and 4 would be very close to unique and I would like some kind of tool, either in excel or standalone, that would notify me if rows are being duplicated or approximately duplicated. I have no idea how to even search for something like this. I tried searches for fuzzy match tools and the like but nothing is quite what I need. Thanks,

There are several ways to approach
One simple method is to write a Levenshtein function to compare these addressed with each other and highlight low values
Assume you have the data setup as follows
Raw example
Sub FindClosestMatch()
Range("B3").Select
Dim mystrings()
Range("B3").Select
Range(Selection, Selection.End(xlDown)).Select
mystrings = Selection.Value
i = 0
Dim string1 As String, string2 As String
Range("C3").Select
For i = LBound(mystrings) To UBound(mystrings)
string1 = mystrings(i, 1)
For j = 1 To 4
string2 = mystrings(j, 1)
ActiveCell.Value = Levenshtein(string1, string2)
ActiveCell.Offset(0, 1).Select
Next
Range("c3").Offset(i, 0).Select
Next
End Sub
How to read values
For e.g 113 James Way 0 15 13 12 means the string has a score of
0 (exact match) with itself
15 with 3448 Harlon Circle
13 With 5888 Murray Rd
12 with 3448 Harlon Cr.
etc
The Macro just compares every address with other address and finds the Levenshtein distance
The lower the number the closest match they are and clearly 0 is exact match when it compares to itself
This macro assumes you have copied the Levenshtein function into your VBA Module

It really depends on how accurate you need it to be and what kind of close matches you want it to catch. If you want to catch typos it'd be a lot harder. But if you're mainly looking to catch St vs Street you could do a vlookup on the left(address, #) or something. Might have to toy with the # to get a good response. # needs to be higher then the number of digits in the street numbers (4/5?) but small enough to catch things like 1 dry ct. I'd guess 7-8.
Basically your addresses are in column A (assuming starting in A2 with headers). Column B says = left(a2,8)
A2 is obviously unique cause it's first.
Start in C3 with =vlookup(left(a3,8),$B$2:B2,1,0)
It'll print an error for all the unique entries and an address for the dupilcates. To make it cleaner you can add an if(iserror()) with
=if(iserror(vlookup(left(a3,8),$B$2:B2,1,0), "", vlookup(left(a3,8),$B$2:B2,1,0))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string