Search for 2nd occurrence of a string in VBA - excel

How to search for 2nd occurrence of a string in another string in VBA ?
For example , in below given string,the word "test" occurs twice.
this is a test to test

Simply do the search twice
Sub Demo()
Dim DataString As String
Dim SearchString As String
Dim i As Long
DataString = "this is a test to test"
SearchString = "test"
i = InStr(1, DataString, SearchString, vbTextCompare)
i = InStr(i + 1, DataString, SearchString, vbTextCompare)
Debug.Print "Second occurance starts at position " & i
End Sub

update
To just find the last occurrence then just
MsgBox InStrRev("this is a test to test", "test")
initial answer which handled less than 2 occurrences
A little ungainly but handles 1 or 0 occurences
Dim strIn As String
Dim strOut As String
Dim lngPos As Long
Dim lngPos2 As Long
strIn = "test"
strOut = "this is a test to test"
lngPos = InStr(strOut, strIn)
If lngPos > 0 Then
lngPos2 = InStr(lngPos + 1, strOut, strIn)
If lngPos2 > 0 Then MsgBox strIn & " at " & lngPos2
Else
MsgBox "No " & strIn
End If

To find the 2nd occurrence of a string, use the InStr function twice. Note that the substring for which you search could be a repeating character sequence like "dd". In this case you have to determine if you want to return 5 or 6 as the result of searching for "bb" in "abcdddd". That is, do you want to start searching for the second occurrence of "dd" at at the end of the first occurrence or at the second character of the first occurrence?
Private Sub ExampleFind2ndOccurrence()
Dim intFirst As Integer, intSecond As Integer
Dim searchThisString As String: searchThisString = "abcdddddefg"
Dim forThisSubString As String: forThisSubString = "dd"
' Find the first occurrence of forThisSubString
intFirst = InStr(1, searchThisString, forThisSubString, vbTextCompare)
' Find the second occurrence of forThisSubString
intSecond = InStr(1, Mid(searchThisString, intFirst + 1), forThisSubString, vbTextCompare)
If intSecond > 0 Then intSecond = intFirst + intSecond
Debug.Print "2nd occurrence occurs at character position "; intSecond
' Alternate method to find second occurrence of forThisSubString in the
' case where there can be no overlap
intSecond = InStr(1, Mid(searchThisString, intFirst + Len(forThisSubString)), forThisSubString, vbTextCompare)
If intSecond > 0 Then intSecond = intFirst + Len(forThisSubString) - 1 + intSecond
Debug.Print "Prohibbitting overlap, 2nd occurrence occurs at character position "; intSecond
End Sub
Find the Nth occurrence of a substring:
Public Function InStr2(ByVal IntStartPosition As Variant _
, ByVal Str As String _
, ByVal SubStr As String _
, Optional IntCompareMethod As Integer = vbTextCompare _
, Optional IntOccurrence As Integer = 1 _
, Optional BlnOverlapOK As Boolean = False)
' Find the IntOccurrence instance of SubStr in Str
' Parameters:
' IntStartPosition (Integer): the character position at which to start searching.
' (See docs for InStr)
' Str (String): the string to search. (See docs for InStr)
' SubStr (String): the substring to find in Str. (See docs for InStr)
' IntCompareMethod (integer): a VBA compare enumeration value. (See docs for InStr)
' IntOccurrence (integer): The number of instances of SubStr for which to search
' BlnOverlapOK (boolean): Is it okay for the Nth occurence of SubStr to overlap the
' N-1 occurrence?
' Returns the location of the occurence of the IntOccurrence instance of SubStr in Str
Dim s As String
Dim intCharPos As Integer
Dim cnt As Integer
Dim intStart As Integer
Dim i As Integer
' Initialize
If IsMissing(IntStartPosition) Then IntStartPosition = 1
intStart = IntStartPosition
Str = Mid(Str, intStart)
intCharPos = 1
cnt = 0
i = 1
Do While intCharPos <= Len(Str) And cnt < IntOccurrence
s = Mid(Str, intCharPos)
i = InStr(1, s, SubStr, IntCompareMethod)
If i = 0 Or i = Null Then
InStr2 = i
Exit Function
End If
cnt = cnt + 1
If BlnOverlapOK Or Len(SubStr) = 1 Or cnt = IntOccurrence Then
intCharPos = intCharPos + i
Else
intCharPos = intCharPos + i + Len(SubStr) - 1
End If
Loop
InStr2 = intCharPos - 1
End Function
Example of finding nth occurrence of substring:
Private Sub InStr2Example()
Dim i As Integer
Dim searchThisString As String: searchThisString = "abcddddddd"
'1234567890
Dim forThisSubString As String: forThisSubString = "dd"
i = InStr2(1, searchThisString, forThisSubString, vbTextCompare, 3, True)
Debug.Print "3rd occurrence occurs at character position "; i
i = InStr2(1, searchThisString, forThisSubString, vbTextCompare, 3, False)
Debug.Print "Prohibbitting overlap, 3rd occurrence occurs at character position "; i
End Sub

You will need to find where the first occurrence starts and then offset the search range accordingly.
A nested Mid/InStr function will do the trick:
Dim x As String, fVal As String
x = "test this is a test"
fVal = "test"
y = Mid$(Mid$(x, InStr(x, fVal) + Len(fVal)), InStr(Mid$(x, InStr(x, fVal) + Len(fVal)), fVal))
Debug.Print y

A flexible function to find any occurrence using Split() could be:
Function GetPosition(ByVal FullText As String, ByVal SearchString As String, ByVal occurrence As Long, Optional ByVal CaseSensitive As Boolean = False) As Long
'Purpose: get start position of a given search occurrence within fulltext
'[0]case sensitive? (case insensitive by default)
If Not CaseSensitive Then
FullText = LCase(FullText): SearchString = LCase(SearchString)
End If
'[1]split fulltext into substrings
Dim part: part = Split(FullText, SearchString) ' split fulltext
If occurrence < 1 Then Exit Function ' accept only positive occurrencies
If occurrence > UBound(part) Then Exit Function ' refuse too high occurrencies
'[2]add substrings plus searchstring lengths
Dim i As Long, n As Long ' counters
For i = 0 To occurrence - 1
n = n + Len(part(i)) ' add part lengths
Next
n = n + (occurrence - 1) * Len(SearchString) + 1
'[3]return search position of wanted occurrence
GetPosition = n
End Function
Example call
Sub Test()
Dim s As String: s = "this is a test to test to test" ' (three occurrencies of "test")
Dim i As Long
For i = 1 To 4
Debug.Print "Occurrence " & i, "starts at position " & GetPosition(s, "tEst", i)
Next
End Sub

Related

How to count length of a word from a sentence and print the word(s) in the cell?

I want to make a function where I extract all words with length = 2 from a sentence. For example, if the sentence is "The Cat is brown", I want the result in the cell to be "is". If there are multiple words with length = 2, I want to keep these too. I have tried MID, RIGHT, LEFT, etc. These does not work as the position of the word is not always identical.
I have no clue how to do this in VBA, any suggestions are welcome :)
Thanks
I have made you a UDF which should work for what you want. You use it like so:
=ExtractWords(Cell to check, Optional number of letters)
By default it will check for 2 letter words but you can specify as well as shown above.
Here is the code. Place it into a module
Function ExtractWords(Cell As Range, Optional NumOfLetters As Integer)
Dim r As String, i As Long, CurrentString As String, FullString As String, m As String
If NumOfLetters = 0 Then NumOfLetters = 2
r = Cell.Value
For i = 1 To Len(r)
m = Mid(r, i, 1)
If Asc(UCase(m)) >= 65 And Asc(UCase(m)) <= 90 Or m = "-" Or m = "'" Then 'Accepts hyphen or single quote as part of the word
CurrentString = CurrentString & m
If i = Len(r) Then GoTo CheckLastWord
Else
CheckLastWord:
If Len(CurrentString) = NumOfLetters Then
If FullString = "" Then
FullString = CurrentString
Else
FullString = FullString & " " & CurrentString 'Change space if want another delimiter
End If
End If
CurrentString = ""
End If
Next i
If FullString = "" Then
ExtractWords = "N/A" 'If no words are found to contain the length required
Else
ExtractWords = FullString
End If
End Function
There are probably other ways to do it that may be easier or more efficient. This is just something I came up with.
Double Upper Case Occurrences
In Excel you can e.g. use it like this:
=getDUC(A1)
=getDUC(A1," ")
=getDUC(A1,",")
=getDUC(A1,"-")
The Code
Option Explicit
' In Excel:
Function getDUC( _
ByVal s As String, _
Optional ByVal Delimiter As String = ", ") _
As String
Dim arr As Variant
arr = DoubleUCaseToArray(s)
getDUC = Join(arr, Delimiter)
End Function
' In VBA:
Sub testDoubleUCaseToArray()
Dim CCodes As Variant: CCodes = Array("US,UKUs", "UkUS,UK", "kUSUKsUK")
Dim arr As Variant
Dim n As Long
For n = LBound(CCodes) To UBound(CCodes)
arr = DoubleUCaseToArray(CCodes(n))
Debug.Print Join(arr, ",")
Next n
End Sub
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose: From a specified string, returns all unique double upper case
' occurrences in a 1D (zero-based) array.
' Remarks: From the string 'USUk' it returns only 'US' (not `SU`).
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function DoubleUCaseToArray( _
ByVal s As String) _
As Variant
If Len(s) > 1 Then
With CreateObject("Scripting.Dictionary")
Dim cFirst As String * 1
Dim cSecond As String * 1
Dim n As Long
For n = 1 To Len(s) - 1
cFirst = Mid(s, n, 1)
If cFirst Like "[A-Z]" Then
cSecond = Mid(s, n + 1, 1)
If cSecond Like "[A-Z]" Then
.Item(cFirst & cSecond) = Empty
End If
n = n + 1
End If
Next n
If .Count > 0 Then
DoubleUCaseToArray = .Keys
End If
End With
End If
End Function

Change specific characters in just part of a cell

I'd like to change a specific in text, so I need to find all ";" characters and change them to "$" but only if they are in bracket(s).
So, there can be few cases:
example 1
text (it's ok; it's not ok); ok
should be:
text (it's ok$ it's not ok); ok
example 2
text (it's ok; it's not ok); ok (tekst;) ok
should be:
text (it's ok$ it's not ok); ok (tekst$) ok
example 3
text (it's ok; (ok;)it's not ok);
should be:
text (it's ok$ (ok$)it's not ok);
so I've started with replacing strings code, but problem is that my code now replace characters even after ")" and I don't want that.
should I somehow counts string that need to be changed and set start position and change only them? But then problem is if we have more brackets, it will not work.
what would be best way to do it?
Sub Removetext()
s = Range("A1").Value
Dim start As Integer
Dim end1 As Integer
start = InStr(1, s, "(")
end1 = InStr(1, s, ")")
s = Left(s, start - 1) & Replace(s, ";", "$", start)
End Sub
Here's one way, avoiding regexp as I don't use it enough to be able to remember how to do it without a lot of effort.
Sub x()
Dim i As Long, lb As Long, rb As Long, r As Range
Range("A1:A4").Copy Range("B1")
For Each r In Range("B1:B4")
For i = 1 To Len(r.Value)
If Mid(r, i, 1) = "(" Then lb = lb + 1
If Mid(r, i, 1) = ")" Then rb = rb + 1
If Mid(r, i, 1) = ";" Then
If lb > 0 And rb < lb Then r.Characters(i, 1).Text = "$"
End If
Next i
Next r
End Sub
Replace Between Chars (UDF)
If there are more occurrences of LeftChar (opening parentheses) than of RightChar (closing parentheses) to the left of
FindChar, FindChar will be replaced with ReplaceChar.
In Excel, for a string in A1 you can use it like this:
=RBC(A1)
In VBA see below.
The Code
Option Explicit
Function RBC( _
ByVal aString As String, _
Optional ByVal LeftChar As String = "(", _
Optional ByVal RightChar As String = ")", _
Optional ByVal FindChar As String = ";", _
Optional ByVal ReplaceChar As String = "$") _
As String
If Len(aString) > 0 Then
Dim Parts() As String
Parts = Split(aString, FindChar)
Dim Result As String
Result = Parts(0)
Dim cLen As Long
Dim lCount As Long
Dim rCount As Long
Dim n As Long
For n = 1 To UBound(Parts)
cLen = Len(Result)
lCount = cLen - Len(Replace(Result, LeftChar, ""))
rCount = cLen - Len(Replace(Result, RightChar, ""))
If lCount > rCount Then
Result = Result & ReplaceChar & Parts(n)
Else
Result = Result & FindChar & Parts(n)
End If
Next n
RBC = Result
End If
End Function
Sub testRBC()
Const aString As String = "(it's ok; (ok;)it's not; ok);"
Debug.Print RBC(aString)
End Sub
' Results in column B:
Sub testRBC2()
Dim rng As Range
Set rng = Range("A1:A100000")
Dim Data As Variant
Data = rng.Value
Dim i As Long
For i = 1 To UBound(Data, 1)
Data(i, 1) = RBC(Data(i, 1))
Next i
rng.Offset(, 1).Value = Data
End Sub

Which method to separate a long message?

I am facing problem when receiving a long message as below
40=1.22.50=0.002.60=35.
The system use the dot as separator while there is also decimal values for numeric value.
The desired output is
40=1.22
50=0.002
60=35
I am now using manual way to format the message. Hope to have a better way to overcome this.
Assuming you have one dot "." as the decimal position, and another "." that separates each element in the array. You can use the code below to read all values of the Long string into an array (Nums is the name of the array).
Option Explicit
Sub Seperate_DecimNumbers()
Dim Nums As Variant
Dim FullStr As String
Dim DotPosition As Integer
Dim i As Integer
' init array size to a large size , will redim it at the end to number of elements found
ReDim Nums(1 To 100)
FullStr = "40=1.22.50=0.002.60=35."
i = 1 ' init array elements counter
Do Until Len(FullStr) = 0
' call FindN function , searching for the 2nd "."
DotPosition = FindN(FullStr, ".", 2)
' unable to find 2 "." in the string >> last element in the array
If DotPosition = 0 Then
Nums(i) = FullStr
Exit Do
Else ' was able to find 2 "." in the string
Nums(i) = Left(FullStr, DotPosition - 1)
End If
i = i + 1
FullStr = Right(FullStr, Len(FullStr) - DotPosition)
Loop
' redim array back to maximum of numbers found in String
ReDim Preserve Nums(1 To i)
' place output start location from Range A2 and below (till number of elements in the array)
Range("A1").Offset(1, 0).Resize(UBound(Nums), 1).Value = Application.Transpose(Nums)
End Sub
Function FindN(sInputString As String, sFindWhat As String, N As Integer) As Integer
' this function find the Nth position of a certain character in a string
Dim J As Integer
Application.Volatile
FindN = 0
For J = 1 To N
FindN = InStr(FindN + 1, sInputString, sFindWhat)
If FindN = 0 Then ' unable to find the 2nd "." >> last appearance
Exit For
End If
Next
End Function
See result below:
Here's my take on the answer, which splits things on the = rather than the .. Doing it this way allows for input such as 40=1.22.50=0.002.60=35.70=120. (i.e. the part to the right of an = does not have to contain a ., it could be an integer.)
Sub SplitDotEqual()
Dim s As String
Dim a() As String
Dim i As Integer
Dim d As Integer
'Read from A1
s = Range("A1").Value
'Split on the "="
a = Split(s & ".", "=") ' include an extra "." to ensure that
' the final field is ended
For i = 0 To UBound(a) - 1
'Put the "=" back
a(i) = a(i) & "="
'Find the last "." before the next "="
d = InStrRev(a(i + 1), ".")
'Append everything prior to the "."
a(i) = a(i) & Left(a(i + 1), d - 1)
'Write to A2:Ax
Cells(i + 2, 1).Value = a(i)
'Strip off everything prior to the ".",
'leaving just the stuff prior to the "="
a(i + 1) = Mid(a(i + 1), d + 1)
Next
End Sub
Let's assume that every other dot is a separator. This code changes the odd-numbered dots into pipes and then parses on the pipes:
Sub parser()
Dim FlipFlop As Boolean, dot As String, pipe As String
Dim s As String, L As Long, i As Long, CH As String
dot = "."
pipe = "|"
s = Range("A1").Value
L = Len(s)
FlipFlop = True
For i = 1 To L
CH = Mid(s, i, 1)
If CH = dot Then
If FlipFlop Then
Else
Mid(s, i, 1) = pipe
End If
FlipFlop = Not FlipFlop
End If
Next i
msg = s & vbCrLf
ary = Split(s, pipe)
For Each a In ary
msg = msg & vbCrLf & a
Next a
MsgBox msg
End Sub
got more closer message and the code partially works.
8=TEST.1.2.9=248.35=D.49=MMUIJ.56=FGTUH.34=32998.50=MMTHUJ.57=AY/ABCDE.52=20161216-07:58:07.11=00708991.1=A-12345-

Finding position in a String

I want to find the position of a sub String in a String but facing some issues. Here is the code
Function findPos( Searchval As String, Output As String) As Long
Dim pos, i, count As Long
pos = InStr(1, content, searchVal, 0)
If pos = 0 Then
count = 0
Else
count = 1
End If
If pos > 0 Then
For i = 1 To pos
If Mid(content, i, 1) = "/" Then count = count + 1
Next i
End If
findPos=count
End Function
For eg: If output is "AA/AE_ABC/AE/CD" and if I searchVal is "AE" then I get output position as 2 which is wrong as I should get 3. I know that pos in code has to be modified somehow but can't figure it.
If you just want to find the position of the string then use this
Sub Sample()
Debug.Print findPos("AE", "AA/AE_ABC/AE/CD")
End Sub
Function findPos(Searchval As String, Output As String) As Long
findPos = InStr(1, Output, Searchval, 0)
End Function
BTW, the position is 4 and not 3
Edit: If you are looking for position after "/" then try this
Sub Sample()
Debug.Print findPos("AE", "AA/AE_ABC/AE/CD")
End Sub
Function findPos(Searchval As String, Output As String) As Long
Dim MyAr
Dim i As Long
'~~> Check if output has "/"
If InStr(1, Output, "/", 0) Then
'~~> Split it and store it in an array
MyAr = Split(Output, "/")
'~~> Loop through the array to find an exact match
For i = LBound(MyAr) To UBound(MyAr)
If MyAr(i) = Searchval Then
findPos = i + 1
Exit Function
End If
Next i
Else
'~~> Check if both Searchval and Output are same
If Output = Searchval Then findPos = 1
End If
End Function
Something like this should work for you, commented for clarity:
Function findPos(ByVal strFind As String, _
ByVal strContent As String, _
Optional ByVal sDelimiter As String = "/") As Long
'strFind is the substring you're searching for
'strContent is the string you're looking in for strFind
'Be default sDelimiter is '/' but it can be specified as something else
Dim varSection As Variant
Dim i As Long
'Check if strFind exists in strContent by itself with the delimiter
If InStr(1, sDelimiter & strContent & sDelimiter, sDelimiter & strFind & sDelimiter, vbTextCompare) > 0 Then
'It exists, loop through delimited sections of strContent to return the position
For Each varSection In Split(strContent, sDelimiter)
i = i + 1 'Increase section count
If varSection = strFind Then 'Check for match
'Match found, return position and exit for loop
findPos = i
Exit For
End If
Next varSection
Else
'No match found, return 0
findPos = 0
End If
End Function

Find near-duplicates of comma-separated lists using Levenshtein distance [duplicate]

This question already has an answer here:
Potential Duplicates Detection, with 3 Severity Level
(1 answer)
Closed 8 years ago.
This question based on the answer of my question yesterday.
To solve my problem, Jean-François Corbett suggested a Levenshtein distance approach. Then I found this code somewhere to get Levenshtein distance percentage.
Public Function GetLevenshteinPercentMatch( _
ByVal string1 As String, ByVal string2 As String, _
Optional Normalised As Boolean = False) As Single
Dim iLen As Integer
If Normalised = False Then
string1 = UCase$(WorksheetFunction.Trim(string1))
string2 = UCase$(WorksheetFunction.Trim(string2))
End If
iLen = WorksheetFunction.Max(Len(string1), Len(string2))
GetLevenshteinPercentMatch = (iLen - LevenshteinDistance(string1, string2)) / iLen
End Function
'********************************
'*** Compute Levenshtein Distance
'********************************
Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer
Dim d() As Integer ' matrix
Dim m As Integer ' length of t
Dim N As Integer ' length of s
Dim i As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost
' Step 1
N = Len(s)
m = Len(t)
If N = 0 Then
LevenshteinDistance = m
Exit Function
End If
If m = 0 Then
LevenshteinDistance = N
Exit Function
End If
ReDim d(0 To N, 0 To m) As Integer
' Step 2
For i = 0 To N
d(i, 0) = i
Next i
For j = 0 To m
d(0, j) = j
Next j
' Step 3
For i = 1 To N
s_i = Mid$(s, i, 1)
' Step 4
For j = 1 To m
t_j = Mid$(t, j, 1)
' Step 5
If s_i = t_j Then
cost = 0
Else
cost = 1
End If
' Step 6
d(i, j) = WorksheetFunction.Min( _
d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)
Next j
Next i
' Step 7
LevenshteinDistance = d(N, m)
End Function
What I have now is a code that finds exact duplicates in one column,
Dim duplicate(), i As Long
Dim delrange As Range, cell As Long
Dim shtIn As Worksheet, Shtout As Worksheet
Dim numofrows1
dim numofrows2
dim j as long
Set shtIn = ThisWorkbook.Sheets("process")
Set Shtout = ThisWorkbook.Sheets("output")
x = 2
y = 1
Set delrange = shtIn.Range("h1:h30000") 'set your range here
ReDim duplicate(0)
'search duplicates in 2nd column
For cell = 1 To delrange.Cells.Count
If Application.CountIf(delrange, delrange(cell)) > 1 Then
ReDim Preserve duplicate(i)
duplicate(i) = delrange(cell).Address
i = i + 1
End If
Next
'print duplicates
For i = UBound(duplicate) To LBound(duplicate) Step -1
Shtout.Cells(x, 1).EntireRow.Value = shtIn.Range(duplicate(i)).EntireRow.Value
x = x + 1
Next i
numofrows2 = Shtout.Cells(Shtout.Rows.Count, 1).End(xlUp).Row - 1
If Shtout.Cells(2, 1).Value = "" Then
MsgBox ("No Duplicates Found!")
Else
MsgBox (numofrows1 & " " & "Potential Duplicates Found")
End If
End Sub
I think that it will be nice if I can combine this two code, but Levenshtein distance is to compare 2 strings. So it can't work together.
I stuck here because I have no idea at all, every reference that I read all tell about comparing two string.
if the parameter this simple : detected as duplicate if the Levenshtein distance percentage is above 90%.
What I must change in this code?
I'm glad my earlier answer was useful to you. You didn't like having to represent each of your possible attributes by one-character symbols...
Ok, as I try to signal to you in the comments there, it is possible to adapt the Levenshtein Distance algorithm to look not at each character in a string, but at each element of an array instead, and do comparisons based on that. In fact it's quite straightforward to make this change:
Before 'Step 1, convert your comma-separated strings into arrays like this:
Dim sSplit() As String
Dim tSplit() As String
sSplit = Split(s, ",")
tSplit = Split(t, ",")
Then replace these four lines of code
N = Len(s)
m = Len(t)
s_i = Mid$(s, i, 1)
t_j = Mid$(t, j, 1)
with these
N = UBound(sSplit) + 1
m = UBound(tSplit) + 1
s_i = sSplit(i - 1)
t_j = tSplit(j - 1)
The + 1 and - 1 are there because Split returns a zero-based array.
Example usage:
?LevenshteinDistance("valros,helmet,42","valros,helmet,42")
0
?LevenshteinDistance("valros,helmet,42","knight,helmet")
2
?LevenshteinDistance("helmet,iron,knight","plain,helmet")
3
Note that 0 means the two strings are identical. You don't need separate code to deal with this.
With the above you should be able to complete your task.
One more note: the Damerau–Levenshtein distance may be a more relevant algorithm for you than the Levenshtein distance. The difference is that in addition to insertion/deletion/substitution, the D-M distance also considers transposition of two adjacent characters. Up to you to decide.
SOLVED!!!
Sub duplicate_separation()
Dim duplicate As Variant, I As Long
Dim vaData As Variant
Dim vadata2 As Variant
Dim delrange As Range, lRow As Long
Dim delrange2 As Range
Dim shtIn As Worksheet, Shtout As Worksheet
Dim numofrows1
Dim j As Long
Set shtIn = ThisWorkbook.Sheets("process")
Set Shtout = ThisWorkbook.Sheets("output")
With shtIn.UsedRange 'set your range here
Set delrange = shtIn.Range("b1").Resize(.Row + .Rows.Count - 1)
End With
vaData = delrange.Value
ReDim duplicate(1 To 1, 1 To 1)
'search duplicates in 2nd column
For lRow = 1 To UBound(vaData, 1)
'choose the parameter
'1. detect potential duplicate data for similiarity above 70%
If FuzzyCount(LookupValue:=CStr(vaData(lRow, 1)), TableArray:=delrange, NFPercent:=0.7, Algorithm:=4) > 1 Then
I = I + 1
ReDim Preserve duplicate(1 To 1, 1 To I)
duplicate(1, I) = delrange(lRow).Address
End If
Next lRow
Shtout.Cells(1, 1).Resize(1, 7).Value = _
Array("Material Number", "Short Description", "Manufacturer", "Material Part Number", "Old Material Number", "Long Description", "sorted ShortDesc")
If I = 0 Then
MsgBox ("No Duplicates Found!")
Else
'print duplicates
MsgBox (I & " " & "Potential Duplicates Found")
Shtout.Cells(2, 1).Resize(I, 6).EntireRow.Value = shtIn.Range(duplicate(1, 1)).Resize(I, 6).EntireRow.Value
End If
End Sub
Fuzzy v lookup function. by Alan
Private Function NormaliseKey(ByVal String1 As String) As String
NormaliseKey = Replace(UCase$(String1), " ", "")
End Function
Function FuzzyCount(ByVal LookupValue As String, _
ByVal TableArray As Range, _
Optional NFPercent As Single = 0.05, _
Optional Algorithm As Variant = 3) As Long
'**********************************************************************
'** Simple count of (Fuzzy) Matching strings >= NFPercent threshold **
'**********************************************************************
Dim lMatchCount As Long
Dim rCur As Range
Dim sString1 As String
Dim sString2 As String
'** Normalise lookup value **
sString1 = LCase$(Application.Trim(LookupValue))
For Each rCur In Intersect(TableArray.Resize(, 1), Sheets(TableArray.Parent.Name).UsedRange)
'** Normalise current Table entry **
sString2 = LCase$(Application.Trim(CStr(rCur)))
If sString2 <> "" Then
If FuzzyPercent(String1:=sString1, _
String2:=sString2, _
Algorithm:=Algorithm, _
Normalised:=False) >= NFPercent Then
lMatchCount = lMatchCount + 1
End If
End If
Next rCur
FuzzyCount = lMatchCount
End Function
Function FuzzyPercent(ByVal String1 As String, _
ByVal String2 As String, _
Optional Algorithm As Variant = 3, _
Optional Normalised As Boolean = False) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim bSoundex As Boolean
Dim bBasicMetaphone As Boolean
Dim intLen1 As Integer, intLen2 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim lngAlgorithm As Long
Dim sngScore As Single
Dim strWork As String
bSoundex = LCase$(CStr(Algorithm)) = "soundex"
bBasicMetaphone = LCase$(CStr(Algorithm)) = "metaphone"
'-------------------------------------------------------
'-- If strings havent been normalised, normalise them --
'-------------------------------------------------------
If Normalised = False Then
If bSoundex Or bBasicMetaphone Then
String1 = NormaliseStringAtoZ(String1)
String2 = NormaliseStringAtoZ(String2)
Else
String1 = LCase$(Application.Trim(String1))
String2 = LCase$(Application.Trim(String2))
End If
End If
'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If String1 = String2 Then
FuzzyPercent = 1
Exit Function
End If
'If bSoundex Then
' String1 = Soundex(Replace(String1, " ", ""))
' String2 = Soundex(Replace(String2, " ", ""))
' If String1 = String2 Then
' FuzzyPercent = msngSoundexMatchPercent
' Else
' FuzzyPercent = 0
' End If
' Exit Function
'ElseIf bBasicMetaphone Then
' String1 = Metaphone1(String1)
' String2 = Metaphone1(String2)
' If String1 = String2 Then
' FuzzyPercent = msngMetaphoneMatchPercent
' Else
' FuzzyPercent = 0
' End If
' Exit Function
'End If
intLen1 = Len(String1)
intLen2 = Len(String2)
If intLen1 = 0 Or intLen2 = 0 Then
FuzzyPercent = 0
Exit Function
End If
'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
FuzzyPercent = 0
Exit Function
End If
intTotScore = 0 'initialise total possible score
intScore = 0 'initialise current score
lngAlgorithm = Val(Algorithm)
'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (lngAlgorithm And 1) <> 0 Then
If intLen1 < intLen2 Then
FuzzyAlg1 String1, String2, intScore, intTotScore
Else
FuzzyAlg1 String2, String1, intScore, intTotScore
End If
End If
'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (lngAlgorithm And 2) <> 0 Then
If intLen1 < intLen2 Then
FuzzyAlg2 String1, String2, intScore, intTotScore
Else
FuzzyAlg2 String2, String1, intScore, intTotScore
End If
End If
'-------------------------------------------------------------
'-- If Algorithm = 4,5,6,7, use Levenstein Distance method --
'-- (Algorithm 4 was Dan Ostrander's code) --
'-------------------------------------------------------------
If (lngAlgorithm And 4) <> 0 Then
If intLen1 < intLen2 Then
' sngScore = FuzzyAlg4(String1, String1)
sngScore = GetLevenshteinPercentMatch(String1:=String1, _
String2:=String2, _
Normalised:=True)
Else
' sngScore = FuzzyAlg4(String2, String1)
sngScore = GetLevenshteinPercentMatch(String1:=String2, _
String2:=String1, _
Normalised:=True)
End If
intScore = intScore + (sngScore * 100)
intTotScore = intTotScore + 100
End If
FuzzyPercent = intScore / intTotScore
End Function
Private Sub FuzzyAlg1(ByVal String1 As String, _
ByVal String2 As String, _
ByRef Score As Integer, _
ByRef TotScore As Integer)
Dim intLen1 As Integer, intPos As Integer, intPtr As Integer, intStartPos As Integer
intLen1 = Len(String1)
TotScore = TotScore + intLen1 'update total possible score
intPos = 0
For intPtr = 1 To intLen1
intStartPos = intPos + 1
intPos = InStr(intStartPos, String2, Mid$(String1, intPtr, 1))
If intPos > 0 Then
If intPos > intStartPos + 3 Then 'No match if char is > 3 bytes away
intPos = intStartPos
Else
Score = Score + 1 'Update current score
End If
Else
intPos = intStartPos
End If
Next intPtr
End Sub
Private Sub FuzzyAlg2(ByVal String1 As String, _
ByVal String2 As String, _
ByRef Score As Integer, _
ByRef TotScore As Integer)
Dim intCurLen As Integer, intLen1 As Integer, intTo As Integer, intPtr As Integer, intPos As Integer
Dim strWork As String
intLen1 = Len(String1)
For intCurLen = 1 To intLen1
strWork = String2 'Get a copy of String2
intTo = intLen1 - intCurLen + 1
TotScore = TotScore + Int(intLen1 / intCurLen) 'Update total possible score
For intPtr = 1 To intTo Step intCurLen
intPos = InStr(strWork, Mid$(String1, intPtr, intCurLen))
If intPos > 0 Then
Mid$(strWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
Score = Score + 1 'Update current score
End If
Next intPtr
Next intCurLen
End Sub
'Private Function FuzzyAlg4(strIn1 As String, strIn2 As String) As Single
'
'Dim L1 As Integer
'Dim In1Mask(1 To 24) As Long 'strIn1 is 24 characters max
'Dim iCh As Integer
'Dim N As Long
'Dim strTry As String
'Dim strTest As String
'
'TopMatch = 0
'L1 = Len(strIn1)
'strTest = UCase(strIn1)
'strCompare = UCase(strIn2)
'For iCh = 1 To L1
' In1Mask(iCh) = 2 ^ iCh
'Next iCh 'Loop thru all ordered combinations of characters in strIn1
'For N = 2 ^ (L1 + 1) - 1 To 1 Step -1
' strTry = ""
' For iCh = 1 To L1
' If In1Mask(iCh) And N Then
' strTry = strTry & Mid(strTest, iCh, 1)
' End If
' Next iCh
' If Len(strTry) > TopMatch Then FuzzyAlg4Test strTry
'Next N
'FuzzyAlg4 = TopMatch / CSng(L1)
'End Function
'Sub FuzzyAlg4Test(strIn As String)
'
'Dim l As Integer
'Dim strTry As String
'Dim iCh As Integer
'
'l = Len(strIn)
'If l <= TopMatch Then Exit Sub
'strTry = "*"
'For iCh = 1 To l
' strTry = strTry & Mid(strIn, iCh, 1) & "*"
'Next iCh
'If strCompare Like strTry Then
' If l > TopMatch Then TopMatch = l
'End If
'End Sub
Public Function GetLevenshteinPercentMatch(ByVal String1 As String, _
ByVal String2 As String, _
Optional Normalised As Boolean = False) As Single
Dim iLen As Integer
If Normalised = False Then
String1 = UCase$(WorksheetFunction.Trim(String1))
String2 = UCase$(WorksheetFunction.Trim(String2))
End If
iLen = WorksheetFunction.Max(Len(String1), Len(String2))
GetLevenshteinPercentMatch = (iLen - LevenshteinDistance(String1, String2)) / iLen
End Function
Private Function NormaliseStringAtoZ(ByVal String1 As String) As String
'---------------------------------------------------------
'-- Remove all but alpha chars and convert to lowercase --
'---------------------------------------------------------
Dim iPtr As Integer
Dim sChar As String
Dim sResult As String
sResult = ""
For iPtr = 1 To Len(String1)
sChar = LCase$(Mid$(String1, iPtr, 1))
If sChar <> UCase$(sChar) Then sResult = sResult & sChar
Next iPtr
NormaliseStringAtoZ = sResult
End Function
'********************************
'*** Compute Levenshtein Distance
'********************************
Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer
Dim d() As Integer ' matrix
Dim m As Integer ' length of t
Dim N As Integer ' length of s
Dim I As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost
' Step 1
N = Len(s)
m = Len(t)
If N = 0 Then
LevenshteinDistance = m
Exit Function
End If
If m = 0 Then
LevenshteinDistance = N
Exit Function
End If
ReDim d(0 To N, 0 To m) As Integer
' Step 2
For I = 0 To N
d(I, 0) = I
Next I
For j = 0 To m
d(0, j) = j
Next j
' Step 3
For I = 1 To N
s_i = Mid$(s, I, 1)
' Step 4
For j = 1 To m
t_j = Mid$(t, j, 1)
' Step 5
If s_i = t_j Then
cost = 0
Else
cost = 1
End If
' Step 6
d(I, j) = WorksheetFunction.Min(d(I - 1, j) + 1, d(I, j - 1) + 1, d(I - 1, j - 1) + cost)
Next j
Next I
' Step 7
LevenshteinDistance = d(N, m)
End Function
evryone. thankyou for your help!!

Resources