I am trying to split the contents of a column into various columns. The column has content that looks like this:
3-BW16569*AW34586*AW34587
3- LVA18140 & LVA19222
3-LVA22841
3- JDSC RELOAD
3 - LV1 TO JDSC 6/21
3- LVU21690
3-LVA19520*LVU21739
3- R241974/R241974
The column is not in a particular format but always has different symbols between the elements to separate them. Can a macro code help with this or a excel function. Thank you!
All thanks to alainbryden for the function SplitMultiDelims() . . don't change it, foo() may help you through in how to use it in your problem...
Sub foo()
Dim ws As Worksheet
Dim sizeArr, index As Integer
Dim Arr() As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
Dim str As String
Dim dilimiters As String
Dim str1 As String
dilimiters = " -*" ' provide all of them
str = "3-BW16569*AW34586*AW34587" ' read the string
'str = ws.Cells(1, 1).Value
Debug.Print str
Arr = SplitMultiDelims(str, dilimiters) ' delimit
sizeArr = UBound(Arr) ' get no of different strings you have
For index = 0 To sizeArr Step 1
str1 = Arr(index) ' get the string
Debug.Print str1
' now paste where evere you want
Next
End Sub
Function SplitMultiDelims(ByRef Text As String, ByRef DelimChars As String, _
Optional ByVal IgnoreConsecutiveDelimiters As Boolean = False, _
Optional ByVal Limit As Long = -1) As String()
Dim ElemStart As Long, N As Long, M As Long, Elements As Long
Dim lDelims As Long, lText As Long
Dim Arr() As String
lText = Len(Text)
lDelims = Len(DelimChars)
If lDelims = 0 Or lText = 0 Or Limit = 1 Then
ReDim Arr(0 To 0)
Arr(0) = Text
SplitMultiDelims = Arr
Exit Function
End If
ReDim Arr(0 To IIf(Limit = -1, lText - 1, Limit))
Elements = 0: ElemStart = 1
For N = 1 To lText
If InStr(DelimChars, Mid(Text, N, 1)) Then
Arr(Elements) = Mid(Text, ElemStart, N - ElemStart)
If IgnoreConsecutiveDelimiters Then
If Len(Arr(Elements)) > 0 Then Elements = Elements + 1
Else
Elements = Elements + 1
End If
ElemStart = N + 1
If Elements + 1 = Limit Then Exit For
End If
Next N
'Get the last token terminated by the end of the string into the array
If ElemStart <= lText Then Arr(Elements) = Mid(Text, ElemStart)
'Since the end of string counts as the terminating delimiter, if the last character
'was also a delimiter, we treat the two as consecutive, and so ignore the last elemnent
If IgnoreConsecutiveDelimiters Then If Len(Arr(Elements)) = 0 Then Elements = Elements - 1
ReDim Preserve Arr(0 To Elements) 'Chop off unused array elements
SplitMultiDelims = Arr
End Function
Related
I have the following data
0/3, 1/1, 3/4
1/3, 3/2, 6/2
12/1, 3/6, 3/4
3/4, 0/3, 1/1 'will be considered is duplicate with the first data
Is there any way to find and remove duplicate data like this?
My current method is to split into 3 strings based on "," then check with the following condition.
'I got each String value by mid command.
'FrstStr1: First String of String 1
'SecStr1: Second String of String 1
'ThrStr1: Third String of String 1
'FrstStr2: First String of String 2
'SecStr2: Second String of String 2
'ThrStr2: Third String of String 2
if (FrstStr1 = FrstStr2 and SecStr1 = SecStr2 and ThrStr1 = ThrStr2) or
(FrstStr1 = FrstStr2 and SecStr1 = ThrStr2 and ThrStr1 = SecStr2) or
() or () .... then
I listed 6 possible cases and put them into if condition like above.
Make Array by Spliting data with delimiter comma.
And Sorting Array by function.
Ceck duplicated data by Dictionary.
## Code ##
Sub test()
Dim vR(), vDB
Dim dic As Object
Dim v As Variant
Dim s As String
Dim i As Long, n As Long
Set dic = CreateObject("Scripting.Dictionary")
vDB = Range("a1").CurrentRegion
For i = 1 To UBound(vDB, 1)
v = Split(vDB(i, 1), ",")
s = newArray(v)
If dic.exists(s) Then
Else
dic.Add s, s
n = n + 1
ReDim Preserve vR(1 To n)
vR(n) = vDB(i, 1)
End If
Next i
If n Then
Range("e1").Resize(n) = WorksheetFunction.Transpose(vR)
End If
End Sub
Function newArray(v As Variant)
Dim temp As String
Dim r As Integer, i As Integer, j As Integer
r = UBound(v)
For i = LBound(v) To r - 1
For j = i + 1 To r
v(i) = Trim(v(i))
v(j) = Trim(v(j))
If v(i) > v(j) Then
temp = v(j)
v(j) = v(i)
v(i) = temp
End If
Next j
Next i
newArray = Join(v, ",")
End Function
Image
expoliting Dictionary and ArrayList objects could lead to a very compact (and maintanable) code:
Sub RemoveDuplicatedDataWithoutCountingOrder()
Dim dict As Object: Set dict = CreateObject("Scripting.Dictionary")
Dim j As Long
Dim key As String
Dim datum As Variant, couple As Variant
For Each datum In Range("A1").CurrentRegion.Value
key = vbNullString
With CreateObject("System.Collections.SortedList")
For Each couple In Split(Replace(datum, " ", vbNullString), ",")
.Add couple, 0
Next
For j = 0 To .Count - 1
key = key & .getkey(j)
Next
If Not dict.exists(key) Then dict.Add key, datum
End With
Next
Range("C1").Resize(dict.Count) = Application.Transpose(dict.items)
End Sub
I don't know how to search for this or how to explain without an example.
I'm looking for an excel function that compares cell strings and identifies the portion they have in common.
Conditions
Compares 2 or more cells.
The common string is identified as long as 2 cells share it. *(ie: if comparing more than 2, it's enough to have 2 cells with that string. Not all the compared cells need to have it.) *
The string has at least 3 or more chars to avoid single characters and pairs being flagged.
Example
----------------------------------------------------------------------
| Pattern | Page URL 1 | Page URL 2 | Page URL 3 |
----------------------------------------------------------------------
| test | example.net/test/ | www.test.com | www.notest.com |
----------------------------------------------------------------------
| q=age | another.com?q=age | test.com/q=age | test.com/q=lol |
----------------------------------------------------------------------
Probably obvious by now, but what I'm trying to achieve/analyze is if there are string patterns that are common to large sets of URLs.
(forgive my poor attempt trying to draw a table)
This doesn't fully answer the question but I think it will give you what you need to get it. Give it a try. Place the following code in a new moudule:
Public Sub FindStrings()
Dim rng1 As Excel.Range
Dim rng2 As Excel.Range
Set rng1 = ActiveSheet.Range("A1")
Set rng2 = ActiveSheet.Range("A2")
Dim i As Integer
Dim j As Integer
Dim searchVal As String
For i = 3 To Len(rng2)
For j = 1 To Len(rng1)
searchVal = Mid(rng1, j, i)
If Len(searchVal) < i Then Exit For
If InStr(1, rng2, searchVal) Then Debug.Print searchVal
Next j
Next i
End Sub
In cell A1 put example.net/test
In cell A2 put www.test.com
Result
tes
est
test
UPDATE
I updated the code to search for a minimum of 4 characters instead of 3 (as you mentioned above). Furthermore, I guessed you wouldn't want strings such as www. and .com returned, nor strings with the / or . character. So the code pulls those out as well. Also, it compares every column combination.
Option Explicit
Public Sub CompareStrings()
Dim Arr As Variant
Dim i As Integer
Dim j As Integer
Dim StartRange As Excel.Range
Dim SearchRange As Excel.Range
Dim Counter As Integer
Dim ComparableRange As Variant
Dim Comparable As Integer
Dim Compared As Integer
Dim SearchVal As String
Set StartRange = ActiveSheet.Range("A1")
Counter = 0
For Each ComparableRange In ActiveSheet.Range("A1:A2")
Set SearchRange = Range(StartRange.Offset(Counter), Cells(StartRange.Offset(Counter).Row, Columns.Count).End(xlToLeft))
Arr = Application.Transpose(Application.Transpose(SearchRange.Value))
Debug.Print "Row " & SearchRange.Row & ":"
For j = LBound(Arr) To UBound(Arr)
For i = j + 1 To UBound(Arr)
For Comparable = 4 To Len(Arr(j))
For Compared = 1 To Len(Arr(i))
SearchVal = Mid(Arr(j), Compared, Comparable)
If InStr(1, SearchVal, ".") = 0 Then
If InStr(1, SearchVal, "/") = 0 Then
If Len(SearchVal) < Comparable Then Exit For
If InStr(1, Arr(i), SearchVal) > 0 Then Debug.Print vbTab & SearchVal
End If
End If
Next Compared
Next Comparable
Next i
Next j
Counter = Counter + 1
Next ComparableRange
End Sub
When comparing test.com/q=age with another.com?q=age You will still get results such as:
q=ag
=age
q=age
...though I suspect you only want the third one. The longer the matching strings are the more results you will get. The last results are the ones you will probably want.
Copy the following code into a module. Read the comments at the top of CommonString for usage.
Option Explicit
Public Function CommonString(rng As Range, iMinLen As Integer, Optional strDelimiter As String = ",") As String
'Finds the maximum number of cells (iMax) in "rng" that have a common substring of length at least "iMinLen".
'The function returns a string with the format "iMax: substring1,substring2,substring3..."
' where substring1, substring2, etc. are unique substrings found in exactly iMax cells.
'The output does not include any substrings of the unique substrings.
'The delimter between substrings can be specified by the optional parameter "strDelimiter".
'If no common substrings of length at least "iMinLen" are found, "CommonString" will return an empty string.
Dim blnRemove() As Boolean
Dim dicSubStrings As Object 'records the number of times substrings are found in pairwise string comparisons
Dim iCandidates As Integer
Dim iCol As Integer
Dim iCurrCommon As Integer
Dim iCurrLen As Integer
Dim iMax As Integer
Dim iMaxCommon As Integer
Dim iNumStrings As Integer
Dim iOutCount As Integer
Dim iRow As Integer
Dim iString1 As Integer
Dim iString2 As Integer
Dim iSubStr1 As Integer
Dim iSubStr2 As Integer
Dim lngSumLen As Long
Dim str1D() As String
Dim strCandidates() As String
Dim strOut() As String
Dim strSim() As String
Dim strSub As String
Dim vKey As Variant
Dim vStringsIn() As Variant
Set dicSubStrings = CreateObject("Scripting.Dictionary")
vStringsIn = rng.Value
iNumStrings = Application.CountA(rng)
ReDim str1D(1 To iNumStrings)
' pull the strings into a 1-D array
For iRow = 1 To UBound(vStringsIn, 1)
For iCol = 1 To UBound(vStringsIn, 2)
iCurrLen = Len(vStringsIn(iRow, iCol))
If iCurrLen > 0 Then
iString1 = iString1 + 1
str1D(iString1) = vStringsIn(iRow, iCol)
lngSumLen = lngSumLen + iCurrLen
End If
Next iCol
Next iRow
'initialize the array that will hold the substrings to output
ReDim strOut(1 To lngSumLen - iNumStrings * (iMinLen - 1))
'find common substrings from all pairwise combination of strings
For iString1 = 1 To iNumStrings - 1
For iString2 = iString1 + 1 To iNumStrings
strSim = Sim2Strings(str1D(iString1), str1D(iString2), iMinLen)
'loop through all common substrings
For iSubStr1 = 1 To UBound(strSim)
If dicSubStrings.Exists(strSim(iSubStr1)) Then
iCurrCommon = dicSubStrings(strSim(iSubStr1)) + 1
dicSubStrings(strSim(iSubStr1)) = iCurrCommon
If iCurrCommon > iMaxCommon Then iMaxCommon = iCurrCommon
Else 'add common substrings to the "dicSubStrings" dictionary
dicSubStrings.Add strSim(iSubStr1), 1
If iMaxCommon = 0 Then iMaxCommon = 1
End If
Next iSubStr1
Next iString2
Next iString1
If dicSubStrings.Count = 0 Then Exit Function
ReDim strCandidates(1 To dicSubStrings.Count)
'add the candidate substrings to the "strCandidates" array
'candidate substrings are those found in exactly "iMaxCommon" pairwise comparisons
For Each vKey In dicSubStrings.keys
If dicSubStrings(vKey) = iMaxCommon Then
iCandidates = iCandidates + 1
strCandidates(iCandidates) = CStr(vKey)
End If
Next vKey
ReDim blnRemove(1 To iCandidates)
iOutCount = iCandidates
'keep only the candidate substrings that are not a substring within another candidate substring
For iSubStr1 = 1 To iCandidates - 1
If Not blnRemove(iSubStr1) Then
For iSubStr2 = 1 To iCandidates - 1
If Not blnRemove(iSubStr2) Then
If Len(strCandidates(iSubStr1)) <> Len(strCandidates(iSubStr2)) Then
If Len(strCandidates(iSubStr1)) > Len(strCandidates(iSubStr2)) Then
If InStr(strCandidates(iSubStr1), strCandidates(iSubStr2)) > 0 Then
blnRemove(iSubStr2) = True
iOutCount = iOutCount - 1
End If
Else
If InStr(strCandidates(iSubStr2), strCandidates(iSubStr1)) > 0 Then
blnRemove(iSubStr1) = True
iOutCount = iOutCount - 1
End If
End If
End If
End If
Next iSubStr2
End If
Next iSubStr1
ReDim strOut(1 To iOutCount)
iOutCount = 0
'add the successful candidates to "strOut"
For iSubStr1 = 1 To iCandidates
If Not blnRemove(iSubStr1) Then
iOutCount = iOutCount + 1
strOut(iOutCount) = strCandidates(iSubStr1)
End If
Next iSubStr1
'convert "iMaxCommon" (pairwise counts) to number of cells (iMax) by solving the formula:
'(iMax ^ 2 - iMax) / 2 = iMaxCommon
iMax = ((8 * iMaxCommon + 1) ^ 0.5 + 1) / 2
CommonString = iMax & ": " & Join(strOut, strDelimiter)
End Function
Private Function Sim2Strings(str1 As String, str2 As String, iMinLen As Integer) As String()
'Returns a list of unique substrings common to both "str1" and "str2" that
' have a length of at least "iMinLen".
Dim dicInList As Object
Dim iCharFrom As Integer
Dim iLen1 As Integer
Dim iSearchLen As Integer
Dim iSubStr As Integer
Dim strCurr As String
Dim strList() As String
Dim vKey As Variant
iLen1 = Len(str1)
Set dicInList = CreateObject("Scripting.Dictionary")
'add common substrings to the "dicInList" dictionary
For iCharFrom = 1 To iLen1 - iMinLen + 1
For iSearchLen = iMinLen To iLen1 - iCharFrom + 1
strCurr = Mid(str1, iCharFrom, iSearchLen)
If InStr(str2, strCurr) = 0 Then
Exit For
Else
If Not dicInList.Exists(strCurr) Then
dicInList.Add strCurr, 0
End If
End If
Next iSearchLen
Next iCharFrom
If dicInList.Count = 0 Then
ReDim strList(0)
Else
ReDim Preserve strList(1 To dicInList.Count)
'output the keys in the "dicInList" dictionary to the "strList" array
For Each vKey In dicInList.keys
iSubStr = iSubStr + 1
strList(iSubStr) = vKey
Next vKey
End If
Sim2Strings = strList
End Function
I'm learning how to work with VBA.
I need to read an Integer and store it in an Array breaking it down to single characters.
For example, if I input the number 1927, I need it to be stored as
Array[0] = 1
Array[1] = 9
Array[2] = 2
Array[3] = 7
I've been looking how to do this for 2 days.Can someone help ?
I find a conversion to a unicode string and subsequent split on Chr(0) is an efficient way to create an array of individual characters.
Dim str As String, arr As Variant, i as long
str = StrConv(CStr(1927), vbUnicode)
arr = Split(str, Chr(0))
redim preserve arr(ubound(arr)-1)
For i = LBound(arr) To UBound(arr)
Debug.Print arr(i)
Next i
Debug.Print Join(arr, vbNullString)
You can convert the integer to a string as follows :
cstr(int)
then you can extract your single numbers as char from the string that you created as follows :
Mid(s, index, 1)
Check this link for more documentation.
This is a possible solution:
Option Explicit
Public Sub SplitIntToArray()
Dim inputString As String
inputString = "1927"
Dim cnt As Long
Dim myArr As Variant
ReDim myArr(Len(inputString) - 1)
For cnt = LBound(myArr) To UBound(myArr)
myArr(cnt) = Mid(inputString, cnt + 1, 1)
Next cnt
For cnt = LBound(myArr) To UBound(myArr)
Debug.Print myArr(cnt)
Next cnt
End Sub
It takes the inputString and based on its length it dimensionzes the myArr. Then it loops the newly dimensionized array and it assigns Mid(inputString, cnt+1,1) to every part of the array. THe cnt+1 is needed, because the first char of the string needs to be in the 0th element of the array, as the arrays start at zero.
You can also make a function, taking string and returning array:
Public Sub SplitIntToArray()
Dim inputString As String
Dim cnt As Long
Dim myArr As Variant
myArr = stringToArray("1927")
For cnt = LBound(myArr) To UBound(myArr)
Debug.Print myArr(cnt)
Next cnt
End Sub
Public Function stringToArray(inputString As String) As Variant
Dim cnt As Long
Dim returnArray As Variant
If Len(inputString) = 0 Then
stringToArray = Array()
Exit Function
End If
ReDim returnArray(Len(inputString) - 1)
For cnt = LBound(returnArray) To UBound(returnArray)
returnArray(cnt) = Mid(inputString, cnt + 1, 1)
Next cnt
stringToArray = returnArray
End Function
Here's another solution that doesn't require looping to assign the values to an array, or from an array to a string.
Sub Ex()
Dim Integers() As Byte
Dim i As Long
Integers = StrConv("0123456789", vbFromUnicode)
'Print the values out, already in an array
For i = LBound(Integers) To UBound(Integers)
Debug.Print Chr$(Integers(i))
Next
'Put back into a string
Dim MyInts As String
MyInts = StrConv(Integers, vbUnicode)
Debug.Print MyInts
End Sub
No one appears to have used the mathematical route yet. So let me do that.
Sub numArr()
Dim num As Integer
Dim arr() As Integer
num = 1927
i = 0
Do
i = i + 1
ReDim Preserve arr(i)
arr(i) = num Mod 10
num = num \ 10
Loop While num > 0
End Sub
This one was more fun writing.
This question already has an answer here:
Potential Duplicates Detection, with 3 Severity Level
(1 answer)
Closed 8 years ago.
This question based on the answer of my question yesterday.
To solve my problem, Jean-François Corbett suggested a Levenshtein distance approach. Then I found this code somewhere to get Levenshtein distance percentage.
Public Function GetLevenshteinPercentMatch( _
ByVal string1 As String, ByVal string2 As String, _
Optional Normalised As Boolean = False) As Single
Dim iLen As Integer
If Normalised = False Then
string1 = UCase$(WorksheetFunction.Trim(string1))
string2 = UCase$(WorksheetFunction.Trim(string2))
End If
iLen = WorksheetFunction.Max(Len(string1), Len(string2))
GetLevenshteinPercentMatch = (iLen - LevenshteinDistance(string1, string2)) / iLen
End Function
'********************************
'*** Compute Levenshtein Distance
'********************************
Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer
Dim d() As Integer ' matrix
Dim m As Integer ' length of t
Dim N As Integer ' length of s
Dim i As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost
' Step 1
N = Len(s)
m = Len(t)
If N = 0 Then
LevenshteinDistance = m
Exit Function
End If
If m = 0 Then
LevenshteinDistance = N
Exit Function
End If
ReDim d(0 To N, 0 To m) As Integer
' Step 2
For i = 0 To N
d(i, 0) = i
Next i
For j = 0 To m
d(0, j) = j
Next j
' Step 3
For i = 1 To N
s_i = Mid$(s, i, 1)
' Step 4
For j = 1 To m
t_j = Mid$(t, j, 1)
' Step 5
If s_i = t_j Then
cost = 0
Else
cost = 1
End If
' Step 6
d(i, j) = WorksheetFunction.Min( _
d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)
Next j
Next i
' Step 7
LevenshteinDistance = d(N, m)
End Function
What I have now is a code that finds exact duplicates in one column,
Dim duplicate(), i As Long
Dim delrange As Range, cell As Long
Dim shtIn As Worksheet, Shtout As Worksheet
Dim numofrows1
dim numofrows2
dim j as long
Set shtIn = ThisWorkbook.Sheets("process")
Set Shtout = ThisWorkbook.Sheets("output")
x = 2
y = 1
Set delrange = shtIn.Range("h1:h30000") 'set your range here
ReDim duplicate(0)
'search duplicates in 2nd column
For cell = 1 To delrange.Cells.Count
If Application.CountIf(delrange, delrange(cell)) > 1 Then
ReDim Preserve duplicate(i)
duplicate(i) = delrange(cell).Address
i = i + 1
End If
Next
'print duplicates
For i = UBound(duplicate) To LBound(duplicate) Step -1
Shtout.Cells(x, 1).EntireRow.Value = shtIn.Range(duplicate(i)).EntireRow.Value
x = x + 1
Next i
numofrows2 = Shtout.Cells(Shtout.Rows.Count, 1).End(xlUp).Row - 1
If Shtout.Cells(2, 1).Value = "" Then
MsgBox ("No Duplicates Found!")
Else
MsgBox (numofrows1 & " " & "Potential Duplicates Found")
End If
End Sub
I think that it will be nice if I can combine this two code, but Levenshtein distance is to compare 2 strings. So it can't work together.
I stuck here because I have no idea at all, every reference that I read all tell about comparing two string.
if the parameter this simple : detected as duplicate if the Levenshtein distance percentage is above 90%.
What I must change in this code?
I'm glad my earlier answer was useful to you. You didn't like having to represent each of your possible attributes by one-character symbols...
Ok, as I try to signal to you in the comments there, it is possible to adapt the Levenshtein Distance algorithm to look not at each character in a string, but at each element of an array instead, and do comparisons based on that. In fact it's quite straightforward to make this change:
Before 'Step 1, convert your comma-separated strings into arrays like this:
Dim sSplit() As String
Dim tSplit() As String
sSplit = Split(s, ",")
tSplit = Split(t, ",")
Then replace these four lines of code
N = Len(s)
m = Len(t)
s_i = Mid$(s, i, 1)
t_j = Mid$(t, j, 1)
with these
N = UBound(sSplit) + 1
m = UBound(tSplit) + 1
s_i = sSplit(i - 1)
t_j = tSplit(j - 1)
The + 1 and - 1 are there because Split returns a zero-based array.
Example usage:
?LevenshteinDistance("valros,helmet,42","valros,helmet,42")
0
?LevenshteinDistance("valros,helmet,42","knight,helmet")
2
?LevenshteinDistance("helmet,iron,knight","plain,helmet")
3
Note that 0 means the two strings are identical. You don't need separate code to deal with this.
With the above you should be able to complete your task.
One more note: the Damerau–Levenshtein distance may be a more relevant algorithm for you than the Levenshtein distance. The difference is that in addition to insertion/deletion/substitution, the D-M distance also considers transposition of two adjacent characters. Up to you to decide.
SOLVED!!!
Sub duplicate_separation()
Dim duplicate As Variant, I As Long
Dim vaData As Variant
Dim vadata2 As Variant
Dim delrange As Range, lRow As Long
Dim delrange2 As Range
Dim shtIn As Worksheet, Shtout As Worksheet
Dim numofrows1
Dim j As Long
Set shtIn = ThisWorkbook.Sheets("process")
Set Shtout = ThisWorkbook.Sheets("output")
With shtIn.UsedRange 'set your range here
Set delrange = shtIn.Range("b1").Resize(.Row + .Rows.Count - 1)
End With
vaData = delrange.Value
ReDim duplicate(1 To 1, 1 To 1)
'search duplicates in 2nd column
For lRow = 1 To UBound(vaData, 1)
'choose the parameter
'1. detect potential duplicate data for similiarity above 70%
If FuzzyCount(LookupValue:=CStr(vaData(lRow, 1)), TableArray:=delrange, NFPercent:=0.7, Algorithm:=4) > 1 Then
I = I + 1
ReDim Preserve duplicate(1 To 1, 1 To I)
duplicate(1, I) = delrange(lRow).Address
End If
Next lRow
Shtout.Cells(1, 1).Resize(1, 7).Value = _
Array("Material Number", "Short Description", "Manufacturer", "Material Part Number", "Old Material Number", "Long Description", "sorted ShortDesc")
If I = 0 Then
MsgBox ("No Duplicates Found!")
Else
'print duplicates
MsgBox (I & " " & "Potential Duplicates Found")
Shtout.Cells(2, 1).Resize(I, 6).EntireRow.Value = shtIn.Range(duplicate(1, 1)).Resize(I, 6).EntireRow.Value
End If
End Sub
Fuzzy v lookup function. by Alan
Private Function NormaliseKey(ByVal String1 As String) As String
NormaliseKey = Replace(UCase$(String1), " ", "")
End Function
Function FuzzyCount(ByVal LookupValue As String, _
ByVal TableArray As Range, _
Optional NFPercent As Single = 0.05, _
Optional Algorithm As Variant = 3) As Long
'**********************************************************************
'** Simple count of (Fuzzy) Matching strings >= NFPercent threshold **
'**********************************************************************
Dim lMatchCount As Long
Dim rCur As Range
Dim sString1 As String
Dim sString2 As String
'** Normalise lookup value **
sString1 = LCase$(Application.Trim(LookupValue))
For Each rCur In Intersect(TableArray.Resize(, 1), Sheets(TableArray.Parent.Name).UsedRange)
'** Normalise current Table entry **
sString2 = LCase$(Application.Trim(CStr(rCur)))
If sString2 <> "" Then
If FuzzyPercent(String1:=sString1, _
String2:=sString2, _
Algorithm:=Algorithm, _
Normalised:=False) >= NFPercent Then
lMatchCount = lMatchCount + 1
End If
End If
Next rCur
FuzzyCount = lMatchCount
End Function
Function FuzzyPercent(ByVal String1 As String, _
ByVal String2 As String, _
Optional Algorithm As Variant = 3, _
Optional Normalised As Boolean = False) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim bSoundex As Boolean
Dim bBasicMetaphone As Boolean
Dim intLen1 As Integer, intLen2 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim lngAlgorithm As Long
Dim sngScore As Single
Dim strWork As String
bSoundex = LCase$(CStr(Algorithm)) = "soundex"
bBasicMetaphone = LCase$(CStr(Algorithm)) = "metaphone"
'-------------------------------------------------------
'-- If strings havent been normalised, normalise them --
'-------------------------------------------------------
If Normalised = False Then
If bSoundex Or bBasicMetaphone Then
String1 = NormaliseStringAtoZ(String1)
String2 = NormaliseStringAtoZ(String2)
Else
String1 = LCase$(Application.Trim(String1))
String2 = LCase$(Application.Trim(String2))
End If
End If
'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If String1 = String2 Then
FuzzyPercent = 1
Exit Function
End If
'If bSoundex Then
' String1 = Soundex(Replace(String1, " ", ""))
' String2 = Soundex(Replace(String2, " ", ""))
' If String1 = String2 Then
' FuzzyPercent = msngSoundexMatchPercent
' Else
' FuzzyPercent = 0
' End If
' Exit Function
'ElseIf bBasicMetaphone Then
' String1 = Metaphone1(String1)
' String2 = Metaphone1(String2)
' If String1 = String2 Then
' FuzzyPercent = msngMetaphoneMatchPercent
' Else
' FuzzyPercent = 0
' End If
' Exit Function
'End If
intLen1 = Len(String1)
intLen2 = Len(String2)
If intLen1 = 0 Or intLen2 = 0 Then
FuzzyPercent = 0
Exit Function
End If
'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
FuzzyPercent = 0
Exit Function
End If
intTotScore = 0 'initialise total possible score
intScore = 0 'initialise current score
lngAlgorithm = Val(Algorithm)
'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (lngAlgorithm And 1) <> 0 Then
If intLen1 < intLen2 Then
FuzzyAlg1 String1, String2, intScore, intTotScore
Else
FuzzyAlg1 String2, String1, intScore, intTotScore
End If
End If
'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (lngAlgorithm And 2) <> 0 Then
If intLen1 < intLen2 Then
FuzzyAlg2 String1, String2, intScore, intTotScore
Else
FuzzyAlg2 String2, String1, intScore, intTotScore
End If
End If
'-------------------------------------------------------------
'-- If Algorithm = 4,5,6,7, use Levenstein Distance method --
'-- (Algorithm 4 was Dan Ostrander's code) --
'-------------------------------------------------------------
If (lngAlgorithm And 4) <> 0 Then
If intLen1 < intLen2 Then
' sngScore = FuzzyAlg4(String1, String1)
sngScore = GetLevenshteinPercentMatch(String1:=String1, _
String2:=String2, _
Normalised:=True)
Else
' sngScore = FuzzyAlg4(String2, String1)
sngScore = GetLevenshteinPercentMatch(String1:=String2, _
String2:=String1, _
Normalised:=True)
End If
intScore = intScore + (sngScore * 100)
intTotScore = intTotScore + 100
End If
FuzzyPercent = intScore / intTotScore
End Function
Private Sub FuzzyAlg1(ByVal String1 As String, _
ByVal String2 As String, _
ByRef Score As Integer, _
ByRef TotScore As Integer)
Dim intLen1 As Integer, intPos As Integer, intPtr As Integer, intStartPos As Integer
intLen1 = Len(String1)
TotScore = TotScore + intLen1 'update total possible score
intPos = 0
For intPtr = 1 To intLen1
intStartPos = intPos + 1
intPos = InStr(intStartPos, String2, Mid$(String1, intPtr, 1))
If intPos > 0 Then
If intPos > intStartPos + 3 Then 'No match if char is > 3 bytes away
intPos = intStartPos
Else
Score = Score + 1 'Update current score
End If
Else
intPos = intStartPos
End If
Next intPtr
End Sub
Private Sub FuzzyAlg2(ByVal String1 As String, _
ByVal String2 As String, _
ByRef Score As Integer, _
ByRef TotScore As Integer)
Dim intCurLen As Integer, intLen1 As Integer, intTo As Integer, intPtr As Integer, intPos As Integer
Dim strWork As String
intLen1 = Len(String1)
For intCurLen = 1 To intLen1
strWork = String2 'Get a copy of String2
intTo = intLen1 - intCurLen + 1
TotScore = TotScore + Int(intLen1 / intCurLen) 'Update total possible score
For intPtr = 1 To intTo Step intCurLen
intPos = InStr(strWork, Mid$(String1, intPtr, intCurLen))
If intPos > 0 Then
Mid$(strWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
Score = Score + 1 'Update current score
End If
Next intPtr
Next intCurLen
End Sub
'Private Function FuzzyAlg4(strIn1 As String, strIn2 As String) As Single
'
'Dim L1 As Integer
'Dim In1Mask(1 To 24) As Long 'strIn1 is 24 characters max
'Dim iCh As Integer
'Dim N As Long
'Dim strTry As String
'Dim strTest As String
'
'TopMatch = 0
'L1 = Len(strIn1)
'strTest = UCase(strIn1)
'strCompare = UCase(strIn2)
'For iCh = 1 To L1
' In1Mask(iCh) = 2 ^ iCh
'Next iCh 'Loop thru all ordered combinations of characters in strIn1
'For N = 2 ^ (L1 + 1) - 1 To 1 Step -1
' strTry = ""
' For iCh = 1 To L1
' If In1Mask(iCh) And N Then
' strTry = strTry & Mid(strTest, iCh, 1)
' End If
' Next iCh
' If Len(strTry) > TopMatch Then FuzzyAlg4Test strTry
'Next N
'FuzzyAlg4 = TopMatch / CSng(L1)
'End Function
'Sub FuzzyAlg4Test(strIn As String)
'
'Dim l As Integer
'Dim strTry As String
'Dim iCh As Integer
'
'l = Len(strIn)
'If l <= TopMatch Then Exit Sub
'strTry = "*"
'For iCh = 1 To l
' strTry = strTry & Mid(strIn, iCh, 1) & "*"
'Next iCh
'If strCompare Like strTry Then
' If l > TopMatch Then TopMatch = l
'End If
'End Sub
Public Function GetLevenshteinPercentMatch(ByVal String1 As String, _
ByVal String2 As String, _
Optional Normalised As Boolean = False) As Single
Dim iLen As Integer
If Normalised = False Then
String1 = UCase$(WorksheetFunction.Trim(String1))
String2 = UCase$(WorksheetFunction.Trim(String2))
End If
iLen = WorksheetFunction.Max(Len(String1), Len(String2))
GetLevenshteinPercentMatch = (iLen - LevenshteinDistance(String1, String2)) / iLen
End Function
Private Function NormaliseStringAtoZ(ByVal String1 As String) As String
'---------------------------------------------------------
'-- Remove all but alpha chars and convert to lowercase --
'---------------------------------------------------------
Dim iPtr As Integer
Dim sChar As String
Dim sResult As String
sResult = ""
For iPtr = 1 To Len(String1)
sChar = LCase$(Mid$(String1, iPtr, 1))
If sChar <> UCase$(sChar) Then sResult = sResult & sChar
Next iPtr
NormaliseStringAtoZ = sResult
End Function
'********************************
'*** Compute Levenshtein Distance
'********************************
Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer
Dim d() As Integer ' matrix
Dim m As Integer ' length of t
Dim N As Integer ' length of s
Dim I As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost
' Step 1
N = Len(s)
m = Len(t)
If N = 0 Then
LevenshteinDistance = m
Exit Function
End If
If m = 0 Then
LevenshteinDistance = N
Exit Function
End If
ReDim d(0 To N, 0 To m) As Integer
' Step 2
For I = 0 To N
d(I, 0) = I
Next I
For j = 0 To m
d(0, j) = j
Next j
' Step 3
For I = 1 To N
s_i = Mid$(s, I, 1)
' Step 4
For j = 1 To m
t_j = Mid$(t, j, 1)
' Step 5
If s_i = t_j Then
cost = 0
Else
cost = 1
End If
' Step 6
d(I, j) = WorksheetFunction.Min(d(I - 1, j) + 1, d(I, j - 1) + 1, d(I - 1, j - 1) + cost)
Next j
Next I
' Step 7
LevenshteinDistance = d(N, m)
End Function
evryone. thankyou for your help!!
I am reading in information from a .txt file, This text file has 2 row and 6 column; each element is separated by space or tab. I have the data to read all the strings but I find difficult in putting the data to the cells. How can I find end of first Row.
Text File:
$SUBCASE 1 1
$DISP 0 509 5 1 2
Below is the complete code, I'm getting only the first character string and rest not...
Private Sub PCH_Click()
Dim arTemp() As Variant
Dim lRet As String
Dim sVal As String
Dim Row As Long
Dim Col As Long
Dim ws As Worksheet
Set ws = Worksheets("Sheet1")
'Default method Uses Open Dialog To Show the Files
lRet = Application.GetOpenFilename("PCH files (*.pch), *.*")
'Reads the file into characters
sVal = OpenTextFileToString2(lRet)
Dim tmp As Variant
tmp = SplitMultiDelims(sVal, ",;$ ", True) ' Place the 2nd argument with the list of delimiter you need to use
Row = 0
For i = LBound(tmp, 1) To UBound(tmp, 1)
Row = Row + 1
Col = 1
While Not vbNewLine = ""
ws.Cells(Row, Col) = tmp(i) 'output on the first column
MsgBox (tmp(i))
Col = Col + 1
Wend
Next i
End Sub
Function OpenTextFileToString2(ByVal strFile As String) As String
' RB Smissaert - Author
Dim hFile As Long
hFile = FreeFile
Open strFile For Input As #hFile
OpenTextFileToString2 = Input$(LOF(hFile), hFile)
Close #hFile
End Function
Function SplitMultiDelims(ByRef Text As String, ByRef DelimChars As String, _
Optional ByVal IgnoreConsecutiveDelimiters As Boolean = False, _
Optional ByVal Limit As Long = -1) As String()
Dim ElemStart As Long, N As Long, M As Long, Elements As Long
Dim lDelims As Long, lText As Long
Dim Arr() As String
lText = Len(Text)
lDelims = Len(DelimChars)
If lDelims = 0 Or lText = 0 Or Limit = 1 Then
ReDim Arr(0 To 0)
Arr(0) = Text
SplitMultiDelims = Arr
Exit Function
End If
ReDim Arr(0 To IIf(Limit = -1, lText - 1, Limit))
Elements = 0: ElemStart = 1
For N = 1 To lText
If InStr(DelimChars, Mid(Text, N, 1)) Then
Arr(Elements) = Mid(Text, ElemStart, N - ElemStart)
If IgnoreConsecutiveDelimiters Then
If Len(Arr(Elements)) > 0 Then Elements = Elements + 1
Else
Elements = Elements + 1
End If
ElemStart = N + 1
If Elements + 1 = Limit Then Exit For
End If
Next N
'Get the last token terminated by the end of the string into the array
If ElemStart <= lText Then Arr(Elements) = Mid(Text, ElemStart)
'Since the end of string counts as the terminating delimiter, if the last character
'was also a delimiter, we treat the two as consecutive, and so ignore the last elemnent
If IgnoreConsecutiveDelimiters Then If Len(Arr(Elements)) = 0 Then Elements = Elements - 1
ReDim Preserve Arr(0 To Elements) 'Chop off unused array elements
SplitMultiDelims = Arr
End Function
You can read a file row by row with following code
Sub IOTest()
Dim fnum, i As Integer, j As Integer
Dim line As String
Dim lines As Variant
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
With regEx
.Pattern = "\s{1}" 'only one whitespace
.Global = True 'find all occurrences
End With
fnum = FreeFile()
Open ThisWorkbook.Path & "\IO_Test.txt" For Input As #fnum
Do Until EOF(fnum) 'until End of file
i = i + 1
Input #fnum, line 'load row into line
'First replace found sole whitespaces with ","
'Then split on the ","s
lines = Split(regEx.Replace(line, ","), ",")
For j = LBound(lines) To UBound(lines)
Cells(i, j + 1) = lines(j)
Next j
Loop
Close #fnum
End Sub
I tested this with the strings
"$SUBCASE" & vbTab & "1" & vbTab & vbTab & vbTab & vbTab & "1"
"$DISP" & vbTab & "0" & vbTab & "509" & vbTab & "5" & vbTab & "1" & vbTab & "2"
And it only works if you have one whitespace(eg. space, tab,...) separating the data. If you have more than one whitespace between the data it gets trickier. But if you can provide an example on how the data is separated I can take a look at it.
I hope it helps, let me know either way ;)