Excel word matching - excel

Hi there I'm hoping someone here can help me out.
I've got some pretty untidy data and I am looking to do a VLOOKUP / INDEX MATCH sort of function but having no luck as the data is quite untidy.
So basically I've got a list of names and there websites in one sheet.
Then a second sheet with just a list of names that I want to populate with the websites from the other sheet as to cut down the work load.
The problem is the company names are not entered in the correct format (i.e for an abbreviated company spaces are used instead of .'s). Another example is when there is multiple branches of a company.
So worksheet 1 would be
Company Name: Company
Website: www.company.com
Worksheet 2 would be:
Company Name : Company (UK)
Company Name : Company (USA)
Where both should have the same website as found in worksheet 1 is there any possible way to do this. I have tried VLOOKUP and INDEX/MATCH and set the condition to TRUE but it does not return good results.
Thanks in advance for any advice.

Fuzzy matching is a far from an exact science, particularly when it comes to the built in Excel functions.
If I was to recommend the safest bet - I would say create a de-duped list of the values in sheet 2 and create a matching lookup value for sheet one. This would certainly be the most accurate (but time consuming and depending on how many times you will use these values)
Alternatively, there are 'fuzzy matches' from various sources that can be used. One of the famous ones is the 'Jaro Winkler Distance' : a version of which can be found at this link: http://garonfolo.dk/herbert/2013/07/excel-vba-jaro-winkler-distance-fuzzy-matching/

Without seeing the data it's impossible to say for sure, but here are some options.
1. You could use MATCH("*Company*",Sheets1!A:A,0).
2. If there is always the same pattern, you could create a helper column using LEFT, RIGHT, and MID to extract the company name.
3. VBA offers some support for regular expressions, using Set reg = CreateObject("VBScript.RegExp") If you know how to use regular expressions, you could write a macro to match based on it.

To expand on the answer by Trum because the link he posted is no longer available, I have translated the JaroWinklerProximity algorithm found in this C# implementation https://stackoverflow.com/a/19165108/8031589 by user leebickmtu to VBA for Excel:
Option Base 1
Function JaroWinklerProximity(String1 As Range, String2 As Range) As Double
Dim mWeightThreshold As Double
mWeightThreshold = 0.7
Dim mNumChars As Integer
mNumChars = 4
Dim aString1 As String
aString1 = LCase(String1.Text)
Dim aString2 As String
aString2 = LCase(String2.Text)
Dim lLen1 As Integer
lLen1 = Len(aString1)
Dim lLen2 As Integer
lLen2 = Len(aString2)
If lLen1 = 0 Then
If lLen2 = 0 Then
JaroWinklerProximity = 1
Exit Function
Else
JaroWinklerProximity = 0
Exit Function
End If
End If
Dim lSearchRange As Integer
lSearchRange = WorksheetFunction.Max(1, WorksheetFunction.Max(lLen1, lLen2) / 2)
ReDim lMatched1(lLen1) As Boolean
ReDim lMatched2(lLen2) As Boolean
Dim lNumCommon As Integer
lNumCommon = 0
Dim i As Integer
For i = 1 To lLen1 Step 1
Dim lStart As Integer
lStart = WorksheetFunction.Max(1, i - lSearchRange)
Dim lEnd As Integer
lEnd = WorksheetFunction.Min(i + lSearchRange, lLen2)
Dim j As Integer
For j = lStart To lEnd - 1 Step 1
If lMatched2(j) Then
GoTo NextIteration1
End If
Dim charAtIndex1 As String
charAtIndex1 = Mid(aString1, i, 1)
Dim charAtIndex2 As String
charAtIndex2 = Mid(aString2, j, 1)
If charAtIndex1 <> charAtIndex2 Then
GoTo NextIteration1
End If
lMatched1(i) = True
lMatched2(j) = True
lNumCommon = lNumCommon + 1
Exit For
NextIteration1:
Next j
Next i
If lNumCommon = 0 Then
JaroWinklerProximity = 0
Exit Function
End If
Dim lNumHalfTransposed As Integer
lNumHalfTransposed = 0
Dim k As Integer
k = 1
For i = 1 To lLen1 Step 1
If Not lMatched1(i) Then
GoTo NextIteration2
End If
Do While Not lMatched2(k)
k = k + 1
Loop
If Mid(aString1, i, 1) <> Mid(aString2, j, 1) Then
lNumHalfTransposed = lNumHalfTransposed + 1
End If
k = k + 1
NextIteration2:
Next
Dim lNumTransposed As Integer
lNumTransposed = lNumHalfTransposed / 2
Dim lNumCommonD As Double
lNumCommonD = lNumCommon
Dim lWeight As Double
lWeight = (lNumCommonD / lLen1 + lNumCommonD / lLen2 + (lNumCommon - lNumTransposed) / lNumCommonD) / 3
If lWeight <= mWeightThreshold Then
JaroWinklerProximity = lWeight
Exit Function
End If
Dim lMax As Integer
lMax = WorksheetFunction.Min(mNumChars, WorksheetFunction.Min(Len(aString1), Len(aString2)))
Dim lPos As Integer
lPos = 1
Do While lPos < lMax And Mid(aString1, lPos, 1) = Mid(aString2, lPos, 1)
lPos = lPos + 1
Loop
If lPos = 1 Then
JaroWinklerProximity = lWeight
Exit Function
End If
JaroWinklerProximity = lWeight + 0.1 * lPos * (1# - lWeight)
End Function

Related

How do I modify a sample code for primefactorization in Excel VBA to a specific column of numbers?

I have in Column K:
K8 is 6384 i.e. =SUM(J1:J8)
K9 is 2598 i.e. =SUM(J2:J9)
K10 is 12176 i.e =SUM(J3:J10)
:
:
K5488
up to K5488 (No numbers in sequence, all different numbers)
The largest number appearing in K is 1 400 000.
I need in Column M: The prime factors of each number in K
e.g. K8 is 6384 then M8 should be 2,2,2,2,3,7,19
k9 is 2598 then M9 should be 2,3,433 etc.
I found the following code by John Coleman on your site (Mar 28) which tested well, but seeing I have no programming knowledge, I don't know how to modify it to use in my columns K & M setup.
Here's the sample code:
Function Factor(ByVal n As Long, Optional FirstTrial As Long = 2) As String
Dim i As Long
Dim t As Long
Dim limit As Long
Dim rest As String
Dim s As String
If n = 1 Then
Factor = n
Exit Function
End If
limit = Int(Sqr(n))
t = FirstTrial
Do While t <= limit
If n Mod t = 0 Then
rest = Factor(n / t, t)
If rest <> "1" Then
s = t & "," & rest
End If
Factor = s
Exit Function
Else
If t = 2 Then t = 3 Else t = t + 2
End If
Loop
'if we get here:
Factor = n
End Function
Function PrimeOrFactor(n As Long) As String
Dim s As String
s = Factor(n)
If n = 1 Then
PrimeOrFactor = "Neither"
ElseIf (s) = Trim(n) Then
PrimeOrFactor = "Prime"
Else
PrimeOrFactor = s
End If
End Function
Tested like:
Sub test()
Dim i As Long
For i = 1 To 20
Cells(i, 1) = i
Cells(i, 2) = PrimeOrFactor(i)
Next i
End Sub
The function you provided is a udf (user defined function) to be used in your worksheet.
If you put the functions you provided in a normal code module, you can enter the following in your worksheet in M8:
=Factor(K8)
and copy that function down to your desired range.

Find if the number is Prime or show the prime factors using excel formula?

I have of integers in Col A and in col B i want to show result 'Prime' if it doesn't have further factors for the number itself. This goes like this if the number for example is 37 result will be 'Prime' and if its 44 then result will be 2x2x11. How can i do this using excel formula? Screen shot :
Disclaimer: code below is ported from this very useful VB.NET example
Option Explicit
Sub Test()
Debug.Print FindFactors(2)
Debug.Print FindFactors(3)
Debug.Print FindFactors(11)
Debug.Print FindFactors(12)
Debug.Print FindFactors(13)
Debug.Print FindFactors(16)
Debug.Print FindFactors(17)
Debug.Print FindFactors(24)
Debug.Print FindFactors(25)
Debug.Print FindFactors(11234)
Debug.Print FindFactors(67894)
End Sub
Function FindFactors(lngNumber As Long) As String
Dim collFactors As Collection
Dim lngFactor As Long
Dim lngCounter As Long
Dim strFactors As String
Dim strFactor As String
Set collFactors = New Collection
' Take out the 2s.
Do While (lngNumber Mod 2 = 0)
collFactors.Add 2
lngNumber = lngNumber / 2
Loop
' Take out other primes.
lngFactor = 3
Do While (lngFactor * lngFactor <= lngNumber)
If (lngNumber Mod lngFactor = 0) Then
' This is a factor.
collFactors.Add lngFactor
lngNumber = lngNumber / lngFactor
Else
' Go to the next odd number.
lngFactor = lngFactor + 2
End If
Loop
' If num is not 1, then whatever is left is prime.
If lngNumber > 1 Then
collFactors.Add lngNumber
End If
' make a string out of collection
strFactors = ""
If collFactors.Count = 1 Then
strFactors = "Prime"
Else
For lngCounter = 1 To collFactors.Count
strFactors = strFactors & collFactors(lngCounter)
If lngCounter < collFactors.Count Then
strFactors = strFactors & "x"
End If
Next lngCounter
End If
FindFactors = strFactors
End Function
Gives an output of:
Prime
Prime
Prime
2x2x3
Prime
2x2x2x2
Prime
2x2x2x3
5x5
2x41x137
2x83x409
Can be used in a worksheet:
Here is a somewhat straightforward recursive version. It is based on the idea that once you identify a factor you divide the number by that factor and then turn your attention to factoring the rest.
Function Factor(ByVal n As Long, Optional FirstTrial As Long = 2) As String
Dim i As Long
Dim t As Long
Dim limit As Long
Dim rest As String
Dim s As String
If n = 1 Then
Factor = n
Exit Function
End If
limit = Int(Sqr(n))
t = FirstTrial
Do While t <= limit
If n Mod t = 0 Then
rest = Factor(n / t, t)
If rest <> "1" Then
s = t & "x" & rest
End If
Factor = s
Exit Function
Else
If t = 2 Then t = 3 Else t = t + 2
End If
Loop
'if we get here:
Factor = n
End Function
Function PrimeOrFactor(n As Long) As String
Dim s As String
s = Factor(n)
If n = 1 Then
PrimeOrFactor = "Neither"
ElseIf (s) = Trim(n) Then
PrimeOrFactor = "Prime"
Else
PrimeOrFactor = s
End If
End Function
Tested like:
Sub test()
Dim i As Long
For i = 1 To 20
Cells(i, 1) = i
Cells(i, 2) = PrimeOrFactor(i)
Next i
End Sub
Output:
Using LET and dynamic arrays allows for the following without VBA.
=LET(x, SEQUENCE(A1),
factors, FILTER(x, MOD(A1,x) = 0),
factorMatrix, 1 * (MOD(factors, TRANSPOSE(factors)) = 0),
primeFactors, FILTER(factors, MMULT(factorMatrix, factors ^ 0) = 2),
primeFactorList, IF(MOD(A1, primeFactors ^ SEQUENCE(1, 20)) = 0, primeFactors, ""),
factorProduct, TEXTJOIN("x",, primeFactorList),
IF(A1 = 1, "Neither", IF(factorProduct=A1&"","Prime",factorProduct)))
It works for numbers up to 2^20.
A slight modification to the excellent code of John Coleman above, using Mod with Doubles included below, will allow factoring integers up to Excel's 15 digit limit. Numbers with large factors may be noticeably slower. For example, 562,951,983,465,953 factored correctly as 16,777,259 x 33,554,467 in about 5 seconds on a Core i3.
Function Factor(ByVal n As Double, Optional FirstTrial As Double = 2) As String 'Changed
Dim i As Long
Dim t As Double 'Changed
Dim limit As Long
Dim rest As String
Dim s As String
If n = 1 Then
Factor = n
Exit Function
End If
limit = Int(Sqr(n))
t = FirstTrial
Do While t <= limit
If FMod(t, n) = 0 Then 'Changed
.
.
.
Public Function FMod(a As Double, b As Double) As Double
FMod = a - Fix(a / b) * b
'http://en.wikipedia.org/wiki/Machine_epsilon
'Unfortunately, this function can only be accurate when `a / b` is outside [-2.22E-16,+2.22E-16]
'Without this correction, FMod(.66, .06) = 5.55111512312578E-17 when it should be 0
If FMod >= -2 ^ -52 And FMod <= 2 ^ -52 Then '+/- 2.22E-16
FMod = 0
End If
End Function

Runtime Error on a 2D Bubblesort in Excel VBA array

I have been banging my head (and a few other heads as well on other Excel programming sites) to get a Combobox in a Userform to sort the rows (coming from two columns in the source spreadsheet) in alpha order.
Ideally, I want a 2 dimensional sort, but at this point, will settle for ONE that works.
Currently, the Combobox, when dropped down, reads in part (minus the bullet points, which do NOT appear and are not needed):
Zoom MRKPayoutPlan
Chuck PSERSFuture
Chuck PSERSCurrent
What I want is:
Chuck PSERSCurrent
Chuck PSERSFuture
Zoom MRKPayoutPlan
The first order is derived from the order in which the rows appear in the source worksheet.
At this point, I am getting a Runtime Error '13', Type Mismatch error. Both fields are text fields (one is last name, the other is a classification code- I want to sort first by name).
Below are the two relevant sections of the VBA code. If someone can help me sort this out, I'll buy at least a virtual round of beers. Excel VBA is not my most comfortable area- I can accomplish this in other apps, but the client spec is that this all must run in Excel alone. Thanks in advance.
Private Sub UserForm_Initialize()
fPath = ThisWorkbook.Path & "\"
currentRow = 4
sheetName = Sheet5.Name
lastRow = Sheets(sheetName).Range("C" & Rows.Count).End(xlUp).Row
Dim rngUID As Range
Dim vList
Set rngUID = Range("vUID")
With rngUID
vList = Application.Index(.Cells, .Parent.Evaluate("ROW(" & .Address & ")"), Array(7, 1))
End With
vList = BubbleSort2D(vList, 2, 1)
With ComboBox1
.ColumnCount = 2
.ColumnWidths = "100;100"
.List = vList
End With
PopulateControls
End Sub
Public Function BubbleSort2D(Strings, ParamArray SortColumns())
Dim tempItem
Dim a As Long
Dim e As Long
Dim f As Long
Dim g As Long
Dim i As String
Dim j As String
Dim m() As String
Dim n
Dim x As Long
Dim y As Long
Dim lngColumn As Long
e = 1
n = Strings
Do While e <> -1
For a = LBound(Strings) To UBound(Strings) - 1
For y = LBound(SortColumns) To UBound(SortColumns)
lngColumn = SortColumns(y)
i = n(a, lngColumn)
j = n(a + 1, lngColumn)
f = StrComp(i, j)
If f < 0 Then
Exit For
ElseIf f > 0 Then
For x = LBound(Strings, 2) To UBound(Strings, 2)
tempItem = n(a, x)
n(a, x) = n(a + 1, x)
n(a + 1, x) = tempItem
Next x
g = 1
Exit For
End If
Next y
Next a
If g = 1 Then
e = 1
Else
e = -1
End If
g = 0
Loop
BubbleSort2D = n
End Function
Here is a bubble sort in VBA source.
Public Sub BubbleSort(ByRef sequence As Variant, _
ByVal lower As Long, ByVal upper As Long)
Dim upperIt As Long
For upperIt = upper To lower + 1 Step -1
Dim hasSwapped As Boolean
hasSwapped = False
Dim bubble As Long
For bubble = lower To upperIt - 1
If sequence(bubble) > sequence(bubble + 1) Then
Dim t as Variant
t = sequence(bubble)
sequence(bubble) = sequence(bubble + 1)
sequence(bubble + 1) = t
hasSwapped = True
End If
Next bubble
If Not hasSwapped Then Exit Sub
Next upperIt
End Sub
Note that using variable names that specify what they are and do instead of single letters makes it easier to read.
As for the 2D sort. Don't. Sort each array individually then sort the array of arrays using the same method. You will need to provide an abstraction to compare the columns. Do not try to sort them both at the same time. I can't think of a scenario where that is a good idea. If for some reason elements can change their sub array in the 2D array, then flatten it into 1 array, sort that and split it back into a 2D array.
Honestly from what I am understanding of you specific problem. You are going from 1D sequence to a 1D sequence so I think 2D arrays are and unnecessary complication.
Instead use a modified bubble sort routine with the comparison statement,
If sequence(bubble) > sequence(bubble +1) Then '...
replaced with a custom comparison function
ComboBoxItemCompare(sequence(bubble), sequence(bubble + 1))
that will return True if the first argument should be swapped with the second.

Type-mismatch in excel VBA

I try to learn VBA. This code:
Dim i As Integer
Dim damage As String
i = 1
Do While 1
damage = CStr(Worksheets("charakters").Range("d14").Value)
you_min_damage = CInt(Left(damage, i))
If Right(i, 0) = "-" Then
Trim (you_min_damage)
Exit Do
End If
i = i + 1
Loop
cause this problem (in 4 iteration):
In cell D14 I have "4 - 11". I want to separate first number nad change it to integer.
You_min_damage is integer.
Try this one:
Dim you_min_damage As Integer, you_max_damage As Integer
Dim arr
'store all values in array
arr = Split(Worksheets("charakters").Range("d14").Value, "-")
'get first value
you_min_damage = CInt(arr(0))
'get last value
you_max_damage = CInt(arr(UBound(arr)))

Split and sort strings components using Excel

I have a column in Excel with the format:
A01G45B45D12
I need a way to format it like this, that is divide the string into groups of three characters, sort the groups alphabetically and then join them together with a + sign between:
A01+B45+D12+G45
I wonder it this is possible using the built in formulas in Excel or if I have to do this using VBA or something else, I already have the code for this in C# if there is an easy way to use that from Excel. I have not written plugins for Excel before.
Edit to add:
The above is just an example, the string can be of "any length" but its always divisible by three and the order is random so I cannot assume anything about the order beforehand.
Sub ArraySort()
Dim strStarter As String
Dim strFinish As String
Dim intHowMany As Integer
Dim intStartSlice As Integer
strStarter = ActiveCell.Offset(0, -1).Value 'Pulls value from cell to the left
intHowMany = Int(Len(strStarter) / 3)
ReDim arrSlices(1 To intHowMany) As String
intStartSlice = 1
For x = 1 To intHowMany
arrSlices(x) = Mid(strStarter, intStartSlice, 3)
intStartSlice = intStartSlice + 3
Next x
Call BubbleSort(arrSlices)
For x = 1 To intHowMany
strFinish = strFinish + arrSlices(x) & "+"
Next x
strFinish = Left(strFinish, Len(strFinish) - 1)
ActiveCell.Value = strFinish 'Puts result into activecell
End Sub
Sub BubbleSort(list() As String)
'Taken from power programming with VBA
'It’s a sorting procedure for 1-dimensional arrays named List
'The procedure takes each array element, if it is greater than the next element, the two elements swap positions.
'The evaluation is repeated for every pair of items (that is n-1 times)
Dim First As Integer, Last As Long
Dim i As Long, j As Long
Dim temp As String
First = LBound(list)
Last = UBound(list)
For i = First To Last - 1
For j = i + 1 To Last
If list(i) > list(j) Then
temp = list(j)
list(j) = list(i)
list(i) = temp
End If
Next j
Next i
End Sub

Resources