Count Exact Match using Excel VBA - excel

I am trying to find the exact matching word using Excel VBA, but failed to do so as either due to case sensitivity or partial match.
Here is my data
Experience Column contains certain keywords and I am extracting those keywords based on master list
The problems in Result are
It is showing UI2 means, UI 2 times, but as we can see in experience it is only 1 time
Same with GO, it shows 2 : One from Go and other from Google
NoSQL has been extracted into NoSQL and SQL, however there were two different skill set: NoSQL and SQL and since the experience doesn't have SQL, it shouldn't be extracted
There is a skill set called "R" in master file, it was difficult to extract particular R as it accounts for every R
Here is my code snip
I have read so many articles, but didn't find appropriate solution. Kindly help.
Thanks

You need to account for word boundaries if you don't want "Go" to (eg) match "Google". You might be better off using a regex approach to find matches.
Note: to avoid matching "Go" with "go" you need the match to be case-sensitive, but to avoid misses on other terms you might need to pass all possible case variants like (eg) "mySQL|MySQL".
Sub matchTester()
Dim s As String, skill
s = "I go to school now and have used R, MySQL, noSQL and " & _
"SQL and Go in my Ratings job at Google. I like R"
For Each skill In Array("SQL", "noSQL|NoSQL", "mySQL|MySQL", "R", "Go", "VBA", "C#")
Debug.Print skill, CountMatches(s, skill)
Next skill
End Sub
Function CountMatches(sIn As String, countThis) As Long
Dim regEx As Object, matches As Object
With CreateObject("vbscript.regexp")
.Global = True
.IgnoreCase = False
.Pattern = "\b(" & countThis & ")\b" 'add word boundaries
Set matches = .Execute(sIn)
End With
CountMatches = matches.Count
End Function
output:
SQL 1
noSQL|NoSQL 1
mySQL|MySQL 1
R 2
Go 1
VBA 0
C# 0

Related

Clean data in excel that comes in varying formats

I have an excel table that contain values in these formats. The tables span over 30000 entries.
I need to clean this data so that only the numbers directly after V- are left. This would mean that when the value is SV-51140r3_rule, V-4407..., I would only want 4407 to remain and when the value is SV-245744r822811_rule, I would only want 245744 to remain. I have about 10 formulas that can handle these variations, but it requires a lot of manual labor. I've also used the text to column feature of excel to clean this data as well, but it takes about 30 minutes to an hour to go through the whole document. I'm looking for ways that I can streamline this process so that one formula or function can handle all of these different variations. I'm open to using VBA but don't have a whole lot of experience with it and I am unable to use Pandas or any IDE or programming language. Help please!!
I've used text to columns to clean data that way and I've used a variation of this formula
=IFERROR(RIGHT(A631,LEN(A631)-FIND("#",SUBSTITUTE(A631,"-","#",LEN(A631)-LEN(SUBSTITUTE(A631,"-",""))))),A631)
Depending on your version of Excel, either of these should work. If you have the ability to use the Let function, it will improve your performance, as this outstanding article articulates.
If you're on a really old version of excel, you'll need to hit ctl shift enter to make array formula work.
While these look daunting, all these functions are doing is finding the last V (by this function) =SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„","") and then looping through each character and only returning numbers.
Obviously the mushroom πŸ„ could be any character that one would consider improbable to appear in the actual data.
Old School
=TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„","")),9^9))),1)+0),
MID(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„","")),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„",""),
FIND("-",SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„","")),9^9))),1),""))
Let Function
(use this if you can)
=LET(zText,SUBSTITUTE(RIGHT(SUBSTITUTE(A2,"V",REPT("πŸ„",999)),999),"πŸ„",""),
TEXTJOIN("",TRUE,IF(ISNUMBER(MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1)+0),
MID(MID(zText,FIND("-",zText),9^9),
FILTER(COLUMN($1:$1),COLUMN($1:$1)<=LEN(MID(zText,FIND("-",zText),9^9))),1),"")))
VBA Custom Function
You could also use a VBA custom function to accomplish what you want.
Function getNumbersAfterCharcter(aCell As Range, aCharacter As String) As String
Const errorValue = "#NoValuesInText"
Dim i As Long, theValue As String
For i = Len(aCell.Value) To 1 Step -1
theValue = Mid(aCell.Value, i, 1)
If IsNumeric(theValue) Then
getNumbersAfterCharcter = Mid(aCell.Value, i, 1) & getNumbersAfterCharcter
ElseIf theValue = aCharacter Then
Exit Function
End If
Next i
If getNumbersAfterCharcter = "" Then getNumbersAfterCharcter = errorValue
End Function

Filter phone numbers from open text field - Power BI, excel, VBA

I have a text field in a table where I need to substitute phone numbers where applicable.
For example the text field could have:
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
Sometimes a phone number will be in the text but not always and the phone number entered will always be different.
Is there a measure to use to replace the phone numbers with no text.
Ideally the solution would be Power BI, but can also be done in the raw data using excel or VBA
Regular expression in VBA (excel) or Python (Power BI) is a straightforward solution.
I have never used PowerBI with Python before but manage to make following python script.
In PowerBI transformation steps I created a new column that would copy [message] columns and named it [noPhoneNumber], then next step ran this python script
import re
def removePhone(x):
return re.sub('\d{10,11}', "**number removed**", x)
length = len(dataset["noPhoneNumber"])
for iRow in range(length):
dataset["noPhoneNumber"][iRow] = removePhone(dataset["noPhoneNumber"][iRow])
so column "noPhoneNumber"
Call me on 08588812885 immediately
Call me on 07525812845
I need assistance please contact me
Good service
becomes
Call me on **number removed** immediately
Call me on **number removed**
I need assistance please contact me
Good service
In VBA Preferable create UDF (user defined function) and don't create a subroutine, that would be too error prone for this kind of problem.
[Added]
If you need to make a Excel based solution, you can create a UDF function like so:
(remember early binding to import of VBScript_RegExp_55.RegExp in excel)
Function removePhoneNumber(text As String, Optional replacement As String = "**number removed**") As String
Dim regex As New RegExp
regex.Pattern = "\d{10,11}"
removePhoneNumber = regex.Replace(text, replacement)
End Function
...and then use excel function like so:
=removePhoneNumber(A2),
=removePhoneNumber(A3)
and so on...
A simple VBA function alternative
Function removePhone(s As String) As String
Const DELIM As String = " "
Dim i As Long, tokens As Variant
tokens = Split(s, DELIM)
For i = LBound(tokens) To UBound(tokens)
If IsNumeric(tokens(i)) Then
tokens(i) = "*Removed*" ' << change to your needs
Exit For ' assuming a single phone number per string
End If
Next
removePhone = Join(tokens, DELIM)
End Function
You can do this in Power Query. Create a custom column with this below code. I have considered the column name is Comments but please adjust this with your column name.
if Text.Length(Text.Select([comments], {"0".."9"})) = 11
then
Text.Replace(
[comments],
Text.Select([comments], {"0".."9"}),
""
)
else [comments]
Here is the output below. You can also replace phone numbers with other text like #### to make is anonymous.
NOTE
This will only work if there are only 1 number in the string with length 11 (You can adjust the length in code as per requirement).
This will Not work if there are more than one Numbers in the string.
If there are 1 number in the string but length not equal 11, this will keep the whole string as original.

Excel - need to extract a machine ID that includes a number in a cell, regardless of its position in sentence

It's my first time posting a question here :)
When exporting data from our enterprise ticketing system, we unfortunately do not have a specific column for a machine ID, but instead have "problem description" column which includes both the short description of the issue and the machine ID. The Machine ID always has numbers, but may contain only numbers or 2-4 letters before the number, with no spaces, examples are:
XK2065
2092
BOZK10625
The number of digits can vary, but is never more that six.
2 examples of the problem description:
1) XK2065 - issue not detected, please investigate.
2) Please investigate why issue was not detected, machine ID is XK2065, ticket number 1425778.
So, the problem is that the unit ID can be located anywhere in the sentence and can also contain only numbers or 2 to 4 letters before the numbers.
Is there a function that can extract the machine ID, regardless of location, along with the beginning letters adjacent to the numbers if it has them? Additional condition I'd like is for a number of digits to be no more than 6, as sometimes ticket numbers may be included which are 7-digit.
A function would be preferable to VBA macro.
Thanks in advance!
This function should do what you need, using regular expression (like #RonRosenfeld suggested):
Function RegExID(str As String) As String
Dim rgx As Object
Set rgx = CreateObject("VBScript.RegExp")
Dim allMatches As Object
With rgx
.Pattern = "\b[A-Z]{0,4}[\d]{4,6}\b"
.Global = True
.ignoreCase = True
.MultiLine = True
End With
Set allMatches = rgx.Execute(str)
For Each Item In allMatches
RegExID = Item.Value
Next
End Function

Limit text to allowed characters only - (not by enumerating the wrong characters) | VBA

I would like to limit certain textboxes to accept only [A-Za-z]
I hope, a counterpart to Like exists.
With Like I would have to make a long list of not allowed characters to be able to filter.
Not MyString like [?;!Β°%/=....]
I can think of a solution in the form of:
For Counter = 1 To Len(MyString)
if Mid(MyString, Counter, 1) Like "*[a-z]*" = false then
MsgBox "String contains bad characters"
exit sub
end if
next
... but is there a more sophisticated 1liner solution ?
Until then, I have created a function to make it "Oneliner":
Function isPureString(myText As String) As Boolean
Dim i As Integer
isPureString = True
For i = 1 To Len(myText)
If Mid(myText, i, 1) Like "*[a-zA-Z_Γ­Γ©Γ‘Ε±ΓΊΕ‘ΓΆΓΌΓ³Γ“ΓœΓ–ΓšΕΕ°ΓΓ‰Γ]*" = False Then
isPureString = False
End If
Next
End Function
If i add 1 more parameter, its also possible to define the allowed characters upon calling the function.
Ok, it seems my question was a bit of a duplicate, even though that did not pop in my search results.
So credits for #QHarr for posting the link.
The solution I can forge from that idea for my "oneliner" is:
If myText Like WorksheetFunction.Rept("[a-zA-Z]", Len(myText))=false then 'do something.
Using .rept is inspiringly clever and elegant in my oppinion.
So what is does: Multiplies the search criteria for each charater instead of looping through the characters.
EDIT:
In an overaboundance of nice and elegant solutions, the most recent leader is:
If not myText Like "*[!A-Za-z]*" then '... do something
Statistics update:
I have tested the last 3 solutions' performance:
I have pasted # in the below text strin at the beginning, at the end or nowhere.
The criteria were: "*[a-zA-Z \S.,]*"
For 100000 repetitions
text = "This will be a very Long text, with one unwanted in the middle, to be able to test the difference in performance of the approaches."
1.) Using the [!...] -> 30ms with error, 80ms if no error
2.) Using .Rept -> around 1800ms for all cases
3.) Using characterLoop+Mid -> around 3000ms if no error / 40-80ms ms if early error

Lookup customer type by the meaningful part of the customer name and set prioritize

Is there any way excel 2010 can lookup customer type by using meaningful part of customer name?
Example, The customer name is Littleton's Valley Market, but the list I am trying to look up the customer type the customer names are formatted little different such as <Littletons Valley MKT #2807 or/and Littleton Valley.
Some customer can be listed under multiple customer types, how can excel tell me what which customer and can I set excel to pull primary or secondary type?
Re #1. Fails on the leading < (if belongs!) and any other extraneous prefix but this may be rare or non-existent so:
=INDEX(G:G,MATCH(LEFT(A1,6)&"*",F:F,0))
or similar may catch enough to be useful. This looks at the first six characters but can be adjusted to suit, though unfortunately only once at a time. Assumes the mismatches are in ColumnA (eg A1 for the formula above) and that the correct names are in ColumnF with the required type in the corresponding row of ColumnG.
On a large scale Fuzzy Lookup may be helpful.
Since with a VBA tag Soundex matching and Levenshtein distance may be of interest.
Re #2 If secondary type is in ColumnH, again in matching row, then adjust G:G above to H:H.
pnuts gives a good answer re: Fuzzy Lookup, Soundex matching, etc. Quick and dirty way I've handled this before:
Function isNameLike(nameSearch As String, nameMatch As String) As Boolean
On Error GoTo ErrorHandler
If InStr(1, invalidChars(nameSearch), invalidChars(nameMatch), vbTextCompare) > 0 Then isNameLike = True
Exit Function
ErrorHandler:
isNameLike = False
End Function
Function invalidChars(strIn As String) As String
Dim i As Long
Dim sIn As String
Dim sOut As String
sOut = ""
On Error GoTo ErrorHandler
For i = 1 To Len(strIn)
sIn = Mid(strIn, i, 1)
If InStr(1, " 1234567890~`!##$%^&*()_-+={}|[]\:'<>?,./" & Chr(34), sIn, vbTextCompare) = 0 Then sOut = sOut & sIn
Next i
invalidChars = sOut
Exit Function
ErrorHandler:
invalidChars = strIn
End Function
Then I can call isNameLike from code, or use it as a formula in a worksheet. Note that you still have to supply the "significant" part of the customer name you're looking for.

Resources