VBA Append unique regular expressions to string variable - excel

How can I grab matching regular expressions from a string, remove the duplicates, and append them to a string variable that separates each by a comma?
For example, in the string, "this is an example of the desired regular expressions: BPOI-G8J7R9, BPOI-G8J7R9 and BPOI-E5Q8D2" the desired output string would be "BPOI-G8J7R9,BPOI-E5Q8D2"
I have attempted to use a dictionary to remove the duplicates, but my function is spitting out the dreaded #Value error.
Can anyone see where I'm going wrong here? Or is there any suggestion for a better way of going about this task?
Code below:
Public Function extractexpressions(ByVal text As String) As String
Dim regex, expressions, expressions_dict As Object, result As String, found_expressions As Variant, i As Long
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "[A-Z][A-Z][A-Z][A-Z][-]\w\w\w\w\w\w"
regex.Global = True
Set expressions_dict = CreateObject("Scripting.Dictionary")
If regex.Test(text) Then
expressions = regex.Execute(text)
End If
For Each item In expressions
If Not expressions_dict.exists(item) Then expressions_dict.Add item, 1
Next
found_expressions = expressions_dict.items
result = ""
For i = 1 To expressions_dict.Count - 1
result = result & found_expressions(i) & ","
Next i
extractexpressions = result
End Function

If you call your function from a Sub you will be able to debug it.
See the comment below about adding the matches as keys to the dictionary - if you add the match object itself, instead of explicitly specifying the match's value property, your dictionary won't de-duplicate your matches (because two or more match objects with the same value are still distinct objects).
Sub Tester()
Debug.Print extractexpressions("ABCD-999999 and DFRG-123456 also ABCD-999999 blah")
End Sub
Public Function extractexpressions(ByVal text As String) As String
Dim regex As Object, expressions As Object, expressions_dict As Object
Dim item
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "[A-Z]{4}-\w{6}"
regex.Global = True
If regex.Test(text) Then
Set expressions = regex.Execute(text)
Set expressions_dict = CreateObject("Scripting.Dictionary")
For Each item In expressions
'A dictionary can have object-type keys, so make sure to add the match *value*
' and the not match object itself
If Not expressions_dict.Exists(item.Value) Then expressions_dict.Add item.Value, 1
Next
extractexpressions = Join(expressions_dict.Keys, ",")
End If
End Function

VBA's regex object actually supports the backreference to a previous capture group. Hence we can get all the unique items through the expression itself:
([A-Z]{4}-\w{6})(?!.*\1)
See an online demo
To put this in practice:
Sub Test()
Debug.Print extractexpressions("this is an example of the desired regular expressions: BPOI-G8J7R9, BPOI-G8J7R9 and BPOI-E5Q8D2")
End Sub
Public Function extractexpressions(ByVal text As String) As String
With CreateObject("VBScript.RegExp")
.Pattern = "([A-Z]{4}-\w{6})(?!.*\1)|."
.Global = True
extractexpressions = Replace(Application.Trim(.Replace(text, "$1 ")), " ", ",")
End With
End Function
Prints:

Related

Identify which capturing group was matched in the evaluated string using regex

Hallo I'm new with regular expressions and im getting a hard time figuring out how to get the group that was matched in the evaluated string using regex in VBA.
There are 4 or more different possibilities of words it can appear in the string followed by 1 or more digits:
W-Point =
WR/KE-Point=
WNr-Point=
SST_P-Nr =
One of this words appear just once in the string
Evaluated string:
"3: CALL U(Base,EZSP,Nr1,Pr-nr=20,Offset=1,Path=2,WNr-Point=20,Pr=65,ON)"
Regexpattern used:
(?:(W-Point=)(\d*)|(SST_P-Nr=)(\d*)|(WR/KE-Point=)(\d*)|(WNr-Point=)(\d*))
So far everything works :Example
Problem: Identify which word/digit pair was matched and get its group number. Right now im looping through the results and discarding the submatches that are empty. is there a better or efficient way to do it ?
Thanks in advance.
Try
Sub test()
Dim s As String
s = "3: CALL U(Base,EZSP,Nr1,Pr-nr=20,Offset=1,Path=2,WNr-Point=20,Pr=65,ON)"
Dim Regex As Object, m As Object
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.pattern = "(W-Point|WR/KE-Point|WNr-Point|SST_P-Nr)( *= *)(\d*)"
End With
If Regex.test(s) Then
Set m = Regex.Execute(s)(0).submatches
Debug.Print m(0), "'" & m(1) & "'", m(2)
End If
End Sub
update : capture = and any spaces

Read json array and print the values using vba

i have a json object that has a json array. I need to iterate array and print the values. I am using excel [vba].I am very new to VBA. Requesting anyone to help me out.
Set sr= CreateObject("MSScriptControl.ScriptControl")
sr.Language = "JScript"
Set Retval = MyScript.Eval("(" + newString + ")")
MsgBox Retval.Earth.Fruits(0).name
when i execute the above piece i am getting 'Object doesn't support this property or method'.
I need to iterate all the names under Fruit
I would use a json parser e.g. jsonconverter.bas as can use with 64bit and 32bit and doesn't represent the same security risk as scriptControl.
Jsonconverter.bas: Download raw code from here and add to standard module called jsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime.
Your json object is a dictionary with an inner dictionary Earth containing a collection Fruits (where Fruits is the key). The items in the collection are dictionaries with keys of "name" and values are the fruits. The [] denotes collection and {} dictionary.
Option Explicit
Public Sub test()
Dim s As String, json As Object, item As Object
s = "{""Earth"":{""Fruits"":[{""name"":""Mango""},{""name"":""Apple""},{""name"":""Banana""}]}}"
Set json = JsonConverter.ParseJson(s)
For Each item In json("Earth")("Fruits")
Debug.Print item("name")
Next
End Sub
Example with regex:
Public Sub test()
Dim s As String
s = "{""Earth"":{""Fruits"":[{""name"":""Mango""},{""name"":""Apple""},{""name"":""Banana""}]}}"
PrintMatches s
End Sub
Public Sub PrintMatches(ByVal s As String)
Dim i As Long, matches As Object, re As Object
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = """name"":""(.*?)"""
If .test(s) Then
Set matches = .Execute(s)
For i = 0 To matches.Count - 1
Debug.Print matches(i).SubMatches(0)
Next i
Else
Debug.Print "No matches"
End If
End With
End Sub

vba Expected Array

Anybody have a good solution for recursive replace?
For example, you still end up with commas in this string returned by MsgBox:
Dim s As String
s = "32,,,,,,,,,,,,,,,,23"
MsgBox Replace(s, ",,", ",")
I only want one comma.
Here is code that I developed, but it doesn't compile:
Function RecursiveReplace(ByVal StartString As String, ByVal Find As String, ByVal Replace As String) As String
Dim s As String
s = Replace(StartString, Find, Replace)
t = StartString
Do While s <> t
t = s
s = Replace(StartString, Find, Replace)
Loop
RecursiveReplace = s
End Function
The compiler complains about the second line in the function:
s = Replace(StartString, Find, Replace)
It says Expected Array.
???
You can use a regular expression. This shows the basic idea:
Function CondenseCommas(s As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Pattern = ",+"
CondenseCommas = RegEx.Replace(s, ",")
End Function
Tested like:
Sub test()
Dim s As String
s = "32,,,,,,,,,,,,,,,,23"
MsgBox CondenseCommas(s)
End Sub

Function to remove any characters before a first number in a string and any characters after dot/semicolon

How to enhance this function to exclude remove any characters before a first number in a string and any characters after . or : ? for instance:
GigabitEthernet0/3.210 --> 0/3
Serial6/2:0.100 --> 6/2
Serial6/6:0 --> 6/6
Function: =REPLACE($A2,1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},$A2&"0123456789"))-1,"")
You can use what you already have to get that, and use the MID function instead of replace:
=MID(
$A1,
MIN(FIND({0,1,2,3,4,5,6,7,8,9},$A1&"0123456789")),
MIN(FIND({":","."},$A1&".:"))-MIN(FIND({0,1,2,3,4,5,6,7,8,9},$A1&"0123456789"))
)
I would solve this with regular expressions - it would allow more flexibility in the rules, if that were ever needed.
E.g. using the following vba code to expose REGEXP:
Function RegExp(ByVal repattern As String, ByVal value As String, Optional occurrence = 1)
' does a regular expression search for "repattern" in string "value"
' returns the match
' or "" if not found
RegExp = ""
Dim RegEx As Object, RegMatchCollection As Object, RegMatch As Object
' create the RegExp Object with late binding
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Global = True 'look for global matches
.Pattern = repattern
End With
counter = 1
Set RegMatchCollection = RegEx.Execute(value)
For Each RegMatch In RegMatchCollection
If counter = occurrence Then
RegExp = RegMatch.value
Exit For
Else
counter = counter + 1
End If
Next
Set RegMatchCollection = Nothing
Set RegEx = Nothing
End Function
You could then have the formula on the worksheet like =RegExp("[0-9][^\.:]*",A1)

Excel VBA - increase accuracy of Match function

I am using a Match function in my program, its main purpose is to compare an input typed by the user and then loop into a database and do something every time there is a match.
Currently, I am working with this :
Function Match(searchStr As Variant, matchStr As Variant) As Boolean
Match = False
If (IsNull(searchStr) Or IsNull(matchStr)) Then Exit Function
If (matchStr = "") Or (searchStr = "") Then Exit Function
Dim f As Variant
f = InStr(1, CStr(searchStr), CStr(matchStr), vbTextCompare)
If IsNull(f) Then Exit Function
Match = f > 0
End Function
And then when it is used :
If Match(sCurrent.Range("A" & i).Value, cell.Value) Then
Here is my problem :
This is way too inaccurate. If I have in my database "Foxtrot Hotel", this function will find a match whenever the user types "F" "Fo" "Fox" "ox" "xtro" "t hot" and so on, so whenever there is a string of character included in the complete sentence.
What I want is to make my Match function identify only complete words. So in this case, to reveal a match only for three specific cases : "Foxtrot" "Hotel" and "Foxtrot Hotel".
I have read about an attribute called "lookat" which can do this kind of stuff with the Find function (lookat:=xlwhole), do you know if something similar can be inserted in my Match function?
Thanks !
You could use a regex like so. This one runs a case-insensitive match on whole words only (using word boundaries around the word being searched).
False (fo is followed by a letter, not a word boundary)
True
True
False (FOXTROT h not followed by a word boundary)
Find wont work with xlWhole for you - that will look to match the entire cell contents, not a single word within the contents.
Function WordMatch(searchStr As Variant, matchStr As Variant) As Boolean
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "\b" & matchStr & "\b"
.ignorecase = True
WordMatch = .test(searchStr)
End With
Set objRegex = Nothing
End Function
Sub B()
Debug.Print WordMatch("foxtrot hotel", "fo")
Debug.Print WordMatch("foxtrot hotel", "hotel")
Debug.Print WordMatch("foxtrot hotel", "FOXTROT")
Debug.Print WordMatch("foxtrot hotel", "FOXTROT h")
End Sub

Resources