In Excel VBA I need to perform multiple regular expression matches which then deletes the match from the string while preserving the remainder of the string. I have it working by daisy-chaining two variables, and by not testing the pattern match first since the second match is the remainder of the first.
Consider the follow data:
(2.5.3) A. 100% of product will be delivered in 3 days
(2.5.3) B. Capability to deliver product by air.
(2.5.3) C. Support for xyz feature
(2.5.3) D. Vendor is to provide an overview of the network as proposed.
(2.5.3) E. The network should allow CustomerABC to discover their devices.
(2.5.3) F. The use of CustomerABC existing infrastructure should be optimized. CustomerABC's capability will vary.
(2.5.3) G. Describe the number of network devices requiring to run CustomerABC's center.
With this data, I am deleting the outline numbers in the beginning of the string, as well as any references to CustomerABC and any hyphenation that could possibly appear multiple times in the string at any location, with potentially upper and lower case. I have the regex's working. Here is the code I'm trying:
Function test(Txt As String) As String
Dim regEx As Object
Dim v1 As String
Dim v2 As String
Dim n As String
n = "CustomerABC"
If regEx Is Nothing Then
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
regEx.IgnoreCase = True
End If
If Len(Txt) > 0 Then
With regEx
' The 1st pattern
.Pattern = "^\(?[0-9.]+\)?"
'If Not .Test(Txt) Then Exit Function
v1 = .Replace(Txt, "")
' The 2nd pattern
.Pattern = n + "(\S*)?(\s+)?"
'If Not .Test(Txt) Then Exit Function
v2 = .Replace(v1, "")
' The result
test = Application.Trim(v2)
End With
End If
End Function
Is there a way to make this better, speed things up, and have a variable number of match/deletions?
Thanks in advance.
Like this:
Function test(Txt As String) As String
Static regEx As Object '<< need Static here
Dim rv As String, p, n
n = "CustomerABC"
If regEx Is Nothing Then
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
regEx.IgnoreCase = True
End If
If Len(Txt) > 0 Then
rv = Txt
'looping over an array of patterns
For Each p In Array("^\(?[0-9.]+\)?", n & "(\S*)?(\s+)?")
With regEx
.Pattern = p
rv = .Replace(rv, "")
End With
Next p
End If
test = Application.Trim(rv)
End Function
Related
I'm trying to extract 'manufacturer=acme' from, for example:
attribute1=red,attribute2=medium,manufacturer=acme,attribute4=spherical
from column 'attributes', for which there are 8000+ rows.
I can't use left(), right(), split() functions because the manufacturer attribute doesn't have a fixed number of attributes/characters to the left or right of it and split() only works for one character, not a string.
Is there a way I can achieve this, target the string manufacturer= and remove all text from the left and right starting from its encapsulating commas?
Quick mock-up for looping through a split string (untested):
dim stringToArray as variant: stringToArray = split(target.value, ",")
dim arrayItem as long
for arrayLocation = lbound(stringToArray) to ubound(stringToArray)
if instr(ucase(stringToArray(arrayLocation)), ucase("manufacturer=")) then
dim manufacturerName as string: manufacturerName = right(stringToArray(arrayLocation), len(stringToArray(arrayLocation))-len("manufacturer="))
exit for
end if
next arrayLocation
debug.print manufacturerName
I have, maybe, an overkill solution using RegExp.
Following is a UDF you can use in a formula
Public Function ExtractManufacturerRE(ByRef r As Range) As String
On Error GoTo RETURN_EMPTY_STR
Dim matches As Object
With CreateObject("VBScript.RegExp")
.Pattern = "manufacturer=[^,]+"
.Global = False
Set matches = .Execute(r.Value)
If matches.Count > 0 Then
ExtractManufacturerRE = matches.Item(0).Value
End If
End With
RETURN_EMPTY_STR:
End Function
To be fair, this is sub-optimal, plus it doesn't work on a range but only on a single cell.
I am looking to have my regexp return the value of of the pattern I am looking for or the position. Similar to how the Instr function works returning its position in a string I would like to be able to do this with patterns. what i have so far just replaces the patters and i cannot figure out how to have it return a position.
Sub test
Dim regex As Object
Dim r As Range, rC As Range
Dim firstextract As Long
' cells in column A
Set r = Range("A2:A3")
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]"
' loop through the cells in column A and execute regex replace
Dim MyArray(10, 10) As Integer
For CntMtg = 1 To 100
For Each rC In r
If rC.Value <> "" Then rC.Value = regex.Replace(rC.Value, "Extract from here")
Next rC
Next
End sub
If you don't want to replace but just get the position of a hit, use the Execute-method. It returns a Collection of Matches. A match has basically three properties:
FirstIndex is the position of the match within your string
Length is the length of the match that was found
Value it the match itself that was found
If you can have more than one match within a string, you need to set the property Global of your regex, else the collection will at most find 1 hit.
The following code uses early binding (as it helps to figure out properties and methods), add a reference to Microsoft VBScript Regular Expressions 5.5.
Dim regex As RegExp
regex.Pattern = "[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]"
regex.Global = True
Dim matches As MatchCollection, match As match
Set matches = regex.Execute(s)
For Each match In matches
Debug.Print "Pos: " & match.FirstIndex & " Len: " & match.Length & " - Found: " & match.Value
Next
For details, see the official documentation: https://learn.microsoft.com/en-us/dotnet/standard/base-types/the-regular-expression-object-model
I am trying to cleanup a set of strings in Excel to extract certain words after removing some prefixes and extra characters. Initially I was trying this with FIND, LEFT, MID, etc. Then, I came across this helpful post and trying my hand at regex.
https://superuser.com/questions/794536/excel-formulas-for-stripping-out-prefix-suffix-around-number
I have used the UDF given there called Remove which takes a regex argument. Now, I am still not able to remove all the items I wanted to remove.
In the attached Excel you can see what I have tried and what the answer I am looking.
Here are the Prefixes I wanted to remove:
The numbers in the beginning surrounded by brackets - Ideally I want this in a separate column.
Anyword before a hyphen here there are a number of them 'l-', 'al-'
and then these prefixes below.
bi
bil
fa
wa
wal
How do I write a single regex which would remove all the above prefixes?
Here is the UDF I am using:
Function Remove(objCell As Range, strPattern As String)
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = strPattern
Remove = RegEx.Replace(objCell.Value, "")
End Function
Here is the link to the XLSM file which contains the data I have:
https://www.dropbox.com/s/et9ee727ompj5fl/Regex%20Trials.xlsm?dl=0
and here is a screenshot to show you what I am looking for:
Not 100% perfect for words but should get you started
Breakdown of RegEx (\d+\:)+\d+
(\d+\:) finds any patterns that match the format x:
the plus after the bracket then tells it that this is a repeating pattern.
lastly the \d+ matches the last digit in the string so that the regex will find a pattern that matches x:x:x
The next RegEx (?!l-|al-|a-|wa-|fa-|bi-)[a-z].* is a lot more complex.
First of all lets look at the [a-z]. This tells it to match any character between a and z. We then want to capture the rest of the word so by using .* it captures everything from the first match to the end of the string (this includes non a-z characters). However, we don't want it to capture the first part of the string before the hyphen (in most cases) so by using ?! We use what's called negative look ahead. This looks for anything inside the brackets and ignores those bits. | simply means or. so anything inside that bracket will be ignored from the match.
Go to http://regexr.com/ if you want to have a play around is a handy site to learn/test RegEx
Public Sub test()
Dim rng As Range
Dim matches
Dim c
With Sheet1
Set rng = .Range(.Cells(2, 1), .Cells(.Cells(.Rows.Count, 1).End(xlUp).Row, 1))
End With
For Each c In rng
With c
.Offset(0, 6) = ExecuteRegEx(.Value2, "(\d+\:)+\d+")
.Offset(0, 7) = ExecuteRegEx(.Value2, "(?!l-|al-|a-|wa-|fa-|bi-)[a-z].*")
End With
Next c
End Sub
Public Function ExecuteRegEx(str As String, pattern As String) As String
Dim RegEx As Object
Dim matches
Set RegEx = CreateObject("VBScript.RegExp")
With RegEx
.Global = True
.ignorecase = False
.pattern = pattern
If .test(str) Then
Set matches = .Execute(str)
ExecuteRegEx = matches(0)
Else
ExecuteRegEx = vbNullString
End If
End With
End Function
I wouldn't use a regex for this: you can do some splitting of the cell value and testing of the prefixs against a defined array of prefixs:
Note: the array values are in an order where substrings of other prefixs are later in the list
Public Function RemovePrefix(RngSrc As Range) As String
If RngSrc.Count > 1 Then Exit Function
On Error GoTo ExitFunction
Dim Prefixs() As String: Prefixs = Split("wal,wa',wa,bil,bi,fa", ",")
Dim Arr() As String, i As Long, Temp As String
Arr = Split(RngSrc, "-")
If UBound(Arr) > 0 Then
RemovePrefix = Arr(UBound(Arr))
Exit Function
End If
Arr = Split(RngSrc, " ")
For i = 0 To UBound(Prefixs)
Temp = Arr(UBound(Arr))
If InStr(Temp, Prefixs(i)) = 1 Then
RemovePrefix = Right(Temp, Len(Temp) - Len(Prefixs(i)))
Exit Function
End If
Next i
RemovePrefix = Temp
ExitFunction:
If Err Then RemovePrefix = "Error"
End Function
I have a column containing multiple string values, like a sentence.
in that sentence i want to find one or all alphanumeric values of 10 or more characters containing atleast one - , and put the resulting values in another column.
For example:
the column containing sentence is like:
upgrade 15.07.2010, old No: WI82-01062. User moved to No: WI12-01012 02.04.2012 to a 2 user network.
or
Upgrade from lite 7/6/07, old No: PTX7-89C367EC5052-01211
Ideally I want a column with values like WI82-01062, WI12-01012 for the first example, and PTX7-89C367EC5052-01211 for the second example.
May be searching for the - in the string and finding the first occurrence of blank space at both ends would help, but I do not have any clue how to write that in excel term.
Thanks
You could probably use a regex like this (there may be better patterns!):
Function ExtractData(r As Variant) As String
Static oRE As Object
Dim sTemp As String
Dim n As Long
Dim matches
If oRE Is Nothing Then
Set oRE = CreateObject("vbscript.regexp")
With oRE
.Pattern = "[A-Za-z0-9\-]{10,}"
.Global = True
End With
End If
Set matches = oRE.Execute(r)
If matches.Count > 0 Then
For n = 1 To matches.Count
sTemp = sTemp & ", " & matches(n - 1)
Next n
ExtractData = Mid$(sTemp, 3)
End If
End Function
Does anyone know how to return only the numeric value immediately on either side of a dash in a string?
For example, let's say we have the following string "Text, 2-78, 88-100, 101". I'm looking for a way to identify a dash and then return one of the numbers (left or right).
Ultimately I would like to check to see if a given number, let's say 75, falls within any of the ranges noted in the string. Ideally it would see that 75 falls within "2-78".
Any help would be greatly appreciated!
Go to Tools->References and check "Microsoft VBScript Regular Expressions 5.5." Then you can do something like this. (I know this isn't good code, but it's the idea...) Also, this finds all the #-# patterns and prints either the left or right number for all of them (based on whether the boolean "left" is true or false).
Dim str, res As String
str = "Text, 2-78, 88-100, 101"
Dim left As Boolean
left = False
Dim re1 As New RegExp
re1.Pattern = "\d+-\d+"
re1.Global = True
Dim m, n As Match
For Each m In re1.Execute(str)
Dim re2 As New RegExp
re2.Global = False
If left Then
re2.Pattern = "\d+"
Else
re2.Pattern = "-\d+"
End If
For Each n In re2.Execute(m.Value)
res = n.Value
If Not left Then
res = Mid(res, 2, Len(str))
End If
Next
MsgBox res
Next
You can do this many different ways with VBA. Using the Split() function to convert into an array, first using the commas as a delimiter and then using the dash would probably be a way to go.
That said, if you want a quick and dirty way to do this with excel ( from which you could record a macro ) here is what you can do.
Paste your target string into a cell.
Run Text to Columns on it, using the comma as your deliminator.
Copy the row your now have and Paste-Transpose onto a new sheet.
Run Text to Columns again on your transposed column, this time with the dash as your deliminator.
You now have side by side columns of your numbers, which you can compare to your target values as needed.
You may need to use the Trim() functions in there somewhere to remove whitespace, but hopefully the text to columns would leave you with numbers instead of text numbers.
Ultimately I think there are lots of ways you could approach this sort of problem. It looks like a good way to try and use RegExp. RegExp is not my speciality but I do like to try and use it to answer some Q's here on SO. This code has been tested for your example data and is working properly.
Something like this, assuming your text is in cell A1, and you're testing a value like 75, this also captures single digits in your string in the match collection:
Sub TestRegExp
Dim m As Match
Dim testValue As Long
Dim rangeArray As Variant
testValue = 75 'or whatever value you're trying to find
pattern = "[\d]+[-][\d]+\b|[\d]+"
Set re = New RegExp
re.pattern = pattern
re.Global = True
re.IgnoreCase = True 'doesn't really matter since you're looking for numbers
Set allMatches = re.Execute([A1])
For Each m In allMatches
rangeArray = Split(m, "-")
Select Case UBound(rangeArray)
Case 0
If testValue = rangeArray(0) Then
msg = testValue & " = " & m
Else:
msg = testValue & " NOT " & m
End If
Case 1
If testValue >= CLng(rangeArray(0)) And testValue <= CLng(rangeArray(1)) Then
msg = testValue & " is within range: " & m
Else:
msg = testValue & " is not within range: " & m
End If
Case Else
End Select
MsgBox msg, vbInformation
Next
End Sub