Comparing two strings with wildcard conditions - excel

Working on a VBA macro that compares string values, and have hit a wall at a point.
Here is some context on what I'm trying to accomplish:
Compare two strings, If String 2 is CONTAINED anywhere in String 1, I return a match. For the most part, I've used the builtin "instr" function in instances where the String 2 is contained in String 1 without any wildcards involved.
The trouble I'm having is that I must treat spaces or " " in String 2 as a wildcard.
Ex:
String 1 = Red Green Blue
String 2 = Red Blue
This should still return a valid match, since the " " in String 2 is being treated as a wildcard, and any number of characters can be between "Red" and "Blue"
What I did was use the "split" function with " " as a delimiter to split String 2 in instances where a space(" ") is involved, and run an instr function on the resulting array to check if each element of the array is contained in String 1. In the example above:
String 2 would be split into a String array(let's call with splitString) with the following elements:
splitString = (Red, Blue)
Using the logic above:
splitString(0) is contained in String 1
splitString(1) is contained in String 1
Therefore, String 2 is contained in String 1, and a match is returned. The match condition I am using is utilizing the UBounds value of splitString (details in the code snippet below)
The issue I am having is that I need to only return a match where the initial string order of String 2 is maintained. Ex:
If:
String 1 = Red Green Blue
String 2 = Blue Red
This is not a valid match since even though when we split String 2, we find the resulting array elements are "contained" in String 1, the order of String 2 is not being respected.
Here is a rough draft of the logic I've coded:
splitString = Split(String2," ")
x = -1
For y = LBound(splitString) To UBound(splitString)
splitStringCompare = InStr(1, String1, splitString(y), vbTextCompare)
If splitStringCompare > 0 Then
x = x + 1
If x = UBound(splitString) Then
"Match"
Else
"No Match"
End If
Next y
Any help or nudge in the right direction would be much appreciated. Thanks!!

You can use Regular Expressions object to test this and verify match. For demonstration purpose, I have created a UDF as below.
Public Function MatchWildCard(strMatchWith As String, strMatchString As String) As Boolean
strMatchString = Replace(Trim(strMatchString), " ", ".+")
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = False
.IgnoreCase = True '\\ Change to False if match is case sensitive
.Pattern = strMatchString
If .Test(strMatchWith) Then
MatchWildCard = True
Else
MatchWildCard = False
End If
End With
End Function
It can then be used in Excel sheet like below snapshot =MatchWildCard(A1,B1):
Note: I am a basic user of RegExp so there may be a better manner of handling this so you should test this on large sample to validate.

Related

Remove Certain Characters from a String using UDF

I have a column which contain cells that have some list of alphanumeric number system as follows:
4A(4,5,6,7,8,9); 4B(4,5,7,8); 3A(1,2,3); 3B(1,2,3), 3C(1,2)
On a cell next to it, I use a UDF function to get rid of special characters "(),;" in order to leave the data as
4A456789 4B4578 3A123 3B123 3C12
Function RemoveSpecial(Str As String) As String
Dim SpecialChars As String
Dim i As Long
SpecialChars = "(),;-abcdefghijklmnopqrstuvwxyz"
For i = 1 To Len(SpecialChars)
Str = Replace$(Str, Mid$(SpecialChars, i, 1), "")
Next
RemoveSpecial = Str
End Function
For the most part this works well. However, on certain occasions, the cell would contain an unorthodox pattern such as when a space is included between the 4A and the parenthesized items:
4A (4,5,6,7,8,9);
or when a text appears inside the parenthesis (including two spaces on each side):
4A (4,5, skip 8,9);
or a space appears between the first two characters:
4 A(4,5,6)
How would you fix this so that the random spaces are removed except to delaminate the actual combination of data?
One strategy would be to substitute the patterns you want to keep before eliminating the "special" characters, then restore the desired patterns.
From your sample data, it look like you want to keep a space only if it follow ); or ),
Something like this:
Function RemoveSpecial(Data As Variant) As Variant
Dim SpecialChars As String
Dim KeepStr As Variant, PlaceHolder As Variant, ReplaceStr As Variant
Dim i As Long
Dim DataStr As String
SpecialChars = " (),;-abcdefghijklmnopqrstuvwxyz"
KeepStr = Array("); ", "), ")
PlaceHolder = Array("~0~", "~1~") ' choose a PlaceHolder that won't appear in the data
ReplaceStr = Array(" ", " ")
DataStr = Data
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, KeepStr(i), PlaceHolder(i))
Next
For i = 1 To Len(SpecialChars)
DataStr = Replace$(DataStr, Mid$(SpecialChars, i, 1), vbNullString)
Next
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, PlaceHolder(i), ReplaceStr(i))
Next
RemoveSpecial = Application.Trim(DataStr)
End Function
Another strategy would be regular expressions (RegEx)
It looks like a regular expression could come in handy here, for example:
Function RemoveSpecial(Str As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\)[;,]( )|[^A-Z\d]+"
RemoveSpecial = .Replace(Str, "$1")
End With
End Function
I have used the regular expression:
\)[;,]( )|[^A-Z\d]+
You can see an online demo to see the result in your browser. The way this works is to apply a form of what some would call "The best regex trick ever!"
\)[;,]( ) - Escape a closing paranthesis, then match either a comma or semicolon before we capture a space character in our 1st capture group.
| - Or use the following alternation:
[^A-Z\d]+ - Any 1+ char any other than in given character class.
EDIT:
In case you have values like 4A; or 4A, you can use:
(?:([A-Z])|\))[;,]( )|[^A-Z\d]+
And replace with $1$2. See an online demo.

Excel FindJobCode's problems

I am new in VBA and I have a code as below to find some job numbers in a description.
However, i have 3 problems on it...
if 1st character is small letter such as "s", "m", then it show error
i cannot solve Example3, the result will show "M3045.67," but all i need is "M3045.67" only, no comma
i don't know why it is failed to run the code Range("E2").Value = "Overhead" after Else in Example5
but for problem 3, i can run result "overhead" before i add 2nd criteria, is something wrong there ? Please help~~~thanks.
P.S. the looping will be added after solving above questions......
Sub FindCode()
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Retenal Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overehad
'Criteria1 : 1st Character = G / S / M / P
If Left(Range("A2"), 1) = "G" Or Left(Range("A2"), 1) = "S" Or Left(Range("A2"), 1) = "M" Or Left(Range("A2"), 1) = "P" Then
'Criteria2 : 2nd-5th Character = Number only
If IsNumeric(Mid(Range("A2"), 2, 4)) Then
'Get string before "space"
Range("E2").Value = Left(Range("A2"), InStr(1, Range("A2"), " ") - 1)
Else
'If not beginning from Crit 1&2, show "Overhead"
Range("E2").Value = "Overhead"
End If
End If
'If start from "P", get first 5 string
If Left(Range("A2"), 1) = "P" And IsNumeric(Mid(Range("A2"), 2, 4)) Then
Range("E2").Value = Left(Range("A2"), 5)
Else
End If
End Sub
The function below will extract the job number and return it to the procedure that called it.
Function JobCode(Cell As Range) As String
' 303
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Rental Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overhead
Dim Fun As String ' function return value
Dim Txt As String ' Text to extract number from
' Minimize the number of times your code reads from the sheet because it's slow
Txt = Cell.Value ' actually, it's Cells(2, 1)
' Criteria1 : 1st Character = G / S / M / P
If InStr("GSMP", UCase(Left(Txt, 1))) Then
Txt = Split(Txt)(0) ' split on blank, take first element
' Criteria2 : 2nd-5th Character = Number only
' Isnumeric(Mid("A4", 2, 4)) = true
If (Len(Txt) >= 5) And (IsNumeric(Mid(Txt, 2, 4))) Then
Fun = Replace(Txt, ",", "")
Fun = Split(Fun, "-")(0) ' discard "-123A" in example 4
End If
End If
' If no job number was extracted, show "Overhead"
If Len(Fun) = 0 Then Fun = "Overhead"
JobCode = Fun
End Function
The setup as a function, rather than a sub, is typical for this sort of search. In my trials I had your 5 examples in A2:A6 and called them in a loop, giving a different cell to the function on each loop. Very likely, this is what you are angling for, too. This is the calling procedure I used for testing.
Sub Test_JobCode()
' 303
Dim R As Long
For R = 2 To Cells(Rows.Count, "A").End(xlUp).Row
' I urge you not to use syntax for addressing ranges when addressing cells
Debug.Print JobCode(Cells(R, "A")) ' actually, it's Cells(2, 1)
Next R
End Sub
Of course, instead of Debug.Print JobCode(Cells(R, "A")) you could also have Cells(R, "B").Value = JobCode(Cells(R, "A"))
The reason why your Else statement didn't work was a logical error. The "Overhead" caption doesn't apply if criteria 1 & 2 aren't met but if all previous efforts failed, which is slightly broader in meaning. This combined with the fact that Isnumeric(Mid("A4", 2, 4)) = True, causing the test not to fail as you expected.
In rough terms, the code first checks if the first letter qualifies the entry for examination (and returns "Overhead" if it doesn't). Then the text is split into words, only the first one being considered. If it's too short or non-numeric no job code is extracted resulting in "Overhead" in the next step. If this test is passed, the final result is modified: The trailing comma is removed (it it exists) and anything appended with a hyphen is removed (if it exists). I'm not sure you actually want this. So, you can easily remove the line. Or you might add more modifications at that point.
What you are trying to do is FAR easier using regular expression matching and replacing, so I recommend enabling that library of functions. The best news about doing that is that you can invoke those functions in EXCEL formulas and do not need to use Visual Basic for Applications at all.
To enable Regular Expressions as Excel functions:
Step 1: Enable the Regular Expression library in VBA.
A. In the Visual Basic for Applications window (where you enter VBA code) find the Tools menu and
select it, then select the References... entry in the sub-menu.
B. A dialogue box will appear listing the possible "Available References:" in alphabetical order.
Scroll down to find the entry "Microsoft VBScript Regular Expressions 5.5".
C. Check the checkbox on that line and press the OK button.
Step 2: Create function calls. In the Visual Basic for Applications window select Insert..Module. Then paste the following VBA code into the blank window that comes up:
' Some function wrappers to make the VBScript RegExp reference Library useful in both VBA code and in Excel & Access formulas
'
Private rg As RegExp 'All of the input data to control the RegExp parsing
' RegExp object contains 3 Boolean options that correspond to the 'i', 'g', and 'm' options in Unix-flavored regexp
' IgnoreCase - pretty self-evident. True means [A-Z] matches lowercase letters and vice versa, false means it won't
' IsGlobal - True means after the first match has been processed, continue on from the current point in DataString and look to process more matches. False means stop after first match is processed.
' MultiLine - False means ^ and $ match only Start and End of DataString, True means they match embedded newlines. This provides an option to process line-by-line when Global is true also.
'
' Returns true/false: does DataString match pattern? IsGlobal=True makes no sense here
Public Function RegExpMatch(DataString As String, Pattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = False, Optional MultiLine As Boolean = False) As Boolean
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpMatch = rg.Test(DataString)
End Function
'
' Find <pattern> in <DataString>, replace with <ReplacePattern>
' Default IsGlobal=True means replace all matching occurrences. Call with False to replace only first occurrence.
'
Public Function RegExpReplace(DataString As String, Pattern As String, ReplacePattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = True, Optional MultiLine As Boolean = False) As String
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpReplace = rg.Replace(DataString, ReplacePattern)
End Function
Now you can call RegExpMatch & RegExpReplace in Excel formulas and we can start to think of how to solve your particular problem. To be a match, your string must start with G, S, M, or P. In a regular expression code that is ^[GSMP], where the up-arrow says to start at the beginning and the [GSMP] says to accept a G, S, M or P in the next position. Then any matching string must next have a number of numeric digits. Code that as \d+, where the \d means one numeric digit and the + is a modifier that means accept one or more of them. Then you could have a dot followed by some more digits, or not. This is a little more complicated - you would code it as (\.\d+)? because dot is a special character in regular expressions and \. says to accept a literal dot. That is followed by \d+ which is one or more digits, but this whole expression is enclosed in parentheses and followed by a ?, which means what is in parentheses can appear once or not at all. Finally, comes the rest of the line and we don't really care what is in it. We code .*$ for zero or more characters (any) followed by the line's end. That all goes together as ^[GSMP]\d+(\.\d+)?.*$.
Putting that pattern into our RegExpReplace call:
=RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1")
We wrapped the part we were interested in keeping in parentheses because the "$1" as part of the replacement pattern says to use whatever was found inside the first set of parentheses. Here is that formula used in Excel
This works for all your examples but the last one, which is your else clause in your logic. We can fix that by testing whether the pattern matched using RegExpMatch:
=IF(regexpMatch(A2,"^([GSMP]\d+(\.\d+)?).*$"),RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1"),"Overhead")
This gives the results you are looking for and you have also gained a powerful text manipulation tool to solve future problems.

How to extract the first instance of digits in a cell with a specified length in VBA?

I have the following Text sample:
Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May
I want to get the number 079, So what I need is the first instance of digits of length 3. There are certain times the 3 digits are at the end, but they usually found with the first 2 underscores. I only want the digits with length three (079) and not 19, 1920, or 2554 which are different lengths.
Sometimes it can look like this with no underscore:
1920 O-B CLI 353 Tar Traf
Or like this with the 3 digit number at the end:
Ins-Si_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_079
There are also times where what I need is 2 digits but when it's 2 digits its always at the end like this:
FY1920-Or-OLV-B-45
How would I get what I need in all cases?
You can split the listed items and check for 3 digits via Like:
Function Get3Digits(s As String) As String
Dim tmp, elem
tmp = Split(Replace(Replace(s, "-", " "), "_", " "), " ")
For Each elem In tmp
If elem Like "###" Then Get3Digits = elem: Exit Function
Next
If Get3Digits = vbNullString Then Get3Digits = IIf(Right(s, 2) Like "##", Right(s, 2), "")
End Function
Edited due to comment:
I would execute a 2 digit search when there are no 3 didget numbers before the end part and the last 2 digits are 2. if 3 digits are fount at end then get 3 but if not then get 2. there are times when last is a number but only one number. I would only want to get last if there are 2 or 3 numbers. The - would not be relevant to the 2 digets. if nothing is found that is desired then would return " ".
If VBA is not a must you could try:
=TEXT(INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=3 or (position()=last() and string-length()=2)]"),1),"000")
It worked for your sample data.
Edit: Some explaination.
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>") - The key part to transform all three potential delimiters (hyphen, underscore and space) to valid XML node end- and startconstruct.
The above concatenated using ampersand into a valid XML construct (adding a parent node <t>).
FILTERXML can be used to now 'split' the string into an array.
//s[.*0=0][string-length()=3 or last() and string-length()=2] - The 2nd parameter of FILTERXML which should be valid XPATH syntax. It reads:
//s 'Select all <s> nodes with
following conditions:
[.*0=0] 'Check if an <s> node times zero
returns zero (to check if a node
is numeric. '
[string-length()=3 or (position()=last() and string-length()=2)] 'Check if a node is 3 characters
long OR if it's the last node and
only 2 characters long.
INDEX(.....,1) - I mentioned in the comments that usually this is not needed, but since ExcelO365 might spill the returned array, we may as well implemented to prevent spilling errors for those who use the newest Excel version. Now we just retrieving the very first element of whatever array FILTERXML returns.
TEXT(....,"000") - Excel will try delete leading zeros of a numeric value so we use TEXT() to turn it into a string value of three digits.
Now, if no element can be found, this will return an error however a simple IFERROR could fix this.
Try this function, please:
Function ExtractThreeDigitsNumber(x As String) As String
Dim El As Variant, arr As Variant, strFound As String
If InStr(x, "_") > 0 Then
arr = Split(x, "_")
Elseif InStr(x, "-") > 0 Then
arr = Split(x, "-")
Else
arr = Split(x, " ")
End If
For Each El In arr
If IsNumeric(El) And Len(El) = 3 Then strFound = El: Exit For
Next
If strFound = "" Then
If IsNumeric(Right(x, 2)) Then ExtractThreeDigitsNumber = Right(x, 2)
Else
ExtractThreeDigitsNumber = strFound
End If
End Function
It can be called in this way:
Sub testExtractThreDig()
Dim x As String
x = "Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May"
Debug.Print ExtractThreeDigitsNumber(x)
End Sub

VBA: How to find the values after a "#" symbol in a string

I am trying to set the letters after a # symbol to a variable.
For example, x = #BAL
I want to set y = BAL
Or x = #NE
I want y = NE
I am using VBA.
Split() in my opinion is the easiest way to do it:
Dim myStr As String
myStr = "#BAL"
If InStr(, myStr, "#") > 0 Then '<-- Check for your string to not throw error
MsgBox Split(myStr, "#")(1)
End If
As wisely pointed out by Scott Craner, you should check to ensure the string contains the value, which he checks in this comment by doing: y = Split(x,"#")(ubound(Split(x,"#")). Another way you can do it is using InStr(): If InStr(, x, "#") > 0 Then...
The (1) will take everything after the first instance of the character you are looking for. If you were to have used (0), then this would have taken everything before the #.
Similar but different example:
Dim myStr As String
myStr = "#BAL#TEST"
MsgBox Split(myStr, "#")(2)
The message box would have returned TEST because you used (2), and this was the second instance of your # character.
Then you can even split them into an array:
Dim myStr As String, splitArr() As String
myStr = "#BAL#TEST"
splitArr = Split(myStr, "#") '< -- don't append the collection number this time
MsgBox SplitArr(1) '< -- This would return "BAL"
MsgBox SplitArr(2) '< -- This would return "TEST"
If you are looking for additional reading, here is more from the MSDN:
Split Function
Description Returns a zero-based, one-dimensional array containing a specified number of substrings. SyntaxSplit( expression [ ,delimiter [ ,limit [ ,compare ]]] ) The Split function syntax has thesenamed arguments:
expression
Required. String expression containing substrings and delimiters. If expression is a zero-length string(""), Split returns an empty array, that is, an array with no elements and no data.
delimiter
Optional. String character used to identify substring limits. If omitted, the space character (" ") is assumed to be the delimiter. If delimiter is a zero-length string, a single-element array containing the entire expression string is returned.
limit
Optional. Number of substrings to be returned; -1 indicates that all substrings are returned.
compare
Optional. Numeric value indicating the kind of comparison to use when evaluating substrings. See Settings section for values.
You can do the following to get the substring after the # symbol.
x = "#BAL"
y = Right(x,len(x)-InStr(x,"#"))
Where x can be any string, with characters before or after the # symbol.

Excel formula: If cell contains substring "this" AND does not contain substring "that"

I'm trying to write a function in which a column contains one substring and does not contain another substring.
In the example bellow I would like my function to return 1 if my row contains "some project" AND DOES NOT CONTAIN "overhead".
row| example strings | desired return value
0 |some project,other project | 1
1 |some project | 1
2 |overhead | 0
3 |some project, overhead | 0
4 |some project, other, boo | 1
I was trying to formulate it first with exact strings such as:
=IF(AND((E3="some project"),NOT(E3="overhead")),1,0)
But this only gives correct results for row 1 and 2 because it only does exact mach for the string instead of matching on the substring.
What you need is some kind of Substring function. I think FIND might work. Check out this page: https://exceljet.net/excel-functions/excel-find-function
Your function would be like:
=IF(AND(ISERROR(FIND("some project", E3))=FALSE,ISERROR(FIND("overhead",E3))),1,0)
EDIT: Above function works after testing
Tricky part here is that FIND returns the starting position of the string, and if it fails it returns #VALUE, which I believe you can catch with the ISERROR() function. This is in no way a beautiful solution. I would try to utilize the code behind and write this is VBA, as I am certain there is a proper substring function in VBA.
If you can insert a little VBA code, then you can use a custom function like so:
=StrContains(E3, "some project", "overhead")
And this will return True if the value in E3 contains both of those substrings. This function relies mainly on VBA's Instr function, which
Function code:
Public Function StrContains(ByRef cl As Excel.Range, ParamArray strings() As Variant)
'Function returns TRUE if the range contains ALL of the passed substring items
' uses ParamArray to allow varying number of substring items
' Ex:
' =StrContains(A1, "something", "else", "foo")
'
Dim val$
Dim s
Dim i As Integer
Dim ret As Boolean
Dim length As Integer
length = UBound(strings) + 1
val = cl.Value2
For Each s In strings
If InStr(1,val, s) <> 0 Then
i = i + 1
End If
Next
ret = (i = length)
StrContains = ret
End Function
You could modify this relatively easily to be case-insensitive, or to accept partial matches optionally, etc. Here is what it looks like extended for both of those concepts:
=StrContains(E3, False, True, "some project", "overhead")
Function Code:
Public Function StrContains(ByRef cl As Excel.Range, MatchCase As Boolean, MatchAll as Boolean, ParamArray strings() As Variant)
'MatchAll is matching switch, use True to require ALL matching items, or False to allow for fewer.
'MatchCase is the Case-sensitive switch, use False to ignore case.
' uses ParamArray to allow varying number of substring items
' Ex:
' =StrContains(A1, "something", "else", "foo")
'
Dim val$
Dim s
Dim i As Integer
Dim ret As Boolean
Dim length As Integer
length = UBound(strings) + 1
val = cl.Value2
If Not MatchCase Then
val = LCase(val)
For i = lBound(strings) to UBound(strings)
strings(i) = lcase(strings(i))
Next
Next
For Each s In strings
If InStr(val, s) <> 0 Then
i = i + 1
End If
Next
ret = (i = IIF(MatchAll, length, 1))
StrContains = ret
End Function
You can actually use the SEARCH function, like this:
=IF(ISNUMBER(SEARCH("some project",B2)),NOT(ISNUMBER(SEARCH("overhead",B2)))*1,0)
Considerations:
SEARCH function: returns either a number or error (that's why I used ISNUMBER, you could actually have used ISERROR too). This function is case insensitive. For case sensitiveness you might just use FIND, in the previous formula woud be =IF(ISNUMBER(FIND("some project",B2)),NOT(ISNUMBER(FIND("overhead",B2)))*1,0)
ISNUMBER function: returns either FALSE or TRUE. I converted the result to 1 or 0 simply by multiplying TRUE or FALSE by 1.
Hope helps

Resources