instr function in VBA - excel

Kindly help need your assistance please,
Description:
I have designed a VBA code where I want to compare a string with FileName in a directory.In this case I have used Instr function, this helps me in 3 cases only but not dynamicaly.
Explaination:
if the str=4567 and compairing with filename, where filename can be:
1.xs1234567.pdf
2.4567.pdf
3.4567(1).pdf
4.updated 4567(2).pdf
so the code i have created help to find all the files, but this is not correct. It should exclude first file name ie:xs1234567.pdf
This is the following code:
Dirfname = finDir
fName = Mid((Dirfname), 1, (InStr(1, (Dirfname), ".") - 1))
fileExt = Mid((Dirfname), (InStr(1, (Dirfname), ".") + 1), 3)
**If (InStr(1, fName, wkvalue) > 0 And wkvalue <> "") Then ** 'checking lookup val in instr
If (Trim(UCase(fileExt)) = "PDF" Or Trim(UCase(fileExt)) = "TIF") Then
Cells(recnum, 2).Value = "Yes"
'col = col + 1
ws.Hyperlinks.Add Anchor:=Cells(recnum, (col + 1)), _
Address:=SourceFolderName & "\" & Dirfname
col = col + 1
'Else: Cells(recnum, 2).Value = "No"
End If
End If
Please advice what can be done for this case.

You could use Regular Expressions to assist you. Im not very proficient with it yet but this is a relatively simple case. Here is a function adapted from tmehta.com/regexp that you could use in conjunction with an iteration of the filenames in a folder:
Function RegExpFind(FindIn, FindWhat As String, _
Optional IgnoreCase As Boolean = False) As Variant
Dim i As Long
Dim rslt() As String
'// Using Late Binding here, use the commented types if you've added
'// the "Microsoft VBScript Regular Expressions" reference
Dim RE As Object 'RegExp
Dim allMatches As Object 'MatchCollection
Dim aMatch As Object 'Match
'// Use "Set RE = New RegExp" if using the VBScript reference
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = FindWhat
RE.IgnoreCase = IgnoreCase
RE.Global = True
Set allMatches = RE.Execute(FindIn)
'// check if we've found anything, if not return a single element array
'// containing an empty string. If we've found something return at least
'// at least a single element array containing the matched expressions
If allMatches.Count = 0 Then
ReDim rslt(0 To 0)
rslt(0) = ""
Else
ReDim rslt(0 To allMatches.Count - 1)
For i = 0 To allMatches.Count - 1
rslt(i) = allMatches(i).Value
Next i
End If
RegExpFind = rslt
End Function
You would need to pass in the file name as the FindIn parameter and the regexp pattern "^4567" in the FindWhat parameter. Using it this way will return 4567 (as the first element in the return array) only if it occurs at the start of the search string. This function could be easily recycled for use with other searches down the road if you need to.

Assuming that your criteria for a match is that the character preceeding 4567, if any, is a space
i = InStr(1, fName, wkvalue)
if i > 0 and wkvalue <> "" Then
ch = " "
if i > 1 then
ch = mid(fName, i - 1, 1)
end if
if ch = " " then
...

You don't describe why the first filename should be rejected, but I assume it's because it has a digit (0-9) before and/or after wkvalue, such that "4567" is not the entire number. In such a case, this will work:
charBefore = ""
charAfter = ""
pos = InStr(fName(i), wkvalue)
If pos = 1 Then
' It's at the beginning of the filename.
' Get character after the number.
charAfter = Mid(fName(i), pos + Len(wkvalue), 1)
ElseIf pos > 1 Then
' It's not at the beginning of the filename
' Get characters before and after the number.
charBefore = Mid(fName(i), pos - 1, 1)
charAfter = Mid(fName(i), pos + Len(wkvalue), 1)
Else
' Number not found.
End If
' Could add another ElseIf to check if it's at the end of the filename.
If pos > 0 And wkvalue <> "" _
And Not charBefore Like "#" And Not charAfter Like "#" Then
' Number found and not preceded or followed by a digit (0-9).
' Do your thing.
End If

Related

Remove alphanumeric chars in front of a defined char

I have a string in a cell composed of several shorter strings of various lengths with blank spaces and commas in between. In some cases only one or more blanks are in between.
I want to remove every blank space and comma and only leave behind 1 comma between each string element. The result must look like this:
The following doesn't work. I'm not getting an error but the strings are truncated at the wrong places. I don't understand why.
Sub String_adaption()
Dim i, j, k, m As Long
Dim STR_A As String
STR_A = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
i = 1
With Worksheets("table")
For m = 1 To Len(.Range("H" & i))
j = 1
Do While Mid(.Range("H" & i), m, 1) = "," And Mid(.Range("H" & i), m - 1, 1) <> Mid(STR_A, j, 1) And m <> Len(.Range("H" & i))
.Range("H" & i) = Mid(.Range("H" & i), 1, m - 2) & Mid(.Range("H" & i), m, Len(.Range("H" & i)))
j = j + 1
Loop
Next m
End With
End Sub
I'd use a regular expression to replace any combination of spaces and comma's. Something along these lines:
Sub Test()
Dim str As String: str = "STRING_22 ,,,,,STRING_1 , , ,,,,,STRING_333 STRING_22 STRING_4444"
Debug.Print RegexReplace(str, "[\s,]+", ",")
End Sub
Function RegexReplace(x_in, pat, repl) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = pat
RegexReplace = .Replace(x_in, repl)
End With
End Function
Just for the sake of alternatives:
Formula in B1:
=TEXTJOIN(",",,TEXTSPLIT(A1,{" ",","}))
The following function will split the input string into pieces (words), using a comma as separator. When the input string has multiple commas, it will result in empty words.
After splitting, the function loops over all words, trims them (remove leading and trailing blanks) and glue them together. Empty words will be skipped.
I have implemented it as Function, you could use it as UDF: If your input string is in B2, write =String_adaption(B2) as Formula into any cell.
Function String_adaption(s As String) As String
' Remove duplicate Commas and Leading and Trailing Blanks from words
Dim words() As String, i As Long
words = Split(s, ",")
For i = 0 To UBound(words)
Dim word As String
word = Trim(words(i))
If word <> "" Then
String_adaption = String_adaption & IIf(String_adaption = "", "", ",") & word
End If
Next i
End Function
P.S.: Almost sure that this could be done with some magic regular expressions, but I'm not an expert in that.
If you have recent Excel version, you can use simple worksheet function to split the string on space and on comma; then put it back together using the comma deliminater and ignoring the blanks (and I just noted #JvdV had previously posted the same formula solution):
=TEXTJOIN(",",TRUE,TEXTSPLIT(A1,{" ",","}))
In VBA, you can use a similar algorithm, using the ArrayList object to collect the non-blank results.
Option Explicit
Function commaOnly(s As String) As String
Dim v, w, x, y
Dim al As Object
Set al = CreateObject("System.Collections.ArrayList")
v = Split(s, " ")
For Each w In v
x = Split(w, ",")
For Each y In x
If y <> "" Then al.Add y
Next y
Next w
commaOnly = Join(al.toarray, ",")
End Function
This preserves the spaces within the smaller strings.
Option Explicit
Sub demo()
Const s = "STRING 22,,,, ,,STRING 1,,,, ,,STRING 333 , , , STRING_22 STRING_44"
Debug.Print Cleanup(s)
End Sub
Function Cleanup(s As String) As String
Const SEP = ","
Dim regex, m, sOut As String, i As Long, ar()
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([^,]+)(?:[ ,]*)"
End With
If regex.Test(s) Then
Set m = regex.Execute(s)
ReDim ar(0 To m.Count - 1)
For i = 0 To UBound(ar)
ar(i) = Trim(m(i).submatches(0))
Next
End If
Cleanup = Join(ar, SEP)
End Function
Code categories approach
For the sake of completeness and to show also other ways "leading to Rome", I want to demonstrate an approach allowing to group the string input into five code categories in order to extract alphanumerics by a tricky match (see [B] Function getCats()):
To meet the requirements in OP use the following steps:
1) remove comma separated tokens if empty or only blanks (optional),
2) group characters into code categories,
3) check catCodes returning alpha nums including even accented or diacritic letters as well as characters like [ -,.+_]
Function AlphaNum(ByVal s As String, _
Optional IgnoreEmpty As Boolean = True, _
Optional info As Boolean = False) As String
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Date: 2023-01-12
'1) remove comma separated tokens if empty or only blanks (s passed as byRef argument)
If IgnoreEmpty Then RemoveEmpty s ' << [A] RemoveEmpty
'2) group characters into code categories
Dim catCodes: catCodes = getCats(s, info) ' << [B] getCats()
'3) check catCodes and return alpha nums plus chars like [ -,.+_]
Dim i As Long, ii As Long
For i = 1 To UBound(catCodes)
' get current character
Dim curr As String: curr = Mid$(s, i, 1)
Dim okay As Boolean: okay = False
Select Case catCodes(i)
' AlphaNum: cat.4=digits, cat.5=alpha letters
Case Is >= 4: okay = True
' Category 2: allow only space, comma, minus
Case 2: If InStr(" -,", curr) <> 0 Then okay = True
' Category 3: allow only point, plus, underline
Case 3: If InStr(".+_", curr) <> 0 Then okay = True
End Select
If okay Then ii = ii + 1: catCodes(ii) = curr ' increment counter
Next i
ReDim Preserve catCodes(1 To ii)
AlphaNum = Join(catCodes, vbNullString)
End Function
Note: Instead of If InStr(" -,", curr) <> 0 Then in Case 2 you may code If curr like "[ -,]" Then, too. Similar in Case 3 :-)
[A] Helper procedure RemoveEmpty
Optional clean-up removing comma separated tokens if empty or containing only blanks:
Sub RemoveEmpty(ByRef s As String)
'Purp: remove comma separated tokens if empty or only blanks
Const DEL = "$DEL$" ' temporary deletion marker
Dim i As Long
Dim tmp: tmp = Split(s, ",")
For i = LBound(tmp) To UBound(tmp)
tmp(i) = IIf(Len(Trim(tmp(i))) = 0, DEL, Trim(tmp(i)))
Next i
tmp = Filter(tmp, DEL, False) ' remove marked elements
s = Join(tmp, ",")
End Sub
[B] Helper function getCats()
A tricky way to groups characters into five code categories, thus building the basic logic for any further analyzing:
Function getCats(s, Optional info As Boolean = False)
'Purp.: group characters into five code categories
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Note: Cat.: including:
' 1 ~~> apostrophe '
' 2 ~~> space, comma, minus etc
' 3 ~~> point separ., plus etc
' 4 ~~> digits 0..9
' 5 ~~> alpha (even including accented or diacritic letters!)
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'a) get array of single characters
Const CATEG As String = "' - . 0 A" 'define group starters (case indep.)
Dim arr: arr = Char2Arr(s) ' << [C] Char2Arr()
Dim chars: chars = Split(CATEG)
'b) return codes per array element
getCats = Application.Match(arr, chars) 'No 3rd zero-argument!!
'c) display in immediate window (optionally)
If info Then Debug.Print Join(arr, "|") & vbNewLine & Join(getCats, "|")
End Function
[C] Helper function Char2Arr
Assigns every string character to an array:
Function Char2Arr(ByVal s As String)
'Purp.: assign single characters to array
s = StrConv(s, vbUnicode)
Char2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function

Generating regular expression in Excel for strings

I have a huge list of strings where the I am trying to generate a regular expression in an automated way. The strings are pretty simple and I would like to generate regular expressions using a formula or vba code. From the list of strings, here is the following legend:
& - Any UPPERCASE character (A-Z)
# - Any digits (0-9)
_ - Space (/s)
- - Dash
For example, the regular expression generated for the following strings:
Policy Number Policy Digits Regular Expression
####&&###### 12 ^\d{4}[A-Z]{2}\d{6}$
####&_###### 11 ^\d{4}[A-Z]{1}\s\d{6}$
ACPBP&&########## 17 ^[ACPBP]{5}[A-Z]{2}\d{10}$
ACPBA&########## or ACPBA&&########## 16 or 17 ^[ACPBA]{5}[A-Z]{1,2}\d{10}$
########## 10 ^\d{10}$
09############ 14 ^[09]{2}\d{12}$
A&&######, A&&#######, or A&&######## 9, 10 or 11 ^[A]{1}[A-Z]{2}\d{6,8}$
&&&####, &&&#####, or &&&###### 7, 8, or 9 ^[A-Z]{3}\d{4,6}$
09-##########-## 14 ^[09]{2}-\d{10}-\d{2}$
Is there some existing code that is available to generate regular expressions for a huge list of strings? What are some of the hints or tips that I can use to build a regular expression string? Thanks in advance.
There is no existing code, but try this:
Option Explicit
Option Compare Text 'to handle upper and lower case "or"
'Set reference to Microsoft Scripting Runtime
' or use Late Binding if distributing this
Function createRePattern(sPolicyNum As String) As String
Dim dCode As Dictionary, dReg As Dictionary
Dim I As Long, sReg As String, s As String
Dim v, sPN
v = Replace(sPolicyNum, "or", ",")
v = Split(v, ",")
Set dCode = New Dictionary
dCode.Add Key:="&", Item:="[A-Z]"
dCode.Add Key:="#", Item:="\d"
dCode.Add Key:="_", Item:="\s"
For Each sPN In v
sPN = Trim(sPN)
If Not sPN = "" Then
Set dReg = New Dictionary
For I = 1 To Len(sPN)
s = Mid(sPN, I, 1)
If Not dCode.Exists(s) Then dCode.Add s, s
If dReg.Exists(s) Then
dReg(s) = dReg(s) + 1
Else
If dReg.Count = 1 Then
dReg.Add s, 1
s = Mid(sPN, I - 1, 1)
sReg = sReg & dCode(s) & IIf(dReg(s) > 1, "{" & dReg(s) & "}", "")
dReg.Remove s
Else
dReg.Add s, 1
End If
End If
Next I
'Last Entry in Regex
s = Right(sPN, 1)
sReg = sReg & dCode(s) & IIf(dReg(s) > 1, "{" & dReg(s) & "}", "") & "|"
End If
Next sPN
s = Left(sReg, Len(sReg) - 1)
'Non-capturing group added if alternation present
If InStr(s, "|") = 0 Then
sReg = "^" & s & "$"
Else
sReg = "^(?:" & Left(sReg, Len(sReg) - 1) & ")$"
End If
createRePattern = sReg
End Function
Note
As written, there are limitations in that you cannot reference the literal strings:
#, &, _, , or
Generate regex patterns without dictionary
In addition to Ron's valid solution an alternative using no dictionary:
Option Explicit ' declaration head of code module
Function generateRePattern(ByVal s As String) As String
'[0]definitions & declarations
Const Pipe As String = "|"
Dim curSymbol$: curSymbol = "" ' current symbol (start value)
Dim lngth As Long: lngth = Len(s) ' current string length
Dim ii As Long: ii = 0 ' group index (start value)
Dim n As Long ' repetition counter
ReDim tmp(1 To lngth) ' provide for sufficient temp items
'[1](optional) Pipe replacement for "or" and commas
s = Replace(Replace(Replace(s, " or ", Pipe), " ", ""), ",", Pipe)
'[2]analyze string item s
Dim pos As Long ' current character position
For pos = 1 To lngth ' check each character
Dim curChar As String
curChar = Mid(s, pos, 1) ' define current character
If curChar <> curSymbol Then ' start new group
'a) change repetition counter in old group pattern
If ii > 0 Then tmp(ii) = Replace(tmp(ii), "n", n)
'b) increment group counter & get pattern via help function
ii = ii + 1: tmp(ii) = getPattern(curChar) ' << getPattern
'c) start new repetition counter & group symbol
n = 1: curSymbol = curChar
Else
n = n + 1 ' increment current repetition counter
End If
Next pos
'd) change last repetition counter
tmp(ii) = Replace(tmp(ii), "n", n)
ReDim Preserve tmp(1 To ii) '
'[3]return function result
generateRePattern = "^(?:" & Replace(Join(tmp, ""), "{1}", "") & ")$"
End Function
Help function getPattern()
Function getPattern(curChar) As String
'Purpose: return general pattern based on current character
'a) definitions
Const Pipe As String = "|"
Dim symbols: symbols = Split("&|#|_", Pipe)
Dim patterns: patterns = Split("[A-Z]{n}|\d{n}|\s", Pipe)
'b) match character position within symbols
Dim pos: pos = Application.Match(curChar, symbols, 0)
'c) return pattern
If IsError(pos) Then
getPattern = curChar
Else
getPattern = patterns(pos - 1)
End If
End Function

How to remove the LAST set Parentheses from a Excel text string that contains only numeric s

I have an excel spreadsheet with over 50,000 entries. The entries have a name and address and sometimes a phone number ALL in the same string. I am concentrating on the phone number part of the string which is always at the end and enclosed in parentheses. I have been trying to use VBA code to address this.
How to remove the LAST set Parentheses from a Excel text string that contains only numeric s between the parentheses. In any given string there may be either NO parentheses or multiple parentheses but I only want to remove that LAST set and leave the numbers contained there in the string
Example string "Toone Carkeet J., agt.,Alliance Assurance Co. Ltd. (Provident Life branch), 3 St. Andrew st. (1936)" I have tried using VBScript.RegExp to define "(1936)" but I cannot get the RegExp to match the string and replace the parentheses () with "".
For Each Cell In Range
If strPattern<> "" Then
strInput = Cell
With regEx
.Pattern="\(([0-9]))*)"
.Global=False
End With
If .Pattern= True Then
Replace(Cell.Value, "(","")
End If
Here are two quick user defined functions that do not rely on regular expressions. The first uses VBA's StrReverse and the second InStrRev.
Function RemoveParens1(str As String)
str = StrReverse(str)
str = Replace(str, "(", vbNullString, 1, 1)
str = Replace(str, ")", vbNullString, 1, 1)
RemoveParens1 = StrReverse(str)
End Function
Function RemoveParens2(str As String)
Dim o As Integer, c As Integer
o = InStrRev(str, "(")
c = InStrRev(str, ")")
str = Left(str, c - 1) & Mid(str, c + 1)
str = Left(str, o - 1) & Mid(str, o + 1)
RemoveParens2 = str
End Function
If you don't want to use UDFs, just pick the logic method you prefer and adapt it for your own purposes.
Here's one more using regular expression's Replace.
Function RemoveParens3(str As String)
Static rgx As Object, cmat As Object, tmp As String
If rgx Is Nothing Then Set rgx = CreateObject("vbscript.regexp")
With rgx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\([0-9]*\)"
If .test(str) Then
Set cmat = .Execute(str)
tmp = cmat.Item(cmat.Count - 1)
tmp = Mid(tmp, 2, Len(tmp) - 2)
str = .Replace(str, tmp)
End If
End With
RemoveParens3 = str
End Function
Here's an example using similar logic to yours.
I changed the names of the range variables as it is not a good idea to use keywords for named variables, even if the editor will allow that.
Instead of just deleting the parentheses, we match the entire (nnnn) substring with the numbers inside a capturing group, and then replace that match with just the captured group.
Since Replace won't do anything if there is no match, there is no need to test.
Also, note that we set up the regEx OUTSIDE the loop.
With regEx
.Pattern = "\((\d+)\)"
.Global = False
End With
For Each myCell In myRange
myCell = regEx.Replace(myCell, "$1")
Next myCell
If necessary due to other substrings with the same pattern, you could change the pattern to ensure the match is at the end of the line, or that it is the last pattern of that type in the string.
For example:
Substring at end of the line
\((\d+)\)$
Substring the last one
\((\d+)\)(?!.*\(\d+\))
And there may be other modifications necessary if your string is in multiple lines within the cell.
Dim x, y, z As Long
x = 2 'ASSUMING YOUR DATA START AT RANGE A2
With Sheet1
Do While .Cells(x, 1).Value <> ""
If Right(.Cells(x, 1).Value, 1) = ")" Then
.Cells(x, 1).Value = Replace(.Cells(x, 1).Value, ")", "")
z = VBA.Len(.Cells(x, 1).Value)
For y = z To 1 Step -1
If Mid(.Cells(x, 1).Value, y, 1) = "(" Then
.Cells(x, 1).Value = Replace(.Cells(x, 1).Value, "(", "")
Exit For
End If
Next y
x = x + 1
End If
Loop
End With

VBA / How can I filter an array on exact string?

As in my title, I'm trying to filter out specific strings from a VBA array, based on an other array.
My code looks something like this :
For Each item In exclusions_list
updated_list = Filter(updated_list, item, False, vbTextCompare)
Next item
My issue is that I only want to exclude exact matches and I can't seem to find a way to do so.
If I have "how" in exclusions_list, I'd like to exclude "how" from updated_list but not "however".
My apologies if this has been asked before. I couldn't find a clear answer and I am not very familiar with VBA.
Thanks !
The Filter method only looks for substrings. It does not have a way of recognizing whole words.
One way to do this is by using Regular Expressions which include a token to recognize word boundaries. This will only work if the substrings you are considering do not include non-Word characters. Word characters are those in the set of [A-Za-z0-9_] (with some exceptions for non-English languages).
For example:
Option Explicit
Sub foo()
Dim arr
Dim arrRes
Dim V
Const sfilter As String = "gi"
Dim col As Collection
arr = Array("Filter", "by", "bynomore", "gi", "gif")
Dim re As Object, MC As Object, I As Long
Set col = New Collection
Set re = CreateObject("vbscript.regexp")
With re
.ignorecase = True
.Pattern = "\b" & sfilter & "\b"
For I = 0 To UBound(arr)
If .test(arr(I)) = False Then _
col.Add arr(I)
Next I
End With
ReDim arrRes(0 To col.Count - 1)
For I = 1 To col.Count
arrRes(I - 1) = col(I)
Next I
End Sub
The resulting array arrRes will contain gif but not gi
Approach via a very simple Replace function
In addition to the valid solutions above and just to demonstrate another approach using a simple Replace function. This solution doesn't pretend to be the most efficient way to execute exclusions.
Example code
Sub Howdy()
' Purpose: exclude exactly matching array items (not case sensitive)
Dim exclusions_list, updated_list, item
exclusions_list = Array("How", "much")
' assign test list (with successive repetitions)
updated_list = Split("Bla bla,How,how,Howdy,However,How,much,much,much,Much,Much,How much,something else", ",")
' Debug.Print UBound(updated_list) + 1 & " items in original list: """ & Join(updated_list, "|") & """"
' execute exclusions
For Each item In exclusions_list
updated_list = modifyArr(updated_list, item) ' call helper function modifyArr()
' Debug.Print UBound(updated_list) + 1 & " items excluding """ & item & """:" & vbTab & """" & _
Join(updated_list, "|") & """"
Next item
End Sub
Note
Not outcommenting the Debug.Print Statements you'd get the following results in the VBE immediate window:
13 items in original list: "Bla bla|How|how|Howdy|However|How|much|much|much|Much|Much|How much|something else"
10 items excluding "How": "Bla bla|Howdy|However|much|much|much|Much|Much|How much|something else"
5 items excluding "much": "Bla bla|Howdy|However|How much|something else"
Helper function modifyArr()
Please note that it's necessary to provide for successive repetitions of strings to be excluded, as a single Replace statement wouldn't exceute every wanted replacement in subsequent string parts.
Function modifyArr(ByVal arr, ByVal item) As Variant
Const C = ",": Dim temp$, sLen$
temp = Replace(C & Join(arr, C) & C, C & item & C, Replace:=C, Compare:=vbTextCompare)
Do While True ' needed to get successive repetitions !
sLen = Len(temp)
temp = Replace(temp, C & item & C, Replace:=C, Compare:=vbTextCompare)
If sLen = Len(temp) Then Exit Do
Loop
' return
modifyArr = Split(Mid$(temp, 2, Len(temp) - 2), C)
End Function
Add a reference to RegEx:
Option Explicit
Sub Filter()
Dim words() As String
words = Split("how,however,test3,test4,,,howevermore,how,whatsohowever,test1,test2", ",")
Dim regex As New RegExp
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "^how$" ' ^ means must start with and $ means must end with
End With
Dim i As Integer
For i = 0 To UBound(words)
If regex.Test(words(i)) Then
' Debug.Print words(i) + " is an exact match!"
words(i) = vbNullString ' Just clear out item, skip later.
Else
' Debug.Print words(i) + " is NOT a match!"
End If
Next i
For i = 0 To UBound(words)
If (StrPtr(words(i)) <> 0) Then ' We can use this to explicitly catch vbNullString, because "" has a pointer.
Debug.Print words(i)
End If
Next i
End Sub
Initially, I'm not clear why people are getting into RegExp here. RegExp is for complex pattern matching, not an exact match. For more on that point, see another answer here.
Basic Loop
The simplest way to do this is to loop through the array and test each value:
Sub ShowFilterOutExact()
startingArray = Array("Filter", "by", "bynomore", "gi", "gif")
filteredArray = FilterOutExact("gif", startingArray)
End Sub
Function FilterOutExact(exactValue, sourceArray)
'Start with a returnArray the same size as the sourceArray
ReDim returnArray(UBound(sourceArray))
For i = 0 To UBound(sourceArray)
If sourceArray(i) <> exactValue Then
returnArray(matchIndex) = sourceArray(i)
matchIndex = matchIndex + 1
End If
Next
'Now trim the returnArray down to size
ReDim Preserve returnArray(matchIndex - 1)
FilterOutExact = returnArray
End Function
For alternatives to the equal operator (or <> for "does not equal"), this answer has more details.
Replace and Filter
You can also do a workaround with the built in Filter() function to get an exact match.
Function FilterExactMatch(SourceArray, Match, Optional DumpValue = "#/#/#", Optional Include = True)
'Make sure the DumpValue is not found in the sourceArray in any element
For i = LBound(SourceArray) To UBound(SourceArray)
ExactMatch = SourceArray(i) = Match
If ExactMatch Xor Include Then SourceArray(i) = DumpValue
Next
FilterExactMatch = Filter(SourceArray, DumpValue, False)
End Function
Filter out multiple values at once
Finally, it turns out the Application.Match function can check an array of values against an array of values to see if any match. This can be used to filter out multiple values at once (or just one) on an exact basis.
Function FilterOutMultiple(unwantedValuesArray, sourceArray)
If LBound(sourceArray) <> 0 Then
MsgBox "sourceArray argument must be zero-based for this to work as written"
Exit Function
End If
matchArray = Application.Match(sourceArray, unwantedValuesArray, 0)
matchCount = Application.Count(matchArray) 'Count non-error values
ReDim returnArray(UBound(sourceArray) - matchCount)
j = -1
For i = 0 To UBound(sourceArray)
If IsError(matchArray(i + 1)) Then 'Keep the error indexes
j = j + 1
returnArray(j) = sourceArray(i)
End If
Next
FilterOutMultiple = returnArray
End Function

How can I find quoted text in a string?

Example
Say I have a string:
"I say ""Hello world"" and she says ""Excuse me?"""
VBA will interpret this string as:
I say "Hello world" and she says "Excuse me?"
A more complex example:
I have a string:
"I say ""Did you know that she said """"Hi there!"""""""
VBA interprets this string as:
I say "Did you know that she said ""Hi there!"""
If we remove "I say "
"Did you know that she said ""Hi there!"""
we can continue parsing the string in vba:
Did you know that she said "Hi there!"
Problem
Ultimately I want some function, sBasicQuote(quotedStringHierarchy as string), which returns a string containing the next level up in the string hierarchy.
E.G.
dim s as string
s = "I say ""Did you know that she said """"Hi there!"""""""
s = sBasicQuote(s) ' returns 'I say "Did you know that she said ""Hi there!"""'
s = sBasicQuote(s) ' returns 'Did you know that she said "Hi there!"'
s = sBasicQuote(s) ' returns 'Hi there!'
I just can't figure out an algorithm that would work with this... You almost need to replace all double quotes, but when you've replaced the nth double quote you have to skip to the n+1th douple quote?
How does one implement this in VBA?
You could do something like this
Public Sub test()
Dim s As String
s = "I say ""Did you know that she said """"Hi there!"""""""
Debug.Print DoubleQuote(s, 0)
Debug.Print DoubleQuote(s, 1)
Debug.Print DoubleQuote(s, 2)
End Sub
Public Function DoubleQuote(strInput As String, intElement As Integer) As String
Dim a() As String
strInput = Replace(strInput, String(2, Chr(34)), String(1, Chr(34)))
a = Split(strInput, chr(34))
DoubleQuote = a(intElement)
End Function
Another slightly modified version is a little more accurate
`Public Function DoubleQuote(strInput As String, intElement As Integer) As String
Dim a() As String
Dim b() As String
Dim i As Integer
ReDim b(0)
a = Split(strInput, Chr(34))
' ***** See comments re using -1 *******
For i = 0 To UBound(a) - 1
If Len(a(i)) = 0 Then
b(UBound(b)) = Chr(34) & a(i + 1) & Chr(34)
i = i + 1
Else
b(UBound(b)) = a(i)
End If
ReDim Preserve b(UBound(b) + 1)
Next i
DoubleQuote = b(intElement)
End Function`
I think the following will return what you are looking for in your nested quote example. Your first example is not really a situation of nested quotes.
Option Explicit
Sub NestedQuotes()
Const s As String = "I say ""Did you know that she said """"Hi there!"""""""
Dim COL As Collection
Dim Start As Long, Length As Long, sTemp As String, V As Variant
Set COL = New Collection
sTemp = s
COL.Add sTemp
Do Until InStr(sTemp, Chr(34)) = 0
sTemp = COL(COL.Count)
sTemp = Replace(sTemp, String(2, Chr(34)), String(1, Chr(34)))
Start = InStr(sTemp, Chr(34)) + 1
Length = InStrRev(sTemp, Chr(34)) - Start
sTemp = Mid(sTemp, Start, Length)
COL.Add sTemp
Loop
For Each V In COL
Debug.Print V
Next V
End Sub
My Solution
I spent some more time thinking and came up with this solution.
Function sMineDoubleQuoteHierarchy(s As String) As String
'Check the number of quotes in the string are even - sanity check
If (Len(s) - Len(Replace(s, """", ""))) Mod 2 <> 0 Then sMineDoubleQuoteHierarchy = "Error - Odd number of quotes found in sMineDoubleQuoteHierarchy() function": Exit Function
'First thing to do is find the first and last *single* quote in the string
Dim lStart, lEnd, i As Long, fs As String
lStart = InStr(1, s, """")
lEnd = InStrRev(s, """")
'After these have been found we need to remove them.
s = Mid(s, lStart + 1, lEnd - lStart - 1)
'Start at the first character
i = 1
Do While True
'Find where the next double quote is
i = InStr(1, s, """""")
'if no double quote is found then concatenate with fs with the remainder of s
If i = 0 Then Exit Do
'Else add on the string up to the char before the ith quote
fs = fs & Left(s, i - 1)
'Replace the ith double quote with a single quote
s = Left(s, i - 1) & Replace(s, """""", """", i, 1)
'Increment by 1 (ensuring the recently converted double quote is no longer a single quote
i = i + 1
Loop
'Return fs
sMineDoubleQuoteHierarchy = s
End Function
What's going on in this solution?
The first part of the process is removing the first and last single quote from the string and returning the text between them. Then we loop through the string replacing each instance of "" and replacing it with ". Each time we do this we skip to the next character to unsure strings like """" go to "" instead of ".
Does anyone else have a better/more compact solution?
Edit
After all the suggestions in this forum I settled with this. It's got some extra error trapping to find validate nested strings.
Public Function DoubleQuoteExtract(ByVal s As String, Optional ByRef ErrorLevel As Boolean) As String
'This effectively parses the string like BASIC does by removing incidents of "" and replacing them with "
'SANITY CHECK - Check even number of quotes
Dim countQuote As Double
countQuote = Len(s) - Len(Replace(s, """", ""))
'Calculate whether or not quote hierarchy is correct:
'"..." - Is okay - Count Quotes = 2 - Count Quotes / 2 = 1
'""..."" - Is not okay - Count Quotes = 4 - Count Quotes / 2 = 2
'"""...""" - Is okay - Count Quotes = 6 - Count Quotes / 2 = 3
'""""..."""" - Is not okay - Count Quotes = 8 - Count Quotes / 2 = 4
'etc.
'Ultimately: IF CountQuotes/2 = Odd The string hierarchy is setup fine
' IF CountQuotes/2 = Even, The string Hierarchy is setup incorrectly.
Dim X As Double: X = countQuote / 2
Dim ceil As Long: ceil = Int(X) - (X - Int(X) > 0)
If ceil Mod 2 <> 0 Then sDoubleQuoteExtract = "#Error - Incorrect number of double quotes forming an incomplete hierarchy.": GoTo ErrorOccurred
'If an odd number of quotes are found then they cannot be paired correctly, thus throw error
If countQuote Mod 2 <> 0 Then sDoubleQuoteExtract = "#Error - Odd number of quotes found in sMineDoubleQuoteHierarchy() function": GoTo ErrorOccurred
'Find the next incident of single quote. Trim the string to this
s = Mid(s, InStr(1, s, String(1, Chr(34))))
'replace all instances of "" with "
s = Replace(s, String(2, Chr(34)), String(1, Chr(34)))
'Finally trim off the first and last quotes
DoubleQuoteExtract = Mid(s, 2, Len(s) - 2)
ErrorLevel = False
Exit Function
ErrorOccurred:
ErrorLevel = True
End Function

Resources