Split string and delimiters into an array - excel

I have the following string:
top,fen,test,delay,test
I want to convert it into an array in the following way:
{top}{,}{fen}{,}{delay}{,}{test}

If you actually need the commas as part of the array, then probably the simplest approach is to do a Replace statement, replacing each comma with a comma surrounded by some other character(s) you can use as a delimiter. Whatever character you opt to use should be unique enough that it is unlikely to appear in the rest of your word list. I will use an underscore, here, but you could use any other special character.
Sub test()
Dim wordlist As String
Dim arrayofWords
Dim i
wordlist = "top,fen,test,delay,test"
wordlist = Replace(wordlist, ",", "_,_")
arrayofWords = Split(wordlist, "_")
'Enclose each word in curly brackets
' omit if this part is not needed
For i = LBound(arrayofWords) To UBound(arrayofWords)
arrayofWords(i) = "{" & arrayofWords(i) & "}"
Next
End Sub
You could use some funky double-byte character since these would rarely if ever be encountered in your word list, like so.
Sub test()
Const oldDelimiter As String = ","
Dim splitter As String
Dim newDelimiter As String
Dim wordlist As String
Dim arrayofWords
Dim i As Long
'Create our new delimiter by concatenating a new string with the comma:
splitter = ChrW(&H25B2)
newDelimiter = splitter & oldDelimiter & splitter
'Define our word list:
wordlist = "top,fen,test,delay,test"
'Replace the comma with the new delimiter, defined above:
wordlist = Replace(wordlist, oldDelimiter, newDelimiter)
'Use SPLIT function to convert the string to an array
arrayofWords = Split(wordlist, splitter)
'Iterate the array and add curly brackets to each element
'Omit if this part is not needed
For i = LBound(arrayofWords) To UBound(arrayofWords)
arrayofWords(i) = "{" & arrayofWords(i) & "}"
Next
End Sub
Here are results from the second method:

Try something like this :
WordsList = Split("top,fen,test,delay,test", ",")
Result = ""
Count = UBound(WordsList)
For i = 0 To Count
Result = Result & "{" & WordsList(i) & "}"
if i < Count then Result = Result & "{,}"
Next i
In an array will look like this :
WordsList = Split("top,fen,test,delay,test", ",")
Dim Result()
Count = (UBound(WordsList)*2) - 1
Redim Result(Count)
j = 0
For i = 0 To UBound(WordsList)
Result(j) = WordsList(i)
j = j + 1
if j < Count then Result(j) = ","
j = j + 1
Next i
Split : http://msdn.microsoft.com/en-us/library/6x627e5f%28v=vs.90%29.aspx
UBound : http://msdn.microsoft.com/en-us/library/95b8f22f%28v=vs.90%29.aspx
Redim : http://msdn.microsoft.com/en-us/library/w8k3cys2.aspx

Here's a really short solution:
Function StringToCurlyArray(s As String) As String()
StringToCurlyArray = Split("{" & Replace(s, ",", "}|{,}|{") & "}", "|")
End Function
Pass your comma-delimited string into that and you'll get an array of curly-braced strings back out, including commas.

Related

Remove alphanumeric chars in front of a defined char

I have a string in a cell composed of several shorter strings of various lengths with blank spaces and commas in between. In some cases only one or more blanks are in between.
I want to remove every blank space and comma and only leave behind 1 comma between each string element. The result must look like this:
The following doesn't work. I'm not getting an error but the strings are truncated at the wrong places. I don't understand why.
Sub String_adaption()
Dim i, j, k, m As Long
Dim STR_A As String
STR_A = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
i = 1
With Worksheets("table")
For m = 1 To Len(.Range("H" & i))
j = 1
Do While Mid(.Range("H" & i), m, 1) = "," And Mid(.Range("H" & i), m - 1, 1) <> Mid(STR_A, j, 1) And m <> Len(.Range("H" & i))
.Range("H" & i) = Mid(.Range("H" & i), 1, m - 2) & Mid(.Range("H" & i), m, Len(.Range("H" & i)))
j = j + 1
Loop
Next m
End With
End Sub
I'd use a regular expression to replace any combination of spaces and comma's. Something along these lines:
Sub Test()
Dim str As String: str = "STRING_22 ,,,,,STRING_1 , , ,,,,,STRING_333 STRING_22 STRING_4444"
Debug.Print RegexReplace(str, "[\s,]+", ",")
End Sub
Function RegexReplace(x_in, pat, repl) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = pat
RegexReplace = .Replace(x_in, repl)
End With
End Function
Just for the sake of alternatives:
Formula in B1:
=TEXTJOIN(",",,TEXTSPLIT(A1,{" ",","}))
The following function will split the input string into pieces (words), using a comma as separator. When the input string has multiple commas, it will result in empty words.
After splitting, the function loops over all words, trims them (remove leading and trailing blanks) and glue them together. Empty words will be skipped.
I have implemented it as Function, you could use it as UDF: If your input string is in B2, write =String_adaption(B2) as Formula into any cell.
Function String_adaption(s As String) As String
' Remove duplicate Commas and Leading and Trailing Blanks from words
Dim words() As String, i As Long
words = Split(s, ",")
For i = 0 To UBound(words)
Dim word As String
word = Trim(words(i))
If word <> "" Then
String_adaption = String_adaption & IIf(String_adaption = "", "", ",") & word
End If
Next i
End Function
P.S.: Almost sure that this could be done with some magic regular expressions, but I'm not an expert in that.
If you have recent Excel version, you can use simple worksheet function to split the string on space and on comma; then put it back together using the comma deliminater and ignoring the blanks (and I just noted #JvdV had previously posted the same formula solution):
=TEXTJOIN(",",TRUE,TEXTSPLIT(A1,{" ",","}))
In VBA, you can use a similar algorithm, using the ArrayList object to collect the non-blank results.
Option Explicit
Function commaOnly(s As String) As String
Dim v, w, x, y
Dim al As Object
Set al = CreateObject("System.Collections.ArrayList")
v = Split(s, " ")
For Each w In v
x = Split(w, ",")
For Each y In x
If y <> "" Then al.Add y
Next y
Next w
commaOnly = Join(al.toarray, ",")
End Function
This preserves the spaces within the smaller strings.
Option Explicit
Sub demo()
Const s = "STRING 22,,,, ,,STRING 1,,,, ,,STRING 333 , , , STRING_22 STRING_44"
Debug.Print Cleanup(s)
End Sub
Function Cleanup(s As String) As String
Const SEP = ","
Dim regex, m, sOut As String, i As Long, ar()
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([^,]+)(?:[ ,]*)"
End With
If regex.Test(s) Then
Set m = regex.Execute(s)
ReDim ar(0 To m.Count - 1)
For i = 0 To UBound(ar)
ar(i) = Trim(m(i).submatches(0))
Next
End If
Cleanup = Join(ar, SEP)
End Function
Code categories approach
For the sake of completeness and to show also other ways "leading to Rome", I want to demonstrate an approach allowing to group the string input into five code categories in order to extract alphanumerics by a tricky match (see [B] Function getCats()):
To meet the requirements in OP use the following steps:
1) remove comma separated tokens if empty or only blanks (optional),
2) group characters into code categories,
3) check catCodes returning alpha nums including even accented or diacritic letters as well as characters like [ -,.+_]
Function AlphaNum(ByVal s As String, _
Optional IgnoreEmpty As Boolean = True, _
Optional info As Boolean = False) As String
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Date: 2023-01-12
'1) remove comma separated tokens if empty or only blanks (s passed as byRef argument)
If IgnoreEmpty Then RemoveEmpty s ' << [A] RemoveEmpty
'2) group characters into code categories
Dim catCodes: catCodes = getCats(s, info) ' << [B] getCats()
'3) check catCodes and return alpha nums plus chars like [ -,.+_]
Dim i As Long, ii As Long
For i = 1 To UBound(catCodes)
' get current character
Dim curr As String: curr = Mid$(s, i, 1)
Dim okay As Boolean: okay = False
Select Case catCodes(i)
' AlphaNum: cat.4=digits, cat.5=alpha letters
Case Is >= 4: okay = True
' Category 2: allow only space, comma, minus
Case 2: If InStr(" -,", curr) <> 0 Then okay = True
' Category 3: allow only point, plus, underline
Case 3: If InStr(".+_", curr) <> 0 Then okay = True
End Select
If okay Then ii = ii + 1: catCodes(ii) = curr ' increment counter
Next i
ReDim Preserve catCodes(1 To ii)
AlphaNum = Join(catCodes, vbNullString)
End Function
Note: Instead of If InStr(" -,", curr) <> 0 Then in Case 2 you may code If curr like "[ -,]" Then, too. Similar in Case 3 :-)
[A] Helper procedure RemoveEmpty
Optional clean-up removing comma separated tokens if empty or containing only blanks:
Sub RemoveEmpty(ByRef s As String)
'Purp: remove comma separated tokens if empty or only blanks
Const DEL = "$DEL$" ' temporary deletion marker
Dim i As Long
Dim tmp: tmp = Split(s, ",")
For i = LBound(tmp) To UBound(tmp)
tmp(i) = IIf(Len(Trim(tmp(i))) = 0, DEL, Trim(tmp(i)))
Next i
tmp = Filter(tmp, DEL, False) ' remove marked elements
s = Join(tmp, ",")
End Sub
[B] Helper function getCats()
A tricky way to groups characters into five code categories, thus building the basic logic for any further analyzing:
Function getCats(s, Optional info As Boolean = False)
'Purp.: group characters into five code categories
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Note: Cat.: including:
' 1 ~~> apostrophe '
' 2 ~~> space, comma, minus etc
' 3 ~~> point separ., plus etc
' 4 ~~> digits 0..9
' 5 ~~> alpha (even including accented or diacritic letters!)
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'a) get array of single characters
Const CATEG As String = "' - . 0 A" 'define group starters (case indep.)
Dim arr: arr = Char2Arr(s) ' << [C] Char2Arr()
Dim chars: chars = Split(CATEG)
'b) return codes per array element
getCats = Application.Match(arr, chars) 'No 3rd zero-argument!!
'c) display in immediate window (optionally)
If info Then Debug.Print Join(arr, "|") & vbNewLine & Join(getCats, "|")
End Function
[C] Helper function Char2Arr
Assigns every string character to an array:
Function Char2Arr(ByVal s As String)
'Purp.: assign single characters to array
s = StrConv(s, vbUnicode)
Char2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function

Excel VBA Replace the nth word in a string

my problem is the following:
I have two sets of strings. The "words" are separated with "+".
String 1: A25+F47+w41+r21+h65
String 2: 4+7+4+4+2
I have a textbox that identifies the word “w41” in string 1. It is the 3rd word in the string. I want to replace the 3rd word in string 2 and that would be the second “4”
What I have so far is:
I am using split function to split string 1 where there is a “+”:
Result=Split(String1, "+")
Then I use the UBound to find the position of w41 and the result is 3.
FindWordPosition = UBound(Result()) + 1
Now I want to split string 2 in the same way. But then I want to change the 3rd word in string 2 from “4” to “3” and then put it together again. The result would be:
String 2: 4+7+3+4+2
but I cannot figure out how to do it :(
One way is to use an ArrayList.
Not sure what you want to replace the matched item with. In the code below, it is being replaced with the sequence number, which matches what you describe, but may not be what you really want.
Option Explicit
Sub due()
Const str1 As String = "A25+F47+w41+r21+h65"
Const str2 As String = "4+7+4+4+2"
Const strMatch As String = "w41"
Dim AL As Object
Dim v, w, I As Long
Set AL = CreateObject("System.Collections.ArrayList")
'put str2 into arrayList
v = Split(str2, "+")
For Each w In v
AL.Add w
Next w
'Check str1 against matchStr to get positions and act on the item in AL at that position
v = Split(str1, "+")
For I = 0 To UBound(v)
'Note that arrayList index and "Split" array are zero-based
If strMatch = v(I) Then
AL.removeat I 'remove item in the same position as the position of the matched item
AL.Insert I, I + 1 'Insert new item at that same position. Could be anything. I chose I+1 to match what you wrote in your question.
End If
Next I
Debug.Print Join(AL.toarray, "+")
End Sub
=> 4+7+3+4+2
Replace With Index
This example should get you on your feet.
The Code
Option Explicit
' Results
' For w41:
' A25+F47+w41+r21+h65
' 4+7+3+4+1
' For h65:
' A25+F47+w41+r21+h65
' 4+7+4+4+5
Sub replaceWithIndex()
Const Criteria As String = "w41"
Const str1 As String = "A25+F47+w41+r21+h65"
Const str2 As String = "4+7+4+4+1"
Const Delimiter As String = "+"
Dim Split1() As String: Split1 = Split(str1, Delimiter)
Dim Split2() As String: Split2 = Split(str2, Delimiter)
' Note that an array obtained with 'Split' is always zero-based, while
' the result of 'Application.Match' is always one-based (hence '- 1').
Dim cMatch As Variant: cMatch = Application.Match(Criteria, Split1, 0)
If IsNumeric(cMatch) Then
Split2(cMatch - 1) = cMatch
Debug.Print "Source"
Debug.Print str1
Debug.Print str2
Debug.Print "Result"
Debug.Print Join(Split1, Delimiter)
Debug.Print Join(Split2, Delimiter)
End If
End Sub
You can simply loop the array and exit if found:
Public Sub BuildPositions()
Const Criteria As String = "w41"
Const String1 As String = "A25+F47+w41+r21+h65"
Const String2 As String = "4+7+4+4+2"
Const Delimiter As String = "+"
Dim Results() As String
Dim Positions() As String
Dim Index As Integer
Results = Split(String1, Delimiter)
Positions = Split(String2, Delimiter)
For Index = LBound(Results) To UBound(Results)
If Results(Index) = Criteria Then
Exit For
End If
Next
If Index <= UBound(Results) Then
' Result was located.
Positions(Index) = 1 + Index
End If
Debug.Print "Results:", String1
Debug.Print "Positions1:", String2
Debug.Print "Positions2:", Join(Positions, Delimiter)
End Sub
Output:
Results: A25+F47+w41+r21+h65
Positions1: 4+7+4+4+2
Positions2: 4+7+3+4+2

To find the missing numbers in a comma-separated list

I have a comma separated lists in cells. All numbers are positive and between 1 and 10.
Example:
if I have in A1: (2,3,5,6), I would like to have missing numbers in B1:(1,4,7,8,9,10).
If A2: (1,10), then I would have in B2:(2,3,4,5,6,7,8,9)
If A3: (7), then I would have in B2:(1,2,3,4,5,6,8,9,10)
I searched for a solution online, but I couldn't find anything similar with comma separated numbers.
I'd be glad if I can have a solution here. Thanks.
Here is a user-defined function that should accomplish this... probably can be optimized.
Public Function MissingNumbers(ByVal numberList As String) As String
Dim temp As String
temp = Replace(numberList, "(", "")
temp = Replace(temp, ")", "")
Dim arr As Variant
arr = Split(temp, ",")
Dim newNumbers As String
newNumbers = "1,2,3,4,5,6,7,8,9,10,"
Dim i As Long
For i = LBound(arr) To UBound(arr)
newNumbers = Replace(newNumbers, arr(i) & ",", "")
Next
newNumbers = "(" & Left$(newNumbers, Len(newNumbers) - 1) & ")"
MissingNumbers = newNumbers
End Function
Just for fun demonstrating how to use negative filtering:
Function MissingList(ByVal numberList As String) As String
Dim given: given = Split(Mid(numberList, 2, Len(numberList) - 2), ",")
Dim series: series = GetSeries() ' i.e. numbers 1..10
Dim i As Long
For i = 0 To UBound(given)
series = Filter(series, given(i), False) ' << negative filtering
Next
MissingList = "(" & Replace(Join(series, ","), "0", "10") & ")"
End Function
As Filter executes a partial search in the 1..10 series, 10 has to be replaced temporarily by a unique 0.
Help function GetSeries()
Function GetSeries()
' Purpose: get numbers 1..10
Const LAST As Long = 10: Const FIRST = 1
Dim tmp: tmp = Application.Transpose(Evaluate("row(" & FIRST & ":" & LAST & ")"))
tmp(LAST) = 0 ' replace 10 by 0 as search item 1 would filter out value 10, too
GetSeries = tmp
End Function

Generating regular expression in Excel for strings

I have a huge list of strings where the I am trying to generate a regular expression in an automated way. The strings are pretty simple and I would like to generate regular expressions using a formula or vba code. From the list of strings, here is the following legend:
& - Any UPPERCASE character (A-Z)
# - Any digits (0-9)
_ - Space (/s)
- - Dash
For example, the regular expression generated for the following strings:
Policy Number Policy Digits Regular Expression
####&&###### 12 ^\d{4}[A-Z]{2}\d{6}$
####&_###### 11 ^\d{4}[A-Z]{1}\s\d{6}$
ACPBP&&########## 17 ^[ACPBP]{5}[A-Z]{2}\d{10}$
ACPBA&########## or ACPBA&&########## 16 or 17 ^[ACPBA]{5}[A-Z]{1,2}\d{10}$
########## 10 ^\d{10}$
09############ 14 ^[09]{2}\d{12}$
A&&######, A&&#######, or A&&######## 9, 10 or 11 ^[A]{1}[A-Z]{2}\d{6,8}$
&&&####, &&&#####, or &&&###### 7, 8, or 9 ^[A-Z]{3}\d{4,6}$
09-##########-## 14 ^[09]{2}-\d{10}-\d{2}$
Is there some existing code that is available to generate regular expressions for a huge list of strings? What are some of the hints or tips that I can use to build a regular expression string? Thanks in advance.
There is no existing code, but try this:
Option Explicit
Option Compare Text 'to handle upper and lower case "or"
'Set reference to Microsoft Scripting Runtime
' or use Late Binding if distributing this
Function createRePattern(sPolicyNum As String) As String
Dim dCode As Dictionary, dReg As Dictionary
Dim I As Long, sReg As String, s As String
Dim v, sPN
v = Replace(sPolicyNum, "or", ",")
v = Split(v, ",")
Set dCode = New Dictionary
dCode.Add Key:="&", Item:="[A-Z]"
dCode.Add Key:="#", Item:="\d"
dCode.Add Key:="_", Item:="\s"
For Each sPN In v
sPN = Trim(sPN)
If Not sPN = "" Then
Set dReg = New Dictionary
For I = 1 To Len(sPN)
s = Mid(sPN, I, 1)
If Not dCode.Exists(s) Then dCode.Add s, s
If dReg.Exists(s) Then
dReg(s) = dReg(s) + 1
Else
If dReg.Count = 1 Then
dReg.Add s, 1
s = Mid(sPN, I - 1, 1)
sReg = sReg & dCode(s) & IIf(dReg(s) > 1, "{" & dReg(s) & "}", "")
dReg.Remove s
Else
dReg.Add s, 1
End If
End If
Next I
'Last Entry in Regex
s = Right(sPN, 1)
sReg = sReg & dCode(s) & IIf(dReg(s) > 1, "{" & dReg(s) & "}", "") & "|"
End If
Next sPN
s = Left(sReg, Len(sReg) - 1)
'Non-capturing group added if alternation present
If InStr(s, "|") = 0 Then
sReg = "^" & s & "$"
Else
sReg = "^(?:" & Left(sReg, Len(sReg) - 1) & ")$"
End If
createRePattern = sReg
End Function
Note
As written, there are limitations in that you cannot reference the literal strings:
#, &, _, , or
Generate regex patterns without dictionary
In addition to Ron's valid solution an alternative using no dictionary:
Option Explicit ' declaration head of code module
Function generateRePattern(ByVal s As String) As String
'[0]definitions & declarations
Const Pipe As String = "|"
Dim curSymbol$: curSymbol = "" ' current symbol (start value)
Dim lngth As Long: lngth = Len(s) ' current string length
Dim ii As Long: ii = 0 ' group index (start value)
Dim n As Long ' repetition counter
ReDim tmp(1 To lngth) ' provide for sufficient temp items
'[1](optional) Pipe replacement for "or" and commas
s = Replace(Replace(Replace(s, " or ", Pipe), " ", ""), ",", Pipe)
'[2]analyze string item s
Dim pos As Long ' current character position
For pos = 1 To lngth ' check each character
Dim curChar As String
curChar = Mid(s, pos, 1) ' define current character
If curChar <> curSymbol Then ' start new group
'a) change repetition counter in old group pattern
If ii > 0 Then tmp(ii) = Replace(tmp(ii), "n", n)
'b) increment group counter & get pattern via help function
ii = ii + 1: tmp(ii) = getPattern(curChar) ' << getPattern
'c) start new repetition counter & group symbol
n = 1: curSymbol = curChar
Else
n = n + 1 ' increment current repetition counter
End If
Next pos
'd) change last repetition counter
tmp(ii) = Replace(tmp(ii), "n", n)
ReDim Preserve tmp(1 To ii) '
'[3]return function result
generateRePattern = "^(?:" & Replace(Join(tmp, ""), "{1}", "") & ")$"
End Function
Help function getPattern()
Function getPattern(curChar) As String
'Purpose: return general pattern based on current character
'a) definitions
Const Pipe As String = "|"
Dim symbols: symbols = Split("&|#|_", Pipe)
Dim patterns: patterns = Split("[A-Z]{n}|\d{n}|\s", Pipe)
'b) match character position within symbols
Dim pos: pos = Application.Match(curChar, symbols, 0)
'c) return pattern
If IsError(pos) Then
getPattern = curChar
Else
getPattern = patterns(pos - 1)
End If
End Function

VBA # trim string to remove characters

i have a file name the i need to remove some characters below is file name and the goal after trim filename.
My Current String = "text_12_12_19.pdl"
New String Goal = "Text.pdl"
You can use Split:
MyStringGoal = Split(MyCurrentString, "_")(0) & "." & Split(MyCurrentString, ".")(1)
Assuming you are looking to obtain all characters preceding the first underscore, I would suggest the following:
Function TrimFilename(fnm As String) As String
Dim i As Long, j As Long
i = InStr(fnm, "_")
j = InStrRev(fnm, ".")
If 0 < i And i < j Then
TrimFilename = Mid(fnm, 1, i - 1) & Mid(fnm, j)
Else
TrimFilename = fnm
End If
End Function
?TrimFilename("text_12_12_19.pdl")
text.pdl
'Another solution (can also use left and right) :
Dim my_current_string As String
Dim New_String_Goal As String
Dim r As String, l As String
my_current_string = "text_12_12_19.pdl"
l = Left(my_current_string, 4)
r = Right(my_current_string, 4)
New_String_Goal = l & r
Debug.Print New_String_Goal

Resources