Split Function - divide cell by string - excel

I am trying to divide merged information from one cell into separate cells.
one cell:
amount:2 price:253,18 price2:59,24 EU status:WBB NAS MRR OWA PXA min:1 opt:3 category: PNE code z:195750
divided data: (I want to export each part into another cell)
amount:2
price:253,18
price2:59,24 EU
status:WBB NAS MRR OWA PXA
min:1
opt:3
category: PNE
code z:195750
I can't simply divide by finding empty space, status cell which is case-sensitive | status:WBB NAS MRR OWA PXA| has a different data range with spaces that can't be divided.
Split ( expression [,delimiter] [,limit] [,compare] )
Sub Split_VBA()
'Create variables
Dim MyArray() As String, MyString As String, N As Integer, Temp As String
MyString = B2 ' TRYING TO GET DATA FROM CELL B2 TO SPLIT IT
'Use the split function to divide the string using a string "price:"
MyArray = Split(MyString, "price:")
Dim arr() As String
' Split the string to an array
arr = Split(B2, "price:") 'try to divide first part of data when appears string 'price:'
For N = 0 To UBound(MyArray)
'place each array element plus a line feed character into a string
Temp = Temp & MyArray(N) & vbLf
Next N
' I WOULD LIKE TO PROVIDE RESULT IN A ROW NOT IN A COLUMN
Range("A1") = Temp
End Sub
So far this VBA code seems to be a little above my abilities and as far as I checked some online available samples, tried to provide code as below, but I just got stuck and I hereby ask you dear community for some piece of advice.

As the order is the same one way is to simply search for adjacent key names & parse out whats in-between:
Sub g()
Dim stringValue As String
stringValue = "amount:2 price:253,18 price2:59,24 EU status:WBB NAS MRR OWA PXA min:1 opt:3 category: PNE code z:195750"
Debug.Print getPart(stringValue, "amount", "price")
Debug.Print getPart(stringValue, "price", "price2")
Debug.Print getPart(stringValue, "price2", "status")
Debug.Print getPart(stringValue, "status", "min")
Debug.Print getPart(stringValue, "min", "opt")
Debug.Print getPart(stringValue, "opt", "category")
Debug.Print getPart(stringValue, "category", "code z")
Debug.Print getPart(stringValue, "code z", "", True)
End Sub
Function getPart(value As String, fromKey As String, toKey As String, Optional isLast As Boolean = False) As String
Dim pos1 As Long, pos2 As Long
pos1 = InStr(1, value, fromKey & ":")
If (isLast) Then
pos2 = Len(value)
Else
pos2 = InStr(pos1, value, toKey & ":")
End If
getPart = Trim$(Mid$(value, pos1, pos2 - pos1))
End Function
amount:2
price:253,18
price2:59,24 EU
status:WBB NAS MRR OWA PXA
min:1
opt:3
category: PNE
code z:19575

Several choices:
The pattern you show is that each split can be determined by a single word (no spaces) followed by a colon.
This can be easily replicated as a regular expression pattern, and implemented in VBA.
However, if your splitword might have a space, then you'll need a different solution:
VBA Regex Solution
'Set Reference to Microsoft VBScript Regular Expressions 5.5
Option Explicit
Function splitIt(S)
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim vResult As Variant, I As Long
Const sPat As String = "\w+:.*?(?=(?:\w+:)|$)"
Set RE = New RegExp
With RE
.Global = True
.Pattern = sPat
If .Test(S) = True Then
Set MC = .Execute(S)
ReDim vResult(1 To MC.Count)
I = 0
For Each M In MC
I = I + 1
vResult(I) = M
Next M
Else
vResult = "split pattern not present"
End If
End With
splitIt = vResult
End Function
This function outputs a horizontal array of values. In versions of Excel with dynamic arrays, this will Spill into the adjacent cells. In older versions, you may have to enter it as an array formula; use INDEX for each element; or rewrite this as a Sub to output to the specific cells
Split on word: regex explanation
\w+:.*?(?=(?:\w+:)|$)
Match a single character that is a “word character” \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the colon character :
Match any single character that is NOT a line break character .*?
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) *?
Assert that the regex below can be matched starting at this position (positive lookahead) (?=(?:\w+:)|$)
Match this alternative (?:\w+:)
Match the regular expression below (?:\w+:)
Match a single character that is a “word character” \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the colon character :
Or match this alternative $
Assert position at the very end of the string $
Created with RegexBuddy

Split - Join - ReSplit
Instead of coding fixed categories, this late approach reads in any category from the base string before executing Split actions (only exception code z will be treated in an extra step):
1 define delimiters
2 tokenize base string (split action via blanks " ") and prefix a Pipe character "|" to the joined category elements
3 return results array via an eventual Pipe Split
Function getParts(ByVal s As String)
'Purpose: split into categories (identified by colon character ":")
'1. a) define delimiters
Const Blank$ = " ", Colon$ = ":", Pipe$ = "|", Xtra$ = "^"
' b) provide for category exception "code z" (the only two words category)
s = Replace(s, "code z", "code" & Xtra & "z")
'2. a) tokenize base string
Dim tokens: tokens = Split(s, Blank)
' b) prefix all ":" elements by Pipe char "|"
Dim i As Long
For i = 0 To UBound(tokens) '
tokens(i) = IIf(InStr(1, tokens(i), Colon), Pipe, Blank) & tokens(i)
Next
' c) restore mutilated "code z" category (back from "code^z")
s = Replace(Join(tokens, vbNullString), Xtra, Blank)
'3. get results array via Pipe split
getParts = Split(Mid$(s,2), Pipe) ' edited due to FaneDurus comment
End Function

I'd look into some regular expression, for example:
[a-z\d ]+:[ ,A-Z\d]+
See an online demo
[a-z\d ]+ - 1+ Lowercase alpha, space, or digit chars.
: - A literal colon.
[ ,A-Z\d]+ - 1+ Space, comma, uppercase alpha or digit.
VBA:
Sub Test()
Dim str As String: str = "amount:2 price:253,18 price2:59,24 EU status:WBB NAS MRR OWA PXA min:1 opt:3 category: PNE code z:195750"
Dim matches As Object
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "[a-z\d]+(?: [a-z\d]+)?:[ ,A-Z\d]+"
If .Test(str) = True Then
Set matches = .Execute(str)
For Each match In matches
Debug.Print Trim(match)
Next
End If
End With
End Sub

A version with a similar logic as Alex K.'s answer, so all the credit goes to him, using two arrays and the processing result being dropped on a row:
Sub extractFromString()
Dim arrStr, arrFin, strInit As String, i As Long, iStart As Long, iEnd As Long, k As Long
strInit = "amount:2 price:253,18 price2:59,24 EU status:WBB NAS MRR OWA PXA min:1 opt:3 category: PNE code z:195750"
arrStr = Split("amount:,price:,price2:,status:,min:,opt:,category:,code z:", ",")
ReDim arrFin(UBound(arrStr))
For i = 0 To UBound(arrStr)
iStart = InStr(strInit, arrStr(i))
If i <> UBound(arrStr) Then
iEnd = InStr(iStart, strInit, arrStr(i + 1))
Else
arrFin(k) = Mid(strInit, iStart): Exit For
End If
arrFin(k) = RTrim(Mid(strInit, iStart, iEnd - iStart)): k = k + 1
Next i
'use here the first cell of the row where the processing result to be returned
Range("A22").Resize(1, UBound(arrFin) + 1) = arrFin
End Sub

Another version of split/join/filter arrays:
Sub extractFromStr()
Dim arrStr, arrFin, strInit As String, i As Long, k As Long
Dim arr1, arr2, firstEl As String, secEl As String
strInit = "amount:2 price:253,18 price2:59,24 EU status:WBB NAS MRR OWA PXA min:1 opt:3 category: PNE code z:195750"
arrStr = Split(strInit, ":") 'split the string by ":" character
ReDim arrFin(UBound(arrStr)) 'ReDim the final array at the same number of elements
For i = 0 To UBound(arrStr) - 1 'iterate between the array elements (except the last)
arr1 = Split(arrStr(i), " ") 'split the i element by space (" ")
arr2 = Split(arrStr(i + 1), " ") 'split the i + 1 element by space (" ")
If i = 0 Then 'for the first array element:
firstEl = arrStr(i) 'it receives the (first) array element value
Else 'for the rest of array elements:
'extract firstEl (category) like first arr1 element, except the case of 'code z' which is extracted in a different way
firstEl = IIf(i = UBound(arrStr) - 1, arr1(UBound(arr1) - 1) & " " & arr1(UBound(arr1)), arr1(UBound(arr1)))
End If
'in order to remove array elements, the code transformes the one to be removed in "|||":
'it could be anything, but "|||" is difficult to suppose that it will be the text of a real element...
arr2(UBound(arr2)) = "|||": If i = UBound(arrStr) - 2 Then arr2(UBound(arr2) - 1) = "|||"
'extract the secEl (the value) by joining the array after removed firstEl:
secEl = IIf(i = UBound(arrStr) - 1, arrStr(UBound(arrStr)), Join(Filter(arr2, "|||", False), " "))
arrFin(k) = firstEl & ":" & secEl: k = k + 1 'create the processed element of the array to keep the result
Next i
'use here the first cell of the row where the processing result to be returned. Here, it returns on the first row:
Range("A1").Resize(1, UBound(arrFin) + 1) = arrFin
End Sub

Related

Remove alphanumeric chars in front of a defined char

I have a string in a cell composed of several shorter strings of various lengths with blank spaces and commas in between. In some cases only one or more blanks are in between.
I want to remove every blank space and comma and only leave behind 1 comma between each string element. The result must look like this:
The following doesn't work. I'm not getting an error but the strings are truncated at the wrong places. I don't understand why.
Sub String_adaption()
Dim i, j, k, m As Long
Dim STR_A As String
STR_A = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
i = 1
With Worksheets("table")
For m = 1 To Len(.Range("H" & i))
j = 1
Do While Mid(.Range("H" & i), m, 1) = "," And Mid(.Range("H" & i), m - 1, 1) <> Mid(STR_A, j, 1) And m <> Len(.Range("H" & i))
.Range("H" & i) = Mid(.Range("H" & i), 1, m - 2) & Mid(.Range("H" & i), m, Len(.Range("H" & i)))
j = j + 1
Loop
Next m
End With
End Sub
I'd use a regular expression to replace any combination of spaces and comma's. Something along these lines:
Sub Test()
Dim str As String: str = "STRING_22 ,,,,,STRING_1 , , ,,,,,STRING_333 STRING_22 STRING_4444"
Debug.Print RegexReplace(str, "[\s,]+", ",")
End Sub
Function RegexReplace(x_in, pat, repl) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = pat
RegexReplace = .Replace(x_in, repl)
End With
End Function
Just for the sake of alternatives:
Formula in B1:
=TEXTJOIN(",",,TEXTSPLIT(A1,{" ",","}))
The following function will split the input string into pieces (words), using a comma as separator. When the input string has multiple commas, it will result in empty words.
After splitting, the function loops over all words, trims them (remove leading and trailing blanks) and glue them together. Empty words will be skipped.
I have implemented it as Function, you could use it as UDF: If your input string is in B2, write =String_adaption(B2) as Formula into any cell.
Function String_adaption(s As String) As String
' Remove duplicate Commas and Leading and Trailing Blanks from words
Dim words() As String, i As Long
words = Split(s, ",")
For i = 0 To UBound(words)
Dim word As String
word = Trim(words(i))
If word <> "" Then
String_adaption = String_adaption & IIf(String_adaption = "", "", ",") & word
End If
Next i
End Function
P.S.: Almost sure that this could be done with some magic regular expressions, but I'm not an expert in that.
If you have recent Excel version, you can use simple worksheet function to split the string on space and on comma; then put it back together using the comma deliminater and ignoring the blanks (and I just noted #JvdV had previously posted the same formula solution):
=TEXTJOIN(",",TRUE,TEXTSPLIT(A1,{" ",","}))
In VBA, you can use a similar algorithm, using the ArrayList object to collect the non-blank results.
Option Explicit
Function commaOnly(s As String) As String
Dim v, w, x, y
Dim al As Object
Set al = CreateObject("System.Collections.ArrayList")
v = Split(s, " ")
For Each w In v
x = Split(w, ",")
For Each y In x
If y <> "" Then al.Add y
Next y
Next w
commaOnly = Join(al.toarray, ",")
End Function
This preserves the spaces within the smaller strings.
Option Explicit
Sub demo()
Const s = "STRING 22,,,, ,,STRING 1,,,, ,,STRING 333 , , , STRING_22 STRING_44"
Debug.Print Cleanup(s)
End Sub
Function Cleanup(s As String) As String
Const SEP = ","
Dim regex, m, sOut As String, i As Long, ar()
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([^,]+)(?:[ ,]*)"
End With
If regex.Test(s) Then
Set m = regex.Execute(s)
ReDim ar(0 To m.Count - 1)
For i = 0 To UBound(ar)
ar(i) = Trim(m(i).submatches(0))
Next
End If
Cleanup = Join(ar, SEP)
End Function
Code categories approach
For the sake of completeness and to show also other ways "leading to Rome", I want to demonstrate an approach allowing to group the string input into five code categories in order to extract alphanumerics by a tricky match (see [B] Function getCats()):
To meet the requirements in OP use the following steps:
1) remove comma separated tokens if empty or only blanks (optional),
2) group characters into code categories,
3) check catCodes returning alpha nums including even accented or diacritic letters as well as characters like [ -,.+_]
Function AlphaNum(ByVal s As String, _
Optional IgnoreEmpty As Boolean = True, _
Optional info As Boolean = False) As String
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Date: 2023-01-12
'1) remove comma separated tokens if empty or only blanks (s passed as byRef argument)
If IgnoreEmpty Then RemoveEmpty s ' << [A] RemoveEmpty
'2) group characters into code categories
Dim catCodes: catCodes = getCats(s, info) ' << [B] getCats()
'3) check catCodes and return alpha nums plus chars like [ -,.+_]
Dim i As Long, ii As Long
For i = 1 To UBound(catCodes)
' get current character
Dim curr As String: curr = Mid$(s, i, 1)
Dim okay As Boolean: okay = False
Select Case catCodes(i)
' AlphaNum: cat.4=digits, cat.5=alpha letters
Case Is >= 4: okay = True
' Category 2: allow only space, comma, minus
Case 2: If InStr(" -,", curr) <> 0 Then okay = True
' Category 3: allow only point, plus, underline
Case 3: If InStr(".+_", curr) <> 0 Then okay = True
End Select
If okay Then ii = ii + 1: catCodes(ii) = curr ' increment counter
Next i
ReDim Preserve catCodes(1 To ii)
AlphaNum = Join(catCodes, vbNullString)
End Function
Note: Instead of If InStr(" -,", curr) <> 0 Then in Case 2 you may code If curr like "[ -,]" Then, too. Similar in Case 3 :-)
[A] Helper procedure RemoveEmpty
Optional clean-up removing comma separated tokens if empty or containing only blanks:
Sub RemoveEmpty(ByRef s As String)
'Purp: remove comma separated tokens if empty or only blanks
Const DEL = "$DEL$" ' temporary deletion marker
Dim i As Long
Dim tmp: tmp = Split(s, ",")
For i = LBound(tmp) To UBound(tmp)
tmp(i) = IIf(Len(Trim(tmp(i))) = 0, DEL, Trim(tmp(i)))
Next i
tmp = Filter(tmp, DEL, False) ' remove marked elements
s = Join(tmp, ",")
End Sub
[B] Helper function getCats()
A tricky way to groups characters into five code categories, thus building the basic logic for any further analyzing:
Function getCats(s, Optional info As Boolean = False)
'Purp.: group characters into five code categories
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Note: Cat.: including:
' 1 ~~> apostrophe '
' 2 ~~> space, comma, minus etc
' 3 ~~> point separ., plus etc
' 4 ~~> digits 0..9
' 5 ~~> alpha (even including accented or diacritic letters!)
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'a) get array of single characters
Const CATEG As String = "' - . 0 A" 'define group starters (case indep.)
Dim arr: arr = Char2Arr(s) ' << [C] Char2Arr()
Dim chars: chars = Split(CATEG)
'b) return codes per array element
getCats = Application.Match(arr, chars) 'No 3rd zero-argument!!
'c) display in immediate window (optionally)
If info Then Debug.Print Join(arr, "|") & vbNewLine & Join(getCats, "|")
End Function
[C] Helper function Char2Arr
Assigns every string character to an array:
Function Char2Arr(ByVal s As String)
'Purp.: assign single characters to array
s = StrConv(s, vbUnicode)
Char2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function

Remove special characters from range in VBA

I have created a VBA code to remove all special characters available in a column. As an example I have a Alphanumeric character with some special characters in every cells of a column:
Suppose in a cell I have a value: abc#123!-245
After executing my code I got output abc 123 245
Here my code is working fine to remove all the special characters. My code is given below:
Sub ReplaceSpecial()
Dim cel As Range
Dim strVal As String
Dim i As Long
Application.ScreenUpdating = False
For Each cel In Selection
strVal = cel.Value
For i = 1 To Len(strVal)
Select Case Asc(Mid(strVal, i, 1))
Case 32, 48 To 57, 65 To 90, 97 To 122
' Leave ordinary characters alone
Case Else
Mid(strVal, i, 1) = " "
End Select
Next i
cel.Value = strVal
Next cel
Application.ScreenUpdating = True
End Sub
Now if I want to remove the space for my output so that output should look like abc123245, how to do that in VBA?
Input: abc#123!-245
Current Output: abc 123 245
Required Output: abc123245
You could construct a new string with just the permitted characters.
Sub ReplaceSpecial()
Dim cel As Range
Dim strVal As String, temp As String
Dim i As Long
Application.ScreenUpdating = False
For Each cel In Selection
strVal = cel.Value
temp = vbNullString
For i = 1 To Len(strVal)
Select Case Asc(Mid(strVal, i, 1))
Case 32, 48 To 57, 65 To 90, 97 To 122
temp = temp & Mid(strVal, i, 1)
End Select
Next i
cel.Value = temp
Next cel
Application.ScreenUpdating = True
End Sub
My sole intention for this late post was to
test some features of the ►Application.Match() function (comparing a string input against valid characters) and to
demonstrate a nice way to "split" a string into single characters as alternative and possibly instructive solution (see help function String2Arr()).
I don't intend, however to show better or faster code here.
Application.Match() allows not only to execute 1 character searches in an array, but to compare even two arrays in one go,
i.e. a character array (based on an atomized string input) against an array of valid characters (blanks, all digits and chars from A to Z).
As Application.Match is case insensitive, it suffices to take e.g. lower case characters.
All findings of input chars return their position in the valid characters array (otherwise resulting in Error 2042).
Furthermore it was necessary to exclude the wild cards "*" and "?", which would have been considered as findings otherwise.
Function ValidChars(ByVal s, Optional JoinResult As Boolean = True)
'Purp: return only valid characters if space,digits,"A-Z" or "a-z"
'compare all string characters against valid characters
Dim tmp: tmp = foundCharAt(s) ' get array with found positions in chars
'overwrite tmp array
Dim i As Long, ii As Long
For i = 1 To UBound(tmp)
If IsNumeric(tmp(i)) Then ' found in valid positions
If Not Mid(s, i, 1) Like "[?*]" Then ' exclude wild cards
ii = ii + 1
tmp(ii) = Mid(s, i, 1) ' get char from original string
End If
End If
Next
ReDim Preserve tmp(1 To ii) ' reduce to new size
'join tmp elements to resulting string (if argument JoinResult = True)
ValidChars = IIf(JoinResult, Join(tmp, ""), tmp)
End Function
Help function foundCharAt()
Returns an array of found character positions in the valid chars array:
Function foundCharAt(ByVal s As String) As Variant
'Purp: return array of found character positions in chars string
'Note: (non-findings show Error 2042; can be identified by IsError + Not IsNumeric)
Dim chars: chars = String2Arr(" 0123456789abcdefghijklmnopqrstuvwxyz")
foundCharAt = Application.Match(String2Arr(s), chars, 0)
End Function
Help function String2Arr()
Assigns an array of single characters after atomizing a string input:
Function String2Arr(ByVal s As String) As Variant
'Purp: return array of all single characters in a string
'Idea: https://stackoverflow.com/questions/13195583/split-string-into-array-of-characters
s = StrConv(s, vbUnicode)
String2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function
Use a regular expression's object and replace all unwanted characters by using a negated character class. For demonstration purposes:
Sub Test()
Dim str As String: str = "abc#123!-245"
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "[^0-9A-Za-z ]"
str = .Replace(str, "")
End With
Debug.Print str
End Sub
The pattern [^0-9A-Za-z ] is a negated character class and captured everything that is not a alphanumeric or a space character. You'll find a more in-depth explaination in this online demo.
At time of writing I'm unsure if you want to leave out the space characters or not. If so, just remove the space from the pattern.
Thought I'd chuck in another alternative using the Like() operator:
For i = Len(str) To 1 Step -1
If Mid(str, i, 1) Like "[!0-9A-Za-z ]" Then
str= Application.Replace(str, i, 1, "")
End If
Next
Or with a 2nd string-type variable (as per #BigBen's answer):
For i = 1 to Len(str)
If Mid(str, i, 1) Like "[0-9A-Za-z ]" Then
temp = temp & Mid(str, i, 1)
End If
Next
If you want to build on your current effort, replace:
cel.Value = strVal
with:
cel.Value = Replace(strVal, " ", "")
Consider:
Sub ReplaceSpecial()
Dim cel As Range
Dim strVal As String
Dim i As Long
Application.ScreenUpdating = False
For Each cel In Selection
strVal = cel.Value
For i = 1 To Len(strVal)
Select Case Asc(Mid(strVal, i, 1))
Case 32, 48 To 57, 65 To 90, 97 To 122
' Leave ordinary characters alone
Case Else
Mid(strVal, i, 1) = " "
End Select
Next i
cel.Value = Replace(strVal, " ", "")
Next cel
Application.ScreenUpdating = True
End Sub

VBA SPLIT by different delimiter

I have a rather tricky problem. I am trying to split and declare a different parts of a string for further use. Obviously there I have a different delimiters to do that.
Say I wanted to split a standard screw code: DIN912M6x10A2 into it's different parts since each part of that code means specific something.
ScreHead is Left up to first "M" without the delimiter = DIN912
ScrewThickness is "M" included up to "x" excluded = M5
ScrewLenght is "x" excluded up to "A" excluded = 10
ScrewMaterial is "A" included up to the " " or if there's no " " then up to the end of the string = A2
What I have so far codewise is (I am working in 5th column):
Dim ScrewHead As Long
ScrewHead = Split(Cells(i, 5), "M"-1)
Dim ScrewDiameter As Long
ScrewDiameter =Split(i,5),"M", "x"-1)
Dim ScrewLenght As Long
ScrewLenght =Split(i,5),"x"-1, "A"-1)
Dim ScrewMaterial As Long
ScrewMaterial =Split(i,5),"A", " ")
Could someone help me with figuring this one out?
Sounds like a nice job for a regular expression to be honest when you can capture all the parts in their own groups. For example through:
^(.+?)(M\d+)x(\d+)(.+?)(?:\s.*)?$
See the online demo
^ - Start line anchor.
(.+?) - A 1st capture group holding 1+ (lazy) characters upto;
(M\d+) - 2nd Capture group with a literal "M" followed by 1+ (greedy) digits.
x - A literal "x".
(\d+) - A 3rd capture group holding 1+ (greedy) digits.
(.+?) - A 4th capture group holding 1+ (lazy) characters upto;
(?:\s.*)? - An optional non-capture group of a space character with 0+ (greedy) characters.
$ - End line anchor.
Here is a quick code to run to retrieve these groups:
Sub Test()
Dim str As String: str = "DIN912M6x10A2 test"
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "^(.+?)(M\d+)x(\d+)(.+?)(?:\s.*)?$"
If .Test(str) = True Then
For Each Match In .Execute(str)(0).Submatches
Debug.Print Match
Next
End If
End With
End Sub
A more extensive code-example for a better understanding:
Sub Test()
Dim str As String: str = "DIN912M6x10A2 test"
Dim ScrewHead As String, ScrewDiameter As String, ScrewLenght As Long, ScrewMaterial As String
Dim matches
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "^(.+?)(M\d+)x(\d+)(.+?)(?:\s.*)?$"
If .Test(str) = True Then
Set matches = .Execute(str)
ScrewHead = matches(0).Submatches(0)
ScrewDiameter = matches(0).Submatches(1)
ScrewLenght = matches(0).Submatches(2)
ScrewMaterial = matches(0).Submatches(3)
End If
End With
End Sub
Here's a plain VBA based sledgehammer approach. You can adopt the code to suit your requirements.
Public Sub GetDiffPartsofString()
Dim strInput As String, strScrewHead As String, strScrewThck As String, strScrewLeng As String, strScrewMatl As String
Dim i As Long, j As Long
strInput = "DIN912M6x10A2"
j = 1
For i = 1 To Len(strInput)
Select Case Mid(strInput, i, 1)
Case "M"
strScrewHead = Mid(strInput, j, i - 1)
j = i
Case "x"
strScrewThck = Mid(strInput, j, i - j)
j = i
Case "A"
strScrewLeng = Mid(strInput, j + 1, i - j - 1)
strScrewMatl = Mid(strInput, i, Len(strInput))
End Select
Next i
Debug.Print strScrewHead, strScrewThck, strScrewLeng, strScrewMatl
End Sub
Tricky split approach via Val()
Another way leading to Rome:
Sub AnalyzeID()
Dim s As String: s = "DIN912M6x10A2 test"
Dim parts: parts = Split(Replace(Replace(s, "M", "x"), " ", "x"), "x")
'adjust split elements
parts(1) = "M" & parts(1)
parts(3) = Split(parts(2), Val(parts(2)))(1) ' (don't change code line order!)
parts(2) = Val(parts(2))
Debug.Print Join(parts, "|") ' ~~> DIN912|M6|10|A2
End Sub
Output in VB Editor's immediate window
DIN912|M6|10|A2

Generating regular expression in Excel for strings

I have a huge list of strings where the I am trying to generate a regular expression in an automated way. The strings are pretty simple and I would like to generate regular expressions using a formula or vba code. From the list of strings, here is the following legend:
& - Any UPPERCASE character (A-Z)
# - Any digits (0-9)
_ - Space (/s)
- - Dash
For example, the regular expression generated for the following strings:
Policy Number Policy Digits Regular Expression
####&&###### 12 ^\d{4}[A-Z]{2}\d{6}$
####&_###### 11 ^\d{4}[A-Z]{1}\s\d{6}$
ACPBP&&########## 17 ^[ACPBP]{5}[A-Z]{2}\d{10}$
ACPBA&########## or ACPBA&&########## 16 or 17 ^[ACPBA]{5}[A-Z]{1,2}\d{10}$
########## 10 ^\d{10}$
09############ 14 ^[09]{2}\d{12}$
A&&######, A&&#######, or A&&######## 9, 10 or 11 ^[A]{1}[A-Z]{2}\d{6,8}$
&&&####, &&&#####, or &&&###### 7, 8, or 9 ^[A-Z]{3}\d{4,6}$
09-##########-## 14 ^[09]{2}-\d{10}-\d{2}$
Is there some existing code that is available to generate regular expressions for a huge list of strings? What are some of the hints or tips that I can use to build a regular expression string? Thanks in advance.
There is no existing code, but try this:
Option Explicit
Option Compare Text 'to handle upper and lower case "or"
'Set reference to Microsoft Scripting Runtime
' or use Late Binding if distributing this
Function createRePattern(sPolicyNum As String) As String
Dim dCode As Dictionary, dReg As Dictionary
Dim I As Long, sReg As String, s As String
Dim v, sPN
v = Replace(sPolicyNum, "or", ",")
v = Split(v, ",")
Set dCode = New Dictionary
dCode.Add Key:="&", Item:="[A-Z]"
dCode.Add Key:="#", Item:="\d"
dCode.Add Key:="_", Item:="\s"
For Each sPN In v
sPN = Trim(sPN)
If Not sPN = "" Then
Set dReg = New Dictionary
For I = 1 To Len(sPN)
s = Mid(sPN, I, 1)
If Not dCode.Exists(s) Then dCode.Add s, s
If dReg.Exists(s) Then
dReg(s) = dReg(s) + 1
Else
If dReg.Count = 1 Then
dReg.Add s, 1
s = Mid(sPN, I - 1, 1)
sReg = sReg & dCode(s) & IIf(dReg(s) > 1, "{" & dReg(s) & "}", "")
dReg.Remove s
Else
dReg.Add s, 1
End If
End If
Next I
'Last Entry in Regex
s = Right(sPN, 1)
sReg = sReg & dCode(s) & IIf(dReg(s) > 1, "{" & dReg(s) & "}", "") & "|"
End If
Next sPN
s = Left(sReg, Len(sReg) - 1)
'Non-capturing group added if alternation present
If InStr(s, "|") = 0 Then
sReg = "^" & s & "$"
Else
sReg = "^(?:" & Left(sReg, Len(sReg) - 1) & ")$"
End If
createRePattern = sReg
End Function
Note
As written, there are limitations in that you cannot reference the literal strings:
#, &, _, , or
Generate regex patterns without dictionary
In addition to Ron's valid solution an alternative using no dictionary:
Option Explicit ' declaration head of code module
Function generateRePattern(ByVal s As String) As String
'[0]definitions & declarations
Const Pipe As String = "|"
Dim curSymbol$: curSymbol = "" ' current symbol (start value)
Dim lngth As Long: lngth = Len(s) ' current string length
Dim ii As Long: ii = 0 ' group index (start value)
Dim n As Long ' repetition counter
ReDim tmp(1 To lngth) ' provide for sufficient temp items
'[1](optional) Pipe replacement for "or" and commas
s = Replace(Replace(Replace(s, " or ", Pipe), " ", ""), ",", Pipe)
'[2]analyze string item s
Dim pos As Long ' current character position
For pos = 1 To lngth ' check each character
Dim curChar As String
curChar = Mid(s, pos, 1) ' define current character
If curChar <> curSymbol Then ' start new group
'a) change repetition counter in old group pattern
If ii > 0 Then tmp(ii) = Replace(tmp(ii), "n", n)
'b) increment group counter & get pattern via help function
ii = ii + 1: tmp(ii) = getPattern(curChar) ' << getPattern
'c) start new repetition counter & group symbol
n = 1: curSymbol = curChar
Else
n = n + 1 ' increment current repetition counter
End If
Next pos
'd) change last repetition counter
tmp(ii) = Replace(tmp(ii), "n", n)
ReDim Preserve tmp(1 To ii) '
'[3]return function result
generateRePattern = "^(?:" & Replace(Join(tmp, ""), "{1}", "") & ")$"
End Function
Help function getPattern()
Function getPattern(curChar) As String
'Purpose: return general pattern based on current character
'a) definitions
Const Pipe As String = "|"
Dim symbols: symbols = Split("&|#|_", Pipe)
Dim patterns: patterns = Split("[A-Z]{n}|\d{n}|\s", Pipe)
'b) match character position within symbols
Dim pos: pos = Application.Match(curChar, symbols, 0)
'c) return pattern
If IsError(pos) Then
getPattern = curChar
Else
getPattern = patterns(pos - 1)
End If
End Function

Parsing and comparing a complicated string

I am hoping someone could help me out with a VBA Excel macro.
I have received a worksheet in Excel 2007 which contains product names in one column, and I need to sort this into a logical format so I can use it. However, the list itself is not in any kind of logical order, is 10 000 rows long and I am going to have to do this every month!!
Basically, what I would like to do is search for certain keywords which are common to most of the entries and move them into separate cells in different columns (but in the same row as the original entry).
Regarding keywords: There are 3 different types, two of which I have a complete list of.
Example of keywords: some are measures such as cm (centimetre), mm (millimetre), m (metre) etc.). Then there are other keywords such as % and finally a last set of keywords which is wood, plastic, glass etc.
If this was not complicated enough, the measures (cm for example) are duplicated in some instances and are important details so I cant just separate them but would ideally like them in two adjacent cells.
Fortunately, there is a space after each measure, % sign and item material.
Working from right to left is the easiest way I can think of achieving this as the first description in the string varies wildly between entries and that can stay as is.
So, below is an example string, lets say this is in Cell A1. (Inverted commas are not included in the string and the word "by" appears in only about 100 cases. Usually it is missing...)
"Chair Leg Wood 100% 1m by 20cm"
I would ideally like for the string to be split up into cells as follows
Cell B1 - Chair Leg
Cell C1 - Wood
Cell D1 - 1m
Cell E1 - 2cm
Cell F1 - 100%
Having the % measures in the same column would be extremely helpful
Can anyone please help me with this or the beginnings of a macro which does this and then moves down the list - I have tried using some basic "find" and "len" formulas but really am at my wits end on how to deal with this!
The task boils down to defining a robust definition of the structure of the input data.
Form the info provided a candidate definition might be
<Description, one or more words> <Material, one word> <A value followd by %> <Dimension A> <optional "by"> <Dimension B>
The following macro will process data that conforms this this spec. The definition may need
expanding, eg two word materials (eg Mild Steel)
You will need to add error handling in case any rows don't conform, eg no % in the string, or % character elsewhere in string
Option Explicit
Dim dat As Variant
Sub ProcessData()
Dim r As Range
Dim i As Long
Set r = Intersect(ActiveSheet.UsedRange, ActiveSheet.Columns(1)).Resize(, 5)
dat = r
For i = 1 To UBound(dat, 1)
ParseRow i, CStr(dat(i, 1))
Next
r = dat
ActiveSheet.Columns(5).Style = "Percent"
End Sub
Sub ParseRow(rw As Long, s As String)
'Chair Leg Wood 100% 1m by 20cm
Dim i As Long
Dim sDim As String, sPCnt As String, sMat As String, sDesc As String
Dim sA As String, sB As String
i = InStr(s, "% ")
sDim = Trim(Replace(Mid(s, i + 2), " by ", " ")) ' text to right of %, remove "by"
sA = Trim(Left(sDim, InStr(sDim, " "))) ' split dimension string in two
sB = Trim(Mid(sDim, InStr(sDim, " ")))
s = Left(s, i)
i = InStrRev(s, " ")
sPCnt = Mid(s, i + 1) ' text back to first space before %
s = Trim(Left(s, i))
i = InStrRev(s, " ") ' last word in string
sMat = Mid(s, i + 1)
sDesc = Trim(Left(s, i)) ' whats left
dat(rw, 1) = sDesc
dat(rw, 2) = sMat
dat(rw, 3) = sA
dat(rw, 4) = sB
dat(rw, 5) = sPCnt
End Sub
First, I'd use the Split function to separate the parts into an array, this will avoid most of the string functions and string math:
Dim parts As Variant
parts = Split(A1)
Then, I'd do my comparisons to each part.
Finally, I'd concatenate the parts I didn't breakout, and place all parts on the sheet.
This is based on your example which has spaces inbetween every part, though something similar could work otherwise, you just have to do more work with each part.
Here's my stab at it. We could use about 10 more examples, but this should be a start. To use, select a one column range with your descriptions and run SplitProduct. It will split it out to the right of each cell.
Sub SplitProducts()
Dim rCell As Range
Dim vaSplit As Variant
Dim i As Long
Dim aOutput() As Variant
Dim lCnt As Long
Const lCOLDESC As Long = 1
Const lCOLMAT As Long = 2
Const lCOLPCT As Long = 3
Const lCOLREM As Long = 4
If TypeName(Selection) = "Range" Then
If Selection.Columns.Count = 1 Then
For Each rCell In Selection.Cells
'split into words
vaSplit = Split(rCell.Value, Space(1))
ReDim aOutput(1 To 1, 1 To 1)
'loop through the words
For i = LBound(vaSplit) To UBound(vaSplit)
Select Case True
Case IsPercent(vaSplit(i))
'percents always go in the same column
lCnt = lCOLPCT
If UBound(aOutput, 2) < lCnt Then
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
End If
aOutput(1, lCnt) = vaSplit(i)
Case IsInList(vaSplit(i))
'list items always go in the same column
lCnt = lCOLMAT
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
If UBound(aOutput, 2) < lCnt Then
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
End If
aOutput(1, lCnt) = vaSplit(i)
Case IsMeasure(vaSplit(i))
'measurements go in the last column(s)
If UBound(aOutput, 2) < lCOLREM Then
lCnt = lCOLREM
Else
lCnt = UBound(aOutput, 2) + 1
End If
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
aOutput(1, lCnt) = vaSplit(i)
Case Else
'everything else gets concatentated in the desc column
aOutput(1, lCOLDESC) = aOutput(1, lCOLDESC) & " " & vaSplit(i)
End Select
Next i
'remove any extraneous spaces
aOutput(1, lCOLDESC) = Trim(aOutput(1, lCOLDESC))
'write the values to the left of the input range
rCell.Offset(0, 1).Resize(1, UBound(aOutput, 2)).Value = aOutput
Next rCell
Else
MsgBox "Select a one column range"
End If
End If
End Sub
Function IsPercent(ByVal sInput As String) As Boolean
IsPercent = Right$(sInput, 1) = "%"
End Function
Function IsInList(ByVal sInput As String) As Boolean
Dim vaList As Variant
Dim vaTest As Variant
'add list items as needed
vaList = Array("Wood", "Glass", "Plastic")
vaTest = Filter(vaList, sInput)
IsInList = UBound(vaTest) > -1
End Function
Function IsMeasure(ByVal sInput As String) As Boolean
Dim vaMeas As Variant
Dim i As Long
'add measurements as needed
vaMeas = Array("mm", "cm", "m")
For i = LBound(vaMeas) To UBound(vaMeas)
'any number of characters that end in a number and a measurement
If sInput Like "*#" & vaMeas(i) Then
IsMeasure = True
Exit For
End If
Next i
End Function
No guarantees that this will be speedy on 10k rows.

Resources