Split string by character and keep before and after string - excel

I have string containing numbers. These numbers are coordinates for scientific modeling.
I need to split this string by character "-" to partial string "before" and "after".
This works only for static number of digit (character).
Dim str As String
Dim before As String
Dim after As String
str = "3-525"
before= Left(str, InStr(str, "-") - 1) ' =3
after= Right(str, InStr(str, "-") + 1) ' =525
If input is str = "3-525" output is before = 3 and after = 525
But when it comes to str = "15-50" output is before = 15 and after = 5-50 and is annoying to retype it again and again.
I need some dynamic solution to split these coordinates by "-" character.

Use Split:
Sub Test()
Dim str As String
str = "3-525"
Dim x
x = Split(str, "-")
Debug.Print x(0) '<--- this is "before", or 3
Debug.Print x(1) '<--- this is "after", or 525
End Sub

Of course applying Split() is the most evident way to execute a split operation. -
Nevertheless your original code logic to count from an Instr() finding isn't wrong per se and you need not reject the way you form the after variable completely.
(1) Using the Right() function only needs a length as further argument, not a position. So you might calculate the remaining length to the right as difference between the total length and the "-" character and modify your code to
after = Right(s, Len(s) - InStr(s, "-"))
(btw I'd prefer s to your str variable as I don't want to mix it up with VBA's Str() function).
(2) Alternatively you could use the Mid() function and code as follows:
after = Mid(s, InStr(s, "-") + 1)
Here it suffices to pass a starting position as further argument (without need to indicate a total lenght in addition).

Related

Remove alphanumeric chars in front of a defined char

I have a string in a cell composed of several shorter strings of various lengths with blank spaces and commas in between. In some cases only one or more blanks are in between.
I want to remove every blank space and comma and only leave behind 1 comma between each string element. The result must look like this:
The following doesn't work. I'm not getting an error but the strings are truncated at the wrong places. I don't understand why.
Sub String_adaption()
Dim i, j, k, m As Long
Dim STR_A As String
STR_A = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
i = 1
With Worksheets("table")
For m = 1 To Len(.Range("H" & i))
j = 1
Do While Mid(.Range("H" & i), m, 1) = "," And Mid(.Range("H" & i), m - 1, 1) <> Mid(STR_A, j, 1) And m <> Len(.Range("H" & i))
.Range("H" & i) = Mid(.Range("H" & i), 1, m - 2) & Mid(.Range("H" & i), m, Len(.Range("H" & i)))
j = j + 1
Loop
Next m
End With
End Sub
I'd use a regular expression to replace any combination of spaces and comma's. Something along these lines:
Sub Test()
Dim str As String: str = "STRING_22 ,,,,,STRING_1 , , ,,,,,STRING_333 STRING_22 STRING_4444"
Debug.Print RegexReplace(str, "[\s,]+", ",")
End Sub
Function RegexReplace(x_in, pat, repl) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = pat
RegexReplace = .Replace(x_in, repl)
End With
End Function
Just for the sake of alternatives:
Formula in B1:
=TEXTJOIN(",",,TEXTSPLIT(A1,{" ",","}))
The following function will split the input string into pieces (words), using a comma as separator. When the input string has multiple commas, it will result in empty words.
After splitting, the function loops over all words, trims them (remove leading and trailing blanks) and glue them together. Empty words will be skipped.
I have implemented it as Function, you could use it as UDF: If your input string is in B2, write =String_adaption(B2) as Formula into any cell.
Function String_adaption(s As String) As String
' Remove duplicate Commas and Leading and Trailing Blanks from words
Dim words() As String, i As Long
words = Split(s, ",")
For i = 0 To UBound(words)
Dim word As String
word = Trim(words(i))
If word <> "" Then
String_adaption = String_adaption & IIf(String_adaption = "", "", ",") & word
End If
Next i
End Function
P.S.: Almost sure that this could be done with some magic regular expressions, but I'm not an expert in that.
If you have recent Excel version, you can use simple worksheet function to split the string on space and on comma; then put it back together using the comma deliminater and ignoring the blanks (and I just noted #JvdV had previously posted the same formula solution):
=TEXTJOIN(",",TRUE,TEXTSPLIT(A1,{" ",","}))
In VBA, you can use a similar algorithm, using the ArrayList object to collect the non-blank results.
Option Explicit
Function commaOnly(s As String) As String
Dim v, w, x, y
Dim al As Object
Set al = CreateObject("System.Collections.ArrayList")
v = Split(s, " ")
For Each w In v
x = Split(w, ",")
For Each y In x
If y <> "" Then al.Add y
Next y
Next w
commaOnly = Join(al.toarray, ",")
End Function
This preserves the spaces within the smaller strings.
Option Explicit
Sub demo()
Const s = "STRING 22,,,, ,,STRING 1,,,, ,,STRING 333 , , , STRING_22 STRING_44"
Debug.Print Cleanup(s)
End Sub
Function Cleanup(s As String) As String
Const SEP = ","
Dim regex, m, sOut As String, i As Long, ar()
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([^,]+)(?:[ ,]*)"
End With
If regex.Test(s) Then
Set m = regex.Execute(s)
ReDim ar(0 To m.Count - 1)
For i = 0 To UBound(ar)
ar(i) = Trim(m(i).submatches(0))
Next
End If
Cleanup = Join(ar, SEP)
End Function
Code categories approach
For the sake of completeness and to show also other ways "leading to Rome", I want to demonstrate an approach allowing to group the string input into five code categories in order to extract alphanumerics by a tricky match (see [B] Function getCats()):
To meet the requirements in OP use the following steps:
1) remove comma separated tokens if empty or only blanks (optional),
2) group characters into code categories,
3) check catCodes returning alpha nums including even accented or diacritic letters as well as characters like [ -,.+_]
Function AlphaNum(ByVal s As String, _
Optional IgnoreEmpty As Boolean = True, _
Optional info As Boolean = False) As String
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Date: 2023-01-12
'1) remove comma separated tokens if empty or only blanks (s passed as byRef argument)
If IgnoreEmpty Then RemoveEmpty s ' << [A] RemoveEmpty
'2) group characters into code categories
Dim catCodes: catCodes = getCats(s, info) ' << [B] getCats()
'3) check catCodes and return alpha nums plus chars like [ -,.+_]
Dim i As Long, ii As Long
For i = 1 To UBound(catCodes)
' get current character
Dim curr As String: curr = Mid$(s, i, 1)
Dim okay As Boolean: okay = False
Select Case catCodes(i)
' AlphaNum: cat.4=digits, cat.5=alpha letters
Case Is >= 4: okay = True
' Category 2: allow only space, comma, minus
Case 2: If InStr(" -,", curr) <> 0 Then okay = True
' Category 3: allow only point, plus, underline
Case 3: If InStr(".+_", curr) <> 0 Then okay = True
End Select
If okay Then ii = ii + 1: catCodes(ii) = curr ' increment counter
Next i
ReDim Preserve catCodes(1 To ii)
AlphaNum = Join(catCodes, vbNullString)
End Function
Note: Instead of If InStr(" -,", curr) <> 0 Then in Case 2 you may code If curr like "[ -,]" Then, too. Similar in Case 3 :-)
[A] Helper procedure RemoveEmpty
Optional clean-up removing comma separated tokens if empty or containing only blanks:
Sub RemoveEmpty(ByRef s As String)
'Purp: remove comma separated tokens if empty or only blanks
Const DEL = "$DEL$" ' temporary deletion marker
Dim i As Long
Dim tmp: tmp = Split(s, ",")
For i = LBound(tmp) To UBound(tmp)
tmp(i) = IIf(Len(Trim(tmp(i))) = 0, DEL, Trim(tmp(i)))
Next i
tmp = Filter(tmp, DEL, False) ' remove marked elements
s = Join(tmp, ",")
End Sub
[B] Helper function getCats()
A tricky way to groups characters into five code categories, thus building the basic logic for any further analyzing:
Function getCats(s, Optional info As Boolean = False)
'Purp.: group characters into five code categories
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Note: Cat.: including:
' 1 ~~> apostrophe '
' 2 ~~> space, comma, minus etc
' 3 ~~> point separ., plus etc
' 4 ~~> digits 0..9
' 5 ~~> alpha (even including accented or diacritic letters!)
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'a) get array of single characters
Const CATEG As String = "' - . 0 A" 'define group starters (case indep.)
Dim arr: arr = Char2Arr(s) ' << [C] Char2Arr()
Dim chars: chars = Split(CATEG)
'b) return codes per array element
getCats = Application.Match(arr, chars) 'No 3rd zero-argument!!
'c) display in immediate window (optionally)
If info Then Debug.Print Join(arr, "|") & vbNewLine & Join(getCats, "|")
End Function
[C] Helper function Char2Arr
Assigns every string character to an array:
Function Char2Arr(ByVal s As String)
'Purp.: assign single characters to array
s = StrConv(s, vbUnicode)
Char2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function

How can I find quoted text in a string?

Example
Say I have a string:
"I say ""Hello world"" and she says ""Excuse me?"""
VBA will interpret this string as:
I say "Hello world" and she says "Excuse me?"
A more complex example:
I have a string:
"I say ""Did you know that she said """"Hi there!"""""""
VBA interprets this string as:
I say "Did you know that she said ""Hi there!"""
If we remove "I say "
"Did you know that she said ""Hi there!"""
we can continue parsing the string in vba:
Did you know that she said "Hi there!"
Problem
Ultimately I want some function, sBasicQuote(quotedStringHierarchy as string), which returns a string containing the next level up in the string hierarchy.
E.G.
dim s as string
s = "I say ""Did you know that she said """"Hi there!"""""""
s = sBasicQuote(s) ' returns 'I say "Did you know that she said ""Hi there!"""'
s = sBasicQuote(s) ' returns 'Did you know that she said "Hi there!"'
s = sBasicQuote(s) ' returns 'Hi there!'
I just can't figure out an algorithm that would work with this... You almost need to replace all double quotes, but when you've replaced the nth double quote you have to skip to the n+1th douple quote?
How does one implement this in VBA?
You could do something like this
Public Sub test()
Dim s As String
s = "I say ""Did you know that she said """"Hi there!"""""""
Debug.Print DoubleQuote(s, 0)
Debug.Print DoubleQuote(s, 1)
Debug.Print DoubleQuote(s, 2)
End Sub
Public Function DoubleQuote(strInput As String, intElement As Integer) As String
Dim a() As String
strInput = Replace(strInput, String(2, Chr(34)), String(1, Chr(34)))
a = Split(strInput, chr(34))
DoubleQuote = a(intElement)
End Function
Another slightly modified version is a little more accurate
`Public Function DoubleQuote(strInput As String, intElement As Integer) As String
Dim a() As String
Dim b() As String
Dim i As Integer
ReDim b(0)
a = Split(strInput, Chr(34))
' ***** See comments re using -1 *******
For i = 0 To UBound(a) - 1
If Len(a(i)) = 0 Then
b(UBound(b)) = Chr(34) & a(i + 1) & Chr(34)
i = i + 1
Else
b(UBound(b)) = a(i)
End If
ReDim Preserve b(UBound(b) + 1)
Next i
DoubleQuote = b(intElement)
End Function`
I think the following will return what you are looking for in your nested quote example. Your first example is not really a situation of nested quotes.
Option Explicit
Sub NestedQuotes()
Const s As String = "I say ""Did you know that she said """"Hi there!"""""""
Dim COL As Collection
Dim Start As Long, Length As Long, sTemp As String, V As Variant
Set COL = New Collection
sTemp = s
COL.Add sTemp
Do Until InStr(sTemp, Chr(34)) = 0
sTemp = COL(COL.Count)
sTemp = Replace(sTemp, String(2, Chr(34)), String(1, Chr(34)))
Start = InStr(sTemp, Chr(34)) + 1
Length = InStrRev(sTemp, Chr(34)) - Start
sTemp = Mid(sTemp, Start, Length)
COL.Add sTemp
Loop
For Each V In COL
Debug.Print V
Next V
End Sub
My Solution
I spent some more time thinking and came up with this solution.
Function sMineDoubleQuoteHierarchy(s As String) As String
'Check the number of quotes in the string are even - sanity check
If (Len(s) - Len(Replace(s, """", ""))) Mod 2 <> 0 Then sMineDoubleQuoteHierarchy = "Error - Odd number of quotes found in sMineDoubleQuoteHierarchy() function": Exit Function
'First thing to do is find the first and last *single* quote in the string
Dim lStart, lEnd, i As Long, fs As String
lStart = InStr(1, s, """")
lEnd = InStrRev(s, """")
'After these have been found we need to remove them.
s = Mid(s, lStart + 1, lEnd - lStart - 1)
'Start at the first character
i = 1
Do While True
'Find where the next double quote is
i = InStr(1, s, """""")
'if no double quote is found then concatenate with fs with the remainder of s
If i = 0 Then Exit Do
'Else add on the string up to the char before the ith quote
fs = fs & Left(s, i - 1)
'Replace the ith double quote with a single quote
s = Left(s, i - 1) & Replace(s, """""", """", i, 1)
'Increment by 1 (ensuring the recently converted double quote is no longer a single quote
i = i + 1
Loop
'Return fs
sMineDoubleQuoteHierarchy = s
End Function
What's going on in this solution?
The first part of the process is removing the first and last single quote from the string and returning the text between them. Then we loop through the string replacing each instance of "" and replacing it with ". Each time we do this we skip to the next character to unsure strings like """" go to "" instead of ".
Does anyone else have a better/more compact solution?
Edit
After all the suggestions in this forum I settled with this. It's got some extra error trapping to find validate nested strings.
Public Function DoubleQuoteExtract(ByVal s As String, Optional ByRef ErrorLevel As Boolean) As String
'This effectively parses the string like BASIC does by removing incidents of "" and replacing them with "
'SANITY CHECK - Check even number of quotes
Dim countQuote As Double
countQuote = Len(s) - Len(Replace(s, """", ""))
'Calculate whether or not quote hierarchy is correct:
'"..." - Is okay - Count Quotes = 2 - Count Quotes / 2 = 1
'""..."" - Is not okay - Count Quotes = 4 - Count Quotes / 2 = 2
'"""...""" - Is okay - Count Quotes = 6 - Count Quotes / 2 = 3
'""""..."""" - Is not okay - Count Quotes = 8 - Count Quotes / 2 = 4
'etc.
'Ultimately: IF CountQuotes/2 = Odd The string hierarchy is setup fine
' IF CountQuotes/2 = Even, The string Hierarchy is setup incorrectly.
Dim X As Double: X = countQuote / 2
Dim ceil As Long: ceil = Int(X) - (X - Int(X) > 0)
If ceil Mod 2 <> 0 Then sDoubleQuoteExtract = "#Error - Incorrect number of double quotes forming an incomplete hierarchy.": GoTo ErrorOccurred
'If an odd number of quotes are found then they cannot be paired correctly, thus throw error
If countQuote Mod 2 <> 0 Then sDoubleQuoteExtract = "#Error - Odd number of quotes found in sMineDoubleQuoteHierarchy() function": GoTo ErrorOccurred
'Find the next incident of single quote. Trim the string to this
s = Mid(s, InStr(1, s, String(1, Chr(34))))
'replace all instances of "" with "
s = Replace(s, String(2, Chr(34)), String(1, Chr(34)))
'Finally trim off the first and last quotes
DoubleQuoteExtract = Mid(s, 2, Len(s) - 2)
ErrorLevel = False
Exit Function
ErrorOccurred:
ErrorLevel = True
End Function

Excel Extract nth first words from a string

I'd like to create a function in vba to extract the first nth words from a string and to look like this
ExtractWords(affected_text, delimiter, number_of_words_to_extract)
I tried a solution but it only extracts the first two words.
Function FirstWords(myStr As Variant, delimiter,words_to_extract) As Variant
FirstWords = Left(myStr, InStr(InStr(1, myStr, delimiter) + 1, myStr, delimiter, vbTextCompare) - 1)
End Function
Any ideas? Thanks
Use Split() function. It returns array of String, split using the delimiter and limit of words you specify.
Dim Result As Variant
Result = Split("Alice,Bob,Chuck,Dave", ",") 'Result: {"Alice,"Bob","Chuck","Dave"}
Result = Split("Alice,Bob,Chuck,Dave", ",", 2) 'Result: {"Alice,"Bob"}
#Taosique's answer using Split is excellent, but if you want the result returned as a string you can do the following:
Function FirstWords(myStr As String, delimiter As String, words_to_extract As Long) As Variant
Dim i As Long, k As Long
For i = 1 To Len(myStr)
If Mid(myStr, i, 1) = delimiter Then
k = k + 1
If k = words_to_extract Then
FirstWords = Mid(myStr, 1, i)
Exit Function
End If
End If
Next I
'if you get to here -- trouble
'unless the delimiter count is words_to_extract - 1
If k = words_to_extract - 1 Then
FirstWords = myStr
Else
FirstWords = CVErr(xlErrValue)
End If End Function
Sub test()
Debug.Print FirstWords("This is a test. I hope it works", " ", 4)
Debug.Print FirstWords("This is a test. I hope it works", " ", 10)
End Sub
When test is run it first displays the string "This is a test." then prints an error condition.
Much the same effect as the above can be achieved by first splitting the string using Split and then rejoining it using Join. A subtle difference is the behavior if there are less than words_to_extract words. The Split then Join approach will return the whole string. The above code treats this as an error condition and, if used as a UDF worksheet function, will display #VALUE! in any cell that contains it.

How to remove letters from end of ID string (variable letter amount)?

I have a list of ID's that I'm trying to clean and compare with another list. The ID's have variable formatting (e.g. RFP322343BA, PPL232334, RFP32334A-00). I'm trying to standardize the data on the front-end (e.g. RFP322343, PPL232334, and RFP32234) to allow for comparison. How can I remove these end text/symbol strings of varying length?
With RFP32334A-00 in cell A1, then
=IF(RIGHT(LEFT(A1,9),1)="A",LEFT(A1,8),LEFT(A1,9))
works, assuming 1) only the first 9 chars are of interest and 2) it is only ever "A" in the 9th place for the "odd" number you provided above. If there are only a few of these odd ones, then just left(a1,9) will be simpler.
Consider the following User Defined Function (UDF):
Public Function FirstPart(sIn As String) As String
Dim i As Long, L As Long, Armed As Boolean, CH As String
FirstPart = ""
Armed = False
L = Len(sIn)
For i = 1 To L
CH = Mid(sIn, i, 1)
If IsNumeric(CH) Then
Armed = True
End If
If Not Armed Then
FirstPart = FirstPart & CH
Else
If Not IsNumeric(CH) Then
Exit Function
Else
FirstPart = FirstPart & CH
End If
End If
Next i
End Function
It locates the first non-numerical character after the first numerical character and clips the string at that point.

In Excel, how can I create the "intersection" of two strings (without dropping into VB)?

In Mac Excel 2011, I have two strings, each consisting of a space-separated concatenation of smaller, spaceless strings. For example:
"red green blue pink"
"horse apple red monkey pink"
From those, I'd like to extract the intersection string:
"red pink"
I can do it in VB, but I'd prefer to stay in Excel proper. Now I know I could hack something together (in Excel) by making an assumption about the number of smaller component strings within each larger string. I could then chop one of the larger strings into those components and then for each do a FIND() on the second large string, concatenating the result as I went.
Problem is, although here I'm giving only two strings, in practice I have two sets of strings, each containing 20 large strings. So the "chop and walk" approach feels like O(N^2) in terms of space in Excel, and I'm looking for a simpler way.
Any ideas?
I don't think you can do it in a single cell function without using multiple cells or VBA. Define a UDF like the one below and use the new function in the one cell with the syntax
=StringIntersect("a b c","d e b f")
which would return "b"
This function does have the nested loop but on string arrays I imagine it will be quick enough
Function StringIntersect(s1 As String, s2 As String) As String
Dim arys1() As String
Dim arys2() As String
Dim arysub() As String
Dim i as integer
Dim j as integer
arys1 = Split(s1, " ")
arys2 = Split(s2, " ")
For i = LBound(arys1) To UBound(arys1)
For j = LBound(arys2) To UBound(arys2)
If arys1(i) = arys2(j) Then StringIntersect = StringIntersect & arys1(i) & " "
Next
Next
StringIntersect = Trim(StringIntersect) 'remove trailing space
End Function
If you don't want to do to the two loops you should be able to do something with inStr which is very quick. I haven't done any speed testing but I suspect the function below is quicker, however you will get unexpected results is the string is duplicated in the first input or the string in the first input is a substring in the second. This could be avoided with more checking but you would probably loose the speed benefit.
Function StringIntersect(s1 As String, s2 As String) As String
Dim arys1() As String
arys1 = Split(s1, " ")
For i = LBound(arys1) To UBound(arys1)
If InStr(1, s2, arys1(i), vbBinaryCompare) > 0 Then StringIntersect = StringIntersect & arys1(i) & " "
Next
StringIntersect = Trim(StringIntersect) 'remove trailing space
End Function
General case for all string
Eg: StringIntersect("abcdefgh", "adefh") = "def"
Function StringIntersect(s1 As String, s2 As String) As String
Dim i As Integer
Dim j As Integer
Dim k As Integer
k = 1
For i = 1 To Len(s1)
For j = 1 To Len(s2)
Do While Mid(s1, i, k) = Mid(s2, j, k) And i + k - 1 <= Len(s1) And j + k - 1 <= Len(s2)
StringIntersect = Mid(s1, i, k)
k = k + 1
Loop
Next j
Next i
End Function

Resources