Related
I have a string in a cell composed of several shorter strings of various lengths with blank spaces and commas in between. In some cases only one or more blanks are in between.
I want to remove every blank space and comma and only leave behind 1 comma between each string element. The result must look like this:
The following doesn't work. I'm not getting an error but the strings are truncated at the wrong places. I don't understand why.
Sub String_adaption()
Dim i, j, k, m As Long
Dim STR_A As String
STR_A = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
i = 1
With Worksheets("table")
For m = 1 To Len(.Range("H" & i))
j = 1
Do While Mid(.Range("H" & i), m, 1) = "," And Mid(.Range("H" & i), m - 1, 1) <> Mid(STR_A, j, 1) And m <> Len(.Range("H" & i))
.Range("H" & i) = Mid(.Range("H" & i), 1, m - 2) & Mid(.Range("H" & i), m, Len(.Range("H" & i)))
j = j + 1
Loop
Next m
End With
End Sub
I'd use a regular expression to replace any combination of spaces and comma's. Something along these lines:
Sub Test()
Dim str As String: str = "STRING_22 ,,,,,STRING_1 , , ,,,,,STRING_333 STRING_22 STRING_4444"
Debug.Print RegexReplace(str, "[\s,]+", ",")
End Sub
Function RegexReplace(x_in, pat, repl) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = pat
RegexReplace = .Replace(x_in, repl)
End With
End Function
Just for the sake of alternatives:
Formula in B1:
=TEXTJOIN(",",,TEXTSPLIT(A1,{" ",","}))
The following function will split the input string into pieces (words), using a comma as separator. When the input string has multiple commas, it will result in empty words.
After splitting, the function loops over all words, trims them (remove leading and trailing blanks) and glue them together. Empty words will be skipped.
I have implemented it as Function, you could use it as UDF: If your input string is in B2, write =String_adaption(B2) as Formula into any cell.
Function String_adaption(s As String) As String
' Remove duplicate Commas and Leading and Trailing Blanks from words
Dim words() As String, i As Long
words = Split(s, ",")
For i = 0 To UBound(words)
Dim word As String
word = Trim(words(i))
If word <> "" Then
String_adaption = String_adaption & IIf(String_adaption = "", "", ",") & word
End If
Next i
End Function
P.S.: Almost sure that this could be done with some magic regular expressions, but I'm not an expert in that.
If you have recent Excel version, you can use simple worksheet function to split the string on space and on comma; then put it back together using the comma deliminater and ignoring the blanks (and I just noted #JvdV had previously posted the same formula solution):
=TEXTJOIN(",",TRUE,TEXTSPLIT(A1,{" ",","}))
In VBA, you can use a similar algorithm, using the ArrayList object to collect the non-blank results.
Option Explicit
Function commaOnly(s As String) As String
Dim v, w, x, y
Dim al As Object
Set al = CreateObject("System.Collections.ArrayList")
v = Split(s, " ")
For Each w In v
x = Split(w, ",")
For Each y In x
If y <> "" Then al.Add y
Next y
Next w
commaOnly = Join(al.toarray, ",")
End Function
This preserves the spaces within the smaller strings.
Option Explicit
Sub demo()
Const s = "STRING 22,,,, ,,STRING 1,,,, ,,STRING 333 , , , STRING_22 STRING_44"
Debug.Print Cleanup(s)
End Sub
Function Cleanup(s As String) As String
Const SEP = ","
Dim regex, m, sOut As String, i As Long, ar()
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.MultiLine = False
.IgnoreCase = True
.Pattern = "([^,]+)(?:[ ,]*)"
End With
If regex.Test(s) Then
Set m = regex.Execute(s)
ReDim ar(0 To m.Count - 1)
For i = 0 To UBound(ar)
ar(i) = Trim(m(i).submatches(0))
Next
End If
Cleanup = Join(ar, SEP)
End Function
Code categories approach
For the sake of completeness and to show also other ways "leading to Rome", I want to demonstrate an approach allowing to group the string input into five code categories in order to extract alphanumerics by a tricky match (see [B] Function getCats()):
To meet the requirements in OP use the following steps:
1) remove comma separated tokens if empty or only blanks (optional),
2) group characters into code categories,
3) check catCodes returning alpha nums including even accented or diacritic letters as well as characters like [ -,.+_]
Function AlphaNum(ByVal s As String, _
Optional IgnoreEmpty As Boolean = True, _
Optional info As Boolean = False) As String
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Date: 2023-01-12
'1) remove comma separated tokens if empty or only blanks (s passed as byRef argument)
If IgnoreEmpty Then RemoveEmpty s ' << [A] RemoveEmpty
'2) group characters into code categories
Dim catCodes: catCodes = getCats(s, info) ' << [B] getCats()
'3) check catCodes and return alpha nums plus chars like [ -,.+_]
Dim i As Long, ii As Long
For i = 1 To UBound(catCodes)
' get current character
Dim curr As String: curr = Mid$(s, i, 1)
Dim okay As Boolean: okay = False
Select Case catCodes(i)
' AlphaNum: cat.4=digits, cat.5=alpha letters
Case Is >= 4: okay = True
' Category 2: allow only space, comma, minus
Case 2: If InStr(" -,", curr) <> 0 Then okay = True
' Category 3: allow only point, plus, underline
Case 3: If InStr(".+_", curr) <> 0 Then okay = True
End Select
If okay Then ii = ii + 1: catCodes(ii) = curr ' increment counter
Next i
ReDim Preserve catCodes(1 To ii)
AlphaNum = Join(catCodes, vbNullString)
End Function
Note: Instead of If InStr(" -,", curr) <> 0 Then in Case 2 you may code If curr like "[ -,]" Then, too. Similar in Case 3 :-)
[A] Helper procedure RemoveEmpty
Optional clean-up removing comma separated tokens if empty or containing only blanks:
Sub RemoveEmpty(ByRef s As String)
'Purp: remove comma separated tokens if empty or only blanks
Const DEL = "$DEL$" ' temporary deletion marker
Dim i As Long
Dim tmp: tmp = Split(s, ",")
For i = LBound(tmp) To UBound(tmp)
tmp(i) = IIf(Len(Trim(tmp(i))) = 0, DEL, Trim(tmp(i)))
Next i
tmp = Filter(tmp, DEL, False) ' remove marked elements
s = Join(tmp, ",")
End Sub
[B] Helper function getCats()
A tricky way to groups characters into five code categories, thus building the basic logic for any further analyzing:
Function getCats(s, Optional info As Boolean = False)
'Purp.: group characters into five code categories
'Auth.: https://stackoverflow.com/users/6460297/t-m
'Site: https://stackoverflow.com/questions/15723672/how-to-remove-all-non-alphanumeric-characters-from-a-string-except-period-and-sp/74679416#74679416
'Note: Cat.: including:
' 1 ~~> apostrophe '
' 2 ~~> space, comma, minus etc
' 3 ~~> point separ., plus etc
' 4 ~~> digits 0..9
' 5 ~~> alpha (even including accented or diacritic letters!)
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'a) get array of single characters
Const CATEG As String = "' - . 0 A" 'define group starters (case indep.)
Dim arr: arr = Char2Arr(s) ' << [C] Char2Arr()
Dim chars: chars = Split(CATEG)
'b) return codes per array element
getCats = Application.Match(arr, chars) 'No 3rd zero-argument!!
'c) display in immediate window (optionally)
If info Then Debug.Print Join(arr, "|") & vbNewLine & Join(getCats, "|")
End Function
[C] Helper function Char2Arr
Assigns every string character to an array:
Function Char2Arr(ByVal s As String)
'Purp.: assign single characters to array
s = StrConv(s, vbUnicode)
Char2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function
I have a list that was copied from a 'table of contents' page to column D. Unfortunately, each cell contains the chapter number, chapter name, and the page number.
3.14.4 chapter name placeholder.140
Sometimes there is a space between the page number and the last character. other times there is not.
I've tried
Function john(txt As String) As Long
Dim x
x = Split(Trim(txt), Chr(32))
john = Val(x(UBound(x)))
End Function
Which does work but I'd like to be able to apply this to the chapter number as well afterwards.
Private Sub FIND_LAST_NUMBER()
Dim A As String
Dim B As Integer
Dim C As String
Dim D As String
x = 3
Do While ActiveSheet.Cells(x, 4).Value <> ""
A = Range("D" & x).Value
A = Trim(A)
B = Len(A)
For Position = B To 1 Step -1
C = Mid(A, Position, 1)
'MsgBox C
If C <> " " Then
D = Right(A, B - Position)
Range("E" & x).Value = C
GoTo LastLine
'Exit Sub
End If
Next Position
LastLine:
x = x + 1
Loop
End Sub
but I'm trying to figure out how to get all of the number instead of only the last digit of the page number from the original cell
I am obviously not getting something here.
Any tips or tricks will be greatly appreciated
One, admittedly not very beautiful solution I can think of right away would be to use Replace to remove all non-numeric characters.
Dim str As String
str = Replace(str, " ", "") '<- to remove the random spaces
str = LCase(str) '<- making everything lower case
For i = 97 To 122
str = Replace(str, Chr(i), "")
Next i
Chr(i) with i from 97 to 122 will be every Character of the standard Alphabet.
This does not work if special Characters appear in the Chapter Name String. If the Chapter name contains numbers these will remain, but you could detect that case because UBound of the split array will be 1 greater than usual.
Also if you can quickly scan all the cells with your data for other unwanted Characters like - / or whatever might occur, you can also get rid of them with Replace
Performance of this solution might not be great but for a quick fix it might do..
Using a formula, not VBA, I would like to come up with a solution to split a string composed of multiple words. The formula should recognize the words where there is a capital letter and separate them. The result would be a string where the words are separated by ",".
To clarify this is an example of the string:
Nursing StudentStudentNurseNursing School
Desired Result:
Nursing Student,Student,Nurse,Nursing School
I am trying the following formula but I can only isolate the first word:
{=LEFT(Q4,SMALL(FIND(CHAR(ROW(INDIRECT("65:90"))),Q4&"ABCDEFGHIJKLMNOPQRSTUVWXYZ"),2)-1)}
Any suggestion?
To accomplish this, you will need pure VBA. Create a custom Function to get in 1 cell the string you want. Then, use Text to Columns later if you need it.
My function:
Public Function GET_STRING(ByVal ThisCell As Range) As String
Dim i As Integer
Dim MyPositions As String
Dim ArrPositions As Variant
For i = 2 To Len(ThisCell.Value) Step 1
If Mid(ThisCell.Value, i, 1) = UCase(Mid(ThisCell.Value, i, 1)) And _
Mid(ThisCell.Value, i, 1) <> " " And Left(Mid(ThisCell.Value, i - 1, 1), 1) <> " " Then MyPositions = MyPositions & i & ";"
Next i
ArrPositions = Split(Left(MyPositions, Len(MyPositions) - 1), ";")
For i = 0 To UBound(ArrPositions) Step 1
If i = 0 Then
GET_STRING = Left(ThisCell.Value, ArrPositions(i) - 1) & "," & Mid(ThisCell.Value, ArrPositions(i), ArrPositions(i + 1) - ArrPositions(i))
ElseIf i <> UBound(ArrPositions) Then
GET_STRING = GET_STRING & "," & Mid(ThisCell.Value, ArrPositions(i), ArrPositions(i + 1) - ArrPositions(i))
Else
GET_STRING = GET_STRING & "," & Mid(ThisCell.Value, ArrPositions(i), Len(ThisCell.Value) - ArrPositions(i) + 1)
End If
Next i
End Function
What I get when i use it on excel
You're pushing the envelope with this requirement. What you want to achieve requires looping over the same string repeatedly. That can only be done with recursion and Excel formulas don't do recursion.
With modern Excel 2016 you have Power Query (Get & Transform, or the add-in for Excel 2010 and 2013) and you can use that to write out the logic in M code if you don't want to use VBA. Power Query can be saved in a macro-free workbook and new data can be processed with the click of the "Refresh all" command in the ribbon.
In B2:C28 fill in these :
A ,A
B ,B
C ,C
D ,D
E ,E
F ,F
G ,G
H ,H
I ,I
J ,J
K ,K
L ,L
M ,M
N ,N
O ,O
P ,P
Q ,Q
R ,R
S ,S
T ,T
U ,U
V ,V
W ,W
X ,X
Y ,Y
Z ,Z
,
Note: B28 = , C28 =
then in A2 =SUBSTITUTE(A1,B2,C2) then drag until A28,
in A29 =RIGHT(A28,LEN(A28)-1) Done.
Hope that helps. (:
+------[edit]-----+
or in one line :
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"Z",",Z"),"Y",",Y"),"X",",X"),"W",",W"),"V",",V"),"U",",U"),"T",",T"),"S",",S"),"R",",R"),"Q",",Q"),"P",",P"),"O",",O"),"N",",N"),"M",",M"),"L",",L"),"K",",K"),"J",",J"),"I",",I"),"H",",H"),"G",",G"),"F",",F"),"E",",E"),"D",",D"),"C",",C"),"B",",B"),"A",",A")," ,"," ")
This is what I use in access VBA
Pass a string like
?GET_SPLIT_STRING("SplitAtCapitals")
and get back the following
Split At Capitals
Public Function GET_SPLIT_STRING(xStr As String) As String
Dim i As Integer, xchar As String, ychar As String
ychar = UCase(Left(xStr, 1))
For i = 2 To Len(xStr) Step 1
xchar = Mid(xStr, i, 1)
If asc(xchar) = asc(UCase(xchar)) Then
xchar = Space(1) & xchar
End If
ychar = ychar & xchar
Next
GET_SPLIT_STRING = ychar
End Function
I´m trying to unify the format of a large .xlsx file I received.
One of the problems I found, is that there are entries which "unique code" is "00UTract 32", "132Unit 359", "5555UT22"... and then I´ve found we´ve "00 UTract 32", "Unit 359, 132", and "22UT, 5555".
As you may suspect, there are duplicates, and I confirmed that was the case.
So, how should I do to add a space each time I find a letter next to a number, so I can start cleaning the mess easily?
Thanks!!!
Select the cells you wish to check/correct and run this macro:
Sub DataFixer()
Dim r As Range, DoIt As Boolean
Dim temp As String, CH As String, v As String
Dim i As Long, L As Long
For Each r In Selection
temp = ""
DoIt = False
v = r.Value
L = Len(v)
CH = Mid(v, 1, 1)
temp = CH
For i = 2 To L
CH = Mid(v, i, 1)
If IsNumeric(Right(temp, 1)) And CH Like "[a-zA-Z]" Then
DoIt = True
temp = temp & " "
End If
temp = temp & CH
Next i
If DoIt Then r.Value = temp
Next r
End Sub
The macro checks each select cell for occurrences of:
{number}{letter}
and replaces them with:
{number} {letter}
I'd probably do this the other way around assuming that the only difference in IDs are the spaces.
Simply remove all spaces from that column, and you will get the same values, without having to deal with checking each character in a string.
This can be done via CTRL+H and no need to introduce VB in it.
In Mac Excel 2011, I have two strings, each consisting of a space-separated concatenation of smaller, spaceless strings. For example:
"red green blue pink"
"horse apple red monkey pink"
From those, I'd like to extract the intersection string:
"red pink"
I can do it in VB, but I'd prefer to stay in Excel proper. Now I know I could hack something together (in Excel) by making an assumption about the number of smaller component strings within each larger string. I could then chop one of the larger strings into those components and then for each do a FIND() on the second large string, concatenating the result as I went.
Problem is, although here I'm giving only two strings, in practice I have two sets of strings, each containing 20 large strings. So the "chop and walk" approach feels like O(N^2) in terms of space in Excel, and I'm looking for a simpler way.
Any ideas?
I don't think you can do it in a single cell function without using multiple cells or VBA. Define a UDF like the one below and use the new function in the one cell with the syntax
=StringIntersect("a b c","d e b f")
which would return "b"
This function does have the nested loop but on string arrays I imagine it will be quick enough
Function StringIntersect(s1 As String, s2 As String) As String
Dim arys1() As String
Dim arys2() As String
Dim arysub() As String
Dim i as integer
Dim j as integer
arys1 = Split(s1, " ")
arys2 = Split(s2, " ")
For i = LBound(arys1) To UBound(arys1)
For j = LBound(arys2) To UBound(arys2)
If arys1(i) = arys2(j) Then StringIntersect = StringIntersect & arys1(i) & " "
Next
Next
StringIntersect = Trim(StringIntersect) 'remove trailing space
End Function
If you don't want to do to the two loops you should be able to do something with inStr which is very quick. I haven't done any speed testing but I suspect the function below is quicker, however you will get unexpected results is the string is duplicated in the first input or the string in the first input is a substring in the second. This could be avoided with more checking but you would probably loose the speed benefit.
Function StringIntersect(s1 As String, s2 As String) As String
Dim arys1() As String
arys1 = Split(s1, " ")
For i = LBound(arys1) To UBound(arys1)
If InStr(1, s2, arys1(i), vbBinaryCompare) > 0 Then StringIntersect = StringIntersect & arys1(i) & " "
Next
StringIntersect = Trim(StringIntersect) 'remove trailing space
End Function
General case for all string
Eg: StringIntersect("abcdefgh", "adefh") = "def"
Function StringIntersect(s1 As String, s2 As String) As String
Dim i As Integer
Dim j As Integer
Dim k As Integer
k = 1
For i = 1 To Len(s1)
For j = 1 To Len(s2)
Do While Mid(s1, i, k) = Mid(s2, j, k) And i + k - 1 <= Len(s1) And j + k - 1 <= Len(s2)
StringIntersect = Mid(s1, i, k)
k = k + 1
Loop
Next j
Next i
End Function