Seperating a character string - excel

I want to separate a character string using the special characters in that string as cutting lines. After each division the next group of strings should be copied in the next column. The picture below shows how it should work.
My first approach doesn't work and maybe it's too complicated. Is there a simple solution to this task?
Sub SeparateString()
Dim i, j, k, counterA, counterB As Integer
Dim str1, str2 As String
Const Sonderz As String = "^!§$%&/()=?`*'_:;°,.-#+´ß}][{³²"
For i = 1 To Worksheets("Tabelle1").Range("A1").End(xlDown).Row
counterA = 0
For j = 1 To Len(Worksheets("Tabelle1").Range("A" & i))
counterB = 0
For k = 1 To Len(Sonderz)
If Mid(Worksheets("Tabelle1").Range("A" & i), j, 1) = Mid(Sonderz, k, 1) Then
counterA = counterA + 1
End If
If Mid(Worksheets("Tabelle1").Range("A" & i), j, 1) <> Mid(Sonderz, k, 1) And counterA = 0 And counterB = 0 Then
Worksheets("Tabelle1").Range("B" & i) = Worksheets("Tabelle1").Range("B" & i) & Mid(Worksheets("Tabelle1").Range("A" & i), j, 1)
counterB = counterB + 1
End If
Next k
Next j
Next i
End Sub

If you are interested and you do have access to Microsoft 365's dynamic arrays:
Formula in B1:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),TRANSPOSE(FILTERXML(CONCAT("<t><s>",IF(ISNUMBER(FIND(X,"^!§$%&/()=?`*'_:;°,.-#+´ß}][{³²")),"</s><s>",X),"</s></t>"),"//s")))
Or nest a SUBSTITUTE() if you need to return string variables:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),TRANSPOSE(SUBSTITUTE(FILTERXML(CONCAT("<t><s>'",IF(ISNUMBER(FIND(X,"^!§$%&/()=?`*'_:;°,.-#+´ß}][{³²")),"</s><s>'",X),"</s></t>"),"//s"),"'","")))
If VBA is a must, you could think about a regular expression to replace all the characters from a certain class with a uniform delimiter to use Split() on:
Sub Test()
Dim s As String: s = "CD!02?WX12EF"
Dim arr() As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "[!§$%&\/()=?`*'_:;°,.#+´ß}\][{³²^-]"
arr = Split(.Replace(s, "!"), "!")
End With
For Each el In arr
Debug.Print el
Next
End Sub
The caret has been moved from being the 1st character to any but the first to prevent a negated-character class; also the hyphen has been moved to the back to prevent an array-construct of characters. Funny enough, if you actually want to be less verbose you could throw these character in a more condense class [!#-/:;?[]-`{}§=°´ß³²].

Assuming the first data is in cell A2,
I would go with the simple use of find() with left() mid() and right()
=left(A2,find("!",A2,1)-1)
then:
=mid(A2,find("!",A2,1)+1,find("?",A2,1)-find("!",A2,1)-1)
and:
=right(A2,len(A2)-find("?",A2,1))
Tested and working with one correction done.

You can also do this in Power Query which has a command to split by ANY delimiter.
In the code below, I generate a list of all possible special characters defined as characters NOT in the set of A..Za..z0..9 and you can easily add to that list by editing the code if you want to include other characters in the permitted list.
Edit: If you only have a few special characters, you can just hard-code the list, eg {"!","?"} instead of using List.Generate, but in your question you did not necessarily restrict the list of special characters, even though those are the only two showing in your examples
To use Power Query:
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//change Table name in next line to reflect actual table name
Source = Excel.CurrentWorkbook(){[Name="Table15"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Source", type text}}),
//Generate list of "special characters" for splitting
//the List.Contains function contains the non-special characters
splitterList = List.RemoveNulls(
List.Generate(()=>0,
each _ <= 255,
each _ +1,
each if List.Contains({"A".."Z","a".."z","0".."9"}, Character.FromNumber(_)) then null else Character.FromNumber(_))),
splitIt = Table.SplitColumn(#"Changed Type", "Source",
Splitter.SplitTextByAnyDelimiter(splitterList))
in
splitIt

Related

Remove Certain Characters from a String using UDF

I have a column which contain cells that have some list of alphanumeric number system as follows:
4A(4,5,6,7,8,9); 4B(4,5,7,8); 3A(1,2,3); 3B(1,2,3), 3C(1,2)
On a cell next to it, I use a UDF function to get rid of special characters "(),;" in order to leave the data as
4A456789 4B4578 3A123 3B123 3C12
Function RemoveSpecial(Str As String) As String
Dim SpecialChars As String
Dim i As Long
SpecialChars = "(),;-abcdefghijklmnopqrstuvwxyz"
For i = 1 To Len(SpecialChars)
Str = Replace$(Str, Mid$(SpecialChars, i, 1), "")
Next
RemoveSpecial = Str
End Function
For the most part this works well. However, on certain occasions, the cell would contain an unorthodox pattern such as when a space is included between the 4A and the parenthesized items:
4A (4,5,6,7,8,9);
or when a text appears inside the parenthesis (including two spaces on each side):
4A (4,5, skip 8,9);
or a space appears between the first two characters:
4 A(4,5,6)
How would you fix this so that the random spaces are removed except to delaminate the actual combination of data?
One strategy would be to substitute the patterns you want to keep before eliminating the "special" characters, then restore the desired patterns.
From your sample data, it look like you want to keep a space only if it follow ); or ),
Something like this:
Function RemoveSpecial(Data As Variant) As Variant
Dim SpecialChars As String
Dim KeepStr As Variant, PlaceHolder As Variant, ReplaceStr As Variant
Dim i As Long
Dim DataStr As String
SpecialChars = " (),;-abcdefghijklmnopqrstuvwxyz"
KeepStr = Array("); ", "), ")
PlaceHolder = Array("~0~", "~1~") ' choose a PlaceHolder that won't appear in the data
ReplaceStr = Array(" ", " ")
DataStr = Data
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, KeepStr(i), PlaceHolder(i))
Next
For i = 1 To Len(SpecialChars)
DataStr = Replace$(DataStr, Mid$(SpecialChars, i, 1), vbNullString)
Next
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, PlaceHolder(i), ReplaceStr(i))
Next
RemoveSpecial = Application.Trim(DataStr)
End Function
Another strategy would be regular expressions (RegEx)
It looks like a regular expression could come in handy here, for example:
Function RemoveSpecial(Str As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\)[;,]( )|[^A-Z\d]+"
RemoveSpecial = .Replace(Str, "$1")
End With
End Function
I have used the regular expression:
\)[;,]( )|[^A-Z\d]+
You can see an online demo to see the result in your browser. The way this works is to apply a form of what some would call "The best regex trick ever!"
\)[;,]( ) - Escape a closing paranthesis, then match either a comma or semicolon before we capture a space character in our 1st capture group.
| - Or use the following alternation:
[^A-Z\d]+ - Any 1+ char any other than in given character class.
EDIT:
In case you have values like 4A; or 4A, you can use:
(?:([A-Z])|\))[;,]( )|[^A-Z\d]+
And replace with $1$2. See an online demo.

Excel FindJobCode's problems

I am new in VBA and I have a code as below to find some job numbers in a description.
However, i have 3 problems on it...
if 1st character is small letter such as "s", "m", then it show error
i cannot solve Example3, the result will show "M3045.67," but all i need is "M3045.67" only, no comma
i don't know why it is failed to run the code Range("E2").Value = "Overhead" after Else in Example5
but for problem 3, i can run result "overhead" before i add 2nd criteria, is something wrong there ? Please help~~~thanks.
P.S. the looping will be added after solving above questions......
Sub FindCode()
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Retenal Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overehad
'Criteria1 : 1st Character = G / S / M / P
If Left(Range("A2"), 1) = "G" Or Left(Range("A2"), 1) = "S" Or Left(Range("A2"), 1) = "M" Or Left(Range("A2"), 1) = "P" Then
'Criteria2 : 2nd-5th Character = Number only
If IsNumeric(Mid(Range("A2"), 2, 4)) Then
'Get string before "space"
Range("E2").Value = Left(Range("A2"), InStr(1, Range("A2"), " ") - 1)
Else
'If not beginning from Crit 1&2, show "Overhead"
Range("E2").Value = "Overhead"
End If
End If
'If start from "P", get first 5 string
If Left(Range("A2"), 1) = "P" And IsNumeric(Mid(Range("A2"), 2, 4)) Then
Range("E2").Value = Left(Range("A2"), 5)
Else
End If
End Sub
The function below will extract the job number and return it to the procedure that called it.
Function JobCode(Cell As Range) As String
' 303
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Rental Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overhead
Dim Fun As String ' function return value
Dim Txt As String ' Text to extract number from
' Minimize the number of times your code reads from the sheet because it's slow
Txt = Cell.Value ' actually, it's Cells(2, 1)
' Criteria1 : 1st Character = G / S / M / P
If InStr("GSMP", UCase(Left(Txt, 1))) Then
Txt = Split(Txt)(0) ' split on blank, take first element
' Criteria2 : 2nd-5th Character = Number only
' Isnumeric(Mid("A4", 2, 4)) = true
If (Len(Txt) >= 5) And (IsNumeric(Mid(Txt, 2, 4))) Then
Fun = Replace(Txt, ",", "")
Fun = Split(Fun, "-")(0) ' discard "-123A" in example 4
End If
End If
' If no job number was extracted, show "Overhead"
If Len(Fun) = 0 Then Fun = "Overhead"
JobCode = Fun
End Function
The setup as a function, rather than a sub, is typical for this sort of search. In my trials I had your 5 examples in A2:A6 and called them in a loop, giving a different cell to the function on each loop. Very likely, this is what you are angling for, too. This is the calling procedure I used for testing.
Sub Test_JobCode()
' 303
Dim R As Long
For R = 2 To Cells(Rows.Count, "A").End(xlUp).Row
' I urge you not to use syntax for addressing ranges when addressing cells
Debug.Print JobCode(Cells(R, "A")) ' actually, it's Cells(2, 1)
Next R
End Sub
Of course, instead of Debug.Print JobCode(Cells(R, "A")) you could also have Cells(R, "B").Value = JobCode(Cells(R, "A"))
The reason why your Else statement didn't work was a logical error. The "Overhead" caption doesn't apply if criteria 1 & 2 aren't met but if all previous efforts failed, which is slightly broader in meaning. This combined with the fact that Isnumeric(Mid("A4", 2, 4)) = True, causing the test not to fail as you expected.
In rough terms, the code first checks if the first letter qualifies the entry for examination (and returns "Overhead" if it doesn't). Then the text is split into words, only the first one being considered. If it's too short or non-numeric no job code is extracted resulting in "Overhead" in the next step. If this test is passed, the final result is modified: The trailing comma is removed (it it exists) and anything appended with a hyphen is removed (if it exists). I'm not sure you actually want this. So, you can easily remove the line. Or you might add more modifications at that point.
What you are trying to do is FAR easier using regular expression matching and replacing, so I recommend enabling that library of functions. The best news about doing that is that you can invoke those functions in EXCEL formulas and do not need to use Visual Basic for Applications at all.
To enable Regular Expressions as Excel functions:
Step 1: Enable the Regular Expression library in VBA.
A. In the Visual Basic for Applications window (where you enter VBA code) find the Tools menu and
select it, then select the References... entry in the sub-menu.
B. A dialogue box will appear listing the possible "Available References:" in alphabetical order.
Scroll down to find the entry "Microsoft VBScript Regular Expressions 5.5".
C. Check the checkbox on that line and press the OK button.
Step 2: Create function calls. In the Visual Basic for Applications window select Insert..Module. Then paste the following VBA code into the blank window that comes up:
' Some function wrappers to make the VBScript RegExp reference Library useful in both VBA code and in Excel & Access formulas
'
Private rg As RegExp 'All of the input data to control the RegExp parsing
' RegExp object contains 3 Boolean options that correspond to the 'i', 'g', and 'm' options in Unix-flavored regexp
' IgnoreCase - pretty self-evident. True means [A-Z] matches lowercase letters and vice versa, false means it won't
' IsGlobal - True means after the first match has been processed, continue on from the current point in DataString and look to process more matches. False means stop after first match is processed.
' MultiLine - False means ^ and $ match only Start and End of DataString, True means they match embedded newlines. This provides an option to process line-by-line when Global is true also.
'
' Returns true/false: does DataString match pattern? IsGlobal=True makes no sense here
Public Function RegExpMatch(DataString As String, Pattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = False, Optional MultiLine As Boolean = False) As Boolean
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpMatch = rg.Test(DataString)
End Function
'
' Find <pattern> in <DataString>, replace with <ReplacePattern>
' Default IsGlobal=True means replace all matching occurrences. Call with False to replace only first occurrence.
'
Public Function RegExpReplace(DataString As String, Pattern As String, ReplacePattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = True, Optional MultiLine As Boolean = False) As String
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpReplace = rg.Replace(DataString, ReplacePattern)
End Function
Now you can call RegExpMatch & RegExpReplace in Excel formulas and we can start to think of how to solve your particular problem. To be a match, your string must start with G, S, M, or P. In a regular expression code that is ^[GSMP], where the up-arrow says to start at the beginning and the [GSMP] says to accept a G, S, M or P in the next position. Then any matching string must next have a number of numeric digits. Code that as \d+, where the \d means one numeric digit and the + is a modifier that means accept one or more of them. Then you could have a dot followed by some more digits, or not. This is a little more complicated - you would code it as (\.\d+)? because dot is a special character in regular expressions and \. says to accept a literal dot. That is followed by \d+ which is one or more digits, but this whole expression is enclosed in parentheses and followed by a ?, which means what is in parentheses can appear once or not at all. Finally, comes the rest of the line and we don't really care what is in it. We code .*$ for zero or more characters (any) followed by the line's end. That all goes together as ^[GSMP]\d+(\.\d+)?.*$.
Putting that pattern into our RegExpReplace call:
=RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1")
We wrapped the part we were interested in keeping in parentheses because the "$1" as part of the replacement pattern says to use whatever was found inside the first set of parentheses. Here is that formula used in Excel
This works for all your examples but the last one, which is your else clause in your logic. We can fix that by testing whether the pattern matched using RegExpMatch:
=IF(regexpMatch(A2,"^([GSMP]\d+(\.\d+)?).*$"),RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1"),"Overhead")
This gives the results you are looking for and you have also gained a powerful text manipulation tool to solve future problems.

Parsing a String in Excel VBA without Leaving Trailing Spaces?

I am currently working on an Excel spreadsheet capable of exporting data from the Yahoo Finance API for dynamic stock quote analysis. I am having problems properly parsing the values into my excel spreadsheet. The issue is that the last column of numeric values have a trailing space character, which prevents Excel from recognizing it as a number and formatting it in comma style.
Here is the function I use currently:
Function UpdateStockData(rawWebpageData As String)
Dim stockQuotes As Variant
Dim stockQuoteValues As Variant
Dim i As Integer
Dim j As Integer
stockQuotes = Split(rawWebpageData, vbLf)
For i = 0 To UBound(stockQuotes)
If InStr(stockQuotes(i), ",") > 0 Then
stockQuoteValues = Split(stockQuotes(i), ",")
For j = 0 To UBound(stockQuoteValues)
sheet.Cells(5 + i, 4 + j).Value = stockQuoteValues(j)
sheet.Cells(5 + i, 4 + j).Value = Trim(sheet.Cells(5 + i, 4 + j).Value)
Next j
End If
Next i
End Function
Here is some sample data:
43.99,44.375,41.97,42.62,30098498
573.37,577.11,568.01,573.64,1871694
16.03,16.14,15.93,16.17,25659400
128.54,129.56,128.32,129.36,31666340
126.32,126.68,125.68,126.27,1629499
105.57,106.00,104.78,106.35,4972937
82.58,83.21,82.20,83.37,6214421
27.89,27.9173,27.62,27.83,1003967
49.07,49.56,48.92,49.55,13870589
43.055,43.21,42.88,43.28,25748692
34.12,34.41,33.72,34.095,23005798
159.42,160.56,158.72,161.03,3633635
43.01,43.90,41.00,40.30,10075067
100.25,100.48,99.18,99.74,9179359
139.54,140.49,138.75,140.69,1311226
119.86,120.05,118.7828,120.20,2931459
42.50,42.98,42.47,42.95,16262994
78.02,78.99,77.66,78.99,1826464
89.87,91.35,89.86,91.02,1773576
15.84,15.98,15.76,15.99,78441600
69.50,70.2302,69.49,70.49,2343967
80.895,81.15,78.85,79.60,28126686
33.08,33.20,32.955,33.25,739726
83.08,83.80,82.34,83.16,4475302
64.72,64.90,64.27,64.27,5147320
35.64,41.85,35.40,40.78,15871339
83.08,83.80,82.34,83.16,4475302
22.93,23.099,22.71,23.10,5290225
18.47,19.00,18.30,18.98,71891
69.65,69.684,69.08,69.98,5992137
154.35,155.22,154.00,155.57,4476188
80.08,81.16,79.77,81.51,7731275
47.79,48.87,47.31,48.58,2219634
23.04,23.21,22.97,23.23,891504
114.76,115.47,114.25,116.07,3799034
80.63,81.56,80.56,81.91,6140957
25.66,25.77,25.47,25.86,31543764
87.18,87.96,86.93,87.62,13467554
58.31,58.795,57.61,58.255,5791024
174.62,175.78,174.41,176.15,1035588
84.35,85.24,84.21,85.16,7369986
42.03,42.25,41.69,41.98,3192667
34.19,34.49,34.01,34.57,15652895
101.65,102.12,101.17,102.34,8665474
7.88,8.01,7.84,7.88,10425638
62.13,62.17,61.3525,61.97,16626413
23.10,23.215,22.85,23.18,651929
The last value of each row of data above is where the problem occurs.
Check the value of the last char on the last iteration it might be a return char. You can use the left function to take what you want or replace.
It would be easier to answer if we I can see the value of rawWebpageData variable.
Check the cell format, you can try to set it to numeric if it is text.
If I was doing it I would debug the data and step through it to look for characters that i'm not checking.

Subscripts (font) in excel vba

I'm trying to populate an array which is composed of greek letters followed by a subscript "1". I already have the greek letters part:
Dim variables(), variables_o
j = 0
For i = 1 To 25
If i = 13 Or i = 15 Then
Else
j = j + 1
ReDim Preserve variables(j)
variables(j) = ChrW(944 + i)
End If
Next
But I'm having trouble with the subscript part. I figure that if I could use the with ... end with feature then I could do it but I'm having trouble figuring out what objects the with ... end with can take. On this website they say:
With...End With Statement (Visual Basic)
The data type of objectExpression can be any class or structure type or even a Visual Basic elementary type such as Integer.
But I don't know what that means. If could do something like this:
dim one as string
one = "1"
with one
font.subscript = true
end with
Then I could figure out how to do what I want. But the with feature does not seem to act on strings. The problem I'm having is that most of the advice for fonts somewhere along the line use the cell method but I want to populate an array, so I'm having trouble. Again what I would ideally like to do is create some dimension which is simply a subscripted one and then alter my array as follows:
Dim variables(), variables_o
j = 0
For i = 1 To 25
If i = 13 Or i = 15 Then
Else
j = j + 1
ReDim Preserve variables(j)
variables(j) = ChrW(944 + i) & subscript_one
End If
Next
To my knowledge, there are no out-of-the-box methods or properties to store the font.Subscript property of a character or series of characters within a string that also contains the characters.
You could use inline tags, like in HTML, to indicate where the subscript begins and ends. For example:
variables(j) = ChrW(944 + i) & "<sub>1</sub>"
Then, when you write out variable, you would parse the string, remove the tags and set the font.Subscript property accordingly.
However, if you're always appending a '1' to each Greek letter, I would just append it to the string, then set the font.Subscript property on the last character of the string when outputting it. For example:
variables(j) = ChrW(944 + i) & "1"
...
For j = 0 to Ubound(variables)
With Worksheets("Sheet1").Cells(j + 1, 1)
.Value = variables(j)
.Characters(Len(variables(j)), 1).Font.Subscript = True
End With
Next j
If you're writing to something other than a cell in a worksheet, it has to support Rich-Text in order for the subscript to show, e.g. a Rich-Text enabled TextBox on a user form. You should be able to use the .Characters object on those controls in a similar manner.
See MSDN-Characters Object for more information.

Combining formulas

I have this formula in a table which basically collects data from two columns and combines them. Now, I'm looking to combine this formula with a REPLACE formula that basically takes these characters æ,ø,å and replaces them with a,o,a.
Here's the formula:
=LOWER(LEFT(tableFaste[[#This Row];[Fornavn:]])&tableFaste[[#This Row];[Etternavn:]])
Sorry, don't know of a Formula way to remove any of a list of characters from a string. You might have to revert to vba for this. Here's a user defined function to do it. Your formula will become
=DeleteChars([#UserName],{"æ","ø","å";"a","o","a"})
To replace the characters use {"æ","ø","å";"a","o","a"} where the list up to the ; is the old characters, after the ; the new. You can make the list as long as you need, just make sure the lists are the same length.
To Delete the characters replace use {"æ","ø","å"} an array list of characters you want to remove
UDF code:
Function DeleteChars(r1 As Range, ParamArray c() As Variant) As Variant
Dim i As Long
Dim s As String
s = r1
If UBound(c(0), 1) = 1 Then
For i = LBound(c(0), 2) To UBound(c(0), 2)
s = Replace(s, c(0)(1, i), "")
Next
Else
For i = LBound(c(0), 2) To UBound(c(0), 2)
s = Replace(s, c(0)(1, i), c(0)(2, i))
Next
End If
DeleteChars = s
End Function
You can use SUBSTITUTE
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(LOWER(LEFT(tableFaste[[#This Row];[Fornavn:]])&tableFaste[[#This Row];[Etternavn:]]),"æ","a"),"ø","o"),"å","a")

Resources