Issues stripping special characters from text in VBA - excel

I have an Excel file that pulls in data from a csv, manipulates it a bit, and then saves it down as a series of text files.
There are some special characters in the source data that trip things up so I added this to strip them out
Const SpecialCharacters As String = "!,#,#,$,%,^,&,*,(,),{,[,],},?,â,€,™"
Function ReplaceSpecialCharacters(myString As String) As String
Dim newString As String
Dim char As Variant
newString = myString
For Each char In Split(SpecialCharacters, ",")
newString = Replace(newString, char, "")
Next
ReplaceSpecialCharacters = newString
End Function
The issue is that this doesn't catch all of them. When I try to process the following text it slips through the above code and causes Excel to error out.
Hero’s Village
I think the issue is that the special character isn't being recognized by Excel itself. I was only able to get the text to look like it does above by copying it out of Excel and pasting it into a different IDE. In Excel is displays as:
In the workbook
In the edit field
In the immediate window
Based on this site it looks like it's having issues displaying the ' character, but how do I get it to fix/filter it out if it can't even read it properly in VBA itself?

Option Explicit
dim mystring as String
dim regex as new RegExp
Private Function rgclean(ByVal mystring As String) As String
'function that find and replace string if contains regex pattern
'returns str
With regex
.Global = True
.Pattern = "[^ \w]" 'regex pattern will ignore spaces, word and number characters...
End With
rgclean = regex.Replace(mystring, "") '.. and replaces everything else with ""
End Function
Try using regular expression.
Make sure you enable regular expression on:
Tools > References > checkbox: "Microsoft VBScript Regular Expressions 5.5"
Pass the "mystring" string variable into the function (rgclean). The function will check for anything that is not space, word[A-Za-z], or numbers[0-9], replace them with "", and returns the string.
The function will pretty much remove any symbols in the string. Any Numbers, Space, or Word will NOT be excluded.

Here is the opposite approach. Remove ALL characters that are not included in this group of 62:
ABCDEFGHIJKLMNOPQESTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
The code:
Const ValidCharacters As String = "ABCDEFGHIJKLMNOPQESTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Function ReplaceSpecialCharacters(myString As String) As String
Dim newString As String, L As Long, i As Long
Dim char As Variant
newString = myString
L = Len(newString)
For i = 1 To L
char = Mid(newString, i, 1)
If InStr(ValidCharacters, char) = 0 Then
newString = Replace(newString, char, "#")
End If
Next i
ReplaceSpecialCharacters = Replace(newString, "#", "")
End Function
Note:
You can also add characters to the string ValidCharacters if you want to retain them.

Related

In VBA, how to extract the string before a number from the text

From ActiveWorkbook.name, I would like to extract the strings that are before (left side of ) the numbers. Since I want to use the same code in multiple workbooks, the file names would be variable, but every file name has date info in the middle (yyyymmdd).
In case of excel file, I can use the below formula, but can I apply the same kind of method in VBA?
=LEFT(A1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},ASC(A1)&1234567890))-1)
Example: MyExcelWorkbook_Management_20200602_MyName.xlsm
In above case, I want to extract "MyExcelWorkbook_Management_".
The most basic thing you could do is to replicate something that worked for you in Excel through Evaluate:
Sub Test()
Dim str As String: str = "MyExcelWorkbook_Management_20200602_MyName.xlsm"
Debug.Print Evaluate(Replace("=LEFT(""X"",MIN(FIND({0,1,2,3,4,5,6,7,8,9},ASC(""X"")&1234567890))-1)", "X", str))
End Sub
Pretty? Not really, but it does the job and got it's limitations.
You could use Regular Expressions to extract any letters / underscores before the number as well
Dim str As String
str = "MyExcelWorkbook_Management_20200602_MyName.xlsm"
With CreateObject("vbscript.regexp")
.Pattern = "^\D*"
.Global = True
MsgBox .Execute(str)(0)
End With
Gives:
MyExcelWorkbook_Management_
So basically you want to use the Midfunction to look for the first numerical character in your input string, and then cut your input string to that position.
That means we need to loop through the string from left to right, look at one character at a time and see if it is a digit or not.
This code does exactly that:
Option Explicit
Sub extratLeftText()
Dim someString As String
Dim result As String
someString = "Hello World1234"
Dim i As Long
Dim c As String 'one character of your string
For i = 1 To Len(someString)
c = Mid(someString, i, 1)
If IsNumeric(c) = True Then 'should write "If IsNumeric(c) = True AND i>1 Then" to avoid an "out of bounds" error
result = Left(someString, i - 1)
Exit For
End If
Next i
MsgBox result
End Sub
Last thing you need to do is to load in some workbook name into your VBA function. Generally this is done with the .Name method of the workbookobject:
Sub workbookName()
Dim wb As Workbook
Set wb = ActiveWorkbook
MsgBox wb.Name
End Sub
Of course you would need to find some way to replace the Set wb = ActiveWorkbook line with code that suits your purpose.

vba Expected Array

Anybody have a good solution for recursive replace?
For example, you still end up with commas in this string returned by MsgBox:
Dim s As String
s = "32,,,,,,,,,,,,,,,,23"
MsgBox Replace(s, ",,", ",")
I only want one comma.
Here is code that I developed, but it doesn't compile:
Function RecursiveReplace(ByVal StartString As String, ByVal Find As String, ByVal Replace As String) As String
Dim s As String
s = Replace(StartString, Find, Replace)
t = StartString
Do While s <> t
t = s
s = Replace(StartString, Find, Replace)
Loop
RecursiveReplace = s
End Function
The compiler complains about the second line in the function:
s = Replace(StartString, Find, Replace)
It says Expected Array.
???
You can use a regular expression. This shows the basic idea:
Function CondenseCommas(s As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Pattern = ",+"
CondenseCommas = RegEx.Replace(s, ",")
End Function
Tested like:
Sub test()
Dim s As String
s = "32,,,,,,,,,,,,,,,,23"
MsgBox CondenseCommas(s)
End Sub

Get only letters from string vb.net

i want to get only letters from string.
eg.
Lets say the string is this :123abc456d
I want to get: abcd
Looking for something like this but for letters in a string:
Dim mytext As String = "123a123"
Dim myChars() As Char = mytext.ToCharArray()
For Each ch As Char In myChars
If Char.IsDigit(ch) Then
MessageBox.Show(ch)
End If
Next
Thanks
You can do it like this :
Dim mytext As String = "123a123"
Dim RemoveChars As String = "0123456789" 'These are the chars that you want to remove from your mytext string
Dim FinalResult As String
Dim myChars() As Char = mytext.ToCharArray()
For Each ch As Char In myChars
If Not RemoveChars.Contains(ch) Then
FinalResult &= ch
End If
Next
MsgBox(FinalResult)
OR :
Dim mytext As String = "1d23ad123d"
Dim myChars() As Char = mytext.ToCharArray()
Dim FinalResult As String
For Each ch As Char In myChars
If Not Char.IsDigit(ch) Then
FinalResult &= ch
End If
Next
MsgBox(FinalResult)
Both will give you the same result.
Hope that helped you :)
You can use Regex to solve this problem. This regex basically says anything that is not in this class, the class being letters in the alphabet then remove by replacing it with nothing.
Dim mytext As String = "123a123"
Dim Result as String = Regex.Replace(myText, "[^a-zA-Z]", "")
Dim myChars() As Char = Result.ToCharArray()
For Each ch As Char In myChars
If Char.IsDigit(ch) Then
MessageBox.Show(ch)
End If
Next
Make sure you have this at the top of your code Imports System.Text.RegularExpressions
Here is a LINQ one liner:
Debug.Print(String.Concat("123abc456d".Where(AddressOf Char.IsLetter)))
Result: abcd.
Here, .Where(AddressOf Char.IsLetter) treats the string as a list of chars, and only keeps letters in the list. Then, String.Concat re-builds the string out of the char list by concatenating the chars.

VB.NET remove specific chars between two characters in a string

In vb.net how do i remove a character from string which occurs between two known characters in a series.For example how do you remove commas from the number occurring between the hashtag
Balance,#163,464.24#,Cashbook Closing Balance:,#86,689.45#,Money,End
You can use this simple and efficient approach using a loop and a StringBuilder:
Dim text = "Balance,#163,464.24#,Cashbook Closing Balance:,#86,689.45#,Money,End"
Dim textBuilder As New StringBuilder()
Dim inHashTag As Boolean = False
For Each c As Char In text
If c = "#"c Then inHashTag = Not inHashTag ' reverse Boolean
If Not inHashTag OrElse c <> ","c Then
textBuilder.Append(c) ' append if we aren't in hashtags or the char is not a comma
End If
Next
text = textBuilder.ToString()
Because I'm bad at regex:
Dim str = "Balance,#163,464.24#,Cashbook Closing Balance:,#86,689.45#,Money,End"
Dim split = str.Split("#"c)
If UBound(split) > 1 Then
For i = 1 To UBound(split) Step 2
split(i) = split(i).Replace(",", "")
Next
End If
str = String.Join("#", split)

Removing particular string from a cell

I have text in a range of cells like
Manufacturer#||#Coaster#|#|Width (side to side)#||#20" W####Height (bottom to top)#||#35" H#|#|Depth (front to back)#||#20.5" D####Seat Depth#||#14.25"**#|#|Material & Finish####**Composition#||#Wood Veneers & Solids#|#|Composition#||#Metal#|#|Style Elements####Style#||#Contemporary#|#|Style#||#Casual
From this cell i need to remove strings between #|#|"needtoremove"#### only without affecting other strings.
I have tried find and replace, finding #|#|*#### and replacing it with #|#|. However its not giving the exact result.
Can anyone help me?
The other solution will remove anything between the first #|#| and ####, event the #||# etc.
In case you only need to remove the text between #|#| and #### only if there is no other ##|| inbetween, I think the simplest way is to use a regex.
You will need to activate the Microsoft VBScript Regular Expressions 5.5 library in Tools->References from the VBA editor.
Change range("D166") to wherever your cell is. The expression as it is right now ("#\|#\|[A-Za-z0-9& ]*####")matches any text that starts with #|#|, ends with #### and has any number of alphanumerical character, & or space. You can add other caracters between the brakets if needed.
Sub remove()
Dim reg As New RegExp
Dim pattern As String
Dim replace As String
Dim strInput As String
strInput = Range("D166").Value
replace = ""
pattern = "#\|#\|[A-Za-z0-9& ]*####"
With reg
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If reg.test(strInput) Then Range("D166").Value = reg.replace(strInput, replace)
End Sub
Something like this.
If that value is in cell A1
Dim str As String
Dim i As Integer
Dim i2 As Integer
Dim ws As Excel.Worksheet
Set ws = Application.ActiveSheet
str = ws.Range("A1").Value
i = InStr(str, "#|#|")
i2 = InStr(str, "####")
str = Left(str, i) & Right(str, Len(str) - i2)
ws.Range("A1").Value = str

Resources