Excel - VBA : Make the "replace" function more specific - excel

I am currently encountering a problem which doesn't seem that hard to fix but, yet, I can't find a clean way of doing it on my own.
I am using the "Replace" function to change some expressions in a sentence typed by an user. For example, if the user types "va", I want it to be turned into "V. A." instead so it will match more easily with my database for further operations.
Here is my simple code to do it :
sMain.Range("J3").Replace "VA", "V. A."
It works well.
Problem is, it's not only spotting "VA" as an individual expression, but also as a part of words.
So if my user types "Vatican", it's gonna turn it into : "V. A.tican"... which of course I don't want.
Do you know how to easily specify my code to make it ONLY consider replacing the whole words matching the expression? (I have dozens of lines of these replacement so ideally, it would be better to act directly on the "replace" functions - if possible).
Thanks in advance !

Do this:
sMain.Range("J3").Replace " VA ", "V. A."
then handle the cases where the original string starts or ends with VA
also, handle all cases of separators which could be (for example) tab, space or comma.
To do that:
const nSep As Integer = 3
Dim sep(nSep) As String
sep(1) = " "
sep(2) = vbTab
sep(3) = ","
for i=1 to nSep
for j=1 to nSep
sMain.Range("J3").Replace sep(i) & "VA" & sep(j), "V. A."
next
next

Can split it up and check each word. I have put it into a function for easy of use and flexibility.
Function ReplaceWordOnly(sText As String, sFind As String, sReplace As String) As String
On Error Resume Next
Dim aText As Variant, oText As Variant, i As Long
aText = Split(sText, " ")
For i = 0 To UBound(aText)
oText = aText(i)
' Check if starting with sFind
If LCase(Left(oText, 2)) = LCase(sFind) Then
Select Case Asc(Mid(oText, 3, 1))
Case 65 To 90, 97 To 122
' obmit if third character is alphabet (checked by ascii code)
Case Else
aText(i) = Replace(oText, sFind, sReplace, 1, -1, vbTextCompare)
End Select
End If
Next
ReplaceWordOnly = Join(aText, " ")
End Function
Example output:
?ReplaceWordOnly("there is a vatican in vA.","Va","V. A.")
there is a vatican in V. A..

Related

Remove Certain Characters from a String using UDF

I have a column which contain cells that have some list of alphanumeric number system as follows:
4A(4,5,6,7,8,9); 4B(4,5,7,8); 3A(1,2,3); 3B(1,2,3), 3C(1,2)
On a cell next to it, I use a UDF function to get rid of special characters "(),;" in order to leave the data as
4A456789 4B4578 3A123 3B123 3C12
Function RemoveSpecial(Str As String) As String
Dim SpecialChars As String
Dim i As Long
SpecialChars = "(),;-abcdefghijklmnopqrstuvwxyz"
For i = 1 To Len(SpecialChars)
Str = Replace$(Str, Mid$(SpecialChars, i, 1), "")
Next
RemoveSpecial = Str
End Function
For the most part this works well. However, on certain occasions, the cell would contain an unorthodox pattern such as when a space is included between the 4A and the parenthesized items:
4A (4,5,6,7,8,9);
or when a text appears inside the parenthesis (including two spaces on each side):
4A (4,5, skip 8,9);
or a space appears between the first two characters:
4 A(4,5,6)
How would you fix this so that the random spaces are removed except to delaminate the actual combination of data?
One strategy would be to substitute the patterns you want to keep before eliminating the "special" characters, then restore the desired patterns.
From your sample data, it look like you want to keep a space only if it follow ); or ),
Something like this:
Function RemoveSpecial(Data As Variant) As Variant
Dim SpecialChars As String
Dim KeepStr As Variant, PlaceHolder As Variant, ReplaceStr As Variant
Dim i As Long
Dim DataStr As String
SpecialChars = " (),;-abcdefghijklmnopqrstuvwxyz"
KeepStr = Array("); ", "), ")
PlaceHolder = Array("~0~", "~1~") ' choose a PlaceHolder that won't appear in the data
ReplaceStr = Array(" ", " ")
DataStr = Data
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, KeepStr(i), PlaceHolder(i))
Next
For i = 1 To Len(SpecialChars)
DataStr = Replace$(DataStr, Mid$(SpecialChars, i, 1), vbNullString)
Next
For i = LBound(KeepStr) To UBound(KeepStr)
DataStr = Replace$(DataStr, PlaceHolder(i), ReplaceStr(i))
Next
RemoveSpecial = Application.Trim(DataStr)
End Function
Another strategy would be regular expressions (RegEx)
It looks like a regular expression could come in handy here, for example:
Function RemoveSpecial(Str As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\)[;,]( )|[^A-Z\d]+"
RemoveSpecial = .Replace(Str, "$1")
End With
End Function
I have used the regular expression:
\)[;,]( )|[^A-Z\d]+
You can see an online demo to see the result in your browser. The way this works is to apply a form of what some would call "The best regex trick ever!"
\)[;,]( ) - Escape a closing paranthesis, then match either a comma or semicolon before we capture a space character in our 1st capture group.
| - Or use the following alternation:
[^A-Z\d]+ - Any 1+ char any other than in given character class.
EDIT:
In case you have values like 4A; or 4A, you can use:
(?:([A-Z])|\))[;,]( )|[^A-Z\d]+
And replace with $1$2. See an online demo.

Excel FindJobCode's problems

I am new in VBA and I have a code as below to find some job numbers in a description.
However, i have 3 problems on it...
if 1st character is small letter such as "s", "m", then it show error
i cannot solve Example3, the result will show "M3045.67," but all i need is "M3045.67" only, no comma
i don't know why it is failed to run the code Range("E2").Value = "Overhead" after Else in Example5
but for problem 3, i can run result "overhead" before i add 2nd criteria, is something wrong there ? Please help~~~thanks.
P.S. the looping will be added after solving above questions......
Sub FindCode()
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Retenal Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overehad
'Criteria1 : 1st Character = G / S / M / P
If Left(Range("A2"), 1) = "G" Or Left(Range("A2"), 1) = "S" Or Left(Range("A2"), 1) = "M" Or Left(Range("A2"), 1) = "P" Then
'Criteria2 : 2nd-5th Character = Number only
If IsNumeric(Mid(Range("A2"), 2, 4)) Then
'Get string before "space"
Range("E2").Value = Left(Range("A2"), InStr(1, Range("A2"), " ") - 1)
Else
'If not beginning from Crit 1&2, show "Overhead"
Range("E2").Value = "Overhead"
End If
End If
'If start from "P", get first 5 string
If Left(Range("A2"), 1) = "P" And IsNumeric(Mid(Range("A2"), 2, 4)) Then
Range("E2").Value = Left(Range("A2"), 5)
Else
End If
End Sub
The function below will extract the job number and return it to the procedure that called it.
Function JobCode(Cell As Range) As String
' 303
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Rental Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overhead
Dim Fun As String ' function return value
Dim Txt As String ' Text to extract number from
' Minimize the number of times your code reads from the sheet because it's slow
Txt = Cell.Value ' actually, it's Cells(2, 1)
' Criteria1 : 1st Character = G / S / M / P
If InStr("GSMP", UCase(Left(Txt, 1))) Then
Txt = Split(Txt)(0) ' split on blank, take first element
' Criteria2 : 2nd-5th Character = Number only
' Isnumeric(Mid("A4", 2, 4)) = true
If (Len(Txt) >= 5) And (IsNumeric(Mid(Txt, 2, 4))) Then
Fun = Replace(Txt, ",", "")
Fun = Split(Fun, "-")(0) ' discard "-123A" in example 4
End If
End If
' If no job number was extracted, show "Overhead"
If Len(Fun) = 0 Then Fun = "Overhead"
JobCode = Fun
End Function
The setup as a function, rather than a sub, is typical for this sort of search. In my trials I had your 5 examples in A2:A6 and called them in a loop, giving a different cell to the function on each loop. Very likely, this is what you are angling for, too. This is the calling procedure I used for testing.
Sub Test_JobCode()
' 303
Dim R As Long
For R = 2 To Cells(Rows.Count, "A").End(xlUp).Row
' I urge you not to use syntax for addressing ranges when addressing cells
Debug.Print JobCode(Cells(R, "A")) ' actually, it's Cells(2, 1)
Next R
End Sub
Of course, instead of Debug.Print JobCode(Cells(R, "A")) you could also have Cells(R, "B").Value = JobCode(Cells(R, "A"))
The reason why your Else statement didn't work was a logical error. The "Overhead" caption doesn't apply if criteria 1 & 2 aren't met but if all previous efforts failed, which is slightly broader in meaning. This combined with the fact that Isnumeric(Mid("A4", 2, 4)) = True, causing the test not to fail as you expected.
In rough terms, the code first checks if the first letter qualifies the entry for examination (and returns "Overhead" if it doesn't). Then the text is split into words, only the first one being considered. If it's too short or non-numeric no job code is extracted resulting in "Overhead" in the next step. If this test is passed, the final result is modified: The trailing comma is removed (it it exists) and anything appended with a hyphen is removed (if it exists). I'm not sure you actually want this. So, you can easily remove the line. Or you might add more modifications at that point.
What you are trying to do is FAR easier using regular expression matching and replacing, so I recommend enabling that library of functions. The best news about doing that is that you can invoke those functions in EXCEL formulas and do not need to use Visual Basic for Applications at all.
To enable Regular Expressions as Excel functions:
Step 1: Enable the Regular Expression library in VBA.
A. In the Visual Basic for Applications window (where you enter VBA code) find the Tools menu and
select it, then select the References... entry in the sub-menu.
B. A dialogue box will appear listing the possible "Available References:" in alphabetical order.
Scroll down to find the entry "Microsoft VBScript Regular Expressions 5.5".
C. Check the checkbox on that line and press the OK button.
Step 2: Create function calls. In the Visual Basic for Applications window select Insert..Module. Then paste the following VBA code into the blank window that comes up:
' Some function wrappers to make the VBScript RegExp reference Library useful in both VBA code and in Excel & Access formulas
'
Private rg As RegExp 'All of the input data to control the RegExp parsing
' RegExp object contains 3 Boolean options that correspond to the 'i', 'g', and 'm' options in Unix-flavored regexp
' IgnoreCase - pretty self-evident. True means [A-Z] matches lowercase letters and vice versa, false means it won't
' IsGlobal - True means after the first match has been processed, continue on from the current point in DataString and look to process more matches. False means stop after first match is processed.
' MultiLine - False means ^ and $ match only Start and End of DataString, True means they match embedded newlines. This provides an option to process line-by-line when Global is true also.
'
' Returns true/false: does DataString match pattern? IsGlobal=True makes no sense here
Public Function RegExpMatch(DataString As String, Pattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = False, Optional MultiLine As Boolean = False) As Boolean
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpMatch = rg.Test(DataString)
End Function
'
' Find <pattern> in <DataString>, replace with <ReplacePattern>
' Default IsGlobal=True means replace all matching occurrences. Call with False to replace only first occurrence.
'
Public Function RegExpReplace(DataString As String, Pattern As String, ReplacePattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = True, Optional MultiLine As Boolean = False) As String
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpReplace = rg.Replace(DataString, ReplacePattern)
End Function
Now you can call RegExpMatch & RegExpReplace in Excel formulas and we can start to think of how to solve your particular problem. To be a match, your string must start with G, S, M, or P. In a regular expression code that is ^[GSMP], where the up-arrow says to start at the beginning and the [GSMP] says to accept a G, S, M or P in the next position. Then any matching string must next have a number of numeric digits. Code that as \d+, where the \d means one numeric digit and the + is a modifier that means accept one or more of them. Then you could have a dot followed by some more digits, or not. This is a little more complicated - you would code it as (\.\d+)? because dot is a special character in regular expressions and \. says to accept a literal dot. That is followed by \d+ which is one or more digits, but this whole expression is enclosed in parentheses and followed by a ?, which means what is in parentheses can appear once or not at all. Finally, comes the rest of the line and we don't really care what is in it. We code .*$ for zero or more characters (any) followed by the line's end. That all goes together as ^[GSMP]\d+(\.\d+)?.*$.
Putting that pattern into our RegExpReplace call:
=RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1")
We wrapped the part we were interested in keeping in parentheses because the "$1" as part of the replacement pattern says to use whatever was found inside the first set of parentheses. Here is that formula used in Excel
This works for all your examples but the last one, which is your else clause in your logic. We can fix that by testing whether the pattern matched using RegExpMatch:
=IF(regexpMatch(A2,"^([GSMP]\d+(\.\d+)?).*$"),RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1"),"Overhead")
This gives the results you are looking for and you have also gained a powerful text manipulation tool to solve future problems.

How do i get part of string after a special character?

I have a column where i pickup increasing numbering values, and their format is xx_yy
so the first is 1_0, second 1_1 and so forth, no we are at 23_31
I want to get the right side of the string, and i am already getting the left side correctly.
using
newActionId = Left(lastActionID, (Application.WorksheetFunction.Find("_", lastActionID, 1) - 1))
i wish to do the following, human writing below
nextSubid = entire stringvalue AFTER special character "_"
I tried just switching left to right, didnt go so well, do you have a suggestion?
You can use Split function to get the relevant text.
Syntax: Split(expression, [ delimiter, [ limit, [ compare ]]])
Option Explicit
Sub Sample()
Dim id As String
Dim beforeSplChr As String
Dim afterSplChr As String
id = "23_31"
beforeSplChr = Split(id, "_")(0)
afterSplChr = Split(id, "_")(1)
Debug.Print beforeSplChr
Debug.Print afterSplChr
End Sub
Another way
Debug.Print Left(id, (InStrRev(id, "_", -1) - 1)) '<~~ Left Part
Debug.Print Right(id, (InStrRev(id, "_", -1) - 1)) '<~~ Right Part
Even though Siddharth Rout has given what can probably be considered a better answer here, I felt that this was worth adding:
To get the second part of the string using your original method, you would want to use the Mid function in place of Left, rather than trying to use Right.
Mid(string, start, [ length ])
Returns length characters from string, starting at the start position
If length is omitted, then will return characters from the start position until the end of the string
newActionId = Mid(lastActionID, Application.WorksheetFunction.Find("_", lastActionID, 1) + 1)
Just for fun (Split is the way to go here), an alternative way using regular expressions:
Sub Test()
Dim str As String: str = "23_31"
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "\d+"
Debug.Print .Execute(str)(0) 'Left Part
Debug.Print .Execute(str)(1) 'Right Part
End With
End Sub
Btw, as per my comment, your first value could also be achieved through:
Debug.Print Val(str)
Split function of string is very usefull for this type of query.
Like:
String s = "23_34";
String left = s.split("_")[0];
String right = s.split("_")[1];
Or you can also use combination of indexOf and substring method together.
String left = s.substring(0,s.indexOf('_')+1)
String right = s.substring(s.indexOf('_'));

VBA: Add Carriage Return + Line Feed at the start of Uppercase phrase

I have cells that contain various information.
In these cells, there are multiple Uppercase phrases.
I would like to be able to split the contents of the cell by adding the CHAR(13) + CHAR(10) Carriage return - linefeed combination
to the start of each new Uppercase phrase.
The only consistency is that the multiple Uppercase phrases begin after a period (.) and before open parenthesis "("
Example:
- Add CRLF to start of PERSUADER
- Add CRLF to start of RIVER JEWEL
- Add CRLF to start of TAHITIAN DANCER
- Add CRLF to start of AMBLEVE
- Add CRLF to start of GINA'S HOPE
NOTE:
There are multiple periods (.) in the text.
I have highlighted the text in red for a visual purpose only (normal text/font during import).
I am OK with either formula, UDF or VBA sub.
TEXT
PERSUADER (1) won by a margin first up at Kyneton. Bit of authority about her performance there and with the stable finding form it's easy to see her going right on with that. Ran really well when placed at Caulfield second-up last prep and that rates well against these. RIVER JEWEL (2) has been racing well at big odds. I have to like the form lines that she brings back in class now. Shapes as a key danger. TAHITIAN DANCER (5) will run well. She was okay without a lot of room at Flemington last time. AMBLEVE (13) is winning and can measure up while GINA'S HOPE (11) wasn't too far from River Jewel at Flemington and ties in as a hope off that form line.
I was able to extract with this function - but not able to manipulate the data in the cell
This is my code so far:
Function UpperCaseWords(ByVal S As String) As String
Dim X As Long, Words() As String
Const OkayPunctuation As String = ",."";:'&,-?!"
For X = 1 To Len(OkayPunctuation)
S = Replace(S, Mid(OkayPunctuation, X, 1), " ")
Next
Words = Split(WorksheetFunction.Trim(S))
For X = 0 To UBound(Words)
If Words(X) Like "*[!A-Z]*" Then Words(X) = ""
Next
UpperCaseWords = Trim(Join(Words))
End Function
Your description is not the same as your examples.
None of your examples start after a dot.
Most start after a dot-space except
PERSUADER starts at the start of the string
GINA'S HOPE starts after a space
I incorporated those rules into a regular expression, but, since your upper case words can include punctuation, for brevity I just looked for
- words that excluded lower case letters and digits
- words at least three characters long
If that is not sufficient in your real data, the regex can easily be made more specific:
Option Explicit
Function upperCaseWords(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = "^|\s(\b[^a-z0-9]+\b\s*\()"
upperCaseWords = .Replace(S, vbCrLf & "$1")
End With
End Function
as per your wording
The only consistency is that the multiple Uppercase phrases begin
after a period (.) and before open parenthesis "("
this should do:
Function UpperCaseWords(ByVal s As String) As String
Dim w As Variant
Dim s1 As String
For Each w In Split(s, ". ")
If InStr(w, "(") Then w = Chr(13) + Chr(10) & w
s1 = s1 & w
Next
UpperCaseWords = s1
End Function
Since the OP accepted the formula solution, and here is a formula answer .
Assume data put in A1
In B1, enter formula and copied across until blank :
=TRIM(RIGHT(SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE(" (. "&$A1," while ",". ")," (",REPT(" ",700)),COLUMN(A1)*700,700))&" ",". ",REPT(" ",300)),300))

VBA Trim leaving leading white space

I'm trying to compare strings in a macro and the data isn't always entered consistently. The difference comes down to the amount of leading white space (ie " test" vs. "test" vs. " test")
For my macro the three strings in the example should be equivalent. However I can't use Replace, as any spaces in the middle of the string (ex. "test one two three") should be retained. I had thought that was what Trim was supposed to do (as well as removing all trailing spaces). But when I use Trim on the strings, I don't see a difference, and I'm definitely left with white space at the front of the string.
So A) What does Trim really do in VBA? B) Is there a built in function for what I'm trying to do, or will I just need to write a function?
Thanks!
So as Gary's Student aluded to, the character wasn't 32. It was in fact 160. Now me being the simple man I am, white space is white space. So in line with that view I created the following function that will remove ALL Unicode characters that don't actual display to the human eye (i.e. non-special character, non-alphanumeric). That function is below:
Function TrueTrim(v As String) As String
Dim out As String
Dim bad As String
bad = "||127||129||141||143||144||160||173||" 'Characters that don't output something
'the human eye can see based on http://www.gtwiki.org/mwiki/?title=VB_Chr_Values
out = v
'Chop off the first character so long as it's white space
If v <> "" Then
Do While AscW(Left(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Left(out, 1)) & "||") <> 0 'Left(out, 1) = " " Or Left(out, 1) = Chr(9) Or Left(out, 1) = Chr(160)
out = Right(out, Len(out) - 1)
Loop
'Chop off the last character so long as it's white space
Do While AscW(Right(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Right(out, 1)) & "||") <> 0 'Right(out, 1) = " " Or Right(out, 1) = Chr(9) Or Right(out, 1) = Chr(160)
out = Left(out, Len(out) - 1)
Loop
End If 'else out = "" and there's no processing to be done
'Capture result for return
TrueTrim = out
End Function
TRIM() will remove all leading spaces
Sub demo()
Dim s As String
s = " test "
s2 = Trim(s)
msg = ""
For i = 1 To Len(s2)
msg = msg & i & vbTab & Mid(s2, i, 1) & vbCrLf
Next i
MsgBox msg
End Sub
It is possible your data has characters that are not visible, but are not spaces either.
Without seeing your code it is hard to know, but you could also use the Application.WorksheetFunction.Clean() method in conjunction with the Trim() method which removes non-printable characters.
MSDN Reference page for WorksheetFunction.Clean()
Why don't you try using the Instr function instead? Something like this
Function Comp2Strings(str1 As String, str2 As String) As Boolean
If InStr(str1, str2) <> 0 Or InStr(str2, str1) <> 0 Then
Comp2Strings = True
Else
Comp2Strings = False
End If
End Function
Basically you are checking if string1 contains string2 or string2 contains string1. This will always work, and you dont have to trim the data.
VBA's Trim function is limited to dealing with spaces. It will remove spaces at the start and end of your string.
In order to deal with things like newlines and tabs, I've always imported the Microsoft VBScript RegEx library and used it to replace whitespace characters.
In your VBA window, go to Tools, References, the find Microsoft VBScript Regular Expressions 5.5. Check it and hit OK.
Then you can create a fairly simple function to trim all white space, not just spaces.
Private Function TrimEx(stringToClean As String)
Dim re As New RegExp
' Matches any whitespace at start of string
re.Pattern = "^\s*"
stringToClean = re.Replace(stringToClean, "")
' Matches any whitespace at end of string
re.Pattern = "\s*$"
stringToClean = re.Replace(stringToClean, "")
TrimEx = stringToClean
End Function
Non-printables divide different lines of a Web page. I replaced them with X, Y and Z respectively.
Debug.Print Trim(Mid("X test ", 2)) ' first place counts as 2 in VBA
Debug.Print Trim(Mid("XY test ", 3)) ' second place counts as 3 in VBA
Debug.Print Trim(Mid("X Y Z test ", 2)) ' more rounds needed :)
Programmers prefer large text as may neatly be chopped with built in tools (inSTR, Mid, Left, and others). Use of text from several children (i.e taking .textContent versus .innerText) may result several non-printables to cope with, yet DOM and REGEX are not for beginners. Addressing sub-elements for inner text precisely (child elements one-by-one !) may help evading non-printable characters.

Resources