How to identify and remove a single letter from a string in a cell? - excel

I have a dataset of names in a column in Excel.
However, only some but not all of the names have a letter attached to the end of it (e.g. John Doe A, Kai Jin, Johnny Desplat Lang B, etc).
Can anyone think of a method to remove the letter from the end of the name from each row, if it is there? Such that, using the example above, I will be left with: John Doe, Kai Jin, Johnny Desplat Lang, etc.
I am fairly familiar with VBA and Excel and would be open to trying anything at all.
Thank you for your help with this question! Apologies beforehand if this seems like an elementary question but I have no idea how to begin to solve it.

"I am fairly familiar with VBA and Excel and would be open to trying anything at all."
If so, then this can be done with a simple formula if you wish to avoid VBA. With your value in A1:
=IF(MID(A1,LEN(A1)-1,1)=" ",LEFT(A1,LEN(A1)-2),A1)
If you must use VBA, I think the Like operator comes in handy:
Sub Test()
Dim arr As Variant: arr = Array("John Doe A", "Kai Jin", "Johnny Desplat Lang B")
For Each el In arr
If el Like "* ?" Then 'Or "* [A-Z]" if you must check for uppercase alpha.
Debug.Print Left(el, Len(el) - 2)
Else
Debug.Print el
End If
Next
End Sub

Just for fun and in order to demonstrate another approach via the Filter() function:
Function ShortenName(ByVal FullName As Variant) As String
'Purpose: remove a single last letter
Dim n: n = Split(FullName, " "): n = Len(n(UBound(n)))
ShortenName = Left(FullName, Len(FullName) + 2 * (n = 1))
End Function
Explanation
Applying the Split() function upon the full name and isolating the last name token (via UBound()) allows to check for a single letter length (variable n).
The function result returns the entire string length minus 2 (last letter plus preceding space) in case of a single letter (the the condition n = 1 then results in True equalling -1). - Alternatively you could have coded: ShortenName = Left(FullName, Len(FullName) - IIf(n = 1, 2, 0))

Related

Excel FindJobCode's problems

I am new in VBA and I have a code as below to find some job numbers in a description.
However, i have 3 problems on it...
if 1st character is small letter such as "s", "m", then it show error
i cannot solve Example3, the result will show "M3045.67," but all i need is "M3045.67" only, no comma
i don't know why it is failed to run the code Range("E2").Value = "Overhead" after Else in Example5
but for problem 3, i can run result "overhead" before i add 2nd criteria, is something wrong there ? Please help~~~thanks.
P.S. the looping will be added after solving above questions......
Sub FindCode()
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Retenal Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overehad
'Criteria1 : 1st Character = G / S / M / P
If Left(Range("A2"), 1) = "G" Or Left(Range("A2"), 1) = "S" Or Left(Range("A2"), 1) = "M" Or Left(Range("A2"), 1) = "P" Then
'Criteria2 : 2nd-5th Character = Number only
If IsNumeric(Mid(Range("A2"), 2, 4)) Then
'Get string before "space"
Range("E2").Value = Left(Range("A2"), InStr(1, Range("A2"), " ") - 1)
Else
'If not beginning from Crit 1&2, show "Overhead"
Range("E2").Value = "Overhead"
End If
End If
'If start from "P", get first 5 string
If Left(Range("A2"), 1) = "P" And IsNumeric(Mid(Range("A2"), 2, 4)) Then
Range("E2").Value = Left(Range("A2"), 5)
Else
End If
End Sub
The function below will extract the job number and return it to the procedure that called it.
Function JobCode(Cell As Range) As String
' 303
'Example1 : G5012.123 Management Fee / Get Result = G5012.123
'Example2 : G3045.67 Management Fee / Get Result = G3045.67
'Example3 : M3045.67, S7066 Rental Fee / Get Result = M3045.67,
'Example4 : P9876-123A Car Park / Get Result = P9876
'Example5 : A4 paper / Get result = Overhead
Dim Fun As String ' function return value
Dim Txt As String ' Text to extract number from
' Minimize the number of times your code reads from the sheet because it's slow
Txt = Cell.Value ' actually, it's Cells(2, 1)
' Criteria1 : 1st Character = G / S / M / P
If InStr("GSMP", UCase(Left(Txt, 1))) Then
Txt = Split(Txt)(0) ' split on blank, take first element
' Criteria2 : 2nd-5th Character = Number only
' Isnumeric(Mid("A4", 2, 4)) = true
If (Len(Txt) >= 5) And (IsNumeric(Mid(Txt, 2, 4))) Then
Fun = Replace(Txt, ",", "")
Fun = Split(Fun, "-")(0) ' discard "-123A" in example 4
End If
End If
' If no job number was extracted, show "Overhead"
If Len(Fun) = 0 Then Fun = "Overhead"
JobCode = Fun
End Function
The setup as a function, rather than a sub, is typical for this sort of search. In my trials I had your 5 examples in A2:A6 and called them in a loop, giving a different cell to the function on each loop. Very likely, this is what you are angling for, too. This is the calling procedure I used for testing.
Sub Test_JobCode()
' 303
Dim R As Long
For R = 2 To Cells(Rows.Count, "A").End(xlUp).Row
' I urge you not to use syntax for addressing ranges when addressing cells
Debug.Print JobCode(Cells(R, "A")) ' actually, it's Cells(2, 1)
Next R
End Sub
Of course, instead of Debug.Print JobCode(Cells(R, "A")) you could also have Cells(R, "B").Value = JobCode(Cells(R, "A"))
The reason why your Else statement didn't work was a logical error. The "Overhead" caption doesn't apply if criteria 1 & 2 aren't met but if all previous efforts failed, which is slightly broader in meaning. This combined with the fact that Isnumeric(Mid("A4", 2, 4)) = True, causing the test not to fail as you expected.
In rough terms, the code first checks if the first letter qualifies the entry for examination (and returns "Overhead" if it doesn't). Then the text is split into words, only the first one being considered. If it's too short or non-numeric no job code is extracted resulting in "Overhead" in the next step. If this test is passed, the final result is modified: The trailing comma is removed (it it exists) and anything appended with a hyphen is removed (if it exists). I'm not sure you actually want this. So, you can easily remove the line. Or you might add more modifications at that point.
What you are trying to do is FAR easier using regular expression matching and replacing, so I recommend enabling that library of functions. The best news about doing that is that you can invoke those functions in EXCEL formulas and do not need to use Visual Basic for Applications at all.
To enable Regular Expressions as Excel functions:
Step 1: Enable the Regular Expression library in VBA.
A. In the Visual Basic for Applications window (where you enter VBA code) find the Tools menu and
select it, then select the References... entry in the sub-menu.
B. A dialogue box will appear listing the possible "Available References:" in alphabetical order.
Scroll down to find the entry "Microsoft VBScript Regular Expressions 5.5".
C. Check the checkbox on that line and press the OK button.
Step 2: Create function calls. In the Visual Basic for Applications window select Insert..Module. Then paste the following VBA code into the blank window that comes up:
' Some function wrappers to make the VBScript RegExp reference Library useful in both VBA code and in Excel & Access formulas
'
Private rg As RegExp 'All of the input data to control the RegExp parsing
' RegExp object contains 3 Boolean options that correspond to the 'i', 'g', and 'm' options in Unix-flavored regexp
' IgnoreCase - pretty self-evident. True means [A-Z] matches lowercase letters and vice versa, false means it won't
' IsGlobal - True means after the first match has been processed, continue on from the current point in DataString and look to process more matches. False means stop after first match is processed.
' MultiLine - False means ^ and $ match only Start and End of DataString, True means they match embedded newlines. This provides an option to process line-by-line when Global is true also.
'
' Returns true/false: does DataString match pattern? IsGlobal=True makes no sense here
Public Function RegExpMatch(DataString As String, Pattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = False, Optional MultiLine As Boolean = False) As Boolean
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpMatch = rg.Test(DataString)
End Function
'
' Find <pattern> in <DataString>, replace with <ReplacePattern>
' Default IsGlobal=True means replace all matching occurrences. Call with False to replace only first occurrence.
'
Public Function RegExpReplace(DataString As String, Pattern As String, ReplacePattern As String, Optional IgnoreCase As Boolean = True, Optional IsGlobal As Boolean = True, Optional MultiLine As Boolean = False) As String
If rg Is Nothing Then Set rg = New RegExp
rg.IgnoreCase = IgnoreCase
rg.Global = IsGlobal
rg.MultiLine = MultiLine
rg.Pattern = Pattern
RegExpReplace = rg.Replace(DataString, ReplacePattern)
End Function
Now you can call RegExpMatch & RegExpReplace in Excel formulas and we can start to think of how to solve your particular problem. To be a match, your string must start with G, S, M, or P. In a regular expression code that is ^[GSMP], where the up-arrow says to start at the beginning and the [GSMP] says to accept a G, S, M or P in the next position. Then any matching string must next have a number of numeric digits. Code that as \d+, where the \d means one numeric digit and the + is a modifier that means accept one or more of them. Then you could have a dot followed by some more digits, or not. This is a little more complicated - you would code it as (\.\d+)? because dot is a special character in regular expressions and \. says to accept a literal dot. That is followed by \d+ which is one or more digits, but this whole expression is enclosed in parentheses and followed by a ?, which means what is in parentheses can appear once or not at all. Finally, comes the rest of the line and we don't really care what is in it. We code .*$ for zero or more characters (any) followed by the line's end. That all goes together as ^[GSMP]\d+(\.\d+)?.*$.
Putting that pattern into our RegExpReplace call:
=RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1")
We wrapped the part we were interested in keeping in parentheses because the "$1" as part of the replacement pattern says to use whatever was found inside the first set of parentheses. Here is that formula used in Excel
This works for all your examples but the last one, which is your else clause in your logic. We can fix that by testing whether the pattern matched using RegExpMatch:
=IF(regexpMatch(A2,"^([GSMP]\d+(\.\d+)?).*$"),RegExpReplace(A2,"^([GSMP]\d+(\.\d+)?).*$","$1"),"Overhead")
This gives the results you are looking for and you have also gained a powerful text manipulation tool to solve future problems.

VBA: Add Carriage Return + Line Feed at the start of Uppercase phrase

I have cells that contain various information.
In these cells, there are multiple Uppercase phrases.
I would like to be able to split the contents of the cell by adding the CHAR(13) + CHAR(10) Carriage return - linefeed combination
to the start of each new Uppercase phrase.
The only consistency is that the multiple Uppercase phrases begin after a period (.) and before open parenthesis "("
Example:
- Add CRLF to start of PERSUADER
- Add CRLF to start of RIVER JEWEL
- Add CRLF to start of TAHITIAN DANCER
- Add CRLF to start of AMBLEVE
- Add CRLF to start of GINA'S HOPE
NOTE:
There are multiple periods (.) in the text.
I have highlighted the text in red for a visual purpose only (normal text/font during import).
I am OK with either formula, UDF or VBA sub.
TEXT
PERSUADER (1) won by a margin first up at Kyneton. Bit of authority about her performance there and with the stable finding form it's easy to see her going right on with that. Ran really well when placed at Caulfield second-up last prep and that rates well against these. RIVER JEWEL (2) has been racing well at big odds. I have to like the form lines that she brings back in class now. Shapes as a key danger. TAHITIAN DANCER (5) will run well. She was okay without a lot of room at Flemington last time. AMBLEVE (13) is winning and can measure up while GINA'S HOPE (11) wasn't too far from River Jewel at Flemington and ties in as a hope off that form line.
I was able to extract with this function - but not able to manipulate the data in the cell
This is my code so far:
Function UpperCaseWords(ByVal S As String) As String
Dim X As Long, Words() As String
Const OkayPunctuation As String = ",."";:'&,-?!"
For X = 1 To Len(OkayPunctuation)
S = Replace(S, Mid(OkayPunctuation, X, 1), " ")
Next
Words = Split(WorksheetFunction.Trim(S))
For X = 0 To UBound(Words)
If Words(X) Like "*[!A-Z]*" Then Words(X) = ""
Next
UpperCaseWords = Trim(Join(Words))
End Function
Your description is not the same as your examples.
None of your examples start after a dot.
Most start after a dot-space except
PERSUADER starts at the start of the string
GINA'S HOPE starts after a space
I incorporated those rules into a regular expression, but, since your upper case words can include punctuation, for brevity I just looked for
- words that excluded lower case letters and digits
- words at least three characters long
If that is not sufficient in your real data, the regex can easily be made more specific:
Option Explicit
Function upperCaseWords(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = "^|\s(\b[^a-z0-9]+\b\s*\()"
upperCaseWords = .Replace(S, vbCrLf & "$1")
End With
End Function
as per your wording
The only consistency is that the multiple Uppercase phrases begin
after a period (.) and before open parenthesis "("
this should do:
Function UpperCaseWords(ByVal s As String) As String
Dim w As Variant
Dim s1 As String
For Each w In Split(s, ". ")
If InStr(w, "(") Then w = Chr(13) + Chr(10) & w
s1 = s1 & w
Next
UpperCaseWords = s1
End Function
Since the OP accepted the formula solution, and here is a formula answer .
Assume data put in A1
In B1, enter formula and copied across until blank :
=TRIM(RIGHT(SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE(" (. "&$A1," while ",". ")," (",REPT(" ",700)),COLUMN(A1)*700,700))&" ",". ",REPT(" ",300)),300))

Comparing Strings in VBA

I have a basic programming background and have been self sufficient for many years but this problem I can't seem to solve. I have a program in VBA and I need to compare two strings. I have tried using the following methods to compare my strings below but to no avail:
//Assume Cells(1, 1).Value = "Cat"
Dim A As String, B As String
A="Cat"
B=Cell(1, 1).Value
If A=B Then...
If A Like B Then...
If StrCmp(A=B, 1)=0 Then...
I've even tried inputting the Strings straight into the code to see if it would work:
If "Cat" = "Cat" Then...
If "Cat" Like "Cat" Then...
If StrCmp("Cat" = "Cat", 1) Then...
VBA for some reason does not recognize these strings as equals. When going through Debugger it shows that StrComp returns 1. Do my strings have different Char lengths? Thanks for any help.
Posting as answer because it doesn't fit in the comments:
I find it hard to believe that something like:
MsgBox "Cat" = "Cat"
Would not display True on your machine. Please verify.
However, I do observe that you are most certainly using StrComp function incorrectly.
The proper use is StrComp(string, anotherstring, [comparison type optional])
When you do StrComp(A=B, 1) you are essentially asking it to compare whether a boolean (A=B will either evaluate to True or False) is equivalent to the integer 1. It is not, nor will it ever be.
When I run the following code, all four message boxes confirm that each statement evaluates to True.
Sub CompareStrings()
Dim A As String, B As String
A = "Cat"
B = Cells(1, 1).Value
MsgBox A = B
MsgBox A Like B
MsgBox StrComp(A, B) = 0
MsgBox "Cat" = "Cat"
End Sub
Update from comments
I don't see anything odd happening if I use an array, just FYI. Example data used in the array:
Modified routine to use an array:
Sub CompareStrings()
Dim A As String, B() As Variant
A = "Cat"
B = Application.Transpose(Range("A1:A8").Value)
For i = 1 To 8
MsgBox A = B(i)
MsgBox A Like B(i)
MsgBox StrComp(A, B(i)) = 0
MsgBox "Cat" = B(i)
Next
End Sub
What I would check is how you're instantiating the array. Range arrays (as per my example) are base 1. If it assigned some other way, it is most likely base 0, so check to make sure that you're comparing the correct array index.

Excel - VBA : Make the "replace" function more specific

I am currently encountering a problem which doesn't seem that hard to fix but, yet, I can't find a clean way of doing it on my own.
I am using the "Replace" function to change some expressions in a sentence typed by an user. For example, if the user types "va", I want it to be turned into "V. A." instead so it will match more easily with my database for further operations.
Here is my simple code to do it :
sMain.Range("J3").Replace "VA", "V. A."
It works well.
Problem is, it's not only spotting "VA" as an individual expression, but also as a part of words.
So if my user types "Vatican", it's gonna turn it into : "V. A.tican"... which of course I don't want.
Do you know how to easily specify my code to make it ONLY consider replacing the whole words matching the expression? (I have dozens of lines of these replacement so ideally, it would be better to act directly on the "replace" functions - if possible).
Thanks in advance !
Do this:
sMain.Range("J3").Replace " VA ", "V. A."
then handle the cases where the original string starts or ends with VA
also, handle all cases of separators which could be (for example) tab, space or comma.
To do that:
const nSep As Integer = 3
Dim sep(nSep) As String
sep(1) = " "
sep(2) = vbTab
sep(3) = ","
for i=1 to nSep
for j=1 to nSep
sMain.Range("J3").Replace sep(i) & "VA" & sep(j), "V. A."
next
next
Can split it up and check each word. I have put it into a function for easy of use and flexibility.
Function ReplaceWordOnly(sText As String, sFind As String, sReplace As String) As String
On Error Resume Next
Dim aText As Variant, oText As Variant, i As Long
aText = Split(sText, " ")
For i = 0 To UBound(aText)
oText = aText(i)
' Check if starting with sFind
If LCase(Left(oText, 2)) = LCase(sFind) Then
Select Case Asc(Mid(oText, 3, 1))
Case 65 To 90, 97 To 122
' obmit if third character is alphabet (checked by ascii code)
Case Else
aText(i) = Replace(oText, sFind, sReplace, 1, -1, vbTextCompare)
End Select
End If
Next
ReplaceWordOnly = Join(aText, " ")
End Function
Example output:
?ReplaceWordOnly("there is a vatican in vA.","Va","V. A.")
there is a vatican in V. A..

Combining formulas

I have this formula in a table which basically collects data from two columns and combines them. Now, I'm looking to combine this formula with a REPLACE formula that basically takes these characters æ,ø,å and replaces them with a,o,a.
Here's the formula:
=LOWER(LEFT(tableFaste[[#This Row];[Fornavn:]])&tableFaste[[#This Row];[Etternavn:]])
Sorry, don't know of a Formula way to remove any of a list of characters from a string. You might have to revert to vba for this. Here's a user defined function to do it. Your formula will become
=DeleteChars([#UserName],{"æ","ø","å";"a","o","a"})
To replace the characters use {"æ","ø","å";"a","o","a"} where the list up to the ; is the old characters, after the ; the new. You can make the list as long as you need, just make sure the lists are the same length.
To Delete the characters replace use {"æ","ø","å"} an array list of characters you want to remove
UDF code:
Function DeleteChars(r1 As Range, ParamArray c() As Variant) As Variant
Dim i As Long
Dim s As String
s = r1
If UBound(c(0), 1) = 1 Then
For i = LBound(c(0), 2) To UBound(c(0), 2)
s = Replace(s, c(0)(1, i), "")
Next
Else
For i = LBound(c(0), 2) To UBound(c(0), 2)
s = Replace(s, c(0)(1, i), c(0)(2, i))
Next
End If
DeleteChars = s
End Function
You can use SUBSTITUTE
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(LOWER(LEFT(tableFaste[[#This Row];[Fornavn:]])&tableFaste[[#This Row];[Etternavn:]]),"æ","a"),"ø","o"),"å","a")

Resources