I need to extract last names, which are allways written upper case, from cells, where is written all the name. The name can have different shapes e.g.:
Jan H. NOVAK
Petr Karel POUZAR
Frantisek Ix GREGOR
I have tried to find some VBAs on the web. I found this one, but it extract also the one letter middle names which are also upper case:
Function UpperWords(str As Variant) As String
Dim i As Integer, sTemp As String, StrTmp As String
For i = 0 To UBound(Split(str, " "))
StrTmp = Split(str, " ")(i)
If UCase(StrTmp) = StrTmp Then sTemp = sTemp & " " & StrTmp
Next i
UpperWords = Trim(sTemp)
End Function
I need to define in the VBA tahat the upper case word which I want to extract has at least two letters.
Thank you for your ideas.
Add the test to the If:
Function UpperWords(str As Variant) As String
Dim i As Integer, sTemp As String, StrTmp As String
For i = 0 To UBound(Split(str, " "))
StrTmp = Split(str, " ")(i)
If UCase(StrTmp) = StrTmp And Len(StrTmp) > 1 Then sTemp = sTemp & " " & StrTmp
Next i
UpperWords = Trim(sTemp)
End Function
If what you want is to extract last names, then you can use a formula/function to do just that. Capitalization would seem to be irrelevant.
worksheet formula
=TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",99)),99))
UDF
Function lastName(S As String) As String
lastName = Mid(S, InStrRev(Trim(S), " ") + 1)
End Function
Dim regEx As Object
Dim allMatches As Object
Dim Surname As Variant
Dim rng As Range
Dim cell As Range
Dim m As Match
Set rng = ThisWorkbook.Worksheets("Sheet2").Range("A1:A4")
For Each cell In rng
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.IgnoreCase = True
.MultiLine = False
.Pattern = "\s([A-Z]+)$"
.Global = True
End With
Set allMatches = regEx.Execute(cell.Value)
For Each m In allMatches
Surname = m.SubMatches(0)
Debug.Print Surname
Next
Next cell
Related
I have pipe-delimited strings I need to find and replace on the entire substring between the pipes
So if my strings looks like
AAAP|AAA TTT|AAA|000 or AAA|AAAP|AAA TTT|AAA|000 Or AAA|AAAP|AAA TTT|AAA|AAA
The AAA can be anywhere in the string. beginning and/or end or exist multiple times
and I want to replace AAA with ZZZ
The result I need:
AAAP|AAA TTT|ZZZ|000 or ZZZ|AAAP|AAA TTT|ZZZ|000 or ZZZ|AAAP|AAA TTT|ZZZ|ZZZ
The result I am getting
AAAP|ZZZ TTT|ZZZ|000 ...
How to restrict the replacement to the entire substring
Sub ExtractSubstringReplace()
Dim strSource, strReplace, strFind, RegExpReplaceWord, r, a
strSource = "AAAP|AAA TTT|AAA|000"
strFind = "AAA"
strReplace = "ZZZ"
Dim re As Object
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.IgnoreCase = True
re.Pattern = "\b" & strFind & "\b"
RegExpReplaceWord = re.Replace(strSource, strReplace)
MsgBox RegExpReplaceWord
End Sub
You can use
re.Pattern = "(^|\|)" & strFind & "(?![^|])"
RegExpReplaceWord = re.Replace(strSource, "$1" & strReplace)
See the (^|\|)AAA(?![^|]) regex demo. Note it is equal to (^|\|)AAA(?=\||$).
Details:
(^|\|) - Capturing group 1: either start of string or a pipe char
AAA - search string
(?![^|]) / (?=\||$) - a lookahead that makes sure there is either | or end of string immediately to the right of the current location.
NOTE: if your strFind can contain special regex metacharacters, make sure you escape the string using the solution from Regular Expression and embedded special characters.
Put the string on which replacement is to be made (AAAP|AAA TTT|AAA|000) in cell A1 of Sheet1 and run the following code.
You will get the whole string with replacements made in cell A2.
Sub ExtractSubstringReplace()
Dim strArr, str, strArrNew(), strSource As String, strReplace As String, strFind As String, i As Long
strSource = Sheet1.Range("A1").Value
strFind = "AAA"
strReplace = "ZZZ"
strArr = Split(Sheet1.Range("A1").Value, "|")
For Each str In strArr
If str = strFind Then str = strReplace
ReDim Preserve strArrNew(i)
strArrNew(i) = str
i = i + 1
Next str
For Each str In strArrNew
Debug.Print str
Next str
Sheet1.Range("A2").Value = Join(strArrNew, "|")
End Sub
I would go with the option presented in the comments by #Siddharth Rout as it is probably the most efficient.
Sub Test()
MsgBox ExtractSubstringReplace("AAAP|AAA TTT|AAA|000", "AAA", "ZZZ")
MsgBox ExtractSubstringReplace("AAA|AAAP|AAA TTT|AAA|000", "AAA", "ZZZ")
MsgBox ExtractSubstringReplace("AAA|AAAP|AAA TTT|AAA|AAA", "AAA", "ZZZ")
End Sub
Public Function ExtractSubstringReplace(ByVal strSource As String _
, ByVal strFind As String _
, ByVal strReplace As String _
, Optional ByVal delimiter As String = "|" _
) As String
Dim result As String
'Duplicate delimiter and also add leading and trailing delimiter
result = delimiter & Replace(strSource, delimiter, delimiter & delimiter) & delimiter
'Replace substrings
result = Replace(result, delimiter & strFind & delimiter, delimiter & strReplace & delimiter)
'Remove leading and trailing delimiter that we added previously
result = Mid$(result, Len(delimiter) + 1, Len(result) - Len(delimiter) * 2)
'Restore delimiters
ExtractSubstringReplace = Replace(result, delimiter & delimiter, delimiter)
End Function
You can also use the function in an Excel cell.
Slightly shortened code
As addition to #ChristianBuse 's fine solution another fast approach based on the same idea (needing only 0.00 to max. 0.02 seconds):
Function SubRep(src, fnd, repl, Optional ByVal delim As String = "|") As String
SubRep = delim & src & delim
Dim i As Long
For i = 1 To 2
SubRep = Replace(SubRep, delim & fnd & delim, delim & repl & delim)
Next
SubRep = Mid$(SubRep, 2, Len(SubRep) - 2)
End Function
Example call
Sub ExampleCall()
Dim terms
terms = Array("AAAP|AAA TTT|AAA|000", "AAA|AAAP|AAA TTT|AAA|000", "AAA|AAAP|AAA TTT|AAA|AAA")
Dim i As Long
For i = LBound(terms) To UBound(terms)
Debug.Print Format(i, "0 ") & terms(i) & vbNewLine & " " & _
SubRep(terms(i), "AAA", "ZZZ")
Next
End Sub
Results in VB Editor's immediate window
0 AAAP|AAA TTT|AAA|000
AAAP|AAA TTT|ZZZ|000
1 AAA|AAAP|AAA TTT|AAA|000
ZZZ|AAAP|AAA TTT|ZZZ|000
2 AAA|AAAP|AAA TTT|AAA|AAA
ZZZ|AAAP|AAA TTT|ZZZ|ZZZ
I am looking to search for values from a list in Sheet1 in each cell of column C on sheet2 to be separated by commas.
Sheet1 has a list of names:
Sheet 2 has a set of sentences in column C. The output in column D should be the names in Sheet1.
I have searched but haven't found a solution.
I don't have any code to show that has been effective in this regard but I did come across a function that seemed promising but, since I don't know what would surround the name per cell it isn't quite what I need.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Count - 1
For j = 0 To allMatches.Item(i).submatches.Count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Using regexp Test:
Function CheckList(ByVal text As String, list As Range) As String
Static RE As Object
Dim arr, sep, r As Long, result As String, v
If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")
If Len(text) > 0 Then
arr = list.Value
'check each name
For r = 1 To UBound(arr, 1)
v = arr(r, 1)
If Len(v) > 0 Then
RE.Pattern = "\b" & v & "\b" '<< whole word only
If RE.test(text) Then
result = result & sep & v
sep = ", " 'populate the separator
End If
End If
Next r
End If
CheckList = result
End Function
You can use a Dictionary object to check each string against the NameList, assuming that the names in the sample string do not have punctuation.
If they do, this method can still be used, but would require some modification. For example, one could replace all of the punctuation with spaces; or do something else depending on how complex things might be.
eg:
Option Explicit
Function ckNameList(str As String, nameList As Range) As String
Dim D As Dictionary
Dim vNames, I As Long, V, W
Dim sOut As String
vNames = nameList
Set D = CreateObject("Scripting.Dictionary")
D.CompareMode = TextCompare
For I = 1 To UBound(vNames)
If Not D.Exists(vNames(I, 1)) Then _
D.Add vNames(I, 1), vNames(I, 1)
Next I
V = Split(str, " ")
sOut = ""
For Each W In V
If D.Exists(W) Then _
sOut = sOut & ", " & W
Next W
ckNameList = Mid(sOut, 3)
End Function
Scott showed how to use TEXTJOIN, when you don't have access to this function. Your best best might be VBA. We could emulate some sort of TEXTJOIN, possibly like so:
Function ExtractNames(nms As Range, str As Range) As String
ExtractNames = Join(Filter(Evaluate("TRANSPOSE(IF(ISNUMBER(SEARCH(" & nms.Address & "," & str.Address & "))," & nms.Address & ",""|""))"), "|", False), ", ")
End Function
Called in D2 like: =ExtractNames($A$2:$A$7,C2) and dragged down. Downside of this Evalate method is that it's making use of an array formula, however the native TEXTJOIN would have been so too. Plusside is that it's avoiding iteration.
EDIT
As #TimWilliams correctly stated, this might end up confusing substrings that hold part of what we are looking for, e.g. > Paul in Pauline.
I also realized that to overcome this, we need to substitute special characters. I've rewritten my function to the below:
Function ExtractNames(nms As Range, str As Range) As String
Dim chr() As Variant, arr As Variant
'Create an array of characters to ignore
chr = Array("!", ",", ".", "?")
'Get initial array of all characters, with specified characters in chr substituted for pipe symbol
arr = Evaluate("TRANSPOSE(IF(ISNUMBER(MATCH(MID(" & str.Address & ",ROW(A1:A" & Len(str.Value) & "),1),{""" & Join(chr, """,""") & """},0)),""|"",MID(" & str.Address & ",ROW(A1:A" & Len(str.Value) & "),1)))")
'Get array of words used to check against names without any specified characters
arr = Split(Join(Filter(arr, "|", False), ""), " ")
'Check which names occur in arr
For Each cl In nms
If IsNumeric(Application.Match(cl.Value, arr, 0)) Then
If ExtractNames = "" Then
ExtractNames = cl.Value
Else
ExtractNames = Join(Array(ExtractNames, cl.Value), ", ")
End If
End If
Next cl
End Function
As you can tell, it's possible still, but my conclusion and recommendation would be to go with RegEx. #TimWilliams has a great answer explaining this, which I slightly adapted to prevent an extra iteration:
Function ExtractNames(nms As Range, str As Range) As String
Dim regex As Object: Set regex = CreateObject("VBScript.RegExp")
Dim arr() As Variant: arr = Application.Transpose(nms.Value)
Dim del As String
regex.Pattern = "\b(?:" & Join(arr, "|") & ")\b"
regex.Global = True
regex.Ignorecase = True
Set hits = regex.Execute(str.Value)
For Each hit In hits
ExtractNames = ExtractNames & del & hit
del = ", "
Next hit
End Function
Or even without iteration:
Function ExtractNames(nms As Range, str As Range) As String
Dim regex As Object: Set regex = CreateObject("VBScript.RegExp")
Dim arr() As Variant: arr = Application.Transpose(nms.Value)
Dim del As String
regex.Global = True
regex.Ignorecase = True
'Perform 1st replace on non-alphanumeric characters
regex.Pattern = "[^\w]"
ExtractNames = Application.Trim(regex.Replace(str.Value, " "))
'Perferom 2nd replace on all words that are not in arr
regex.Pattern = "\b(?!" & Join(arr, "|") & ")[\w-]+\b"
ExtractNames = Application.Trim(regex.Replace(ExtractNames, " "))
ExtractNames = Replace(ExtractNames, " ", ", ")
End Function
New to VBA, trying to create a function that essentially searches a column for certain values. If it finds a hit then it returns a corresponding column, else returns a space. The way the worksheet is formatted, one cell can have multiple values (separated by ALT+ENTER, so each new value is on a separate line).
The code I used currently works but has an issue:
Since I am using inStr the code is returning partial matches as well (which I do not want).
Example:
**Column to Search (one cell)**
ABC
AB
B
When I run the code to find AB, it will return hits for both AB and ABC since AB is part of it.
Ideal solution would be to first split the cells based on ALT+ENTER and loop through all values per cell and then return the desired value. But not how the syntax would look.
Current Code
Function newFunc(Search_string As String, Search_in_col As Range, Return_val_col As Range)
Dim i As Long
Dim result As String
Dim mRange As Range
Dim mValue As String
For i = 1 To Search_in_col.Count
If InStr(1, Search_in_col.Cells(i, 1).Text, Search_string) <> 0 Then
If (Return_val_col.Cells(i, 1).MergeCells) Then
Set mRange = Return_val_col.Cells(i, 1).MergeArea
mValue = mRange.Cells(1).Value
result = result & mValue & ", "
Else
result = result & Return_val_col.Cells(i, 1).Value & ", "
End If
End If
Next
Example:
Adding an example to better explain the situation
you can split the string and loop that.
Function newFunc(Search_string As String, Search_in_col As Range, Return_val_col As Range) As String
If Search_in_col.Cells.Count <> Return_val_col.Cells.Count Then Exit Function
Dim sptStr() As String
sptStr = Split(Search_string, Chr(10))
Dim srchArr() As Variant
srchArr = Search_in_col.Value
Dim RetArr() As Variant
RetArr = Return_val_col.Value
Dim i As Long
For i = LBound(sptStr) To UBound(sptStr)
Dim j As Long
For j = LBound(srchArr, 1) To UBound(srchArr, 1)
If srchArr(j, 1) = sptStr(i) Then
newFunc = newFunc & RetArr(j, 1) & ", "
End If
Next j
Next i
newFunc = Left(newFunc, Len(newFunc) - 2)
End Function
EDIT:
As per the new information:
Function newFunc(Search_string As String, Search_in_col As Range, Return_val_col As Range) As String
Search_string = "|" & Search_string & "|"
Dim srchArr() As Variant
srchArr = Search_in_col.Value
Dim RetArr() As Variant
RetArr = Return_val_col.Value
Dim i As Long
For i = LBound(srchArr, 1) To UBound(srchArr, 1)
Dim T As String
T = "|" & Replace(srchArr(i, 1), Chr(10), "|") & "|"
If InStr(T, Search_string) > 0 Then
newFunc = newFunc & RetArr(i, 1) & ", "
End If
Next i
newFunc = Left(newFunc, Len(newFunc) - 2)
End Function
You can use regular expressions which have a word boundary token.
The following seems to reproduce what you show in your example:
Option Explicit
'Set reference to Microsoft VBScript Regular Expressions 5.5
Function col_return(lookFor As String, lookIn As Range) As String
Dim RE As RegExp
Dim C As Range
Dim S As String
Set RE = New RegExp
With RE
.Global = True
.IgnoreCase = True 'unless you want case sensitive searches
For Each C In lookIn
.Pattern = "\b(" & lookFor & ")\b"
If .Test(C.Text) = True Then
S = S & "," & C.Offset(0, -1)
End If
Next C
End With
col_return = Mid(S, 2)
End Function
I used early binding, which means you set a reference in VBA as noted in the comments.
You can use late-binding and avoid the reference. To do that you would change to the DIM and Set lines for RE to:
DIM RE as Object
Set RE = createobject("vbscript.regexp")
You can read about early vs late-binding by doing an internet search.
The formula I used and the layout is in the screenshot below:
In VBA if I have a string of numbers lets say ("1,2,3,4,5,2,2"), how can I remove the duplicate values and only leave the first instance so the string says ("1,2,3,4,5").
Here is a function you can use to dedupe a string as you've described. Note that this won't sort the deduped string, so if yours was something like "4,2,5,1,3,2,2" the result would be "4,2,5,1,3". You didn't specify you needed it sorted, so I didn't include that functionality. Note that the function uses , as the default delimiter if not specified, but you can specify a delimiter if you choose.
Function DeDupeString(ByVal sInput As String, Optional ByVal sDelimiter As String = ",") As String
Dim varSection As Variant
Dim sTemp As String
For Each varSection In Split(sInput, sDelimiter)
If InStr(1, sDelimiter & sTemp & sDelimiter, sDelimiter & varSection & sDelimiter, vbTextCompare) = 0 Then
sTemp = sTemp & sDelimiter & varSection
End If
Next varSection
DeDupeString = Mid(sTemp, Len(sDelimiter) + 1)
End Function
Here's some examples of how you would call it:
Sub tgr()
MsgBox DeDupeString("1,2,3,4,5,2,2") '--> "1,2,3,4,5"
Dim myString As String
myString = DeDupeString("4-2-5-1-3-2-2", "-")
MsgBox myString '--> "4-2-5-1-3"
End Sub
I suggest writing a Join function to combine the unique parts back into a single string (there is one available for arrays, but not for any other collection):
Function Join(Iterable As Variant, Optional Delimiter As String = ",") As String
Dim notFirst As Boolean
Dim item As Variant
For Each item In Iterable
If notFirst Then
Join = Join & Delimiter
Else
notFirst = True
End If
Join = Join & item
Next
End Function
Then, use Split to split a string into an array, and Scripting.Dictionary to enforce uniqueness:
Function RemoveDuplicates(s As String, Optional delimiter As String = ",") As String
Dim parts As String()
parts = Split(s,delimiter)
Dim dict As New Scripting.Dictionary
Dim part As Variant
For Each part In parts
dict(part) = 1 'doesn't matter which value we're putting in here
Next
RemoveDuplicates = Join(dict.Keys, delimiter)
End Function
try this:
Sub test()
Dim S$: S = "1,2,3,4,5,2,2,5,6,6,6"
Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
Dim Key As Variant
For Each Key In Split(S, ",")
If Not Dic.exists(Trim(Key)) Then Dic.Add Trim(Key), Nothing
Next Key
S = Join(Dic.Keys, ","): MsgBox S
End Sub
Heres my crack at it:
Function Dedupe(MyString As String, MyDelimiter As String)
Dim MyArr As Variant, MyNewArr() As String, X As Long, Y As Long
MyArr = Split(MyString, MyDelimiter)
ReDim MyNewArr(0)
MyNewArr(0) = MyArr(0)
Y = 0
For X = 1 To UBound(MyArr)
If InStr(1, Join(MyNewArr, MyDelimiter), MyDelimiter & MyArr(X)) = 0 Then
Y = Y + 1
ReDim Preserve MyNewArr(Y)
MyNewArr(Y) = MyArr(X)
End If
Next
Dedupe = Join(MyNewArr, MyDelimiter)
End Function
Call it like this in code:
Dedupe(Range("A1").Text,",")
Or like this in the sheet:
=Dedupe(A1,",")
The first parameter is the cell to test and the second is the delimiter you want to use (in your example it is the comma)
vb6,Find Duplicate letter in word when there is no delimiter.
Function RemoveDuplicateLetter(ByVal MyString As String) As String
Dim MyArr As Variant, MyNewArr() As String, X As String,str as String
Dim bValue As Boolean
Dim i As Long, j As Long
For i = 0 To Len(MyString)
str = str & Mid$(MyString, i + 1, 1) & vbNullChar
Next
i = 0
MyArr = Split(str, vbNullChar)
ReDim MyNewArr(0)
MyNewArr(0) = MyArr(0)
For i = LBound(MyArr) To UBound(MyArr)
bValue = True
For j = i + 1 To UBound(MyArr)
If MyArr(i) = MyArr(j) Then
bValue = False
Exit For
End If
Next
If bValue Then X = X & " " & MyArr(i)
Next
RemoveDuplicateLetter = X
End Function
I have a cell value like this:
This is a <"string">string, It should be <"changed">changed to <"a"> a number.
There are some words repeated in this part <" ">.
I want use Excel VBA to change the cell value to:
This is a string, It should be changed to a number.
Any help will be appreciated. Thanks.
Following up on the suggestion to use regular expressions, here's an example:
Option Explicit
Sub RemoveByRegexWithLateBinding()
Dim strIn As String
Dim strOut As String
Dim objRegex As Object
'input
strIn = "This is a <""string"">string, It should be <""changed"">changed to <""a""> a number."
Debug.Print "Input:" & vbCr & strIn
'create and apply regex
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "<""[^<>""]*"">"
objRegex.Global = True
strOut = objRegex.Replace(strIn, "")
'test output
Debug.Print "Output:" & vbCr & strOut
End Sub
Produces this output:
Input:
This is a <"string">string, It should be <"changed">changed to <"a"> a number.
Output:
This is a string, It should be changed to a number.
Diagram of regular expression:
Which can be explained as finding a string that:
begins with <"
contains anything apart from the characters <, > and "
ends with ">
Assuming the text in cell A1, then try this code
Sub DelDoubleString()
Dim Text As String, Text2Replace As String, NewText As String
On Error Resume Next 'Optional, in case there's no double string to be deleted
Text = Cells(1, 1)
Do
Text2Replace = Mid$(Text, InStr(Text, "<"), InStr(Text, ">") - InStr(Text, "<") + 1)
NewText = Application.WorksheetFunction.Substitute(Text, Text2Replace, vbNullString)
Text = NewText
Loop Until InStr(NewText, "<") = 0
Cells(1, 1) = NewText
End Sub
Select the cells containing your text and run this short macro:
Sub Kleanup()
Dim d As Range, s As String, rng As Range
Dim gather As Boolean, L As Long, DQ As String
Dim i As Long, s2 As String, CH As String
Set rng = Selection
DQ = Chr(34)
For Each r In rng
s = Replace(r.Text, "<" & DQ, Chr(1))
s = Replace(s, DQ & ">", Chr(2))
gather = True
L = Len(s)
s2 = ""
For i = 1 To L
CH = Mid(s, i, 1)
If CH = Chr(1) Then gather = False
If CH = Chr(2) Then gather = True
If gather And CH <> Chr(2) Then s2 = s2 & CH
Next i
r.Value = s2
Next r
End Sub
U can Use Replace function
ActiveSheet.Cells(1, 1).Value = Replace(ActiveSheet.Cells(1, 1).Value, "String", "Number")