Excel VBA split after second space - excel

I'm trying to split my column so that the names
James John Doe
Comes out as only
James John
Using the below formula but it only leaves the first name, where I want it to split at the second occurrence of "space".
Sub Split1()
Dim r As Range
For Each r In Range("A2:A" & Cells(Rows.count, "A").End(xlUp).Row).Cells.SpecialCells(xlCellTypeConstants)
r.Value = Split(r.Value, " ")(0)
Next r
Can anyone help me out?
Thanks

There are a few ways you can turn a three-word phrase into the first two words.
Let's start with your Split() method.
This function returns an array. Your particular method of attempting to access the index will only return a single word.
You can place into an array, then just combine the array elements:
For Each r In Range(...)
Dim retVal() As String
retVal = Split(r.Value)
r.Value = retVal(0) & " " & retVal(1)
Next r
You can remove the last word with Replace():
For Each r In Range(...)
r.Value = Replace(r.Value, Split(r.Value)(2), "")
Next
Or you can even use Regular Expressions:
With CreateObject("VBScript.RegExp")
.Pattern = "\s[^\s]+$"
For Each r in Range(...)
r.Value = .Replace(r.Value, "")
Next
End With
In Regular Expressions, \s signifies a single space character, the [^...] bracket means "Do not include", which we placed a \s within the bracket, so that would match any non-space character, followed by the + means 1 or more times, and finally the $ signifies the end of the string. Essentially, you are wanting to match a word [^\s]+ that is at the end of the string $, preceeded by a space \s, and remove it via the .Replace() method. And you actually could also simply use the pattern \s\S+$, which is essentially the same thing (\S means any non-space character when it's capitalized).

You may try to use left & find to obtain the string value untill second space instead of split function
Code modification:
Dim r As Range
Dim s As String, newText As String
Dim Length As Long
For Each r In Range("A2:A" & Cells(Rows.Count, "A").End(xlUp).Row).Cells.SpecialCells(xlCellTypeConstants)
s = r.Value
Length = Application.WorksheetFunction.Find(" ", s, Application.WorksheetFunction.Find(" ", s) + 1)
r.Value = Left(s, Length)
Next r
Sample output:

Further way to extract the 1st and 3rd token of a split array
This approach profits from the advanced possibilities of Application.Index allowing to indicate any new row or columns order; the wanted columns are reflected here by the last 1-based (columns) argument Array(1, 3):
Function GetFirstLast(s As String) As String
GetFirstLast = Join(Application.Index(Split(s), 0, Array(1, 3)))
End Function
Example call:
Debug.Print GetFirstLast("James John Doe")
resulting in
James Doe in the VB Editor's immediate window.

Related

How to format strings with numbers and special characters in Excel or Access using VBA?

I have a mathematical problem: these five strings are IDs for the same object. Due to these differences, objects appear multiple times in my Access table/query. Although there are a lot of these mutations, but I take this as an example.
76 K 6-18
76 K 6-18(2)
0076 K 0006/ 2018
0076 K 0006/2018
76 K 6/18
How would the VBA-code have to look like to recognize that these numbers stand for the same thing , so a general formatting with "RegEx()" or "format()" or "replace()"...but they must not only refer to this example but to the kind.
The common factor of these and all other mutations is always the following:
1) includes "-", no zeros left of "-", just 18 an not 2018 (year) at the end.
2) is like the first but with (2) (which can be dropped).
3) includes "/", zeros left of "/", and 2018 as year at the end.
4) is like third, but without space after "/".
5) is like the first one, but with a "/" instead of "-".
Character is always one single "K"! I suppose the best way would be to convert all 5 strings to 76 K 6 18 or in ohter cases for example to 1 K 21 20 or 123 K 117 20 . Is this possible with one elegant code or formula? Thanks
Here is a fun alternative using a rather complex but intuitive regular expression:
^0*(\d+) (K) 0*(\d+)[-\/] ?\d{0,2}(\d\d)(?:\(\d+\))?$
See an online demo
^ - Start line anchor.
0* - 0+ zeros to catch any possible leading zeros.
(\d+) - A 1st capture group of 1+ digits ranging 0-9.
- A space character.
(K) - 2nd Capture group capturing the literal "K".
- A space character.
(\d+) - A 3rd capture group of 1+ digits ranging 0-9.
[-\/] - Character class of either a hyphen or forward slash.
? - An optional space character.
\d{0,2} - 0-2 digits ranging from 0-9.
(\d\d) - A 4th capture group holding exactly two digits.
(?:\(\d+\))? - An optional non-capture group holding 1+ digits inside literal paranthesis.
$ - End line anchor.
Now just replace the whole string by the 4 capture groups with spaces in between.
Let's test this in VBA:
'A code-block to call the function.
Sub Test()
Dim arr As Variant: arr = Array("76 K 6-18", "76 K 6-18(2)", "0076 K 0006/ 2018", "0076 K 0006/2018", "76 K 6/18")
For x = LBound(arr) To UBound(arr)
Debug.Print Transform(CStr(arr(x)))
Next
End Sub
'The function that transform the input.
Function Transform(StrIn As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "^0*(\d+) (K) 0*(\d+)[-\/] ?\d{0,2}(\d\d)(?:\(\d+\))?$"
Transform = .Replace(StrIn, "$1 $2 $3 $4")
End With
End Function
All the elements from the initial array will Debug.Print "76 K 6 18".
Hope it helps, happy coding!
EDIT: If your goal is just to check if your string compiles against the pattern, the pattern itself can be shortened a little and you can return a boolean instead:
'A code-block to call the function.
Sub Test()
Dim arr As Variant: arr = Array("76 K 6-18", "76 K 6-18(2)", "0076 K 0006/ 2018", "0076 K 0006/2018", "76 K 6/18")
For x = LBound(arr) To UBound(arr)
Debug.Print Transform(CStr(arr(x)))
Next
End Sub
'The function that checks the input.
Function Transform(StrIn As String) As Boolean
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "^0*\d+ K 0*\d+[-\/] ?\d{2,4}(?:\(\d+\))?$"
Transform = .Test(StrIn)
End With
End Function
As #Vincent has suggested, look at using a custom function to convert all of the different data to be consistent. Based on what you have described, the following seems to work:
Function fConvertFormula(strData As String) As String
On Error GoTo E_Handle
Dim astrData() As String
strData = Replace(strData, "/", " ")
strData = Replace(strData, "-", " ")
strData = Replace(strData, " ", " ")
astrData = Split(strData, " ")
If UBound(astrData) = 3 Then
astrData(0) = CLng(astrData(0))
astrData(2) = CLng(astrData(2))
If InStr(astrData(3), "(") > 0 Then
astrData(3) = Left(astrData(3), InStr(astrData(3), "(") - 1)
End If
If Len(astrData(3)) = 4 Then
astrData(3) = Right(astrData(3), 2)
End If
fConvertFormula = Join(astrData, " ")
End If
fExit:
On Error Resume Next
Exit Function
E_Handle:
MsgBox Err.Description & vbCrLf & vbCrLf & "fConvertFormula", vbOKOnly + vbCritical, "Error: " & Err.Number
Resume fExit
End Function
It starts by replacing "field" delimiters with spaces, and then does a replace of double spaces. It then removes any leading zeroes from the first and third elements, if there is a bracket in the last element then delete that part, and finally converts to a 2 digit value before joining it all back up.
You may have other cases that you need to deal with, so I would suggest creating a query with the original data and the data converted by this function, and seeing what it throws out.
This function unifies the given string by the rules you defined in your question:
Public Function UnifyValue(ByVal inputValue As String) As String
'// Remove all from "(" on.
inputValue = Split(inputValue, "(")(0)
'// Replace / by blank
inputValue = Replace(inputValue, "/", " ")
'// Replace - by blank
inputValue = Replace(inputValue, "-", " ")
'// Replace double blanks by one blank
inputValue = Replace(inputValue, " ", " ")
'// Split by blank
Dim splittedInputValue() As String
splittedInputValue = Split(inputValue, " ")
'// Create the resulting string
UnifyValue = CLng(splittedInputValue(0)) & _
" " & splittedInputValue(1) & _
" " & CLng(splittedInputValue(2)) & _
" " & Right(CLng(splittedInputValue(3)), 2)
End Function
It always returns 76 K 6 18 regarding to your sample values.

VBA code to extract Substring from Main String

I am trying to extract substring from main string. String have not same pattern. Main string is in Column "I". Desired output should be as per column "J". I have to extract substring between "FL" and "WNG".
I have tried to write code put it is not giving proper output. Can you please assist with alternate solution to get desired output using VBA.
Sub Get_Substring()
Range("K2") = Mid(Range("I2"), InStrRev(Range("I2"), "FL") + 1, _
InStrRev(Range("I2"), "WNG") - _
InStrRev(Range("I2"), "FL") - 1)
End Sub
Try the following...
Range("K2") = Mid(Range("I2"), InStrRev(Range("I2"), "FL") + 2, _
InStrRev(Range("I2"), "WNG") - _
InStrRev(Range("I2"), "FL") - 2)
Although, I would make it clear that you want the value for each of the ranges, as follows...
Range("K2").Value = Mid(Range("I2").Value, InStrRev(Range("I2").Value, "FL") + 2, _
InStrRev(Range("I2").Value, "WNG") - _
InStrRev(Range("I2").Value, "FL") - 2)
The next piece of code extracts the necessary string using arrays, too. But it can do it, even if more "WNG" strings exist in the string to be analyzed:
Private Function ExtractString(strTxt As String) As String
Dim arrFL, arrWNG, i As Long
arrFL = Split(strTxt, "FL")
For i = 1 To UBound(arrFL) 'start from the second array element
arrWNG = Split(arrFL(i), "WNG") 'split each first array element by "WNG"
'if the array contains at least a "WNG" string:
If UBound(arrWNG) > 0 Then ExtractString = arrWNG(0): Exit Function 'extract the first array element
Next
End Function
Note: If more pairs "FL" folowed by "WNG" exists, the function can be adapted to return an array, containing all such potential occurrences...
It can be tested using the next testing Sub:
Sub testExtractString()
Dim x As String
x = "John12REGNO02FL02WNGARM01"
'x = "John12WNGREGNO02FL02WNGARM01"
'x = "John12WNGREGNO02FL02WNGARWNGM01"
Debug.Print ExtractString(x)
End Sub
Just uncomment each x definition row...
I'll chuck in a solution based on regex to assure you got the exact substring:
Sub Test()
Dim stringIn As String: stringIn = "John12REGNO02FL02WNGARM01"
Debug.Print (Extract(stringIn))
End Sub
Function Extract(stringIn As String) As String
With CreateObject("vbscript.regexp")
.Pattern = "^.*FL(.*?)WNG"
If .Test(stringIn) = True Then
Extract = .Execute(stringIn)(0).Submatches(0)
Else
Extract = "None Found"
End If
End With
End Function
^ - Start line anchor.
.*FL - 0+ Chars greedy, and therefor untill, the last occurence of "FL".
(.*?) - A capture group with 0+ but lazy characters and therefor upto the nearest occurence of:
WNG - Literally match "WNG".
NOTE, you could make a more strict pattern only catching digits of that's the only type of characters possible, e.g: ^.*FL(\d*)WNG.
Here is an online demo
You can try the following udf:
Public Function FLWNG(s As String) As String
'Purpose: get the substring enclosed by the most right pair of FL..WNG
Dim tmp
tmp = Split(Replace(s, "WNG", "FL"), "FL")
FLWNG = tmp(UBound(tmp) - 1)
End Function
Explanation
Replacing all occurencies of WNG in the original string (s) with FL allows to split the resulting string by the FL delimiter only.
Assuming that the original string has at least one enclosing structure, you get the enclosed content as next to last element, i.e. via tmp(Ubound(tmp)-1).

Swapping delimited strings in an excel column

I have a column in a very large excel spreadsheet that is in some cases incorrectly formatted. It should contain first a street address, then a name, separated by a hyphen, as shown:
123 Main St-Smith
However, some are formatted in reverse, such as:
Jones-231 High St
All the addresses start with a numeric and all the names start with an alpha. I am looking for a macro or code that would swap only the name and address where it is incorrectly formatted. I have tried turning it into a comma delimited to separate them out, but since they only occur intermittently I am still left with fixing them one by one manually.
Any suggestions? I am by no means an Excel macro expert. Thanks!
Split the string on the hyphen then look for spaces in the second element.
dim i as long, tmp as variant
with worksheets("sheet1")
for i = 2 to .cells(.rows.count, "a").end(xlup).row
tmp = split(.cells(i, "a").value2, "-")
if cbool(instr(1, tmp(1), " ")) then _
.cells(i, "a") = join(array(tmp(1), tmp(0)), "-")
next i
end with
As you wrote
Street name is any string that begins with a digit and ends with either a hyphen or the end of the string
Name is any string that starts with a non-digit and ends with either a hyphen or the end of the string
This can be done using just native VBA, (although at first I was going to use Regular Expressions)
split on the hyphen
rearrange depending on if first starts with a number or not
do some error checking in case no hyphen present or don't have the number and non-number start as specified.
Option Explicit
Function fmtAddressName2(S As String) As String
Dim sAddr As String, sName As String
Dim v As Variant
v = Split(S, "-")
On Error GoTo badFormat
If IsNumeric(Left(v(0), 1)) And Not IsNumeric(Left(v(1), 1)) Then
sAddr = v(0)
sName = v(1)
ElseIf Not IsNumeric(Left(v(0), 1)) And IsNumeric(Left(v(1), 1)) Then
sAddr = v(1)
sName = v(0)
Else
GoTo badFormat
End If
fmtAddressName2 = sAddr & "-" & sName
Exit Function
badFormat:
'return unchanged string
fmtAddressName2 = S
'or could return an error message
End Function

Remove words that contain each other and leave the longer one

I'm looking for a macro (preferably a function) that would take cell contents, split it into separate words, compare them to one another and remove the shorter words.
Here's an image of what I want the output to look like (I need the words that are crossed out removed):
I tried to write a macro myself, but it doesn't work 100% properly because it's not taking the last words and sometimes removes what shouldn't be removed. Also, I have to do this on around 50k cells, so a macro takes a lot of time to run, that's why I'd prefer it to be a function. I guess I shouldn't use the replace function, but I couldn't make anything else work.
Sub clean_words_containing_eachother()
Dim sht1 As Worksheet
Dim LastRow As Long
Dim Cell As Range
Dim cell_value As String
Dim word, word2 As Variant
Set sht1 = ActiveSheet
col = InputBox("Which column do you want to clear?")
LastRow = sht1.Cells(sht1.Rows.Count, col).End(xlUp).Row
Let to_clean = col & "2:" & col & LastRow
For i = 2 To LastRow
For Each Cell In sht1.Range(to_clean)
cell_value = Cell.Value
cell_split = Split(cell_value, " ")
For Each word In cell_split
For Each word2 In cell_split
If word <> word2 Then
If InStr(word2, word) > 0 Then
If Len(word) < Len(word2) Then
word = word & " "
Cell = Replace(Cell, word, " ")
ElseIf Len(word) > Len(word2) Then
word2 = word2 & " "
Cell = Replace(Cell, word2, " ")
End If
End If
End If
Next word2
Next word
Next Cell
Next i
End Sub
Assuming that the retention of the third word in your first example is an error, since books is contained later on in notebooks:
5003886 book books bound case casebound not notebook notebooks office oxford sign signature
and also assuming that you would want to remove duplicate identical words, even if they are not contained subsequently in another word, then we can use a Regular Expression.
The regex will:
Capture each word
look-ahead to see if that word exists later on in the string
if it does, remove it
Since VBA regexes cannot also look-behind, we work-around this limitation by running the regex a second time on the reversed string.
Then remove the extra spaces and we are done.
Option Explicit
Function cleanWords(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\b(\w+)\b(?=.*\1)"
.ignorecase = True
'replace looking forward
sTemp = .Replace(S, "")
' check in reverse
sTemp = .Replace(StrReverse(sTemp), "")
'return to normal
sTemp = StrReverse(sTemp)
'Remove extraneous spaces
cleanWords = WorksheetFunction.Trim(sTemp)
End With
End Function
Limitations
punctuation will not be removed
a "word" is defined as containing only the characters in the class [_A-Za-z0-9] (letters, digits and the underscore).
if any words might be hyphenated, or contain other non-word characters
in the above, they will be treated as two separate words
if you want it treated as a single word, then we might need to change the regex
General steps:
Write cell to array (already working)
for each element (x), go through each element (y) (already working)
if x is in y AND y is longer that x THEN set x to ""
concat array back into string
write string to cell
String/array manipulations are much faster than operations on cells, so this will give you some increase in performance (depending on the amount of words you need to replace for each cell).
The "last word problem" might be that you dont have a space after the last word within your cells, since you only replace word + " " with " ".

Extract two substrings from a set of three separated by ampersands

I am trying to extract numbers from a text.
If I have an entry like 12&6&2014, how can I extract the 12 (the number that is before the first &) and 2014 (the number that occurs after the second &)?
To get first number:
=LEFT(A1, FIND("&", A1)-1)
To get last number after the second &:
=RIGHT(A1, 4)
Otherwise, if that's not always a year:
=MID(A1, FIND(CHAR(1), SUBSTITUTE(A1, "&", CHAR(1), 2))+1, LEN(A1))
you can loop through each character int he string and check to see if it is numeric
Sub getNumberValues()
Dim s As String
Dim c As New Collection
Dim sNewString As String
s = "12&6&2014"
For v = 1 To Len(s)
If IsNumeric(Mid(s, v, 1)) Then
sNewString = sNewString & Mid(s, v, 1)
Else
c.Add sNewString
sNewString = ""
End If
Next v
'add the last entry
c.Add sNewString
sNewString = ""
For Each x In c
sNewString = sNewString & x & Chr(13)
Next
MsgBox sNewString
End Sub
If what I understand is correct which is that the characters that separate vary as well as how many digits are used you might look into something like this:
Function CleanUp(Txt)
For x = 1 To 255
Select Case x
Case 45, 47, 65 To 90, 95, 97 To 122
Txt = WorksheetFunction.Substitute(Txt, Chr(x), "") <- "" can be replaced
End Select with "&" to do a
Next x MID() using & as
the delimiter
CleanUp = Txt
End Function
If you can use VBA, this will replace your characters with a blank, but you could put in your own character and then use your formulas to separate from a specific delimiter.
The original code can be found here:
http://www.mrexcel.com/forum/excel-questions/380531-extract-only-numbers.html
Simplest might be Text to Columns with & as the delimiter, then delete the middle column. This overwrites the original data, so a copy might be appropriate.
Another simple way would be to create two copies, and for one Find what: &* and Replace with: nothing, for the other Find what: *& and Replace with: nothing.
An alternative formula solution might be:
=DAY(SUBSTITUTE($A1,"&","/"))
and
=YEAR(SUBSTITUTE($A1,"&","/"))

Resources