Text to columns - Uniform response with different numbers of Spaces - excel

I have a spreadsheet where column A is a list of names. Some of these names have titles (e.g., Mr John Doe, Miss Jane Doe, Mrs Jane Bloggs, Cllr Joe Bloggs etc) some of the names do not (just Joe Doe, John Bloggs, Jane Doe etc). I've been asked to split the names into three columns - Title, First Name, Last Name.
When I try the simple "text to columns", it's fine where there is a title, but where there isn't one, the first name defaults to the title column.
Is there a way to have the data split into the correct cells, or is it going to be a lot of manual work for someone?

You can use VBA to accomplish this.
You will create two different arrays. The first one is your raw data (your single column) preArr(), and your new array that will be written back to the worksheet postArr() that has been dimensioned for three columns ReDim postArr(..., 1 To 3).
First, test if the string from preArr(i, 1) contains known salutations. If it does, then you will add the first split string to postArr(, 1) - otherwise you won't add anything to this column.
Side Note: You can add additional salutations to this line:
.Pattern = "^(?:MRS?|MS|MIS+|CLLR|DR)\.?\s"
This is a regular expression, but just add another | separator for additional checks. I combined MR and MRS into one group, the ? makes the S optional in case you were wondering.
Here is the full program:
Option Explicit
Sub splitOnNames()
Dim preArr(), postArr(), ws As Worksheet, preRng As Range
Set ws = Selection.Parent
Set preRng = Selection
preArr = preRng.Value
If UBound(preArr, 2) > 1 Then
MsgBox "This can only be done on a single column!", vbCritical
Exit Sub
End If
ReDim postArr(LBound(preArr) To UBound(preArr), 1 To 3)
Dim i As Long, x As Long, tmpArr
For i = LBound(preArr) To UBound(preArr)
If preArr(i, 1) <> "" Then
tmpArr = Split(preArr(i, 1))
If testSalutation(preArr(i, 1)) Then
postArr(i, 1) = tmpArr(0)
postArr(i, 2) = tmpArr(1)
For x = 2 To UBound(tmpArr) 'Some last names have two names
postArr(i, 3) = Trim(postArr(i, 3) & " " & tmpArr(x))
Next x
Else
postArr(i, 2) = tmpArr(0)
For x = 1 To UBound(tmpArr) 'Some last names have two names
postArr(i, 3) = Trim(postArr(i, 3) & " " & tmpArr(x))
Next x
End If
Erase tmpArr
End If
Next i
With preRng
Dim postRng As Range
Set postRng = ws.Range(ws.Cells(.Row, .Column), _
ws.Cells(.Rows.Count + .Row - 1, .Column + 2))
postRng.Value = postArr
End With
End Sub
Private Function testSalutation(ByVal testStr As String) As Boolean
With CreateObject("VBScript.RegExp")
.IgnoreCase = True
.Pattern = "^(?:MRS?|MS|MIS+|CLLR|DR)\.?\s"
testSalutation = .Test(testStr)
End With
End Function
See it Live:

If I have to do this, then I use the 'Text to columns'. After that I sort by third column.
Now all rows having only 2 values are one after the other listed. I mark the first column for all this rows, press 'Ctrl + or' or click the right mouse and select 'insert cells'. Then you will be asked if you like to shift down or right. Select shift right and a cells are arranged as you like to have it.

Related

Split full name into lastname,firstname if lastname has multiple parts (i.e. van, de)

I have a list with multiple names, however, some names have multiple last names: Ex. "Eddie van Halen. I can't get my code to output: "van Halen, Eddie", instead it outputs: "Van, Eddie"
Would there be a way to check if names() has more than 2 parts, and if so to include names(1) & names(2) as last name instead of checking for "van". That is if a last name includes other parts such as "de".
Additionaly, if the full name does not have multiple parts, ex: "volunteer", the code should skip this name.
Here is my current code:
Sub Filter()
Dim r As Range
Dim c As Range
Dim names() As String
Dim lastrow As Long
lastrow = ActiveSheet.Cells(Rows.Count, 2).End(xlUp).Row
With ActiveSheet
Set r = .Range("K3:K" & lastrow)
For Each c In r
names = Split(c.Value, " ")
If IsEmpty(c.Value) Then Exit Sub
ElseIf InStr(c.Value, "van") > 0 Then
c.Value = names(1) & names(2) & ", " & names(0)
Else
c.Value = names(1) & ", " & names(0)
End If
Next c
End With
End Sub
Split takes a third argument, "Limit", which can be used to stop splitting once the result will have that many entries.
names = Split(c.Value, " ", 2)
In this case names will always have at most 2 elements.
As an option, you could use regular expression. The following regular expression matches all words which do not begin with van, von, de etc. You can add your words at will. As an edge case, the name itself can start with van or von (for instance, Vonder). In order to handle this case, I have added \b into match of the exclusion, so these prefixes must be stand-alone. The other case, as #ScottCraner noted, is three parts names (like Mary Lou Smith in his example). In this case you can maneuver these cases with Count of matches (x variable). For instance, you can concatenate any parts, if there are three names.
Sub F()
'// Tools -> References -> Microsoft VBSscript Regular Expressions 5.5
Dim re As RegExp, mc As MatchCollection, m As Match, s$, x%
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "\b(?!(van|von|de)\b)[a-z]+"
Set mc = re.Execute("van Halen, Vanzen")
If mc.Count > 0 Then
For x = 0 To mc.Count - 1
MsgBox mc(x)
Next
Else
MsgBox "No matches were found.", vbExclamation
End If
'// Output:
'// Halen
'// Vanzen
End Sub

Error with returning multiple index from a list using listbox multiselect

New to VBA and somewhat frustrated. trying to make a form for my fellow teachers to simplify paperwork.
I have a list in a form
1. apple
2. orange
3. grape
If the user were to select apple and grape from the list, I want the cell to have just the indexes. So to print (1,3) in the cell. I don't want the words.
My current code
Private Sub SpedAccomAddBtn_Click()
'variable to count multiple selections'
VarSped = " "
'loop to keep track of indexes of selected items'
For X = 0 To Me.SpedListBx.ListCount - 1 'count through list
If Me.SpedListBx.Selected(X) Then
If VarSped = " " Then 'if blank then record first item'
VarSped = Me.SpedListBx.ListIndex + 1 'first selected item. +1 because excel is a 0 based index'
Else 'if not the first selection add a , between selections'
VarSped = VarSped & "," & Me.SpedListBx.ListIndex + 1
End If
End If
Next X
ThisWorkbook.Sheets("Master SPED Sheet").Range("c4") = VarSped 'print to cell'
If i use the earlier example of choosing Apple and Grape, I get (3,3) not (1,3). I can't figure out why VarSped keeps getting overwritten.
(I am new to coding and I have to comment everything so I feel like I know what I'm doing)
Try this and see how to refer to the current item inside the loop:
Private Sub SpedAccomAddBtn_Click()
Dim VarSped As String
Dim x As Integer
'variable to count multiple selections'
VarSped = " "
'loop to keep track of indexes of selected items'
For x = 0 To Me.SpedListBx.ListCount - 1 'count through list
If Me.SpedListBx.Selected(x) Then
If VarSped = " " Then 'if blank then record first item'
VarSped = Me.SpedListBx.List(x) 'first selected item. +1 because excel is a 0 based index'
Else 'if not the first selection add a , between selections'
VarSped = VarSped & "," & Me.SpedListBx.List(x)
End If
End If
Next x
ThisWorkbook.Sheets("Master SPED Sheet").Range("c4") = VarSped 'print to cell'
End Sub
your loop iterator is your item position already: just add one to it to manage the 0-based List:
Option Explicit
Private Sub SpedAccomAddBtn_Click()
Dim VarSped As String ' a string is always initialized with a null string, i.e. with a "" string. hence no need for a 'VarSped = ""' statement
Dim X As Long
For X = 0 To Me.SpedListBx.ListCount - 1 'count through list
If Me.SpedListBx.Selected(X) Then VarSped = VarSped & "," & X + 1
Next
If VarSped <> vbNullString Then ThisWorkbook.Sheets("Master SPED Sheet").Range("c4") = Mid$(VarSped, 2) ' print 'VarSped' only if user selected something (i.e. 'VarSped' is not a not string). Mid$() function is used to skip the first character which is a colon
End Sub
Get in the habit of placing Option Explicit at the very topo of every module and explicitly declaring all your variables: this will both save you a lot of debugging time and give you much more control over your code

sum number of a string of a cell that meet certain conditions

I just wonder how to calculate this in vba:
Calculate the first amount of money if it is yes but not calculate the amount if it is no. Imagine there are four cells:
(cell 1) abcbc bcbcbcb cbcbcbc $1000/kskskksks/$2000//1222/1221/11/yes
(cell 2) any words will be here $2300/heyhey hey/ //3232//3232/no
(cell 3) kakjsak dsdsk kdjskj 2323/ $23232/hhehe 22/33/333/yes
(cell 4) kakaka kjsdkj ksjskjds kdjsjkdj 11 223 222/ $1121/ $2121/yes
The algorithm is to check whether is yes or no. Then, on each line, find the first money, beginning with $, the second money on the same line would not take into account.
In this example, the program will take $1000 into account, because it is yes, second line would not be executed since it is no. And the third cell would take the first money (first $), $23232. So, the program will sum $1000+$23232+$1121=$25353
I guess that this is what you want, considering that you are using the first column to place each value and your sheet's name is "Sheet1"
Sub SumFirstAmountIfYes()
Dim AmountSum As Variant ' Declares the AmountSum
lastRow = Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Row ' Finds the last used row on the first column
For i = 1 To lastRow ' Iterates over the rows to the last row
StringValue = Sheets("Sheet1").Cells(i, 1).Value2 ' Gets the value to a variable
If StringValue Like "*yes" Then ' Checks if the string terminates with "yes"
FirstDollar = InStr(StringValue, "$") ' Finds first dollar symbol "$"
FirstSlashAfterDollar = InStr(FirstDollar, StringValue, "/", 0) ' Finds first slash "\" after the first dollar symbol
FirstAmount = Mid(StringValue, FirstDollar + 1, FirstSlashAfterDollar - FirstDollar - 1) ' Gets the amount of each row
AmountSum = AmountSum + CDec(FirstAmount) ' Adds to the sum variable each found amount
End If
Next
MsgBox (AmountSum) ' Shows the final sum of the amounts
End Sub
This uses split to isolate yes/no and InStr to locate the first currency symbol.
Sub sumYes()
Dim i As Long, str As String, dbl As Double
With Worksheets("Sheet10")
For i = 2 To .Cells(.Rows.Count, "A").End(xlUp).Row
str = LCase(.Cells(i, "A").Value2)
If Split(str, "/")(UBound(Split(str, "/"))) = "yes" Then
If CBool(InStr(1, str, Chr(36))) Then
dbl = dbl + Val(Mid(str, InStr(1, str, Chr(36)) + 1))
End If
End If
Next i
.Cells(2, "B") = dbl
End With
End Sub
Slightly different approach which uses excel array formula assuming your cell values doesn't contain trailing spaces ,
=SUM(IF(RIGHT(A1:A4,1)="s",MID(A1:A4,SEARCH("$",A1:A4)+1,SEARCH("/",A1:A4,SEARCH("$",A1:A4))-SEARCH("$",A1:A4)-1)*1,""))

How can I remove text duplicates in a cell string?

So imagine there is the following string in a cell in excel:
A1 = "Company 1 Company 2 Company 1 Company 2 Company 3"
and the desired result for now is removing the duplicates:
A1 = "Company 1 Company 2 Company 3" (I imagine that this one doesn't require a macro)
the ideal one would be to put the distinct values in different cells in a vertical way:
A1 = "Company 1"
A2 = "Company 2"
A3 = "Company 3"
(which would require definitely programming but since I never used vba i'm not experienced enough I think to elaborate such code)
Is it feasible?
EDIT: the delimiter can be changed from a space " " to other, for example, a semicolon ";" to prevent errors and to be easier to solve this one.
Asumption is you have a delimiter between the strings to tell apart you could use the following code
Option Explicit
Sub RemoveDuplicates()
Const SEPARATOR = ","
Dim vDat As Variant
vDat = Split(Range("A1"), SEPARATOR)
' remove trailing blanks if necessary
Dim i As Long
For i = LBound(vDat) To UBound(vDat)
vDat(i) = Trim(vDat(i))
Next i
Dim dic As Object
Set dic = CreateObject("Scripting.Dictionary")
Dim vItem As Variant
For Each vItem In vDat
If Not dic.Exists(vItem) Then
dic.Add vItem, vItem
End If
Next
vDat = dic.Keys
' Write data to column B
Range("B1").Resize(UBound(vDat) + 1) = WorksheetFunction.Transpose(vDat)
'Debug.Print Join(vDat, SEPARATOR)
End Sub
Tested with the following data
A1 = Company 1, Company 2, Company 1, Company 2 , Company 3
or
A1 = IBM, Apple, Microsoft, Apple , IBM
With an unambiguous string, and by that I mean:
delimiter not included in the substrings, OR
each entry surrounded by double quotes
you can use Power Query in Excel 2010, 2013 or Data Get & Transform in Excel 2016, to do all of that.
Split the cell on the delimiter
Define the quote mark as the text qualifier if necessary
Rows - remove duplicates
So with data like:
Company 1;Company 2;Company 1;Company 2;Company 3
or (space delimiter)
"Company 1" "Company 2" "Company 1" "Company 2" "Company 3"
you can easily accomplish what you require without using VBA.
And if, as in your examples, there are extraneous spaces at the beginning or end of the data, Power Query has a Text.Trim function that will be useful.
Alternate solution using UDF (commented for clarity):
Public Function UNIQUELIST(ByVal arg_vOriginalList As String, ByVal arg_sDelimiter As String, ByVal arg_lReturnIndex As Long) As Variant
Dim oDict As Object
Dim vElement As Variant
Dim i As Long
'Use a dictionary to extract unique elements
Set oDict = CreateObject("Scripting.Dictionary")
i = 0 'This is a counter to keep track until we reach the appropriate return index
'Loop through each element
For Each vElement In Split(arg_vOriginalList, arg_sDelimiter)
'Check the trimmed, lowercase element against the keys of the dictionary
If Not oDict.Exists(LCase(Trim(vElement))) Then
'Unique element found
i = i + 1
If i = arg_lReturnIndex Then
'Found appropriate unique element, output and exit function
UNIQUELIST = Trim(vElement)
Exit Function
End If
'Not correct return index, add element to dictionary
'Lowercase the key (so uniques aren't case sensitive) and trim both the key and the value
oDict.Add LCase(Trim(vElement)), Trim(vElement)
End If
Next vElement
'arg_lReturnIndex was less than 1 or greater than the number of unique values, return blank
UNIQUELIST = vbNullString
End Function
Then in a cell where you want the output to start (for example, B1), put this formula and copy down (adjust the "," to be the correct delimiter):
=UNIQUELIST($A$1,",",ROW(A1))
Approach using same delimiters as in OP
I assume the same space delimiters as in your original post: As you want to get your company strings in groups of two, I slightly modified the good solution of #Storax by concatenating the Split result in steps of 2 first and demonstrate a shorter way to write results back to sheet (cf. section [5]).
Example Code
Option Explicit ' declaration head of your code module
Sub SplitCompanies()
' [0] declare variables and set objects
Dim v, vItem
Dim i As Integer, n As Integer
Dim s, str As String
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Tabelle5") ' << change to your sheet name
Dim dict As Object ' late binding of dictionary
Set dict = CreateObject("Scripting.Dictionary")
' [1] get cell value and split it (space delimited as in Original Post)
str = ws.Range("A1") ' cell value, e.g. "Company 1 Company 2 Company 1 Company 2 Company 3"
s = Split(str, " ") ' split cell value (space delimiter)
' [2] count all companies and redimension helper array
n = Int((UBound(s) + 1) / 2) - 1 ' items counter equals 1/2 of split items
ReDim v(0 To n) ' redim zero-based 1-dim helper array
' [3] concatenate partial strings in helper array
For i = 0 To n
v(i) = s(i * 2) & " " & s(i * 2 + 1)
Next i
' [4] build dictionary with unique items
For Each vItem In v
If Not dict.Exists(vItem) Then
dict.Add vItem, vItem
End If
Next
' [5] Write data to column B
ws.Range("B1:B" & dict.Count) = Application.Transpose(dict.Keys)
' [6] clear memory
Set dict = Nothing: Set ws = Nothing
End Sub

How to extract text within a string of text

I have a simple problem that I'm hoping to resolve without using VBA but if that's the only way it can be solved, so be it.
I have a file with multiple rows (all one column). Each row has data that looks something like this:
1 7.82E-13 >gi|297848936|ref|XP_00| 4-hydroxide gi|297338191|gb|23343|randomrandom
2 5.09E-09 >gi|168010496|ref|xp_00| 2-pyruvate
etc...
What I want is some way to extract the string of numbers that begin with "gi|" and end with a "|". For some rows this might mean as many as 5 gi numbers, for others it'll just be one.
What I would hope the output would look like would be something like:
297848936,297338191
168010496
etc...
Here is a very flexible VBA answer using the regex object. What the function does is extract every single sub-group match it finds (stuff inside the parenthesis), separated by whatever string you want (default is ", "). You can find info on regular expressions here: http://www.regular-expressions.info/
You would call it like this, assuming that first string is in A1:
=RegexExtract(A1,"gi[|](\d+)[|]")
Since this looks for all occurance of "gi|" followed by a series of numbers and then another "|", for the first line in your question, this would give you this result:
297848936, 297338191
Just run this down the column and you're all done!
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional separator As String = ", ") As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long, j As Long
Dim result As String
RE.pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.count - 1
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & (separator & allMatches.Item(i).submatches.Item(j))
Next
Next
If Len(result) <> 0 Then
result = Right$(result, Len(result) - Len(separator))
End If
RegexExtract = result
End Function
Here it is (assuming data is in column A)
=VALUE(LEFT(RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2),
FIND("|",RIGHT(A1,LEN(A1) - FIND("gi|",A1) - 2)) -1 ))
Not the nicest formula, but it will work to extract the number.
I just noticed since you have two values per row with output separated by commas. You will need to check if there is a second match, third match etc. to make it work for multiple numbers per cell.
In reference to your exact sample (assuming 2 values maximum per cell) the following code will work:
=IF(ISNUMBER(FIND("gi|",$A1,FIND("gi|", $A1)+1)),CONCATENATE(LEFT(RIGHT($A1,LEN($A1)
- FIND("gi|",$A1) - 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ),
", ",LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1)
- 2),FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1,FIND("gi|", $A1)+1) - 2))
-1 )),LEFT(RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2),
FIND("|",RIGHT($A1,LEN($A1) - FIND("gi|",$A1) - 2)) -1 ))
How's that for ugly? A VBA solution may be better for you, but I'll leave this here for you.
To go up to 5 numbers, well, study the pattern and recurse manually in the formula. IT will get long!
I'd probably split the data first on the | delimiter using the convert text to columns wizard.
In Excel 2007 that is on the Data tab, Data Tools group and then choose Text to Columns. Specify Other: and | as the delimiter.
From the sample data you posted it looks like after you do this the numbers will all be in the same columns so you could then just delete the columns you don't want.
As the other guys presented the solution without VBA... I'll present the one that does use. Now, is your call to use it or no.
Just saw that #Issun presented the solution with regex, very nice! Either way, will present a 'modest' solution for the question, using only 'plain' VBA.
Option Explicit
Option Base 0
Sub findGi()
Dim oCell As Excel.Range
Set oCell = Sheets(1).Range("A1")
'Loops through every row until empty cell
While Not oCell.Value = ""
oCell.Offset(0, 1).Value2 = GetGi(oCell.Value)
Set oCell = oCell.Offset(1, 0)
Wend
End Sub
Private Function GetGi(ByVal sValue As String) As String
Dim sResult As String
Dim vArray As Variant
Dim vItem As Variant
Dim iCount As Integer
vArray = Split(sValue, "|")
iCount = 0
'Loops through the array...
For Each vItem In vArray
'Searches for the 'Gi' factor...
If vItem Like "*gi" And UBound(vArray) > iCount + 1 Then
'Concatenates the results...
sResult = sResult & vArray(iCount + 1) & ","
End If
iCount = iCount + 1
Next vItem
'And removes trail comma
If Len(sResult) > 0 Then
sResult = Left(sResult, Len(sResult) - 1)
End If
GetGi = sResult
End Function
open your excel in Google Sheets and use the regular expression with REGEXEXTRACT
Sample Usage
=REGEXEXTRACT("My favorite number is 241, but my friend's is 17", "\d+")
Tip: REGEXEXTRACT will return 241 in this example because it returns the first matching case.
In your case
=REGEXEXTRACT(A1,"gi[|](\d+)[|]")

Resources