Excel extracting text into columns - excel

Is there a way to extract data from a single cell and split it into columns by headers. For example we have in A1 cell text like this:
Name: John
Address: USA, New York
Age: 66
I want to split this text into columns with headers Name, Address, Age and extract data to the following columns. I'd be grateful for tips.

This is little bit tricky but will work on all version of excel. As per below screenshot Put Name, Address, Age to B1,C1 & D1 cell then put below formula to B2 cell then drag down and right as needed.
=SUBSTITUTE(TRIM(MID(SUBSTITUTE($A2,CHAR(10),REPT(" ",100)),((COLUMN(A$2)-1)*100)+1,COLUMN($A$2)*100)),B1&": ","")

If you wouldn't mind using formulas instead of VBA:
With Excel O365:
Formula in B2:
=TRANSPOSE(TRIM(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A2,":",CHAR(10)),CHAR(10),"</s><s>")&"</s></t>","//s[position() mod 2 = 0]")))
With Excel 2013 or higher, other than O365:
=INDEX(TRIM(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE($A2,":",CHAR(10)),CHAR(10),"</s><s>")&"</s></t>","//s[position() mod 2 = 0]")),COLUMN(A1))
And drag over and down...

A VBA array approach
This late post in addition to the valid answers above demonstrates an array approach and a double splitting:
section [1] splits into lines via the vbLf delimiter (equalling Chr(10)),
section [2] restricts splitting to two parts (via ": ")
As it's not so widely known how to use the Split() function by limiting output to 2 tokens only as shown in section [2], have a look at the
Syntax
Split(expression, [ delimiter, [ limit, [ compare ]]])
Option Explicit
Sub SplitIntoTokens()
With Sheet1 ' << change to your project's sheet Code(Name)
'[0] get string content
Dim lastRow As Long
lastRow = .Range("A" & .Rows.Count).End(xlUp).Row
Dim data As Variant
data = .Range("A1:D" & lastRow).Value
Dim i As Long
For i = 2 To UBound(data)
'[1] split into lines
' ~> Name: John|Address: USA, New York|Age: 66
Dim lines: lines = Split(data(i, 1), vbLf)
'[2] split into 2 parts only and take the 2nd one
Dim ii As Long, tmp
For ii = 0 To UBound(lines)
lines(ii) = Split(lines(ii), ": ", 2)(1) ' split via ": "-delimiter, 2nd part via index (1)
data(i, ii + 2) = lines(ii)
Next
'Debug.Print Join(lines, "|") optional (display results in VB Editors Immediate Window
Next
'[3] write array results back to sheet
.Range("A1").Resize(UBound(data), 4) = data ' write data
.Range("A1:D1") = Split("Data,Name,Address,Age", ",") ' write header (if not existant)
End With
End Sub

Related

Concatenate values of more cells in a single variable in vba

I have an excel file with four columns: name, surname, address, area.
There are a lot of rows.
Is there a way to concatenate all the values of every single row in a variable, using vba?
I need a variable that should contain something like this:
(name1, surname1, address1, area1); (name2, surname2, address2, area2); (name3, surname3, address3, area3)...
If you have the following data in your worksheet
Then the following code will read the data into an array …
Option Explicit
Public Sub Example()
Dim RangeData() As Variant ' declare an array
RangeData = Range("A1:D5").Value2 ' read data into array
End Sub
… with the following structure:
Alternatively you can do something like
Public Sub Example()
Dim DataRange As Range
Set DataRange = Range("A2:D5")
Dim RetVal As String
Dim Row As Range
For Each Row In DataRange.Rows
RetVal = RetVal & "(" & Join(Application.Transpose(Application.Transpose(Row.Value2)), ",") & "); "
Next Row
Debug.Print RetVal
End Sub
To get this output:
(name1, surname1, address1, area1); (name2, surname2, address2, area2); (name3, surname3, address3, area3); (name4, surname4, address4, area4);
.. is there a way to write the result like a sort of list that shows all the values of the cells of the range?
Yes, there is. In addition to PEH's valid answers and disposing of Excel version MS365 you might also use
Dim s as String
s = Evaluate("ArrayToText(A2:D5, 1)") ' arg. value 1 representing strict format
resulting in the following output string:
{"name1","surname1","address1","area1";"name2","surname2","address2","area2";"name3","surname3","address3","area3";"name4","surname4","address4","area4"}
Syntax
ARRAYTOTEXT(array, [format])
The ARRAYTOTEXT function returns an array of text values from any specified range. It passes text values unchanged, and converts non-text values to text.
The format argument has two values, 0 (concise default format) and 1 (strict format to be used here to distinguish different rows, too):
Strict format, i.e. value 1 includes escape characters and row delimiters. Generates a string that can be parsed when entered into the formula bar. Encapsulates returned strings in quotes except for Booleans, Numbers and Errors.
Thank you for your answers, suggestions, ideas and hints. I am sorry if my question was not so clear, all the solutions you added were perfect and extremely elegant.
In the end I found a way - a dumber way in comparison to all the things you wrote - and I solved with a for statement.
I did like this:
totRow = ActiveSheet.UsedRange.Rows.Count
For i = 1 To totRow
name = Cells(i, 1)
surname = Cells(i, 2)
address = Cells(i, 3)
area = Cells(i, 4)
Example = Example & "(" & name & ", " & surname & ", " & address & ", " & area & "); "
Next i
Range("E1").Value = Example
It works (it does what I wanted to do), but I noticed a little limit: if the rows are a lot I can't keep the whole text in the variable.

VBA formula removing everything after second space

I was trying to copy from column D to column E first two words of each row but still can not find where the error is....
Range("E1:E" & lLastRow).Formula = "=LEFT(D1,FIND("" "",D1,FIND("" "",D1)+1)-1)"
Another option, instead of using a Formula, you can use the Split function.
Code
Dim i As Long, LastRow As Long
Dim WordsArr As Variant
' loop through rows
For i = 1 To LastRow
WordsArr = Split(Range("D" & i).Value, " ") ' use Split and space to read cell words to array
If UBound(WordsArr) >= 1 Then ' make sure the cell contents is at least 2 words
Range("E" & i).Value = WordsArr(0) & " " & WordsArr(1) ' insert only the first 2 words
Else ' in case there are less than 2 words
' do someting....
End If
Next i
End Sub
Try this instead ...
Range("E1:E" & lLastRow).FormulaR1C1 = "=LEFT(RC[-1],FIND("" "",RC[-1],FIND("" "",RC[-1])+1)-1)"
I find using R1C1 better for those sort of operations, especially given you want your references to be dynamic, not absolute.
Alternatively, add the formula you had normally and simply fill down.

Find how many words from cell are found in an array

I have two columns with data. The first one has some terms and the other one contains single words.
what I have
I'm looking for a way to identify which words from each cell from the first column appear in the second, so the result should look something like this (I don't need the commas):
what I need
My question is somehow similar to Excel find cells from range where search value is within the cell but not exactly, because I need to identify which words are appearing in the second column and there can be more than one word.
I also tried =INDEX($D$2:$D$7;MATCH(1=1;INDEX(ISNUMBER(SEARCH($D$2:$D$7;A2));0);))
but it also returns only one word.
If you are willing to use VBA, then you can define a user defined function:
Public Function SearchForWords(strTerm As String, rngWords As Range) As String
Dim cstrDelimiter As String: cstrDelimiter = Chr(1) ' A rarely used character
strTerm = cstrDelimiter & Replace(strTerm, " ", cstrDelimiter) & cstrDelimiter ' replace any other possible delimiter here
SearchForWords = vbNullString
Dim varWords As Variant: varWords = rngWords.Value
Dim i As Long: For i = LBound(varWords, 1) To UBound(varWords, 1)
Dim j As Long: For j = LBound(varWords, 2) To UBound(varWords, 2)
If InStr(1, strTerm, cstrDelimiter & varWords(i, j) & cstrDelimiter) <> 0 Then
SearchForWords = SearchForWords & varWords(i, j) & ", "
End If
Next j
Next i
Dim iLeft As Long: iLeft = Len(SearchForWords) - 2
If 0 < iLeft Then
SearchForWords = Left(SearchForWords, Len(SearchForWords) - 2)
End If
End Function
And you can use it from the Excel table like this:
=SearchForWords(A2;$D$2:$D$7)
I have a partial solution:
=IF(1-ISERROR(SEARCH(" "&D2:D7&" "," "&A2&" ")),D2:D7&", ","")
This formula returns an array of the words contained in the cell (ranges are according to your picture). This array is sparse: it contains empty strings for each missing word. And it assumes that words are always separated by one space (this may be improved if necessary).
However, native Excel functions are not capable of concatenating an array, so I think the rest is not possible with native formulas only.
You would need VBA but if you use VBA you should not bother with the first part at all, since you can do anything.
You can create a table with the words you want to find across the top and use a formula populate the cells below each word if it's found. See screenshot.
[edit] I've noticed that it's incorrectly picking up "board" in "blackboard" but that should be easily fixed.
=IFERROR(IF(FIND(C$1,$A2,1)>0,C$1 & ", "),"")
Simply concatinate the results
=CONCATENATE(C2,D2,E2,F2,G2,H2)
or
=LEFT(CONCATENATE(C2,D2,E2,F2,G2,H2),LEN(CONCATENATE(C2,D2,E2,F2,G2,H2))-2)
to take off the last comma and space
I've edited this to fix the problem with "blackboard"
new formula for C2
=IF(OR(C$1=$A2,ISNUMBER(SEARCH(" "&C$1&" ",$A2,1)),C$1 & " "=LEFT($A2,LEN(C$1)+1)," " & C$1=RIGHT($A2,LEN(C$1)+1)),C$1 & ", ","")
New formula for B2 to catch the error if there are no words
=IFERROR(LEFT(CONCATENATE(C2,D2,E2,F2,G2,H2,I2),LEN(CONCATENATE(C2,D2,E2,F2,G2,H2,I2))-2),"")

EXCEL vba - extract numbers from cell and paste into two different columns?

I have a spreadsheet with a load of random text and numbers in column A like so:
Column A
Row 1 = 471806121601 5205569 - 0007 Standard White Toilet Tissue 27
Row 2 = 471814121601 5206177 - 0014 Premium White Toilet Tissue 6
Row 3 = 471814121601 5206178 - 0007 Premium White Toilet Tissue 27
Row 4 = 471806121601 5206180 - 0014 Premium Kitchen Towel 2x75l 6
I have about 2000 lines in total. In each cell, is a Purchase order number (12 digits) and an item number next to it (7 digits).
I am trying to extract the po number and put it into column B and extract the item number and put it into column C
Column B Column C
471806121601 5205569
471814121601 5206177
471814121601 5206178
471806121601 5206180
Here is my code:
Option Explicit
Sub main()
Dim cell As Range
Dim arr As Variant, arrElem As Variant
With Worksheets("Orders") '<--| change "Strings" to your actual worksheet name
For Each cell In .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
arr = Split(Replace(cell.Value, " ", " "), " ") '<--| change "A"'s to your actual relevant column index
For Each arrElem In arr
If IsNumeric(arrElem) Then
If Len(arrElem) = 12 Then cell.Offset(0, 1).Value = arrElem
End If
Next arrElem
Next cell
End With
Dim cell2 As Range
Dim arr2 As Variant, arrElem2 As Variant
With Worksheets("Orders") '<--| change "Strings" to your actual worksheet name
For Each cell2 In .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
arr2 = Split(Replace(cell2.Value, " ", " "), " ") '<--| change "A"'s to your actual relevant column index
For Each arrElem2 In arr2
If IsNumeric(arrElem2) Then
If Len(arrElem2) = 7 Then cell2.Offset(0, 3).Value = arrElem2
End If
Next arrElem2
Next cell2
End With
End Sub
This code does work. However it takes absolutely ages and only does one line at a time...Slowly.
Is there a quicker way of doing this? Thanks
If your PO and IN are always the same length in col B put
=MID(A2, 1, 12)
And in col C
=MID(A2, 14, 7)
However if your number change but are always the first two swap the above for,
=MID(A2,1,FIND(" ",A2,1)-1)
And
=MID(A2, FIND(" ", A2, 1)+1, 7)
Respectively.
just use split(string,delimiter)(0) and (1) why replace the space, just use that as the delim. If Row # is in, then use (1) and (2), or you could consider split(split(input,"-")," ") maybe a little faster, not sure though. Also, once you're done no need to complete the loop, so consider, do until with flags rather than for next, although exit for is available
Formula wise, it could be done using something like this
=MID(D1,FIND("é",SUBSTITUTE(D1," ","é",3)),FIND("é",SUBSTITUTE(D1," ","é",4))-FIND("é",SUBSTITUTE(D1," ","é",3)))
and
=MID(D1,FIND("é",SUBSTITUTE(D1," ","é",4)),FIND("é",SUBSTITUTE(D1," ","é",5))-FIND("é",SUBSTITUTE(D1," ","é",4)))

Using nested formula in VBA

I'm working on problem that necessitates the use of nested formulas in excel. For eg:
I have a column for errors and one for its analysis
Error Analysis
Enter a valid material number Invalid Material
Eg errors:
Enter a valid material number; The material number 1234 does not
exist.
PO number XYZ does not exist.
VIN number 123 does not exist.
Country of origin AB does not exist.
I have a compendium of such errors and their analyis in the next sheet, and I'm using VLOOKUP in conjuction with FIND to lookup the analysis for the known errors.
=VLOOKUP(LEFT(F2, FIND(" ", F2, FIND(" ", F2) + 1) - 1)&"*", 'Sheet2'!A:B, 2, 0)
What i'm trying to do here is extract the first two words from the error and append a * to it and use it in VLOOKUP.
It would be something like Vlookup "PO number *" in the other sheet and get the analysis for it. Asterisk is because I don 't get the same number daily. And I also know that the extracted first two words of the error will be unique. (I know that error with "Enter a" as the first two words will not appear again).
Now I get errors in the same column so I thought of making a button and writing a code which uses the above formula.
I tried to modify some code off the net, but I'm not getting anywhere with it. I'm totally new to VBA. It'd be great if you can provide a snippet for this. I'll try to replicate the procedure for other needs.
This code seems to be working for now
Sub PopulateAnalysis()
Dim an_row As Long
Dim an_clm As Long
Dim lft As String
Dim st_num As Integer
Dim fin As String
Dim searchStr As String
Dim soughtStr As String
Table1 = Sheet1.Range("F2:F6") 'ErrorColumn from Error table (How do I make the range dynamic??)
Table2 = Sheet5.Range("A1:B6")
an_row = Sheet1.Range("G2").Row ' Populate this column from the analysis table on sheet2
an_clm = Sheet1.Range("G2").Column
For Each cl In Table1
'How do I translate the above formula into VBA statements??
st_num = InStr(InStr(cl, " ") + 1, cl, " ")
lft = left(cl, st_num - 1)
fin = lft & "*"
Sheet1.Cells(an_row, an_clm) = Application.WorksheetFunction.VLookup(fin, Table2, 2, True)
an_row = an_row + 1
Next cl
MsgBox "Done"
End Sub
This should work. You don't need the debug lines of course ;)
Sub PopulateAnalysis()
Dim rngTableWithErrors As Range
Dim rngTableWithAnalysis As Range
Application.ScreenUpdating = False
'set the range for Table with error, Table1 on sheet 1
With Sheets(1) 'change to name of the sheet, more reliable than index num.
Set rngTableWithErrors = .Range("F2:F" & .Cells(.Rows.Count, 6).End(xlUp).Row)
Debug.Print rngTableWithErrors.Address
End With
'set the range for Table with Analysis, Table 2 on sheet 2
With Sheets(2) 'change to name of the sheet, more reliable than index num.
Set rngTableWithAnalysis = .Range("A1:B" & .Cells(.Rows.Count, 2).End(xlUp).Row)
Debug.Print rngTableWithAnalysis.Address
End With
'formula for cell G2
'=VLOOKUP(LEFT(F2;FIND(" ";F2;FIND(" ";F2)+1)- 1)&"*";Sheet2!A1:B23;2; 0)
rngTableWithErrors.Offset(0, 1).FormulaR1C1 = _
"=VLOOKUP(LEFT(R[0]C[-1],FIND("" "",R[0]C[-1],FIND("" "",R[0]C[-1])+1)-1)& ""*"",Sheet2!R1C1:R" & rngTableWithAnalysis.Rows.Count & "C2,2, 0)"
Application.ScreenUpdating = True
MsgBox "Done"
End Sub
Notes
You can notice, that we are setting the upper left cells of ranges manually. It's better practice to somehow find the upper left cells (using Find method is my favorite) and work from there. You never know, how the user will change the worksheet - i.e. add new rows, columns etc.

Resources