Logically parse a string in excel to trim near duplicates - excel

Assume the string:
item1, item1N, item1Z, item1fhg, item1_any_letters, item2, item3, item3N, item3H
my goal output is simply
item1, item2, item3
this is about a 100,000 line Excel file currently, but can be migrated to another program etc if needed temporarily.
Essentially I need to determine duplicates (any initial phrase ending in a number) with no regard to letters after the number. Some phrases might have for example "Brand item2, Brand item34" as well, the only determining factor of a duplicate is any and all terminology AFTER the number.
any ideas on where to begin with this? Each string usually has between 2 and 500 values in it, seperated by comma and a space. No comma follows the final value.

Sub Tester()
Dim re As Object, match As Object
Dim dict As Object
Dim arr, arrItems, x As Long, y As Long
Dim val, matches, valMatch
Set dict = CreateObject("scripting.dictionary")
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "([\w ]+\d+)"
re.ignorecase = True
re.Global = True
arr = ActiveSheet.Range("A1:A100").Value
For x = LBound(arr, 1) To UBound(arr, 1)
arrItems = Split(arr(x, 1), ",")
dict.RemoveAll
For y = LBound(arrItems) To UBound(arrItems)
val = Trim(arrItems(y))
If re.Test(val) Then
Set matches = re.Execute(val)
valMatch = matches(0).Value
If Not dict.exists(valMatch) Then dict.Add valMatch, 1
End If
Next y
Debug.Print arr(x, 1)
Debug.Print Join(dict.keys, ",") 'where do you want this?
Next x
End Sub

A VBA approach that is somehwat similar to Tim's for the first pathway
Use a RegExp to remove the invalid charcaters (characters after a number and before a comma)
Eliminate the duplicates with
a) Use a Dictionary
b) Excel's inbuilt remove duplicates functionality (writes to a sheet)
Const strDelim = ", "
Sub TestMe()
Dim strTest As String
Dim x
strTest = "item1, item1N, item1Z, item1fhg, item1_any_letters, item2, item3, item3N, item3H"
x = Split(DeDupe(strTest), strDelim)
'fix last element
x(UBound(x)) = Left$(x(UBound(x)), Len(x(UBound(x))) - 1)
Call Method2(x)
End Sub
Sub Method2(ByVal x)
Dim objDic As Object
Dim y As Variant
Set objDic = CreateObject("Scripting.Dictionary")
Dim lngRow As Long
For lngRow = LBound(x) To UBound(x)
objDic(x(lngRow)) = 1
Next lngRow
MsgBox Join(objDic.keys, strDelim)
End Sub
Function DeDupe(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "(.+?\d+)[^\d]+(,|$)"
DeDupe = .Replace(strIn, "$1,")
End With
End Function
Option B
'another potential option. Not applied in this code
Sub Method1(ByVal x)
Dim y As Variant
Dim rng1 As Range
With ActiveSheet
.[a1].Resize(UBound(x) + 1, 1) = Application.Transpose(x)
.Columns("A").RemoveDuplicates Columns:=1, Header:=xlNo
y = Application.Transpose(Range([a1], Cells(Rows.Count, "A").End(xlUp)))
End With
MsgBox Join(y, strDelim)
End Sub

This is probably imperfect, since it's a quick hack which only removes the rightmost non-digit strings. You will need some regexp knowledge to tune it to your needs.
Anyway, follow the "installation" steps given here, save the module, and you will be able to write in your sheet a formula such as
=S(A1;"[^0-9]*$";"")
in, say, the B1 cell. If A1 cell contains "Item 1234 blah blah", then B1 will now contain "Item 1234". Drag the formula in all cells of column B, and save values to another Excel file for sorting (or you can try sorting and sub-totaling in-place).
Unfortunately, I do not believe that doing this in 100,000+ cells is practical (I even advise against subtotaling in-place).
You would be much better served by installing textools (sed, grep, uniq...) for Windows, and running your file through a filter. Assuming that each row represents one item as above, a filter such as
sed -e 's/^\([^0-9][^0-9]*[0-9][0-9]*\).*/\1/g' | sort | uniq -c | sort -rn
would get your 100,000 line file and return something like
79283 Item 1
1234 Item 2
993 Item 3
..........
(on some platforms you could have written (\D+\d+) instead of ([^0-9]..., but I'm unsure of the Windows behaviour).
An even better choice of tools would be (Strawberry)Perl, which has CSV support too, or Python language.

Related

Is there a faster way to replace accented characters?

I have this code that replaces all accented characters except in row 6. However, this macro takes a long time because it goes through every cell/letter, is there any way to make this any faster by making it ignore cells that don't have any accents in them?
Const sFm As String = "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
Const sTo As String = "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
Dim i As Long, employeews As Worksheet
Dim rowsix() As Variant
Set employeews = DestWb.Sheets(1)
'Don't replace row 6
rowsix = employeews.Rows(6).Value
For i = 1 To Len(sFm)
employeews.Cells.Replace Mid(sFm, i, 1), Mid(sTo, i, 1), LookAt:=xlPart, MatchCase:=True
Next i
employeews.Rows(6).Value = rowsix
Putting comment as an answer so the code is more readable:
I would think to choose a range I want to replace values within, then loop through the special characters to replace, as a whole, within the range. The only real caveat to remember is that this will affect formulas.
dim accentArr as variant, noAccentArr as variant
'accent and noaccent need to have same upper bound for this approach!
accentArr = Array("Š","Ž","š") 'quick mockup
noAccentArr = Array("S","Z","s")
dim i as long
For i = lbound(accentArr) to ubound(accentArr)
ws.range("a1:z5").replace(accentArr(i),noAccentArr(i))
Next i
Rather than going character by character in the cell, you at least do a mass replace for specific characters... this also allows your Range() to start at row 7, as to not include row 6.
Postscript, see: Split string into array of characters? if you want to utilize the existing string without having to manually split out the string of characters into an array.
In line with what everyone else is saying, and not really knowing what you are considering as bad performance, you could try someting like so. It uses a dictionary which is populated with your from and to strings, split into characters and their replacements where the from is the key and the to is the item The keys() and items() of the dictionary are array's so using them rather than slicing the string each time and the dictionary will be available again.
Private d As Scripting.Dictionary
Const sFrom As String = "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
Const sTo As String = "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
Sub PopulateReplacements()
Dim s As String
Dim l As Long
Set d = New Scripting.Dictionary
For l = 1 To Len(sFrom)
If Not d.Exists(Mid(sFrom, l, 1)) Then _
d.Add Mid(sFrom, l, 1), Mid(sTo, l, 1)
Next l
End Sub
Sub TestReplacing()
Dim s As String
Dim l As Long
s = "ÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔ"
s = "_Ÿ_À_Á_Â_Ã_Ä_Å_Ç_È_É_Ê_Ë_Ì_Í_Î_Ï_Ð_Ñ_"
s = sFrom
If d Is Nothing Then
PopulateReplacements
End If
For l = 0 To UBound(d.Keys())
s = Replace(s, d.Keys()(l), d.Items()(l))
Next l
Debug.Print s
End Sub

how to not enter if statement inside a loop if it have been executed

I have a for loop, and inside it i have if statement.
In my Excel I have a list that contains each value one time. Once I found it i don't want the code to even check the conditional, i want it to skip this part of the if statement completely each time the loop is executed, is it possible?
Here is my code and list:
the first iteration of the loop will find that "c" is the value so it will do what inside it (xc = i)
I don't want the code to even check "ElseIf Cells(1, i) = "c" again, like the following image, is this possible?
code as text:
Sub test()
Dim i, xa, xb, xc As Integer
For i = 1 To 5
If Cells(i, 1) = "a" Then
xa = i
ElseIf Cells(i, 1) = "b" Then
xb = i
ElseIf Cells(i, 1) = "c" Then
xc = i
End If
Next i
End Sub
My initial interpretation of your need was "if the code hits 'c' again, just don't act".
To do so, you could modify the logic as follows:
ElseIf (xc = 0) And (Cells(i, 1) = "c") Then
This way, as soon as xc is set, the first boolean expression would be False, and the overall condition would not ever be met again. As mentioned by #TimWilliams, VBA would still evaluate the second boolean expression, unlike other languages that feature short-circuiting options. #Gene's answer describes a way around this. Typically, for better performance, you would evaluate the simple conditions first, before resorting to costly ones.
Additional notes
In VBA, you must give a type to each variable. In your Dim line, only xc is an Integer, while the other variables are Variants.
An unqualified Cells() call operates on the currently active worksheet, which might not be the expected one. Suggestion: qualify Cells() with the CodeName of your worksheet. The CodeName is what you see or specify under a worksheet's (Name) property as seen from the Visual Basic editor. For example, if (Name) is Sheet1, use Sheet1.Cells(). This will only work if the code resides in the same workbook as Sheet1. If the code is behind the worksheet itself, you can even use Me.Cells().
When dealing with cell values as your code does, VBA is (silently) being nice and understands that, among the numerous properties of the Range class, Value is what you are interested in. It is better practice, however, to explicitly state the target property, such as in Sheet1.Cells(i, j).Value.
EDIT
Knowing the values will be distinct and that there are about 60 of them, I suggest you simply use a Dictionary, as shown below, to get each value's row in one go, without a cascade of Ifs:
Option Explicit
Sub test()
Dim i As Integer
Dim dict As Object 'Scripting.Dictionary
Set dict = CreateObject("Scripting.Dictionary")
For i = 1 To 5
dict(Cells(i, 1).Value) = i
Next
Debug.Print dict("a") '4
Debug.Print dict("b") '2
Debug.Print dict("c") '1
'Etc.
End Sub
if i understood your question you can try this code:
Sub test()
Dim i, xa, xb, xc As Integer
Dim a, b, c As Boolean
a = False
b = False
c = False
For i = 1 To 5
If Cells(i, 1) = "a" And a <> True Then
xa = i
a = True
ElseIf Cells(i, 1) = "b" And b <> True Then
xb = i
b = True
ElseIf Cells(i, 1) = "c" And c <> True Then
xc = 1
c = True
End If
Next i
End Sub
Boolean variable is setted true for example only when the cells(i,1)="a" and after the next "a" value are skipped...
hope this helps
I just wanted to "mod" Ferdinando's code so it's a bit more "readable", I think. The main (the substantive) difference between this version and Ferdinando's or Excelosaurus' is that the cell is not even tested once the value is detected. Remember that the question was: I don't want the code to even check "ElseIf Cells(1, i) = "c" again... So, this version does exactly that.
Sub test()
Dim i As Integer, xa As Integer, xb As Integer, xc As Integer
Dim aFound As Boolean, bFound As Boolean, cFound As Boolean
Dim r As Range
For i = 1 To 5
Set r = Cells(i, 1)
If Not aFound Then
If r = "a" Then xa = i: aFound = True
ElseIf Not bFound Then
If r = "b" Then xb = i: bFound = True
ElseIf Not cFound Then
If r = "c" Then xc = i: cFound = True
End If
Next i
End Sub
I don't like the idea of 60 ElseIfs. Please examine the code below. In order to test it, create a worksheet called "TestSheet" and enter your A1:A5 to cells H2:H6.
Sub TestSpike()
' 06 Jan 2019
Dim Rng As Range
Dim Items As Variant
Dim Spike As String
Dim Tmp As String
Dim i As Integer
Dim R As Long
Items = Split("c|b|0|a|1", "|")
With Worksheets("TestSheet").Columns("H")
For R = 2 To 6
Tmp = CStr(.Cells(R).Value)
If InStr(1, Spike, Tmp, vbTextCompare) = 0 Then
Spike = Spike & "|" & Tmp
On Error Resume Next
i = Application.WorksheetFunction.Match(Tmp, Items, 0)
If Err Then
MsgBox Tmp & " wasn't found in Array"
Else
MsgBox "i = " & i & " = Item " & Tmp
End If
End If
Next R
End With
End Sub
The code has a "Spike". Each item is first checked against the Spike. If it is found there no further tests are carried out. Else, it is added to the Spike.
New items, after being added to the Spike, are checked against the Array "Items" which would hold your 60 elements, separated by Chr(124) thus, Split("c|b|0|a|1", "|"). I use the worksheet function MATCH to look for the item in the array. The result is an index number (or an error, if not found). You can use this index number in a Select Case statement to process each item distinct from others, basically the same way as you now process it when the If statement returns True.
One idea you may find useful with this kind of setup is to use the index from the Match function to return a value from another array. The other array might, for example, contain function names and you use Application.Run to call a different function for each item. This would run significantly faster than examining 60-odd Select Case statements.

Deleting everything after the second occurrence of a number

Quick question, if I want to delete everything after the second occurrence of a number:
i.e -
I have:
1105 Bracket Ave. Suite 531 Touche
5201 Used St. 1351 Bored Today
I want:
1105 Bracket Ave. Suite 531
5201 Used St. 1351
is there a simple formula or VBA I would use for this?
Here is a UDF using VBA's regular expression engine to remove all after the second integer.
Option Explicit
Function FirstTwoNumbers(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "(\d+\D+\d+).*"
FirstTwoNumbers = .Replace(S, "$1")
End With
End Function
If there is only a single integer, it will return the entire string.
If the numbers might be decimal numbers, will need to modify .Pattern
And here is another UDF using only native VBA methods:
Function FirstTwo(S As String) As String
Dim V
Dim tS As String
Dim I As Long, numNumbers As Long
V = Split(S)
Do Until numNumbers = 2
tS = tS & Space(1) & V(I)
I = I + 1
If IsNumeric(V(I - 1)) Then numNumbers = numNumbers + 1
Loop
FirstTwo = Mid(tS, 2)
End Function
and finally, a formula with no particular assumptions:
=LEFT(A1,FIND(CHAR(1),SUBSTITUTE(A1," ",CHAR(1),LOOKUP(2,1/ISNUMBER(-TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),seq_99,99))),seq))))
seq and seq99 are Named Formulas Formula ► Define Name
seq Refers to: =ROW(INDEX($1:$255,1,1):INDEX($1:$255,255,1))
seq_99 Refers to: =IF(ROW(INDEX($1:$255,1,1):INDEX($1:$255,255,1))=1,1,(ROW(INDEX($1:$255,1,1):INDEX($1:$255,255,1))-1)*99)
This solution is with these assumptions:-
First occurrence of a number will not have a length > 10
There will atleast a distance of 10 or 10 alphabets including spaces between first and second number
There will always be a 'space' existing after second number
There will always be a second number present in the string
Try this:-
=TRIM(MID(A1,1,FIND(" ",A1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},A1&"0123456789",MIN(FIND({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))+10)))))
Here is a VBA approach, amend range to suit. It puts the answer in the adjacent column
Sub x()
Dim oMatches As Object, r As Range
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "\d+"
For Each r In Range("A1:A5")
If .Test(r) Then
Set oMatches = .Execute(r)
If oMatches.Count > 1 Then
r.Offset(, 1).Value = Left(r, oMatches(1).firstindex + oMatches(1).Length)
Else
r.Offset(, 1).Value = r.Value
End If
Else
r.Offset(, 1).Value = r.Value
End If
Next r
End With
End Sub
You can use the following formula,if A1 is your string,in B1 write:
=LEFT(A1,MAX(IFERROR(ISNUMBER(VALUE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)))*ROW(INDIRECT("1:"&LEN(A1))),0)))
press Ctrl+Shift+Enter at the same time Array Formula
This will read the length of the string and return the Maximum place of numbers (last number in the string) and return the Left() string till this number

Runtime Error on a 2D Bubblesort in Excel VBA array

I have been banging my head (and a few other heads as well on other Excel programming sites) to get a Combobox in a Userform to sort the rows (coming from two columns in the source spreadsheet) in alpha order.
Ideally, I want a 2 dimensional sort, but at this point, will settle for ONE that works.
Currently, the Combobox, when dropped down, reads in part (minus the bullet points, which do NOT appear and are not needed):
Zoom MRKPayoutPlan
Chuck PSERSFuture
Chuck PSERSCurrent
What I want is:
Chuck PSERSCurrent
Chuck PSERSFuture
Zoom MRKPayoutPlan
The first order is derived from the order in which the rows appear in the source worksheet.
At this point, I am getting a Runtime Error '13', Type Mismatch error. Both fields are text fields (one is last name, the other is a classification code- I want to sort first by name).
Below are the two relevant sections of the VBA code. If someone can help me sort this out, I'll buy at least a virtual round of beers. Excel VBA is not my most comfortable area- I can accomplish this in other apps, but the client spec is that this all must run in Excel alone. Thanks in advance.
Private Sub UserForm_Initialize()
fPath = ThisWorkbook.Path & "\"
currentRow = 4
sheetName = Sheet5.Name
lastRow = Sheets(sheetName).Range("C" & Rows.Count).End(xlUp).Row
Dim rngUID As Range
Dim vList
Set rngUID = Range("vUID")
With rngUID
vList = Application.Index(.Cells, .Parent.Evaluate("ROW(" & .Address & ")"), Array(7, 1))
End With
vList = BubbleSort2D(vList, 2, 1)
With ComboBox1
.ColumnCount = 2
.ColumnWidths = "100;100"
.List = vList
End With
PopulateControls
End Sub
Public Function BubbleSort2D(Strings, ParamArray SortColumns())
Dim tempItem
Dim a As Long
Dim e As Long
Dim f As Long
Dim g As Long
Dim i As String
Dim j As String
Dim m() As String
Dim n
Dim x As Long
Dim y As Long
Dim lngColumn As Long
e = 1
n = Strings
Do While e <> -1
For a = LBound(Strings) To UBound(Strings) - 1
For y = LBound(SortColumns) To UBound(SortColumns)
lngColumn = SortColumns(y)
i = n(a, lngColumn)
j = n(a + 1, lngColumn)
f = StrComp(i, j)
If f < 0 Then
Exit For
ElseIf f > 0 Then
For x = LBound(Strings, 2) To UBound(Strings, 2)
tempItem = n(a, x)
n(a, x) = n(a + 1, x)
n(a + 1, x) = tempItem
Next x
g = 1
Exit For
End If
Next y
Next a
If g = 1 Then
e = 1
Else
e = -1
End If
g = 0
Loop
BubbleSort2D = n
End Function
Here is a bubble sort in VBA source.
Public Sub BubbleSort(ByRef sequence As Variant, _
ByVal lower As Long, ByVal upper As Long)
Dim upperIt As Long
For upperIt = upper To lower + 1 Step -1
Dim hasSwapped As Boolean
hasSwapped = False
Dim bubble As Long
For bubble = lower To upperIt - 1
If sequence(bubble) > sequence(bubble + 1) Then
Dim t as Variant
t = sequence(bubble)
sequence(bubble) = sequence(bubble + 1)
sequence(bubble + 1) = t
hasSwapped = True
End If
Next bubble
If Not hasSwapped Then Exit Sub
Next upperIt
End Sub
Note that using variable names that specify what they are and do instead of single letters makes it easier to read.
As for the 2D sort. Don't. Sort each array individually then sort the array of arrays using the same method. You will need to provide an abstraction to compare the columns. Do not try to sort them both at the same time. I can't think of a scenario where that is a good idea. If for some reason elements can change their sub array in the 2D array, then flatten it into 1 array, sort that and split it back into a 2D array.
Honestly from what I am understanding of you specific problem. You are going from 1D sequence to a 1D sequence so I think 2D arrays are and unnecessary complication.
Instead use a modified bubble sort routine with the comparison statement,
If sequence(bubble) > sequence(bubble +1) Then '...
replaced with a custom comparison function
ComboBoxItemCompare(sequence(bubble), sequence(bubble + 1))
that will return True if the first argument should be swapped with the second.

Parsing and comparing a complicated string

I am hoping someone could help me out with a VBA Excel macro.
I have received a worksheet in Excel 2007 which contains product names in one column, and I need to sort this into a logical format so I can use it. However, the list itself is not in any kind of logical order, is 10 000 rows long and I am going to have to do this every month!!
Basically, what I would like to do is search for certain keywords which are common to most of the entries and move them into separate cells in different columns (but in the same row as the original entry).
Regarding keywords: There are 3 different types, two of which I have a complete list of.
Example of keywords: some are measures such as cm (centimetre), mm (millimetre), m (metre) etc.). Then there are other keywords such as % and finally a last set of keywords which is wood, plastic, glass etc.
If this was not complicated enough, the measures (cm for example) are duplicated in some instances and are important details so I cant just separate them but would ideally like them in two adjacent cells.
Fortunately, there is a space after each measure, % sign and item material.
Working from right to left is the easiest way I can think of achieving this as the first description in the string varies wildly between entries and that can stay as is.
So, below is an example string, lets say this is in Cell A1. (Inverted commas are not included in the string and the word "by" appears in only about 100 cases. Usually it is missing...)
"Chair Leg Wood 100% 1m by 20cm"
I would ideally like for the string to be split up into cells as follows
Cell B1 - Chair Leg
Cell C1 - Wood
Cell D1 - 1m
Cell E1 - 2cm
Cell F1 - 100%
Having the % measures in the same column would be extremely helpful
Can anyone please help me with this or the beginnings of a macro which does this and then moves down the list - I have tried using some basic "find" and "len" formulas but really am at my wits end on how to deal with this!
The task boils down to defining a robust definition of the structure of the input data.
Form the info provided a candidate definition might be
<Description, one or more words> <Material, one word> <A value followd by %> <Dimension A> <optional "by"> <Dimension B>
The following macro will process data that conforms this this spec. The definition may need
expanding, eg two word materials (eg Mild Steel)
You will need to add error handling in case any rows don't conform, eg no % in the string, or % character elsewhere in string
Option Explicit
Dim dat As Variant
Sub ProcessData()
Dim r As Range
Dim i As Long
Set r = Intersect(ActiveSheet.UsedRange, ActiveSheet.Columns(1)).Resize(, 5)
dat = r
For i = 1 To UBound(dat, 1)
ParseRow i, CStr(dat(i, 1))
Next
r = dat
ActiveSheet.Columns(5).Style = "Percent"
End Sub
Sub ParseRow(rw As Long, s As String)
'Chair Leg Wood 100% 1m by 20cm
Dim i As Long
Dim sDim As String, sPCnt As String, sMat As String, sDesc As String
Dim sA As String, sB As String
i = InStr(s, "% ")
sDim = Trim(Replace(Mid(s, i + 2), " by ", " ")) ' text to right of %, remove "by"
sA = Trim(Left(sDim, InStr(sDim, " "))) ' split dimension string in two
sB = Trim(Mid(sDim, InStr(sDim, " ")))
s = Left(s, i)
i = InStrRev(s, " ")
sPCnt = Mid(s, i + 1) ' text back to first space before %
s = Trim(Left(s, i))
i = InStrRev(s, " ") ' last word in string
sMat = Mid(s, i + 1)
sDesc = Trim(Left(s, i)) ' whats left
dat(rw, 1) = sDesc
dat(rw, 2) = sMat
dat(rw, 3) = sA
dat(rw, 4) = sB
dat(rw, 5) = sPCnt
End Sub
First, I'd use the Split function to separate the parts into an array, this will avoid most of the string functions and string math:
Dim parts As Variant
parts = Split(A1)
Then, I'd do my comparisons to each part.
Finally, I'd concatenate the parts I didn't breakout, and place all parts on the sheet.
This is based on your example which has spaces inbetween every part, though something similar could work otherwise, you just have to do more work with each part.
Here's my stab at it. We could use about 10 more examples, but this should be a start. To use, select a one column range with your descriptions and run SplitProduct. It will split it out to the right of each cell.
Sub SplitProducts()
Dim rCell As Range
Dim vaSplit As Variant
Dim i As Long
Dim aOutput() As Variant
Dim lCnt As Long
Const lCOLDESC As Long = 1
Const lCOLMAT As Long = 2
Const lCOLPCT As Long = 3
Const lCOLREM As Long = 4
If TypeName(Selection) = "Range" Then
If Selection.Columns.Count = 1 Then
For Each rCell In Selection.Cells
'split into words
vaSplit = Split(rCell.Value, Space(1))
ReDim aOutput(1 To 1, 1 To 1)
'loop through the words
For i = LBound(vaSplit) To UBound(vaSplit)
Select Case True
Case IsPercent(vaSplit(i))
'percents always go in the same column
lCnt = lCOLPCT
If UBound(aOutput, 2) < lCnt Then
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
End If
aOutput(1, lCnt) = vaSplit(i)
Case IsInList(vaSplit(i))
'list items always go in the same column
lCnt = lCOLMAT
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
If UBound(aOutput, 2) < lCnt Then
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
End If
aOutput(1, lCnt) = vaSplit(i)
Case IsMeasure(vaSplit(i))
'measurements go in the last column(s)
If UBound(aOutput, 2) < lCOLREM Then
lCnt = lCOLREM
Else
lCnt = UBound(aOutput, 2) + 1
End If
ReDim Preserve aOutput(1 To 1, 1 To lCnt)
aOutput(1, lCnt) = vaSplit(i)
Case Else
'everything else gets concatentated in the desc column
aOutput(1, lCOLDESC) = aOutput(1, lCOLDESC) & " " & vaSplit(i)
End Select
Next i
'remove any extraneous spaces
aOutput(1, lCOLDESC) = Trim(aOutput(1, lCOLDESC))
'write the values to the left of the input range
rCell.Offset(0, 1).Resize(1, UBound(aOutput, 2)).Value = aOutput
Next rCell
Else
MsgBox "Select a one column range"
End If
End If
End Sub
Function IsPercent(ByVal sInput As String) As Boolean
IsPercent = Right$(sInput, 1) = "%"
End Function
Function IsInList(ByVal sInput As String) As Boolean
Dim vaList As Variant
Dim vaTest As Variant
'add list items as needed
vaList = Array("Wood", "Glass", "Plastic")
vaTest = Filter(vaList, sInput)
IsInList = UBound(vaTest) > -1
End Function
Function IsMeasure(ByVal sInput As String) As Boolean
Dim vaMeas As Variant
Dim i As Long
'add measurements as needed
vaMeas = Array("mm", "cm", "m")
For i = LBound(vaMeas) To UBound(vaMeas)
'any number of characters that end in a number and a measurement
If sInput Like "*#" & vaMeas(i) Then
IsMeasure = True
Exit For
End If
Next i
End Function
No guarantees that this will be speedy on 10k rows.

Resources