Categorize single column of text into multiple columns - excel

I am trying to create a macro that will categorize data in one column into multiple columns based on the item type. The data I am trying to categorize is a list of contracts with meta-data on the items in the contract.
The raw data looks like this:
Contract No Contract Name Item Type Item Description
111111 Chocolate Supplies POTS 5"
111111 Chocolate Supplies POTS 10"
111111 Chocolate Supplies POTS 15"
111111 Chocolate Supplies PANS 5"
111111 Chocolate Supplies PANS 10"
111111 Chocolate Supplies PANS 15"
111111 Chocolate Supplies KNIVES Paring knife
111111 Chocolate Supplies SILVERWARE Salad fork
111111 Chocolate Supplies SILVERWARE Dinner fork
111111 Chocolate Supplies SILVERWARE Dessert fork
111111 Chocolate Supplies SILVERWARE Dessert spoon
111111 Chocolate Supplies SILVERWARE Soup spoon
22222 Soups and Salads Order POTS 10"
22222 Soups and Salads Order POTS 15"
22222 Soups and Salads Order PANS 15"
22222 Soups and Salads Order KNIVES Butter knife
22222 Soups and Salads Order KNIVES Bread knife
22222 Soups and Salads Order KNIVES Paring knife
22222 Soups and Salads Order SILVERWARE Soup spoon
The final data needs to look like this (edited to include image):
Contract Contract Name POTS PANS KNIVES SILVERWARE
111111 Chocolate Supplies 5" 5" Paring knife Salad fork
10" 10" Dinner fork
15" 15" Dessert fork
Dessert spoon
Soup spoon
22222 Soups and Salads Order 10" 15" Butter knife Soup spoon
15" Bread knife
Paring knife
# What I've tried so far #
The current crude solution I am using is to:
- Run the query
- Paste the data into excel
- Create a pivot
- Use a series of count, offset and indirect formulas to reorganize the data as needed
- Since the above process leaves empty rows between each section of contracts, I copy-paste the data into a new worksheet, put an Autofilter and remove the blank rows
... and voila, that's the final report.
# Possible VBA solution #
I found this tutorial which seems to do exactly what I want, except for the problem where I need the macro to start a new section when the contract no. changes. I don't know how to get the VBA code below to also check for the contract no.
I'd love any help you could send my way. Thanks in advance!
# Code from tutorial on get-digital-help [dot] com by Oscar. #
This is not my code, and I give complete credit to Oscar's tutorial for getting me going in the right direction.
Sub Categorizedatatocolumns()
Dim rng As Range
Dim dest As Range
Dim vrb As Boolean
Dim i As Integer
Set rng = Sheets("Sheet1").Range("A4")
vrb = False
Do While rng <> ""
Set dest = Sheets("Sheet1").Range("A20")
Do While dest <> ""
If rng.Value = dest.Value Then
vrb = True
End If
Set dest = dest.Offset(0, 1)
Loop
If vrb = False Then
dest.Value = rng.Value
dest.Font.bold = True
End If
vrb = False
Set rng = rng.Offset(1, 0)
Loop
Set rng = Sheets("Sheet1").Range("A4")
Do While rng <> ""
Set dest = Sheets("Sheet1").Range("A20")
Do While dest <> ""
If rng.Value = dest.Value Then
i = 0
Do While dest <> ""
Set dest = dest.Offset(1, 0)
i = i + 1
Loop
Set rng = rng.Offset(0, 1)
dest.Value = rng.Value
Set rng = rng.Offset(0, -1)
Set dest = dest.Offset(-i, 0)
End If
Set dest = dest.Offset(0, 1)
Loop
Set rng = rng.Offset(1, 0)
Loop
End Sub

You may consider using pivot table which will give similar output.
Turn off the Subtotal and show data in tabular form for all fields.

Related

Excel - How to find number of times strings appear in column

This is my first post, I'm still a beginner at excel!
I created a python script that scrapes the Billboard Hip Hop/R&B Charts and populates the data to an excel spreadsheet. My data looks like this:
Headers are Billboard Number, Artist Name and Song Title.
1 Drake Nice For What
2 Post Malone Featuring Ty Dolla $ign Psycho
3 Drake God's Plan
4 Post Malone Better Now
5 Post Malone Featuring 21 Savage Rockstar
6 BlocBoy JB Featuring Drake Look Alive
7 Post Malone Paranoid
8 Lil Dicky Featuring Chris Brown Freaky Friday
9 Post Malone Rich & Sad
10 Post Malone Featuring Swae Lee Spoil My Night
11 Post Malone Featuring Nicki Minaj Ball For Me
12 Migos Featuring Drake Walk It Talk It
13 Post Malone Featuring G-Eazy & YG Same Bitches
14 Cardi B| Bad Bunny & J Balvin I Like It
15 Post Malone Zack And Codeine
16 Post Malone Over Now
17 Cardi B Be Careful
18 Post Malone Takin' Shots
19 The Weeknd & Kendrick Lamar Pray For Me
20 Rich The Kid Plug Walk
21 The Weeknd Call Out My Name
22 Bruno Mars & Cardi B Finesse
23 Post Malone Candy Paint
24 Ella Mai Boo'd Up
25 Rae Sremmurd & Juicy J Powerglide
26 Post Malone 92 Explorer
27 J. Cole ATM
28 J. Cole KOD
29 Post Malone Otherside
30 Post Malone Blame It On Me
31 J. Cole Kevin's Heart
32 Kendrick Lamar & SZA All The Stars
33 Nicki Minaj Chun-Li
34 Lil Pump Esskeetit
35 Migos Stir Fry
36 Famous Dex Japan
37 Post Malone Sugar Wraith
38 Cardi B Featuring Migos Drip
39 XXXTENTACION Sad!
40 Jay Rock| Kendrick Lamar| Future & James Blake King's Dead
41 Rich The Kid Featuring Kendrick Lamar New Freezer
42 Logic & Marshmello Everyday
43 J. Cole Motiv8
44 YoungBoy Never Broke Again Outside Today
45 Post Malone Jonestown (Interlude)
46 Cardi B Featuring 21 Savage Bartier Cardi
47 YoungBoy Never Broke Again Overdose
48 J. Cole 1985 (Intro To The Fall Off)
49 J. Cole Photograph
50 Khalid| Ty Dolla $ign & 6LACK OTW
I want to count the total number times an artist appears under Artist Name including if they were featured in a song and also display top charting song. For example:
Headers are Artist Name, Billboard Appearances and Top Song.
Post Malone 17 Psycho
J.Cole 6 ATM
Cardi B 5 I Like It
Drake 4 Nice For What
Migos 3 Walk It Talk It
YoungBoy Never Broke Again 2 Outside Today
Rich The Kid 2 Plug Walk
21 Savage 2 Rockstar
...
How can I achieve this?
If you already have all artist names, use countif and vlookup with wildcard.
ps. Make sure your artist names are correct. Your sample data J.cole didn't contain whitespace, it will return wrong result.
First you need to do what's called data cleaning to get a list of the artists on the billboards.
To get a list of the unique artists copy the list of your data to a new space on your spreadsheet. Then select all of your data and run the "Remove Duplicates" function (under the Data tab) selecting the artist column. This will give you a list of all the unique artists and give you their top song to boot.
Now all those X featuring Y artist "names" will be unique, so you will need to filter it down some more. Search for the "Featuring" linker word with the Find function and use that combined with the Left function to grab only the first artist. Something like this, =IFERROR(LEFT(I3,FIND("Featuring",I3)-2),I3)
This function uses the iferror function to pass through names that don't have the "Featuring" word. Then do the same thing on that resulting column for & and | and that will give you a pretty clean list of single artists.
To get the featured artists, do a similar thing using the Right function instead of the Left function.
After you get that clean list do the unique artists filter again to condense it down. From there you can use that with your vlookup or find functions to start counting.
You can use Power Query (Get & Transform Data) to manipulate your table:
let
Source = Excel.CurrentWorkbook(){[Name="tbInput"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Billboard Number", Int64.Type},
{"Artist Name", type text},
{"Song Title", type text}}),
#"Split Column by Delimiters" = Table.ExpandListColumn(
Table.TransformColumns(
#"Changed Type",
{{"Artist Name", Splitter.SplitTextByAnyDelimiter({"Featuring","|","&"}, QuoteStyle.None),
let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
"Artist Name"),
#"Trimmed Text" = Table.TransformColumns(
#"Split Column by Delimiters",
{{"Artist Name", Text.Trim, type text}}),
#"Artist Highest Position" = Table.Group(
#"Trimmed Text",
{"Artist Name"},
{{"Highest Position", each List.Min([Billboard Number]),
type number}}),
#"Merge 1" = Table.NestedJoin(
#"Artist Highest Position",
{"Artist Name", "Highest Position"},
#"Trimmed Text",
{"Artist Name", "Billboard Number"},
"Merged",
JoinKind.LeftOuter),
#"Highest Song" = Table.ExpandTableColumn(
#"Merge 1", "Merged", {"Song Title"}, {"Song Title"}),
#"Artist Count" = Table.Group(
#"Trimmed Text",
{"Artist Name"},
{{"Count", each Table.RowCount(_), type number}}),
#"Merge 2" = Table.NestedJoin(
#"Artist Count",
{"Artist Name"},
#"Highest Song",
{"Artist Name"},
"Merged",
JoinKind.LeftOuter),
#"Expanded Merged" = Table.ExpandTableColumn(
#"Merge 2", "Merged", {"Song Title"}, {"Song Title"}),
#"Sorted Rows" = Table.Sort(#"Expanded Merged",{{"Count", Order.Descending}})
in
#"Sorted Rows"
Which gives the output:
The hard part is cleaning the data to get a unique list of artist names.
Examining your list, it seems that when there are multiple artist names listed for a single song, they will be separated by Featuring, &, or |
If that is always the case, you can use a VBA macro to separate the names, and then use a Dictionary to collect a list of the names.
While you are creating that list, it is trivial to also obtain the Count of times the artist appears, and also the top rated song (which would be the song associated with the first instance of that name).
We utilize a User Defined Object (Class) to hold the information, and collect those objects into a Dictionary keyed to the artist name.
Note also that we read the worksheet data into a VBA array, and iterate through the array. This usually runs an order of magnitude faster than iterating through the actual worksheet.
To obtain the report, we would then output the results onto a worksheet.
Class Module
Option Explicit
'Class module **RENAME**: cArtist
Public Cnt As Long
Public Song As String
Regular Module
Option Explicit
Option Compare Text
Sub Artists()
Dim dA As Dictionary, cA As cArtist
Dim vSrc, vRes
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim V, W, X, Y, Z, A, B
Dim I As Long
Dim sKey As String
Set wsSrc = Worksheets("sheet6")
With wsSrc
vSrc = .Range(.Cells(2, 2), .Cells(.Rows.Count, 2).End(xlUp)).Resize(columnsize:=2)
End With
Set wsRes = Worksheets("sheet6")
Set rRes = wsRes.Cells(1, 6)
Set dA = New Dictionary
For I = 1 To UBound(vSrc, 1)
W = Split(vSrc(I, 1), "Featuring")
For Each X In W
Y = Split(X, "|")
For Each Z In Y
A = Split(Z, "&")
For Each B In A
sKey = Trim(B)
Set cA = New cArtist
With cA
.Cnt = 1
.Song = Trim(vSrc(I, 2))
End With
If Not dA.Exists(sKey) Then
dA.Add Key:=sKey, Item:=cA
Else
dA(sKey).Cnt = dA(sKey).Cnt + 1
End If
Next B
Next Z
Next X
Next I
ReDim vRes(0 To dA.Count, 1 To 3)
vRes(0, 1) = "Artist Name"
vRes(0, 2) = "Billboard Appearances"
vRes(0, 3) = "Top Song"
I = 0
For Each V In dA.Keys
I = I + 1
With dA(V)
vRes(I, 1) = V
vRes(I, 2) = .Cnt
vRes(I, 3) = .Song
End With
Next V
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
.Sort key1:=rRes(1, 2), order1:=xlDescending, key2:=rRes(1, 1), order2:=xlAscending, MatchCase:=False, Header:=xlYes
.Style = "Output"
With .Columns(2)
.ColumnWidth = .ColumnWidth / 2
.WrapText = True
.HorizontalAlignment = xlCenter
End With
With .Rows(1)
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlCenter
End With
.EntireColumn.AutoFit
End With
End Sub
Output given your Input above

Extract partial matches string(words) between 2 columns in Excel

I have two columns like below, I need to extract a partial word that matches between the 2 columns and put this in a 3rd columns.
Name1 Name2
RED CURRY CHICKEN GREEN CURRY CHICKEN
BEEF WITH MINT LEAVES BEEF WITH BASIL LEAVES
SWEET AND SOUND PORK BACON AND EGG
FRIED RICE FRIED RICE
My expected result
Name1 Name2 Partial Matches
RED CURRY CHICKEN GREEN CURRY CHICKEN CHICKEN
BEEF WITH MINT LEAVES BEEF WITH BASIL LEAVES BEEF WITH LEAVES
SWEET AND SOUND PORK BACON AND EGG AND
FRIED RICE FRIED RICE FRIED RICE
To do this with a formula is inefficient; you need vba for a sub procedure or user defined function.
Function friedRice(str1 As String, str2 As String)
Dim w As Long, words As Variant, tmp As String
words = Split(str1, Chr(32))
For w = LBound(words) To UBound(words)
If Not IsError(Application.Match(words(w), Split(str2, Chr(32)), 0)) Then
tmp = tmp & Chr(32) & words(w)
End If
Next w
friedRice = Trim(tmp)
End Function

Excel: Scanning column with multiple conditions

I just started off as a new programmer and I'm trying to create Vba code to scan the following data in Excel.
Let's assume
Apple = 123456
Apple Juice = 987654
Pear = 234567
Pear Juice = 987654
Orange = 345678
Orange Juice = 987654
The codes for each item will be given but what I want is to write a program to detect the specific codes and assigning it the names (apple, apple juice etc.). If "123456" appears first, then the cell below it will be assigned "Apple Juice". If "234567" appears first, then it will be assigned "Pear Juice". It is challenging for me as Juice category shares the same "987654" unlike the Fruits which has a unique FruitNo assigned to it.
However, I am having difficulty writing the codes for the scanning of above cells. I have tried using the ActiveCellOffset but it doesn't seem to be functioning to the way I want it.
The following codes are what I have tried so far.
And this is the outcome I would like to have. The red letters are the names that the program is suppose to come up with.
Changing your code from
If FruitNo = 987654 And ActiveCell.Offset(-1, 0) = 123456 Then
to
If FruitNo = 987654 And Range("A" & i).Offset(-1, 0) = 123456 Then
should solve your issue (and same for another IF condition). However, this can be achieved using formula also.

Find all other cells with same adjacent element

I have an excel spreadhseet with the following columns
A: City
B: State
C: Other cities that are in the same state as column A
For example, the result may look like this:
City State Other cities in State
--------------------------------------------------------------
Philadelphia Pennsylvania Pitsburgh
Pitsburgh Pennsylvania Philadelphia
San Diego California Palo Alto, Mountain View, LA, San Jose, Houston
Palo Alto California San Jose, Mountain View, San Diego
Mountain View California San Jose, LA, Palo Alto, San Diego
LA California San Jose, Mountain View, Palo Alto, San Diego
San Jose California LA, Mountain View, Palo Alto, San Diego
Austin Texas Houston, Dallas
Houston Texas Austin, Dallas
Dallas Texas Dallas, Houston
What formula could I use to generate the 'other cities in state' column?
Ragged edge string concatenation is difficult using Excel worksheet functions; even with the new Excel 2016/Office 365/Excel Online CONCAT and TEXTJOIN functions¹.
A well written UDF² can easily overcome the limitations.
Module1 code sheet
Option Explicit
Function CITYJOIN(rst As Range, sst As String, rct As Range, _
Optional sct As String = "", _
Optional bIncludeSelf As Boolean = False, _
Optional delim As String = ", ")
Dim r As Long
Static dict As Object
If dict Is Nothing Then
Set dict = CreateObject("Scripting.Dictionary")
dict.compareMode = vbTextCompare
End If
dict.RemoveAll
'truncate any full column references to the .UsedRange
Set rst = Intersect(rst, rst.Parent.UsedRange)
'set the cities to the same size as the states
Set rct = rct.Resize(rst.Rows.Count, rst.Columns.Count)
'loop through the cells to create unique dictionary keys
For r = 1 To rst.Cells.Count
If LCase(rst(r).Value2) = LCase(sst) Then
dict.Item(StrConv(rct(r).Value2, vbProperCase)) = vbNullString
End If
Next r
'get rid of 'self-city'
If Not bIncludeSelf Then
dict.Remove sct
End If
'return a delimited string
CITYJOIN = Join(dict.keys, delim)
End Function
      
The optional city (e.g. sct ) is only optional if you choose to include the city on the same row. By default the city on the same row is excluded and must be supplied as a parameter in order to remove it.
The static dict object means that you will only create the Scripting.Dictionary object once. The same object is used for subsequent calls to the function. This is particularly useful when filling down a long column with a formula containing this UDF.
¹ excel-2016 See What's new in Excel 2016 for Windows for more information.
² A User Defined Function (aka UDF) is placed into a standard module code sheet. Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to Insert ► Module (Alt+I,M). Paste the function code into the new module code sheet titled something like Book1 - Module1 (Code). Tap Alt+Q to return to your worksheet(s).

Adding data to worksheet where unique identifier accumulates

I had a mate of mine create a VBA script for me for a rather large Excel file containing football player game records.
The file currently has around 7000 player names over 190000 odd rows and each player is granted a unique PID (PlayerID). So a player with 10 games played will have 10 rows with one unique PID.
The script goes off to a website, copies player data and pastes it to the end of my excel worksheet called (Goals). When this data is added to the worksheet if a player matched on surname and firstname already exists, the PID will be populated with that players PID. If, in the data being added from the website, the player does not already exist a new unique number is to be given to this player.
For example:
The Player Fred SMITH PID=1234 already exists, any new record for him will receive the PID of 1234.
A new player Joe BLOGGS is added via the script the PID for him shall be the highest existing PID + 1. So if Fred SMITH has the highest PID then Joe BLOGGS would then be assigned the PID of 1235.
The script works well until a new player is added.
Data before import:
PID | surname | firstname | Game |
1233| Jones | Mark | 1
1234| Smith | Fred | 2
Expected after import - Joe Blogs New player
PID | surname | firstname | Game |
1233| Jones | Mark | 1
1234| Smith | Fred | 2
1235| Bloggs | Joe | 3
1234| Smith | Fred | 3
Actual after import - Joe Blogs New player
PID | surname | firstname | Game |
1233| Jones | Mark | 1
1234| Smith | Fred | 2
1235| Bloggs | Joe | 3
1236| Smith | Fred | 3
I can see why this happens as the script says add 1 in column A, but how can I change it so it adds 1 to the highest number in column A rather than the number on the row above?
Here's the script:
For d = 1 To 300000
If Worksheets("Goals").Range("G" & CStr(d)).Value = surname Then
If Worksheets("Goals").Range("H" & CStr(d)).Value = firstname Then
PID = Worksheets("Goals").Range("A" & CStr(d)).Value
ID = Worksheets("Goals").Range("B" & CStr(d - 1)).Value + 1
Exit For
Else:
If Worksheets("Goals").Range("H" & CStr(d)).Value = "" Then
PID = Worksheets("Goals").Range("A" & CStr(d - 1)).Value + 1
ID = Worksheets("Goals").Range("B" & CStr(d - 1)).Value + 1
Exit For
End If
End If
Else:
If Worksheets("Goals").Range("A" & CStr(d)).Value = "" Then
PID = Worksheets("Goals").Range("A" & CStr(d - 1)).Value + 1
ID = Worksheets("Goals").Range("B" & CStr(d - 1)).Value + 1
Exit For
End If
End If
Next d
There's a bunch of other optimizations you would probably benefit from making to your script, leaving aside anything else - you're accessing the Object layer a LOT which will slow things down.
As for calculating the PID quickly, there's a number of good ways to do it. My personal favourite (because it minimizes worksheet layer access) would be to build a dictionary of existing PIDs before you do anything else. If you're only importing say, 20 records, that might take an appreciable amount of time, but otherwise, the ability to use .HasKey(PID) methods will save you a LOT of time. Dictionaries are great, especially in languages like VBA.
Side note: What you really want is a primary key in a database. For all that Access gets a lot of flac for being too easy to use wrong, it is probably a better tool for this kind of thing than Excel.
Anyway, I'll throw in an optimized script in a few minutes, but in the mean time, you might want to look at using something like
Excel.WorksheetFunction.Max(Worksheets("Goals").Range("A1:A30000"))
when you're after the highest number in a range...
While a helper column that combined the firstname and surname would speed up general lookup operations, creating a virtual one in a Scripting.Dictionary might be the best for your import operation.
To use a Scripting.Dictionary you will need to go into the VBE's Tools ► References and add Microsoft Scripting Runtime.
Your code sample looks like it is walking through 300K rows for each new imported player. You will be able to see where my understanding of your supplied snippet breaks down.
Dim d As Long, lID As Long, lPID As Long, sPLYR As String
Dim surname As String, firstname As String
Dim dPLYRs As New Scripting.Dictionary
dPLYRs.CompareMode = TextCompare
'populate scripting dictionary
With Worksheets("Goals")
lID = Application.Max(.Range("B:B"))
lPID = Application.Max(.Range("A:A"))
For d = 1 To .Cells(Rows.Count, "G").End(xlUp).Row
sPLYR = .Cells(d, "G").Value & Chr(124) & .Cells(d, "H").Value
If Len(sPLYR) > 1 And Not dPLYRs.Exists(sPLYR) Then
dPLYRs.Add Key:=sPLYR, Item:=.Cells(d, "A").Value
ElseIf Len(sPLYR) > 1 And dPLYRs.Exists(sPLYR) Then
'repair broken PIDs
.Cells(d, "A") = dPLYRs.Item(sPLYR)
End If
Next d
End With
'this is where your sample code loses me. I have no idea where surname and firstname come from
'you probably need a loop to cycle through the imported names
'you have a unique index of surname & Chr(124) & firstname as the dictionary keys for lookup with the PID as each key's item
sPLYR = surname & Chr(124) & firstname
dPLYRs.RemoveAll: Set dPLYRs = Nothing
So the dictionary gets populated with all existing players and repairs and PIDs it finds. I cannot determine from what you've offered up what to do past then.

Resources