Excel - How to find number of times strings appear in column - excel

This is my first post, I'm still a beginner at excel!
I created a python script that scrapes the Billboard Hip Hop/R&B Charts and populates the data to an excel spreadsheet. My data looks like this:
Headers are Billboard Number, Artist Name and Song Title.
1 Drake Nice For What
2 Post Malone Featuring Ty Dolla $ign Psycho
3 Drake God's Plan
4 Post Malone Better Now
5 Post Malone Featuring 21 Savage Rockstar
6 BlocBoy JB Featuring Drake Look Alive
7 Post Malone Paranoid
8 Lil Dicky Featuring Chris Brown Freaky Friday
9 Post Malone Rich & Sad
10 Post Malone Featuring Swae Lee Spoil My Night
11 Post Malone Featuring Nicki Minaj Ball For Me
12 Migos Featuring Drake Walk It Talk It
13 Post Malone Featuring G-Eazy & YG Same Bitches
14 Cardi B| Bad Bunny & J Balvin I Like It
15 Post Malone Zack And Codeine
16 Post Malone Over Now
17 Cardi B Be Careful
18 Post Malone Takin' Shots
19 The Weeknd & Kendrick Lamar Pray For Me
20 Rich The Kid Plug Walk
21 The Weeknd Call Out My Name
22 Bruno Mars & Cardi B Finesse
23 Post Malone Candy Paint
24 Ella Mai Boo'd Up
25 Rae Sremmurd & Juicy J Powerglide
26 Post Malone 92 Explorer
27 J. Cole ATM
28 J. Cole KOD
29 Post Malone Otherside
30 Post Malone Blame It On Me
31 J. Cole Kevin's Heart
32 Kendrick Lamar & SZA All The Stars
33 Nicki Minaj Chun-Li
34 Lil Pump Esskeetit
35 Migos Stir Fry
36 Famous Dex Japan
37 Post Malone Sugar Wraith
38 Cardi B Featuring Migos Drip
39 XXXTENTACION Sad!
40 Jay Rock| Kendrick Lamar| Future & James Blake King's Dead
41 Rich The Kid Featuring Kendrick Lamar New Freezer
42 Logic & Marshmello Everyday
43 J. Cole Motiv8
44 YoungBoy Never Broke Again Outside Today
45 Post Malone Jonestown (Interlude)
46 Cardi B Featuring 21 Savage Bartier Cardi
47 YoungBoy Never Broke Again Overdose
48 J. Cole 1985 (Intro To The Fall Off)
49 J. Cole Photograph
50 Khalid| Ty Dolla $ign & 6LACK OTW
I want to count the total number times an artist appears under Artist Name including if they were featured in a song and also display top charting song. For example:
Headers are Artist Name, Billboard Appearances and Top Song.
Post Malone 17 Psycho
J.Cole 6 ATM
Cardi B 5 I Like It
Drake 4 Nice For What
Migos 3 Walk It Talk It
YoungBoy Never Broke Again 2 Outside Today
Rich The Kid 2 Plug Walk
21 Savage 2 Rockstar
...
How can I achieve this?

If you already have all artist names, use countif and vlookup with wildcard.
ps. Make sure your artist names are correct. Your sample data J.cole didn't contain whitespace, it will return wrong result.

First you need to do what's called data cleaning to get a list of the artists on the billboards.
To get a list of the unique artists copy the list of your data to a new space on your spreadsheet. Then select all of your data and run the "Remove Duplicates" function (under the Data tab) selecting the artist column. This will give you a list of all the unique artists and give you their top song to boot.
Now all those X featuring Y artist "names" will be unique, so you will need to filter it down some more. Search for the "Featuring" linker word with the Find function and use that combined with the Left function to grab only the first artist. Something like this, =IFERROR(LEFT(I3,FIND("Featuring",I3)-2),I3)
This function uses the iferror function to pass through names that don't have the "Featuring" word. Then do the same thing on that resulting column for & and | and that will give you a pretty clean list of single artists.
To get the featured artists, do a similar thing using the Right function instead of the Left function.
After you get that clean list do the unique artists filter again to condense it down. From there you can use that with your vlookup or find functions to start counting.

You can use Power Query (Get & Transform Data) to manipulate your table:
let
Source = Excel.CurrentWorkbook(){[Name="tbInput"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(
Source,
{{"Billboard Number", Int64.Type},
{"Artist Name", type text},
{"Song Title", type text}}),
#"Split Column by Delimiters" = Table.ExpandListColumn(
Table.TransformColumns(
#"Changed Type",
{{"Artist Name", Splitter.SplitTextByAnyDelimiter({"Featuring","|","&"}, QuoteStyle.None),
let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
"Artist Name"),
#"Trimmed Text" = Table.TransformColumns(
#"Split Column by Delimiters",
{{"Artist Name", Text.Trim, type text}}),
#"Artist Highest Position" = Table.Group(
#"Trimmed Text",
{"Artist Name"},
{{"Highest Position", each List.Min([Billboard Number]),
type number}}),
#"Merge 1" = Table.NestedJoin(
#"Artist Highest Position",
{"Artist Name", "Highest Position"},
#"Trimmed Text",
{"Artist Name", "Billboard Number"},
"Merged",
JoinKind.LeftOuter),
#"Highest Song" = Table.ExpandTableColumn(
#"Merge 1", "Merged", {"Song Title"}, {"Song Title"}),
#"Artist Count" = Table.Group(
#"Trimmed Text",
{"Artist Name"},
{{"Count", each Table.RowCount(_), type number}}),
#"Merge 2" = Table.NestedJoin(
#"Artist Count",
{"Artist Name"},
#"Highest Song",
{"Artist Name"},
"Merged",
JoinKind.LeftOuter),
#"Expanded Merged" = Table.ExpandTableColumn(
#"Merge 2", "Merged", {"Song Title"}, {"Song Title"}),
#"Sorted Rows" = Table.Sort(#"Expanded Merged",{{"Count", Order.Descending}})
in
#"Sorted Rows"
Which gives the output:

The hard part is cleaning the data to get a unique list of artist names.
Examining your list, it seems that when there are multiple artist names listed for a single song, they will be separated by Featuring, &, or |
If that is always the case, you can use a VBA macro to separate the names, and then use a Dictionary to collect a list of the names.
While you are creating that list, it is trivial to also obtain the Count of times the artist appears, and also the top rated song (which would be the song associated with the first instance of that name).
We utilize a User Defined Object (Class) to hold the information, and collect those objects into a Dictionary keyed to the artist name.
Note also that we read the worksheet data into a VBA array, and iterate through the array. This usually runs an order of magnitude faster than iterating through the actual worksheet.
To obtain the report, we would then output the results onto a worksheet.
Class Module
Option Explicit
'Class module **RENAME**: cArtist
Public Cnt As Long
Public Song As String
Regular Module
Option Explicit
Option Compare Text
Sub Artists()
Dim dA As Dictionary, cA As cArtist
Dim vSrc, vRes
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim V, W, X, Y, Z, A, B
Dim I As Long
Dim sKey As String
Set wsSrc = Worksheets("sheet6")
With wsSrc
vSrc = .Range(.Cells(2, 2), .Cells(.Rows.Count, 2).End(xlUp)).Resize(columnsize:=2)
End With
Set wsRes = Worksheets("sheet6")
Set rRes = wsRes.Cells(1, 6)
Set dA = New Dictionary
For I = 1 To UBound(vSrc, 1)
W = Split(vSrc(I, 1), "Featuring")
For Each X In W
Y = Split(X, "|")
For Each Z In Y
A = Split(Z, "&")
For Each B In A
sKey = Trim(B)
Set cA = New cArtist
With cA
.Cnt = 1
.Song = Trim(vSrc(I, 2))
End With
If Not dA.Exists(sKey) Then
dA.Add Key:=sKey, Item:=cA
Else
dA(sKey).Cnt = dA(sKey).Cnt + 1
End If
Next B
Next Z
Next X
Next I
ReDim vRes(0 To dA.Count, 1 To 3)
vRes(0, 1) = "Artist Name"
vRes(0, 2) = "Billboard Appearances"
vRes(0, 3) = "Top Song"
I = 0
For Each V In dA.Keys
I = I + 1
With dA(V)
vRes(I, 1) = V
vRes(I, 2) = .Cnt
vRes(I, 3) = .Song
End With
Next V
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
.Sort key1:=rRes(1, 2), order1:=xlDescending, key2:=rRes(1, 1), order2:=xlAscending, MatchCase:=False, Header:=xlYes
.Style = "Output"
With .Columns(2)
.ColumnWidth = .ColumnWidth / 2
.WrapText = True
.HorizontalAlignment = xlCenter
End With
With .Rows(1)
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlCenter
End With
.EntireColumn.AutoFit
End With
End Sub
Output given your Input above

Related

Where can I add a conditional statement in this M function code (Power Query)?

I'm trying to work my way through this PowerQuery problem but I'm getting fairly stumped (been throwing myself at it for hours).
The goal is to retrieve a value from a table, closest to a variable (x). Sample of the table:
S
1
2
3
0
1698
1737
1781
1
1737
1795
1855
2
1780
1854
1928
3
1822
1912
2002
4
1864
1971
2075
5
1907
2029
2149
6
1949
2086
2222
7
1992
2145
2296
8
2034
2203
2369
9
2077
2262
2443
10
2119
2320
2516
11
2162
2378
2590
Let's say the variable S = "1" (column name) and variable x is "2000". The code I'm using works fine in this scenario:
let
S = "1",
x = 2000,
Source = tblSchalen,
Result = Table.Column(Source, S){List.Count(List.Select( Table.Column(Source, S), each _ <= x))}
in
Result
This correctly returns the value of "2034", as 2000 is higher than 1992 and the closest (upstairs) neighbour is 2034.
The problem is that, if I make x "2500" it errors (too few elements in enumeration) since the search procedure sort of "overflows". I vaguely understand why this happens (it's counting the amount (List.Select) of values that are lower or equal to X, and uses that number that to return a row number?) but I'd like to prevent this overflow by inserting some conditional statements as I just would want every x variable above the highest value in the list to just return the highest value in the list. (so x = 2500 would return "2162" if S is 1, "2378" if S is 2 and "2516" if S is 3)
Could anyone help me in the right direction? The syntax in M is different enough from VBA to make me confused and the M editor isn't quite as helpful as the VBA editor when debugging.
edit:
I guess I "want something like this, but working", these conditional statements in M really trip me up, as the code below gives me a "Token Identifyer Expected" error:
let
S = "1",
x = 2500,
Source = tblSchalen,
amount = List.Count(List.Select(Table.Column(Source, S), each _ <= x)), // this return a number from 0 - 12
if amount > 11 then amount = 11 else amount = amount
Result = Table.Column(amount)
in
Result
I've tried "Result = if amount > 11 etc etc" just to give it some identifier, but that's also no bueno.
This would find the closest value to Table2 Column1 from Table1, using column name listed in Table2 Column2
let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", Int64.Type}}),
TransformedTable1=Table.Buffer(Table.UnpivotOtherColumns(Table1, {"S"}, "Attribute", "Value")),
#"Added Custom" = Table.AddColumn(#"Changed Type","Closest",(i)=>
Table.Sort(
Table.AddColumn(Table.SelectRows(TransformedTable1,
each [Attribute]=Text.From(i[Column2])),
"diff", each Number.Abs([Value]-i[Column1]))
,{{"diff", Order.Ascending}})
{0}[Value]
)
in #"Added Custom"
Note: Column 2 is the name of the column, not the position

Macro to Calendarize Data

Problem:
I am writing a macro to do some data input for me. These data reports all have the same format (date in column 1, value in column 2), except the order of the months of data change from client to client. Currently I have written the macro to grab the data from another sheet and bring it into the current sheet. But now I need to write something to take the data and get it into the correct format.
Example: This client is organized from Apr-Mar (again, it could be any 12 month combination) and I need to get it into Jan-Dec, regardless of year.
Before:
Apr-14 37,645
May-14 47,000
Jun-14 11,600
Jul-14 33,503
Aug-14 38,550
Sep-14 36,063
Oct-14 39,246
Nov-14 30,315
Dec-14 28,403
Jan-15 25,799
Feb-15 24,302
Mar-15 27,873
After:
Jan-15 25,799
Feb-15 24,302
Mar-15 27,873
Apr-14 37,645
May-14 47,000
Jun-14 11,600
Jul-14 33,503
Aug-14 38,550
Sep-14 36,063
Oct-14 39,246
Nov-14 30,315
Dec-14 28,403
Attempt at a Solution:
I am hesitant to post it as it's incomplete and would be hard to follow. All it does is parse the first date cell to get the first 3 letters corresponding to a month, then check against 12 if/elseif statements. There must be a better way to write it, I just can't think of it.
Any and all help is appreciated. I'm sure a shove in the right direction would help!
I ended up just creating a loop that looks for "Jan" and gets all the data from there.
'Loop to dynamically calendarize data
''Loop returns 2 arrays, length 12, index 0-11, "electic" and "heating" in Jan to Dec format
complete = False
For i = 9 To 20
If Left(target.Sheets("Sheet1").Cells(i, 1).Value, 3) = "Jan" Then
For j = 0 To (20 - i)
carryOver = i + j
electric(j) = target.Sheets("Sheet1").Cells(carryOver, 2).Value
heating(j) = target.Sheets("Sheet1").Cells(carryOver, 5).Value
Next j
complete = True
ElseIf complete = True Then
For k = j To 11
electric(k) = target.Sheets("Sheet1").Cells(9 + (k - j), 2).Value
heating(k) = target.Sheets("Sheet1").Cells(9 + (k - j), 5).Value
Next k
GoTo end_of_for
End If
Next i
end_of_for:

using "Large if" in vba

I am working on a spreadsheet in which I have data of quarter-wise sales by co. and sector. My data is arranged in below mentioned format:
Row 1 has date
Data starts from row 2
Column B: Company Name
Column C: Sector Name of the co. for ex. energy, materials, technology etc
Column D: Sales figure of co.
Column E, F, G: Price, shares, volume of co.
Column H: Blank
From next column onwards I have same data fields for next quarter
Column I: Company Name
Column J: Sector Name of the co. for ex. energy, materials, technology etc
Column K: Sales figure of co
So on and so forth
Now, in another worksheet I need to get name and sales figures for top 3/5/10/15 etc (top n co.s depending on user input) within each sector. For ex. sales figures and co. name of top 3 companies in Energy sector.
I have been trying to write a vba code for this but I am struggling. I have mentioned below the code I was trying but its not flexible at all since in my code I have given reference of column C and D which would actually change after each quarter
Sub try()
Dim r As Long
n = Range("topcos").Value + 5
For r = 5 To n
p = 1
Worksheets("Top co. share with co name").Activate
Cells(r, 16).Select
Selection.FormulaArray = "=LARGE(IF('Co Wise'!$C$3:$C$600=P$3,'Co Wise'!$D$3:$D$600,""""),p)"
p = p + 1
Next r
End Sub
It seems that you merely forgot that p is a variable and should be treated as such in the formula:
Selection.FormulaArray = "=LARGE(IF('Co Wise'!$C$3:$C$600=P$3,'Co Wise'!$D$3:$D$600,"""")," & p & ")"
Alternatively:
Sub SO()
Sheets("Top co. share with co name").Range("P5:P" & [topcos] + 5).FormulaArray = _
"=LARGE(IF('Co Wise'!$C$3:$C$600=P$3,'Co Wise'!$D$3:$D$600,""""),ROW()-4)"
End Sub
Would do the same thing without the need for loops and variables.

Pulling data out of an worksheet by country

I have a huge amount of people in my excel sheet and I want to split them by country with excel coding, here is an example of my data:
Country | Name
UK | Tom
Austria | Bobsky
UK | Ralf
Germany | Badolf
Germany | Schwartz
UK | Andy
So would it be possible to just separate the people who are in the UK into s different part of my spreadsheet?
I have already tried
INDEX(B1:B6, MATCH("UK", A1:A6,0)) - this returns a repeated row if the match function returns no result
I have also tried many things with
if(VLOOKUP(etc etc) = "UK".....
and I have found this doesn't work either. I thought this would be something excel could do simply without having to filter + copy & paste or use VBA but this is not easy.
This is doable for a couple thousand rows of data. If your 'huge amount of people' is much more than that, an advanced filter or pivot table is a more viable solution.
      
With UK in D3 use the following in E3.
=IFERROR(INDEX($B$2:$B$9999, SMALL(INDEX(ROW($1:$9995)+($A$2:$A$9996<>D3)*1E+99, , ), ROW(1:1))), "")
Fill down as necessary.
Without wasting time to draft a complex Excel formula:
Option 1
Use a PivotTable where you use the Country column as a filter
Option 2
Use a Microsoft Query Data->From Other Sources->Microsoft Query on each worksheet like this:
SELECT * FROM [Sheet1$] WHERE Country = 'UK' ORDER BY NAME
Sub processData()
'I have workout for only UK
'this macro copies records of UK people from Sheet1 to Sheet2
lastrows = Worksheets("Sheet1").Cells(Rows.count, 1).End(xlUp).Row
count = 1
For x = 2 To lastrows
If (Worksheets("Sheet1").Cells(x, 1) = "UK") Then
Worksheets("Sheet2").Cells(count, 1) = Worksheets("Sheet1").Cells(x, 1)
Worksheets("Sheet2").Cells(count, 2) = Worksheets("Sheet1").Cells(x, 2)
count = count + 1
End If
Next x
MsgBox "Task finished"
End Sub

Categorize single column of text into multiple columns

I am trying to create a macro that will categorize data in one column into multiple columns based on the item type. The data I am trying to categorize is a list of contracts with meta-data on the items in the contract.
The raw data looks like this:
Contract No Contract Name Item Type Item Description
111111 Chocolate Supplies POTS 5"
111111 Chocolate Supplies POTS 10"
111111 Chocolate Supplies POTS 15"
111111 Chocolate Supplies PANS 5"
111111 Chocolate Supplies PANS 10"
111111 Chocolate Supplies PANS 15"
111111 Chocolate Supplies KNIVES Paring knife
111111 Chocolate Supplies SILVERWARE Salad fork
111111 Chocolate Supplies SILVERWARE Dinner fork
111111 Chocolate Supplies SILVERWARE Dessert fork
111111 Chocolate Supplies SILVERWARE Dessert spoon
111111 Chocolate Supplies SILVERWARE Soup spoon
22222 Soups and Salads Order POTS 10"
22222 Soups and Salads Order POTS 15"
22222 Soups and Salads Order PANS 15"
22222 Soups and Salads Order KNIVES Butter knife
22222 Soups and Salads Order KNIVES Bread knife
22222 Soups and Salads Order KNIVES Paring knife
22222 Soups and Salads Order SILVERWARE Soup spoon
The final data needs to look like this (edited to include image):
Contract Contract Name POTS PANS KNIVES SILVERWARE
111111 Chocolate Supplies 5" 5" Paring knife Salad fork
10" 10" Dinner fork
15" 15" Dessert fork
Dessert spoon
Soup spoon
22222 Soups and Salads Order 10" 15" Butter knife Soup spoon
15" Bread knife
Paring knife
# What I've tried so far #
The current crude solution I am using is to:
- Run the query
- Paste the data into excel
- Create a pivot
- Use a series of count, offset and indirect formulas to reorganize the data as needed
- Since the above process leaves empty rows between each section of contracts, I copy-paste the data into a new worksheet, put an Autofilter and remove the blank rows
... and voila, that's the final report.
# Possible VBA solution #
I found this tutorial which seems to do exactly what I want, except for the problem where I need the macro to start a new section when the contract no. changes. I don't know how to get the VBA code below to also check for the contract no.
I'd love any help you could send my way. Thanks in advance!
# Code from tutorial on get-digital-help [dot] com by Oscar. #
This is not my code, and I give complete credit to Oscar's tutorial for getting me going in the right direction.
Sub Categorizedatatocolumns()
Dim rng As Range
Dim dest As Range
Dim vrb As Boolean
Dim i As Integer
Set rng = Sheets("Sheet1").Range("A4")
vrb = False
Do While rng <> ""
Set dest = Sheets("Sheet1").Range("A20")
Do While dest <> ""
If rng.Value = dest.Value Then
vrb = True
End If
Set dest = dest.Offset(0, 1)
Loop
If vrb = False Then
dest.Value = rng.Value
dest.Font.bold = True
End If
vrb = False
Set rng = rng.Offset(1, 0)
Loop
Set rng = Sheets("Sheet1").Range("A4")
Do While rng <> ""
Set dest = Sheets("Sheet1").Range("A20")
Do While dest <> ""
If rng.Value = dest.Value Then
i = 0
Do While dest <> ""
Set dest = dest.Offset(1, 0)
i = i + 1
Loop
Set rng = rng.Offset(0, 1)
dest.Value = rng.Value
Set rng = rng.Offset(0, -1)
Set dest = dest.Offset(-i, 0)
End If
Set dest = dest.Offset(0, 1)
Loop
Set rng = rng.Offset(1, 0)
Loop
End Sub
You may consider using pivot table which will give similar output.
Turn off the Subtotal and show data in tabular form for all fields.

Resources