Excel - How to count pairs in two columns containing lists - excel

I have a number of farmers registered on my database. Each farmer grows a few fruits and sells to a few counties.
For every fruit / county pair (e.g. apple, Warwickshire), how do I count the number of farmers that can supply that combo?
I have over 100 farmers registered on my database.
So my database has a row for each farmer, a column for fruits and a column for the counties they cover. The fruits and counties that each farmer covers are recorded as comma separated lists in the two cells on that farmer's row.
I want to create a matrix with fruits on the horizontal and counties on the vertical to count how many farmers cover that particular combo.
For the example in the screenshot, I've tried:
=COUNTIF(A2:B4,AND(ISNUMBER(SEARCH(G11,A2,1)),ISNUMBER(SEARCH(A13,B2,1)))="TRUE")
but with no luck.

IF you have Excel 2010+, you can do this with Power Query (aka Get & Transform in Excel 2016+).
Using Power Query allows you to update the table easily whenever any new products (or counties) are added. You just re-run the query after you add rows to the data table (or add a product or county to a given row).
Except for removing the extra spaces (Trim after splitting the columns), all can be done via the GUI. But you can just paste the M-Code into the Advanced Editor and then explore the GUI to study the individual steps.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Products", type text}, {"Counties", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Counties", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Counties"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Counties", type text}}),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type1", {{"Products", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Products"),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Products", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type2", "Prod", each Text.Trim([Products])),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "County", each Text.Trim([Counties])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Products", "Counties"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"County", "Prod"}, {{"grouped", each _, type table [Prod=text, County=text]}, {"counts", each Table.RowCount(_), type number}}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"grouped"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Prod]), "Prod", "counts", List.Sum)
in
#"Pivoted Column"
Original Data
Results

Just for "fun" I built a solution with AND() and FIND() :
IFERROR(AND(FIND(B$7,$A$2,1),FIND($A8,$B$2,1)),0)+IFERROR(AND(FIND(B$7,$A$3,1),FIND($A8,$B$3,1)),0)+IFERROR(AND(FIND(B$7,$A$4,1),FIND($A8,$B$4,1)),0)
You could wrap this in an IF() so you only show results greater than 1 which may make it easier to spot the ones wanted.
IF(IFERROR(AND(FIND(B$7,$A$2,1),FIND($A8,$B$2,1)),0)+IFERROR(AND(FIND(B$7,$A$3,1),FIND($A8,$B$3,1)),0)+IFERROR(AND(FIND(B$7,$A$4,1),FIND($A8,$B$4,1)),0)<2,"")

Oh I get it . Its the way it has been tabulated by the form-filler(s) or has been given the data (I'm explaining it to others here so they get where I'm coming from). He probably wants to change the way these forms are filled to make it easier to read, follow and more efficient logical process going forward.
he has received confusing/badly compiled information/tables and wants to make them more straightforward/logical.
I think I got it working according to the way I understood the way you worded the question, what i know how to do in excel and your information given. Way I did it works like a "count the number of occurrences any specific word appears in a string." .
version 1:
=(LEN(lookupall("*"&B$8&"*",$A$2:$B$4,2))-LEN(SUBSTITUTE(lookupall("*"&B$8&"*",$A$2:$B$4,2),$A9,"")))/LEN(B$8)
and drag across and down.
or better:
version 2:
=(LEN(lookupall("*"&B$7&"*",$A$2:$B$4,2))-LEN(SUBSTITUTE(lookupall("*"&B$7&"*",$A$2:$B$4,2),$A8,"")))/LEN($A8)
[now 14:00 edited above - 5am written and it was off by some cells]
my results are :
version 1 results table:
&
version 2 results table: I think this is exactly what you wanted.
Notes: yes, in both source tables, A2:B4 I've called them those names (but the data is the same. war = Warwickshire. app=apples etc. )
Which one does what your seeking most?
lookupall is a UDF you can find on the net if you search around. It gives all vlookup results including duplicate lookups, concatenated together. It occurred to me that you can then look at the number of times your values in A (counties) appear in each of the results (fruit look ups), and then divide by the number of letters in that word (the counties im version 1, the fruit in version 2) to get precise number.
in version 1, I think you have to round down/up the results (because when I get rid of the glos (Gloucestershire) in b2 for instance, the result in b12 becomes 0 which would be precise given those numbers). But version 2 is better - more accurate.
Is this kinda going in the right direction for you? Might be worth more tweaking, but I think given the approximate nature of question (the way I read it), its the best I can do. It would have been more accurate to tie ...
Though I am sure there are better, more versatile, generic-scientifically accurate, other-similar-table-applicable and precise formulas out there which would do it better in just 1 formula or 1 single UDF.
The lookupall UDF I use:
Function LookupAll(vVal, rTable As Range, ColumnI As Long) As Variant
Dim rFound As Range, lLoop As Long
Dim strResults As String
With rTable.Columns(1)
Set rFound = .Cells(1, 1)
For lLoop = 1 To WorksheetFunction.CountIf(.Cells, vVal)
Set rFound = .Find(what:=vVal, After:=rFound, LookIn:=xlFormulas, lookAt _
:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _
False, SearchFormat:=False)
strResults = strResults & ";" & rFound(1, ColumnI)
Next lLoop
End With
LookupAll = Trim(Right(strResults, Len(strResults) - 1))
End Function
This actually does this and (many other) jobs actually and has been a life-saver for much of my own work. (p.s. nobody asks me questions in the office & nobody gave me anything ! its all found, researched and discovered or made by me to survive!).
my Correct results table I am pleased to say is exactly the same as Solar Mikes! So Version 2 is correct
with
=(LEN(lookupall("*"&E$7&"*",$A$2:$B$4,2))-LEN(SUBSTITUTE(lookupall("*"&E$7&"*",$A$2:$B$4,2),$A11,"")))/LEN($A11)
in cell B8 and dragged down&across

Related

word patterns within an excel column

I have 2 Excel data sets each comprising a column of word patterns and have been searching for a way to copy and group all instances of repetition within these columns into a new column.
This is the closest result I could find so far:
Sub Common5bis()
Dim Joined
Set d = CreateObject("Scripting.Dictionary") 'make dictionary
d.CompareMode = 1 'not case sensitive
a = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value 'data to array
For i = 1 To UBound(a) 'loop trough alle records
If Len(a(i, 1)) >= 5 Then 'length at least 5
For l = 1 To Len(a(i, 1)) - 4 'all strings withing record
s = Mid(a(i, 1), l, 5) 'that string
d(s) = d(s) + 1 'increment
Next
End If
Next
Joined = Application.Index(Array(d.Keys, d.items), 0, 0) 'join the keys and the items
With Range("D1").Resize(UBound(Joined, 2), 2) 'export range
.EntireColumn.ClearContents 'clear previous
.Value = Application.Transpose(Joined) 'write to sheet
.Sort .Range("B1"), xlDescending, Header:=xlNo 'sort descending
End With
End Sub
Which yielded this result for the particular question:
This example achieves 4 of the things I'm trying to achieve:
Identify repeating strings within a single column
Copies these strings into a separate column
Displays results in order of occurrence (in this case from least to most)
Displays the quantity of repetitions (including the first instance) in an adjacent column
However, although from reading the code there are basic things I've figured out that I can adapt to my purposes, it still fails to achieve these essential tasks which I'm still trying to figure out:
Identify individual words rather than single characters
I could possibly reduce the size from 5 to 3, but for the word stings I have (lists of pronouns from larger texts) that would include "I I" repetitions but won't be so great for "Your You" etc, whilst at least 4 or 5 would miss anything starting with "I I"
Include an indefinite amount of values - looking at the code and the replies to the forum it comes from it looks like it's capped at 5, but I'm trying to find a way to identify all repetitions for all multiple word strings which could be something like "I I my you You Me I You my"
Is case sensitive - this is quite important as some words in the column have been capitalised to differentiate different uses
I'm still learning the basics of VBA but have manually typed out this example of what I'm trying to do with the code I've found above:
Intended outcome:
And so on
I'm a bit screwed at this point which is why I'm reaching out here (sorry if this is a stupid question, I'm brand new to VBA as my work almost never needs Excel, let alone macros) so will massively appreciate any constructive advice towards a solution!
Because I've been working with it recently, I note that you can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range or From within sheet
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
First add a custom function:
New blank query
Rename per the code comment
Edits to make case-insensitive
Custom Function
//rename fnPatterns
//generate all possible patterns of two words or more
(String as text)=>
let
//split text string into individual words & get the count of words
#"Split Words" = List.Buffer(Text.Split(String," ")),
wordCount = List.Count(#"Split Words"),
//start position for each number of words
starts = List.Numbers(0, wordCount-1),
//number of words for each pattern (minimum of two (2) words in a pattern
words = List.Reverse(List.Numbers(2, wordCount-1)),
//generate patterns as index into the List and number of words
// will be used in the List.Range function
patterns = List.Combine(List.Generate(
()=>[r={{0,wordCount}}, idx=0],
each [idx] < wordCount-1,
each [r=List.Transform({0..starts{[idx]+1}}, (li)=> {li, wordCount-[idx]-1}),
idx=[idx]+1],
each [r]
)),
//Generate a list of all the patterns by using the List.Range function
wordPatterns = List.Distinct(List.Accumulate(patterns, {}, (state, current)=>
state & {List.Range(#"Split Words", current{0}, current{1})}), Comparer.OrdinalIgnoreCase)
in
wordPatterns
Main Function
let
//change next line to reflect data source
//if data has a column name other than "Column1", that will need to be changed also wherever referenced
Source = Excel.CurrentWorkbook(){[Name="Table17"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Create a list of all the possible patterns for each string, added as a custom column
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "Patterns", each fnPatterns([Column1]), type list),
//removed unneeded original column of strings
#"Removed Columns" = Table.RemoveColumns(#"Invoked Custom Function",{"Column1"}),
//Expand the column of lists of lists into a column of lists
#"Expanded Patterns" = Table.ExpandListColumn(#"Removed Columns", "Patterns"),
//convert all lists to lower case for text-insensitive comparison
#"Added Custom" = Table.AddColumn(#"Expanded Patterns", "lower case patterns",
each List.Transform([Patterns], each Text.Lower(_))),
//Count number of matches for each pattern
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Count", each List.Count(List.Select(#"Added Custom"[lower case patterns], (li)=> li = [lower case patterns])), Int64.Type),
//Filter for matches of more than one (1)
// then remove duplicate patterns based on the "lower case pattern" column
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Count] > 1)),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows", {"lower case patterns"}),
//Remove lower case pattern column and sort by count descending
#"Removed Columns1" = Table.RemoveColumns(#"Removed Duplicates",{"lower case patterns"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Count", Order.Descending}}),
//Re-construct original patterns as text
#"Extracted Values" = Table.TransformColumns(#"Sorted Rows",
{"Patterns", each Text.Combine(List.Transform(_, Text.From), " "), type text})
in
#"Extracted Values"
Note that you could readily implement a similar algorithm using VBA, the VBA.Split function and a Dictionary

Get values of top N based on sum and condition [duplicate]

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.
This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())
An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)
Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

Restructuring a table with PowerQuery

I am moving my first steps in PowerQuery, so here's my problem. I have a raw data table which list countries and certain products. For each product there is the "market" value followed by a MyValue (meaning my own sales of that product in that country). An example here:
raw table
What I was trying to obtain with PowerQuery is a table that unpivots the products category and leaves two columns, one for Market and one for MyValue.
I tried in many ways and the closest to the result I could get was splitting the original table in two, one for the Market and one for MyValues. Then unpivot each one of them in PowerQuery so that I could get them in this way:
Market
And
MyValue
I tried then to merge the two tables but can't work it out. Of course I could do that manually but I'm sure there a way to do it with PowerQuery, either splitting into 2 tables, unpivoting and then merging or - even better - with a single query.
The result I'm aiming at is like
Desired Result
You are close.
After you unpivot, you need to create a custom column that you can pivot on, and also modify the names in the resultant "attribute" column.
Read the comments in the code and explore the Applied Steps window to understand the algorithm
M Code
let
Source = Excel.CurrentWorkbook(){[Name="rawTable"]}[Content],
//generalized "typer" in case you add other Items
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Country", type text}, {"Date", type date}} &
List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),2),each {_, Int64.Type})),
//Unpivot all except Country|Date
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Country", "Date"}, "Item", "Value"),
//Add Custom Column to create Pivot column for "Market" and "MyValue
#"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "Custom", each
if Text.StartsWith([Item],"My")
then "Market"
else "MyValue"),
//Replace "My" so Item Labels will be consistent
#"Replaced Value" = Table.ReplaceValue(#"Added Custom","My","",Replacer.ReplaceText,{"Item"}),
//Pivot with no aggregation (unless you want to)
#"Pivoted Column" = Table.Pivot(#"Replaced Value", List.Distinct(#"Replaced Value"[Custom]), "Custom", "Value"),
//Sort "Items" to original Column Order
itemSortOrder = List.Distinct(#"Replaced Value"[Item]),
sorted = Table.Sort(#"Pivoted Column",
{{"Country", Order.Ascending},
each List.PositionOf(itemSortOrder,[Item])
})
in
sorted
Hopefully, this is what you want for a result
thank you so much for having spent your time to help me.
I think I solved my problem using the List.Zip function. Solution was not mine but I took if from THIS video. With this trick, I don't even have to split the original source data into two tables (market & MyShare).
It perfectly does what I needed to with little if no effort for data-cleaning...

Comparing Cells in a row to see if adjacent cells have same values

Problem: My maximum Range is around 10000 Rows x 365 columns, I want to compare cell values across a row .
Conditions:
It has to return how many times a name is repeated in each row for every primary key
if a name comes only once in a row, that need not be shown, anything more than 2 should be displayed
It has to exclude blank cells and if it encounters "Dispatched" then it need not count further.
Requirement: Any solution either excel or macro would do.
Sample Excel File
Bag Number
8th July
9th July
10th July
11th July
12th July
13th July
20/F/43352/1
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/2
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/3
FINAL POLISH
QC
Dispatched
Dispatched
Dispatched
Dispatched
20/F/43352/4
Casting
Casting
Laser Cutting
Filing
Filing
FINAL POLISH
20/F/43352/5
Casting
20/F/43352/6
Casting
Casting
FINAL POLISH
Dispatched
20/F/43352/7
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
The Output for the same should be
Bags
Casting
Filing
Final Polish
Dispatched
20/F/43347/1
3days
3 days
Yes
20/F/43347/2
3days
3 days
Yes
20/F/43347/3
2 days
3days
3 days
Yes
Background
Until very recently this process was manual so once this spreadsheet was made, it would be divided among 3 people and they would manually scan, highlight and proceed
Tried a countif condition, row wise but that again reduces 365 columns to 12 columns and leaves behind lots of unnecessary values, (if its in a station for only 1 day need not be highlighted)
Tried Pivot but did not give a summary that makes sense.
VBA is not my strong suite haven't tried anything there.
I am looking for something that will help make sense to this and highlight if any product is stuck anywhere.
Hi all, to answer all queries,
#braX I have tried countif with the department names, but the resulting table is unwieldy for my requirement. am looking for ideas to solve this
#DavidWooley-AST there are total of 12 departments, and the data is kept for an entire year, a primary key can go through each department in 45 days or more.
Also there is a chance that incase of any rework then there is a revisit to the department. thus that data also has to be captured, sorry I should have mentioned this before.
You can create the output you show using Power Query, available in Windows Excel 2010+ and Office 365.
The below should get you started.
You will have to add some lines in the Table.Group Aggregation list for other tasks.
You may also need to add code to exclude non-repeats and after "Dispatched" but you showed no examples of that in your data or results, so I did not code anything for that.
I also don't know what you mean by "highlight if any product is stuck anywhere".
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Group by Bag Number
// then extract the Count for each type
// Add " days" to each count
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Bag Number"}, {
{"Filing", (t)=> "Filing " & Text.From(List.Count(List.Select(t[Value],each _ = "FILING"))) & " days"},
{"Final Polish", (t)=> "Final Polish " & Text.From(List.Count(List.Select(t[Value],each _ = "FINAL POLISH"))) & " days"}
}),
//Merge columns with commas (and hyphen for the first to the rest) to get final format
#"Merged Columns" = Table.CombineColumns(#"Grouped Rows",{"Filing", "Final Polish"},
Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged"),
#"Merged Columns1" = Table.CombineColumns(#"Merged Columns",{"Bag Number", "Merged"},
Combiner.CombineTextByDelimiter(" - ", QuoteStyle.None),"A")
in
#"Merged Columns1"
Edit based on your new example of data and desired output
Given your new example, you can get the output from PQ as shown below.
Note that you can add the other departments using the same syntax as shown for those done (except for Dispatched which is treated differently).
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Change to proper case for consistency and text matching
properCase = Table.TransformColumns(#"Removed Columns",{{"Value", Text.Proper, type text}}),
//Group by Bag Number
// then extract the Count for each type
// Show null if count < 2
// Add " days" to each count
// Show only `Dispatched` if it occurrs one or more times
#"Grouped Rows" = Table.Group(properCase, {"Bag Number"}, {
{"Casting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Casting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Laser Cutting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Laser Cutting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Filing", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Filing"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Final Polish", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Final Polish"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"QC", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Qc"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Dispatched", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Dispatched"))
in
if x = 0 then null else "Dispatched", type text}
})
in
#"Grouped Rows"

How do I calculate Percentiles in PowerQuery based on grouping variables?

I have a few columns of data, I need to convert the excel version of "PERCENTILE" into Powerquery format.
I have some code which adds in as a function but doesnt apply accurately as it doesnt allow for grouping of the data by CATEGORY and YEAR. So anything that is in Full Discretionary 1.5-2.5 AND 2014 needs to be added to the percentile array, equally anything that falls in Full discretionary 2.5-3.5 AND 2014 needs to go into a different percentile array
let
Source = (list as any, k as number) => let
Source = list,
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Sorted Rows" = Table.Sort(#"Converted to Table",{{"Column1", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "TheIndex", each Table.RowCount(#"Converted to Table")*k/100),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Index] >= [TheIndex] and [Index] <= [TheIndex]+1),
Custom1 = List.Average(#"Filtered Rows"[Column1])
in
Custom1
in
Source
So Expected results would be that anything that matches off on the 2 columns (Year,Category) should be applied within the same array. Currently invoking the above function just gives me errors.
I have also tried using grouping and outputting the "Min, Median, and Max" outputs but I also require 10% and 90% Percentiles.
Thank you in advance
Based on some findings on other websites and alot of googling (most folk just want to use DAX but if youre only using Power Query you cant!) someone posted an answer which is very helpful:
https://social.technet.microsoft.com/Forums/en-US/a57bfbea-52d1-4231-b2de-fa993d9bb4c9/can-the-quotpercentilequot-be-calculated-in-power-query?forum=powerquery
Basically:
/PercentileInclusive Function
(inputSeries as list, percentile as number) =>
let
SeriesCount = List.Count(inputSeries),
PercentileRank = percentile*(SeriesCount-1)+1, //percentile value between 0 and 1
PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
Percentile1 = List.Max(List.MinN(inputSeries,PercentileRankRoundedDown)),
Percentile2 = List.Max(List.MinN(inputSeries,PercentileRankRoundedUp)),
Percentile = Percentile1+(Percentile2-Percentile1)*(PercentileRank-PercentileRankRoundedDown)
in
Percentile
The above will replicate the PERCENTILE function found within Excel - you pass this as a query using "New Query" and advanced editor. Then call it in after grouping your data -
Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each
List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each
List.Average([Amount Sales]), type number}})
In the above formula, RenamedColumns is the name of the previous step
in the script. Change the name to match your actual case. I've assumed
that the pre-grouping sales amount column is "Amount Sales." Names of
grouped columns are "Sales Total" and "95 Percentile Sales."
Next modify the group formula, substituting List.Average with
PercentileInclusive:
Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each
List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each
PercentileInclusive([Amount Sales],0.95), type number}})
This worked for my data set and matches similar

Resources