How do I calculate Percentiles in PowerQuery based on grouping variables? - excel

I have a few columns of data, I need to convert the excel version of "PERCENTILE" into Powerquery format.
I have some code which adds in as a function but doesnt apply accurately as it doesnt allow for grouping of the data by CATEGORY and YEAR. So anything that is in Full Discretionary 1.5-2.5 AND 2014 needs to be added to the percentile array, equally anything that falls in Full discretionary 2.5-3.5 AND 2014 needs to go into a different percentile array
let
Source = (list as any, k as number) => let
Source = list,
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Sorted Rows" = Table.Sort(#"Converted to Table",{{"Column1", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "TheIndex", each Table.RowCount(#"Converted to Table")*k/100),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Index] >= [TheIndex] and [Index] <= [TheIndex]+1),
Custom1 = List.Average(#"Filtered Rows"[Column1])
in
Custom1
in
Source
So Expected results would be that anything that matches off on the 2 columns (Year,Category) should be applied within the same array. Currently invoking the above function just gives me errors.
I have also tried using grouping and outputting the "Min, Median, and Max" outputs but I also require 10% and 90% Percentiles.
Thank you in advance

Based on some findings on other websites and alot of googling (most folk just want to use DAX but if youre only using Power Query you cant!) someone posted an answer which is very helpful:
https://social.technet.microsoft.com/Forums/en-US/a57bfbea-52d1-4231-b2de-fa993d9bb4c9/can-the-quotpercentilequot-be-calculated-in-power-query?forum=powerquery
Basically:
/PercentileInclusive Function
(inputSeries as list, percentile as number) =>
let
SeriesCount = List.Count(inputSeries),
PercentileRank = percentile*(SeriesCount-1)+1, //percentile value between 0 and 1
PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
Percentile1 = List.Max(List.MinN(inputSeries,PercentileRankRoundedDown)),
Percentile2 = List.Max(List.MinN(inputSeries,PercentileRankRoundedUp)),
Percentile = Percentile1+(Percentile2-Percentile1)*(PercentileRank-PercentileRankRoundedDown)
in
Percentile
The above will replicate the PERCENTILE function found within Excel - you pass this as a query using "New Query" and advanced editor. Then call it in after grouping your data -
Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each
List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each
List.Average([Amount Sales]), type number}})
In the above formula, RenamedColumns is the name of the previous step
in the script. Change the name to match your actual case. I've assumed
that the pre-grouping sales amount column is "Amount Sales." Names of
grouped columns are "Sales Total" and "95 Percentile Sales."
Next modify the group formula, substituting List.Average with
PercentileInclusive:
Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each
List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each
PercentileInclusive([Amount Sales],0.95), type number}})
This worked for my data set and matches similar

Related

The Date Where There Is Enough Supply To Satisfy Dem

source link
I am trying to come up with a solution to the following problem.
Problem:
In my dataset I have certain quantity of item in demand (need), and purchase orders that re-supply that item(Supply). I need to determine for each demand , what is the first date where we will have enough supply to fill the demand.
For example, if we look at our 1st demand, which require 5 units, according to the cumulative Sum column, 18/12/23 will be the first date when we would have enough qty supplied to satisfy the first demand. The problem appears when we have more the one demand for an item.
If we stay with same item What I would like to do is to update the cumulative Sum when we meet the enough quantity ( as cumulative Sum = cumulative Sum- qty(demand) or 6(cumulative supply)-5(demand) = 1 ) so the cumulative Sum for the next demand will be 100 +1 = 101 and not 100 + 6 = 106. Thereby we can simply rely on the cumulative Sum (updated) to retrieve the first date where we will have enough supply to fill the demand.
I'm not sure if something like this is possibly in Power Query but any help is greatly appreciated.
Hopefully that all makes sense. Thx.
Revised
In powerquery try this as code for Demand
let Source = Excel.CurrentWorkbook(){[Name="DemandDataRange"]}[Content],
#"SupplyGrouped Rows" = Table.Group(Supply, {"item"}, {{"data", each
let a = Table.AddIndexColumn( _ , "Index", 0, 1),
b=Table.AddColumn(a, "CumTotal", each List.Sum(List.FirstN(a[Qty],[Index]+1)))
in b, type table }}),
#"SupplyExpanded data" = Table.ExpandTableColumn(#"SupplyGrouped Rows", "data", { "Supply date", "CumTotal"}, {"Supply date", "CumTotal"}),
#"Grouped Rows" = Table.Group(Source, {"item"}, {{"data", each
let a= Table.AddIndexColumn(_, "Index", 0, 1),
b=Table.AddColumn(a, "CumTotal", each List.Sum(List.FirstN(a[Qty],[Index]+1)))
in b, type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Qty", "Date", "Index", "CumTotal"}, {"Qty", "Date", "Index", "CumTotal"}),
x=Table.AddColumn(#"Expanded data","MaxDate",(i)=>try Table.SelectRows( #"SupplyExpanded data", each [item]=i[item] and [CumTotal]>=i[CumTotal] )[Supply date]{0} otherwise null, type date ),
#"Removed Columns" = Table.RemoveColumns(x,{"Index", "CumTotal"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Date", type date}})
in #"Changed Type"
Given my understanding of what you want for results, the following Power Query M code should return that.
If you just want to compare the total supply vs total demand, then only check the final entries instead of the first non-negative.
Read the code comments, statement names and explore the Applied Steps to understand the algorithm.
let
//Read in the data tables
//could have them in separate querries
Source = Excel.CurrentWorkbook(){[Name="Demand"]}[Content],
Demand = Table.TransformColumnTypes(Source,{{"item", type text}, {"Qty", Int64.Type}, {"Date", type date}}),
//make demand values negative
#"Transform Demand" = Table.TransformColumns(Demand,{"Qty", each _ * -1}),
Source2 = Excel.CurrentWorkbook(){[Name="Supply"]}[Content],
Supply = Table.TransformColumnTypes(Source2,{{"item", type text},{"Qty", Int64.Type},{"Supply date", type date}}),
#"Rename Supply Date Column" = Table.RenameColumns(Supply,{"Supply date","Date"}),
//Merge the tables and sort by Item and Date
Merge = Table.Combine({#"Rename Supply Date Column", #"Transform Demand"}),
#"Sorted Rows" = Table.Sort(Merge,{{"item", Order.Ascending}, {"Date", Order.Ascending}}),
//Group by Item
//Grouped running total to find first positive value
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"item"}, {
{"First Date", (t)=> let
#"Running Total" = List.RemoveFirstN(List.Generate(
()=>[rt=t[Qty]{0}, idx=0],
each [idx]<Table.RowCount(t),
each [rt=[rt]+t[Qty]{[idx]+1}, idx=[idx]+1],
each [rt]),1),
#"First non-negative" = List.PositionOfAny(#"Running Total", List.Select(#"Running Total", each _ >=0), Occurrence.First)
in t[Date]{#"First non-negative"+1}, type date}})
in
#"Grouped Rows"
Supply
Demand
Results
I did this in Excel formula rather than using powerquery - there will be a powerquery equivalent but I'm not very fluent in DAX yet.
You need a helper column - could do without it but everything's much more readable if you have it.
In sheet Supply (2), cell E2, enter the formula:
=SUMIFS(Supply!B:B; Supply!C:C;"<=" & C2;Supply!A:A;A2)-SUMIFS(Dem!B:B;Dem!C:C;"<=" & C2;Dem!A:A;A2)
and copy downwards. This can be described as Total supply up to that date minus total demand up to that date. In some cases this will be negative (where there's more demand than supply).
Now you need to find the date of the first non-negative value for that.
First create a unique list of the items - I put it on the same sheet in the range G2:G6. Then in H2, the formula:
=MINIFS(C:C;A:A;G2;E:E;">=" & 0)
and copy downwards.

Get values of top N based on sum and condition [duplicate]

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.
This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())
An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)
Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

Calculate average of a range with multiple column

I have a scenarios, where I have to calculate average price of shares from a set of date. Consider I have following data.
Now I want to represent the data in following format:
Above table will store the average price whenever a new scrip is added in the first table.
I have tried AVERAGEIFS(), but it calculate averages only for a single column range. But I have to calculate average price using price * quantity across the range for the given scrip.
Please suggest.
Not sure I understand the question.
If you're trying to get the total amount base on the average price without a helper column you could use this
=AVERAGEIF($B$3:$E$8,B12,$E$3:$E$8)*SUMIF($B$3:$E$8,B12,$C$3:$C$8)
You can use Power Query (available in Excel 2010+) for this.
In Excel 2016+ (may be different in earlier versions):
select some cell within the data table
Data / Get & Transform / From Table/Range
In the UI, open the Advanced Editor
Paste the M-Code below into the window that opens
Change the Table Name in Line 2 to reflect the actual table name in your worksheet.
NOTE: In the UI, in the Applied Steps window, float your cursor over the information icons to read the comments for explanations. Also you can double click on the gear icons for more information as to how those steps were set up
M Code
let
//Change Table name to correct name
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Stocks", type text}, {"Quantity", Int64.Type}, {"Date", type date}, {"Price", type number}}),
//Group by Stock
#"Grouped Rows" = Table.Group(#"Changed Type", {"Stocks"}, {{"Grouped", each _, type table [Stocks=nullable text, Quantity=nullable number, Date=nullable date, Price=nullable number]}}),
//Sum quantity for each stock
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "Quantity", each List.Sum(Table.Column([Grouped],"Quantity"))),
//Compute weighted average price for each group of stocks
#"Added Custom" = Table.AddColumn(#"Added Custom1", "Price", each List.Accumulate(
List.Positions(Table.Column([Grouped],"Quantity")),
0,
(state, current) =>state + Table.Column([Grouped],"Price"){current} *
Table.Column([Grouped],"Quantity"){current})
/ List.Sum(Table.Column([Grouped],"Quantity"))),
//Compute Total Amount for each stock
#"Added Custom2" = Table.AddColumn(#"Added Custom", "Amount", each [Quantity]*[Price]),
//Remove extraneous Columns
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Grouped"})
in
#"Removed Columns"
Are you allowed to add a column to your data for calculating the total_price? For example, column E = Quantity * Price.
Then your calculations table would be quite simple. Formulas for row 3:
Quantity: =SUMIFS(B:B,A:A,G3)
Average_Price: =SUMIFS(E:E,A:A,G3) / SUMIFS(B:B,A:A,G3)
Amount: =H3*I3

Charting average sales per weekday on data composed of hours

I'm using PowerBI desktop and I'm creating a chart to display average sales per weekday:
My data is in the format below:
(sampled in Excel to remove sensitive information, added colors to facilitate visualization)
My problem is: since each day is broken in 24 rows (hours), my average is wrong by a factor of 24.
For example, if I select January-2019 in the slicer, which has five Tuesdays (weekday code: 2), I want to see on the bar number 2:
(sum of amount where weekday = 2) / 5
Instead, I'm calculating:
(sum of amount where weekday = 2) / (24 * 5)
I can think of some ways to get this right, but they involve custom columns or auxiliary tables. I'm sure there is a simpler answer using DAX and measures, but I'm still learning it.
How can I correctly calculate this?
Let's assume your table name is "Data". Create 3 DAX measures (not calculated columns):
Measure 1:
Total Amount = SUM(Data[Amount])
Measure 2:
Number of Days = DISTINCTCOUNT(Data[Date])
Measure 3:
Average Amount per Day = DIVIDE( [Total Amount], [Number of Days])
Drop the last measure into a chart, it should give you the expected result.
As I understand from your excel you are working with 3 different columns. You can better combine this to a datetime and let power-bi handle it.
Below m-language will do this for you:
let
Source = Excel.Workbook(File.Contents("C:\....\Test.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"date", type datetime}, {"hour", type time}, {"amount", type number}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Date", each [date]+ Duration.FromText(Time.ToText([hour]))),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"amount", "Date"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each ([amount] <> 0))
in
#"Filtered Rows"
The trick is in the added column: #"Added Custom" = Table.AddColumn(#"Changed Type", "Date", each [date]+ Duration.FromText(Time.ToText([hour])))
Here I add the time to the date.
I also removed the empty (zero amount) rows, you do not need them.
I added the Date & weekday to the Axis so a user can now drill down from year, month, day to weekday.
Be aware you need to do the SUM of the amount, not the average.

Power Query Adding a row that sums up previous columns

I'm trying to create a query that sums up a column of values and puts the sum as a new row in the same table. I know I can do this using the group function but it doesn't do it exactly as I need it to do. I'm trying to create an accounting Journal Entry and I need to calculate the offsetting for a long list of debits. I know this is accountant talk. Here's a sample of the table I am using.
Date GL Num GL Name Location Amount
1/31 8000 Payroll Office 7000.00
1/31 8000 Payroll Remote 1750.00
1/31 8000 Payroll City 1800.00
1/31 8010 Taxes Office 600.00
1/31 8010 Taxes Remote 225.00
1/31 8010 Taxes City 240.00
1/31 3000 Accrual All (This needs to be the negative sum of all other rows)
I have been using the Group By functions and grouping by Date with the result being the sum of Amount but that eliminates the previous rows and the four columns except Date. I need to keep all rows and columns, putting the sum in the same Amount column if possible. If the sum has to be in a new column, I can work with that as long as the other columns and rows remain. I also need to enter the GL Num, GL Name, and Location values for this sum row. These three values will not change. They will always be 3000, Accrual, All. The date will change based upon the date used in the actual data. I would prefer to do this all in Power Query (Get & Transform) if possible. I can do it via VBA but I'm trying to make this effortless for others to use.
What you can do it calculate the accrual rows in a separate query and then append them.
Duplicate your query.
Group by Date and sum over Amount. This should return the following:
Date Amount
1/31 11615
Multiply your Amount column by -1. (Transform > Standard > Multiply)
Add custom columns for GL Num, GL Name and Location with the fixed values you choose.
Date Amount GL Num GL Name Location
1/31 11615 3000 Accrual All
Append this table to your original. (Home > Append Queries.)
You can also roll this all up into a single query like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
OriginalTable = Table.TransformColumnTypes(Source,{{"Date", type date}, {"GL Num", Int64.Type}, {"GL Name", type text}, {"Location", type text}, {"Amount", Int64.Type}}),
#"Grouped Rows" = Table.Group(OriginalTable, {"Date"}, {{"Amount", each List.Sum([Amount]), type number}}),
#"Multiplied Column" = Table.TransformColumns(#"Grouped Rows", {{"Amount", each _ * -1, type number}}),
#"Added Custom" = Table.AddColumn(#"Multiplied Column", "GL Num", each 3000),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "GL Name", each "Accrual"),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Location", each "All"),
#"Appended Query" = Table.Combine({OriginalTable, #"Added Custom2"})
in
#"Appended Query"
Note that we are appending the last step with an earlier step in the query instead of referencing a different query.

Resources