I have a file with 50,000 lines of data in 3 columns- Unique ID, Start Date, and End Date.
Using Power Pivot, I need to determine if any records with the same Unique ID have any overlapping dates. Each Unique ID appears about 5 times.
In excel, I would use a formula
SUMPRODUCT: =SUMPRODUCT(($B3<=$C$3:$C$13)*($C3>=$B$3:$B$13)*($A$3:$A$13=A3))>1
While this formula works really well in excel, with 50k+ records, this breaks my computer.
I was wondering, how would I perform that same calculation in power pivot/query.
Example of the data and calculation.
Thank you so much!
following a PowerQuery M-Code, this will solve your problem. Don't know how long it will take for 50k rows:
let
Quelle = Excel.CurrentWorkbook(){[Name="tab_Dates"]}[Content],
Change_Type = Table.TransformColumnTypes(Quelle,{{"Unique ID", type text}, {"Start Date", type date}, {"End Date", type date}}),
add_List_Dates = Table.AddColumn(Change_Type, "List_Dates", each List.Dates([Start Date], Duration.Days([End Date]-[Start Date])+1 , #duration(1,0,0,0))),
expand_List_Dates = Table.ExpandListColumn(add_List_Dates, "List_Dates"),
add_CountIF_ID_Date = Table.AddColumn(expand_List_Dates, "CountIF_ID_Date", (CountRows) =>
Table.RowCount(
Table.SelectRows(
expand_List_Dates,
each
([Unique ID] = CountRows[Unique ID] and [List_Dates] = CountRows[List_Dates])))),
Change_Type_2 = Table.TransformColumnTypes(add_CountIF_ID_Date,{{"CountIF_ID_Date", type text}}),
ChangeValue_CountIF_ID_Date = Table.ReplaceValue(Change_Type_2, each [CountIF_ID_Date], each if [CountIF_ID_Date] <> "1" then "FALSE" else "TRUE",Replacer.ReplaceText,{"CountIF_ID_Date"}),
Remove_Column_List_Dates = Table.RemoveColumns(ChangeValue_CountIF_ID_Date,{"List_Dates"}),
Remove_Duplicates = Table.Distinct(Remove_Column_List_Dates)
in
Remove_Duplicates
Related
source link
I am trying to come up with a solution to the following problem.
Problem:
In my dataset I have certain quantity of item in demand (need), and purchase orders that re-supply that item(Supply). I need to determine for each demand , what is the first date where we will have enough supply to fill the demand.
For example, if we look at our 1st demand, which require 5 units, according to the cumulative Sum column, 18/12/23 will be the first date when we would have enough qty supplied to satisfy the first demand. The problem appears when we have more the one demand for an item.
If we stay with same item What I would like to do is to update the cumulative Sum when we meet the enough quantity ( as cumulative Sum = cumulative Sum- qty(demand) or 6(cumulative supply)-5(demand) = 1 ) so the cumulative Sum for the next demand will be 100 +1 = 101 and not 100 + 6 = 106. Thereby we can simply rely on the cumulative Sum (updated) to retrieve the first date where we will have enough supply to fill the demand.
I'm not sure if something like this is possibly in Power Query but any help is greatly appreciated.
Hopefully that all makes sense. Thx.
Revised
In powerquery try this as code for Demand
let Source = Excel.CurrentWorkbook(){[Name="DemandDataRange"]}[Content],
#"SupplyGrouped Rows" = Table.Group(Supply, {"item"}, {{"data", each
let a = Table.AddIndexColumn( _ , "Index", 0, 1),
b=Table.AddColumn(a, "CumTotal", each List.Sum(List.FirstN(a[Qty],[Index]+1)))
in b, type table }}),
#"SupplyExpanded data" = Table.ExpandTableColumn(#"SupplyGrouped Rows", "data", { "Supply date", "CumTotal"}, {"Supply date", "CumTotal"}),
#"Grouped Rows" = Table.Group(Source, {"item"}, {{"data", each
let a= Table.AddIndexColumn(_, "Index", 0, 1),
b=Table.AddColumn(a, "CumTotal", each List.Sum(List.FirstN(a[Qty],[Index]+1)))
in b, type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Qty", "Date", "Index", "CumTotal"}, {"Qty", "Date", "Index", "CumTotal"}),
x=Table.AddColumn(#"Expanded data","MaxDate",(i)=>try Table.SelectRows( #"SupplyExpanded data", each [item]=i[item] and [CumTotal]>=i[CumTotal] )[Supply date]{0} otherwise null, type date ),
#"Removed Columns" = Table.RemoveColumns(x,{"Index", "CumTotal"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Date", type date}})
in #"Changed Type"
Given my understanding of what you want for results, the following Power Query M code should return that.
If you just want to compare the total supply vs total demand, then only check the final entries instead of the first non-negative.
Read the code comments, statement names and explore the Applied Steps to understand the algorithm.
let
//Read in the data tables
//could have them in separate querries
Source = Excel.CurrentWorkbook(){[Name="Demand"]}[Content],
Demand = Table.TransformColumnTypes(Source,{{"item", type text}, {"Qty", Int64.Type}, {"Date", type date}}),
//make demand values negative
#"Transform Demand" = Table.TransformColumns(Demand,{"Qty", each _ * -1}),
Source2 = Excel.CurrentWorkbook(){[Name="Supply"]}[Content],
Supply = Table.TransformColumnTypes(Source2,{{"item", type text},{"Qty", Int64.Type},{"Supply date", type date}}),
#"Rename Supply Date Column" = Table.RenameColumns(Supply,{"Supply date","Date"}),
//Merge the tables and sort by Item and Date
Merge = Table.Combine({#"Rename Supply Date Column", #"Transform Demand"}),
#"Sorted Rows" = Table.Sort(Merge,{{"item", Order.Ascending}, {"Date", Order.Ascending}}),
//Group by Item
//Grouped running total to find first positive value
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"item"}, {
{"First Date", (t)=> let
#"Running Total" = List.RemoveFirstN(List.Generate(
()=>[rt=t[Qty]{0}, idx=0],
each [idx]<Table.RowCount(t),
each [rt=[rt]+t[Qty]{[idx]+1}, idx=[idx]+1],
each [rt]),1),
#"First non-negative" = List.PositionOfAny(#"Running Total", List.Select(#"Running Total", each _ >=0), Occurrence.First)
in t[Date]{#"First non-negative"+1}, type date}})
in
#"Grouped Rows"
Supply
Demand
Results
I did this in Excel formula rather than using powerquery - there will be a powerquery equivalent but I'm not very fluent in DAX yet.
You need a helper column - could do without it but everything's much more readable if you have it.
In sheet Supply (2), cell E2, enter the formula:
=SUMIFS(Supply!B:B; Supply!C:C;"<=" & C2;Supply!A:A;A2)-SUMIFS(Dem!B:B;Dem!C:C;"<=" & C2;Dem!A:A;A2)
and copy downwards. This can be described as Total supply up to that date minus total demand up to that date. In some cases this will be negative (where there's more demand than supply).
Now you need to find the date of the first non-negative value for that.
First create a unique list of the items - I put it on the same sheet in the range G2:G6. Then in H2, the formula:
=MINIFS(C:C;A:A;G2;E:E;">=" & 0)
and copy downwards.
Problem: My maximum Range is around 10000 Rows x 365 columns, I want to compare cell values across a row .
Conditions:
It has to return how many times a name is repeated in each row for every primary key
if a name comes only once in a row, that need not be shown, anything more than 2 should be displayed
It has to exclude blank cells and if it encounters "Dispatched" then it need not count further.
Requirement: Any solution either excel or macro would do.
Sample Excel File
Bag Number
8th July
9th July
10th July
11th July
12th July
13th July
20/F/43352/1
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/2
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/3
FINAL POLISH
QC
Dispatched
Dispatched
Dispatched
Dispatched
20/F/43352/4
Casting
Casting
Laser Cutting
Filing
Filing
FINAL POLISH
20/F/43352/5
Casting
20/F/43352/6
Casting
Casting
FINAL POLISH
Dispatched
20/F/43352/7
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
The Output for the same should be
Bags
Casting
Filing
Final Polish
Dispatched
20/F/43347/1
3days
3 days
Yes
20/F/43347/2
3days
3 days
Yes
20/F/43347/3
2 days
3days
3 days
Yes
Background
Until very recently this process was manual so once this spreadsheet was made, it would be divided among 3 people and they would manually scan, highlight and proceed
Tried a countif condition, row wise but that again reduces 365 columns to 12 columns and leaves behind lots of unnecessary values, (if its in a station for only 1 day need not be highlighted)
Tried Pivot but did not give a summary that makes sense.
VBA is not my strong suite haven't tried anything there.
I am looking for something that will help make sense to this and highlight if any product is stuck anywhere.
Hi all, to answer all queries,
#braX I have tried countif with the department names, but the resulting table is unwieldy for my requirement. am looking for ideas to solve this
#DavidWooley-AST there are total of 12 departments, and the data is kept for an entire year, a primary key can go through each department in 45 days or more.
Also there is a chance that incase of any rework then there is a revisit to the department. thus that data also has to be captured, sorry I should have mentioned this before.
You can create the output you show using Power Query, available in Windows Excel 2010+ and Office 365.
The below should get you started.
You will have to add some lines in the Table.Group Aggregation list for other tasks.
You may also need to add code to exclude non-repeats and after "Dispatched" but you showed no examples of that in your data or results, so I did not code anything for that.
I also don't know what you mean by "highlight if any product is stuck anywhere".
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Group by Bag Number
// then extract the Count for each type
// Add " days" to each count
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Bag Number"}, {
{"Filing", (t)=> "Filing " & Text.From(List.Count(List.Select(t[Value],each _ = "FILING"))) & " days"},
{"Final Polish", (t)=> "Final Polish " & Text.From(List.Count(List.Select(t[Value],each _ = "FINAL POLISH"))) & " days"}
}),
//Merge columns with commas (and hyphen for the first to the rest) to get final format
#"Merged Columns" = Table.CombineColumns(#"Grouped Rows",{"Filing", "Final Polish"},
Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged"),
#"Merged Columns1" = Table.CombineColumns(#"Merged Columns",{"Bag Number", "Merged"},
Combiner.CombineTextByDelimiter(" - ", QuoteStyle.None),"A")
in
#"Merged Columns1"
Edit based on your new example of data and desired output
Given your new example, you can get the output from PQ as shown below.
Note that you can add the other departments using the same syntax as shown for those done (except for Dispatched which is treated differently).
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Change to proper case for consistency and text matching
properCase = Table.TransformColumns(#"Removed Columns",{{"Value", Text.Proper, type text}}),
//Group by Bag Number
// then extract the Count for each type
// Show null if count < 2
// Add " days" to each count
// Show only `Dispatched` if it occurrs one or more times
#"Grouped Rows" = Table.Group(properCase, {"Bag Number"}, {
{"Casting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Casting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Laser Cutting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Laser Cutting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Filing", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Filing"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Final Polish", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Final Polish"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"QC", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Qc"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Dispatched", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Dispatched"))
in
if x = 0 then null else "Dispatched", type text}
})
in
#"Grouped Rows"
I am using a spreadsheet to log the tasks completed and in progress of a project. I was to generate some VBA code that can identify the latest delivery date within a task. However, in each task there are various sub tasks.
So the boundaries are the task which are whole numbers, and in between these whole numbers e.g. 46 and 46, are sub tasks.
The latest date needs to be calculated by examining the dates of the tasks between each whole number. E.g. 46.1,46.2,46.3 etc.
Would i be better by using the excel functions or would it be easier to use code?
e.g. the example of an excel function but in vba i would use.
Worksheets("Activity Overview").cells(n, "E").value = "=IFERROR(IF(AGGREGATE(14,7,'Sub Tasks'!S:S/(('Sub Tasks'!A:A>='Activity Overview'!A" & n & ")*('Sub Tasks'!A:A<'Activity Overview'!A" & n + 1 & ")),1),AGGREGATE(14,7,'Sub Tasks'!S:S/(('Sub Tasks'!A:A>='Activity Overview'!A" & n & ")*('Sub Tasks'!A:A<'Activity Overview'!" & n + 1 & ")),1),""""),"""")"
```
Use MAXIFS():
=MAXIFS(B:B,A:A,">="&E1,A:A,"<"&E1+1)
If one does not have MAXIFS then use AGGREGATE:
=AGGREGATE(14,7,$B$1:$B$6/(($A$1:$A$6>=E1)*($A$1:$A$6<E1+1)),1)
AGGREGATE is an array type formula and as such the references should be limited to the data range.
Here's a solution using SUMPRODUCT. It basically filters all values between the base value (>=49) and less than the base value plus one (<50).
You can also do this using Power Query aka Get & Transform available in Excel 2010+
Get Data from Table/Range
Change the Date column to Date format (it defaults to DateTime)
Create a custom column which is the Integer of the Task column and Name it Main Task
=Int64.From([Task])
Delete the original Task column
Group By the Main Task column
New Column Name Latest Date
Operation: Max
Column: Target Delivery Date
If there are any nulls in the Latest Date Column (from tasks with no delivery dates), you can either leave them, or filter them out.
Data
Results
M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Task", type number}, {"Target Delivery Date", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Main Task", each Int64.From([Task])),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Main Task"}, {{"Latest Date", each List.Max([Target Delivery Date]), type date}}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Latest Date] <> null))
in
#"Filtered Rows"
I have a few columns of data, I need to convert the excel version of "PERCENTILE" into Powerquery format.
I have some code which adds in as a function but doesnt apply accurately as it doesnt allow for grouping of the data by CATEGORY and YEAR. So anything that is in Full Discretionary 1.5-2.5 AND 2014 needs to be added to the percentile array, equally anything that falls in Full discretionary 2.5-3.5 AND 2014 needs to go into a different percentile array
let
Source = (list as any, k as number) => let
Source = list,
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Sorted Rows" = Table.Sort(#"Converted to Table",{{"Column1", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "TheIndex", each Table.RowCount(#"Converted to Table")*k/100),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Index] >= [TheIndex] and [Index] <= [TheIndex]+1),
Custom1 = List.Average(#"Filtered Rows"[Column1])
in
Custom1
in
Source
So Expected results would be that anything that matches off on the 2 columns (Year,Category) should be applied within the same array. Currently invoking the above function just gives me errors.
I have also tried using grouping and outputting the "Min, Median, and Max" outputs but I also require 10% and 90% Percentiles.
Thank you in advance
Based on some findings on other websites and alot of googling (most folk just want to use DAX but if youre only using Power Query you cant!) someone posted an answer which is very helpful:
https://social.technet.microsoft.com/Forums/en-US/a57bfbea-52d1-4231-b2de-fa993d9bb4c9/can-the-quotpercentilequot-be-calculated-in-power-query?forum=powerquery
Basically:
/PercentileInclusive Function
(inputSeries as list, percentile as number) =>
let
SeriesCount = List.Count(inputSeries),
PercentileRank = percentile*(SeriesCount-1)+1, //percentile value between 0 and 1
PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
Percentile1 = List.Max(List.MinN(inputSeries,PercentileRankRoundedDown)),
Percentile2 = List.Max(List.MinN(inputSeries,PercentileRankRoundedUp)),
Percentile = Percentile1+(Percentile2-Percentile1)*(PercentileRank-PercentileRankRoundedDown)
in
Percentile
The above will replicate the PERCENTILE function found within Excel - you pass this as a query using "New Query" and advanced editor. Then call it in after grouping your data -
Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each
List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each
List.Average([Amount Sales]), type number}})
In the above formula, RenamedColumns is the name of the previous step
in the script. Change the name to match your actual case. I've assumed
that the pre-grouping sales amount column is "Amount Sales." Names of
grouped columns are "Sales Total" and "95 Percentile Sales."
Next modify the group formula, substituting List.Average with
PercentileInclusive:
Table.Group(RenamedColumns, {"Country"}, {{"Sales Total", each
List.Sum([Amount Sales]), type number}, {"95 Percentile Sales", each
PercentileInclusive([Amount Sales],0.95), type number}})
This worked for my data set and matches similar
I'm trying to create a query that sums up a column of values and puts the sum as a new row in the same table. I know I can do this using the group function but it doesn't do it exactly as I need it to do. I'm trying to create an accounting Journal Entry and I need to calculate the offsetting for a long list of debits. I know this is accountant talk. Here's a sample of the table I am using.
Date GL Num GL Name Location Amount
1/31 8000 Payroll Office 7000.00
1/31 8000 Payroll Remote 1750.00
1/31 8000 Payroll City 1800.00
1/31 8010 Taxes Office 600.00
1/31 8010 Taxes Remote 225.00
1/31 8010 Taxes City 240.00
1/31 3000 Accrual All (This needs to be the negative sum of all other rows)
I have been using the Group By functions and grouping by Date with the result being the sum of Amount but that eliminates the previous rows and the four columns except Date. I need to keep all rows and columns, putting the sum in the same Amount column if possible. If the sum has to be in a new column, I can work with that as long as the other columns and rows remain. I also need to enter the GL Num, GL Name, and Location values for this sum row. These three values will not change. They will always be 3000, Accrual, All. The date will change based upon the date used in the actual data. I would prefer to do this all in Power Query (Get & Transform) if possible. I can do it via VBA but I'm trying to make this effortless for others to use.
What you can do it calculate the accrual rows in a separate query and then append them.
Duplicate your query.
Group by Date and sum over Amount. This should return the following:
Date Amount
1/31 11615
Multiply your Amount column by -1. (Transform > Standard > Multiply)
Add custom columns for GL Num, GL Name and Location with the fixed values you choose.
Date Amount GL Num GL Name Location
1/31 11615 3000 Accrual All
Append this table to your original. (Home > Append Queries.)
You can also roll this all up into a single query like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
OriginalTable = Table.TransformColumnTypes(Source,{{"Date", type date}, {"GL Num", Int64.Type}, {"GL Name", type text}, {"Location", type text}, {"Amount", Int64.Type}}),
#"Grouped Rows" = Table.Group(OriginalTable, {"Date"}, {{"Amount", each List.Sum([Amount]), type number}}),
#"Multiplied Column" = Table.TransformColumns(#"Grouped Rows", {{"Amount", each _ * -1, type number}}),
#"Added Custom" = Table.AddColumn(#"Multiplied Column", "GL Num", each 3000),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "GL Name", each "Accrual"),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Location", each "All"),
#"Appended Query" = Table.Combine({OriginalTable, #"Added Custom2"})
in
#"Appended Query"
Note that we are appending the last step with an earlier step in the query instead of referencing a different query.