M Query Table.Group with Min based on two columns - excel

I have a table with many columns. Three of these columns are:
Package Name (text)
Units Required (Int.64)
Assessment (Int.64)
What I am trying to do is to find the 'Minimum' "Package Name" first by selecting the smallest number of "Units Required", then because sometimes there are several instances where the number of required units will be the same, the row with the lowest "Assessment".
I am exploring the Table.Group() approach but I am not getting anywhere with my understanding of it. I am doing this in Power Query in Excel 365.
Psuedo Code would be something like:
Table.Group("Previous Step Name",{"Package Name"},{MIN("Units Required"),MIN("Assessment")})
As an aside - is it possible to use a single Table.Group and group at two levels? such as "Package Name" and "Column X" so that the result would be a: for each "Package Name" then for each "Column X" in each "Package Name" (nested as it were).
Thankyou in advance for taking a look at this.
Any help greatly appreciated.
Cheers
The Frog

I think you have to do it step by step.
Data
Queries
Load_Data
Load data from Excel table
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content]
in
Source
Min_Unit
Identify min unit by grouping with empty "group by" field.
let
Source = Load_Data,
Group = Table.Group(Source, {}, {{"Min_Unit", each List.Min([Units Required]), type number}})
in
Group
Min_Unit_And_Assessment
Use inner join to filter original data for entries which equal min_unit. Next, group by "units required" to get the min_assessment.
let
Source = Table.NestedJoin(Load_Data, {"Units Required"}, Min_Unit, {"Min_Unit"}, "Min_Unit", JoinKind.Inner),
Group = Table.Group(Source , {"Units Required"}, {{"Min_Assessment", each List.Min([Assessment]), type nullable number}})
in
Group
Result
Inner join to filter original data for the combination of min_unit and min_assessment.
let
Source = Table.NestedJoin(Load_Data, {"Units Required", "Assessment"}, Min_Unit_And_Assessment, {"Units Required", "Min_Assessment"}, "Min_Unit_And_Assessment", JoinKind.Inner),
RemoveUnnecessaryColumns = Table.RemoveColumns(Source,{"Min_Unit_And_Assessment"})
in
RemoveUnnecessaryColumns
Result

Qualia, thankyou for pointing me in the right direction.
The way that I solved this was really simple in the end!
Step 1: Sort the rows based on the grouping criteria (package name, system class) in that order
Step 2: Add an Index Column so each row has a unique ID to work with
Step 3: Group the table based on the same fields (package name, system class) and 'aggregate' on the lowest Index Number (MIN)
Step 4: Perform a 'Merge Queries' with a Left Outer Join using the Index Number as the matching field between your current 'step' and the step from earlier in the processing where the Index was added - you can then have the rows matched and only the rows needed will be matched since the others are now gone due to the MIN aggregation from earlier. Here is my example:
Table.NestedJoin(#"Grouped Rows", {"Winner"}, #"Added Index", {"Index"}, "Lookup Data", JoinKind.LeftOuter)
- Grouped Rows was the grouping step (Step 3)
- Winner is the name of the Index that had the minimum value
- Added Index was the last step before grouping that still had all the columns (Step 2)
- Index is the column that was added after the sort to uniquely number each row
Step 5: Expand the table and select the columns of data that you want to hang onto
Treating it a bit like a database was a good approach and I appreciate the suggestion you put together for me. Hopefully this will allow others to solve some of their problems too.
Cheers and many thanks
The Frog

Related

Excel VBA Power Query - How to create a query that dynamically returns only the sale's rows of the last minute?

I have a comma separated csv file with the following structure:
Col Headers:
ProdDate, ProdTime, OLEDATETIME, ProdBuyPrice, ProdSellPrice, ProdBoughtQTY, ProdSoldQTY, etc
09/21/2019, 13:54:22, 43729.5801, 12.45, 12.61, 8, 9, etc.
This CSV file is atualized many times per minute (5 to 70 times per minute) meaning that it can have 5 to 70 lines within the last minute of sales, then I can't fix an arbitray fixed number on "mantain first lines" to return only the rows that arrived in the last minute and I never did this before with Power Query. So I need an finished recipe to do this, but my googling resulted nothing until now.
Any suggestion?
This is an example of how you can identify a dynamic row number. In this example, we have a table that shows fruit sales by store. We want to create a query that returns the highest number of bananas sold.
This is what our data table looks like.
Step 1 - Add an index column starting from 1. This assigns row numbers.
Add Column > Index Column > From 1
Step 2 - Filter and Sort the data.
Remove any columns that are unnecessary.
Filter the Item column for Bananas.
Sort the Values column in descending order.
Right-click on the first value in the Index column and choose Drill-Down.
RESULT
Now you have a dynamic row #. You could also instead choose the value itself to return the sales instead of the index. To apply this to other scenarios, just keep filtering and sorting until you get to the result you need.
This is how you filter a time column for records occurring in the latest one minute of times.
let
Source = Excel.CurrentWorkbook(){[Name="t_DatesAndTimes"]}[Content],
ChangedTypes_ColData = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}}),
AddCol_DateAndTime = Table.AddColumn(ChangedTypes_ColData, "Date and Time", each [Date] & [Time], type datetime),
LatestTime_ofReport_MinusOneMinute = List.Max(AddCol_DateAndTime[Date and Time])-#duration(0,0,1,0),
FilterRows_KeepTimesInLastMinute = Table.SelectRows(AddCol_DateAndTime, each [Date and Time] >= LatestTime_ofReport_MinusOneMinute)
in
FilterRows_KeepTimesInLastMinute
Data Table needing to be filtered
Table filtered for time in the last minute of times listed in the report.

how to merge two rows into one in spotfire?

I am stuck at a point in spotfire wherein I need to transform the table (the one below)
ID First name last name
1 Mark
1 Taylor
2 Howard
2 Giblin
to (the table as shown here)
ID First Name Last Name
1 Mark Taylor
2 James Bond
Could someone please help me out. Thanks for the help in advance!
File > Add Data Tables
Add (button) > From Current Analysis > "Your Table Name"
Under transformations, Select "Calculate and Replace Column" > Add (button)
Then use this formula
Max([FirstName]) over ([ID]) as [FirstName]
Repeat the last step for you last name
Max([LastName]) over ([ID]) as [LastName]
Note, you could do this in a cross table or a calculated column as well. It will not remove the duplicate rows though, only fill in the gaps.

Power Query - Keeping Most recent records in change log columns

I need to strip records to show just the most recent for a given person, and I'm trying to think of a method for doing this in a custom column so I can just keep the most recent records. This is essentially a a status change list, and I need to match the last change as a "current status" for merging with another query. Each date can be unique, and each person can have any from 1 to a dozen status changes. I've picked a selection below, Last Names have been removed to protect the innocent. For sake of the example, Each "name" has a unique identifier that I can use to prevent any overlap from similar names.
AaronS 4/1/2015
AaronS 10/16/2013
AaronS 5/15/2013
AdamS 2/27/2007
AdamL 12/16/2004
AdamL 11/17/2004
AlanG 11/1/2007
AlexanderJ 7/1/2016
AlexanderJ 1/25/2016
AlexanderJ 4/1/2015
AlexanderJ 10/16/2013
AlexanderJ 6/1/2013
AlexanderJ 11/7/2011
My goal would be to return the most recent date for each individual "name" and nulls for the other rows. Then I can filter out nulls to return one row per name. I'm fairly new to power query and mostly adept with the UI, barely learning M Code. Any help will be most welcome.
GUI
Bring the "Name" and "Date" data into Power Query.
Group by "Name". In the Group By dialog select the operation All Rows. Name the new column "AllRows". Click OK.
Add a custom column and title it "LatestRow". Enter the formula below. Click OK. Note that the "Date" column is coming from the sub-table in the "AllRows" column.
= Table.Max([AllRows], "Date")
Click the expand button in the upper right corner of the "LatestRow" column. This will return the record associated with the latest date for each name.
Code
let
Source = Excel.CurrentWorkbook(){[Name="data"]}[Content],
GroupedRows = Table.Group(Source, {"Name"}, {{"AllRows", each _, type table [Name=nullable text, Date=nullable datetime]}}),
AddedCustomColumn = Table.AddColumn(GroupedRows, "LatestRow", each Table.Max([AllRows], "Date")),
ExpandedLatestRow = Table.ExpandRecordColumn(AddedCustomColumn, "LatestRow", {"Date"}, {"LatestRow.Date"})
in
ExpandedLatestRow

Excel Power Query -- Select value in column specified in related table -- INDEX+MATCH alternative

Problem
I have two queries, one contains product data (data_query), the other (recode_query) contains product names from within the data_query and assigns them specific id_tags. id_tags are also column names within the data_query.
What I need to achieve and fail at
I need the data_query to look at the id_tag of the specific product name within the data_query, as parsed from the recode_query (this is already working and in place) and input the retrieved value within the specific custom column cell. In Excel, I would be using INDEX/MATCH combo:
{=INDEX(data_query[#Data];; MATCH(data_query[#id_tag]; data_query[#Headers]; 0))}
I have searched near and far, but I probably can't even spot the solution, even if I have come across it, as I am not that deep in the data manipulation and power query myself.
Is this what you're wanting?
let
DataQuery = Table.FromColumns({{1,2,3}, {"Boxed", "Bagged", "Rubberbanded"}}, {"ID","Pkg"}),
RecodeQuery = Table.FromColumns({{"Squirt Gun", "Coffee Maker", "Trenching Tool"}, {1,2,3}}, {"Prod Name", "ID2"}),
Rzlt = Table.Join(DataQuery, "ID", RecodeQuery, "ID2", JoinKind.Inner)
in
Rzlt

Excel - Power Query 2016

I got data from two tables.
Customers (containing customer ID and the total value of orders/funding
Orders (Containing customer ID and each order)
I created a Power Query, then chose the option to "Merge Queries as New". Selected the matching Columns (Customer ID) and chose the option:Left Outer (All from the first and, matching from second => All from the customer table, matching from the order table). Then I expanded the last column of the Query to include what I wanted from the Order table resulting in the table below on the left. The one on the right is what I'm after. The problem is that funding amounts are already totals per customer. I don't need the value of each order broken down. I still need the orders displayed but I don't need their values (just the total per customer). Is it possible to do it like the one below on the right? Otherwise, the grand total is way off.
I think what you're trying to do is join with only the first instance of each value in your Customer column. There doesn't appear to be any feature or GUI element that allows you to do that (I had a look at the reference documentation for Power Query M, maybe I missed something).
To replicate your data, I'm starting off with some tables (left table is namedCustomers, right table is namedOrders):
I then use the M code below (the first few lines are just to get my tables from the sheet):
let
customers = Excel.CurrentWorkbook(){[Name = "Customers"]}[Content],
orders = Excel.CurrentWorkbook(){[Name = "Orders"]}[Content],
merged = Table.NestedJoin(orders, {"CUSTOMER"}, customers, {"CUSTOMER"}, "merged", JoinKind.LeftOuter),
indexColumn = Table.AddIndexColumn(merged, "Temporary", 0, 1),
indexes =
let
uniqueCustomers = Table.Distinct(Table.SelectColumns(indexColumn, {"CUSTOMER"})), // Want to keep as table
listOfRecords = Table.ToRecords(uniqueCustomers),
firstOccurenceIndexes = List.Accumulate(listOfRecords, {}, (listState, currentItem) =>
List.Combine({listState, {Table.PositionOf(indexColumn, currentItem, Occurrence.First, "CUSTOMER")}})
)
in
firstOccurenceIndexes,
expandSelectively =
let
toBoolean = Table.TransformColumns(indexColumn, {{"Temporary", each List.Contains(indexes, _), type logical}}),
tableOrNull = Table.AddColumn(toBoolean, "toExpand", each if [Temporary] then [merged] else null),
dropRedundantColumns = Table.RemoveColumns(tableOrNull, {"merged", "Temporary"}),
expand = Table.ExpandTableColumn(dropRedundantColumns, "toExpand", {"FUNDING"})
in
expand
in
expandSelectively
If your table names and column names match mine (including case sensitivity), then you might just be able to copy-paste all of the M code above into the Advanced Editor and have it work for you. Otherwise, you may need to tweak as necessary.
This is what I get when I load the query to the worksheet.
There might be better (more efficient) ways of doing this, but this is what I have for now.
If you're not using the order ID column, then I would suggest doing a Group By on the OrderTable before merging in the funding so that you'd end up with a table like this instead:
Region Customer OrderCount Funding
South A 3 2394
South B 2 4323
South C 1 1234
South D 2 3423
This way you don't have mixed levels of granularity that cause problems like you are seeing with the totals.

Resources