I'm a little rusty on PowerQuery.
I need to count "previous" entries in the same table.
For example, let's say we have a table of car sales.
For the purposes of PowerQuery, this table will be named tblCarSales
I need to add two aggregate columns.
The first aggregate column is the count of previous sales.
The Excel formula would be =COUNTIF([Sale Date],"<"&[#[Sale Date]])
The second aggregate column is the count of previous sales by make.
The Excel formula would be =COUNTIFS([Sale Date],"<"&[#[Sale Date]],[Make],[#Make])
How can this behavior be accomplished in PowerQuery, instead of using Excel formulas?
For example, I'm starting with the source statement:
let
Source = Excel.CurrentWorkbook(){[Name="tblCarSales"]}[Content]
in
Source
... where the source table only provides the Make, Model, and Sale Date columns.
You can do this sort of thing using List and Table functions. I'll show both.
let
Source = Excel.CurrentWorkbook(){[Name="tblCarSales"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Previous Sale Count",
(C) => List.Count(List.Select(Source[Sale Date],
each _ < C[Sale Date]))),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Previous Sale Count By Make",
(C) => Table.RowCount(Table.SelectRows(Source,
(S) => S[Sale Date] < C[Sale Date] and S[Make] = C[Make])))
in
#"Added Custom1"
We have to use the functions so that Power Query knows what context we're looking at the columns in. For further reading, check out this Power Query M Primer.
Related
I have a table of data which is consisted of 18 columns and 2.017 rows. I can get the row that has the highest (MAX) value in a cell but I need the row that has the most cells with higher values and have them in DESC order. I haven't managed yet to find a relevant post to this.
Here follows an example:
Using numbers up to 10 for illustration, the following shows the logic behind. (The actual numbers are those shown in Exhibit1)
Thank you
EDIT:
I am adding the below in order to try to clarify further. I am not sure if it is the correct path to go but I hope it makes sense.
In Exhibit2 I am indexing each column Desc (Based on Exhibit1) and then =SUM in the end of the row. Following this logic, the name having the lowest total is the one with the most high values (not the highest) in its row.
The result table is the following
Although possible with formulas and helper tables/columns, this can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range or from within sheet
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
As we discussed in our Chat, I transform each column into a list of Ranked Entries; then sum the ranks for each row and sort as you have laid out.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
//type all the columns
data = Table.TransformColumnTypes(Source,{
{"Order", Int64.Type},
{"Name", type text}} &
List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),2), each {_, type number})
),
//Replace with ranks
//generate list of transforms to dynamically include all columns
cols = List.RemoveFirstN(Table.ColumnNames(data),2),
xForms = List.Transform(cols, (c)=> {c, each List.PositionOf(List.Sort(Table.Column(data,c),Order.Descending),_)}),
ranks = Table.TransformColumns(data,xForms),
//add Index column to enable row-wise sums
// then add the sumRank column and delete the Index column
#"Added Index" = Table.AddIndexColumn(ranks, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "sumRank", each
List.Sum(
Record.ToList(
Record.RemoveFields(#"Added Index"{[Index]},{"Order","Name","Index"})
)
)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
//join back with the original data table
//extract the sumRank column
join = Table.NestedJoin(data,{"Order","Name"}, #"Removed Columns",{"Order","Name"}, "joined",JoinKind.FullOuter),
#"Expanded joined" = Table.ExpandTableColumn(join, "joined", {"sumRank"}, {"sumRank"}),
//sort by the sumRank column, then remove it
#"Sorted Rows" = Table.Sort(#"Expanded joined",{{"sumRank", Order.Ascending}}),
#"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"sumRank"})
in
#"Removed Columns1"
This set-up is volatile, so I would only adopt it if non-volatile alternatives are not forthcoming.
An additional column in your table with the following formula:
=SUM(COUNTIF(OFFSET([Column1],,TRANSPOSE(ROW(INDIRECT("1:"&COLUMNS(Table1[#[Column1]:[Column4]])))-1)),">="&Table1[#[Column1]:[Column4]]))
which you can then use to sort your table.
Note that this formula will most likely require committing with CTRL+SHIFT+ENTER for your version of Excel.
Amend the table and column names as required, noting that the part
Table1[#[Column1]:[Column4]]
as well as including the table name, should comprise the leftmost and rightmost of the contiguous columns to be interrogated.
Apologies if I make any errors, first time posting here!
I have a dataset that I've read into the Excel data model using PowerQuery, I've split this into 3 tables that I've linked through a unique ID field (so one main table with just the unique IDs and general info then two tables linked from it).
What I want to do is take one of the linked tables that looks like this:
ID
Start Date
End Date
Category
123456
01/01/2000
01/01/2001
A
I've created a separate date table and what I want is a count of every active ID for each month of the date table which I managed using CALCULATE and FILTER in a column on the date table. But when I load that into the Pivot it ignores the categories.
I tried relating the date table using the start date field of the other table but it didn't make any difference.
I've found tonnes of PowerBI solutions that involve calculated tables but being Excel based is a requirement.
Thanks in advance!
I'm afraid that to expand the date interval in Power Query we need to write a line of M code.
This is a small sample that creates a sample table with the columns of the table in your question. I used different value to keep the example simple.
The idea is to expand the dates interval creating a M List, containing the interval of dates expanded. Then to use this list to create the new rows with the new column "Date".
The last step removes the "Start Date" and "End Date" columns
This code can be directly pasted into the advanced query editor for a new blank query.
let
Source = #table(
type table
[
#"ID"=number,
#"Start Date"=date,
#"End Date"=date,
#"Category"=text
],
{
{1,#date(2020,1,1),#date(2020,1,2), "A"},
{2,#date(2020,1,10),#date(2020,1,12), "A"}
}
),
SourceWithList = Table.AddColumn(Source, "Date",
each List.Dates([Start Date], Duration.Days([End Date] - [Start Date]) + 1, #duration(1, 0, 0, 0))),
#"Expanded DateList" = Table.ExpandListColumn(SourceWithList, "Date"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded DateList",{"Start Date", "End Date"})
in
#"Removed Columns"
The Source statement is just needed for the example, to create the starting table.
The SourceWithList is the M code to be written: it adds a column using the function Table.AddColumn(), and creates the new column using the function List.Dates().
This function requires the start date, the duration and the step interval.
The duration is computed with the function Duration.Days() that returns the difference between two dates as number of days.
To create the #"Expanded DateList" step it's possible to use the Power Query interface clicking on "Expand to new rows" in the column menu. The screenshot I took are in Power BI, but the Power Pivot interface for Power Query is very similar.
Then remove the "Start Date" and "End Date" columns by selecting the column and clicking on "Remove Columns"
I am using a spreadsheet to log the tasks completed and in progress of a project. I was to generate some VBA code that can identify the latest delivery date within a task. However, in each task there are various sub tasks.
So the boundaries are the task which are whole numbers, and in between these whole numbers e.g. 46 and 46, are sub tasks.
The latest date needs to be calculated by examining the dates of the tasks between each whole number. E.g. 46.1,46.2,46.3 etc.
Would i be better by using the excel functions or would it be easier to use code?
e.g. the example of an excel function but in vba i would use.
Worksheets("Activity Overview").cells(n, "E").value = "=IFERROR(IF(AGGREGATE(14,7,'Sub Tasks'!S:S/(('Sub Tasks'!A:A>='Activity Overview'!A" & n & ")*('Sub Tasks'!A:A<'Activity Overview'!A" & n + 1 & ")),1),AGGREGATE(14,7,'Sub Tasks'!S:S/(('Sub Tasks'!A:A>='Activity Overview'!A" & n & ")*('Sub Tasks'!A:A<'Activity Overview'!" & n + 1 & ")),1),""""),"""")"
```
Use MAXIFS():
=MAXIFS(B:B,A:A,">="&E1,A:A,"<"&E1+1)
If one does not have MAXIFS then use AGGREGATE:
=AGGREGATE(14,7,$B$1:$B$6/(($A$1:$A$6>=E1)*($A$1:$A$6<E1+1)),1)
AGGREGATE is an array type formula and as such the references should be limited to the data range.
Here's a solution using SUMPRODUCT. It basically filters all values between the base value (>=49) and less than the base value plus one (<50).
You can also do this using Power Query aka Get & Transform available in Excel 2010+
Get Data from Table/Range
Change the Date column to Date format (it defaults to DateTime)
Create a custom column which is the Integer of the Task column and Name it Main Task
=Int64.From([Task])
Delete the original Task column
Group By the Main Task column
New Column Name Latest Date
Operation: Max
Column: Target Delivery Date
If there are any nulls in the Latest Date Column (from tasks with no delivery dates), you can either leave them, or filter them out.
Data
Results
M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Task", type number}, {"Target Delivery Date", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Main Task", each Int64.From([Task])),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Main Task"}, {{"Latest Date", each List.Max([Target Delivery Date]), type date}}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Latest Date] <> null))
in
#"Filtered Rows"
I'm trying to create a query that sums up a column of values and puts the sum as a new row in the same table. I know I can do this using the group function but it doesn't do it exactly as I need it to do. I'm trying to create an accounting Journal Entry and I need to calculate the offsetting for a long list of debits. I know this is accountant talk. Here's a sample of the table I am using.
Date GL Num GL Name Location Amount
1/31 8000 Payroll Office 7000.00
1/31 8000 Payroll Remote 1750.00
1/31 8000 Payroll City 1800.00
1/31 8010 Taxes Office 600.00
1/31 8010 Taxes Remote 225.00
1/31 8010 Taxes City 240.00
1/31 3000 Accrual All (This needs to be the negative sum of all other rows)
I have been using the Group By functions and grouping by Date with the result being the sum of Amount but that eliminates the previous rows and the four columns except Date. I need to keep all rows and columns, putting the sum in the same Amount column if possible. If the sum has to be in a new column, I can work with that as long as the other columns and rows remain. I also need to enter the GL Num, GL Name, and Location values for this sum row. These three values will not change. They will always be 3000, Accrual, All. The date will change based upon the date used in the actual data. I would prefer to do this all in Power Query (Get & Transform) if possible. I can do it via VBA but I'm trying to make this effortless for others to use.
What you can do it calculate the accrual rows in a separate query and then append them.
Duplicate your query.
Group by Date and sum over Amount. This should return the following:
Date Amount
1/31 11615
Multiply your Amount column by -1. (Transform > Standard > Multiply)
Add custom columns for GL Num, GL Name and Location with the fixed values you choose.
Date Amount GL Num GL Name Location
1/31 11615 3000 Accrual All
Append this table to your original. (Home > Append Queries.)
You can also roll this all up into a single query like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
OriginalTable = Table.TransformColumnTypes(Source,{{"Date", type date}, {"GL Num", Int64.Type}, {"GL Name", type text}, {"Location", type text}, {"Amount", Int64.Type}}),
#"Grouped Rows" = Table.Group(OriginalTable, {"Date"}, {{"Amount", each List.Sum([Amount]), type number}}),
#"Multiplied Column" = Table.TransformColumns(#"Grouped Rows", {{"Amount", each _ * -1, type number}}),
#"Added Custom" = Table.AddColumn(#"Multiplied Column", "GL Num", each 3000),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "GL Name", each "Accrual"),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Location", each "All"),
#"Appended Query" = Table.Combine({OriginalTable, #"Added Custom2"})
in
#"Appended Query"
Note that we are appending the last step with an earlier step in the query instead of referencing a different query.
In Excel, what VBA code will help me explode/enrich data in table A by applying the % shares in table B to produce the desired output in table C? Not all companies need to be enriched.
screenshot of relevant tables in Excel
I envisage some loop to match on company name and then to enrich Table B by inserting the necessary rows to show the resulting shared $ by Team.
I'd use Power Query for that, not VBA. Load both table A and Table B into Power Query. Then create a query that merges the two tables on the company column, using a full outer join. Then extract the team and share columns. Create a new column for the calculation of the $ value, delete the columns not required and remove null values from the Team column. The result looks like this:
M Code generated by clicking the buttons in Power Query Editor and entering one IF statement manually looks like this:
let
Source = Table.NestedJoin(TableA,{"company"},TableB,{"company"},"NewColumn",JoinKind.FullOuter),
#"Expanded NewColumn" = Table.ExpandTableColumn(Source, "NewColumn", {"Team", "PercShare"}, {"Team", "PercShare"}),
#"Added Custom" = Table.AddColumn(#"Expanded NewColumn", "Dollars", each if [Team] = null then [dollar] else [dollar] *([PercShare]/100)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"PercShare"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Columns",null,"",Replacer.ReplaceValue,{"Team"})
in
#"Replaced Value"