Merging two tables by the closest date and ID in Power query - excel

I have two excel workbooks that contain information about when a truck departs a depo and one where it is received at another. Each file contains the following in separate columns:
Departure:
Departure Date,
Departure Time,
Truck ID,
Cargo info
Receive:
Arrival Date,
Arrival Time,
Truck ID
How can I merge these two tables so the receiving table can be populated with the cargo from the departure table using Power Query?
As you can see, there is sometimes the same truck on separate trips on the same date, and therefore, it would be great to allocate the cargo based on the closest date and time for a particular truck ID. Clearly a truck cannot arrive before it has left.
I want to try and do this using Power Query, but I have been scratching my head on how to do it. Any help is greatly appreciated.
My two toy data files can be downloaded here

Powerquery version:
Load Departure table, here Table1 into Powerquery
Transform data type for date column to date, and time column to time
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}, {"Truck_ID", type text}, {"Cargo", type text}})
in #"Changed Type"
File .. close and load .. connection only
Load Receive table into Powerquery
Transform data type for date column to date, and time column to time
Add column, custom column
= (i)=> Table.FirstN(Table.Sort(Table.SelectRows(Table1, each [Date] = i[Date] and [Truck_ID] = i[Truck_ID] and [Time]<=i[Time]),{{"Time", Order.Descending}}),1)
What this does is find all the rows from Table1 (departures) with same date, same truck ID, and Time less than or equal to receive time. It then sorts in descending time order, and takes first row, thus finding closest time.
Finally, expand the cargo column using arrows atop the added new column
Full code for Table2 (Receive table)
let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}, {"Truck_ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", (i)=> Table.FirstN(Table.Sort(Table.SelectRows(Table1, each [Date] = i[Date] and [Truck_ID] = i[Truck_ID] and [Time]<=i[Time]),{{"Time", Order.Descending}}),1)),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Cargo"}, {"Cargo"})
in #"Expanded Custom"

And here's another method using Sorting and Grouping to develop the list
Algorithm explained in code comments
MCode
let
//Generated by the UI if you select to get Data from Folder
//and the folder has only your tables to process.
//If there are other tables, add a Filter to just select the two you want.
Source = Folder.Files("C:\Users\ron\Desktop\BillyJo"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
//Keep only the Source.Name and Transform Files columns
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
//Combine the two tables
combin = Table.Combine(#"Renamed Columns1"[Transform File]),
#"Changed Type" = Table.TransformColumnTypes(combin,{{"Date", type date}, {"Time", type time}}),
//create dateTime column to more easily compare the times
fullDate = Table.AddColumn(#"Changed Type","fullDate", each [Date] & [Time], type datetime),
//sort by full date ascending
#"Sorted Rows" = Table.Sort(fullDate,{{"fullDate", Order.Ascending}}),
//group by Truck_ID, then fill down the Cargo column and extract the Even rows for the Received table
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"Truck_ID"}, {
{"all", each Table.AlternateRows(Table.FillDown(_,{"Cargo"}),0,1,1),
type table [Date=nullable date, Time=nullable time, Truck_ID=text, Cargo=nullable text, fullDate=datetime]}
}),
//Remove unneeded column and
//Expand the table produced by the Grouping
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Truck_ID"}),
#"Expanded all" = Table.ExpandTableColumn(#"Removed Columns", "all", {"Date", "Time", "Truck_ID", "Cargo"}, {"Date", "Time", "Truck_ID", "Cargo"})
in
#"Expanded all"

This should do:
Stack the tables
Sort for Truck & Time
= Table.Sort(#"Replaced Value",{{"Truck_ID", Order.Descending},{"Time", Order.Descending}})
Add index column starting with 1, then integer divide it by two
Now you can reference it twice from this point. Filter by Arrivals in one case and filter by Departures by the other, then merge using Index as ID:

Related

PowerQuery - Forecast from table

I am trying to create a forecast (single table) for departments to input their assumptions on spending in a single table. Instead of entering amounts for every single month, I would like the user to enter the amount, frequency, start date, and end date for each category. To illustrate, see below the table with some sample data.
This is the result in Power Query (or Power BI) I am trying to get, which is my understanding of how to be able to run date slicers and filters in a Power BI model when comparing against actuals.
If this can't be done with DAX and instead must be done in excel (through look up formulas), how would you structure the formula?
Here is a PQ example that creates what you show as your desired table given what you show as your input:
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to better understand the algorithm
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table9"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"G/L", Int64.Type}, {"Dimension", type text}, {"Description", type text},
{"Amount", Int64.Type}, {"Repeat Every", type text}, {"Start Date", type date}, {"End Date", type date}}),
//Last possible date as Today + 5 years (to end of month)
lastDt = Date.EndOfMonth(Date.AddYears(Date.From(DateTime.FixedLocalNow()),5)),
//Generate list of all possible dates for a given row using List.Generate function
allDates = Table.AddColumn(#"Changed Type", "allDates", each let
lastDate = List.Min({lastDt,[End Date]}),
intvl = {1,3,6}{List.PositionOf({"Monthly","Quarterly","Semi Annual"},[Repeat Every])}
in
List.Generate(
()=> [Start Date],
each _ <= lastDate,
each Date.EndOfMonth(Date.AddMonths(_,intvl)))),
//Remove unneeded columns and expand the list of dates
#"Removed Columns" = Table.RemoveColumns(allDates,{"Repeat Every", "Start Date", "End Date"}),
#"Expanded allDates" = Table.ExpandListColumn(#"Removed Columns", "allDates"),
//Sort to get desired output
// Date column MUST be sorted to ensure correct order when pivoted
// Other columns sorted alphanumerically, but could change the sort to reflect original order if preferred.
#"Sorted Rows" = Table.Sort(#"Expanded allDates",{
{"allDates", Order.Ascending},
{"G/L", Order.Ascending},
{"Dimension", Order.Ascending}}),
//Pivot the date column with no aggregation
#"Pivoted Column" = Table.Pivot(
Table.TransformColumnTypes(#"Sorted Rows", {
{"allDates", type text}}, "en-US"),
List.Distinct(Table.TransformColumnTypes(#"Sorted Rows", {{"allDates", type text}}, "en-US")[allDates]),
"allDates", "Amount")
in
#"Pivoted Column"
Original Data
Results

Group by column A value, transpose column B, column C row values for each grouped column A value

This is in Excel 2016. I have a spreadsheet where each row represents a response to two questions "Qa" and "Qb" from a unique student. The spreadsheet columns are: "Section" (class section student is in), "Qa", and "Qb".
Thus, if three students answered from the same class section, that section will be listed three times under "Section", with each unique students answers in the other columns.
I want to group by section and spread the answers to each question across a single row in separate columns. The number of columns to create will default to the section with the most unique responses
In this case, 10003 has the greatest number of responses, so I want to get the following end result.
I am at a loss with how to get this going. Something like grouping by the section but transposing the rows within that group?
As #ScottCraner pointed out, you can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
M Code
let
//Change table name in next row to actual table name in workbook
Source = Excel.CurrentWorkbook(){[Name="Table20"]}[Content],
//set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Section", Int64.Type}, {"Qa", type text}, {"Qb", type text}}),
//Group by Section
//Add a 1-based Index column to each Group
#"Grouped Rows" = Table.Group(#"Changed Type", {"Section"}, {
{"Row", each Table.AddIndexColumn(_,"Row",1,1)}}),
//Expand the grouped tables
#"Expanded Row" = Table.ExpandTableColumn(#"Grouped Rows", "Row", {"Qa", "Qb", "Row"}, {"Qa", "Qb", "Row"}),
//Unpivot
//Merge Row and Attribute columns to create the q-number headers
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Row", {"Section", "Row"}, "Attribute", "Value"),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Unpivoted Other Columns",
{{"Row", type text}}, "en-US"),{"Attribute", "Row"},
Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
//Pivot on the Sorted Merged column with no aggregation
#"Pivoted Column" = Table.Pivot(#"Merged Columns", List.Sort(List.Distinct(#"Merged Columns"[Merged])), "Merged", "Value")
in
#"Pivoted Column"
Note that there are no empty columns (iow, there is no Qa-4)
If you really need an empty column, insert a step at the beginning replacing nulls with a blank
let
//Change table name in next row to actual table name in workbook
Source = Excel.CurrentWorkbook(){[Name="Table20"]}[Content],
//set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Section", Int64.Type}, {"Qa", type text}, {"Qb", type text}}),
//if you really need a blank Qa column since you have four distinct Qb rows but only 3 Qa rows,
// then we insert the next line
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"",Replacer.ReplaceValue,{"Qa", "Qb"}),
//Group by Section
//Add a 1-based Index column to each Group
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Section"}, {
{"Row", each Table.AddIndexColumn(_,"Row",1,1)}}),
//Expand the grouped tables
#"Expanded Row" = Table.ExpandTableColumn(#"Grouped Rows", "Row", {"Qa", "Qb", "Row"}, {"Qa", "Qb", "Row"}),
//Unpivot
//Merge Row and Attribute columns to create the q-number headers
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Row", {"Section", "Row"}, "Attribute", "Value"),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Unpivoted Other Columns",
{{"Row", type text}}, "en-US"),{"Attribute", "Row"},
Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
//Pivot on the Sorted Merged column with no aggregation
#"Pivoted Column" = Table.Pivot(#"Merged Columns", List.Sort(List.Distinct(#"Merged Columns"[Merged])), "Merged", "Value")
in
#"Pivoted Column"

Power Query: Calculate date/time instances within and over 1 day and show them as percentages (system utility time %)

I have some system data set where I wanna find comparison between two systems (Uptimum + scrubber), utility time (%) of how much of percentage they were operational during 24h but also if it exceeds 24h.
Data set is below data, but as you can notice - there are dates in Column A (date) gaps there, some days are missing and that will be like that from time to time. But there are also more system instances within one day (system operation can be changed many times per day), that is a reason there is a time in Column B (time column) so I can follow the exact timing of operation within a day.
There is no official "end time" here, it is just ongoing process where operations (systems) are changing/shifting among many other parameters.
What I did is, I extracted dates in Column F so to avoid duplicates and summed them up per system (G2 and H2 Columns), using this functions below and you can see screenshot below too:
=SUMIFS(Explog2021_04_28[T];Explog2021_04_28[D];$F2;Explog2021_04_28[System];"<>"&G$1)-SUMIFS(Explog2021_04_28[T];Explog2021_04_28[D];$F2;Explog2021_04_28[System];G$1)+(INDEX(Explog2021_04_28[System];MATCH($F2;Explog2021_04_28[D]))=G$1)-(INDEX(Explog2021_04_28[System];MATCH($F2;Explog2021_04_28[D];0))<>G$1)*$B2
With this function I summed Columns A and B using extracted values of date and system options.
First thing as you can notice I have minus values as percentage, it shouldnt be there, is that because I have so many gaps in dates? Is there a better way to fix this? As you can see on chart it looks bad..
This shouldnt also exceed 100% of overall usage if that is possible.
Every input would be great from you.
If I understand you correctly, I believe the following Power Query should accomplish what you are looking for.
Please read the code comments and step through the applied steps window to understand the algorithm. Ask if you have questions, and complain if there are logic errors.
I assumed that the system was always in either scrubber or Uptimum
M Code
let
//Read in data. Change table name in next line to reflect actual table name
Source = Excel.CurrentWorkbook(){[Name="systemTable"]}[Content],
//Type the columns
#"Changed Type" = Table.TransformColumnTypes(Source,{{"D", type text}, {"T", type any}, {"System", type text}}),
#"Changed Type with Locale" = Table.TransformColumnTypes(#"Changed Type", {{"D", type date}}, "en-150"),
#"Changed Type1" = Table.TransformColumnTypes(#"Changed Type with Locale",{{"T", type time}}),
//Combine date and time => datetime
#"Added Custom" = Table.AddColumn(#"Changed Type1", "startTime",
each DateTime.From(Number.From([D]) + Number.From([T])), type datetime),
//create shifted column to be able to quickly refer to previous row
//this method much faster than using an Index column
Base = #"Added Custom",
ShiftedList = List.RemoveFirstN(Table.Column(Base, "startTime"),1) & {null},
Custom1 = Table.ToColumns(Base) & {ShiftedList},
Custom2 = Table.FromColumns(Custom1, Table.ColumnNames(Base) & {"endTime"}),
#"Changed Type2" = Table.TransformColumnTypes(Custom2,{{"endTime", type datetime}}),
//Create a list of dates for each time span
#"Added Custom1" = Table.AddColumn(#"Changed Type2", "datesList", each
let
st = DateTime.Date([startTime]),
et = DateTime.Date([endTime] ),
dur = Duration.TotalDays(et-st)
in
if et=null then {st} else List.Dates(st,dur+1,#duration(1,0,0,0))),
//Expand the list so we have sequential dates (fill in the gaps)
#"Expanded datesList" = Table.ExpandListColumn(#"Added Custom1", "datesList"),
//Remove unneeded columns
#"Removed Columns" = Table.RemoveColumns(#"Expanded datesList",{"D", "T"}),
//change date list datatype to datetime for simpler calculation formula
#"Changed Type3" = Table.TransformColumnTypes(#"Removed Columns",{{"datesList", type datetime}}),
//calculate hours in System each day
#"Added Custom2" = Table.AddColumn(#"Changed Type3", "Hrs in Day",
each List.Min({Date.EndOfDay([datesList]),[endTime]}) - List.Max({[startTime],[datesList]}),Duration.Type),
//Remove unneeded columns
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom2",{"startTime", "endTime"}),
//change date list to dates for report
#"Changed Type5" = Table.TransformColumnTypes(#"Removed Columns1",{{"datesList", type date}}),
//Group by Date and System to calculate percent time in system
#"Grouped Rows" = Table.Group(#"Changed Type5", {"datesList", "System"}, {
{"Sum", each List.Sum([Hrs in Day])/#duration(0,24,0,0), Percentage.Type}}),
//Pivot on System to generate final report
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[System]), "System", "Sum", List.Sum),
//Rename the datelist column
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"datesList", "D"}})
in
#"Renamed Columns"
Data
Results

In Excel or Power BI, how can I assign 'ongoing project' to each date in a year, when I only have the start and end date of each project?

I have two tables (see attached workbook).
One with the names of all projects in first column, the start date of the project in that row in the second column and the end date of that project in the third column.
The other table has a column with all the dates in a year. I want to add several columns to it. My question is how to get the one that I coloured yellow in the workbook. That column should contain the project that will be/was in process for each day of the year.
I hope the workbook will illustrate my problem.
Sneak peak:
Table one
Project ID
Start Date
End Date
A
2/1/2020
3/1/2020
B
5/1/2020
10/1/2020
Etc.
Etc.
Etc.
Table two
Each Date in a year
Ongoing project
1/1/2020
2/1/2020
A
3/1/2020
A
4/1/2020
5/1/2020
B
Etc.
Etc.
So far I have tried several approaches: Index/match, xlookup, dynamic arrays.
Edit:
Excel Wizard (YouTube) provided a solution that helped me out.
=TEXTJOIN(",",,REPT(TableOne[Project ID],([#Each Date in a year]>=TableOne[Start Date])*(#Each Date in a year]<=TableOne[End Date])))
In Power Query you could:
Transform your table 1 into a table where each ProjectID/Date has a single row
Create a second table consisting of all the dates in the time period
Join the two tables with a JoinKind.FullOuter
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Project ID", type text}, {"Start Date", type date}, {"End Date", type date}}),
//show one row for each projectID/Date
#"Added Custom" = Table.AddColumn(#"Changed Type", "dtRange", each
List.Dates([Start Date], Duration.TotalDays([End Date] - [Start Date]) + 1,#duration(1,0,0,0))),
#"Expanded dtRange" = Table.ExpandListColumn(#"Added Custom", "dtRange"),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded dtRange",{{"dtRange", type date}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Start Date", "End Date"}),
//not sure what you want for the calendar range
//but you can set it in the next two steps
dtStart = #date(2021,12,3),
calDays = 365,
dtTbl = Table.TransformColumnTypes(
Table.FromList(
List.Dates(dtStart,calDays,#duration(1,0,0,0)),
Splitter.SplitByNothing(),{"Dates"},null,ExtraValues.Error),
{{"Dates", type date}}),
//combine the two tables
joinTbl = Table.Join(dtTbl,"Dates",#"Removed Columns","dtRange",JoinKind.FullOuter),
#"Removed Columns1" = Table.RemoveColumns(joinTbl,{"dtRange"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Dates", Order.Ascending}}),
#"Renamed Columns" = Table.RenameColumns(#"Sorted Rows",{{"Project ID", "Ongoing Project"}})
in
#"Renamed Columns"
Sample Data
Results note that multiple non-project date rows are hidden

Excel calculate value for new category based on other group categories

I am getting data from a database that is provided in long format and I need to get ratios from values that are given different categories. E.g. I want the average price based on revenues and quantity sold.
Is there an easy way to calculate this in a pivot once I have the data?
My MWE would look like this
And I woul like to calculate the new rows with the category price
One way would probably to do this in MS SQL beforehand, but I am not that skilled with that and I need my colleagues to be able to do this in Excel themselves.
In Power Query, you can
Group the Rows by Year
From the resultant tables, divide the 1st Value by the 2nd.
Paste the code below into the Advanced Editor; and change the table name in Line 2 to reflect the actual table name of your data. Then you can explore the "Applied Steps" in the UI to see how the code was generated.
Changing the data table will change the Query results, but you will need to "Refresh" the query. This can be done form the Ribbon; or you can create a Button on the worksheet.
M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Year"}, {{"Grouped", each _, type table [Year=number, Category=text, Value=number]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Price",
each Table.Column([Grouped],"Value"){0} /
Table.Column([Grouped],"Value"){1})
in
#"Added Custom"
Edit: From your comments, it seems you might have more than just Revenue/Quantity pairs of categories for each year. And I suppose it possible you might have more than a single Revenue/Quantity pair.
Below is code that will take that into account; breaking the Quantity and Revenue from each year into two columns, then dividing one by the other which would result in a weighted average price for each year:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//needed only if you have blank rows in the table
#"Filtered Rows" = Table.SelectRows(Source, each ([Year] <> null)),
//Group by Year
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Year"}, {{"Grouped", each _, type table [Year=number, Category=text, Value=number]}}),
//Extract Revenue and Quantity into two new columns of Lists
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Revenue", each Table.Column(Table.SelectRows([Grouped], each ([Category] = "Revenue")),"Value")),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Quantity", each Table.Column(Table.SelectRows([Grouped], each ([Category] = "Quantity")),"Value")),
//Sum the value for each List of Revenue and divide by each in the List of Quantity
//This will result in a weighted average if there is more than one Revenue/Quantity pair in a year
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Price", each List.Sum([Revenue]) / List.Sum([Quantity])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Grouped", "Revenue", "Quantity"}),
//Some cleanup
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Year", Int64.Type}, {"Price", Currency.Type}})
in
#"Changed Type"

Resources