Excel and Power Query: Merge Using Most Recent Date

Excel and Power Query: Merge Using Most Recent Date - excel

I was able to merge(inner join) a transaction table with an owner info table using account number as my key to get the following results:
I need the transactions to be linked to the current owner(s) of the account. But as you can see, for the 9/16/2016 transaction, it is also linked to an owner who did not own the account until much later. Similarly, the 11/27/2020 transaction needs to be linked to the newer owner, so I am looking for something like this:
Since the secondary owner does not change, Mary applies to both transactions.
For other accounts, it is also possible for the primary owner to remain the same while the secondary owner has changed. There are also accounts in which there are up to 4 secondary owners.
So, in short, I need the transaction dates to match up with the previous, most recent Owner Change Date for both primary and secondary owner(s).
I am new to Power Query, so I do not know whether this is better done using PQ or simply Excel functions/formulas. Or maybe there are additional data manipulation/transformation steps I need to take before this?

Assuming you start with 2 tables, which look something like this:
Owners:
Transactions:
You can use a function to filter owners based on transaction date, apply a partiton to owner type, and return the latest owner, for each transaction row:
let
fnLatestOwners = (MyTable as table, MyDate as date) =>
let
#"Filtered Date" = Table.SelectRows(MyTable, each [Owner Change Date] <= MyDate),
#"Partitioned Owners" = Table.Group(#"Filtered Date", {"Primary / Secondary Owner"}, {{"Partition", each Table.FirstN(Table.Sort(_,{{"Owner Change Date", Order.Descending}}),1), type table}}),
#"Combined Partitions" = Table.Combine(#"Partitioned Owners"[Partition]),
#"Removed Columns" = Table.RemoveColumns(#"Combined Partitions",{"Owner Change Date"})
in
#"Removed Columns",
Source = Transactions,
#"Merged Queries" = Table.NestedJoin(Source,{"Account"},Owners,{"Account"},"Owners",JoinKind.LeftOuter),
#"Added Latest Owners" = Table.AddColumn(#"Merged Queries", "Latest Owners", each fnLatestOwners([Owners],[Trans Date]), type table),
#"Expanded Latest Owners" = Table.ExpandTableColumn(#"Added Latest Owners", "Latest Owners", {"Primary / Secondary Owner", "Owner Name"}, {"Primary / Secondary Owner", "Owner Name"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Latest Owners",{"Owners"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Primary / Secondary Owner", type text}, {"Owner Name", type text}})
in
#"Changed Type"
This returns:

Related

Excel Power Query - Dynamics 365 Online - get names from another table without Merge

I am pulling some Opportunities from our Dyn365 instance into Excel using the Get Data From Dynamics 365 (online) feature. Some columns (i.e. Account, Seller, etc.) show GUIDs rather than names and in order to see names I need to use Merge Queries and pull the relevant tables into Excel.
Problem is, the Accounts table has ~800k records and the Sellers table isn't small either so even if I reduce the number of columns to load, it still takes about 7 minutes to refresh this query. My questions:
Can this be achieved without the need for merging tables?
Or, can I use merge, but not have to load Accounts and Sellers into worksheet?
Is there a better way to do what i'm trying to do?
(except for the Dynamic Worksheet Export)?
//Edited 31-May-2022
M code:
let
Source = Excel.CurrentWorkbook(){[Name="Opportunities"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"OpportunityID", type text}, {"OpportunityNumber", type text}, {"Account", type text}, {"Seller", type text}}),
#"Merged Queries" = Table.Buffer(Table.NestedJoin(#"Changed Type", {"Account"}, Accounts, {"AccountID"}, "Accounts", JoinKind.LeftOuter)),
#"Expanded Accounts" = Table.ExpandTableColumn(#"Merged Queries", "Accounts", {"CustomerName", "Country"}, {"Accounts.CustomerName", "Accounts.Country"}),
#"Merged Queries1" = Table.Buffer(Table.NestedJoin(#"Expanded Accounts", {"Seller"}, Sellers, {"SellerID"}, "Sellers", JoinKind.LeftOuter)),
#"Expanded Sellers" = Table.ExpandTableColumn(#"Merged Queries1", "Sellers", {"SellerName"}, {"Sellers.SellerName"})
in
#"Expanded Sellers"
//Edit2 - the below doesn't even want to load into preview (i.e. marching ants forever). Without the "Table.Buffer()" the preview loads within seconds.
let
Source = OData.Feed("https://mydomain.crm.dynamics.com/api/data/v8.2/", null, [Implementation="2.0"]),
#"BufferedOpportunities" = Table.Buffer(Source{[Name="opportunities",Signature="table"]}[Data])
in
#"BufferedOpportunities"

Regarding point 2:
You can merge without loading the Accounts and Sellers tables into a worksheet. You can right click the query in the Queries & Connections pane in Excel and ensure you tick Only Create Connection for each of those two queries. The tables will then be unloaded from the workbook which will save a lot of time.
Edit:
Try this. I have typed it freehand so can't guarantee no mistakes but it should give you an idea of what you need to do.
let
Source = Excel.CurrentWorkbook(){[Name="Opportunities"]}[Content],
#"Changed Type" = Table.Buffer( Table.TransformColumnTypes(Source,{{"OpportunityID", type text}, {"OpportunityNumber", type text}, {"Account", type text}, {"Seller", type text}})),
BufferedAccounts = Table.Buffer(Accounts),
BufferedSellers = Table.Buffer(Sellers),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Account"}, BufferedAccounts, {"AccountID"}, "Accounts", JoinKind.LeftOuter),
#"Expanded Accounts" = Table.ExpandTableColumn(#"Merged Queries", "Accounts", {"CustomerName", "Country"}, {"Accounts.CustomerName", "Accounts.Country"}),
#"Merged Queries1" = Table.NestedJoin(#"Expanded Accounts", {"Seller"}, BufferedSellers, {"SellerID"}, "Sellers", JoinKind.LeftOuter),
#"Expanded Sellers" = Table.ExpandTableColumn(#"Merged Queries1", "Sellers", {"SellerName"}, {"Sellers.SellerName"})
in
#"Expanded Sellers"

Power Query Formula Language - Get children based on parent adjacent column value

bear with me, this is my first attempt using the Power Query Formula Language. I need some advice on how to solve a particular problem sorting and filtering source data.
I now got this current source data, structured like this:
Using this power query:
let
Source = Excel.CurrentWorkbook(){[Name="EmployeeOrganization"]}[Content],
ListEmployees = Table.Group(Source, {"Organization"}, {{"Employee", each Text.Combine([Employee],","), type text}}),
CountEmployees = Table.AddColumn(ListEmployees, "Count", each List.Count(Text.Split([Employee],","))),
SplitEmployees = Table.SplitColumn(ListEmployees, "Employee", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),List.Max(CountEmployees[Count])),
Transpose = Table.Transpose(SplitEmployees),
PromoteHeaders = Table.PromoteHeaders(Transpose, [PromoteAllScalars=true])
in
PromoteHeaders
I am able to produce the following result:
To avoid having to add the organization name to every single employee in the source, I would like the organization name to act as an parent-group, with the employees as children. I would also like the result to only fetch the organizations (+ employees) that has status Active = Yes.
The desired source should look similar to this:
So that the desired result should look similar to this: (Apple is gone due to Active = NO)
I am stuck at this point and need some advice on how can I modify my Power Query Formula to:
Only fetch Organizations that are Active (Does not matter if they have employees or not)
Somehow link the children Employees to the correct Organizations. (Without having to write the org name in every adjacent employee column)
(Excel file can be found her)

In PQ, you'll need to fill in the blank rows, then Pivot with no aggregation.
See the comments in the code, and follow the Applied Steps to understand the algorithm
Source
Custom Function
Rename: fnPivotAll
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
Basic Query
let
//Read in data and set data types
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8k12yc9LzEkpVtJRAqLI1GKlWJ1oEDMgtSS1CCQK5XvlpyLzEvPgXMeCgpxUiH6/fJgC38SiSiT1jjmZyXAN7vn56TAdyDYmluYgaXHKTwLzYgE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Organization = _t, Employee = _t, Active = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Organization", type text}, {"Employee", type text}, {"Active", type text}}),
//replace blanks with null if not already there
#"Replaced Value" = Table.ReplaceValue(#"Changed Type","",null,Replacer.ReplaceValue,{"Organization", "Employee", "Active"}),
//fill down the Company and active columns
#"Filled Down" = Table.FillDown(#"Replaced Value",{"Organization", "Active"}),
//Filter to show only Active="Yes and Employee not null
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Employee] <> null) and ([Active] = "Yes")),
//Pivot with no aggregation
//could do this with grouping, but easier (and maybe faster, with a custom function
pivotAll = fnPivotAll(#"Filtered Rows","Organization","Employee"),
//remove unneeded Active column and set data types
#"Removed Columns" = Table.RemoveColumns(pivotAll,{"Active"}),
typed = Table.TransformColumnTypes(#"Removed Columns",
List.Transform(Table.ColumnNames(#"Removed Columns"),each {_, Text.Type}))
in
typed
typed Results

Excel Power Query: how to prevent multiple requests for the same source (web json)?

I have a REST endpoint that queries the data in JIRA and returns the data for activeSprint and previousSprint so the user can build a burndown chart, similar to this:
{
activeSprint: ['..burndown data array..'],
previousSprint: ['..burndown data array..']
}
So in power query I setup the first query which is only representing the connection-only to the source
JSON
let
Source = Json.Document(Web.Contents("https://.../burndown"))
in
Source
Then I make a reference of the query called JSON and create a new query called Active Sprint
let
Source = JSON,
activeSprint = Source[activeSprint],
burndownData = activeSprint[burndownData],
#"Converted to Table" = Table.FromList(burndownData, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"date", "dateFormatted", "storyPoints", "issueCount", "issues", "plannedStoryPoints"}, {"date", "dateFormatted", "storyPoints", "issueCount", "issues", "plannedStoryPoints"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1",{{"dateFormatted", type datetime}, {"date", type datetime}})
in
#"Changed Type"
then I do the same as per above for the Previous Sprint query.
I group all the queries together in XLS into one group called TEST1 so it looks like this:
When I right click on the group Test1 and click refresh, it makes 2 requests to the API. Can I make it just one call since the source is the same for both ??
The dependency in the Power Query Editor is correct, but I don't get it why it sends 2 separate requests ??
Thanks

Feed cell value into excel query web browser URL

My problem:
Through New Query -> From Other Sources -> From Web, I entered a static URL that allowed me to load approximately 60k "IDs" from a webpage in JSON format.
I believe each of these IDs corresponds to an item.
So they're all loaded and organised in a column, with one ID per line, inside a Query tab.
For the moment, no problem.
Now I need to import information from a dynamic URL that depends on the ID.
So I need to import from URL in this form:
http://www.example.com/xxx/xxxx/ID
This imports the following for each ID:
name of correspond item,
average price,
supply,
demand,
etc.
After research I came to the conclusion that I had to use the "Advanced Editor" inside the query editor to reference the ID query tab.
However I have no idea how to put together the static part with the ID, and how to repeat that over the 60k lines.
I tried this:
let
Source = Json.Document(Web.Contents("https://example.com/xx/xxxx/" & ID)),
name1 = Source[name]
in
name1
This returns an error.
I think it's because I can't add a string and a column.
Question: How do I reference the value of the cell I'm interested in and add it to my string ?
Question: Is what I'm doing viable?
Question: How is Excel going to handle loading 60k queries?
Each query is only a few words to import.
Question: Is it possible to load information from 60k different URLs with one query?
EDIT : thank you very much for answer Alexis, was very helpful. So to avoid copying what you posted I did it without the function (tell me what you think of it) :
let
Source = Json.Document(Web.Contents("https://example.com/all-ID.json")),
items1 = Source[items],
#"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
#"Inserted Merged Column" = Table.AddColumn(#"Renamed Columns", "URL", each Text.Combine({"http://example.com/api/item/", Text.From([ID], "fr-FR")}), type text),
#"Added Custom" = Table.AddColumn(#"Inserted Merged Column", "Item", each Json.Document(Web.Contents([URL]))),
#"Expanded Item" = Table.ExpandRecordColumn(#"Added Custom", "Item", {"name"}, {"Item.name"})
in
#"Expanded Item"
Now the problem I have is that it takes ages to load up all the information I need from all the URLs.
As it turns out it's possible to extract from multiple IDs at once using this format : http://example.com/api/item/ID1,ID2,ID3,ID4,...,IDN
I presume that trying to load from an URL containing all of the IDs at once would not work out because the URL would contain way too many characters to handle.
So to speed things up, what I'm trying to do now is concatenate every Nth row into one cell, for example with N=3 :
205
651
320165
63156
4645
31
6351
561
561
31
35
would become :
205, 651, 320165
63156, 4645, 31
6351, 561, 561
31, 35
The "Group by" functionnality doesn't seem to be what I'm looking for, and I'm not sure how to automatise that throught Power Query
EDIT 2
So after a lot of testing I found a solution, even though it might not be the most elegant and optimal :
I created an index with a 1 step
I created another costum column, I associated every N rows with an N increasing number
I used "Group By" -> "All Rows" to create a "Count" column
Created a costum column "[Count][ID]
Finally I excracted values from that column and put a "," separator
Here's the code for N = 10 000 :
let
Source = Json.Document(Web.Contents("https://example.com/items.json")),
items1 = Source[items],
#"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"ID", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Conditional Column" = Table.AddColumn(#"Added Index", "Custom", each if Number.RoundDown([Index]/10000) = [Index]/10000 then [Index] else Number.IntegerDivide([Index],10000)*10000),
#"Reordered Columns" = Table.ReorderColumns(#"Added Conditional Column",{"Index", "ID", "Custom"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns", {"Custom"}, {{"Count", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom.1", each [Count][ID]),
#"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Custom.1", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
#"Extracted Values"

I think what you want to do here is create a custom function that you invoke with each of your ID values.
Let me give a similar example that should point you in the right direction.
Let's say I have a table named ListIDs which looks like this:
ID
----
1
2
3
4
5
6
7
8
9
10
and for each ID I want to pull some information from Wikipedia (e.g. for ID = 6 I want to lookup https://en.wikipedia.org/wiki/6 and return the Cardinal, Ordinal, Factorization, and Divisors of 6).
To get this for just one ID value my query would look like this (using 6 again):
let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/6")),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
#"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
#"Pivoted Column"
Now we want to convert this into a function so that we can use it as many times as we want without creating a bunch of queries. (Note: I've named this query/function WikiLookUp as well.) To do this, change it to the following:
let
WikiLookUp = (ID as text) =>
let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/" & ID)),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
#"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
#"Pivoted Column"
in
WikiLookUp
Notice that all we did is wrap it in another set of let...in and defined the parameter ID = text which gets substituted into the Source line near the end. The function should appear like this:
Now we can go back to our table which we've imported into the query editor and invoke our newly created function in a custom column. (Note: Make sure you convert your ID values to text type first since they're being appended to a URL.)
Add a custom column with the following definition (or use the Invoke Custom Function button)
= WikiLookUp([ID])
Expand that column to bring in all the columns you want and you're done!
Here's what that query's M code looks like:
let
Source = Excel.CurrentWorkbook(){[Name="ListIDs"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each WikiLookUp([ID])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Cardinal", "Ordinal", "Factorization", "Divisors"}, {"Cardinal", "Ordinal", "Factorization", "Divisors"})
in
#"Expanded Custom"
The query should look like this:

Power Query to Append to Existing Table

I recently switched to PowerQuery to fetch data from various sources. I have loaded my existing data to a table called "masterEntries".
The query I have calls a function to check the last record for each source in "masterEntries" and fetches only newer records.
let
Source = Excel.CurrentWorkbook(){[Name="formsMaster"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"FormName", type text}, {"Form", type text}, {"LastEntry", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each formEntries([FormName],[LastEntry])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}, {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Form", "LastEntry"}),
in
#"Removed Columns"
This query loads the data to a new table. Instead I want to append the data to "masterEntries".
I am trying to do this with PowerQuery and not VBA. PowerQuery has Append Query feature where two or more queries/results can be combined to a new table.
Even a new query to append the resulting table from above query ("latestEntries") to existing table ("masterEntries") will do.
Any ideas on how it can be done with PowerQuery?
EDIT
My original data ("masterEntries") was loaded manually. It is a big table with 400K+ records. I can load it using a query if that is going to help.
Every run of "latestEntries" checks what records are already in "masterEntries" and fetches only newer entries from different sources.
The Append Query method in Power Query is just a connection. It does not append records permanently. That is, when "latestEntries" brings a new set of records, the "masterEntries" loses the records that were in the earlier run of "latestEntries".

This sounds a bit like a request for "incremental load". This is currently not supported by Power Query in Excel. The workaround is to go via a "linkback"-table like described here: http://ms-olap.blogspot.de/2015/05/incremental-data-loads-in-microsoft.html
If your linkback-table exceeds 1,1 Mio rows, you can use JSON-compression like described here: http://www.thebiccountant.com/2016/12/06/how-to-store-tables-longer-than-11-mio-rows-in-excel/
But be aware, that this costs performance.
Both methods "cost" performance, so this technique only makes sense, if you "save" repetitive execution of really heavy transformations (or long loads from the web).

You should add something like this, just change name of Your_Table into table you want to use:
#"Append Query" = Table.Combine({#"Removed Columns", Your_Table})
in
#"Append Query"

Assuming you have some kind of ID, and it is integer, here is the query for the masterEntries table (this is important!):
let
Source = Excel.CurrentWorkbook(){[Name="masterEntries"]}[Content],
Types = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Value", type number}}),
//Assuming you have integer-type IDs.
//Otherwise you have to order and index records in a view, and query that view.
MaxID = List.Max(Types[ID]),
//if you have ordered index, List.Max() can be substituted with Table.LastN(Types, 1)[ID]{0}
//it may perform better.
TableFromDB = Excel.CurrentWorkbook(){[Name="source"]}[Content], //Replace with database table
GetNewRows = Table.SelectRows(TableFromDB, each [ID] > MaxID),
MergeTables = Table.Combine({Types, GetNewRows})
in
MergeTables

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Excel and Power Query: Merge Using Most Recent Date - excel

Related

Excel Power Query - Dynamics 365 Online - get names from another table without Merge

Power Query Formula Language - Get children based on parent adjacent column value

Excel Power Query: how to prevent multiple requests for the same source (web json)?

Feed cell value into excel query web browser URL

Power Query to Append to Existing Table

Categories

Resources