I recently switched to PowerQuery to fetch data from various sources. I have loaded my existing data to a table called "masterEntries".
The query I have calls a function to check the last record for each source in "masterEntries" and fetches only newer records.
let
Source = Excel.CurrentWorkbook(){[Name="formsMaster"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"FormName", type text}, {"Form", type text}, {"LastEntry", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each formEntries([FormName],[LastEntry])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}, {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Form", "LastEntry"}),
in
#"Removed Columns"
This query loads the data to a new table. Instead I want to append the data to "masterEntries".
I am trying to do this with PowerQuery and not VBA. PowerQuery has Append Query feature where two or more queries/results can be combined to a new table.
Even a new query to append the resulting table from above query ("latestEntries") to existing table ("masterEntries") will do.
Any ideas on how it can be done with PowerQuery?
EDIT
My original data ("masterEntries") was loaded manually. It is a big table with 400K+ records. I can load it using a query if that is going to help.
Every run of "latestEntries" checks what records are already in "masterEntries" and fetches only newer entries from different sources.
The Append Query method in Power Query is just a connection. It does not append records permanently. That is, when "latestEntries" brings a new set of records, the "masterEntries" loses the records that were in the earlier run of "latestEntries".
This sounds a bit like a request for "incremental load". This is currently not supported by Power Query in Excel. The workaround is to go via a "linkback"-table like described here: http://ms-olap.blogspot.de/2015/05/incremental-data-loads-in-microsoft.html
If your linkback-table exceeds 1,1 Mio rows, you can use JSON-compression like described here: http://www.thebiccountant.com/2016/12/06/how-to-store-tables-longer-than-11-mio-rows-in-excel/
But be aware, that this costs performance.
Both methods "cost" performance, so this technique only makes sense, if you "save" repetitive execution of really heavy transformations (or long loads from the web).
You should add something like this, just change name of Your_Table into table you want to use:
#"Append Query" = Table.Combine({#"Removed Columns", Your_Table})
in
#"Append Query"
Assuming you have some kind of ID, and it is integer, here is the query for the masterEntries table (this is important!):
let
Source = Excel.CurrentWorkbook(){[Name="masterEntries"]}[Content],
Types = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Value", type number}}),
//Assuming you have integer-type IDs.
//Otherwise you have to order and index records in a view, and query that view.
MaxID = List.Max(Types[ID]),
//if you have ordered index, List.Max() can be substituted with Table.LastN(Types, 1)[ID]{0}
//it may perform better.
TableFromDB = Excel.CurrentWorkbook(){[Name="source"]}[Content], //Replace with database table
GetNewRows = Table.SelectRows(TableFromDB, each [ID] > MaxID),
MergeTables = Table.Combine({Types, GetNewRows})
in
MergeTables
Related
Oi,
So i'm having issues with the List.Contains (x,x)=false function.
Context
I have 32 Excel files where i retrieve data from in a first query, this data gets filtered so only the columns needed (Each of those 32 Excel files is about 2MB) - This query then gets transformed into a "list" (ListofJustifWBS) so i only have the WBS's of that particular Query.
I also have another query, where i import a huge data excel file including WBS's - Actuals - Best estimates,...
What i want to do is : only keep the WBS's from the second Query that are not included in the first query.
The code i'm using is = Table.SelectRows(#"Changed Type", each (List.Contains(ListOfJustifWBS,[WBS])=false))
Whenever i run the query in the editor, the data get processed.. However, when i track the "progress" in the bottom right cornor i see all 32 excel file getting progressed, but excel sometimes "retrieved" (?) 20MB worth of data in each excel file while the excel file itself is only 2MB?
Whenever i try to run the query in an Excel Sheet tabl, Excel goes "Not responding".
Any idea how to fix this?
If you replace
= Table.SelectRows(#"Changed Type", each (List.Contains(ListOfJustifWBS,[WBS])=false))
with
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"WBS"}, Table.FromList(ListOfJustifWBS ), {"Column1"}, "ListOfJustifWBS", JoinKind.LeftOuter),
#"Expanded ListOfJustifWBS" = Table.ExpandTableColumn(#"Merged Queries", "ListOfJustifWBS", {"Column1"}, {"Column1.1"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded ListOfJustifWBS", each ([Column1.1] = null))
is this any faster ?
I was able to merge(inner join) a transaction table with an owner info table using account number as my key to get the following results:
I need the transactions to be linked to the current owner(s) of the account. But as you can see, for the 9/16/2016 transaction, it is also linked to an owner who did not own the account until much later. Similarly, the 11/27/2020 transaction needs to be linked to the newer owner, so I am looking for something like this:
Since the secondary owner does not change, Mary applies to both transactions.
For other accounts, it is also possible for the primary owner to remain the same while the secondary owner has changed. There are also accounts in which there are up to 4 secondary owners.
So, in short, I need the transaction dates to match up with the previous, most recent Owner Change Date for both primary and secondary owner(s).
I am new to Power Query, so I do not know whether this is better done using PQ or simply Excel functions/formulas. Or maybe there are additional data manipulation/transformation steps I need to take before this?
Assuming you start with 2 tables, which look something like this:
Owners:
Transactions:
You can use a function to filter owners based on transaction date, apply a partiton to owner type, and return the latest owner, for each transaction row:
let
fnLatestOwners = (MyTable as table, MyDate as date) =>
let
#"Filtered Date" = Table.SelectRows(MyTable, each [Owner Change Date] <= MyDate),
#"Partitioned Owners" = Table.Group(#"Filtered Date", {"Primary / Secondary Owner"}, {{"Partition", each Table.FirstN(Table.Sort(_,{{"Owner Change Date", Order.Descending}}),1), type table}}),
#"Combined Partitions" = Table.Combine(#"Partitioned Owners"[Partition]),
#"Removed Columns" = Table.RemoveColumns(#"Combined Partitions",{"Owner Change Date"})
in
#"Removed Columns",
Source = Transactions,
#"Merged Queries" = Table.NestedJoin(Source,{"Account"},Owners,{"Account"},"Owners",JoinKind.LeftOuter),
#"Added Latest Owners" = Table.AddColumn(#"Merged Queries", "Latest Owners", each fnLatestOwners([Owners],[Trans Date]), type table),
#"Expanded Latest Owners" = Table.ExpandTableColumn(#"Added Latest Owners", "Latest Owners", {"Primary / Secondary Owner", "Owner Name"}, {"Primary / Secondary Owner", "Owner Name"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Latest Owners",{"Owners"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Primary / Secondary Owner", type text}, {"Owner Name", type text}})
in
#"Changed Type"
This returns:
I wish to extract the data from this website:
https://apps.who.int/food-additives-contaminants-jecfa-database/search.aspx?fl=%2b
When I do so in Power query I only get the Kind, Name, Children, Text table.
However when doing the exact same on PowerBI this recognises the list as desired.
Can I get Power Query to recognise the data in the same way? Or is there a way to export the Query to Excel?
It seems that Power Query in Excel does not have the Html.Table function which is used in Power Query in Power BI.
But you can export the data as a csv, then import into Excel.
From the PQ Editor:
Close and Apply
Visualize all the columns
Click in the visualized area
At the bottom right, you will see an ellipsis
Click there and you will be able to select Export to CSV
Bit of a convoluted solution but I have subsequently found how to achieve this using Power Query in Excel. It is possible to navigate the HTML within power query to get to the raw data. Once here the data may not be adjacent to one another but this can be cleaned up easily using power query.
M Code:
let
Source = Web.Page(Web.Contents("https://apps.who.int/food-additives-contaminants-jecfa-database/search.aspx?fcc=4")),
Data0 = Source{0}[Data],
Children0 = Data0{0}[Children],
Children1 = Children0{1}[Children],
Children2 = Children1{2}[Children],
Children3 = Children2{2}[Children],
Children6 = Children3{6}[Children],
Children4 = Children6{6}[Children],
Children5 = Children4{1}[Children],
Children19 = Children5{19}[Children],
Children7 = Children19{1}[Children],
#"Expanded Children" = Table.ExpandTableColumn(Children7, "Children", {"Kind", "Name", "Children", "Text"}, {"Children.Kind", "Children.Name", "Children.Children", "Children.Text"}),
#"Expanded Children.Children" = Table.ExpandTableColumn(#"Expanded Children", "Children.Children", {"Kind", "Name", "Children", "Text"}, {"Children.Children.Kind", "Children.Children.Name", "Children.Children.Children", "Children.Children.Text"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Children.Children",{"Kind", "Name", "Children.Kind", "Children.Name", "Children.Children.Kind", "Children.Children.Name", "Children.Children.Children", "Text"}),
#"Filled Up" = Table.FillUp(#"Removed Columns",{"Children.Text"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Up", each ([Children.Children.Text] <> null and [Children.Children.Text] <> "Flavouring Agent"))
in
#"Filtered Rows"
I have a REST endpoint that queries the data in JIRA and returns the data for activeSprint and previousSprint so the user can build a burndown chart, similar to this:
{
activeSprint: ['..burndown data array..'],
previousSprint: ['..burndown data array..']
}
So in power query I setup the first query which is only representing the connection-only to the source
JSON
let
Source = Json.Document(Web.Contents("https://.../burndown"))
in
Source
Then I make a reference of the query called JSON and create a new query called Active Sprint
let
Source = JSON,
activeSprint = Source[activeSprint],
burndownData = activeSprint[burndownData],
#"Converted to Table" = Table.FromList(burndownData, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"date", "dateFormatted", "storyPoints", "issueCount", "issues", "plannedStoryPoints"}, {"date", "dateFormatted", "storyPoints", "issueCount", "issues", "plannedStoryPoints"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1",{{"dateFormatted", type datetime}, {"date", type datetime}})
in
#"Changed Type"
then I do the same as per above for the Previous Sprint query.
I group all the queries together in XLS into one group called TEST1 so it looks like this:
When I right click on the group Test1 and click refresh, it makes 2 requests to the API. Can I make it just one call since the source is the same for both ??
The dependency in the Power Query Editor is correct, but I don't get it why it sends 2 separate requests ??
Thanks
I am using Power Query in Excel to read JSON files. I have a sample working script, as follows:
let
Source = Json.Document(File.Contents("E:\laureates.json")),
#"Converted to Table" = Record.ToTable(Source),
#"Expanded Value" = Table.ExpandListColumn(#"Converted to Table", "Value"),
#"Expanded Value1" = Table.ExpandRecordColumn(#"Expanded Value", "Value", {"id", "firstname", "surname", "born", "died", "bornCountry", "bornCountryCode", "bornCity", "diedCountry", "diedCountryCode", "diedCity", "gender", "prizes"}, {"Value.id", "Value.firstname", "Value.surname", "Value.born", "Value.died", "Value.bornCountry", "Value.bornCountryCode", "Value.bornCity", "Value.diedCountry", "Value.diedCountryCode", "Value.diedCity", "Value.gender", "Value.prizes"})
in
#"Expanded Value1"
The second and third arguments of the expression for #"Expanded Value1" has the names of the columns hard-coded; this code is generated via the user interface.
I would like to reuse the script. The problem is that whenever the source file changes - has different column names or new column names - I encounter an error. A workaround is to regenerate the script.
I can avoid the issue if I can specify the second and third arguments as expressions that will be evaluated dynamically at runtime.
So far, my attempts have failed: I would appreciate any hints regarding how I can replace the second and third arguments as code or expressions.
I think you can use Record.FieldNames to generate that list dynamically.
Something like this:
Table.ExpandRecordColumn(
#"Expanded Value",
"Value",
Record.FieldNames([Value]),
List.Transform(Record.FieldNames([Value]), each "Value." & _)
)
Edit: As AAsk points out, the above syntax is incorrect since it's attempting to pull row context on a table-level operation. Instead of [Value] for each row, we need to use a representative one to be applied to the whole column and picking #"Expanded Value"{0}[Value], the record from the first row, should work.
Table.ExpandRecordColumn(
#"Expanded Value",
"Value",
Record.FieldNames(#"Expanded Value"{0}[Value]),
List.Transform(Record.FieldNames(#"Expanded Value"{0}[Value]), each "Value." & _)
)
The List.Transform is there to prepend "Value." to the start of each column name but it works just fine to use Record.FieldNames(#"Expanded Value"{0}[Value]) twice instead.
Record.FieldNames in your (Alexis Olson) answer made me realise that I can get the column names from a single record (the logic being that all records should have the same column names). There the code that gets the column names dynamically is:
let
Source = Json.Document(File.Contents("E:\laureates.json")),
#"Converted to Table" = Record.ToTable(Source),
#"Expanded Value" = Table.ExpandListColumn(#"Converted to Table", "Value"),
#"Expanded Value1" = Table.ExpandRecordColumn(#"Expanded Value", "Value",
Record.FieldNames(#"Expanded Value"{0}[Value]),
Record.FieldNames(#"Expanded Value"{0}[Value])
)
in
#"Expanded Value1"
Now I can change the file name and its contents of files with different content are displayed correctly.