Having issues with List.Contains - Not loading

Having issues with List.Contains - Not loading - excel

Oi,
So i'm having issues with the List.Contains (x,x)=false function.
Context
I have 32 Excel files where i retrieve data from in a first query, this data gets filtered so only the columns needed (Each of those 32 Excel files is about 2MB) - This query then gets transformed into a "list" (ListofJustifWBS) so i only have the WBS's of that particular Query.
I also have another query, where i import a huge data excel file including WBS's - Actuals - Best estimates,...
What i want to do is : only keep the WBS's from the second Query that are not included in the first query.
The code i'm using is = Table.SelectRows(#"Changed Type", each (List.Contains(ListOfJustifWBS,[WBS])=false))
Whenever i run the query in the editor, the data get processed.. However, when i track the "progress" in the bottom right cornor i see all 32 excel file getting progressed, but excel sometimes "retrieved" (?) 20MB worth of data in each excel file while the excel file itself is only 2MB?
Whenever i try to run the query in an Excel Sheet tabl, Excel goes "Not responding".
Any idea how to fix this?

If you replace
= Table.SelectRows(#"Changed Type", each (List.Contains(ListOfJustifWBS,[WBS])=false))
with
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"WBS"}, Table.FromList(ListOfJustifWBS ), {"Column1"}, "ListOfJustifWBS", JoinKind.LeftOuter),
#"Expanded ListOfJustifWBS" = Table.ExpandTableColumn(#"Merged Queries", "ListOfJustifWBS", {"Column1"}, {"Column1.1"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded ListOfJustifWBS", each ([Column1.1] = null))
is this any faster ?

Related

Excel Power Query: how to prevent multiple requests for the same source (web json)?

I have a REST endpoint that queries the data in JIRA and returns the data for activeSprint and previousSprint so the user can build a burndown chart, similar to this:
{
activeSprint: ['..burndown data array..'],
previousSprint: ['..burndown data array..']
}
So in power query I setup the first query which is only representing the connection-only to the source
JSON
let
Source = Json.Document(Web.Contents("https://.../burndown"))
in
Source
Then I make a reference of the query called JSON and create a new query called Active Sprint
let
Source = JSON,
activeSprint = Source[activeSprint],
burndownData = activeSprint[burndownData],
#"Converted to Table" = Table.FromList(burndownData, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"date", "dateFormatted", "storyPoints", "issueCount", "issues", "plannedStoryPoints"}, {"date", "dateFormatted", "storyPoints", "issueCount", "issues", "plannedStoryPoints"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1",{{"dateFormatted", type datetime}, {"date", type datetime}})
in
#"Changed Type"
then I do the same as per above for the Previous Sprint query.
I group all the queries together in XLS into one group called TEST1 so it looks like this:
When I right click on the group Test1 and click refresh, it makes 2 requests to the API. Can I make it just one call since the source is the same for both ??
The dependency in the Power Query Editor is correct, but I don't get it why it sends 2 separate requests ??
Thanks

Power Query: Include all unique columns found in files from folder

I've recently run into an issue which hopefully is solvable.
Currently, I have power query pointing at a folder containing several CSV files. This is normally no issue, however, in this instance not all of the files have the same columns.
Is there a way to have power query return every unique column found in the folder populating empty data observations with null values?
Assume that my folder has csv files similar to the following (note that the rows are indexed using letters for easy reference):
I would like my final table to look something like:
This seems like it should be pretty simple, but I can't figure it out for the life of me! Any help would be greatly appreciated!

Assuming you're using Folder.Files, I think you can:
Grab the Content column of the table returned by Folder.Files -- which should give you a list of binary values.
Parse each item in the list as a CSV document using List.Transform and Csv.Document -- which should give you a list of tables.
Then merge your list of tables with Table.Combine -- which should give you one single table. Table.Combine should take care of the details (like aligning column names).
You've not provided any code in your question, so it's hard to give a relevant example, but I think the code below gives me your expected output.
I've turned the row indexes into an ID column, just to make the final table easier to verify/follow.
let
firstCsv =
"ID,one,two,three
A,1,4,7
B,2,5,8
C,3,6,9",
secondCsv =
"ID,one,two,three,four
D,1,6,11,16
E,2,7,12,17
F,3,8,13,18
G,4,9,14,19
H,5,10,15,20",
thirdCsv =
"ID,one,two,yes,no,maybe
I,1,1,1,1,1
J,2,2,2,2,2
K,3,3,3,3,3
L,4,4,4,4,4
M,5,5,5,5,5",
// For example's sake, let's suppose that the contrived table below was
// returned by calling Folder.Files
filesInFolder = Table.FromColumns({
List.Transform({firstCsv, secondCsv, thirdCsv}, Text.ToBinary),
List.Transform({"1".."3"}, each "CSV file " & _ & ".csv"),
List.Repeat({"someFolderPath"}, 3)
}, type table [Content = binary, Name = text, Folder = text]),
parsed = List.Transform(filesInFolder[Content], each
let
csv = Csv.Document(_, [Delimiter = ",", QuoteStyle = QuoteStyle.Csv]),
promoted = Table.PromoteHeaders(csv, [PromoteAllScalars = true])
in promoted
),
// The step below should match the expected output in your question.
combined = Table.Combine(parsed)
in
combined
Obviously, you'll need to adjust for your own folder path and actually call Folder.Files as you presumably already are in your own code.

I've always used something like this
//read all files in specified directory you fill in here
let Source = Folder.Files("C:\directory\subdirectory"),
//filter only csv files
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".csv")),
//Pull contents of each file into table with an index
#"Added Custom1" = Table.AddColumn(#"Filtered Rows", "Custom", each Table.AddIndexColumn(Csv.Document(File.Contents([Folder Path]&"\"&[Name]),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),"Index")),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom1", "Custom", {"Column1", "Index"}, {"Column1", "Index"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Custom",{"Column1", "Index", "Name"}),
#"Pivoted Column" = Table.Pivot(#"Removed Other Columns", List.Distinct(#"Removed Other Columns"[Name]), "Name", "Column1"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns"

How to get Column Names dynamically in Excel Power Query

I am using Power Query in Excel to read JSON files. I have a sample working script, as follows:
let
Source = Json.Document(File.Contents("E:\laureates.json")),
#"Converted to Table" = Record.ToTable(Source),
#"Expanded Value" = Table.ExpandListColumn(#"Converted to Table", "Value"),
#"Expanded Value1" = Table.ExpandRecordColumn(#"Expanded Value", "Value", {"id", "firstname", "surname", "born", "died", "bornCountry", "bornCountryCode", "bornCity", "diedCountry", "diedCountryCode", "diedCity", "gender", "prizes"}, {"Value.id", "Value.firstname", "Value.surname", "Value.born", "Value.died", "Value.bornCountry", "Value.bornCountryCode", "Value.bornCity", "Value.diedCountry", "Value.diedCountryCode", "Value.diedCity", "Value.gender", "Value.prizes"})
in
#"Expanded Value1"
The second and third arguments of the expression for #"Expanded Value1" has the names of the columns hard-coded; this code is generated via the user interface.
I would like to reuse the script. The problem is that whenever the source file changes - has different column names or new column names - I encounter an error. A workaround is to regenerate the script.
I can avoid the issue if I can specify the second and third arguments as expressions that will be evaluated dynamically at runtime.
So far, my attempts have failed: I would appreciate any hints regarding how I can replace the second and third arguments as code or expressions.

I think you can use Record.FieldNames to generate that list dynamically.
Something like this:
Table.ExpandRecordColumn(
#"Expanded Value",
"Value",
Record.FieldNames([Value]),
List.Transform(Record.FieldNames([Value]), each "Value." & _)
)
Edit: As AAsk points out, the above syntax is incorrect since it's attempting to pull row context on a table-level operation. Instead of [Value] for each row, we need to use a representative one to be applied to the whole column and picking #"Expanded Value"{0}[Value], the record from the first row, should work.
Table.ExpandRecordColumn(
#"Expanded Value",
"Value",
Record.FieldNames(#"Expanded Value"{0}[Value]),
List.Transform(Record.FieldNames(#"Expanded Value"{0}[Value]), each "Value." & _)
)
The List.Transform is there to prepend "Value." to the start of each column name but it works just fine to use Record.FieldNames(#"Expanded Value"{0}[Value]) twice instead.

Record.FieldNames in your (Alexis Olson) answer made me realise that I can get the column names from a single record (the logic being that all records should have the same column names). There the code that gets the column names dynamically is:
let
Source = Json.Document(File.Contents("E:\laureates.json")),
#"Converted to Table" = Record.ToTable(Source),
#"Expanded Value" = Table.ExpandListColumn(#"Converted to Table", "Value"),
#"Expanded Value1" = Table.ExpandRecordColumn(#"Expanded Value", "Value",
Record.FieldNames(#"Expanded Value"{0}[Value]),
Record.FieldNames(#"Expanded Value"{0}[Value])
)
in
#"Expanded Value1"
Now I can change the file name and its contents of files with different content are displayed correctly.

how to resolve Power BI error - the key didn't match any rows in the table

I am trying to load (combine) multiple Excel files into Power BI (October 2019 version). Every file has only 1 sheet. Each sheet has 1 range, and each range has the same schema across all files. (The sheet names are different, though.) A sample sheet name is '200704'.
Here are my steps:
Get Data \ Folder \ Connect
specify the Folder path
Combine & Load
select one of the files as my sample file; click on the file name as
my Parameter1; click OK
After I click OK, the cursor spins for a bit, and then it stops. Nothing happens. So, I go to Edit Queries \ Edit Queries. There is a warning symbol on my data query that reads:
An error occurred in the 'Transform File' query. Expression.Error:
The key didn't match any rows in the table.
Details: Key = Item=200704 Kind=Sheet Table=[Table]
How do I resolve this error?
If it helps, Power BI generate 5 queries for me, and the structure is:
Transform File from data [2]
Helper Queries [3]
Parameter1 (Sample File)
Sample File
Transform File
Transform Sample File
Other Queries [1]
data
Interestingly, if it helps to diagnose the issue, if I set sample file = First file or if I set sample file to my first file manually, the following error is thrown in the dialog, but it doesn't show what query is in error when I try to view / edit the query.
Failed to save modifications to the server. Error returned: 'OLE DB or ODBC error: [Expression.Error] The key didn't match any rows in the table..'.
And, to be sure, when I attempt to load this file (or any file in the folder, for that matter) individually (via Excel connection), it loads successfully. So, something must be wrong with the M code in my Folder connection.

I figured out the cause of my problem and the solution. The issue is that the row in my template query was being referenced incorrectly (i.e., the primary key between the template query and the regular query is wrong, and it has hard-coding of sheet names). To fix that, I had to remove all other columns in the template query table except the Data column, as described here. (It's odd that no MS documentation on combining multiple Excel files discusses this very important step.)
For comparison, here is my former (incorrect) M code:
Transform Sample File:
let
Source = Excel.Workbook(Parameter1, null, true),
#"Sample_Sheet" = Source{[Item="sample",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(#"Sample_Sheet", [PromoteAllScalars=true])
in
#"Promoted Headers"
test:
let
Source = Folder.Files("C:\some folder path"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"ID", type text}, {"Name", type text}})
in
#"Changed Type"
And here is my new (correct) code:
Transform Sample File:
let
Source = Excel.Workbook(Parameter1, null, true),
#"Removed Columns" = Table.RemoveColumns(Source,{"Name", "Item", "Kind", "Hidden"}),
Data = #"Removed Columns"{0}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Data, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}, {"Name", type text}})
in
#"Changed Type"
test:
let
Source = Folder.Files("C:\some folder path"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))
in
#"Expanded Table Column1"
Notice the 'Removed Columns' step in the new template query. This is the "secret sauce" to the key problem. Also notice that I kept all default steps after my 'Data' step (i.e., 'Promoted Headers' and 'Changed Type') in my template query. This is because all of my sheets have the same schema. If this weren't true, then I would need to move those steps to the regular query.

I had exactly the same error simply because the PowerBI VNET Gateway could not authenticate to the source of the data to refresh dataset hosted in PowerBI premium capacity workspace. Totally unexpected and confusing, but once the correct credentials were set for the Gateway configuration - everything worked fine and the error had gone away.

Power Query to Append to Existing Table

I recently switched to PowerQuery to fetch data from various sources. I have loaded my existing data to a table called "masterEntries".
The query I have calls a function to check the last record for each source in "masterEntries" and fetches only newer records.
let
Source = Excel.CurrentWorkbook(){[Name="formsMaster"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"FormName", type text}, {"Form", type text}, {"LastEntry", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each formEntries([FormName],[LastEntry])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}, {"EntryId", "Field1", "Field2", "Field3", "Field5", "DateCreated"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Form", "LastEntry"}),
in
#"Removed Columns"
This query loads the data to a new table. Instead I want to append the data to "masterEntries".
I am trying to do this with PowerQuery and not VBA. PowerQuery has Append Query feature where two or more queries/results can be combined to a new table.
Even a new query to append the resulting table from above query ("latestEntries") to existing table ("masterEntries") will do.
Any ideas on how it can be done with PowerQuery?
EDIT
My original data ("masterEntries") was loaded manually. It is a big table with 400K+ records. I can load it using a query if that is going to help.
Every run of "latestEntries" checks what records are already in "masterEntries" and fetches only newer entries from different sources.
The Append Query method in Power Query is just a connection. It does not append records permanently. That is, when "latestEntries" brings a new set of records, the "masterEntries" loses the records that were in the earlier run of "latestEntries".

This sounds a bit like a request for "incremental load". This is currently not supported by Power Query in Excel. The workaround is to go via a "linkback"-table like described here: http://ms-olap.blogspot.de/2015/05/incremental-data-loads-in-microsoft.html
If your linkback-table exceeds 1,1 Mio rows, you can use JSON-compression like described here: http://www.thebiccountant.com/2016/12/06/how-to-store-tables-longer-than-11-mio-rows-in-excel/
But be aware, that this costs performance.
Both methods "cost" performance, so this technique only makes sense, if you "save" repetitive execution of really heavy transformations (or long loads from the web).

You should add something like this, just change name of Your_Table into table you want to use:
#"Append Query" = Table.Combine({#"Removed Columns", Your_Table})
in
#"Append Query"

Assuming you have some kind of ID, and it is integer, here is the query for the masterEntries table (this is important!):
let
Source = Excel.CurrentWorkbook(){[Name="masterEntries"]}[Content],
Types = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Value", type number}}),
//Assuming you have integer-type IDs.
//Otherwise you have to order and index records in a view, and query that view.
MaxID = List.Max(Types[ID]),
//if you have ordered index, List.Max() can be substituted with Table.LastN(Types, 1)[ID]{0}
//it may perform better.
TableFromDB = Excel.CurrentWorkbook(){[Name="source"]}[Content], //Replace with database table
GetNewRows = Table.SelectRows(TableFromDB, each [ID] > MaxID),
MergeTables = Table.Combine({Types, GetNewRows})
in
MergeTables

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Having issues with List.Contains - Not loading - excel

Related

Excel Power Query: how to prevent multiple requests for the same source (web json)?

Power Query: Include all unique columns found in files from folder

How to get Column Names dynamically in Excel Power Query

how to resolve Power BI error - the key didn't match any rows in the table

Power Query to Append to Existing Table

Categories

Resources