Load multiple files from a folder with missing column Power BI - azure

I am trying to load and combine multiple json files (from a single folder) into Power BI. Few files have a missing column.
Suppose, File1 has 20columns, where File2 has only 19columns.
Column Name are same.
But when combining and loading the files Power BI throws an error for the missing column:
An error occurred in the ‘Transform File (4)’ query. Expression.Error: The column 'col3' of the table wasn't found.
Source = AzureStorage.DataLake("Folder-Path"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File (4)", each #"Transform File (4)"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File (4)"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File (4)", Table.ColumnNames(#"Transform File (4)"(#"Sample File (4)"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"id", type any}, {"ref", type any}, {"col3", type any}})
in
#"Changed Type"```

I actually came here looking for an answer to a similar question.
this link will explain some more of the process:
https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-combine-binaries
When using the Helper Queries created by Power BI to combine files, A sample file will be chosen. Then you can apply steps to that file.
As you have stated some of the files are missing a column. When the change type is applied it is looking for specific header names. If it can't find one (case sensitive) the load will fail.
Removing the #"Changed Type" step or fixing the headers in the culprit files will fix this issue.
You could also remove {"col3", type any} from the #"Changed Type" step.
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"id", type any}, {"ref", type any}})
Not to hijack the question, but my issue is when the sample file has less columns than another file in the folder, it will only load as many columns as the sample file has, which can result in missing columns. My fix was to choose a sample file with the most columns.

Related

Getting lastest update file in Power BI Folder Source

I have a folder with 3 different subfolders where I store some daily exports. I want to get only the latest updated file from each folder path. As I show in the pic.
I need to keep the binary and the attributes columns of the latest "date modified" file for each same path. The latest date modified is different for each different subfolder. My subfolders refere to each month of the year (their name are May, Jun, July and soon it will be incorporated August).
In this way, as I have 3 different folder paths values in my column Folder Path, I will only have 3 csv to expand.
I have tried to group by folder path and maximum Date modified, but I lost the other fields. If I add, maximun of attributes or all rows, does not solve me problem either.
I also tried this solution: https://community.powerbi.com/t5/Desktop/Keep-only-the-latest-date-for-duplicate-entries/td-p/638447
But I get stuck in an error: Function type value cannot be converted to Table type. Details: Value = [Function], Type = [Type].
M Query code:
let
Source = AzureStorage.DataLake(".../usersDailyData"),
#"Filtered Hidden Files1" = Table.SelectRows(#"Sorted Rows", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transformar archivo", each #"Transformar archivo"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transformar archivo"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transformar archivo", Table.ColumnNames(#"Transformar archivo"(#"Archivo de ejemplo"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{ Columns name here })
in
#"Changed Type"
Any ideas to solve it?
Thank you very much.
I'd suggest adding a column that has the max datetime for each Folder Path, then comparing each rows datetime to that max, and filtering out non-matches. Along the lines of below. I think it goes after the Invoke Custom Function1 step , but you can move it where needed
...
#"Added Custom" = Table.AddColumn(#"Invoke Custom Function1","MaxDate",(i)=>List.Max(Table.SelectRows( #"Invoke Custom Function1", each [Folder Path]=i[Folder Path]) [Date modified]), type datetime ),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom", each if [Date modified]=[MaxDate] then "keep" else null),
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] = "keep")),
...

Excel Power Query only imports column titles, not data

I am trying to use Power Query in Excel 2013 to import a folder full of 121 text files. Each text file has a column of numbers:
24
2.0000E+07
1.0000E+07
5.0000E+06
2.0000E+06
1.0000E+06
1.0000E+05
1.0000E+04
1.0000E+03
1.0000E+02
1.0000E+01
1.0000E+00
6.2500E-01
5.0000E-01
4.0000E-01
3.0000E-01
2.0000E-01
1.0000E-01
8.0000E-02
6.0000E-02
4.0000E-02
3.0000E-02
2.0000E-02
1.0000E-02
2.0000E-04
1.0000E-05
1.0516E-05
9.3907E-06
3.3497E-04
1.8445E-03
1.3411E-03
5.4756E-03
9.4254E-03
1.2390E-02
1.4350E-02
1.5677E-02
1.7293E-02
4.0507E-03
2.0602E-03
2.1823E-03
3.1392E-03
7.5455E-03
9.1609E-02
7.5750E-02
1.2536E-01
1.9400E-01
1.2207E-01
1.2811E-01
1.1341E-01
5.2564E-02
56
2.0000E+07
6.4300E+06
4.3000E+06
3.0000E+06
1.8500E+06
1.5000E+06
1.2000E+06
8.6100E+05
7.5000E+05
6.0000E+05
4.7000E+05
3.3000E+05
2.7000E+05
2.0000E+05
5.0000E+04
2.0000E+04
1.7000E+04
3.7400E+03
2.2500E+03
1.9200E+02
1.8800E+02
1.1800E+02
1.1600E+02
1.0500E+02
1.0100E+02
6.7500E+01
6.5000E+01
3.7100E+01
3.6000E+01
2.1800E+01
2.1200E+01
2.0500E+01
7.0000E+00
6.8800E+00
6.5000E+00
6.2500E+00
5.0000E+00
1.1300E+00
1.0800E+00
1.0100E+00
6.2500E-01
4.5000E-01
3.7500E-01
3.5000E-01
3.2500E-01
2.5000E-01
2.0000E-01
1.5000E-01
1.0000E-01
8.0000E-02
6.0000E-02
5.0000E-02
4.0000E-02
2.5300E-02
1.0000E-02
4.0000E-03
1.0000E-05
I want to use Power Query to import the entire folder into Excel, with the data in each text file having its own column, and the column header being the name of the text file.
Like this
The problem is that Power Query only seems to import the file names, but not the data within them.
So I get something like:
this
With no data underneath its respective column. What am I doing wrong? Would it have something to do with Power Query seeing the data as 'binary' instead of 'text'?
This should do what you want ... read in all .txt files in a directory, and then place the values from each into its own column where the column headers is the filename.
Obviously, change the path in the first step
Assumes a single column of data in each source file
let Source = Folder.Files("C:\directory\subdirectory\"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".txt")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each Table.AddIndexColumn(Csv.Document(File.Contents([Folder Path]&"\"&[Name]),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),"Index",1)),
#"Expanded Custom.1" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Column1", "Index"}, {"Column1", "Index"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Custom.1",{"Name", "Column1", "Index"}),
#"Pivoted Column" = Table.Pivot(#"Removed Other Columns", List.Distinct(#"Removed Other Columns"[Name]), "Name", "Column1"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns"

Power Query file path from cell value

I have, in my excel workbook, a Table called ResultsTable, in that table there is a file path
C:\Users\XXXX\OneDrive - WORK\Digital
Soil\Data\Results
I have Query that should get all excel files from the folder and transform the data into something usefull looking like this:
let
Source = Folder.Files("ResultsTable"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Analyseresultater", each #"Transform File from Analyseresultater"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from Analyseresultater"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Analyseresultater", Table.ColumnNames(#"Transform File from Analyseresultater"(#"Sample File"))),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Table Column1",{"Key", "Attribute", "Value"})
in
#"Removed Other Columns"
But I get the error
DataFormat.Error: The supplied folder path must be a valid absolute
path. Details:
ResultsTable
I hope someone can help me get through this error :)
EDIT: Added screenshot of how my sheet with tables are set up
You can fix the code like,
let
FilePath = Excel.CurrentWorkbook(){[Name="ResultsTable"]}[Content][Path to results]{0},
Source = Folder.Files(FilePath),
In the original code, Folder.Files() was receiving the literal text "ResultsTable", not the cell value in ResultsTable. You need to first pick the cell value with Excel.CurrentWorkbook(), and then pass it to Folder.Files().

how to resolve Power BI error - the key didn't match any rows in the table

I am trying to load (combine) multiple Excel files into Power BI (October 2019 version). Every file has only 1 sheet. Each sheet has 1 range, and each range has the same schema across all files. (The sheet names are different, though.) A sample sheet name is '200704'.
Here are my steps:
Get Data \ Folder \ Connect
specify the Folder path
Combine & Load
select one of the files as my sample file; click on the file name as
my Parameter1; click OK
After I click OK, the cursor spins for a bit, and then it stops. Nothing happens. So, I go to Edit Queries \ Edit Queries. There is a warning symbol on my data query that reads:
An error occurred in the 'Transform File' query. Expression.Error:
The key didn't match any rows in the table.
Details: Key = Item=200704 Kind=Sheet Table=[Table]
How do I resolve this error?
If it helps, Power BI generate 5 queries for me, and the structure is:
Transform File from data [2]
Helper Queries [3]
Parameter1 (Sample File)
Sample File
Transform File
Transform Sample File
Other Queries [1]
data
Interestingly, if it helps to diagnose the issue, if I set sample file = First file or if I set sample file to my first file manually, the following error is thrown in the dialog, but it doesn't show what query is in error when I try to view / edit the query.
Failed to save modifications to the server. Error returned: 'OLE DB or ODBC error: [Expression.Error] The key didn't match any rows in the table..'.
And, to be sure, when I attempt to load this file (or any file in the folder, for that matter) individually (via Excel connection), it loads successfully. So, something must be wrong with the M code in my Folder connection.
I figured out the cause of my problem and the solution. The issue is that the row in my template query was being referenced incorrectly (i.e., the primary key between the template query and the regular query is wrong, and it has hard-coding of sheet names). To fix that, I had to remove all other columns in the template query table except the Data column, as described here. (It's odd that no MS documentation on combining multiple Excel files discusses this very important step.)
For comparison, here is my former (incorrect) M code:
Transform Sample File:
let
Source = Excel.Workbook(Parameter1, null, true),
#"Sample_Sheet" = Source{[Item="sample",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(#"Sample_Sheet", [PromoteAllScalars=true])
in
#"Promoted Headers"
test:
let
Source = Folder.Files("C:\some folder path"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"ID", type text}, {"Name", type text}})
in
#"Changed Type"
And here is my new (correct) code:
Transform Sample File:
let
Source = Excel.Workbook(Parameter1, null, true),
#"Removed Columns" = Table.RemoveColumns(Source,{"Name", "Item", "Kind", "Hidden"}),
Data = #"Removed Columns"{0}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Data, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}, {"Name", type text}})
in
#"Changed Type"
test:
let
Source = Folder.Files("C:\some folder path"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))
in
#"Expanded Table Column1"
Notice the 'Removed Columns' step in the new template query. This is the "secret sauce" to the key problem. Also notice that I kept all default steps after my 'Data' step (i.e., 'Promoted Headers' and 'Changed Type') in my template query. This is because all of my sheets have the same schema. If this weren't true, then I would need to move those steps to the regular query.
I had exactly the same error simply because the PowerBI VNET Gateway could not authenticate to the source of the data to refresh dataset hosted in PowerBI premium capacity workspace. Totally unexpected and confusing, but once the correct credentials were set for the Gateway configuration - everything worked fine and the error had gone away.

Appending Queries using Powerquery/M in Excel

I have a series of about 70 queries in an Excel workbook, I want to append them into one output table.
Simple editor freezes, leaving me with the 'Advanced Editor' option.
I have tried:
let
Source = Excel.CurrentWorkbook(){[Name="Table110"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Source", type any}, {"Sender", type any}, {"Subject", type any}, {"Date", type any}, {"Body", type any}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null})))
#"Appended Query" = Table.Combine({#"Removed Blank Rows",#"A A",B,C,#"D D",E})
in
#"Appended Query"
(Shortened Table.Combine line for clarity)
The editor brings back 'Token Comma expected' for this and highlights the #"Appended Query" in line:
#"Appended Query" = Table.Combine({#"Removed Blank Rows",#"A A",B,C,#"D D",E})
Can anyone see/help explain this error?
A comma is missing at the end of step #"Removed Blank Rows".

Resources