I am trying to use Power Query in Excel 2013 to import a folder full of 121 text files. Each text file has a column of numbers:
24
2.0000E+07
1.0000E+07
5.0000E+06
2.0000E+06
1.0000E+06
1.0000E+05
1.0000E+04
1.0000E+03
1.0000E+02
1.0000E+01
1.0000E+00
6.2500E-01
5.0000E-01
4.0000E-01
3.0000E-01
2.0000E-01
1.0000E-01
8.0000E-02
6.0000E-02
4.0000E-02
3.0000E-02
2.0000E-02
1.0000E-02
2.0000E-04
1.0000E-05
1.0516E-05
9.3907E-06
3.3497E-04
1.8445E-03
1.3411E-03
5.4756E-03
9.4254E-03
1.2390E-02
1.4350E-02
1.5677E-02
1.7293E-02
4.0507E-03
2.0602E-03
2.1823E-03
3.1392E-03
7.5455E-03
9.1609E-02
7.5750E-02
1.2536E-01
1.9400E-01
1.2207E-01
1.2811E-01
1.1341E-01
5.2564E-02
56
2.0000E+07
6.4300E+06
4.3000E+06
3.0000E+06
1.8500E+06
1.5000E+06
1.2000E+06
8.6100E+05
7.5000E+05
6.0000E+05
4.7000E+05
3.3000E+05
2.7000E+05
2.0000E+05
5.0000E+04
2.0000E+04
1.7000E+04
3.7400E+03
2.2500E+03
1.9200E+02
1.8800E+02
1.1800E+02
1.1600E+02
1.0500E+02
1.0100E+02
6.7500E+01
6.5000E+01
3.7100E+01
3.6000E+01
2.1800E+01
2.1200E+01
2.0500E+01
7.0000E+00
6.8800E+00
6.5000E+00
6.2500E+00
5.0000E+00
1.1300E+00
1.0800E+00
1.0100E+00
6.2500E-01
4.5000E-01
3.7500E-01
3.5000E-01
3.2500E-01
2.5000E-01
2.0000E-01
1.5000E-01
1.0000E-01
8.0000E-02
6.0000E-02
5.0000E-02
4.0000E-02
2.5300E-02
1.0000E-02
4.0000E-03
1.0000E-05
I want to use Power Query to import the entire folder into Excel, with the data in each text file having its own column, and the column header being the name of the text file.
Like this
The problem is that Power Query only seems to import the file names, but not the data within them.
So I get something like:
this
With no data underneath its respective column. What am I doing wrong? Would it have something to do with Power Query seeing the data as 'binary' instead of 'text'?
This should do what you want ... read in all .txt files in a directory, and then place the values from each into its own column where the column headers is the filename.
Obviously, change the path in the first step
Assumes a single column of data in each source file
let Source = Folder.Files("C:\directory\subdirectory\"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".txt")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each Table.AddIndexColumn(Csv.Document(File.Contents([Folder Path]&"\"&[Name]),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),"Index",1)),
#"Expanded Custom.1" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Column1", "Index"}, {"Column1", "Index"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Custom.1",{"Name", "Column1", "Index"}),
#"Pivoted Column" = Table.Pivot(#"Removed Other Columns", List.Distinct(#"Removed Other Columns"[Name]), "Name", "Column1"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns"
Related
So I have files that I get from a 3rd party everyday, they have been building up for over a year and I need to combine them into summary pivots, 1 file/pivot for each month.
So I have ~30 files that are .xls files but I think they are formatted as txt files because when I open them I get this notification below, anyway when I save them the defualt is text tab delimited.
Example of notification
Each file has the same formatting and the same column headers. My current slow strategy is to open one at a time and paste the contents all into one file, then create a pivot. I know I should be doing this faster using either Power Pivot/Power Query or VB. Which one should I use and can anyone give me hints on how to get started?
you can do this in powerquery using either of these options from data ... get data ... from other sources .... blank query and then home ... advanced editor...
To read and combine all XLS files in a directory
//read all files in specified directory you fill in here
let Source = Folder.Files("C:\directory\subdirectory"),
//filter only filetype xlsx
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Content"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "GetFileData", each Excel.Workbook([Content],true)),
#"Expanded GetFileData" = Table.ExpandTableColumn(#"Added Custom", "GetFileData", {"Data", "Hidden", "Item", "Kind", "Name"}, {"Data", "Hidden", "Item", "Kind", "Sheet"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded GetFileData",{"Content", "Hidden", "Item", "Kind"}),
List = List.Union(List.Transform(#"Removed Columns"[Data], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", List,List)
in #"Expanded Data"
to read and combine all text files (you specify extension) in directory
let Source = Folder.Files("C:\directory\subdirectory"),
//filter only filetype txt
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".txt")),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows", "Custom", each Table.AddIndexColumn(Csv.Document(File.Contents([Folder Path]&"\"&[Name]),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),"Index")),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom1", "Custom", {"Column1", "Index"}, {"Column1", "Index"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Custom",{"Column1", "Index", "Name"}),
#"Pivoted Column" = Table.Pivot(#"Removed Other Columns", List.Distinct(#"Removed Other Columns"[Name]), "Name", "Column1"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns"
I have a folder with 3 different subfolders where I store some daily exports. I want to get only the latest updated file from each folder path. As I show in the pic.
I need to keep the binary and the attributes columns of the latest "date modified" file for each same path. The latest date modified is different for each different subfolder. My subfolders refere to each month of the year (their name are May, Jun, July and soon it will be incorporated August).
In this way, as I have 3 different folder paths values in my column Folder Path, I will only have 3 csv to expand.
I have tried to group by folder path and maximum Date modified, but I lost the other fields. If I add, maximun of attributes or all rows, does not solve me problem either.
I also tried this solution: https://community.powerbi.com/t5/Desktop/Keep-only-the-latest-date-for-duplicate-entries/td-p/638447
But I get stuck in an error: Function type value cannot be converted to Table type. Details: Value = [Function], Type = [Type].
M Query code:
let
Source = AzureStorage.DataLake(".../usersDailyData"),
#"Filtered Hidden Files1" = Table.SelectRows(#"Sorted Rows", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transformar archivo", each #"Transformar archivo"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transformar archivo"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transformar archivo", Table.ColumnNames(#"Transformar archivo"(#"Archivo de ejemplo"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{ Columns name here })
in
#"Changed Type"
Any ideas to solve it?
Thank you very much.
I'd suggest adding a column that has the max datetime for each Folder Path, then comparing each rows datetime to that max, and filtering out non-matches. Along the lines of below. I think it goes after the Invoke Custom Function1 step , but you can move it where needed
...
#"Added Custom" = Table.AddColumn(#"Invoke Custom Function1","MaxDate",(i)=>List.Max(Table.SelectRows( #"Invoke Custom Function1", each [Folder Path]=i[Folder Path]) [Date modified]), type datetime ),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom", each if [Date modified]=[MaxDate] then "keep" else null),
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] = "keep")),
...
I have a datasource from an external Excel file that I have added to an Excel worksheet. I need to add new custom columns that compare the data to a table ("My_Table") in another worksheet that is manually updated. I used the Power Query Editor and created a new column that checks if there is a matching entry in My_Table based on matching 3 columns and gives a True/False result (ie for each row of the datasource, if the acctName, projectName, and boardName match a corresponding row in My_Table, then it returns true):
#"Added Custom" = Table.AddColumn(#"Reordered Columns", "Tracked", each Table.Contains( My_Table, [Customer=[acctName], Project=[projectName], Board=[boardName]]))
What I would like to do now is do the exact same thing but count how many times those three columns match in "My_Table". I thought Tabel.RowCount would work but I'm not sure if that's the right way to do it as I either have an error or a zero result.
dolomike, Here's another shot at it...
I started with this as Table1:
...and this as My_Table:
...and used this M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Merged Queries" = Table.NestedJoin(Source, {"acctName", "projectName", "boardName"}, My_Table, {"Customer", "Project", "Board"}, "My_Table", JoinKind.LeftOuter),
#"Expanded My_Table" = Table.ExpandTableColumn(#"Merged Queries", "My_Table", {"Customer", "Project", "Board"}, {"My_Table.Customer", "My_Table.Project", "My_Table.Board"}),
#"Grouped Rows" = Table.Group(#"Expanded My_Table", {"My_Table.Customer", "My_Table.Project", "My_Table.Board"}, {{"Count", each Table.RowCount(_), type number}, {"AllData", each _, type table [acctName=text, projectName=text, boardName=text, My_Table.Customer=text, My_Table.Project=text, My_Table.Board=text]}}),
Custom2 = Table.TransformColumns(#"Grouped Rows",{"Count", each if _ = List.Max(#"Grouped Rows"[Count]) then 0 else _}),
#"Removed Other Columns" = Table.SelectColumns(Custom2,{"Count", "AllData"}),
#"Expanded AllData" = Table.ExpandTableColumn(#"Removed Other Columns", "AllData", {"acctName", "projectName", "boardName", "My_Table.Customer", "My_Table.Project", "My_Table.Board"}, {"acctName", "projectName", "boardName", "My_Table.Customer", "My_Table.Project", "My_Table.Board"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Expanded AllData",{"Count", "acctName", "projectName", "boardName"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"acctName", "projectName", "boardName", "Count"}),
#"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{{"acctName", "Customer"}, {"projectName", "Project"}, {"boardName", "Board"}}),
#"Removed Duplicates" = Table.Distinct(#"Renamed Columns")
in
#"Removed Duplicates"
...to get this result:
I have, in my excel workbook, a Table called ResultsTable, in that table there is a file path
C:\Users\XXXX\OneDrive - WORK\Digital
Soil\Data\Results
I have Query that should get all excel files from the folder and transform the data into something usefull looking like this:
let
Source = Folder.Files("ResultsTable"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Analyseresultater", each #"Transform File from Analyseresultater"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from Analyseresultater"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Analyseresultater", Table.ColumnNames(#"Transform File from Analyseresultater"(#"Sample File"))),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Table Column1",{"Key", "Attribute", "Value"})
in
#"Removed Other Columns"
But I get the error
DataFormat.Error: The supplied folder path must be a valid absolute
path. Details:
ResultsTable
I hope someone can help me get through this error :)
EDIT: Added screenshot of how my sheet with tables are set up
You can fix the code like,
let
FilePath = Excel.CurrentWorkbook(){[Name="ResultsTable"]}[Content][Path to results]{0},
Source = Folder.Files(FilePath),
In the original code, Folder.Files() was receiving the literal text "ResultsTable", not the cell value in ResultsTable. You need to first pick the cell value with Excel.CurrentWorkbook(), and then pass it to Folder.Files().
I am trying to append close to 10000 excel files (each having size of 50-100 kb). Half the way into the process I am running into an error with the PQ. The error hits half the way when I am appending files and it is impossible to figure out which .xlsx file is the one causing the issue.
PQ's Queries and Connections pane shows the following error at the same time:
How do I go about resolving this issue other than going one by one manually and uploading query on PQ until I find the file(s) which are giving me the errors? Thanks for reading!
I've frequently run into issues where PQ outright fails when it runs into "error" cells in excel workbooks, even if you've tried to remove errors in earlier steps. I'm not clear on the criteria that causes this, but I wonder if that could be the case here since it mentions a "#VALUE!" error in that message? While PQ should probably be handling this more gracefully, I made a couple of queries that let me input a directory and it will return the workbook, sheet, and row of every cell error in every excel file in that directory. I've never tried it with 10k excel files, but if my code were cleaned up to be more efficient it would probably work quickly enough.
The query that gets all the raw excel file data looks like this:
let
Source = Folder.Files(YOUR DIRECTORY HERE),
#"Filtered Rows1" = Table.SelectRows(Source, each not Text.StartsWith([Name], "~")),
#"Filtered Rows" = Table.SelectRows(#"Filtered Rows1", each Text.EndsWith([Extension], ".xlsx") or Text.EndsWith([Extension], ".xlsm")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "WorkbookData", each Excel.Workbook([Content])),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Folder Path", "Name", "WorkbookData"}),
#"Expanded WorkbookData" = Table.ExpandTableColumn(#"Removed Other Columns", "WorkbookData", {"Data", "Hidden", "Item", "Kind", "Name"}, {"WorkbookData.Data", "WorkbookData.Hidden", "WorkbookData.Item", "WorkbookData.Kind", "WorkbookData.Name"}),
#"Filtered Rows2" = Table.SelectRows(#"Expanded WorkbookData", each ([WorkbookData.Kind] = "Sheet")),
#"Removed Other Columns1" = Table.SelectColumns(#"Filtered Rows2",{"Folder Path", "Name", "WorkbookData.Name", "WorkbookData.Data"}),
ExpandedData = Table.ExpandTableColumn(#"Removed Other Columns1", "WorkbookData.Data", Table.ColumnNames(Table.Combine(#"Removed Other Columns1"[WorkbookData.Data]))),
IdentifySheets = Table.AddColumn(ExpandedData, "UniqueSheet", each [Folder Path]&[Name]&[WorkbookData.Name]),
SheetRowCounts = Table.Group(IdentifySheets, {"UniqueSheet"}, {{"Count", each Table.RowCount(_), type number}}),
#"Added Custom2" = Table.AddColumn(SheetRowCounts, "PerSheetRow", each List.Numbers(1, [Count], 1)),
#"Expanded PerSheetIndex" = Table.ExpandListColumn(#"Added Custom2", "PerSheetRow"),
IndexBase = Table.AddIndexColumn(#"Expanded PerSheetIndex", "Index", 0, 1),
#"Added Index" = Table.AddIndexColumn(IdentifySheets, "Index", 0, 1),
#"Merged Queries" = Table.NestedJoin(#"Added Index",{"Index"},IndexBase,{"Index"},"NewColumn",JoinKind.LeftOuter),
#"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"PerSheetRow"}, {"PerSheetRow"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded NewColumn",{"UniqueSheet", "Index"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns", List.Combine({{"Folder Path", "Name", "WorkbookData.Name", "PerSheetRow"}, List.RemoveMatchingItems(Table.ColumnNames(ExpandedData), {"Folder Path", "Name", "WorkbookData.Name"})}))
in
#"Reordered Columns"
And that part is setup as connection only query, since I don't want to load the data of every sheet of every workbook I'm checking.
The query I use to load the rows with errors in it looks like this:
let
Source = NAME OF THE QUERY ABOVE,
#"Kept Errors" = Table.SelectRowsWithErrors(Source, Table.ColumnNames(Source)),
ColumnList = Table.FromList(Table.ColumnNames(#"Kept Errors")),
#"Added Custom" = Table.AddColumn(ColumnList, "Custom", each "ERROR"),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Replacements", each Record.FieldValues(_)),
ErrorReplacements = Table.SelectColumns(#"Added Custom1",{"Replacements"}),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Kept Errors", ErrorReplacements[Replacements]),
#"Renamed Columns" = Table.RenameColumns(#"Replaced Errors",{{"PerSheetRow", "SheetRow"}, {"Name", "Workbook"}, {"WorkbookData.Name", "Sheet"}})
in
#"Renamed Columns"
I couldn't find a way to get PQ convert the "error" cells to a string of which specific error it is (probably possible, I just don't know how), so instead I just have it replace all the error cells with "ERROR" and have conditional formatting on my sheet to highlight that.
Can't say how functional this would be for your case, but it has helped me numerous times to find errors cells in sets of excel files though.