Related
I am attempting to extract section heading numbers from a column in excel using power query.
I have already achieved this by matching with an existing list. However, I wonder if there is a better way to achieve this in fewer steps.
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
#"Trimmed Text1" = Table.TransformColumns(Source,{{"Column1", PowerTrim, type text}}),
SectionNumbers = Table.AddColumn(#"Trimmed Text1", "SectionNumber", (x) => Text.Combine(Table.SelectRows(SectionNumbers, each Text.Contains(x[Column1],[SectionNumbers], Comparer.OrdinalIgnoreCase))[SectionNumbers],", ")),
#"Split Column by Delimiter2" = Table.SplitColumn(SectionNumbers, "SectionNumber", Splitter.SplitTextByEachDelimiter({","}, QuoteStyle.None, true), {"SectionNumber.1", "SectionNumber.2"}),
#"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter2", "Custom", each if [SectionNumber.2] = null then [SectionNumber.1] else [SectionNumber.2]),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom1",{"Column1", "Custom"})
in
#"Removed Other Columns"
The Section numbers being matched to can be generated using:
SectionNumbers
let
Source = {1..16},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Custom", each {1..9}),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom.1", each "."),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Added Custom1", {{"Custom", type text}}, "en-GB"),{"Custom", "Custom.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Merged Columns1" = Table.CombineColumns(Table.TransformColumnTypes(#"Merged Columns", {{"Column1", type text}}, "en-GB"),{"Column1", "Merged"},Combiner.CombineTextByDelimiter(".", QuoteStyle.None),"SectionNumbers")
in
#"Merged Columns1"
Essentially I would like a way of extracting any decimal at the start of a cell, either 15.0. or 15.0, or even 15.0.1 etc.
I have considered using regex i.e. \d+\.\d+[.] which should work however I have many rows and find that regex sometimes is computationally intensive in this case, so it takes much longer to load than the Above M Code.
Another power query method
Since you know your section numbers you could:
Generate a (buffered) list of all the section numbers
see if the first space-separated part of the string in column 1 exists in the Section Number list.
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//create list of all serial numbers
SerialNumbers = List.Buffer(
let
Part1 = List.Transform({1..16}, each Text.From(_) & "."),
Part2 =List.Transform({1..10}, each Text.From(_) & "."),
sn = List.Accumulate(Part1,{}, (state, current)=> state &
List.Generate(
()=>[s=current & Part2{0}, idx=0],
each [idx] < List.Count(Part2),
each [s=current & Part2{[idx]+1}, idx=[idx]+1],
each [s]))
in
sn),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
let
x = Text.Split([Column1]," "){0}
in
if List.Contains(SerialNumbers,x) then x else null, type text)
in
#"Added Custom"
How about
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each if Text.PositionOfAny([Column1], {"0".."9"})>=0 then Text.BeforeDelimiter(Text.From([Column1])," ") else null)
in #"Added Custom"
I have a datasource from an external Excel file that I have added to an Excel worksheet. I need to add new custom columns that compare the data to a table ("My_Table") in another worksheet that is manually updated. I used the Power Query Editor and created a new column that checks if there is a matching entry in My_Table based on matching 3 columns and gives a True/False result (ie for each row of the datasource, if the acctName, projectName, and boardName match a corresponding row in My_Table, then it returns true):
#"Added Custom" = Table.AddColumn(#"Reordered Columns", "Tracked", each Table.Contains( My_Table, [Customer=[acctName], Project=[projectName], Board=[boardName]]))
What I would like to do now is do the exact same thing but count how many times those three columns match in "My_Table". I thought Tabel.RowCount would work but I'm not sure if that's the right way to do it as I either have an error or a zero result.
dolomike, Here's another shot at it...
I started with this as Table1:
...and this as My_Table:
...and used this M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Merged Queries" = Table.NestedJoin(Source, {"acctName", "projectName", "boardName"}, My_Table, {"Customer", "Project", "Board"}, "My_Table", JoinKind.LeftOuter),
#"Expanded My_Table" = Table.ExpandTableColumn(#"Merged Queries", "My_Table", {"Customer", "Project", "Board"}, {"My_Table.Customer", "My_Table.Project", "My_Table.Board"}),
#"Grouped Rows" = Table.Group(#"Expanded My_Table", {"My_Table.Customer", "My_Table.Project", "My_Table.Board"}, {{"Count", each Table.RowCount(_), type number}, {"AllData", each _, type table [acctName=text, projectName=text, boardName=text, My_Table.Customer=text, My_Table.Project=text, My_Table.Board=text]}}),
Custom2 = Table.TransformColumns(#"Grouped Rows",{"Count", each if _ = List.Max(#"Grouped Rows"[Count]) then 0 else _}),
#"Removed Other Columns" = Table.SelectColumns(Custom2,{"Count", "AllData"}),
#"Expanded AllData" = Table.ExpandTableColumn(#"Removed Other Columns", "AllData", {"acctName", "projectName", "boardName", "My_Table.Customer", "My_Table.Project", "My_Table.Board"}, {"acctName", "projectName", "boardName", "My_Table.Customer", "My_Table.Project", "My_Table.Board"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Expanded AllData",{"Count", "acctName", "projectName", "boardName"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"acctName", "projectName", "boardName", "Count"}),
#"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{{"acctName", "Customer"}, {"projectName", "Project"}, {"boardName", "Board"}}),
#"Removed Duplicates" = Table.Distinct(#"Renamed Columns")
in
#"Removed Duplicates"
...to get this result:
I have, in my excel workbook, a Table called ResultsTable, in that table there is a file path
C:\Users\XXXX\OneDrive - WORK\Digital
Soil\Data\Results
I have Query that should get all excel files from the folder and transform the data into something usefull looking like this:
let
Source = Folder.Files("ResultsTable"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Analyseresultater", each #"Transform File from Analyseresultater"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from Analyseresultater"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Analyseresultater", Table.ColumnNames(#"Transform File from Analyseresultater"(#"Sample File"))),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded Table Column1",{"Key", "Attribute", "Value"})
in
#"Removed Other Columns"
But I get the error
DataFormat.Error: The supplied folder path must be a valid absolute
path. Details:
ResultsTable
I hope someone can help me get through this error :)
EDIT: Added screenshot of how my sheet with tables are set up
You can fix the code like,
let
FilePath = Excel.CurrentWorkbook(){[Name="ResultsTable"]}[Content][Path to results]{0},
Source = Folder.Files(FilePath),
In the original code, Folder.Files() was receiving the literal text "ResultsTable", not the cell value in ResultsTable. You need to first pick the cell value with Excel.CurrentWorkbook(), and then pass it to Folder.Files().
I am trying to append close to 10000 excel files (each having size of 50-100 kb). Half the way into the process I am running into an error with the PQ. The error hits half the way when I am appending files and it is impossible to figure out which .xlsx file is the one causing the issue.
PQ's Queries and Connections pane shows the following error at the same time:
How do I go about resolving this issue other than going one by one manually and uploading query on PQ until I find the file(s) which are giving me the errors? Thanks for reading!
I've frequently run into issues where PQ outright fails when it runs into "error" cells in excel workbooks, even if you've tried to remove errors in earlier steps. I'm not clear on the criteria that causes this, but I wonder if that could be the case here since it mentions a "#VALUE!" error in that message? While PQ should probably be handling this more gracefully, I made a couple of queries that let me input a directory and it will return the workbook, sheet, and row of every cell error in every excel file in that directory. I've never tried it with 10k excel files, but if my code were cleaned up to be more efficient it would probably work quickly enough.
The query that gets all the raw excel file data looks like this:
let
Source = Folder.Files(YOUR DIRECTORY HERE),
#"Filtered Rows1" = Table.SelectRows(Source, each not Text.StartsWith([Name], "~")),
#"Filtered Rows" = Table.SelectRows(#"Filtered Rows1", each Text.EndsWith([Extension], ".xlsx") or Text.EndsWith([Extension], ".xlsm")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "WorkbookData", each Excel.Workbook([Content])),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Folder Path", "Name", "WorkbookData"}),
#"Expanded WorkbookData" = Table.ExpandTableColumn(#"Removed Other Columns", "WorkbookData", {"Data", "Hidden", "Item", "Kind", "Name"}, {"WorkbookData.Data", "WorkbookData.Hidden", "WorkbookData.Item", "WorkbookData.Kind", "WorkbookData.Name"}),
#"Filtered Rows2" = Table.SelectRows(#"Expanded WorkbookData", each ([WorkbookData.Kind] = "Sheet")),
#"Removed Other Columns1" = Table.SelectColumns(#"Filtered Rows2",{"Folder Path", "Name", "WorkbookData.Name", "WorkbookData.Data"}),
ExpandedData = Table.ExpandTableColumn(#"Removed Other Columns1", "WorkbookData.Data", Table.ColumnNames(Table.Combine(#"Removed Other Columns1"[WorkbookData.Data]))),
IdentifySheets = Table.AddColumn(ExpandedData, "UniqueSheet", each [Folder Path]&[Name]&[WorkbookData.Name]),
SheetRowCounts = Table.Group(IdentifySheets, {"UniqueSheet"}, {{"Count", each Table.RowCount(_), type number}}),
#"Added Custom2" = Table.AddColumn(SheetRowCounts, "PerSheetRow", each List.Numbers(1, [Count], 1)),
#"Expanded PerSheetIndex" = Table.ExpandListColumn(#"Added Custom2", "PerSheetRow"),
IndexBase = Table.AddIndexColumn(#"Expanded PerSheetIndex", "Index", 0, 1),
#"Added Index" = Table.AddIndexColumn(IdentifySheets, "Index", 0, 1),
#"Merged Queries" = Table.NestedJoin(#"Added Index",{"Index"},IndexBase,{"Index"},"NewColumn",JoinKind.LeftOuter),
#"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"PerSheetRow"}, {"PerSheetRow"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded NewColumn",{"UniqueSheet", "Index"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns", List.Combine({{"Folder Path", "Name", "WorkbookData.Name", "PerSheetRow"}, List.RemoveMatchingItems(Table.ColumnNames(ExpandedData), {"Folder Path", "Name", "WorkbookData.Name"})}))
in
#"Reordered Columns"
And that part is setup as connection only query, since I don't want to load the data of every sheet of every workbook I'm checking.
The query I use to load the rows with errors in it looks like this:
let
Source = NAME OF THE QUERY ABOVE,
#"Kept Errors" = Table.SelectRowsWithErrors(Source, Table.ColumnNames(Source)),
ColumnList = Table.FromList(Table.ColumnNames(#"Kept Errors")),
#"Added Custom" = Table.AddColumn(ColumnList, "Custom", each "ERROR"),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Replacements", each Record.FieldValues(_)),
ErrorReplacements = Table.SelectColumns(#"Added Custom1",{"Replacements"}),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Kept Errors", ErrorReplacements[Replacements]),
#"Renamed Columns" = Table.RenameColumns(#"Replaced Errors",{{"PerSheetRow", "SheetRow"}, {"Name", "Workbook"}, {"WorkbookData.Name", "Sheet"}})
in
#"Renamed Columns"
I couldn't find a way to get PQ convert the "error" cells to a string of which specific error it is (probably possible, I just don't know how), so instead I just have it replace all the error cells with "ERROR" and have conditional formatting on my sheet to highlight that.
Can't say how functional this would be for your case, but it has helped me numerous times to find errors cells in sets of excel files though.
My problem:
Through New Query -> From Other Sources -> From Web, I entered a static URL that allowed me to load approximately 60k "IDs" from a webpage in JSON format.
I believe each of these IDs corresponds to an item.
So they're all loaded and organised in a column, with one ID per line, inside a Query tab.
For the moment, no problem.
Now I need to import information from a dynamic URL that depends on the ID.
So I need to import from URL in this form:
http://www.example.com/xxx/xxxx/ID
This imports the following for each ID:
name of correspond item,
average price,
supply,
demand,
etc.
After research I came to the conclusion that I had to use the "Advanced Editor" inside the query editor to reference the ID query tab.
However I have no idea how to put together the static part with the ID, and how to repeat that over the 60k lines.
I tried this:
let
Source = Json.Document(Web.Contents("https://example.com/xx/xxxx/" & ID)),
name1 = Source[name]
in
name1
This returns an error.
I think it's because I can't add a string and a column.
Question: How do I reference the value of the cell I'm interested in and add it to my string ?
Question: Is what I'm doing viable?
Question: How is Excel going to handle loading 60k queries?
Each query is only a few words to import.
Question: Is it possible to load information from 60k different URLs with one query?
EDIT : thank you very much for answer Alexis, was very helpful. So to avoid copying what you posted I did it without the function (tell me what you think of it) :
let
Source = Json.Document(Web.Contents("https://example.com/all-ID.json")),
items1 = Source[items],
#"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
#"Inserted Merged Column" = Table.AddColumn(#"Renamed Columns", "URL", each Text.Combine({"http://example.com/api/item/", Text.From([ID], "fr-FR")}), type text),
#"Added Custom" = Table.AddColumn(#"Inserted Merged Column", "Item", each Json.Document(Web.Contents([URL]))),
#"Expanded Item" = Table.ExpandRecordColumn(#"Added Custom", "Item", {"name"}, {"Item.name"})
in
#"Expanded Item"
Now the problem I have is that it takes ages to load up all the information I need from all the URLs.
As it turns out it's possible to extract from multiple IDs at once using this format : http://example.com/api/item/ID1,ID2,ID3,ID4,...,IDN
I presume that trying to load from an URL containing all of the IDs at once would not work out because the URL would contain way too many characters to handle.
So to speed things up, what I'm trying to do now is concatenate every Nth row into one cell, for example with N=3 :
205
651
320165
63156
4645
31
6351
561
561
31
35
would become :
205, 651, 320165
63156, 4645, 31
6351, 561, 561
31, 35
The "Group by" functionnality doesn't seem to be what I'm looking for, and I'm not sure how to automatise that throught Power Query
EDIT 2
So after a lot of testing I found a solution, even though it might not be the most elegant and optimal :
I created an index with a 1 step
I created another costum column, I associated every N rows with an N increasing number
I used "Group By" -> "All Rows" to create a "Count" column
Created a costum column "[Count][ID]
Finally I excracted values from that column and put a "," separator
Here's the code for N = 10 000 :
let
Source = Json.Document(Web.Contents("https://example.com/items.json")),
items1 = Source[items],
#"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"ID", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Conditional Column" = Table.AddColumn(#"Added Index", "Custom", each if Number.RoundDown([Index]/10000) = [Index]/10000 then [Index] else Number.IntegerDivide([Index],10000)*10000),
#"Reordered Columns" = Table.ReorderColumns(#"Added Conditional Column",{"Index", "ID", "Custom"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns", {"Custom"}, {{"Count", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom.1", each [Count][ID]),
#"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Custom.1", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
#"Extracted Values"
I think what you want to do here is create a custom function that you invoke with each of your ID values.
Let me give a similar example that should point you in the right direction.
Let's say I have a table named ListIDs which looks like this:
ID
----
1
2
3
4
5
6
7
8
9
10
and for each ID I want to pull some information from Wikipedia (e.g. for ID = 6 I want to lookup https://en.wikipedia.org/wiki/6 and return the Cardinal, Ordinal, Factorization, and Divisors of 6).
To get this for just one ID value my query would look like this (using 6 again):
let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/6")),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
#"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
#"Pivoted Column"
Now we want to convert this into a function so that we can use it as many times as we want without creating a bunch of queries. (Note: I've named this query/function WikiLookUp as well.) To do this, change it to the following:
let
WikiLookUp = (ID as text) =>
let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/" & ID)),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
#"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
#"Pivoted Column"
in
WikiLookUp
Notice that all we did is wrap it in another set of let...in and defined the parameter ID = text which gets substituted into the Source line near the end. The function should appear like this:
Now we can go back to our table which we've imported into the query editor and invoke our newly created function in a custom column. (Note: Make sure you convert your ID values to text type first since they're being appended to a URL.)
Add a custom column with the following definition (or use the Invoke Custom Function button)
= WikiLookUp([ID])
Expand that column to bring in all the columns you want and you're done!
Here's what that query's M code looks like:
let
Source = Excel.CurrentWorkbook(){[Name="ListIDs"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each WikiLookUp([ID])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Cardinal", "Ordinal", "Factorization", "Divisors"}, {"Cardinal", "Ordinal", "Factorization", "Divisors"})
in
#"Expanded Custom"
The query should look like this: