Dynamically append tables from different workbooks in Power Query - excel

I'm trying to append several Excel tables from different workbooks using PowerQuery.
But instead of loading every table manually and then appending them, I would like to keep record of workbook names and tables along with their addresses in another table (say 'data_sources' table), so that PowerQuery could know what tables I want to append & where to find them.
How could I accomplish this? Suppose I have the following table, where other workbooks could be added later.
EDIT:
The following code will include to the query results any new column from the original tables. The #"Added Custom" argument will create a column containing the tables, and then it is used in the next step so to combine them.
let
Source = Excel.CurrentWorkbook(){[Name="data_sources"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Workbook location", type text}, {"Workbook name", type text}, {"Table name", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Excel.Workbook(File.Contents([Workbook location]&[Workbook name])){[Item=[Table name],Kind="Table"]}[Data]),
#"Combine" = Table.Combine(#"Added Custom"[Custom]),
#"Filtered Rows" = Table.SelectRows(Combine, each [Client] <> null and [Client] <> "")
in
#"Filtered Rows"

Given the following input table, then the code below works to retrieve all tables and combine them.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Workbook Location", type text}, {"Workbook Name", type text}, {"Table Name", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Excel.Workbook(File.Contents([Workbook Location]&[Workbook Name]) ){[Item=[Table Name],Kind="Table"]}[Data])
in
#"Added Custom"

Related

Power Query - Find matching contents from multiple other tables

I have a set of data of non-trivial size that I am trying to transform in Power Query. One column's (say, "Column_1") values holds several dimensions of data that are not consistently delimited in any way. I want to apply formulas to this column to do the following:
with reference to various separate tables (say, "Lookup_n") each listing all possible values for a given dimension, identify whether a substring contained in a table is present in the data in Column1
if it is present, insert that substring into a new column specific to that dimension, and remove it from the data in Column1
Here is an example of what I would like to have happen:
Sample Output
I am fairly new to Power Query so don't really know where to begin in formulating a solution to this. I would be very interested to hear if there is an easier way to accomplish this than using the method I have described.
Thanks!
In powerquery, try this code for the input after creating query lookup_1 (with column name lookup_1), query lookup_2 (with column name lookup_2_ and query lookup_3 (with column name lookup_3)
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Lookup = Table.UnpivotOtherColumns( Table.Combine({lookup_3, lookup_2, lookup_1}),{} , "Attribute", "Value"),
#"Added Custom" = Table.AddColumn(Source,"custom",(i)=>(Table.SelectRows(Lookup, each Text.Contains(i[Column_1],[Value])))),
Expanded = Table.ExpandTableColumn(#"Added Custom", "custom", {"Attribute", "Value"}, {"Attribute", "Value"}),
#"Changed Type1" = Table.TransformColumnTypes(Expanded,{{"Column_1", type text}, {"Attribute", type text}, {"Value", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type1",null,"<none>",Replacer.ReplaceValue,{"Attribute", "Value"}),
#"Pivoted Column" = Table.Pivot(#"Replaced Value", List.Distinct(#"Replaced Value"[Attribute]), "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"<none>"})
in #"Removed Columns"

Writing a query that can have excel fill in and consolidate unpopulated fields within a group of duplicates?

I have a contact info dataset (large) that contains lots of semi-duplicate rows that I'd like to condense into as little rows as possible. Attached is a sample of what I'm talking about.
The blue table on the left is smaller scale example of what I'm currently working with. The orange table on the right is what I would like the table to look like.
I want to write a query that will be able to select an ID that has multiple rows, and within that selection, assess whether values can be moved into a parent row that has unpopulated cells (see ID "4" and how I condensed those three rows of data into one by filling in blanks and consolidating duplicates).
An important point of emphasis is how to perform this task without it being a blanket statement for all duplicates in the entire worksheet. Ultimately I want to perform this task for the entire worksheet, but I want excel to first isolate a single ID and then execute the aforementioned task, rather than evaluating the criteria based on all duplicate IDs. ((If that makes sense))
One other condition I would like to have is for certain columns where multiple rows under the same ID have different values, is to allocate that data into a subsequent column (see Tags & Tags2 columns under ID "1") instead of overriding a cell.
I only want to do this ^ for 2 columns; for the others, have it keep them as separate rows.
This sounds like a task for Power Query, but my knowledge is limited in that realm. Any help on how to construct a query that accomplish this task is much appreciated. Thanks.
This seems to work fine
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type any}, {"City", type text}, {"Tags", type text}}),
// group, then unpivot, remove duplicates
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Data", each Table.Distinct(Table.UnpivotOtherColumns(_, {"ID"}, "Attribute", "Value"), {"Attribute", "Value"}), type table}}),
// combine all the tags into one cell for later splitting
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Group([Data], {"ID", "Attribute"}, {{"Data", each Text.Combine([Value],","), type text}})),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Attribute", "Data"}, {"Attribute", "Data.1"}),
// replace null with Title to preserve rows with no data
#"Replaced Value" = Table.ReplaceValue(#"Expanded Custom",null,"Title",Replacer.ReplaceValue,{"Attribute"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Data"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Data.1"),
// split the Tags column into any number of columns as needed
#"Replaced Value1" = Table.ReplaceValue(#"Pivoted Column",null,"xxx",Replacer.ReplaceValue,{"Tags"}),
DynamicColumnList = List.Transform({1 ..List.Max(Table.AddColumn(#"Replaced Value1","Custom", each List.Count(Text.PositionOfAny([Tags],{","},Occurrence.All)))[Custom])+1}, each "Tags." & Text.From(_)),
#"Split Column by Delimiter" = Table.SplitColumn( #"Pivoted Column", "Tags", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), DynamicColumnList)
in #"Split Column by Delimiter"
You can get your desired output from Power Query using just the Table.Group function.
I have assumed:
Output columns are only as you show
Input columns don't have anything in Phone2 and Tags2
If that is not the case, simple modifications are possible
If there are more distinct entities that columns for the output, they will be output in a single column concatenated.
In other words, if you had three tags; the first one would be in the Tags column and the second and third concatenated with a comma in the Tags 2 column.
I did it this way because, since you show no examples, I'm not quite sure how you want things lined up if you have, for example, multiple phones and multiple tags.
Note: If you want to restrict this to just one ID, just insert a filtering step at the beginning
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type text}, {"City", type text}, {"Tags", type text}, {"Tags2", type any}}),
//Group by ID then
//Depending on how many columns available in results table, will
//either concatenate, multiple non-duplicate rows, or put them in separate columns
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {
{"Title", each Text.Combine(List.Distinct([Title]),", ")},
{"Company", each Text.Combine(List.Distinct([Company]),", ")},
{"Phone", each try List.RemoveNulls([Phone]){0} otherwise null},
{"Phone 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Phone])),1),", ")},
{"City", each Text.Combine(List.Distinct([City]),", ")},
{"Tags", each try List.RemoveNulls([Tags]){0} otherwise null},
{"Tags 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Tags])),1),", ")}
})
in
#"Grouped Rows"

Split data grouped within cells from multiple columns into rows using Power Query Editor

Similar to a beginners question I posted: Split values in cell into columns and rows
When trying to achieve the same affect for multiple columns, power query editor can split one column as desired but for the other column copies all of the values to the split into each new row (as in the image). This makes sense however im wondering if its possible to split the data accordingly as shown in the desired outcome.
I have found a work around to this by repeating the PQE exercise twice for each column to the split and then moving the outputted columns so that they are adjacent. However this seems like an inefficient way to achieve this. Can power query split both columns as desired without having to do this twice?
I would suggest first combining the columns; then doing the split.
But when you combine the columns, you need to do this on a row-by-row basis to keep things together on the same line.
A list of each cell contents can be created with the Text.Split function.
Then the two lists can be combined using the List.Zip function.
Finally, we just split them up.
I use a Custom Column to create the joined lists. You can see the formula by clicking on the Added Custom applied step.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"Sub", type text}, {"CAS", type text}}),
//combine the two columns
#"Added Custom" = Table.AddColumn(#"Changed Type", "list", each List.Zip({
Text.Split([Sub],"#(lf)"),
Text.Split([CAS],"#(lf)")
})),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Sub", "CAS"}),
//Expand the list and split into rows
#"Expanded list" = Table.ExpandListColumn(#"Removed Columns", "list"),
#"Extracted Values" = Table.TransformColumns(#"Expanded list", {"list", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "list", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"list.1", "list.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"list.1", type text}, {"list.2", type text}}),
//Rename the splitted columns
renamed = Table.RenameColumns(#"Changed Type1",List.Zip({Table.ColumnNames(#"Changed Type1"),Table.ColumnNames(Source)}))
in
renamed
try below
The key is in the added custom columns that split on linefeed into lists, and then combine those lists into a table that can be expanded into rows. To make null handling easier I converted nulls to a text null, then back at end
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Sub", type text}, {"CAS", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"[null]",Replacer.ReplaceValue,{"Sub", "CAS"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Custom", each Text.Split([Sub],"#(lf)")),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.Split([CAS],"#(lf)")),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each Table.FromColumns({[Custom],[Custom.1]})),
#"Expanded Custom.2" = Table.ExpandTableColumn(#"Added Custom2", "Custom.2", {"Column1", "Column2"}, {"Column1", "Column2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom.2",{"Sub", "CAS", "Custom", "Custom.1"}),
#"Replaced Value1" = Table.ReplaceValue(#"Removed Columns","[null]",null,Replacer.ReplaceValue,{"Column1", "Column2"})
in #"Replaced Value1"

Power Query Sum of column by group as new column

So I am new to power query and I just wasted over an hour looking for something that I can do easily in many other programs.
I just want to create a new column summing up another column. FOr instance, to check if the percentage a correct and if not normalize therafter. I dont want to group by and reduce the table.
I ve been searching left and right and tried to add a new column like "Group Sum" using stuff like
= list.sum([Number])
= Calculate(SUM([Number])
just to get the the total sum of all entries 200. No success.
Maybe its me, but I really dont see the logic.
I now tried
let
Quelle = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}, {"Group Sum", Int64.Type}, {"Spalte1", Int64.Type}})
#"Added Custom" = Table.AddColumn(#"Geänderter Typ","Group Sum",(i)=>List.Sum(Table.SelectRows(#"Geänderter Typ", each [Group]=i[Group])[Number]), type number )
in
#"Geänderter Typ"
which results in an error and
let
Quelle = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}, {"Group Sum", Int64.Type}}),
#"Hinzugefügte benutzerdefinierte Spalte" = Table.AddColumn(#"Geänderter Typ", "Benutzerdefiniert", each Table.Group(Quelle, {"Group"}, {{"Group Sum", each List.Sum([Number]), type nullable number}}))
in
#"Hinzugefügte benutzerdefinierte Spalte"
Which gives me a new column where all entries say "Table"
Here are two other options. The examples assume your source table is named Table1. Here's how mine looks at its source in Excel:
Note it does not have a Group Sum column. The query will derive that.
Option 1.
Click Add Column then Custom Column and fill out the screen like this and click OK:
You should see a table like this:
Then just click the table in the first row of the Custom column and you should get a table that looks like this:
Then you can merge this new table with the original source table (Table1). Click Home > Merge Queries and fill out the information for the merge like this and click OK. (Note that the same query "Table1" is being merged to itself at this point, and only the Group column is selected for each entry.)
You should see a table like this:
Then, in the formula bar above that table, where you see = Table.NestedJoin(Custom, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter), change the first instance of Custom to Source, so the line reads = Table.NestedJoin(Source, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter) instead.
That is, change it from:
To:
Then expand the new Custom column by clicking the button, only selecting the Group Sum column, clearing the checkbox beside "Use original column name as prefix," and clicking OK:
You should get this result:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Table.Group(Source, {"Group"}, {{"Group Sum", each List.Sum([Number]), type nullable number}})),
Custom = #"Added Custom"{0}[Custom],
#"Merged Queries" = Table.NestedJoin(Source, {"Group"}, Custom, {"Group"}, "Custom", JoinKind.LeftOuter),
#"Expanded Custom" = Table.ExpandTableColumn(#"Merged Queries", "Custom", {"Group Sum"}, {"Group Sum"})
in
#"Expanded Custom"
(You can replace Table1, Source and Changed Type with Tablelle1, Quelle, and #"Geänderter Typ", respectively throughout the code above to align with Max's language.)
Option 2.
Click Transform then Group By and fill out the screen like this and click OK:
Then expand the AllData column with only the Gender and Number columns selected like this:
The result:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Group"}, {{"AllData", each _, type table [Group=text, Gender=text, Number=number]}, {"Group Sum", each List.Sum([Number]), type number}}),
#"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Gender", "Number"}, {"Gender", "Number"})
in
#"Expanded AllData"
try
let Quelle= Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Promoted Headers" = Table.PromoteHeaders(Quelle, [PromoteAllScalars=true]),
#"Geänderter Typ" = Table.TransformColumnTypes(#"Promoted Headers",{{"Group", type text}, {"Gender", type text}, {"Number", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Geänderter Typ","Group Sum2",(i)=>List.Sum(Table.SelectRows(#"Geänderter Typ", each [Group]=i[Group]) [Number]), type number )
in #"Added Custom"
Group and Join Method
I have now seen a few ways to do this, but I think the most efficient is probably a group-and-join approach that builds on previous comments and answers here. It takes one line:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.Join(Source, "Group", Table.Group(Source,{"Group"},{{"Group Sum", each List.Sum([Number]), type nullable number}}), "Group")
in
#"Added Custom"
The Table.Group() part of this creates a table with each unique value of the grouping variable ("Group" here) and, for each of those unique values, its summary value (the sum of [Number] for all rows with the same "Group" value here). To attach these summary values onto the original table becomes the job for Table.Join(). The Table.Join() function gets four input arguments: 1.) the original table, 2.) the grouping column in the original table ("Group" here), 3.) the summary table (that's the output of the Table.Group() function here) and 4.) the grouping column in summary table (also "Group" here).
I tested this and get the results as shown:
Note: I changed Number column values from the question to show that the code is working. In the example provided in the original question, the Group Sum is 100 for both groups, and that seems to make the approach suggested in another answer look like it's working when it does not.

Create a different pivot view in Power Query

I have the data structured in excel in the following format
What I want to do with that is to transform it into this. In simple words for each ID I want to record the difference in value from previous day, and if there is no value in previous day we just keep the current value.
As an intermediate step I am trying to transform the raw data into something like this but I am not sure how to go about it in simple Excel pivot tables, or Power query transformations.
There is something wrong with your sample because [v1-v2] is not the same method as [v5-v4, v3-v2, v8-v7] but I assume the latter ones were right
See if this works for you
Assumes data in 3 columns in a range named Table1 with column headers Dates, ID, Value
You can paste into PowerQuery using ... Advanced Editor ...
Creates a column with the value of yesterday for that ID and returns a null if nothing is found. Then does the subtraction, and pivots
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Dates", type date}, {"ID", type text}, {"Value", Int64.Type}}),
Yesterday = Table.AddColumn(#"Changed Type" , "Yesterday", (i) => List.Sum(Table.SelectRows( #"Changed Type", each ([ID] = i[ID] and Date.AddDays([Dates],1) = i[Dates]))[Value]), type number ),
#"Replaced Value" = Table.ReplaceValue(Yesterday,null,0,Replacer.ReplaceValue,{"Yesterday"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Custom", each [Value]-[Yesterday]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Value", "Yesterday"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Dates", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Dates", type text}}, "en-US")[Dates]), "Dates", "Custom", List.Sum)
in #"Pivoted Column"

Resources