Power Query - Find matching contents from multiple other tables - excel

I have a set of data of non-trivial size that I am trying to transform in Power Query. One column's (say, "Column_1") values holds several dimensions of data that are not consistently delimited in any way. I want to apply formulas to this column to do the following:
with reference to various separate tables (say, "Lookup_n") each listing all possible values for a given dimension, identify whether a substring contained in a table is present in the data in Column1
if it is present, insert that substring into a new column specific to that dimension, and remove it from the data in Column1
Here is an example of what I would like to have happen:
Sample Output
I am fairly new to Power Query so don't really know where to begin in formulating a solution to this. I would be very interested to hear if there is an easier way to accomplish this than using the method I have described.
Thanks!

In powerquery, try this code for the input after creating query lookup_1 (with column name lookup_1), query lookup_2 (with column name lookup_2_ and query lookup_3 (with column name lookup_3)
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Lookup = Table.UnpivotOtherColumns( Table.Combine({lookup_3, lookup_2, lookup_1}),{} , "Attribute", "Value"),
#"Added Custom" = Table.AddColumn(Source,"custom",(i)=>(Table.SelectRows(Lookup, each Text.Contains(i[Column_1],[Value])))),
Expanded = Table.ExpandTableColumn(#"Added Custom", "custom", {"Attribute", "Value"}, {"Attribute", "Value"}),
#"Changed Type1" = Table.TransformColumnTypes(Expanded,{{"Column_1", type text}, {"Attribute", type text}, {"Value", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type1",null,"<none>",Replacer.ReplaceValue,{"Attribute", "Value"}),
#"Pivoted Column" = Table.Pivot(#"Replaced Value", List.Distinct(#"Replaced Value"[Attribute]), "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"<none>"})
in #"Removed Columns"

Related

Dynamically append tables from different workbooks in Power Query

I'm trying to append several Excel tables from different workbooks using PowerQuery.
But instead of loading every table manually and then appending them, I would like to keep record of workbook names and tables along with their addresses in another table (say 'data_sources' table), so that PowerQuery could know what tables I want to append & where to find them.
How could I accomplish this? Suppose I have the following table, where other workbooks could be added later.
EDIT:
The following code will include to the query results any new column from the original tables. The #"Added Custom" argument will create a column containing the tables, and then it is used in the next step so to combine them.
let
Source = Excel.CurrentWorkbook(){[Name="data_sources"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Workbook location", type text}, {"Workbook name", type text}, {"Table name", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Excel.Workbook(File.Contents([Workbook location]&[Workbook name])){[Item=[Table name],Kind="Table"]}[Data]),
#"Combine" = Table.Combine(#"Added Custom"[Custom]),
#"Filtered Rows" = Table.SelectRows(Combine, each [Client] <> null and [Client] <> "")
in
#"Filtered Rows"
Given the following input table, then the code below works to retrieve all tables and combine them.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Workbook Location", type text}, {"Workbook Name", type text}, {"Table Name", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Excel.Workbook(File.Contents([Workbook Location]&[Workbook Name]) ){[Item=[Table Name],Kind="Table"]}[Data])
in
#"Added Custom"

Writing a query that can have excel fill in and consolidate unpopulated fields within a group of duplicates?

I have a contact info dataset (large) that contains lots of semi-duplicate rows that I'd like to condense into as little rows as possible. Attached is a sample of what I'm talking about.
The blue table on the left is smaller scale example of what I'm currently working with. The orange table on the right is what I would like the table to look like.
I want to write a query that will be able to select an ID that has multiple rows, and within that selection, assess whether values can be moved into a parent row that has unpopulated cells (see ID "4" and how I condensed those three rows of data into one by filling in blanks and consolidating duplicates).
An important point of emphasis is how to perform this task without it being a blanket statement for all duplicates in the entire worksheet. Ultimately I want to perform this task for the entire worksheet, but I want excel to first isolate a single ID and then execute the aforementioned task, rather than evaluating the criteria based on all duplicate IDs. ((If that makes sense))
One other condition I would like to have is for certain columns where multiple rows under the same ID have different values, is to allocate that data into a subsequent column (see Tags & Tags2 columns under ID "1") instead of overriding a cell.
I only want to do this ^ for 2 columns; for the others, have it keep them as separate rows.
This sounds like a task for Power Query, but my knowledge is limited in that realm. Any help on how to construct a query that accomplish this task is much appreciated. Thanks.
This seems to work fine
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type any}, {"City", type text}, {"Tags", type text}}),
// group, then unpivot, remove duplicates
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Data", each Table.Distinct(Table.UnpivotOtherColumns(_, {"ID"}, "Attribute", "Value"), {"Attribute", "Value"}), type table}}),
// combine all the tags into one cell for later splitting
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Group([Data], {"ID", "Attribute"}, {{"Data", each Text.Combine([Value],","), type text}})),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Attribute", "Data"}, {"Attribute", "Data.1"}),
// replace null with Title to preserve rows with no data
#"Replaced Value" = Table.ReplaceValue(#"Expanded Custom",null,"Title",Replacer.ReplaceValue,{"Attribute"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Data"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Data.1"),
// split the Tags column into any number of columns as needed
#"Replaced Value1" = Table.ReplaceValue(#"Pivoted Column",null,"xxx",Replacer.ReplaceValue,{"Tags"}),
DynamicColumnList = List.Transform({1 ..List.Max(Table.AddColumn(#"Replaced Value1","Custom", each List.Count(Text.PositionOfAny([Tags],{","},Occurrence.All)))[Custom])+1}, each "Tags." & Text.From(_)),
#"Split Column by Delimiter" = Table.SplitColumn( #"Pivoted Column", "Tags", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), DynamicColumnList)
in #"Split Column by Delimiter"
You can get your desired output from Power Query using just the Table.Group function.
I have assumed:
Output columns are only as you show
Input columns don't have anything in Phone2 and Tags2
If that is not the case, simple modifications are possible
If there are more distinct entities that columns for the output, they will be output in a single column concatenated.
In other words, if you had three tags; the first one would be in the Tags column and the second and third concatenated with a comma in the Tags 2 column.
I did it this way because, since you show no examples, I'm not quite sure how you want things lined up if you have, for example, multiple phones and multiple tags.
Note: If you want to restrict this to just one ID, just insert a filtering step at the beginning
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Title", type text}, {"Company", type text}, {"Phone", type text}, {"Phone2", type any}, {"Street Address", type text}, {"City", type text}, {"Tags", type text}, {"Tags2", type any}}),
//Group by ID then
//Depending on how many columns available in results table, will
//either concatenate, multiple non-duplicate rows, or put them in separate columns
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {
{"Title", each Text.Combine(List.Distinct([Title]),", ")},
{"Company", each Text.Combine(List.Distinct([Company]),", ")},
{"Phone", each try List.RemoveNulls([Phone]){0} otherwise null},
{"Phone 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Phone])),1),", ")},
{"City", each Text.Combine(List.Distinct([City]),", ")},
{"Tags", each try List.RemoveNulls([Tags]){0} otherwise null},
{"Tags 2", each Text.Combine(List.RemoveFirstN(List.RemoveNulls(List.Distinct([Tags])),1),", ")}
})
in
#"Grouped Rows"

Split data grouped within cells from multiple columns into rows using Power Query Editor

Similar to a beginners question I posted: Split values in cell into columns and rows
When trying to achieve the same affect for multiple columns, power query editor can split one column as desired but for the other column copies all of the values to the split into each new row (as in the image). This makes sense however im wondering if its possible to split the data accordingly as shown in the desired outcome.
I have found a work around to this by repeating the PQE exercise twice for each column to the split and then moving the outputted columns so that they are adjacent. However this seems like an inefficient way to achieve this. Can power query split both columns as desired without having to do this twice?
I would suggest first combining the columns; then doing the split.
But when you combine the columns, you need to do this on a row-by-row basis to keep things together on the same line.
A list of each cell contents can be created with the Text.Split function.
Then the two lists can be combined using the List.Zip function.
Finally, we just split them up.
I use a Custom Column to create the joined lists. You can see the formula by clicking on the Added Custom applied step.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"Sub", type text}, {"CAS", type text}}),
//combine the two columns
#"Added Custom" = Table.AddColumn(#"Changed Type", "list", each List.Zip({
Text.Split([Sub],"#(lf)"),
Text.Split([CAS],"#(lf)")
})),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Sub", "CAS"}),
//Expand the list and split into rows
#"Expanded list" = Table.ExpandListColumn(#"Removed Columns", "list"),
#"Extracted Values" = Table.TransformColumns(#"Expanded list", {"list", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "list", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"list.1", "list.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"list.1", type text}, {"list.2", type text}}),
//Rename the splitted columns
renamed = Table.RenameColumns(#"Changed Type1",List.Zip({Table.ColumnNames(#"Changed Type1"),Table.ColumnNames(Source)}))
in
renamed
try below
The key is in the added custom columns that split on linefeed into lists, and then combine those lists into a table that can be expanded into rows. To make null handling easier I converted nulls to a text null, then back at end
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Sub", type text}, {"CAS", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"[null]",Replacer.ReplaceValue,{"Sub", "CAS"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Custom", each Text.Split([Sub],"#(lf)")),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.Split([CAS],"#(lf)")),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each Table.FromColumns({[Custom],[Custom.1]})),
#"Expanded Custom.2" = Table.ExpandTableColumn(#"Added Custom2", "Custom.2", {"Column1", "Column2"}, {"Column1", "Column2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom.2",{"Sub", "CAS", "Custom", "Custom.1"}),
#"Replaced Value1" = Table.ReplaceValue(#"Removed Columns","[null]",null,Replacer.ReplaceValue,{"Column1", "Column2"})
in #"Replaced Value1"

Create a different pivot view in Power Query

I have the data structured in excel in the following format
What I want to do with that is to transform it into this. In simple words for each ID I want to record the difference in value from previous day, and if there is no value in previous day we just keep the current value.
As an intermediate step I am trying to transform the raw data into something like this but I am not sure how to go about it in simple Excel pivot tables, or Power query transformations.
There is something wrong with your sample because [v1-v2] is not the same method as [v5-v4, v3-v2, v8-v7] but I assume the latter ones were right
See if this works for you
Assumes data in 3 columns in a range named Table1 with column headers Dates, ID, Value
You can paste into PowerQuery using ... Advanced Editor ...
Creates a column with the value of yesterday for that ID and returns a null if nothing is found. Then does the subtraction, and pivots
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Dates", type date}, {"ID", type text}, {"Value", Int64.Type}}),
Yesterday = Table.AddColumn(#"Changed Type" , "Yesterday", (i) => List.Sum(Table.SelectRows( #"Changed Type", each ([ID] = i[ID] and Date.AddDays([Dates],1) = i[Dates]))[Value]), type number ),
#"Replaced Value" = Table.ReplaceValue(Yesterday,null,0,Replacer.ReplaceValue,{"Yesterday"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Custom", each [Value]-[Yesterday]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Value", "Yesterday"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Dates", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Dates", type text}}, "en-US")[Dates]), "Dates", "Custom", List.Sum)
in #"Pivoted Column"

How to repeat a sequence of number in excel

I have a column in excel against which I want to create a column which contains the repeated sequence from the first column
what I have :
What I need against it:
Here is a step-by-step solution using Power Query:
Please note you need to have Excel 2010 or later version to be able to use Power Query. My version is Excel 2016.
I did not use any advanced coding but just a few built-in functions of the Power Query Editor in combination of Text.Repeat formula.
Here is the full code behind the scene just for reference only.
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Column1", "Column1 - Copy"),
#"Renamed Columns" = Table.RenameColumns(#"Duplicated Column",{{"Column1", "Number"}, {"Column1 - Copy", "Text"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"Text", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each Text.Repeat([Text],[Number])),
#"Split Column by Position" = Table.ExpandListColumn(Table.TransformColumns(#"Added Custom", {{"Custom", Splitter.SplitTextByRepeatedLengths(1), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Position",{{"Custom", Int64.Type}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type2",{"Custom"})
in
#"Removed Other Columns"
Cheers :)
Here is one way of doing this:
Formula in B2:
=IF(COUNTIF($B$1:B1,B1)=INDEX($A$2:$A$6,SUMPRODUCT(1/(COUNTIF($B$1:B1,$B$1:B1)))-1),INDEX($A$2:$A$6,SUMPRODUCT(1/(COUNTIF($B$1:B1,$B$1:B1)))),INDEX($A$2:$A$6,SUMPRODUCT(1/(COUNTIF($B$1:B1,$B$1:B1)))-1))
It's a rather long formula and can be significantly shorter, but I got a feeling your sample data does not represent your real data, so this would work also for other number than constantly +1. For example:
Just for interest, you can do it by looking up the row of the output column in the cumulative sums of the input column. I like the idea of getting the output directly from the row number, but I can't see a neat way of implementing it
(1) Helper column
Put the cumulative totals in column B:
=SUM(A1:A$1)-A1
Then just do a lookup in the output column:
=IF(ROW()>SUM(A$1:A$5),"",INDEX(A$1:A$5,MATCH(ROW()-1,B$1:B$5)))
(2) Subtotal/offset combo:
=IF(ROW()>SUM(A$1:A$5),"",INDEX(A$1:A$5,MATCH(ROW()-1,SUBTOTAL(9,OFFSET($A$1,0,0,ROW(A$1:A$5)))-A$1:A$5)))
This has to be entered as an array formula using CtrlShiftEnter

Resources