Related
I have data set of basic housing data in the following format:
Existing data format:
That format is the same and reapeats for hundrets of properties. I would like to transform that that into a table format like the following example:
Property Type
Price
Location
Region
Additional info
Area
House
252000
London
Kensington
4500 square meters
...
...
...
...
...
etc
In other words I want to make the text before ":" symbol column name with the text after it the data that goes into into the corresponding cell and to repeat that for hundrets of sites. Usually there is missing(no data) in Additional info but sometimes there is.
I am not shure which is the best program to do this. So far in my mind comes Excel but if there is an easier way I will be glad to use it.
As per my below screenshot Excel 365 I have used following formulas.
C2=FILTERXML("<t><s>"&SUBSTITUTE(INDEX($A:$A,SEQUENCE(COUNTA($A:$A)/4,1,1,4)),": ","</s><s>")&"</s></t>","//s[last()]")
D2=FILTERXML("<t><s>"&SUBSTITUTE(INDEX($A:$A,SEQUENCE(COUNTA($A:$A)/4,1,2,4)),": ","</s><s>")&"</s></t>","//s[last()]")
E2=FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(INDEX($A:$A,SEQUENCE(COUNTA($A:$A)/4,1,3,4)),",","</s><s>"),":","</s><s>")&"</s></t>","//s[2]")
F2=FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(INDEX($A:$A,SEQUENCE(COUNTA($A:$A)/4,1,3,4)),",","</s><s>"),":","</s><s>")&"</s></t>","//s[last()-1]")
H2=FILTERXML("<t><s>"&SUBSTITUTE(INDEX($A:$A,SEQUENCE(COUNTA($A:$A)/4,1,4,4)),": ","</s><s>")&"</s></t>","//s[last()]")
If you are not in Excel 365 then can try-
=FILTERXML("<t><s>"&SUBSTITUTE(INDEX($A:$A,ROW($A1)+(ROW($A1)-1)*3),": ","</s><s>")&"</s></t>","//s[last()]")
Basically =ROW(A1)+(ROW(A1)-1)*3 will generate a sequence of row numbers and INDEX($A:$A,ROW($A1)+(ROW($A1)-1)*3) will return value from Column A as per that sequence. Then FILTERXML() will return expected value specified in xPath parameter.
To know, how FILTERXML() works yo can read this article from JvdV. This is a fantastic article for FILTERXML() lover.
You can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
Note: The fnPivotAll function is a custom function that enables a method of creating a non-aggregated Pivot Table where there are multiple values per Pivot Column. From the UI, you add this as a New Query from Blank, and just paste that M-code in place of what's there
M-Code (for main query)
let
//Read in data
//Change table name in next line to your actural table name
Source = Excel.CurrentWorkbook(){[Name="Table1_2"]}[Content],
//Split by comma into new rows
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Column1",
Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),
let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
//Remove the blank rows
#"Filtered Rows" = Table.SelectRows(#"Split Column by Delimiter", each ([Column1] <> "" and [Column1] <> " ")),
//Split by the rightmost colon only into new columns
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows", "Column1",
Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, true), {"Column1.1", "Column1.2"}),
//Split by the remaining colon into new rows
// So as to have empty rows under "Additional data"
//Then Trim the columns to remove leading/trailing spaces
#"Split Column by Delimiter2" = Table.ExpandListColumn(Table.TransformColumns(#"Split Column by Delimiter1", {{"Column1.1", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1.1"),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.1", type text}, {"Column1.2", type text}}),
#"Trimmed Text" = Table.TransformColumns(#"Changed Type",{{"Column1.1", Text.Trim, type text}, {"Column1.2", Text.Trim, type text}}),
//Create new column processing "Additional Data" to show a blank
// and Price to just show the numeric value, splitting from "EUR"
#"Added Custom" = Table.AddColumn(#"Trimmed Text", "Custom", each if [Column1.1] = "Additional data" then " "
else if [Column1.1] = "Price" then Text.Split([Column1.2]," "){1} else [Column1.2]),
//Remove unneeded column
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Column1.2"}),
//non-aggregated pivot
pivot = fnPivotAll(#"Removed Columns","Column1.1","Custom"),
//set data types (frequently a good idea in PQ
#"Changed Type1" = Table.TransformColumnTypes(pivot,{
{"Property type", type text},
{"Location", type text},
{"region", type text},
{"Additional data", type text},
{"Area", type text},
{"Price", Currency.Type}})
in
#"Changed Type1"
M-Code (for custom function)
be sure to rename this query: fnPivotAll
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
Background:
I'm trying to create query that incorporates dynamic column names such that if a user changes the name of a column within a table, the logic still follows for any transformations.
The problem is quite straightforward but difficult to explain in just a few sentences. I have tried to break down everything I've done to easily show the problem I am having. If obvious please just skip to the problem at the end.
Demo of Dynamic Columns:
To begin making the column Names dynamic, I use:
DynamicNameHeader = Table.ColumnNames(Source)
This creates a list of the column names from the original table. If the Table name is altered in the spreadsheet, this list is dynamically updated. This list can be used to indirectly refer to columns within your M code.
As an example to show this, Here I have changed Column1 in Table1 to Hello and after refreshing the data the outputted table updates the column to read Hello accordingly.
This works, by referring to the DynamicNameHeader list and indexing for the desired column. The code also demonstrates simple transformations that can be achieved in this by changing the text to uppercase and reordering the columns.
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
DynamicHeaderNames =Table.ColumnNames(Source),
#"Uppercased Text" = Table.TransformColumns(#"Source",{{DynamicHeaderNames{0}, Text.Upper, type text}, {DynamicHeaderNames{1}, Text.Upper, type text}}),
#"Reordered Columns" = Table.ReorderColumns(#"Uppercased Text",{DynamicHeaderNames{1}, DynamicHeaderNames{0}})
in
#"Reordered Columns"
This is just an easy example to show how I'm trying to integrate Dynamic Columns in this way.
PROBLEM
Here is a simple version of the actual data that has been sorted in the outputted table such that the data in Column 1 is Prioritised over values in Column2. The numbers in column 3 just correspond to values in column2.
Working M code to achieve this output:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//combine the columns
#"Added Custom" = Table.AddColumn(Source, "Custom", each
let
L1 = if [Column2] = "-" then {[Column1]}
else List.Combine({{[Column1]},Text.Split([Column2],"#(lf)")}),
L2 = if [Column2] = "-" then {[Column3]}
else List.Combine({{"-"},Text.Split([Column3],"#(lf)")})
in
List.Zip({L1,L2})),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Column1", "Column2", "Column3"}),
//split the combined columns
#"Expanded Custom" = Table.ExpandListColumn(#"Removed Columns1", "Custom"),
#"Extracted Values" = Table.TransformColumns(#"Expanded Custom",
{"Custom", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values",
"Custom", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Column1", "Column3"})
in
#"Split Column by Delimiter"
I wish to make this code Dynamic as in the simplified example, however I am running into lots of syntax issues when I add the Custom Column step.
In essence I wish to reference columns 1, 2 and 3 indirectly by indexing the DynamicNameHeader list as before. Is this possible? Importantly, this isn't just to allow the column names to be altered by the user, but so that transformations to the data also can refer to the relevant columns dynamically too. This Custom Column Transformation is pretty much the only step proving difficult because it uses [] which dont appear to be compatible with DynamicNameHeader{x}.
I hope this explanation is clear enough to understand what I am trying to achieve and if anyone has any solutions to this problem it would be really appreciated.
Your posted M Code doesn't really return the results you show, but here is an example of how to use undefined column headers in your custom column.
It involves adding an Index column to sort things out.
Note that I also replaced nulls which were in my sample data, as the Text.Split function will fail with a null
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
allCols=Table.ColumnNames(Source),
col1 = allCols{0},
col2 = allCols{1},
col3 = allCols{2},
IDX = Table.AddIndexColumn(Source,"IDX",0,1),
#"Replaced Value" = Table.ReplaceValue(IDX,null,"",Replacer.ReplaceValue,allCols),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "lists", each let
col2List = Text.Split(
Table.Column(#"Replaced Value",col2){[IDX]},
"#(lf)"),
col3List = Text.Split(
Text.From(Table.Column(#"Replaced Value",col3){[IDX]}),
"#(lf)")
in
List.Zip({col2List,col3List}))
Edit
Here is an example of M-Code that is insensitive to the actual column names, and will produce the output you show from the input you show:
let
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
allCols=Table.ColumnNames(Source),
col1 = allCols{0},
col2 = allCols{1},
col3 = allCols{2},
IDX = Table.AddIndexColumn(Source,"IDX",0,1),
#"Added Custom" = Table.AddColumn(IDX, "Custom", each if Table.Column(IDX,col2){[IDX]} = "-"
then
List.Zip({{Table.Column(IDX,col1){[IDX]}},{Text.From(Table.Column(IDX,col3){[IDX]})}})
else
List.InsertRange(
List.Zip({
Text.Split(Table.Column(IDX,col2){[IDX]},"#(lf)"),
Text.Split(Table.Column(IDX,col3){[IDX]},"#(lf)")}),
0,{{Table.Column(IDX,col1){[IDX]},"-"}})),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",List.Combine({{"IDX"}, allCols})),
#"Expanded Custom" = Table.ExpandListColumn(#"Removed Columns", "Custom"),
#"Extracted Values" = Table.TransformColumns(#"Expanded Custom",
{"Custom", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Custom",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {col1, col3}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{col1, type text}, {col3, type text}})
in
#"Changed Type"
I have data in this form:
In PowerBI, I added another column where I keep only code combinations that consist of one single code. I did this with something like IF(LEN(Code > 1), "", Code). Result:
This enables me to create a slicer that contains only single codes. I also added a table that shows Codes.
Now, when I select codes in the slicer, I want the table to show these codes, plus the exact combination of it.
For example, when I select A and B, the table should show me A and B and A, B. I don't want it to show A, B, C although it contains A and B.
If I filter for A and B and C, however, I want it to show A and B and C and A, B, C - but not A, B.
How can I achieve that?
All entries in Codes are saved as strings.
You need a disconnected (not connected to the base table) table for your slicer. Now, if I consider your base table name is - your_table_name, you can create the new slicer table with this below code-
slicer =
SELECTCOLUMNS(
FILTER(
your_table_name_8,
LEN(your_table_name_8[codes]) = 1
),
"code",your_table_name_8[codes]
)
After creating the new slicer table, check in the model is there any auto relation detected between 2 tables or not. If you find any relation, just Remove the relation.
Now, create your slicer from the newly created table and create this below measure in your base table your_table_name-
show_or_hide =
VAR current_code = MIN(your_table_name_8[codes])
VAR comma_separated_list =
CALCULATE (
CONCATENATEX (
VALUES(slicer[code]),
slicer[code],
","
)
)
RETURN
IF(
current_code = comma_separated_list || (LEN(current_code) = 1 && CONTAINSSTRING(comma_separated_list,current_code)),
1,
0
)
lets see the outcome-
Finally, you can apply a visual level filter using the new measure show_or_hide and apply a filter so that value with True only shown in the visual.
Reorder Combination
Let we have this following table-
Now this following code from Advanced Query Editor will give the the expected output-
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WctRx0nFWitUBsZx1nJRiYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Column1"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns", "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Index"}, "Attribute", "Value"),
#"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "column_name", each "Column-" & [Value]),
#"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"column_name", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Attribute"}),
#"Reordered Columns1" = Table.ReorderColumns(#"Removed Columns",{"Index", "column_name", "Value"}),
#"Sorted Rows" = Table.Sort(#"Reordered Columns1",{{"Index", Order.Ascending}, {"column_name", Order.Ascending}}),
#"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[column_name]), "column_name", "Value", List.Min),
#"Merged Columns" = Table.CombineColumns(#"Pivoted Column",{"Column-A", "Column-B", "Column-C"},Combiner.CombineTextByDelimiter(",", QuoteStyle.None),"Merged")
in
#"Merged Columns"
Please check what applied in steps one by one for better understanding. Here is the output with ascending order in the combination-
Index column added for keep tracking the row from start to end. If you have already a similar column, you can use that column as well.
I have a similar problem but a bit more complex as this one :
Power Query: Function to search a column for a list of keywords and return only rows with at least one match and this one : https://community.powerbi.com/t5/Desktop/Power-query-Add-column-with-list-of-keywords-found-in-text/td-p/83109
I have a Database with a lot of columns of which one is a free-text description string.
On another Excel Sheet in the workbook, I've set up a Matching table to categorize the rows based on lists of keywords like this :
category | keywords
pets | dog, cat, rabbit,...
cars | Porsche, BMW, Dodge,...
...
The goal is to put a custom column in my database that will return the hereabove category (or categories ?) based on which listed keywords it can find in the description field.
I think the solution above and the one from ImkeF are not so far but I didn't find a way to turn it into a successful Query for my case. (I'm good at Excel but quite a noob to M and programming Queries...)
oriented on the obove posted links:
M-Code for tbl_category: the keywords (separated with comma) will be split into rows
let
Source = Excel.CurrentWorkbook(){[Name="tbl_category"]}[Content],
#"Replaced Value" = Table.ReplaceValue(Source," ","",Replacer.ReplaceText,{"keywords"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"keywords", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keywords"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keywords", type text}})
in
#"Changed Type1"
M-Code for tbl_text. Here will be add a Custom Column called "Category":
let
Source = Excel.CurrentWorkbook(){[Name="tbl_text"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Category", (Earlier) => Table.SelectRows(tbl_category,
each Text.Contains(Record.Field(Earlier, "Text"), Record.Field(_, "keywords"), Comparer.OrdinalIgnoreCase))),
#"Expanded Category" = Table.ExpandTableColumn(#"Added Custom", "Category", {"Category"}, {"Category"})
in
#"Expanded Category"
Ok,
I've finally found how to build a query to suits my needs based on your steps above!
Note : I used "Row Labels" to replace the column header of the 1st tbl_category column for clarity.
My solution is not as neat as I would like (I had to create a second custom column because of my lack of knowledge on how to nest the two steps so they act on the same cell) but it works perfectly!
So thanks again for your help Chris... without your leads I woudn't have found this maze exit!
here the 2nd code modified:
let
Source = Excel.CurrentWorkbook(){[Name="tbl_text"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Category",
(Earlier) => Table.SelectRows(tbl_category,
each Text.Contains(Record.Field(Earlier, "Text"), Record.Field(_, "keywords"),
Comparer.OrdinalIgnoreCase))),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom",
each Text.Combine(Table.ToList(Table.Transpose(
Table.Distinct(Table.SelectColumns([Category],{"Row Labels"}))),
Combiner.CombineTextByDelimiter(",")), ", ")),
in
#"Added Custom1"
Greetz
Just for the record,
Once applied to real data the query was not working anymore... giving the error "We cannot convert the value null to type Text."
the solution was as easy as removing "null" cells (blank cells that were categories for which no keywords were yet identified) first!
M-Code for tbl_category:
let
Source = Excel.CurrentWorkbook(){[Name="tbl_category"]}[Content],
#"Filtered Rows" = Table.SelectRows(Source, each ([keywords] <> null)),
#"Replaced Value" = Table.ReplaceValue(#"Filtered Rows"," ","",Replacer.ReplaceText,{"keywords"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"keywords", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keywords"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"keywords", type text}})
in
#"Changed Type1"
Greetz
I am attempting to extract cells through a combination of index(match) and right(len)-find() functions from an array of data with text. In my formula, I am searching for instances of "* DS#1 ", excel returns those but also returns instances with " DS#11 *". How do I get excel to return only DS#1?
I have attempted to use an if statement with no success, if(formula="* 11 *","",formula).
Below is a link to an example of the data. The first cell highlighted in yellow should not be returning that text, it should be "". The second cell highlighted in yellow is appropriate to return that data.
example data
=RIGHT(INDEX($V:$AC,MATCH("DS#1",$AC:$AC,0),1),LEN(INDEX($V:$AC,MATCH(FW$1,$AC:$AC,0),1))-FIND($AG2,INDEX($V:$AC,MATCH(FW$1,$AC:$AC,0),1))+1)
Here some example on how to find a value and check for the following char.
Formula in D2:
=INDEX(A2:A6,MATCH(1,INDEX((ISNUMBER(SEARCH("DS#1",B2:B6)))*(NOT(ISNUMBER(MID(B2:B6,SEARCH(C2,B2:B6)+LEN(C2),1)*1))),0),0))
Here is a formula you can adapt to your ranges which will return a list from the range rngDS that contain findDS. I used named ranges, but you can adapt them to your own ranges.
Not sure if this is what you want since you chose to not post examples of your data or desired results.
The routine finds the findDS string and then checks to be sure that the following character is non-numeric.
C1: =IFERROR(INDEX(rngDS,AGGREGATE(15,6,1/(NOT(ISNUMBER(-MID(rngDS,SEARCH(findDS,rngDS)+LEN(findDS),1))+ISERROR(MID(rngDS,SEARCH(findDS,rngDS)+LEN(findDS),1))))*ROW(rngDS),ROWS($1:1))),"")
and fill down
It would be very difficult to come up a formula based solution especially when you need to first differentiate DS1, DS#1. DS#11, DS#11X etc. then look for the text string after each DS code, not to mention these confusing codes may (or may not) be positioned in random orders in the text string.
A better approach would be using Power Query which is available in Excel 2010 and later versions. My solution is using Excel 2016.
Presume you have the following two tables:
You can use From Table function in the Data tab to add both tables to the Power Query Editor.
Once added, make a duplicate copy of Table 1. I have renamed the duplicate as Table1 (2) - Number Ref. Then you should have three un-edited queries:
If your source data is a larger table containing some other information, you can google how to add a worksheet to the editor and how to remove unnecessary columns and remove duplicated values.
Firstly, let's start working with Table1.
Here are the steps:
Use Replace Values function to remove all # from the text string, and then replace all DS with DS# in the text string, so all DS codes are in the format of DS#XXX. Eg. DS8 will be changed to DS#8. This step may not be necessary if DS8 is a valid code as well as DS#8;
Use Split Column function to split the text strings by the word DS, and put each sub text string into a new row, then you should have the following:
Use Split Column function again to split the text strings by 1 Character from the left and you should have the following:
Filter the first column to show hash tag # only and then remove the first column, then you should have the following:
use Replace Values function repeatedly to remove the following characters/symbols from the text strings: (, ), HT, JH, SK, //, and replace dash - with space . I presume these are irrelevant in the comment but you can leave them if needed. Then you should have:
use Split Column function again to split the text string by the first space on the left, then you should have:
Then you can Trim and Clean the second column to further tidy up the comments, rename the columns as DS#, Comments, and Number Ref consecutively, and change the format of the third column to Text. Then you should have:
The last step is to add a custom column called Match ID to combine the value from first and third column into one text string as shown below:
Secondly, let's work on Table1 (2) - Number Ref
Here are the steps:
Remove the first column so leave the Number Ref column as the single column;
Transpose the column, and Promote the first row as header. Then you should have:
The purpose of this query is to transform all Number Reference into column headers, then append this query with the next query (Table2) to achieve the desired result which I will explain in the next section.
Lastly, let's work on the third query Table2.
Here are the steps:
Append this table with the Number Ref table from previous step;
highlight the whole table and use Replace Values function to replace all null with number 1. Then highlight the first column and use Unpivot Other Columns function to transform the table as below:
then Remove the last column (Value), and add a new custom column called Match ID to combine the DS code with the Number Reference. Then you should have:
Merge the table with Table1 using Match ID as shown below:
Expand the newly merged column Table1 to show Comments;
Use Split Column function to split the first column by Non-digit to digit, change the format of the digit column to whole number, and then sort Attribute column and the digit column ascending consecutively, then you should have:
Use Split Column function again to split the Match ID column by dash sign -, and remove the first three columns, rename the remaining three columns as DS#, Number Ref and Comments consecutively, then you should have:
Close & Load this table to a new worksheet as desired, which may look like this:
In conclusion, It is entirely up to you how you would like to structure the table in Power Query. You can pre-filter the Number Reference in the editor and load only relevant results to a worksheet, you can load the full table to a worksheet and use VLOOKUP or INDEX to retrieve the data as desired, or you can load the third query to data model from where you can create pivot tables to play around.
Here are the codes behind the scene for reference only. All steps are using built-in functions of the editor without any advanced manual coding.
Table1
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Comments", type text}}),
#"Replaced Value8" = Table.ReplaceValue(#"Changed Type","#","",Replacer.ReplaceText,{"Comments"}),
#"Replaced Value9" = Table.ReplaceValue(#"Replaced Value8","DS","DS#",Replacer.ReplaceText,{"Comments"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value9", {{"Comments", Splitter.SplitTextByDelimiter("DS", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Comments"),
#"Split Column by Position" = Table.SplitColumn(#"Split Column by Delimiter", "Comments", Splitter.SplitTextByPositions({0, 1}, false), {"Comments.1", "Comments.2"}),
#"Filtered Rows" = Table.SelectRows(#"Split Column by Position", each ([Comments.1] = "#")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Comments.1"}),
#"Replaced Value1" = Table.ReplaceValue(#"Removed Columns",")","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1","-"," ",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","(","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value3","HT","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value5" = Table.ReplaceValue(#"Replaced Value4","JH","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value6" = Table.ReplaceValue(#"Replaced Value5","SK","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value7" = Table.ReplaceValue(#"Replaced Value6","//","",Replacer.ReplaceText,{"Comments.2"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value7", "Comments.2", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Comments.2.1", "Comments.2.2"}),
#"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter1",{{"Comments.2.2", Text.Trim, type text}}),
#"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"Comments.2.2", Text.Clean, type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Cleaned Text",{{"Comments.2.1", "DS#"}, {"Comments.2.2", "Comments"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"Number Ref", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Match ID", each "DS#"&[#"DS#"]&"-"&[Number Ref])
in
#"Added Custom"
Table1 (2) - Number Ref
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Comments", type text}, {"Number Ref", Int64.Type}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Number Ref"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Number Ref", type text}}),
#"Transposed Table" = Table.Transpose(#"Changed Type1"),
#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true]),
#"Changed Type2" = Table.TransformColumnTypes(#"Promoted Headers",{{"388", type any}, {"1", type any}})
in
#"Changed Type2"
Table2
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"List", type text}}),
#"Appended Query" = Table.Combine({#"Changed Type", #"Table1 (2) - Number Ref"}),
#"Replaced Value" = Table.ReplaceValue(#"Appended Query",null,"1",Replacer.ReplaceValue,{"List", "388", "1"}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Replaced Value", {"List"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Value"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Match ID", each [List]&"-"&[Attribute]),
#"Merged Queries" = Table.NestedJoin(#"Added Custom", {"Match ID"}, Table1, {"Match ID"}, "Table1", JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "Table1", {"Comments"}, {"Comments"}),
#"Split Column by Character Transition" = Table.SplitColumn(#"Expanded Table1", "List", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"List.1", "List.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Character Transition",{{"List.2", Int64.Type}}),
#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Attribute", Order.Ascending}, {"List.2", Order.Ascending}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Sorted Rows", "Match ID", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), {"Match ID.1", "Match ID.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"List.1", type text}, {"Match ID.1", type text}, {"Match ID.2", type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type2",{"Match ID.1", "Match ID.2", "Comments"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Match ID.1", "DS#"}, {"Match ID.2", "Number Ref"}})
in
#"Renamed Columns"
Cheers :)
Based on the creative answers from this post, I was able to take some of these ideas and form a solution. There are two parts to the solution I found. The first is to replace the strings that I am trying to filter out with a random text that doesn't appear in any instance in my data. Since I had a range of data I needed to replace (DS11 through DS19), I used a VBA function to avoid a large nested function.
Once I had the strings I am trying to filter out replaced, I added an If(Isnumber(search()) function to display "" when the replaced text is returned.
Function REPLACETEXTS(strInput As String, rngFind As Range, rngReplace As Range) As String
Dim strTemp As String
Dim strFind As String
Dim strReplace As String
Dim cellFind As Range
Dim lngColFind As Long
Dim lngRowFind As Long
Dim lngRowReplace As Long
Dim lngColReplace As Long
lngColFind = rngFind.Columns.Count
lngRowFind = rngFind.Rows.Count
lngColReplace = rngFind.Columns.Count
lngRowReplace = rngFind.Rows.Count
strTemp = strInput
If Not ((lngColFind = lngColReplace) And (lngRowFind = lngRowReplace)) Then
REPLACETEXTS = CVErr(xlErrNA)
Exit Function
End If
For Each cellFind In rngFind
strFind = cellFind.Value
strReplace = rngReplace(cellFind.Row - rngFind.Row + 1, cellFind.Column - rngFind.Column + 1).Value
strTemp = Replace(strTemp, strFind, strReplace)
Next cellFind
REPLACETEXTS = strTemp
End Function