Powerquery: split multi-value cell to below empty cells - excel

I am importing a lot of tables from pdf-files into Excel by Powerquery - which works pretty well.
Beside several other migrations I have the following task which I am not able to solve:
In some cases - esp. after page breaks - single values that should go into single cells (one below the other) are placed into one cell joined by linebreaks and below cells are empty.
I need to split the values of such a cell (cell-content contains line-breaks) and put 2nd to n value into the according empty cells below this cell.
(It's kind of a "splitted drill-down" ...)
I am pretty new to M (not to VBA or programming) but I am not able to find a working solution.

This is difficult to do robustly but you can expand using Text.Split on the line feed delimiter as #horseyride suggests and remove the blank rows on that second column and then smash the columns back together with Table.FromColumns.
Here's an example you can paste into the Advanced Editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUoEYkMDpVgdCDcJiI0gXCMgMxkkawrnpsTkpcbkpYEEjeCCIJ4FMs8IImcMZKaDJI3h3AwQ11wpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Week = _t, A = _t, B = _t]),
TransformA = List.Select(List.Combine(List.Transform(Source[A], each Text.Split(_, "#(lf)"))), each Text.Length(_) > 0),
FromCols = Table.FromColumns({Source[Week], TransformA, Source[B]}, {"Week", "A", "B"})
in
FromCols
This takes a starting table like this:
Transforms the A column as a list, splitting each element on the line feed character, combining each result back together, and filtering out null and empty strings:
The final step takes columns Week and B from the original table and sticks the transformed column A in the middle:
You'll run into trouble if the number of extra expanded rows doesn't exactly match the number of blank rows removed but this should work under the assumption that they do match.

right click column
transform data type text
right click column ... split column ... by delimiter ... advanced option, split using special characters [x] .. split into rows
then use arrow atop that column to filter out null rows
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Column2", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
#"Filtered Rows" = Table.SelectRows(#"Split Column by Delimiter", each ([Column2] <> null)),
in #"Filtered Rows"

Related

Extract a number from text in a PowerBI field and create a new calculated column

I have the following data in Excel, which I'm importing into PowerBI.
In the short description, there is a code (immediately after IDN) in each row - I need to extract just the number. THe number is not always the same length and it may be followed by a space, or another character (a - in the screenshot).
In excel I can use: =SEARCH("IDN",A2) to find the start of the IDN text - FirstDetectIDN
I can then find the next space (NextSpace) using find again: =FIND(" ",A2,B2)
I use the same to find the NextSpace2 - so I now have the starting and end position of the spaces surrounding the number I want to extract.
But that gives me the extra characters on the end of the number ("-EOL" above in the screenshot) that I don't want.
Is there any way in PowerBI that I can replicate all of that in one new calculated column AND also only extract the number part (so for the second line, I would only want 784729 in the new calculated field).
Thanks for any suggestions,
Mark
If you have digits AFTER the set of digits comprising the IDN number, then try this slightly more complicated version:
First split on IDN
Then split on the transition from digit to non-digit
Note that by specifying the Added Column as type text, we will retain any leading zeros in the IDN. If you prefer, you can specify as type number or Int64.Type which will drop any leading zeros.
let
//Change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
//Set data type
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Short description", type text}}),
//Extract the first set of digits after "IDN"
#"Added Custom" = Table.AddColumn(#"Changed Type", "IDN", each
Text.Trim(
Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c))
(Text.AfterDelimiter([Short description],"IDN")){0}), type text)
in
#"Added Custom"
This type of data cleaning should be done in Power Query.
Add a new column and type in the following code:
let
a = Text.AfterDelimiter([Column1],"IDN"),
b = List.Transform({a}, each Text.Select(_, {"0".."9"}))
in b{0}
Full code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCk4tUzBV8HTxU7A0MbcwVtBVcCxNyfTITM9QitWJVvJNLUksSszNL80rASsytzAxN7LUdfX3AaoMT03JTU1Rio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each let
a = Text.AfterDelimiter([Column1],"IDN"),
b = List.Transform({a}, each Text.Select(_, {"0".."9"}))
in b{0})
in
#"Added Custom"

Repeated excel rows based on a cell with multiple values

The end goal would be to change the granularity of a report, where each row would be repeated X times (where X is the nr of IDs in one cell), with the relevant ID on each row
So data like this
which should be displayed as such
is there a way in which each row can be repeated, with the relevant IDs from the 4th column?
I tried something in Power Query Editor, however I only figured out a way to create more columns based on how many IDs there are - but its the ideal solution
I also found this article which is really helpful https://www.extendoffice.com/documents/excel/4054-excel-duplicate-rows-based-on-cell-value.html#a1 yet it only solves half of the problem, as it would only duplicate the rows based on how many IDs there are - how can this be done in a way that it actually populates the relevant ID too?
You can use this query:
let
Source = Excel.CurrentWorkbook(){[Name="Sheet1"]}[Content],
#"Split Column by Delimiter" = Table.SplitColumn(Source, "IDs", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Ids.1", "Ids.2", "Ids.3"}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"Name", "date", "detail"}, "Attribute", "ID"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"})
in
#"Removed Columns"
It first uses the "split column" function and then unpivots the table by keeping the first three columns.
You have to adjust the sheet name and the column names as well.
2nd option:
let
Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Ids", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Ids")
in
#"Split Column by Delimiter"
Using the advanced options of the split column dialog and splitting into rows.

Replace text within a table for all cells that contain a given word for n columns

I have data within a table that occasionally has been inputted with text to say something like not available or No Data etc. I wish to replace each instance a cell contains no that this is then replaced with null across n number of columns. I don't know every type of word that has been entered but it looks as though each cell to be converted to null contains no as characters so I will go with this.
i.e.
Is there any way to combine `if text.contains([n columns],"no") then null else [n columns]
In powerquery, this removes the content of any cell containing (No,NO,no,nO) and converts to a null
Click select the first column, right click, Unpivot other columns
click select Value column and transform ... data type .. text
right click Value column and transform ... lower case
we really don't want that so change this in the formula bar
= Table.TransformColumns(#"Changed Type1",{{"Value", Text.Lower, type text}})
to resemble this instead (which also ignore the Case of the No)
= Table.TransformColumns(#"Changed Type1",{{"Value", each if Text.Contains(_,"no", Comparer.OrdinalIgnoreCase) then null else _, type text}})
click select attribute column
Transform ... pivot column
values column:Value, Advanced ... don’t aggregate
sample full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column"}, "Attribute", "Value"),
#"Changed Type1" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Value", type text}}),
#"CheckForNo" = Table.TransformColumns(#"Changed Type1",{{"Value", each if Text.Contains(_,"no", Comparer.OrdinalIgnoreCase) then null else _, type text}}),
#"Pivoted Column" = Table.Pivot(#"CheckForNo", List.Distinct(#"Lowercased Text"[Attribute]), "Attribute", "Value")
in #"Pivoted Column"

retrieving values by specifying index postion

I have a column which looks something like below:
1235hytfgf ui
3434jhjhjh ui
6672jhjkhj ty
I have to name 1st four characters as numbers; the next 6 characters as type and last 2 as id; which function should I use to say that from index(0-3) should be numbers and index(4-9) : type
LEFT(), RIGHT() and MID() are three functions used to rip out portions of a string.
If you data was starting in B2 you could use the following formula in an empty cell to pull the first 4 charters.
LEFT(B2,4)
Now that will pull the first 4 characters and leave them as characters. If you want the numbers as a string to be converted to a numeric value then one of the easy ways to convert it is to send it through a math operation which does not change its value. *1, +0, -0, /1 and -- all work. You formula may look like one of the following:
--LEFT(B2,4)
LEFT(B2,4)+0
LEFT(B2,4)*1
LEFT(B2,4)/1
LEFT(B2,4)-0
To grab the middle portion of the string, use the MID function. Since you already know the starting position and length of the string to pull you can hard code the information into your formula and it will look as follows:
MID(B2,5,6)
5 is the starting position for which character to start pulling from, and 6 is the length of the string to pull or number of characters to pull.
To get the last 2 characters, similar to the first function, use RIGHT(). The formula would look as follows:
RIGHT(B2,2)
If you are dealing with a large amount of data, say over thousands of records, try power query in excel.
Please refer to this article to find out how to use Power Query on your version of Excel. It is available in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.
Steps are:
Add your one-column data to Power Query Editor;
Highlight the column, use Split Columns function and select By Digit to Non-Digit, it will separate the first nth numerical characters from the string;
Highlight the second column, use Split Columns function again and select By Delimiter and use space as the delimiter, it will separate the type and id as desired;
Renamed the columns, and you should have something like the following:
You can then Close & Load the output to a new worksheet (by default).
Here is the power query M Code behind the scene for reference only. All functions used are available in GUI so should be easy to execute.
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Split Column by Character Transition" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Column1.1", "Column1.2"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Split Column by Character Transition", "Column1.2", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.1", "numbers"}, {"Column1.2.1", "type"}, {"Column1.2.2", "id"}})
in
#"Renamed Columns"
Let me know if you have any questions. Cheers :)

How to get mutiple substrings in Microsoft Excel Cell

I'm trying get from a cell just the value of the 'id' tag separated by ';'.
The data is as follows:
Cell:
A1: {"id":1145585,"label":"1145585: Project Z"}
A2: {"id":1150322,"label":"1150322: Project Waka 1"}|{"id":1150365,"label":"1150365: Project Waka 2"}
A3: {"id":1149240,"label":"1149240: Analysis of Technical Options"}|{"id":1149258,"label":"1149258: Check and Report"}
A4: {"id":1148925,"label":"1148925: Change Management Review"}|{"id":1148920,"label":"1148920: Follow-Up Meetings"}|{"id":1148923,"label":"1148923: Launch Date Definition"}
I have tried to use left, mid and find functions, however the number of 'IDs' can vary from 1 to 1000. I'm also trying to avoid using vba, but it seems to be the only option. So any solution is great!
The result should be:
Cell:
A1: 1145585
A2: 1150322;1150365
A3: 1149240;1149258
A4: 1148925;1148920;1148923
Any ideas?
Thanks!
Sounds like a task for #powerquery. Please refer to this article to find out how to use Power Query on your version of Excel. It is availeble in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.
The steps are:
Load the source data to power query editor which should look like the following:
Use Index Column function under the Add Column tab to add an Index column;
Use Split Column function under the Transform tab to split the column by custom delimiter "id": and put the results into Rows as shown below:
Use Extract function under the Transform tab to extract the first 7 characters of the column;
Change the Data Type to Whole Number, remove Errors, and then change the Data Type back to Text;
Use Group By function under the Transform tab to group Column1 by Index as set out below. Don't panic if the result is in error as it is expected.
Go back to last step and replace the original formula in the formula bar with the following one as Text.Combine is not a built-in function:
= Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})
Close & Load the output to a new worksheet (by default), and you should have the following:
Here are the Power Query M codes behind the scene. Most of the steps are performed using built-in functions except the last step of manually replacing the formula with the correct one. Let me know if you have any questions. Cheers :)
let
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Column1", Splitter.SplitTextByDelimiter("""id"":", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1", type text}}),
#"Extracted First Characters" = Table.TransformColumns(#"Changed Type1", {{"Column1", each Text.Start(_, 7), type text}}),
#"Changed Type2" = Table.TransformColumnTypes(#"Extracted First Characters",{{"Column1", Int64.Type}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type2", {"Column1"}),
#"Changed Type3" = Table.TransformColumnTypes(#"Removed Errors",{{"Column1", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})
in
#"Grouped Rows"
Based on #TerryW comment, here is a solution using the FILTERXML function available in Excel 2013+. But it also requires TEXTJOIN which did not appear until later versions of Excel 2016 (and office 365)
It relies on the fact that the id string is always followed by a comma.
A disadvantage is that FILTERXML will return the numeric id's as numeric values. So leading zero's will be dropped. If there are always a fixed number of digits in the id and leading zero's need to be present, this can be mitigated by using the TEXT function.
We construct an xml by dividing both on id and on comma
We then use an xpath to return the node which follows the node that contains id
=TEXTJOIN(";",TRUE,FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A1,"""id"":",",id,"),",","</s><s>")&"</s></t>","//s[text()='id']/following-sibling::*[1]"))
Since this is an array formula, you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar
Source
Results

Resources