Power Query - Remove text between delimiters - excel

I want to remove any text between "( )" including the "( )".
There are many difference instances where I can't simply find and replace.
Small example: ABC (1)
EFG (2)
XYZ (1, 2)
I wish to display
ABC
EFG
XYZ
Found this post, but the code for the function is no longer visible(at least on all the browsers I've tried). https://www.thebiccountant.com/2019/07/15/text-removebetweendelimiters-function-for-power-bi-and-power-query/
I copied the code from one of the comments and it seems to work fine, however when I invoke the function on the column I get all errors with the following: "Expression.Error: The specified index parameter is invalid.
Details:
List"
Does anyone have the code from the author? Or know what I'm doing wrong?
Here is the code from the new custom column after I run the function:
Table.AddColumn(#"Changed Type1", "N", each Query1([#"NEC(s)"], "(", ")", 1, null))
Thanks

Here's a different solution that uses recursion.
(txt as text) =>
[
fnRemoveFirstTag = (DELIM as text)=>
let
OpeningTag = Text.PositionOf(DELIM,"("),
ClosingTag = Text.PositionOf(DELIM,")"),
Output =
if OpeningTag = -1
then DELIM
else Text.RemoveRange(DELIM,OpeningTag,ClosingTag-OpeningTag+1)
in
Output,
fnRemoveDELIM = (y as text)=>
if fnRemoveFirstTag(y) = y
then y
else #fnRemoveDELIM(fnRemoveFirstTag(y)),
Output = #fnRemoveDELIM(txt)
][Output]
It works on your sample data, and should also work if there is more than one set of parentheses delimited substrings in your string.
Copied shamelessly and modified minimally from Power Query: remove all text between delimiters

Is there text to the right of the )?
If not, just split column on custom delimiter ( leftmost, then remove the 2nd column
= Table.SplitColumn(Source, "Column1", Splitter.SplitTextByEachDelimiter({"("}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"})
OR transform the column to remove anything after the initial (
= Table.TransformColumns(Source,{{"Column1", each Text.Start(_,Text.PositionOf(_,"(")), type text}})
If text to the right of the ), try
= Table.TransformColumns(Source,{{"Column1", each Text.Start(,Text.PositionOf(,"("))&Text.End(,Text.Length()-Text.PositionOf(_,")")-1), type text}})

There is an even simpler solution.
You can create a new function called fun_ReplaceTextBetweenDelimiters, and in it add this code 👇
let
fun_ReplaceTextBetweenDelimiters = (Text as text, StartDelimiter as text, EndDelimiter as text, optional ReplaceDelimiters as nullable logical, optional NewText as nullable text, optional TrimResult as nullable logical, optional FixDoubleSpaces as nullable logical) as text =>
let
// Add Default Parameters
Default_ReplaceDelimiters = if ReplaceDelimiters is null then true else ReplaceDelimiters,
Default_NewText = if NewText is null then "" else NewText,
Default_TrimResult = if TrimResult is null then true else TrimResult,
Default_FixDoubleSpaces = if FixDoubleSpaces is null then true else FixDoubleSpaces,
//Do work
TextBetweenDelimiters = Text.BetweenDelimiters(Text, StartDelimiter, EndDelimiter),
TextToReplace = if Default_ReplaceDelimiters then Text.Combine({StartDelimiter,TextBetweenDelimiters,EndDelimiter}) else TextBetweenDelimiters,
ReplacedText = Text.Replace(Text, TextToReplace, Default_NewText),
//Clean Result
TrimmedText = if Default_TrimResult then Text.Trim(ReplacedText) else ReplacedText,
FixedSpaces = if Default_FixDoubleSpaces then Text.Replace(TrimmedText, " ", " ") else TrimmedText
in
FixedSpaces
in
fun_ReplaceTextBetweenDelimiters
Then, we can test it like this:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRyVtAw1FTSAbGUYnWilVzd3BU0jEAiQBZYJCIyCqhGRwEsCOQoxcYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TestData = _t, TargetData = _t]),
ChangeType = Table.TransformColumnTypes(Source,{{"TestData", type text}, {"TargetData", type text}}),
RunFunction = Table.AddColumn(ChangeType, "NewText", each fun_ReplaceTextBetweenDelimiters([TestData], "(", ")", true), type text),
TestResult = Table.AddColumn(RunFunction, "Test", each [TargetData]=[NewText], type logical)
in
TestResult
Input:
TestData
TargetData
ABC (1)
ABC
EFG (2)
EFG
XYZ (1, 2)
XYZ
Output:
TestData
TargetData
NewText
Test
ABC (1)
ABC
ABC
TRUE
EFG (2)
EFG
EFG
TRUE
XYZ (1, 2)
XYZ
XYZ
TRUE

Related

How can i get all the sub-children?

source link
Hello guys, so i have a function ("flecheD"),
(ColChild,ColParent,ParentActuel,source)=>
let
mylist=Table.Column(Table.SelectRows(source,each Record.Field(_,ColParent)=ParentActuel),ColChild),
resultat=Text.Combine(mylist)
in
Text.Trim(
if resultat ="" then "" else # resultat &"|" & # flecheD(ColChild,ColParent,resultat,source),"|")
which loops through 2 columns (Parent,Child) to get all children of the main parent (output->Children column). The problem is that when the function is confronted with several children, the resultat variable no longer has a single letter/child but several, which blocks the function from looking for the other sub-children.
In order to solve this, I tried to create a custom function "SubChilldren" with List.Generate()
(children as text, ColChild,ColParent,source)=>
let
i = 1,
length = Text.Length(children),
subchildren = List.Generate( ()=>#flecheD(ColChild,ColParent,Text.At(children,i-1),source), i<=length, i+1 )
in
Text.Combine(subchildren)
which when coupled with my initial function
(ColChild,ColParent,ParentActuel,source)=>
let
mylist=Table.Column(Table.SelectRows(source,each Record.Field(_,ColParent)=ParentActuel),ColChild),
resultat=Text.Combine(mylist)
in
Text.Trim(
if resultat ="" then "" else if Text.Length(resultat) = 1 then # resultat &"|" & # flecheD(ColChild,ColParent,resultat,source)
else #resultat &"|"& SubChildren(resultat,ColChild,ColParent,source),"|")
should normally get the sub-children of each children. However, it doesnt work . Could you please help me . Thx
I thought this was a fun way, but you could write a recursive function as well. I have it hard coded to 4 levels of children deep
(not sure how in your source data D child can have two parents, c and J, but whatever)
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Parent"}, {{"data", each List.RemoveNulls(_[Child]), type list}}),
Parent_List = List.Buffer(#"Grouped Rows"[Parent] ),
Child_List = List.Buffer(#"Grouped Rows"[data] ),
Process = (n as list) as list =>
let children = List.Transform(List.Transform(n, each Text.ToList(_)), each Text.Combine( List.Distinct(List.Combine(List.Transform(_, each try Child_List{List.PositionOf( Parent_List, _ )} otherwise null))))) in children,
Level1=Process(Source[Parent]),
Level2=Process(Level1),
Level3=Process(Level2),
Level4=Process(Level3),
Final=List.Transform(List.Positions(Level1),each Level1{_}&"|"&Level2{_}&"|"&Level3{_}&"|"&Level4{_}&"|"),
#"Replaced Value" = Table.ReplaceValue(Table.FromList(Final),"||","",Replacer.ReplaceText,{"Column1"}),
custom1 = Table.ToColumns(Source) & Table.ToColumns(#"Replaced Value"),
custom2 = Table.FromColumns(custom1,Table.ColumnNames(Source) & {"Children"})
in custom2
edited to be generic so it can take text as well as numerical inputs
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Parent", type text}, {"Child", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Parent"}, {{"data", each List.Transform(List.RemoveNulls(_[Child]), each Text.From(_)), type list}}),
Parent_List = List.Buffer(List.Transform(#"Grouped Rows"[Parent], each Text.From(_))),
Child_List = List.Buffer(#"Grouped Rows"[data]),
Process = (n as list) as list =>let children = List.Transform(List.Transform(n, each Text.Split(_,",") ) , each try Text.Combine(List.Distinct(List.Combine(List.Transform(_, each try Child_List{List.PositionOf( Parent_List, _ )} otherwise ""))),"," ) otherwise "") in children,
Level1=Process(#"Changed Type"[Parent]),
Level2=Process(Level1),
Level3=Process(Level2),
Level4=Process(Level3),
Final=List.Transform(List.Positions(Level1),each Level1{_}&"|"&Level2{_}&"|"&Level3{_}&"|"&Level4{_}&"|"),
#"Replaced Value" = Table.ReplaceValue(Table.FromColumns({Final}),"||","",Replacer.ReplaceText,{"Column1"}),
custom1 = Table.ToColumns(#"Changed Type") & Table.ToColumns(#"Replaced Value"),
custom2 = Table.FromColumns(custom1,Table.ColumnNames(#"Changed Type") & {"Children"})
in custom2

Excel - Power Query TrimStart With condition

I want to trim 2 chars. if it’s Start with “20”
https://i.stack.imgur.com/MlmmE.png
You can test whether the text starts with "20", and if so, then return the text after "20":
if Text.StartsWith(Example, "20") then Text.AfterDelimiter(Example,"20") else Example
Edited answer:
A similar approach, but this step transforms values in a table column. Change Source and MyValue references to suit.
= Table.TransformColumns(Source, {{"MyValue", each Number.From(let x = Number.ToText(_) in if Text.Start(x,2) = "20" then Text.AfterDelimiter(x,"20") else x), type number}})
I started with a basic text file that looks like this ...
Example
20456 208899 123 366420
20324 535435 654 533454
852 929492 583 283832
... and for transparency and to show you how I got through it, all steps are outlined below. I suspect this could be done in a more efficient manner but it does the job.
Step 1
Step 2
Step 3
= Table.AddColumn(
#"Promoted Headers",
"Result",
each if Text.StartsWith([Example], "20") then Text.Middle([Example], 2) else [Example])
Step 4
Step 5 (Result)
Split your string by the space into a List.
Transform the list by checking each element to see if it starts with 20
eg: As an added column:
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom",
each
Text.Combine(
List.Transform(
Text.Split([Column1]," "),
each
if Text.StartsWith(_,"20")
then Text.Middle(_,2)
else _)," ")
)
Original and Results
Note:
If you don't want to add a column, you can transform the existing column with the same algorithm, but you need to use the Advanced Editor to enter the code:
trim20 = Table.TransformColumns(#"Previous Step", {"Column1", (c)=>
Text.Combine(
List.Transform(Text.Split(c," "),
each if Text.StartsWith(_,"20") then Text.Middle(_,2) else _)," ")})
If you're doing it on sheet directly, it's very straight forward.
=IF(LEFT(A1,2) = "20", MID(A1, 3, 1000), A1)

Replace by regular expresion in Excel

I have a list in Excel like the following:
1 / 6 / 45
123
1546
123 456
1247 /% 456 /
I want to create a new column with all sequences of consecutive non digits replaced by a character. In Google Sheets, this is easy using =REGEXREPLACE(A1&"/","\D+",","), resulting in:
1,6,45,
123,
1546,
123,456
1247,456,
In that formula A1&"/" is needed in order for REGEXREPLACE to work with numbers. No big deal, just adds a comma at the end.
How can we do this in Excel? Pure Power Query (not R, not Python, just M) is very much encouraged. VBA and other clickable Excel features are unacceptable (like find and replace).
If you have Excel 365:
In B1:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),SUBSTITUTE(TRIM(CONCAT(IF(ISNUMBER(--X),X," ")))," ",","))
Or if streaks of digits are always delimited by at least a space:
=TEXTJOIN(",",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.*0=0]"))
Another option, if you have got access to it, is LAMBDA(). Make a function to replace all kind of characters, something along the lines of this. Without LAMBDA() and TEXTJOIN() I think your best bet would be to start nesting SUBSTITUTE() functions.
Here is a Power Query solution.
It makes use of the List.Accumulate function to determine whether to add a digit, or a comma, to the string:
Note that the code replicates what you show for results. If you prefer to avoid trailing (and/or leading) commas, it can be easily modified.
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
Edit To eliminate leading/trailing commas, we add the Text.Trim function which, in Power Query, allows defining a specific text to Trim from the start/end:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each
Text.Trim(
List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ","),
",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
VBA UDF You mentioned you did not want VBA, but not clear if you were restricting that to a "clickable". Here is a user defined function that you can use on a worksheet directly. It uses the VBA regex engine which allows easy extraction of multiple matches
You can enter a formula on the worksheet such as =commaSep(cell_ref) to get the same results as shown above in my second PQ example
Option Explicit
Function commaSep(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\d+"
If .test(S) Then
Set MC = .Execute(S)
sTemp = ""
For Each M In MC
sTemp = sTemp & "," & M
Next M
commaSep = Mid(sTemp, 2)
Else
commaSep = "no digits"
End If
End With
This is another variation if you have TEXTJOIN function available.
=SUBSTITUTE(TRIM(TEXTJOIN("",TRUE,IFERROR(MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),1)+0," ")))," ",",")
And another option in Power Query.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTQVzADYhNTpVgdINfIGEKbmpjBBIByZgpQjom5gr4qWEBfKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
x1 = Table.AddColumn(#"Changed Type", "x1", each Text.ToList([Column1])),
x2 = Table.AddColumn(x1, "x2", each List.Transform([x1], each if Text.Contains("0123456789", _) then _ else " " )),
x3 = Table.AddColumn(x2, "x3", each Text.Split(Text.Combine([x2])," ")),
x4 = Table.AddColumn(x3, "x4", each List.Transform([x3], each if Text.Contains("0123456789", try Text.At(_,0) otherwise " ") then _&"," else "" )),
x5 = Table.AddColumn(x4, "x5", each Text.Combine([x4])),
#"Removed Columns" = Table.RemoveColumns(x5,{"x1", "x2", "x3", "x4"})
in
#"Removed Columns"

Conditional formatting based on lists

I'm new to M and would like to create an "if, then, else statement" based on values inside a list.
Basically I have 4 lists:
let
FoodCompanies = {"Nestlé", "Pepsico", "Unilever"},
ClothingCompanies = {"Nike", "Ralph Lauren", "Old Navy"},
TechCompanies = {"Apple", "Samsung", "IBM"},
AllCompanies = {FoodCompanies,ClothingCompanies,TechCompanies}
Now I want to create a conditional column that checks for another column (tag) if one of the values is present and based on that makes a calculation.
| ItemId| DateOfSale | tag |
| 001 | 01/01/1980 | Nestlé |
| 002 | 01/01/1980 | Nike, Apple |
| 003 | 01/01/1980 | Unilever, Old Navy, IBM |
| 004 | 01/01/1980 | Samsung |
So ... I start like this:
#"Added Conditional Column" = Table.AddColumn(#"Renamed Columns3", "type", each
Single values
if [tag] = "" then "Empty tag"
else if [tag] = "Nestlé" then "Nestlé"
else if [tag] = "Nike" then "Nike"
...
Multiple values
It's for the multiple values I don't know how to create the logic
If tag contains more then 1 value from FoodCompanies but not from ClosthingCompanies or Techcompanies I want it to be "FoodCompanies"
If tag contains more then 1 value from ClothingCompanies but not from FoodCompanies or Techcompanies I want it to be "ClothingCompanies"
If tag contains 2 values from AllCompanies it should be "MixedCompanies"
if tag contains all values from AllCompanies it should be "AllofThem"
...
Anyone can help me on the way? I would do it like
else if List.Count(FoodCompanies) > 1 and ( List.Count(ClothingCompanies) < 1 or List.Count(Techcompanies) < 1) then "FoodCompanies"
but how do I evaluate it against the tag value?
Here's one approach, which converts your list of companies to a table, matches the tag values, filters the results, then determines the output:
#"Renamed Columns3" = //your previous step here
fnMatchingList = (MyList) =>
let
AllLists = #table(type table [#"ListName"=text, #"ListValues"=list],
{{"FoodCompanies",{"Nestlé", "Pepsico", "Unilever"}},
{"ClothingCompanies", {"Nike", "Ralph Lauren", "Old Navy"}},
{"TechCompanies",{"Apple", "Samsung", "IBM"}}}),
FullList = Table.ExpandListColumn(AllLists, "ListValues"),
Match = Table.AddColumn(FullList, "Match", each List.Contains(MyList,[ListValues])),
Filtered = Table.SelectRows(Match, each ([Match] = true)),
Output = if Table.RowCount(Filtered) = 1 then Filtered{0}[ListValues] else
if List.Distinct(Filtered[ListName]) = List.Distinct(FullList[ListName]) then "AllCompanies" else
Text.Combine(List.Distinct(Filtered[ListName]),", ")
in
Output,
#"Added Matching List" = Table.AddColumn(#"Previous Step", "taglist", each if [tag] = null or [tag] = "" then "(Empty Tag)" else fnMatchingList(Text.Split([tag],", ")))
Edit: to aid understanding, here's a standalone query which you can step through, to see what the function is actually doing:
let
MyList = {"Pepsico", "Nike"},
AllLists = #table(type table [#"ListName"=text, #"ListValues"=list],
{{"FoodCompanies",{"Nestlé", "Pepsico", "Unilever"}},
{"ClothingCompanies", {"Nike", "Ralph Lauren", "Old Navy"}},
{"TechCompanies",{"Apple", "Samsung", "IBM"}}}),
FullList = Table.ExpandListColumn(AllLists, "ListValues"),
Match = Table.AddColumn(FullList, "Match", each List.Contains(MyList,[ListValues])),
Filtered = Table.SelectRows(Match, each ([Match] = true)),
Output = if Table.RowCount(Filtered) = 1 then Filtered{0}[ListValues] else
if List.Distinct(Filtered[ListName]) = List.Distinct(FullList[ListName]) then "AllCompanies" else
Text.Combine(List.Distinct(Filtered[ListName]),", ")
in
Output

Power Query/Excel: Error with DateTime.ToText()

In Excel 2016 - Query Editor - Advanced Editor.
Here is my code:
let
SettingsSheet = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"TimeRange" = Table.TransformColumnTypes(SettingsSheet,{{"From", type datetime}, {"To", type datetime}}),
From = #"TimeRange"[From],
To = #"TimeRange"[To],
DateFormatString = "yyyy-MM-dd-THH:mm:ssZ",
FormattedFrom = DateTime.ToText(#"TimeRange"[From], DateFormatString ),
FormattedTo = DateTime.ToText(To, DateFormatString ),
...
(Further in the code, I will need to concart formatted datetimes in a URL string.)
If I finish with
...
in
#"TimeRange"
I get a table with DateTimes, as expected.
If I finish with
...
#"testTable" = { From, To, FormattedFrom, FormattedFrom}
in
#"testTable"
I get a table displaying
1 List
2 List
3 Error
4 Error
while I expected
3 and 4 to be date formatted as DateFormatString suggests.
I have also tried without DateFormatString as in
FormattedFrom = DateTime.ToText(#"TimeRange"[From]),
and with DateFormatString = "YYYYMMDD", as shown in example on https://msdn.microsoft.com/en-us/library/mt253497.aspx
But I got the same result.
How am I supposed to format dates ?
Edit: Error says: Expression.Error: We cannot convert a value of type
List to type DateTime. Details:
Value=List
Type=Type
DateTime.FromText expects a cell as a first argument instead of a column.
This added custom column would create a textstring that concatenates the 2 Dates with the desired format and "-" as a separator:
String = Table.AddColumn(#"TimeRange", "String", each DateTime.ToText([From], DateFormatString)&"-"&DateTime.ToText([To], DateFormatString))

Resources