Related
I am a bit new to powerquery and need some help with the points specified below the code.
I have a piece of working code that does the following:
Load the source
find the max "series" of a row, the format is a mix of letters and numbers, i.e. Y21Q3S1, the letters stay the same and the numbers are increasing (year, quarter, and series).
I want to look if a certain tag is assigned to a row, so I search all the tag columns if a tag is present and write that in the "Tags" column and "none" if there were none found
through grouping I find the points per tag, for each "max series"
I finally present it in a table in excel with first column being the series, then a column for the Tags as well as a column "None" if none of the Tags were present. I add a last updated date column.
The code:
let
Source = Csv.Document(Web.Contents("somefile.csv"),[Delimiter=",", Columns=32, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type with Locale" = Table.TransformColumnTypes(#"Promoted Headers", {{"Custom field (Points)", type number}}, "en-GB"),
#"Added Custom" = Table.AddColumn(#"Changed Type with Locale", "Max Series", each List.Max({[Series], [Series_1], [Series_2], [Series_3], [Series_4]})),
Tags = Table.AddColumn(#"Added Custom", "Tags", each if List.Contains({[Tags], [Tags_7], [Tags_8], [Tags_9], [Tags_10], [Tags_11], [Tags_12], [Tags_13], [Tags_14], [Tags_15], [Tags_16], [Tags_17], [Tags_18], [Tags_19], [Tags_20], [Tags_21], [Tags_22], [Tags_23]}, "tag1") then "tag1"
else if List.Contains({[Tags], [Tags_7], [Tags_8], [Tags_9], [Tags_10], [Tags_11], [Tags_12], [Tags_13], [Tags_14], [Tags_15], [Tags_16], [Tags_17], [Tags_18], [Tags_19], [Tags_20], [Tags_21], [Tags_22], [Tags_23]}, "tag2") then "tag2"
else if List.Contains({[Tags], [Tags_7], [Tags_8], [Tags_9], [Tags_10], [Tags_11], [Tags_12], [Tags_13], [Tags_14], [Tags_15], [Tags_16], [Tags_17], [Tags_18], [Tags_19], [Tags_20], [Tags_21], [Tags_22], [Tags_23]}, "tag3") then "tag3"
else "zzzNone"),
RemoveDummy = Table.SelectRows(Tags, each [ID] <> "ID-1234"),
#"Grouped Rows" = Table.Group(RemoveDummy, {"Max Series", "Tags"}, {{"Points per Tags", each List.Sum([#"Custom field (Points)]), type number}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Tags", Order.Ascending}}),
#"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[Tags]), "Tags", "Points per Tags"),
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"zzzNone", "None"}, {"Max Series", "Series"}}),
#"Added Custom1" = Table.AddColumn(#"Renamed Columns", "Last update", each DateTime.LocalNow()),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom1",{{"Last update", type datetime}})
in
#"Changed Type1"
The "Series" and "Tags" columns are a multivariable field, containing all series and tags and is translated by excel into multiple columns. The issue is that the number of series and tags are changing and to try coping with this I have created a dummy row with a lot of series. However, as you can see from the code this also changes and somehow "Tags_2" to "Tags_6" has disappeared and I had to error correct by removing these from the code.
Is there a dynamic way to if any column "Tags_*" contains "tag1" then... so I don't have to hard-code this?
Same goes for the "Max Series" where I would like to dynamically take max value of any columns "Series_*"
I would like to make the “Tags” step more dynamic, so that I can take input from a table in the excel sheet specifying which tags I want to search for instead of hardcoding “tag1, “tag2” etc.
My current code only assigns the points to the first tag found. However, I would like to assign points to several tags, so if two tags were found the "points" be assigned with half to each and for 3 tags they would all get one third of the points. I don’t know how to do this. Could you help me here?
As I am a bit new powerquery my code might be far from optimal, if you have some suggestions in your answers on how I can improve it that would be highly appreciated :-)
Hi mitru and welcome to StackOverflow!
You can make the 'Tags' step automatic by 'Unpivot other columns' and 'Group by' operations. To obtain this you should select all non Tag* columns and use 'Unpivot other columns'. Then please perform a Group by operation with operation = All Rows
You will receive a column populated with tables. The next step is to create a Custom columns with following formula:
=if List.Contains([Tags][Value],"tag1") then "tag1"
else if List.Contains([Tags][Value],"tag2") then "tag2"
else if List.Contains([Tags][Value],"tag3") then "tag3"
else "zzzNone"
The [Tags] is the column containing tables while [Value] is the column within each table that contains tags you are looking for.
Under below link there is a file with sample solution that I created.
https://sendeyo.com/en/608c8dee7f
Regarding bullet 3. I am not sure how the scoring system should work. Can you provide a sample data with the final output?
With the help from Gonso's post I was able to make the "tags" step more dynamic.
Furthermore, I found a solution on the bullet 3 assigning points to different tags if more than one tag is present.
I am posting the updated code here in case anyone else find the solution helpful:
let
Source = Csv.Document(Web.Contents("somefile"),[Delimiter=",", Columns=50, Encoding=65001, QuoteStyle=QuoteStyle.None]),
Promoted_Headers = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Changed_Type_with_Locale = Table.TransformColumnTypes(Promoted_Headers, {{"Custom field (Points)", type number}}, "en-GB"),
Max_Series = Table.AddColumn(Changed_Type_with_Locale, "Max Series", each List.Max({[Series], [Series_1], [Series_2], [Series_3], [Series_4]})),
Unpivoted_Other_Columns = Table.UnpivotOtherColumns(Max_Series, {"Type", "ID", "Custom field (Points)", "Series", "Series_1", "Series_2", "Series_3", "Series_4", "Series_5", "Series_6", "Series_7", "Series_8", "Series_9", "Max Series"}, "Attribute", "Value"),
Tags = Table.Group(Unpivoted_Other_Columns, {"Type", "ID", "Custom field (Points)", "Series", "Series_1", "Series_2", "Series_3", "Series_4", "Series_5", "Series_6", "Series_7", "Series_8", "Series_9", "Max Series"}, {{"Tags", each _, type table [Type=nullable text, ID=nullable text, #"Custom field (Points)"=nullable number, Series=nullable text, Series_1=nullable text, Series_2=nullable text, Series_3=nullable text, Series_4=nullable text, Series_5=nullable text, Series_6=nullable text, Series_7=nullable text, Series_8=nullable text, Series_9=nullable text, Max Series=text, Attribute=text, Value=text]}}),
Tag1 = Table.AddColumn(Tags, "Tag1", each if List.Contains([Tags][Value],"Tag1") then 1
else 0),
Tag2 = Table.AddColumn(Tag, "Tag2", each if List.Contains([Tags][Value],"tag2") then 1
else 0),
Tag3 = Table.AddColumn(Tag2, "Tag3", each if List.Contains([Tags][Value],"tag3") then 1
else 0),
NoneTag = Table.AddColumn(Tag3, "TagNone", each if List.Sum({[Tag], [Tag2], [Tag3]}) > 0
then 0
else 1),
Tag_total = Table.AddColumn(NoneTag, "Tag_total", each List.Sum({[Tag], [Tag2], [Tag3], [TagNone]})),
Tag_update = Table.ReplaceValue(Tag_total,each [Tag1], each if [Tag1] > 0 then ([#"Custom field (Points)"] * ([Tag1] / [Tag_total])) else [Tag1],Replacer.ReplaceValue,{"Tag1"}),
Tag2_update = Table.ReplaceValue(Tag_update, each [Tag2], each if [Tag2] > 0 then ([#"Custom field (Points)"] * ([Tag2] / [Tag_total])) else [Tag2],Replacer.ReplaceValue,{"Tag2"}),
Tag3_update = Table.ReplaceValue(Tag2_update, each [Tag3], each if [Tag3] > 0 then ([#"Custom field (Points)"] * ([Tag3] / [Tag_total])) else [Tag3],Replacer.ReplaceValue,{"Tag3"}),
TagNone_update = Table.ReplaceValue(Tag3_update, each [TagNone], each if [TagNone] > 0 then ([#"Custom field (Points)"] * ([TagNone] / [Tag_total])) else [TagNone],Replacer.ReplaceValue,{"TagNone"}),
RemoveDummy = Table.SelectRows(TagNone_update, each [ID] <> "ID-1234"),
Grouped_Series = Table.Group(RemoveDummy, {"Max Series"}, {{"Tag1", each List.Sum([Tag1]), type number}, {"Tag2", each List.Sum([Tag2]), type number}, {"Tag3", each List.Sum([Tag3]), type number}, {"None", each List.Sum([TagNone]), type nullable number}}),
Sorted_Series = Table.Sort(Grouped_Series,{{"Max Series", Order.Ascending}}),
Renamed_Series = Table.RenameColumns(Sorted_Series,{{"Max Series", "Series"}}),
Added_last_update = Table.AddColumn(Renamed_Series, "Last update", each dateTime.LocalNow()),
Changed_date_Type = Table.TransformColumnTypes(Added_last_update,{{"Last update", type datetime}})
in
Changed_date_Type
I have a list in Excel like the following:
1 / 6 / 45
123
1546
123 456
1247 /% 456 /
I want to create a new column with all sequences of consecutive non digits replaced by a character. In Google Sheets, this is easy using =REGEXREPLACE(A1&"/","\D+",","), resulting in:
1,6,45,
123,
1546,
123,456
1247,456,
In that formula A1&"/" is needed in order for REGEXREPLACE to work with numbers. No big deal, just adds a comma at the end.
How can we do this in Excel? Pure Power Query (not R, not Python, just M) is very much encouraged. VBA and other clickable Excel features are unacceptable (like find and replace).
If you have Excel 365:
In B1:
=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),SUBSTITUTE(TRIM(CONCAT(IF(ISNUMBER(--X),X," ")))," ",","))
Or if streaks of digits are always delimited by at least a space:
=TEXTJOIN(",",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.*0=0]"))
Another option, if you have got access to it, is LAMBDA(). Make a function to replace all kind of characters, something along the lines of this. Without LAMBDA() and TEXTJOIN() I think your best bet would be to start nesting SUBSTITUTE() functions.
Here is a Power Query solution.
It makes use of the List.Accumulate function to determine whether to add a digit, or a comma, to the string:
Note that the code replicates what you show for results. If you prefer to avoid trailing (and/or leading) commas, it can be easily modified.
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
Edit To eliminate leading/trailing commas, we add the Text.Trim function which, in Power Query, allows defining a specific text to Trim from the start/end:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "textToList", each List.Combine({Text.ToList([Column1]),{","}})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "commaTerminators", each
Text.Trim(
List.Accumulate(
[textToList],"", (state,current) =>
if List.Contains({"0".."9"},current)
then state & current
else if Text.EndsWith(state,",")
then state
else state & ","),
",")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"textToList"})
in
#"Removed Columns"
VBA UDF You mentioned you did not want VBA, but not clear if you were restricting that to a "clickable". Here is a user defined function that you can use on a worksheet directly. It uses the VBA regex engine which allows easy extraction of multiple matches
You can enter a formula on the worksheet such as =commaSep(cell_ref) to get the same results as shown above in my second PQ example
Option Explicit
Function commaSep(S As String) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "\d+"
If .test(S) Then
Set MC = .Execute(S)
sTemp = ""
For Each M In MC
sTemp = sTemp & "," & M
Next M
commaSep = Mid(sTemp, 2)
Else
commaSep = "no digits"
End If
End With
This is another variation if you have TEXTJOIN function available.
=SUBSTITUTE(TRIM(TEXTJOIN("",TRUE,IFERROR(MID(A2,ROW($A$1:INDEX(A:A,LEN(A2))),1)+0," ")))," ",",")
And another option in Power Query.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTQVzADYhNTpVgdINfIGEKbmpjBBIByZgpQjom5gr4qWEBfKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
x1 = Table.AddColumn(#"Changed Type", "x1", each Text.ToList([Column1])),
x2 = Table.AddColumn(x1, "x2", each List.Transform([x1], each if Text.Contains("0123456789", _) then _ else " " )),
x3 = Table.AddColumn(x2, "x3", each Text.Split(Text.Combine([x2])," ")),
x4 = Table.AddColumn(x3, "x4", each List.Transform([x3], each if Text.Contains("0123456789", try Text.At(_,0) otherwise " ") then _&"," else "" )),
x5 = Table.AddColumn(x4, "x5", each Text.Combine([x4])),
#"Removed Columns" = Table.RemoveColumns(x5,{"x1", "x2", "x3", "x4"})
in
#"Removed Columns"
I am having a problem creating a conditional column in Power bi that finds/looks up words that begin with specific letters and then remove it for the column as showing in this example below.
Values to remove are words that begins with the letters; FCL,MON and WOD
Can anybody help me?
Thank you!
If they're all the s same length then you can write it more compactly like this:
if List.Contains({"WOD", "FCL", "MON"}, Text.Start([Input],3)) then "" else [Input]
Otherwise, you need to write each separately,
if Text.StartsWith([Input], "WOD") or
Text.StartsWith([Input], "FCL") or
Text.StartsWith([Input], "MON")
then "" else [Input]
You can use create a conditional column.
Here is a screenshot of the conditional column and the condition "begins with":
Here is the result:
Here is the M code, be carefull, when you are trying to create something with M you need to "repeat" him what you just did. For example I indicated that my file had headers in the step before the conditional column, and thus, M repeats this within the new step like this : #"Promoted Headers"
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Added Conditional Column" = Table.AddColumn(#"Promoted Headers", "Custom", each if Text.StartsWith([Input], "FCL") then " " else if Text.StartsWith([Input], "MON") then " " else if Text.StartsWith([Input], "WOD") then " " else [Input])
in
#"Added Conditional Column"
I hope this helps
I'm new to PowerQuery in Excel and I'm trying to get a random sample from a table, but nothing I do seems to be working.
I have a table with a few hundred entries and I want a sample of fifteen. (Non-repeating.)
I've Googled this problem extensively and none of the examples work for me, but I honestly don't know why. Is there anyone who can help me understand how to accomplish this?
Thank you very much!
Try something like this - replace Source as appropriate:
= Table.RemoveColumns(Table.FirstN(Table.Sort(Table.Buffer(Table.AddColumn(Source, "Random", each Number.Random())), {"Random", Order.Ascending}),15),{"Random"})
Or if you prefer to see it step by step:
let
Source = MySourceTable,
#"Added Random" = Table.AddColumn(Source, "Random", each Number.Random()),
#"Buffered Random Values" = Table.Buffer(#"Added Random"),
#"Sorted Rows by Random" = Table.Sort(#"Buffered Random Values",{{"Random", Order.Ascending}}),
#"Kept First Rows" = Table.FirstN(#"Sorted Rows by Random",15),
#"Removed Random Column" = Table.RemoveColumns(#"Kept First Rows",{"Random"})
in
#"Removed Random Column"
How can I achieve the same calculation in Power Query?
In Excel I would use: =COUNTIF($A$2:A2,A2)
Name Occurrence
A 1
A 2
B 1
A 3
B 2
Thanks,
Tamir
This could be reached via "buttons" only (almost - except some custom column formula):
AddedIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
GroupedRows = Table.Group(AddedIndex, {"Name"}, {{"tmp", each _, type table}}),
AddedCustom = Table.AddColumn(GroupedRows, "Custom", each Table.AddIndexColumn([tmp],"Occurrence", 1,1)),
RemovedOtherColumns = Table.SelectColumns(AddedCustom,{"Custom"}),
Expanded = Table.ExpandTableColumn(RemovedOtherColumns, "Custom", {"Name", "Occurrence"}, {"Name", "Occurrence"})
Or, shorter:
AddedIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
GroupedRows = Table.Group(AddedIndex, {"Name"}, {{"tmp", each Table.AddIndexColumn(_, "Occurence", 1,1), type table}}),
Expanded = Table.ExpandTableColumn(GroupedRows, "tmp", {"Occurrence"}, {"Occurrence"})
There you can also extract Index column and sort by it, if you need to preserve initial row order. Also all other necessary columns have to be extracted on the last step
Keeping a running count is definitely possible in PQ, though one of those things that is not super simple due to how PQ is designed to look at data. There is probably a more efficient way, but this is what I came up with.
First add an Index column that starts at 1, so we can easily track the "row" we are on. Then add a custom column with this in it
Number.Abs(List.Count(List.RemoveItems(List.Range(#"Added Index"[Name], 0, [Index]), {[Name]}))-List.Count(List.Range(#"Added Index"[Name], 0, [Index])))
I didn't see a simple list function in PQ that counts matching items in a list, so instead we get the count of items in a list by taking the difference in count between a list with the matching items removed and the base count of the list. The index is used to so we can check against a list only up to our current "row" by using List.Range.
The full M code when I pulled in a table of sample data looked like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Occurence", each Number.Abs(List.Count(List.RemoveItems(List.Range(#"Added Index"[Name], 0, [Index]), {[Name]}))-List.Count(List.Range(#"Added Index"[Name], 0, [Index]))))
in
#"Added Custom"