Power Query: Function to search a column for a list of keywords and return only rows with at least one match - excel

I am making a simple Google-like search function in Power Query.
Let's say I have a column called Description in a table called Database. The user then inputs some search queries like "dog, cat, animals". I want to filter Database for rows that contain at least one of these keywords. They keywords can change each time, depending on what the user types in a named range in Excel.
I know you can filter a column in Power Query for multiple keywords, like this:
FilterRows = Table.SelectRows(LastStep, each Text.Contains([English], "dog") or Text.Contains([English], "cat")),
but those keywords are static, and the column is also static. I want to be able to control both the keywords and the column name as variables. I think I need to write a function but I am not sure how to start.

Your question requires several moving parts.
First, I would get the keywords from a named range "Keywords" into a table like this:
{KeywordTbl}
let
GetKeywords = if Excel.CurrentWorkbook(){[Name="Keywords"]}[Content]{0}[Column1] = null then null else Text.Split(Excel.CurrentWorkbook(){[Name="Keywords"]}[Content]{0}[Column1], ", "),
ConvertToTable = Table.FromList(GetKeywords,null,{"Keywords"})
in
ConvertToTable
Secondly, store the column name where you want to search in an Excel named range called "ColName". Then pull the named range into Power Query like this:
{ColName}
let
GetColName = Excel.CurrentWorkbook(){[Name="ColName"]}[Content]{0}[Column1]
in
GetColName
Then I would write a function that takes 4 variables, the table and column you want to look in, and the table and column containing the keywords:
{SearchColForKeywords}
(LookInTbl as table, KeywordTbl as table, LookInCol as text, KeywordCol as text) =>
let
RelativeMerge = Table.AddColumn(LookInTbl, "RelativeJoin",
(Earlier) => Table.SelectRows(KeywordTbl,
each Text.Contains(Record.Field(Earlier, LookInCol), Record.Field(_, KeywordCol), Comparer.OrdinalIgnoreCase))),
ExpandRelativeJoin = Table.ExpandTableColumn(RelativeMerge, "RelativeJoin", {KeywordCol}, {"Keywords found"}),
FilterRows = Table.SelectRows(ExpandRelativeJoin, each [Keywords found] <> null and [Keywords found] <> ""),
// Concatenate multiple keyword founds line into one line
GroupAllData = Table.Group(FilterRows, {"Word ID"}, {{"AllData", each _, type table [First column=text, Second column=text, ... your other columns=text]}}),
AddCol = Table.AddColumn(GroupAllData, "Keywords found", each [AllData][Keywords found]),
ExtractValues = Table.TransformColumns(AddCol, {"Keywords found", each Text.Combine(List.Transform(_, Text.From), ", "), type text}),
DeleteAllData = Table.RemoveColumns(ExtractValues,{"AllData"}),
MergeQueries = Table.NestedJoin(DeleteAllData, {"Word ID"}, FilterRows, {"Word ID"}, "DeleteAllData", JoinKind.LeftOuter),
ExpandCols = Table.ExpandTableColumn(MergeQueries, "DeleteAllData", {"First Col name", "Second col name", ... "Your Other column names here"}),
DeleteKeywordsFound = Table.RemoveColumns(ExpandCols,{"Keywords found"})
in
DeleteKeywordsFound
FYI, half of this function has been developed by a user named lmkeF on PowerBI community. The full discussion is here. I merely improved on his solution.
Finally, I will use that function in another query like this:
StepName = SearchColForKeywords(MainTbl, KeywordTbl, ColName, "Keywords"),
You may customize the 4 variable names.

Related

Data set Transformation for Power BI

Actually i'm working on Power BI to make an analysis of authors publications numbers and trends.
I have the data set shown in the image below.
A column of authors and and another for their IDs
in each cell, i'ev multiple authors at once, the same for their IDs
so my question
Is there a way to match each author with it's ID so i can proceed my analysis.
Thank you so much
Since you chose to provide your data as a screenshot, which cannot be copy/pasted into a table, I had to make up my own.
split each column into a list
combine the two lists into a table
Source
M Code (Transform=>Home=>Advanced Editor)
let
Source = Table.FromRecords(
{[Authors="Author A, Author B", #"Author(s) ID"="12345;67890;"],
[Authors="Author C,Author D,Author E", #"Author(s) ID"="444123;789012;66666;"],
[Authors="Author X, Author Y, Author Z, Author P", #"Author(s) ID"="1111;2222;3333;4444;"]}),
#"Changed Type" = Table.TransformColumnTypes(Source, {{"Authors", type text},{"Author(s) ID", type text}}),
//split each column into a List; trim the entries
authors = List.Combine(List.Transform(#"Changed Type"[Authors], each Text.Split(Text.Trim(_),","))),
IDs = List.Combine(List.Transform(#"Changed Type"[#"Author(s) ID"], each Text.Split(Text.Trim(_,";"),";"))),
//create new table
result = Table.FromColumns({authors,IDs},
type table[Authors=text, #"Author(s) ID"=text])
in
result
Result

Find matching value in query based on text

I am building a table in power query and I want to find the matching value from a column in a row. Does anyone know how to do this? I import my source data with:
leagueDataSource = #"League Data All",
this gives me this table:
I then have a variable called:
leagueName = "Albania - Superliga",
and want to create another variable called activeSeason. How do I match the variable leagueName with the value in active Season ?
Found the answer myself :)
leagueName = "name",
a = List.PositionOf(leagueDataSource[League], leagueName, 0),
leagueID = Number.ToText(leagueDataSource[Active Season]{a}),

Power Query (M) _ Dynamically update a column list for List.Sum function

I'm not sure if even possible but the goal is to dynamically update a query based on the user selecting a date. I have a table in my Excel file while updates a value which feeds to PeriodString variable (below)
/*Parameter name = PeriodString */
let
Source = Excel.CurrentWorkbook(){[Name="PeriodString"]}[Content],
StrPeriod = Source[Value]{0}
in
StrPeriod
The part of the code I want to update is the [ ..months selected ].
=List.Sum({[FYOpening],[January],[February],[March],[April],[May]})
With the below variable
=List.Sum({PeriodStr})
I tried using Table.Column as I realize I have to convert the value to a list of selectable columns but I cant' get it to work.
=List.Sum({Table.Column(PeriodString{0},PeriodString[0])})
Expression.Error: We cannot convert the value "[FY Opening],[Januar..." to type List.
Details:
Value=[FY Opening],[January],[February],[March],[April],[May]
Type=[Type]
Let me know if possible / alternatives.
If you need exactly value like "[Col1],[Col2],[Col3]" for PeriodString, then use such code:
let
Source = #table({"a".."e"},{{1..5}, {6..10}}),
PeriodString = "[b],[d],[e]",
sum = Table.AddColumn(Source, "sum", each List.Sum(Expression.Evaluate("{"&PeriodString&"}", [_=_])))
in
sum
I'd prefer to use PQ list instead:
let
Source = #table({"a".."e"},{{1..5}, {6..10}}),
list = {"b","d","e"},
sum = Table.AddColumn(Source, "sum", each List.Sum(Record.ToList(Record.SelectFields(_, list))))
in
sum

Excel Power Query: Variables for Table Name

I'm trying to achieve something that seems like it should be fairly simple but I can't find an answer for... replace the name of a table or power query with a variable.
Currently trying to do this with a merge query so it would look something like this:
Table.NestedJoin(VARIABLE1,key1,VARIABLE2,key2,"Append",JoinKind.Inner)
Currently getting all sorts of errors no matter what I try...
Thank you!
// Edit:
Not really looking to do a function - hoping for users to utilize as easy as possible so they would be able to update a named table in the workbook, refresh, and then get a table as an output. Here is my current code - hopefully that'll help. My Region code replacements worked fine, but the Days replacements don't - I need each day (Monday-Thursday) to be replaced with my day variables (StartDay, Day2, etc.). Each of those has a separate text query referring back to the excel workbook inputs, and each of them should pull up a query based on the text (ex: StartDay = Monday so should pull the Monday query). This is the error I get, assuming that it is reading it as text "Monday" and not query Monday.
Expression.Error: We cannot convert the value "Monday" to type Table.
Details:
Value=Monday
Type=Type
let
ANDOriginCode = OriginRegion,
ANDDestinationCode = DestinationRegion,
ANDStartDay = StartDay,
ANDDay2 = Day2,
ANDDay3 = Day3,
ANDDay4 = Day4,
ANDDay5 = Day5,
Source = Table.NestedJoin(Monday,{"Tuesday Destination Region Code"},Tuesday,{"Tuesday Origin Region Code"},"Append1 (3)",JoinKind.Inner),
#"Filtered Rows1" = Table.SelectRows(Source, each [Monday Origin Region Code] = OriginRegion),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"ID", "Pickup Day of Week", "Delivery Day of Week"}),
#"Expanded Append1 (3)" = Table.ExpandTableColumn(#"Removed Columns", "Append1 (3)", {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}, {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Append1 (3)",{"Wednesday Destination Region Code"},Wednesday,{"Wednesday Origin Region Code"},"Append1 (4)",JoinKind.Inner),
#"Expanded Append1 (4)" = Table.ExpandTableColumn(#"Merged Queries", "Append1 (4)", {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"}, {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"})
#"Merged Queries1" = Table.NestedJoin(#"Expanded Append1 (4)",{"Thursday Destination Region Code"},Thursday,{"Thursday Origin Region Code"},"Append1 (5)",JoinKind.Inner)
in
#"Merged Queries1"
This might help:
let
Source = (VARIABLE1 as table, VARIABLE2 as table) => Table.NestedJoin(VARIABLE1, Key1, VARIABLE2, Key1, "Append", JoinKind.Inner)
in
Source
You can use parameters for Key1 and Key2. The function will prompt you to select your tables.
You can invoke it from any other query with:
Function.Invoke(Merge,{Table1,Table2})
Replace Merge with whatever you named the first query above and replace Table1 and Table2 with your target tables.
In case you're thinking of it, I have not been able to figure out how to pass tables from parameters. When you do that, the value you enter is recognized as text--for instance, "Table" versus Table--so it won't work. I could not find any information on how to pass a table value, like Table, in a variable. Anyhow, I hope this helps at least a little.
I was searching for this, too!
I finally found it, thanks to Chris Webb at https://blog.crossjoin.co.uk/2015/02/06/expression-evaluate-in-power-querym/
The key is using Expression.Evaluate with #shared as the second argument.
If you define Query1 as
let
Source = 1 + 1
in
Source
Query2 as
let
Source = 15 * 10
in
Source
define pIndex as a parameter that is "1" or "2", and
define QuerySwitch as
Expression.Evaluate("Query" & pIndex, #shared)
then QuerySwitch will return
2 when pIndex is "1"
150 when pIndex is "2"
My example:
I have a query QueryThatTakesFiveMinutes that
other queries use, and
writes to an Excel table (also named "QueryThatTakesFiveMinutes")
If I define a query "QueryThatTakesFiveMinutes Cached" by moving my cursor to the output QueryThatTakesFiveMinutes table in Excel and creating a new query from that table then, when I'm testing, I can change all the queries that use QueryThatTakesFiveMinutes to instead use #"QueryThatTakesFiveMinutes cached" and test downstream computation without waiting five minutes every time. Then I just need to remember to change it back when I'm ready.
But that was annoying.
I created a named range in Excel called "ProductionMode" that pointed to a specific cell that holds a value of either TRUE or FALSE
In Power-Query, I defined a very handy power query function called fNamedCellValue as
(rangeName as text) => Excel.CurrentWorkbook(){[Name=rangeName ]}[Content]{0}[Column1]
so that I can define a "ProductionMode" query as
fGetNamedCellValue("ProductionMode")
I use this in a way that's similar to the Index parameter above, but this way I can edit it via Excel.
When I defined "modeQueryThatTakesFiveMinutes" as
if ProductionMode then QueryThatTakesFiveMinutes else #"QueryThatTakesFiveMinutes Cached"
and changed all queries that use QueryThatTakesFiveMinutes to use modeQueryThatTakesFiveMinutes instead, I was very surprised to find that both QueryThatTakesFiveMinutes and #"QueryThatTakesFiveMinutes Cached" were evaluated and it didn't save any time at all!
So then after searching, being overjoyed to find your question only to realize it wasn't answered, then finding Chris Webb's article, I tried redefining modeQueryThatTakesFiveMinutes as
Expression.Evaluate(
if ProductionMode then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
Unfortunately, instead of working, I got an error of
Formula.Firewall: Query 'modeQueryThatTakesFiveMinutes' references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
However, I found a way around this, too, by putting the offending code within a function that the consuming query executes.
Deleting ProductionMode and defining a new query fProductionMode of
() => fGetNamedCellValue("ProductionMode") as logical
now doesn't return true or false, it returns a function that will return true or false when evaluated. Why is one legal and the other isn't? I don't know, but it is! Change the definition of modeQueryThatTakesFiveMinutes to
Expression.Evaluate(
if fProductionMode() then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
and it works!

How to use variable column name in a function for other functions?

I have written two functions which take date as input and I'm gonna use them on multiple queries. Instead of doing manual work every time (call both functions, filter rows where first function returns True, expand record of the second function to columns, delete first function column) I thought I'd write another function that takes names of the table and the column with dates as parameters to automatize that process. My current table-based function works if I include specific column's date in the code, but those names will be different between different queries(tables).
Here's the table function's code:
(t as table) =>
let
FunctionFilter = Table.AddColumn(t, "DateFilter", each DateFilter([myDate2])),
FunctionPeriods = Table.AddColumn(#"FunctionFilter", "TimePeriods", each TimePeriods([myDate2])),
ExpandPeriods= Table.ExpandRecordColumn(FunctionPeriods, "TimePeriods", {"Year", "Quarter", "Month", "WeekMon", "WeekTue", "Day"},
{"Year", "Quarter", "Month", "WeekMon", "WeekTue", "Day"}),
TrueDate = Table.SelectRows(ExpandPeriods, each ([DateFilter] = true)),
DeleteDateFilter = Table.RemoveColumns(TrueDate,{"DateFilter"})
in
DeleteDateFilter
My only problem is inserting a variable column name in place of [myDate2] here:
FunctionFilter = Table.AddColumn(t, "DateFilter", each DateFilter([myDate2])),
FunctionPeriods = Table.AddColumn(#"FunctionFilter", "TimePeriods", each TimePeriods([myDate2])),
Using Table.Column(t,[column name]) returns a list instead of a date, which causes called date functions to throw a type mismatch error.
You may use such technique:
// Table
let
Source = #table(3,List.Zip({{"a".."d"},{1..4},List.Numbers(10,4,10)})),
fn = fn(Source, "Column3")
in
fn
// fn
(tbl as table, col as text) =>
let
i = Table.AddIndexColumn(tbl, "i", 0, 1),
add = Table.AddColumn(i, "new", each Table.Column(i, col){[i]}*10),
del = Table.RemoveColumns(add, "i")
in
del

Resources