Text.Contains for multiple values power query - excel

I am attempting to create the following query:
The idea is to check if each row in the source query contains any of the following keywords in the Search list and return the Found words is present.
Importantly I need this to be dynamic i.e. the search list could be a single word or could be 100+ words. Therefore I need to work around just stitching a bunch of Text. Contains with or statements is possible.
In effect, I want to create something like
Text.Contains([Column1], {any value in search list}) then FoundWord else null
Data:
Physical hazards Flam. Liq. 3 - H226 Eliminate all sources of ignition.
Health hazards STOT SE 3 - H336. Avoid inhalation of vapours and contact with skin and eyes.
Environmental hazards Not Classified. Avoid the spillage or runoff entering drains, sewers or watercourses.
Personal precautions Keep unnecessary and unprotected personnel away from the spillage.
clothing as described in Section 8 of this safety data sheet. Provide adequate ventilation.
Search List:
Hazards
Eliminate
ventilation
Avoid

try this code for query Table2 after creating query lookfor
let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
Findmatch = Table.AddColumn(Source, "Found", (x) => Text.Combine(Table.SelectRows(lookfor, each Text.Contains(x[Column1],[Column1], Comparer.OrdinalIgnoreCase))[Column1],", "))
in Findmatch

Related

Applying function to each row in Query Custom Column

Summary of problem:
I have a PowerQuery Table in Excel that contains 13 columns. The 13th Column is a custom column "Task Start Week Number". I want the PowerQuery to apply a formula to each of the rows generated for this Query. The formula is as follows:
=IFS(AND('Program Dates'!$B$2<WEEKNUM(New_Items_to_Save[Start Date]),
WEEKNUM(New_Items_to_Save[Start Date])<54),
'Program Dates'!$G$2-('Program Dates'!$D$2-(-53+WEEKNUM(New_Items_to_Save[Start Date]))),
WEEKNUM(New_Items_to_Save[Start Date])<'Program Dates'!$B$2,
'Program Dates'!$G$2-('Program Dates'!$D$2-(-53+WEEKNUM(New_Items_to_Save[Start Date])))+53)
What I've done here is reference a cell which contains the formula, that way I can just run the GetValue() function for a named range. I can't get this to work and I don't know what I'm doing wrong.
Thank you in advance for your help!
Context:
This is the query table I need to add the calculation to.
The last column is the custom column, and those values should be calculated using the following cells:
This is the source of the other info needed to calculate the week number of the program, with reference arrows shown.
Note: The dates referenced in the function have already been converted using the WEEKNUM() operation. I am comparing Week# to Week#, not Date to Week#
Function Logic:
AND: if the date falls within the range of the current year ie. week# is less than 54, but after the start of the program, then perform this calc.
IFS: otherwise, if week# is before the end of the program ie. 2023, then perform this calculation.
Edit:
Here is the PowerQuery function I want to call for each of the new cells in this custom column:
Parameter2 = Date.WeekOfYear(StartWeek)
let
GetWeek = ()
if GetValue("Start_Week") < Parameter2 < 54
then (GetValue("Program_Duration") - GetValue("End_Week") + 53 - Parameter2))
else
(GetValue("Program_Duration") - GetValue("End_Week") + 53 - Parameter2 +53))
in
GetWeek
I don't know if I need the let statement or if I should just put it in a function
f(x) => [equation]
and then call "...each f([column name])" in power query?
I think that there are actually three different parts to your question, and maybe your confusion is coming from combining them all together.
The way I see it is in these parts:
How to create a custom function.
How to apply a function to a new column.
How to apply a function to an existing column.
How to create a custom function
There are two main ways to create a custom function in Power Query:
Using the UI (follow steps here):
Step
Description
Image
1
Write your query
2
Parameterise your query
3
Create your function
Using only code (follow steps here):
Example to filter a table:
let fun_FilterTable = (tbl_InputTable as table, txt_FilterValue as text) as table =>
let
Source = tbl_InputTable,
Filter = Table.SelectRows(DayCount, each Text.Contains([Column], txt_FilterValue))
in
Filter
in
fun_FilterTable
Example to check if one string contains another:
let fun_CheckStringContains = (txt_String as text, txt_Check as text) as nullable logical =>
let
Source = txt_String,
Check = Text.Contains(Source, txt_Check)
in
Check
in
fun_CheckStringContains
More resources:
Using custom functions
Custom Functions Made Easy in Power BI Desktop
PowerQuery best practices
DataFlow best practices
How to apply a function to a new column
Also has two different ways to achieve:
Custom Column (follow steps here):
Step
Description
Image
1
Create custom column
2
Add function
Custom Function (follow steps here):
Step
Description
Image
1
Invoke custom function
Sources:
Add a custom column
Using custom functions
Custom Functions Made Easy in Power BI Desktop
How to apply a function to an existing column
Also has two different ways to achieve (unfortunately, only possible with pure code):
Using Transformation:
Example to uppercase an entire column:
let
Source = Table,
#"Uppercased text" = Table.TransformColumns(Source, {{"Column", each Text.Upper(_), type nullable text}})
in
#"Uppercased text"
Example to add a prefix to all rows in one column:
let
Source = Table,
#"Added prefix" = Table.TransformColumns(Source, {{"Column", each "test_" & _, type text}})
in
#"Added prefix"
Example to coerce column to date in Australian format:
let
Source = Table,
#"Fix date" = Table.TransformColumns(Source, {{"DateColumn", each Date.From(_, "en-AU"), type date}})
in
#"Fix date"
Using Replacement
Example to replace some text:
let
Source = Table,
#"Replaced value" = Table.ReplaceValue(Source, "Admin", "Administrator", Replacer.ReplaceText, {"Column"})
in
#"Replaced value"
Example to replace with values from another column
let
Source = Table,
#"Replaced value" = Table.ReplaceValue(Source, each [FixThisColumn], each [OtherColumn], Replacer.ReplaceText, {"FixThisColumn"})
in
#"Replaced value"
Your Specific Problem
Without some dummy data to use, I have created some here. Please note, in future, please provide some data in a minimum reproducible example (see here), so that we can easily recreate the scenario from your example.
Data:
ID
ProgramStartDate
ProgramEndDate
1
1/Jan/2020
1/Dec/2021
2
1/Jan/2022
1/Mar/2023
3
1/Mar/2022
1/Dec/2022
4
1/Sep/2021
1/Dec/2023
5
1/Jan/2023
1/Dec/2023
I think that you should be using a combination of the PowerQuery in-build date functions (see here) and some of the PowerQuery conditional processes (see here).
My code would look something like this:
let
Source = Table.FromColumns({{1,2,3,4,5},{"1/Jan/2020","1/Jan/2022","1/Mar/2022","1/Sep/2021","1/Jan/2023"},{"1/Dec/2021","1/Mar/2023","1/Dec/2022","1/Dec/2023","1/Dec/2023"}},{"ID","ProgramStartDate","ProgramEndDate"}),
fix_Types = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"ProgramStartDate", type date}, {"ProgramEndDate", type date}}),
add_Today = Table.AddColumn(fix_Types, "DateToday", each Date.From(DateTime.LocalNow()), type date),
add_CheckCurrentYear = Table.AddColumn(add_Today, "IsInCurrentYear", each Date.IsInCurrentYear([DateToday]), type logical),
add_CheckProgramRunning = Table.AddColumn(add_CheckCurrentYear, "ProgramIsCurrent", each [DateToday]>[ProgramStartDate] and [DateToday]<[ProgramEndDate], type logical),
add_ConditionalCheck = Table.AddColumn(add_CheckProgramRunning, "DoSomething", each if [IsInCurrentYear] and [ProgramIsCurrent] then "Do Something" else null, type text)
in
add_ConditionalCheck
And the final output would look something like this:
ID
ProgramStartDate
ProgramEndDate
DateToday
IsInCurrentYear
ProgramIsCurrent
DoSomething
1
1/01/2020
1/12/2021
22/12/2022
TRUE
FALSE
null
2
1/01/2022
1/03/2023
22/12/2022
TRUE
TRUE
Do Something
3
1/03/2022
1/12/2022
22/12/2022
TRUE
FALSE
null
4
1/09/2021
1/12/2023
22/12/2022
TRUE
TRUE
Do Something
5
1/01/2023
1/12/2023
22/12/2022
TRUE
FALSE
null
This should help you work towards resolving your issue.

Extracting data with an associated tag/unit

I have been attempting to separate out key data hidden within sentences of text e.g:
I have made some progress with the following code however it pulls undesired values too:
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Input", type text}, {"Desired OutPut", type any}, {"Bonus", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if Text.Contains([Input], "mmHg") then Text.Remove([Input],Text.ToList(Text.Remove([Input],{"0".."9","-", " ", "."}))) else null),
#"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"Custom", Text.Trim, type text}})
in
#"Trimmed Text"
As you can see other numerical data is being pulled.
I think however following these rules is perhaps the wrong way to go about this and wonder If it's possible to use mmHg as a Tag to extract 'nearby` data. Ideally the value or range will be touching "mmHg" however there are instances where this isnt the case hence this idea of nearby logic. I apprecaite I could remove all data except numbers and mmgH however I think this idea of tagging if possible will be very useful going forward. In my mind im thinking like: if Text contains mmHg then search for {0..9,"-"} within X charecters (say 10 to the left). Is this possible?
As sort of extra I will attempt to extract the Eye that this pressure is found in. Here I wish to use some soft of logic with a sort of first come first serve basis. I think this it an okay assumption that the first pressure will relate to the first mentioned eye per sentence. I am unsure how to do this in M code. This may however warrant a seperate question.
I think you can utilize regular expressions here:
Step 1):
Add a custom function to the group of your table:
In this case I called it 'fnRegexExtr' (much like a previous question you asked). The source function I used came from here and is a regex-replace function.
(x,y,z)=>
let
Source = Web.Page(
"<script>var x="&"'"&x&"'"&";var z="&"'"&z&
"'"&";var y=new RegExp('"&y&"','g');
var b=x.replace(y,z);document.write(b);</script>")
[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
Source
Step 2):
On the 'Add Column' tab, invoke this custom function. Use the following parameters:
x - Input
y - (\\d+(?:-\\d+)?)\\D*mmHg|.
z - $1
Step 3):
We can add another column using the same function with different parameters:
x - Input
y - \\b(right|left)\\s*eye\\b|.
z - $1
Please note the trailing spaces. Using spaces inbetween capture group 1 makes that PQ will auto-trim the result.
Step 4):
Under tab 'Transform' I simply replaced errors with 'null' values.
Step 5):
Edited the M-code to replace spaces inbetween values with comma-space delimiters.
Result:
M-Code:
let
Source = Excel.CurrentWorkbook(){[Name="Tabel1_2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Input", type text}}),
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "mmHg", each Text.Replace(fnRegexExtr([Input], "(\\d+(?:-\\d+)?)\\D*mmHg|.", "$1 ")," ",", ")),
#"Invoked Custom Function1" = Table.AddColumn(#"Invoked Custom Function", "Side", each Text.Replace(fnRegexExtr([Input], "\\b(right|left)\\s*eye\\b|.", "$1 ")," ",", ")),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function1", {{"mmHg", null}, {"Side", null}})
in
#"Replaced Errors"

Extract CAS Number from Downloaded Data

I have downloaded a CSV file from Pubchem containing over 5000+ records. One of the columns contains a bunch of computed synonyms where CAS Number is the records I wish to extract. Unfortunately, the CAS number isn't necessarily in the same position in this list, making splitting by delimiter more difficult. Below is the source data example and the desired output I am trying to achieve.
An older answer to a post a while back used a Regex function to extract strings of Numbers with a given length.
fnRegexExtr
let fx=(text,regex)=>
Web.Page(
"<script>
var x='"&text&"';
var y=new RegExp('"&regex&"','g');
var b=x.match(y);
document.write(b);
</script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
fx
Unsure if this is possible here and unfamiliar with Regex but I'm wondering if it is possible to modify this function to extract CAS numbers. The difficulty is that CAS Numbers can be in various formats CAS Numbers are up to 10 digits long using the format xxxxxxx-yy-z.
If anyone has any alternative solutions to extracting CAS numbers with this somewhat complex data feel free to post.
Data:
cid and cmpdname can be anything.
1-Aminopropan-2-ol|1-AMINO-2-PROPANOL|78-96-6|Isopropanolamine|Monoisopropanolamine
1-chloro-2,4-dinitrobenzene|2,4-Dinitrochlorobenzene|97-00-7|Dinitrochlorobenzene|DNCB|Chlorodinitrobenzene|CDNB
1,2-dichloroethane|Ethylene dichloride|107-06-2|Ethylene chloride|Ethane, 1,2-dichloro-|Glycol dichloride|Dutch liquid|Dutch oil|Ethane dichloride|Aethylenchloride
1,2,4-trichlorobenzene|120-82-1|Benzene, 1,2,4-trichloro-|unsym-Trichlorobenzene|Hostetex L-pec|Trojchlorobenzene
CHLOROACETALDEHYDE|2-chloroacetaldehyde|107-20-0|Chloroethanal|2-Chloroethanal|Acetaldehyde, chloro-|Chloroaldehyde|Monochloroacetaldehyde|2-Chloro-1-ethanal
In PQ, this will pull out the contents of any item that does not contain a letter in cmpdsynonym, which I think is basically what you are looking for
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom.3", each List.RemoveNulls(List.Transform(Text.Split([cmpdsynonym],"|"), each if _ = Text.Remove (_,{"A".."Z","a".."z"}) then _ else null)){0})
in #"Added Custom"
Here's one way of doing it in PQ, using fnRegexExtr to return the CAS; and a simple Text.Split to return the chemical compound name:
let
//Read in data and set data type as text
Source = Excel.CurrentWorkbook(){[Name="Compounds"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Transform to desired output
Result = Table.FromColumns(
{List.Transform(#"Changed Type"[Column1], each Text.Split(_,"|")){0}}
& {List.Transform(#"Changed Type"[Column1],each fnRegexExtr(_, "\\b\\d{1,7}-\\d{2}-\\d"))},
type table[Compound=text, CAS=text]
)
in
Result
Original
Results

How to extract relationships from a text

I am currently new with NLP and need guidance as of how I can solve this problem.
I am currently doing a filtering technique where I need to brand data in a database as either being correct or incorrect. I am given a structured data set, with columns and rows.
However, the filtering conditions are given to me in a text file.
An example filtering text file could be the following:
Values in the column ID which are bigger than 99
Values in the column Cash which are smaller than 10000
Values in the column EndDate that are smaller than values in StartDate
Values in the column Name that contain numeric characters
Any value that follows those conditions should be branded as bad.
However, I want to extract those conditions and append them to the program that I've made so far.
For instance, for the conditions above, I would like to produce
`if ID>99`
`if Cash<10000`
`if EndDate < StartDate`
`if Name LIKE %[1-9]%`
How can I achieve the above result using the Stanford NLP? (or any other NLP library).
This doesn't look like a machine learning problem; it's a simple parser. You have a simple syntax, from which you can easily extract the salient features:
column name
relationship
target value or target column
The resulting "action rule" is simply removing the "syntactic sugar" words and converting the relationship -- and possibly the target value -- to its symbolic form.
Enumerate all of your critical words for each position in a lexicon. Then use basic string manipulation operators in your chosen implementation language to find the three needed fields.
EXAMPLE
Given the data above, your lexicons might be like this:
column_trigger = "Values in the column"
relation_dict = {
"are bigger than" : ">",
"are smaller than" : "<",
"contain" : "LIKE",
...
}
value_desc = {
"numeric characters" : "%[1-9]%",
...
}
From here, use these items in standard parsing. If you're not familiar with that, please look up the basics of a simple sentence grammar in your favourite programming language, with rules such as such as
SENTENCE => SUBJ VERB OBJ
Does that get you going?

Replace all error values of all columns after importing datas (while keeping the rows)

An Excel table as data source may contain error values (#NA, #DIV/0), which could disturbe later some steps during the transformation process in Power Query.
Depending of the following steps, we may get no output but an error. So how to handle this cases?
I found two standard steps in Power Query to catch them:
Remove errors (UI: Home/Remove Rows/Remove Errors) -> all rows with an error will be removed
Replace error values (UI: Transform/Replace Errors) -> the columns have first to be selected for performing this operations.
The first possibility is not a solution for me, since I want to keep the rows and just replace the error values.
In my case, my data table will change over the time, means the column name may change (e.g. years), or new columns appear. So the second possibility is too static, since I do not want to change the script each time.
So I've tried to get a dynamic way to clean all columns, indepent from the column names (and number of columns). It replaces the errors by a null value.
let
Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
//Remove errors of all columns of the data source. ColumnName doesn't play any role
Cols = Table.ColumnNames(Source),
ColumnListWithParameter = Table.FromColumns({Cols, List.Repeat({""}, List.Count(Cols))}, {"ColName" as text, "ErrorHandling" as text}),
ParameterList = Table.ToRows(ColumnListWithParameter ),
ReplaceErrorSource = Table.ReplaceErrorValues(Source, ParameterList)
in
ReplaceErrorSource
Here the different three queries messages, after I've added two new column (with errors) to the source:
If anybody has another solution to make this kind of data cleaning, please write your post here.
let
src = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
cols = Table.ColumnNames(src),
replace = Table.ReplaceErrorValues(src, List.Transform(cols, each {_, "!"}))
in
replace
Just for novices like me in Power Query
"!" could be any string as substitute for error values. I initially thought it was a wild card.
List.Transform(cols, each {_, "!"}) generates the list of error handling by column for the main funcion:
Table.ReplaceErrorValues(table_with errors, {{col1,error_str1},{col2,error_str2},{},{}, ...,{coln,error_strn}})
Nice elegant solution, Sergei

Resources