So Power Query doesn't have the Html.Table Function that can be found in Power BI.
So can we use regex to convert Html into text and make an equivalent function?
Previous posts state that this shouldn't be done since HTML doesn't follow the same rules as text, however .. needs must. Its also just interesting as a question and if achievable, would prove very useful for scraping difficult pages, directly in excel.
I came across this regex:
https://regex101.com/r/AtElMH/2 From and answer on the following post. Seems to work reasonably well.
So Im wondering if I can use this to tidy up any HTML that I pull into excel from the web connector. Each line from the table in blue comes from submitting the HTML to https://www.textfixer.com/html/html-to-text.php just to give an idea of what each row should expect. However, as per the Regex 101 link it does not have to be perfect i.e. if the occasional tag slips through, that's okay; it's more of a tidy-up. I would rather that than a pattern that loses data.
Currently, submitting this regex into FnRegexReplace Function results in an error. I don't know if the regex can be read correctly by excel and, if not, if there are any work arounds.
FnRegexReplace: Note: y = Text.Replace(y,"\","\\"), so no need for \\
(x,y,z)=>
let
y = Text.Replace(y,"\","\\"),
Source = Web.Page(
"<script>var x="&"'"&x&"'"&";var z="&"'"&z&
"'"&";var y=new RegExp('"&y&"','gmi');
var b=x.replace(y,z);document.write(b);</script>")
[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
Source
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "FnRegexReplace", each FnRegexReplace([Column1], "<([\w\-\/]+)( +[\w\-]+(=(('[^']*')|(""[^""]*"")))?)* *>", " "))
in
#"Invoked Custom Function"
HTML DATA:
</div><!-- SectionHeaderWrapper --><div id="SectionContent"><h3 id="sAdministrativeDataSummary" class="mDisabled">Administrative data</h3><h3 id="sWorkersHazardViaInhalationRoute">Workers - Hazard via inhalation route</h3><h4>Systemic effects</h4><h5>Long term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>no hazard identified</dd></dl></dl></dl></dl></dl></dl><h5>Acute/short term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>no hazard identified</dd></dl><h6>DNEL related information</h6></dl></dl></dl></dl></dl><h4>Local effects</h4><h5>Long term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>DNEL (Derived No Effect Level)
Value:</dt><dd><span class="UserEntry">0.02</span> mg/m³
Most sensitive endpoint:</dt><dd>repeated dose toxicity</dd></dl><h6>DNEL related information</h6><dl class="HorDL"><dt>DNEL derivation method:</dt><dd>other: <span class="UserEntry">Biocidal Products Regulation guidance for Human Health Risk Assessment (Volume III, Part B, December 2013</span>
Overall assessment factor (AF):</dt><dd class="UserEntry">16
Dose descriptor:</dt><dd>NOAEC
Value:</dt><dd><span class="UserEntry">0.34</span> mg/m³
AF for dose response relationship:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">NOAEC defined based on local effects of irritation/corrosion which are considered concentration dependent
AF for differences in duration of exposure:</dt><dd class="UserEntry">2
Justification:</dt><dd class="UserEntry">NOAEC derived from subchronic study therefore extrapolating to chronic duration
AF for interspecies differences (allometric scaling):</dt><dd class="UserEntry">2.5
Justification:</dt><dd class="UserEntry">Local effects observed only therefore toxicokinetics do not contribute to interspecies differences
AF for other interspecies differences:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">Local effects observed only therefore toxicokinetics do not contribute to interspecies differences
AF for intraspecies differences:</dt><dd class="UserEntry">3.2
Justification:</dt><dd class="UserEntry">Local effects observed only therefore toxicokinetics do not contribute to intraspecies differences
AF for the quality of the whole database:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">Hazards well characterised in multiple studies of good reliability
AF for remaining uncertainties:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">No remaining uncertainties</dd></dl></dl></dl></dl></dl><h5>Acute/short term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>DNEL (Derived No Effect Level)
For fun, not using recursion or regex
//single column of HTML text as input into [Column1]
// removes all text betweeen all pairs of < and >
let Source = Csv.Document(File.Contents("C:\Temp\a.txt")),
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.ToList([Column1])),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1, Int64.Type),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Index", "Custom"),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom.1", each if [Custom]="<" or [Custom]=">" then [Custom] else null),
#"Duplicated Column" = Table.DuplicateColumn(#"Added Custom1", "Custom.1", "Custom.1 - Copy"),
#"Filled Down" = Table.FillDown(#"Duplicated Column",{"Custom.1 - Copy"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([#"Custom.1 - Copy"] = ">") and ([Custom.1] = null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1", "Custom.1", "Custom.1 - Copy"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Index"}, {{"data", each Text.Combine(_[Custom]), type text}})
in #"Grouped Rows"
then you'd probably go back and replace all HTML entities like
Entities
&
³
One answer so Far: (<[^<>]*>)+ however I am not sure how well this works for other HTML text using "___" as substitution
Related
Im attempting to get the Href information from the following site using Power query:
https://hpvchemicals.oecd.org/ui/SIDS_Details.aspx?id=fc1ced8a-ce14-45fa-b003-dfeda5e38075
As per the page I wish to obtain the href for the 50000.pdf link.
Inspecting the page this should be: handler.axd?id=fae8d1b1-406b-4287-8a05-f81aa1b16d3f
However attempting this in Power query this appears to be ommited from the text:
M Code:
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://hpvchemicals.oecd.org/ui/SIDS_Details.aspx?id=fc1ced8a-ce14-45fa-b003-dfeda5e38075"))})
in
Source
My question is why does this happen? I dont think it can be solved (if so great) but Im still interested to understand whats going on here.
It is using an iframe. Try this.
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://hpvchemicals.oecd.org/ui/SidsOrganigrame.aspx?SIDSNo=fc1ced8a-ce14-45fa-b003-dfeda5e38075&id=000c31fa-483a-4e5b-a8bb-c26c3148e464&Key=1c143ab1-b132-4b57-b34d-559b07c845f2&Idx=0"))}),
#"Filtered Rows" = Table.SelectRows(Source, each [Column1] = " <img src=""images/FiletypeIcone/htm.png"" height=""16"" width=""16"" border=""0"" /> SIAR published by UNEP<br /><img src=""images/FiletypeIcone/pdf.ico"" height=""16"" width=""16"" border=""0"" /> FORMALDEHYDE_50000.pdf<br />")
in
#"Filtered Rows"
I am attempting to collect href data in power query for excel for any results found on https://echa.europa.eu/ when searching for 'Acetone'.
Current M Code:
let
Source = Web.Page(Web.Contents(
"https://echa.europa.eu/search-for-chemicals?" &
//Parameters
"p_auth=69hDou3E&p_p_id=disssimplesearch_WAR_disssearchportlet&p_p_lifecycle=1&p_p_state=normal&p_p_col_id=" &
"_118_INSTANCE_UFgbrDo05Elj__column-1&p_p_col_count=1&_disssimplesearch_WAR_disssearchportlet_javax.portlet.action=" &
"doSearchAction&_disssimplesearch_WAR_disssearchportlet_backURL=https%3A%2F%2Fecha.europa.eu%2Finformation-on-chemicals" &
"%3Fp_p_id%3Ddisssimplesearchhomepage_WAR_disssearchportlet%26p_p_lifecycle%3D0%26p_p_state%3Dnormal%26p_p_mode%3Dview" &
"%26p_p_col_id%3D_118_INSTANCE_UFgbrDo05Elj__column-1%26p_p_col_count%3D1%26_disssimplesearchhomepage_WAR_disssearchportlet_sessionCriteriaId%3D" &
"_disssimplesearchhomepage_WAR_disssearchportlet_formDate=1621042609544&_disssimplesearch_WAR_disssearchportlet_searchOccurred=" &
"true&_disssimplesearch_WAR_disssearchportlet_sskeywordKey=Acetone&_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer" &
"=true&_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox=on")),
Data = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data,{{"Name", type text}, {"EC / List no.", type text}, {"CAS no.", type text}, {"BP", type text}, {"OBL", type text}})
in
#"Changed Type"
The parameters are form a previous VBA Post:
This returns:
As you can see the BP is returned just saying Open Brief Profile instead of the Href for each chemical.
Desired result for acetone in BP column:
I know this can be done using table from examples using Power BI but since I manipulate the data in excel it's more useful to pull it straight from here.
I have explored this previously with no success however https://community.powerbi.com/t5/Desktop/web-connector-and-getting-HREF-value/m-p/422068 gives me hope that it could be done? I have tried this though and run into issues.
If anyone could advise whether this could be done it would be appreciated. the final result is that column BP (not bothered about OBL) containsa href for each result in the table.
Try this:
let
Source = Excel.Workbook(Web.Contents("https://echa.europa.eu/search-for-chemicals?p_p_id=disssimplesearch_WAR_disssearchportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=exportResults&p_p_cacheability=cacheLevelPage&_disssimplesearch_WAR_disssearchportlet_sessionCriteriaId=dissSimpleSearchSessionParam101401654440118533&_disssimplesearch_WAR_disssearchportlet_formDate=1654440118558&_disssimplesearch_WAR_disssearchportlet_sskeywordKey=Acetone&_disssimplesearch_WAR_disssearchportlet_orderByCol=relevance&_disssimplesearch_WAR_disssearchportlet_orderByType=asc&_disssimplesearch_WAR_disssearchportlet_exportType=xls"))[Data]{0}
in
Source
Actually i'm working on Power BI to make an analysis of authors publications numbers and trends.
I have the data set shown in the image below.
A column of authors and and another for their IDs
in each cell, i'ev multiple authors at once, the same for their IDs
so my question
Is there a way to match each author with it's ID so i can proceed my analysis.
Thank you so much
Since you chose to provide your data as a screenshot, which cannot be copy/pasted into a table, I had to make up my own.
split each column into a list
combine the two lists into a table
Source
M Code (Transform=>Home=>Advanced Editor)
let
Source = Table.FromRecords(
{[Authors="Author A, Author B", #"Author(s) ID"="12345;67890;"],
[Authors="Author C,Author D,Author E", #"Author(s) ID"="444123;789012;66666;"],
[Authors="Author X, Author Y, Author Z, Author P", #"Author(s) ID"="1111;2222;3333;4444;"]}),
#"Changed Type" = Table.TransformColumnTypes(Source, {{"Authors", type text},{"Author(s) ID", type text}}),
//split each column into a List; trim the entries
authors = List.Combine(List.Transform(#"Changed Type"[Authors], each Text.Split(Text.Trim(_),","))),
IDs = List.Combine(List.Transform(#"Changed Type"[#"Author(s) ID"], each Text.Split(Text.Trim(_,";"),";"))),
//create new table
result = Table.FromColumns({authors,IDs},
type table[Authors=text, #"Author(s) ID"=text])
in
result
Result
I'm trying to achieve something that seems like it should be fairly simple but I can't find an answer for... replace the name of a table or power query with a variable.
Currently trying to do this with a merge query so it would look something like this:
Table.NestedJoin(VARIABLE1,key1,VARIABLE2,key2,"Append",JoinKind.Inner)
Currently getting all sorts of errors no matter what I try...
Thank you!
// Edit:
Not really looking to do a function - hoping for users to utilize as easy as possible so they would be able to update a named table in the workbook, refresh, and then get a table as an output. Here is my current code - hopefully that'll help. My Region code replacements worked fine, but the Days replacements don't - I need each day (Monday-Thursday) to be replaced with my day variables (StartDay, Day2, etc.). Each of those has a separate text query referring back to the excel workbook inputs, and each of them should pull up a query based on the text (ex: StartDay = Monday so should pull the Monday query). This is the error I get, assuming that it is reading it as text "Monday" and not query Monday.
Expression.Error: We cannot convert the value "Monday" to type Table.
Details:
Value=Monday
Type=Type
let
ANDOriginCode = OriginRegion,
ANDDestinationCode = DestinationRegion,
ANDStartDay = StartDay,
ANDDay2 = Day2,
ANDDay3 = Day3,
ANDDay4 = Day4,
ANDDay5 = Day5,
Source = Table.NestedJoin(Monday,{"Tuesday Destination Region Code"},Tuesday,{"Tuesday Origin Region Code"},"Append1 (3)",JoinKind.Inner),
#"Filtered Rows1" = Table.SelectRows(Source, each [Monday Origin Region Code] = OriginRegion),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"ID", "Pickup Day of Week", "Delivery Day of Week"}),
#"Expanded Append1 (3)" = Table.ExpandTableColumn(#"Removed Columns", "Append1 (3)", {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}, {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Append1 (3)",{"Wednesday Destination Region Code"},Wednesday,{"Wednesday Origin Region Code"},"Append1 (4)",JoinKind.Inner),
#"Expanded Append1 (4)" = Table.ExpandTableColumn(#"Merged Queries", "Append1 (4)", {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"}, {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"})
#"Merged Queries1" = Table.NestedJoin(#"Expanded Append1 (4)",{"Thursday Destination Region Code"},Thursday,{"Thursday Origin Region Code"},"Append1 (5)",JoinKind.Inner)
in
#"Merged Queries1"
This might help:
let
Source = (VARIABLE1 as table, VARIABLE2 as table) => Table.NestedJoin(VARIABLE1, Key1, VARIABLE2, Key1, "Append", JoinKind.Inner)
in
Source
You can use parameters for Key1 and Key2. The function will prompt you to select your tables.
You can invoke it from any other query with:
Function.Invoke(Merge,{Table1,Table2})
Replace Merge with whatever you named the first query above and replace Table1 and Table2 with your target tables.
In case you're thinking of it, I have not been able to figure out how to pass tables from parameters. When you do that, the value you enter is recognized as text--for instance, "Table" versus Table--so it won't work. I could not find any information on how to pass a table value, like Table, in a variable. Anyhow, I hope this helps at least a little.
I was searching for this, too!
I finally found it, thanks to Chris Webb at https://blog.crossjoin.co.uk/2015/02/06/expression-evaluate-in-power-querym/
The key is using Expression.Evaluate with #shared as the second argument.
If you define Query1 as
let
Source = 1 + 1
in
Source
Query2 as
let
Source = 15 * 10
in
Source
define pIndex as a parameter that is "1" or "2", and
define QuerySwitch as
Expression.Evaluate("Query" & pIndex, #shared)
then QuerySwitch will return
2 when pIndex is "1"
150 when pIndex is "2"
My example:
I have a query QueryThatTakesFiveMinutes that
other queries use, and
writes to an Excel table (also named "QueryThatTakesFiveMinutes")
If I define a query "QueryThatTakesFiveMinutes Cached" by moving my cursor to the output QueryThatTakesFiveMinutes table in Excel and creating a new query from that table then, when I'm testing, I can change all the queries that use QueryThatTakesFiveMinutes to instead use #"QueryThatTakesFiveMinutes cached" and test downstream computation without waiting five minutes every time. Then I just need to remember to change it back when I'm ready.
But that was annoying.
I created a named range in Excel called "ProductionMode" that pointed to a specific cell that holds a value of either TRUE or FALSE
In Power-Query, I defined a very handy power query function called fNamedCellValue as
(rangeName as text) => Excel.CurrentWorkbook(){[Name=rangeName ]}[Content]{0}[Column1]
so that I can define a "ProductionMode" query as
fGetNamedCellValue("ProductionMode")
I use this in a way that's similar to the Index parameter above, but this way I can edit it via Excel.
When I defined "modeQueryThatTakesFiveMinutes" as
if ProductionMode then QueryThatTakesFiveMinutes else #"QueryThatTakesFiveMinutes Cached"
and changed all queries that use QueryThatTakesFiveMinutes to use modeQueryThatTakesFiveMinutes instead, I was very surprised to find that both QueryThatTakesFiveMinutes and #"QueryThatTakesFiveMinutes Cached" were evaluated and it didn't save any time at all!
So then after searching, being overjoyed to find your question only to realize it wasn't answered, then finding Chris Webb's article, I tried redefining modeQueryThatTakesFiveMinutes as
Expression.Evaluate(
if ProductionMode then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
Unfortunately, instead of working, I got an error of
Formula.Firewall: Query 'modeQueryThatTakesFiveMinutes' references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
However, I found a way around this, too, by putting the offending code within a function that the consuming query executes.
Deleting ProductionMode and defining a new query fProductionMode of
() => fGetNamedCellValue("ProductionMode") as logical
now doesn't return true or false, it returns a function that will return true or false when evaluated. Why is one legal and the other isn't? I don't know, but it is! Change the definition of modeQueryThatTakesFiveMinutes to
Expression.Evaluate(
if fProductionMode() then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
and it works!
I've used Power Query to add custom fields to a table made from 2 merged tables in order to simulate a pivot table. However, I can't seem to add a filter to my final table. Is there another way to do this?
I've tried to use the Pivot table from Excel, but I can't seem to insert calculated field as desired.
Here's my Excel file:
https://ufile.io/x2v1j
I'll start with a disclaimer that I'm not exactly sure I know what you're trying to do; but I took a stab at this anyway.
I figured you were trying to filter the months in the T_Catégories query, before your grouping; so I added a manual filter step there. When I did that and deselected months, your T_Final query broke. The reason is because, as I filtered out months, it also filtered out categories that your T_Final query relied upon for column names. For instance, this affected your calculations that relied upon column names. I had to change your T_Final query so that it would dynamically determine the column names.
Again, I'm not exactly sure about what you're trying to do, so I may have gotten it wrong with respect to the calculations, but this might help get you closer at least.
Like I said, in T_Catégories, I added the filter:
That's when things broke for T_Final. So in T_Final, I needed to:
Change the step Valeur remplacée1 to = Table.ReplaceValue(#"Colonne dynamique",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Colonne dynamique"))
(I was pretty sure you were using the columns resulting from the previous step Colonne dynamique.)
Change the step Personnalisée ajoutée3 to = Table.AddColumn(#"Valeur remplacée1", "Total général", each List.Sum(List.RemoveFirstN(Record.ToList(_),1)))
(This is making a list from the record, then removing the first entry of the list and summing what remains in the list.)
Change the step Colonnes permutées to = Table.ReorderColumns(#"Personnalisée ajoutée3",Table.ColumnNames(#"Personnalisée ajoutée3"))
(I was pretty sure you were using the column resulting from the previous step Personnalisée ajoutée3.)
Change the step Personnalisée ajoutée to = Table.AddColumn(#"Colonnes permutées", "Indisponibilté", each List.Sum(List.RemoveLastN(List.RemoveFirstN(Record.ToList(_),1),2)))
(This is making a list from the record, then removing the first entry of the list, then removing the last two entries of the list, and summing what remains in the list. This is especially where I'm not sure I added the items you intended. At least you can see what I did to be able to add the columns without using static column names.)
Here's the m code for the three queries:
T_Catégories:
let
Source = Excel.CurrentWorkbook(){[Name="T_Catégories"]}[Content],
#"Type modifié" = Table.TransformColumnTypes(Source,{{"Métier", type text}, {"Code absence", Int64.Type}, {"Date", type date}, {"Catégorie", type text}}),
#"Colonnes supprimées" = Table.RemoveColumns(#"Type modifié",{"Code absence", "Date"}),
#"Filtered Rows" = Table.SelectRows(#"Colonnes supprimées", each true),
#"Lignes groupées" = Table.Group(#"Filtered Rows", {"Métier", "Catégorie"}, {{"Nombre", each Table.RowCount(_), type number}})
in
#"Lignes groupées"
T_métiers:
let
Source = Excel.CurrentWorkbook(){[Name="T_métiers"]}[Content],
#"Type modifié" = Table.TransformColumnTypes(Source,{{"Métier", type text}, {"Nombre", Int64.Type}})
in
#"Type modifié"
T_Final:
let
Source = Table.Combine({T_Catégories, T_métiers}),
#"Valeur remplacée" = Table.ReplaceValue(Source,null,"Nombre employés",Replacer.ReplaceValue,{"Catégorie"}),
#"Colonne dynamique" = Table.Pivot(#"Valeur remplacée", List.Distinct(#"Valeur remplacée"[Catégorie]), "Catégorie", "Nombre"),
#"Valeur remplacée1" = Table.ReplaceValue(#"Colonne dynamique",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Colonne dynamique")),
#"Personnalisée ajoutée3" = Table.AddColumn(#"Valeur remplacée1", "Total général", each List.Sum(List.RemoveFirstN(Record.ToList(_),1))),
#"Colonnes permutées" = Table.ReorderColumns(#"Personnalisée ajoutée3",Table.ColumnNames(#"Personnalisée ajoutée3")),
#"Personnalisée ajoutée" = Table.AddColumn(#"Colonnes permutées", "Indisponibilté", each List.Sum(List.RemoveLastN(List.RemoveFirstN(Record.ToList(_),1),2))),
#"Personnalisée ajoutée1" = Table.AddColumn(#"Personnalisée ajoutée", "Disponibilté", each [Nombre employés]*7.5),
#"Personnalisée ajoutée2" = Table.AddColumn(#"Personnalisée ajoutée1", "Taux disponibilté (%)", each (1-[Indisponibilté]/[Disponibilté])*100),
#"Type modifié" = Table.TransformColumnTypes(#"Personnalisée ajoutée2",{{"Indisponibilté", Int64.Type}, {"Disponibilté", type number}, {"Taux disponibilté (%)", type number}})
in
#"Type modifié"
I would think you can progress from here fairly well.