Im attempting to get the Href information from the following site using Power query:
https://hpvchemicals.oecd.org/ui/SIDS_Details.aspx?id=fc1ced8a-ce14-45fa-b003-dfeda5e38075
As per the page I wish to obtain the href for the 50000.pdf link.
Inspecting the page this should be: handler.axd?id=fae8d1b1-406b-4287-8a05-f81aa1b16d3f
However attempting this in Power query this appears to be ommited from the text:
M Code:
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://hpvchemicals.oecd.org/ui/SIDS_Details.aspx?id=fc1ced8a-ce14-45fa-b003-dfeda5e38075"))})
in
Source
My question is why does this happen? I dont think it can be solved (if so great) but Im still interested to understand whats going on here.
It is using an iframe. Try this.
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://hpvchemicals.oecd.org/ui/SidsOrganigrame.aspx?SIDSNo=fc1ced8a-ce14-45fa-b003-dfeda5e38075&id=000c31fa-483a-4e5b-a8bb-c26c3148e464&Key=1c143ab1-b132-4b57-b34d-559b07c845f2&Idx=0"))}),
#"Filtered Rows" = Table.SelectRows(Source, each [Column1] = " <img src=""images/FiletypeIcone/htm.png"" height=""16"" width=""16"" border=""0"" /> SIAR published by UNEP<br /><img src=""images/FiletypeIcone/pdf.ico"" height=""16"" width=""16"" border=""0"" /> FORMALDEHYDE_50000.pdf<br />")
in
#"Filtered Rows"
Related
Has anyone experience how to import global address book from Outlook to Excel/Power BI using Power Query?
I managed to retrieve some of users from my Companies Email using Data from > Exchange Online
let
Source = Exchange.Contents("XYZ.com"),
People = Source{[Name="People"]}[Data]
in
People
But it doesnt give me whole list of Outlook users from the Company...
I found different code but doesnt work for me, it says: my domain doesnt exist. And I am not very familiar with Active Directory and so on...any inputs and tricks?
let
Source = ActiveDirectory.Domains("CompanyDomain.com"),
CompanyDomain.com = Source{[Domain = "CompanyDomain.com"]}[Object Categories],
user = CompanyDomain.com{[Category = "user"]}[Objects],
#"Removed Other Columns" = Table.SelectColumns(user, {"displayName", "user",
"organizationalPerson"}),
#"Expanded organizationalPerson" = Table.ExpandRecordColumn(#"Removed Other Columns",
"organizationalPerson", {"department"}, {"department"}),
#"Expanded user 1" = Table.ExpandRecordColumn(#"Expanded organizationalPerson", "user",
{"mail"}, {"mail"})
in
#"Expanded user 1"
I am using Office365 version.
Create a query that is just the source line above. For example:
let
Source = ActiveDirectory.Domains("yourdomain.com")
in
Source
In Power Query, when you click on the Source line on the right, what does it show? In my case, I get a table of domains. It would be one of those values you would use in the "CompanyDomain.com =" line above.
If Source is giving you the error about not finding the domain, then you'll have to have someone tell you what that value is for your company. (Mine was automatically filled in when I chose the Active Directory connector, and it's not just CompanyName.com.)
Hope it helps,
Alan
So Power Query doesn't have the Html.Table Function that can be found in Power BI.
So can we use regex to convert Html into text and make an equivalent function?
Previous posts state that this shouldn't be done since HTML doesn't follow the same rules as text, however .. needs must. Its also just interesting as a question and if achievable, would prove very useful for scraping difficult pages, directly in excel.
I came across this regex:
https://regex101.com/r/AtElMH/2 From and answer on the following post. Seems to work reasonably well.
So Im wondering if I can use this to tidy up any HTML that I pull into excel from the web connector. Each line from the table in blue comes from submitting the HTML to https://www.textfixer.com/html/html-to-text.php just to give an idea of what each row should expect. However, as per the Regex 101 link it does not have to be perfect i.e. if the occasional tag slips through, that's okay; it's more of a tidy-up. I would rather that than a pattern that loses data.
Currently, submitting this regex into FnRegexReplace Function results in an error. I don't know if the regex can be read correctly by excel and, if not, if there are any work arounds.
FnRegexReplace: Note: y = Text.Replace(y,"\","\\"), so no need for \\
(x,y,z)=>
let
y = Text.Replace(y,"\","\\"),
Source = Web.Page(
"<script>var x="&"'"&x&"'"&";var z="&"'"&z&
"'"&";var y=new RegExp('"&y&"','gmi');
var b=x.replace(y,z);document.write(b);</script>")
[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
Source
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "FnRegexReplace", each FnRegexReplace([Column1], "<([\w\-\/]+)( +[\w\-]+(=(('[^']*')|(""[^""]*"")))?)* *>", " "))
in
#"Invoked Custom Function"
HTML DATA:
</div><!-- SectionHeaderWrapper --><div id="SectionContent"><h3 id="sAdministrativeDataSummary" class="mDisabled">Administrative data</h3><h3 id="sWorkersHazardViaInhalationRoute">Workers - Hazard via inhalation route</h3><h4>Systemic effects</h4><h5>Long term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>no hazard identified</dd></dl></dl></dl></dl></dl></dl><h5>Acute/short term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>no hazard identified</dd></dl><h6>DNEL related information</h6></dl></dl></dl></dl></dl><h4>Local effects</h4><h5>Long term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>DNEL (Derived No Effect Level)
Value:</dt><dd><span class="UserEntry">0.02</span> mg/m³
Most sensitive endpoint:</dt><dd>repeated dose toxicity</dd></dl><h6>DNEL related information</h6><dl class="HorDL"><dt>DNEL derivation method:</dt><dd>other: <span class="UserEntry">Biocidal Products Regulation guidance for Human Health Risk Assessment (Volume III, Part B, December 2013</span>
Overall assessment factor (AF):</dt><dd class="UserEntry">16
Dose descriptor:</dt><dd>NOAEC
Value:</dt><dd><span class="UserEntry">0.34</span> mg/m³
AF for dose response relationship:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">NOAEC defined based on local effects of irritation/corrosion which are considered concentration dependent
AF for differences in duration of exposure:</dt><dd class="UserEntry">2
Justification:</dt><dd class="UserEntry">NOAEC derived from subchronic study therefore extrapolating to chronic duration
AF for interspecies differences (allometric scaling):</dt><dd class="UserEntry">2.5
Justification:</dt><dd class="UserEntry">Local effects observed only therefore toxicokinetics do not contribute to interspecies differences
AF for other interspecies differences:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">Local effects observed only therefore toxicokinetics do not contribute to interspecies differences
AF for intraspecies differences:</dt><dd class="UserEntry">3.2
Justification:</dt><dd class="UserEntry">Local effects observed only therefore toxicokinetics do not contribute to intraspecies differences
AF for the quality of the whole database:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">Hazards well characterised in multiple studies of good reliability
AF for remaining uncertainties:</dt><dd class="UserEntry">1
Justification:</dt><dd class="UserEntry">No remaining uncertainties</dd></dl></dl></dl></dl></dl><h5>Acute/short term exposure</h5><dl class="HorDL"><dt>Hazard assessment conclusion:</dt><dd>DNEL (Derived No Effect Level)
For fun, not using recursion or regex
//single column of HTML text as input into [Column1]
// removes all text betweeen all pairs of < and >
let Source = Csv.Document(File.Contents("C:\Temp\a.txt")),
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.ToList([Column1])),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1, Int64.Type),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Index", "Custom"),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom.1", each if [Custom]="<" or [Custom]=">" then [Custom] else null),
#"Duplicated Column" = Table.DuplicateColumn(#"Added Custom1", "Custom.1", "Custom.1 - Copy"),
#"Filled Down" = Table.FillDown(#"Duplicated Column",{"Custom.1 - Copy"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([#"Custom.1 - Copy"] = ">") and ([Custom.1] = null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1", "Custom.1", "Custom.1 - Copy"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Index"}, {{"data", each Text.Combine(_[Custom]), type text}})
in #"Grouped Rows"
then you'd probably go back and replace all HTML entities like
Entities
&
³
One answer so Far: (<[^<>]*>)+ however I am not sure how well this works for other HTML text using "___" as substitution
I am attempting to collect href data in power query for excel for any results found on https://echa.europa.eu/ when searching for 'Acetone'.
Current M Code:
let
Source = Web.Page(Web.Contents(
"https://echa.europa.eu/search-for-chemicals?" &
//Parameters
"p_auth=69hDou3E&p_p_id=disssimplesearch_WAR_disssearchportlet&p_p_lifecycle=1&p_p_state=normal&p_p_col_id=" &
"_118_INSTANCE_UFgbrDo05Elj__column-1&p_p_col_count=1&_disssimplesearch_WAR_disssearchportlet_javax.portlet.action=" &
"doSearchAction&_disssimplesearch_WAR_disssearchportlet_backURL=https%3A%2F%2Fecha.europa.eu%2Finformation-on-chemicals" &
"%3Fp_p_id%3Ddisssimplesearchhomepage_WAR_disssearchportlet%26p_p_lifecycle%3D0%26p_p_state%3Dnormal%26p_p_mode%3Dview" &
"%26p_p_col_id%3D_118_INSTANCE_UFgbrDo05Elj__column-1%26p_p_col_count%3D1%26_disssimplesearchhomepage_WAR_disssearchportlet_sessionCriteriaId%3D" &
"_disssimplesearchhomepage_WAR_disssearchportlet_formDate=1621042609544&_disssimplesearch_WAR_disssearchportlet_searchOccurred=" &
"true&_disssimplesearch_WAR_disssearchportlet_sskeywordKey=Acetone&_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer" &
"=true&_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox=on")),
Data = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data,{{"Name", type text}, {"EC / List no.", type text}, {"CAS no.", type text}, {"BP", type text}, {"OBL", type text}})
in
#"Changed Type"
The parameters are form a previous VBA Post:
This returns:
As you can see the BP is returned just saying Open Brief Profile instead of the Href for each chemical.
Desired result for acetone in BP column:
I know this can be done using table from examples using Power BI but since I manipulate the data in excel it's more useful to pull it straight from here.
I have explored this previously with no success however https://community.powerbi.com/t5/Desktop/web-connector-and-getting-HREF-value/m-p/422068 gives me hope that it could be done? I have tried this though and run into issues.
If anyone could advise whether this could be done it would be appreciated. the final result is that column BP (not bothered about OBL) containsa href for each result in the table.
Try this:
let
Source = Excel.Workbook(Web.Contents("https://echa.europa.eu/search-for-chemicals?p_p_id=disssimplesearch_WAR_disssearchportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=exportResults&p_p_cacheability=cacheLevelPage&_disssimplesearch_WAR_disssearchportlet_sessionCriteriaId=dissSimpleSearchSessionParam101401654440118533&_disssimplesearch_WAR_disssearchportlet_formDate=1654440118558&_disssimplesearch_WAR_disssearchportlet_sskeywordKey=Acetone&_disssimplesearch_WAR_disssearchportlet_orderByCol=relevance&_disssimplesearch_WAR_disssearchportlet_orderByType=asc&_disssimplesearch_WAR_disssearchportlet_exportType=xls"))[Data]{0}
in
Source
I'm trying to make a dynamic oData URL. This works, but it applies the filter after downloading the data set. I need to use query folding, so it applies filter in the oData Query.
Can someone point me in right direction
let
// Get Parameters
Params = Excel.CurrentWorkbook(){[Name="tParams"]}[Content],
ItemNoValue = Params{0}[Value],
// Get D365 Data
Source = OData.Feed("https://xxxxxxxx&$filter=ItemNumber eq '11111111' &$select=dataAreaId,ItemNumber &$top=100", null, [Implementation="2.0"]),
#"Filter by Item Number" = Table.SelectRows(Source, each ([ItemNumber] = ItemNoValue))
in
#"Filter by Item Number"
What's also very strange is if I construct the URL in an excel cell, it STILL returns 30MB of data before displaying one row
let
// Get Parameters
UrlParam = Excel.CurrentWorkbook(){[Name="URL"]}[Content],
UrlValue = UrlParam[Column1]{0},
Source = OData.Feed(UrlValue, null, [Implementation="2.0"])
in
Source
I'm trying to achieve something that seems like it should be fairly simple but I can't find an answer for... replace the name of a table or power query with a variable.
Currently trying to do this with a merge query so it would look something like this:
Table.NestedJoin(VARIABLE1,key1,VARIABLE2,key2,"Append",JoinKind.Inner)
Currently getting all sorts of errors no matter what I try...
Thank you!
// Edit:
Not really looking to do a function - hoping for users to utilize as easy as possible so they would be able to update a named table in the workbook, refresh, and then get a table as an output. Here is my current code - hopefully that'll help. My Region code replacements worked fine, but the Days replacements don't - I need each day (Monday-Thursday) to be replaced with my day variables (StartDay, Day2, etc.). Each of those has a separate text query referring back to the excel workbook inputs, and each of them should pull up a query based on the text (ex: StartDay = Monday so should pull the Monday query). This is the error I get, assuming that it is reading it as text "Monday" and not query Monday.
Expression.Error: We cannot convert the value "Monday" to type Table.
Details:
Value=Monday
Type=Type
let
ANDOriginCode = OriginRegion,
ANDDestinationCode = DestinationRegion,
ANDStartDay = StartDay,
ANDDay2 = Day2,
ANDDay3 = Day3,
ANDDay4 = Day4,
ANDDay5 = Day5,
Source = Table.NestedJoin(Monday,{"Tuesday Destination Region Code"},Tuesday,{"Tuesday Origin Region Code"},"Append1 (3)",JoinKind.Inner),
#"Filtered Rows1" = Table.SelectRows(Source, each [Monday Origin Region Code] = OriginRegion),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"ID", "Pickup Day of Week", "Delivery Day of Week"}),
#"Expanded Append1 (3)" = Table.ExpandTableColumn(#"Removed Columns", "Append1 (3)", {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}, {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Append1 (3)",{"Wednesday Destination Region Code"},Wednesday,{"Wednesday Origin Region Code"},"Append1 (4)",JoinKind.Inner),
#"Expanded Append1 (4)" = Table.ExpandTableColumn(#"Merged Queries", "Append1 (4)", {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"}, {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"})
#"Merged Queries1" = Table.NestedJoin(#"Expanded Append1 (4)",{"Thursday Destination Region Code"},Thursday,{"Thursday Origin Region Code"},"Append1 (5)",JoinKind.Inner)
in
#"Merged Queries1"
This might help:
let
Source = (VARIABLE1 as table, VARIABLE2 as table) => Table.NestedJoin(VARIABLE1, Key1, VARIABLE2, Key1, "Append", JoinKind.Inner)
in
Source
You can use parameters for Key1 and Key2. The function will prompt you to select your tables.
You can invoke it from any other query with:
Function.Invoke(Merge,{Table1,Table2})
Replace Merge with whatever you named the first query above and replace Table1 and Table2 with your target tables.
In case you're thinking of it, I have not been able to figure out how to pass tables from parameters. When you do that, the value you enter is recognized as text--for instance, "Table" versus Table--so it won't work. I could not find any information on how to pass a table value, like Table, in a variable. Anyhow, I hope this helps at least a little.
I was searching for this, too!
I finally found it, thanks to Chris Webb at https://blog.crossjoin.co.uk/2015/02/06/expression-evaluate-in-power-querym/
The key is using Expression.Evaluate with #shared as the second argument.
If you define Query1 as
let
Source = 1 + 1
in
Source
Query2 as
let
Source = 15 * 10
in
Source
define pIndex as a parameter that is "1" or "2", and
define QuerySwitch as
Expression.Evaluate("Query" & pIndex, #shared)
then QuerySwitch will return
2 when pIndex is "1"
150 when pIndex is "2"
My example:
I have a query QueryThatTakesFiveMinutes that
other queries use, and
writes to an Excel table (also named "QueryThatTakesFiveMinutes")
If I define a query "QueryThatTakesFiveMinutes Cached" by moving my cursor to the output QueryThatTakesFiveMinutes table in Excel and creating a new query from that table then, when I'm testing, I can change all the queries that use QueryThatTakesFiveMinutes to instead use #"QueryThatTakesFiveMinutes cached" and test downstream computation without waiting five minutes every time. Then I just need to remember to change it back when I'm ready.
But that was annoying.
I created a named range in Excel called "ProductionMode" that pointed to a specific cell that holds a value of either TRUE or FALSE
In Power-Query, I defined a very handy power query function called fNamedCellValue as
(rangeName as text) => Excel.CurrentWorkbook(){[Name=rangeName ]}[Content]{0}[Column1]
so that I can define a "ProductionMode" query as
fGetNamedCellValue("ProductionMode")
I use this in a way that's similar to the Index parameter above, but this way I can edit it via Excel.
When I defined "modeQueryThatTakesFiveMinutes" as
if ProductionMode then QueryThatTakesFiveMinutes else #"QueryThatTakesFiveMinutes Cached"
and changed all queries that use QueryThatTakesFiveMinutes to use modeQueryThatTakesFiveMinutes instead, I was very surprised to find that both QueryThatTakesFiveMinutes and #"QueryThatTakesFiveMinutes Cached" were evaluated and it didn't save any time at all!
So then after searching, being overjoyed to find your question only to realize it wasn't answered, then finding Chris Webb's article, I tried redefining modeQueryThatTakesFiveMinutes as
Expression.Evaluate(
if ProductionMode then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
Unfortunately, instead of working, I got an error of
Formula.Firewall: Query 'modeQueryThatTakesFiveMinutes' references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
However, I found a way around this, too, by putting the offending code within a function that the consuming query executes.
Deleting ProductionMode and defining a new query fProductionMode of
() => fGetNamedCellValue("ProductionMode") as logical
now doesn't return true or false, it returns a function that will return true or false when evaluated. Why is one legal and the other isn't? I don't know, but it is! Change the definition of modeQueryThatTakesFiveMinutes to
Expression.Evaluate(
if fProductionMode() then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
and it works!