A better way to extract Subheading numbers using power query - excel

I am attempting to extract section heading numbers from a column in excel using power query.
I have already achieved this by matching with an existing list. However, I wonder if there is a better way to achieve this in fewer steps.
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
#"Trimmed Text1" = Table.TransformColumns(Source,{{"Column1", PowerTrim, type text}}),
SectionNumbers = Table.AddColumn(#"Trimmed Text1", "SectionNumber", (x) => Text.Combine(Table.SelectRows(SectionNumbers, each Text.Contains(x[Column1],[SectionNumbers], Comparer.OrdinalIgnoreCase))[SectionNumbers],", ")),
#"Split Column by Delimiter2" = Table.SplitColumn(SectionNumbers, "SectionNumber", Splitter.SplitTextByEachDelimiter({","}, QuoteStyle.None, true), {"SectionNumber.1", "SectionNumber.2"}),
#"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter2", "Custom", each if [SectionNumber.2] = null then [SectionNumber.1] else [SectionNumber.2]),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom1",{"Column1", "Custom"})
in
#"Removed Other Columns"
The Section numbers being matched to can be generated using:
SectionNumbers
let
Source = {1..16},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Custom", each {1..9}),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom.1", each "."),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Added Custom1", {{"Custom", type text}}, "en-GB"),{"Custom", "Custom.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Merged Columns1" = Table.CombineColumns(Table.TransformColumnTypes(#"Merged Columns", {{"Column1", type text}}, "en-GB"),{"Column1", "Merged"},Combiner.CombineTextByDelimiter(".", QuoteStyle.None),"SectionNumbers")
in
#"Merged Columns1"
Essentially I would like a way of extracting any decimal at the start of a cell, either 15.0. or 15.0, or even 15.0.1 etc.
I have considered using regex i.e. \d+\.\d+[.] which should work however I have many rows and find that regex sometimes is computationally intensive in this case, so it takes much longer to load than the Above M Code.

Another power query method
Since you know your section numbers you could:
Generate a (buffered) list of all the section numbers
see if the first space-separated part of the string in column 1 exists in the Section Number list.
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//create list of all serial numbers
SerialNumbers = List.Buffer(
let
Part1 = List.Transform({1..16}, each Text.From(_) & "."),
Part2 =List.Transform({1..10}, each Text.From(_) & "."),
sn = List.Accumulate(Part1,{}, (state, current)=> state &
List.Generate(
()=>[s=current & Part2{0}, idx=0],
each [idx] < List.Count(Part2),
each [s=current & Part2{[idx]+1}, idx=[idx]+1],
each [s]))
in
sn),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
let
x = Text.Split([Column1]," "){0}
in
if List.Contains(SerialNumbers,x) then x else null, type text)
in
#"Added Custom"

How about
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each if Text.PositionOfAny([Column1], {"0".."9"})>=0 then Text.BeforeDelimiter(Text.From([Column1])," ") else null)
in #"Added Custom"

Related

Create Power Query Column Recursively

EMI Calculation:
Number.Round((rate/12*pv)/(1-Number.Power(1+rate/12,-nper)),0)
where PV is initial loan amount
Rate is the initial rate of interest at the beginning
and nper is 180 = Tenure in months
here the EMI = 107485
Keeping EMI Fixed, now i need to compute monthly interest and principal
The interest calculated formula will be
Previous balance * Rate of interest/12 * no. of days / no of days in the previous month
So if there is no rate change the interest will be prev bal * rate of interest /12 else it will be for the no of days new interest rate applicable.
Principal = EMI-Interest paid
New Balance = Previous Balance - Principal
Amount
What i am not able to achieve is :
Every time Interest is calculated, i want to refer to the previous balance which is not available.
I have tried creating Recursive functions to compute Interest, principal and balance columns:
Below function computes EMI value at the beginning
PMT
(pv,rate,nper)=>
let
payment = Number.Round((rate/12*pv)/(1-Number.Power(1+rate/12,-nper)),0)
in
payment
Below function computes Interest amount (Index Value is the row number / payment no)
Get_InterestValue
(indexValue,table,rate,no_of_days,no_of_days_in_Month)=>
let
prev_idx=indexValue-1,
previousbal = get_previous_balance(indexValue,table),
interest=previousbal*rate/12/no_of_days_in_Month*no_of_days
in
interest
Below function computes Principal amount paid
GetPrincipalPaid
(indexvalue,table,EMI_amount,Interest_amount)=>
let
principal_amount= EMI_amount-Interest_amount
in
principal_amount
Below function computes Previous balance
get_previous_balance
(index_value,table)=>
let
prev_idx=index_value-1,
prev_bal=if prev_idx=0 then fParameter("ParameterTable","Initial_Loan_Amount") else Table.SelectRows(table, each ([Index] = prev_idx)){0}[Bal]
in
prev_bal
fParameter function reads the intial values table
The columns gives errors
I have found a solution as in following code:
Now I am facing performance issue. Below solution works good with 10 Months or smaller. once u go for a higher Tenure, the calculations are very slow and doesnt generate any output for long time or probably crashes before calculations are completed,
is there any way i can speed up the query processing?
PMT Function:
(pv,rate,nper)=>
let
payment = Number.Round((rate/12*pv)/(1-Number.Power(1+rate/12,-nper)),0)
in
payment
fParameter Function:
let Parameter=(TableName,ParameterLabel) =>
let
Source=Excel.CurrentWorkbook(){[Name=TableName]}[Content],
value=Source{[Parameter=ParameterLabel]}[Value]
in
value
in Parameter
Payment Schedule Query:
let
Custom1 = List.Generate(()=>Date.Month(fParameter("ParameterTable","StartDate")),each _<=fParameter("ParameterTable","Tenure_in_Months")*1.5+Date.Month(fParameter("ParameterTable","StartDate"))-1,each _+1),
#"Converted to Table" = Table.FromList(Custom1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Month", each if Number.Mod([Column1],12)=0 then 12 else Number.Mod([Column1],12)),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Year1", each Date.Year(fParameter("ParameterTable","StartDate"))+Number.RoundDown([Column1]/12)),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Year", each if [Month]=12 then [Year1]-1 else [Year1]),
#"Renamed Columns1" = Table.RenameColumns(#"Added Custom2",{{"Column1", "PaymentMonth"}}),
#"Added Custom3" = Table.AddColumn(#"Renamed Columns1", "Payment Date", each #date([Year],[Month],Date.Day(fParameter("ParameterTable","StartDate")))),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"Payment Date"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Payment Date", type date}}),
#"Added Custom4" = Table.AddColumn(#"Changed Type", "EMI", each EMIValue),
#"Merged Queries" = Table.NestedJoin(#"Added Custom4",{"Payment Date"},Loan_Rates,{"Date"},"Loan_Rates",JoinKind.FullOuter),
#"Expanded Loan_Rates" = Table.ExpandTableColumn(#"Merged Queries", "Loan_Rates", {"Date", "Loan Rate"}, {"Date", "Loan Rate"}),
#"Added Custom5" = Table.AddColumn(#"Expanded Loan_Rates", "New_Date", each if [Date] = null then [Payment Date] else [Date]),
#"Sorted Rows" = Table.Sort(#"Added Custom5",{{"New_Date", Order.Ascending}}),
#"Filled Down" = Table.FillDown(#"Sorted Rows",{"Loan Rate"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Payment Date", "Date"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"New_Date", "Payment Date"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Payment Date", "EMI", "Loan Rate"}),
#"Added Index" = Table.AddIndexColumn(#"Reordered Columns", "Index", 0, 1),
#"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 1, 1),
#"Merged Queries1" = Table.NestedJoin(#"Added Index1",{"Index"},#"Added Index1",{"Index.1"},"Added Index1",JoinKind.LeftOuter),
#"Expanded Added Index1" = Table.ExpandTableColumn(#"Merged Queries1", "Added Index1", {"Payment Date"}, {"Payment Date.1"}),
#"Added Custom7" = Table.AddColumn(#"Expanded Added Index1", "Payment.Date.2", each if [Index.1]=1 then fParameter("ParameterTable","Disbursement Date") else [Payment Date.1]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom7",{{"Payment.Date.2", type date}}),
#"Removed Columns2" = Table.RemoveColumns(#"Changed Type1",{"Payment Date.1"}),
#"Renamed Columns2" = Table.RenameColumns(#"Removed Columns2",{{"Payment.Date.2", "Payment Date.1"}}),
#"Added Custom6" = Table.AddColumn(#"Renamed Columns2", "Days", each [Payment Date]-[Payment Date.1]),
#"Extracted Days" = Table.TransformColumns(#"Added Custom6",{{"Days", Duration.Days, Int64.Type}}),
#"Added Custom8" = Table.AddColumn(#"Extracted Days", "No Of Days in Month", each Date.EndOfMonth([Payment Date.1])-Date.StartOfMonth([Payment Date.1])+#duration(1,0,0,0)),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom8",{"Index", "Payment Date.1"}),
#"Extracted Days1" = Table.TransformColumns(#"Removed Columns1",{{"No Of Days in Month", Duration.Days, Int64.Type}}),
#"Reordered Columns1" = Table.ReorderColumns(#"Extracted Days1",{"Payment Date", "Days", "EMI", "Loan Rate"}),
#"Renamed Columns3" = Table.RenameColumns(#"Reordered Columns1",{{"Index.1", "Index"}}),
#"Reordered Columns2" = Table.ReorderColumns(#"Renamed Columns3",{"Index", "Payment Date", "EMI", "No Of Days in Month", "Days", "Loan Rate"}),
#"Replaced Value" = Table.ReplaceValue(#"Reordered Columns2",null,0,Replacer.ReplaceValue,{"EMI"}),
#"Invoked Custom Function" = Table.AddColumn(#"Replaced Value", "Interest", each Get_InterestValue([Index], Table.Buffer(#"Replaced Value"))),
#"Added Custom9" = Table.AddColumn(#"Invoked Custom Function", "Principal", each [EMI]-[Interest]),
#"Added Custom11" = Table.AddColumn(#"Added Custom9", "Balance", each fParameter("ParameterTable","Initial_Loan_Amount")-List.Sum(List.FirstN(#"Added Custom9"[Principal],[Index]))),
#"Filtered Rows" = Table.SelectRows(#"Added Custom11", each [Interest] > 0)
in
#"Filtered Rows"
Loan_Rates table Query == This reads change of loan rates over a period:
let
Source = Excel.CurrentWorkbook(){[Name="Loan_Rates"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Loan Rate", Percentage.Type}})
in
#"Changed Type"
Get_InterestValue Function
(indexValue,table)=>
let
prev_idx=indexValue-1,
rate=Table.SelectRows(table, each ([Index] = indexValue)){0}[Loan Rate],
no_of_days_in_Month=Table.SelectRows(table, each ([Index] = indexValue)){0}[No Of Days in Month],
no_of_days=Table.SelectRows(table, each ([Index] = indexValue)){0}[Days],
previousbal = get_previous_balance(indexValue,table),
interest=previousbal*rate/12/no_of_days_in_Month*no_of_days
in
interest
GetPrincipalPaid Function:
(indexValue,table)=>
let
EMI_Amount=Table.SelectRows(table, each ([Index] = indexValue)){0}[EMI],
principal_amount= EMI_Amount-Get_InterestValue(indexValue,table)
in
principal_amount
GetBalance Function:
(indexValue,table)=>
let
bal = get_previous_balance(indexValue,table)-GetPrincipalPaid(indexValue,table)
in
bal
get_previous_balance function:
(indexValue,table)=>
let
prev_idx=indexValue-1,
prev_bal=if prev_idx=0 then fParameter("ParameterTable","Initial_Loan_Amount") else GetBalance(prev_idx,table)
in
prev_bal
EMIValue Function:
let
Source = PMT(fParameter("ParameterTable","Initial_Loan_Amount"), fParameter("ParameterTable","Rate_int_PA"), fParameter("ParameterTable","Tenure_in_Months")),
#"Converted to Table" = #table(1, {{Source}}),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "EMI"}}){0}[EMI]
in
#"Renamed Columns"

Appending incomplete sentences together using grammatical rules (Mainly solved But answer needs simplifying)

Pulling data into excel from a text file of PDF occasionally splits sentences.
Following a previous post, Using PQ I can group text together and then with regex identify . belonging to the end of a sentence and split using this. This works really well, except where a Subheading may exist without punctuation. This results in a minor error where sentences are occasionally joined together. for example
SUBHEADING 1
Non-sensical
sentence 1. Sentence 2.
Returns as:
SUBHEADING 1 Non-sensical sentence 1.
Sentence 2.
I initially thought this was the best that could be done, however, the focus so far has been on identifying the end of a sentence with . Additionally though, we could also use the fact that sentences tend to start with a capital letter to refine this further. The way excel interprets PDF pages means that prior transformations are required to group the text, which I am pretty certain rules out using regex.
Here is an example of what I am trying to do (orange) and where I currently am (green):
Notice how each sentence starts with a capital and ends with .. I think this will be a logical way to separate SUBHEADING/CHAPTERS, etc., from the rest of the text. From this, I can then apply the remainder of the transformations on the previous post and if im right, this should give better separation.
So far:
I have identified which sentences Start with a Captial letter/lowercase and which end in lowercase or . (where ending in a number is also considered lowercase). Using this, I think it's safe to assume that:
If a row ends lowercase AND the next row also contains lowercase,
this belongs to the same sentence and can be appended.
If the Sentence starts Uppercase and ends in a full stop, this is a
complete sentence and requires no changes.
If a Sentence starts uppercase without a .At the end and then Next
is uppercase; this is a subheading/Extra info, not a typical
sentence.
I think it is possible to generate the Desired table using these rules. I am having trouble determining the transformations required to append the sentences following the above rules. I need some sort of way to identify the Upper/lower/. of rows above and below and I just cant see how.
I will update as I progress; however, if anyone could comment on how to achieve this using these rules, that would be great.
Note there is an error with the current M code where 3d. Is recognised as uppercase.
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each if Text.Start([Column1], 1) = Text.Lower(Text.Start([Column1], 1)) then "Lowercase" else "Uppercase"),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each if Text.End([Column1], 1) = "." then Text.End([Column1], 1) else if Text.End([Column1], 1) = Text.Lower(Text.End([Column1], 1)) then "Lowercase" else null)
in
#"Added Custom1"
Source Data:
SUBHEADING 1
Non-sensical
sentence 1. Sentence 2.
Sentence 3 part 3a,
part 3b, and
part 3c.
SUBHEADING 1.1.
Non-sensical
sentence 4. Sentence 5.
Sentence 6 part 3a,
part 3b, 3c and
3d.
2.0. SUBHEADING
Extra Info 1 (Not a proper sentence)
Extra Info 2
Sentence 7.
SUBHEADING 3.0
Sentence 8.
SOLUTION SO FAR (Minor error if Subheading starts with number)
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Start", each if Text.Start([Column1], 1) = Text.Lower(Text.Start([Column1], 1)) then "Lowercase" else "Uppercase"),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "End", each if Text.End([Column1], 1) = "." then Text.End([Column1], 1) else if Text.End([Column1], 1) = Text.Lower(Text.End([Column1], 1)) then "Lowercase" else null),
#"Added Custom4" = Table.AddColumn(#"Added Custom1", "Complete Sentences", each if [Start] = "Uppercase" and [End] = "." then "COMPLETE" else null),
#"Added Index" = Table.AddIndexColumn(#"Added Custom4", "Index", 0, 1, Int64.Type),
#"Added Custom2" = Table.AddColumn(#"Added Index", "StartCheckCaseBelow", each try #"Added Index" [Start] { [Index] + 1 } otherwise "COMPLETE"),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom2",{"Index"}),
#"Added Custom3" = Table.AddColumn(#"Removed Columns1", "CompleteHeadings_CompleteText", each if [StartCheckCaseBelow]= "COMPLETE" then "COMPLETE" else if [Start] = "Uppercase" and [StartCheckCaseBelow] = "Uppercase" then "COMPLETE" else null),
#"Added Custom5" = Table.AddColumn(#"Added Custom3", "Custom", each if [CompleteHeadings_CompleteText] = null then if [Start] = "Uppercase" and [End] = "Lowercase" then "START" else if [End] = "." then "END" else null else null),
#"Added Index1" = Table.AddIndexColumn(#"Added Custom5", "Index", 0, 1, Int64.Type),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Added Index1", {{"Index", type text}}, "en-GB"),{"Index", "CompleteHeadings_CompleteText"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"CompleteHeadings_CompleteText"),
#"Added Index3" = Table.AddIndexColumn(#"Merged Columns", "Index", 0, 1, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Index3", each ([Custom] = "START")),
#"Added Index2" = Table.AddIndexColumn(#"Filtered Rows", "temp", 0, 1, Int64.Type),
combined = #"Added Index2" & Table.SelectRows(#"Added Index3", each [Custom] <> "START"),
#"Sorted Rows" = Table.Sort(combined,{{"Index", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index"}),
#"Added Custom6" = Table.AddColumn(#"Removed Columns", "Custom.1", each if Text.Contains([CompleteHeadings_CompleteText], "COMPLETE") then [CompleteHeadings_CompleteText] else if [Complete Sentences] <> null then [Complete Sentences] else [temp]),
#"Filled Down" = Table.FillDown(#"Added Custom6",{"Custom.1"}),
#"Removed Other Columns" = Table.SelectColumns(#"Filled Down",{"Column1", "Custom.1"}),
#"Grouped Rows" = Table.Group(#"Removed Other Columns", {"Custom.1"}, {{"Text", each Text.Combine([Column1], " "), type text}}),
#"Removed Other Columns1" = Table.SelectColumns(#"Grouped Rows",{"Text"})
in
#"Removed Other Columns1"
This works to achieve the table in orange however requires many steps which need to be simplified. If you could advise on this or know of a better way please let me know.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each if Text.Start([Column1], 1) = Text.Lower(Text.Start([Column1], 1)) then " " else "| "),
#"Merged Columns" = Table.CombineColumns(#"Added Custom",{ "Custom", "Column1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Transposed Table" = Table.Transpose(#"Merged Columns"),
Custom1 = Table.ColumnNames( #"Transposed Table"),
Custom2 = #"Transposed Table",
#"Merged Columns1" = Table.CombineColumns(Custom2,Custom1,Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Merged Columns1", {{"Merged", Splitter.SplitTextByDelimiter("| ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Merged"),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Split Column by Delimiter", {{"Merged", Splitter.SplitTextByDelimiter(". ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Merged"),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Merged", type text}})
in
#"Changed Type"
Will be interested to see if there are more elegant solutions out there.

A Better way to Extract Sentences from PDFs using Power Query/BI

I have been exploring the idea of using Power BI to dynamically try to extract information from any PDF if given a URL.
In this case the URL is https://hpvchemicals.oecd.org/ui/handler.axd?id=fae8d1b1-406b-4287-8a05-f81aa1b16d3f which is a safety assessment profile for formaldehyde. I wish to be able to extract as many sentences from a PDF document as possible.
Assuming ". " identifies the end of a sentence this actually works really well where paragraphs are concerned, along with some other trickery, to split the entire PDFinto sentences which I can then search.
M Code (updated for improved Extraction):
let
Source = Pdf.Tables(Web.Contents("https://hpvchemicals.oecd.org/ui/handler.axd?id=fae8d1b1-406b-4287-8a05-f81aa1b16d3f"), [Implementation="1.3"]),
#"Expanded Data" = Table.ExpandTableColumn(Source, "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29", "Column30", "Column31", "Column32", "Column33", "Column34", "Column35", "Column36", "Column37", "Column38", "Column39", "Column40", "Column41", "Column42"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29", "Column30", "Column31", "Column32", "Column33", "Column34", "Column35", "Column36", "Column37", "Column38", "Column39", "Column40", "Column41", "Column42"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Data", each ([Kind] = "Page")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Name", "Kind"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Columns1", "Index", 0, 1, Int64.Type),
Exclude={"Id"},
List=List.Difference(Table.ColumnNames(#"Removed Columns1"),Exclude),
MergeAllColumns= Table.AddColumn(#"Added Index","Custom", each Text.Combine(Record.ToList( Table.SelectColumns(#"Added Index",List){[Index]}), " ")),
#"Removed Other Columns" = Table.SelectColumns(MergeAllColumns,{"Id", "Custom"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Removed Other Columns", {{"Custom", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter",". ",".. #",Replacer.ReplaceText,{"Custom"}),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Custom", Splitter.SplitTextByDelimiter(". ", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
#"Cleaned Text" = Table.TransformColumns(#"Split Column by Delimiter1",{{"Custom", Text.Clean, type text}}),
#"Added Index1" = Table.AddIndexColumn(#"Cleaned Text", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index1", "Custom.1", each if Text.Contains([Custom], "#") then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom.1"}),
#"Grouped Rows" = Table.Group(#"Filled Down", {"Custom.1"}, {{"Custom", each Text.Combine([Custom], " "), type text}}),
#"Split Column by Delimiter2" = Table.ExpandListColumn(Table.TransformColumns(#"Grouped Rows", {{"Custom", Splitter.SplitTextByDelimiter(". ", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
#"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter2", "Custom.2", each if Text.Contains([Custom], "NOAEL") then [Custom] else null)
in
#"Added Custom1"
So from this:
to this:
Although this sentence splitting works pretty well using ". " I'm now being greedy and wondering if this can be done in an even better way.
There are some instances where the Sentence doesn't split correctly which could be improved. For example, if the end of a sentence is joined to the next e.g. Hello.How are you would not split. Whilst ...e.g. Apple itself is recognised would split into. ...e.g. and Apple.
Python - RegEx for splitting text into sentences (sentence-tokenizing)
Appears to propose doing this using regex however with this excel regex replace function this doesn't appear to work.
fnRegexExtr3 (doesn't require \\ just \):
// regexReplace
let regexReplace=(text as nullable text,pattern as nullable text,replace as nullable text, optional flags as nullable text) as text =>
let
f=if flags = null or flags ="" then "" else flags,
l1 = List.Transform({text, pattern, replace}, each Text.Replace(_, "\", "\\")),
l2 = List.Transform(l1, each Text.Replace(_, "'", "\'")),
t = Text.Format("<script>var txt='#{0}';document.write(txt.replace(new RegExp('#{1}','#{3}'),'#{2}'));</script>", List.Combine({l2,{f}})),
r=Web.Page(t)[Data]{0}[Children]{0}[Children],
Output=if List.Count(r)>1 then r{1}[Text]{0} else ""
in Output
in regexReplace
please feel free to look at the PDF link and propose any suggestions for improving the capturing of data.
I will continue to update here with any progress.

Power Query: Expression.Error: There weren't enough elements in the enumeration to complete the operation. (LIST)

What I am trying to achieve is to obtain "matches/pairs" from two tables. One (source 1)is data table with Date/Time and Pressure value columns and the other (source 2) is like Date/Time and Info value Columns. Second table has so called "pairs" , start and stop in certain time. I want to get exact matches when is found in source 1 or approximate match when is not exact as in source 1 (seconds can be a problem).
Lets say you are matching/lookup two tables, give me everything that falls between for instance 15.01.2022 06:00:00 and 15.01.2022 09:15:29.
Where I have a problem is more likely exact match and seconds. It is skipping or cant find any pair if the seconds are not matching. So my question is how to make if not seconds then lookup for next availablee match, can be a minute too as long as they are in the given range (start stop instances).
That is a reason I am getting this Expression error. Or is there a way to skip that error and proceed with Query??
Link to download the data:
https://docs.google.com/spreadsheets/d/1Jv5j7htAaEFktN0ntwOZCV9jesF43tEP/edit?usp=sharing&ouid=101738555398870704584&rtpof=true&sd=true
On the code below is what I am trying to do:
let
//Be sure to change the table names in the Source= and Source2= lines to be the actual table names from your workbook
Source = Excel.CurrentWorkbook(){[Name="Parameters"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date/Time", type datetime}, {"P7 [mbar]", Int64.Type}}),
//get start/stop times table
Source2 = Excel.CurrentWorkbook(){[Name="Log_Original"]}[Content],
typeIt = Table.TransformColumnTypes(Source2, {"Date/Time", type datetime}),
#"Filtered Rows" = Table.SelectRows(typeIt, each ([#"Date/Time"] <> null)),
#"Added Index" = Table.AddIndexColumn(#"Filtered Rows", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "NextLineStart", each if Text.Contains([Info],"start", Comparer.OrdinalIgnoreCase) = true
and Text.Contains(#"Added Index"[Info]{[Index]+1},"start",Comparer.OrdinalIgnoreCase) = true
then "delete"
else null),
#"Filtered Rows1" = Table.SelectRows(#"Added Custom", each ([NextLineStart] = null)),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Index", "NextLineStart"}),
//create a list of all the relevant start/stop times
filterTimes = List.Combine(
List.Generate(
()=> [times = List.DateTimes(#"Removed Columns1"[#"Date/Time"]{0},
Duration.TotalSeconds(#"Removed Columns1"[#"Date/Time"]{1}-#"Removed Columns1"[#"Date/Time"]{0})+1,
#duration(0,0,0,1)), IDX = 0],
each [IDX] < Table.RowCount(#"Removed Columns1"),
each [times = List.DateTimes(#"Removed Columns1"[#"Date/Time"]{[IDX]+2},
Duration.TotalSeconds(#"Removed Columns1"[#"Date/Time"]{[IDX]+3}-#"Removed Columns1"[#"Date/Time"]{[IDX]+2})+1,
#duration(0,0,0,1)), IDX = [IDX]+2],
each [times]
)
),
//filter the table using the list
filterTimesCol = Table.FromList(filterTimes,Splitter.SplitByNothing()),
filteredTable = Table.Join(#"Changed Type","Date/Time",filterTimesCol,"Column1",JoinKind.Inner),
#"Removed Columns" = Table.RemoveColumns(filteredTable,{"Column1"}),
#"Added Custom1" = Table.AddColumn(#"Removed Columns", "Custom", each DateTime.ToText([#"Date/Time"],"dd-MMM-yy")),
#"Filtered Rows2" = Table.SelectRows(#"Added Custom1", each [#"Date/Time"] > #datetime(2019, 01, 01, 0, 0, 0)),
#"Sorted Rows" = Table.Sort(#"Filtered Rows2",{{"Date/Time", Order.Ascending}})
in
#"Sorted Rows"
I set up the below to return a sorted table with all results between the start and ending date/times. You can then select the first or middle or bottom row of each table if you want from this point. Its hard to tell from your question if you are looking for the value closest to the start value, closest to the end value or something inbetween. You can wrap my Table.Sort with a Table.FirstN or Table.LastN to pick up the first or last row.
I left most of your starting code alone
let Source = Table.Buffer(T1),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date/Time", type datetime}, {"P7 [mbar]", Int64.Type}}),
//get start/stop times table
Source2 = T2,
typeIt = Table.TransformColumnTypes(Source2, {"Date/Time", type datetime}),
#"Filtered Rows" = Table.SelectRows(typeIt, each ([#"Date/Time"] <> null)),
#"Added Index" = Table.AddIndexColumn(#"Filtered Rows", "Index", 0, 1),
// shift Info up one row for comparison
shiftedList = List.RemoveFirstN( #"Added Index"[Info],1),
custom1 = Table.ToColumns( #"Added Index") & {shiftedList},
custom2 = Table.FromColumns(custom1,Table.ColumnNames( #"Added Index") & {"NextInfo"}),
#"Filtered Rows2" = Table.SelectRows(custom2, each not (Text.Contains([Info],"start", Comparer.OrdinalIgnoreCase) and Text.Contains([NextInfo],"start", Comparer.OrdinalIgnoreCase))),
#"Added Custom3" = Table.AddColumn(#"Filtered Rows2", "Type", each if Text.Contains(Text.Lower([Info]),"start") then "start" else if Text.Contains(Text.Lower([Info]),"finished") then "finished" else null),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom3",{"Info", "NextInfo"}),
#"Added Custom1" = Table.AddColumn(#"Removed Columns2", "Custom", each if [Type]="start" then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom1",{"Custom"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Type]), "Type", "Date/Time"),
#"Added Custom2" = Table.AddColumn(#"Pivoted Column","Table",(i)=>Table.Sort(Table.SelectRows(T1, each [#"Date/Time"]>=i[start] and [#"Date/Time"]<=i[finished]),{{"Date/Time", Order.Ascending}}) , type table )
in #"Added Custom2"

Feed cell value into excel query web browser URL

My problem:
Through New Query -> From Other Sources -> From Web, I entered a static URL that allowed me to load approximately 60k "IDs" from a webpage in JSON format.
I believe each of these IDs corresponds to an item.
So they're all loaded and organised in a column, with one ID per line, inside a Query tab.
For the moment, no problem.
Now I need to import information from a dynamic URL that depends on the ID.
So I need to import from URL in this form:
http://www.example.com/xxx/xxxx/ID
This imports the following for each ID:
name of correspond item,
average price,
supply,
demand,
etc.
After research I came to the conclusion that I had to use the "Advanced Editor" inside the query editor to reference the ID query tab.
However I have no idea how to put together the static part with the ID, and how to repeat that over the 60k lines.
I tried this:
let
Source = Json.Document(Web.Contents("https://example.com/xx/xxxx/" & ID)),
name1 = Source[name]
in
name1
This returns an error.
I think it's because I can't add a string and a column.
Question: How do I reference the value of the cell I'm interested in and add it to my string ?
Question: Is what I'm doing viable?
Question: How is Excel going to handle loading 60k queries?
Each query is only a few words to import.
Question: Is it possible to load information from 60k different URLs with one query?
EDIT : thank you very much for answer Alexis, was very helpful. So to avoid copying what you posted I did it without the function (tell me what you think of it) :
let
Source = Json.Document(Web.Contents("https://example.com/all-ID.json")),
items1 = Source[items],
#"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
#"Inserted Merged Column" = Table.AddColumn(#"Renamed Columns", "URL", each Text.Combine({"http://example.com/api/item/", Text.From([ID], "fr-FR")}), type text),
#"Added Custom" = Table.AddColumn(#"Inserted Merged Column", "Item", each Json.Document(Web.Contents([URL]))),
#"Expanded Item" = Table.ExpandRecordColumn(#"Added Custom", "Item", {"name"}, {"Item.name"})
in
#"Expanded Item"
Now the problem I have is that it takes ages to load up all the information I need from all the URLs.
As it turns out it's possible to extract from multiple IDs at once using this format : http://example.com/api/item/ID1,ID2,ID3,ID4,...,IDN
I presume that trying to load from an URL containing all of the IDs at once would not work out because the URL would contain way too many characters to handle.
So to speed things up, what I'm trying to do now is concatenate every Nth row into one cell, for example with N=3 :
205
651
320165
63156
4645
31
6351
561
561
31
35
would become :
205, 651, 320165
63156, 4645, 31
6351, 561, 561
31, 35
The "Group by" functionnality doesn't seem to be what I'm looking for, and I'm not sure how to automatise that throught Power Query
EDIT 2
So after a lot of testing I found a solution, even though it might not be the most elegant and optimal :
I created an index with a 1 step
I created another costum column, I associated every N rows with an N increasing number
I used "Group By" -> "All Rows" to create a "Count" column
Created a costum column "[Count][ID]
Finally I excracted values from that column and put a "," separator
Here's the code for N = 10 000 :
let
Source = Json.Document(Web.Contents("https://example.com/items.json")),
items1 = Source[items],
#"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"ID", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Conditional Column" = Table.AddColumn(#"Added Index", "Custom", each if Number.RoundDown([Index]/10000) = [Index]/10000 then [Index] else Number.IntegerDivide([Index],10000)*10000),
#"Reordered Columns" = Table.ReorderColumns(#"Added Conditional Column",{"Index", "ID", "Custom"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns", {"Custom"}, {{"Count", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom.1", each [Count][ID]),
#"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Custom.1", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
#"Extracted Values"
I think what you want to do here is create a custom function that you invoke with each of your ID values.
Let me give a similar example that should point you in the right direction.
Let's say I have a table named ListIDs which looks like this:
ID
----
1
2
3
4
5
6
7
8
9
10
and for each ID I want to pull some information from Wikipedia (e.g. for ID = 6 I want to lookup https://en.wikipedia.org/wiki/6 and return the Cardinal, Ordinal, Factorization, and Divisors of 6).
To get this for just one ID value my query would look like this (using 6 again):
let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/6")),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
#"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
#"Pivoted Column"
Now we want to convert this into a function so that we can use it as many times as we want without creating a bunch of queries. (Note: I've named this query/function WikiLookUp as well.) To do this, change it to the following:
let
WikiLookUp = (ID as text) =>
let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/" & ID)),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
#"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
#"Pivoted Column"
in
WikiLookUp
Notice that all we did is wrap it in another set of let...in and defined the parameter ID = text which gets substituted into the Source line near the end. The function should appear like this:
Now we can go back to our table which we've imported into the query editor and invoke our newly created function in a custom column. (Note: Make sure you convert your ID values to text type first since they're being appended to a URL.)
Add a custom column with the following definition (or use the Invoke Custom Function button)
= WikiLookUp([ID])
Expand that column to bring in all the columns you want and you're done!
Here's what that query's M code looks like:
let
Source = Excel.CurrentWorkbook(){[Name="ListIDs"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each WikiLookUp([ID])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Cardinal", "Ordinal", "Factorization", "Divisors"}, {"Cardinal", "Ordinal", "Factorization", "Divisors"})
in
#"Expanded Custom"
The query should look like this:

Resources