Get cells of rows - node.js

I'm successfully getting the rows of a table as so:
var rows = await page.evaluate(() => Array.from(document.querySelectorAll('.summary > tbody > tr'), element => $(element)))
How do I get the children of each row as an array?
Do I do ... Array.from(rows[i].querySelectorAll(...?
I've tried a few different methods but I can't figure it out.

I understand your question so that you want to get only the values of cells grouped as an array for each of the rows present in a table. If that's correct, then you could do it in this way:
const rows = await page.evaluate(
() => Array.from( document.querySelectorAll('table > tbody > tr') ) // Get the rows as an array
.map(row => Array.from( row.querySelectorAll("td") ) // For each row get its cells as an array
.map(td => td.textContent)) // Replace each cell in the latter array with its text
)

Short answer:
Use the following code, which queries the direct child elements of the tr elements:
const rowChildren = await page.$$('.summary > tbody > tr > *');
Long Answer
Your code is not doing what you think it is doing. I go over your code, to show you the problem.
Problem
Here is your code again:
var rows = await page.evaluate(
() => Array.from(
document.querySelectorAll('.summary > tbody > tr'),
element => $(element),
)
)
What this code does is:
Run document.querySelectorAll in the browser
Map each element in the NodeList to a jQuery object (I'm assuming $ is jQuery here)
Call JSON.stringify on the array of jQuery objects (to serialize them)
puppeteer transfers your serialized data from the browser environment to the Node.js environment
rows now contains an array of "jQuery objects", without reference to their actual DOM node
So, this code does not give you the handle to the jQuery elements in the Node.js environment as the function page.evaluate can only return serializable objects (which DOM nodes are not). Although it looks like you successfully queried the DOM nodes, these objects are just the "jQuery wrappers" around the DOM nodes without the actual DOM nodes as these were not serialized.
Solution
To query DOM nodes from the browser environment, you have to use a function like page.$$, which can return ElementHandles. Therefore, using the following code, will return the actual tr rows:
const rows = await page.$$('.summary > tbody > tr');
To then further query their child elements, you can simply add a > * selector to the end, which will query all direct child nodes of the tr rows:
const rowChildren = await page.$$('.summary > tbody > tr > *');

i think you might be looking to do something along these lines
const rows = await page.evaluate(
() => Array.from(
document.querySelectorAll('.summary > tbody > tr'),
element => $(element)
)
)
let children = []
for (const row of rows)
children = [...children, ...row.children]

Related

How can I setup Pagination in Excel Power Query?

I am importing financial data using JSON from the web into excel, but as the source uses pagination (giving 50 results per page I need to implement pagination in order to import all the results.
The data source is JSON:
https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=1
or https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=2
?page=1, ?page=2, ?page=3
I use the following code to implement pagination, but receive an error:
= (page as number) as table =>
let
Source = Json.Document(Web.Contents("https://localbitcoins.com//sell-bitcoins-online/VES/.json?page=" & Number.ToText(page) )),
Data1 = Source{1}[Data],
RemoveBottom = Table.RemoveLastN(Data1,3)
in
RemoveBottom
When I envoke a parameter (1 for page 1) to test it I get the following error and I can't seem to find out why?
An error occurred in the ‘GetData’ query. Expression.
Error: We cannot convert a value of type Record to type List.
Details:
Value=Record
Type=Type
For the record, I try to include page handling using ListGenerate:
= List.Generate( ()=>
[Result= try GetData(1) otherwise null, page = 1],
each [Result] <> null,
each [Result = try GetData([page]+1) otherwise null, Page = [Page]+1],
each [Result])
What is the default way to implement pagination using Power Query in MS Excel?
I realise you asked this nearly a month ago and may have since found an answer, but will respond anyway in case it helps someone else.
This line Data1 = Source{1}[Data] doesn't make sense to me, since I think Source will be a record and you can't use {1} positional lookup syntax with records.
The code below returns 7 pages for me. You may want to check if it's getting all the pages you need/expect.
let
getPageOfData = (pageNumber as number) =>
let
options = [
Query = [page = Number.ToText(pageNumber)]
],
url = "https://localbitcoins.com/sell-bitcoins-online/VES/.json",
response = Web.Contents(url, options),
deserialised = Json.Document(response)
in deserialised,
responses = List.Generate(
() => [page = 1, response = getPageOfData(page), lastPage = null],
each [lastPage] = null or [page] <= [lastPage],
each [
page = [page] + 1,
response = getPageOfData(page),
lastPage = if [lastPage] = null then if Record.HasFields(response[pagination], "next") then null else page else [lastPage]
],
each [response]
)
in
responses
In List.Generate, my selector only picks the [response] field to keep things simple. You could drill deeper into the data either within selector itself (e.g. each [response][data][ad_list]) or create a new step/expression and use List.Transform to do so.
After a certain amount of drilling down and transforming, you might see some data like:
but that depends on what you need the data to look like (and which columns you're interested in).
By the way, I used getPageOfData in the query above, but this particular API was including the URL for the next page in its responses. So pages 2 and thereafter could have just requested the URL in the response (rather than calling getPageOfData).

Getting pagination to work with one to many join

I'm currently working on a database with several one-to-many and many-to-many relationships and I am struggling getting ormlite to work nicely.
I have a one-to-many relationship like so:
var q2 = Db.From<GardnerRecord>()
.LeftJoin<GardnerRecord, GardnerEBookRecord>((x, y) => x.EanNumber == y.PhysicalEditionEan)
I need to return a collection of ProductDto that has a nested list of GardnerEBookRecord.
Using the SelectMult() technique it doesn't work because the pagination breaks as I am condensing the left joined results to a smaller collection so the page size and offsets are all wrong (This method: How to return nested objects of many-to-many relationship with autoquery)
To get the paging right I need to be able to do something like:
select r.*, count(e) as ebook_count, array_agg(e.*)
from gardner_record r
left join gardner_e_book_record e
on r.ean_number = e.physical_edition_ean
group by r.id
There are no examples of this in the docs and I have been struggling to figure it out. I can't see anything that would function like array_agg in the Sql object of OrmLite.
I have tried variations of:
var q2 = Db.From<GardnerRecord>()
.LeftJoin<GardnerRecord, GardnerEBookRecord>((x, y) => x.EanNumber == y.PhysicalEditionEan)
.GroupBy(x => x.Id).Limit(100)
.Select<GardnerRecord, GardnerEBookRecord>((x, y) => new { x, EbookCount = Sql.Count(y), y }) //how to aggregate y?
var res2 = Db.SelectMulti<GardnerRecord, GardnerEBookRecord>(q2);
and
var q2 = Db.From<GardnerRecord>()
.LeftJoin<GardnerRecord, GardnerEBookRecord>((x, y) => x.EanNumber == y.PhysicalEditionEan)
.GroupBy(x => x.Id).Limit(100)
.Select<GardnerRecord, List<GardnerEBookRecord>>((x, y) => new { x, y });
var res = Db.SqlList<object>(q2);
But I can't work out how to aggregate the GardnerEBookRecord to a list and keep the paging and offset correct.
Is this possible? Any workaround?
edit:
I made project you can run to see issue:
https://github.com/GuerrillaCoder/OneToManyIssue
Database added as a docker you can run docker-compose up. Hopefully this shows what I am trying to do
Npgsql doesn't support reading an unknown array or records column type, e.g array_agg(e.*) which fails with:
Unhandled Exception: System.NotSupportedException: The field 'ebooks' has a type currently unknown to Npgsql (OID 347129).
But it does support reading an array of integers with array_agg(e.id) which you can query instead:
var q = #"select b.*, array_agg(e.id) ids from book b
left join e_book e on e.physical_book_ean = b.ean_number
group by b.id";
var results = db.SqlList<Dictionary<string,object>>(q);
This will return a Dictionary Dynamic Result Set which you'll need to combine into a distinct id collection to query all ebooks referenced, e.g:
//Select All referenced EBooks in a single query
var allIds = new HashSet<int>();
results.Each(x => (x["ids"] as int[])?.Each(id => allIds.Add(id)));
var ebooks = db.SelectByIds<EBook>(allIds);
Then you can create a dictionary mapping of id => Ebook and use it to populate a collection of ebooks entities using the ids for each row:
var ebooksMap = ebooks.ToDictionary(x => x.Id);
results.Each(x => x[nameof(ProductDto.Ebooks)] = (x["ids"] as int[])?
.Where(id => id != 0).Map(id => ebooksMap[id]) );
You can then use ServiceStack AutoMapping Utils to convert each Object Dictionary into your Product DTO:
var dtos = results.Map(x => x.ConvertTo<ProductDto>());

getRow for nested row

I have a table with nested datatree, I need to collapse particular rows but can't seem to get the row instance of the nested row. I can use the index of the top-level rows to do table.getRow(1) but table.getRow(2) fails. I can get the row instance of row 68 which is the next top-level row, so it seems I can't get row(2) because it is a child of row(1). is there an alternate way to get child rows? table.getRow(1).getRow(2)?
Check documentation check this jsfiddle and check console
use `var children = row.getTreeChildren();`
// Get all rows
const allRows = table.getRows();
console.log('allRows',allRows);
const firstRow = table.getRow(1);
console.log('firstRow',firstRow);
const firstRowChildren = firstRow.getTreeChildren();
console.log('firstRowChildren',firstRowChildren);
$('#toggle').click(function(){
firstRow.treeToggle();
});

Using composed xpath to locate an element and click on it

I am trying to retrieve a list of elements using XPATH and from this list I want to retrieve a child element based on classname and click it.
var rowList = XPATH1 + className;
var titleList = className + innerHTMLofChildElement;
for(var i = 0; i < titleList.length; i++) {
if(titleList[i][0] === title) {
browser.click(titleList[i][0]); //I don't know what to do inside the click function
}
}
I had a similar implementation perhaps to what you are trying to do, however my implementation is perhaps more complex due to using CSS selectors rather than XPath. I'm certain this is not optimized, and can most likely be improved upon.
This uses the methods elementIdText() and elementIdClick() from WebdriverIO to work with the "Text Values" of the Web JSON Elements and then click the intended Element after matching what you're looking for.
http://webdriver.io/api/protocol/elementIdText.html
http://webdriver.io/api/protocol/elementIdClick.html
Step 1 - Find all your potential elements you want to work with:
// Elements Query to return Elements matching Selector (titles) as JSON Web Elements.
// Also `browser.elements('<selector>')`
titles = browser.$$('<XPath or CSS selector>')
Step 2 - Cycle through the Elements stripping out the InnerHTML or Text Values and pushing it into a separate Array:
// Create an Array of Titles
var titlesTextArray = [];
titles.forEach(function(elem) {
// Push all found element's Text values (titles) to the titlesTextArray
titlesTextArray.push(browser.elementIdText(elem.value.ELEMENT))
})
Step 3 - Cycle through the Array of Title Texts Values to find what you're looking for. Use elementIdClick() function to click your desired value:
//Loop through the titleTexts array looking for matching text to the desired title.
for (var i = 0; i < titleTextsArray.length; i++) {
if (titleTextsArray[i].value === title) {
// Found a match - Click the corresponding element that
// it belongs to that was found above in the Titles
browser.elementIdClick(titles[i].value.ELEMENT)
}
}
I wrapped all of this into a function in which i provided the intended Text (in your case a particular title) I wanted to search for. Hope this helps!
I don't know node.js, but in Java you should achieve your goal by:
titleList[i].findElementBy(By.className("classToFind"))
assuming titleList[i] is an element on list you want to get child elements from

tbody . each( ) ; method missing watir

Hi there I am not sure if i am using the wrong watir syntax or if there is something wrong with my watir. Below if the code that I am writing to go through each row of a table body. Be
e.frame(:name => "content").frame(:name => "main").tbody(:class => "blacklabel").each(){|i|.....}
when i run this code i get a missing error code. Also when i try
e.frame(:name => "content").frame(:name => "main").tbody(:class => "blacklabel").length()
I get a missing method error. Below is the website that i am using.
You want to iterate over the rows collection rather than the table body - ie you need to call rows() before the each(). So you want to do:
my_table = e.frame(:name => "content").frame(:name => "main").tbody(:class => "blacklabel")
my_table.rows.each{|i|.....}
The tbody element uses the TableSection class. The TableSection API can be seen here - http://rdoc.info/gems/watir-classic/Watir/TableSection.

Resources