Parse returned element with nodeJS and Selenium - node.js

For the first time I'm trying Selenium and nodeJS. I need to parse HTML page and I'm having problem to parse returned element with XPATH again, to perform deeper search. Here is my code:
async function parse () {
let driver = await new Builder().forBrowser("firefox").build();
await driver.get("https://web-url.com");
await sleep(2000); //custom function
let elements = await driver.findElements(By.xpath("//ul[contains(#class, 'some-class')]//li"));
for (let i = 0; i < elements.length; i++) {
let timeContainer = await elements[i].findElement(By.xpath("//time[contains(#class, 'time-class')]")).getAttribute("datetime");
let innerHTML = await elements[i].getAttribute("innerHTML");
console.log(timeContainer);
}
await sleep(2000); //custom function
driver.close();
}
innerHTML variable returns correct HTML string for each element, but if I try to print timeContainer var, it prints same value for all elements. How to perform additional XPATH search for each element?
Thank you.

You need to make the XPath in this line elements[i].findElement(By.xpath("//time[contains(#class, 'time-class')]")) relative.
This is done by putting a dot . before the rest of XPath expression, as following:
elements[i].findElement(By.xpath(".//time[contains(#class, 'time-class')]"))
By default, when the page is scanned to find the match to the passed XPath expression it starts from the top of the page. This is why this line returns you the first (the same) match each time.
But when we put a dot . there, it starts searching inside the current node i.e. inside the elements[i] element.

Related

Puppeteer select element from innertext [duplicate]

Is there any method (didn't find in API) or solution to click on element with text?
For example I have html:
<div class="elements">
<button>Button text</button>
<a href=#>Href text</a>
<div>Div text</div>
</div>
And I want to click on element in which text is wrapped (click on button inside .elements), like:
Page.click('Button text', '.elements')
Short answer
This XPath expression will query a button which contains the text "Button text":
const [button] = await page.$x("//button[contains(., 'Button text')]");
if (button) {
await button.click();
}
To also respect the <div class="elements"> surrounding the buttons, use the following code:
const [button] = await page.$x("//div[#class='elements']/button[contains(., 'Button text')]");
Explanation
To explain why using the text node (text()) is wrong in some cases, let's look at an example:
<div>
<button>Start End</button>
<button>Start <em>Middle</em> End</button>
</div>
First, let's check the results when using contains(text(), 'Text'):
//button[contains(text(), 'Start')] will return both two nodes (as expected)
//button[contains(text(), 'End')] will only return one nodes (the first) as text() returns a list with two texts (Start and End), but contains will only check the first one
//button[contains(text(), 'Middle')] will return no results as text() does not include the text of child nodes
Here are the XPath expressions for contains(., 'Text'), which works on the element itself including its child nodes:
//button[contains(., 'Start')] will return both two buttons
//button[contains(., 'End')] will again return both two buttons
//button[contains(., 'Middle')] will return one (the last button)
So in most cases, it makes more sense to use the . instead of text() in an XPath expression.
You may use a XPath selector with page.$x(expression):
const linkHandlers = await page.$x("//a[contains(text(), 'Some text')]");
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error("Link not found");
}
Check out clickByText in this gist for a complete example. It takes care of escaping quotes, which is a bit tricky with XPath expressions.
You can also use page.evaluate() to click elements obtained from document.querySelectorAll() that have been filtered by text content:
await page.evaluate(() => {
[...document.querySelectorAll('.elements button')].find(element => element.textContent === 'Button text').click();
});
Alternatively, you can use page.evaluate() to click an element based on its text content using document.evaluate() and a corresponding XPath expression:
await page.evaluate(() => {
const xpath = '//*[#class="elements"]//button[contains(text(), "Button text")]';
const result = document.evaluate(xpath, document, null, XPathResult.ANY_TYPE, null);
result.iterateNext().click();
});
made quick solution to be able to use advanced css selectors like ":contains(text)"
so using this library you can just
const select = require ('puppeteer-select');
const element = await select(page).getElement('button:contains(Button text)');
await element.click()
The solution is
(await page.$$eval(selector, a => a
.filter(a => a.textContent === 'target text')
))[0].click()
Here is my solution:
let selector = 'a';
await page.$$eval(selector, anchors => {
anchors.map(anchor => {
if(anchor.textContent == 'target text') {
anchor.click();
return
}
})
});
There is no supported css selector syntax for text selector or a combinator option, my work around for this would be:
await page.$$eval('selector', selectorMatched => {
for(i in selectorMatched)
if(selectorMatched[i].textContent === 'text string'){
selectorMatched[i].click();
break;//Remove this line (break statement) if you want to click on all matched elements otherwise the first element only is clicked
}
});
Since OP's use case appears to be an exact match on the target string "Button text", <button>Button text</button>, text() seems like the correct method rather than the less-precise contains().
Although Thomas makes a good argument for contains when there are sub-elements, avoiding false negatives, using text() avoids a false positive when the button is, say, <button>Button text and more stuff</button>, which seems just as likely a scenario. It's useful to have both tools on hand so you can pick the more appropriate one on a case-by-case basis.
const xp = '//*[#class="elements"]//button[text()="Button text"]';
const [el] = await page.$x(xp);
await el?.click();
Note that many other answers missed the .elements parent class requirement.
Another XPath function is [normalize-space()="Button text"] which "strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space" and may be useful for certain cases.
Also, it's often handy to use waitForXPath which waits for, then returns, the element matching the XPath or throws if it's not found within the specified timeout:
const xp = '//*[#class="elements"]//button[text()="Button text"]';
const el = await page.waitForXPath(xp);
await el.click();
With puppeteer 12.0.1, the following works for me:
await page.click("input[value='Opt1']"); //where value is an attribute of the element input
await page.waitForTimeout(1000);
await page.click("li[value='Nested choice 1']"); //where value is an attribute of the element li after clicking the previous option
await page.waitForTimeout(5000);
I had to:
await this.page.$eval(this.menuSelector, elem => elem.click());
You can just use the query selector.
await page.evaluate(() => {
document.querySelector('input[type=button]').click();
});
Edits ----
You can give your button a className and use that to select the button element since you know exactly what you're trying to click:
await page.evaluate(() => {
document.querySelector('.button]').click();
});

How to check if an element is in the document with playwright?

I want to test if an element had been rendered. So I want expect that if is present. Is there a command for this?
await page.goto(‘<http://localhost:3000/>');
const logo = await page.$(‘.logo’)
// expect(logo.toBeInDocument())
If you query one element with page.$(), you can simply use:
const logo = await page.$('.logo');
if (logo) {
}
Similarly if you query multiple elements with page.$$():
const logo = await page.$$('.logo');
if (logo) {
}
Since this example returns (after awaiting) an array of element handles, you can also use property length in the condition:
const logo = await page.$$('.logo');
if (logo.length) {
}
The key in all these examples is to await the promise that page.$() and page.$$() return.
Since the use of ElementHandle (page.$(), page.$$()) is discouraged by the Playwright Team, you could use the Locator object and the count() method:
expect(await page.locator('data-testid=exists').count()).toBeTruthy();
expect(await page.locator('data-testid=doesnt-exist').count()).toBeFalsy();
If you want to check if the element is rendered (I assume you mean visible) you could use the toBeVisible assertion:
await expect(page.locator('data-testid=is-visible')).toBeVisible();

Using puppeteer how do you get all child nodes of a node?

I'm having trouble finding a way to iterate subnodes of a given node in puppeteer. I do not know the html structure beforehand, just the id of the parent element.
var elemId = "myelemid";
const doc = await page._client.send('DOM.getDocument');
const node = await page._client.send('DOM.querySelector', {
nodeId: doc.root.nodeId,
selector: '#' + elemId
});
//node.children empty
//node['object'].children empty
//try requesting childnodes
var id = node.nodeId;
var childNodes = await page._client.send('DOM.requestChildNodes', {
nodeId: id
});
//childNodes empty
//try requesting by resolveNode?
var aNode = await page._client.send('DOM.resolveNode', {
nodeId: id
});
//aNode.children is empty
Is there a way to get the children of a node if you don't know the html structure in puppeteer?
What I would do here is to use the evaluate method of Puppeteer to return the children elements of your node to your script as follows:
const nodeChildren = await page.$eval(cssSelector, (uiElement) => {
return uiElement.children;
});
console.log(nodeChildren); // Outputs the array of the nodes children
Hope this helps!
I ended up using page.evaluate to run some js that adds unique classnames to every element and subelement I want analyzed and then pass that back as JSON string since page.evaluate only returns a string. Then I just call DOM.querySelector on each of those unique selectors and loop through them that way.
Returning children from page.$eval doesn't give me protocol nodes that I can run more dev protocol functions on, and xpath doesn't solve my real problem because it can't recursively loop through all sub-children, then the sub-children of those children.
I'm closing the issue since labeling using unique classnames solves my problem.

Office JS issue with recognising ListItems

I'm trying to add a paragraph at the end of the document and escape the possibility of the newly added paragraph to be added inside a list (if the document is ending with a list).
I have the following code:
let paragraph = paragraphs.items[paragraphs.items.length - 1]
let p = paragraph.insertParagraph('', window.Word.InsertLocation.after)
if (paragraph.listItemOrNullObject) {
p.detachFromList()
p.leftIndent = 0
}
The following happens: if there is a ListItem, the code works. If not, it breaks inside the if condition, like I wrote paragraph.listItem.
Shouldn't this be used like this?
EDIT - error thrown:
name:"OfficeExtension.Error"
code:"GeneralException"
message:"GeneralException"
traceMessages:[] 0 items
innerError:null
▶debugInfo:{} 4 keys
code:"GeneralException"
message:"GeneralException"
toString:function (){return JSON.stringify(this)}
errorLocation:"Paragraph.detachFromList"
the issue here is that the *.isNullObject methods/properties does not return a regular js 'null' object, but a NullObject (a special framework type of null).
check out this code i rewrote it i think in a more efficient way. excuse my js, you can port it to ts.
hope this helps.
Word.run(function (context) {
var listI = context.document.body.paragraphs.getLast().listItemOrNullObject;
context.load(listI);
return context.sync()
.then(function () {
if (listI.isNullObject) { // check out how i am validating if its null.
console.log("there is no list at the end")
}
else {
context.document.body.paragraphs.getLast().detachFromList();
context.document.body.paragraphs.getLast().leftIndent = 0;
return context.sync();
}
})
})
listItemOrNullObject will return a null object if it isn't a ListItem. Conceptually you're if is asking "if this is a list item or it isn't a list item" which effectively will also return true.
It is failing here you are attempting to detach from a non-existent list. I would take a look at isListItem. This will tell you specifically if the paragraph is a ListItem so you only execute p.detachFromList() when in fact it is part of a list.

How to remove property from Cheerio object?

I need to get all element except first with Cheerio. So I select all and then try to delete first but after when I try to loop elements I get error that first element undefined...
var categories = $('.subtitle1');
console.log(categories.length);
delete categories[0];
delete categories['0'];
console.log(categories.length);
categories.each(function(i, element) {
console.log(element.children);
});
Result:
15
15
TypeError: Cannot read property 'children' of undefined....
If I comment delete... everything works fine except that I have first element.
Maybe this could solve your problem:
var $element = $(<htmlElement>);
$element = $element.slice(1); // Return all except first.
// $element = $element.slice(1, $element.length);
Documentation: https://github.com/cheeriojs/cheerio#slice-start-end-
So in your case this should be work:
var categories = $('.subtitle1').slice(1);
categories.each(function(i, element) {
console.log(element.children);
});

Resources