Use puppeteer to search for element based on inner text - node.js

I've been attempting to do some web scraping using puppeteer, and I've run into the following problem: I want to click an element that has a specific inner text (in this case 'INHERITANCE TAX RETURN'), but everything else about the element seems to be identical to a lot of other elements on the page. I was wondering if anyone knew a way to search for an element based on its inner text. Any help would be greatly appreciated.

Have you tried:
const linkHandlers = await page.$x("//span[contains(text(), 'INHERITANCE TAX RETURN')]");
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error("Link not found");
}

Related

Office Js detachFromList() online Word issue

This is my code:
let paragraph = paragraphs.items[paragraphs.items.length - 1];
let p = paragraph.insertParagraph('', window.Word.InsertLocation.after);
if (paragraph.isListItem) {
p.detachFromList()
p.leftIndent = 0
}
It works nicely, thanks to the help from MS people watching SO issues.
However, that works on the desktop version of Word. The online version of Word does not end the bullet list when using some templates so the new content (p) is added as a part of the list. No error is thrown.
I've tried playing around with paragraph.insertBreak('line') at a few places, but I'm unsure what would be the best thing to do here in order to keep the same user experience across platforms.
Any way I can do this so it works the same both on the desktop and on the online version of Word? Tested in Chrome, used the template General Notes for testing bullet lists. Works fine online if bullet list is generated by user, through the Home menu, but General Notes doesn't work.
i just tried the exact same code i sent you in word online, and it also works, so you should be fine...
try to apply this after you insert your paragraph at the end.
Word.run(function (context) {
var listI = context.document.body.paragraphs.getLast().listItemOrNullObject;
context.load(listI);
return context.sync()
.then(function () {
if (listI.isNullObject) { // check out how i am validating if its null.
console.log("there is no list at the end")
}
else {
context.document.body.paragraphs.getLast().detachFromList();
context.document.body.paragraphs.getLast().leftIndent = 0;
return context.sync();
}
})
})

nightmare.js trying to click on link based on anchor text

I'm trying to use nightmare, in node js to click on links based on the text inside the anchor text of the link.
Here's some example code:
var Nightmare = require('nightmare');
var nightmare = Nightmare({show: true})
nightmare
.goto('https://www.wikipedia.org/')
.inject('js', 'C:/users/myname/desktop/nodejs/node_modules/jquery/dist/jquery.js')
.wait(500)
var selector = 'a';
nightmare
.evaluate(function (selector) {
// now we're executing inside the browser scope.
return document.querySelector(selector).innerText;
}, selector) // <-- that's how you pass parameters from Node scope to browser scope
.end()
.then(function(result) {
console.log(result)
})
I'm really unclear on why the inner text of all tags are not returning? I thought I could maybe do an if statement in the .evalution method, so that it would restrict the link to be clicked on to "English" for instance.
Any idea how to click on links based on the link text?
As far as I know, there is no way to select a DOM element solely on what it contains. You'll either need to select all of the anchors (like you're doing now) and filter to what you want based on innerText then issue click events directly, or you could inject jQuery and use :contains and $.click() to issue the click.
Also, if you want all of the text from the tags, you'll likely want to use document.querySelectorAll().
As an example to get all of the text:
.evaluate(function (selector) {
return document.querySelectorAll(selector)
.map(element => element.innerText);
}, selector)

Element is not currently visible and so may not be interacted with node and selenium driver

I have the following code and I cannot get the driver to click the div. It keeps throwing the error
"Element is not currently visible and so may not be interacted"
when debugging you can clearly see that the element is visible. How can I ignore the warning or the error?
var webdriver = require('selenium-webdriver')
, By = webdriver.By
, until = webdriver.until;
var driver = new webdriver.Builder().forBrowser('firefox').build();
driver.get('http://www.vapeworld.com/');
driver.manage().timeouts().implicitlyWait(10, 10000);
var hrefs = driver.findElements(webdriver.By.tagName("a"));
hrefs.then(function (elements) {
elements.forEach(function (element) {
element.getAttribute('name').then(function (obj) {
if (obj == '1_name') {
console.log(obj);
element.click();
}
});
});
});
Your code is clicking an A tag with the name "1_name". I'm looking at the page right now and that element doesn't exist, hidden or otherwise.
You'd be better served by replacing the bulk of your code with a CSS selector, "a[name='1_name']" or "a[name='" + tagName + "']", that will find the element you want with a single find. You can then click on that element.
The issue you are running into is that the element you are trying to click is not visible, thus the error message. Selenium is designed to only interact with elements that the user can see, which would be visible elements. You will need to find the element you are looking for and figure out how to make it visible. It may be clicking another link on the page or scrolling a panel over, etc.
If you don't care about user scenarios and just want to click the element, visible or not, look into .executeScript().
Looked at the website and used the F12 tool (Chrome) to investigate the page:
var elements = [].slice.call(document.getElementsByTagName("a"));
var elementNames = elements.map(function (x) { return x.getAttribute("name"); });
var filledElementNames = elementNames.filter(function (x) { return x != null; });
console.log(filledElementNames);
The content of the website http://www.vapeworld.com is very dynamic. Depending on the situation you get one or more anchors with "x_name" and not always "1_name": the output of the script in Chrome was ["2_name"] and Edge returns ["1_name", "9_name", "10_name", "17_name", "2_name"]. So "you can clearly see that the element is visible" is not true in all situations. Also there were some driver bugs on this subject so it is worthwhile to update the driver if needed. See also the answers in this SO question explaining all the criteria the driver uses. If you want to ignore this error you can catch this exception:
try {
element.click();
} catch (Exception ex) {
console.log("Error!");
}
See this documentation page for more explanation.

Material Angular md-autocomplete clear and blur after selection (multi select)

I am trying to use md-autocomplete in Angular Material as a multi selector. The idea is, that the selected element from the autocomplete will be added to an object array after selection and then the selection will be removed from the md-autocomplete. I was able to clear the md-autocomplete, but the focus stays on the md-autocomplete input and so the autocomplete suggestions are still visible.
Example:
http://cdpn.io/QjQGVQ
Code:
function selectedItemChange(item) {
$log.info('Item changed to ' + JSON.stringify(item));
if(item)
{
//check if item is already selected
if($filter('filter')(vm.contactsSelected, function (d) {return d.id === item.id;})[0])
{
$log.info('Item already selected. Will not add it again.');
}
else
{
//add id to object
vm.contactsSelected.push(item);
}
// clear search field
vm.searchText = '';
vm.selectedItem = undefined;
//somehow blur the autocomplete focus
//$mdAutocompleteCtrl.blur();
}
}
PS: I am aware I could use the contact chips of Angular Material instead, but I was still wondering how the blur could be achieved.
If you set md-no-cache="true" property inside your the list will dissapear, but input field will not be cleared. I think is better solution than clearing input field but leaving the list visible, but is up to you.

Remove parent div if duplicate words

I'm using a service called Embedly to style my RSS feeds from Google Feedburner. I have an example code over her: JsFiddle
If you look closely you will see the source (CNN) at the end of every title. This is called .provider I would like to get rid of the whole div (.embed) IF the the word CNN is located elsewere (meaning duplicate) in the div, either .description or a
I tried many things, this is one of them really straight forward code:
$('.embed').each(function() {
if($('.embed a:first **could also be .description**', this).text() == $('.provider', this).text())
$(this).remove();
});
I cant figure out why its not working. I also used it with on and live click with no luck.
I just realized the 'embeds' are not there on document tready. I added a button with click event which you can click after the embedly has loaded in: http://jsfiddle.net/2VBSX/37/
You can use success event to filter provider class from the data like this :
EDITED
$('div.newscontainer').embedly({
key: ':3eccf441bf0f43acbb076da9817af27d',
success: function(oembed, dict) {
output = $(oembed['code']);
description = $(oembed['code']).find(".description").text();
var regex =new RegExp(output.find('.provider').text(),"i");
if(regex.exec(description) == null ) {
$(dict["node"]).parent().html(output);
}
output.find("a:eq(0)").text(); // First
output.find("a:eq(1)").text(); // Provider
}
});
Checkout this jsfiddle demo
Is it that what you want?
var regex = /CNN/;
$('.embed').each(function(index, element) {
if (regex.exec($('.embed a:first').text()) != null
&& regex.exec($('.provider').text()) != null) {
element.remove();
}
});

Resources