Cheerio scraping: Not able to find elements in HTML response

Cheerio scraping: Not able to find elements in HTML response - node.js

I have a HTML response as:
<table id="\"tableone\"" class="\"sortable">
Now ,when I tries to find element with the ID #tableone ,it returns nothing but if I try to find using 'table' it works.
var $ = cheerio.load(html)
console.log('tableone:'+ $('#tableone').length) => 0

Try with jQuery's wild card selector. In your case you can use the * selector which will fetch all matching entries which matches the term tableone
You can try like this,
console.log('tableone:'+ $('[id*=tableone]').length);
Hope this helps!

Related

Selenium: Stale Element Reference Exception Error

I am trying to loop through all the pages of a website. but I am getting a stale element reference: element is not attached to the page document error. This happens when the script try to click the third page. The script got the error when it runs to page.click(). Any suggestions?
while driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[-1].text=='...':
links=driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')
for link in links:
if ((link.text !='...') and (link.text !='ADD DOCUMENTS')):
print('Page Number: '+ link.text)
print('Page Position: '+str(links.index(link)))
position=links.index(link)
page=driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[position]
page.click()
time.sleep(5)
driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[-1].click()

You can locate the link element each time again according to the index, not to use elements found initially.
Something like this:
amount = len(driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a'))
for i in range(1,amount+1):
link = driver.find_element_by_xpath("(//*[#id='jsGrid_vgAllCases']//a)["+str(i) +"]")
from now you can continue within your for loop with this link like this:
amount = len(driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a'))
for i in range(1,amount+1):
link = driver.find_element_by_xpath("(//*[#id='jsGrid_vgAllCases']//a)["+str(i) +"]")
if ((link.text !='...') and (link.text !='ADD DOCUMENTS')):
print('Page Number: '+ link.text)
print('Page Position: '+str(links.index(link)))
position=links.index(link)
page=driver.find_element_by_id('jsGrid_vgAllCases').find_elements_by_tag_name('a')[position]
page.click()
time.sleep(5)
(I'm not sure about the correctness of all the rest your code, just copy-pasted it)

I'm running into an issue with the Stale Element Exception too. Interesting with Firefox no problem, Chrome && Edge both fail randomly. In general i have two generic find method with retry logic, these find methods would look like:
// Yes C# but should be relevant for any WebDriver...
public static IWebElement( this IWebDriver driver, By locator)
public static IWebElement( this IWebElement element, By locator)
The WebDriver variant seems to work fine for my othe fetches as the search is always "fresh"... But the WebElement search is the one causing grief. Unfortunately the app forces me to need the WebElement version. Why he page/html will be something like:
<node id='Best closest ID Possible'>
<span>
<div>text i want</div>
<div>meh ignore this </div>
<div>More text i want</div>
</span>
<span>
<!-- same pattern ... -->
So the code get the closest element possible by id and child spans i.e. "//*[#id='...']/span" will give all the nodes of interest. This is now where i run into issues, enumerating all element, will do two XPath select i.e. "./div[1]" and "./div[3]" for pulling out the text desired. It is only in fetching the text nodes under the elements where randomly a StaleElement will be thrown. Sometimes the very first XPath fails, sometimes i'll go through a few pages, as the pages being might have 10,000's or more pages, while the structure is the same i'll spot check random pages as they all the same format. At most i've gotten through 20 consecutive pages with Chrome (ver 92.0.4515.107) or Edge (ver 94.0.986), both seem to be the latest as of now.
One solution that should work, get all the the span elements first, i.e. '//*[#id='x']/span' get my list then query from the driver like:
var nodeList = driver.FindElements(By.XPath('//*[#id='x']/span' ));
for( int idx = 0 ; idx < nodeList.Count; idx++)
{
string str1 = driver.FindElements(By.XPath("//*[#id='x']/span[idx+1]/div[1]")).GetAttribute("innerText");
string str2 = driver.FindElements(By.XPath("//*[#id='x']/span[idx+1]/div[3]")).GetAttribute("innerText");
}
```
Think it would work but, YUK! This is kind of simplified and being able to do an XPath from the respective "ID" located node would be preferable..

Scraping specific attribute in tr tag

allId=soup.find_all("tr","data-id")
I just take data-id's values. How can I scrape these tags?

To fetch value of data-id try this.
allId=soup.find_all("tr",attrs={"data-id" : True})
for item in allId:
print(item['data-id'])
You can also use css selector.
allId=soup.select("tr[data-id]")
for item in allId:
print(item['data-id'])

How to get href values from a class - Python - Selenium

<a class="link__f5415c25" href="/profiles/people/1515754-andrea-jung" title="Andrea Jung">
I have above HTML element and tried using
driver.find_elements_by_class_name('link__f5415c25')
and
driver.get_attribute('href')
but it doesn't work at all. I expected to extract values in href.
How can I do that? Thanks!

You have to first locate the element, then retrieve the attribute href, like so:
href = driver.find_element_by_class_name('link__f5415c25').get_attribute('href')
if there are multiple links associated with that class name, you can try something like:
eList = driver.find_elements_by_class_name('link__f5415c25')
hrefList = []
for e in eList:
hrefList.append(e.get_attribute('href'))
for href in hrefList:
print(href)

How to handle more than one element on Page with same xpath

I have more than one element on page which has same xpath and same id/name
,
there are two fields on page with same locators i tried to enter value at desired location with below code
element.all(by.xpath('//*[#id="testInstanceScan"]')).get(1).sendKeys('Vkumar');
but I faced error message:
Failed: Index out of bound. Trying to access element at index: 1, but there are only 1 elements that match locator By(xpath, //*[#id="testInstanceScan"]
)
if i used
element.all('#testInstanceScan').get(1).sendKeys('Vkumar');
i faced error
Failed: Invalid locator
Stack:
TypeError: Invalid locator
Xpath:
//*[#id="TubeExpirationDate"]
please suggest me on this.

Check element.all() and filter() or each() something like cssContainingText() on the Protractor API Page.
to improve your code (which doesn't work):
for (var i=1;element.all(by.xpath('//*[#id="TubeExpirationDate"]')).c‌ount();i++ ){
element.all('//*[#id="TubeExpirationDate"]').get(i).clear;
element.all('//*[#id="TubeExpirationDate"]').get(i).sendKeys‌('11252017');
browser.driver.sleep(2000);
}
try in example this:
let j = 0;
let entries = ['july', 'august', 'september', 'restOfYear'];
$$('#TubeExpirationDate').each(function(elem){
elem.clear().sendKeys('11'+entries[j]+'2017');
j++;
//browser.sleep(2000); //I suggest not to use browser.sleep at all
})
But better would be to clearly identify the exact object you'd like to send your input by using CSS Selectors for unique identification. Such as:
let elemGroup = $$('#TubeExpirationDate');
elemGroup.$('div[indiv-index ="0"]').clear().sendKeys('11-01-2017');
elemGroup.$('div[indiv-index ="1"]').clear().sendKeys('11-02-2017');
//or if you want to use a counter-variable
elemGroup.$('div[indiv-index ="'+j+'"]').clear().sendKeys('11'+entries[j]+'2017');
But as I already suggested, do also some research on Protractortest.org ... so many good explanations and examples are available there.

You need to use element.all(locator).each(eachFunction), similar to this post. element.all() returns a List of the element identified by the common locator. .each() then loops through that List. The function(element, index) that is passed into .each() gives you access to each element and its index.
element.all(by.id('TubeExpirationDate')).each(function(element, index) {
element.clear();
var date = index + 3;
element.sendKeys('11' + date + '2017');
});
As suggested by others, protractortest.org is a great resource to take the time to review.

Get element Id from a found element

I'm using the Chrome driver and Selenium tools in my CodedUI tests. I can find the element I need using the SearchProperties and a Contains operator however I need the full Id for subsequent searches.
For example I need to find an input element with Id "pm_modal_28".
This is easy enough by doing a search where Id contains "pm_modal".
I then need to parse the value "28" out of the Id that was found so I can search for the next nested element which has an Id of "dp_28".
When I use the Id property of HtmlDiv I get a NotSupportedException. Is there anyway I can get all of the Html attributes from an Element or get the Id from an element after it has been found?

Not sure if this what you are after, once the control is identified, you would have all its properties to play around with.
For example
var control = new HtmlDiv ();
control.SearchProperties.Add("Id", "MyDiv_28");
if (!control.TryFind()) return;
var newControl = new HtmlDiv();
newControl.SearchProperties.Add("Id", control.Id.Split('_')[1]);
newControl.TryFind();

HtmlDiv myDiv = new HtmlDiv(browser);
//Add the search logic u want !
myDiv.SearchProperties.Add("class", "ClassName");
string onewayforID = myDiv.Id;
string anotherWay = myDiv.GetProperty(HtmlDiv.PropertyNames.Id).ToString(); // Or u can simpy pass "Id"
See if that Works !

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cheerio scraping: Not able to find elements in HTML response - node.js

I have a HTML response as: <table id="\"tableone\"" class="\"sortable"> Now ,when I tries to find element with the ID #tableone ,it returns nothing but if I try to find using 'table' it works. var $ = cheerio.load(html) console.log('tableone:'+ $('#tableone').length) => 0

Try with jQuery's wild card selector. In your case you can use the * selector which will fetch all matching entries which matches the term tableone You can try like this, console.log('tableone:'+ $('[id*=tableone]').length); Hope this helps!

Related

Selenium: Stale Element Reference Exception Error

Scraping specific attribute in tr tag

How to get href values from a class - Python - Selenium

How to handle more than one element on Page with same xpath

Get element Id from a found element

Categories

Resources