I'm trying to parse through an awful website and I need some help with using cheerio.
I know that if I for example want to get html of a body of a html I do
$('body','html').html();
How do I descend through multiple elements?
(What if I want to get html > body > font > table > tbody > tr ?)
!! Have to be careful with all these elements being immediate children, I do not want to catch some other nonimmediate children (for example if table > table existed)
You could just do:
$('html > body > font > table > tbody > tr').html()
You can select just like with jQuery selectors
Related
i'm very new to selenium(3.141.0) and python3, and i got a problem that couldn't figure it out.
The html looks similar to this
<div class='a'>
<div>
<p><b>ABC</b></p>
<p><b>ABC#123</b></p>
<p><b>XYZ</b></p>
<div>
</div>
I want selenium to find if # exist inside that div, (can not target the paragraph only element because sometime the text i want to extract is inside different element BUT it's always inside that <div class='a'>) If # exist => print the whole <p><b>ABC#123</b></p> (or sometime <div>ABC#123<div> )
To find an element with contained text, you must use an XPath. From what you are describing, it looks like you want the locator
//div[#class='a']//*[contains(text(),'#')]
^ a DIV with class 'a'
^ that has a descendant element that contains the text '#' within itself or a descendant
The code would look something like
for e in driver.find_elements(By.XPATH, "//div[#class='a']//*[contains(text(),'#')]"):
print(e.get_attribute('outerHTML')
and it will print all instances of <b>ABC#123</b>, <div>ABC#123</div>, or <p>ABC#123</p>, whichever exists
I need to scrape some data off tags in a page which further has more DOM elements.
The articles are repeated and they have an xpath as:
//*[#id="post_page"]/div/div[2]/main/div/div/div/div[2]/div[2]/div/div[3]/div/article[N]
where 'N' represents the Nth article.
And within each article, the xpath for the element I'm interested in is:
/div/div/div/div/div/div/div[3]/div[1]/button[1]/span
The first thing I did was to use
Elements = driver.find_elements(By.XPATH, <first_path>)
And it fetched me all the articles in the page. PS: I did not add [N] because that would only fetch a specific article, and I'm interested in all.
Then, for each element in the list, I used find_element using the second path as follows:
for elem in Elements:
Required.append(elem.find_element(By.XPATH, <second_path>))
Where Required is a list in which I'll be storing the data. And this is where I got the element does not exist error.
I also tried adding a . before <second_path> but that didn't solve the issue either.
The complete xpath of the element is:
//*[#id="post_page"]/div/div[2]/main/div/div/div/div[2]/div[2]/div/div[3]/div/article[N]/div/div/div/div/div/div/div[3]/div[1]/button[1]/span
And the CSS Selector for the same is:
#post_page > div > div._UuSG.w77Za._21rSD._3SBW4 > main > div > div > div > div._UuSG._ayWa._3dGg1.Vlb1o._1vyTb > div._UuSG.qzupC._3cqkW > div > div:nth-child(3) > div > article:nth-child(N) > div > div > div > div > div > div > div._UuSG._3VzCT._2FoTG > div._UuSG._3dGg1._2VJFi._2h1-g > button:nth-child(1) > span
I also tried an approach using a loop where I increment a counter variable and use that as N for the whole xpath, but that didn't seem to work either. Got the same error.
Any help would be greatly appreciated.
EDIT[1]
The last span has the following class names:
<span class="_UuSG _3_54N a8-QN _2cSLK L4pn5 RiX17">Stuff I need</span>
Which are unique (collectively) in the page. This information might be relevant somehow.
I think I know your problem. When you do
Elements = driver.find_elements(By.XPATH, <first_path>)
you have already found all the elements you need here. So in your for loop, just use elem, no more "finding" is needed.
for elem in Elements:
Required.append(elem)
I would use .// to select using descendent-or-self axis starting from the current node (. means current node).
You have already tried with ./, which is pretty close.
xpath ".//span", what does the dot mean?
What is meaning of .// in XPath?
I have been struggling with this for a while but not avail.
I need to input some text in div element with xpath.
I am familiar with inputting text using id and names of elements. However, for inputting text into dynamic tables, unable to find a method using xpath. any help will be appreciated.
In the below code example, I am trying to input the text '0000'
const [ncm] = await cmtab.$x('//*[#id="sheet1"]/tbody/tr[2]/td/div/div['+ d + ']/table/tbody/tr['+ r3 + ']/td[9]');
await cmtab.evaluate(ncm, (element, value) => element.value = value, "0000");
Should be able to input the text in the table using xpath. Please help as this would mean a lot for my project.
I am trying to do some web scraping reading some lines inside a html page. I need to look for a text which is repeated through the page inside some <span> elements. In the following example I would like to end with an array of strings with ['Text number 1','Text number 2','Text number 3']
<html>
...
<span>Text number 1</span>
...
<span>Text number 2</span>
...
<span>Text number 3</span>
...
</html>
I have the following code
sElements = ' ... span'; // I declare the selector.
cs = await page.$$(sElements); // I get an array of ElementHandle
The selector is working as in Google Chrome developer tools it captures exactly the 3 elements I am looking for. Also the cs variable is filled with an array of three elements. But then I am trying
for(c in cs)
console.log(c.innerText);
But undefined is logged. I have tried with .text .value .innerText .innerHTML .textContent ... I do not know what I am missing as I think this is really simple
I have also tried this with the same undefined result.
cs = await page.$$eval(sElements, e => e.innerHTML);
Here is an example that would get the innerText of the last span element.
let spanElement;
spanElement = await this.page.$$('span');
spanElement = spanElement.pop();
spanElement = await spanElement.getProperty('innerText');
spanElement = await spanElement.jsonValue();
If you still are unable to get any text then ensure the selector is correct and that the span elements have an innerText defined (not outerText). You can run $(selector) in Chrome console to check.
I'm having issues where I can easily create a table in Jade if I use a variable that I've defined on the page, but as soon as I try to use anything else it prints a long table of nothing.
For instance I can produce a table with the below code:
table
thead
tr
th Bid ID
th Bid Value
tbody
items = [ {"bid_id":1, "bid_value":1.63},{"bid_id":2, "bid_value":1.75},{"bid_id":3, "bid_value":1.00} ]
each item, i in items
tr
td #{item.bid_id}
td #{item.bid_value}
However when I try to use the following I get a very long table that's completely empty!
table
thead
tr
th Bid ID
th Bid Value
tbody
items = all_bids
each item, i in items
tr
td #{item.bid_id}
td #{item.bid_value}
all_bids contains the exact same JSON as defined explicitly above. If I print it in the Jade view using:
p= all_bids
It prints the array correctly as:
[ {"bid_id":1, "bid_value":1.63},{"bid_id":2, "bid_value":1.75},{"bid_id":3, "bid_value":1.00} ]
Struggling to find any decent documentation on creating tables in Jade so any help would be appreciated!
Thanks!
So... is all_bids an array or maybe it is a json string?? It seems that all_bids is a string in your case. In this case each loops over characters and since characters do not have neither bid_id nor bid_value property you obtain a big and empty table.
Now how did I come up with this stuff?? Let's try to be detectives for a moment, shall we? :) Look at this line: p= all_bids. It produces this output:
[ {"bid_id":1, "bid_value":1.63},{"bid_id":2, "bid_value":1.75},{"bid_id":3, "bid_value":1.00} ]
Normally if it was an array you would get:
"[object Object],[object Object],[object Object]"
because of .toString() call (which happens behind the scene). Therefore all_bids is not an array, it's a string!
When you pass all_bids to Jade, try converting it into an object, i.e. JSON.parse(all_bids);.