I am trying to do some web scraping reading some lines inside a html page. I need to look for a text which is repeated through the page inside some <span> elements. In the following example I would like to end with an array of strings with ['Text number 1','Text number 2','Text number 3']
<html>
...
<span>Text number 1</span>
...
<span>Text number 2</span>
...
<span>Text number 3</span>
...
</html>
I have the following code
sElements = ' ... span'; // I declare the selector.
cs = await page.$$(sElements); // I get an array of ElementHandle
The selector is working as in Google Chrome developer tools it captures exactly the 3 elements I am looking for. Also the cs variable is filled with an array of three elements. But then I am trying
for(c in cs)
console.log(c.innerText);
But undefined is logged. I have tried with .text .value .innerText .innerHTML .textContent ... I do not know what I am missing as I think this is really simple
I have also tried this with the same undefined result.
cs = await page.$$eval(sElements, e => e.innerHTML);
Here is an example that would get the innerText of the last span element.
let spanElement;
spanElement = await this.page.$$('span');
spanElement = spanElement.pop();
spanElement = await spanElement.getProperty('innerText');
spanElement = await spanElement.jsonValue();
If you still are unable to get any text then ensure the selector is correct and that the span elements have an innerText defined (not outerText). You can run $(selector) in Chrome console to check.
Related
I want to get the text which is inside the span. However, I am not able to achieve it. The text is inside ul<li<span<a<span. I am using selenium with python.
Below is the code which I tried:
departmentCategoryContent = driver.find_elements_by_class_name('a-list-item')
departmentCategory = departmentCategoryContent.find_elements_by_tag_name('span')
after this, I am just iterating departmentCategory and printing the text using .text i.e
[ print(x.text) for x in departmentCategory ]
However, this is generating an error: AttributeError: 'list' object has no attribute 'find_elements_by_tag_name'.
Can anyone tell me what I am doing wrong and how I can get the text?
Problem:
As far as I understand, departmentCategoryContent is a list, not a single WebElement, then it doesn't have the find_elements_by_tag_name() method.
Solution:
you can choose 1 of 2 ways below:
You need for-each of list departmentCategoryContent first, then find_elements_by_tag_name().
Save time with one single statement, using find_elements_by_css_selector():
departmentCategory = driver.find_elements_by_css_selector('.a-spacing-micro.apb-browse-refinements-indent-2 .a-list-item span')
[ print(x.text) for x in departmentCategory ]
Test on devtool:
Explanation:
Your locator .a-list-item span will return all the span tag belong to the div that has class .a-list-time. There are 88 items containing the unwanted tags.
So, you need to add more specific locator to separate the other div. In this case, I use some more classes. .a-spacing-micro.apb-browse-refinements-indent-2
You're looping over the wrong thing. You want to loop through the 'a-list-item' list and find a single span element that is a child of that webElement. Try this:
departmentCategoryContent = driver.find_elements_by_class_name('a-list-item')
print(x.find_element_by_tag_name('span').text) for x in departmentCategoryContent
note that the second dom search is a find_element (not find_elements) which will return a single webElement, not a list.
I am brand new to Scrapy, and I could use a hint here. I realize that there are quite a few similar questions, but none of them seem to fix my problem. I have the following code written for a simple web scraper:
import scrapy
from ScriptScraper.items import ScriptItem
class ScriptScraper(scrapy.Spider):
name = "script_scraper"
allowed_domains = ["https://proplay.ws"]
start_urls = ["https://proplay.ws/dramas/"]
def parse(self, response):
for column in response.xpath('//div[#class="content-column one_fourth"]'):
text = column.xpath('//p/b/text()').extract()
item = ScriptItem()
item['url'] = "test"
item['title'] = text
yield item
I will want to do some more involved scraping later, but right now, I'm just trying to get the scraper to return anything at all. The HTML for the site I'm trying to scrape looks like this:
<div class="content-column one_fourth">
::before
<p>
<b>
All dramas
<br>
(in alphabetical
<br>
order):
</b>
</p>
...
</div>
and I am running the following command in the Terminal:
scrapy parse --spider=script_scraper -c parse_ITEM -d 2 https://proplay.ws/dramas/
According to my understanding of Scrapy, the code I have written should be yielding the text "All dramas"; however, it is yielding an empty array instead. Can anyone give me a hint as to why this is not producing the expected yield? Again, I apologize for the repetitive question.
your XPath expressions are not exactly as you want to extract data. If you want the first column's first-row item. Then your XPath expression should be.
item = {}
item['text'] = response.xpath ('//div[#class="content-column one_fourth"][1]/p[1]/b/text()').extract()[0].
The function extract() will return all the matches for the expression, it returns an array. If you want the first you should use extract()[0] or extract_first().
Go through this page https://devhints.io/xpath to get more knowledge related to Xpath.
I want to parse only one span tag in my html document. There are three sibling span tags without any class or I'd. I am targeting the second one only using BeautifulSoup 4.
Given the following html document:
<div class="adress">
<span>35456 street</span>
<span>city, state</span>
<span>zipcode</span>
</div>
I tried:
for spn in soup.findAll('span'):
data = spn[1].text
but it didn't work. The expected result is the text in the second span stored in a a variable:
data = "city, state"
and how to to get both the first and second span concatenated in one variable.
You are trying to slice an individual span (a Tag instance). Get rid of the for loop and slice the findAll response instead, i.e.
>>> soup.findAll('span')[1]
<span>city, state</span>
You can get the first and second tags together using:
>>> soup.findAll('span')[:2]
[<span>35456 street</span>, <span>city, state</span>]
or, as a string:
>>> "".join([str(tag) for tag in soup.findAll('span')[:2]])
'<span>35456 street</span><span>city, state</span>'
Another option:
data = soup.select_one('div > span:nth-of-type(2)').get_text(strip=True)
print(data)
Output:
city, state
I have been struggling with this for a while but not avail.
I need to input some text in div element with xpath.
I am familiar with inputting text using id and names of elements. However, for inputting text into dynamic tables, unable to find a method using xpath. any help will be appreciated.
In the below code example, I am trying to input the text '0000'
const [ncm] = await cmtab.$x('//*[#id="sheet1"]/tbody/tr[2]/td/div/div['+ d + ']/table/tbody/tr['+ r3 + ']/td[9]');
await cmtab.evaluate(ncm, (element, value) => element.value = value, "0000");
Should be able to input the text in the table using xpath. Please help as this would mean a lot for my project.
I wanted to check , if element1 or element2 is present then return true.I use following code to check if element 1 is present.Now I want to check if either element with class name element1 or element2 is present, it should return true and finish the wait condition and move on to next line
driver.wait(until.elementLocated(by.className('element1')), 10000);
Basically something like below ? :P
driver.wait(until.elementLocated(by.className('element1 || element2')), 10000);
My example DOM looks like below
<div class="element1"></div>
OR
<div class="element2"></div>
if anyone of div present, My coditions will be met..sometimes webpage generate only element 1 , sometimes it generate only element2, or sometimes both.
You can do this by using a css selector like this:
by.cssSelector("div.element1, div.element2")
The comma is an OR operator in this case.
Here is the solution to your Question-
In Selenium 3.4.0 to induce ExplicitWait you can use multiple clauses with ExpectedConditions as follows:
Java
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.or(
ExpectedConditions.visibilityOfElementLocated(By.className("element1")),
ExpectedConditions.visibilityOfElementLocated(By.className("element2"))));
Let me know if this Answers your Question.