How do I retrieve text from a text node in Selenium - python-3.x

So, essentially I want to get the text from the site and print it onto console.
This is the HTML snippet:
<div class="inc-vat">
<p class="price">
<span class="smaller currency-symbol">£</span>
1,500.00
<span class="vat-text"> inc. vat</span>
</p>
</div>
Here is an image of the DOM properties:
How would I go abouts retrieving the '1,500.00'? I have tried to use self.browser.find_element_by_xpath('//*[#id="main-content"]/div/div[3]/div[1]/div[1]/text()') but that throws an error which says The result of the xpath expression is: [object Text]. It should be an element. I have also used other methods like .text but they either only print the '£' symbol, print a blank or throw the same error.

You can use below css :
p.price
sample code :-
elem = driver.find_element_by_css_selector("p.price").text.split(' ')[1]
print(elem)

Related

Selenium Can't Find Element Returning None or []

im having trouble accessing element, here is my code:
driver.get(url)
desc = driver.find_elements_by_xpath('//p[#class="somethingcss xxx"]')
and im trying to use another method like this
desc = driver.find_elements_by_class_name('somethingcss xxx')
the element i try to find like this
<div data-testid="descContainer">
<div class="abc1123">
<h2 class="xxx">The Description<span data-tid="prodTitle">The Description</span></h2>
<p data-id="paragraphxx" class="somethingcss xxx">sometext here
<br>text
<br>
<br>text
<br> and several text with
<br> tag below
</p>
</div>
<!--and another div tag below-->
i want to extract tag p inside div class="abc1123", but it doesn't return any result, only return [] when i try to get_attribute or extract it to text.
When i try extract another element using this method with another class, it works perfectly.
Does anyone know why I can't access these elements?
Try the following css selector to locate p tag.
print(driver.find_element_by_css_selector("p[data-id^='paragraph'][class^='somethingcss']").text)
OR Use get_attribute("textContent")
print(driver.find_element_by_css_selector("p[data-id^='paragraph'][class^='somethingcss']").get_attribute("textContent"))

Scrape a span text from multiple span elements of same name within a p tag in a website

I want to scrape the text from the span tag within multiple span tags with similar names. Using python, beautifulsoup to parse the website.
Just cannot uniquely identify that specific gross-amount span element.
The span tag has name=nv and a data value but the other one has that too. I just wanna extract the gross numerical dollar figure in millions.
Please advise.
this is the structure :
<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="93122">93,122</span>
<span class="ghost">|</span>
<span class="text-muted">Gross:</span>
<span name="nv" data-value="69,645,701">$69.65M</span>
</p>
Want the text from second span under span class= text muted Gross.
What you can do is find the <span> tag that has the text 'Gross:'. Then, once it finds that tag, tell it to go find the next <span> tag (which is the value amount), and get that text.
from bs4 import BeautifulSoup as BS
html = '''<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="93122">93,122</span>
<span class="ghost">|</span>
<span class="text-muted">Gross:</span>
<span name="nv" data-value="69,645,701">$69.65M</span>
</p>'''
soup = BS(html, 'html.parser')
gross_value = soup.find('span', text='Gross:').find_next('span').text
Output:
print (gross_value)
$69.65M
or if you want to get the data-value, change that last line to:
gross_value = soup.find('span', text='Gross:').find_next('span')['data-value']
Output:
print (gross_value)
69,645,701
And finally, if you need those values as an integer instead of a string, so you can aggregate in some way later:
gross_value = int(soup.find('span', text='Gross:').find_next('span')['data-value'].replace(',', ''))
Output:
print (gross_value)
69645701

Trouble in scraping specific element having same classname using beautifulsoup python

How can I extract text with status information Semi-Furnished,
Available immediately for Family on,Semi-Furnished.
As the div class="proDetailsRowElm" has detail and status information i am ending up getting detail an status information in my list.
Could you please help me to get only status information?
HTML CODE
<div class="proDetailsRowElm">
<label>Details:</label>
<div class="proDetailsRow__list">
<span class="proDetailsRow__item">3 Bathroom</span>
<span class="proDetailsRow__item">3 Balcony</span>
</div>
<a class='stop-propagation underline font-type-4 view-details-link' href="javascript:void(0);" onclick="stopPage=true;window.open('/propertyDetails/3-BHK-1800-Sq-ft-Multistorey-Apartment-FOR-Rent-Kadubeesanahalli-in-Bangalore&id=4d423330363332363633', '_blank');callDetailPropertData('30632663');addViewedPropertyToCookie('30632663',1);detailViewTrack('30632663');clicktrack('1', 'propertyId=30632663,'+'2', 'div'+',sessionId='+sessionId ,'Rent','Kadubeesanahalli','Agent','91','Bangalore' ,'','', 'N','35,000','','3','Multistorey Apartment','','','8','','',false,'','',''); trackPropertyPosition('1', '2', '30632663', 'div')"></a>
</div>
<div class="proDetailsRowElm">
<label>Status:</label>
Semi-Furnished,
Available immediately for Family
</div>
Python code
property_status_list=soup.find_all('div',class_='proDetailsRowElm')
for property_status in property_status_list:
for element in property_status_list:
print(element.text)
Above code Output
Details:
3 Bathroom
3 Balcony
Status:
Furnished,
Available immediately for Family
Required Output
Status:
Furnished,
Available immediately for Family
I'm by no means a BeautifulSoup expert but you might be able to use next_sibling:
property_status_list=soup.find_all('div',class_='proDetailsRowElm')
for property_status in property_status_list:
try:
k = property_status.find('label', text='Status:').next_sibling
print(repr(k))
except:
pass
Returns:
'\nSemi-Furnished,\nAvailable immediately for Family\n'

conditional xpath statement

This is a piece of HTML from which I'd like to extract information from:
<li>
<p><strong class="more-details-section-header">Provenance</strong></p>
<p>Galerie Max Hetzler, Berlin<br>Acquired from the above by the present owner</p>
</li>
I'd like to have an xpath expression which extracts the content of the 2nd <p> ... </p> depending if there's a sibling before with <p> ... Provenance ... </p>
This is to where I got so far:
if "Provenance" in response.xpath('//strong[#class="more-details-section-header"]/text()').extract():
print("provenance = yes")
But how do I get to Galerie Max Hetzler, Berlin<br>Acquired from the above by the present owner ?
I tried
if "Provenance" in response.xpath('//strong[#class="more-details-section-header"]/text()').extract():
print("provenance = yes ", response.xpath('//strong[#class="more-details-section-header"]/following-sibling::p').extract())
But am getting []
You should use
//p[preceding-sibling::p[1]/strong='Provenance']/text()

Python 3 BeautifulSoup4 search for text in source page

I want to search for all '1' in the source code and print the location of that '1' ex: <div id="yeahboy">1</div> the '1' could be replaced by any other string. I want to see the tag around that string.
Consider this context for example * :
from bs4 import BeautifulSoup
html = """<root>
<div id="yeahboy">1</div>
<div id="yeahboy">2</div>
<div id="yeahboy">3</div>
<div>
<span class="nested">1</span>
</div>
</root>"""
soup = BeautifulSoup(html)
You can use find_all() passing parameter True to indicate that you want only element nodes (instead of the child text nodes), and parameter text="1" to indicate that the element you want must have text content equals "1" -or any other text you want to search for- :
for element1 in soup.find_all(True, text="1"):
print(element1)
Output :
<div id="yeahboy">1</div>
<span class="nested">1</span>
*) For OP: for future questions, try to give a context, just like the above context example. That will make your question more concrete and easier to answer -as people doesn't have to create context on his own, which may turn out to be not relevant to the situation that you actually have.

Resources