Selenium Python returns unusual data - python-3.x

I am trying to extract href from the following a tag
<a href="https://www.olx.ph/item/pioneer-pointe-condominium-unit-for-rent-1-br-fully-furnished-22k-ID8k7OP.html?h=ba76d6b70e&utm_source=Opt_Homepage_Var_1&utm_medium=Ad_Clicks&utm_campaign=Phase_2" itemprop="url" class="funnel" data-category-id="137" data-funnel-type="Select Ad" data-action-type="Select Ad" data-funnel-userid="0">
<span class="title" itemprop="name">Pioneer Pointe Condominium unit for rent - 1 BR Fully Furnished - 22K</span>
</a>
I am using the following code in Selenium with python:
links=browser.find_elements_by_xpath('//a[#itemprop="url"]')
for l in links:
print(l)
and my current unusual output is :
<selenium.webdriver.remote.webelement.WebElement (session="8b6a29a1af20221f48056d6a8f34bd63", element="0.8368598264582081-1")>
<selenium.webdriver.remote.webelement.WebElement (session="8b6a29a1af20221f48056d6a8f34bd63", element="0.8368598264582081-2")>
<selenium.webdriver.remote.webelement.WebElement (session="8b6a29a1af20221f48056d6a8f34bd63", element="0.8368598264582081-3")>
note: this is just a part of a output (first three lines)
These should be href of the a tag

just l returns the object that the browser has found you must specify which part of the object you want
for l in links:
print(l.get_attribute("href"))

Related

How do I retrieve text from a text node in Selenium

So, essentially I want to get the text from the site and print it onto console.
This is the HTML snippet:
<div class="inc-vat">
<p class="price">
<span class="smaller currency-symbol">£</span>
1,500.00
<span class="vat-text"> inc. vat</span>
</p>
</div>
Here is an image of the DOM properties:
How would I go abouts retrieving the '1,500.00'? I have tried to use self.browser.find_element_by_xpath('//*[#id="main-content"]/div/div[3]/div[1]/div[1]/text()') but that throws an error which says The result of the xpath expression is: [object Text]. It should be an element. I have also used other methods like .text but they either only print the '£' symbol, print a blank or throw the same error.
You can use below css :
p.price
sample code :-
elem = driver.find_element_by_css_selector("p.price").text.split(' ')[1]
print(elem)

How do you use find_previous() in a select query in Python?

I am trying to pull the span (lets call it AAA before a specific span - BBB. This BBB span only shows up certain times on the page and I only want the AAA's which directly precede the BBBs.
Is there a way to select AAA's that are only proceeded by BBB? Or, to get to my proposed question, how can you use find_previous when you're running a select query? I am successful if I just use select_one -
AAA= selsoup.select_one('span.BBB').find_previous().text
but when I try to use select to pull all entries I get an error message (You're probably treating a list of elements like a single element.)
I've tried applying .find_previous in a for loop but that doesnt work either. Any suggestions?
Sorry, I probably should have added this before:
Adding code from the page -
<tr class="tree">
<th class="AAA">What I want right here<span class="BBB">(Aba: The New Look)</span></th>
Instead of .find_previous() you can use + in your CSS selector:
from bs4 import BeautifulSoup
html_doc = """
<span class="ccc"">txt</span>
<span class="aaa"">This I don't Want</span>
<span class="bbb"">txt</span>
<span class="aaa"">* This I Want *</span>
<span class="ccc"">txt</span>
<span class="aaa"">This I don't Want</span>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for aaa in soup.select(".bbb + .aaa"):
print(aaa.text)
Prints:
* This I Want *
EDIT: Based on your edit:
bbb = soup.select_one(".AAA .BBB")
print(bbb.text)
Prints:
(Aba: The New Look)

how to write inside the <p> tag using selenium webdriver in python 3?

My html code is :
<p contenteditable="true" ></p>
I want to write code inside the p tag using selenium webdriver.
i tried
browser = webdriver.Chrome()
browser.get(url_link)
write = browser.find_element_by_tag_name("p['contenteditable':'True'")
write.text('hello')
but it gives error
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector:
An invalid or illegal selector was specified
Change it to locator like xpath to get the element.
Here is an xpath example
write = browser.find_element_by_xpath("//p[#conteneditable=‘true’]")

How can i click the third href link?

<ul id='pairSublinksLevel1' class='arial_14 bold newBigTabs'>...<ul>
<ul id='pairSublinksLevel2' class='arial_12 newBigTabs'>
<li>...</li>
<li>...</li>
<li>
<a href='/equities/...'> last data </a> #<-- HERE
</li>
<li>...</li>
Question is how can i get click third li tag ??
In my code
xpath = "//ul[#id='pairSublinksLevel2']"
element = driver.find_element_by_xpath(xpath)
actions = element.find_element_by_css_selector('a').click()
code works partially. but i want to click third li tag.
The code keeps clicking on the second tag.
Try
driver.find_element_by_xpath("//ul[#id='pairSublinksLevel2']/li[3]/a").click()
EDIT:
Thanks #DebanjanB for suggestion:
When you get the element with xpath //ul[#id='pairSublinksLevel2'] and search for a tag in its child elements, then it will return the first match(In your case, it could be inside second li tag). So you can use indexing as given above to get the specific numbered match. Please note that such indexing starts from 1 not 0.
As per the HTML you have shared you can use either of the following solutions:
Using link_text:
driver.find_element_by_link_text("last data").click()
Using partial_link_text:
driver.find_element_by_partial_link_text("last data").click()
Using css_selector:
driver.find_element_by_css_selector("ul.newBigTabs#pairSublinksLevel2 a[href*='equities']").click()
Using xpath:
driver.find_element_by_xpath("//ul[#class='arial_12 newBigTabs' and #id='pairSublinksLevel2']//a[contains(#href,'equities') and contains(.,'last data')]").click()
Reference: Official locator strategies for the webdriver

Trouble in scraping specific element having same classname using beautifulsoup python

How can I extract text with status information Semi-Furnished,
Available immediately for Family on,Semi-Furnished.
As the div class="proDetailsRowElm" has detail and status information i am ending up getting detail an status information in my list.
Could you please help me to get only status information?
HTML CODE
<div class="proDetailsRowElm">
<label>Details:</label>
<div class="proDetailsRow__list">
<span class="proDetailsRow__item">3 Bathroom</span>
<span class="proDetailsRow__item">3 Balcony</span>
</div>
<a class='stop-propagation underline font-type-4 view-details-link' href="javascript:void(0);" onclick="stopPage=true;window.open('/propertyDetails/3-BHK-1800-Sq-ft-Multistorey-Apartment-FOR-Rent-Kadubeesanahalli-in-Bangalore&id=4d423330363332363633', '_blank');callDetailPropertData('30632663');addViewedPropertyToCookie('30632663',1);detailViewTrack('30632663');clicktrack('1', 'propertyId=30632663,'+'2', 'div'+',sessionId='+sessionId ,'Rent','Kadubeesanahalli','Agent','91','Bangalore' ,'','', 'N','35,000','','3','Multistorey Apartment','','','8','','',false,'','',''); trackPropertyPosition('1', '2', '30632663', 'div')"></a>
</div>
<div class="proDetailsRowElm">
<label>Status:</label>
Semi-Furnished,
Available immediately for Family
</div>
Python code
property_status_list=soup.find_all('div',class_='proDetailsRowElm')
for property_status in property_status_list:
for element in property_status_list:
print(element.text)
Above code Output
Details:
3 Bathroom
3 Balcony
Status:
Furnished,
Available immediately for Family
Required Output
Status:
Furnished,
Available immediately for Family
I'm by no means a BeautifulSoup expert but you might be able to use next_sibling:
property_status_list=soup.find_all('div',class_='proDetailsRowElm')
for property_status in property_status_list:
try:
k = property_status.find('label', text='Status:').next_sibling
print(repr(k))
except:
pass
Returns:
'\nSemi-Furnished,\nAvailable immediately for Family\n'

Resources