How to extract a field from xpath which is not present in an element tag?

How to extract a field from xpath which is not present in an element tag? - python-3.x

<div class="info">
<span class="label">Establishment year</span>
"2008"
</div>
I want to extract 2008 by using xpath but the expression just selects the establishment text.
driver.find_element_by_xpath("//*[text()='Establishment year']")

As the text 2008 is within a text node to extract the text 2008 you can use the following solution:
print(driver.execute_script('return arguments[0].lastChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='info']/span[#class='label' and text()='Establishment year']/..")))).strip())

Unfortunately WebDriver does not allow find_element function result to be a Text Node so you will have to go for execute_script function like:
driver.execute_script(
"return document.evaluate(\"//div[#class='info']/node()[3]\", document, null, XPathResult.STRING_TYPE, null).stringValue;")
Demo:
More information:
XPath Tutorial
XPath Axes
XPath Operators & Functions

Related

Getting text() element in <p> with VBA/Selenium

Using Excel 2019 VBA, I am trying to get data from a paragraph on a web page with this structure.
<p>
<strong>Release Date:</strong>
" May 30th 2022"
<br>
<strong>From:</strong>
<a href=URL>Title</a>
<br>
<strong>Performers:</strong>
<a href=URL1>Name1</a>,
<a href=URL2>Name2</a>,
<a href=URL3>Name3</a>
</p>
This is the xpath for the paragraph.
/html/body/div[11]/div/div/div[1]/div[1]/div/div/p[1]
To get the individual elements ("Release Date", "From" and "Performers"), I am having to parse the entire paragraph with "Instr"s or regular expressions.
Is there a way to directly reference these elements with XPath?
For example, the "Release Date" Xpath is:
/html/body/div[11]/div/div/div[1]/div[1]/div/div/p[1]/text()[1]
I have tried to get this directly with the following but none of them work.
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]/text()")(1) - Invalid Selector
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]").Attribute("text")(1) - returns nothing
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]")(1).Attribute("text") - returns nothing
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]").text(1) - invalid procedure call
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]")(1).text - returns entire paragraph
Any advice would be greatly appreciated.

How to extract only the text which is not inside any tag using xpath with selenium and python binding

Link to the page is: "https://www.members.agta.org/assnfe/CompanySearch.asp?MODE=DETAIL&COID=1026706&COMPNAME=&CITYNAME=&STATENAME=&CITYID=0&STATEID=0&CTRYID=181&SEARCHIDENTIFIER=81.145.145.150_12/24/2019%203:31:24%20AM&RETAILMBRS=0&ORGTYPE=0&GEMSTONEID=-1&PRODUCTSID=-1&COMPANYDATA=&TID=2&GEMCOLORID=-1&GEMCUTID=-1&GEMQUALID=A"
Here is the html i am targeting:
<p><strong>Contact:</strong>
Garmendia, Diane
<br>
<strong>Email:</strong> Diane33jewels#gmail.com<br>
<strong>P:</strong> 805-957-9100<br>
<strong>F:</strong> 805-957-4191<br>
http://www.33jewels.com
<!-- <b>Email Link:</b> $MC:EMAILLINKTOFORM$ -->
</p>
I need to extract "Garmendia, Diane" using the xpath expression.
I have tried using:
cname=driver.find_element_by_xpath("//*[contains(text(), 'Contact:')]//following-sibling::text()[1]")
But the error i am getting is:
Message: invalid selector: The result of the xpath expression "//*[contains(text(), 'Contact:')]//following-sibling::text()[1]" is: [object Text]. It should be an element.

To Extract the Garmendia, Diane use javascripts executor and childNodes
Induce WebDriverWait() and wait for element_to_be_clickable() with following XPATH
Code:
element=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//p[contains(.,'Contact:')]")))
print(driver.execute_script('return arguments[0].childNodes[1].textContent;', element))

How to get substring from string using xpath 1.0 in lxml

This is the example HTML.
<html>
<a href="HarryPotter:Chamber of Secrets">
text
</a>
<a href="HarryPotter:Prisoners in Azkabahn">
text
</a>
</html>
I am in a situation where I need to extract
Chamber of Secrets
Prisoners in Azkabahn
I am using lxml 4.2.1 in python which uses xpathb1.0.
I have tried to extract using XPath
'substring-after(//a/#href,"HarryPotter:")'
which returns only "Chamber of Secrets".
and with XPath
'//a/#href[substring-after(.,"HarryPotter:")]'
which returns
'HarryPotter:Chamber of Secrets'
'HarryPotter:Prisoners in Azkabahn'
I have researched for it and got new learning but didn't find the fix of my problem.
I have hit and tried different XPath using substring-after.
In my research, I got to know that it could also be accomplished by regex too, then I tried and failed.
I found that it is easy to manipulate a string in XPath 2.0 and above using regex but we can also use regex in XPath 1.0 using XSLT extensions.
Could we do it with substring-after function, if yes then what is the XPath and if No then what is the best approach to get the desired output?
And how we can get the desired output using regex in XPath by sticking to lxml.

Try this approach to get both text values:
from lxml import html
raw_source = """<html>
<a href="HarryPotter:Chamber of Secrets">
text
</a>
<a href="HarryPotter:Prisoners in Azkabahn">
text
</a>
</html>"""
source = html.fromstring(raw_source)
for link in source.xpath('//a'):
print(link.xpath('substring-after(#href, "HarryPotter:")'))

If you want to use substring-after() and substring-before() and together
Here is example:
from lxml import html
f_html = """<html><body><table><tbody><tr><td class="df9" width="20%">
<a class="nodec1" href="javascript:reqDl(1254);" onmouseout="status='';" onmouseover="return dspSt();">
<u>
2014-2
</u>
</a>
</td></tr></tbody></table></body></html>"""
tree_html = html.fromstring(f_html)
deal_id = tree_html.xpath("//td/a/#href")
print(tree_html.xpath('substring-after(//td/a/#href, "javascript:reqDl(")'))
print(tree_html.xpath('substring-before(//td/a/#href, ")")'))
print(tree_html.xpath('substring-after(substring-before(//td/a/#href, ")"), "javascript:reqDl(")'))
Result:
1254);
javascript:reqDl(1254
1254

How can i click the third href link?

<ul id='pairSublinksLevel1' class='arial_14 bold newBigTabs'>...<ul>
<ul id='pairSublinksLevel2' class='arial_12 newBigTabs'>
<li>...</li>
<li>...</li>
<li>
<a href='/equities/...'> last data </a> #<-- HERE
</li>
<li>...</li>
Question is how can i get click third li tag ??
In my code
xpath = "//ul[#id='pairSublinksLevel2']"
element = driver.find_element_by_xpath(xpath)
actions = element.find_element_by_css_selector('a').click()
code works partially. but i want to click third li tag.
The code keeps clicking on the second tag.

Try
driver.find_element_by_xpath("//ul[#id='pairSublinksLevel2']/li[3]/a").click()
EDIT:
Thanks #DebanjanB for suggestion:
When you get the element with xpath //ul[#id='pairSublinksLevel2'] and search for a tag in its child elements, then it will return the first match(In your case, it could be inside second li tag). So you can use indexing as given above to get the specific numbered match. Please note that such indexing starts from 1 not 0.

As per the HTML you have shared you can use either of the following solutions:
Using link_text:
driver.find_element_by_link_text("last data").click()
Using partial_link_text:
driver.find_element_by_partial_link_text("last data").click()
Using css_selector:
driver.find_element_by_css_selector("ul.newBigTabs#pairSublinksLevel2 a[href*='equities']").click()
Using xpath:
driver.find_element_by_xpath("//ul[#class='arial_12 newBigTabs' and #id='pairSublinksLevel2']//a[contains(#href,'equities') and contains(.,'last data')]").click()
Reference: Official locator strategies for the webdriver

I want to get text from anchor tag using selenium python I want print text helloworld

<div class="someclass">
<p class="name">helloworld</p>
</div>
//I want to print helloworld text from anchor tag, using python selenium code.

You can do it using CSS:
.find_element_by_css_selector("p.name a")`,
or you can do it using xpath:
.find_element_by_xpath("//p[#class='name']/a")
Example:
element = self.browser.find_element_by_css_selector("p.name a")
print element.get_attribute("text")
I hope this helped, if not tell me :)

One step solution:
browser.find_element_by_xpath('//p[#class="name"]/a').get_attribute('text')
The gives you the text of anchor tag.

To get the text from any html tag using Selenium in python,
You can simply use ".get_attribute('text')".
In this case:
a_tag = self.driver.find_element_by_css_selector("p.name a")
a_tag.get_attribute('text')

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to extract a field from xpath which is not present in an element tag? - python-3.x

<div class="info"> <span class="label">Establishment year</span> "2008" </div> I want to extract 2008 by using xpath but the expression just selects the establishment text. driver.find_element_by_xpath("//*[text()='Establishment year']")

Related

Getting text() element in <p> with VBA/Selenium

How to extract only the text which is not inside any tag using xpath with selenium and python binding

How to get substring from string using xpath 1.0 in lxml

How can i click the third href link?

I want to get text from anchor tag using selenium python I want print text helloworld

Categories

Resources