Get first element Xpath - node.js

I have a HTML like this :
<ol class="list">
<li class="list-item " id="37647629">
<!---->
<div>
<!---->
<div>
<!---->
<book class="book">
<div class="title">
someText
</div>
<div class="year">
2022
</div>
</book>
</div>
<!---->
</div>
<!---->
</li>
<li class="list-item " id="37647778">
<!---->
<div>
<!---->
<div>
<!---->
<book class="book">
<div class="title">
someOtherText
</div>
<div class="year">
2014
</div>
</book>
</div>
</div>
<!---->
</li>
</ol>
I want to get the first book title and year, directly with two xPath expression.
I tried :
$x('//book') => Ok, get the two books list
$x('//book[0]') => Empty list
$x('//book[0]/div[#class="title"]') => Nothing
Seems I have to do this :
$x('//book')[0]
and then process title, but why I can't do this just with Xpath and directly access the first title with a Xpath expression ?

This will give you the first book title
"(//book)[1]//div[#class='title']"
And this gives the first book year
"(//book)[1]//div[#class='year']"

You're missing that XPath indexing starts at 1; JavaScript indexing starts at 0.
$x('//book') selects all book elements in the document.
$x('//book[0]') selects nothing because XPath indexing starts at 1. (It also signifies to select all book elements that are the first among siblings — not necessarily the same as the first of all book elements in the document.)
$x('//book')[0] would select the first book element because JavaScript indexing starts at 0.
$x('(//book)[1]') would select the first book element because XPath indexing starts at 1.
To select the first div with class of 'title', all in XPath:
$x('(//div[#class="title"])[1]')
or, using JavaScript to index:
$x('(//div[#class="title"])')[0]
To return just the string value without the leading/trailing whitespace, wrap in normalize-space():
$x('normalize-space((//div[#class="title"])[1])')
Note that normalize-space() will also consolidate internal whitespace, but that is of no consequence with this example.
See also
How to select first element via XPath? (And be sure not to miss the explanation of the difference between //book[1] and (//book)[1] — they are not the same.)

Related

How can I get texts with certain criteria in python with selenium? (texts with certain siblings)

It's really tricky one for me so I'll describe the question as detail as possible.
First, let me show you some example of html.
....
....
<div class="lawcon">
<p>
<span class="b1">
<label> No.1 </label>
</span>
</p>
<p>
"I Want to get 'No.1' label in span if the div[#class='lawcon'] has a certain <a> tags with "bb" title, and with a string of 'Law' in the text of it."
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Law Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.2 </label>
</p>
<p>
"But I don't want to get No.2 label because, although it has <a> tag with "bb" title, but it doesn't have a text of law in it"
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Just Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.3 </label>
</p>
<p>
"If there are multiple <a> tags with the right criteria in a single div, I want to get span(No.3) for each of those" <a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Lawyer</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">By the Law</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">But not this one</a>
...
...
...
So, here is the thing. I want to extract the text of (e.g. No.1) in div[#class='lawcon'] only if the div has a tag with "bb" title, with a string of 'Law' in it.
If inside of the div, if there isn't any tag with "bb" title, or string of "Law" in it, the span should not be collected.
What I tried was
div_list = [div.text for div in driver.find_elements_by_xpath('//span[following-sibling::a[#title="bb"]]')]
But the problem is, when it has multiple tag with right criteria in a single div, it only return just one div.
What I want to have is a location(: span numbers) list(or tuple) of those text of tags
So it should be like
[[No.1 - Law Power], [No.3 - Lawyer], [No.3 - By the Law]]
I'm not sure I have explained enough. Thank you for your interests and hopefully, enlighten me with your knowledge! I really appreciate it in advance.
Here is the simple python script to get your desired output.
links = driver.find_elements_by_xpath("//a[#title='bb' and contains(.,'Law')]")
linkData = []
for link in links:
currentList = []
currentList.append(link.find_element_by_xpath("./ancestor::div[#class='lawcon']//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer'], ['No.3-By the Law']]
I am not sure why you want the output in that format. I would prefer the below approach, so that you will get to know how many divs have the matching links and then you can access the links from the output based on the divs. Just a thought.
divs = driver.find_elements_by_xpath("//a[#title='bb' and contains(.,'Law')]//ancestor::div[#class='lawcon']")
linkData = []
for div in divs:
currentList = []
for link in div.find_elements_by_xpath(".//a[#title='bb' and contains(.,'Law')]"):
currentList.append(div.find_element_by_xpath(".//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer', 'No.3-By the Law']]
As your requirement is to extract the texts No.1 and so on, which are within a <label> tag, you have to induce WebDriverWait for the visibility_of_all_elements_located() and you will have only 2 matches (against your expectation of 3) and you can use the following Locator Strategy:
Using XPATH:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='lawcon']//a[#title='bb' and contains(.,'Law')]//preceding::label[1]")))])

Scraping multiple similar lines with python

Using a simple request I'm trying to get from this html page some information stored in "alt". The problem is that, within each instance, the information is separated in multiple lines that start with "img", and when I try to access it, I can only read the first instance of "img" and not the rest, but I'm not sure how to do it. Here's the HTML text:
<div class="archetype-tile-description-wrapper">
<div class="archetype-tile-description">
<h2>
<span class="deck-price-online">
Golgari Midrange
</span>
<span class="deck-price-paper">
Golgari Midrange
</span>
</h2>
<div class="manacost-container">
<span class="manacost">
<img alt="b" class="common-manaCost-manaSymbol sprite-mana_symbols_b" src="//assets1.mtggoldfish.com/assets/s-d69cbc552cfe8de4931deb191dd349a881ff4448ed3251571e0bacd0257519b1.gif" />
<img alt="g" class="common-manaCost-manaSymbol sprite-mana_symbols_g" src="//assets1.mtggoldfish.com/assets/s-d69cbc552cfe8de4931deb191dd349a881ff4448ed3251571e0bacd0257519b1.gif" />
</span>
</div>
<ul>
<li>Jadelight Ranger</li>
<li>Merfolk Branchwalker</li>
<li>Vraska's Contempt</li>
</ul>
</div>
</div>
Having said that, what I'm looking to get from this is both "b" and "g" and store them in a single variable.
You can probably grab those <img> elements with the class "common-manaCost-manaSymbol" like this:
imgs = soup.find_all("img",{"class":"common-manaCost-manaSymbol"})
and then you can iterate over each <img> and grab the alt property of it.
alts = []
for i in imgs:
alts.append(i['alt'])
or with a list comprehension
alts = [i['alt'] for i in imgs]

How to use aria-attribute (aria-labelledby) for combo box (input+autocomplete list) correctly?

How can I use the aria-attribute aria-labelledby for combo box (input+autocomplete list) correctly?
According to the W3C, the aria-labelledby property provides the user with a recognizable name of the object.
I've found the following example on W3C:
<div class="combobox-wrapper">
<div>
<input type="text"
aria-labelledby="ex1-label">
</div>
<ul aria-labelledby="ex1-label"></ul>
</div>
But I've noticed that aria-labelledby isn't descriptive. Values in aria-labelledby for different element are used the same.
Maybe I can use aria-labelledby like this:
<div class="combobox-wrapper">
<div>
<input type="text"
aria-labelledby="textBox">
</div>
<ul aria-labelledby="autocomplete-list"></ul>
</div>
The WAI ARIA attribute aria-labelledby is used when you can't use the normal <input> + <label> combination to label a form element, e.g. because you are using a custom form element. In other words, it is used in situations where you can't use the <label>'s for attribute to define a label for the input (e.g.
<input id="communitymode" name="communitymode" type="checkbox"> <label for="communitymode">communiti wiki</label>; note that the for attribute's value refers to the input's id attribute.)
With aria-labelledby, your reference works in the opposite direction as the for attibute: you tell the browser or the screen reader where to find the "label" for the form control it has just encountered.
<div class="combobox-wrapper">
<div>
<span id="combolabel">Select your country:</span>
<input type="text"
aria-labelledby="combolabel">
</div>
<ul aria-labelledby="combolabel"></ul>
</div>
In the above code sample, both the <input> element and the <ul> element are labelled by the <span> element with id "combolabel".
Remember the first rule of ARIA is don't use ARIA when native HTML elements exist. If you are trying to create an accessible autocomplete box try this:
http://wet-boew.github.io/v4.0-ci/demos/datalist/datalist-en.html
It does not use ARIA and follows all applicable W3C rules and guidelines.

Returning Certain 'a' class Href by Date

I have a number of divtags (as shown below) that contains a hrefthat I'm looking for. I can return the all of the hrefs and append them to a list but what I need to do is to just return the hrefs where the date equals the newest date in <li class="last first date"></li>. Any help on how I could achieve this would be great.
<div class="span8 story index_story genre-letter">
<a class="gtm-event" data-evt-action="/opinion/letters/article 1 on
/opinion/letters" data-evt-category="Section element" data-evt-
label="Position 98 of 99" href="/opinion/letters/article 1">
<span class="h2">Article 1</span>
</a>
<div class="article_info">
<ul>
<li class="last first date">February 21, 2018</li>
</ul>
</div>

How to input text to non standard input elements

I'm faced with a span element for text input instead of an input box, and I'm struggling to use Watir (Ruby) to enter text. There's no set method, there is a text method that returns the text fine, but I don't seem to be able to set the text that way.
I've also tried using span.select and span.focus and then browser.send_keys but nothing is input in the field.
<div class="UFIAddCommentInput _1osb _5yk1">
<div class="_5yk2" tabindex="-2">
<div class="_5rp7">
<div class="_1p1t">
<div class="_1p1v">
Write a reply...
</div>
</div>
<div class="_5rpb">
<div aria-autocomplete="list" aria-expanded="false" aria-haspopup="false" aria-owns="js_3i" class="_5rpu" contenteditable="true" data-testid="ufi_reply_composer" role="combobox" spellcheck="true" title="Write a reply..." id="js_3j">
<div data-contents="true">
<div data-block="true" data-offset-key="8c176-0-0" class="_45m_ _2vxa">
<span data-offset-key="8c176-0-0">
<br data-text="true">
</br>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
What could I try next? Is there a way to stop front end designers using non-standard elements?
You can use javascript to do this. The difficulty for me was to handle the nested quotes.
Two pieces of knowledge I had to figure out first before being able to do this w/ regards to nested strings:
a.) regarding how javascript handles nested quotes: http://www.w3schools.com/js/js_strings.asp
b.) on how to deal with nested quotes in ruby: Escaping single and double qoutes from a string in ruby (the %Q operator lets you set whatever you want to begin and end a string)
css_selector = "span[data-offset-key='8c176-0-0']"
b.execute_script(%Q|query="#{css_selector}"|)
b.execute_script("document.querySelector(query).innerHTML='that was tricky'")
Looks like the ability to inject JavaScript using procedures such as this enables you to be able to do just about anything Watir can't do otherwise. Good question, this was a learning experience for me too

Resources