Node - Cheerio - Find element that contains specific text - node.js

I am trying to get "text that I want" from the site with this structure of code:
<td class="x">
<h3 class="x"> number </h3>
<p>
text that I want;
</p>
</td>
If there will be one td with class "x" then I will do this:
$('td.x > p > a').text()
and get text that I want, but the problem is that on this site there are a lot of "td" and "h3" elements with the same class "x". The only difference is that each time the text that is in "h3" element is a different number and I know what number is in "h3" element on the place where is my link. For example:
<td class="x">
<h3 class="x"> **125** </h3>
<p>
text that I want;
</p>
</td>
The question is - is it possible to choose selector based on the text that is inside - in my example I know that in code there is h3 element with text "125" or maybe is better way to get text from "a" element in my case.

Contains is the selector you're looking for
$('h3:contains("**125**")')
This will select h3 that has the text you wanted

Related

lxml.html XPATH expression for element when the test has to be applied to the text_content not the text

I have the following html
<html>
<body>
<p style="text-align:center;margin-bottom:0pt;margin-top:0pt;text-indent:0%;font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">
<a name="_marker_1"></a>
<a name="bananabread"></a>
<font style="font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">
<a name="bananabread"></a>Ban</font> <font style="font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">ana Bread</font>
</p>
<p style="text-align:center;margin-top:10pt;margin-bottom:0pt;text-indent:0%;font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">The Best You Ever Tasted</p>
<p style="margin-top:24pt;margin-bottom:0pt;text-indent:7.69%;font-style:italic;font-family:Times New Roman;font-size:10pt;font-weight:normal;text-transform:none;font-variant: normal;">If you don't agree that this is the best banana bread you have ever eaten well I would suggest you see your doctor</p>
<p style="margin-top:10pt;margin-bottom:0pt;text-indent:7.69%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">Lots of text here describing what I am trying to capture</p>
<p style="text-align:center;margin-bottom:0pt;margin-top:0pt;text-indent:0%;font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">
<a name="_marker_2"></a>
<a name="bananapudding"></a>
<font style="font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">
<a name="bananapudding"></a>Banana</font>
<font style="font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">Pudding</font>
</p>
<p style="text-align:center;margin-top:10pt;margin-bottom:0pt;text-indent:0%;font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">Creamy and Satisfying</p>
<p style="margin-top:24pt;margin-bottom:0pt;text-indent:7.69%;font-style:italic;font-family:Times New Roman;font-size:10pt;font-weight:normal;text-transform:none;font-variant: normal;">This is the same recipe your mother used when you were ten!</p>
<p style="margin-top:10pt;margin-bottom:0pt;text-indent:7.69%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">Lots of text here describing what I am trying to capture</p>
</body>
</html>
I am trying to write an xpath expression to identify Banana Bread - my initial efforts were successful -
b_tree.xpath('.//*[starts-with(text(),"Banana Bread")]')
but I notice the error cases and upon investigation they are like the html above - another element is added inside the content I am searching for. Sometimes it is like above, a possibly unneeded font element, sometimes it is an anchor.
I worked with this answer (Related) but have not been successful
I can check for elements that have text_content() - clean up the text_content and then string match to my ultimate goal but I am hoping to learn to better apply xpath to these types of problems.
To be absolutely clear I need the text_content of the p element. But sometimes I just need the text of a font element. My existing XPATH expression works fine on the cases where there is not an intervening element. I do not know when I open the page the structure that was imposed on the document.
When the text() expression is applied to an element whose text content is interrupted by other elements, it returns a nodeset consisting of multiple text nodes, of which starts-with considers only the first. If you replace text() by ., you get the text value of the element, which is the concatenation of all text nodes, and that's what you want.
But there is still a problem with the spaces in an element like (attributes omitted, spaces are dots):
<p>
..<a></a>
..<a></a>
..<font>
....<a></a>Banana</font>
..<font>Pudding</font>
</p>
The text value of this element is _.._.._.._....Banana_..Pudding_ (underscores represent line feeds), therefore you must apply normalize-space, which normalizes this to Banana.Pudding, so that
.//*[starts-with(normalize-space(.),"Banana Pudding")]
finds this occurrence.
However, Banana Bread cannot be found, because it does not exist on the page. The element
<font>
..<a></a>Ban</font>.....<font>ana.Bread</font>
has a normalized text value of Ban.ana.Bread and you don't expect the space inside the word Banana. normalize-space removes spaces and line feeds that are invisible on the rendered page, but the two spaces in Ban.ana.Bread are both visible.
If there was no space between the two <font> elements,
.//*[starts-with(normalize-space(.),"Banana Bread")]
would detect 3 elements: the <html>, the <body> and the <p>, because "Banana Bread" are the first words in each of them. So you might better use
.//p[starts-with(normalize-space(.),"Banana Bread")]
instead.

How can I get texts with certain criteria in python with selenium? (texts with certain siblings)

It's really tricky one for me so I'll describe the question as detail as possible.
First, let me show you some example of html.
....
....
<div class="lawcon">
<p>
<span class="b1">
<label> No.1 </label>
</span>
</p>
<p>
"I Want to get 'No.1' label in span if the div[#class='lawcon'] has a certain <a> tags with "bb" title, and with a string of 'Law' in the text of it."
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Law Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.2 </label>
</p>
<p>
"But I don't want to get No.2 label because, although it has <a> tag with "bb" title, but it doesn't have a text of law in it"
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Just Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.3 </label>
</p>
<p>
"If there are multiple <a> tags with the right criteria in a single div, I want to get span(No.3) for each of those" <a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Lawyer</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">By the Law</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">But not this one</a>
...
...
...
So, here is the thing. I want to extract the text of (e.g. No.1) in div[#class='lawcon'] only if the div has a tag with "bb" title, with a string of 'Law' in it.
If inside of the div, if there isn't any tag with "bb" title, or string of "Law" in it, the span should not be collected.
What I tried was
div_list = [div.text for div in driver.find_elements_by_xpath('//span[following-sibling::a[#title="bb"]]')]
But the problem is, when it has multiple tag with right criteria in a single div, it only return just one div.
What I want to have is a location(: span numbers) list(or tuple) of those text of tags
So it should be like
[[No.1 - Law Power], [No.3 - Lawyer], [No.3 - By the Law]]
I'm not sure I have explained enough. Thank you for your interests and hopefully, enlighten me with your knowledge! I really appreciate it in advance.
Here is the simple python script to get your desired output.
links = driver.find_elements_by_xpath("//a[#title='bb' and contains(.,'Law')]")
linkData = []
for link in links:
currentList = []
currentList.append(link.find_element_by_xpath("./ancestor::div[#class='lawcon']//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer'], ['No.3-By the Law']]
I am not sure why you want the output in that format. I would prefer the below approach, so that you will get to know how many divs have the matching links and then you can access the links from the output based on the divs. Just a thought.
divs = driver.find_elements_by_xpath("//a[#title='bb' and contains(.,'Law')]//ancestor::div[#class='lawcon']")
linkData = []
for div in divs:
currentList = []
for link in div.find_elements_by_xpath(".//a[#title='bb' and contains(.,'Law')]"):
currentList.append(div.find_element_by_xpath(".//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer', 'No.3-By the Law']]
As your requirement is to extract the texts No.1 and so on, which are within a <label> tag, you have to induce WebDriverWait for the visibility_of_all_elements_located() and you will have only 2 matches (against your expectation of 3) and you can use the following Locator Strategy:
Using XPATH:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='lawcon']//a[#title='bb' and contains(.,'Law')]//preceding::label[1]")))])

display quotation marks within a variable freemarker - netsuite advanced pdf

I'm trying to show results in the PDF from the netsuite database, however some results have quotation marks, so the results are incomplete, try adding "? Html" at the end of each variable, but do not It affects the column that I want.
I hope and you can help me, greetings!
Netsuite Advanced PDF template - Freemarker
<table cellmargin="5"><#list results as result><tr>
<td style="width: 150px;">
<#if result.custitem_gg_item_image?length != 0><img src="https://----com${result.custitem_gg_item_image}" style="width: 125px; height: 125px;"/><#else><img src="https:/---.com" style="width: 125px; height: 125px;"/></#if>
</td>
<td><strong style="font-size: 12pt"><u><span>${result.itemid?html}</span></u></strong><br/><br/><strong style="font-size: 10pt"><span>${result.displayname?html}</span></strong><br/><br/>
<#if result.purchasedescription?length != 0><span>${result.purchasedescription?html}</span><#else><span>${result.salesdescription?html}</span></#if></td>
</tr>
</#list>
</table>
Just having a quotation mark in the field values shouldn't cause any issues. Do you also have html in the sales and purchase descriptions? Normally you should not do that -- leave the html for the storedetaileddescription field.
If your issue is actually html in the descriptions then see this answer: Remove HTML tags in Freemarker Template

How to find two different class names using find_elements_by_css_selector

I can't seem to find an answer for this on the net.
Here's a snippet of html code:
<td>
<div class="low-fare-day active"></div>
<div class="low-prices"></div>
</td>
<td>
<div class="low-fare-day"></div>
<div class="low-prices1"></div>
</td>
Below is my code:
I want to find the two classes low-fare-day and low-fare-day.active using css_selector, but couldn't get it working. Can anyone solve this puzzle for me?
fromdata = driver.find_elements_by_css_selector('div.low-fare-day','div.low-fare-day.active')
or
fromdata = driver.find_elements_by_css_selector('div.low-fare-day' | 'div.low-fare-day.active')
Try it:
driver.find_elements_by_css_selector('div[class*=low-fare-day]')
Explanation:
div[class*=low-fare-day] -> means you're looking for a div
div[class*=low-fare-day] -> you're selecting the class value inside the div selected before to compare values
*= after class means you will cath all that is equals or that contains the next value
div[class*=low-fare-day] -> the value to compare if the div's class contains it

Get the text of a link within a table cell

I have a table similar to this one:
<table id="space-list" class="aui list-container">
<tr class="space-list-item" data-spacekey="BLANKSPACEEXAMPLE">
<td class="entity-attribute space-name">
<a title="Blank Space Example" href="https://q-leap.atlassian.net/wiki/display/BLANKSPACEEXAMPLE/Blank+Space+Example+Home">
Blank Space Example
</a>
</td>
<td class="entity-attribute space-desc">
<span>
An example of a "Knowledge Base" type space, freely editable, accessible to everyone, may be deleted at any time.
</span>
</td>
</tr>
</table>
My PageObject code looks like this
class Space < PageObject::Elements::TableRow
def name
cell_element(index: 0).link_element(href: /q-leap/).text
end
def description
cell_element(index: 1).text
end
end
PageObject.register_widget :space, Space, :tr
class SpaceDirectoryPage
include PageObject
spaces(:space) do
table_element(:id => 'space-list')
.group_elements(:tag_name => 'tr')[1..-1]
end
end
And now I am iterating over all the rows in the table to get the content of each cell:
while true
on(SpaceDirectoryPage).space_elements.each_with_index do |space|
puts space.name
puts space.description
end
end
Which is working fine for the description, but I have no clue how to access the text of the link within the first column; tried 100s of things, nothing worked.
Thanks in advance!

Resources