CSS selector for an element's text without child elements - text

I'm trying to pull out the text of a parent element without its child elements texts
<div>
Text A
<button>Text B</button>
<button>Text C</button>
</div>
I need only the "Text A" but all I manage to get is:
" Text A
Text B
Text C "
Any suggestions?

Related

Generate Seletor from source code, for scrapy

I am trying to create a CSS selector from the source code of a dynamic web page. I have tried with no results with:
response.css('seller-info#region *::text').get()
response.css('seller-info > region *::text').get()
response.css('.seller-info#region ::text').get()
response.css('seller-info#region ::text').get()
response.css('seller-info > region ::text').get()
response.css('seller-info:contains("to extract")::text').get()
response.css('.seller-info:contains("to extract")::text').get()
response.css('.seller-info:contains("to extract") *::text').get()
response.css('seller-info:contains("to extract") *::text').get()
Response of each: "None"
I need the text: "to extract"
*The region name is repeated in other code trees
Source code
<seller-info
username='glorious'
ispro='true'
region="to extract"
phoneurl='/pg/0.gif"'
storeurl=""
seniority=''
category="1220"
phonevisible='true'
>
<div slot="avatar">
<div class="seller-info__header--icon-container">
<i class="icon-yapo icon-briefcase "></i>
</div>
</div>
</seller-info>```
Data from your source code that you are trying to extract - this is a tag attribute value (not tag text):
region = response.css("seller-info[region]::attr(region)").get()
or:
region = response.css("seller-info::attr(region)").get()
Selectors like tagname::text aimed to extract text between opening and closing tags like <tagname> text to extract </tagname>
Your <seller-info> tag - is self-closing tag (like img tag). It store data inside its attributes.

How to scrape nested text between tags using BeautifulSoup?

I found a website using the following HTML structure somewhere:
...
<td>
<span>some span text</span>
some td text
</td>
...
I'm interested in retrieving the "some td text" and not the "some span text" but the get_text() method seems to return all the text as "some span textsome td text". Is there a way to get just the text inside a certain element using BeautifulSoup?
Not all the tds follow the same structure, so unfortunately I cannot predict the structure of the resulting string to trim it where necessary.
Each element has a name attribute, which tells you the type of tag, e.g. div, td, span. In the case there is no tag (bare content), it will be None.
So you can just use a simple list comprehension to filter out all the tag elements.
from bs4 import BeautifulSoup
html = '''
<td>
<span>some span text</span>
some td text
</td>
'''
soup = BeautifulSoup(html, 'html.parser')
content = soup.find('td')
text = [c.strip() for c in content if c.name is None and c.strip() != '']
print(text)
This will print:
['some td text']
after some cleaning of newlines and empty strings.
If you wanted to join up the content afterwards, you could use join:
print('\n'.join(text))

Finding texts of anchor tags which are children of specific class named div

<div class="outer">
<div class= ""></div>
<div class= "inner">
text1
text2
text3
</div>
</div>
Lets say there is an outer div which holds couple of child divs. First one has no class name second one includes anchor tags. And the page has a lot of divs class named "outer". How can I get texts inside these a tags? And I want to count the number of anchor tags' texts inside div class="inner". Because page has a lot of divs with class named="outer" and these divs holds different number of a href tags inside child div class named="inner".
to get a inside div.outer > div.inner do loop
outers = soup.select('div.outer')
for inner in outers:
atags = inner.select('div.inner a')
print(len(atags))
for a in atags:
print(a['href'])

jsf fill title (tooltip) with line-break based on backing-bean

So I dynamically generate a input form from backing bean. One of the dynamically generated element is an image with title value (tooltip). But I couldn't figure out how to add line-break into the title.
I tried "\r\n", "<br/>", "
", all end up showing the same value on the tooltip.
I know if it is an outputText, I could call escape="false" to make it work...but how to do it with the title field?
<img src="img/info.png" title="#{bean.info}"/>
wherea bean.info has value of
"A : apple
B : ball"
but the tooltip end up showing "A : apple
B : ball" instead of on two lines.
Or does anyone have idea how to put an outputText result into the title of an img tag?

Show only li and corresponding elements that contain visible and hidden text matching search terms?

I have this:
<ul id="container">
<li class="sub_container">
<a> + text + </a><span> + text + </span><span> + text + </span>
<span> + text + </span><a> + text + </a><div> + text + </div>
<hr></hr>
</li>
<li class="sub_container">
<a> + text + </a><span> + text + </span><span> + text + </span>
<span> + text + </span><a> + text + </a><div> + text + </div>
<hr></hr>
</li>
</ul>
and I want to be able to search some visible and hidden text (if possible), so where there is a matching term within + text + inside any <li> tags, only those relevant tags and all their contained elements (which also includes their + text +) stay visible while all other <li> and contained elements become hidden. Shown above is a sample set of two <li> and their contained elements but those tags and elements will multiply dynamically - only the content of + text + changes (+ text + parts come from spreadsheet).
I tried this for searching, where #find is a link to activate the search and #filter is the input box to type the search terms:
$jq("#find").click(function(){
$jq("ul li").hide()
.filter(":contains('"+ $jq("#filter").val() +"')").show()
return false;
})
It works but hides only the next 2 items (in this case an <a> and <span>) located immediately after the opening <li> tag. I tried this plugin at http://lomalogue.com/jquery/quicksearch/ but it gave no results. Any simple click function like the above would be appreciated. Thx ahead for guiding me with a working sample.
This parcel of code finally helped:
$jq("span.numbers, div.timestamp, span.title, div.text, hr").hide();
$jq(".text:containsIgnoreCase("+ $jq(".search").val() +")").show("fast");
$jq(".text:containsIgnoreCase("+ $jq(".search").val() +")").prev("span").show("fast");
$jq(".text:containsIgnoreCase("+ $jq(".search").val() +")").prev("span").prev("div").show("fast");
$jq(".text:containsIgnoreCase("+ $jq(".search").val() +")").prev("span").prev("div").prev("span").show("fast");
$jq(".text:containsIgnoreCase("+ $jq(".search").val() +")").next("hr").show("fast");
The above code hides all elements and then shows elements matching search terms, I guess you could do it the other way around. The headers and other items (siblings related to and located above the divs containing the search terms) display accordingly. This works great for dynamically inserted data coming from a spreadsheet, for instance, where #ids cannot be used and each section of data utilizes the same .class elements.

Resources