scrapy xpath extract text after element is assigned

scrapy xpath extract text after element is assigned - python-3.x

I have such html
<h1 id="1"><i>2</i>sample contents</h1>
I know by using the following work to get only text perfectly without html
response.xpath('//*[#id="1"]/text()').get() # sample contents
response.xpath('//*[#id="1"]/text()').extract_first() # sample contents
but if I assign into a variable then want to get only the text without html after?
For example
header = response.xpath('//*[#id="1"]')
# the below will get text WITH html tags
header.get()
header.extract_first()
What I want is if i assigned to header and I want to get text only, how am I able to do that?
Thanks in advance for any suggestions and help.
EDIT:
By testing Moein's answer, somehow what I get in return is "\r\n \r\n " spacings instead

You can continue your XPath address by calling xpath on header variable:
header.xpath('./text()').get()

Related

How do I get the class name from HTML using the absolute xpath

In the following picture, the full xpath of the yellow highlighted bit of HTML is
/html/body/bx-site/ng-component/div/sp-sports-ui/div/main/div/section/main/sp-path-event/div/sp-next-events/div/div/div[2]/div1/sp-coupon1/sp-multi-markets/section/section/sp-outcomes/sp-two-way-vertical[2]/ul/li1/sp-outcome/button
I am using selenium to scrape some data from a website. The text of the xpath is what I want but I also need the class name of the yellow highlight bit of HTML. The class name constantly changes so I need a way to retrieve the class name along with the text. In this case the class name would be "bet-btn". I am using driver.find_element_by_xpath to get the text from the html, but can't figure out a way to retrieve the class name. Using the xpath is there a way in selenium to retrieve the class name of the yellow highlighted bit.

I would advise against using absolute xpath unless you really needed to
Try this instead:
elem = driver.find_element_by_xpath("//sp-outcome/button")
class_value = elem.get_attribute("class")
BTW that xpath is assuming there are no other //sp-outcome/button element paths on that page. If there are you would need to expand it some, but you still wouldn't need the entire absolute xpath. Those are generally pretty fragile.

HTML tag with attribute containing hyphen

I have a requirement to generate HTML from python. For that I am using HTML module.
As I use angularJS in my application I have to add attribute like 'ng-controller' in div tag.(A attribute name that contain hyphen)
for example
<div ng-controller='myController'></div>
HTML module does not allow hyphen or any special character as a part of attribute name.
I tried a lot, But I am not able to find any solution. Can anyone please help me on this?

I recently checked HTML module there is raw_text() function which allow user to explicitly add HTML code.
from html import HTML()
h = HTML()
h.raw_text('<div ng-controller="a"></div>')

netsuite - inline html

I am trying to use a custom column as a hyperlink to a external site. Meaning,
In PO detail page, I want to add a custom column and I want the value of it to have the following HTML content.
Try Google
So when I go to the PO detail page I want to have a link to google.com.
How can I do this? I tried this as Inline HTML, Free-form Text and Rich text. none of them gave me a link.

I found 1 way of doing this using .
1. I added a Inline HTML field.
2. I added the default value for that as a <iframe> block which sends data to my service point.
3. In the service point I created the link (<a>) neede for that PO and print it.
That worked for me.

I had this same problem and finally found the answer. You need to create a field as Inline HTML and then enclose the link and the url in single quotes, concatenating with double pipes:
'Search Google'
More help can be found here:
http://www.netsuiterp.com/2019/06/highlighting-url-link-custom-field.html

WKHTMLTOPDF Dynamic Header on every page

I am trying to produce a PDF file using WKHTMLTOPDF library in NODE for a large HTML file. I need to be able to stuff in some content in the Header and Footer on every page. But the content on the header changes on every page for e.g, have custom numbering in a format like BX008761. The number should increment on every page.
First page will be BX008761, second page BX008762, third BX008763 so on..
I could find a thread which is related..
WKHTMLTOPDF -- Is possible to display dynamic headers?
the above thread states:
"you can feed --header-html almost anything :) Try the following to see my point:
wkhtmltopdf.exe --margin-top 30mm --header-html isitchristmas.com google.fi x.pdf
So isitchristmas.com could be www.yoursite.com/magical/ponies.php"
does the source value provided for --header-html option be called for every page of the PDF rendered or it is called just once for every PDF..?
Appreciate your support.Thank you.
EDIT : I have tried a sample program and confirmed that it will process the value provided for --header-html option on every page rendered with in PDF. I am using a remote service to return the HTML string as a response to the url.
Now it is displaying the html string as is, instead of decoding it.
when the service returns below string:
<html> <body> <span style="color:red" > 123 :: 0 :: 3000025 :: 634943551338828720</span> <body> <html>
then the header on every page is also same as above instead of displaying the text in red color. how do i make the wkhtmltohtml understand that the content it received from service need to be decoded.
appreciate if any one can suggest a workaround.
Thank you.
EDIT : I have used another work around to return a HTML page for the header content. I used essentially a HTTPHandler in asp.net to return a valid response and the issue looks to have addressed the core issue of having a dynamic header on every page.

Formatting views template output

I am using this code in a views template to print user name in a field. The output is fine except it doesn't shows as a link, as its supposed to.
<?php print $row->users_name; ?>
How can i modify the code to show the output as a link? btw this was the comment in the template file which i cant decode.
Variables available:
$view: The view object
$field: The field handler object that can process the input
$row: The raw SQL result that can be used
$output: The processed output that will normally be used.
When fetching output from the $row, this construct should be used:
$data = $row->{$field->field_alias}

What does a dpr of $row->users_name show? Is it supposed to contain the correct HTML for a link?
If not then take a look at the l function - http://api.drupal.org/api/function/l
Like Martin says you should let us know where you want the link to point to.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

scrapy xpath extract text after element is assigned - python-3.x

You can continue your XPath address by calling xpath on header variable: header.xpath('./text()').get()

Related

How do I get the class name from HTML using the absolute xpath

HTML tag with attribute containing hyphen

netsuite - inline html

WKHTMLTOPDF Dynamic Header on every page

Formatting views template output

Categories

Resources