The XML file that I am trying to parse has an element that may contain HTML. I try to get the value like this:
data.Description = root.Element(namespace+ "Description").Value;
But when the value is HTML, I get back the plain text representation of the HTML. Is there any way to get the original value of XElement?
You're looking for .ToString(), which returns the outer XML source (including the tag itself)
Related
I am working with Jodit editor on a react project. When I create a note and send to database, and displays the note, it displays fine, but when I try to edit the content and save, the rendered text/content now contains some html tags and this "<p><". I have tried using renderHTML to convert it to only plain text, but that doesn't seem to work. I really need help on converting to plain text alone. The render renderHTML works fine when creating a new note, but doesn't convert to plain text when I try to edit the content.
Before you use the Jodit contents in a non-Jodit environment, you'll need to use a utility that converts the HTML tags.
One library that does this already is html-to-text.
On the backend, the actual code is as simple as
const { conversion } = require('html-to-text');
let adjustedText = conversion('<p>Your string of HTML code here</p>');
... unless you want to feed optional parameters to the conversion() function, like wordwrap. See html-to-text documentation for that.
I have such html
<h1 id="1"><i>2</i>sample contents</h1>
I know by using the following work to get only text perfectly without html
response.xpath('//*[#id="1"]/text()').get() # sample contents
response.xpath('//*[#id="1"]/text()').extract_first() # sample contents
but if I assign into a variable then want to get only the text without html after?
For example
header = response.xpath('//*[#id="1"]')
# the below will get text WITH html tags
header.get()
header.extract_first()
What I want is if i assigned to header and I want to get text only, how am I able to do that?
Thanks in advance for any suggestions and help.
EDIT:
By testing Moein's answer, somehow what I get in return is "\r\n \r\n " spacings instead
You can continue your XPath address by calling xpath on header variable:
header.xpath('./text()').get()
I have problem with matching response text on xml page on capybara.
When I use page.should(have_content(arg1)) capybara raises error that there is no \html element (there shouldn't be as it's xml).
When I use page.should(have_xpath(arg1)) it raises Element at 40 no longer present in the DOM (Capybara::Webkit::NodeNotAttachedError)
What is the correct way to test xml ?
When using capybara-webkit, the driver will try to use a browser's HTML DOM to look for elements. That doesn't work, because you don't have an HTML document.
One workaround is to fall back to Capybara's string implementation:
xml = Capybara.string(page.body)
expect(xml).to have_xpath(arg1)
expect(xml).to have_content(arg1)
Assuming your page returns a content type of text/xml, capybara-webkit won't mess with the response body at all, so you can pass it through to Capybara.string (or directly to Nokogiri if you like).
In Orchard i can see the menu is output in MenuItem.cshtml with the line DisplayChildren(model).
I would like to take this html output and run an xslt transform to change the structure.
How can i get the menu item as the html and store this in a local variable (as opposed to it being output directly to the stream)?
var html = DisplayChildren(model);
Yup, it's that simple. It will give you a IHtmlString, that you can ToString() if you need to. But xslt? 8|
I have questions on preventing XSS attacks.
1) Question:
I have an HTML template as Javascript string (trusted) and insert content coming from a server request (untrusted). I replace placeholders within that HTML template strings with that untrusted content and output it to the DOM using innerHTML/Text.
In particular I insert texts that I output in <div> and <p> tags that are already present in the template HTML string and form element values, i.e. texts in input tag's value attribute, select option and textarea tags.
Do I understand correctly that I can treat every inserted text mentioned above as HTML subcontext thus I only encode like so: encodeForJavascript( encodeForHTML( inserted_text ) ). Or do I have to encode the texts that I insert into value attributes of the input fields for the HTML Attribute subcontext?
After reading up on this issue on OWASP I am inclined to think that latter is only necessary in case I set the attribute with unstrusted content via Javascript like so: document.forms[ 0 ].elements[ 0 ].value = encodeForHTMLAttribute, is that correct?
2) Question:
What is the added value of server side encoding server responses that enter the client side via Ajax and get handled anyway (like in question 1). In addition, don't we risk problems when double encoding the content?
Thanks
You need to encode for the context in question, so to data inserted into html context needs to be encoded for html, and data inserted into html attributes, should be html attribute encoded. This is addition to the javascript encoding you mentioned.
I would javascript encode for transfer and then encode for the correct context client side, where I know which context is the right one.