Using Mozilla firefox parser (Rendering Engine) in an extension - security

I want to build a Firefox extension which will use Firefox parser(Rendering Engine). I want to feed some HTML data to parser and in return, it will give me HTML and java-script content separately. Then I will do some processing on it. Is there any API or another way to do it?

you mean something like this...
let s = "<i>cool</i><script>alert('cool!')</script>";
var parser = new DOMParser();
let doc = parser.parseFromString(s, "text/html");
//do whatever you want....
doc.body.appendChild(doc.createElement('hr'));
alert(doc.documentElement.outerHTML)

Related

Chrome Extension : How to write content script in HTML

On certain actions, I want to open a modal/popup on the website user is visiting.
I am doing it right now as:
//contentScript.js
function on_certain_action(message) {
const injectElement = document.createElement("div");
injectElement.className = 'app_modal'
document.body.appendChild(injectElement);
}
But, it is getting complex and difficult to understand. Is there some way I can write it in HTML? OR how do you handle it, if you are injecting complex script in content?

Nodejs URL path unquoting reserved characters in pathname (sometimes)

this is kinda weird (node repl v8.15.0):
let URL = require('url').URL
let {pathname} = new URL('https://my.domain.com/e30%3D/with%3F')
console.log(pathname) // logs '/e30%3D/with%3F' <-- this looks right
then in my CloudFlare worker (using the service-worker-mock):
let URL = require('url').URL
let {pathname} = new URL('https://my.domain.com/e30%3D/with%3F')
console.log(pathname) // logs '/e30=/abc%21%3Fdef' <-- `=` unquoted in path?
I'm guessing it's probably a different version of URL? Anyway I can control that?
Your expectation that the two URL implementations parse the same way is actually correct, if you open the inspector and run the second code it should encode the way you expect. Unfortunately, as Harris points out, the Workers URL implementation is buggy and difficult to fix. I'd recommend using some sort of URL polyfill in your code to encode URLs properly.

How to use `nsIParserUtils.parseFragment()` for Firefox addon

Our Firefox addon issues queries to Google at the backend (main.js), then extracts some content through xpath. For this purpose, we use innerHTML to create a document instance for xpath parsing. But when we submit this addon to Mozilla, we got rejected because:
This add-on is creating DOM nodes from HTML strings containing potentially unsanitized data, by assigning to innerHTML, jQuery.html, or through similar means. Aside from being inefficient, this is a major security risk. For more information, see https://developer.mozilla.org/en/XUL_School/DOM_Building_and_HTML_Insertion
Following the link provided, we tried to replace innerHTML with nsIParserUtils.parseFragment(). However, the example code:
let { Cc, Ci } = require("chrome");
function parseHTML(doc, html, allowStyle, baseURI, isXML) {
let PARSER_UTILS = "#mozilla.org/parserutils;1";
...
The Cc, Ci utilities can only be used on main.js, while the function requires a document (doc) as the argument. But we could not find any examples about creating a document inside main.js, where we could not use document.implementation.createHTMLDocument("");. Because main.js is a background script, which does not have reference to the global built-in document.
I googled a lot, but still could not find any solutions. Could anybody kindly help?
You probably want to use nsIDOMParser instead, which is the same as the standard DOMParser accessible in window globals except that you can use it from privileged contexts without a window object.
Although that gives you a whole document with synthesized <html> and <body> elements if you don't provide your own. If you absolutely need a fragment you can use the html5 template element to extract a fragment via domparser:
let partialHTML = "foo <b>baz</b> bar"
let frag = parser.parseFromString(`<template>${partialHTML}</template>`, 'text/html').querySelector("template").content

Render raw html in response with Express

I would like to know how to render a raw HTML string in a response with Express.
My question is different from the others because I am not trying to render a raw HTML template; rather I just want to render a single raw HTML string.
Here is what I have tried in my route file.
router.get('/myRoute', function (req, res, next) {
var someHTML = "bar"
res.send(someHTML);
});
But when I point my browser to this route, I see a hyperlink, instead of a raw HTML string. I have tried to set the content-type to text by doing: res.setHeader('Content-Type', 'text'); with no avail.
Any suggestions?
For others arriving here; this worked best for me:
res.set('Content-Type', 'text/html');
res.send(Buffer.from('<h2>Test String</h2>'));
Edit:
And if your issue is escaping certain characters, then try using template literals: Template literals
The best way to do this is, assuming you're using callback style, declare var output=""....then go through appending what you need to the output var with +=.... use a template literal (new line doesn't matter ${variables in here}) if it's a large string... then res.writeHead(200,{Content-Type: text/html); res.end(output)
Encode the HTML before sending it. Someone made a Gist for this: https://gist.github.com/mikedeboer/1aa7cd2bbcb8e0abc16a
Just add tags around it
someHTML = "<plaintext>" + someHTML + "</plaintext>";
Just a word of caution that the plaintext is considered obsolete which means browser vendors have no obligation to implement them. However ,it still works on major browsers.
Another way you could do it is
someHTML = someHTML.replace(/</g, '<').replace(/>/g, '>');

How do I programatically fetch the live plaintext contents of an etherpad?

This question came up on the etherpad-open-source-discuss mailing list and I thought it would be useful to have it here.
Just construct a URL like so and fetch it:
http://dtherpad.com/ep/pad/export/foo/latest?format=txt
That will get the live, plaintext contents of http://dtherpad.com/foo
For example, in PHP you can grab it with
file_get_contents("http://dtherpad.com/ep/pad/export/foo/latest?format=txt")
Note that that's just the "export to plain text" link that's provided in the Import/Export menu of every pad.
A few other possibilities:
From a browser, you can hit http://your-etherpad-server.com/ep/pad/view/padId/latest?pt=1
From within the code of the collaborative editor (ace2_inner.js), use rep.alltext
Within the Etherpad's javascript, use pad.text for the most recent version of pad.getRevisionText(rev.revNum) for a specified previous revision.
It seems that the javascript functions mentioned by Ari in his response are no longer present in the current versions of Etherpad as implemented on sites like http://etherpad.mozilla.org
However you can now simply use the following javascript function, within eherpad's javascript to get the text of the latest revision
padeditor.ace.exportText()
You can get the plaintext content of etherpad using jQuery as:
jQuery(document).ready(function(){
jQuery('#export').click(function(){
var padId = 'examplePadIntense';//Id of the div in which etherpad lite is integrated
var epframeId = 'epframe'+ padId;
var frameUrl = $('#'+ epframeId).attr('src').split('?')[0];
var contentsUrl = frameUrl + "/export/txt";
jQuery.get(contentsUrl, function(data) {
var textContent = data;
});
});
});
You can also use the getText HTTP api to retrieve the contents of a pad.
See my other answer for more details.

Resources