How to use `nsIParserUtils.parseFragment()` for Firefox addon - security

Our Firefox addon issues queries to Google at the backend (main.js), then extracts some content through xpath. For this purpose, we use innerHTML to create a document instance for xpath parsing. But when we submit this addon to Mozilla, we got rejected because:
This add-on is creating DOM nodes from HTML strings containing potentially unsanitized data, by assigning to innerHTML, jQuery.html, or through similar means. Aside from being inefficient, this is a major security risk. For more information, see https://developer.mozilla.org/en/XUL_School/DOM_Building_and_HTML_Insertion
Following the link provided, we tried to replace innerHTML with nsIParserUtils.parseFragment(). However, the example code:
let { Cc, Ci } = require("chrome");
function parseHTML(doc, html, allowStyle, baseURI, isXML) {
let PARSER_UTILS = "#mozilla.org/parserutils;1";
...
The Cc, Ci utilities can only be used on main.js, while the function requires a document (doc) as the argument. But we could not find any examples about creating a document inside main.js, where we could not use document.implementation.createHTMLDocument("");. Because main.js is a background script, which does not have reference to the global built-in document.
I googled a lot, but still could not find any solutions. Could anybody kindly help?

You probably want to use nsIDOMParser instead, which is the same as the standard DOMParser accessible in window globals except that you can use it from privileged contexts without a window object.
Although that gives you a whole document with synthesized <html> and <body> elements if you don't provide your own. If you absolutely need a fragment you can use the html5 template element to extract a fragment via domparser:
let partialHTML = "foo <b>baz</b> bar"
let frag = parser.parseFromString(`<template>${partialHTML}</template>`, 'text/html').querySelector("template").content

Related

How to replace JSDOM with cheerio for Readability

I'd like to use Readability to parse out the "article" content within web pages. Readability does a good job, but it depends on JSDOM which seems to be very slow and throws errors if parsing CSS content fails (which I do not need at all), but it's not possible to ignore CSS in JSDOM, as I understand from the Gibhub issues of the project.
I've been trying to replace JSDOM with cheerio, but I haven't been able to figure out what part of its API is compatible with the output of JSDOM.
In JSDOM, the statement
var doc = new JSDOM(html);
produces a DOM object which can be passed into
let reader = new Readability(doc.window.document);
In cheerio however, there doesn't seem to anything that produces a DOM object.
I've tried
var $ = cheerio.load(html);
var object_something = $('html');
It throws an error when I call new Readability(object_something)
Error: First argument to Readability constructor should be a document object.
which clearly means that what it says. object_something is an object, but not sure what it actually is.
Is it even possible to produce a DOM object with cheerio? I have over 10 million local HTML documents, so any performance improvement would save me a lot of time.

Does origen support 93k multi_bin feature?

The examples for generating tests in the testflow create stop_bins. However there were no examples of how to generate the 93k multi_bin node. Does this feature exist in the current origen-sdk?
output node looks like this in 93k .tf file
if #FLAG then
{
multi_bin;
}
else
{
}
There is currently no direct support for creating multi_bin nodes, though in time I do expect that it will be added as a result of this effort to add support for limits tables.
In the meantime though, there is the ability to render any text and this can be used to generate what you want.
To generate the above example you could do:
if_flag :flag do
render 'multi_bin;'
end
This will also work with in-line conditions, this is the same:
render 'multi_bin;', if_flag: :flag
Additionally, on_pass and on_fail will accept a render option:
func :my_test, on_fail: { render: 'multi_bin;' }
Obviously that is creating something that will not be able to translate to other tester platforms, so the advice is to use render sparingly and only as a get out of jail card when you really need it.
Also note that for these examples to work you need at least OrigenTesters 0.11.1.

Real time Prefix matching and auto-complete in Quora

How is real time autocomplete with prefix matching implemented in Quora ?
Since Solr and Sphinx doesn't support real-time updating so what changes were made to support real time updating?
Looks like it's done using javascript and jquery. I grabbed a few key lines from the minified script on the Quora homepage that I think support this theory:
Here's an ajax call to a resource providing JSON data:
$.ajax({type:"GET",url:this.resultsQueryPath,dataType:"json",data:a,success:this.fnbind(ƒ(a){this.ajaxCallback(a)}),error:this.fnbind(ƒ(a,b,c){console.log(b,c),this.requestOutstanding=!1,this.$("##results_shell").html("Could not retrieve results: "+b)})})}
note that the successful result gets put into the "a" variable. Then later here's the autocompletion based on the keydown of the "question_box" element which is completing from the parent of "a"
this.$ ("##item input.question_box").keydown (ƒ (b) {
if (b.keyCode==9&&!b.shiftKey)for (var c=e.getLiveDomId (a.cid),d=a.parent ().orderedVisibleChildren (),f\^M=0;f<d.length-1;++f)if (c==d [f]) {
$ (this).blur (),$ ("#"+d [f+1]+" input.question_box").focus ();return!1}
})
I think this is pretty incontrovertible, but it would still be nice to have the un-minified script to compare. For instance I can't see where resultsQueryPath comes from (I can't locate it's source, may be intentionally obfuscated).

raphael text() not working in ie9

Why is the following example not working in ie9?
http://jsfiddle.net/dzyyd/2/
It spits out a console error:
"Unexpected call to method or property access."
I found it pretty quickly. You created the element, but did not put it anywhere. Once it is added to the document body, everything seems to be fine.
this._width=300;
this._height=300;
this._bgSvgContainer = document.createElement("div");
//NOTE: add the created div to the body of the document so that it is displayed
document.body.appendChild(this._bgSvgContainer);
var bgCanvas = Raphael(this._bgSvgContainer, this._width, this._height);
this._bgCanvas = bgCanvas;
var num = this._bgCanvas.text(this._width-10,this._height-10,"1");
It's really hard to tell with such a tiny code-fragment (doesn't run on any browser for me), but it's probably a scope issue this in IE during events is completely different to this using the W3C event model. See: quirksmode-Event order-Problems of the Microsoft model

Best way to handle security and avoid XSS with user entered URLs

We have a high security application and we want to allow users to enter URLs that other users will see.
This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.
What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?
Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)
Is there an argument for not supporting user entered links at all?
Clarification:
Basically our users want to input:
stackoverflow.com
And have it output to another user:
stackoverflow.com
What I really worry about is them using this in a XSS hack. I.e. they input:
alert('hacked!');
So other users get this link:
stackoverflow.com
My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.
You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?
I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?
If you think URLs can't contain code, think again!
https://owasp.org/www-community/xss-filter-evasion-cheatsheet
Read that, and weep.
Here's how we do it on Stack Overflow:
/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
return Regex.Replace(url, #"[^-A-Za-z0-9+&##/%?=~_|!:,.;\(\)]", "");
}
The process of rendering a link "safe" should go through three or four steps:
Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.
If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.
Use a library, such as OWASP-ESAPI API:
PHP - http://code.google.com/p/owasp-esapi-php/
Java - http://code.google.com/p/owasp-esapi-java/
.NET - http://code.google.com/p/owasp-esapi-dotnet/
Python - http://code.google.com/p/owasp-esapi-python/
Read the following:
https://www.golemtechnologies.com/articles/prevent-xss#how-to-prevent-cross-site-scripting
https://www.owasp.org/
http://www.secbytes.com/blog/?p=253
For example:
$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$esapi = new ESAPI( "/etc/php5/esapi/ESAPI.xml" ); // Modified copy of ESAPI.xml
$sanitizer = ESAPI::getSanitizer();
$sanitized_url = $sanitizer->getSanitizedURL( "user-homepage", $url );
Another example is to use a built-in function. PHP's filter_var function is an example:
$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$sanitized_url = filter_var($url, FILTER_SANITIZE_URL);
Using filter_var allows javascript calls, and filters out schemes that are neither http nor https. Using the OWASP ESAPI Sanitizer is probably the best option.
Still another example is the code from WordPress:
http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L2561
Additionally, since there is no way of knowing where the URL links (i.e., it might be a valid URL, but the contents of the URL could be mischievous), Google has a safe browsing API you can call:
https://developers.google.com/safe-browsing/lookup_guide
Rolling your own regex for sanitation is problematic for several reasons:
Unless you are Jon Skeet, the code will have errors.
Existing APIs have many hours of review and testing behind them.
Existing URL-validation APIs consider internationalization.
Existing APIs will be kept up-to-date with emerging standards.
Other issues to consider:
What schemes do you permit (are file:/// and telnet:// acceptable)?
What restrictions do you want to place on the content of the URL (are malware URLs acceptable)?
Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)
You don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library
It is very easy to use, all you need is an include and that is it :)
While you're on the topic, why not given a read on Design Guidelines for Secure Web Applications
If any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Python, ROR, etc)
For Pythonistas, try Scrapy's w3lib.
OWASP ESAPI pre-dates Python 2.7 and is archived on the now-defunct Google Code.
How about not displaying them as a link? Just use the text.
Combined with a warning to proceed at your own risk may be enough.
addition - see also Should I sanitize HTML markup for a hosted CMS? for a discussion on sanitizing user input
There is a library for javascript that solves this problem
https://github.com/braintree/sanitize-url
Try it =)
In my project written in JavaScript I use this regex as white list:
url.match(/^((https?|ftp):\/\/|\.{0,2}\/)/)
the only limitation is that you need to put ./ in front for files in same directory but I think I can live with that.
Using Regular Expression to prevent XSS vulnerability is becoming complicated thus hard to maintain over time while it could leave some vulnerabilities behind. Having URL validation using regular expression is helpful in some scenarios but better not be mixed with vulnerability checks.
Solution probably is to use combination of an encoder like AntiXssEncoder.UrlEncode for encoding Query portion of the URL and QueryBuilder for the rest:
public sealed class AntiXssUrlEncoder
{
public string EncodeUri(Uri uri, bool isEncoded = false)
{
// Encode the Query portion of URL to prevent XSS attack if is not already encoded. Otherwise let UriBuilder take care code it.
var encodedQuery = isEncoded ? uri.Query.TrimStart('?') : AntiXssEncoder.UrlEncode(uri.Query.TrimStart('?'));
var encodedUri = new UriBuilder
{
Scheme = uri.Scheme,
Host = uri.Host,
Path = uri.AbsolutePath,
Query = encodedQuery.Trim(),
Fragment = uri.Fragment
};
if (uri.Port != 80 && uri.Port != 443)
{
encodedUri.Port = uri.Port;
}
return encodedUri.ToString();
}
public static string Encode(string uri)
{
var baseUri = new Uri(uri);
var antiXssUrlEncoder = new AntiXssUrlEncoder();
return antiXssUrlEncoder.EncodeUri(baseUri);
}
}
You may need to include white listing to exclude some characters from encoding. That could become helpful for particular sites.
HTML Encoding the page that render the URL is another thing you may need to consider too.
BTW. Please note that encoding URL may break Web Parameter Tampering so the encoded link may appear not working as expected.
Also, you need to be careful about double encoding
P.S. AntiXssEncoder.UrlEncode was better be named AntiXssEncoder.EncodeForUrl to be more descriptive. Basically, It encodes a string for URL not encode a given URL and return usable URL.
You could use a hex code to convert the entire URL and send it to your server. That way the client would not understand the content in the first glance. After reading the content, you could decode the content URL = ? and send it to the browser.
Allowing a URL and allowing JavaScript are 2 different things.

Resources