Chrome Extension : How to write content script in HTML - google-chrome-extension

On certain actions, I want to open a modal/popup on the website user is visiting.
I am doing it right now as:
//contentScript.js
function on_certain_action(message) {
const injectElement = document.createElement("div");
injectElement.className = 'app_modal'
document.body.appendChild(injectElement);
}
But, it is getting complex and difficult to understand. Is there some way I can write it in HTML? OR how do you handle it, if you are injecting complex script in content?

Related

How to use `nsIParserUtils.parseFragment()` for Firefox addon

Our Firefox addon issues queries to Google at the backend (main.js), then extracts some content through xpath. For this purpose, we use innerHTML to create a document instance for xpath parsing. But when we submit this addon to Mozilla, we got rejected because:
This add-on is creating DOM nodes from HTML strings containing potentially unsanitized data, by assigning to innerHTML, jQuery.html, or through similar means. Aside from being inefficient, this is a major security risk. For more information, see https://developer.mozilla.org/en/XUL_School/DOM_Building_and_HTML_Insertion
Following the link provided, we tried to replace innerHTML with nsIParserUtils.parseFragment(). However, the example code:
let { Cc, Ci } = require("chrome");
function parseHTML(doc, html, allowStyle, baseURI, isXML) {
let PARSER_UTILS = "#mozilla.org/parserutils;1";
...
The Cc, Ci utilities can only be used on main.js, while the function requires a document (doc) as the argument. But we could not find any examples about creating a document inside main.js, where we could not use document.implementation.createHTMLDocument("");. Because main.js is a background script, which does not have reference to the global built-in document.
I googled a lot, but still could not find any solutions. Could anybody kindly help?
You probably want to use nsIDOMParser instead, which is the same as the standard DOMParser accessible in window globals except that you can use it from privileged contexts without a window object.
Although that gives you a whole document with synthesized <html> and <body> elements if you don't provide your own. If you absolutely need a fragment you can use the html5 template element to extract a fragment via domparser:
let partialHTML = "foo <b>baz</b> bar"
let frag = parser.parseFromString(`<template>${partialHTML}</template>`, 'text/html').querySelector("template").content

NodeJs web crawler file extension handling

I'm developing a web crawler in nodejs. I've created a unique list of the urls in the website crawle body. But some of them have extensions like jpg,mp3, mpeg ... I want to avoid crawling those who have extensions. Is there any simple way to do that?
Two options stick out.
1) Use path to check every URL
As stated in comments, you can use path.extname to check for a file extension. Thus, this:
var test = "http://example.com/images/banner.jpg"
path.extname(test); // '.jpg'
This would work, but this feels like you'll wind up having to create a list of file types you can crawl or you must avoid. That's work.
Side note -- be careful using path. Typically, url is your best tool for parsing links because path is aimed at files/directories, not urls. On some systems (Windows), using path to manipulate a url can result in drama because of the slashes involved. Fair warning!
2) Get the HEAD for each link & see if content-type is set to text/html
You may have reasons to avoid making more network calls. If so, this isn't an option. But if it is OK to make additional calls, you could grab the HEAD for each link and check the MIME type stored in content-type.
Something like this:
var headersOptions = {
method: "HEAD",
host: "http://example.com",
path: "/articles/content.html"
};
var req = http.request(headersOptions, function (res) {
// you will probably need to also do things like check
// HTTP status codes so you handle 404s, 301s, and so on
if (res.headers['content-type'].indexOf("text/html") > -1) {
// do something like queue the link up to be crawled
// or parse the link or put it in a database or whatever
}
});
req.end();
One benefit is that you only grab the HEAD, so even if the file is a gigantic video or something, it won't clog things up. You get the HEAD, see the content-type is a video or whatever, then move along because you aren't interested in that type.
Second, you don't have to keep track of file names because you're using a standard MIME type to differentiate html from other data formats.

Using Mozilla firefox parser (Rendering Engine) in an extension

I want to build a Firefox extension which will use Firefox parser(Rendering Engine). I want to feed some HTML data to parser and in return, it will give me HTML and java-script content separately. Then I will do some processing on it. Is there any API or another way to do it?
you mean something like this...
let s = "<i>cool</i><script>alert('cool!')</script>";
var parser = new DOMParser();
let doc = parser.parseFromString(s, "text/html");
//do whatever you want....
doc.body.appendChild(doc.createElement('hr'));
alert(doc.documentElement.outerHTML)

Re-inject content scripts after update

I have a chrome extension which injects an iframe into every open tab. I have a chrome.runtime.onInstalled listener in my background.js which manually injects the required scripts as follows (Details of the API here : http://developer.chrome.com/extensions/runtime.html#event-onInstalled ) :
background.js
var injectIframeInAllTabs = function(){
console.log("reinject content scripts into all tabs");
var manifest = chrome.app.getDetails();
chrome.windows.getAll({},function(windows){
for( var win in windows ){
chrome.tabs.getAllInWindow(win.id, function reloadTabs(tabs) {
for (var i in tabs) {
var scripts = manifest.content_scripts[0].js;
console.log("content scripts ", scripts);
var k = 0, s = scripts.length;
for( ; k < s; k++ ) {
chrome.tabs.executeScript(tabs[i].id, {
file: scripts[k]
});
}
}
});
}
});
};
This works fine when I first install the extension. I want to do the same when my extension is updated. If I run the same script on update as well, I do not see a new iframe injected. Not only that, if I try to send a message to my content script AFTER the update, none of the messages go through to the content script. I have seen other people also running into the same issue on SO (Chrome: message content-script on runtime.onInstalled). What is the correct way of removing old content scripts and injecting new ones after chrome extension update?
When the extension is updated Chrome automatically cuts off all the "old" content scripts from talking to the background page and they also throw an exception if the old content script does try to communicate with the runtime. This was the missing piece for me. All I did was, in chrome.runtime.onInstalled in bg.js, I call the same method as posted in the question. That injects another iframe that talks to the correct runtime. At some point in time, the old content scripts tries to talk to the runtime which fails. I catch that exception and just wipeout the old content script. Also note that, each iframe gets injected into its own "isolated world" (Isolated world is explained here: http://www.youtube.com/watch?v=laLudeUmXHM) hence newly injected iframe cannot clear out the old lingering iframe.
Hope this helps someone in future!
There is no way to "remove" old content scripts (Apart from reloading the page in question using window.location.reload, which would be bad)
If you want to be more flexible about what code you execute in your content script, use the "code" parameter in the executeScript function, that lets you pass in a raw string with javascript code. If your content script is just one big function (i.e. content_script_function) which lives in background.js
in background.js:
function content_script_function(relevant_background_script_info) {
// this function will be serialized as a string using .toString()
// and will be called in the context of the content script page
// do your content script stuff here...
}
function execute_script_in_content_page(info) {
chrome.tabs.executeScript(tabid,
{code: "(" + content_script_function.toString() + ")(" +
JSON.stringify(info) + ");"});
}
chrome.tabs.onUpdated.addListener(
execute_script_in_content_page.bind( { reason: 'onUpdated',
otherinfo: chrome.app.getDetails() });
chrome.runtime.onInstalled.addListener(
execute_script_in_content_page.bind( { reason: 'onInstalled',
otherinfo: chrome.app.getDetails() });
)
Where relevant_background_script_info contains information about the background page, i.e. which version it is, whether there was an upgrade event, and why the function is being called. The content script page still maintains all its relevant state. This way you have full control over how to handle an "upgrade" event.

Scraping URLs from a node.js data stream on the fly

I am working with a node.js project (using Wikistream as a basis, so not totally my own code) which streams real-time wikipedia edits. The code breaks each edit down into its component parts and stores it as an object (See the gist at https://gist.github.com/2770152). One of the parts is a URL. I am wondering if it is possible, when parsing each edit, to scrape the URL for each edit that shows the differences between the pre-edited and post edited wikipedia page, grab the difference (inside a span class called 'diffchange diffchange-inline', for example) and add that as another property of the object. Right not it could just be a string, does not have to be fully structured.
I've tried using nodeio and have some code like this (i am specifically trying to only scrape edits that have been marked in the comments (m[6]) as possible vandalism):
if (m[6].match(/vandal/) && namespace === "article"){
nodeio.scrape(function(){
this.getHtml(m[3], function(err, $){
//console.log('getting HTML, boss.');
console.log(err);
var output = [];
$('span.diffchange.diffchange-inline').each(function(scraped){
output.push(scraped.text);
});
vandalContent = output.toString();
});
});
} else {
vandalContent = "no content";
}
When it hits the conditional statement it scrapes one time and then the program closes out. It does not store the desired content as a property of the object. If the condition is not met, it does store a vandalContent property set to "no content".
What I am wondering is: Is it even possible to scrape like this on the fly? is the scraping bogging the program down? Are there other suggested ways to get a similar result?
I haven't used nodeio yet, but the signature looks to be an async callback, so from the program flow perspective, that happens in the background and therefore does not block the next statement from occurring (next statement being whatever is outside your if block).
It looks like you're trying to do it sequentially, which means you need to either rethink what you want your callback to do or else force it to be sequential by putting the whole thing in a while loop that exits only when you have vandalcontent (which I wouldn't recommend).
For a test, try doing a console.log on your vandalContent in the callback and see what it spits out.

Resources