Using webRequest API to intercept script requests, edit them and send them back - google-chrome-extension

As the title says, I'm trying to intercept script requests from the user's page, make a GET request to the script url from the background, add a bit of functionality and send it back to the user.
A few caveats:
I don't want to do this with every script request
I still have to guarantee that the script tags are executed in the original order
So far I came with two solutions, none of which work properly. The basic code:
chrome.webRequest.onBeforeRequest.addListener(
function handleRequest(request) {
// First I make the get request for the script myself SYNCHRONOUSLY,
// because the webRequest API cannot handle async.
const syncRequest = new XMLHttpRequest();
syncRequest.open('GET', request.url, false);
syncRequest.send(null);
const code = syncRequest.responseText;
},
{ urls: ['<all_urls>'] },
['blocking'],
);
Now once we have the code, there are two approaches that I've tried to insert it back into the page.
I send the code through a port to a content script, that will add it to the page inside a <script></script> tag. Along with the code, I also send an index to keep sure the scripts are inserted back into the page in the correct order. This works fine for my dummy website, but it breaks on bigger apps, like youtube, where it fails to load the image of most videos. Any tips on why this happens?
I return a redirect to a data url:
if (condition) return { cancel: false }
else return { redirectUrl: 'data:application/javascript; charset=utf-8,'.concat(alteredCode) };
This second options breaks the code formatting, sometimes removing the space, sometimes cutting it short. I'm not sure on the reason behind this behavior, it might have something to do with data url spec.
I'm stuck. I've researched pretty much every related answer on this website and couldn't find anything. Any help or information is greatly appreciated!
Thanks for your time!!!

Related

Accessing the window properties on another website by GM_xmlhttpRequest

I am currently writing a userscript for website A to access another the contents on website B. So I tried to use the GM_xmlhttpRequest to do it. However, a variable on B is written to the window property eg: window.var or responseContent.var.
However, when I tried to get the window.var, the output is undefined, which means the properties under the window variable cannot be read successfully. I guess the window object is refering to the website A but not website B, so the result is undefined (There is no window.var on A).
I am sure that the GM_xmlhttpRequest has successfully read the content of the website B because I have added console.log to see the response.responseText. I have also used the window.var to successfully visit that variable on website B by browser directly.
GM_xmlhttpRequest({
method: "GET",
url: url,
headers: {
referrer: "https://A.com"
},
onload: function (response) {
// console.log(response.responseText);
let responseContent = new Document();
responseContent = new DOMParser().parseFromString(response.responseText, "text/html");
let titleDiv = responseContent.querySelector("title");
if (titleDiv != null) {
if (titleDiv.innerText.includes("404")) {
console.log("404");
return;
} else {
console.log(responseContent.var);
console.log(window.var);
}
}
},
onerror: function (e) {
console.log(e);
}
});
I would like to retrieve content window.var on website B and show it on the console.log of A
Please help me solve the problem. Thank you in advance.
#wOxxOm's comments are on point. You cannot really get the executed javascript of another website like that. One way to go around it is to use <iframe> and post message, just like #wOxxOm said. But that may fail if the other website has policy against iframes.
If this userscript is just for your use, another way is to have two scripts, one for each website and have them both open in browser tabs. Then again you can use postMessage to have those two scripts communicate the information. Dirty solution for your second userscript would be to just post the variable info on regular interval:
// Userscript for website-b.com
// needs #grant for unsafe-window to get the window.var
setInterval(()=>{postMessage(unsafeWindow.var, "website-a.com");}, 1000);
That would send an update of var's value every second. It's not very elegant, but it's simple and works. For a more elegant solution, you may want to first postMessage from website a that will trigger postMessage(unsafeWindow.var, "website-a.com"). Working with that further, you will soon find yourself inventing an asynchronous communication protocol.
Alternatively, if the second website is simple, you can try to parse the value of var directly from HTML, or wherever the value is coming from. That's a preferred solution, but requires reverse-engineering on your part.

Login via POST does not yield valid session

I am currently trying to convert a smallish app from nodejs to golang (hence the two tags) but I'm running into a bit of trouble in doing so.
Essentially it is a very simple http POST login which I can't seem to realise. The background is that my university provides a calendar export function and I would like to provide this calendar as a feed that could be added to Google Cal.
Now the thing is that I have the whole thing working in node, but I would really like to be able realise it in go aswell.
The important bit of node code would be
var query = url.parse(req.url, true).query;
var data = {
u: query.user, // Username
p: query.password, // Password
};
needle.post(LOGIN_URL, data, {}, function (error, response) {
//extract cookies etc.
});
which is working like a charm but if I try to do the same in go
import "github.com/parnurzeal/gorequest"
//...
resp, body, err := gorequest.New().Post(LOGIN_URL).Send("u=user&p=pass").End()
//extract cookies etc.
I end up an invalid (timed out) session. I already tried using just net/http in go, which doesn't seem to change anything.
The result the POST request yields is a 302 redirect to an overview page (Btw: it is ASP based). Could it be that this is what's causing the problem, since gorequest then fetches that overview page without the cookies returned in resp, effectively creating a new session that isn't authorized, or am I overlooking something terribly simple?
So it seems that I found the answer myself by following your advice and using "net/http" and digging a little deeper into what the http.Client actually does. To anyone who might encounter similar problems, here is my solution:
http.Client automatically redirects if it receives a 30x response by the server see documentation. Although one can override the redirect policy, I was unable to prevent redirection entirely.
Additionally it seems as if the client has a bug (what I would call it at least), as it drops all header upon redirect see the issue or in the source where new headers are created.
While searching around in net/http I found http.DefaultTransport which is used by http.Client and does not redirect. It seems somewhat lower level and exactly what I was after. The following piece of code demonstrates how I replaced the line realised with gorequest from above:
data := url.Values{"u": {USER}, "p": {PASS}}
req, err := http.NewRequest("POST", LOGIN_URL, bytes.NewBufferString(data.Encode()))
//I needed quite some time to figure out that I needed to set the content type accordingly
req.Header.Add("Content-Type", "application/x-www-form-urlencoded")
//...
resp, err := http.DefaultTransport.RoundTrip(req)
//...
//resp.Header["Set-Cookie"] now contains the login/session cookies
Although I need to extract cookies myself and set a few header values, the solution works perfectly and I am quite happy with it. If anybody has some improvements to my solution or any other advice I am glad to hear it. And thanks to JimB and Volker :).

Sammy.js with Knockout.js Not Running Route With Every URL Change

I have a single Sammy route that recognizes an arbitrary number of parameters. The route looks like this:
get(/^\/(?:\?[^#]*)?#page\/?((?:[^\:\/]+\:[^\:\/]+\/?)*)$/g, function() {
var params = {};
var splat = this.params.splat[0];
var re = /([^\:\/]+)\:([^\:\/]+)/g;
match = true
while(match = re.exec(splat)) {
params[match[1]] = match[2];
}
self.loadData(params);
});
This code works. What it does is it recognizes routes of the pattern #page/param1:value1/param2:value2/ for an arbitrary number of parameters. My loadData function has default values for many of these parameters. I'm confident there isn't a problem with the actual loading of the pages, since it works 100% on many computers in many browsers. However, it has weird behavior on my Android's browser and on my friend's Mac's Safari and Chrome (works on my PC's Chrome). I've noticed that these are Webkit browsers.
The behavior is that the route runs correctly for the first URL change, then won't for the next URL change (although the URL in the browser bar does indeed always change), then it'll work again for the third one, and won't for the fourth. That is, it works every other time. This seems like very strange behavior to me, and I'm at a loss as to how to debug this. For certain links, I was able to run a hack such that on click I set the window location to the URL and forcefully run the sammy code with runRoute('get', url);. It's impractical to have to add this for every click event on the page, and that doesn't really account for all URL changes anyway. Is there something I can do to debug why my route isn't being run every time the URL is changing?
For those of you who encounter similar behavior, on every other click in the above-mentioned browsers, this.params.splat was undefined. It's supposed to be set to the matched part of the URL (e.g. /#page/param1:value1/).
The hack I came up with to deal with this is to add this to the top of the get route:
if(this.params.splat === undefined) {
app.unload().run();
return;
}
This doesn't get to the root of the problem, it's just a hack that allows it to re-run the routes so that params.splat isn't undefined the next time through. If anyone has more information on what is going on, I'd be interested.

Chrome extension, onbeforerequest, which page is making the call?

Im making a small google chrome extension that is watching all calls a page is making. The idea is to log how a page behaves and how many external calls are made. Got everything working except the part where i need to get the source url of the page that initiates the call.
For example im going to www.stackoverflow.com in my browser, then my onbeforerequest lister kicks in and gives me all the calls. So far so good. But i still want the name of the page which is making the calls, in this case i want: "www.stackoverflow.com" and the owner of the calls.
I tried getting it from the tabs, but chrome.tabs.get uses a callback and that is not called before its all over and i got all the calls processed.
any ideas on how to get the source url?
edit
Im using this code right now, to get the url, but it keeps returning "undefined":
var contentString = "";
chrome.webRequest.onBeforeRequest.addListener(
function (details) {
var tabid = details.tabId;
var sourceurl = "N/A";
if (tabid >= 0) {
chrome.tabs.get(parseInt(tabid), function (tab) {
sourceurl = tab.url;
alert(sourceurl);
});
}
});
When doing the alert, i get undefined for every request
edit 2 - this is working for me
chrome.tabs.get(parseInt(tabid), function (tab) {
if (tab != undefined) {
alert(tab.url);
}
});
onBeforeRequest returns a TabID, then you can then use the get method of the tabs API to get a reference to the tab and thus the URL of the page.
You can use the details.initiator (see here for more details) which is supported since Chrome 63 version. Below you can see what is mentioned on Chrome APIs page about the initiator.
The origin where the request was initiated. This does not change
through redirects. If this is an opaque origin, the string 'null' will
be used.

Scraping URLs from a node.js data stream on the fly

I am working with a node.js project (using Wikistream as a basis, so not totally my own code) which streams real-time wikipedia edits. The code breaks each edit down into its component parts and stores it as an object (See the gist at https://gist.github.com/2770152). One of the parts is a URL. I am wondering if it is possible, when parsing each edit, to scrape the URL for each edit that shows the differences between the pre-edited and post edited wikipedia page, grab the difference (inside a span class called 'diffchange diffchange-inline', for example) and add that as another property of the object. Right not it could just be a string, does not have to be fully structured.
I've tried using nodeio and have some code like this (i am specifically trying to only scrape edits that have been marked in the comments (m[6]) as possible vandalism):
if (m[6].match(/vandal/) && namespace === "article"){
nodeio.scrape(function(){
this.getHtml(m[3], function(err, $){
//console.log('getting HTML, boss.');
console.log(err);
var output = [];
$('span.diffchange.diffchange-inline').each(function(scraped){
output.push(scraped.text);
});
vandalContent = output.toString();
});
});
} else {
vandalContent = "no content";
}
When it hits the conditional statement it scrapes one time and then the program closes out. It does not store the desired content as a property of the object. If the condition is not met, it does store a vandalContent property set to "no content".
What I am wondering is: Is it even possible to scrape like this on the fly? is the scraping bogging the program down? Are there other suggested ways to get a similar result?
I haven't used nodeio yet, but the signature looks to be an async callback, so from the program flow perspective, that happens in the background and therefore does not block the next statement from occurring (next statement being whatever is outside your if block).
It looks like you're trying to do it sequentially, which means you need to either rethink what you want your callback to do or else force it to be sequential by putting the whole thing in a while loop that exits only when you have vandalcontent (which I wouldn't recommend).
For a test, try doing a console.log on your vandalContent in the callback and see what it spits out.

Resources