Chrome Extension - Scrape any url, ignoring sandboxing and Content Security Policy?

Chrome Extension - Scrape any url, ignoring sandboxing and Content Security Policy? - google-chrome-extension

I'd like to build a chrome extension that can make requests against any web page that the user has access to, even pages that are protected by Content Security Policies, preferably in the background (without having to have the page open in the browser).
So for example, I'd like to be able to:
request info from a page the user may be logged into, like Gmail
request info from a RSS/other pages
request info from pages on Facebook
Is this possible? It seems like I could have the extension open a new window, and a tab for every page I want to pull info from. Is this the only way this can work? I'd prefer to have this happen behind the scenes, without having to open a window.

CSP is not a problem as long as your manifest.json adds the URLs you want to process in permissions e.g. "*://*/" or "<all_urls>" will allow access to any site.
The solution, however, depends on how that page is built. If the server response contains all the info you need then you can simply make a direct request via XMLHttpRequest or fetch (more info) in the background script, parse it with DOMParser and extract the data. Otherwise you can try to run it in an iframe (you'll have to strip X-Frame-Options) or in an inactive/pinned tab and use a content script to extract the data. To access JavaScript variables of the page you'll need to add a DOM script so its code will run in page context.

Related

How can I prevent an iframe displaying an email to load images and other email trackers?

We have a web admin panel in which the agents can see conversations with customers.
Those conversations are the result of importing normal emails thru an IMAP connection. We grab the "untouched" mailbox files and we store them in a database. Then we post-process the files to index by "from", "to", "date" and so on and so forth.
Up to here, okey. We can seek all the emails involved with a client and render them at will.
Then when the agent looks for a customer in the web admin panel and opens it, the full email conversation appears. And we display the HTML version of the email within an iframe (or the text version if the html version is not there). 90% of the customers send HTML.
What happens? Upon the agent opening the email in our web, the iframe loads the "full html" and renders it. This makes "remote loading" (images, sounds, styles if so, and whatever) to be downloaded. This allows customers to "track" if we opened the email by appending tracking id's to the assets (typical http://track.example.com/image.jpg?id=123456789)
I've tried the "sandbox" attribute of the iframe html tag with no luck (it still downloads the images).
Question
How can I programmatically tell the iframe to not load ANY remote content, and just render the initial HTML without any remote call?

Mozilla's iframe documentation listing all available attributes for the is here: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe
If you look at "sandbox" there is no restriction specific to image or other includes, just restrictions on things like running JavaScript. There are no other attributes that would restrict images and includes.
To solve the problem of images and includes in your HTML you will need to filter the HTML either at the server before sending it or in the client after it arrives.
Server:
Before storing it into the database.
In the code that retrieves the HTML and returns it to the iframe.
Client:
Use AJAX to fill the iframe with the HTML, with code that filters a
response. With this approach you could also use a div instead of an
iframe if that works better for your layout.
If all of your users will use Chrome or Firefox, you could look at writing a browser extension

How do I capture the URL after an external website login in ReactJS?

I want to retrieve the URL after opening an external website pop up in my ReactJS/NodeJS application. Basically in my application, I have a button that redirects the page to microsoft online login page. What I want is the URL of the page after the user logs into microsoft online.
Is there any way that's possible? If so, what are my options?

If you navigate to another webpage, your React application is no longer being served to your browser, and can't do anything. You would need to have a script running on the microsoft website, either by writing it in the source code (which I doubt you can do) or by some other method such as a browser extension.

There is no way to track different systems like methods #izb mentioned, if they already dont provide.
Many systems provides information from their servers, push/ping systems.
One of the payment systems, I redirect request, customer pays, and they redirects the page I entered before in their panel, like successful or fail pages.

Chrome Extension: Sending a message to the page loaded in a specific iframe

I'm working on a Chrome extension to (among other things) support a page with multiple iframes, each of which loads a page from some other domain. I need to send a msg to the page loaded a specific one of those iframes. The top-level page and the pages in the iframe each have their own content scripts, so the full messaging API is available.
From the top page, when I do chrome.runtime.sendMessage(), all the iframes get it (as does the top window, but it's easy for its content script to know that that particular msg isn't intended for it). Is there any way to target a specific one of those iframes, or for the desired iframe page to know that the msg is for it?
Note that...
The top page can't access anything in iframe pages directly, because they're from other domains.
The top page knows the URL that was originally loaded in each frame, but the user may have navigated from there, so including the target URL as a msg parameter for the receiving script to check won't work.
Is there something obvious I'm missing here?
UPDATE: #wOxxOm's answer was very helpful, but I'm still stuck on how to get the frameIds I need.
More specifically, I need to do two things with those iframes, both of which need that frameId:
Inject a script into each iframe
Send msgs to a specific iframe in response to user actions on the top-level page
All of this is complicated by the fact that the iframes are created and removed dynamically as the user works.
One idea I had is to initially load each new iFrame with the URL "about:blank?id=nnn", where nnn is the DOM id of the corresponding iframe element. That way, when I call getAllFrames(), I can recognize the new iframes by that URL, and build a lookup of frameIds for each DOM id. Once that's done, I can load the real URL, inject the script once it's loaded.
That seems so roundabout, I'm hoping I've missed some supporting API or other straightforward approach.

I did find a solution, but it's pretty indirect. I hope this is clear; all these moving parts are the nature of the beast as I understand it.
Here's what I ended up doing:
Added a name attribute to each iframe, the same as its DOM id.
When the page in each iframe loads, a global content script calls chrome.runtime.sendMessage(), passing that name, which it can access as window.name.
The background script gets that msg, with the frameId of that iframe as sender.frameId, and calls chrome.tabs.sendMessage(), passing the frameId and window name.
The top-level page's content script builds a lookup object from those window-name (AKA iframe DOM id) / frameId pairs.
When the top-level page's content script wants to send a message to any of the iframe pages, it looks up the target's frameId in that lookup object, then calls chrome.runtime.sendMessage(), with a message type that indicates it's for a specific iframe, and including that frameId.
In response, the background script sends it on to the requested iframe's content script with chrome.tabs.sendMessage(), passing {frameId: request.frameId} as the 3rd parameter, as wOxxOm suggested.
This is working here, but by all means let me know if there's a simpler way to do this.

tabs permission or content script?

I'm writing an extension that needs to show a page action on amazon.com pages.
Would it be better to request the "tabs" permission or to inject a content script into amazon.com pages?
The tabs permission strikes me as using less resources (because it just checks the URL against a regex in the background script) but I think it's a scarier permission message ("access your tabs and browsing activity")?
Injecting a content script into amazon.com pages seems like it would take more resources it but would only need permission to amazon.com...

It is a generic question and answer depends on Client to Client. You have pointed out the + and - of each.
I suggest you to go for content scripts if your clients are particular about security and privacy, in this you are adding an extra load to pages(with content scripts and message passing) which may slow down the normal execution process.
I suggest you to go for tab permission, if you are all about performance. It is a native API, and executes in background page no extra load on tabs. Many extensions on web store does use tabs API, i dont think this would scare them as this is not new.
However, it is all about your target section of users.

Google Chrome Extension - prevent cookie on jquery ajax request or Use a chome.extension

I have a great working chrome extension now.
It basically loops over a list of HTML of a web auction site, if a user has not paid for to have the image shown in the main list. A default image is shown.
My plugin use a jQuery Ajax request to load the auction page and find the main image to display as a thumbnail for any missing images. WORKS GREAT.
The plugin finds the correct image url and update the HTML Dom to the new image and sets a new width.
The issue is, that the auction site tracks all pages views and saves it to a "recently viewed" section of the site "users can see any auctions they have clicked on"
ISSUE
- My plugin uses ajax and the cookies are sent via the jQuery ajax request. I am pretty sure I cannot modify the cookies in this request so the auction site tracks the request and for any listing that has a missing image this listing is now shown in my "recently viewed" even though I have not actually navigated to it.
Can I remove cookies for ajax request (I dont think I can)
Can chrome remove the cookie (only for the ajax requests)
Could I get chrome to make the request (eg curl, with no cookie?)
Just for the curious.
Here is a page with missing images on this auction site
http://www.trademe.co.nz/Browse/SearchResults.aspx?searchType=all&searchString=toaster&type=Search&generalSearch_keypresses=9&generalSearch_suggested=0
Thanks for any input, John.

You can use the webRequest API to intercept and modify requests (including blanking headers). It cannot be used to modify requests which are created within the context of a Chrome extension though. If you want to use this API for cookie-blanking purposes, you have to load the page in a non-extension context. Either by creating a new tab, or use an off-screen tab (using the experimental offscreenTabs API.
Another option is to use the chrome.cookie API, and bind a onChanged event. Then, you can intercept cookie modifications, and revert the changes using chrome.cookies.set.
The last option is to create a new window+tab in Incognito mode. This method is not reliable, and should not be used:
The user can disallow access to the Incognito mode
The user could have navigated to the page in incognito mode, causing cookie fields to be populated.
It's disruptive: A new window is created.

Presumably this AJAX interaction is being run from a content script? Could you run it from the background page instead and pass the data to the content script? I belive the background page operates in a different context and shouldn't send the normal cookies.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string