cors+s3+browser cache+chrome extension - google-chrome-extension

Yes. this is a complex question. i will try to nake it brief.
My website fetches resources from s3.
I also have an extension that needs to prefetch that s3 file when someone does a google query, so later when they go on my site ,the resource is cached.
At this point I should probably stress that I'm not doing anything malicious. just a matter of user experience.
My problem is. that making an ajax request to s3 fron the extension (either from content-script or background) doesn't send an origin header.
This means that the resource is downloaded and cached without an allow origin header. s3
doesnt add that allow-origin:* if theres no origin in the request. so later, on my site it fails due to missing allow-origin header in cached file :-(
Any ideas on a better way to prefetch to browser cache?
Is there a way to force the ajax request to send an origin? Any origin?
Since I have an allow-origin:* on my s3 bucket, I think any origin will do accept null.
Thanks
Edit: Ended up using one of Rob W's solutions. You are awesome.
Let me comment on each of the options he suggested:
Not to add the host premissions on my manifest - clever idea but wouldn't work for me since I have a content script which runs on any website, so I must use a catch-all wildcard, and I don't think there is an "exclude" permission option.
I tried it, it issues a null origin, which as expected ends up in S3 sending the allow-origin:* header as required. this means I don't get that "allow-origin header missing" error, however the file is then not served from cache. I guess for it to be actually served from cache in chrome this has to be exactly the same origin. so that was very close but not enough.
third option is a charm. And it is the simplest. I didn't know I was able to manipulate the origin header. So I do that and set the exact origin of my website - And it works. The file is cached and later served from cache. I must stress that i had to add a Url filter to only apply this to requests going out to my s3 bucket, otherwise I expect this would wreak havoc on the user's browser.
Thanks. Well done

You've got three options:
Do not add the host permission for S3 to your manifest file. Then the extension will not have the blanket permission to access the resource, and an Origin request header will be sent with the request.
Use a non-extension frame to perform the AJAX request. For example, the following method will result in a cross-origin GET request with Origin: null.
function prefetchWithOrigin(url) {
var html = '<script>(' + function(url) {
var x = new XMLHttpRequest();
x.open('GET', url);
x.onloadend = function() {
parent.postMessage('done', '*');
};
x.send();
} + ')(' + JSON.stringify(url) + ');</script>';
var f = document.createElement('iframe');
f.src = 'data:text/html,' + encodeURIComponent(html);
(document.body || document.documentElement).appendChild(f);
window.addEventListener('message', function listener(event) {
// Remove frame upon completion
if (event.source === f.contentWindow) {
window.removeEventListener('message', listener);
f.remove();
}
});
}
Use the chrome.webRequest.onBeforeSendHeaders event to manually append the Origin header.

Related

How can I intercept only one endpoint of a domain for my browser API calls?

Suppose I enter a (public) website that makes 3 XHR/fetch calls on 3 different endpoints:
https://api.example.com/path1
https://api.example.com/path2
https://api.example.com/path3
What I want to achieve is intercept the call to https://api.example.com/path2 only, redirect it to a local service (localhost:8000) and let path1 and path3 through to the original domain.
What kind of options do I have here? I have studied a lot of approaches to this issue:
DNS rewriting - this solution is not suitable as I still have to intercept path1 and path3, only redirect them to the original IPs and try to mimic the headers as much as possible - which means I would have to do a specific proxy configuration for each intercepted domain - this is unfeasible
Chrome extensions - found none to deal specifically with single endpoint intercepting
Overwriting both fetch and XmlHttpRequest after page load - still doesn't cover all scenarios, maybe some websites cache the values of fetch and XmlHttpRequest before page load (?)
Combining the chrome extension and fetch overwrite will work.
download an webextension that let you load javascript code before a given page loads, e.g. User JavaScript and CSS
Add the following script to run before your page loads, base on: Intercepting JavaScript Fetch API requests and responses
const { fetch: originalFetch } = window;
window.fetch = async (...args) => {
let [resource, config ] = args;
// request interceptor starts
resource = resource === "https://api.example.com/path2" ? "http://localhost:8000/path2" : resource
// request interceptor ends
const response = await originalFetch(resource, config);
// response interceptor here
return response;
};

Serve remote url with node.js stream

I have a video stored in amazon s3.
Now I'm serving it to the client with node.js stream
return request(content.url).pipe(res)
But, the following format is not working with safari.
Safari is unable to play streamed data. But, the same code works for chrome and firefox.
I did some research and found out
chrome's request content-range param looks like
[0-]
But, safari does the same with content ranges
[0-10][11-20][21-30]
Now if the content was stored in my server, I could have break the file in chucks with
fs.createReadStream(path).pipe(res)
to serve safari with it's requested content range
As mentioned in this blog https://medium.com/better-programming/video-stream-with-node-js-and-html5-320b3191a6b6
How can I do the same with remote url stored in s3?
FYI, It's not feasible to download the content temporarily on server and delete it after serving. As, the website is supposed to receive good traffic.
How can I do the same with remote url stored in s3?
Don't.
Let S3 serve the data. Sign a URL to temporarily allow access to the client. Then, you don't have to serve or proxy anything and you save a lot of bandwidth. An example from the documentation:
var params = {Bucket: 'bucket', Key: 'key'};
var url = s3.getSignedUrl('getObject', params);
console.log('The URL is', url);
...As, the website is supposed to receive good traffic.
You'll probably also want to use a CDN to further reduce your costs and enhance the performance. If you're already using AWS, CloudFront is a good choice.

How to cache response from google drive API in NodeJS

I want to reproduce music files from google drive on a web page. I have the link for each file but the response cache headers for the calls are 'no-cache, no-store, max-age=0, must-revalidate" so it will never be saved on the browser cache. Is there any way to add cache to google drive files requests?
The problem:
When you use the drive link for a music file (mp3) https://drive.google.com/a/pucp.pe/uc?id=1kYYS9FZ9Vxif5WJM9ZQcY4SR35NMgoIE&export=download the GET API call receives a 302 code which generates a redirect to another URL, in this case, to 'https://doc-0o-28-docs.googleusercontent.com/docs/securesc/bgp95l3eabkkpccn0qi0qopvc4e7d4mq/us95e8ush1v4b7vvijq1vj1d7ru4rlpo/1556330400000/01732506421897009934/01732506421897009934/1kYYS9FZ9Vxif5WJM9ZQcY4SR35NMgoIE?h=14771753379018855219&e=download'. Each of these calls has no-cache in headers.
I tried using workbox (cache API) but I don't find a way to cache redirects, probably I need to cache both calls (the first GET and the redirect). However, if I use the redirected URL the caching works, but I don't have access to that URL until the first call is made.
I tried to use a proxy server from a NodeJS server
app.get("/test", (req, res) => {
try {
https.get(
URL,
function(response) {
res.writeHead(response.statusCode, {...response.headers,
"Cache-Control": "public, max-age=120",
"Expires": new Date(Date.now() + 120000).toUTCString() })
response.pipe(res);
}
);
} catch (e) {
console.log(e);
}
});
I tried using the first URL with no luck.
I tried using the redirect URL but I get a "Status Code: 302 Found"
One solution could be to download the file and serve it directly from my server but I will be missing the point of using the drive storage. I really want to use google drive storage and not duplicate all files on my server.
Is there a recommended way to do the caching for this case? maybe there is some google drive configuration that I'm missing. Or do you know another approach I could take in this case?
You should be able to cache redirected responses using Workbox (and the Cache Storage API in general). HTTP 30x redirects are followed automatically by default, and you should only have to route the original, non-redirected URL.
Here's a live example of a Glitch project that uses Workbox to cache that MP3 file: https://glitch.com/edit/#!/horn-sausage?path=sw.js:9:0
The relevant snippet of code, that also accounts for the fact that there is no CORS used when serving the files (so you'll get back an opaque response with a status of 0) is:
workbox.routing.registerRoute(
new RegExp('https://drive.google.com/'),
new workbox.strategies.CacheFirst({
plugins: [
new workbox.cacheableResponse.Plugin({statuses: [0, 200]})
],
})
);

I need to redirect a file from server to a folder without htaccess

i have a file on my "website/folder/file" which i would like to redirect to prevent user to access that file, without using htaccess.
My file is a huge DB containing url's, i don't want the users access to that file by typing the direct URL to the file.
That file is called & used by my chrome extension, who block access to the user if he tries to reach one of the url's in that DB.
Problem is by typing the direct url to that file we have access...
i tried everything with the .htaccess file, i know we can block, redirect, etc with the .htaccess file but if i redirect or block the url of the DB with the htaccess, my extension doesn't work anymore because the DB is blocked by the htaccess file.
so i'm trying to find a solution, maybe is an !
my background.js
'use strict';
let db = []; // session Global
// ----- parse & cache the database data
fetch('http://thmywebsite.com/folder/db.dat')
.then(response => response.text())
.then(text => { db = text.trim().split(/[\r\n]+/); })
.catch(error => console.log(error));
chrome.webRequest.onBeforeRequest.addListener(details => {
let url = new URL(details.url);
return { cancel: url && url.hostname && db.includes(url.hostname) };
},
{ urls: ["http://*/*", "https://*/*"] },
["blocking"]
);
chrome.extension.isAllowedIncognitoAccess(function (isAllowedAccess) {
if (isAllowedAccess) return; // Great, we've got access
})
You can't realistically do this. You can't block a resource that needs to be available publicly (by your client-side script).
You can potentially make this a little harder for someone wanting your DB, by perhaps sending a unique HTTP request header as part of your fetch(). You can then check for the presence of this header server-side (in .htaccess) and block the request otherwise. This prevents a user from casually requesting this file directly in their browser. However, this is trivial to bypass for anyone who looks at your script (or monitors the network traffic) as they can construct the request to mimic your script. But let's not forget, your script downloads the file to the browser anyway - so it's already there for anyone to save.
You need to rethink your data model. If you don't want the DB to be publicly accessible then it simply can't be publicly accessible. Instead of your script downloading the DB to the client and processing the request locally, you could send the request to your server. Your server then performs the necessary lookup (on the "hidden" database) and sends back a response. Your script then acts on this response.

JSONP request working cross-domain, but can't figure out origin

I'm trying out a JSONP call. I have a NodeJs app in server 1, under domain domain1.com looking like this:
server.get('/api/testjsonp', function(req, res) {
var clientId = req.param('clientId');
res.header('Content-Type', 'application/json');
res.header('Charset', 'utf-8')
res.send(req.query.callback + '({"something": "rather", "more": "fun",
"sourceDomain": "' + req.headers.origin + '"' + ',"clientId":"' + clientId +
'"});');
});
In another server (server 2) and under a different domain (domain2.com), I have created a test html page with a call like this:
var data = { clientId : 1234567890 };
$.ajax({
dataType: 'jsonp',
data: data,
jsonp: 'callback',
url: 'https://domain1.com/api/testjsonp?callback=1',
success: function(data) {
alert('success');
},
error: function(err){
alert('ERROR');
console.log(err);
}
});
I have 2 problems here:
1) Why is this working? Isn't it a cross-domain call and therefore I'd need to implement the ALLOW-ORIGIN headers stuff? I'm following this example:
http://css.dzone.com/articles/ajax-requests-other-domains
http://benbuckman.net/tech/12/04/cracking-cross-domainallow-origin-nut
2) In the server, I can't figure out which domain is making the call, req.headers.origin is always undefined. I'd like to be able to know which domain is calling, to prevent unwanted calls. Alternative I could check for the calling IP, any idea how?
Many thanks
Why is this working? Isn't it a cross-domain call and therefore I'd need to implement the ALLOW-ORIGIN headers stuff? I
Are far as the browser is concerned, you aren't directly reading data from a different origin. You are loading a JavaScript program from another origin (and it happens to have some data bundled in it).
In the server, I can't figure out which domain is making the call, req.headers.origin is always undefined. I'd like to be able to know which domain is calling, to prevent unwanted calls.
The URL of the referring page is stored in the Referer header, not the Origin header. It is, however, optional and won't be sent under many circumstances.
If you want to limit access to the data to certain sites, then you can't use JSON-P. Use plain JSON and CORS instead.
Alternative I could check for the calling IP, any idea how?
That would give you the address of the client, not the server that directed the client to you.

Resources