Serving files stored in S3 in express/nodejs app

Serving files stored in S3 in express/nodejs app - node.js

I have app where user's photos are private. I store the photos(thumbnails also) in AWS s3. There is a page in the site where user can view his photos(i.e thumbnails). Now my problem is how do I serve these files. Some options that I have evaluated are:
Serving files from CloudFront(or AWS) using signed url generation. But the problem is every time the user refreshes the page I have to create so many signed urls again and load it. So therefore I wont be able to cache the Images in the browser which would have been a good choice. Is there anyway to do still in javascript? I cant have the validity of those urls for longer due to security issues. And secondly within that time frame if someone got hold of that url he can view the file without running through authentication from the app.
Other option is to serve the file from my express app itself after streaming it from S3 servers. This allows me to have http cache headers, therefore enable browser caching. It also makes sure no one can view a file without being authenticated. Ideally I would like to stream the file and a I am hosting using NGINX proxy relay the other side streaming to NGINX. But as i see that can only be possible if the file exist in the same system's files. But here I have to stream it and return when i get the stream is complete. Don't want to store the files locally.
I am not able to evaluate which of the two options would be a better choice?? I want to redirect as much work as possible to S3 or cloudfront but even using singed urls also makes the request first to my servers. I also want caching features.
So what would be ideal way to do? with the answers for the particular questions pertaining to those methods?

i would just stream it from S3. it's very easy, and signed URLs are much more difficult. just make sure you set the content-type and content-length headers when you upload the images to S3.
var aws = require('knox').createClient({
key: '',
secret: '',
bucket: ''
})
app.get('/image/:id', function (req, res, next) {
if (!req.user.is.authenticated) {
var err = new Error()
err.status = 403
next(err)
return
}
aws.get('/image/' + req.params.id)
.on('error', next)
.on('response', function (resp) {
if (resp.statusCode !== 200) {
var err = new Error()
err.status = 404
next(err)
return
}
res.setHeader('Content-Length', resp.headers['content-length'])
res.setHeader('Content-Type', resp.headers['content-type'])
// cache-control?
// etag?
// last-modified?
// expires?
if (req.fresh) {
res.statusCode = 304
res.end()
return
}
if (req.method === 'HEAD') {
res.statusCode = 200
res.end()
return
}
resp.pipe(res)
})
})

If you'll redirect user to a signed url using 302 Found browser will cache the resulting image according to its cache-control header and won't ask it the second time.
To prevent browser from caching the signed url itself you should send proper Cache-Control header along with it:
Cache-Control: private, no-cache, no-store, must-revalidate
So the next time it'll send request to the original url and will be redirected to a new signed url.
You can generate signed url with knox using signedUrl method.
But don't forget to set proper headers to every uploaded image. I'd recommend you to use both Cache-Control and Expires headers, because some browser have no support for Cache-Control header and Expires allows you to set only an absolute expiration time.
With the second option (streaming images through your app) you'll have better control over the situation. For example, you'll be able to generate Expires header for each response according to current date and time.
But what about speed? Using signed urls have two advantages which may affect page load speed.
First, you won't overload your server. Generating signed urls if fast because you're just hashing your AWS credentials. And to stream images through your server you'll need to maintain a lot of extra connections during the page load. Anyway, it won't make any actual difference unless your server is hard loaded.
Second, browsers keeps only two parallel connections per hostname during page load. So, browser will keep resolving images urls in parallel while downloading them. It'll also keep images downloading from blocking downloading of any other resources.
Anyway, to be absolutely sure you should run some benchmarks. My answer was based on my knowledge of HTTP specification and on my experience in web developing, but I never tried to serve images that way myself. Serving public images with long cache lifetime directly from S3 increases page speed, I believe the situation won't change if you'll do it through redirects.
And you should keep in mind that streaming images through your server will bring all the benefits of Amazon CloudFront to naught. But as long as you're serving content directly from S3 both options will work fine.
Thus, there are two cases when using signed urls should speedup your page:
If you have a lot of images on a single page.
If you serving images using CloudFront.
If you have only few images on each page and serving them directly from S3, you'll probably won't see any difference at all.
Important Update
I ran some tests and found that I was wrong about caching. It's true that browsers caches images they was redirected to. But it associates cached image with the url it was redirected to and not with the original one. So, when browser loads the page second time it requests image from the server again instead of fetching it from the cache. Of course, if server responds with the same redirect url it responded the first time, browser will use its cache, but it's not the case for signed urls.
I found that forcing browser to cache signed url as well as the data it receives solves the problem. But I don't like the idea of caching invalid redirect URL. I mean, if browser will miss the image somehow it'll try to request it again using invalid signed url from the cache. So, I think it's not an option.
And it doesn't matter if CloudFront serve images faster or if browsers limits the number of parallel downloads per hostname, the advantage of using browser cache exceeds all the disadvantages of piping images through your server.
And it looks like most social networks solves the problem with private images by hiding its actual urls behind some private proxies. So, they store all their content on public servers, but there is no way to get an url to a private image without authorization. Of course, if you'll open private image in a new tab and send the url to your friend, he'll be able to see the image too. So, if it's not an option for you then it'll be best for you to use Jonathan Ong's solution.

I would be concerned with using the CloudFront option if the photos really do need to remain private. It seems like you'll have a lot more flexibility in administering your own security policy. I think the nginx setup may be more complex than is necessary. Express should give you very good performance working as a remote proxy where it uses request to fetch items from S3 and streams them through to authorized users. I would highly recommend taking a look at Asset Rack, which uses hash signatures to enable permanent caching in the browser. You won't be able to use the default Racks because you need to calculate the MD5 of each file (perhaps on upload?) which you can't do when it's streaming. But depending on your application, it could save you a lot of effort for browsers never to need to refetch the images.

Regarding your second option, you should be able to set cache control headers directly in S3.
Regarding your first option. Have you considered securing your images a different way?
When you store an image in S3, couldn't you use a hashed and randomised filename? It would be quite straight forward to make the filename difficult to guess + this way you'll have no performance issues viewing the images back.
This is the technique facebook use. You can still view an image when you're logged out, as long as you know the URL.

Related

What is the safest method to make session?

So I have few things to say I don't want to use cookies so things like express-session doesn't come as option.
I use nodejs with express with no front-end JavaScript and mysql as database. I don't really know how to do it so I would like to hear your opinion.
I already tried to search on internet.

When dealing with regular web pages, there are only four places in a request to store information that would identify a session.
Cookie sent with each request
Custom header on each request
Query parameter with each request
In the path of the URL
You've ruled out the cookie.
The custom header could work for programmatic requests and is regularly used by Javascript code with various types of tokens. But, if you need a web browser to maintain or send the session on its own, then custom headers are out too.
That leaves query parameters or in the path of the URL. These both have the same issues. You would create a sessionID and then attach something like ?sessionID=92347987 to every single request that your web page makes to your server. There are some server-side frameworks that do sessions this way (most have been retired in favor of cookies). This has all sorts of issues (which is why it isn't used very often any more). Here are some of the downsides:
You have to dynamically generate every single link in a web page so that it will include the right sessionID as part of the link so if the user clicks on it, the resulting http request will have the right sessionID included.
All browser caching has to be disabled or bypassed because you don't want the browser to use cached web pages that might contain the wrong sessionID.
User bookmarks basically don't work because they end up bookmarking a URL with a sessionID in it that won't last forever.
The user sees sessionID=xxxx in all their URLs.
Network infrastructure that log the URLs of requests will include the sessionID (because it's in the URL). This is considered a security risk.
All that said and with those tradeoffs, it can be made to work, but it is not considered the "safest" way to do it.

Block the access of Aws cloudfront url from chrome, safari and all browsers

Had Done:
I had done uploading Kyc documents and attachments in s3 bucket
Integrated S3 with CloudFront
Blocked all public access in S3 bucket.
Only way of accessing content is 'CloudFront url'
My requirement is:
Any one can access the documents if 'CloudFront Url' known
So i want to restrict the access of URL except my application
Mainly block the access of that url in chrome, safari and all browsers
Is it possible to restrict the URL ? How ?

Lambda#Edge will let you do almost anything you want with a request as it's processed by CloudFront.
You could look at the user agent, then return a 403 if it doesn't match what you expect. Beware, however, that it's not difficult to change the user-agent string. Better is to use an authentication token.

To be honest, I don't understand your question well and you should make an attempt to describe the issue again. From a bird's eye view, I feel you are describing an IDOR vulnerability. But I will address multiple parts in my response.
AWS WAF will allow you to perform quite a bit of blocking on a wide variety of request content.
Specifically for this problem, if you choose to use AWS WAF, you can do the following to address this issue:
Create a WAF ACL, it should not be regional and should be global, set the default action of the WAF ACL to auto allow
Build regex pattern sets of what you would like to block or you can hard code specific examples
Create a rule that will block requests which have a User-Agent header that matches your regex pattern set
But at the end of the day, you might just be fighting a battle which should not necessarily be fought in the first place. Think about it like this, if you want to block all User-Agent headers which symbolize a browser, that is fine. But the problem is, the User-Agent header can easily be overwritten and spoofed such you won't see the typical browser User-Agent header. I don't suggest you to block requests based on this criteria because at the end of the day, I can just use a proxy and have it replace that request content before forwarding the traffic to the server and bypass the WAF or even Lambda#Edge.
What I would suggest is to develop some sort of authorization/authentication requirement to access these specific files. Since KYC can be sensitive, this would be a good control to put in place to be sure the files are not accessed by those who should not access them.
It seems to me like you are running into a case where an attacker can exploit an IDOR vulnerability. If that is the case, you need to program this logic in the application layer. There will be no way to prevent this at the AWS WAF layer.
If you truly wanted to fix the issue and you were dealing with an IDOR, I would use Lambda#Edge to validate that the Cookie included in the request should be able to access the KYC document. You should store in a database what KYC documents can be accessed by which specific user and you should check that the Cookies header includes the Cookies of the user who uploaded the KYC document. This would be effectively implementing authorization/authentication, but instead at just at the application layer, it would be also at the Lambda#Edge (or CDN) layer.

Offline data caching mechanism in hybrid application

I am developing application using Angular2 and ionic in back end i am using node. In my application i have some html forms these forms contain common input fields and image upload.
If the user don't have net connection user can able to complete the form without any problem. Once he get internet connection at that time it will upload to server.
How can we implement this feature in web and mobile separately ?
Whats the best solution for this scenario. Please suggest i don't have any clear picture about this one.

To understand what offline scenarios are good candidates for a mobile web app, it helps to first understand the key technologies that make it possible.
Mobile web apps can be built with three core capabilities, and all of these are part of the new HTML5 standards:
Browser application caching of pages
Local storage
Local database
Browser application caching allows a manifest to be created listing pages that should be cached and made available offline. Normally, when you visit a URL, a server request is made to return the page. Setting up an application cache manifest tells the browser how it can use pages already downloaded rather than just immediately displaying an error when there is no longer a network connection.
Local Storage is a standard that retains local web app data (even when the browser is shut down) using a key/value system that works similarly to browser cookies. However, it is different from browser cookies in two important ways. First, cookies are resent to the server with every HTTP request, and it would waste a lot of bandwidth to resend all offline data when the server doesn’t need it. Secondly, cookies tend to max out at around 4k of data, while local storage usually gives an application as much as 5 MB of data to work with per domain. 5 MB may not sound like much, but when used carefully, it can go a very long way in terms of offline local storage.
Local Database removes the 5MB limit of local storage and allows data to be indexed so that multiple properties can be queried quickly. This is only an HTML5 proposed standard at present; only Internet Explorer and Firefox have implemented it so far. Safari and Chrome use an older, deprecated system called Web SQL. This means if you need this level of functionality, there is a significant amount of extra work and complexity to support both standards across all major browsers. Hopefully, that won’t always be the case and major browsers will conform to the finalized HTML5 specifications.

When user is offline store form data in localStorage, and when it comes online get data form localStorage upload it to server and remove data from localStorage.
localStorage.setItem('form_data', formData) //Store data in localStorage
localStorage.getItem('form_data') //Get form localStorage
localStorage.removeItem('form_data') //Remove from localStrage
Add eventlistner on window
if (typeof navigator.onLine !== 'undefined') {
window.addEventListener('offline', function (ev) {
userOffline();
});
window.addEventListener('online', function (ev) {
userOnline();
});
}
function userOnline() {} // when user online
function userOffline() {} // when user offline

Amazon cloudfront how to set signed url outdated if used one time

I need to protect my videos to be downlouded using "Internet Download Manager - IDM/IDMan"
i used
1. rtmp stream
2. signed URL
3. expiration date of signed URL (60seconds)
4. i set player(jwplayer)to *autostar*
AND i need to set signed url outdated if it used one time
using this solution IDM will get an url that is already used then blocked
Is there any way to configure cloudfront to use signed url just one time;
Or any solution that can protect videos to be uploaded and used in other web sites.
Please can you help?
Thanks in advance

Cloudfront does not support the ability to only play a url once and they never will. The reason is that the only way to do this will be for all their edge servers to share the information - they currently do not share state which means scaling is much easier and performance is much better.
Unfortunately, if you're looking for fine grained control over how your videos are played, you're going to need more fine grained code, which you can't do in cloudfront - you'll need to host content directly on your server.
Idea 1: Limit by count
You can implement the idea that you have - once the url has been used once, you no longer serve up that file.
Idea 2: Limit by referrer
You can look at the referrer header and if it's from your website, then allow the content to be downloaded. Otherwise, reject it. Note: this can be spoofed and a user can set the referrer header manually.
Preventing a video from being downloaded and later uploaded is technically impossible. By letting them display the video, there really isn't any way to do that without them being able to record those bits and replay them later. There are probably things, like preventing right clicks or using an odd proprietary format or something else but I'm not familiar with DRM techniques.

Preventing Rogue spiders from Indexing directory

We have a secure website (developed in .NET 2.0/C# running on Windows server and IIS 5) to which members have to log in and then they can view some PDF files stored in a virtual directory. To prevent spiders from crawling this website, we have a robots.txt that will disallow all user agents from coming in. However, this will NOT prevent Rogue spiders from indexing the PDF files since they will disregard the robots.txt commands. Since the documents are to be secure, I do not want ANY spiders getting into this virtual directory (not even the good ones).
Read a few articles on the web and wondering how programmers (rather than web masters) have solved this problem in their applications, since this seems like a very common problem. There are many options on the web but am looking for something that is easy and elegant.
Some options that I have seen, but seem to be weak. Listed here with their cons:
Creating a Honeypot/tarpit that will allow rogue spiders to get in and then will list their IP address. Cons : this can also block valid users coming from the same IP, need to manually maintain this list or have some way for members to remove themselves from the list. We dont have a range of IPs that valid members will use, since the website is on the internet.
Request header analysis : However, the rogue spiders use real agent names so this is pointless.
Meta-Robots tag: Cons: only obeyed by google and other valid spiders.
There was some talk about using .htaccess which is suppose to be good but thats only will apache, not IIS.
Any suggestions are very much appreciated.
EDIT: as 9000 pointed out below, rogue spiders should not be able to get into a page that requires a login. I guess the question is 'how to prevent someone who knows the link form requesting the PDF file without logging into the website'.

I see a contradiction between
members have to log in and then they can view some PDF files stored in a virtual directory
and
this will NOT prevent Rogue spiders from indexing the PDF files
How come any unauthorized HTTP request to this directory ever gets served with something else than code 401? The rouge spiders certainly can't provide an authorization cookie. And if the directory is accessible to them, what is 'member login' then?
Probably you need to serve the PDF files via a script that checks authorization. I think IIS is capable of requiring an authorization just for a directory access, too (but I don't really know).

I assume that your links to PDFs come from a known location. You can check the Request.UrlReferrer to make sure users are coming from this internal / known page to access the PDFs.
I would definitely force downloads to go through a script where you can check that a user is in fact logged in to the site before allowing the download.
protected void getFile(string fileName) {
/*
CHECK AUTH / REFERER HERE
*/
string filePath = Request.PhysicalApplicationPath + "hidden_PDF_directory/" + fileName;
System.IO.FileInfo fileInfo = new System.IO.FileInfo(filePath);
if (fileInfo.Exists) {
Response.Clear();
Response.AddHeader("Content-Disposition", "attachment; filename=" + fileInfo.Name);
Response.AddHeader("Content-Length", fileInfo.Length.ToString());
Response.ContentType = "application/pdf";
Response.WriteFile(fileInfo.FullName);
Response.End();
} else {
/*
ERROR
*/
}
}
Untested, but this should give you an idea at least.
I'd also stay away from robots.txt since people will often use this to actually look for things you think you're hiding.

Here is what I did (expanding on Leigh's code).
Created an HTTPHandler for PDF files, created a web.config on the secure directory and configured the Handler to handle PDFs.
In the handler, I check to see if the user is logged in using a session variable set by the application.
If the user has the session variable, I create a fileInfo object and send it on the response. Note : don't do 'context.Response.End()', also the 'Content-Disposition' is obsolete.
So now, where even there is a request for a PDF on the secure directory, the HTTP handler gets the request and checks to see if the user is logged in. If not, display error message, else display the file.
Not sure if there is an performance hit since I am creating the fileInfo objects and sending that, rather than sending the file that already exists. The thing is that you can't Server.Transfer or Response.Redirect to the *.pdf file since you are creating an infinite loop and the response will never get returned to the user.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string