I am using s3 to host static website. This website is placed in s3 bucket and being distributed by cloudfront. It all works well but we are facing problem when we need to change specific files. if we change index.html file in s3 bucket we are not getting the latest file from cloudfront.
Should I be setting expiry time on s3 for these static files and only then after expired time will cloudfront look for new version of file and distribute new files?
CloudFront uses the Cache-Control and Expires header sent by the origin server to decide if a resource is to be stored in cache and how long a it is considered fresh. If you don't control caching via response headers, CF will consider each resource as stale after 24 hours it was fetched from the origin. Optionally, you can configure a distribution to ignore cache control headers and use an expiry time for each resource that you specify.
When you update a file at the origin, CF will not attempt to refresh its copy until it expires. You can follow different strategies to have CF update cached copies.
1) The least efficient and not recommended is use invalidation. You can do it via AWS console or API.
2) Tell CF when to look for updated content by sending Expires headers. For example, if you have a strict policy for deploying new content/version to your website and you know that say you roll out a deployment almost every Thursday, you may send an Expires header with each resource from your origin set to next planned deployment date. (This will probably not work with S3 origins.)
3) The most efficient and recommended way is to use versioned URLs. A good practice could be to include the last modified time of the resource in its access URI. With EC2 or other origins able to serve dynamic content it is fairly easy, with an S3 origin, it's not that straight forward if possible at all.
Therefore I'd recommend invalidating the updated resources.
It looks like you have to set the meta data on the s3 side:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
The best way I found to do this is use BucketExplorer and go "Batch Operation", "Update Metadata", "Add Metadata" and then add "Cache-Control:max-age=604800, public" a 1 week cache period.
Related
Had Done:
I had done uploading Kyc documents and attachments in s3 bucket
Integrated S3 with CloudFront
Blocked all public access in S3 bucket.
Only way of accessing content is 'CloudFront url'
My requirement is:
Any one can access the documents if 'CloudFront Url' known
So i want to restrict the access of URL except my application
Mainly block the access of that url in chrome, safari and all browsers
Is it possible to restrict the URL ? How ?
Lambda#Edge will let you do almost anything you want with a request as it's processed by CloudFront.
You could look at the user agent, then return a 403 if it doesn't match what you expect. Beware, however, that it's not difficult to change the user-agent string. Better is to use an authentication token.
To be honest, I don't understand your question well and you should make an attempt to describe the issue again. From a bird's eye view, I feel you are describing an IDOR vulnerability. But I will address multiple parts in my response.
AWS WAF will allow you to perform quite a bit of blocking on a wide variety of request content.
Specifically for this problem, if you choose to use AWS WAF, you can do the following to address this issue:
Create a WAF ACL, it should not be regional and should be global, set the default action of the WAF ACL to auto allow
Build regex pattern sets of what you would like to block or you can hard code specific examples
Create a rule that will block requests which have a User-Agent header that matches your regex pattern set
But at the end of the day, you might just be fighting a battle which should not necessarily be fought in the first place. Think about it like this, if you want to block all User-Agent headers which symbolize a browser, that is fine. But the problem is, the User-Agent header can easily be overwritten and spoofed such you won't see the typical browser User-Agent header. I don't suggest you to block requests based on this criteria because at the end of the day, I can just use a proxy and have it replace that request content before forwarding the traffic to the server and bypass the WAF or even Lambda#Edge.
What I would suggest is to develop some sort of authorization/authentication requirement to access these specific files. Since KYC can be sensitive, this would be a good control to put in place to be sure the files are not accessed by those who should not access them.
It seems to me like you are running into a case where an attacker can exploit an IDOR vulnerability. If that is the case, you need to program this logic in the application layer. There will be no way to prevent this at the AWS WAF layer.
If you truly wanted to fix the issue and you were dealing with an IDOR, I would use Lambda#Edge to validate that the Cookie included in the request should be able to access the KYC document. You should store in a database what KYC documents can be accessed by which specific user and you should check that the Cookies header includes the Cookies of the user who uploaded the KYC document. This would be effectively implementing authorization/authentication, but instead at just at the application layer, it would be also at the Lambda#Edge (or CDN) layer.
Don't ask me why, we want to run node.js in server1, but want its config files in server2 (say in s3).
What's the recommended way to do this ?
i could read config file directly when starting my node app, but want to see practical ways.
In the past, I've done this by signing a URL with a long expiration date. That way, I could safely update configs in the bucket, and the servers would get the configuration when they started up.
Unfortunately, the ability to sign URLs for longer than 7 days is removed with the V4 signatures. V2 signatures stop working on June 24th of this year.
However, you could have your own server (or Lambda function) receive the request for config and redirect them to a signed bucket URL. Just be very careful to authenticate the config request securely.
I'm creating a way for clients to upload large files to S3 for us to process.
I built a mechanism that allows clients to send the list of files they have and in return they get an HTTP request they need to send to S3, along with the file attached, for one of the files they offered. This removes the strain of uploads from our server and we can pick up any file that has been uploaded via notification from the S3 bucket.
My problem is with replay attacks. If a certain party asks to send a file and receives the request back, they can replay the same request over and over again, costing us in requests. I don't care about overwriting the file since the Contents-MD5 header forces the file to be the same file (conflicts notwithstanding). I also don't care about being notified about the file completing upload again.
I thought about generating a policy that only allows uploads with a specific token, which changes every X minutes. Should someone want to replay an attack, they would fail and have to re-request an S3 request from us (which would fail, since the upload already completed beforehand). I'm not sure how much of a best practice it would be to rotate such a token and worried it would also cause lots of legitimate requests that are taking too long to start to fail.
Is there any other mechanism I'm not aware of that should be used in this case?
worried it would also cause lots of legitimate requests that are taking too long to start to fail.
You can dismiss that particular worry, by signing the URLs with a short expiration time. Authentication and authorization, including signature validation, happens at the beginning of the request. S3 won't cut an upload or download short because the signature expires in the middle of a long request.
Changing bucket policies programmatically, repeatedly, on the fly is definitely not a best practice.
Note that it is does not appear to be clearly documented, but when S3 denies a request, the (negligible, but still non-zero) per-request charge apparently still applies, so having S3 refuse a redundant overwrite of the same object with identical content is unlikely to be a solution worth achieving.
I have app where user's photos are private. I store the photos(thumbnails also) in AWS s3. There is a page in the site where user can view his photos(i.e thumbnails). Now my problem is how do I serve these files. Some options that I have evaluated are:
Serving files from CloudFront(or AWS) using signed url generation. But the problem is every time the user refreshes the page I have to create so many signed urls again and load it. So therefore I wont be able to cache the Images in the browser which would have been a good choice. Is there anyway to do still in javascript? I cant have the validity of those urls for longer due to security issues. And secondly within that time frame if someone got hold of that url he can view the file without running through authentication from the app.
Other option is to serve the file from my express app itself after streaming it from S3 servers. This allows me to have http cache headers, therefore enable browser caching. It also makes sure no one can view a file without being authenticated. Ideally I would like to stream the file and a I am hosting using NGINX proxy relay the other side streaming to NGINX. But as i see that can only be possible if the file exist in the same system's files. But here I have to stream it and return when i get the stream is complete. Don't want to store the files locally.
I am not able to evaluate which of the two options would be a better choice?? I want to redirect as much work as possible to S3 or cloudfront but even using singed urls also makes the request first to my servers. I also want caching features.
So what would be ideal way to do? with the answers for the particular questions pertaining to those methods?
i would just stream it from S3. it's very easy, and signed URLs are much more difficult. just make sure you set the content-type and content-length headers when you upload the images to S3.
var aws = require('knox').createClient({
key: '',
secret: '',
bucket: ''
})
app.get('/image/:id', function (req, res, next) {
if (!req.user.is.authenticated) {
var err = new Error()
err.status = 403
next(err)
return
}
aws.get('/image/' + req.params.id)
.on('error', next)
.on('response', function (resp) {
if (resp.statusCode !== 200) {
var err = new Error()
err.status = 404
next(err)
return
}
res.setHeader('Content-Length', resp.headers['content-length'])
res.setHeader('Content-Type', resp.headers['content-type'])
// cache-control?
// etag?
// last-modified?
// expires?
if (req.fresh) {
res.statusCode = 304
res.end()
return
}
if (req.method === 'HEAD') {
res.statusCode = 200
res.end()
return
}
resp.pipe(res)
})
})
If you'll redirect user to a signed url using 302 Found browser will cache the resulting image according to its cache-control header and won't ask it the second time.
To prevent browser from caching the signed url itself you should send proper Cache-Control header along with it:
Cache-Control: private, no-cache, no-store, must-revalidate
So the next time it'll send request to the original url and will be redirected to a new signed url.
You can generate signed url with knox using signedUrl method.
But don't forget to set proper headers to every uploaded image. I'd recommend you to use both Cache-Control and Expires headers, because some browser have no support for Cache-Control header and Expires allows you to set only an absolute expiration time.
With the second option (streaming images through your app) you'll have better control over the situation. For example, you'll be able to generate Expires header for each response according to current date and time.
But what about speed? Using signed urls have two advantages which may affect page load speed.
First, you won't overload your server. Generating signed urls if fast because you're just hashing your AWS credentials. And to stream images through your server you'll need to maintain a lot of extra connections during the page load. Anyway, it won't make any actual difference unless your server is hard loaded.
Second, browsers keeps only two parallel connections per hostname during page load. So, browser will keep resolving images urls in parallel while downloading them. It'll also keep images downloading from blocking downloading of any other resources.
Anyway, to be absolutely sure you should run some benchmarks. My answer was based on my knowledge of HTTP specification and on my experience in web developing, but I never tried to serve images that way myself. Serving public images with long cache lifetime directly from S3 increases page speed, I believe the situation won't change if you'll do it through redirects.
And you should keep in mind that streaming images through your server will bring all the benefits of Amazon CloudFront to naught. But as long as you're serving content directly from S3 both options will work fine.
Thus, there are two cases when using signed urls should speedup your page:
If you have a lot of images on a single page.
If you serving images using CloudFront.
If you have only few images on each page and serving them directly from S3, you'll probably won't see any difference at all.
Important Update
I ran some tests and found that I was wrong about caching. It's true that browsers caches images they was redirected to. But it associates cached image with the url it was redirected to and not with the original one. So, when browser loads the page second time it requests image from the server again instead of fetching it from the cache. Of course, if server responds with the same redirect url it responded the first time, browser will use its cache, but it's not the case for signed urls.
I found that forcing browser to cache signed url as well as the data it receives solves the problem. But I don't like the idea of caching invalid redirect URL. I mean, if browser will miss the image somehow it'll try to request it again using invalid signed url from the cache. So, I think it's not an option.
And it doesn't matter if CloudFront serve images faster or if browsers limits the number of parallel downloads per hostname, the advantage of using browser cache exceeds all the disadvantages of piping images through your server.
And it looks like most social networks solves the problem with private images by hiding its actual urls behind some private proxies. So, they store all their content on public servers, but there is no way to get an url to a private image without authorization. Of course, if you'll open private image in a new tab and send the url to your friend, he'll be able to see the image too. So, if it's not an option for you then it'll be best for you to use Jonathan Ong's solution.
I would be concerned with using the CloudFront option if the photos really do need to remain private. It seems like you'll have a lot more flexibility in administering your own security policy. I think the nginx setup may be more complex than is necessary. Express should give you very good performance working as a remote proxy where it uses request to fetch items from S3 and streams them through to authorized users. I would highly recommend taking a look at Asset Rack, which uses hash signatures to enable permanent caching in the browser. You won't be able to use the default Racks because you need to calculate the MD5 of each file (perhaps on upload?) which you can't do when it's streaming. But depending on your application, it could save you a lot of effort for browsers never to need to refetch the images.
Regarding your second option, you should be able to set cache control headers directly in S3.
Regarding your first option. Have you considered securing your images a different way?
When you store an image in S3, couldn't you use a hashed and randomised filename? It would be quite straight forward to make the filename difficult to guess + this way you'll have no performance issues viewing the images back.
This is the technique facebook use. You can still view an image when you're logged out, as long as you know the URL.
If I set the content expiration for static files to something like 14 days and I decide to update some files later on, will IIS know to serve the updated files or will the client have to wait until the expiration date?
Or is it the other way around where the browser requests a new file if the modified date is different?
Sometimes I update a file on the server and I have to do a hard refresh (CTRL+F5) to see the difference. Currently I have it to expire after 1 day.
The web browser, and any intermediate proxies, are allowed to cache the page until its expiration date. This means that IIS might not even be aware of the client viewing the page.
You want ETags
An ETag is an opaque identifier assigned by a web server to a specific version of a resource found at a URL. If the resource content at that URL ever changes, a new and different ETag is assigned. Used in this manner ETags are similar to fingerprints, and they can be quickly compared to determine if two versions of a resource are the same or not. [...]