Is there a way to manually purge API responses from AWS CloudFront? - amazon-cloudfront

Let's say I'm caching responses via the Cache-Control header and I don't want to wait for max-age to hit, because I'm churning my API and want to manually clear it. Reading the documentation I could only see a topic about invalidating files, but nothing about API response cache. Perhaps what I'm trying to do is impossible, other than reset the entire CloudFront?
Would this command clear API responses:
aws cloudfront create-invalidation --distribution-id=YOUR_DISTRIBUTION_ID --paths "/*"

Related

Azure Functions Proxy seems to be caching responses

We have an interesting behaviour that we are trying to understand/workaround.
We have an Azure Function running on a consumption host that has nothing but proxies on it.
One of these proxies points to an Azure CDN endpoint, that in turn points to an Azure Storage Account.
From time to time we update the file in the storage account and purge the CDN endpoint.
Requesting the file directly from the CDN returns the (correct) new file data.
However, the Function Proxy url continues to return the (incorrect) old file data.
Browser caches are disabled/cleared and all that normal stuff.
We can see the Last-Modified headers are different, so clearly the proxy is returning the old file.
Further, adding a querystring to the proxy URL - anything you like (we used ?v=1) would return the (correct) new file. Removing the querystring gets us back to the old file again.
Is this behaviour intentional? I have read UserVoice requests where people wanted caching added to Functions and it was explicitly declined due to the number of alternatives available. I see no other explanation for this behaviour, though.
Does anyone know how to disable/defeat proxy response caching?

Refresh every 5 seconds - how to cache s3 files?

I store image files of my user model on s3. My frontend fetches new data from the backend (nodeJS) every 5 seconds. In each of those fetches, all users are retrieved which involves getting the image file from s3. Once the application scales this results in a huge request amount on s3 and high costs so I guess caching the files on the backend makes sense since they rarely change once uploaded.
How would I do it? Cache the file once downloaded from s3 onto the local file system of the server and only download them again if a new upload happened? Or is there a better mechanism for this?
Alternatively, when I set the cache header on the s3 files, are they still being fetched everytime I call s3.getObject or does that already achieve what I'm trying to do?
You were right in terms of the cost, which CloudFront would not improve. I was misleading.
Back to your problem, you can cache the files in the S3 bucket adding the metadata for that.
For example:
Cache-control = max-age=172800
You can do that in the console, or through the aws cli for instance.
If you request the files directly, and these have the headers, the browser should do a check on the etag
Validating cached responses with ETags TL;DR The server uses the ETag
HTTP header to communicate a validation token. The validation token
enables efficient resource update checks: no data is transferred if
the resource has not changed.
If you requests the files whit s3.getObject method it would do the request anyway, so it would download the file again.
Pushing not requesting:
If you can't do this, you might want to think about the backend pushing only new data to the frontend, instead of it requesting new data every 5 seconds, which would make the load significantly lower.
---
No so cost effective, more speed focused.
You could use CloudFront as a CDN for your S3 bucket. This will allow you to get the file faster, and also CloudFront would handle the cache for you.
You would need to setup the TTL accordingly to your needs, you can also invalidate the cache everytime you make an upload of a file if you need so.
From the docs:
Storing your static content with S3 provides a lot of advantages. But to help optimize your application’s performance and security while effectively managing cost, we recommend that you also set up Amazon CloudFront to work with your S3 bucket to serve and protect the content. CloudFront is a content delivery network (CDN) service that delivers static and dynamic web content, video streams, and APIs around the world, securely and at scale. By design, delivering data out of CloudFront can be more cost effective than delivering it from S3 directly to your users.

Google App Engine standard doesn't compress my Next.js/Express app

I'm trying to figure out what to do to make Google App Engine (standard version) apply compression to the output of my Next.js/Node.js/Express application.
As far as I've gathered, the problem is that
1) Google's load balancer removes all meta tags indicating that the client supports compression from the request, and thus app.use(compression()) in server.js won't do anything. I've tried to force compression using a {filter: shouldCompress} function, but doesn't seem to matter since Google's front end still returns an uncompressed result. (Locally compression works fine.)
2) How and when Google's load balancer chooses to apply compression is a mystery to me. (And particularly, why not to my silly but large application/javascsript content :))
Here's what they say in the docs:
If the client sends HTTP headers with the original request indicating
that the client can accept compressed (gzipped) content, App Engine
compresses the handler response data automatically and attaches the
appropriate response headers. It uses both the Accept-Encoding and
User-Agent request headers to determine if the client can reliably
receive compressed responses.
How Requests are Handled: Response Compression
So there's that. I'd love to use App Engine for this project but when index.js is 700KB instead of a compressed 200KB, it's kind of a showstopper.
As per the Request Headers and Responses documentation for Node.js, the Accept-Encoding header is removed from the request for security purpose.
Note: Entity headers (headers relating to the request body) are not sanitized or checked, so applications should not rely on them. In particular, the Content- MD5 request header is sent unmodified to the application, so may not match the MD5 hash of the content. Also, the Content-Encoding request header is not checked by the server, so if the client sends a gzipped request body, it will be sent in compressed form to the application.
Also note the response on Google Group which states:
Today, we are not passing through the Accept-Encoding header, so it is not possible for your middleware to decide that it should compress.
We will roll out a fix for this in the new few weeks.

Modify a CloudFront request before logging?

I'm building an ELK stack (for the first time) to track end-user REST API usage for a CloudFront distribution (in front of an S3 origin). Users pass a refresh token as part of their request and I was hoping to use this token to identify which users were making which request. Unfortunately, it looks like CloudFront access logs are missing some header information (particularly Authorization/Accept in my use case). This leaves me with three questions:
Is there a way to tell CloudFront to log additional items? It appears the answer is no.
As an alternative strategy, I tried modifying the request object with lambda#edge (in Viewer Request) to move the header information into the query string (so that it would get logged) but any manipulation in lambda#edge does not seem to be reflected in the log. (though it is reflected in the Origin Request function). Should this be possible?
If doing what I want is impossible, I think the alternative approach is forgo CloudFront logs completely and just fire an http request to logstash with every user request, but I feel like this could be easy to overload.
Thanks
After a few days of research and reaching out to Amazon, I was finally able to answer my own questions:
CloudFront logs can't be customized, they are what they are.
See 1.
It turns out that customization is the wrong approach. What I really need to do is aggregate two separate logs that have the information I need into a single logstash entry. It turns out that the Viewer Response lambda#edge function contains a requestId property (actually event.Records[0].cf.config.requestId) which matches the CloudFront log x-edge-request-id column. So while I haven't finished implementing it yet, these two columns can be used in the logstash config for aggregation. I just need to make sure I set up a Viewer Response event that logs out a consistent format that I can then part with logstash. I'm using the logstash-input-cloudwatch_logs to retrieve teh cloudwatch logs.

AWS backend + react front end : handling request

I'm messing around with AWS and I've set up a simple REST API using dynamodb, api gateway, and cognito. I've written the REST API using node + express.
My node app is on EB, and basically I handle authentication of requests in API gateway using cognito. As a standalone, this seems to be working fine as I'm testing it using a simple react app.
Now I'm doing server side rendering for my actual react app, so I'm trying to figure out the best way to handle this. For the server side rendering I have another node app called react-app-server, and I want to handle caching on the API gateway and use cloudfront for serving the static doc, images, etc.
So if I went to www.mysite.com/for-sale/some-item-thats-for-sale, this request should first check if there is a cached version of this page and serve it. Otherwise, I need to have my react-app-server render the .html and serve/cache it. Since there are two node applications, api-server and react-app-server, how can I point from my api-gateway to the react-app-server to render the html?
How does this scenario fit in with the AWS architecture? I realize this might be a really stupid question but I am really new to this. Thanks
I would recommend that you place Cloudfront in front of all your apps and allow Cloudfront to handle ALL of your caching using Cache-Control or an Expires header that you return on each HTML response. This will allow all the cached content to be returned from the Cloudfront edge servers improving performance and simplify your app a bit as well.
For example, if your node app returns an HTML document with a Cache-Control: public, max-age=31536000 Header, Cloudfront will read that value and will return the same HTML response from the edge servers for up to 1 year (31,536,000 seconds = 1 year). If your node app returns an HTML document with a Cache-Control: public, max-age=3600 Cloudfront will read that value and will cache the HTML for up to 1 hour at the edge servers.
You can have two (or more) origins defined in your distribution and you can use behaviors to control which origin each request would delegate to.
I actually just wrote a tutorial this AM on using Cloudfront in front of API Gateway: https://www.codeengine.com/articles/process-form-aws-api-gateway-lambda/. It's not identical to your use case but will help you get started if you're looking to use Cloudfront in front of API Gateway.
If you follow the tutorial a bit you can see that I'm serving most requests from an S3 bucket but routing paths starting with /rest/ to API Gateway which is believe would work for your use case as well.
CloudFront is a content delivery network which aims to minimize latency and transfer times for visitors from all parts of the world, by placing copies of your files in edge-locations. It has caching capabilities, so with the right setup you should be able to make it retrieve static content from your react-prerenderer and cache them.
The API Gateway is build to serve dynamic content and only runs in the main availability zones of AWS, not on edge locations. Routing a request via the API Gateway to CloudFront would be strange, if at all possible.
Another possibility is to handle cache in your application (e.g. in Express you could use mcache).

Resources