Bypass CloudFront cache - amazon-cloudfront

Is there a way for certain users to force a bypass of CloudFront's cache so that they can always request new versions of files from the source, but the new files that they request are then available on the cache to other normal-permission users?
For example, if I had a sports score app that cached score information for 5 minutes, but I wanted to add a premium tier that allows subscribers to download up-to-date data whenever they use the app. I'd still want the data that the premium users download to be cached on CloudFront so that regular users don't waste API calls into the servers.
I was thinking maybe there would be a way to do this in the request headers, but I couldn't find any documentation on it.

Related

Azure CDN high max latency

We are experimenting with using blazor webasembly with angular. It works nicely, but blazor requires a lot of dlls to be loaded, so we decided to store them in azure blob storage, and to serve it we use microsoft CDN on azure.
When we check average latency as users started working, it shows values between 200-400 ms. But max values of latency jump to 5-6 minutes.
This happens for our ussual workload 1k-2k users over course of 1 hour. If they dont have blazor files cached locally yet that can be over 60 files per user requested from cdn.
My question is if this is expected behaviour or we can have some bad configuration somewhere.
I mention blazor WS just in case, not sure it can be problem specificaly with way how these files are loaded, or it is only because it is big amount of fetched files.
Thanks for any advice in advance.
I did check if files are served from cache, and from response headers it seems so: x-cache: TCP_HIT. Also Byte Hit ratio from cdn profile seem ok,mostly 100% and it never falls under 65%.

Azure cdn Ignore query strings purpose

I know what is the difference between Azure CDN query string modes and I have read a helpfull example of query string modes but...
I don't understand what is the purpose of "Ignore query strings" or how this can be useful in a real dynamic web.
For example, suppose we have a product purchase website with a URL similar to www.myweb.com/products?id=3
If we use "Ignore query strings"... Does this mean that if an user later requests product 4 (www.myweb.com/products?id=4), he will receive the page for product 3?
I think I'm not understanding correctly Azure CDN, I'm seeing Azure CDN as a dynamic content CDN, however Azure CDN is only used for static content as this article explains:
Standard content delivery network (CDN) capability includes the ability to cache files closer to end users to speed up delivery of static files.
This is correct? Any help or example on the subject is welcome
Yes, if you are selected Ignore query strings Query string caching behavior (this is the default), in your case subsequent requests after the initial request www.myweb.com/products?id=3, no matter the query string value, that POP server will serve the same content until it's cache period expires.
And for the second question, CDN is all about serving static files. To my understanding i believe what the article says is about dynamic site accelaration. It's about bunch of techniques to optimize dynamic web sites content serving performance. Because unlike static web sites, dynamic web sites assets (static files. ex: images, js, css, html) are loading dynamically based on the user behavior.
Now that I have it clearer, I will answer my question:
Azure CDN - Used to cache static content, even on dynamic web pages.
For the example in the question, all products must download the same javascript and css content, for those types of files Azure CDN is used. Real example using "Ignore query strings":
User A www.myweb.com/products?id=3, jquery-versionX.js and mystyles.css are not cached, the server is requested and the user receives it.
User B www.myweb.com/products?id=4, since we are using "Ignore query strings" the jquery-versionX.js and mystyles.css files are cached, they are served to the user without requesting it from the server again.
User C www.myweb.com/products?id=3, since we are using "Ignore query strings" the jquery-versionX.js and mystyles.css files are cached, they are served to the user without requesting it from the server again.
Reddis or other similar - Used to cache dynamic content (queries to databases for example).
For the example in the question, all the products have different information, which is obtained by doing a database query. We can store those queries or JSON objects in a Reddis cache. Real example:
User A www.myweb.com/products?id=3, product 3 is not cached, it is requested from the server and received by the user.
User B www.myweb.com/products?id=4, product 4 is not cached, it is requested from the server and received by the user.
User C www.myweb.com/products?id=3, product 3 is cached, the server is not requested and the user receives it from the cache.
Summary:
Both methods can be used simultaneously, Azure CDN is for static content and Reddis or similar for dynamic content.

How to improve performance on backend when data is fetched from multiple APIs in sequencial manner?

I am creating a Nodejs app that consumes APIs of multiple servers in a sequential manner as the next request depends on results from previous requests.
For instance, user registration is done at our platform in PostgreSQL database. User feeds, chats, posts are stored at getStream servers. User roles and permissions are managed through CMS. If in a page we want to display a list of user followers with some buttons as per the user permissions then first I need to find list of my current user's followers from getStream then enrich them with my PostgreSQL DB then fetch their permissions from CMS. Since one request has to wait for another it takes long time to give response.
I need to serve all that data in a certain format. I have used Promise.all() where requests were not depending on each other.
I thought of a way to store pre-processed data that is ready to be served but I am not sure how to do that. What is the best way to solve this problem?
sequential manner as the next request depends on results from previous requests
you could try using async/await so that each request will run in a sequential manner.

Refresh every 5 seconds - how to cache s3 files?

I store image files of my user model on s3. My frontend fetches new data from the backend (nodeJS) every 5 seconds. In each of those fetches, all users are retrieved which involves getting the image file from s3. Once the application scales this results in a huge request amount on s3 and high costs so I guess caching the files on the backend makes sense since they rarely change once uploaded.
How would I do it? Cache the file once downloaded from s3 onto the local file system of the server and only download them again if a new upload happened? Or is there a better mechanism for this?
Alternatively, when I set the cache header on the s3 files, are they still being fetched everytime I call s3.getObject or does that already achieve what I'm trying to do?
You were right in terms of the cost, which CloudFront would not improve. I was misleading.
Back to your problem, you can cache the files in the S3 bucket adding the metadata for that.
For example:
Cache-control = max-age=172800
You can do that in the console, or through the aws cli for instance.
If you request the files directly, and these have the headers, the browser should do a check on the etag
Validating cached responses with ETags TL;DR The server uses the ETag
HTTP header to communicate a validation token. The validation token
enables efficient resource update checks: no data is transferred if
the resource has not changed.
If you requests the files whit s3.getObject method it would do the request anyway, so it would download the file again.
Pushing not requesting:
If you can't do this, you might want to think about the backend pushing only new data to the frontend, instead of it requesting new data every 5 seconds, which would make the load significantly lower.
---
No so cost effective, more speed focused.
You could use CloudFront as a CDN for your S3 bucket. This will allow you to get the file faster, and also CloudFront would handle the cache for you.
You would need to setup the TTL accordingly to your needs, you can also invalidate the cache everytime you make an upload of a file if you need so.
From the docs:
Storing your static content with S3 provides a lot of advantages. But to help optimize your application’s performance and security while effectively managing cost, we recommend that you also set up Amazon CloudFront to work with your S3 bucket to serve and protect the content. CloudFront is a content delivery network (CDN) service that delivers static and dynamic web content, video streams, and APIs around the world, securely and at scale. By design, delivering data out of CloudFront can be more cost effective than delivering it from S3 directly to your users.

Hard-Coding Categories or Fetching from API

What is the recommended method of getting CategoryIds. I understand Foursquare provides this list: https://developer.foursquare.com/categorytree. My question is should I just use this list and hard-code the values or fetch the Ids on first opening of the app and caching these results?
From the venues/categories API documentation:
When designing client applications, please download this list only once per session, but also avoid caching this data for longer than a week to avoid stale information.
So fetch on app launch and cache for the current session to insure the hierarchy is always up-to-date.

Resources