AWS Cloudfront latency: origin fetch vs internet

AWS Cloudfront latency: origin fetch vs internet - amazon-cloudfront

Is the first fetch of any given file from an origin via Cloudfront faster on average than fetching directly from the origin over the internet? I'm wondering if the AWS backbone somehow outperforms the speed of the public internet.
Eg if a user from Sydney wants a file from my S3 in Europe, and Cloudfront doesn't yet have it cached, is it quicker to get it directly over the internet, or for Cloudfront to fetch it from the European origin to the Sydney edge cache and to the internet for the last few hops? But that's just an example. Users will be worldwide, and many will be in Europe, close to the origin.
I do understand that AFTER that request to origin the CDN will cache the file and subsequent requests from Sydney for that same file within the file's TTL will be much faster, but subsequent requests will not happen often in my use case...
I have a large collection of small files (<1MB) on S3 which seldom change, and each of them individually is seldom downloaded and will have a TTL of about 1 week.
I'm curious if putting Cloudfront in front of S3, in this case, will be worth it even though I won't get much value from the edge caching service that the CDN provides.
So should I expect to see any latency decrease on average for those first fetch scenarios?
EDIT: I subsequently found this article which mentions "Persistent Connections... reduces overall latency...", but I suspect it just means better performance of the Cloudfront-to-origin subsystem, and not necessarily better end-to-end perf for the user.

I'm wondering if the AWS backbone somehow outperforms the speed of the public internet.
The idea is that it should.
You should see an overall improvement, because CloudFront does several useful things, even when not caching:
brings the traffic onto the AWS managed network as close to the viewer as practical, with the traffic traversing most of its distance on the AWS network rather than on the public Internet.
sectionalizes the TCP interactions between the browser and the origin by creating two TCP connections¹, one from browser to CloudFront, and one from CloudFront to origin. The back-and-forth messaging that occurs for connection setup, then TLS negotiation, then HTTP request/response, are optimized.
(optional) provides http/2 to HTTP/1.1 gateway/translation, allowing the browser to make concurrent requests over a single http/2 connection while converting these to multiple HTTP/1.1 requests on separate connections to the origin.
There are some minor arbitrage opportunities in the discrepancies between costs for traffic leaving a region bound for the Internet and traffic leaving a CloudFront edge bound for the Internet. (Traffic outbound from EC2/S3 to CloudFront is not billable). In many cases, these work in your favor, such as a viewer in a low cost area accessing a bucket in a high cost area, but they are almost always asymmetric. A London viewer and a Sydney bucket is $0.14/GB accessing the bucket directly, but $0.085/GB accessing the same bucket through CloudFront. On the flip side, a Sidney viewer accessing a London bucket is $0.09/GB direct to the bucket, $0.14/GB through CloudFront. London viewer/London bucket is $0.085 through CloudFront or $0.09/GB direct to the bucket. It is my long-term assumption thst these discrepancies represent the cost of Internet access compared to the cost of AWS's private transport. You can also configure CloudFront, via the price class feature, to use only the lower cost edges, which is not guaranteed to actually use only the lower cost edges for traffic, but rather guaranteed not to charge you a higher price if a lower cost edge is not used.
Note also that there are two (known) services that use CloudFront with caching always disabled:
Enabling S3 Transfer Acceleration on a bucket is fundamentally a zero-config-required CloudFront distribution without the cache enabled. Transfer acceleration has only three notable differences compared to a self-provisioned CloudFront + S3 arrangement: specifically, it can pass-through signed URLs that S3 understands and accepts (with S3 plus your own CloudFront, you have to use CloudFront signed URLs, which use a different algorithm) and the CloudFront network is bypassed for users who are geographically close to the bucket region, which also eliminates the Transfer Acceleration surcharge for that request. The third difference is that it almost always costs more than your own CloudFront + S3.
AWS apparently believes the value added here is significant enough that for the feature to cost more than using S3 + CloudFront yourself makes sense. On occasion, I have used it to squeeze a bit more optimization out of a direct-to-bucket arrangement, because it is an easy change to make.
Find the Transfer Acceleration speed test on this page and observe what it does. This is upload, rather than download, but it is the same idea -- it gives you reasonable depiction of the differences between the public Internet and the AWS "Edge Network" (the CloudFront infrastructure).
API Gateway edge-optimized APIs also do route through CloudFront for performance reasons. While API Gateway does offer optional caching, it uses a caching instance, not the CloudFront cache. API subsequently introduced a second type of API endpoint that doesn't use CloudFront, because when you are making requests within the same actual AWS region, it doesn't make sense to send the request through extra hardware. This also makes deploying API Gateway behind your own CloudFront a bit more sensible, avoiding an unnecessary second pass through the same infrastructure.
¹two TCP connections may actually be three, which should tend to further improve performance because the boundary between each connection provides a content buffer that allows for smoother and faster transport and changes the bandwidth-delay product in favorable ways. Since some time in 2016, CloudFront has two tiers of edge locations, the outer "global" edges (closest to the viewer) and the inner "regional" edges (within the actual AWS regions). This is documented but the documentation is very high-level and doesn't explain the underpinnings thoroughly. Anecdotal observations suggest that each global edge has an assigned "home" regional edge that is the regional edge in its nearest AWS region. The connection goes from viewer, to outer edge, to the inner edge, and then to the origin. The documentation suggests that there are cases where the inner (regional) edge is bypassed, but observations suggest that these are the exception.

Related

How to deal with millions queries to DNS server?

I'm wondering, how modern DNS servers dealing with millions queries per second, due to the fact that txnid field is uint16 type?
Let me explain. There is intermediate server, from one side clients sending to it DNS requests, and from other side server itself sending requests to upper DNS server (8.8.8.8 for example). So the thing is, that according to DNS protocol there is field txnid in the DNS header, which should be unchanged during request and response. Obviously, that intermediate DNS server with multiple clients replace this value with it's own txnid value (which is a counter), then sends request to external DNS server and after resolving replace this value back to client's one. And all of this will work fine for 65535 simultaneous requests due to uint16 field type. But what if we have hundreds of millions of them like Google DNS servers?

Going from your Google DNS server example:
In mid-2018 their servers were handling 1.2 trillion queries-per-day, extrapolating that growth says their service is currently handling ~20 million queries-per-second
They say that successful resolution of a cache-miss takes ~130ms, but taking timeouts into account pushes the average time up to ~400ms
I can't find any numbers on what their cache-hit rates are like, but I'd assume it's more than 90%. And presumably it increases with the popularity of their service
Putting the above together (2e7 * 0.4 * (1-0.9)) we get ~1M transactions active at any one time. So you have to find at least 20 bits of state somewhere. 16 bits comes for free because of the txnid field. As Steffen points out you can also use port numbers, which might give you another ~15 bits of state. Just these two sources give you more than enough state to run something orders of magnitude bigger than Google's DNS system.
That said, you could also just relegate transaction IDs to preventing any cache-poisoning attacks, i.e. reject any answers where the txnid doesn't match the inflight query for that question. If this check passes, then add the answer to the cache and resume any waiting clients.

How to prevent malicious costly querying on cloud platforms?

Assume the Joker is a maximally sophisticated, well-equipped and malicious user of Batman's start up batmanrules.com hosted by, say, AWS infrastructure. The business logic of batmanrules.com requires that unregistered users be able to send http requests to the REST API layer of batman.com, which lead to the invocation (in one way or another) of queries against an AWS-based DB. Batman doesn't want to be constrained by DB type (it can be either SQL or noSQL).
The Joker wants to ruin batman financially by sending as many http requests as he can in order to run up Batman's AWS bill. The Joker uses all the latest tricks in the book using DDOS-like methods to send http requests from different IP addresses that target all sorts of mechanisms within batman.com's business logic.
Main Question: how does Batman prevent financial ruin while keeping his service running smoothly for his normal users?
Assume a lot of traffic is going on, how can you weed out the 'malicious' queries from the non-malicious, especially when users arent being registered? I know you can do rate-limiting against IP addresses, but cant the Joker (who is maximally sophisticated and well-equipped) find clever ways to issue requests from ever-changing IP addresses, and or to tweak the requests so that no two are exactly the same?
Note: my question focuses not on denial of service -- let's assume it's ok if the site goes down for a while -- but, rather, on Batman's financial loss. Batman has done a great job on making the architecture scale up and down with varying load, his only concern is that high loads (induced by Joker's shenanigans) entail high cost.
My instinct tells me that there is no silver bullet here, and that batman would have to build safeguards into his business logic (e.g. shut down if traffic spikes within certain parameters) AND/OR to require reCAPTCHA tokens on all non-trivial requests submitted to the REST API.

You can use AWS WAF and configure rules to block malicious users.
For example a straight forward rule would be to do a rate base blocking where if you could find its highly unlikely to get above X amount of requests concurrently from a same IP address.
For advanced use cases you can implement custom rules by analyzing the request logs with Lambda and to apply the block in WAF.
In addition, as you clearly identified it is not possible to prevent all the malicious requests. The goal should be to inspect and prevent which is an ongoing process with the right architecture in place to block requests on need basis.

Expected performance with getstream.io

The getstream.io documentation says that one should expect retrieving a feed in approximately 60ms. When I retrieve my feeds they contain a field named 'duration' which I take is the calculated server side processing time. This value is steadily around 10-40ms, with an average around 15ms.
The problem is, I seldomly get my feeds in less than 150ms and the average time is rather around 200-250ms and sometimes up to 300-400ms. This is the time for the getting the feed alone, no enrichment etc., and I have verified with tcpdump that the network roundtrip is low (around 25ms), and that the time is actually spent waiting for the server to respond.
I've tried to move around my application (eu-west and eu-central) but that doesn't seem to affect things much (again, network roundtrip is steadily around 25ms).
My question is - should I really expect 60ms and continue investigating, or is 200-400ms normal? On the getstream.io site it is explained that developer accounts receive "Low Priority Processing" - what does this mean in practise? How much difference could I expect with another plan?
I'm using the node js low level API.

Stream APIs use SSL to encrypt traffic. Unfortunately SSL introduces additional network I/O. Usually you need to pay for the increased latency only once because Stream HTTP APIs supports HTTP persistent connection (aka keep-alive).
Here's a Wireshark screenshot of the TCP traffic of 2 sequential API requests with keep alive disabled client side:
The 4 lines in red highlight that the TCP connection is getting closed each time. Another interesting thing is that the handshaking takes almost 100ms and it's done twice (the first bunch of lines).
After some investigation, it turns out that the library used to make API requests to Stream's APIs (request) does not have keep-alive enabled by default. Such change will be part of the library soon and is available on a development branch.
Here's a screenshot of the same two requests with keep-alive enabled (using the code from that branch):
This time there is not connection reset anymore and the second HTTP request does not do SSL handshaking.

High amount of http read timeouts on azure

When we migrated our apps to azure from rackspace, we saw almost 50% of http requests getting read timeouts.
We tried placing the client both inside and outside azure with the same results. The client in this case is also a server btw, so no geographic/browser issues either.
We even tried increasing the size of the box to ensure azure wasn't throttling. But even using D boxes for a single request, the result was the same.
Once we moved out apps out of azure they started functioning properly again.
Each query was done directly on an instance using a public ip, so no load balancer issues either.
Almost 50% of queries ran into this issue. The timeout was set to 15 minutes.
Region was US East 2

Having 50% of HTTP requests timing out is not normal behavior. This is why you need to analyze what is causing those timeouts by validating the requests are hitting your VM. For this, I would recommend you running a packet capture on your server and analyze response times, as well as look for high number of retransmissions; it is even better if you can take a simultaneous network trace on your clients machines so you can do TCP sequence number analysis and compare packets sent vs received. 
If you are seeing high latencies in the packet capture or high number of retransmissions, it requires detailed analysis. I strongly suggest you to open a support incident so Microsoft support can help you investigate your issue further.

ServicePointManager SetTcpKeepAlive

We have a web site which calls Azure Storage thousands of times a second. All of the storage endpoints are HTTPS. Does anyone know if setting ServicePointManager.SetTcpKeepAlive = true will help with performance? It is disabled by default.

Not sure if enabling tcp keep-alive will help your performance issue (it should be easy enough for you to benchmark), but... if you're calling storage endpoints from your Azure-hosted web site, and storage is in the same region (same data center), you shouldn't need https, since traffic never leaves the data center.
EDIT since you're working with the ServicePointManager, also consider setting ServicePointManager.UseNagleAlgorithm=false. Otherwise, small tcp packets get buffered up to 1/2-second. If your storage communication is for small (less than ~1400 byte) payloads, this setting should help (especially when dealing with things like Azure Queues, which tend to have very small messages).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string