Do browsers re-try DNS when a page load fails? - dns

After Amazon's failure and reading many articles about what redundant/distributed means in practice, DNS seems to be the weak point. For example, if DNS is set to round-robin among data centers, and one of the data centers fails, it seems that many browsers will have cached that DNS and continue to hit a failed node.
I understand time-to-live (TTL), but of course this may be set to a long time.
So my question is, if a browser does not get a response from an IP, is it smart enough to refresh the DNS in the hope of being routed to another node?

Round-robin DNS is a per-browser thing. This is how mozilla does it:
A single host name may resolve to multiple ip addresses, each of which is stored in the
host entity returned after a successful lookup. Netlib preserves the order in which the dns
server returns the ip addresses. If at any point during a connection, the ip address
currently in use for a host name fails, netlib will use the next ip address stored in the
host entity. If that one fails, the next is queried, and so on. This progression through
available ip address is accomplished in the NET_FinishConnect() function. Before a url load
is considered complete because it's connection went foul, it's host entity is consulted to
determine whether or not another ip address should be tried for the given host. Once an ip
address fails, it's out, removed from the host entity in the cache. If all ip addresses in
the host entity fail, netlib propegates the "server not responding" error back up the call
chain.
As for Amazon's failure, there was NOTHING wrong with DNS during Amazon's downtime. The DNS servers correctly reported the IP addresses, and the browsers used those IP addresses. The screw-up was on Amazon's side. They re-routed traffic to an overwhelmed cluster. The DNS was dead-on, but the clusters themselves couldn't handle the huge load of traffic.
Amazon says it best themself:
EC2 provides two very important availability building blocks: Regions and Availability
Zones. By design, Regions are completely separate deployments of our infrastructure.
Regions are completely isolated from each other and provide the highest degree of
independence. Many users utilize multiple EC2 Regions to achieve extremely-high levels of
fault tolerance. However, if you want to move data between Regions, you need to do it via
your applications as we don’t replicate any data between Regions on our users’ behalf.
In other words, "remember all of that high-availability we told you we have? Yeah it's really still up to you." Due to their own bumbling, they took out both the primary AND secondary nodes in the cluster, and there was nothing left to fail over to. And then when they brought it all back, there was a sudden "re-mirroring storm" as the nodes tried to synchronize simultaneously, causing more denial of service. DNS had nothing to do with any of it.

Related

What does "IPv4 addresses in the same AS" mean?

I'm in the process of moving a website from HostGator to Amazon EC2. Front end and back end are both moved. I added a Hosted Zone in Amazon Route 53 and updated my nameservers in HostGator. Unfortunately, the site won't load.
I ran a check with Zonemaster and received the following warnings:
All nameservers in the delegation have IPv4 addresses in the same AS
(16509). All nameservers in the delegation have IPv6 addresses in the
same AS (16509). All nameservers in the delegation are in the same AS
(16509).
I've searched but can't figure out what "AS" means in this context. Would love some help to point me in the right direction for troubleshooting.
The domain in question is tektonbody.com.
Thanks!
AS refers to Autonomous System which in rough terms means "block of IPs that share common routing" or in more general terms "are from the same allocation block".
You're getting this warning because the nameservers are all in the same block and if that single route goes offline for some reason, all your nameservers go down. It's generally best to spread these out geographically to minimize your exposure to localized events.
They're just looking out for you here. Typically you should have 3-4 different nameservers on different backbone providers in different regions so that no single failure, even at the provider level, can take them all down.
Adding an A record pointing to the IP of our ec2 instance fixed this issue for me
We are migrating from OVH to AWS

Does sharing the same IP necessarily mean sharing the same server?

I was curious as to why one client site on a shared server was performing very poorly and I wanted to know if there was a way to find out how many other sites were being hosted on the same server. I found this reverse IP lookup site:
http://reverseip.domaintools.com/
that claims the client's site IP is also being used by 3000+ other sites. I did a quick survey of other clients' sites and this is more than twice the next closest, most being in the 800 - 1500 range.
Does this mean that there are 3000+ sites being hosted on one server, or could there still be multiple servers sharing an IP? Basically I want to know if this is the main likely reason the site is slow.
On public internet, sharing the same IP address does not mean sharing the same physical server. Here are the ways of sharing an IP, and yet processing on different physical server:
Most often, the public IP addresses are the interfaced by a Load Balancer, or a Reverse Proxy, or a Gateway. This device then routes (technically proxies) the connection to one of the physical servers running behind them. All these are within the firewall/network/data_cente of the "serving" organization.
Unless designed (or ill-designed) to reveal information about the internal IP addresses, there is no way to figure out the IP address of physical device that actually processed the request.
Anycast allows you to have the same IP address being available at different geographical locations. Look at Google's DNS servers (IP address 8.8.8.8). Such services are anycasted, to serve from the nearest geo-location.
This is also true from server's perspective. A server does not necessarily know the "original" IP address from where the request initiated. Most often, we are proxied, and/or NAT'ed by routers and other devices at our home and offices. After all, there are only so many public IP addresses available (at least IPv4), and we cannot have one public IP address for each device :) .
Closing statement: The server and the client only know the ingress/egress points of each other's network. Beyond that, they have no idea of the internal IP addresses of the physical devices.
Yes, it can very well mean that. It is very common, and is the only way companies selling you hosting for pennies can even approach turning a profit.
It is done with virtual hosting support in the web server. This relies on DNS and the browser / client providing the referrer URL to the server as part of the HTTP request. The HTTP server then knows who the client thinks he is requesting a URI from, and maps the request to that site tree. Those trees often sit on the same disk, though the sites may be jailed or virtualized.
I've seen numbers higher than 3000, for example.
If you want better, you have to move to a higher quality provider, and/or obtain your own IP addresses.

Where is an IP Anycast Nameserver system implemented?

Ive been reading alot about nameserver the last days. For our websites we want to optimize the waiting time of the visitors that is caused by our namserver. I will have some questions about IP Anycast and the general function of the DNS. Let me start by explaining what I understood the DNS works from user side:
User X wants to visit www.example.com, the following steps happen to get the IP address:
1.Step: User X sends request to the Nameserver of his ISP or nameserver by choice.(recursive nameserver)
2.Step: If the adress is not found, the recursive nameserver will send a request to one of the 13 root nameserver to get the nameserver for the .com TLD
3.Step: Query the .com nameserver to get the auhorative nameserver
4.Step: Query the auhorative nameserver to get the ip-address for www.example.com
First I realized that as a owner of a website you can only optimize Step number 4 and all other steps are not in our hands.
I came across IP Anycast nameserver (what is also used for the 13root nameservers) and totally understand the concept of distributed machines. But what I dont understand is where the decision logic, to which of the distributed machines the user will be send, according to his "position",is implemented? I mean when i buy an anycast nameserver, the logic should be implemented on the .com nameserver (Step 3), so that this nameserver decides to which machine of my anycast nameserver the user will be send.
For me thats really hard to understand and im asking myself if it really works that way? I hope someone can help me with these understanding questions.
Beside of that i found out, that another small method to gain some speed for the user, is to only use A Records and no CName Records anymore.
Are there some more ways to optimize a nameserver?
Thanks in advance!
Your question is not really related to programming, but more to operations, and is also a little too vague ("Are there some more ways to optimize a nameserver?").
But let us try to give you pointers.
User X wants to visit www.example.com, the following steps happen to get the IP address:
Your following description is then mostly correct. Note that at each step, by default until very recently, a recursive nameserver will send the whole name queried to each nameserver. Recently, QNAME Minimization appeared as a standard and now recursive nameservers can send to each authoritativ nameservers only the labels it neeeds to reply. This enhances privacy without changes to the protocol, but is not widespread today because some authoritative nameservers do not work correctly when queried that way.
As a domain name owner you can indeed only have an impact at the last step. But remember that recursive nameservers have caches, so the list of root nameservers as well as the list of .COM nameservers for example are so "hot" (so often needed) that they surely sit always in resolvers' caches, so basically step 1 and 2 happens do not happen often (at start when cache is empty typicaly).
I came across IP Anycast nameserver (what is also used for the 13root nameservers) and totally understand the concept of distributed machines. But what I dont understand is where the decision logic, to which of the distributed machines the user will be send, according to his "position",is implemented?
First things first, IP anycasting is not specific to the DNS, it is just hugely popular here because
it solves the load balancing/fail over problem that all big TLDs have
it works specially well with DNS over UDP which is a simple one query one reply protocol.
So any service can theoretically be anycasted. It means that a given IP address just appears at different locations in the world.
To summarize very broadly, Internet traffic between providers (AS numbers) is exchanged at peering points, where they interconnect and each provider says "I know about IP block 192.0.2.0/24, please send me all traffic for it", etc. for each blocks
(again this is a summary. Blocks are allocated by RIRs, and yes by default this is not very much authenticated so BGP hijacks happen when another provider also says "give me this traffic" when it shouldn't - and it happens because of malicious goals or just simple human errors).
For a normal (technical term: "unicast") IP address, only a single provider (AS) will announce it somewhere (technically: announce its block not just a single IP) and everything will be configured in such a way that wherever the start of the exchange is, for this single IP as destination, it will arrive at the exact same box.
On the contrary, for an anycast IP address, either a single provider or multiple ones (that is multiple Autonomous Systems) will announce this IP at various locations (peering points) in the globe. At each peering point, traffic for this IP will get taken by the provider announcing it there and then it will route this traffic to a specific server "nearby". Announcements of the same IP at peering point A and peering point B, will drive corresponding traffic on one side in datacentre X and the other for datacentre Y.
For the client, when everything works, it does not change anything, as long as all the replying servers react the same way to the same query. The client does not (and sometimes can not) even know the IP is anycasted or that it want to location X when another client doing the same thing will instead hit location Y.
So in short nameservers "decide" nothing in this regard. At each point of the DNS resolution, when they need to contact nameserver NS1 they know its IP address is IP1 and they just open an UDP (or sometimes TCP) connection to this IP, absolutely normally. It is the underlying IP and BGP protocols that will, if anycast is in action, make the response come from the appropriate "close" server.
Note that anycasting in this way, for DNS, achieve both:
fail-over : if one server dies, with appropriate monitoring, its provider withdraws its IP announcement, that is this local copy kind of disappear and the traffic will automatically (in order of seconds) shift to any other instance where the same IP is announced
load-balancing : rougly speaking, if you anycast one IP on 2 locations, each should receive 50% of traffic. It is not true in practice, and is very complicated (read: impossible) to predict or even monitor, because it all depends on the peering points, the agreements between the providers and various other policies (simple example: if you peer at two points where on first there is only one provider sending you trafic, and at other point you have 100 providers with whom you exchange traffic, then you may get more connections going to the second instance... except of course if single provider at first peering point is an ISP with millions of clients, where the other 100 providers are single small organizations...)
So, some nameservers are anycasted. Nowadays all the root ones are (but this was not true 16 months ago, see https://b.root-servers.org/news/2017/04/17/anycast.html as b.root-servers.org was the last one to board the anycast wagon) as well as all big TLDs, sometimes even with more that one "Anycast DNS providers".
For any domain name, you can get some providers that will give you a DNS service for it, based on a "cloud" of anycasted nameservers.
See for example:
https://www.pch.net/services/dns_anycast
https://www.netnod.se/dns/netnod-dns-services
https://dyn.com/dns/network-map/
http://www.cdns.net/anycast.html
https://www.rcodezero.at/en/home/
https://aws.amazon.com/route53/
https://cloud.google.com/dns/
and many others.
Now following on a totally different topic:
Beside of that i found out, that another small method to gain some speed for the user, is to only use A Records and no CName Records anymore.
This is not really something you gain things with, and CNAME records are useful in many other cases.
Again, you need to remember that there are caches.
So even if your configuration is:
www.mywebsite.example CNAME www.mywebsite.example.somegreatCDN.example
www.mywebsite.example.somegreatCDN.example A 192.0.2.128
it is true that this means on paper two DNS requests to finally be able to do an HTTP query, but in practice things will be cached (even more so today with big public open resolvers such as 1.1.1.1 or 8.8.8.8 or 9.9.9.9, that are anycasted too in fact), so the difference will be negligible (and only impacts the first time, never again until it is in cache) ... especially in the case of HTTP and everything that happens later that is opening, frequently, dozens of connections to download javscript source codes, CSS files, fonts, etc. that may be hosted elsewhere.
A lot of websites use CNAME records without negative impact. See www.amazon.com for example, right now:
;; ANSWER SECTION:
www.amazon.com. 730 IN CNAME www.cdn.amazon.com.
www.cdn.amazon.com. 11 IN CNAME d3ag4hukkh62yn.cloudfront.net.
d3ag4hukkh62yn.cloudfront.net. 11 IN A 54.239.172.122
You may however argue that some names will be more popular than others and hence stay longer in cache, which is certainly the case.
And finally:
Are there some more ways to optimize a nameserver?
Based on what? We touched various subjects above, all are compromises, you sacrifice something (it may be just "money") to gain something else (redundancy, etc.). There is no generic rule to declare when this compromise makes sense or not, it will depend a lot on your situation and what you are trying to do.
You are right, and should be congratulated about that, that you should invest some time around DNS setup, both for security and performance reasons. While a lot of money is often invested in huge HTTP setup to sustain various problems or spikes of activity (but even the best fail sometimes, see the recent Amazon Prime Day opening that was a gigantic failure), but often people forget about the DNS because it is on the infrastructure level so not well known nor understood (using UDP makes it already stand out from all other known protocols, as this is rare).
For example there is another completely different thing (it is orthogonal to anycasting, so it can work with or without it, the goals are different) that is related: "geo-DNS" means when a nameserver will reply differently based from where the client asks. This is meant to give, for example, a different IP for a webserver, one that is closer to client (so in that case the webserver itself is probably not anycasted). This is done by just looking at the source IP from the DNS packet, but it is often not good enough because the authoritative nameservers only see as source IP the one from the recursive nameserver and not the real end client one and nowadays with big open public recursive nameservers the location should be far off, so you also have a specific DNS option called EDNS Client Subnet that can be passed between recursive and authoritative nameservers so that they get the end client real IP address (in fact a block not a single IP for privacy reasons) and can act upon it.
Short answer is: you are right. The NameServers is where you can optimize and all "IP Anycast" products I have seen is just a NameServer setup that has a lot of locations.
They use the same system as the "root servers of the internet" but this does not mean that they have the same function. The IP Anycast is simply a method for multiple servers in different locations to serve the same IP address.
From WIKIPEDIA (http://en.wikipedia.org/wiki/Anycast)
On the Internet, anycast is usually implemented by using Border Gateway Protocol to simultaneously announce the same destination IP address range from many different places on the Internet. This results in packets addressed to destination addresses in this range being routed to the "nearest" point on the net announcing the given destination IP address.
If you are using a big ISP like ASCIO or someone using ULTRADNS you probably do not have to worry about this step too much, but if the NS is a local ISP it is worth considering. Make sure you have NS where your visitors are.
I assume this is where you came into contact with "IP Anycast" products. None that I have seen offers anything to attack step 1-2-3 but rather offers a large setup of NameServers allowing them to reduce resolving time due to closeness of networks.
Let me know if you are of the understanding that the offer is for a root NameServer setup, because I would like to see this.

My EC2 instance receives traffic for unrelated hostnames. How does this happen?

I have a couple EC2 instances behind an Elastic Load Balancer. These instances serve HTTP requests for a single web site. I recently started looking at the HOST header of the traffic, because I am planning to split my app into virtual hosts.
With some regularity (dozens of times a day), I log a request for a host name that is totally unrelated to my servers. As a couple examples, today I saw requests with the host names ad.adserverplus.com and r1---sn-upfn-hp5e.c.youtube.com. I looked these up and the IP addresses are not the same as any of my servers, nor of the ELB, so I am trying to develop a theory as to how this happens.
I realize that someone could be spoofing the host header, but it happens often enough that I am pretty sure this is not what is going on. My other idea is that somehow there is stale DNS data that just happens to resolve one of those hosts to my IP address, but again this seems like it could happen once in a great while but not regularly. What are some other possibilities, and how might I verify / discredit them?
EDIT
I looked at some of the unexpected host names today, and it seems that they actually do resolve to an IP that is one of the possible IPs that my domain apex resolves to. I use Route 53 for DNS, and I have the zone apex pointed to the ELB, so when I query the IP address for my domain, I get different answers depending on when I ask. So this makes me very curious, how do these IP addresses get assigned to me and how does EC2 make sure they are not co-opting an IP address that someone else is already using.
There are any number of reasons for this. First you should understand that the public host name for your EC2 instances and load balancers have likely been used before. If you have an elastic IP associated with your load balancer, it has also probably been used before.
As such you can get traffic to your servers that is intended for a previous tenant of that hostname of IP address that you are currently using.
One thing you can do is to configure your web servers to reject traffic (respond with 403) to traffic that is not arriving with the proper hostname specified or that comes from a specific external host.
Your IP or your ELBs IP may have at one point in time been an open proxy. meaning that someone is hoping that you would forward the requests on to their intended destination.
but in general open port 80 to the internet and all kinds of bots and zombies will visit you with a pretty constant flow of dodgy requests. I would imagine though that the \ec2 IP ranges would be a particularly juicy range to search for poorly patched websites to exploit.

Using DNS for failover using multiple A records

It has recently come to my attention that setting up multiple A records for a hostname can be used not only for round-robin load-balancing but also for automatic failover.
So I tried testing it:
I loaded a page from our domain
Noted which of our servers had served the page
Turned off the web server on that host
Reloaded the page
And indeed the browser automatically tried a different server to load the page. This worked in Opera, Safari, IE, and Firefox. Only Chrome failed to try a different server.
But after leaving that server offline for a few minutes and looking at the access logs, I found that the number of requests to the other servers had not significantly increased. With 1 out of 3 servers offline, I had expected accesses to each of the remaining 2 servers to roughly increase by 50%, but instead I only saw 7-10%. That can only mean DNS-based failover does not work for the majority of browsers/visitors, which directly contradicts what I had just tested.
Does anyone have an idea what is up with DNS-based web browser failover? What possible reason could there be why automatic failover works for me but not the majority of our visitors?
What's happening is that the browsers are not doing automatic DNS failover.
If you have multiple A records on a domain then when your nameserver requests the IP for the domain you typed into your browser, it'll request one from the SOA. It could be any of those A records. Then it passes it along.
Some nameservers are 'smart' enough to request a new A record if the one it gets doesn't work and some aren't. So if you set multiple A records then you will have set up a pseudo redundancy failover, but only for those people with 'smart' nameservers. The rest get a toss of the dice on which IP they get and if it works then good, and if not then it will fail to load as it did for you in Chrome.
If you want to specifically test this then you can use your hosts file C:\Windows\system32\drivers\etc\hosts in Windows and /etc/hosts in Linux to specify what IP you want to go with what domain to see if you get a true failover - as what you'll run into in practicality is that DNS servers across the net will cache your domain name resolution based on its TTL. So if/when you get a real failure, that IP will still need to be resolve and be otherwise farmed out to another nameserver.
Another possible explanation is that, for most public websites, the bulk of traffic comes from bots not from browsers. Depending on the bot it is possible that they aren't quite as smart as the browsers when it comes to handling multiple A records for a domain.
Also, some bots use keep-alives to keep the TCP connections open & make multiple HTTP requests over the same connection. Given that the DNS lookup is only done when a connection is made, they will continue to make requests to the old IP address at least as long as the connection is kept open.
If the above explanation has any weight you should be able to see it in your logs by examining the user agent strings.

Resources