Using DNS for failover using multiple A records - dns

It has recently come to my attention that setting up multiple A records for a hostname can be used not only for round-robin load-balancing but also for automatic failover.
So I tried testing it:
I loaded a page from our domain
Noted which of our servers had served the page
Turned off the web server on that host
Reloaded the page
And indeed the browser automatically tried a different server to load the page. This worked in Opera, Safari, IE, and Firefox. Only Chrome failed to try a different server.
But after leaving that server offline for a few minutes and looking at the access logs, I found that the number of requests to the other servers had not significantly increased. With 1 out of 3 servers offline, I had expected accesses to each of the remaining 2 servers to roughly increase by 50%, but instead I only saw 7-10%. That can only mean DNS-based failover does not work for the majority of browsers/visitors, which directly contradicts what I had just tested.
Does anyone have an idea what is up with DNS-based web browser failover? What possible reason could there be why automatic failover works for me but not the majority of our visitors?

What's happening is that the browsers are not doing automatic DNS failover.
If you have multiple A records on a domain then when your nameserver requests the IP for the domain you typed into your browser, it'll request one from the SOA. It could be any of those A records. Then it passes it along.
Some nameservers are 'smart' enough to request a new A record if the one it gets doesn't work and some aren't. So if you set multiple A records then you will have set up a pseudo redundancy failover, but only for those people with 'smart' nameservers. The rest get a toss of the dice on which IP they get and if it works then good, and if not then it will fail to load as it did for you in Chrome.
If you want to specifically test this then you can use your hosts file C:\Windows\system32\drivers\etc\hosts in Windows and /etc/hosts in Linux to specify what IP you want to go with what domain to see if you get a true failover - as what you'll run into in practicality is that DNS servers across the net will cache your domain name resolution based on its TTL. So if/when you get a real failure, that IP will still need to be resolve and be otherwise farmed out to another nameserver.

Another possible explanation is that, for most public websites, the bulk of traffic comes from bots not from browsers. Depending on the bot it is possible that they aren't quite as smart as the browsers when it comes to handling multiple A records for a domain.
Also, some bots use keep-alives to keep the TCP connections open & make multiple HTTP requests over the same connection. Given that the DNS lookup is only done when a connection is made, they will continue to make requests to the old IP address at least as long as the connection is kept open.
If the above explanation has any weight you should be able to see it in your logs by examining the user agent strings.

Related

How does load balancing work for very high traffic domains?

Take Google.com for example. If it ultimately resolves to a single IP at any point of time, the packets will land on a single server. Even if all it does is send a redirect response (for transferring load other servers), it still has to be capable of handling hundreds of thousands of requests per second.
I can think of a number of non standard ways to handle this. For example the router may be programmed to load balance the packet across multiple servers. But it still means that google.com is dependent on a single physical facility as IP addresses are not portable to another location.
I was hoping internet fabric itself has some mechanism to handle such things.Multiple A records per domain is one such mechanism. But while researching this I found that google.com's DNS entry has only one A record and the IP value is different depending on which site you query it from.
How is it done? In what ways is it better and why has Google chosen to do it this way instead of having multiple A records?
Trying to lookup A record of google.com yields different results from different sites:
https://www.misk.com/tools/#dns/google.com Resolves to 216.58.217.142
https://www.ultratools.com/tools/dnsLookupResult resolves to 172.217.9.206
This is generally done using dynamic DNS/round robin DNS/ DNS load balancing.
Say your have 3 web servers at 3 different locations. When the lookup is done the DNS server will respond with a different IP for each request. Some DNS servers also allow a policy based config... wherein it can return a certain IP 70% of time and some other IP 30% of the time.
This document provides reference on how to do this with Windows 2016.

Single domain on multiple servers

I have a domain that needs spread on several server for load balancing purposes.
I also have my application to tell what server suppose to handle certain requests.
Right ow I have it set to use sub-domains like www1, www2 and just redirect to each server but that is ugly.
I need a way to proxy the requests and users to see only www all the time regardless what IP is actually serving the request...
I read a bit into apache proxy thing, but I am still confused how will such a scenario deliver the page and resources like videos without changing the www.
You can enter multiple ip addresses per subdomain in your DNS table. If your DNS server supports it, you can rotate these entries on each request to get a simple round robin load balancer (see http://en.wikipedia.org/wiki/Round-robin_DNS)
However, a much better solution is to have a load balancing server that handles all request to your web site. This way you can add and remove web servers to/from load balance instantaneously. So when you need to do some maintenance on one server you just take it out of the rotation.
Many load balancers also check if the web servers are still alive and remove dead servers automatically. This will increase your uptime significantly.

How do browsers handle a multiple IP response for a single hostname from DNS?

I want to know how this is handles or if there is a standard?
Browsers cache DNS Responses for a few minutes and typically attempt a connection with the first IP address returned in the DNS response. The same IP is used until the cache expires.
Internet Explorer caches DNS lookups for 30 minutes by default, as specified by the DnsCacheTimeout registry setting. Firefox caches DNS lookups for 1 minute, controlled by the network.dnsCacheExpiration configuration setting.
From: Yahoo Dev Network: Best Practices for Speeding Up Your Web Site
Therefore for multiple IP addresses to be used for load-balancing purposes, the DNS server must change the order of the addresses supplied in the response, choosing the order randomly or in a sequential "round robin" fashion. In fact, this is usually the default behaviour of DNS servers when they respond to hostnames with multiple A records.
There is no standard procedure for deciding which address will be used by the requesting application - a few resolvers attempt to re-order the list to give priority to numerically "closer" networks. Some desktop clients do try alternate addresses after a connection timeout of 30-45 seconds.
From: Wikipedia: Round robin DNS
Generally they iterate through the responses and use the first one they can connect to.

Webserver failover

I will be running a dynamic web site and if the server ever is to stop responding, I'd like to failover to a static website that displays a "We are down for maintenance" page. I have been reading and I found that switching the DNS dynamically may be an option, but how quick will that change take place? And will everyone see the change immediately? Are there any better ways to failover to another server?
DNS has a TTL (time to live) and gets cached until the TTL expires. So a DNS cutover does not happen immediately. Everyone with a cached DNS lookup of your site still uses the old value. You could set an insanely short TTL but this is crappy for performance. DNS is almost certainly not the right way to accomplish what you are doing.
A load balancer can do this kind of immediate switchover. All traffic always hits the load balancer first which under normal circumstances proxies requests along to your main web server(s). In the event of web server crash, you can just have the load balancer direct all web traffic to your failover web server.
pound, perlbal or other software load-balancer could do that, I believe, yes
perhaps even Apache rewrite rules could allow this? I'm not sure if there's a way to branch when the dynamic server is not available, though. Customize Apache 404 response to your liking?
first of all is important understand which kind of failure you want failover, if it's app/db error and the server remain up you can create a script that do some checks and failover your website to another temp page. (changing apache config or .htaccess)
If is an hardware failover the DNS solution is ok but it's not immediate so you will lose some users traffic.
The best ideal solution is to use a proxy (like HAProxy) that forward the HTTP request to at least 2 webserver and automatically detect if one of those fail and switch over to the working one.
If you're using Amazon AWS you can use ELB - Elastic Load Balancer

DNS-based strategies for showing a nice "Currently Offline" page when the server is down

How can I make that a site automagically show a nice "Currently Offline" page when the server is down (I mean, the full server is down and the request can't reach IIS)
Changing the DNS manually is not an option.
Edit: I'm looking to some kind of DNS trick to redirect to other server in case the main server is down. I can make permanent changes to the DNS, but not manually as the server goes down.
I have used the uptime services at DNSMadeEasy to great success. In effect, they set the DNS TTL to a very low number (5 minutes). They take care of pinging your server.
In the event of outage, DNS queries get directed to the secondary IP. An excellent option for a "warm spare" in small shops with limited DNS requirements. I've used them for 3 years with not a single minute of downtime.
EDIT:
This allows for geographically redundant failover, which the NLB solution proposed does not address. If the network connection is down, both servers in a standard NLB configuration will be unreachable.
Some server needs to dish out the "currently offline page", so if your server is completely down, there will have to be some other server serving the file(s), so either you can set up a cluster of servers (even if just 2) and while the first one is down, the 2nd is configured only to return the "currently offline page". Once the 1st server is back up, you can take down the 2nd safetly (as server 1 will take all the load).
You probably need a second server with 100% uptime and then add some kind of failover load balancer. to it, and if the main server is online redirect to that and if it isn't redirect to itself showing a page saying server is down
I believe that if the server is down, there is nothing you can do.
The request will send up a 404 network error because when the web address is resolved to an IP, the IP that is being requested does not exist (because the server is down). If you can't change the DNS entry, then the client browser will continue to hit xxx.xxx.xxx.xxx and will never get a response.
If the server is up, but the website is down, you have options.
EDIT
Your edit mentions that you can make a permanent change the IP. But you would still need a two server setup in order to achieve what you are talking about. You can direct the DNS to a load balancer which would be able to direct the request to a server that is currently active. However, this still requires 100% uptime for the server that the DNS points to.
No matter what, if the server that the DNS is pointing to (which you must control, in order to redirect the traffic) is down, then all requests will receive a 404 network error.
EDIT Thanks to brian for pointing out my 404 error error.
Seriously, DNS is not the right answer to server load-balancing or fail-over. Too many systems (including stub clients and ISP recursive resolve) will cache records for much longer than the specified TTL.
If both servers are on the same network, use routing protocols to achieve fail-over by having both servers present the same IP address to the network, but where the fail-over server only takes over if it detects that the (supposedly) live server is offline.
If the servers are Unix, this is easily done by running Quagga on each server, and then using OSPF as the local routing protocol. I've personally used this for warm standby servers where the redundant system was actually in another data center, albeit one that was connected via a direct link to the main data center.
Certain DNS providers, such as AWS's Route 53, have a health-check option, which can be used to re-route to a static page. AWS has a how-to guide on setting this up.
I'm thinking if the site is load balanced the load balancer itself would detect that the web servers it's trying to redirect clients to are down, therefore it would send the user to a backup server with a message dictating technical problems.
Other than that.....
The only thing I can think is to control the calling page. Obviously that won't work in all circumstances... but if you know that most of your hits to this server will come from a particular source, then you could add a java script test to the source, and redirect to a "server down" page that is generated on a different server.
But if you are trying to handle all hits, from all sources (some of which you can't control), then I think you are out of luck. As other folks are saying - when a server is down, the browser gets a 404 error when it attempts a connection.
... perhaps there would be a way at a point in between to detect 404 errors being returned by servers and replacing them with a "server is down" web page. You'd need something like an HTML firewall or some other intermediate network gear between the server and the web client.

Resources