Understanding load balancing and DNS records

Understanding load balancing and DNS records - dns

I am curious on how to setup multiple load-balancers (with different IP addresses) with a specific domain.
I understand that it is possible to setup multiple A-records in a DNS to all of my load-balancers, but I can understand that this is not ideal.
DNS' doesn't do any kind of is-alive checks, so if a load-balancer dies, the DNS will still send users to this address, right?
So how do you connect a domain/DNS with multiple load-balancers, while preventing a dead load-balancer from getting requests...
I read something about anycast, but is this the only solution?
I am just curious about how this issue is normally handled.
Thanks.

You have multiple solutions.
On a pure DNS level you can publish your records with a low TTL (say 5 minutes), and have your monitoring systems change the content of the zone by removing the dead record when detected. This does not provide immediate fail-over but can be often good enough.
It does not involve too complicated systems.
Also, some DNS servers allow some "programmed part", with a dynamic backend that can compute records based on some external parameters, like doing live checks and replying only with the live records.
Anycast is another solution indeed, and has then no relationship with the DNS anymore (although the DNS itself can be "anycasted" but then it is to resolve its possible failover needs, not the ones of your application).
Basically your multiple systems, on various places in the world, are advertised with the same IP address. So the DNS has only one record.
With the "magic" of BGP, each instance announcing a given IP address will collect all the nearby traffic, so you get load-balancing for free in fact. And you need some specific tooling so that, as soon as some local instance is dead (or in maintenance mode for example), you stop announcing its IP address there, so that all other networks in the world, again because of BGP, learn that to reach "something" behing that IP they need to go somewhere else, to another instance of yours announcing this IP.
This is far more complicated to setup as you need a proven BGP setup (and making errors in BGP can have even greater consequences than in DNS), and multiple instances located in different datacentres, and possibly multiple AS numbers, depending on how you want to do your anycast done. This clearly needs skilled professional in BGP routing where the first solution with only DNS (in the first case of just changing a static zonefile) is reachable by any enthousiastic amateur.
So the answer also slightly depend on the network locations of your load-balancers.

Related

Access load-balanced website when DNS lookup is restricted on server

The scenario is - I need to send push notification to Apple push server hosted at gateway.sandbox.push.apple.com. This Apple server is load balanced and the destination IP address can be anything in 17.x.x.x block.
Now my server which will be requesting Apple server is in secure environment and is behind firewalls. I have got the IP range 17.x.x.x unblocked, but DNS resolving is not possible on that server. That server also doesn't have Internet access on it.
What I did was - I pinged the Apple server from another system and got the Apple server's IP address for the moment. Then I mapped that IP address with the DNS name in the hosts file of my Windows server. This worked, but now the IP address can change anytime at the Apple end, and this will break things.
What can I do in this scenario?

You can talk to your security people and in cooperation with them come up with a proper, internally supported, way to provide what you need. What you need is to look up an address, and then talk to that address. Currently, you are only provided half of that.
What you're asking us for is a way to circumvent your own organization's security policies (policies that admittedly appear stupid, but that's another matter entirely). Even if someone here can come up with a technical way to do what you ask that works for now, it's likely to break at any time, since you're working at odds with your own workplace. Also, what will your bosses say if they find out that you're violating security policies?
Security very often comes down to tradeoffs. As the saying goes, the only truly secure system is one that has been encased in concrete and sunk to the bottom of the sea. But such a system will also be somewhat difficult to get useful work out of, so usually we accept lesser security in order to get work done. In your case, the tradeoff currently sits in a place that prevents you from doing whatever it is you're working on. So your organization needs to make a choice: change the tradeoff so that your machine can look up names, or keep the current tradeoff and accept that your task will not be done.
I'm sorry that I can't give you a straight up "Sure, do this" kind of answer, but your problem really is not technical.

Where should restricting IP address be handled?

We run a reverse proxy in front of our application tier and I'm wondering where the "best practice" place for handling the IP restriction is.
Currently, we use the application security to restrict access to specific resources by IP address but this has caused some issues when we moved to running behind a reverse proxy. It's quite easy to configure the allow/deny rules at the proxy instead of the application but since we run multiple applications behind the proxy, making modifications to the config there has the potential to affect other application (not a huge danger, but still present).
Is it better to do the filter further up the chain or closer to the application?
Are there any gotchas, like what we've encountered by doing application restriction and adding a reverse proxy where all the requests "come from" the proxy, forcing us to use a header to find the "real" IP address.

We filter as early as possible and keep it away from the application; these sort of things are better managed by network operations. The reason being is that app developers or maintainers are not always in on the loop when changing ip addresses and the network ops people are usually the first to know. Also network type tools are usually better at providing / restricting access that software level tools.

I would never restrict by IP address. Restrictions like that are the job of a security layer, not of the Network layer, which is where IP addresses live. I rarely find value in having an Application restrict the implementation of the Network.

This depends on the type of resources that need to be restricted by IP. If parts of the application need to be restricted via IP then the application should be handling it. If the entire application needs to be blocked then you should be further up the chain.
The general rule is to restrict as early as possible without compromising any audit systems you have in place (it is almost always a good idea to know when people try to break your security system).

I restrict by IP addresses as early as possible - this eliminates unnecessary traffic in the following layers or subnetworks. So my advice is similar to u07ch's do it as early as possible.

Cross-colo fail-over design, DNS level fail-over?

I'm interested in cross-colo fail-over strategies for web applications, such that if the main site fails users seamlessly land at the fail-over site in another colo.
The application side of things looks to be mostly figured out with a master-slave database setup between the colos and services designed to recover and be able to pick up mid-stream. I'm trying to figure out the strategy for moving traffic from the main site to the fail-over site. DNS failover, even with low TTLs, seems to carry a fair bit of latency.
What strategies would you recommend for quickly moving traffic between colos, assuming the servers at the main colo are unreachable?
If you have other interesting experience / words of wisdom about cross-colo failover I'd love to hear those as well.

DNS based mechanisms are troublesome, even if you put low TTLs in your zone files.
The reason for this is that many applications (e.g. MSIE) maintain their own caches which ignore the TTL. Other software will do a single gethostbyname() or equivalent call and store the result until the program is restarted.
Worse still, many ISPs' recursive DNS servers are known to ignore TTLs below their own preferred minimum and impose their own higher TTLs.
Ultimately if the site is to run from both data centers without changing its IP address then you need to look at arrangements for "Multihoming" via global BGP4 route announcements.
With multihoming you need to get at least a /24 netblock of "provider independent" (aka "PI") IP address space, and then have that only be announced to the global routing table from the backup site if the main site goes offline.

As for DNS, I like to reference, "Why DNS Based Global Server Load Balancing Doesn't Work". For everything else -- use BGP.
Designing networks in order to load balance using BGP is still not an easy task and I myself certainly am not an expert on this. It's also more complex than Wikipedia can tell you but there are a couple interesting articles on the web that detail how it can be done:
Load Balancing In BGP Networks
Load Sharing in Single and Multi homed environments
There is always more if you search for BGP and load balancing. There are also a couple whitepapers on the net which describe how Akamai does their global loadbalancing (I believe it's BGP too.), which is always interesting to read and learn about.
Beyond the obvious concepts you can use software and hardware to achieve, you might also want to check with your ISP/provider/colo if they can set you up.
Also, no offense in regard to your choice of colo (Who's the provider?), but most places should be setup to deal with downtimes and so on, they should not require you to take actions. Of course floods or aliens can always strike, but in that case I guess there are more important issues. :-)

If you can, Multicast - http://en.wikipedia.org/wiki/Multicast or AnyCast - http://en.wikipedia.org/wiki/Anycast

How does geographic lookup by IP work?

Is which IPs are assigned to which ISPs public information? How do geo IP services obtain this information and maintain this information?
How can I personally figure out where a certain IP belongs without using one of these services?

For what it's worth, I worked at a senior level in the ISP industry for more than a decade so I have quite some experience with this.
Large IP ranges are allocated as needed by IANA to each of the Regional Internet Registries.
The regions are generally continental in size - IP addresses are not assigned on a per-country basis.
The RIRs in turn then allocate IP addresses to ISPs, who in turn assign them to end-users.
Each of the RIRs maintain a whois server which can be queried to find out not only which ISP has been assigned any netblock, but to a certain extent which end-user, and that end-user's address.
Note that many ISPs do not fill out this information for every single customer. Hence if you're a residential subscriber of a DSL service, it's likely that the Geo records will give the address of your ISP, and not your own address.
The various GeoLocation providers mostly work by mining these whois records. Note that the legality of doing so is something of a gray area - RIPE's database copyright statement is here.
IANA also maintains the root zone for the DNS, but that is completely separate from any IP allocation functions. It is very important to maintain the distinction between domain name operations and IP addresses.

To answer the specific question about "how it works": there's alot of manual labor involved, and the databases are to a large extent maintained manually. Just as other answers point out, there's no real correlation between IP ranges and countries, much less specific regions. Recently the system of IP address space distribution has been even more decentralized which means small private vendors can acquire IPv4 address ranges regardless of geographic region. This is why Google acquried Urchin so they could use their services for Google Analytics, which provides very accurate IP-to-geographic-region information.
If you don't want to use a service like MaxMind (free for personal use, and the database is open to some extent) or Google Analytics (free for personal use), there's free (and hence always slightly outdated) databases floating around, sometimes as flat files.

There are a variety of libraries that have mapping tables as well as services you can incorporate into your code.
The most important thing to understand is that there is no direct relationship between an IP address and any part of the world. The addresses are allocated in large blocks to organizations that are roughly geographical, which in turn allocate smaller blocks, this may happen at several levels for any given IP address (Alnitak explains the process well).
The fact is: WHOIS data does not have to be accurate. If I have an address block, I can say it is on Mars. And even if you narrow down the location of the final organization (say a very small ISP in Alaska), the user might be using dialup from Hawaii, or the server might be hosting a company from Guam.
So, there is always an element of risk/estimation in mapping an IP address (or a domain name) to a physical location. This is not to say you should never do it, there are many applications where rough or imperfect information is very useful.

Beware, the data is often slow to be updated, and even slower to replicate. My work place changed ISPs a number of years ago, and we were assigned a block of formerly Canadian IP addresses (we're based in the US), for months Google continued to give us google.ca as our default search engine. About 1/2 the time my home IP address comes up as being from my town, the other 1/2 from a town in another state.
Jason is right that the process is the same, but the updates are even slower and the data less accurate.

Alnitak's answer is pretty much on the mark.
As a side note, if you want to use a .dll to determine the user's location, then you can try this IPAddressExtension found on CodePlex. It has an internal database of ISP's to locations. As mentioned above by Alnitak, each ISP have IP blocks .. so this information is all buried inside the .dll :)
It's really easy to use. Just reference the .dll and then create an instance of a System.Net.IpAddress object! the extensions are listed on it.
I also need to declare that i'm the author of that codeplex project/product.
Please check it out :)
EDIT: added information about me being the author of that product.

Using IP Addresses or Host Headers in IIS

I was wondering about the best practices regarding this? I know there are two ways to use IIS and host multiple websites.
The first is to have an IP for every website
The second is to use host headers, and a single IP Address for IIS
I was wondering which was the best practice, and why one should be preferred over the other?
Thanks!

It's easier to implement and manage SSL if each site has its own IP address/domain name. You simply get a cert for that name and install it on that site. Doing SSL with Host Headers requires a wildcard server certificate that is implemented and synchronized across all sites that share the IP. You also don't have the restriction that all the sites be in the same domain.

I personally separate sites based on the relation to each other. For example all of my business sites share a single IP adddress (1 domain currently). All of my personal/community sites share a second IP address.
The differences can come over time when it comes to sending e-mail as I know that IP comes into play in some blacklisting systems, so if one site with a shared IP address causes problems it CAN cause issues for the other sites using that IP.
I am sure there are other items, reasons, and justifications, but those are at least mine...

Personally I find host header configuration makes life very easy for standard web hosting.
I have literally hundreds of sites running of off single IP addresses on a number of servers - both IIS and *nix Apache, all configured as virtual hosts. In a live web hosting environment it makes life easier both in terms of DNS configuration and server configuration.
The only time I used IP based separation is where I want to run sites on different networks and thus serve the traffic of a different network interface.
I've not seen any performance loss with the host header methodology but would like to hear anyone's horror tales - there have to be some out there :-)

Virtual hosting is usually better than separate IP addresses, but your mileage will vary.
This is really a network vs. systems deployment connection. You want to look at the total number of sites and services you will have on a system. You might want them to live on separate network interfaces (hence multiple IP addresses). You might want them to live off bonded physical interfaces.
You might want web applications to run that need to run separately from others because of security reasons.
The other answers above mention other factors, like SSL, organizational boundaries. (Some software does make associations by IP-address, like spam control). There are probably many other factors I have not thought of.

Host headers are prefered because they conserve IPv4 address space. They have been mandatory since HTTP/1.1.
With https things are a little more complex; you need a modern browser that supports the TLS/SSL server_name extension (RFC 4366 and previously RFC 3546). This includes:
Opera 8.0 or later
Firefox 2.0 or later
IE 7 on Vista
Google Chrome
Of course your server has to support it. If you want to support earlier browsers and use SSL/TLS, you need to us an IP address per virtual host; as those browsers become obsolete you'll be about to share IP addresses for TLS/SSL.

Host Headers versus multiple IPs when hosting several websites

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string