Google has multiple servers at multiple locations. When I search Google in my web browser, how does the DNS map this name to the corresponding IP address? Google has multiple servers in multiple locations with separate IPs. Is a load balancer used first?
A couple of different approaches are used:
Geographic DNS
When a request comes in for a domain name, the DNS server looks at the IP address making the request and returns an IP address of a nearby server.
Some complicated extensions are required to deal with large shared caching DNS servers (like ISP nameservers), but that's the general idea.
Anycast DNS
Anycast is a weird routing trick where a single IP range can be advertised by multiple ASes. This will cause requests to an IP address in that range to be routed to whichever server is closest.
If a DNS server is hosted on an anycast IP, different instances of that server can be configured to return different IPs. This can be used as a computationally easier alternative to geographic DNS.
Anycast HTTP
If anycast can be used to route DNS to the closest server, why not just go to the next step and use it to route HTTP as well?
(It turns out there's a reason why you usually don't want to do this: Routing changes can break a HTTP connection. This doesn't affect DNS as it's usually used over UDP. Cloudflare does it anyway, though, and it usually works fine… YMMV.)
In large scale reverse proxy server is usually used for this purpose and it can do various tasks including load balancing as well. To the client it appears that you connect only to one server while reverse proxy hides servers behind it.
In small scale you can do similar things just with DNS settings mapping different domain names to different IP addresses. See this article
Related
I was curious as to why one client site on a shared server was performing very poorly and I wanted to know if there was a way to find out how many other sites were being hosted on the same server. I found this reverse IP lookup site:
http://reverseip.domaintools.com/
that claims the client's site IP is also being used by 3000+ other sites. I did a quick survey of other clients' sites and this is more than twice the next closest, most being in the 800 - 1500 range.
Does this mean that there are 3000+ sites being hosted on one server, or could there still be multiple servers sharing an IP? Basically I want to know if this is the main likely reason the site is slow.
On public internet, sharing the same IP address does not mean sharing the same physical server. Here are the ways of sharing an IP, and yet processing on different physical server:
Most often, the public IP addresses are the interfaced by a Load Balancer, or a Reverse Proxy, or a Gateway. This device then routes (technically proxies) the connection to one of the physical servers running behind them. All these are within the firewall/network/data_cente of the "serving" organization.
Unless designed (or ill-designed) to reveal information about the internal IP addresses, there is no way to figure out the IP address of physical device that actually processed the request.
Anycast allows you to have the same IP address being available at different geographical locations. Look at Google's DNS servers (IP address 8.8.8.8). Such services are anycasted, to serve from the nearest geo-location.
This is also true from server's perspective. A server does not necessarily know the "original" IP address from where the request initiated. Most often, we are proxied, and/or NAT'ed by routers and other devices at our home and offices. After all, there are only so many public IP addresses available (at least IPv4), and we cannot have one public IP address for each device :) .
Closing statement: The server and the client only know the ingress/egress points of each other's network. Beyond that, they have no idea of the internal IP addresses of the physical devices.
Yes, it can very well mean that. It is very common, and is the only way companies selling you hosting for pennies can even approach turning a profit.
It is done with virtual hosting support in the web server. This relies on DNS and the browser / client providing the referrer URL to the server as part of the HTTP request. The HTTP server then knows who the client thinks he is requesting a URI from, and maps the request to that site tree. Those trees often sit on the same disk, though the sites may be jailed or virtualized.
I've seen numbers higher than 3000, for example.
If you want better, you have to move to a higher quality provider, and/or obtain your own IP addresses.
I'm having the following dilemma, I have a website on IIS with two internal IPs, each one of those IPs are NATed to different external IPs (each IP is from a different ISP). I also configured a RoundRobin DNS Service (two A hosts with the same name but with a different IP). Basically what this does is that the traffic is balanced between the two ISPs, and that's what we want. The thing is that apparently this configuration (DNS Roundrobin) is meant for when you have a cluster of server so each server has its own ISP on its own NIC, so the traffic from the webserver to the client is made over that ISP.
Right now we are being told that no matter where our inbound traffic comes from, the outbound traffic is always through our main WAN, which is also OK, because we have tested that when the primary WAN link is down, the website keeps working on the secondary link.
OK, the question is, do you think there may be problem with this configuration? Is the DNS Rounrobin also useful on this configuration?.
Thanks a lot for your feedback.
normally when you host a web service the responses are much bigger compared to the inbound traffic (normally you receive an HTTP GET/ and deliver the whole content back) - so it would make much more sense to balance the outbound traffic over your ISPs to get value out of your additional bandwidth.
does it make sense - yes - you can loose one ISP and your site is still available (assuming you do Healthchecks on your DNS server to determine if the sites are available before you send the IP address back - if you always deliver both IPs even when one ISP is down it won't help you at all)
it would be better to add an additional server - OR do policy based routing on your single server - so sending the response out of the interface where it was received.
hope that helps!
Reading a lot about servers, load balancing and similar topics, a question came to mind.
DNS servers are servers which gives you the IP for a given domain name. Is there a "dictator" knowing all the valid DNS servers in the world? If I want to make a DNS server, and someone requests a website it doesn't have. How would it know which other DNS to redirect the request to? What if I tell facebook.com to have a spoof IP, and everyone getting the IP from my DNS server would be communicating with a spoof facebook server? Obviously, this isn't how it works (at least not at a big degree), because then someone would have done it already to attack hundreds of people.
When one registers a domain, one has to specify the name server for that domain. What happens during this process? Is a request sent to this DNS server to notify it there is a new domain to save in the database? If so, how can anyone own the top domains like .com? And why cannot I for example make my own top domain name if I can make my own DNS server?
After looking at nginx as a load balancing system, I'm starting to wonder a bit. Is it so that a request to http://www.google.com/ works like this? The computer asks a DNS server for the IP address for google.com, and then requests it? This will only be one IP, and all requests to Google ends up at this one server? And then this IP will be connected to a nginx server, or a more basic hardware unit to route the request internally to other servers? So all requests go to one server before it redirects the request to a data center?
After looking up google.com, it says the name servers are ns1.google.com etc.. But what is the point of them, if you need a different name server to get to ns1.google.com in the first place?
Obviously what I've written doesn't make sense, because if it were true, the web as a whole would be unusable because of people exploiting the possibilities for malicious causes. And I can't imagine how ONE server could handle ALL the requests thrown at google.com.
I've tried searching Google, but all I get is theoretical explanations that led me to where I am now. It would have been great if someone would point me to some articles that explain this thoroughly, and hopefully a lot of other people will find this question useful.
Anyone can run a DNS server, but the challenge is getting someone to use it. Normally the DNS server IP is provided as a DHCP option or is statically assigned. If you can get someone to use your server, you can return any IP for any hostname, including creating new top-level domains (subject to any filtering at the client, of course. Web browsers might have difficulty with a new TLD, for example). Note that with DNSSEC, this will eventually change, as the name record will be digitally signed and your server won't be able to fake the signature exactly.
DNS servers operate in a tree. When one server receives a request for a domain it does not control, it forwards the request on to another DNS server. The other DNS server may be the one which returns the IP (this is called the authoritative server), or it may return a NS record which points to another server which then must be queried. The DNS root servers provide for resolving TLDs.
A DNS server does not need to always return the same IP for a given name. It may choose to return a different IP based on region, client IP, or even per-request. This is the most typical way to load balance. Multiple DNS servers can also load balance the DNS requests by using anycast routing, where many servers share the same public IP and traffic is routed to them randomly by publishing multiple routes for the same IP.
I have a couple EC2 instances behind an Elastic Load Balancer. These instances serve HTTP requests for a single web site. I recently started looking at the HOST header of the traffic, because I am planning to split my app into virtual hosts.
With some regularity (dozens of times a day), I log a request for a host name that is totally unrelated to my servers. As a couple examples, today I saw requests with the host names ad.adserverplus.com and r1---sn-upfn-hp5e.c.youtube.com. I looked these up and the IP addresses are not the same as any of my servers, nor of the ELB, so I am trying to develop a theory as to how this happens.
I realize that someone could be spoofing the host header, but it happens often enough that I am pretty sure this is not what is going on. My other idea is that somehow there is stale DNS data that just happens to resolve one of those hosts to my IP address, but again this seems like it could happen once in a great while but not regularly. What are some other possibilities, and how might I verify / discredit them?
EDIT
I looked at some of the unexpected host names today, and it seems that they actually do resolve to an IP that is one of the possible IPs that my domain apex resolves to. I use Route 53 for DNS, and I have the zone apex pointed to the ELB, so when I query the IP address for my domain, I get different answers depending on when I ask. So this makes me very curious, how do these IP addresses get assigned to me and how does EC2 make sure they are not co-opting an IP address that someone else is already using.
There are any number of reasons for this. First you should understand that the public host name for your EC2 instances and load balancers have likely been used before. If you have an elastic IP associated with your load balancer, it has also probably been used before.
As such you can get traffic to your servers that is intended for a previous tenant of that hostname of IP address that you are currently using.
One thing you can do is to configure your web servers to reject traffic (respond with 403) to traffic that is not arriving with the proper hostname specified or that comes from a specific external host.
Your IP or your ELBs IP may have at one point in time been an open proxy. meaning that someone is hoping that you would forward the requests on to their intended destination.
but in general open port 80 to the internet and all kinds of bots and zombies will visit you with a pretty constant flow of dodgy requests. I would imagine though that the \ec2 IP ranges would be a particularly juicy range to search for poorly patched websites to exploit.
Short Question :
Since DNS is anycast, is there any way for a DNS Server to know the "first" source DNS Query originated from?
Long Question :
I've developed a custom DynDNS server using PowerDNS, I want to feed it information via web interface by users. I want the web interface to update records for each user "based on IP".
So when the DNS Server gets requests, If it could determine the source IP, it'd be easy to return records associated with that IP.
As long as I tested, the DNS Server can only know the "last" node IP on the DNS chain, not the source. Is there any way?
Regards
Google and Yahoo! submitted a draft (draft-vandergaast-edns-client-ip-01) to the IETF DNS Extensions Working Group that proposed a new EDNS0 option within DNS requests that recursive servers could use to indicate their own client's IP address to the upstream authoritative server.
The intent was to theoretically optimise the use of Content Delivery Networks by ensuring that the web server addresses returned were based on the end user's IP address, rather than on the address of the end user's DNS server.
The idea was not well received and wasn't accepted by the working group because it intentionally broke the caching layer of the DNS, and the draft has subsequently expired.
UPDATE - a variation on this has subsequently been published as RFC 7871.
Perhaps you have control of the software performing the lookup? If so, you could include the IP address as part of the request, e.g.
23-34-45-56.www.example.com
to which your custom-written server replies
23-34-45-56.www.example.com 1800 CNAME www-europe.example.com
or
23-34-45-56.www.example.com 300 A 34.45.56.67
etc.
If the client is a web browser, complications arise due to NAT, HTTP proxies, and the inability to query host interface addresses directly from Javascript. However, you might be able to do an AJAX-style lookup to a what's-my-ip service, which understands X-Forwarded-For.
Long answer to Short Question :
DNS is not anycast. Some content DNS server owners use anycasting to distribute servers in multiple physical locations around the world, but the DNS/UDP and DNS/TCP protocols themselves are not anycast. The notion simply doesn't exist at that protocol layer.
Short answer to Long Question :
No.
Expansion
As noted, there's nothing in the DNS protocol for this. Moreover, the relationship between front-end and back-end transactions at a caching resolving proxy DNS server is not one-to-one.
You'll have to use whatever client differentiation mechanisms exist in the actual service protocol that you're using, instead of putting your client differentiation in the name→IP address lookup mechanism. Client differentiation for other services doesn't belong in name→IP address lookup, anyway. Such lookup is common to multiple protocols, for starters. Use the mechanisms of whatever actual service protocol is being used by the clients who are communicating with your servers.