At the moment there is a nginx-balancer (Centos 7, a virtual machine with a white address) proxying to a large number of backend Apache servers. It is necessary to implement a failover cluster of two balancers on nginx. Fault tolerance is trially implemented using a virtual ip address (keepalived is used). Tell me what you can read about the pair nginx-balancer or how it can be implemented: all requests coming to them on the same virtual ip-address are evenly distributed between the two of them, but if one of them fails, the remaining one takes everything on itself?
At the moment, it turns out that there are two identical balancers and the benefit of the second is only in insurance. In the moments of full work of the main (master), the second (backup) is uselessly idle.
What you are describing is active-active HA.. you can find something on google for nginx+ but by briefly looking at it I don't really see it as true active/active = there is not just one virtual (floating) IP.. instead active/active is achieved by using two floating IPs (two VRRP groups - one VIP address active on each nginx) and then using round-robin DNS A record containing both addresses.
As far as I know keepalived is using VRRP protocol which in some implementations can provide 'true' active/active.. anyway I'm not sure keepalived supports this. Based on informatin I'm able to lookup it's not possible.
Related
I have two problems.
Fist problem:
I want to use multiple DNS load-balancers (don't know what load-balancing solutions are out there, please suggest some) and some master slave (powerdns) replication on my dns servers.
so my approach is like an A record which will round robin my two or more NS records, our NS records would then resolve our dns load balancer POP around a distributed network of our DNS Load balancers and then our powerdns master slave replication would kick in.
this roundrobin approach is just to focus DDOS mitigation measure and we want this. i know people don't load balance dns servers but we have to.
Second problem:
after deploying this system we wanna use dns to resolve our distributed webservers around the globe like an anycast system.
we have two different servers in different countries
one server = cdn.xyz.in (static content)
second server = xyz.in (dynamic website)
well we have those two servers deployed in multiple locations like Singapore, NYC etc. that we call - POP's
so what i want is to use dns to resolve the closest POP's to the user request. Kind of Geo. i forgot the name from Amazon 53, it does route the traffic how do they achieve it?
also if we can add a monitoring system or a dns analytical system in front of load balancer to monitor and track out traffic like cloudflare does would be great, list me some tools i would figure it myself.
In my network infrastructure I have multiple subnets intended to segregate different types of devices. I would like the ability to serve different DNS responses from different DNS servers based on the requesting subnet. For example I'd like to use Google's DNS for one subnet but say CloudFlare's anti-malware DNS for another. I would also like the ability to then further lock down by using different "address" declarations on the different subnets.
One way that some people accomplish the first part is to use the "dhcp-option" declaration to serve different server addresses to the different subnets but this kind of defeats the purpose of DNSMASQ and turns it basically into just a DHCP server and also defeats using a firewall to restrict access to port 53 to control any hard-coded dns servers.
The other option I've seen is to run 2 instances of DNSMASQ however this creates a highly customized setup which doesn't allow any of the system level configuration files or run scripts which I'd like to avoid.
So I'm hoping someone can offer a solution for this.
Thanks in advance.
Presumably you want to all of the subnets to use DNSMasq to resolve local domain names, but you want the subnets to use different recursive resolvers for Internet queries?
You should be able to do that with the DHCP settings (so that each subnet will received two DNS entries - one for DNSMasq and one for another resolver e.g. 8.8.8.8). These entries will end up in the /etc/resolv.conf for each device and will be attempted in order when the device needs to resolve DNS. If DNSMasq is set to resolve local queries only, then the device will be forced to try the second address (e.g. 8.8.8.8) to resolve Internet queries.
Take Google.com for example. If it ultimately resolves to a single IP at any point of time, the packets will land on a single server. Even if all it does is send a redirect response (for transferring load other servers), it still has to be capable of handling hundreds of thousands of requests per second.
I can think of a number of non standard ways to handle this. For example the router may be programmed to load balance the packet across multiple servers. But it still means that google.com is dependent on a single physical facility as IP addresses are not portable to another location.
I was hoping internet fabric itself has some mechanism to handle such things.Multiple A records per domain is one such mechanism. But while researching this I found that google.com's DNS entry has only one A record and the IP value is different depending on which site you query it from.
How is it done? In what ways is it better and why has Google chosen to do it this way instead of having multiple A records?
Trying to lookup A record of google.com yields different results from different sites:
https://www.misk.com/tools/#dns/google.com Resolves to 216.58.217.142
https://www.ultratools.com/tools/dnsLookupResult resolves to 172.217.9.206
This is generally done using dynamic DNS/round robin DNS/ DNS load balancing.
Say your have 3 web servers at 3 different locations. When the lookup is done the DNS server will respond with a different IP for each request. Some DNS servers also allow a policy based config... wherein it can return a certain IP 70% of time and some other IP 30% of the time.
This document provides reference on how to do this with Windows 2016.
I need to deliver a lot of HTTP content (Lets say it simple - a Big storage with HTTTP Access - Similar to AWS S3)
The Bandwith needed for this excedds the Bandwith of one Server (We get 200MBit each Server and the question is not to change this)
For out Prog we need 1Gbit that woudl mean 5 Servers.
When I connect them togeter with mod_proxy then I have one Server in front which only has 200MBit. So thats not the right way.
But these Servers must be accassible from the Web with one Domain Name. Is there a possibillity to so that? Example: One gets the HTTP Request, but the Resonse comes from a different Server?
DNS Round Robin?
Different Idea?
Thanx
If the outbound network traffic is not CPU limited, you can use this open source Linux Network Balancer
http://lnlb.sourceforge.net/
The inbound network speed will remain at 200MBit, but with five nodes the maximum outbound limit is 5*200MBit.
A lot of people condemn round robin DNS, perhaps assuming that it will take a full TCP timeout to detect all failed node which is simply not the case. Its a simple way to solve the performance problem and improves availability a lot. This also helps to solve the potential bottleneck of your lan without having to go to 10gbit Ethernet which would be a requirement between a router and a load balancer for the rate of traffic you describe.
There may be scope for getting more throughput from your servers and hence only needing 3 or 4 servers rather than 5. But that's a very different question.
After Amazon's failure and reading many articles about what redundant/distributed means in practice, DNS seems to be the weak point. For example, if DNS is set to round-robin among data centers, and one of the data centers fails, it seems that many browsers will have cached that DNS and continue to hit a failed node.
I understand time-to-live (TTL), but of course this may be set to a long time.
So my question is, if a browser does not get a response from an IP, is it smart enough to refresh the DNS in the hope of being routed to another node?
Round-robin DNS is a per-browser thing. This is how mozilla does it:
A single host name may resolve to multiple ip addresses, each of which is stored in the
host entity returned after a successful lookup. Netlib preserves the order in which the dns
server returns the ip addresses. If at any point during a connection, the ip address
currently in use for a host name fails, netlib will use the next ip address stored in the
host entity. If that one fails, the next is queried, and so on. This progression through
available ip address is accomplished in the NET_FinishConnect() function. Before a url load
is considered complete because it's connection went foul, it's host entity is consulted to
determine whether or not another ip address should be tried for the given host. Once an ip
address fails, it's out, removed from the host entity in the cache. If all ip addresses in
the host entity fail, netlib propegates the "server not responding" error back up the call
chain.
As for Amazon's failure, there was NOTHING wrong with DNS during Amazon's downtime. The DNS servers correctly reported the IP addresses, and the browsers used those IP addresses. The screw-up was on Amazon's side. They re-routed traffic to an overwhelmed cluster. The DNS was dead-on, but the clusters themselves couldn't handle the huge load of traffic.
Amazon says it best themself:
EC2 provides two very important availability building blocks: Regions and Availability
Zones. By design, Regions are completely separate deployments of our infrastructure.
Regions are completely isolated from each other and provide the highest degree of
independence. Many users utilize multiple EC2 Regions to achieve extremely-high levels of
fault tolerance. However, if you want to move data between Regions, you need to do it via
your applications as we don’t replicate any data between Regions on our users’ behalf.
In other words, "remember all of that high-availability we told you we have? Yeah it's really still up to you." Due to their own bumbling, they took out both the primary AND secondary nodes in the cluster, and there was nothing left to fail over to. And then when they brought it all back, there was a sudden "re-mirroring storm" as the nodes tried to synchronize simultaneously, causing more denial of service. DNS had nothing to do with any of it.