I have a webapplication in multiple Regions in the Azure Cloud and i'm using the Traffic Manager in Performance mode zu redirect the user to the closest Region.
What's concerning me is the following:
With this site https://www.whatsmydns.net i checked my Webapplication to see, which Datacenter is selected.
The funny thing is, that people from California gets redirected to the server in Westeurope but there is a Server in US Central too.
So from the site of the traffic manager the ping to the europe server is faster then to US central.
But i believe, that the difference between these too can not be high...
Now i have the fear, that it can happen that a user jumps between US Central and Europe all the time because he is in such a zone where the latencies to the available servers are nearly identical.
I also store files in a Azure Storage account in each region. If the user now jumps, i would have to transfer these files between the regions all the time...
So i was wondering if there is a possibility to redirect the user by his GEOIp to a specific region than by latency?
One of the benefit of the traffic manager is in my eyes that i can use one domain for all regions...
the only solution for my problem i can think of is a own cloudservice which replaces the traffic manager and redirects the user to the different regions by their IP like us-center.DOMAIN.com, we-eu.DOMAIN.com etc...
Are there any other solutions?
Thanks for your help!
Br,
metabolic
If you believe Traffic Manager is routing queries incorrectly, that should be raised with Azure Support.
Traffic Manager 'Performance' mode routing is based on an internal 'IP address to Azure data center latency map. The source IP of the DNS query (which is typically the IP of your DNS server) is looked up in the map to determine which Azure location will offer the best performance. There is an implicit assumption that the IP address of the DNS server is a good proxy for the location of the end user.
The 'Performance' mode in Azure Traffic Manager is deterministic. Identical queries from the same address will be routed consistently. The only exception is that routing may change during occasional map updates, which affect only a small %age of the IP address space.
A more common cause of routing changes is customers moving from place to place. For example, during travel, or simply by picking up a Wifi network that uses a DNS service in a different location, with a different IP address.
A Geo-IP based routing is not currently supported by Traffic Manager. However, please note that it would work in the same way as the 'performance' routing, just that it would use a different map. Users could still be routed to different locations as a result of map updates or changing DNS servers.
As you describe, if your application requires a strong, un-violable association between a user and a region, one option is re-direct users at the application level (e.g. via HTTP 302).
Related
I currently have 3 traffic managers, 1 entry point for our domain, which does geolocation routing to 2 other traffic managers. One global, one for the US.
These traffic managers are priority traffic managers which point to application gateways. By having the priority traffic managers, it allow us to have a 'failover' if one site / application gateway goes down.
The reason we have a application gateway in the different countries is to allow path manipulation so if the user is from the US, they get a /us path instead of a /.
I have configured our CNAMES like www. and blog. in the application gateways for both, global and US which works fine. I can point the CNAME records to the entry traffic manager no problem.
The problem I have having is pointing the A record root domain to the traffic manager. Since traffic managers don't have IP addresses, I get an error because in Azure, the root domain can be pointed at a traffic manager, but only one that uses external endpoints using a IP Address.
Has anyone else ran into this issue and have a way to solve it?
Thanks
Adding a root/apex domain to Azure Traffic Manager should be possible as it is integrated with Azure DNS. So, you should be able to create A record to ATM as shown below from Azure DNS,
Currently, I am trying to setup a traffic manager profile for our company needs. Although I read the articles and documentation, I did no find an answer. Our goal is the following:
Route traffic based on domain name/website:
abc.com - routed to West Europe / West US is a backup
def.com - routed to West US / West Europe is a backup
Do we need to create separate traffic manager profile for each website we want to route to these regions. As I understand with custom headers it is possible to monitor different websites inside the profile, but obviously not to direct traffic using different rules for each website.
Probably, I am missing something out.
Thank you in advance.
Azure traffic manager supports the following traffic routing methods:
Priority, Weighted, Performance, Geographic, Multivalue, Subnet.
If you want to direct traffic based on the domain name, you need to create a separate traffic manager profile for each domain. Then create two endpoints by selecting Priority routing rules in each profile.
In addition, you could have a look at the Azure front door. With Front Door, you can transform your global consumer and enterprise applications into robust, high-performing personalized modern applications with content that reach a global audience through Azure. It supports URL-path based routing for requests. You also could assign priorities to your different backends when you want to use a primary service backend for all traffic.
I am reading a book about distributed systems. One of the options of data replication mentioned is the use of a multi leader approach and place each leader in a different datacenter. The main point of different datacenters is to be geographically close to the user.
The author then discusses all the write conflicts that emerge by having multiple write leaders, but he doesn't say much on how to direct users to connect to geographically close data center.
For example, user in Austria makes a HTTP request to https://stackoverflow.com. Stackoverflow has datacenters in Germany and North America. DNS record point to the datacenter in US.
Is initial request always going to be pointing to the datacenter in US? I know that once a user is identified, I can instruct all AJAX and img requests to point to Germany (by chaning the html response I sent back), but initial requests, such as page reload will always point to US.
This kinda defeats the purpose of being geographically close to users if they always have to connect to the distant server at first and only after that, the inline resources are fetched from a nearby server.. Am I missing some essential principles here ?
It is very much possible to connect to geographically nearest datacenter. There are lots of companies providing this as a service. eg. Akamai, AWS, Google Cloud, Cloudflare.
This is generally done at DNS level. So when someone makes a request to your domain, the first request goes to the DNS server to resolve the domain-name to IP location. -> This is where the appropriate nearest server location gets resolved.
This is generally used for loadbalancing as well, and called DNS loadbalancing.
This is done through Geo DNS, most cloud service providers have this.
This article has a good explanation on how Geo DNS works.
For example, in AWS one could set up a Geo Location policy, this would let you choose the resources that serve your traffic based on the geographic location of your users, meaning the location that DNS queries originate from
I am planning to use Azure Traffic manager to do a failover of my app running on one Azure zone to Azure zone.
I need some suggestion, if that is the correct approach to do a failover ? We have seen issue with Azure that, most of the services in one region goes down for few hours. Although I understand that Azure traffic manager is not associated with the region. But is it possible that Azure traffic manager goes down or that traffic manager endpoint is not reachable although my backend webapp is reachable?
If I am planning to use Azure traffic manager, what are other problems I should be worried about ?
I've been working with TM for some time now, so here are a few issues I haven't seen mentioned before:
Keep-Alive
If your service allows Keep-Alive, then your DNS entry will be ignored as long as the connection remains open. I've seen some exceptionally odd behavior result from this, including users being stuck on a fallback page since they kept using the connection, causing it to remain open indefinitely. If you have access to IIS Manager, you can force Keep-Alive to be false.
Browser DNS Caching
Most browsers have their own DNS cache, and very few honor DNS Time To Live. In my experience Chrome is pretty responsive, with IE and Edge having significant delays if you need them to rollover quickly. I've heard that Opera is particularly bad.
Other DNS Caching
Even if you're not accessing your service through a browser, other components can have DNS caches, and some of them will allow you to manage the cache yourself. This can in theory even depend on ISP's DNS caching, though reports on the magnitude of this vary significantly.
Traffic Manager works at the DNS level, which itself is replicated. However, even then, you should still build in redundancy into your solution.
Take a look at the Azure Architecture Center under "Make all things redundant" and you will see a recommendation for Traffic Manager:
consider adding another traffic management solution as a failback. If
the Azure Traffic Manager service fails, change your CNAME records in
DNS to point to the other traffic management service.
The Traffic Manager internal architecture is resilient to the failure of any single Azure region. So, even if a region fails, Traffic Manager should stay up. That applies to all Traffic Manager components: control plane, endpoint monitoring, and DNS name servers.
Since Traffic Manager works at the DNS level, it doesn't have an 'endpoint' that proxies your traffic--it uses DNS to direct clients to the appropriate endpoint, and clients then connect to those endpoints directly. Thus, an unreachable endpoint is an application problem, not a Traffic Manager problem.
That said, if the Traffic Manager DNS name servers are down, you have a serious problem. You DNS resolution path will fail and your customers will be impacted. The only solution is to either accept the risk (small, but can never be zero) or have a plan in place to use another DNS system, either in parallel or failover. This is not a limitation of Traffic Manager; you could say the same about any DNS-based traffic management system.
The earlier answer from DornaDigital is very good (other than the first point which suggests DNS caching will protect you through a name server outage--it won't). It covers some important points. In short, DNS-based failover works well for new sessions. Existing clients may have to refresh or even close their browser and reconnect.
I also agree with the details provided dornadigital.
There are considerations for front end applications as well. The browsers all have different thresholds for how long they maintain persistent connections. Chromium, for example, currently maintains a connection unless there is inactivity for 300 seconds.
In our web applications, we are detecting the failover by the presence of a certain number of failed requests to the endpoint. After requests begin failing, we pause requests for 301 seconds to allow the connection to reset. This allows the DNS change from the traffic manager to be applied to subsequent requests. We pop up a snackbar to indicate to the user that we are having an issue and display the count down when requests will resume. Similar to Gmail when it has an issue connecting to their servers.
I hope that gives you one idea on how to build some redundancy into your web apps.
I disagree with Jonathan as his understanding of the resiliency of the Traffic Manager service is in disagreement with Microsoft's own documentation on the subject.
When you provision Azure Traffic Manager, you select a region in which to deploy the service. I (correctly) inferred this to assert if said region were to fail, the Traffic Manager service could also be impacted and in turn, your application solution would not properly fail over to the secondary region.
According to Microsoft's Azure Application Architecture Guide, under "Make all things redundant", a customer should deploy Traffic Manager into more than one region:
Include redundancy for Traffic Manager. Traffic Manager is a possible failure point. Review the Traffic Manager SLA, and determine whether using Traffic Manager alone meets your business requirements for high availability. If not, consider adding another traffic management solution as a failback. If the Azure Traffic Manager service fails, change your CNAME records in DNS to point to the other traffic management service.
Azure Application Architecture Guide - Make all things redundant
My thought and intention is to not deploy Traffic Manager within the primary service region, but instead to deploy it into the secondary (failover region) and a tertiary (3rd) region as a backup.
I am starting to develop a webservice that will be hosted in the cloud but needs higher availability than typical cloud SLAs provide.
Typical SLAs, e.g. Windows Azure, promise an availability of 99.9%, i.e. up to 43min downtime per month. I am looking for an order of magnitude better availability (<5min down time per month). While I can configure several load balanced database back-ends to resolve that part of the issue I see a bottleneck at the webserver. If the webserver fails, the whole service is unavailable to the customer. What are the options of reducing that risk without introducing another possible single point of failure? I see the following solutions and drawbacks to each:
SRV-record:
I duplicate the whole infrastructure (and take care that the databases are in sync) and add additional SRV records for the domain so that the user tying to access www.example.com will automatically get forwarded to example.cloud1.com or if that one is offline to example.cloud2.com. Googling around it seems that SRV records are not supported by any major browser, is that true?
second A-record:
Add an additional A-record as alternatives. Drawbacks:
a) at my hosting provider I do not see any possibility to add a second A-record but just one... is that normal?
b)if one server of two servers are down I am not sure if the user gets automatically re-directed to the other one or 50% of all users get a 404 or some other error
Any clues for a best-practice would be appreciated
Cheers,
Sebastian
The availability of the instance i.e. SLA when specified by the Cloud Provider means the "Instance's Health is server running in the context of Hypervisor or Fabric Controller". With that said, you need to take an effort and ensure the instance is not failing because of your app / OS / or pretty much anything running inside the instance. There are few things which devops tend to miss and that kind of hit back hard like for instance - forgetting to configure the OS Updates and Patches.
The fundamental axiom with the availability is the redundancy. More redundant your application / infrastructure is more availabile is your app.
I recommend your to look into the Azure Traffic Manager and then re-work on your architecture. You need not worry about the SRV record or A-Record. Just a CNAME for the traffic manager would do the trick.
The idea of traffic manager is simple, you can tell the traffic
manager to stand after the domain name ( domain name resolution of the
app ) then the traffic manager decides where to send the request on
considerations of factors like Round-Robin, Disaster Management etc.
With the combination of the Traffic Manager and multi-region infrastructure setup; you will march towards the high availability goal.
Links
Azure Traffic Manager Overview
Cloud Power: How to scale Azure Websites globally with Traffic Manager
Maybe You should configure a corosync cluster with DRBD ?
DRBD will ensure You that the data on both nodes are replicated (for example website files and db files).
Apache as web server will be available under a virtual IP to which domain is pointed. In case of one server is down corosync will move all services to second server within few seconds.