I am working on a project whereby the web site (all components are hosted in Azure) will have both US and international users. We are using Blob and Table storage for 99% of the data. What I do not understand is how to setup global instances, including multiple tables, etc, and keep everything in sync. Say a user logs into the site from France, how can I ensure they will always hit the same data center (which implies the same Storage instance)? If they hit a different storage instance, their data will not be there and/or stale.
Both Compute and Storage are affinitized to a specific data center. There's no global compute or global storage deployment concept.
Having said that: You'll typically host your human-facing app (e.g. web app) in a single data center. Usually, latency between browser and server is not much of an issue if only a relatively-small quantity of data is moving between the two. The majority of bandwidth is typically between web server and app servers and/or database instances. And in Azure, data doesn't necessarily need to be colocated in the same data center as the web app (though it's the ideal scenario from latency + egress bandwidth cost perspective).
If you want Compute in multiple data centers, you'd need to have a higher-level mechanism doing some type of load balancing for you (such as Azure's Traffic Manager). However, even with Traffic Manager's "closest" setting, you're not really guaranteed that a user in France will hit the W. Europe vs. N. Europe data center. You'd always have to plan for a visitor hitting any data center. This is why it's much simpler to deal with Compute in a single data center.
Regarding data: If your Compute is in a single data center, there's no need (other than disaster recovery) to write data to multiple data centers. If you do decide to deploy Compute to multiple data centers, you'll need your own method for syncing data. For Azure blobs & table storage, you can consider some type of command pattern (e.g. CQRS) where your operations are queue driven. This allows you to process each queued data operation against multiple storage accounts across different data centers.
Now, you might have data sovereignty issues, where data must reside in a specific data center for specific customers, based on their geo. Again, you'll need to implement this in the app layer. One thought on this is to affinitize a user with a particular data center when they get set up (and just store the data center mapping in a single database along with your web tier). At this point, when a visitor logs in, you can easily look up their correct data center and, within their browsing session, access their data from the specific data center.
Related
I'm a bit confused by Azure price calculator. In particular it doesn't explain the bandwidth pricing.
I'm considering Azure for a restful api that is going to use blobs for most data storage together with a sql server database for a subset that is easier to manage with a relational approach.
In this application a lot of data will enter the system through the ReST api, but a small fraction will be exposed to the clients (mainly as summary reports). Still the total bandwidth required should be in the order of 50 GiB/mo.
In the Azure's pricing page related to data transfer I see the pricing is only related to outgoing data, but I cannot figure how this relates to a ReST api that will be hosted in Azure App Service.
I mean, it could just mean that I'm going to pay for the bandwidth consumed by HTTPS responses (and not by HTTPS requests), but it seems a bit hard to estimate what this pricing is going to be.
Within a given region, there are no transfer costs at all. You mentioned using App Service, blobs, and SQL Database. As long as those services are within a single region, there are zero bandwidth costs as data flows between them and any other service within that region.
Bandwidth is billed specifically for outbound transfer. So, essentially you're metered for all data leaving a given region.
If you look at the page Data Transfers Pricing Details
Data Transfers refer to data moving in and out of Azure data centres other than those explicitly covered by the Content Delivery Network or ExpressRoute pricing.
Inbound data transfers
(i.e. data going into Azure data centres): Free
Outbound data transfer prices are set at a sliding scale depending on location and bandwidth used.
inbound traffic is free so the data coming in can be removed from the equation. Outbound is not free, and you saw the pricing page.
Data transfer is everything that is going out from every operation you execute.
And it is hard to estimate the traffic pricing - i would recommend to register the Azure trial and test it for a month and see how it is going. Because your data is not only what is returned, there is a lot of payloads coming with that.
But if you estimate 10 GB/month of outbound traffic, then it will start from $0.087 per GB starting from fifth GB (because first 5 are free). There are different regions described at the pricing page as well, so you should apply the pricing according to the region where your website is.
I tried to dig on MSDN but could not get concrete statement for which is the best load balancing method.
could someone please share some light on which of the below are best option for given scenario:
Performance
Failover
Round Robin.
Scenario:
x Web Roleshosted on Large VM on single data center.
Requirement:
must be 100% up 24x7.
Thank you.
First: Do you really want to offer a 100% uptime SLA for your customers, when Azure itself doesn't offer 100% in its SLA's?
That said: Traffic Manager only load-balances your compute, not your storage. So if you're trying to increase uptime by having a set of backup compute nodes running in another data center, you need to think about data access speed and cost:
With round robin, you'll now have distributed traffic across multiple data centers, guaranteed, and constantly. And if your data is in a single data center (which is a good idea to have data in a single System of Record, unless you have replication logic all taken care of), some of your users are going to see increased latency as the nodes separated from your data are going to be requesting data across many miles (potentially between continents). Plus, data egress has a $$$ cost to it.
With performance, your users are directed toward the data center which offers them the lowest latency. Again, this now means traffic across multiple data centers, with the same issues as round robin.
With failover, you now have all traffic going to one data center, with another designated as your failover data center (so it's for High Availability). In the event you have an outage in the primary data center, you'd now have a failover data center to rely on. This may help justify the added latency and cost, as you'd only experience this latency+cost when your primary app location becomes unavailable for some reason.
So: If you're going for the high availability route, to help approach the 100% availability mark, I'm guessing you'd be best off with the failover model.
Traffic manager comes into picture only when your application is deployed across multiple cloud services within same data center or in different data centers. If your application is hosted in a single cloud service (with multiple instances of course) , then the instances are load balanced using Round Robin pattern. This is the default load balancing pattern and comes to you without any extra charge.
You can read more about traffic manager here: https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-overview/
As per my guess there can not be comparison which is best load balancing method of Azure Traffic manager. All of them have unique advantages and vary depending on the requirement of application. Most common scenario is to use performance load balancing option with azure traffic manager. But as Gaurav said, you will have to have your cloud service application hosted on more than one cloud services. If you wish to implement performance load balancing then here is the link to get you started - http://sanganakauthority.blogspot.com/2014/06/performance-load-balancing-using-azure.html
I have my Azure Sql database located in West Europe and are considering to have a database in the States also. Deploying my website in the states was easy, but this website then query the database in Europe, which gives delays.
What do people do in these cases? Having separate databases for different users could work I guess, but it then fails if a user normally on one server get routed to the other server, then his data is not in the database. Is there easy solutions to have the same data available in two azure SQL servers, and Azure maintain the data sync? What about conflicts when syncing?
It really depends on your requirements and how you implement routing. You can design your distributed application in a manner that user A, when authenticated always go the US server for instance. Even if he/she is currently in Europe or Asia.
If you want to sync everything everywhere, there a preview feature named "SQL Data Sync". It can sync data between multiple instance of SQL Server (including on-premises SQL Server installations). It is quite flexible in terms of configuring and options for syncing. But again, it really depends on application requirements. If I was building distributed system, I would not sync data across continents. Will design the app so that user specific data lives in only one Data Centre. this, of course is impossible if my user has access to a lot more data then just related to his/her profile.
The best option would be to keep user-specific data in user's designated Data Centre, and sync the data that must be available to all users at all locations.
I am trying to understand how to design the distribution of an application.
The plan is to replicate the whole app in different geographic regions(EU,US,Asia) and use Azure Traffic Manager to handle the requests distribution.
The thing is that the app has a special need where the requests should be isolated within a region. The US users should be directed only to US data center, EU users to EU data center and so on.
The requirement is to prevent the traffic randomly going to different data centers, for example: a US user makes few requests to US data center and then few requests to EU data center.
Also it is important to note that this is not about request stickinesses. What I need to achieve is that all users from same city/country always get directed to the same data centers.
Only at a point of data center failure ALL the requests can be directed to another region.
Is it possible to create such configuration?
From my understanding it seems you can do this via the "Performance" Load Balancing Method. This Hands-on-Lab provided by Microsoft gives a step by step process to setting up the Traffic Manager to handle requests based on geography.
AFAIK Amazon AWS offers so-called "regions" and "availability zones" to mitigate risks of partial or complete datacenter outage. Looks like if I have copies of my application in two "regions" and one "region" goes down my application still can continue working as if nothing happened.
Is there something like that with Windows Azure? How do I address risk of datacenter catastrophic outage with Windows Azure?
Within a single data center, your Windows Azure application has the following benefits:
Going beyond one compute instance, your VMs are divided into fault domains, across different physical areas. This way, even if an entire server rack went down, you'd still have compute running somewhere else.
With Windows Azure Storage and SQL Azure, storage is triple replicated. This is not eventual replication - when a write call returns, at least one replica has been written to.
Ok, that's the easy stuff. What if a data center disappears? Here are the features that will help you build DR into your application:
For SQL Azure, you can set up Data Sync. This facility synchronizes your SQL Azure database with either another SQL Azure database (presumably in another data center), or an on-premises SQL Server database. More info here. Since this feature is still considered a Preview feature, you have to go here to set it up.
For Azure storage (tables, blobs), you'll need to handle replication to a second data center, as there is no built-in facility today. This can be done with, say, a background task that pulls data every hour and copies it to a storage account somewhere else. EDIT: Per Ryan's answer, there's data geo-replication for blobs and tables. HOWEVER: Aside from a mention in this blog post in December, and possibly at PDC, this is not live.
For Compute availability, you can set up Traffic Manager to load-balance across data centers. This feature is currently in CTP - visit the Beta area of the Windows Azure portal to sign up.
Remember that, with DR, whether in the cloud or on-premises, there are additional costs (such as bandwidth between data centers, storage costs for duplicate data in a secondary data center, and Compute instances in additional data centers). .
Just like with on-premises environments, DR needs to be carefully thought out and implemented.
David's answer is pretty good, but one piece is incorrect. For Windows Azure blobs and tables, your data is actually geographically replicated today between sub-regions (e.g. North and South US). This is an async process that has a target of about a 10 min lag or so. This process is also out of your control and is purely for a data center loss. In total, your data is replicated 6 times in 2 different data centers when you use Windows Azure blobs and tables (impressive, no?).
If a data center was lost, they would flip over your DNS for blob and table storage to the other sub-region and your account would appear online again. This is true only for blobs and tables (not queues, not SQL Azure, etc).
So, for a true disaster recovery, you could use Data Sync for SQL Azure and Traffic Manager for compute (assuming you run a hot standby in another sub-region). If a datacenter was lost, Traffic Manager would route to the new sub-region and you would find your data there as well.
The one failure that you didn't account for is in the ability for an error to be replicated across data centers. In that scenario, you may want to consider running Azure PAAS as part of HP Cloud offering in either a load balanced or failover scenario.