What is the difference between Active geo replication, Auto failover groups and Read scale out in Azure.
Active Geo-replication provides replication of your primary database to a sec database in a different azure region.
Auto-Fail-over groups is a feature that provides automated management of the fail-over in case the primary server goes down, traffic will route to the secondary on its own.
Related
I have two azure SQL managed instances in different region and configured fail-over group between the same MI instances. Also configured the transactional replication between on-prem sql \ Azure IaaS VM to primary managed instance. Now want to test fail-over group by failing over to secondary and then to primary. What's the best way \ possible way so that replication should not get disturbed.
If geo-replication is enabled on a publisher or distributor instance in a failover group, the managed instance administrator must clean up all publications on the old primary and reconfigure them on the new primary after a failover occurs. Please refer MS doucumentation for more information.
When you configure the subscriber, use the failover group read/write listener endpoint instead of the primary managed instance name.
The following information is available in the Microsoft documentation on this subject:
"If a subscriber SQL Managed Instance is in a failover group, the publication should be configured to connect to the failover group listener endpoint for the subscriber managed instance. In the event of a failover, subsequent action by the managed instance administrator depends on the type of failover that occurred:
For a failover with no data loss, replication will continue working
after failover.
For a failover with data loss, replication will work as well. It will
replicate the lost changes again.
For a failover with data loss, but the data loss is outside of the
distribution database retention period, the SQL Managed Instance
administrator will need to reinitialize the subscription database."
We have one webapp running on Azure, which pushes data to Azure Redis, then we have an on-prem component which reads that data from Azure redis and processes that.
Recently due to Azure region failure that Azure Redis went down. Web app and my on-prem component was not able to contact Azure redis.
How can I make sure zero down time for my web app to access Azure redis ?
Redis-GeoRelication doesn't solves my problem as it is unideirectional, and Manual failover. Also my web app and on-prem component need to know both redis endpoint, and contact accrondignly. which is not seemless.
Azure redis doesn't support cluster having shards in multiple region.
So my requirement is, Web-app and on-prem component both need to contain one cache/database endpoint ( without having any knowledge about the replication of the cache/database). if primary cache/db fails then, that endpoint should automatically goes to replicated cache or DB.
As per Documentation from Azure, it doesn't seem Azure Redis is correct fit for this requirment, is there any other Azure component which fits this requiremnet.
Had a look to Azure sql with failover group. As per documentation, "you can configure a grace period that controls the time between the detection of the outage and the failover itself. It is possible that traffic manager initiates the endpoint failover before the failover group triggers the failover of the database. In that case the web application cannot immediately reconnect to the database. But the reconnections will automatically succeed as soon as the database failover completes." . We can set that grace period to 1 hour (minimum) .
So it means with Azure sql also. In case of failure of one db server, my web application will not be able to write to db for atleast 1 hour, Is my understanding correct ?
Azure SQL and Azure Cosmos DB both support single endpoint and HA across regions, you might want to look into those.
Those are not caches, but they do allow for a single endpoint and failover
If I go into the Azure portal and go to a SQL Azure db and click on Geo-Replication I can select another data center to have a secondary database in. I can configure this as "readable." With that done, do I automatically get failover?
So for example, if my primary db is in Central US and I configure Geo-Replication to US East 2, will anything automatically failover my db to US East 2 if there is an error in Central US? Or do i have to initiate the failover through the portal or some code/monitoring solution? And would i have to update my connection string or does the azure infrastructure manage this for me?
I've reviewed a few docs below about this but looking for some more input:
- https://azure.microsoft.com/en-us/documentation/articles/sql-database-designing-cloud-solutions-for-disaster-recovery/?rnd=1
- https://azure.microsoft.com/en-us/documentation/articles/sql-database-geo-replication-failover-portal/
- https://azure.microsoft.com/en-us/documentation/articles/sql-database-geo-replication-overview/
https://azure.microsoft.com/en-us/documentation/articles/sql-database-geo-replication-overview/
do i have to initiate the failover through the portal or some
code/monitoring solution?
Yes, you have to initiate the fail over explicitly. There is no automatic failover in case the primary goes offline.
would i have to update my connection string or does the azure
infrastructure manage this for me?
You would have to update connection string explicitly as well.
FailOver and DR drill sections of this link should provide necessary info, it also talks about keeping firewall rules and users in sync between primary and secondary : https://azure.microsoft.com/en-us/blog/spotlight-on-new-capabilities-of-azure-sql-database-geo-replication/
We are trying to SQL Azure Geo-Replication for load balancing 95% of SQL transaction are read only and 5% requires Write.
In SQL Azure Geo-replication we can have only One(aka Primary) database as RW and rest are RO. so we need to separate the RW and RO traffic. I was wondering that is there an easy way to use multiple connection string one for RW and one for RO.
Assuming you have a web app deployed for each replica as described in Pattern 2, you can create a round-robin TM profile (if replicas are in the same region) or performance profile (if replicas are in different regions). This way all connections to the TM endpoint will be routed accordingly. Since you have 5% writes, you should also create a failover profile with a different endpoint. The latter will route all connections to the same web app (and the primary db). You can have up to 4 read-only replicas.
AFAIK Amazon AWS offers so-called "regions" and "availability zones" to mitigate risks of partial or complete datacenter outage. Looks like if I have copies of my application in two "regions" and one "region" goes down my application still can continue working as if nothing happened.
Is there something like that with Windows Azure? How do I address risk of datacenter catastrophic outage with Windows Azure?
Within a single data center, your Windows Azure application has the following benefits:
Going beyond one compute instance, your VMs are divided into fault domains, across different physical areas. This way, even if an entire server rack went down, you'd still have compute running somewhere else.
With Windows Azure Storage and SQL Azure, storage is triple replicated. This is not eventual replication - when a write call returns, at least one replica has been written to.
Ok, that's the easy stuff. What if a data center disappears? Here are the features that will help you build DR into your application:
For SQL Azure, you can set up Data Sync. This facility synchronizes your SQL Azure database with either another SQL Azure database (presumably in another data center), or an on-premises SQL Server database. More info here. Since this feature is still considered a Preview feature, you have to go here to set it up.
For Azure storage (tables, blobs), you'll need to handle replication to a second data center, as there is no built-in facility today. This can be done with, say, a background task that pulls data every hour and copies it to a storage account somewhere else. EDIT: Per Ryan's answer, there's data geo-replication for blobs and tables. HOWEVER: Aside from a mention in this blog post in December, and possibly at PDC, this is not live.
For Compute availability, you can set up Traffic Manager to load-balance across data centers. This feature is currently in CTP - visit the Beta area of the Windows Azure portal to sign up.
Remember that, with DR, whether in the cloud or on-premises, there are additional costs (such as bandwidth between data centers, storage costs for duplicate data in a secondary data center, and Compute instances in additional data centers). .
Just like with on-premises environments, DR needs to be carefully thought out and implemented.
David's answer is pretty good, but one piece is incorrect. For Windows Azure blobs and tables, your data is actually geographically replicated today between sub-regions (e.g. North and South US). This is an async process that has a target of about a 10 min lag or so. This process is also out of your control and is purely for a data center loss. In total, your data is replicated 6 times in 2 different data centers when you use Windows Azure blobs and tables (impressive, no?).
If a data center was lost, they would flip over your DNS for blob and table storage to the other sub-region and your account would appear online again. This is true only for blobs and tables (not queues, not SQL Azure, etc).
So, for a true disaster recovery, you could use Data Sync for SQL Azure and Traffic Manager for compute (assuming you run a hot standby in another sub-region). If a datacenter was lost, Traffic Manager would route to the new sub-region and you would find your data there as well.
The one failure that you didn't account for is in the ability for an error to be replicated across data centers. In that scenario, you may want to consider running Azure PAAS as part of HP Cloud offering in either a load balanced or failover scenario.