Disaster recovery set up for Azure Data Factory service - azure

I just added azure data factory service to my subscription. During the setup I was able to select only one region, what happens if disaster happens in this region? How does ADF guarantees high availability?
Do we need to wait till recovery or is there any similar setup like in ADLS2(GRS & RA-GRS).

No statements of Disaster Recovery could be found in the ADF official document.Based on my researching,ADF only provides cloud-based data integration work flow, the DR is affected by the supported data stores in ADF actually. I provide some clues for your reference:
1.The statement of Location option when you create ADF:
2.High availability for Azure Integration Runtime,it is affected by DU setting(allocation of compute resources):https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#data-integration-units
3.High availability for Self-Hosted Integration Runtime,it could be better if you create multiple nodes in the on-premise environment:https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#high-availability-and-scalability

Related

How to implement "High Availability" for Azure Synapse Analytics?

Does Azure Synapse Analytics support Geo-Redundancy like Storage Account & Key vault? If not, why do I implement High availability for Azure Synapse Analytics? I have the following components as a part of the Azure Synapse Analytics Solution
SQL Dedicated Pool
SQL Serverless Pool
Spark Pool
Storage Account(ADLS)
Azure DevOps Git Repo
First, designing and documenting a Disaster Recovery plan is a project unto itself. I’ve been working on one for a client of mine using Synapse for several months part-time.
The first task is to define your Recovery Time Objective (RTO, meaning how long before your solution is back up in the event of a disaster) and your Recovery Point Objective (RPO, meaning how many minutes or hours of data you can afford to lose… and with analytics solutions you can usually reload from the source to catch up). If your RTO and RPO are low for an analytics solution (like 2 hours) then you probably need to spin up parallel environments in another region and load data to both environments in parallel. If your RTO and RPO are typical for an analytics solution (24-48 hours) then you can probably survive with ensuring backups are geo-redundant and restoring in the event of an outage. I would recommend you preconfigured your Synapse workspace and other infrastructure before the outage unless you have a trust an infrastructure as code solution. If your RPO and RTO are long (like 7 days) it’s extremely unlikely an Azure service or region is going to be down for that long.
ADLS supports RA-GRS redundancy so you could read all the files from the secondary endpoint in its pair region and copy files to another ADLS in the secondary region. Unfortunately ADLS accounts don’t yet support user-initiated failover.
Dedicated SQL Pools support built-in geo redundant backups once a day but you can’t control when they are taken. If this isn’t acceptable then you need to proactively create a user-defined restore point and proactively restore it cross region and pause the SQL pool.
Synapse Serverless SQL pools have no storage so ensure you have a backup of the schema (views, permissions, external data sources, external tables, etc) in source control or somewhere. The data will failover with ADLS.
For Spark Pools ensure you have your notebook artifacts in source control and you can always run them in a different Synapse workspace in another region when needed. Document your cluster configs.
Write out a disaster recovery playbook and do a DR drill periodically (once a quarter or once a year).
Here is another author’s description of the DR plan for Synapse.

How to implement Disaster Recovery for Azure Data Factory?

Currently, we are working on Disaster Recovery scenarios for Azure Data Factory. Is there a reference that discusses Disaster Recovery Implementation for Azure Data Factory? Possibly with an example from Terraform.
Unlike Database etc wherein you physically save data, ADF is just a JSON config or code aspect. So ideally,
you can export the ARM templates as code backup and/or source control ADF in GIT integration
https://learn.microsoft.com/en-us/azure/data-factory/source-control
In case of any regional outage, you can recreate the same ADF with similar configs in another region based on this ARM template and CICD establishment.
Below links can help :
https://learn.microsoft.com/en-us/answers/questions/138430/azure-data-factory-failures-and-disaster-recovery.html
https://www.linkedin.com/pulse/planning-azure-data-factory-disaster-recovery-arvind-periyasamy

Data Migration from Snowflake (on GCP Instance) to Snowflake (Azure Instance)

I am looking for some inputs on how to do a GCP cloud to AZURE cloud data migration.
Scenario -
I have a snowflake instance configured on GCP cloud (multiple databases holding legacy data) and I have another snowflake instance configured on Azure Cloud (DWH created on this instance).
I want to move/copy the data of all the databases (including all child objects - schema, table, views etc) sitting on GCP snowflake instance to snowflake instance configured on Azure Cloud.
Can you please guide me on what can be the best solution for such data migration and any steps or documentation link would be really helpful.
Many thanks - Minti
Please check the Database replication mechanism which can be used as a migration tool for SF account from 1 cloud platform to another. https://docs.snowflake.com/en/user-guide/database-replication-intro.html
Not something I've done before to be honest but if you didn't want to use external tools one possible method would be to secure share your GCP databases with your Azure Snowflake account.
You then might be able to create a new database that is a clone of this share (not sure if this is possible).
Most objects get cloned apart from stages and pipes but tables, views etc should carry over
This is a pretty easy process with a couple of prerequisites.
Make sure you have Organizations enabled on your GCP account.
This feature allows you to self-provision Snowflake accounts on any cloud provider/region. Open a support case to enable it.
Introduction to Organizations
Create a new account on Azure if you haven't already.
Enable Replication on both accounts
This can be done when logged into the account with the ORGADMIN role
Replicate your databases
Note: this will work for having a replica of the databases in the GCP Snowflake account databases in your Azure Snowflake account. If you want to permanently migrate your databases you need to set up Failover/Failback. This is a Business Critical feature, but Snowflake support will enable it for lower editions until you can complete your migration, at which point they will disable it.
Replicating a Database to Another Account
There are two options
You could make use of the replication feature
High level Steps include the below
a. Target account to be created - Can use the Organizations feature available in Snowflake(Enabled by Snowflake Support upon request)
b. Account level objects should be created manually in the target account
Note: The Failover feature is supported for the accounts whose edition is Business-critical and above. However, for account migration scenarios, this feature will be enabled for a temporary period by the Snowflake Support.
c. Replication - the below links can be referenced for a complete understanding of the process.
https://docs.snowflake.com/en/user-guide/database-replication-intro.html#introduction-to-database-replication-across-multiple-accounts
https://docs.snowflake.com/en/user-guide/database-replication-config.html#replicating-a-database-to-another-account
https://docs.snowflake.com/en/user-guide/database-failover-config.html#failing-over-databases-across-multiple-accounts
Please find the link below to have an overview on the costs associated
https://docs.snowflake.com/en/user-guide/database-replication-billing.html#understanding-billing-for-database-replication
Limitations
https://docs.snowflake.com/en/user-guide/database-replication-intro.html#current-limitations-of-replication
One other option is to create the target account and use the unloading and loading feature
https://docs.snowflake.com/en/user-guide-data-unload.html
https://docs.snowflake.com/en/user-guide-data-load.html

Website deployed on Azure need Disaster Recovery and High Availability

I have site deployed on Azure. I am using Cloud Services, Storage, SQL Database.
I want to have High Availability and Disaster Recovery for our Azure Website.
My question is that how can we provide this feature on Azure? Is it already managed by Azure or we need to use any services from Azure for the same.
Thanks in Advance
Well, I don't think DR is needed, since everything you use is PaaS Service, so if you trust Azure - it will handle everything for you, if you don't. Well, if you don't it won't help you ;)
So, in my opinion best way to achieve what you are looking for is using build-in HA for Cloud Services (increase instance count), while Storage and Azure SQL are HA by design.
If you really-really want DR, you can implement Traffic Manager with extra copy of your Cloud Service in another Azure region and implement Storage Replication and Azure SQL Replication.
I won't be giving link to documentation, as all of those are found in under 5 minutes in and search engine.

Geo-replicate between two regions

I know that by default geo-replicate is turned on for the azure service. However it only does so between two places within the same region. E.g. if I have chosen North Europe, the geo-replicate will be located in West Europe. Is it possible to so that I have the replication in US instead?
I want to make service such that my database can be located in two or more regions, such that the response time when accessing the database will be minimal. That is for a user in US he will access the database replica in US, while or a European user he will access the replica in Europe.
First, you should know that geo replication is turned on for Azure STORAGE service, not for any other Azure services yet. Then, you shall also be aware that this geo-replication is for disaster recovery mainly (and only as of today).
If you have to replicate a DB (Windows Azure SQL Database, a.k.a. WASD) you can use the SQL Data Sync - the only known way as of today to sync Azure Databases (either between different geo regions, or between Azure and on-premises).
There is not support for Windows Azure Cloud Service geo-replication. If you need to geographically distribute your application, you have to manage cloud service deployment across different data centers on your own.
If this is the case, for Azure storage, I would suggest using a single Storage service for WRITE operations, but Azure CDN for READ operations. Otherwise it might get too complicated. Of course the chosen architectural approach will depend on the requirements of the app (and expected load).
Then, you have to combine the different deployments with Azure Traffic Manager with a "Performance" algorithm setup.
EDIT (NOV 2014)
As of Q3 2014, Azure SQL Database also support Geo Replication. And Azure Data Sync is depricated and removed service. Azure Storage replication continue to be offered with 3 different flavours: Zone redundant, Geo redundant, Geo redundant with Read Access.
And still no option to replicate between Geographic Regions (i.e. from EU to US). Replication is still only an option between Geo Zone pairs (same geography).
I believe this is not possible today out of the box. You would need to do that on your own using data sync (for SQL Azure) and similar technologies (for Windows Azure Storage).

Resources