We are working through some DR scenarios. Is the management plane / metadata for a Synapse workspace geo-redundant? E.g. the workspace definition, the list of notebooks, the currently deployed pipelines, monitoring run history etc? I appreciate the data has backups for the deployed region etc but I am interested in the actual workspace metadata/definition.
The scenario we are interested in is if a local Azure data center ceases to operate, say due to a natural disaster, Will the Synapse workspace continue to operate and show the deployed workbooks, pipelines, run history of previous jobs etc? From the documentation it seems like the data will need to be restored and possibly linked services reconnected, but I cant find anything on the workspace metadata / management plane.
If anyone could point me to docs or add clarity, that would be great.
Related
Does Azure Synapse Analytics support Geo-Redundancy like Storage Account & Key vault? If not, why do I implement High availability for Azure Synapse Analytics? I have the following components as a part of the Azure Synapse Analytics Solution
SQL Dedicated Pool
SQL Serverless Pool
Spark Pool
Storage Account(ADLS)
Azure DevOps Git Repo
First, designing and documenting a Disaster Recovery plan is a project unto itself. I’ve been working on one for a client of mine using Synapse for several months part-time.
The first task is to define your Recovery Time Objective (RTO, meaning how long before your solution is back up in the event of a disaster) and your Recovery Point Objective (RPO, meaning how many minutes or hours of data you can afford to lose… and with analytics solutions you can usually reload from the source to catch up). If your RTO and RPO are low for an analytics solution (like 2 hours) then you probably need to spin up parallel environments in another region and load data to both environments in parallel. If your RTO and RPO are typical for an analytics solution (24-48 hours) then you can probably survive with ensuring backups are geo-redundant and restoring in the event of an outage. I would recommend you preconfigured your Synapse workspace and other infrastructure before the outage unless you have a trust an infrastructure as code solution. If your RPO and RTO are long (like 7 days) it’s extremely unlikely an Azure service or region is going to be down for that long.
ADLS supports RA-GRS redundancy so you could read all the files from the secondary endpoint in its pair region and copy files to another ADLS in the secondary region. Unfortunately ADLS accounts don’t yet support user-initiated failover.
Dedicated SQL Pools support built-in geo redundant backups once a day but you can’t control when they are taken. If this isn’t acceptable then you need to proactively create a user-defined restore point and proactively restore it cross region and pause the SQL pool.
Synapse Serverless SQL pools have no storage so ensure you have a backup of the schema (views, permissions, external data sources, external tables, etc) in source control or somewhere. The data will failover with ADLS.
For Spark Pools ensure you have your notebook artifacts in source control and you can always run them in a different Synapse workspace in another region when needed. Document your cluster configs.
Write out a disaster recovery playbook and do a DR drill periodically (once a quarter or once a year).
Here is another author’s description of the DR plan for Synapse.
I am attempting to spin up an azure synapse pool in terraform. At present from the documentation found at: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/synapse_sql_pool, it appears you have to use a synapse workspace, which also includes a datafactory integration and powerbi, etc.
Right now we just want to datawarehouse not all the other bells and whistles. As you can see within the Azure Portal, you are free to spin up a synapse analytics DW with or without a workspace (see the right image in the box, "formerly SQL DW"):
When you spin that up, you simply have a standalone DW...
Any insight on just getting the datawarehouse as you can in the portal without the workspace and realted?
I am not a Terraform guy. As for Synapse, you are referring to the new one that is in preview. The new one has the workspace which supports SQL pools, Sparks clusters and Pipelines. Although they are supported, they are not created when you deploy a Synapse workspace.
So you can go ahead and created the workspace and one SQL Pool and you will get what you're looking for: the data warehouse engine, named SQL Pool.
Some extra notes: there are 2 types of SQL data warehouse in Synapse Analytics: SQL Pools and SQL on demand. The first one is provisioned computing and is the traditional one with all the features. SQL on demand is still in preview, doesn't have all the features and is charged by the terabyte processed by your queries.
Happy data crunching!
I'm new for Azure, Do we have any default job to perform database backup from Azure Tabular storage?.
Do we have any default job to perform database backup from Azure Tabular storage?.
No, we do not have default job to do it.
Huge demand to backup data directly from the azure Blob/table storage accounts. In order to meet compliance- today users have to move the data to VM and then back it up. This will simplify the current process to take the backup, meet the compliance and BCDR requirements and also save on Cost.
You can give your voice to this feedback to promote the further to achieve. Or you can refer to this issue to manually backup your table storage.
I just added azure data factory service to my subscription. During the setup I was able to select only one region, what happens if disaster happens in this region? How does ADF guarantees high availability?
Do we need to wait till recovery or is there any similar setup like in ADLS2(GRS & RA-GRS).
No statements of Disaster Recovery could be found in the ADF official document.Based on my researching,ADF only provides cloud-based data integration work flow, the DR is affected by the supported data stores in ADF actually. I provide some clues for your reference:
1.The statement of Location option when you create ADF:
2.High availability for Azure Integration Runtime,it is affected by DU setting(allocation of compute resources):https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#data-integration-units
3.High availability for Self-Hosted Integration Runtime,it could be better if you create multiple nodes in the on-premise environment:https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#high-availability-and-scalability
I want to confirm our understanding of how our Azure SQL databases are being backed up to enable point in time restore. We have not currently configured geo-replication to have the database available in another region. We may in the future as some data analysis is done. But my understanding is that the database is still being backed up to a geo redundant location so I could do a geo-restore if there was an issue with the data center that houses my sql database. Is that correct or do I need to enable geo-replication and pay for a second database in order to have a disaster recover option if the datacenter had an issue.
To clarify further: I think this article states what I'm saying in the Geo-Restore section.
https://azure.microsoft.com/en-us/documentation/articles/sql-database-business-continuity/
Thanks
Yes, all databases have a geo-replicated copy for disaster recovery purposes. For more details, please see the following: https://azure.microsoft.com/en-us/blog/azure-sql-database-geo-restore/
Geo-restore uses the same technology as point in time restore with one
important difference. It restores the database from a copy of the most
recent daily backup in geo-replicated blob storage (RA-GRS). For each
active database, the service maintains a backup chain that includes a
weekly full backup, multiple daily differential backups, and
transaction logs saved every 5 minutes. These blobs are geo-replicated
this guarantees that daily backups are available even after a massive
failure in the primary region.
Yes, Azure SQL Databases are automatically backed up to a different Azure data center using Geo-Replication. This is an automatic features of Azure SQL that is baked into the service offering.
Here's a blog post with further information about Azure SQL Data Replication:
https://azure.microsoft.com/en-us/blog/azure-sql-database-standard-geo-replication/