How to implement Disaster Recovery for Azure Data Factory? - azure

Currently, we are working on Disaster Recovery scenarios for Azure Data Factory. Is there a reference that discusses Disaster Recovery Implementation for Azure Data Factory? Possibly with an example from Terraform.

Unlike Database etc wherein you physically save data, ADF is just a JSON config or code aspect. So ideally,
you can export the ARM templates as code backup and/or source control ADF in GIT integration
https://learn.microsoft.com/en-us/azure/data-factory/source-control
In case of any regional outage, you can recreate the same ADF with similar configs in another region based on this ARM template and CICD establishment.
Below links can help :
https://learn.microsoft.com/en-us/answers/questions/138430/azure-data-factory-failures-and-disaster-recovery.html
https://www.linkedin.com/pulse/planning-azure-data-factory-disaster-recovery-arvind-periyasamy

Related

Azure Table Storage Backup

In my azure subscription I have a storage account with a lot of tables that contains important data.
As far as I know azure offers a backup point-in-time for the storages and blobs, and geo redundancy in event of a failover. But I couldn't find anything regarding the backup of table storages.
The only way to do so is by using azCopy which is fine and a logic, but I couldn't make it work as I had some issues with permissions even if I set the Azure Blob Data Contributor to my container.
So as an option, I was thinking if there is a way how to implement this using python code to loop throu all the tables in a specific container and make a copy into another container.
Can anyone enlighten me on this matter please?
Did you set the Azure Storage firewall: allow access from all networks?:
Python code is a way but we can't help you design the code. And there isn't an example for you. It doesn't meet Stack Overflow's guideline.
If you still couldn't figure it out with AzCopy, I would suggest you think about use Data Factory to schedule backup the data from table storage to another container.
Create a pipeline with copy active to copy the data from Table
Storage. Ref this tutorial:Copy data to and from Azure Table
storage by using Azure Data Factory.
Create a schedule trigger for the pipeline to make the jobs
automatic.
If the Table storage has many tables, the easiest way is using Copy Data Tool.
Update:
Copy data tool source settings:
Sink settings: auto create the table in sink table storage
HTH.

How do I setup an AutoResolve Integrated Runtime in Azure Purview

I am trying to test out Azure Purview and connect it to an Azure SQL Server. Since the SQL server is hosted in the cloud I want to use the default AutoResolve Integrated Runtime to get connected but there is not one setup or an option to setup a new one. Has anyone else using Purview been able to setup (or needed to setup) an AutoResolve IR?
To connect to Azure SQL DB/MI you can directly go to the Azure Purview portal and register new data sources and select Azure SQL DB/MI.
In this article - Manage data sources in Azure Purview (Preview), you learn how to register new data sources, manage collections of data sources, and view sources in Azure Purview (Preview).
Only to connect on-premise SQL server you need to Set up a
self-hosted integration runtime to scan the data source.
If the data source is located on Azure, you don't need any integration runtime to scan the data source.
Reference: Register and scan an Azure SQL Database.
CHEEKATLAPRADEEP-MSFT is absolutely correct, to go a step further, since you know what an auto resolve integration runtime is, you probably are utilizing Azure Data Factory so in addition to registering your SQL Server, you can also link your Azure Data Factory for data lineage purposes. Based on the pipelines that are executed, it will autonomously create the data lineage.
Navigation to Link Data Factory
Data lineage created by linking Data Factory
Keep in mind, you will have to execute pipelines after linkage for it to pick up the data lineage. Also, for sources or destinations not supported yet, it will not get the data lineage.

Is there a simple way to ETL from Azure Blob Storage to Snowflake EDW?

I have the following ETL requirements for Snowflake on Azure and would like to implement the simplest possible solution because of timeline and technology constraints.
Requirements :
Load CSV data (only a few MBs) from Azure Blob Storage into Snowflake Warehouse daily into a staging table.
Transform the loaded data above within Snowflake itself where transformation is limited to just a few joins and aggregations to obtain a few measures. And finally, park this data into our final tables in a Datamart within the same Snowflake DB.
Lastly, automate the above pipeline using a schedule OR using an event based trigger (i.e. steps to kick in as soon as file lands in Blob Store).
Constraints :
We cannot use use Azure Data Factory to achieve this simplest design.
We cannot use Azure Functions to deploy Python Transformation scripts and schedule them either.
And, I found that Transformation using Snowflake SQL is a limited feature where it only allows certain things as part of COPY INTO command but does not support JOINS and GROUP BY. Furthermore, although the following THREAD suggests that scheduling SQL is possible, but that doesn't address my Transformation requirement.
Regards,
Roy
Attaching the following Idea diagram for more clarity.
https://community.snowflake.com/s/question/0D50Z00009Z3O7hSAF/how-to-schedule-jobs-from-azure-cloud-for-loading-data-from-blobscheduling-snowflake-scripts-since-dont-have-cost-for-etl-tool-purchase-for-scheduling
https://docs.snowflake.com/en/user-guide/data-load-transform.html#:~:text=Snowflake%20supports%20transforming%20data%20while,columns%20during%20a%20data%20load.
You can create snowpipe on Azure blob storage, Once snowpipe created on top of your azure blob storage, It will monitor bucket and file will be loaded into your stage table as soon as new file comes in. After copied the data into stage table you can schedule transformation SQL using snowflake task.
You can refer snowpipe creation step for azure blob storage in below link:
Snowpipe on microsoft Azure blob storage

Disaster recovery set up for Azure Data Factory service

I just added azure data factory service to my subscription. During the setup I was able to select only one region, what happens if disaster happens in this region? How does ADF guarantees high availability?
Do we need to wait till recovery or is there any similar setup like in ADLS2(GRS & RA-GRS).
No statements of Disaster Recovery could be found in the ADF official document.Based on my researching,ADF only provides cloud-based data integration work flow, the DR is affected by the supported data stores in ADF actually. I provide some clues for your reference:
1.The statement of Location option when you create ADF:
2.High availability for Azure Integration Runtime,it is affected by DU setting(allocation of compute resources):https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#data-integration-units
3.High availability for Self-Hosted Integration Runtime,it could be better if you create multiple nodes in the on-premise environment:https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#high-availability-and-scalability

Copy files from on-prem to azure

I'm new to Azure eco system. I'm doing some research on copying data from on-prem to azure. I found following options:
AzCopy
Azure Data Factory (Copy Data Tool)
Data Management Gateway
Ours is a Microsoft shop; so, I'm looking for tools that gel with MS platform. Also, down the line, we want to automate the entire thing as much as we can. So, I think, Azure Storage Explorer is out of the question. Is there a preference among the above 3. Or, are there any better tools?
I think you are mixing stuff, Copy Data Tool is just an Azure Data Factory Wizard to make some sample data moving between resources. Azure Data Factory uses the data management gateway to get on premises resources such as files and databases.
What you want to do can be made with Azure Data Factory. I recommend using version 2 (even in its preview version) because its Authoring is easier to understand if you are new to the tool. You can graphically configure linked services, datasets and pipelines from there.
I hope this helped, if you need further help just ask away!
If you're already familiar with SSIS, there's also the option to use SSIS in ADF that enables on-prem data access via VNet.

Resources