Is it possible to use Azure Data Factory on-premise without data running through the cloud? - azure

is it possible to use Azure Data Factory on-premise without letting the data run through the cloud? I know Talend got a prodcut, where the data is transfered only on our machines and not in the cloud.
Read documentation on Microsoft.com but didnt find any useful information

You may use a self-hosted integration runtime to transfer data entirely through your on-premises infrastructure, as long as both the data source and sink are on-premises.
However, the control flow will still happen through the cloud, even if the data itself never leaves your data center. For this reason, setting up a self-hosted integration runtime will still require outbound network access from your infrastructure to Azure.
Check out this piece of documentation for more information: https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime?tabs=data-factory#command-flow-and-data-flow

Related

Azure Self Hosted Integration Runtime - Security Concerns

If I want to host a self hosted integration runtime to process data on my local network, and just use ADF a data manipulation tool rather than something that submits data to the cloud, will it ever be sent up to Azure to be processed?
It appears to be that the data would solely be processed on the nodes of the Self Hosted Integration Runtime, but wanted to confirm.
Reference: https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime?tabs=data-factory
Azure Data Factory including Azure Integration Runtime and Self-hosted
Integration Runtime does not store any temporary data, cache data or
logs except for linked service credentials for cloud data stores,
which are encrypted by using certificates.
The Self-Hosted IR is Hybrid Scenario. You can implement additional security features like Storing credentials in Key-Vault, using different ports, different data encryption techniques (IPSec VPN or Azure ExpressRoute) during transit.
Refer: Security considerations for data movement in Azure Data Factory

How do I setup an AutoResolve Integrated Runtime in Azure Purview

I am trying to test out Azure Purview and connect it to an Azure SQL Server. Since the SQL server is hosted in the cloud I want to use the default AutoResolve Integrated Runtime to get connected but there is not one setup or an option to setup a new one. Has anyone else using Purview been able to setup (or needed to setup) an AutoResolve IR?
To connect to Azure SQL DB/MI you can directly go to the Azure Purview portal and register new data sources and select Azure SQL DB/MI.
In this article - Manage data sources in Azure Purview (Preview), you learn how to register new data sources, manage collections of data sources, and view sources in Azure Purview (Preview).
Only to connect on-premise SQL server you need to Set up a
self-hosted integration runtime to scan the data source.
If the data source is located on Azure, you don't need any integration runtime to scan the data source.
Reference: Register and scan an Azure SQL Database.
CHEEKATLAPRADEEP-MSFT is absolutely correct, to go a step further, since you know what an auto resolve integration runtime is, you probably are utilizing Azure Data Factory so in addition to registering your SQL Server, you can also link your Azure Data Factory for data lineage purposes. Based on the pipelines that are executed, it will autonomously create the data lineage.
Navigation to Link Data Factory
Data lineage created by linking Data Factory
Keep in mind, you will have to execute pipelines after linkage for it to pick up the data lineage. Also, for sources or destinations not supported yet, it will not get the data lineage.

How can I use Azure Stream Analytics to use an on-premise SQL Server as an Output?

I'm following the instructions to set up App Insights to spool to SQL using Azure Stream Analytics, but I'm trying to deviate slightly to use an on-premise SQL server (that the web application already uses) over VPN.
At the point of adding the output, this is failing with:
Is it the case that IP addresses are not supported, or is it something more fundamental than that?
You are probably looking for answers directly to your question, which Jean-Sébastien answers succinctly. But an alternative architecture, if you haven't considered it already...
You could stream to a transient Azure SQL Database or Blob storage (likely cheaper depending on your workload), and then use Azure Data Factory tunnelled via a Self-Hosted Data Factory Integration Runtime to "send" the data back to on-premise SQL.
Data Factory V2 also has blob triggers, so rather than needing a schedule it could pickup any new blobs in micro batches.
I say "send" in quotation marks as the Integration Runtime actually creates an outgoing connection to from on-premise to Azure, yet gives the capability for push-like data transfer.
If data factory proves useful, here is a guide creating copy pipelines: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-hybrid-copy-portal
Albeit this guide is for on-prem sql to blob, but it gives you a stronger starting point.
At this time only Azure SQL Databases are supported in Azure Stream Analytics.
Sorry for the inconvenience.
Thanks,
JS (Azure Stream Analytics)

Copy files from on-prem to azure

I'm new to Azure eco system. I'm doing some research on copying data from on-prem to azure. I found following options:
AzCopy
Azure Data Factory (Copy Data Tool)
Data Management Gateway
Ours is a Microsoft shop; so, I'm looking for tools that gel with MS platform. Also, down the line, we want to automate the entire thing as much as we can. So, I think, Azure Storage Explorer is out of the question. Is there a preference among the above 3. Or, are there any better tools?
I think you are mixing stuff, Copy Data Tool is just an Azure Data Factory Wizard to make some sample data moving between resources. Azure Data Factory uses the data management gateway to get on premises resources such as files and databases.
What you want to do can be made with Azure Data Factory. I recommend using version 2 (even in its preview version) because its Authoring is easier to understand if you are new to the tool. You can graphically configure linked services, datasets and pipelines from there.
I hope this helped, if you need further help just ask away!
If you're already familiar with SSIS, there's also the option to use SSIS in ADF that enables on-prem data access via VNet.

Azure Data Factory - moving data from On-Premise SQL to Azure SQL

A simple question: Can this be achieved directly? I mean without the Azure blob storage in between (as showed in all the examples)? Can someone provide some code example please.
yes, you can do this directly. In fact, you can do direct copies from any of our supported sources/sinks, you don't have to pass through blob. To go from on-prem SQL Server-->SQL azure, you will need to setup a Data Management Gateway connector on your on-prem server. Then, you use a linked service of type AzureStorage and an output dataset of type AzureSQLTable as the output dataset, instead of AzureBlob as is shown in the example. The exact steps to setup the DMG and the JSON code for the linked services, datasets, and pipelines can be found in our documentation. We are also improving our UI in the near future to make these kinds of copy setups an easy code-free experience.
https://azure.microsoft.com/en-us/documentation/articles/data-factory-sqlserver-connector/

Resources