Azure Data Factory - moving data from On-Premise SQL to Azure SQL - azure

A simple question: Can this be achieved directly? I mean without the Azure blob storage in between (as showed in all the examples)? Can someone provide some code example please.

yes, you can do this directly. In fact, you can do direct copies from any of our supported sources/sinks, you don't have to pass through blob. To go from on-prem SQL Server-->SQL azure, you will need to setup a Data Management Gateway connector on your on-prem server. Then, you use a linked service of type AzureStorage and an output dataset of type AzureSQLTable as the output dataset, instead of AzureBlob as is shown in the example. The exact steps to setup the DMG and the JSON code for the linked services, datasets, and pipelines can be found in our documentation. We are also improving our UI in the near future to make these kinds of copy setups an easy code-free experience.
https://azure.microsoft.com/en-us/documentation/articles/data-factory-sqlserver-connector/

Related

Is it possible to use Azure Data Factory on-premise without data running through the cloud?

is it possible to use Azure Data Factory on-premise without letting the data run through the cloud? I know Talend got a prodcut, where the data is transfered only on our machines and not in the cloud.
Read documentation on Microsoft.com but didnt find any useful information
You may use a self-hosted integration runtime to transfer data entirely through your on-premises infrastructure, as long as both the data source and sink are on-premises.
However, the control flow will still happen through the cloud, even if the data itself never leaves your data center. For this reason, setting up a self-hosted integration runtime will still require outbound network access from your infrastructure to Azure.
Check out this piece of documentation for more information: https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime?tabs=data-factory#command-flow-and-data-flow

Custom Script in Azure Data Factory & Azure Databricks

I have a requirement to parse a lot of small files and load them into a database in a flattened structure. I prefer to use ADF V2 and SQL Database to accomplish it. The file parsing logic is already available using Python script and I wanted to orchestrate it in ADF. I could see an option of using Python Notebook connector to Azure Databricks in ADF v2. May I ask if I will be able to just run a plain Python script in Azure Databricks through ADF? If I do so, will I just run the script in Databricks cluster's driver only and might not utilize the cluster's full capacity. I am also thinking of calling Azure functions as well. Please advise which one is more appropriate in this case.
Just provide some ideas for your reference.
Firstly, you are talking about Notebook and Databricks which means ADF's own copy activity and Data Flow can't meet your needs, since as i know, ADF could meet just simple flatten feature! If you miss that,please try that first.
Secondly,if you do have more requirements beyond ADF features, why not just leave it?Because Notebook and Databricks don't have to be used with ADF,why you want to pay more cost then? For Notebook, you have to install packages by yourself,such as pysql or pyodbc. For Azure Databricks,you could mount azure blob storage and access those files as File System.In addition,i suppose you don't need many workers for cluster,so just configure it as 2 for max.
Databricks is more suitable for managing as a job i think.
Azure Function also could be an option.You could create a blob trigger and load the files into one container. Surely,you have to learn the basic of azure function if you are not familiar with it.However,Azure Function could be more economical.

How can I use Azure Stream Analytics to use an on-premise SQL Server as an Output?

I'm following the instructions to set up App Insights to spool to SQL using Azure Stream Analytics, but I'm trying to deviate slightly to use an on-premise SQL server (that the web application already uses) over VPN.
At the point of adding the output, this is failing with:
Is it the case that IP addresses are not supported, or is it something more fundamental than that?
You are probably looking for answers directly to your question, which Jean-Sébastien answers succinctly. But an alternative architecture, if you haven't considered it already...
You could stream to a transient Azure SQL Database or Blob storage (likely cheaper depending on your workload), and then use Azure Data Factory tunnelled via a Self-Hosted Data Factory Integration Runtime to "send" the data back to on-premise SQL.
Data Factory V2 also has blob triggers, so rather than needing a schedule it could pickup any new blobs in micro batches.
I say "send" in quotation marks as the Integration Runtime actually creates an outgoing connection to from on-premise to Azure, yet gives the capability for push-like data transfer.
If data factory proves useful, here is a guide creating copy pipelines: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-hybrid-copy-portal
Albeit this guide is for on-prem sql to blob, but it gives you a stronger starting point.
At this time only Azure SQL Databases are supported in Azure Stream Analytics.
Sorry for the inconvenience.
Thanks,
JS (Azure Stream Analytics)

Generating and storing JSON files from the run-time parameters passed to Azure Data Factory v2 pipeline?

Can we create a file (preferably json) and store it in its supported storage sinks (like Blob, Azure Data Lake Service etc) using the parameters that are passed to Azure Data Factory v2 pipeline at run-time. I suppose it can be done via Azure Batch but it seems to be an overkill for such a trivial task. Is there a better way to do that?
Here are all the transform activities ADFv2 currently equips with, I'm afraid there isn't a direct way to create a file in ADFv2. You could leverage Custom activity to achieve this by running your customized code logic on an Azure Batch pool of virtual machines. Hope it'll help a little.

Copy files from on-prem to azure

I'm new to Azure eco system. I'm doing some research on copying data from on-prem to azure. I found following options:
AzCopy
Azure Data Factory (Copy Data Tool)
Data Management Gateway
Ours is a Microsoft shop; so, I'm looking for tools that gel with MS platform. Also, down the line, we want to automate the entire thing as much as we can. So, I think, Azure Storage Explorer is out of the question. Is there a preference among the above 3. Or, are there any better tools?
I think you are mixing stuff, Copy Data Tool is just an Azure Data Factory Wizard to make some sample data moving between resources. Azure Data Factory uses the data management gateway to get on premises resources such as files and databases.
What you want to do can be made with Azure Data Factory. I recommend using version 2 (even in its preview version) because its Authoring is easier to understand if you are new to the tool. You can graphically configure linked services, datasets and pipelines from there.
I hope this helped, if you need further help just ask away!
If you're already familiar with SSIS, there's also the option to use SSIS in ADF that enables on-prem data access via VNet.

Resources