SFTP support for azure data factory - azure

I have a SFTP server. I have my files uploaded there. I want to use azure data factory to connect with the SFTP server and read the files from the SFTP server and save them in the Azure Blob storage.
Is there a way to perform this using azure pipeline/activity configuration?

ADF has recently Added SFTP Support.Refer
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-sftp-connector

EDIT
Data Factory now has native support for sftp.
It doesn't appear that Data factory supports sftp natively, however:
If you need to move data to/from a data store that Copy Activity
doesn't support, use a custom activity in Data Factory with your own
logic for copying/moving data. For details on creating and using a
custom activity, see Use custom activities in an Azure Data Factory
pipeline.
Also, Azure Logic Apps do support sftp natively which you could use to drop into blob storage, however I'm guessing (I'm soon to find out) that you'll loose the knowledge that the sftp server failing is a route cause when monitoring the factory.
SFTP planned feature in the azure feedback portal, if it is important to you I would recommend voting it up.

Related

Ingest Data From On-Premise SFTP Folder To Azure SQL Database (Azure Data Factory)

Usecase: I have data files of varying size copied to a specific SFTP folder periodically (Daily/Weekly). All these files needs to be validated and processed. Then write them to related tables in Azure SQL. Files are of CSV format and are actually a flat text file which directly corresponds to a specific Table in Azure SQL.
Implementation:
Planning to use Azure Data Factory. So far, from my reading I could see that I can have a Copy pipeline in-order to copy the data from On-Prem SFTP to Azure Blob storage. As well, we can have SSIS pipeline to copy data from On-Premise SQL Server to Azure SQL.
But I don't see a existing solution to achieve what I am looking for. can someone provide some insight on how can I achieve the same?
I would try to use Data Factory with a Data Flow to validate/process the files (if possible for your case). If the validation is too complex/depends on other components, then I would use functions and put the resulting files to blob. The copy activity is also able to import the resulting CSV files to SQL server.
You can create a pipeline that does the following:
Copy data - Copy Files from SFTP to Blob Storage
Do Data processing/validation via Data Flow
and sink them directly to SQL table (via Data Flow sink)
Of course, you need an integration runtime, that can access the on-prem server - either by using VNet integration or by using the self hosted IR. (If it is not publicly accessible)

How to fetch or read an .bak file in Azure data factory?

I am trying to transfer data from .bak file in the azure storage account to Azure SQL Db by ADF. I know ADF doesn't support it but is there any workaround?
You would have to have an instance of SQL Server box product, eg 2019 running on a virtual machine (VM) or on-premises SQL instance in order for this to work. Restore the backup to that SQL instance via a Stored Proc activity in Azure Data Factory (ADF) and then use the Copy activity to load the data. You will need a self-hosted integration runtime (SHIR) to access the SQL instance irrespective of whether it's on premises or in Azure.
For added value, you could use ADF to start and stop your VM via REST API calls, as part of your pipeline. I think you could do something like this (untested):
There is no way to access the .bak file directly. I have read about .bak file viewers in the past but you couldn't build this into an architecture. These would be too unreliable or untrusted.

How can I use Azure Stream Analytics to use an on-premise SQL Server as an Output?

I'm following the instructions to set up App Insights to spool to SQL using Azure Stream Analytics, but I'm trying to deviate slightly to use an on-premise SQL server (that the web application already uses) over VPN.
At the point of adding the output, this is failing with:
Is it the case that IP addresses are not supported, or is it something more fundamental than that?
You are probably looking for answers directly to your question, which Jean-Sébastien answers succinctly. But an alternative architecture, if you haven't considered it already...
You could stream to a transient Azure SQL Database or Blob storage (likely cheaper depending on your workload), and then use Azure Data Factory tunnelled via a Self-Hosted Data Factory Integration Runtime to "send" the data back to on-premise SQL.
Data Factory V2 also has blob triggers, so rather than needing a schedule it could pickup any new blobs in micro batches.
I say "send" in quotation marks as the Integration Runtime actually creates an outgoing connection to from on-premise to Azure, yet gives the capability for push-like data transfer.
If data factory proves useful, here is a guide creating copy pipelines: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-hybrid-copy-portal
Albeit this guide is for on-prem sql to blob, but it gives you a stronger starting point.
At this time only Azure SQL Databases are supported in Azure Stream Analytics.
Sorry for the inconvenience.
Thanks,
JS (Azure Stream Analytics)

Azure Blob Storage as Source and FTP as destination

Is there anyway I can transfer txt files from my Azure Blob Storage to a FTP directly, going serverless?
If possible using SSIS or Azure Data Factory.
Thanks!
you can use Azure Logic App:
Connectors to blob storage
Connectors to FTP
A simple logic app to push a blob to a FTP would be:
SSIS has a lot of connectors that can talk directly to AZURE storage. As for FTP, you may have to use a third party software (WinSCP) that can accomplish uploading of the file to FTP (if the built in FTP Task doesnt accomplish it already). If you are looking to go directly from Azure to FTP, you may have to rely on custom C# code. I am not even sure if that is possible.
You could use SSIS. Azure data factory copy activity doesn’t support ftp as sink.

Can Azure Data Factory write to FTP

I want to write the output of pipeline to an FTP folder. ADF seems to support on-premises file but not FTP folder.
How can I write the output in text format to an FTP folder?
Unfortunately FTP Servers are not a supported data store for ADF as of right now. Therefore there is no OOTB way to interact with an FTP Server for either reading or writing.
However, you can use a custom activity to make it possible, but it will require some custom development to make this happen. A fellow Cloud Solution Architect within MS put together a blog post that talks about how he did it for one of his customers. Please take a look at the following:
https://blogs.msdn.microsoft.com/cloud_solution_architect/2016/07/02/creating-ftp-data-movement-activity-for-azure-data-factory-pipeline/
I hope that this helps.
Upon thinking about it you might be able to achieve what you want in a mildly convoluted way by writing the output to a Azure Blob storage account and then either
1) manually: downloading and pushing the file to the "FTP" site from the Blob storage account or
2) automatically: using Azure CLI to pull the file locally and then push it to the "FTP" site with a batch or shell script as appropriate
As a lighter weight approach to custom activities (certainly the better option for heavy work).
You may wish to consider using azure functions to write to ftp (note there is a time out when using a consumption plan - not in other plans, so it will depend on how big the files are).
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-storage-blob-triggered-function
You could instruct data factory to write to a intermediary blob storage.
And use blob storage triggers in azure functions to upload them as soon as they appear in blob storage.
Or alternatively, write to blob storage. And then use a timer in logic apps to upload from blob storage to ftp. Logic Apps hide a tremendous amount of power behind there friendly exterior.
You can write a Logic app that will pick your file up from Azure storage and send it to an FTP site. Then call the Logic App using a Data Factory Web Activity.
Make sure you do some error handling in your Logic app to return 400 if the ftp fails.

Resources