I have some Excel files stored in SharePoint online. I want copy files stored in SharePoint folders to Azure Blob storage.
To achieve this, I am creating a new pipeline in Azure Data factory using Azure Portal. What are possible ways to copy files from SharePoint to Azure blob store using Azure Data Factory pipelines?
I have looked at all linked services types in Azure data factory pipeline but couldn't find any suitable type to connect to SharePoint.
Rather than directly accessing the file in SharePoint from Data Factory, you might have to use an intermediate technology and have Data Factory call that. You have a few of options:
Use a Logic App to move the file
Use an Azure Function
Use a custom activity and write your own C# to copy the file.
To call a Logic App from ADF, you use a web activity.
You can directly call an Azure Function now.
We can create a linked service of type 'File system' by providing the directory URL as 'Host' value. To authenticate the user, provide username and password/AKV details.
Note: Use Self-hosted IR
You can use the logic app to fetch data from Sharepoint and load it to azure blob storage and now you can use azure data factory to fetch data from blob even we can set an event trigger so that if any file comes into blob container the azure pipeline will automatically trigger.
You can use Power Automate (https://make.powerautomate.com/) to do this task automatically:
Create an Automated cloud flow trigger whenever a new file is dropped in a SharePoint
Use any mentioned trigger as per your requirement and fill in the SharePoint details
Add an action to create a blob and fill in the details as per your use case
By using this you will be pasting all the SharePoint details to the BLOB without even using ADF.
My previous answer was true at the time, but in the last few years, Microsoft has published guidance on how to copy documents from a SharePoint library. You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.
I ran into some issues with large files and Logic Apps. It turned out there were some extremely large files to be copied from that SharePoint library. SharePoint has a default limit of 100 MB buffer size, and the Get File Content action doesn’t natively support chunking.
I successfully pulled the files with the web activity and copy activity. But I found the SharePoint permissions configuration to be a bit tricky. I blogged my process here.
You can use a binary dataset if you just want to copy the full file rather than read the data.
If my file is located at https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV, the URL I need to retrieve the file is https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV')/$value.
Be careful about when you get your auth token. Your auth token is valid for 1 hour. If you copy a bunch of files sequentially, and it takes longer than that, you might get a timeout error.
Related
At the moment, I am having to routinely manually upload a file onto a Teams Channel.
I have managed to create a pipeline to upload the file into my Azure Data Lake. I would now like to push the file from my Azure environment to my Teams Channel. I have found that webhooks cannot work with files and that bots can send files in the chat but not "Upload" them into a channel.
Is there a way to upload files from Azure to MS Teams using Data Factory or other alternatives?
Thank you.
The most appropriate way to partially achieve this would be to use a Teams Adaptive Card Connector but I couldn't find any way to easily set it up. It seems quite complex. Second option would be to use a Teams Post Message Connector from Azure Logic App but unfortunately it might not yet support attachments but you can send links to files stored elsewhere (SharePoint, Blob etc). Here's what you can try in the meantime.
In ADF:
Create a Copy Activity that will send the files to a Storage.
Create a Web Activity that will send a notification to a Logic App when the pipeline. You only need to use this as a trigger for the Logic App.
In Logic Apps:
Create an HTTP Connector that will receive the pipeline run information the files using the Azure Storage Blob Create blob action.
After that, create a Microsoft Teams Post a message (V3) Teams Connector, choose your Team and Channel
Use the Get blob content using path action to get a URL to the file - you might need to construct this URL.
Create your message using variables from the above connector and parse a link to the file to be downloaded.
Depending on the content of the file, you could also try to retrieve partial content and display it in the message body directly (if you can consider that an alternative for you).
I've not tried using the Adaptive Card Connector but I know it does give you a far richer dynamic experience. You would need to spend some time to design a custom card solution. Use this playground to see if it's something you can explore in the future.
The flow is as follows and kinda runs in parallel:
ADF Pipeline runs > ADF Copy Activity saves file in Blob storage
ADF Pipeline runs > ADF Web Activity triggers Logic App HTTP Connector > Logic App retrieves file from Blob storage > Sends a message to a Teams channel with a link to the file.
Here are all the supported Teams Actions.
We are generating an extract file in Data Factory(blob) that we need to upload to a SharePoint location. Is there any service available in azure to do this activity?
We were able to do this via Logic Apps.
since your source is blob and destination s sharepoint , HTTP is not available as a sink in ADF . SO unfortunately you cannot use the REST API and also there is no direct connector to sharepoint.
So you can use Logic app or Azure function for the copy task from blob to sharepoint
Can any one help me how to load csv file from share point online to azure Blob storage using Azure Data Factory.
I tried with Logic apps and succeed however logic app will not upload all file unless we made any change to the file or upload new.
I need to load all the file even there is no changes.
ADF v2 now supports loading from sharepoint online by OData connector with AAD service principal authentication: https://learn.microsoft.com/en-us/azure/data-factory/connector-odata
You probably can use a Logic App by changing to a Recurrence Trigger.
On that interval, you List the files in the Library then take any action on them you want.
How do I delete all files in a source folder (located on On-premise file system). I would need help with a .NET custom activity or any Out-of-the-box solutions in Azure Data Factory.
PS: I did find a delete custom activity but it's more towards Blob storage.
Please help.
Currently, there is NO support for a custom activity on Data Management Gateway. Data Management Gateway only supports copy activity and Stored Procedure activity as of today (02/22/2017).
Work Around: As I do not have a delete facility for on-premise files, I am planning to have source files in folder structures of yyyy-mm-dd. So, every date folder (Ex: 2017-02-22 folder) will contain all the related files. I will now configure my Azure Data Factory job to pull data based on date.
Example: The ADF job on Feb 22nd will search for 2017-02-22 folder. In the next run my ADF job will search for 2017-02-23 folder. This way I don't need to delete the processed files.
Actually, there is a normal way to do it. You will have to create Azure Functions App that will accept POST with your FTP/SFTP settings (in case you use one) and file name to remove. Therefore you parse request content to JSON, extract settings and use SSH.NET library to remove desired file. In case you have just a file share, you do not even need to bother with SSH.
Later on in Data Factory you add a Web Activity with dynamic content in Body section constructing the JSON request in the form I've mentioned above. For URL you specify published Azure Function Url + ?code=<your function key>
We actually ended up creating the whole bunch of Azure Functions that serve as custom activities for our DF pipelines.
I want to write the output of pipeline to an FTP folder. ADF seems to support on-premises file but not FTP folder.
How can I write the output in text format to an FTP folder?
Unfortunately FTP Servers are not a supported data store for ADF as of right now. Therefore there is no OOTB way to interact with an FTP Server for either reading or writing.
However, you can use a custom activity to make it possible, but it will require some custom development to make this happen. A fellow Cloud Solution Architect within MS put together a blog post that talks about how he did it for one of his customers. Please take a look at the following:
https://blogs.msdn.microsoft.com/cloud_solution_architect/2016/07/02/creating-ftp-data-movement-activity-for-azure-data-factory-pipeline/
I hope that this helps.
Upon thinking about it you might be able to achieve what you want in a mildly convoluted way by writing the output to a Azure Blob storage account and then either
1) manually: downloading and pushing the file to the "FTP" site from the Blob storage account or
2) automatically: using Azure CLI to pull the file locally and then push it to the "FTP" site with a batch or shell script as appropriate
As a lighter weight approach to custom activities (certainly the better option for heavy work).
You may wish to consider using azure functions to write to ftp (note there is a time out when using a consumption plan - not in other plans, so it will depend on how big the files are).
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-storage-blob-triggered-function
You could instruct data factory to write to a intermediary blob storage.
And use blob storage triggers in azure functions to upload them as soon as they appear in blob storage.
Or alternatively, write to blob storage. And then use a timer in logic apps to upload from blob storage to ftp. Logic Apps hide a tremendous amount of power behind there friendly exterior.
You can write a Logic app that will pick your file up from Azure storage and send it to an FTP site. Then call the Logic App using a Data Factory Web Activity.
Make sure you do some error handling in your Logic app to return 400 if the ftp fails.