how to load file from share point online to Azure Blob using Azure Data Factory - azure

Can any one help me how to load csv file from share point online to azure Blob storage using Azure Data Factory.
I tried with Logic apps and succeed however logic app will not upload all file unless we made any change to the file or upload new.
I need to load all the file even there is no changes.

ADF v2 now supports loading from sharepoint online by OData connector with AAD service principal authentication: https://learn.microsoft.com/en-us/azure/data-factory/connector-odata

You probably can use a Logic App by changing to a Recurrence Trigger.
On that interval, you List the files in the Library then take any action on them you want.

Related

Microsoft Sharepoint and Snowflake integrations and automations

I'm trying to integrate the Sharepoint site with Snowflake. Read the data from SharePoint files and upload it to the snowflake.
We are not looking to use a separate integration tool.
One option that i can think of is loading these files into external stage object, external stage object is Blob Storage In case of Azure Cloud Service Or S3 Bucket in Case of Azure Cloud Service and then using Snowpipe to load continuously using these files into Snowflake. This is link you can refer to know more about Snowpipe.
https://docs.snowflake.com/en/user-guide/data-load-snowpipe.html
Other option i can think of is , You would have to write dot net code in Share point to read the data files in sharepoint and using Snowflake Connector/Driver to Connect to Snowflake and load the data into it.

Upload file to Sharepoint from Data Factory

We are generating an extract file in Data Factory(blob) that we need to upload to a SharePoint location. Is there any service available in azure to do this activity?
We were able to do this via Logic Apps.
since your source is blob and destination s sharepoint , HTTP is not available as a sink in ADF . SO unfortunately you cannot use the REST API and also there is no direct connector to sharepoint.
So you can use Logic app or Azure function for the copy task from blob to sharepoint

Is there a way to know if a file has been uploaded to a network drive in Azure

I have a network location where every hour a csv file gets dumped. I need to copy that file to an azure blob. How do I know that a file has been uploaded to that network drive. Is there something like a file watcher in azure which monitors this network location? Also, is it possible to copy a file from network location to an azure blob through code?
I'm using .net core APIs deployed to an Azure App Service.
Please suggest a possible solution.
You can use Azure Event Grid but as of today Event Grid does not support Azure File Share.
As your fileshare in on-prem the only way I see is that you can write a custom publisher which can run on-prem and uses Azure Event Grid to send the event to Azure Event Grid and a subscriber which can be Azure Function does the work you want it to do.
https://learn.microsoft.com/en-us/azure/event-grid/custom-event-quickstart-portal
But it will only be an Event and not the file itself which has been added\changed and to do that you will have to then upload the file itself into Azure for processing as well. As the above way requires you to do two things I would recommend run a custom code on-prem which runs CRON job like and looks for the new or edited file and then uploads to Azure BLOB Storage and then execute Azure Function to do your processing task.
Since the files are on-prem you can use powershell to monitor a folder for new files. Then fire an event to upload the file to an Azure blob.
There is a video showing how to do this here: https://www.youtube.com/watch?v=Usih7UywZYA
The changes you need to make are:
replace the action with an upload to azure https://argonsys.com/microsoft-cloud/library/how-to-upload-files-to-azure-blob-storage-using-powershell-and-azcopy/
Run powershell in the context of a user that can upload files

Azure Data Factory and SharePoint

I have some Excel files stored in SharePoint online. I want copy files stored in SharePoint folders to Azure Blob storage.
To achieve this, I am creating a new pipeline in Azure Data factory using Azure Portal. What are possible ways to copy files from SharePoint to Azure blob store using Azure Data Factory pipelines?
I have looked at all linked services types in Azure data factory pipeline but couldn't find any suitable type to connect to SharePoint.
Rather than directly accessing the file in SharePoint from Data Factory, you might have to use an intermediate technology and have Data Factory call that. You have a few of options:
Use a Logic App to move the file
Use an Azure Function
Use a custom activity and write your own C# to copy the file.
To call a Logic App from ADF, you use a web activity.
You can directly call an Azure Function now.
We can create a linked service of type 'File system' by providing the directory URL as 'Host' value. To authenticate the user, provide username and password/AKV details.
Note: Use Self-hosted IR
You can use the logic app to fetch data from Sharepoint and load it to azure blob storage and now you can use azure data factory to fetch data from blob even we can set an event trigger so that if any file comes into blob container the azure pipeline will automatically trigger.
You can use Power Automate (https://make.powerautomate.com/) to do this task automatically:
Create an Automated cloud flow trigger whenever a new file is dropped in a SharePoint
Use any mentioned trigger as per your requirement and fill in the SharePoint details
Add an action to create a blob and fill in the details as per your use case
By using this you will be pasting all the SharePoint details to the BLOB without even using ADF.
My previous answer was true at the time, but in the last few years, Microsoft has published guidance on how to copy documents from a SharePoint library. You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.
I ran into some issues with large files and Logic Apps. It turned out there were some extremely large files to be copied from that SharePoint library. SharePoint has a default limit of 100 MB buffer size, and the Get File Content action doesn’t natively support chunking.
I successfully pulled the files with the web activity and copy activity. But I found the SharePoint permissions configuration to be a bit tricky. I blogged my process here.
You can use a binary dataset if you just want to copy the full file rather than read the data.
If my file is located at https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV, the URL I need to retrieve the file is https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV')/$value.
Be careful about when you get your auth token. Your auth token is valid for 1 hour. If you copy a bunch of files sequentially, and it takes longer than that, you might get a timeout error.

Can Azure Data Factory write to FTP

I want to write the output of pipeline to an FTP folder. ADF seems to support on-premises file but not FTP folder.
How can I write the output in text format to an FTP folder?
Unfortunately FTP Servers are not a supported data store for ADF as of right now. Therefore there is no OOTB way to interact with an FTP Server for either reading or writing.
However, you can use a custom activity to make it possible, but it will require some custom development to make this happen. A fellow Cloud Solution Architect within MS put together a blog post that talks about how he did it for one of his customers. Please take a look at the following:
https://blogs.msdn.microsoft.com/cloud_solution_architect/2016/07/02/creating-ftp-data-movement-activity-for-azure-data-factory-pipeline/
I hope that this helps.
Upon thinking about it you might be able to achieve what you want in a mildly convoluted way by writing the output to a Azure Blob storage account and then either
1) manually: downloading and pushing the file to the "FTP" site from the Blob storage account or
2) automatically: using Azure CLI to pull the file locally and then push it to the "FTP" site with a batch or shell script as appropriate
As a lighter weight approach to custom activities (certainly the better option for heavy work).
You may wish to consider using azure functions to write to ftp (note there is a time out when using a consumption plan - not in other plans, so it will depend on how big the files are).
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-storage-blob-triggered-function
You could instruct data factory to write to a intermediary blob storage.
And use blob storage triggers in azure functions to upload them as soon as they appear in blob storage.
Or alternatively, write to blob storage. And then use a timer in logic apps to upload from blob storage to ftp. Logic Apps hide a tremendous amount of power behind there friendly exterior.
You can write a Logic app that will pick your file up from Azure storage and send it to an FTP site. Then call the Logic App using a Data Factory Web Activity.
Make sure you do some error handling in your Logic app to return 400 if the ftp fails.

Resources