Microsoft Sharepoint and Snowflake integrations and automations - sharepoint

I'm trying to integrate the Sharepoint site with Snowflake. Read the data from SharePoint files and upload it to the snowflake.
We are not looking to use a separate integration tool.

One option that i can think of is loading these files into external stage object, external stage object is Blob Storage In case of Azure Cloud Service Or S3 Bucket in Case of Azure Cloud Service and then using Snowpipe to load continuously using these files into Snowflake. This is link you can refer to know more about Snowpipe.
https://docs.snowflake.com/en/user-guide/data-load-snowpipe.html
Other option i can think of is , You would have to write dot net code in Share point to read the data files in sharepoint and using Snowflake Connector/Driver to Connect to Snowflake and load the data into it.

Related

Parsing files DAT files, CSV files and Image files using Azure services

I have 5 types of EDI files namely - *.DAT, *.XML, *.TXT, *.CSV and Image files which contains data in them whose data are not in standard format.
I need to parse them and extract required data from them and persist them in SQL Database.
Currently, I'm spending time writing parser class libraries for each type of EDI file and not scalable .
I need to know if there are any azure services which can do the parsing work for me and is scalable.
Can I expect a solution on this regards?
I need to parse them and extract required data from them and persist them in SQL Database.
Yes, You can use Azure Functions to process a Files like CSV an import data into Azure SQL Or Azure Data Factory is also helpful to read or copy many file formats and store them in SQL Server Database in specified formats, there is an practical example provided by Microsoft, Please refer here.
To do with Azure Functions, the following steps are:
Create Azure Functions (Stack: .Net 3.1) of type Blob Trigger and define the local storage account connection string in local.settings.json like below:
In the Function.cs, there will be some boilerplate code which gives the logic of showing the uploaded blob name and its size.
In the Run function, you can define your parsing logic of the uploaded blob files.
Create the Azure SQL Database, configure the server with location, pricing tier and the required settings. After that, Select Set Server Firewall on the database overview page. Click Add Client IP to add your IP Address and Save. Test the database whether you're able to connect.
Deploy the project to Azure Functions App from Visual Studio.
Open your Azure SQL Database in the Azure portal and navigate to Connection Strings. Copy the connection string for ADO.NET.
Paste that Connection String in Azure Function App Settings in the portal.
Test the function app from portal and the remaining steps of uploading files from storage to SQL Database were available in this GitHub documentation
Also for parsing the files like CSV, etc. to JSON Format through Azure Functions, please refer here.
Consider using Azure Data Factory. It supports a range of file types.

Upload file to Sharepoint from Data Factory

We are generating an extract file in Data Factory(blob) that we need to upload to a SharePoint location. Is there any service available in azure to do this activity?
We were able to do this via Logic Apps.
since your source is blob and destination s sharepoint , HTTP is not available as a sink in ADF . SO unfortunately you cannot use the REST API and also there is no direct connector to sharepoint.
So you can use Logic app or Azure function for the copy task from blob to sharepoint

Azure Data Factory and SharePoint

I have some Excel files stored in SharePoint online. I want copy files stored in SharePoint folders to Azure Blob storage.
To achieve this, I am creating a new pipeline in Azure Data factory using Azure Portal. What are possible ways to copy files from SharePoint to Azure blob store using Azure Data Factory pipelines?
I have looked at all linked services types in Azure data factory pipeline but couldn't find any suitable type to connect to SharePoint.
Rather than directly accessing the file in SharePoint from Data Factory, you might have to use an intermediate technology and have Data Factory call that. You have a few of options:
Use a Logic App to move the file
Use an Azure Function
Use a custom activity and write your own C# to copy the file.
To call a Logic App from ADF, you use a web activity.
You can directly call an Azure Function now.
We can create a linked service of type 'File system' by providing the directory URL as 'Host' value. To authenticate the user, provide username and password/AKV details.
Note: Use Self-hosted IR
You can use the logic app to fetch data from Sharepoint and load it to azure blob storage and now you can use azure data factory to fetch data from blob even we can set an event trigger so that if any file comes into blob container the azure pipeline will automatically trigger.
You can use Power Automate (https://make.powerautomate.com/) to do this task automatically:
Create an Automated cloud flow trigger whenever a new file is dropped in a SharePoint
Use any mentioned trigger as per your requirement and fill in the SharePoint details
Add an action to create a blob and fill in the details as per your use case
By using this you will be pasting all the SharePoint details to the BLOB without even using ADF.
My previous answer was true at the time, but in the last few years, Microsoft has published guidance on how to copy documents from a SharePoint library. You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.
I ran into some issues with large files and Logic Apps. It turned out there were some extremely large files to be copied from that SharePoint library. SharePoint has a default limit of 100 MB buffer size, and the Get File Content action doesn’t natively support chunking.
I successfully pulled the files with the web activity and copy activity. But I found the SharePoint permissions configuration to be a bit tricky. I blogged my process here.
You can use a binary dataset if you just want to copy the full file rather than read the data.
If my file is located at https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV, the URL I need to retrieve the file is https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV')/$value.
Be careful about when you get your auth token. Your auth token is valid for 1 hour. If you copy a bunch of files sequentially, and it takes longer than that, you might get a timeout error.

how to load file from share point online to Azure Blob using Azure Data Factory

Can any one help me how to load csv file from share point online to azure Blob storage using Azure Data Factory.
I tried with Logic apps and succeed however logic app will not upload all file unless we made any change to the file or upload new.
I need to load all the file even there is no changes.
ADF v2 now supports loading from sharepoint online by OData connector with AAD service principal authentication: https://learn.microsoft.com/en-us/azure/data-factory/connector-odata
You probably can use a Logic App by changing to a Recurrence Trigger.
On that interval, you List the files in the Library then take any action on them you want.

What are the Azure ML output formats?

Does Azure ML only provide output through it's web services?
Is it possible to feed the output to an Azure SQL database?
Is it possible to feed the output to a Redshift database?
Essentially I am looking to know if I can integrate Azure ML Studio with our existing redshift analytics database.
yes you can write to SQL DB in Azure.
you can also use a Python module to make REST calls so in theory you can write to Redshift.
Writing to SQL DB is possible in Azure ML and so is Writing directly to Azure Blob Storage.
However, unlike #Hai, I do not believe you can write to a Redshift DB since it is clearly stated by the "Python Module" documentation from Microsoft that the Python execution is Sandboxed and therefore can not access resources outside the virtual machine it runs on(i.e Internet resources, on-premises resources, ...)

Resources