Code to connect to Sharepoint from PySpark - apache-spark

I want to extract SharePoint List data using PySpark. I am not sure about the Sharepoint list data and storage. I want to read the SharePoint list data as a PySpark data frame.
I have tried Python Libraies:
Sharepy
Slum
Sharepoint and many others

Assuming you are using pyspark from databricks, I am using a different approach. I am using office 365 powerautomate flows to store the sharepoint lists in azure data storage as csv files. These flows can be called from databricks via calling the http triggers of power automate in python or you can have power automate automatically update when data change occurs. The csv files can then be mounted as tables in sql analytics and used easily in databricks. The benefit is microsoft offers and easy to use no code solution to exporting sharpoint to azure storage and it also handles all of the security nuances.

You can download the file/list that you want from Sharepoint first using one of the following packages, then you can use PySpark to ingest and process it.
https://pypi.org/project/sharepoint/
https://pypi.org/project/sharepy/
Here is a tutorial using Sharepy package: https://www.mydatahack.com/how-to-get-data-from-sharepoint-with-python/
I hope it helps.

Related

Using KQL (kusto query language) to explore data from a local file (e.g. Excel, CSV, JSON etc.)

Is there any way to use KQL to query a large local file (10k+ rows) such as Excel, CSV etc. alongside data hosted in Kusto (Azure Data Explorer)?
Here is my scenario:
I extensively use KQL to explore data hosted in Kusto (Azure Data Explorer) clusters. Mostly these explorations are very dynamic and in one-off scenarios to investigate situations.
For some data, I just have Excel and CSV files that I want to join with Kusto data. I know I could do this with Pandas, but I'm specifically asking if there's any way to do it with KQL, preferably without setting up a cluster and ingesting the data into a Kusto table.
There are a few ways this can be done, but there is one requirement which is that the data needs to be accessible from the Kusto cluster, for your scenario, the files need to be in Azure Storage. The lightest approach is using the externaldata operator, but you can also set up an external table.
Also please note that you can get your own free cluster to do this processing, to create it go to http://aka.ms/kustofree

Azure Database -> Excel Query

I have a query being executed in a Azure server periodically and I need to add some code to it, so it can save some data from Tables/Views to a Excel file during the execution.
I have implemented some code like this on other databases (non-Azures), but executing the same code in Azure gives me messages like "Azure doesn't support" some of the tools I used.
What should I use to do this? I just got to save some Tables data to specific sheets in Excel.
Thanks in advance!
In case if the requirement is specific to Excel file creation ; you can use a logic app to query database from Azure SQL database and generate Excel file based on the below link:
https://community.dynamics.com/ax/b/d365fortechies/posts/logic-app-for-azure-sql-db-to-azure-file-storage-workflow
Note: You can select Excel file generation for Logic app rather than CSV mentioned in the above example or generate an CSV file and then convert into Excel
Since OPENDATASOURCE is not supported in Azure SQL. You also can use other ETL tools to save some data from Tables/Views to a Excel.
Such as Azure data factory:
Using Copy activity in Azure data factory, you can query from table, execute your sql query and execute stored procudure then convert to a Excel file. There are multiple destinations for you to choose to store this excel, cloud or local server.

Which Azure products are needed for a staging database?

I have several external data APIs that I access using some Python scripts. My scripts run from an on-premises server, transform the data, and store it in a SQL Server database on the same server. I suppose it's a rudimentary ETL system run with Python and T-SQL.
The system is about to grow quite a bit with new APIs and will require more complex data pipelines (for example, some of the API data will be spun off to more than one table). I think this would be a good time to move the system onto Azure (we are heavily integrated with Microsoft so it will have to be Azure!).
I have spent a few days researching the Azure products that would let me run Python scripts to access data from web APIs and store the processed data in a cloud database. I'm looking for advice on what sort of Azure products other people have used for similar jobs. At the moment it seems I will need:
Azure SQL Database to hold the processed data that can be accessed by various colleagues.
Azure Data Factory to manage, log, and schedule the pipeline jobs and to run my custom Python scripts (is this even possible?).
Azure Batch to run the aforementioned Python scripts but I'm not sure about this.
I want to put together a proposal basically and start thinking about costs but it would be good to hear from someone who has done something similar - am I on the right track or completely off? Should I just stay on-premises? Thank you in advance.
Azure SQL Database, Azure SQL Data Warehouse are good for relational data. And if you want to use NoSQL, you could go with Azure Cosmos DB. If you want to use Files to store data, you could use Azure Data Lake.
For python scripts, you could use custom activity or Data bricks for Azure Data Factory.
Azure SQL Warehouse should be used if the amount of data you want to load is in petabytes. Also, Azure Data warehouse is not meant for complex transformations. I would recommend it for plain data load with PolyBase.

Bulk upload Excel to SQL Azure daily

I have a requirement to bulk upload data from a excel file to an Azure SQL table on a daily basis. I did some research and found that we could create a VM install full SQL and use SSIS package to do this.
Is there any other reliable way to go about this? The excel may contain up to 10,000 rows.
I have also read we could upload file to a blob storage and read from there but found it's not very robust approach.
Can anyone suggest if this is feasible approach-
Place excel file in Azure Website accessed via FTP
Azure Timer job using SQL Bulk copy code to update the SQL table
Any help would be highly appreciated!
You could use Azure Data Factory - check out the documentation here. Place your files in Azure Data Lake and the ADF will process them.

How to get data from Azure Table Storage to SharePoint Online

I need to get data from Azure table storage and populate in SharePoint O365 site as a list. I found a way to achieve if it's from SQL Azure but couldn't able to find for Azure Tables. Kindly share your inputs.
Azure Table and SharePoint both have APIs, so in the base case you can import from one and export to others using their APIs. However, it seems you are looking for a tool, not to write code.
To export from Azure Tables you can use the AzCopy Tool (http://aka.ms/azcopy) to export the contents into a Json-formatted local file. Then, you can use whatever tool you use to import into SharePoint, as long as it understands Json files.

Resources