Best way to extract data from Azure Data Lake to SQL Server - azure

I am looking for a best programmatic way to extract data from Azure Data Lake to MSSQL database, which is installed on a VM within Azure.
Currently I am considering following options:
Azure Data Factory
SSIS (Using Azure Data Lake Store Connection Manager)
User-Defined Outputter Example1, Example2
Custom C# code that reads Azure Data Lake data and inserts it into SQL Server DB
Any other good ways I am missing?

Data factory v2 (currently in public preview), also supports hosting SSIS to give you a data factory AND ssis option.
And not necessarily a good idea for many scenarios, but Azure Logic Apps has both a data lake store connector and SQL Server connector, which could be useful in scenarios such as writing lots of small files on a schedule or trigger.
You also may not need to go full on c# and instead use PowerShell, there are powershell modules for both data lake store and sql server.

Related

Export data from Azure SQL managed instance to Azure Data Lake Storage as json

I have a requirement to export data from Azure SQL managed instance to data lake storage as json documents. I have to use SQL Server Integration Services to accomplish this. I tried using the Flexible File Destination Data Flow task but when I see the supported file formats there's no json being supported. What other options do I have to accomplish this.
Azure Data Factory support data movement between Azure Managed Instance and Data lake account, but unfortunately when the destination is Azure Data Lake storage it also doesn't support JSON format using SSIS.
Azure Data Lake Store Destination
The Azure Data Lake Store Destination component enables an SSIS
package to write data to an Azure Data Lake Store. The supported file
formats are: Text, Avro, and ORC.
Workaround: The possible workaround you can try to use Data Flow activity in Azure Data Factory. Load the data from Managed Instance and transform it using Pivot transformation and store the processed data in Data Lake. This approach doesn't involve SSIS. Check this similar kind of request and approach here.

How to access a Redshift DB through VPN to extract data and load into own Azure environment?

pretty new to the Azure environment and so far my search for information wasnt very successful.
Problem is as follows:
we wanna access a redshift DB which you can only connect to if you are conntected to a specific VPN beforehand - this is the main problem
we then wanna build an automated data pipeline which extracts daily updated data from the redshift db and create our own analytics solution from it
how can that be set up in a fully automated workflow and also in the simplest, most efficient way with the tools available on the azure platform?
thanks for the help.
If VPN is not the challenge and you just need to extract the data from Redshift DB and store it in any Azure Service like Blob Storage or Azure Synapse Analytics, then best possible way is to use Azure Data Factory. Azure Data Factory is a fully managed, serverless data integration service.
You can copy data using Copy activity from Amazon Redshift to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.
Specifically, this Amazon Redshift connector supports retrieving data from Redshift using query or built-in Redshift UNLOAD support.
Note: When copying data to an Azure data store, see Azure Data Center IP Ranges for the Compute IP address and SQL ranges used by the Azure data centers.
In case you need to import data into Azure SQL database from AWS Redshift, follow the link.

Need solution to integrate Grafana with Azure data lake

I want to integrate Azure data lake storage with Grafana for visualization of time series data. I need to know what all the tools I can use to make it possible.
I used ADF to extract data from csv files stored in data lake and move to a table in Azure data explorer. After that, I used Azure data explorer plugin in grafana to visualize the same. It worked fine. But I need to know is there any other approach which may be better or cost-effective.
Integrating Grafana with Azure Data Lake is the best option when compared to others because the other options include data movements using ADF and additional cost for Azure SQL Datawarehouse along with the cost of PowerBI.
Reason:
Grafana is a leading open source software designed for visualizing time series analytics. It is an analytics and metrics platform that enables you to query and visualize data and create and share dashboards based on those visualizations. Combining Grafana’s beautiful visualizations with Azure Data Explorer’s snappy ad hoc queries over massive amounts of data, creates impressive usage potential.
The Grafana and Azure Data Explorer teams have created a dedicated plugin which enables you to connect to and visualize data from Azure Data Explorer using its intuitive and powerful Kusto Query Language. In just a few minutes, you can unlock the potential of your data and create your first Grafana dashboard with Azure Data Explorer.
For more details on visualizing data from Azure Data Explorer in Grafana please visit our documentation, “Visualize data from Azure Data Explorer in Grafana”.
Other options:
For Azure Data Lake Gen1:
You can use a mix of services to create visual representations of data stored in Data Lake Storage Gen1.
You can start by using Azure Data Factory to move data from Data Lake Storage Gen1 to Azure SQL Data Warehouse.
After that, you can integrate Power BI with Azure SQL Data Warehouse to create visual representation of the data.
For Azure Data Lake Gen2:
You can use a mix of services to create visual representations of data stored in Data Lake Storage Gen2.
You can start by using Azure Data Factory to move data from Data Lake Storage Gen2 to Azure SQL Data Warehouse.
After that, you can integrate Power BI with Azure SQL Data Warehouse to create visual representation of the data.
Hope this helps.
They just released a new guide. This is for Grafana 5.3
https://learn.microsoft.com/en-us/azure/data-explorer/grafana
you are able to test this by running Grafana in a Docker container (or for real, if you want). I followed the guide, and it is working almost exactly as expected. The only issue I am having is Grafana is concatenating the column name and the data in the column, making reading and formatting tricky.

Add SQL Server as a data source in Azure Data Lake Analytics

I'm doing some tests with Azure Data Lake Analytics and I can’t add a new SQL Server database as a Data Source. When I click on "Add data source", the only two available options are: "Azure Data Lake Storage Gen1" and "Azure Storage".
What I want is to add one SQL Server database so that I can run U-SQL queries against it.
Our SQL Server firewall is correctly configured to allow access to Azure Services, but I am not allowed to add it as a data source.
How can this be done? Is it a matter of other configuration issues?
Any help would be greatly appreciated.
Per my research ,there is no other configuration issues for sql server data source in DLA. Based on this official doc, DLA only supports two data sources:Data Lake Store and Azure Storage.
As workaround , I suggest you using Azure Data Factory to transfer data from sql server database to azure storage so that you could run U-SQL script against data source.
Any concern,please let me know.

How to connect Data Lake store in Azure analysis services

How to connect Data Lake store in Azure analysis services
Can we use HIVE ODBC or any other options?
I assume you want to use Azure Data Lake as a data source for Azure Analysis Services (e.g. you have fact and dimension files in the Data Lake).
There is no connector in Azure Analysis Services to pull data directly from Azure Data Lake at present, although hopefully this is something Microsoft will address soon.
As a workaround you could try the following:
Azure Analysis Services will allow you to use Azure Blob Storage as a data source. So once you have transformed your data in Azure Data Lake you then need to copy the fact and dimension files into Azure Blob Storage (e.g. using Azure Data Factory) and then you should be able to use Azure Analysis Services to build your model.
Note that the Blob Storage data source option is only available if you build a 1400 compatibility model in Azure Analysis Services. This option is only available if you have the latest version of SQL Server Data Tools for Visual Studio (you may need to upgrade to version 17.1 of SSDT).
I hope this helps

Resources