Azure Data Lake Analytics Vs Azure SQL Data Warehouse - azure

I am using ADF to connect to sources and get data into Azure Data Lake store. After getting data into Data Lake Store, I want to do some transformation, aggregation and use that data in SSRS reports and also for creating Cubes.
Can anyone suggest me which will be the best option (Azure Data Lake Analytics or Azure SQL DW) ?
I am looking here to make a decision on to take which one after Data lake.

There are no more Azure SQL DW. What we have now are Azure Synapse (same as Azure DW) and Azure Synapse Analytics (instead of Azure Datalake analytics). Microsoft is stopping support (develop) USQL and Azure Datalake analytic. If volume of your data is huge and you want use Polybase technology the best choice is Azure Synapse and Azure Synapse Analytics. You can rich your ADF by using Databricks to do analytics stuff. By using Polybase you can do ELT instead of ETL.

Microsoft Azure is not anymore investing on Azure Data Lake Analytics (ADLA) , you can evidently see that number of enhancements /updates in last couple of years are almost none in ADLA. While on the other side Azure SQL Data Warehouse is their flagship service ( recently names as azure synapse analytics) and hence getting enhanced and updated very fast. Synapse is based on MPP architecture and provides all required capabilities of big data computing.

What is the size of your data? Azure Data Lake is more meant for petabyte size big data processing and Azure SQL Data Warehouse for large relational DWH solutions (starting from 250/500 GB and up).
With Azure Data Lake you can even have the data from a data lake feed a NoSQL database, a SSAS cube, a data mart, or go right into Power BI. With Azure SQL Datawarehouse you can have cubes, Power BI reports and SSRS
If you need SQL Server Reporting Services, Integration Services (and you have complex SSIS logic), and Analysis Services (SSAS), you may better consider an Azure SQL VM.

Related

Difference between the dedicated sql pool and dedicated sql pool inside the azure synapse analytics?

Difference between the dedicated sql pool and dedicated sql pool inside the azure synapse analytics?
While provision the azure synapse analytics we will use the Azure storage layer gen2 ,as per the msdn the data will be stored in the azure storage gen2 but azure gen2 will use the hdfs features.so how the dfs feature will use the syanpse analytics?
They both are the same thing. Either you can first create a Dedicated SQL pool and link it with Synapse Workspace, or you can first create the Synapse Workspace and then dedicated pool inside it.
A dedicated SQL pool offers T-SQL based compute and storage capabilities. After creating a dedicated SQL pool in your Synapse workspace, data can be loaded, modeled, processed, and delivered for faster analytic insight.
Apart from Dedicated SQL pool, Azure Synapse provide Serverless SQL and Apache Spark pools. Based on your requirement you can choose the appropriate.
Serverless SQL pool is a query service over the data in your data lake. It enables you to access your data through the following functionalities:
A familiar T-SQL syntax to query data in place without the need to copy or load data into a specialized store.
Integrated connectivity via the T-SQL interface that offers a wide range of business intelligence and ad-hoc querying tools, including the most popular drivers.
You will be directly passing the file path stored in Data Lake Gen2 in T-SQL statement. Refer example below:
select top 10 *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',
format = 'csv',
parser_version = '2.0',
firstrow = 2 ) as rows
For more related information, I recommend you to go through this document.

Load data from Databricks to Azure Analysis Services (AAS)

Objective
I'm storing data as Delta Lake format at ADLS gen2. Also they are available through Hive catalog.
It's important to notice that we're currently using PowerBI, but in future we may switch to Excel over AAS.
Question
What is the best way (or hack) to connect AAS to my ADLS gen2 data in Delta Lake format?
The issue
There are no Databricks/Hive among AAS supported sources. AAS supports ADLS gen2 through Blob connector, but AFAIK, it doesn't support Delta Lake format, only parquet.
Possible solution
From this article I see that the issue may be potentially solved with PowerBI on-premise API gateway:
One example is the integration between Azure Analysis Services (AAS)
and Databricks; Power BI has a native connector to Databricks, but
this connector hasn’t yet made it to AAS. To compensate for this, we
had to deploy a Virtual Machine with the Power BI Data Gateway and
install Spark drivers in order to make the connection to Databricks
from AAS. This wasn’t a show stopper, but we’ll be happy when AAS has
a more native Databricks connection.
The issue with this solution is that we're planning to stop using PowerBI. I don't quite understand how it works, what PBI license and implementation/maintenance efforts it requires. Could you please provide deeper insight on how it'll work?
UPD, 26 Dec 2020
Now, when Azure Synapse Analytics is GA, it has full support of SQL on-demand. That means that serverless Synapse may theoretically be used as a glue between AAS and Delta Lake. See "Direct Query Databricks' Delta Lake from Azure Synapse".
In the same time, is that possible to query Databricks Catalog (internal/external) from Synapse on-demand using ODBC? Synapse supports ODBC as external source.
Power BI Dataflows now supports Parquet files, so you can load from those files to Power BI, however the standard design pattern is to use Azure SQL Data Warehouse to load the file then layer Azure Analysis Service (AAS) over that. AAS does not support parquet, you would have to create a CSV version of the final table, or load it to a SQL Database.
As mentioned the typical architecture, is to have Databricks do some or all of the ETL, then have Azure SQL DW sit over it.
Azure SQL DW has now morphed into Azure Synapse, but this has the benefit of that a Databricks/Spark database now has a shadow copy but accessible by the SQL on Demand functionality. SQL on Demand doesn't require to to have an instance of the data warehouse component of Azure Synapse, it runs on demand, and you per per TB of query. A good outline of how it can help is here. The other option is to have Azure Synapse load the data from external table into that service then connect AAS to that.

Azure Data Factory architecture with Azure SQL database to Power BI

I'm no MS expert - recently hopped onto the Azure train and apologies in advance if I get some information wrong.
Basically need some input in Azure's architecture utilising Azure Data Factory (as the ETL/ELT tool) and Azure SQL database (as the storage), to a BI output - Power BI. My situation is this;
I have on-premise data sources such as Oracle DB, Oracle Cloud SSAS, MS SQL server db
I'd like to have a MS cloud infrastructure solution for reporting purposes.
No data migration needed - merely pumping on-prem data onto cloud and producing a BI reporting solution
Based on my limited knowledge and Google research, Azure Data Factory caters for all my on-prem sources, as well as the future cloud Azure SQL database. If future analysis is needed, Azure Storage and Azure Databricks can be added in to this architecture. I have sketched out the architecture of my proposed solution.
Just confirming my understanding
Without Azure Storage & Databricks (the 2 pink boxes), the 2 Azure component (DF & SQL database) is sufficient to take data from on-premise sources, process on cloud & output into Power BI.
With Azure Storage & Databricks (the 2 pink boxes), processing will be more efficient as their summarised function is to store training data models & act as an analytics processing engine.
Azure SQL database is more suitable, as compared to Azure SQL datawarehouse as my data sources does not exceed 1TB; cost-wise is cheaper AND one of my data sources contain data from call centers, hence OLTP is more suitable. Plus I have Azure Databricks to support the analytical bit that SQL datawarehouse does (OLAP).
Any other comments to help me understand this whole architecture will be great!
I am a new learner of Azure. I was wondering if we have #Query (value="...") kind or any equivalence for DocumentDb (CosmosDB). Because, the documentDB does not take #Query. I am looking to convert the sql query (From jpa to cosmosDB).
Taking data from on-prem or IaaS sources like SQL on a VM, Oracle etc, requires a Self-Hosted Integration Runtime (SHIR).
Please review the Modern Data Warehouse pattern which sounds similar to what you are proposing.

Need solution to integrate Grafana with Azure data lake

I want to integrate Azure data lake storage with Grafana for visualization of time series data. I need to know what all the tools I can use to make it possible.
I used ADF to extract data from csv files stored in data lake and move to a table in Azure data explorer. After that, I used Azure data explorer plugin in grafana to visualize the same. It worked fine. But I need to know is there any other approach which may be better or cost-effective.
Integrating Grafana with Azure Data Lake is the best option when compared to others because the other options include data movements using ADF and additional cost for Azure SQL Datawarehouse along with the cost of PowerBI.
Reason:
Grafana is a leading open source software designed for visualizing time series analytics. It is an analytics and metrics platform that enables you to query and visualize data and create and share dashboards based on those visualizations. Combining Grafana’s beautiful visualizations with Azure Data Explorer’s snappy ad hoc queries over massive amounts of data, creates impressive usage potential.
The Grafana and Azure Data Explorer teams have created a dedicated plugin which enables you to connect to and visualize data from Azure Data Explorer using its intuitive and powerful Kusto Query Language. In just a few minutes, you can unlock the potential of your data and create your first Grafana dashboard with Azure Data Explorer.
For more details on visualizing data from Azure Data Explorer in Grafana please visit our documentation, “Visualize data from Azure Data Explorer in Grafana”.
Other options:
For Azure Data Lake Gen1:
You can use a mix of services to create visual representations of data stored in Data Lake Storage Gen1.
You can start by using Azure Data Factory to move data from Data Lake Storage Gen1 to Azure SQL Data Warehouse.
After that, you can integrate Power BI with Azure SQL Data Warehouse to create visual representation of the data.
For Azure Data Lake Gen2:
You can use a mix of services to create visual representations of data stored in Data Lake Storage Gen2.
You can start by using Azure Data Factory to move data from Data Lake Storage Gen2 to Azure SQL Data Warehouse.
After that, you can integrate Power BI with Azure SQL Data Warehouse to create visual representation of the data.
Hope this helps.
They just released a new guide. This is for Grafana 5.3
https://learn.microsoft.com/en-us/azure/data-explorer/grafana
you are able to test this by running Grafana in a Docker container (or for real, if you want). I followed the guide, and it is working almost exactly as expected. The only issue I am having is Grafana is concatenating the column name and the data in the column, making reading and formatting tricky.

using Azure Data Lake for Analytics

Currently as part of our requirements we are working with the below Azure components
Azure Event Hub
Azure Stream Analytics
Azure Table Storage
Azure Sql DB
Basically with first 3 components, we will be building an Analytics and Reports platform.
Currently as we just started we analyze the data from Azure Table Storage and display it in the analytics dashboard.
Recently we came across a new Azure product Azure Data Lake . Doing some research on microsoft website , we could see we can easily migrate data from Azure Table Storage (with help of Azure Data Factory) to Azure Lake Store. Creating big data pipelines using Azure Data Lake and Azure Data Factory
As we go through the above link, it's mentioned that we need to create an Azure Data Lake Analytics pipeline to process the data.
So what am unclear is the where will be analytics output data will be saved. Do we need to save the analytics output to some DB ? or can we real-time analytics through a Http request ?
We have huge number rows of records in Azure Table Storage that will be moved to Azure Data Lake. For this scenario is it a good option or Can we go an analytics-based solution from Azure Table Storage itself.
Please share your thoughts
You can store your analytics output data on Azure Data Lake Store (a data repository that enables you to store all kinds of data in their raw format without defining schemas.) after processing it through Azure Data lake Analytics (An analytics service that enables you to run jobs on data sets without having to think about clusters.)
As you said "We have huge number rows of records in Azure Table Storage that will be moved to Azure Data Lake.", I think performing analytics on data placed on Azure data lake store is much more efficient because it offers unlimited storage with immediate read/write access to it and scaling the throughput you need for your workloads. It's also offers small writes at low latency for big data sets. So I believe it is better choice then Azure Table storage.

Resources