Difference between the dedicated sql pool and dedicated sql pool inside the azure synapse analytics? - azure

Difference between the dedicated sql pool and dedicated sql pool inside the azure synapse analytics?
While provision the azure synapse analytics we will use the Azure storage layer gen2 ,as per the msdn the data will be stored in the azure storage gen2 but azure gen2 will use the hdfs features.so how the dfs feature will use the syanpse analytics?

They both are the same thing. Either you can first create a Dedicated SQL pool and link it with Synapse Workspace, or you can first create the Synapse Workspace and then dedicated pool inside it.
A dedicated SQL pool offers T-SQL based compute and storage capabilities. After creating a dedicated SQL pool in your Synapse workspace, data can be loaded, modeled, processed, and delivered for faster analytic insight.
Apart from Dedicated SQL pool, Azure Synapse provide Serverless SQL and Apache Spark pools. Based on your requirement you can choose the appropriate.
Serverless SQL pool is a query service over the data in your data lake. It enables you to access your data through the following functionalities:
A familiar T-SQL syntax to query data in place without the need to copy or load data into a specialized store.
Integrated connectivity via the T-SQL interface that offers a wide range of business intelligence and ad-hoc querying tools, including the most popular drivers.
You will be directly passing the file path stored in Data Lake Gen2 in T-SQL statement. Refer example below:
select top 10 *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',
format = 'csv',
parser_version = '2.0',
firstrow = 2 ) as rows
For more related information, I recommend you to go through this document.

Related

Database link for Azure SQL to Azure Synapse Anayltics Serverless SQL Pool

A client of mine needs to join tables from his Azure SQL financial data mart with external tables built upon a Data Lakehouse (Parquet files) in Azure Synapse Analytics.
I was wondering if it's possible to create a database link within a Azure SQL database accessing a Azure Synapse Analytics Serverless (on-demand) SQL Pool.
Yes, it’s possible open the Integrate hub, and select and add a link connection.
Select test connection and make sure to check whether SQL firewall rules are properly configured or not.
Reference:
Get started with Azure Synapse Link for Azure SQL Database (Preview) - Azure Synapse Analytics | Microsoft Docs

Not able to transform and load from ADLS(csv) to Dedicated SQL Pool by using Azure Synapse's Dataflow

I am trying to transform data from ADLS by using Azure Synapse's Dataflow and store it in a table in Dedicated SQL Pool.
I created a Dataset 'UserSinkDataset' pointing to this table in Dedicated SQL Pool.
This 'UserSinkDataset' is not visible in sink dataset of dataflow
There is no option to create a dataset pointing to Dedicated pool from dataflow
Could someone help me understand why is it not being shown in the dropdown?
There is no option to create a dataset referring to dedicated SQL pool instead it provides Azure Synapse Analytics. That is why it is not showing the UserSinkDataset (Azure Synapse Dedicated SQL pool) in the dropdown. So, you can use Azure Synapse Analytics option to point to the table in dedicated SQL pool and create your dataset.
You can follow the steps given below.
Once you reach the sink step, click on new.
Browse for Azure Synapse Analytics and continue.
Create a new linked service by clicking on new.
Specify your workspace, dedicated SQL pool (the one you want to point to) and authentication for the synapse workspace. Test the connection and create the linked service.
After creating the linked service, you can select dbo.SFUser from your SQL pool and click ok.
Now you can go ahead and set the rest of the properties for sink.
You can also create ‘UserSinkDataset’ by choosing azure synapse analytics instead of azure synapse dedicated SQL pool before creating dataflow. This way the dataset created will appear in the dropdown list on sink dataset property.

Can I use Azure Synapse functionality outside the Azure environment?

Forum,
I am currently looking into Azure Synapse as an option for migrating our on-prem data architecture. I am excited by the functionality it offers - SQL Pools, Spark Pools, and the accompanying notebooks. I get that Synapse can function as a all in one data platform, where my data scientists and data analists can use its functionality to deliver insights at will. However, a large part of the work my team does is creating data products.
We currently have a kubernetes cluster with several stand-alone API's that perform data-science operations in the larger whole of our software. They can be thought of as microservices. Most of the ETL is done in our SQL-server, and the microservices in our K8S cluster (usually python + some python packages + FastAPI) typically get the required data from our SQL-server through some SQL-query with an ODBC connector.
Now my question is, how suitable is Synapse for such an architecture? Can I call upon the SQL-pool or spark-pool to do the heavy data-lifting from outside the azure environment, say from a kubernetes pod?
Unfortunately you can't integrate Azure Synapse Analytics with Kubernetes Services.
While Synapse SQL helps perform SQL queries, Apache Spark executes batch/stream processing on Big Data. SQL Pool is used to work with data stored in Dedicated SQL Pool while Spark SQL can be integrated with existing data preparation or data science projects that you may hold in Azure Databricks or Azure Machine Learning Services.
Also, as per this third-party document, Azure Synapse Analytics can't integrate with Kubernetes Services.
As a workaround, you can copy/move your data from Kubernetes to Azure Services like Azure Dedicated SQL Pool, Azure Blob Storage or Azure Data Lake Storage and then integrate it with Azure Synapse pipeline or Spark Pool.

Load data from Databricks to Azure Analysis Services (AAS)

Objective
I'm storing data as Delta Lake format at ADLS gen2. Also they are available through Hive catalog.
It's important to notice that we're currently using PowerBI, but in future we may switch to Excel over AAS.
Question
What is the best way (or hack) to connect AAS to my ADLS gen2 data in Delta Lake format?
The issue
There are no Databricks/Hive among AAS supported sources. AAS supports ADLS gen2 through Blob connector, but AFAIK, it doesn't support Delta Lake format, only parquet.
Possible solution
From this article I see that the issue may be potentially solved with PowerBI on-premise API gateway:
One example is the integration between Azure Analysis Services (AAS)
and Databricks; Power BI has a native connector to Databricks, but
this connector hasn’t yet made it to AAS. To compensate for this, we
had to deploy a Virtual Machine with the Power BI Data Gateway and
install Spark drivers in order to make the connection to Databricks
from AAS. This wasn’t a show stopper, but we’ll be happy when AAS has
a more native Databricks connection.
The issue with this solution is that we're planning to stop using PowerBI. I don't quite understand how it works, what PBI license and implementation/maintenance efforts it requires. Could you please provide deeper insight on how it'll work?
UPD, 26 Dec 2020
Now, when Azure Synapse Analytics is GA, it has full support of SQL on-demand. That means that serverless Synapse may theoretically be used as a glue between AAS and Delta Lake. See "Direct Query Databricks' Delta Lake from Azure Synapse".
In the same time, is that possible to query Databricks Catalog (internal/external) from Synapse on-demand using ODBC? Synapse supports ODBC as external source.
Power BI Dataflows now supports Parquet files, so you can load from those files to Power BI, however the standard design pattern is to use Azure SQL Data Warehouse to load the file then layer Azure Analysis Service (AAS) over that. AAS does not support parquet, you would have to create a CSV version of the final table, or load it to a SQL Database.
As mentioned the typical architecture, is to have Databricks do some or all of the ETL, then have Azure SQL DW sit over it.
Azure SQL DW has now morphed into Azure Synapse, but this has the benefit of that a Databricks/Spark database now has a shadow copy but accessible by the SQL on Demand functionality. SQL on Demand doesn't require to to have an instance of the data warehouse component of Azure Synapse, it runs on demand, and you per per TB of query. A good outline of how it can help is here. The other option is to have Azure Synapse load the data from external table into that service then connect AAS to that.

Azure Data Lake Analytics Vs Azure SQL Data Warehouse

I am using ADF to connect to sources and get data into Azure Data Lake store. After getting data into Data Lake Store, I want to do some transformation, aggregation and use that data in SSRS reports and also for creating Cubes.
Can anyone suggest me which will be the best option (Azure Data Lake Analytics or Azure SQL DW) ?
I am looking here to make a decision on to take which one after Data lake.
There are no more Azure SQL DW. What we have now are Azure Synapse (same as Azure DW) and Azure Synapse Analytics (instead of Azure Datalake analytics). Microsoft is stopping support (develop) USQL and Azure Datalake analytic. If volume of your data is huge and you want use Polybase technology the best choice is Azure Synapse and Azure Synapse Analytics. You can rich your ADF by using Databricks to do analytics stuff. By using Polybase you can do ELT instead of ETL.
Microsoft Azure is not anymore investing on Azure Data Lake Analytics (ADLA) , you can evidently see that number of enhancements /updates in last couple of years are almost none in ADLA. While on the other side Azure SQL Data Warehouse is their flagship service ( recently names as azure synapse analytics) and hence getting enhanced and updated very fast. Synapse is based on MPP architecture and provides all required capabilities of big data computing.
What is the size of your data? Azure Data Lake is more meant for petabyte size big data processing and Azure SQL Data Warehouse for large relational DWH solutions (starting from 250/500 GB and up).
With Azure Data Lake you can even have the data from a data lake feed a NoSQL database, a SSAS cube, a data mart, or go right into Power BI. With Azure SQL Datawarehouse you can have cubes, Power BI reports and SSRS
If you need SQL Server Reporting Services, Integration Services (and you have complex SSIS logic), and Analysis Services (SSAS), you may better consider an Azure SQL VM.

Resources