I have a view in an on-demand (or "serverless") sql pool. My goal is to over data from the serverless views and materialize them as tables in the dedicated pool. Is this possible?
There are a couple of options here:
create a Synapse Pipeline with Copy activity. Use the serverless and the source and the dedicated sql pool as the sink. Make sure the 'Auto create table' option is set on the sink
create a Synapse notebook that connects via jdbc to the serverless sql pool (it's just a sql endpoint right), and writes into dedicated sql pool via the synapsesql.write method. I did an example of that technique here.
As per the official Microsoft documentation:
Limitations
Views in Synapse SQL are only stored as metadata. Consequently, the following options aren't available:
There isn't a schema binding option
Base tables can't be updated through the view
Views can't be created over temporary tables
There's no support for the EXPAND / NOEXPAND hints
There are no indexed views in Synapse SQL
But, as an alternative, if your table is in dedicated SQL pool you can use CREATE TABLE AS SELECT (CTAS) that creates a new table based on the output of a SELECT statement. CTAS is the simplest and fastest way to create a copy of a table.
To know more, please refer CREATE TABLE AS SELECT (Azure Synapse Analytics).
Related
I am trying to transform data from Salesforce before loading it to dedicated SQL pool.
When I try to create a dataset from Synapse's Dataflow, I am not able to choose Salesforce a Data store:
Can anyone suggest how to transform data from Salesforce or any other Datasource that is not supported by Dataflow?
As per the Official Documentation, Currently Dataflows does not support Salesforce data as source or sink.
If you want, you can raise the feature request in the Synapse portal.
As an alternate, you can use Copy activity in the Azure Data factory to copy data from Salesforce to Dedicated SQL pool and then you can transform it using Dataflows in synapse from Dedicated SQL DB to Dedicated SQL DB.
Follow the below steps to achieve your requirement:
First create a Data Factory Workspace.
Select the Author hub and a create a pipeline. Now, drag the copy activity from the workspace and select the source. You can see that Salesforce is supported when you select new source dataset. Select it and create a linked service for that.
Now, select the sink dataset and click on Azure Synapse analytics.
Create a linked service for the Dedicated SQL database and select it.
Then, you can select the table in the Dedicated SQL and copy your data by running this.
After this copy, go to Synapse workspace and click on the Source of the Dataflow.
Select the Azure Synapse Analytics in source and click on continue.
Now, click on New to create linked service for the SQL DB. Give the subscription and server name and authenticate with your database.
After the creation of linked service, select it and give your table which is result of the copy in the DB.
Now, go to sink and select Azure Synapse Analytics and create another linked service for it as same above and select the resultant table in DB which you want after transform.
By following the above process, we can achieve the transformation from Salesforce data to Dedicated SQL DB.
Can anyone suggest how to transform data from Salesforce or any other Datasource that is not supported by Dataflow?
You can try this approach for the data stores which are not supported by the Data flows and please refer this to check various data stores supported by Copy activity before doing this process for the other data stores.
I am trying to transform data from ADLS by using Azure Synapse's Dataflow and store it in a table in Dedicated SQL Pool.
I created a Dataset 'UserSinkDataset' pointing to this table in Dedicated SQL Pool.
This 'UserSinkDataset' is not visible in sink dataset of dataflow
There is no option to create a dataset pointing to Dedicated pool from dataflow
Could someone help me understand why is it not being shown in the dropdown?
There is no option to create a dataset referring to dedicated SQL pool instead it provides Azure Synapse Analytics. That is why it is not showing the UserSinkDataset (Azure Synapse Dedicated SQL pool) in the dropdown. So, you can use Azure Synapse Analytics option to point to the table in dedicated SQL pool and create your dataset.
You can follow the steps given below.
Once you reach the sink step, click on new.
Browse for Azure Synapse Analytics and continue.
Create a new linked service by clicking on new.
Specify your workspace, dedicated SQL pool (the one you want to point to) and authentication for the synapse workspace. Test the connection and create the linked service.
After creating the linked service, you can select dbo.SFUser from your SQL pool and click ok.
Now you can go ahead and set the rest of the properties for sink.
You can also create ‘UserSinkDataset’ by choosing azure synapse analytics instead of azure synapse dedicated SQL pool before creating dataflow. This way the dataset created will appear in the dropdown list on sink dataset property.
Difference between the dedicated sql pool and dedicated sql pool inside the azure synapse analytics?
While provision the azure synapse analytics we will use the Azure storage layer gen2 ,as per the msdn the data will be stored in the azure storage gen2 but azure gen2 will use the hdfs features.so how the dfs feature will use the syanpse analytics?
They both are the same thing. Either you can first create a Dedicated SQL pool and link it with Synapse Workspace, or you can first create the Synapse Workspace and then dedicated pool inside it.
A dedicated SQL pool offers T-SQL based compute and storage capabilities. After creating a dedicated SQL pool in your Synapse workspace, data can be loaded, modeled, processed, and delivered for faster analytic insight.
Apart from Dedicated SQL pool, Azure Synapse provide Serverless SQL and Apache Spark pools. Based on your requirement you can choose the appropriate.
Serverless SQL pool is a query service over the data in your data lake. It enables you to access your data through the following functionalities:
A familiar T-SQL syntax to query data in place without the need to copy or load data into a specialized store.
Integrated connectivity via the T-SQL interface that offers a wide range of business intelligence and ad-hoc querying tools, including the most popular drivers.
You will be directly passing the file path stored in Data Lake Gen2 in T-SQL statement. Refer example below:
select top 10 *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',
format = 'csv',
parser_version = '2.0',
firstrow = 2 ) as rows
For more related information, I recommend you to go through this document.
I am working on Tutorial 4 of this doc from Azure Team where this section in a Dedicated SQL pool (that actually is also a database - as item 2 of Tutorial states) creates a database named nyctaxi with a table nyctaxi.trip as follows:
%%pyspark
spark.sql("CREATE DATABASE IF NOT EXISTS nyctaxi")
df.write.mode("overwrite").saveAsTable("nyctaxi.trip")
Then Tutorial 5 creates another table NYCTaxiTripSmall. After completing these tutorials, I can see (on the Data Hub of the Synapse studio) my Dedicated SQL Pool as shown below. But when I click on any database object folder it does not show any db object (tables, external tables etc.) and instead, it shows a red cross sign (as shown below).
Question: Why I am not seeing the db objects (described above) in the Dedicated SQL pool below.
Remarks: Please note that I also created DataExplorationDB db and an external source in tutorial 2 using serverless SQL pool - and, as shown in yellow below, I can see that db and its objects. So why same is not true for Dedicated SQL Pool db? I have also restarted the Dedicated SQL pool and it's online. But still no db objects are showing.
When you create a Spark database the tables aren't automatically added to your Dedicated SQL Pool. You can add them as External Tables if you want, but there's no automatic metadata sync between Spark and Dedicated SQL Pool.
Synapse does create a serverless "Lake Database" corresponding to your Spark database, which you can use from SQL Scripts or access with SQL Server reporting tools.
I am looking at a Data Lake csv file and want to create an external table in the serverless SQL Pool of Microsoft Synapse. The goal is to query this file with Row Level Security constraints in place.
When the external table is created on a dedicated Server, I am able to query the file with Row Level Security constraints in place.
How can I make the Row Level security for external tables on a serverless SQL Pool?
Unfortunately, row level-security is not supported in serverless SQL pool at the moment.
Can you please vote for this on our User Voice?
https://feedback.azure.com/forums/307516-sql-data-warehouse?category_id=171048
You can't use the feature as it is. T-SQL support on Serverless is limited.
E.g. CREATE FUNCTION isn't supported.
This syntax is not supported by serverless SQL pool in Azure Synapse Analytics.
You could of course try to DIY using Views which are supported in Serverless.
In the figure below Entitlements would become another CSV and EXTERNAL TABLE that you would create.
You'll have to either find the right function to get current user and/or role for View's SELECT query, or provide it via some wrapper code from some other place where you maintain your own Context.
Disclaimer: I've not done this in Serverless so can't say for sure.