What to do when my Data source is not supported by Azure Synapse's Data Flow? - azure

I am trying to transform data from Salesforce before loading it to dedicated SQL pool.
When I try to create a dataset from Synapse's Dataflow, I am not able to choose Salesforce a Data store:
Can anyone suggest how to transform data from Salesforce or any other Datasource that is not supported by Dataflow?

As per the Official Documentation, Currently Dataflows does not support Salesforce data as source or sink.
If you want, you can raise the feature request in the Synapse portal.
As an alternate, you can use Copy activity in the Azure Data factory to copy data from Salesforce to Dedicated SQL pool and then you can transform it using Dataflows in synapse from Dedicated SQL DB to Dedicated SQL DB.
Follow the below steps to achieve your requirement:
First create a Data Factory Workspace.
Select the Author hub and a create a pipeline. Now, drag the copy activity from the workspace and select the source. You can see that Salesforce is supported when you select new source dataset. Select it and create a linked service for that.
Now, select the sink dataset and click on Azure Synapse analytics.
Create a linked service for the Dedicated SQL database and select it.
Then, you can select the table in the Dedicated SQL and copy your data by running this.
After this copy, go to Synapse workspace and click on the Source of the Dataflow.
Select the Azure Synapse Analytics in source and click on continue.
Now, click on New to create linked service for the SQL DB. Give the subscription and server name and authenticate with your database.
After the creation of linked service, select it and give your table which is result of the copy in the DB.
Now, go to sink and select Azure Synapse Analytics and create another linked service for it as same above and select the resultant table in DB which you want after transform.
By following the above process, we can achieve the transformation from Salesforce data to Dedicated SQL DB.
Can anyone suggest how to transform data from Salesforce or any other Datasource that is not supported by Dataflow?
You can try this approach for the data stores which are not supported by the Data flows and please refer this to check various data stores supported by Copy activity before doing this process for the other data stores.

Related

How can I make tables from serverless Azure Synapse views?

I have a view in an on-demand (or "serverless") sql pool. My goal is to over data from the serverless views and materialize them as tables in the dedicated pool. Is this possible?
There are a couple of options here:
create a Synapse Pipeline with Copy activity. Use the serverless and the source and the dedicated sql pool as the sink. Make sure the 'Auto create table' option is set on the sink
create a Synapse notebook that connects via jdbc to the serverless sql pool (it's just a sql endpoint right), and writes into dedicated sql pool via the synapsesql.write method. I did an example of that technique here.
As per the official Microsoft documentation:
Limitations
Views in Synapse SQL are only stored as metadata. Consequently, the following options aren't available:
There isn't a schema binding option
Base tables can't be updated through the view
Views can't be created over temporary tables
There's no support for the EXPAND / NOEXPAND hints
There are no indexed views in Synapse SQL
But, as an alternative, if your table is in dedicated SQL pool you can use CREATE TABLE AS SELECT (CTAS) that creates a new table based on the output of a SELECT statement. CTAS is the simplest and fastest way to create a copy of a table.
To know more, please refer CREATE TABLE AS SELECT (Azure Synapse Analytics).

Not able to transform and load from ADLS(csv) to Dedicated SQL Pool by using Azure Synapse's Dataflow

I am trying to transform data from ADLS by using Azure Synapse's Dataflow and store it in a table in Dedicated SQL Pool.
I created a Dataset 'UserSinkDataset' pointing to this table in Dedicated SQL Pool.
This 'UserSinkDataset' is not visible in sink dataset of dataflow
There is no option to create a dataset pointing to Dedicated pool from dataflow
Could someone help me understand why is it not being shown in the dropdown?
There is no option to create a dataset referring to dedicated SQL pool instead it provides Azure Synapse Analytics. That is why it is not showing the UserSinkDataset (Azure Synapse Dedicated SQL pool) in the dropdown. So, you can use Azure Synapse Analytics option to point to the table in dedicated SQL pool and create your dataset.
You can follow the steps given below.
Once you reach the sink step, click on new.
Browse for Azure Synapse Analytics and continue.
Create a new linked service by clicking on new.
Specify your workspace, dedicated SQL pool (the one you want to point to) and authentication for the synapse workspace. Test the connection and create the linked service.
After creating the linked service, you can select dbo.SFUser from your SQL pool and click ok.
Now you can go ahead and set the rest of the properties for sink.
You can also create ‘UserSinkDataset’ by choosing azure synapse analytics instead of azure synapse dedicated SQL pool before creating dataflow. This way the dataset created will appear in the dropdown list on sink dataset property.

Loading data into Azure Synapse Analytics from Azure SQL Database

I am followin this tutorial to move data from SQL to Azure Synapse https://learn.microsoft.com/en-us/azure/data-factory/load-azure-sql-data-warehouse?tabs=data-factory
However, once I get to step 5c I cannot select a a Database name, Do I have to create an Azure Synapse Database first to copy data over there? I though that is what this tutorial will do?
I have a SQL database and I want to move the data into Azure Synapse.
Thanks
Yes, in Azure data factory your source and sink needs to be already present w.r.t database scenarios.
So it is expected that you already have an Azure SQL database and Azure SQL datawarehouse in place before proceeding with copy activity

Add SQL Server as a data source in Azure Data Lake Analytics

I'm doing some tests with Azure Data Lake Analytics and I can’t add a new SQL Server database as a Data Source. When I click on "Add data source", the only two available options are: "Azure Data Lake Storage Gen1" and "Azure Storage".
What I want is to add one SQL Server database so that I can run U-SQL queries against it.
Our SQL Server firewall is correctly configured to allow access to Azure Services, but I am not allowed to add it as a data source.
How can this be done? Is it a matter of other configuration issues?
Any help would be greatly appreciated.
Per my research ,there is no other configuration issues for sql server data source in DLA. Based on this official doc, DLA only supports two data sources:Data Lake Store and Azure Storage.
As workaround , I suggest you using Azure Data Factory to transfer data from sql server database to azure storage so that you could run U-SQL script against data source.
Any concern,please let me know.

Error trying to copy data from Azure SQL database to Azure Blob Storage

I have created a pipeline in Azure data factory (V1). I have a copy pipeline, that has an AzureSqlTable data set on input and AzureBlob data set as output. The AzureSqlTable data set that I use as input, is created as output of another pipeline. In this pipeline I launch a procedure that copies one table entry to blob csv file.
I get the following error when launching pipeline:
Copy activity encountered a user error: ErrorCode=UserErrorTabularCopyBehaviorNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=CopyBehavior property is not supported if the source is tabular data source.,Source=Microsoft.DataTransfer.ClientLibrary,'.
How can I solve this?
According to the error information, it indicateds that it is not supported action for Azure data factory, but if use Azure sql table as input and Azure blob data as output it should be supported by Azure data factory.
I also do a demo test it with Azure portal. You also could follow the detail steps to do that.
1.Click the copy data from Azure portal.
2.Set copy properties.
3.Select the source
4.Select the destination data store
5.Complete the deployment
6.Check the result from azure and storage.
Update:
If we want to use the existing dataset we could choose [From Existing Conections], for more information please refer to the screenshot.
Update2:
For Data Factory(v1) copy activity settings it just supports to use existing Azure blob storage/Azure Data Lake Store Dataset. More detail information please refer to this link.
If using Data Factory(V2) is acceptable, we could using existing azure sql dataset.
So, actually, if we don't use this awful "Copy data (PREVIEW)" action and we actually add an activity to existing pipeline and not a new pipeline - everything works. So the solution is to add a copy activity manually into an existing pipeline.

Resources