I am trying to read data on an Azure SQL instance from an Azure Databricks workspace, avoiding using username/password personal credentials for automated, regular data fetch & analysis. I thought using a managed identity would do the job, however it looks to be less smooth than with Azure Functions or Web Services. Is this supported in Databricks?
I would need environment variables that do not exist in the Databricks instance, like IDENTITY_ENDPOINT and IDENTITY_HEADER, following the doc https://learn.microsoft.com/en-us/azure/app-service/overview-managed-identity
Any insight would be greatly appreciated!
Related
We are trying to ingest some data from DataLake to Azure Cosmos DB and Spark OLTP Connector seems to be the easiest to use.
But due to the company's policy, we are not supposed to use the master keys and we usually use managed identity for the applications. I see the Cosmos DB Java client builder has the 'TokenCredential' option with sample code as:
CosmosAsyncClient client = CosmosClientBuilder
.credential(new DefaultAzureCredentialBuilder().build())
.buildAsyncClient();
Is there anyway to setup the connector to use the same authentication mechanism with managed identity?
I see the Cosmos DB Java client builder has the 'TokenCredential' option with sample code
In CosmosAsyncClient you also have to mention the maker key. there is no such way to use managed identities.
we are not supposed to use the master keys and we usually use managed identity for the applications.
As you want to transfer data from Datalake to CosmosDB with Managed Identities you can use Copy Data Tool in Azur data factory. Create Linked service for cosmos db and in authentication type select Managed identity either system or user.
You can refer this So Thread by #KarthikBhyresh-MT for more understanding on Copy data tool.
Currently, the Spark Connector does not support MSI. I see you correctly created the Issue on the repo that holds the source code: https://github.com/Azure/azure-sdk-for-java/issues/29958
That will surely be used for tracking purposes or at least linking to the workitem that tracks the progress on that area. The feature will be available in the future but there is currently no ETA.
I am trying to create an ADF Linked Service connection to a Synapse Link Serverless SQL Pool connected to ADSL Storage. I can successfully get a connection but when I try and use a dataset to access the data I get a permission issue.
I can successfully access the data via Synapse studio :
This is the error I get when I use the data set in ADF.
I can also look at the schemas in SSMS , where they appear as External tables. But get a similar credential error at the same point.
Has anyone come across this issue please ?
There are a few pieces of information you haven’t supplied in your question but I believe I know what happened. The external table worked in Synapse Studio because you were connected to the Serverless SQL pool with your AAD account and it passed through your AAD credentials to the data lake and succeeded.
However when you setup the linked service to the Serverless SQL Pool Im guessing you used a SQL auth account for the credentials. With SQL auth it doesn’t know how to authenticate with the data lake so looked for a server scoped credential but couldn’t find one.
The same happened when you connected from SSMS with a SQL auth account I’m guessing.
You have several options. If it’s important to be able to access the external table with SQL auth you can execute the following to tell it how to access the data lake. This assumes the Synapse Workspace Managed Service Identity has Storage Blob Data Reader or Storage Blob Data Contributor role on the data lake.
CREATE CREDENTIAL [https://<YourDataLakeName>.dfs.core.windows.net]
WITH IDENTITY = 'Managed Identity';
Or you could change the authentication on the linked service to use the Managed Service Identity.
I am trying to test out Azure Purview and connect it to an Azure SQL Server. Since the SQL server is hosted in the cloud I want to use the default AutoResolve Integrated Runtime to get connected but there is not one setup or an option to setup a new one. Has anyone else using Purview been able to setup (or needed to setup) an AutoResolve IR?
To connect to Azure SQL DB/MI you can directly go to the Azure Purview portal and register new data sources and select Azure SQL DB/MI.
In this article - Manage data sources in Azure Purview (Preview), you learn how to register new data sources, manage collections of data sources, and view sources in Azure Purview (Preview).
Only to connect on-premise SQL server you need to Set up a
self-hosted integration runtime to scan the data source.
If the data source is located on Azure, you don't need any integration runtime to scan the data source.
Reference: Register and scan an Azure SQL Database.
CHEEKATLAPRADEEP-MSFT is absolutely correct, to go a step further, since you know what an auto resolve integration runtime is, you probably are utilizing Azure Data Factory so in addition to registering your SQL Server, you can also link your Azure Data Factory for data lineage purposes. Based on the pipelines that are executed, it will autonomously create the data lineage.
Navigation to Link Data Factory
Data lineage created by linking Data Factory
Keep in mind, you will have to execute pipelines after linkage for it to pick up the data lineage. Also, for sources or destinations not supported yet, it will not get the data lineage.
Does Azure ML only provide output through it's web services?
Is it possible to feed the output to an Azure SQL database?
Is it possible to feed the output to a Redshift database?
Essentially I am looking to know if I can integrate Azure ML Studio with our existing redshift analytics database.
yes you can write to SQL DB in Azure.
you can also use a Python module to make REST calls so in theory you can write to Redshift.
Writing to SQL DB is possible in Azure ML and so is Writing directly to Azure Blob Storage.
However, unlike #Hai, I do not believe you can write to a Redshift DB since it is clearly stated by the "Python Module" documentation from Microsoft that the Python execution is Sandboxed and therefore can not access resources outside the virtual machine it runs on(i.e Internet resources, on-premises resources, ...)
A simple question: Can this be achieved directly? I mean without the Azure blob storage in between (as showed in all the examples)? Can someone provide some code example please.
yes, you can do this directly. In fact, you can do direct copies from any of our supported sources/sinks, you don't have to pass through blob. To go from on-prem SQL Server-->SQL azure, you will need to setup a Data Management Gateway connector on your on-prem server. Then, you use a linked service of type AzureStorage and an output dataset of type AzureSQLTable as the output dataset, instead of AzureBlob as is shown in the example. The exact steps to setup the DMG and the JSON code for the linked services, datasets, and pipelines can be found in our documentation. We are also improving our UI in the near future to make these kinds of copy setups an easy code-free experience.
https://azure.microsoft.com/en-us/documentation/articles/data-factory-sqlserver-connector/