I have a requirement to connect to Azure SQL Database from Azure Databricks via Service Principal. Tried searching forums but unable to find the right approach. Any help is greatly appreciated.
Tried a similar approach with SQL User ID and Password with JDBC Connection and it worked successfully. Now looking into Service Principal approach.
P.S: The SP ID and Key should be placed in the Azure Key Vault and needs to be accessed here on Databricks.
You can use Apache Spark Connector for SQL Server and Azure SQL
and an example of what you have to do in Databricks can be found in following Python file
As you can see, we are not directly connecting with the Service Principal, instead, we are using the Service Principal to generate an access token that is going to be used later when specifying the connection parameters:
jdbc_df = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", db_table) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("databaseName", database_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()
But if you can't or don't want to use previous library, you can also do the same with the native Azure-SQL JDBC connector of Spark:
jdbc_df = spark.read.format("com.microsoft.sqlserver.jdbc.SQLServerDriver")\
.option("url", url) \
.option("dbtable", db_table) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("databaseName", database_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()
Azure Key Vault support with Azure Databricks
https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html#akv-ss
**Here's the working Solution**
sql_url=sqlserver://#SERVER_NAME#.database.windows.net:1433;database=#DATABASE_NAME#
properties = {"user":"#APP_NAME#","password":dbutils.secrets.get(scope =
"#SCOPE_NAME#", key =
"#KEYVAULT_SECRET_NAME#"),"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver"}
**APP_NAME**==>which is created under app registration in Azure active directory.
**SCOPE_NAME**==>Which you have create mentioned on docs Follow the
URL(https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html)
**KEYVAULT_SECRET_NAME**==>Secret Key name which is put into AKV.
**NOTE PROVIDE ACCESS TO YOUR APP_ID ON DATABASE STEPS MENTIONED BELOW**
CREATE USER #APP_NAME# FROM EXTERNAL PROVIDER
EXEC sp_addrolemember 'db_owner', '#APP_NAME#';
Maybe you can reference this tutorial: Configuring AAD Authentication to Azure SQL Databases.
Summary:
Azure SQL is a great service - you get your databases into the cloud without having to manage all that nasty server stuff. However, one of the problems with Azure SQL is that you have to authenticate using SQL authentication - a username and password. However, you can also authenticate via Azure Active Directory (AAD) tokens. This is analogous to integrated login using Windows Authentication - but instead of Active Directory, you're using AAD.
There are a number of advantages to AAD Authentication:
You no longer have to share logins since users log in with their AAD credentials, so auditing is better
You can manage access to databases using AAD groups
You can enable "app" logins via Service Principals
In order to get this working, you need:
To enable AAD authentication on the Azure SQL Server
A Service Principal
Add logins to the database granting whatever rights required to the service principal
Add code to get an auth token for accessing the database
But in this post, author will walk through creating a service principal, configuring the database for AAD auth, creating code for retrieving a token and configuring an EF DbContext for AAD auth.
Still hope this tutorial can helps.
Related
I want to connect Superset to a Databricks for querying the tables. Superset uses SQLAlchemy to connect to databases which requires a PAT (Personal Access Token) to access.
It is possible to connect and run queries when I use the PAT I generated on my account through Databricks web UI? But I do not want to use my personal token in a production env. Even so, I was not able to find how to generate a PAT like token for a Service Principal.
The working SQLAlchemy URI is looks like this:
databricks+pyhive://token:XXXXXXXXXX#aaa-111111111111.1.azuredatabricks.net:443/default?http_path=sql%2Fprotocolv1%qqq%wwwwwwwwwww1%eeeeeeee-1111111-foobar00
After checking the Azure docs, there are two ways on how to run queries between Databricks and another service:
Create a PAT for a Service Principal to be associated with Superset.
Create a user AD account for Superset.
For the first and preferred method, I was able to advance, but I was not able to generate the Service Principal's PAT:
I was able to register an app on Azure's AD.
So I got the tenant ID, client ID and create a secret for the registered app.
With this info, I was able to curl Azure and receive a JWT token for that app.
But all the tokens referred in the docs are JTW's OAUTH2 tokens, which does not seems to work with SQLAlchemy URI.
I know it's possible to generate a PAT for a Service Principal since there is a mention on how to read, update and delete a Service Principal's PAT on the documentation. But it has no information on how to create a PAT for a Service Principal.
I prefer to avoid using the second method (creating an AD user for Superset) since I am not allowed to create/manage users for the AD.
In summary, I have a working SQLAlchemy URI, but I want to use a generated token, associated with a Service Principal, instead of using my PAT. But I can't find how to generate that token (I only found documentation on how to generate OAUTH2 tokens).
You can create PAT for service principal as following (examples are taken from docs, do export DATABRICKS_HOST="https://hostname" before executing):
Add service principal into the Databricks workspace using SCIM API (doc):
curl -X POST '$DATABRICKS_HOST/api/2.0/preview/scim/v2/ServicePrincipals' \
--header 'Content-Type: application/scim+json' \
--header 'Authorization: Bearer <personal-access-token>' \
--data-raw '{
"schemas":[
"urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal"
],
"applicationId":"<application-id>",
"displayName": "test-sp",
"entitlements":[
{
"value":"allow-cluster-create"
}
]
}'
Get AAD Token for service principal (doc, another option is to use az-cli):
export DATABRICKS_TOKEN=$(curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
-d 'grant_type=client_credentials&client_id=<client-id>&resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d&client_secret=<application-secret>' \
https://login.microsoftonline.com/<tenant-id>/oauth2/token|jq -r .accessToken)
Generate token using the AAD Token (doc):
curl -s -n -X POST "$DATABRICKS_HOST/api/2.0/token/create" --data-raw '{
"lifetime_seconds": 100,
"comment": "token for superset"
}' -H "Authorization: Bearer $DATABRICKS_TOKEN"
Currently, I use device code credential to get the access to Azure AD.
device_code_credential = DeviceCodeCredential(
azure_client_id,
tenant_id=azure_tenant_id,
authority=azure_authority_uri)
But I still need to use Azure account username/password to connect to Azure SQL server
driver = 'ODBC Driver 17 for SQL Server'
db_connection_string = f'DRIVER={driver};SERVER={server};' \
f'DATABASE={database};UID={user_name};PWD={password};'\
f'Authentication=ActiveDirectoryPassword;'\
'Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;'
connector = pyodbc.connect(db_connection_string)
Is any way in python under linux/MacOS can allow me to use device_code_credential and access_token to connect to Azure SQL server?
https://github.com/mkleehammer/pyodbc/issues/228
I only got this link and it doesn't seem to work.
Anyone has a fully working sample?
You could reference this tutorial: AzureAD/azure-activedirectory-library-for-python: Connect to Azure SQL Database.
It is doable to connect to Azure SQL Database by obtaining a token from Azure Active Directory (AAD), via ADAL Python. We do not currently maintain a full sample for it, but this essay outlines some key ingredients.
You follow the instruction of Connecting using Access Token to
provision your application. There is another similar blog post here.
Your SQL admin need to add permissions for the app-registration to
the specific database that you are trying to access. See details in
this blog post Token-based authentication support for Azure SQL DB
using Azure AD auth by Mirek H Sztajno.
It was not particularly highlighted in either of the documents
above, but you need to use https://database.windows.net/ as the
resource string. Note that you need to keep the trailing slash,
otherwise the token issued would not work.
Feed the configuration above into ADAL Python's Client Credentials
sample.
Once you get the access token, use it in this way in pyodbc to
connect to SQL Database.
This works with AAD access tokens. Example code to expand the token and prepend the length as described on the page linked above, in Python 2.x:
token = "eyJ0eXAiOi...";
exptoken = "";
for i in token:
exptoken += i;
exptoken += chr(0);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
conn = pyodbc.connect(connstr, attrs_before = { 1256:bytearray(tokenstruct) });
3.x is only slightly more involved due to annoying char/bytes split:
token = b"eyJ0eXAiOi...";
exptoken = b"";
for i in token:
exptoken += bytes({i});
exptoken += bytes(1);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
conn = pyodbc.connect(connstr, attrs_before = { 1256:tokenstruct });
(SQL_COPT_SS_ACCESS_TOKEN is 1256; it's specific to msodbcsql driver so pyodbc does not have it defined, and likely will not.)
Hope this helps.
You can get a token via
from azure.identity import DeviceCodeCredential
# Recommended to allocate a new ClientID in your tenant.
AZURE_CLI_CLIENT_ID = "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
credential = DeviceCodeCredential(client_id=AZURE_CLI_CLIENT_ID)
databaseToken = credential.get_token('https://database.windows.net/.default')
Then use databaseToken.token as an AAD Access Token as described in Leon Yue's answer.
According to this page.
https://learn.microsoft.com/en-us/archive/blogs/sqlsecurity/token-based-authentication-support-for-azure-sql-db-using-azure-ad-auth
AAD Token-based authentication to access Azure SQL DB is supported only if client is under windows environment.
Could MacOS and Linux support AAD Token-based authentication to access Azure SQL DB?
https://github.com/mkleehammer/pyodbc/issues/228
token = context.acquire_token_with_client_credentials(
database_url,
azure_client_id,
azure_client_secret
)
print(token)
tokenb = bytes(token["accessToken"], "UTF-8")
exptoken = b''
for i in tokenb:
exptoken += bytes({i})
exptoken += bytes(1)
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken
tokenstruct
SQL_COPT_SS_ACCESS_TOKEN = 1256
CONNSTRING = "DRIVER={};SERVER={};DATABASE={}".format("ODBC Driver 17 for SQL Server", prod_server, prod_db)
db_connector = pyodbc.connect(CONNSTRING, attrs_before={SQL_COPT_SS_ACCESS_TOKEN: tokenstruct})
This is the code I run under MacOS and it is python.
I keep getting this issue
pyodbc.InterfaceError: ('28000', "[28000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user ''. (18456) (SQLDriverConnect)")
Anyone has a idea?
It seems that you have not added your application service principal to your Azure SQL database .
What you need to do is to:
1. Enable AAD authentication for your Azure SQL Server. Please select an AAD user in this step.
2. Connect to your Azure SQL Database with the user account you set in step 1.
3. Add your application service principal to your SQL Server, and alert appropriate role to it.
CREATE USER [Azure_AD_principal_name] FROM EXTERNAL PROVIDER;
EXEC sp_addrolemember 'db_owner', 'Azure_AD_principal_name';
Here, the Azure_AD_principal_name should be the application's name.
4. Connect to your Azure SQL Database with AAD
I'm trying the sample code NativeClient-Headless-DotNet.sln against my B2C tenant.
When I attempt to execute the command:
result = authContext.AcquireTokenAsync(todoListResourceId, clientId, uc).Result;
using an existing username and password, I get this exception:
InnerException = {"unknown_user_type: Unknown User Type"}
As far as I know, I've set-up all the values correctly in Web and App config (I'm using the same values that I use in my Graph API project, which works OK).
Any ideas why this should happen?
Are accounts created with:
userType.type = "userName";
found by this method?
Currently, Azure AD B2C doesn't have any direct support for this.
However, work to support for the Resource Owner Password Credentials flow in Azure AD B2C is in-progress.
This new feature will enable a desktop application to collect a user credential and POST it to the B2C tenant for validation.
I'm trying to add a key in my Azure AD application using Azure CLI.
But looking throught the Azure CLI API it seems that there is no such command.
For exmaple:
I'm trying to automate the task from the link below via Azure CLI:
http://blog.davidebbo.com/2014/12/azure-service-principal.html
I can create AD application, service principal, but I can't find a way to add key for newly create AD application.
I'll appreciate any ideas and directions :)
Thanks in advance !
For a new AD application, you can specify a key with -p while creating. For example,
azure ad app create -n <your application name> --home-page <the homepage of you application> -i <the identifier URI of you application> -p <your key>
For an existing AD application, surely the Graph API is able to update the AD Application Credential. Read this API reference, and you can see that the password credential is able to use "POST, GET, PATCH". However, it's too complicated to use the Graph API. I have check the Azure CLI. That functionality is not yet implemented, and the source is unreadable for me. Then, I took a look at Azure SDK for Python, because I am familiar with python, and I found out that they have already implemented it in 2.0.0rc2. See the GitHub Repo
I have written a python script. But, in order to use my script you need to install not only azure2.0.0rc2, but also msrest and msrestazure.
from azure.common.credentials import UserPassCredentials
from azure.graphrbac import GraphRbacManagementClient, GraphRbacManagementClientConfiguration
from azure.graphrbac.models import ApplicationCreateParameters, PasswordCredential
credentials = UserPassCredentials("<your Azure Account>", "<your password>")
subscription_id = "<your subscription id>"
tenant_id = "<your tenant id>"
graphrbac_client = GraphRbacManagementClient(
GraphRbacManagementClientConfiguration(
credentials,
subscription_id,
tenant_id
)
)
application = graphrbac_client.application.get('<your application object id>')
passwordCredential = PasswordCredential(start_date="2016-04-13T06:08:04.0863895Z",
end_date="2018-04-13T06:08:04.0863895Z",
value="<your new key>")
parameters = ApplicationCreateParameters(application.available_to_other_tenants,
application.display_name,
"<the homepage of your AD application>",
application.identifier_uris,
reply_urls=application.reply_urls,
password_credentials = [passwordCredential])
application = graphrbac_client.application.update('<your application object id>', parameters)
The only problem with this script is that you are only able to override all the existing keys of you AD application. You are not able to append a new key. This is a problem of the Graph API. The Graph API does not allow users to read an existing key. One possible solution would be storing your existing keys somewhere else. But, this will bring extra security risk.
I don't have any experience of automating adding the key, I'm not sure it's even possible to be honest. However have a look at the ApplicationEntity documentation in the Graph API, it might be possible using a POST request to the web service.