How to use Temporary credentials from AssumeRole in Spark configuration - apache-spark

I'm currently facing a issue where I'm unable to create a Spark session (through PySpark) that uses temporary credentials (from a assumed role in a different AWS account).
The idea is to assume a role in Account B, get temporary credentials and create the spark session in Account A, so that Account A is allowed to interact with Account B through the Spark Session.
I've almost tried every possible configuration available in my spark session. Is there anyone that has some reference material to create a spark session using temporary credentials?
role_arn = "arn:aws:iam::account-b:role/example-role"
duration_seconds = 60*15 # durations of the session in seconds
# obtain the temporary credentials
credentials = boto3.client("sts").assume_role(
RoleArn=role_arn,
RoleSessionName=role_session_name#,
# DurationSeconds=duration_seconds
)['Credentials']
spark = SparkSession \
.builder \
.enableHiveSupport() \
.appName("test") \
.config("spark.jars", "/usr/local/spark/jars/hadoop-aws-2.10.0.jar,/usr/local/spark/jars/aws-java-sdk-1.7.4.jar")\
.config("spark.hadoop.fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain") \
.config("spark.hadoop.fs.s3a.access.key", credentials['AccessKeyId']) \
.config("spark.hadoop.fs.s3a.secret.key", credentials['SecretAccessKey']) \
.config("spark.hadoop.fs.s3a.endpoint", "s3.eu-west-1.amazonaws.com") \
.getOrCreate()
The above seems to not work, it does not implicitly use the credentials I pass to the spark session. It uses the actual underlying execution role of the process.
Looking at the documentation there's also some notes on 'short living credentials' not being supported. So I wonder how others are able to create a spark session with temporary credentials?

update hadoop aws and compatible binaries (including aws sdk) to one written in the last eight years.
which will then include the temporary credential support

Related

List databricks secret scope and find referred keyvault in azure databricks

How can we find existing secret scopes in databricks workspace. And which keyvault is referred by specific SecretScope in Azure Databricks?
This command lists available scopes on databricks:
dbutils.secrets.listScopes()
You can do this with either:
Databricks Secrets REST API - the list secret scopes API will give that information
Databricks CLI - the databricks secrets list-scopes command will show your KeyVault URL
You can try this snippet here in Python:
import pandas
import json
import requests
# COMMAND ----------
# MAGIC %md ### define variables
# COMMAND ----------
pat = 'EnterPATHere' # paste PAT. Get it from settings > user settings
workspaceURL = 'EnterWorkspaceURLHere' # paste the workspace url in the format of 'https://adb-1234567.89.azuredatabricks.net' Note, the URL must not end with '/'
# COMMAND ----------
# MAGIC %md ### list secret scopes
# COMMAND ----------
response = requests.get(workspaceURL + '/api/2.0/secrets/scopes/list',\
headers = {'Authorization' : 'Bearer '+ pat,\
'Content-Type': 'application/json'})
pandas.json_normalize(json.loads(response.content), record_path = 'scopes')
I have happened to have written a blog post about this where a full Python script is provided to manage secret scopes in Azure Databricks.

Authenticate Spark to GCS with HMAC key

We have a Spark application accessing GCS using the GCP connector. We would like to authenticate using a service account HMAC key. Is this possible?
We have tried a few of the authentication configurations here but none seems to work.
Here's an example of what we are trying to do
val spark = SparkSession.builder()
.config("google.cloud.auth.client.id", "HMAC key id")
.config("google.cloud.auth.client.secret", "HMAC key secret")
.master("local[*]")
.appName("Test App")
.getOrCreate()
df.write.format("parquet")
.save("gs://test-project/")
We have tried the keyfile JSON which works, but HMAC would be a bit more convenient for us.

Access token invalid after configuring Microsoft Azure Active Directory for Snowflake External OAuth

I was trying to Configure Microsoft Azure AD for External OAuth as per the Snowflake tutorial: https://docs.snowflake.com/en/user-guide/oauth-azure.html
The configuration steps went ahead without a hitch and I was able to use the final step: https://docs.snowflake.com/en/user-guide/oauth-azure.html#testing-procedure to obtain the access token from AAD.
However, when I tried to use the access token with Snowflake using a JDBC driver, I obtained the error: "net.snowflake.client.jdbc.SnowflakeSQLException: Invalid OAuth access token.
The Snowflake integration created is of the form:
create security integration ext_oauth_azure_ad
type = external_oauth
enabled = true
external_oauth_type = azure
external_oauth_issuer = '<issuer-url>'
external_oauth_jws_keys_url = '<keys-url>/discovery/v2.0/keys'
external_oauth_audience_list = ('https://<app-id-uri>')
external_oauth_token_user_mapping_claim = 'upn'
external_oauth_snowflake_user_mapping_attribute = 'login_name'
external_oauth_any_role_mode = 'ENABLE';
I tried playing around with this config by changing the external_oauth_token_user_mapping_claim to email since that was the attribute in the decoded JWT access token that matched the login_name but to no avail.
The scope provided in AD is the session:role-any which should be valid for any scope.
Not sure how to proceed post this.
Edit:
The command used to obtain access token is:
curl -X POST -H "Content-Type: application/x-www-form-urlencoded;charset=UTF-8" --data-urlencode "client_id=<ad-client-id>" --data-urlencode "client_secret=<ad-client-secret>" --data-urlencode "username=<ad-user-email>" --data-urlencode "password=<my-password>" --data-urlencode "grant_type=password" --data-urlencode "scope=<scope-as-in-ad>" 'https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token'
Update:
Tried using the command:
select system$verify_external_oauth_token('<access_token>');
to validate if the token was valid in Snowflake and obtained the result:
Token Validation finished.{"Validation Result":"Failed","Failure Reason":"EXTERNAL_OAUTH_INVALID_SIGNATURE"}
This is strange because I have added the correct issuer based on the configuration step(entityId from the Federation metadata document
)

Connect to Azure SQL Database from DataBricks using Service Principal

I have a requirement to connect to Azure SQL Database from Azure Databricks via Service Principal. Tried searching forums but unable to find the right approach. Any help is greatly appreciated.
Tried a similar approach with SQL User ID and Password with JDBC Connection and it worked successfully. Now looking into Service Principal approach.
P.S: The SP ID and Key should be placed in the Azure Key Vault and needs to be accessed here on Databricks.
You can use Apache Spark Connector for SQL Server and Azure SQL
and an example of what you have to do in Databricks can be found in following Python file
As you can see, we are not directly connecting with the Service Principal, instead, we are using the Service Principal to generate an access token that is going to be used later when specifying the connection parameters:
jdbc_df = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", db_table) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("databaseName", database_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()
But if you can't or don't want to use previous library, you can also do the same with the native Azure-SQL JDBC connector of Spark:
jdbc_df = spark.read.format("com.microsoft.sqlserver.jdbc.SQLServerDriver")\
.option("url", url) \
.option("dbtable", db_table) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("databaseName", database_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()
Azure Key Vault support with Azure Databricks
https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html#akv-ss
**Here's the working Solution**
sql_url=sqlserver://#SERVER_NAME#.database.windows.net:1433;database=#DATABASE_NAME#
properties = {"user":"#APP_NAME#","password":dbutils.secrets.get(scope =
"#SCOPE_NAME#", key =
"#KEYVAULT_SECRET_NAME#"),"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver"}
**APP_NAME**==>which is created under app registration in Azure active directory.
**SCOPE_NAME**==>Which you have create mentioned on docs Follow the
URL(https://docs.azuredatabricks.net/user-guide/secrets/secret-scopes.html)
**KEYVAULT_SECRET_NAME**==>Secret Key name which is put into AKV.
**NOTE PROVIDE ACCESS TO YOUR APP_ID ON DATABASE STEPS MENTIONED BELOW**
CREATE USER #APP_NAME# FROM EXTERNAL PROVIDER
EXEC sp_addrolemember 'db_owner', '#APP_NAME#';
Maybe you can reference this tutorial: Configuring AAD Authentication to Azure SQL Databases.
Summary:
Azure SQL is a great service - you get your databases into the cloud without having to manage all that nasty server stuff. However, one of the problems with Azure SQL is that you have to authenticate using SQL authentication - a username and password. However, you can also authenticate via Azure Active Directory (AAD) tokens. This is analogous to integrated login using Windows Authentication - but instead of Active Directory, you're using AAD.
There are a number of advantages to AAD Authentication:
You no longer have to share logins since users log in with their AAD credentials, so auditing is better
You can manage access to databases using AAD groups
You can enable "app" logins via Service Principals
In order to get this working, you need:
To enable AAD authentication on the Azure SQL Server
A Service Principal
Add logins to the database granting whatever rights required to the service principal
Add code to get an auth token for accessing the database
But in this post, author will walk through creating a service principal, configuring the database for AAD auth, creating code for retrieving a token and configuring an EF DbContext for AAD auth.
Still hope this tutorial can helps.

How to load a key into IBM KeyProtect using Terraform

I would like to use the IBM Terraform provider to provision a KeyProtect instance containing a standard key.
Getting a KeyProtect instance is easy: Use a service instance of type kms.
Does Terraform offer a way of inserting a specified key in the KeyProtect instance?
Not tested, but should work... ;-)
The IBM Terraform provider is only for the cloud resources, not for "application data". However, there is a REST API Provider which allows to execute calls to REST APIs.
IBM Cloud Key Protect provides such an interface and allows to either create or import a key. This toolchain deploy script shows an automated way of provisioning Key Protect and creating a new root key (read the security tutorial here). You basically need to code something similar to obtain the necessary token and other metadata:
curl -s -X POST $KP_MANAGEMENT_URL \
--header "Authorization: Bearer $KP_ACCESS_TOKEN" \
--header "Bluemix-Instance: $KP_GUID" \
--header "Content-Type: application/vnd.ibm.kms.key+json" -d #scripts/root-enckey.json
Update:
The Terraform provider has ibm_kms_key and some other resources now. It allows to import existing keys into either Key Protect or Hyper Protect Crypto Services.

Resources