Azure passing secrets without depending on Azure Keyvaults - azure

I am building an Azure ML Pipeline for batch scoring. In one step I need to access a key stored in the workspace's Azure Keyvault.
However, I want to strictly separate the authoring environment (responsible for creating the datasets, building the environment, building and running the pipeline) and the production environment (responsible for transforming data, running the prediction etc.).
Therefore, code in the production environment should be somewhat Azure agnostic. I want to be able to submit my inference script to Google Cloud Compute Instances, if needed.
Thus my question is:
What is the best practise to pass secrets to remote runs without having the remote script retrieve it from the keyvault itself?
Is there a way to have redacted environment variables or command line arguments?
Thanks!
Example of what I would like to happen:
# import all azure dependencies
secret = keyvault.get_secret("my_secret")
pipeline_step = PythonScriptStep(
script_name="step_script.py",
arguments=["--input_data", input_data, "--output_data", output_data],
compute_target=compute,
params=["secret": secret] # This will create an env var on the remote?
)
pipeline = Pipeline(workspace, steps=[pipeline_step])
PipelineEndpoint.publish(...)
An within step_script.py:
# No imports from azureml!
secret = os.getenv("AML_PARAMETER_secret")
do_something(secret)

Related

How to place folder with terraform files in a docker container and further deploy it to azure

I have my infrastructure as a code folder with distinct terraform files stored on Azure in a storage account on a resource group that is only used for storing state or secrets used for automation.
How can I place the folder in a docker container and further use it so that secrets remain private?
Never put secrets in a docker image. They are easily reversible and aren't treated as secret.
You would normally store your Terraform files (without secrets) in a source repository that has a pipeline attached. The pipeline could have the secrets defined as "secret variables" (different pipeline tools have different terms for the same thing).
For example, say you need to provide a particular API key to talk to a service with Terraform. Often the provider supports an environment variable out for the credential by default (check their docs), in cases where it doesn't you can create a Terraform variable to do so and set the secret on the pipeline as mentioned earlier.
e.g.
In terraform:
variable "key" {
type = "string"
sensitive = true
}
provider "someprovider" {
project = "..."
region = "..."
key = var.key
}
Then in the pipeline you would define something like:
TF_VAR_key=xxxx-xxxx-xxxx-xxxx
Normally within the pipeline tools you can provide variables to the various steps or docker images (such as Terraform image).

Best way to store Terraform variable values without having them in source control

We have a code repo with our IaC in Terraform. This is in Github, and we're going to pull the code, build it, etc. However, we don't want the values of our variables in Github itself. So this may be a dumb question, but where do we store the values we need for our variables? If my Terraform requires an Azure subscription id, where would I store the subscription id? The vars files won't be in source control. The goal is that we'll be pulling the code into an Azure Devops pipeline so the pipeline will have to know where to go to get the input variable values. I hope that makes sense?
You can store your secrets in Azure Key Vault and retrieve them in Terraform using azurerm_key_vault_secret.
data "azurerm_key_vault_secret" "example" {
name = "secret-sauce"
key_vault_id = data.azurerm_key_vault.existing.id
}
output "secret_value" {
value = data.azurerm_key_vault_secret.example.value
}
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/data-sources/key_vault_secret
There has to be a source of truth eventually.
You can store your values in the pipeline definitions as variables themselves and pass them into the Terraform configuration.
Usually it's a combination of tfvar files (dependent on target environment) and some variables from the pipeline. If you do have vars in your pipelines though, the pipelines should be in code.
If the variables are sensitive then you need to connect to a secret management tool to get those variables.
If you have many environments, say 20 environments and the infra is all the same with exception of a single ID you could have the same pipeline definition (normally JSON or YAML) and reference it for the 20 pipelines you build, each of those 20 would have that unique value baked in for use at execution. That var is passed through to Terraform as the missing piece.
There are other key-value property tracking systems out there but Git definitely works well for this purpose.
You can use Azure DevOps Secure files (pipelines -> library) for storing your credentials for each environment. You can create a tfvar file for each environment with all your credentials, upload it as a secure file in Azure DevOps and then download it in the pipeline with a DownloadSecureFile#1 task.

How safe/protect Azure service principal secret

My deploy task using PowerShell script, which use Service Principal for connection to Azure KeyVault for pull secret. Secret (password) store in PowerShell script's code as plain text. Maybe there is another solution how to minimize token viewing.
And also i use powershell inline mode (not separate script) with Azure DevOps Secret Variable in deploy task, but this solution difficult to support (script has several different operations, so you have to keep many versions of the script).
Script is store in Git repository, anyone who has access to it will be able to see the secret and gain access to other keys. Perhaps I don't understand this concept correctly, but if keys cannot be stored in the code, then what should I do?
I devops you can use variable groups and define that the variables is pulled directly from a selected keyvault (if the service principal you have selected have read/list access to the KV) LINK.
This means that you can define all secrets in keyvault, and they would be pulled before any tasks happens in your yaml. To be able to use them in the script you can define them as a env variable or parameter to your script and just reference $env:variable or just $variable, instead of having the secret hardcoded in your script.

exclude azure data factory connections and integration runtime from azure devops sync

So we have configured ADF to use GIT under DevOps.
Problem is our connection details are getting synced between dev\qa\master branches which are causing issues as each environment has its own SQL Servers.
Is there any way to keep connections and IR out of sync operation between branches?
Look at this similar post Which also asks how to use parameters for SQL connection information in ADF.
Your solution should also leverage Managed Identities for creating the access policies in the Key Vault this can be done via ARM.
One additional comment would be that the Linked Services would be where the parameter substitutions of these values would occur.
Connections rather have to be parameterized than removed from a deployment pipeline.
Parameterization can be done by using "pipeline" and "variable groups" variables
As an example, a pipeline variable adf-keyvault can be used to point to a rigt KeyVault instance that belongs to a certain environment:
adf-keyvault = "adf-kv-yourProjectName-$(Environment)"
Variable $Environment is declared on a variable groups level, so each environment has own value mapped, for instance:
$Environment = 'dev' #development
$Environment = 'stg' #staging
$Environment = 'prd' #production
Therefore the final value of adf-keyvault, depending on environment, resolves into:
adf-keyvault = "adf-kv-yourProjectName-dev"
adf-keyvault = "adf-kv-yourProjectName-stg"
adf-keyvault = "adf-kv-yourProjectName-prd"
And each Key Vault stores connection string to a database server in secret with the same name across environments. For instance:
adf-sqldb-connectionstring = Server=123.123.123.123;Database=adf-sqldb-dev;User Id=myUsername;Password=myPassword;
Because an initial setup of CI/CD pipelines in Azure Data Factory can be complex in a first glance, I blogged recently a step-by-step guide about this topic: Azure Data Factory & DevOps – Setting-up Continuous Delivery Pipeline

GitHub Actions for Terraform - How to provide "terraform.tfvars" file with aws credentials

I am trying to setup GitHub Actions for execute a terraform template.
My confusion is - how do I provide *.tfvars file which has aws credentials. (I can't check-in these files).
Whats the best practice to share the variable's values expected by terraform commands like plan or apply where they need aws_access_key and aws_secret_key.
Here is my GitHub project - https://github.com/samtiku/terraform-ec2
Any guidance here...
You don't need to provide all variables through *.tfvars file. Apart from -var-file option, terraform command provides also -var parameter, which you can use for passing secrets.
In general, secrets are passed to scripts through environment variables. CI tools give you an option to define environment variables in project configuration. It's a manual step, because as you have already noticed, secrets cannot be stored in the repository.
I haven't used Github Actions in particular, but after setting environment variables, all you need to do is run terraform with secrets read from them:
$ terraform -var-file=some.tfvars -var "aws-secret=${AWS_SECRET_ENVIRONMENT_VARIABLE}
This way no secrets are ever stored in the repository code. If you'd like to run terraform locally, you'll need first to export these variables in your shell :
$ export AWS_SECRET_ENVIRONMENT_VARIABLE="..."
Although Terraform allows providing credentials to some providers via their configuration arguments for flexibility in complex situations, the recommended way to pass credentials to providers is via some method that is standard for the vendor in question.
For AWS in particular, the main standard mechanisms are either a credentials file or via environment variables. If you configure the action to follow what is described in one of those guides then Terraform's AWS provider will automatically find those credentials and use them in the same way that the AWS CLI does.
It sounds like environment variables will be the easier way to go within GitHub actions, in which case you can just set the necessary environment variables directly and the AWS provider should use them automatically. If you are using the S3 state storage backend then it will also automatically use the standard AWS environment variables.
If your system includes multiple AWS accounts then you may wish to review the Terraform documentation guide Multi-account AWS Architecture for some ideas on how to model that. The summary of what that guide recommends is to have a special account set aside only for your AWS users and their associated credentials, and then configure your other accounts to allow cross-account access via roles, and then you can use a single set of credentials to run Terraform but configure each instance of the AWS provider to assume the appropriate role for whatever account that provider instance should interact with.

Resources