Using the Environment Class with Pipeline Runs - azure-machine-learning-service

I am using an estimator step for a pipeline using the Environment class, in order to have a custom Docker image as I need some apt-get packages to be able to install a specific pip package. It appears from the logs that it's completely ignoring, unlike the non-pipeline version of the estimator, the docker portion of the environment variable. Very simply, this seems broken :
I'm running on SDK v1.0.65, and my dockerfile is completely ignored, I'm using
FROM mcr.microsoft.com/azureml/base:latest\nRUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc
in the base_dockerfile property of my code.
Here's a snippet of my code :
from azureml.core import Environment
from azureml.core.environment import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package('pymssql==2.1.1')
myenv = Environment(name="mssqlenv")
myenv.python.conda_dependencies=conda_dep
myenv.docker.enabled = True
myenv.docker.base_dockerfile = 'FROM mcr.microsoft.com/azureml/base:latest\nRUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc'
myenv.docker.base_image = None
This works well when I use an Estimator by itself, but if I insert this estimator in a Pipeline, it fails. Here's my code to launch it from a Pipeline run:
from azureml.pipeline.steps import EstimatorStep
sql_est_step = EstimatorStep(name="sql_step",
estimator=est,
estimator_entry_script_arguments=[],
runconfig_pipeline_params=None,
compute_target=cpu_cluster)
from azureml.pipeline.core import Pipeline
from azureml.core import Experiment
pipeline = Pipeline(workspace=ws, steps=[sql_est_step])
pipeline_run = exp.submit(pipeline)
When launching this, the logs for the container building service reveal:
FROM continuumio/miniconda3:4.4.10... etc.
Which indicates it's ignoring my FROM mcr.... statement in the Environment class I've associated with this Estimator, and my pip install fails.
Am I missing something? Is there a workaround?

I can confirm that this is a bug on the AML Pipeline side. Specifically, the runconfig property environment.docker.base_dockerfile is not being passed through correctly in pipeline jobs. We are working on a fix. In the meantime, you can use the workaround from this thread of building the docker image first and specifying it with environment.docker.base_image (which is passed through correctly).

I found a workaround for now, which is to build your own Docker image. You can do this by using these options of the DockerSection of the Environment :
myenv.docker.base_image_registry.address = '<your_acr>.azurecr.io'
myenv.docker.base_image_registry.username = '<your_acr>'
myenv.docker.base_image_registry.password = '<your_acr_password>'
myenv.docker.base_image = '<your_acr>.azurecr.io/testimg:latest'
and use obviously whichever docker image you built and pushed to the container registry linked to the Azure Machine Learning Workspace.
To create the image, you would run something like this at the command line of a machine that can build a linux based container (like a Notebook VM):
docker build . -t <your_image_name>
# Tag it for upload
docker tag <your_image_name:latest <your_acr>.azurecr.io/<your_image_name>:latest
# Login to Azure
az login
# login to the container registry so that the push will work
az acr login --name <your_acr>
# push the image
docker push <your_acr>.azurecr.io/<your_image_name>:latest
Once the image is pushed, you should be able to get that working.

I also initially used EstimatorStep for custom images, but recently have figured out how to successfully pass Environment's first to RunConfiguration's, then to PythonScriptStep's. (example below)
Another workaround similar to your workaround would be to publish your custom docker image to Docker hub, then the param, docker_base_image becomes the URI, in our case mmlspark:0.16.
def get_environment(env_name, yml_path, user_managed_dependencies, enable_docker, docker_base_image):
env = Environment(env_name)
cd = CondaDependencies(yml_path)
env.python.conda_dependencies = cd
env.python.user_managed_dependencies = user_managed_dependencies
env.docker.enabled = enable_docker
env.docker.base_image = docker_base_image
return env
spark_env = f.get_environment(env_name='spark_env',
yml_path=os.path.join(os.getcwd(), 'compute/aml_config/spark_compute_dependencies.yml'),
user_managed_dependencies=False, enable_docker=True,
docker_base_image='microsoft/mmlspark:0.16')
# use pyspark framework
spark_run_config = RunConfiguration(framework="pyspark")
spark_run_config.environment = spark_env
roll_step = PythonScriptStep(
name='rolling window',
script_name='roll.py',
arguments=['--input_dir', joined_data,
'--output_dir', rolled_data,
'--script_dir', ".",
'--min_date', '2015-06-30',
'--pct_rank', 'True'],
compute_target=compute_target_spark,
inputs=[joined_data],
outputs=[rolled_data],
runconfig=spark_run_config,
source_directory=os.path.join(os.getcwd(), 'compute', 'roll'),
allow_reuse=pipeline_reuse
)
A couple of other points (that may be wrong):
PythonScriptStep is effectively a wrapper for ScriptRunConfig, which takes run_config as an argument
Estimator is a wrapper for ScriptRunConfig where RunConfig settings are made available as parameters
IMHO EstimatorStep shouldn't exist because it is better to define Env's and Steps separately instead of at the same time in one call.

Related

Nextjs App reading configuration from Azure App Service

We have a nextjs project which is build by docker and deploy into Azure App Service (container). We also setup configuration values within App Service and try to access it, however its not working as expected.
Few things we tried
Restarting the App Service after adding new configuration
removing .env file while building the docker image
including .env file while building the docker image
Here's how we read try to read the environment variables within the App Service
const env = process.env.NEXT_PUBLIC_ENV;
const A = process.env.NEXT_PUBLIC_AS_VALUE;
Wondering if this can actually be done?
Just thinking something out loud below,
Since we're deploying the docker image within App Service's Container (Linux).. does that mean, the container can't pull the value from this environment variable?
Docker image already perform the npm run build, would that means the image is in static formed (build time). It will never ready from App Service configuration (runtime).
After a day or 2, I came up with an alternative solution by passing the environment values in Dockerfile while building my project.
TLDR
Pass your env values within dockerfile
Set all your env (dev, staging, prod, etc) var values in Pipeline variable.
Set a "settable" variable inside the Pipeline variable too, so you can set to build different environment while triggering your pipeline (eg, buildEnv)
Setup a bash script to perform variable text changing (eg, from firebaseApiKey to DEVfirebaseApiKey ) according to env received from buildEnv.
Use "replace token" task from Azure Pipeline to replace values inside Dockerfile
Build your docker image
Huaala~ now you get your environment specific build
Details
Within your Dockerfile you can place your env variable like this
RUN NEXT_PUBLIC_ENV=#{env}# \
NEXT_PUBLIC_FIREBASE_API_KEY=#{firebaseApiKey}# \
NEXT_PUBLIC_FIREBASE_AUTH_DOMAIN=#{firebaseAuthDomain}# \
NEXT_PUBLIC_FIREBASE_PROJECT_ID=#{firebaseProjectId}# \
NEXT_PUBLIC_FIREBASE_STORAGE_BUCKET=#{firebaseStorageBucket}# \
NEXT_PUBLIC_FIREBASE_MESSAGING_SENDER_ID=#{firebaseMessagingSenderId}# \
NEXT_PUBLIC_FIREBASE_APP_ID=#{firebaseAppId}# \
NEXT_PUBLIC_FIREBASE_MEASUREMENT_ID=#{firebaseMeasurementId}# \
NEXT_PUBLIC_BASE_URL=#{baseURL}# \
npm run build
These values set (eg, baseURL, firebaseMeasurementId, etc) are only a placeholder, because we need to change them later with bash script according to the buildEnv we receive. (buildEnv is settable when you trigger a build)
Bash script sample as below. What it does is that it will look within your Dockerfile for the word env and change to DEVenv / UATenv / PRODenv based on what you're passing to buildEnv
#!/bin/bash
case $(buildENV) in
dev)
sed -i -e 's/env/DEVenv/g' ./Dockerfile
;;
uat)
sed -i -e 's/env/UATenv/g' ./Dockerfile
;;
prod)
sed -i -e 's/env/PROenvD/g' ./Dockerfile
;;
*)
echo -n "unknown"
;;
esac
When this is complete, your "environment specific" docker file is sort of created. Now we'll make use of the "replace token" task from Azure Pipeline to replace the values inside Dockerfile. **Make sure you have all your values setup in Pipeline Variable!
Lastly all you may build your docker image and deploy :)

Docker and AzureKeyVault: unable to load shared library 'libsecret-1.so.0'

I have Asp.net core Xunit integration tests that connect to MongoDb to test basic repositories on collections. The tests are built and run in a container in AKS. I have setup the test fixture to connect Azure Key Vault to retrieve connection string to a MongoDb.
var pathToSetting= Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
var configuration = new ConfigurationBuilder()
.SetBasePath(pathToSetting)
.AddJsonFile("appsettings.json")
.AddEnvironmentVariables();
var secretClient = new SecretClient(
new Uri("url_to_Azure_keyVault"),
new DefaultAzureCredential(),
new SecretClientOptions()
{
Retry =
{
Delay = TimeSpan.FromSeconds(2),
MaxDelay = TimeSpan.FromSeconds(4),
MaxRetries = 2,
Mode = RetryMode.Exponential
}
});
configuration.AddAzureKeyVault(secretClient, new KeyVaultSecretManager());
I am using the following Docker file for the integration tests:
#Grab an OS image made to run the .Net Core SDK
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS build
#copy files for build
WORKDIR /testProject
COPY . .
RUN dotnet build tests/integrationTest.csproj --output /testProject/artifacts
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS final
COPY --from=build ["/testProject/artifacts", "/testProject/artifacts"]
ENTRYPOINT dotnet test /testProject/artifacts/integrationTest.dll
The tests run fine locally from Visual Studio but fail with exception below when run in container both locally and in AKS.
[xUnit.net 00:00:03.10] IntegrationTest1 [FAIL]X
Test1 [1ms]
Error Message:
System.AggregateException : One or more errors occurred. (SharedTokenCacheCredential authentication failed: Persistence check failed. Inspect inner exception for details) (The following constructor parameters did not have matching fixture data: TestFixture testFixture)
---- Azure.Identity.AuthenticationFailedException : SharedTokenCacheCredential authentication failed: Persistence check failed. Inspect inner exception for details
-------- Microsoft.Identity.Client.Extensions.Msal.MsalCachePersistenceException : Persistence check failed. Inspect inner exception for details
------------ System.DllNotFoundException : Unable to load shared library 'libsecret-1.so.0' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment
variable: liblibsecret-1.so.0: cannot open shared object file: No such file or directory
Any ideas how to troubleshoot this error ?
I came across this potential fix while working on my own issue:
Wherever you create new DefaultAzureCredentialOptions, you should also set the property ExcludeSharedTokenCacheCredential to true.
In your WSL environment install libsecret-1-dev. In Ubuntu for example, run the command sudo apt install libsecret-1-dev. This will add libsecret-1.so.0 to your system so that MSAL can find it.
https://hungyi.net/posts/wsl2-msal-extensions/
It didn't work for me, but I am using a docker container that doesn't have full access to apt. I can't install libsecret-1-dev.
Not a root cause, but same error popped up for me this morning. Rolling Microsoft.Web.Identity package down from 1.7.0 to 1.6.0 did the trick.
Looks like from the GitHub issues on other Azure packages, wrapping these exceptions is a common bug that gets logged.
Switching Azure.Identity 1.2.3 to 1.2.2 did the trick for me (this page helped me https://hungyi.net/posts/wsl2-msal-extensions/).

Dataflow Error with Apache Beam SDK 2.20.0

I am trying to build an Apache Beam pipeline in Python 3.7 with beam sdk version 2.20.0, the pipeline gets deployed on Dataflow successfully but does not seem to be doing anything. In the worker logs, I can see the following error message repeatedly reported
Error syncing pod xxxxxxxxxxx (), skipping: Failed to start container
worker log
I have tried everything I could but this error is quite stubborn, my pipeline looks like this.
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import GoogleCloudOptions
from apache_beam.options.pipeline_options import StandardOptions
from apache_beam.options.pipeline_options import WorkerOptions
from apache_beam.options.pipeline_options import SetupOptions
from apache_beam.options.pipeline_options import DebugOptions
options = PipelineOptions()
options.view_as(GoogleCloudOptions).project = PROJECT
options.view_as(GoogleCloudOptions).job_name = job_name
options.view_as(GoogleCloudOptions).region = region
options.view_as(GoogleCloudOptions).staging_location = staging_location
options.view_as(GoogleCloudOptions).temp_location = temp_location
options.view_as(WorkerOptions).zone = zone
options.view_as(WorkerOptions).network = network
options.view_as(WorkerOptions).subnetwork = sub_network
options.view_as(WorkerOptions).use_public_ips = False
options.view_as(StandardOptions).runner = 'DataflowRunner'
options.view_as(StandardOptions).streaming = True
options.view_as(SetupOptions).sdk_location = ''
options.view_as(SetupOptions).save_main_session = True
options.view_as(DebugOptions).experiments = []
print('running pipeline...')
with beam.Pipeline(options=options) as pipeline:
(
pipeline
| 'ReadFromPubSub' >> beam.io.ReadFromPubSub(topic=topic_name).with_output_types(bytes)
| 'ProcessMessage' >> beam.ParDo(Split())
| 'WriteToBigQuery' >> beam.io.WriteToBigQuery(table=bq_table_name,
schema=bq_schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
)
result = pipeline.run()
I have tried supplying a beam sdk 2.20.0.tar.gz from the compute instance using sdk_location parameter, that doesn't work either. I can't use sdk_location = default as that triggers a download from pypi.org. I am working in an offline environment and connectivity to internet is not an option. Any help would be highly appreciated.
The pipeline itself is deployed on a container and all libraries that go with apache beam 2.20.0 are specified in a requirements.txt file, docker image installs all the libraries.
TL;DR : Copy the Apache Beam SDK Archive into an accessible path and provide the path as a variable.
I was also struggling with this setup. Finally I found a solution - even if your question was raised quite some days ago, this answer might help someone else.
There are probably multiple ways to do that, but the following two are quite simple.
As a precondition you'll need to create the apache-beam-sdk source archive as following:
Clone Apache Beam GitHub
Switch to required tag eg. v2.28.0
cd to beam/sdks/python
Create tar.gz source archive of your required beam_sdk version like following:
python setup.py sdist
Now you should have the source archive apache-beam-2.28.0.tar.gz in the path beam/sdks/python/dist/
Option 1 - Use Flex templates and copy Apache_Beam_SDK in Dockerfile
Documentation : Google Dataflow Documentation
Create a Dockerfile --> you have to include this COPY utils/apache-beam-2.28.0.tar.gz /tmp, because this is going to be the path you can set in your SetupOptions.
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
# Due to a change in the Apache Beam base image in version 2.24, you must to install
# libffi-dev manually as a dependency. For more information:
# https://github.com/GoogleCloudPlatform/python-docs-samples/issues/4891
# update used packages
RUN apt-get update && apt-get install -y \
libffi-dev \
&& rm -rf /var/lib/apt/lists/*
COPY setup.py .
COPY main.py .
COPY path_to_beam_archive/apache-beam-2.28.0.tar.gz /tmp
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
RUN python -m pip install --user --upgrade pip setuptools wheel
Set sdk_location to path you've copied the apache_beam_sdk.tar.gz to:
options.view_as(SetupOptions).sdk_location = '/tmp/apache-beam-2.28.0.tar.gz'
Build the Docker image with Cloud Build
gcloud builds submit --tag $TEMPLATE_IMAGE .
Create a Flex template
gcloud dataflow flex-template build "gs://define-path-to-your-templates/your-flex-template-name.json" \
--image=gcr.io/your-project-id/image-name:tag \
--sdk-language=PYTHON \
--metadata-file=metadata.json
Run generated flex-template in your subnetwork (if required)
gcloud dataflow flex-template run "your-dataflow-job-name" \
--template-file-gcs-location="gs://define-path-to-your-templates/your-flex-template-name.json" \
--parameters staging_location="gs://your-bucket-path/staging/" \
--parameters temp_location="gs://your-bucket-path/temp/" \
--service-account-email="your-restricted-sa-dataflow#your-project-id.iam.gserviceaccount.com" \
--region="yourRegion" \
--max-workers=6 \
--subnetwork="https://www.googleapis.com/compute/v1/projects/your-project-id/regions/your-region/subnetworks/your-subnetwork" \
--disable-public-ips
Option 2 - Copy sdk_location from GCS
According Beam documentation you should be able to even provide directly a GCS / gs:// path for the Option sdk_location, but it didn't work for me. But the following should work:
Upload previously generated archive to a bucket which you're able to access from your Dataflow Job you'd like to execute. Probably to something like gs://yourbucketname/beam_sdks/apache-beam-2.28.0.tar.gz
Copy the apache-beam-sdk in your source code to eg. /tmp/apache-beam-2.28.0.tar.gz
# see: https://cloud.google.com/storage/docs/samples/storage-download-file
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket("gs://your-bucket-name")
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob("gs://your-bucket-name/path/apache-beam-2.28.0.tar.gz")
blob.download_to_filename("/tmp/apache-beam-2.28.0.tar.gz")
Now you can set the sdk_location to the path you've downloaded the sdk archive.
options.view_as(SetupOptions).sdk_location = '/tmp/apache-beam-2.28.0.tar.gz'
Now your Pipeline should be able to run without internet breakout.

How can I set run_name in mlflow command line?

MLFlow version: 1.4.0
Python version: 3.7.4
I'm running the UI as mlflow server... with all the required command line options.
I am logging to MLFlow as an MLFlow project, with the appropriate MLproject.yaml file. The project is being run on a Docker container, so the CMD looks like this:
mlflow run . -P document_ids=${D2V_DOC_IDS} -P corpus_path=... --no-conda --experiment-name=${EXPERIMENT_NAME}
Running the experiment like this results in a blank run_name. I know there's a run_id but I'd also like to see the run_name and set it in my code -- either in the command line, or in my code as mlflow.log.....
I've looked at Is it possible to set/change mlflow run name after run initial creation? but I want to programmatically set the run name instead of changing it manually on the UI.
One of the parameters to mlflow.start_run() is run_name. This would give you programmatic access to set the run name with each iteration. See the docs here.
Here's an example:
from datetime import datetime
## Define the name of our run
name = "this run is gonna be bananas" + datetime.now()
## Start a new mlflow run and set the run name
with mlflow.start_run(run_name = name):
## ...train model, log metrics/params/model...
## End the run
mlflow.end_run()
If you want to include set the name as part of an MLflow Project, you'll have to specify it as a parameter in the entry points to the project. This is located in in the MLproject file. Then you can pass those values into the mlflow.start_run() function from the command line.
for CLI, this seems to now be available:
--run-name <runname>
https://mlflow.org/docs/latest/cli.html#cmdoption-mlflow-run-run-name

How to reuse successfully built docker images in Azure ML?

In our company I use Azure ML and I have the following issue. I specify a conda_requirements.yaml file with the PyTorch estimator class, like so (... are placeholders so that I do not have to type everything out):
from azureml.train.dnn import PyTorch
est = PyTorch(source_directory=’.’, script_params=..., compute_target=..., entry_script=..., conda_dependencies_file_path=’conda_requirements.yaml’, environment_variables=..., framework_version=’1.1’)
The conda_requirements.yaml (shortened version of the pip part) looks like this:
dependencies:
- conda=4.5.11
- conda-package-handling=1.3.10
- python=3.6.2
- cython=0.29.10
- scikit-learn==0.21.2
- anaconda::cloudpickle==1.2.1
- anaconda::cffi==1.12.3
- anaconda::mxnet=1.1.0
- anaconda::psutil==5.6.3
- anaconda::pip=19.1.1
- anaconda::six==1.12.0
- anaconda::mkl==2019.4
- conda-forge::openmpi=3.1.2
- conda-forge::pycparser==2.19
- tensorboard==1.13.1
- tensorflow==1.13.1
- pip:
- torch==1.1.0
- torchvision==0.2.1
This successfully builds on Azure. Now in order to reuse the resulting docker image in that case, I use the custom_docker_image parameter to pass to the
from azureml.train.estimator import Estimator
est = Estimator(source_directory=’.’, script_params=..., compute_target=..., entry_script=..., custom_docker_image=’<container registry name>.azurecr.io/azureml/azureml_c3a4f...’, environment_variables=...)
But now Azure somehow seems to rebuild the image again and when I run the experiment it cannot install torch. So it seems to only install the conda dependencies and not the pip dependencies, but actually I do not want Azure to rebuild the image. Can I solve this somehow?
I attempted to somehow build a docker image from my Docker file and then add to the registry. I can do az login and according to https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication I then should also be able to do an acr login and push. This does not work.
Even using the credentials from
az acr credential show –name <container registry name>
and then doing a
docker login <container registry name>.azurecr.io –u <username from credentials above> -p <password from credentials above>
does not work.
The error message is authentication required even though I used
az login
successfully. Would also be happy if someone could explain that to me in addition to how to reuse docker images when using Azure ML.
Thank you!
AzureML should actually cache your docker image once it was created. The service will hash the base docker info and the contents of the conda.yaml file and will use that as the hash key -- unless you change any of that information, the docker should come from the ACR.
As for the custom docker usage, did you set the parameter user_managed=True? Otherwise, AzureML will consider your docker to be a base image on top of which it will create the conda environment per your yaml file.
There is an example of how to use a custom docker image in this notebook:
https://github.com/Azure/MachineLearningNotebooks/blob/4170a394edd36413edebdbab347afb0d833c94ee/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb

Resources