How can I force docker pull from custom images for Google AI Platform Notebooks? - gcp-ai-platform-notebook

I'm creating custom docker images for Google AI Platform Notebooks as documented in https://cloud.google.com/ai-platform/notebooks/docs/custom-container
But I can't find out how to update this docker image in the instance once it's created.

You can do it using the metadata.
Just to provide an example, create a Notebook instance from UI, or also from CLI:
gcloud compute instances create nb-container-1 \
--image-project=deeplearning-platform-release \
--image-family=common-container-notebooks \
--machine-type=n1-standard-1 \
--accelerator type=nvidia-tesla-t4,count=1 \
--maintenance-policy TERMINATE \
--metadata="proxy-mode=project_editors,install-nvidia-driver=True,container=gcr.io/deeplearning-platform-release/base-cu101:m49" \
--boot-disk-size 200GB \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--zone=asia-southeast1-b
or
gcloud beta notebooks instances create nb-container-2 \
'--machine-type=n1-standard-1' \
'--container-repository=gcr.io/deeplearning-platform-release/base-cu101' \
'--container-tag=m49' \
'--accelerator-type=NVIDIA_TESLA_T4' \
'--accelerator-core-count=1' \
--install-gpu-driver \
'--location=europe-west3-a'
Once instance is created you can do the following:
Stop instance
Edit metadata to latest tag:
container=gcr.io/deeplearning-platform-release/base-cu101:latest
Start instance

Related

Dataproc cluster creation fails with free Google Cloud credits

I am using the free credits of Google Cloud. I followed Dataproc tutorial but when I am running the following command I have an error regarding the storage capacity.
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--region=${REGION} \
--zone=${ZONE} \
--image-version=1.5 \
--master-machine-type=n1-standard-4 \
--worker-machine-type=n1-standard-4 \
--bucket=${BUCKET_NAME} \
--optional-components=ANACONDA,JUPYTER \
--enable-component-gateway \
--metadata 'PIP_PACKAGES=google-cloud-bigquery google-cloud-storage' \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh
Do you have any idea how to fix this? I changed n1-standard-4 to n1-standard-1 but I could not fix it. However, when I removed --image-version=1.5 the command works. Does it create any problem for the rest of the program?
Also from the web interface when I click on JupyterLab link, I can not see Python 3 icon among the kernels available on my Dataproc cluster. I only have Python 2 and it keeps saying connection with the server is gone.
Here is picture of JupyterLab error:
You are seeing an error regarding storage capacity because in 1.5 image version Dataproc uses bigger 1000 GiB disks for master and worker nodes to improve performance. You can reduce disk size by using --master-boot-disk-size=100GB and --worker-boot-disk-size=100GB command flags:
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--region=${REGION} \
--zone=${ZONE} \
--image-version=1.5 \
--master-machine-type=n1-standard-4 \
--master-boot-disk-size=100GB \
--worker-machine-type=n1-standard-4 \
--worker-boot-disk-size=100GB \
--bucket=${BUCKET_NAME} \
--optional-components=ANACONDA,JUPYTER \
--enable-component-gateway \
--metadata 'PIP_PACKAGES=google-cloud-bigquery google-cloud-storage' \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh
When you removed --image-version=1.5 flag this command used default 1.3 image version that does not support Python 3 by default, that's why you are not seeing Python 3 kernel in JupyterLab.

Docker run cant find google authentication "oauth2google.DefaultTokenSource: google: could not find default credentials"

Hey there I am trying to figure out why i keep getting this error when running the docker run command. Here is what i am running
docker run -p 127.0.0.1:2575:2575 -v ~/.config:/home/.config gcr.io/cloud-healthcare-containers/mllp-adapter /usr/mllp_adapter/mllp_adapter --hl7_v2_project_id=****** --hl7_v2_location_id=us-east1 --hl7_v2_dataset_id=*****--hl7_v2_store_id=*****--export_stats=false --receiver_ip=0.0.0.0
I have tried both ubuntu and windows with an error that it failed to connect and to see googles service authentication documentation. I have confirmed the account is active and the keys are exported to the config below
randon#ubuntu-VM:~/Downloads$ gcloud auth configure-docker
WARNING: Your config file at [/home/brandon/.docker/config.json] contains these credential helper entries:
{
"credHelpers": {
"gcr.io": "gcloud",
"us.gcr.io": "gcloud",
"eu.gcr.io": "gcloud",
"asia.gcr.io": "gcloud",
"staging-k8s.gcr.io": "gcloud",
"marketplace.gcr.io": "gcloud"
}
I am thinking its something to do with the -v command on how it uses the google authentication. Any help or guidance to fix, Thank you
-v ~/.config:/root/.config is used to give the container access to gcloud credentials;
I was facing the same for hours and I decided check the source code even I not being a go developer.
So, there I figured out the we have a credentials option to set the credentials file. It's not documented for now.
The docker command should be like:
docker run \
--network=host \
-v ~/.config:/root/.config \
gcr.io/cloud-healthcare-containers/mllp-adapter \
/usr/mllp_adapter/mllp_adapter \
--hl7_v2_project_id=$PROJECT_ID \
--hl7_v2_location_id=$LOCATION \
--hl7_v2_dataset_id=$DATASET_ID \
--hl7_v2_store_id=$HL7V2_STORE_ID \
--credentials=/root/.config/$GOOGLE_APPLICATION_CREDENTIALS \
--export_stats=false \
--receiver_ip=0.0.0.0 \
--port=2575 \
--api_addr_prefix=https://healthcare.googleapis.com:443/v1 \
--logtostderr
Don't forget to put your credentials file inside your ~/.config folder.
Here it worked fine. I hope helped you.
Cheers

Microsoft !GERMAN! Azure Cloud - getting oauth2-proxy to work

I am trying to setup oauth2-proxy to authenticate against microsofts german azure cloud. It's quite a ride, but I got as far as being able to do the oauth handshake. However, I am getting an error when trying to receive user mail and name via the graph API.
I run the proxy within docker like this:
docker run -it -p 8081:8081 \
--name oauth2-proxy --rm \
bitnami/oauth2-proxy:latest \
--upstream=http://localhost:8080 \
--provider=azure \
--email-domain=homefully.de \
--cookie-secret=super-secret-cookie \
--client-id=$CLIENT_ID \
--client-secret="$CLIENT_SECRET" \
--http-address="0.0.0.0:8081" \
--redirect-url="http://localhost:8081/oauth2/callback" \
--login-url="https://login.microsoftonline.de/common/oauth2/authorize" \
--redeem-url="https://login.microsoftonline.de/common/oauth2/token" \
--resource="https://graph.microsoft.de" \
--profile-url="https://graph.microsoft.de/me"
Right now it's stumbling upon the profile url (which is used to retrieve the identity of the user loggin in)
The log output is this:
2019/01/28 09:24:51 api.go:21: 400 GET https://graph.microsoft.de/me {
"error": {
"code": "BadRequest",
"message": "Invalid request.",
"innerError": {
"request-id": "1e55a321-87c2-4b85-96db-e80b2a5af1a3",
"date": "2019-01-28T09:24:51"
}
}
}
I would REALLY appreciate suggestions about what I am doing wrong here? So far the documentation has not been really helpful to me. It seems that things are slighly different in the german azure cloud, but documentation is pretty thin on that. The fact that the azure docs only describe the US cloud where all urls are different (not in a very logical sense unfortunately) makes things a lot harder...
Best,
Matthias
the issue was that the profile url https://graph.microsoft.de/me was incorrect.
While https://graph.microsoft.com/me is valid for the US cloud, the german cloud requires the version embedded in the URL like this:
https://graph.microsoft.de/v1.0/me.
This worked for me:
docker run -it -p 8081:8081 \
--name oauth2-proxy --rm \
bitnami/oauth2-proxy:latest \
--upstream=http://localhost:8080 \
--provider=azure \
--email-domain=homefully.de \
--cookie-secret=super-secret-cookie \
--client-id=$CLIENT_ID \
--client-secret="$CLIENT_SECRET" \
--http-address="0.0.0.0:8081" \
--redirect-url="http://localhost:8081/oauth2/callback" \
--login-url="https://login.microsoftonline.de/common/oauth2/authorize" \
--redeem-url="https://login.microsoftonline.de/common/oauth2/token" \
--resource="https://graph.microsoft.de" \
--profile-url="https://graph.microsoft.de/v1.0/me"

PySpark Job fails with workflow template

To follow with this question I decided to try the workflow template API.
Here's what it looks like :
gcloud beta dataproc workflow-templates create lifestage-workflow --region europe-west2
gcloud beta dataproc workflow-templates set-managed-cluster lifestage-workflow \
--master-machine-type n1-standard-8 \
--worker-machine-type n1-standard-16 \
--num-workers 6 \
--cluster-name lifestage-workflow-cluster \
--initialization-actions gs://..../init.sh \
--zone europe-west2-b \
--region europe-west2 \
gcloud beta dataproc workflow-templates add-job pyspark gs://.../main.py \
--step-id prediction \
--region europe-west2 \
--workflow-template lifestage-workflow \
--jars gs://.../custom.jar \
--py-files gs://.../jobs.zip,gs://.../config.ini \
-- --job predict --conf config.ini
The template is correctly created.
The job works when I run it manually from one of my already existing clusters. It also runs when I use an existing cluster instead of asking the workflow to create one.
The thing is I want the cluster to be created before running the job and deleted just after, that's why I'm using a managed cluster.
But with the managed cluster I just can't make it run. I tried to use the same configuration as my existing clusters but it doesn't change anything.
I always get the same error.
Any idea why my job runs perfectly except for when it is run from a generated cluster ?
The problem came from the version of the managed cluster.
By default the image version was 1.2.31 and my existing cluster was using the image 1.2.28. When I changed the config to add --image-version=1.2.28 it worked.
Dataproc image 1.2.31 Upgraded Spark to 2.2.1 which introduced [SPARK-22472]:
SPARK-22472: added null check for top-level primitive types. Before
this release, for datasets having top-level primitive types, and it
has null values, it might return some unexpected results. For example,
let’s say we have a parquet file with schema , and we read it
into Scala Int. If column a has null values, when transformation is
applied some unexpected value can be returned.
This likely added just enough generated code to take classes of 64k limit.

How can I include additional jars when starting a Google DataProc cluster to use with Jupyter notebooks?

I am following the instructions for starting a Google DataProc cluster with an initialization script to start a jupyter notebook.
https://cloud.google.com/blog/big-data/2017/02/google-cloud-platform-for-data-scientists-using-jupyter-notebooks-with-apache-spark-on-google-cloud
How can I include extra JAR files (spark-xml, for example) in the resulting SparkContext in Jupyter notebooks (particularly pyspark)?
The answer depends slightly on which jars you're looking to load. For example, you can use spark-xml with the following when creating a cluster:
$ gcloud dataproc clusters create [cluster-name] \
--zone [zone] \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--properties spark:spark.jars.packages=com.databricks:spark-xml_2.11:0.4.1
To specify multiple Maven coordinates, you will need to swap the gcloud dictionary separator character from ',' to something else (as we need to use that to separate the packages to install):
$ gcloud dataproc clusters create [cluster-name] \
--zone [zone] \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--properties=^#^spark:spark.jars.packages=artifact1,artifact2,artifact3
Details on how escape characters are changed can be found in gcloud:
$ gcloud help topic escaping

Resources