Dataproc is not installing custom Conda package from custom Conda channel - apache-spark

I am attempting to spin up a single node Dataproc "cluster" in GCP that installs additional packages from both conda-forge and a custom Conda channel. The gcloud command I run is:
gcloud beta dataproc clusters create MY_CLUSTER_NAME \
--enable-component-gateway \
--bucket MY_GCS_BUCKET \
--region us-central1 \
--subnet default \
--zone us-central1-a \
--single-node \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--image-version 1.5-ubuntu18 \
--properties spark:spark.jars.packages=org.apache.spark:spark-avro_2.12:2.4.4,spark-env:spark.jars.packages=org.apache.spark:spark-avro_2.12:2.4.4 \
--optional-components ANACONDA,JUPYTER \
--max-idle 7200s \
--scopes 'https://www.googleapis.com/auth/cloud-platform' \
--project MY_PROJECT_ID \
--metadata='CONDA_PACKAGES=pandas matplotlib seaborn scikit-learn MY_CUSTOM_PACKAGE' \
--metadata='CONDA_CHANNELS=conda-forge https://MY_CUSTOM_CONDA_CHANNEL'
I have verified I can conda install -c https://MY_CUSOMT_CONDA_CHANNEL MY_CUSTOM_PACKAGE locally, and that other packages are being installed. When searching through the logs for the cluster, I find no entries about the installation of the additional conda packages.
Questions:
Where can I find logs that will help me debug this problem?
Is there something wrong with the above command?

It seems that you didn't add the conda-install.sh init action when creating the cluster, see more details in this doc, e.g.:
gcloud dataproc clusters create my-cluster \
--image-version=1.4 \
--region=${REGION} \
--metadata='CONDA_PACKAGES=pandas matplotlib seaborn scikit-learn MY_CUSTOM_PACKAGE' \
--metadata='CONDA_CHANNELS=conda-forge https://MY_CUSTOM_CONDA_CHANNEL' \
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/python/conda-install.sh
You should be able to find the init action log at /var/log/dataproc-initialization-script-0.log, see more details in this doc.

Related

Private AKS is not able to find the path of my local files to do a deployment through Helm

I'm currently trying to deploy my helm charts through my private aks cluster. However, I'm unable to do anything since it can't find the path of my local directory.
This is the command that I'm running:
az aks command invoke \
--resource-group aharo-aks-appgateway01 \
--name aharo-aks02 \
--command "helm install haro ./haro_files_helm_chart"
This is the error message that I'm getting
command started at 2023-01-06 22:49:46+00:00, finished at 2023-01-06 22:49:46+00:00 with exitcode=1
Error: INSTALLATION FAILED: path "./haro_files_helm_chart" not found
To prove that this type of commands can work, I tried one from the Microsoft Documentation:
az aks command invoke \
--resource-group aharo-aks-appgateway01 \
--name aharo-aks02 \
--command "helm repo add bitnami https://charts.bitnami.com/bitnami && helm repo update && helm install my-release bitnami/nginx"
What else can I do to find the path of my directory? Do you know if I could be missing any configuration on my cluster?
When you are passing the helm install command to the AKS VMs, the VMs(nodes) will be looking for ./haro_files_helm_chart in their filesystem not the machine that is running the command, hence the path not found error.
In the example you shared, the node is installing a helm chart that it is downloading first.
To resolve the issue, you should attach the directory of the helm chart with the az aks command invoke as documented here. Below is the part you need:
You can also attach all files in the current directory. For example:
az aks command invoke \
--resource-group myResourceGroup \
--name myAKSCluster \
--command "kubectl apply -f deployment.yaml configmap.yaml -n default" \
--file .
For example, I created a chart called "test-chart" and installed it using helm create test-chart. The chart would be created in the current directory im in:
$ls
test-chart
Then run the same command shared above and just change the command (without changing the directory):
az aks command invoke \
--resource-group myResourceGroup \
--name myAKSCluster \
--command "helm install test-chart-override-name test-chart" \
--file .
The answer for this is the following:
az aks command invoke \
--resource-group aharo-aks-appgateway01 \
--name aharo-aks02 \
--command "helm install haro . "
Anoother workaround is uploading your helm charts to your container registry & then, you will have to download them and install them directly like the example from microsoft:
az aks command invoke \
--resource-group aharo-aks-appgateway01 \
--name aharo-aks02 \
--command "helm repo add bitnami https://charts.bitnami.com/bitnami && helm repo update && helm install my-release bitnami/nginx"

AKS Helm Install - Not Authorized after connecting with ACR

I have a AKS and ACR, and attached the ACR successfully with my AKS using
az aks update -n <AKSNAME> -g <RESOURCE> --attach-acr <ACRNAME>
Yet, when I run the command below from this how-to-guide, I get a Error: failed pre-install: timed out waiting for the condition. Upon further investigating with kubectl get events, I find that the images I pull from ACR are failing due to authorization: failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized.
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-basic \
--set controller.replicaCount=2 \
--set controller.nodeSelector."kubernetes\.io/os"=linux \
--set controller.image.registry=$ACR_URL \
--set controller.image.image=$CONTROLLER_IMAGE \
--set controller.image.tag=$CONTROLLER_TAG \
--set controller.image.digest="" \
--set controller.admissionWebhooks.patch.nodeSelector."kubernetes\.io/os"=linux \
--set controller.admissionWebhooks.patch.image.registry=$ACR_URL \
--set controller.admissionWebhooks.patch.image.image=$PATCH_IMAGE \
--set controller.admissionWebhooks.patch.image.tag=$PATCH_TAG \
--set defaultBackend.nodeSelector."kubernetes\.io/os"=linux \
--set defaultBackend.image.registry=$ACR_URL \
--set defaultBackend.image.image=$DEFAULTBACKEND_IMAGE \
--set defaultBackend.image.tag=$DEFAULTBACKEND_TAG \
--set controller.service.loadBalancerIP=$STATIC_IP \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-dns-label-name"=$DNS_LABEL
I find this very strange, cause I have the images in my ACR, and I have successfully authenticated ACR with AKS.
I run everything from the Azure CLI, Helm version 3.4.0. I found some related issues, which all use the attach-acr.

Sequential execution of multiple spark jobs in dataproc / gcp

I would like to launch sequentially multiple spark jobs in gcp, like
gcloud dataproc jobs submit spark file1.py
gcloud dataproc jobs submit spark file2.py
...
so that the execution of one of those starts just when the execution of the previous job is completed.
Is there any way to do it?
This can be done using Dataproc Workflows templates
This workflow will create and delete the cluster as part of the workflow.
These are the steps you can follow to create the workflow:
Create your workflow template
export REGION=us-central1
gcloud dataproc workflow-templates create workflow-id \
--region $REGION
Set a Dataproc cluster type that will be used for the jobs
gcloud dataproc workflow-templates set-managed-cluster workflow-id \
--region $REGION \
--master-machine-type machine-type \
--worker-machine-type machine-type \
--num-workers number \
--cluster-name cluster-name
Add the jobs as steps to your workflow
gcloud dataproc workflow-templates add-job pyspark gs://bucket-name/file1.py \
--region $REGION \
--step-id job1 \
--workflow-template workflow-id
The second job needs the parameter --start-after to make sure it runs after the first job.
gcloud dataproc workflow-templates add-job pyspark gs://bucket-name/file2.py \
--region $REGION \
--step-id job2 \
--start-after job1 \
--workflow-template workflow-id
Run the workflow
gcloud dataproc workflow-templates instantiate template-id \
--region $REGION \

Velero installation in kubernetes server using azure provider

While installation of Velero on kubernetes using helm charts as below
helm install --namespace velero \
--set configuration.provider="Microsoft Azure" \
--set-file credentials.secretContents.cloud=<FULL PATH TO FILE> \
--set configuration.backupStorageLocation.name=azure \
--set configuration.backupStorageLocation.bucket=<BUCKET NAME> \
--set configuration.volumeSnapshotLocation.name=<PROVIDER NAME> \
--set configuration.volumeSnapshotLocation.config.region=<REGION> \
--set image.repository=velero/velero \
--set image.tag=v1.2.0 \
--set image.pullPolicy=IfNotPresent \
--set initContainers[0].name=velero-plugin-for-microsoft-azure:v1.0.0 \
--set initContainers[0].image=velero/velero-plugin-for-microsoft-azure:v1.0.0 \
--set initContainers[0].volumeMounts[0].mountPath=/target \
--set initContainers[0].volumeMounts[0].name=plugins \
stable/velero
I have configured the below environment variables in credential-velero file and path has been provided in above command.
credentials-velero file -
AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}
AZURE_TENANT_ID=${AZURE_TENANT_ID}
AZURE_CLIENT_ID=${AZURE_CLIENT_ID}
AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET}
AZURE_RESOURCE_GROUP=${AZURE_RESOURCE_GROUP}
AZURE_CLOUD_NAME=AzurePublicCloud
I am getting below error -
an error occurred: some backup storage locations are invalid: error getting backup store for location "default": rpc error: code = Unknown desc = unable to get all required environment variables: the following keys do not have values: AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_SUBSCRIPTION_ID
Could you please help with resolution of above error?
Your velero credential files should contain values for those, not placeholders.
cat << EOF > ./credentials-velero
AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID}
AZURE_TENANT_ID=${AZURE_TENANT_ID}
AZURE_CLIENT_ID=${AZURE_CLIENT_ID}
AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET}
AZURE_RESOURCE_GROUP=${AZURE_RESOURCE_GROUP}
AZURE_CLOUD_NAME=AzurePublicCloud
EOF
https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure#setup
use this way
AZURE_SUBSCRIPTION_ID=XXXX-XXXXX-XXXXXXX-XXXXXXX
AZURE_TENANT_ID=XXXX-XXXXX-XXXXXXX-XXXXXXX
AZURE_CLIENT_ID=XXXX-XXXXX-XXXXXXX-XXXXXXX
AZURE_CLIENT_SECRET=XXXXXXXXXXXXXXXXXX
AZURE_RESOURCE_GROUP=MC_RESOURCE_GROUP_NAME_OF_AKS # this should be the MC resource group
AZURE_CLOUD_NAME=AzurePublicCloud
Also try using the master image
--set initContainers[0].image=velero/velero-plugin-for-microsoft-azure:master
Use this format
helm install velero vmware-tanzu/velero --namespace velero \
--set-file credentials.secretContents.cloud=./credentials-velero \
--set configuration.provider=azure \
--set configuration.backupStorageLocation.name=azure \
--set configuration.backupStorageLocation.bucket='velero' \
--set configuration.backupStorageLocation.config.resourceGroup=RESOURCE_GROUP_OF_STORAGEACCOUNT \
--set configuration.backupStorageLocation.config.storageAccount=STORAGE_ACCOUNT_NAME \
--set snapshotsEnabled=true \
--set deployRestic=true \
--set image.repository=velero/velero \
--set image.pullPolicy=Always \
--set initContainers[0].name=velero-plugin-for-microsoft-azure \
--set initContainers[0].image=velero/velero-plugin-for-microsoft-azure:master \
--set initContainers[0].volumeMounts[0].mountPath=/target \
--set initContainers[0].volumeMounts[0].name=plugins

Container instances metrics are missing

Metrics for our Azure Container instances has stopped showing up in the portal and when querying Azure Monitor using the CLI.
I've tried to redeploying instances, restart the containers, and disabling features such as log analytics.
These are the options to az we use to deploy our containers, with redacted values:
az container create \
--resource-group "" \
--name "" \
--image \
--registry-username "" \
--registry-password "" \
--ports \
--ip-address public \
--dns-name-label "" \
--azure-file-volume-account-name "" \
--azure-file-volume-account-key "" \
--azure-file-volume-share-name "" \
--azure-file-volume-mount-path "" \
--cpu 1 \
--memory 1 \
--log-analytics-workspace "" \
--log-analytics-workspace-key ""
According to the documentation metrics should just be there, so I'm curious as to why metrics have seemingly stopped. I'm not sure if there's some newly introduced option that needs to be enabled?
For your issue, it seems there is no problem if you create the Azure container instance with the CLI command and input the correct parameters. So I guess the possible reason is that the container instance is not in the running state so that the metrics are also stoped.
You can take a look at the steps about the Container instance logging with Azure Monitor logs to see if there are other steps missing.

Resources