Request insufficient authentication scopes when running Spark-Job on dataproc - apache-spark

I am trying to run the spark job on the google dataproc cluster as
gcloud dataproc jobs submit hadoop --cluster <cluster-name> \
--jar file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
--class org.apache.hadoop.examples.WordCount \
--arg1 \
--arg2 \
But the Job throws error
(gcloud.dataproc.jobs.submit.spark) PERMISSION_DENIED: Request had insufficient authentication scopes.
How do I add the auth scopes to run the JOB?

Usually if you're running into this error it's because of running gcloud from inside a GCE VM that's using VM-metadata controlled scopes, since otherwise gcloud installed on a local machine will typically already be using broad scopes to include all GCP operations.
For Dataproc access, when creating the VM from which you're running gcloud, you need to specify --scopes cloud-platform from the CLI, or if creating the VM from the Cloud Console UI, you should select "Allow full access to all Cloud APIs":
As another commenter mentioned above, nowadays you can also update scopes on existing GCE instances to add the CLOUD_PLATFORM scope.

You Need to check the option for allowing the API access while creating the DataProc cluster. Then only you can submit the jobs to cluster using gcloud dataproc jobs submit
command

Related

Spark cluster on Kubernetes without spark-submit

I have a spark application and want to deploy this on a Kubernetes cluster.
Following the below documentation I have managed to create an empty Kubernetes cluster, generated docker image using the Dockerfile provided under kubernetes/dockerfiles/spark/Dockerfile and deployed this on the cluster using spark-submit in a Dev environment.
https://spark.apache.org/docs/latest/running-on-kubernetes.html
However, in a 'proper' environment we have a managed Kubernetes cluster (bespoke unlike EKS etc.) and will have to provide pod configuration files to get deployed.
I believe you can supply Pod template file as an argument to the spark-submit command.
https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template
How can I do this without spark-submit? And are there any example yaml files?
PS: we have limited access to this cluster, e.g. we can install Helm charts but not operator or controller.
You could try to use k8s Spark CRD https://github.com/GoogleCloudPlatform/spark-on-k8s-operator and provide a pod configuration through it.

AKS nodepool in a failed state, PODS all pending

yesterday I was using kubectl in my command line and was getting this message after trying any command. Everything was working fine the previous day and I had not touched anything in my AKS.
Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-01-11T12:57:51-05:00 is after 2022-01-11T13:09:11Z
After doing some google to solve this issue I found a guide about rotating certificates:
https://learn.microsoft.com/en-us/azure/aks/certificate-rotation
After following the rotate guide it fixed my certificate issue however all my pods were still in a pending state so I then followed this guide: https://learn.microsoft.com/en-us/azure/aks/update-credentials
Then one of my nodepools started working again which is of type user but the one of type system is still in a failed state with all pods pending.
I am not sure of the next steps I should be taking to solve this issue. Does anyone have any recommendations? I was going to delete the nodepool and make a new one but I can't do that either because it is the last system node pool.
Assuming you are using API version older than 2020-03-01 for creating AKS cluster.
There are few limitations apply when you create and manage AKS clusters that support system node pools.
• An API version of 2020-03-01 or greater must be used to set a node
pool mode. Clusters created on API versions older than 2020-03-01
contain only user node pools, but can be migrated to contain system
node pools by following update pool mode steps.
• The mode of a node pool is a required property and must be
explicitly set when using ARM templates or direct API calls.
You can use the Bicep/JSON code provided in MS Document to create the AKS cluster as there is using upgaded API version.
You can also follow this MS Document if you want to Create a new AKS cluster with a system node pool and add a dedicated system node pool to the existing AKS cluster.
The following command adds a dedicated node pool of mode type system with a default count of three nodes.
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name systempool \
--node-count 3 \
--node-taints CriticalAddonsOnly=true:NoSchedule \
--mode System

Authenticating to Google Cloud Firestore from GKE with Workload Identity

I'm trying to write a simple backend that will access my Google Cloud Firestore, it lives in the Google Kubernetes Engine. On my local I'm using the following code to authenticate to Firestore as detailed in the Google Documentation.
if (process.env.NODE_ENV !== 'production') {
const result = require('dotenv').config()
//Additional error handling here
}
This pulls the GOOGLE_APPLICATION_CREDENTIALS environment variable and populates it with my google-application-credentals.json which I got from creating a service account with the "Cloud Datastore User" role.
So, locally, my code runs fine. I can reach my Firestore and do everything I need to. However, the problem arises once I deploy to GKE.
I followed this Google Documentation to set up a Workload Identity for my cluster, I've created a deployment and verified that the pods all are using the correct IAM Service Account by running:
kubectl exec -it POD_NAME -c CONTAINER_NAME -n NAMESPACE sh
> gcloud auth list
I was under the impression from the documentation that authentication would be handled for my service as long as the above held true. I'm really not sure why but my Firestore() instance is behaving as if it does not have the necessary credentials to access the Firestore.
In case it helps below is my declaration and implementation of the instance:
const firestore = new Firestore()
const server = new ApolloServer({
schema: schema,
dataSources: () => {
return {
userDatasource: new UserDatasource(firestore)
}
}
})
UPDATE:
In a bout of desperation I decided to tear down everything and re-build it. Following everything over step by step I appear to have either encountered a bug or (more likely) I did something mildly wrong the first time. I'm now able to connect to my backend service. However, I'm now getting a different error. Upon sending any request (I'm using GraphQL, but in essence it's any REST call) I get back a 404.
Inspecting the logs yields the following:
'Getting metadata from plugin failed with error: Could not refresh access token: A Not Found error was returned while attempting to retrieve an accesstoken for the Compute Engine built-in service account. This may be because the Compute Engine instance does not have any permission scopes specified: Could not refresh access token: Unsuccessful response status code. Request failed with status code 404'
A cursory search for this issue doesn't seem to return anything related to what I'm trying to accomplish, and so I'm back to square one.
I think your initial assumption was correct! Workload Identity is not functioning properly if you still have to specify scopes. In the Workload article you have linked, scopes are not used.
I've been struggling with the same issue and have identified three ways to get authenticated credentials in the pod.
1. Workload Identity (basically the Workload Identity article above with some deployment details added)
This method is preferred because it allows each pod deployment in a cluster to be granted only the permissions it needs.
Create cluster (note: no scopes or service account defined)
gcloud beta container clusters create {cluster-name} \
--release-channel regular \
--identity-namespace {projectID}.svc.id.goog
Then create the k8sServiceAccount, assign roles, and annotate.
gcloud container clusters get-credentials {cluster-name}
kubectl create serviceaccount --namespace default {k8sServiceAccount}
gcloud iam service-accounts add-iam-policy-binding \
--member serviceAccount:{projectID}.svc.id.goog[default/{k8sServiceAccount}] \
--role roles/iam.workloadIdentityUser \
{googleServiceAccount}
kubectl annotate serviceaccount \
--namespace default \
{k8sServiceAccount} \
iam.gke.io/gcp-service-account={googleServiceAccount}
Then I create my deployment, and set the k8sServiceAccount.
(Setting the service account was the part that I was missing)
kubectl create deployment {deployment-name} --image={containerImageURL}
kubectl set serviceaccount deployment {deployment-name} {k8sServiceAccount}
Then expose with a target of 8080
kubectl expose deployment {deployment-name} --name={service-name} --type=LoadBalancer --port 80 --target-port 8080
The googleServiceAccount needs to have the appropriate IAM roles assigned (see below).
2. Cluster Service Account
This method is not preferred, because all VMs and pods in the cluster will have permissions based on the defined service account.
Create cluster with assigned service account
gcloud beta container clusters create [cluster-name] \
--release-channel regular \
--service-account {googleServiceAccount}
The googleServiceAccount needs to have the appropriate IAM roles assigned (see below).
Then deploy and expose as above, but without setting the k8sServiceAccount
3. Scopes
This method is not preferred, because all VMs and pods in the cluster will have permisions based on the scopes defined.
Create cluster with assigned scopes (firestore only requires "cloud-platform", realtime database also requires "userinfo.email")
gcloud beta container clusters create $2 \
--release-channel regular \
--scopes https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email
Then deploy and expose as above, but without setting the k8sServiceAccount
The first two methods require a Google Service Account with the appropriate IAM roles assigned. Here are the roles I assigned to get a few Firebase products working:
FireStore: Cloud Datastore User (Datastore)
Realtime Database: Firebase Realtime Database Admin (Firebase Products)
Storage: Storage Object Admin (Cloud Storage)
Going to close this question.
Just in case anyone stumbles onto it here's what fixed it for me.
1.) I re-followed the steps in the Google Documentation link above, this fixed the issue of my pods not launching.
2.) As for my update, I re-created my cluster and gave it the Cloud Datasource permission. I had assumed that the permissions were seperate from what Workload Identity needed to function. I was wrong.
I hope this helps someone.

Apache Pulsar geo replication not working - GKE

I have two GKE clusters in 2 different regions with Apache Pulsar deployed (using Streamlio app available in marketplace) in both of them. I have added both the clusters to know about each other using
pulsar-admin clusters create region-2 --url http://<ANOTHER_CLUSTER_IP>:8080 \
--broker-url pulsar://<ANOTHER_CLUSTER_IP>:6650
& the same command in another cluster.
Then I create the tenants & namespaces in cluster region-1.
First the tenants
pulsar-admin tenants create my-tenant-1 \
--admin-roles admin --allowed-clusters region-1,region-2
Then, the namespace
pulsar-admin namespaces set-clusters tenant-1/ns1 --clusters region-1,region-2
I dont see the new tenant or the namespace created in region-1 in replciated region-2. Then I tried to give permission to the namespace but I get auth error.
$ pulsar-admin namespaces grant-permission my-tenant-1/ns1 \
--actions produce,consume \
--role admin
I get the below error
Authorization is not enabled
Reason: HTTP 501 Not
Where am I doing wrong in setting the geo-replication between two different clusters deployed in different regions in gke. Any step I missed?

release-channel attribute in terraform for GKE cluster creation

https://cloud.google.com/kubernetes-engine/docs/concepts/release-channels
offers to specify a release-channel on cluster creation for automatic upgrades of the cluster.
gcloud alpha container clusters create [CLUSTER-NAME] \
--zone [ZONE] \
[ADDITIONAL-FLAGS] \
--release-channel rapid
It seems not possible with Terraform.
It would be nice to have this feature in terraform too, right?
I believe the release-channel feature is still in beta
It'd be worth raising in https://github.com/terraform-providers/terraform-provider-google-beta (a variant of the Google provider, which "is now necessary to be able to configure products and features that are in beta", according to https://www.terraform.io/docs/providers/google/version_2_upgrade.html#the-google-beta-provider)
When I need a resource that hasn't made it to the google provider yet I usually just create a wrapper module that calls the gcloud command. Here's an example:
https://github.com/rojopolis/terraform-google-filestore

Resources