Apache Pulsar geo replication not working - GKE - apache-pulsar

I have two GKE clusters in 2 different regions with Apache Pulsar deployed (using Streamlio app available in marketplace) in both of them. I have added both the clusters to know about each other using
pulsar-admin clusters create region-2 --url http://<ANOTHER_CLUSTER_IP>:8080 \
--broker-url pulsar://<ANOTHER_CLUSTER_IP>:6650
& the same command in another cluster.
Then I create the tenants & namespaces in cluster region-1.
First the tenants
pulsar-admin tenants create my-tenant-1 \
--admin-roles admin --allowed-clusters region-1,region-2
Then, the namespace
pulsar-admin namespaces set-clusters tenant-1/ns1 --clusters region-1,region-2
I dont see the new tenant or the namespace created in region-1 in replciated region-2. Then I tried to give permission to the namespace but I get auth error.
$ pulsar-admin namespaces grant-permission my-tenant-1/ns1 \
--actions produce,consume \
--role admin
I get the below error
Authorization is not enabled
Reason: HTTP 501 Not
Where am I doing wrong in setting the geo-replication between two different clusters deployed in different regions in gke. Any step I missed?

Related

AKS nodepool in a failed state, PODS all pending

yesterday I was using kubectl in my command line and was getting this message after trying any command. Everything was working fine the previous day and I had not touched anything in my AKS.
Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-01-11T12:57:51-05:00 is after 2022-01-11T13:09:11Z
After doing some google to solve this issue I found a guide about rotating certificates:
https://learn.microsoft.com/en-us/azure/aks/certificate-rotation
After following the rotate guide it fixed my certificate issue however all my pods were still in a pending state so I then followed this guide: https://learn.microsoft.com/en-us/azure/aks/update-credentials
Then one of my nodepools started working again which is of type user but the one of type system is still in a failed state with all pods pending.
I am not sure of the next steps I should be taking to solve this issue. Does anyone have any recommendations? I was going to delete the nodepool and make a new one but I can't do that either because it is the last system node pool.
Assuming you are using API version older than 2020-03-01 for creating AKS cluster.
There are few limitations apply when you create and manage AKS clusters that support system node pools.
• An API version of 2020-03-01 or greater must be used to set a node
pool mode. Clusters created on API versions older than 2020-03-01
contain only user node pools, but can be migrated to contain system
node pools by following update pool mode steps.
• The mode of a node pool is a required property and must be
explicitly set when using ARM templates or direct API calls.
You can use the Bicep/JSON code provided in MS Document to create the AKS cluster as there is using upgaded API version.
You can also follow this MS Document if you want to Create a new AKS cluster with a system node pool and add a dedicated system node pool to the existing AKS cluster.
The following command adds a dedicated node pool of mode type system with a default count of three nodes.
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name systempool \
--node-count 3 \
--node-taints CriticalAddonsOnly=true:NoSchedule \
--mode System

Pull images from an Azure container registry to a Kubernetes cluster

I have followed this tutorial microsoft_website to pull images from an azure container. My yaml successfully creates a pod job, which can pull the image, BUT only when it runs on the agentpool node in my cluster.
For example, adding nodeName: aks-agentpool-33515997-vmss000000 to the yamlworks fine, but specifying a different node name, e.g. nodeName: aks-cpu1-33515997-vmss000000, the pod fails. The error message I get with describe pods is Failed to pull image and then kubelet Error: ErrImagePull.
What I'm missing?
Create secret:
kubectl create secret docker-registry <secret-name> \
--docker-server=<container-registry-name>.azurecr.io \
--docker-username=<service-principal-ID> \
--docker-password=<service-principal-password>
As #user1571823 told solution to the problem is deleting the old image from the acr and creating/pushing a new one.
The problem was related to some sort of corruption in the image saved in the azure container registry (acr). The reason why one agent pool could pulled the image was actually because the image already existed in the VM.
Henceforth as #andov said it is good option to open an incident case to Azure support for AKS from your subscription, where AKS is deployed. The support team has full access to the AKS service backend and they can tell exactly what was causing your problem.
Four things to check:
Is it a subscription issue? Are the nodes in different subscriptions?
Is it a rights issue? Does the service principle of the node have rights to pull the image.
Is it a network issue? Are the nodes on different subnets?
Is there something with the image size or configuration, that means that it cannot run on the other cluster.
Edit
New-AzAksNodePool has a parameter -DefaultProfile
It can be AzContext, AzureRmContext, AzureCredential
If this is different between your nodes it would explain the error

How to use Cloud Trace with Nodejs on GKE with workload identity enabled?

I'm trying to set up Cloud Trace on a GKE cluster with workload identity enabled. My pod uses a service account, which has the Cloud Trace Agent role. (I also tried giving it the Owner role, to rule out permission issues, but that didn't change the error.)
I followed the Node.js quickstart, which says to add the following snippet to my code:
require('#google-cloud/trace-agent').start();
When I try to add a trace, I get the following error:
#google-cloud/trace-agent DEBUG TraceWriter#publish: Received error while publishing traces to cloudtrace.googleapis.com: Error: Could not refresh access token: A Forbidden error was returned while attempting to retrieve an access token for the Compute Engine built-in service account. This may be because the Compute Engine instance does not have the correct permission scopes specified: Could not refresh access token: Unsuccessful response status code. Request failed with status code 403
(How) can I configure the library to work in this scenario?
In order to answer your question on comments above: correct me if I'm wrong - workload identity is a cluster feature, not connected to a namespace?
And seeing that you have fixed your problem by configuring the binding between KSA/K8s Namespace and GCP SA I will add a response to add more context that I believe could help clarify this.
Yes you are right, Workload identity is a GKE cluster feature that lets you bind an identity from K8s (Kubernetes Service Account (KSA)) with a GCP identity (Google Service Account(GSA)) in order to have your workloads authenticated with an specific GCP identity and with enough permissions to be able to reach certain APIs (depending on the permissions that your GCP service account has). k8s namespaces and KSA take a critical role here, as KSA are Namespaced resources.
Therefore, in order to authenticate correctly your workloads (containers) and with a GCP Service account, you need to create them in the configured k8s Namespace and with the configured KSA, as mentioned in this doc
If you create your workloads on a different k8s Namespace (meaning using a different KSA), you will not be able to get an authenticated identity for your workloads, instead of that, your workloads will be authenticated with the Workload Identity Pool/Workload Identity Namespace, which is: PROJECT_ID.svc.id.goog. Meaning that when you create a container with the GCP SDK installed and run a glcoud auth list you will get PROJECT_ID.svc.id.goog as the authenticated identity, which is an IAM object but not an identity with permission in IAM. So your workloads will be lacking of permissions.
Then you need to create your containers in the configured namespace and with the configured service account to be able to have a correct identity in your containers and with IAM permissions.
I'm assuming that above (authentication with lack of permission and lack of an actual IAM Identity) is what happened here, as you mentioned in your response, you just added the needed binding between GSA and the KSA, meaning that your container was lacking of an identity with actual IAM permissions.
Just to be clear on this, Workload Identity allows you to authenticate your workloads with a service account different from the one on your GKE nodes. If your application runs inside a Google Cloud environment that has a default service account, your application can retrieve the service account credentials to call Google Cloud APIs. Such environments include Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, and Cloud Functions, here.
With above comment I want to say that even if you do not use Workload Identity, your containers will be authenticated as they are running on GKE, which by default use a service account, and this service account is inherited from the nodes to your containers, the default service account (Compute service Account) and its scopes are enough to write from containers to Cloud Trace and that is why you were able to see traces with a GKE cluster with Workload Identity disabled, because the default service account was used on your containers and nodes.
If you test this on both environments:
GKE cluster with Workload Identity: You will be able to see, with the correct config, a service account different than the default, authenticating your workloads/containers.
GKE cluster with Workloads Identity disabled: You will be able to see the same service account used by your nodes (by default the compute engine service account with Editor role and scopes applied on your nodes when using default service account) on your Containers.
These tests can be performed by spinning the same container you used on your response, which is:
kubectl run -it \
--image google/cloud-sdk:slim \
--serviceaccount KSA_NAME \ ##If needed
--namespace K8S_NAMESPACE \ ##If needed
workload-identity-test
And running `glcoud auth list to see the identity you are authenticated with on your containers.
Hope this can help somehow!
It turned out I had misconfigured the IAM service account.
I managed to get a more meaningful error message by running a new pod in my namespace with the gcloud cli installed:
kubectl run -it \
--image gcr.io/google.com/cloudsdktool/cloud-sdk \
--serviceaccount $GKE_SERVICE_ACCOUNT test \
-- bash
after that, just running any gcloud command gave an error message containing (emphasis mine):
Unable to generate access token; IAM returned 403 Forbidden: The caller does not have permission
This error could be caused by a missing IAM policy binding on the target IAM service account.
Running
gcloud iam service-accounts get-iam-policy $SERVICE_ACCOUNT
indeed showed that the binding to the Kubernetes service account was missing.
Adding it manually fixed the issue:
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:$PROJECT.svc.id.goog[$NAMESPACE/$GKE_SERVICE_ACCOUNT]" \
$SERVICE_ACCOUNT
After more research, the underlying problem was that I created my service accounts using Config Connector but hadn't properly annotated the Kubernetes namespace with the Google Cloud project to deploy the resources in:
kubectl annotate namespace "$NAMESPACE" cnrm.cloud.google.com/project-id="$PROJECT"
Therefore, Cloud Connector could not add the IAM policy binding.

Authenticating to Google Cloud Firestore from GKE with Workload Identity

I'm trying to write a simple backend that will access my Google Cloud Firestore, it lives in the Google Kubernetes Engine. On my local I'm using the following code to authenticate to Firestore as detailed in the Google Documentation.
if (process.env.NODE_ENV !== 'production') {
const result = require('dotenv').config()
//Additional error handling here
}
This pulls the GOOGLE_APPLICATION_CREDENTIALS environment variable and populates it with my google-application-credentals.json which I got from creating a service account with the "Cloud Datastore User" role.
So, locally, my code runs fine. I can reach my Firestore and do everything I need to. However, the problem arises once I deploy to GKE.
I followed this Google Documentation to set up a Workload Identity for my cluster, I've created a deployment and verified that the pods all are using the correct IAM Service Account by running:
kubectl exec -it POD_NAME -c CONTAINER_NAME -n NAMESPACE sh
> gcloud auth list
I was under the impression from the documentation that authentication would be handled for my service as long as the above held true. I'm really not sure why but my Firestore() instance is behaving as if it does not have the necessary credentials to access the Firestore.
In case it helps below is my declaration and implementation of the instance:
const firestore = new Firestore()
const server = new ApolloServer({
schema: schema,
dataSources: () => {
return {
userDatasource: new UserDatasource(firestore)
}
}
})
UPDATE:
In a bout of desperation I decided to tear down everything and re-build it. Following everything over step by step I appear to have either encountered a bug or (more likely) I did something mildly wrong the first time. I'm now able to connect to my backend service. However, I'm now getting a different error. Upon sending any request (I'm using GraphQL, but in essence it's any REST call) I get back a 404.
Inspecting the logs yields the following:
'Getting metadata from plugin failed with error: Could not refresh access token: A Not Found error was returned while attempting to retrieve an accesstoken for the Compute Engine built-in service account. This may be because the Compute Engine instance does not have any permission scopes specified: Could not refresh access token: Unsuccessful response status code. Request failed with status code 404'
A cursory search for this issue doesn't seem to return anything related to what I'm trying to accomplish, and so I'm back to square one.
I think your initial assumption was correct! Workload Identity is not functioning properly if you still have to specify scopes. In the Workload article you have linked, scopes are not used.
I've been struggling with the same issue and have identified three ways to get authenticated credentials in the pod.
1. Workload Identity (basically the Workload Identity article above with some deployment details added)
This method is preferred because it allows each pod deployment in a cluster to be granted only the permissions it needs.
Create cluster (note: no scopes or service account defined)
gcloud beta container clusters create {cluster-name} \
--release-channel regular \
--identity-namespace {projectID}.svc.id.goog
Then create the k8sServiceAccount, assign roles, and annotate.
gcloud container clusters get-credentials {cluster-name}
kubectl create serviceaccount --namespace default {k8sServiceAccount}
gcloud iam service-accounts add-iam-policy-binding \
--member serviceAccount:{projectID}.svc.id.goog[default/{k8sServiceAccount}] \
--role roles/iam.workloadIdentityUser \
{googleServiceAccount}
kubectl annotate serviceaccount \
--namespace default \
{k8sServiceAccount} \
iam.gke.io/gcp-service-account={googleServiceAccount}
Then I create my deployment, and set the k8sServiceAccount.
(Setting the service account was the part that I was missing)
kubectl create deployment {deployment-name} --image={containerImageURL}
kubectl set serviceaccount deployment {deployment-name} {k8sServiceAccount}
Then expose with a target of 8080
kubectl expose deployment {deployment-name} --name={service-name} --type=LoadBalancer --port 80 --target-port 8080
The googleServiceAccount needs to have the appropriate IAM roles assigned (see below).
2. Cluster Service Account
This method is not preferred, because all VMs and pods in the cluster will have permissions based on the defined service account.
Create cluster with assigned service account
gcloud beta container clusters create [cluster-name] \
--release-channel regular \
--service-account {googleServiceAccount}
The googleServiceAccount needs to have the appropriate IAM roles assigned (see below).
Then deploy and expose as above, but without setting the k8sServiceAccount
3. Scopes
This method is not preferred, because all VMs and pods in the cluster will have permisions based on the scopes defined.
Create cluster with assigned scopes (firestore only requires "cloud-platform", realtime database also requires "userinfo.email")
gcloud beta container clusters create $2 \
--release-channel regular \
--scopes https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email
Then deploy and expose as above, but without setting the k8sServiceAccount
The first two methods require a Google Service Account with the appropriate IAM roles assigned. Here are the roles I assigned to get a few Firebase products working:
FireStore: Cloud Datastore User (Datastore)
Realtime Database: Firebase Realtime Database Admin (Firebase Products)
Storage: Storage Object Admin (Cloud Storage)
Going to close this question.
Just in case anyone stumbles onto it here's what fixed it for me.
1.) I re-followed the steps in the Google Documentation link above, this fixed the issue of my pods not launching.
2.) As for my update, I re-created my cluster and gave it the Cloud Datasource permission. I had assumed that the permissions were seperate from what Workload Identity needed to function. I was wrong.
I hope this helps someone.

Request insufficient authentication scopes when running Spark-Job on dataproc

I am trying to run the spark job on the google dataproc cluster as
gcloud dataproc jobs submit hadoop --cluster <cluster-name> \
--jar file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
--class org.apache.hadoop.examples.WordCount \
--arg1 \
--arg2 \
But the Job throws error
(gcloud.dataproc.jobs.submit.spark) PERMISSION_DENIED: Request had insufficient authentication scopes.
How do I add the auth scopes to run the JOB?
Usually if you're running into this error it's because of running gcloud from inside a GCE VM that's using VM-metadata controlled scopes, since otherwise gcloud installed on a local machine will typically already be using broad scopes to include all GCP operations.
For Dataproc access, when creating the VM from which you're running gcloud, you need to specify --scopes cloud-platform from the CLI, or if creating the VM from the Cloud Console UI, you should select "Allow full access to all Cloud APIs":
As another commenter mentioned above, nowadays you can also update scopes on existing GCE instances to add the CLOUD_PLATFORM scope.
You Need to check the option for allowing the API access while creating the DataProc cluster. Then only you can submit the jobs to cluster using gcloud dataproc jobs submit
command

Resources