"insufficient authentication scopes" from Google API when calling from K8S cluster - node.js

I'm trying to report Node.js errors to Google Error Reporting, from one of our kubernetes deployments running on a GCP/GKE cluster with RBAC. (i.e. permissions defined in a service account associated to the cluster)
const googleCloud = require('#google-cloud/error-reporting');
const googleCloudErrorReporting = new googleCloud.ErrorReporting();
googleCloudErrorReporting.report('[test] dummy error message');
This works only in certain environments:
it works when run on my laptop, using a service account that has the "Errors Writer" role
it works when running in my cluster as a K8S job, after having added the "Errors Writer" role to that cluster's service account
it causes the following error when called from my Node.js application running in one of my K8S deployments:
ERROR:#google-cloud/error-reporting: Encountered an error while attempting to transmit an error to the Stackdriver Error Reporting API.
Error: Request had insufficient authentication scopes.
It feels like the job did pick up the permission changes of the cluster's service account, whereas my deployment did not.
I did try to re-create the deployment to make it refresh its auth token, but the error is still happening...
Any ideas?
UPDATE: I ended up following Jérémie Girault's suggestion: create a service account and bind it to my deployment. It works!

The error message has to do with the access scopes set on the cluster when using the default service account. You must enable access to the appropriate API.
As you mentioned, creating a separate service account, providing it the appropriate IAM permissions and linking it to your cluster or workload will bypass this error as well.

Related

Use DefaultAzureCredentials to authenticate Service bus in Docker Container

I'm trying to use DefaultAzureCredentials to authenticate my Azure function against Azure Service Bus. In my azure function azure-func-service-bus, I call to Azure Service Bus
servicebus_client = ServiceBusClient(
fully_qualified_namespace=MY_SERVICE_BUS_NAMESPACE_NAME+".servicebus.windows.net",
credential=DefaultAzureCredential(additionally_allowed_tenants=['*'])
)
I created and pushed Docker container to ACR. When I run the container locally for testing outside of Azure, it does not know what permissions to use.
az acr login --name acr01
docker push acr01.azurecr.io/azure-func-service-bus:v1
docker pull acr01.azurecr.io/azure-func-service-bus:v1
docker run -it --rm -p 8080:80 acr01.azurecr.io/azure-func-service-bus:v1
but got the following error.
DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot.this issue.
ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
VisualStudioCodeCredential: Failed to get Azure user details from Visual Studio Code.
AzureCliCredential: Azure CLI not found on path
AzurePowerShellCredential: PowerShell is not installed
To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.
Unexpected error occurred (ClientAuthenticationError('DefaultAzureCredential failed to retrieve a token from the included credentials.\nAttempted credentials:\n\tEnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.\nVisit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot.this issue.\n\tManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.\n\tSharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.\n\tVisualStudioCodeCredential: Failed to get Azure user details from Visual Studio Code.\n\tAzureCliCredential: Azure CLI not found on path\n\tAzurePowerShellCredential: PowerShell is not installed\nTo mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.')). Handler shutting down.
I'm missing a key piece of the puzzle. How can I handle this?
When the Azure Function runs in Azure, it's configured to support ManagedIdentityCredential. For your case I'd recommend trying to configure EnvironmentCredential to test locally.
You can find the details in the link, but the short version is:
Create a service principle (Docs) and give it the needed access
Run the container with extra Environment Variables:
AZURE_TENANT_ID: service principal's Tenant ID
AZURE_CLIENT_ID: service principal's AppId
AZURE_CLIENT_SECRET: service principle's password
I'd recommend using a .env file to make this easier, but be sure it doesn't get checked in anywhere.
FYI If your account doesn't use MFA, you can instead use the variables AZURE_USERNAME and AZURE_PASSWORD. But then you've put your username and password in a file or your terminal history which is concerning. Admittedly the service principal has the same problem, but you can more easily mitigate that with minimizing it's access and regularly rolling the secret.
P.S. If you're using Visual Studio for making your Azure Function you should be able to use something like: EnvironmentCredentialExample to automate setting up and using the needed .env file.

GKE: Impossible to delete a cluster

I have a weird issue with GKE, the cluster has been created by Terraform, and I tried to make a change requiring a deletion and re-creation.
It failed at the re-creation because I was missing an API, so I added it and retry.
Thing is that I have a cluster that exists, empty but with failed to delete cluster message on it.
I never had this issue and I already destroyed and re-created this very resource. I tried to destroy all the resources created by terraform on this project but I still get an error "failed to delete cluster".
Also I tried to do it by hand on the UI but still get the same error.
I tried to do it using
gcloud container clusters delete <cluster_name> and got
"Failed to delete cluster, name: operation-xxx-xxx..." and got a link to the operation failed.
It's a JSON with a 401 code, with the following message:
Request is missing required authentication credential. Expected OAuth
2 access token, login cookie or other valid authentication credential.
See
https://developers.google.com/identity/sign-in/web/devconsole-project.
I tried to re-auth but it doesn't help I get the same error.
I'm running out of idea, can you help me here?
A 401 (unauthorized) suggests that you've insufficient permissions to delete the cluster.
Either get a role that permits your user account to delete clusters.
Or ask someone who has an account that has sufficient powers to delete it for you.
Or authenticate gcloud (gcloud activate-service-account) with the Service Account that you used to create the cluster (assuming it can delete clusters too) and then use gcloud container clusters delete ... optionally include --account=${SERVICE_ACCOUNT_EMAIL} or just ensure the Service Account is ACTIVE with gcloud auth list.
I did not found a proper solution, but what did work was to delete the whole project and start over.
Luckily for me it was a lab, not a production project...

How to use Cloud Trace with Nodejs on GKE with workload identity enabled?

I'm trying to set up Cloud Trace on a GKE cluster with workload identity enabled. My pod uses a service account, which has the Cloud Trace Agent role. (I also tried giving it the Owner role, to rule out permission issues, but that didn't change the error.)
I followed the Node.js quickstart, which says to add the following snippet to my code:
require('#google-cloud/trace-agent').start();
When I try to add a trace, I get the following error:
#google-cloud/trace-agent DEBUG TraceWriter#publish: Received error while publishing traces to cloudtrace.googleapis.com: Error: Could not refresh access token: A Forbidden error was returned while attempting to retrieve an access token for the Compute Engine built-in service account. This may be because the Compute Engine instance does not have the correct permission scopes specified: Could not refresh access token: Unsuccessful response status code. Request failed with status code 403
(How) can I configure the library to work in this scenario?
In order to answer your question on comments above: correct me if I'm wrong - workload identity is a cluster feature, not connected to a namespace?
And seeing that you have fixed your problem by configuring the binding between KSA/K8s Namespace and GCP SA I will add a response to add more context that I believe could help clarify this.
Yes you are right, Workload identity is a GKE cluster feature that lets you bind an identity from K8s (Kubernetes Service Account (KSA)) with a GCP identity (Google Service Account(GSA)) in order to have your workloads authenticated with an specific GCP identity and with enough permissions to be able to reach certain APIs (depending on the permissions that your GCP service account has). k8s namespaces and KSA take a critical role here, as KSA are Namespaced resources.
Therefore, in order to authenticate correctly your workloads (containers) and with a GCP Service account, you need to create them in the configured k8s Namespace and with the configured KSA, as mentioned in this doc
If you create your workloads on a different k8s Namespace (meaning using a different KSA), you will not be able to get an authenticated identity for your workloads, instead of that, your workloads will be authenticated with the Workload Identity Pool/Workload Identity Namespace, which is: PROJECT_ID.svc.id.goog. Meaning that when you create a container with the GCP SDK installed and run a glcoud auth list you will get PROJECT_ID.svc.id.goog as the authenticated identity, which is an IAM object but not an identity with permission in IAM. So your workloads will be lacking of permissions.
Then you need to create your containers in the configured namespace and with the configured service account to be able to have a correct identity in your containers and with IAM permissions.
I'm assuming that above (authentication with lack of permission and lack of an actual IAM Identity) is what happened here, as you mentioned in your response, you just added the needed binding between GSA and the KSA, meaning that your container was lacking of an identity with actual IAM permissions.
Just to be clear on this, Workload Identity allows you to authenticate your workloads with a service account different from the one on your GKE nodes. If your application runs inside a Google Cloud environment that has a default service account, your application can retrieve the service account credentials to call Google Cloud APIs. Such environments include Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, and Cloud Functions, here.
With above comment I want to say that even if you do not use Workload Identity, your containers will be authenticated as they are running on GKE, which by default use a service account, and this service account is inherited from the nodes to your containers, the default service account (Compute service Account) and its scopes are enough to write from containers to Cloud Trace and that is why you were able to see traces with a GKE cluster with Workload Identity disabled, because the default service account was used on your containers and nodes.
If you test this on both environments:
GKE cluster with Workload Identity: You will be able to see, with the correct config, a service account different than the default, authenticating your workloads/containers.
GKE cluster with Workloads Identity disabled: You will be able to see the same service account used by your nodes (by default the compute engine service account with Editor role and scopes applied on your nodes when using default service account) on your Containers.
These tests can be performed by spinning the same container you used on your response, which is:
kubectl run -it \
--image google/cloud-sdk:slim \
--serviceaccount KSA_NAME \ ##If needed
--namespace K8S_NAMESPACE \ ##If needed
workload-identity-test
And running `glcoud auth list to see the identity you are authenticated with on your containers.
Hope this can help somehow!
It turned out I had misconfigured the IAM service account.
I managed to get a more meaningful error message by running a new pod in my namespace with the gcloud cli installed:
kubectl run -it \
--image gcr.io/google.com/cloudsdktool/cloud-sdk \
--serviceaccount $GKE_SERVICE_ACCOUNT test \
-- bash
after that, just running any gcloud command gave an error message containing (emphasis mine):
Unable to generate access token; IAM returned 403 Forbidden: The caller does not have permission
This error could be caused by a missing IAM policy binding on the target IAM service account.
Running
gcloud iam service-accounts get-iam-policy $SERVICE_ACCOUNT
indeed showed that the binding to the Kubernetes service account was missing.
Adding it manually fixed the issue:
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:$PROJECT.svc.id.goog[$NAMESPACE/$GKE_SERVICE_ACCOUNT]" \
$SERVICE_ACCOUNT
After more research, the underlying problem was that I created my service accounts using Config Connector but hadn't properly annotated the Kubernetes namespace with the Google Cloud project to deploy the resources in:
kubectl annotate namespace "$NAMESPACE" cnrm.cloud.google.com/project-id="$PROJECT"
Therefore, Cloud Connector could not add the IAM policy binding.

Getting UnknownHostException while accessing key-vault secrets

We have a spring boot application deployed in Azure App Service that access Azure Key Vault using Service Principle. Below are the steps we used
Created a Service Principal (SP) and Key-vault
Stored the SP’s client secret into Key-Vault
Provided necessary access(Get, List, Set permissions in Secret Permissions section) to SP to access Key-Vault
Also, we added VNet and to Data lake storage account and provided necessary access to SP to access storage as well.
Whitelisted IP of the App Service in Key-Vault also.
Deployed the spring boot app in Azure App service
After deployment service is up and running perfectly but suddenly sometime we are getting below error and we cant access the hosted web services because of the below error container keep on restarting in Azure.
2020-10-08 09:59:53,945 ERROR o.s.b.SpringApplication - Application run failed
java.lang.IllegalStateException: Failed to configure KeyVault property source
at com.microsoft.azure.keyvault.spring.KeyVaultEnvironmentPostProcessorHelper.addKeyVaultPropertySource(KeyVaultEnvironmentPostProcessorHelper.java:77) ~[azure-spring
-boot-2.2.4.jar!/:?]
Caused by: java.lang.RuntimeException: Max retries 3 times exceeded. Error Details: keyvault-name.vault.azure.net
Caused by: java.net.UnknownHostException: keyvault-name.vault.azure.net
After 2-3 hours it automatically gets resolved. We are not able to replicate the issue since its intermittent and we don't know the pattern. Can any one guide us where and what will be the issue.

Can't log in service princible from VSTS, but works in TFS and Azure Portal state success

I'm sitting in a project where I will move from TFS to VSTS so we do have a working release definition.
But when I try deploying a service fabric cluster i get the following error:
2018-08-28T09:02:59.8922249Z ##[error]An error occurred attempting to acquire an Azure Active Directory token. Ensure that your service endpoint is configured properly with valid credentials. Error message: Exception calling "AcquireToken" with "3" argument(s): "AADSTS50079: Due to a configuration change made by your administrator, or because you moved to a new location, you must enroll in multi-factor authentication to access '< service principle Id >'.
Trace ID: < guid1 is here >
Correlation ID: < guid2 is here >
Then I go to the azure portal -> AAD -> Sign In -> look up my specific sign in (based on correlation Id) and there it state that Sign-in status is Success
Considering this works for our TFS instance i assume the service principle is correctly set up. But since the build/deploy agents is now on a VM in azure instead of on prem for TFS, is there anything i need to change?
Traffic should be OK, i can navigate to the https-adress to the cluster from the VM with agents.
I've tried google it, but to no success so hopefully someone can point me to the right direction where to look.
And in portal, 'MFA is required' is no, so multi factor should not be neccesary.
Just try using certificate based authentication instead of using AAD Authentication in the service endpoint configuration.
Reference the same issue here: https://github.com/Microsoft/vsts-tasks/issues/7714
If that still not work, just try to create a new endpoint, then try it again.

Resources