How to access Files in Google Cloud Storage through GKE pods

How to access Files in Google Cloud Storage through GKE pods - node.js

I'm trying get image files of Google Cloud Storage (GCS) in my Node.js application using Axios client. On develop mode using my PC I pass a Bearer Token and all works properly.
But, I need to use this in production in a cluster hosted on Google Kubernetes Engine (GKE).
I made recommended tuturials to create a service account (GSA), then I vinculed with kubernetes account (KSA), via Workload identity approach, but when I try get files througt one endpoint on my app, I'm receiving:
{"statusCode":401,"message":"Unauthorized"}
What is missing to make?
Update: What I've done:
Create Google Service Account
https://cloud.google.com/iam/docs/creating-managing-service-accounts
Create Kubernetes Service Account
# gke-access-gcs.ksa.yaml file
apiVersion: v1
kind: ServiceAccount
metadata:
name: gke-access-gcs
kubectl apply -f gke-access-gcs.ksa.yaml
Relate KSAs and GSAs
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:cluster_project.svc.id.goog[k8s_namespace/ksa_name]" \
gsa_name#gsa_project.iam.gserviceaccount.com
Note the KSA and complete the link between KSA and GSA
kubectl annotate serviceaccount \
--namespace k8s_namespace \
ksa_name \
iam.gke.io/gcp-service-account=gsa_name#gsa_project.iam.gserviceaccount.com
Set Read and Write role:
gcloud projects add-iam-policy-binding project-id \
--member=serviceAccount:gsa-account#project-id.iam.gserviceaccount.com \
--role=roles/storage.objectAdmin
Test access:
kubectl run -it \
--image google/cloud-sdk:slim \
--serviceaccount ksa-name \
--namespace k8s-namespace \
workload-identity-test
The above command works correctly. Note that was passed --serviceaccount and workload-identity. Is this necessary to GKE?
PS: I don't know if this influences, but I am using SQL Cloud with proxy in the project.

EDIT
Issue portrayed in the question is related to the fact that axios client does not use the Application Default Credentials (as official Google libraries) mechanism that Workload Identity takes advantage of. The ADC checks:
If the environment variable GOOGLE_APPLICATION_CREDENTIALS is set, ADC uses the service account file that the variable points to.
If the environment variable GOOGLE_APPLICATION_CREDENTIALS isn't set, ADC uses the default service account that Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, and Cloud Functions provide.
-- Cloud.google.com: Authentication: Production
This means that axios client will need to fall back to the Bearer token authentication method to authenticate against Google Cloud Storage.
The authentication with Bearer token is described in the official documentation as following:
API authentication
To make requests using OAuth 2.0 to either the Cloud Storage XML API or JSON API, include your application's access token in the Authorization header in every request that requires authentication. You can generate an access token from the OAuth 2.0 Playground.
Authorization: Bearer OAUTH2_TOKEN
The following is an example of a request that lists objects in a bucket.
JSON API
Use the list method of the Objects resource.
GET /storage/v1/b/example-bucket/o HTTP/1.1
Host: www.googleapis.com
Authorization: Bearer ya29.AHES6ZRVmB7fkLtd1XTmq6mo0S1wqZZi3-Lh_s-6Uw7p8vtgSwg
-- Cloud.google.com: Storage: Docs: Api authentication
I've included basic example of a code snippet using Axios to query the Cloud Storage (requires $ npm install axios):
const Axios = require('axios');
const config = {
headers: { Authorization: 'Bearer ${OAUTH2_TOKEN}' }
};
Axios.get(
'https://storage.googleapis.com/storage/v1/b/BUCKET-NAME/o/',
config
).then(
(response) => {
console.log(response.data.items);
},
(err) => {
console.log('Oh no. Something went wrong :(');
// console.log(err) <-- Get the full output!
}
);
I left below example of Workload Identity setup with a node.js official library code snippet as it could be useful to other community members.
Posting this answer as I've managed to use Workload Identity and a simple nodejs app to send and retrieve data from GCP bucket.
I included some bullet points for troubleshooting potential issues.
Steps:
Check if GKE cluster has Workload Identity enabled.
Check if your Kubernetes service account is associated with your Google Service account.
Check if example workload is using correct Google Service account when connecting to the API's.
Check if your Google Service account is having correct permissions to access your bucket.
You can also follow the official documentation:
Cloud.google.com: Kubernetes Engine: Workload Identity
Assuming that:
Project (ID) named: awesome-project <- it's only example
Kubernetes namespace named: bucket-namespace
Kubernetes service account named: bucket-service-account
Google service account named: google-bucket-service-account
Cloud storage bucket named: workload-bucket-example <- it's only example
I've included the commands:
$ kubectl create namespace bucket-namespace
$ kubectl create serviceaccount --namespace bucket-namespace bucket-service-account
$ gcloud iam service-accounts create google-bucket-service-account
$ gcloud iam service-accounts add-iam-policy-binding --role roles/iam.workloadIdentityUser --member "serviceAccount:awesome-project.svc.id.goog[bucket-namespace/bucket-service-account]" google-bucket-service-account#awesome-project.iam.gserviceaccount.com
$ kubectl annotate serviceaccount --namespace bucket-namespace bucket-service-account iam.gke.io/gcp-service-account=google-bucket-service-account#awesome-project-ID.iam.gserviceaccount.com
Using the guide linked above check the service account authenticating to API's:
$ kubectl run -it --image google/cloud-sdk:slim --serviceaccount bucket-service-account --namespace bucket-namespace workload-identity-test
The output of $ gcloud auth list should show:
Credentialed Accounts
ACTIVE ACCOUNT
* google-bucket-service-account#AWESOME-PROJECT.iam.gserviceaccount.com
To set the active account, run:
$ gcloud config set account `ACCOUNT`
Google service account created earlier should be present in the output!
Also it's required to add the permissions for the service account to the bucket. You can either:
Use Cloud Console
Run: $ gsutil iam ch serviceAccount:google-bucket-service-account#awesome-project.iam.gserviceaccount.com:roles/storage.admin gs://workload-bucket-example
To download the file from the workload-bucket-example following code can be used:
// Copyright 2020 Google LLC
/**
* This application demonstrates how to perform basic operations on files with
* the Google Cloud Storage API.
*
* For more information, see the README.md under /storage and the documentation
* at https://cloud.google.com/storage/docs.
*/
const path = require('path');
const cwd = path.join(__dirname, '..');
function main(
bucketName = 'workload-bucket-example',
srcFilename = 'hello.txt',
destFilename = path.join(cwd, 'hello.txt')
) {
const {Storage} = require('#google-cloud/storage');
// Creates a client
const storage = new Storage();
async function downloadFile() {
const options = {
// The path to which the file should be downloaded, e.g. "./file.txt"
destination: destFilename,
};
// Downloads the file
await storage.bucket(bucketName).file(srcFilename).download(options);
console.log(
`gs://${bucketName}/${srcFilename} downloaded to ${destFilename}.`
);
}
downloadFile().catch(console.error);
// [END storage_download_file]
}
main(...process.argv.slice(2));
The code is exact copy from:
Googleapis.dev: NodeJS: Storage
Github.com: Googleapis: Nodejs-storage: downloadFile.js
Running this code should produce an output:
root#ubuntu:/# nodejs app.js
gs://workload-bucket-example/hello.txt downloaded to /hello.txt.
root#ubuntu:/# cat hello.txt
Hello there!

Related

Deployed GAE Instance Does Not Have Permissions

I have successfully deployed apollo-server-express on a GAE instance, however, the instance is unable to fetch secrets from the Google Secrets Manager.
Error from GAE Logs:
Error: 7 PERMISSION_DENIED: The caller does not have permission
index.ts
// #note `SECRET_NAMES` is a comma separated string of the secrets' paths
const secretNames = process.env.SECRET_NAMES?.split(',') ?? []
for (const secretName of secretNames) {
// #note `loadSecret` will use Google Secrets Manager SDK to download the payload
// #note `secretName` is the fully qualified path to the secret located in Google Secret Manager API
Object.assign(process.env, await loadSecret(secretName))
}
app.yml
runtime: nodejs16
service: <service-name>
instance_class: F1
env_variables:
SECRET_NAMES: '<path-1>/versions/latest,<path-2>/versions/latest`
On my local machine, I would use the google-service-key.json to have the server run with an app service's credentials as default. This app service has the following roles:
Secrets Access
SQL Client
However, once I run gcloud app deploy, the server is no longer going to look for the google-service-key.json, and instead will use admin.credentials.applicatioDefault() to authenticate.
However, I'm not certain that the default credentials of the GAE instance are the same credentials that I referenced in the google-service-key.json.

How to use Cloud Trace with Nodejs on GKE with workload identity enabled?

I'm trying to set up Cloud Trace on a GKE cluster with workload identity enabled. My pod uses a service account, which has the Cloud Trace Agent role. (I also tried giving it the Owner role, to rule out permission issues, but that didn't change the error.)
I followed the Node.js quickstart, which says to add the following snippet to my code:
require('#google-cloud/trace-agent').start();
When I try to add a trace, I get the following error:
#google-cloud/trace-agent DEBUG TraceWriter#publish: Received error while publishing traces to cloudtrace.googleapis.com: Error: Could not refresh access token: A Forbidden error was returned while attempting to retrieve an access token for the Compute Engine built-in service account. This may be because the Compute Engine instance does not have the correct permission scopes specified: Could not refresh access token: Unsuccessful response status code. Request failed with status code 403
(How) can I configure the library to work in this scenario?

In order to answer your question on comments above: correct me if I'm wrong - workload identity is a cluster feature, not connected to a namespace?
And seeing that you have fixed your problem by configuring the binding between KSA/K8s Namespace and GCP SA I will add a response to add more context that I believe could help clarify this.
Yes you are right, Workload identity is a GKE cluster feature that lets you bind an identity from K8s (Kubernetes Service Account (KSA)) with a GCP identity (Google Service Account(GSA)) in order to have your workloads authenticated with an specific GCP identity and with enough permissions to be able to reach certain APIs (depending on the permissions that your GCP service account has). k8s namespaces and KSA take a critical role here, as KSA are Namespaced resources.
Therefore, in order to authenticate correctly your workloads (containers) and with a GCP Service account, you need to create them in the configured k8s Namespace and with the configured KSA, as mentioned in this doc
If you create your workloads on a different k8s Namespace (meaning using a different KSA), you will not be able to get an authenticated identity for your workloads, instead of that, your workloads will be authenticated with the Workload Identity Pool/Workload Identity Namespace, which is: PROJECT_ID.svc.id.goog. Meaning that when you create a container with the GCP SDK installed and run a glcoud auth list you will get PROJECT_ID.svc.id.goog as the authenticated identity, which is an IAM object but not an identity with permission in IAM. So your workloads will be lacking of permissions.
Then you need to create your containers in the configured namespace and with the configured service account to be able to have a correct identity in your containers and with IAM permissions.
I'm assuming that above (authentication with lack of permission and lack of an actual IAM Identity) is what happened here, as you mentioned in your response, you just added the needed binding between GSA and the KSA, meaning that your container was lacking of an identity with actual IAM permissions.
Just to be clear on this, Workload Identity allows you to authenticate your workloads with a service account different from the one on your GKE nodes. If your application runs inside a Google Cloud environment that has a default service account, your application can retrieve the service account credentials to call Google Cloud APIs. Such environments include Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, and Cloud Functions, here.
With above comment I want to say that even if you do not use Workload Identity, your containers will be authenticated as they are running on GKE, which by default use a service account, and this service account is inherited from the nodes to your containers, the default service account (Compute service Account) and its scopes are enough to write from containers to Cloud Trace and that is why you were able to see traces with a GKE cluster with Workload Identity disabled, because the default service account was used on your containers and nodes.
If you test this on both environments:
GKE cluster with Workload Identity: You will be able to see, with the correct config, a service account different than the default, authenticating your workloads/containers.
GKE cluster with Workloads Identity disabled: You will be able to see the same service account used by your nodes (by default the compute engine service account with Editor role and scopes applied on your nodes when using default service account) on your Containers.
These tests can be performed by spinning the same container you used on your response, which is:
kubectl run -it \
--image google/cloud-sdk:slim \
--serviceaccount KSA_NAME \ ##If needed
--namespace K8S_NAMESPACE \ ##If needed
workload-identity-test
And running `glcoud auth list to see the identity you are authenticated with on your containers.
Hope this can help somehow!

It turned out I had misconfigured the IAM service account.
I managed to get a more meaningful error message by running a new pod in my namespace with the gcloud cli installed:
kubectl run -it \
--image gcr.io/google.com/cloudsdktool/cloud-sdk \
--serviceaccount $GKE_SERVICE_ACCOUNT test \
-- bash
after that, just running any gcloud command gave an error message containing (emphasis mine):
Unable to generate access token; IAM returned 403 Forbidden: The caller does not have permission
This error could be caused by a missing IAM policy binding on the target IAM service account.
Running
gcloud iam service-accounts get-iam-policy $SERVICE_ACCOUNT
indeed showed that the binding to the Kubernetes service account was missing.
Adding it manually fixed the issue:
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:$PROJECT.svc.id.goog[$NAMESPACE/$GKE_SERVICE_ACCOUNT]" \
$SERVICE_ACCOUNT
After more research, the underlying problem was that I created my service accounts using Config Connector but hadn't properly annotated the Kubernetes namespace with the Google Cloud project to deploy the resources in:
kubectl annotate namespace "$NAMESPACE" cnrm.cloud.google.com/project-id="$PROJECT"
Therefore, Cloud Connector could not add the IAM policy binding.

Authenticating to Google Cloud Firestore from GKE with Workload Identity

I'm trying to write a simple backend that will access my Google Cloud Firestore, it lives in the Google Kubernetes Engine. On my local I'm using the following code to authenticate to Firestore as detailed in the Google Documentation.
if (process.env.NODE_ENV !== 'production') {
const result = require('dotenv').config()
//Additional error handling here
}
This pulls the GOOGLE_APPLICATION_CREDENTIALS environment variable and populates it with my google-application-credentals.json which I got from creating a service account with the "Cloud Datastore User" role.
So, locally, my code runs fine. I can reach my Firestore and do everything I need to. However, the problem arises once I deploy to GKE.
I followed this Google Documentation to set up a Workload Identity for my cluster, I've created a deployment and verified that the pods all are using the correct IAM Service Account by running:
kubectl exec -it POD_NAME -c CONTAINER_NAME -n NAMESPACE sh
> gcloud auth list
I was under the impression from the documentation that authentication would be handled for my service as long as the above held true. I'm really not sure why but my Firestore() instance is behaving as if it does not have the necessary credentials to access the Firestore.
In case it helps below is my declaration and implementation of the instance:
const firestore = new Firestore()
const server = new ApolloServer({
schema: schema,
dataSources: () => {
return {
userDatasource: new UserDatasource(firestore)
}
}
})
UPDATE:
In a bout of desperation I decided to tear down everything and re-build it. Following everything over step by step I appear to have either encountered a bug or (more likely) I did something mildly wrong the first time. I'm now able to connect to my backend service. However, I'm now getting a different error. Upon sending any request (I'm using GraphQL, but in essence it's any REST call) I get back a 404.
Inspecting the logs yields the following:
'Getting metadata from plugin failed with error: Could not refresh access token: A Not Found error was returned while attempting to retrieve an accesstoken for the Compute Engine built-in service account. This may be because the Compute Engine instance does not have any permission scopes specified: Could not refresh access token: Unsuccessful response status code. Request failed with status code 404'
A cursory search for this issue doesn't seem to return anything related to what I'm trying to accomplish, and so I'm back to square one.

I think your initial assumption was correct! Workload Identity is not functioning properly if you still have to specify scopes. In the Workload article you have linked, scopes are not used.
I've been struggling with the same issue and have identified three ways to get authenticated credentials in the pod.
1. Workload Identity (basically the Workload Identity article above with some deployment details added)
This method is preferred because it allows each pod deployment in a cluster to be granted only the permissions it needs.
Create cluster (note: no scopes or service account defined)
gcloud beta container clusters create {cluster-name} \
--release-channel regular \
--identity-namespace {projectID}.svc.id.goog
Then create the k8sServiceAccount, assign roles, and annotate.
gcloud container clusters get-credentials {cluster-name}
kubectl create serviceaccount --namespace default {k8sServiceAccount}
gcloud iam service-accounts add-iam-policy-binding \
--member serviceAccount:{projectID}.svc.id.goog[default/{k8sServiceAccount}] \
--role roles/iam.workloadIdentityUser \
{googleServiceAccount}
kubectl annotate serviceaccount \
--namespace default \
{k8sServiceAccount} \
iam.gke.io/gcp-service-account={googleServiceAccount}
Then I create my deployment, and set the k8sServiceAccount.
(Setting the service account was the part that I was missing)
kubectl create deployment {deployment-name} --image={containerImageURL}
kubectl set serviceaccount deployment {deployment-name} {k8sServiceAccount}
Then expose with a target of 8080
kubectl expose deployment {deployment-name} --name={service-name} --type=LoadBalancer --port 80 --target-port 8080
The googleServiceAccount needs to have the appropriate IAM roles assigned (see below).
2. Cluster Service Account
This method is not preferred, because all VMs and pods in the cluster will have permissions based on the defined service account.
Create cluster with assigned service account
gcloud beta container clusters create [cluster-name] \
--release-channel regular \
--service-account {googleServiceAccount}
The googleServiceAccount needs to have the appropriate IAM roles assigned (see below).
Then deploy and expose as above, but without setting the k8sServiceAccount
3. Scopes
This method is not preferred, because all VMs and pods in the cluster will have permisions based on the scopes defined.
Create cluster with assigned scopes (firestore only requires "cloud-platform", realtime database also requires "userinfo.email")
gcloud beta container clusters create $2 \
--release-channel regular \
--scopes https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email
Then deploy and expose as above, but without setting the k8sServiceAccount
The first two methods require a Google Service Account with the appropriate IAM roles assigned. Here are the roles I assigned to get a few Firebase products working:
FireStore: Cloud Datastore User (Datastore)
Realtime Database: Firebase Realtime Database Admin (Firebase Products)
Storage: Storage Object Admin (Cloud Storage)

Going to close this question.
Just in case anyone stumbles onto it here's what fixed it for me.
1.) I re-followed the steps in the Google Documentation link above, this fixed the issue of my pods not launching.
2.) As for my update, I re-created my cluster and gave it the Cloud Datasource permission. I had assumed that the permissions were seperate from what Workload Identity needed to function. I was wrong.
I hope this helps someone.

Google Cloud Vision OCR Error Code 7 - Permission Denied

I am building a webapp that utilizes Google Cloud Vision's OCR. The OCR works fine for about 7-8 requests, after which I get an error like so:
Error: 7 PERMISSION_DENIED: Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the vision.googleapis.com. We recommend configuring the billing/quota_project setting in gcloud or using a service account through the auth/impersonate_service_account setting. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.
The problem is, I have already set up a billing account and a service account.
I have tried using multiple GCloud commands to fix this, and when I run gcloud auth list, I can see that my service account is the active account. I have also tried generating a JSON key and setting path to that key in my enviroment variables - as instructed here: https://cloud.google.com/docs/authentication/getting-started
Has anyone encountered this issue before? For reference, I am running Windows 10 and using Node.js for the webapp. Thanks!

You are authenticating using end user credentials from the Google Cloud SDK or Google Cloud Shell and not service account credentials.
1.Make a new directory
mkdir ocr
cd ocr
2.Download an image.
curl https://www.python.org/static/apple-touch-icon-144x144-precomposed.png > image.png
3.Install the client library.
sudo pi3 install --upgrade google-cloud-vision
4.Create a service account.
gcloud iam service-accounts create ocr-vision \
--description "ocr-vision" \
--display-name "ocr-vision"
gcloud iam service-accounts list
5.Create a key.json file.
gcloud iam service-accounts keys create key.json \
--iam-account ocr-vision#your-project.iam.gserviceaccount.com
6.Assign the owner role to the service account.
gcloud projects add-iam-policy-binding your-project \
--member serviceAccount:ocr-vision#your-project.iam.gserviceaccount.com \
--role roles/owner
7.Export the env variable
export GOOGLE_APPLICATION_CREDENTIALS=key.json
8.Run the script
python script.py
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = os.path.abspath('image.png')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
# Performs label detection on the image file
response = client.label_detection(image=image)
labels = response.label_annotations
print('Labels:')
for label in labels:
print(label.description)
9.Output
Labels:
Yellow
Font
Line
Material property
Clip art
Logo
Symbol
Icon
Graphics
Illustration

How to get all running PODs on Kubernetes cluster

This simple Node.js program works fine on local because it pulls the kubernetes config from my local /root/.kube/config file
const Client = require('kubernetes-client').Client;
const Config = require('kubernetes-client/backends/request').config;
const client = new K8sClient({ config: Config.fromKubeconfig(), version: '1.13' });
const pods = await client.api.v1.namespaces('xxxxx').pods.get({ qs: { labelSelector: 'application=test' } });
console.log('Pods: ', JSON.stringify(pods));
Now I want to run it as a Docker container on cluster and get all current cluster's running PODs (for same/current namespace). Now of course it fails:
Error: { Error: ENOENT: no such file or directory, open '/root/.kube/config'
So how make it work when deployed as a Docker container to cluster?
This little service needs to scan all running PODs... Assume it doesn't need pull config data since it's already deployed.. So it needs to access PODs on current cluster

Couple of concepts to grab your head around first:
Service account
Role
Role binding
To perform you end goal (which if i understand correct): Containerize Node js application
Step 1: Put application in a container
Step 2: Create a deployment/statefulset/daemonset as per you requirement using the container created above in step 1
Explanation:
In step 2 above {by default} if you do not (explicitly) mention a serviceaccount (custom) then it will be the default account the credentials of which are mounted inside the container (by default) here
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-xxxx
readOnly: true
which can be verified by this command after (successful) pod creation
kubectl get pod -n {yournamespace(by default is default)} POD_NAME -o yaml
Now (Gotchas!!) if you cannot access the cluster with those credentials then depending on which service account you are using and the rights of that serviceaccount needs to be accessed. For example if you are using abc serviceaccount which does not have rolebinding to it then you will not be able to view the cluster. In that case you need to create (first) a role (to read pods) and a rolebinding (for that role) to the serviceaccount.
UPDATE:The problem got resolved by changing Config.fromKubeconfig() to Config.getInCluster() Ref
Clarification: fromKubeconfig() function is good if you are running your application on a node which is a part of kubernetes cluster and has cluster accessing token saved here: /$USER/.kube/config but if you want to run the nodeJS appilcation in a container in a pod then you need this Config.getInCluster() to load the token.
if you are nosy enough then check the comments of this answer! :P
Note: here the nodejs library in discussion is this

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to access Files in Google Cloud Storage through GKE pods - node.js

Related

Deployed GAE Instance Does Not Have Permissions

How to use Cloud Trace with Nodejs on GKE with workload identity enabled?

Authenticating to Google Cloud Firestore from GKE with Workload Identity

Google Cloud Vision OCR Error Code 7 - Permission Denied

How to get all running PODs on Kubernetes cluster

Categories

Resources