Unable to open the kubernetes dashboard in Azure Kubernetes Service - azure

I created kubernetes cluster in my Azure resource group using Azure Kubernetes Service and login into cluster with the help of resource group credentials through Azure CLI. I could opened the kubernetes dashboard successfully for the first time. After that i deleted my resource group and other resource groups which are defaultly created along with kubernetes cluster. I created resource group and kubernetes cluster one more time in my azure account. i am trying to open the kubernetes dashboard next time, getting error like 8001 port not open. I tried with proxy port-forwarding, but i don't have idea how to hit the dashboard url with different port?.
Could anybody suggest me how to resolve this issue?

I think you need to delete your kubernetes config and pull new one with az aks get-credentials or whatever you are using, because you are probably still using config from the previous cluster (hint: it wont work because its a different cluster).
you can do that by deleting this file: ~/.kube/config and pull the new one and try kubectl get nodes. if that works try the port-forward. It is not port related. something is wrong with your config\az cli
ok, I recall in the previous question you mentioned you started using RBAC, you need to add cluster role to the dashboard:
kubectl create clusterrolebinding kubernetes-dashboard --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard
https://learn.microsoft.com/en-us/azure/aks/kubernetes-dashboard#for-rbac-enabled-clusters

Related

Azure AKS - give entire cluster access to Azure Key Vault

I'm trying to find a way to give an entire AKS cluster to Azure Key vault. I have temporarily got this working by following the below process:
Go to the VMSS of the cluster -> Identity -> Set System Assigned Status to 'On'
Add this Managed identity as an access policy to Key Vault.
This works, however whenever I stop and start the cluster, I have to re-create this managed identity and re-add it to Key Vault. I have tried using the User Assigned Identities for the vmss as well but that does not seem to work.
I also cannot use the azure pod identities/CSI features for other reasons so I'm just looking for a simple way to give my cluster permanent access to key Vault.
Thanks in advance
Pod is smallest unit in Kubernetes. Pod is a group of one or more containers that are deployed together on the same host (node).
Pod runs a node which is controlled by master.
Pod uses OS level virtualization which can consume resources of VMSS when it runs and based on requirement.
Stopping and restarting cluster/nodes pod will lose all the resources that leads to loss of pods. So, there will be no pod under VMSS until you restart. In case you restart your cluster/node, the new pod will be created with different name and with another IP address.
From this github discussion, I found that MIC (Managed Identity Cluster) removes the identity from the underlying VMSS when no pods are configured to use that identity. So, you have to recreate the Managed Identity for VMSS.
You can refer this link for better understanding how to access keyvault from Azure AKS.

Kubernetes pods using invalid Azure Active Directory access tokens

When deploying new jobs and services to Azure Kubernetes Service cluster, the pods fail to request valid AAD access tokens with all permissions available. If new permissions were added on the same day, before or after a deployment, the tokens still do not pick them up. This issue has been observed with permissions granted to Active Directory Groups over Key Vaults, Storage Accounts, and SQL databases scopes so far.
Example: I have a .NET 5.0 C# API running on 3 pods with antiaffinity rules located each on a separate node. The application reads information from a SQL database. I made a release and added the database permissions afterwards. Things I have tried so far to make the application reset the access tokens:
kubectl delete pods --all -n <namespace> which essentially created 3 new pods again failing due to insufficient permissions.
kubectl apply -f deployment.yaml to deploy a new version of the image running in the containers, again all 3 pods kept failing.
kubectl delete -f deployment.yaml followed by kubectl apply -f deployment.yaml to erase the old kubernetes object and create a new one. This resolved the issue on 2/3 pods, however, the third one kept failing due to insufficient permissions.
kubectl delete namespace <namespace> to erase the entire namespace with all configuration available and recreated it again. Surprisingly, again 2/3 pods were running with the correct permissions and the last one did not.
The commands were ran more than one hour after the permissions were added to the group. The database tokens are active for 24 hours and when I have seen this issue occur with cronjobs, I had to wait 1 day for the task to execute correctly (none of the above steps worked in a cronjob scenario). The validity of the tokens kept changing which implied that the pods are requesting new access tokens, again excluding the most recently added permissions. The only solution I have found that works 100% of the time is destroy the cluster and recreate it which is not viable in any production scenario.
The failing pod from my example was the one always running on node 00 which made me think there may be an extra caching layer on the first initial node of the cluster. However, I still do not understand why the other 2 pods were running with no issue and also what is the way to restart my pods or refresh the access token to minimise the wait time until resolution.
Kubernetes version: 1.21.7.
The cluster has no AKS-managed AAD or pod-identity enabled. All RBAC is granted to the cluster MSI via AD groups.
Please check if below can be worked around in your case.
To access the Kubernetes resources, you must have access to the AKS cluster, the Kubernetes API, and the Kubernetes objects. Ensure that you're either a cluster administrator or a user with the appropriate permissions to access the AKS cluster
Things you need to do, if you haven't already:
Enable Azure RBAC on your existing AKS cluster, using:
az aks update -g myResourceGroup -n myAKSCluster --enable-azure-rbac
Create Role that allows read access to all other Pods and Services:
Add the necessary roles (Azure Kubernetes Service Cluster User Role , Azure Kubernetes Service RBAC Reader/Writer/Admin/Cluster Admin) to the user. See ( Microsoft Docs).
Also check Troubleshooting
Also check if you need to have "Virtual Machine Contributor" and storage account contributer for your resource group containing pods and see if namespace is mentioned in that pod , if you have missed . Stack Overflow refernce.Also do check if firewall is restricting the access to the network in that pod.
Resetting the kubeconfig context using the az aks get-credentials command may clear the previously cached authentication token for some xyz user:
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster --overwrite-existing >Reference
Please do check Other References below:
kubernetes - Permissions error - Stack Overflow
create-role-assignments-for-users-to-access-cluster | microsoft docs
user can't access to AKS cluster with RBAC enabled (github.com)
kubernetes - Stack Overflow

Azure k8s dashboard does not open

I have k8s cluster on Azure and can not access the dashboard.
To access it I was doing aks browse --resource-group <res_group> --name <cluster_name>
It does not open after accidentally deleted the kube-dashboard pod.
Error:
Couldn't find the Kubernetes dashboard pod.
Did try to enable-disbale dashboard add-on on Azure.
Re-install k8s-dashboard. (Azure did not allow)
Any ideas on how to solve the issue and restart the dashboard?
Did find the following solution that worked for me:
Created another Azure k8s cluster. For each cluster Azure makes a dashboard
deployment.
Copied the yaml files with the command:
kubectl get deployment -n kube-system <kubernetes-dasboard-xxx>
for each "deployment, replicaSet, service and pod related to dashboard"
Recreated them into the old not working cluster.
Upgraded-downgraded the cluster version to re-deploy the objects.
Depends on your k8s version, AKS doesn't enable dashboard while creating a new cluster. You can find details in below link.
https://learn.microsoft.com/en-us/azure/aks/kubernetes-dashboard
And I suggest you, can directly install dashboard from kubernetes dashboard page, it is installing dashboard another namespace(it it better actually) and you can create and RBAC account to see all resources as an admin privileges.
https://github.com/kubernetes/dashboard
https://github.com/kubernetes/dashboard/blob/master/docs/user/access-control/creating-sample-user.md
And also you can use --enable-addons
https://learn.microsoft.com/en-us/azure/aks/kubernetes-dashboard

Azure AKS Public IP in Non-standard Resource Group

I've been trying to manage an Azure Kubernetes Service (AKS) instance via Terraform. When I create the AKS instance via the Azure CLI per this MS tutorial, then install an ingress controller with a static public IP, per this MS tutorial, everything works fine. This method implicitly creates a service principal (SP).
When I create an otherwise exact duplicate of the AKS cluster via Terraform, I am forced to supply the service principal explicitly. I gave this new SP "Contributor" access to the cluster's entire resource group yet, when I get to the step to create the ingress controller (using the same command that tutorial 2 provided, above: helm install stable/nginx-ingress --set controller.replicaCount=2 --set controller.service.loadBalancerIP="XX.XX.XX.XX"), the ingress service comes up but it never acquires its public IP. The IP status remains "<pending>" indefinitely, and I can find nothing in any log about why. Are there logs that should tell me why my IP is still pending?
Again, I am fairly certain that, other than the SP, the Terraform AKS cluster is an exact duplicate of the one created based on the MS tutorial. Running terraform plan finds no differences between the two. Does anyone have any idea what permission my AKS SP might need or what else I might be missing here? Strangely, I can't find ANY permissions assigned to the implicitly created principal via the Azure portal, but I can't think of anything else that might be causing this behavior.
Not sure if it's a red herring or not, but other users have complained about a similar problem in the context of issues opened against the second tutorial. Their fix always appears to be "tear down your cluster and retry", but that isn't an acceptable solution in this context. I need a reproducible working cluster and azurerm_kubernetes_cluster doesn't currently allow for building an AKS instance with an implicitly created SP.
I'm going to answer my own question, for posterity. It turns out the problem was the resource group where I created the static public IP. AKS clusters use two resource groups: the group that you explicitly created the cluster in, and a second group which is implicitly created by the cluster. That second, implicit resource group always gets a name starting with "MC_" (the rest of the name is derivative of the explicit RG, the cluster name, and the region).
Anyhow, the default AKS configuration requires that the public IP be created within that implicit resource group. Assuming that you created the AKS cluster with Terraform, its name will be exported in ${azurerm_kubernetes_cluster.NAME.node_resource_group}.
EDIT 2019-05-23
Since writing this, we found a use case that the workaround of using the MC_* resource group wasn't good enough for. I opened a support ticket with MS and they directed me to this solution. Add the following annotation to your LoadBalancer (or Ingress controller), and make sure that the AKS SP has at least Network Contributor rights in the destination resource group (myResourceGroup in the example below):
metadata:
annotations:
service.beta.kubernetes.io/azure-load-balancer-resource-group: myResourceGroup
This solved it completely for us.
Set Static IP Resource Group when Installing Helm Chart
Here is a minimal helm install command for nginx-controller that works when the static IP is in a different resource group than the cluster managed node resource group.
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx \
--set controller.replicaCount=1 \
--set controller.service.externalTrafficPolicy=Local \
--set controller.service.loadBalancerIP=$ingress_controller_ip \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-resource-group"=$STATIC_IP_ROSOURCE_GROUP
The key is the last override to provide the resource group of the static IP.
Also, note that you may need to customize the load balancer health probe if your root path doesn't return a successful http response. We do this by additionally adding the following (replace /healthz with your probe EP):
Additional Note: Health Probe Endpoints
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz
Versions
Kubernetes 1.22.6
ingress-nginx-4.1.0
ingress-nginx/controller:v1.2.0
I can't comment just yet so putting this addition as answer.
Derek is right, you can totally use existing IP from a resource group different to where AKS cluster was provisioned. There is the documentation page. Just make sure you've done these two steps below:
Add "Network Contributor" role assignment for your AKS service principal to the resource group where your existing static IP is.
Add service.beta.kubernetes.io/azure-load-balancer-resource-group: myResourceGroup to the ingress controller with the following command:
kubectl annotate service ingress-nginx-controller -n ingress service.beta.kubernetes.io/azure-load-balancer-resource-group=datagate

Error when trying to get Azure Kubernetes Service to use Cluster Load balancer from Service

I'm working to get Streamsets Data Collector running in Azure Kubernetes Service (AKS) and when I run kubectl .... the service appears to be up, however its giving this error. This is an RBAC AKS Cluster so I think I need to give the service principal permissions AND/OR do a cluster role binding to that service in Kubernetes. Any ideas?
The error shows invalid client. It probably means that the original service principal secret of your AKS cluster is invalid or expired. See the similar error here.
Follow that link, you can find the original client secret when you deploy the AKS cluster, so that you can re-add that as a key to the Service Principal. On the master and node VMs in the Kubernetes cluster, the service principal credentials are stored in the file /etc/kubernetes/azure.json.
On the VM page---Run command---choose RunShellScript---Enter cat /etc/kubernetes/azure.json and click "Run" then you could find the property aadClientSecret.
For more details, you could read Service principals with Azure Kubernetes Service (AKS)

Resources