Unable to pull image from Azure Container Registry - azure

We recently had an issue with our Azure Kubernetes Cluster not reporting back any data through the Azure Portal. To fix this, I updated the Kubernetes version to the latest version as was recommended on GitHub. After the upgrade was complete, we were able to view logs and monitoring data through the portal, but one of the containers stored in our Azure Container Registry is not able to be pulled by the Kubernetes Cluster.
The error I see in the Kuberenetes Management page is:
Failed to pull image "myacr.azurecr.io/container:190305.191": [rpc error: code = Unknown desc = Error response from daemon: Get https://myacr.azurecr.io/v2/mycontainer/manifests/190305.191: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://myacr.azurecr.io/v2/mycontainer/manifests/190305.191: unauthorized: authentication required]
My original setup used the first script provided in this document and it worked correctly without issue. Once I started receiving the error, I ran it again just to make sure.
Once I saw that failed, I then deleted the account from the permissions on both the ACR and the AKS. Again, it failed to pull the image.
After that, I tried using the second method of creating an Kubernetes secret and received the same error.
At this point, I'm unsure what else to check. I've verified that I can run docker pull on my machine and pull the image, but there seems to be a breakdown between the AKS and the ACR that I can not sort out.

It's been a while since I originally posted this, but I did stumble across a currently stable solution to the problem.
The service principal, for whatever reason, is not able to maintain a connection to the ACR. So if your cluster ever goes down, you lose the ability to pull from the ACR. I had this happen multiple times over the last year and as I moved more of my Kubernetes deployment to Azure, it became a bigger and bigger issue.
I stumbled across this Microsoft Doc and noticed the mention of the --attach-acr command.
This is what the full command looks like:
az aks create -n myAKSCluster -g myResourceGroup --generate-ssh-keys --attach-acr $MYACR
Since setting it up with that flag, I have had 0 issues with it.
knock on wood

Related

AKS Cluster deployment fails with "ReconcileMSICredentialError"

When I try to deploy a fresh AKS cluster with "Dev/Test" Settings via the Portal, I get the following error while deployment:
{"code":"DeploymentFailed","message":"At least one resource deployment operation failed.
Please list deployment operations for details. Please see
https://aka.ms/DeployOperations for usage details.","details":
[{"code":"ReconcileMSICredentialError","message":"Reconcile MSI credential failed.
Details: autorest/azure: Service returned an error. Status=409 Code=\"Conflict\"
Message=\"Secret bf905bf9e9ad86526b26e98d2ea490a0a500ff23907f9df987d95de3a649e751 is
currently being deleted and cannot be re-created; retry later.\" InnerError=
{\"code\":\"ObjectIsBeingDeleted\"}."}]}
However, the resource still gets deployed, but with a notification that "the resource is in a failed state". When I stop the cluster and start it new, the notification disappears but I'm not sure if the error remains.
I can avoid the error altogether, if I pick a new name for the cluster. However, I'd like to keep the old name.
The same happens when I deploy with different settings (CPU, number of nodes, etc.). I also tried deleting the cluster entirely and deploying it completely new but the error persists. I haven't found any explanation to this error either on Stackoverflow or Google.
What could be the reason for this error and how to avoid it?
I tried to reproduce the same issue in my environment and got the below results
I have created the AKS cluster with dev/test environment
The reference cluster is successfully created
I have given the some credentials to the cluster using below command
az aks get-credentials --resource-group Alldemorg --name cluster_name
*Created the sample application and deployed that application into the cluster,
I have used the following Reference for example sample file.*
Deployment got succeeded and I am able to see all the pods and nodes which got created for the application
Note:
1). "ReconcileMSICredentialError" error we are getting because of the version please check the version and upgrade to latest
2). If the resource is in failed stated delete the entire resource instead of deleting cluster and create it again if we stop and start the resource may chance of getting "ReconcileMSICredentialError".

ImagePullbackOff : minikube k8s

i'm trying to add spark to minikube with using this blog as tutorial to help me : https://medium.com/#rewelle/d%C3%A9ploiement-dune-architecture-compl%C3%A8te-big-data-avec-kubernetes-570eaa0e627
i get ImagePullBackOff like status when he tried to add the pod : spark-standalone-worker-1.
And that's what i get when i run : kubectl describe pods sparl-standalone-worker-1 :
The status ImagePullBackOff means that a Pod couldn’t start, because Kubernetes couldn’t pull a container image. The ‘BackOff’ part means that Kubernetes will keep trying to pull the image, with an increasing delay (‘back-off’).
This is a very common reason for ImagePullBackOff since Docker introduced rate limits on Docker Hub. Once you hit your maximum download limit on Docker Hub, you’ll be blocked and this might cause your ImagePullBackOff error. You’ll either need to sign in with an account, or find another place to get your image from.
This error can also happen if your registry requires SSL/TLS authentication, but you don’t trust its certificate. Make sure you follow the instructions to set up TLS authentication.
Failed to pull image ... authentication required
You need to create a Secret for the Docker Registry, or you have been rate-limited from pulling new containers for a while.

Failed to pull image - Azure AKS

I have been following this guide to deploy application on Azure using AKS
Every thing was fine until I deployed, one node is in not ready state with ImagePullBackOff status
kubectl describe pod output
Performing below command I get success command, so I am sure authentication is happening
az acr login --name siddacr
and this command lists out the image which was uploaded
az acr repository list --name <acrName> --output table
I figured out.
The error was in the name of the image in deployment.yml file
imagebackpulloff might be caused because of the following reasons:
The image or tag doesn’t exist
You’ve made a typo in the image name or tag
The image registry requires authentication
You’ve exceeded a rate or download limit on the registry

ImagePullBackOff error backoff github packages azure aks

I am deploying my services Azure AKS. I am running into an issue where I get a ImagePullBackOff error. Here is some context.
I have 2 nodepools one with --enable-node-public-ip option and another one without the node public-ip enabled option. I am trying to deploy a Daemonset resource. The container image is hosted on GitHub package registry. The issue is, the nodes that don't have a public IP enabled are successfully able to pull the images whereas, the nodes that have ppublic-ip enabled has an error.
Here is the error:
Failed to pull image "docker.pkg.github.com/xyz": rpc error: code = NotFound desc = failed to pull and unpack image "docker.pkg.github.com/xyz"
I would appreciate help on this.
ImagePullBackOff and ErrImagePull indicate that the image used by a container cannot be loaded from the image registry. Make sure you don't have a typo in image definition.
Try docker pull docker.pkg.github.com/OWNER/REPOSITORY/IMAGE_NAME:TAG_NAME beforehand. Afterwards, you should give docker.pkg.github.com/OWNER/REPOSITORY/IMAGE_NAME:TAG_NAME in Daemonset definition.
It is not an authentication issue.

Pull images from an Azure container registry to a Kubernetes cluster

I have followed this tutorial microsoft_website to pull images from an azure container. My yaml successfully creates a pod job, which can pull the image, BUT only when it runs on the agentpool node in my cluster.
For example, adding nodeName: aks-agentpool-33515997-vmss000000 to the yamlworks fine, but specifying a different node name, e.g. nodeName: aks-cpu1-33515997-vmss000000, the pod fails. The error message I get with describe pods is Failed to pull image and then kubelet Error: ErrImagePull.
What I'm missing?
Create secret:
kubectl create secret docker-registry <secret-name> \
--docker-server=<container-registry-name>.azurecr.io \
--docker-username=<service-principal-ID> \
--docker-password=<service-principal-password>
As #user1571823 told solution to the problem is deleting the old image from the acr and creating/pushing a new one.
The problem was related to some sort of corruption in the image saved in the azure container registry (acr). The reason why one agent pool could pulled the image was actually because the image already existed in the VM.
Henceforth as #andov said it is good option to open an incident case to Azure support for AKS from your subscription, where AKS is deployed. The support team has full access to the AKS service backend and they can tell exactly what was causing your problem.
Four things to check:
Is it a subscription issue? Are the nodes in different subscriptions?
Is it a rights issue? Does the service principle of the node have rights to pull the image.
Is it a network issue? Are the nodes on different subnets?
Is there something with the image size or configuration, that means that it cannot run on the other cluster.
Edit
New-AzAksNodePool has a parameter -DefaultProfile
It can be AzContext, AzureRmContext, AzureCredential
If this is different between your nodes it would explain the error

Resources