How to connect Vault with Consul agent on Kubernetes via Helm chart (Consul server is on Azure managed app) - azure

Requesting for some help on : How to connect Vault with Consul agent on Kubernetes via Helm chart (Consul server is on Azure managed app)
I'm trying to build a POC, for Vault and consul and got some questions.
Deployed Azure managed app using - https://learn.hashicorp.com/tutorials/consul/hashicorp-consul-service-deploy
Installed consul agent on AKS with the steps in the https://learn.hashicorp.com/tutorials/consul/hashicorp-consul-service-aks?in=consul/hcs-azure
Consul helm chart : https://github.com/hashicorp/consul-helm
Installed vault via helm chart: https://github.com/hashicorp/vault-helm
Kubernetes services and pods for consul.
~$kubectl get svc -n consul
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul-connect-injector-svc ClusterIP 10.0.252.97 <none> 443/TCP 2d13h
consul-controller-webhook ClusterIP 10.0.169.80 <none> 443/TCP 2d13h
~$kubectl get pods -n consul
NAME READY STATUS RESTARTS AGE
consul-27j4j 1/1 Running 0 2d13h
consul-connect-injector-webhook-deployment-9454b8d68-778rd 1/1 Running 0 2d13h
consul-controller-7857456f99-mhzpw 1/1 Running 1 2d13h
consul-lkhpl 1/1 Running 0 2d13h
consul-webhook-cert-manager-cfbb689f7-fgtlw 1/1 Running 0 2d13h
consul-zf989 1/1 Running 0 2d13h
vault config as below:
ui:
enabled: true
serviceType: LoadBalancer
server:
ingress:
enabled: true
extraPaths:
- path: /
backend:
serviceName: vault-ui
servicePort: 8200
hosts:
- host: vault.something_masked.com
ha:
enabled: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "consul" {
path = "vault/"
scheme = "https"
address = "HOST_IP:8500"
}
Vault pods
kubectl get pods -n vault
NAME READY STATUS RESTARTS AGE
vault-0 0/1 Running 0 7m14s
vault-1 0/1 Running 0 7m11s
vault-2 0/1 Running 0 7m11s
vault-agent-injector-cbbb6f4df-rmbd7 1/1 Running 0 7m22s
ERROR : Vault is unable to make communication with consul agent.
Logs for vault-0 pod
kubectl logs vault-0 -n vault
WARNING! Unable to read storage migration status.
2021-06-27T08:37:17.801Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2021-06-27T08:37:18.824Z [WARN] storage migration check error: error="Get "https://10.54.0.206:8500/v1/kv/vault/core/migration": dial tcp 10.54.0.206:8500: connect: connection refused"
Logs for vault-agent-injector pod
kubectl logs vault-agent-injector-cbbb6f4df-rmbd7 -n vault
2021-06-27T08:37:09.189Z [INFO] handler: Starting handler..
Listening on ":8080"...
2021-06-27T08:37:09.218Z [INFO] handler.auto-tls: Generated CA
2021-06-27T08:37:09.219Z [INFO] handler.certwatcher: Updated certificate bundle received. Updating certs...
2021-06-27T08:37:18.252Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-06-27T08:37:18.452Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=10s
Any suggestions or advice in the vault configuration if I have missed something ??
Thank you in advance.
Regards
Pooja

There are many points to debug the issue and the problem can be anywhere
the major issue I am seeing right now is your vault pods are running but not in a Ready state.
You have to unseal the vault from seal status.
Read more about the Unseal : https://learn.hashicorp.com/tutorials/vault/ha-with-consul#step-5-start-vault-and-verify-its-state
also how your consul is iteracting with vault ? Using LB, Ingress, service name ?

I am using consul azure managed app server and i have installed consul agent on aks.
kubectl get svc -n consul
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul-connect-injector-svc ClusterIP 10.0.252.97 <none> 443/TCP 3d13h
consul-controller-webhook ClusterIP 10.0.169.80 <none> 443/TCP 3d13h
as i am using consul agent i do not see consul-server running.
Helm chart config for consul
global:
enabled: false
name: consul
datacenter: dc1
acls:
manageSystemACLs: true
bootstrapToken:
secretName: XXX-sandbox-managed-app-bootstrap-token
secretKey: token
gossipEncryption:
secretName: XXX-sandbox-managed-app-hcs
secretKey: gossipEncryptionKey
tls:
enabled: true
enableAutoEncrypt: true
caCert:
secretName: XXX-sandbox-managed-app-hcs
secretKey: caCert
externalServers:
enabled: true
hosts:
['XXX.az.hashicorp.cloud']
httpsPort: 443
useSystemRoots: true
k8sAuthMethodHost: https://XXX.uksouth.azmk8s.io:443
client:
enabled: true
# If you are using Kubenet in your AKS cluster (the default network),
# uncomment the line below.
# exposeGossipPorts: true
join:
['XXX.az.hashicorp.cloud']
connectInject:
enabled: true
controller:
enabled: true
Helm chart config for vault
ui:
enabled: true
serviceType: LoadBalancer
server:
ingress:
enabled: true
extraPaths:
- path: /
backend:
serviceName: vault-ui
servicePort: 8200
hosts:
- host: something.com
ha:
enabled: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "consul" {
path = "vault/"
scheme = "https"
address = "HOST_IP:8500"
}
Error in vault pod which is unable to connect to consul agent.
kubectl logs vault-0 -n vault
WARNING! Unable to read storage migration status.
2021-06-28T08:13:13.041Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2021-06-28T08:13:13.042Z [WARN] storage migration check error: error="Get "https://127.0.0.1:8500/v1/kv/vault/core/migration": dial tcp 127.0.0.1:8500: connect: connection refused"
i am not sure if some configuration is missed in consul helm chart as i do not see any service running on port 8500 in consul namespace.
Any suggestion would be much appreciated.
Thanks,
pooja

Related

Unable to get Azure Key Vault integrated with Azure Kubernetes Service

Stuck on getting this integration working. I'm following the documentation step-by-step.
The following is everything I have done starting from scratch, so if it isn't listed here, I haven't tried it (I apologize in advance for the long series of commands):
# create the resource group
az group create -l westus -n k8s-test
# create the azure container registery
az acr create -g k8s-test -n k8stestacr --sku Basic -l westus
# create the azure key vault and add a test value to it
az keyvault create --name k8stestakv --resource-group k8s-test -l westus
az keyvault secret set --vault-name k8stestakv --name SECRETTEST --value abc123
# create the azure kubernetes service
az aks create -n k8stestaks -g k8s-test --kubernetes-version=1.19.7 --node-count 1 -l westus --enable-managed-identity --attach-acr k8stestacr -s Standard_B2s
# switch to the aks context
az aks get-credentials -b k8stestaks -g k8s-test
# install helm charts for secrets store csi
helm repo add csi-secrets-store-provider-azure https://raw.githubusercontent.com/Azure/secrets-store-csi-driver-provider-azure/master/charts
helm install csi-secrets-store-provider-azure/csi-secrets-store-provider-azure --generate-name
# create role managed identity operator
az role assignment create --role "Managed Identity Operator" --assignee <k8stestaks_clientId> --scope /subscriptions/<subscriptionId>/resourcegroups/MC_k8s-test_k8stestaks_westus
# create role virtual machine contributor
az role assignment create --role "Virtual Machine Contributor" --assignee <k8stestaks_clientId> --scope /subscriptions/<subscriptionId>/resourcegroups/MC_k8s-test_k8stestaks_westus
# install more helm charts
helm repo add aad-pod-identity https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts
helm install pod-identity aad-pod-identity/aad-pod-identity
# create identity
az identity create -g MC_k8s-test_k8stestaks_westus -n TestIdentity
# give the new identity a reader role for AKV
az role assignment create --role "Reader" --assignee <TestIdentity_principalId> --scope /subscriptions/<subscription_id/resourceGroups/k8s-test/providers/Microsoft.KeyVault/vaults/k8stestakv
# allow the identity to get secrets from AKV
az keyvault set-policy -n k8stestakv --secret-permissions get --spn <TestIdentity_clientId>
That is pretty much it for az cli commands. Everything up to this point executes fine with no errors. I can go into the portal, see these roles for the MC_ group, the TestIdentity with read-only for secrets, etc.
After that, the documentation has you build secretProviderClass.yaml:
apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass
metadata:
name: azure-kvname
spec:
provider: azure
parameters:
usePodIdentity: "true"
useVMManagedIdentity: "false"
userAssignedIdentityID: ""
keyvaultName: "k8stestakv"
cloudName: ""
objects: |
array:
- |
objectName: SECRETTEST
objectType: secret
objectVersion: ""
resourceGroup: "k8s-test"
subscriptionId: "<subscriptionId>"
tenantId: "<tenantId>"
And also the podIdentityBinding.yaml:
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
name: azureIdentity
spec:
type: 0
resourceID: /subscriptions/<subscriptionId>/resourcegroups/MC_k8s-test_k8stestaks_westus/providers/Microsoft.ManagedIdentity/userAssignedIdentities/TestIdentity
clientID: <TestIdentity_clientId>
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
name: azure-pod-identity-binding
spec:
azureIdentity: azureIdentity
selector: azure-pod-identity-binding-selector
Then just apply them:
# this one executes fine
kubectl apply -f k8s/secret/secretProviderClass.yaml
# this one does not
kubectl apply -f k8s/identity/podIdentityBinding.yaml
Problem #1
With the last one I get:
unable to recognize "k8s/identity/podIdentityBinding.yaml": no matches for kind "AzureIdentity" in version "aadpodidentity.k8s.io/v1"
unable to recognize "k8s/identity/podIdentityBinding.yaml": no matches for kind "AzureIdentityBinding" in version "aadpodidentity.k8s.io/v1"
Not sure why because the helm install pod-identity aad-pod-identity/aad-pod-identity command was successful. Looking at my Pods however...
Problem #2
I've followed these steps three times and every time the issue is the same--the aad-pod-identity-nmi-xxxxx will not launch:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
aad-pod-identity-mic-7b4558845f-hwv8t 1/1 Running 0 37m
aad-pod-identity-mic-7b4558845f-w8mxt 1/1 Running 0 37m
aad-pod-identity-nmi-4sf5q 0/1 CrashLoopBackOff 12 37m
csi-secrets-store-provider-azure-1613256848-cjlwc 1/1 Running 0 41m
csi-secrets-store-provider-azure-1613256848-secrets-store-m4wth 3/3 Running 0 41m
$ kubectl describe pod aad-pod-identity-nmi-4sf5q
Name: aad-pod-identity-nmi-4sf5q
Namespace: default
Priority: 0
Node: aks-nodepool1-40626841-vmss000000/10.240.0.4
Start Time: Sat, 13 Feb 2021 14:57:54 -0800
Labels: app.kubernetes.io/component=nmi
app.kubernetes.io/instance=pod-identity
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aad-pod-identity
controller-revision-hash=669df55fd8
helm.sh/chart=aad-pod-identity-3.0.3
pod-template-generation=1
tier=node
Annotations: <none>
Status: Running
IP: 10.240.0.4
IPs:
IP: 10.240.0.4
Controlled By: DaemonSet/aad-pod-identity-nmi
Containers:
nmi:
Container ID: containerd://5f9e17e95ae395971dfd060c1db7657d61e03052ffc3cbb59d01c774bb4a2f6a
Image: mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4
Image ID: mcr.microsoft.com/oss/azure/aad-pod-identity/nmi#sha256:0b4e296a7b96a288960c39dbda1a3ffa324ef33c77bb5bd81a4266b85efb3498
Port: <none>
Host Port: <none>
Args:
--node=$(NODE_NAME)
--http-probe-port=8085
--operation-mode=standard
--kubelet-config=/etc/default/kubelet
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Sat, 13 Feb 2021 15:34:40 -0800
Finished: Sat, 13 Feb 2021 15:34:40 -0800
Ready: False
Restart Count: 12
Limits:
cpu: 200m
memory: 512Mi
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://:8085/healthz delay=10s timeout=1s period=5s #success=1 #failure=3
Environment:
NODE_NAME: (v1:spec.nodeName)
FORCENAMESPACED: false
Mounts:
/etc/default/kubelet from kubelet-config (ro)
/run/xtables.lock from iptableslock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from aad-pod-identity-nmi-token-8sfh4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
iptableslock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
kubelet-config:
Type: HostPath (bare host directory volume)
Path: /etc/default/kubelet
HostPathType:
aad-pod-identity-nmi-token-8sfh4:
Type: Secret (a volume populated by a Secret)
SecretName: aad-pod-identity-nmi-token-8sfh4
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 38m default-scheduler Successfully assigned default/aad-pod-identity-nmi-4sf5q to aks-nodepool1-40626841-vmss000000
Normal Pulled 38m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 14.677657725s
Normal Pulled 38m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 5.976721016s
Normal Pulled 37m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 627.112255ms
Normal Pulling 37m (x4 over 38m) kubelet Pulling image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4"
Normal Pulled 37m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 794.669637ms
Normal Created 37m (x4 over 38m) kubelet Created container nmi
Normal Started 37m (x4 over 38m) kubelet Started container nmi
Warning BackOff 3m33s (x170 over 38m) kubelet Back-off restarting failed container
I guess I'm not sure if both problems are related and I haven't been able to get the failing Pod to start up.
Any suggestions here?
Looks it is related to the default network plugin that AKS picks for you if you don't specify "Advanced" for network options: kubenet.
This integration can be done with kubenet outlined here:
https://azure.github.io/aad-pod-identity/docs/configure/aad_pod_identity_on_kubenet/
If you are creating a new cluster, enable Advanced networking or add the --network-plugin azure flag and parameter.

Kubernetes challenge waiting for http-01 propagation: dial tcp: no such host

I am trying to create a kubernetes cluster namespace with auto generated DNS for ingress, secured with Let's Encrypt TLS certificates. Unfortunately i'm running in some trouble and do not know where to look for the solution.
Deployment is being done with a multi-stage yaml pipeline into an AKS cluster, i've setup an nginx ingress controller and cert-manager, both in a separate namespace. The deployment succeeds and everything seems to be running, but the exposed hostnames from the ingress are not reachable. When taking a look at the certificates i see the following
Name: letsencrypt-tls-cd
Namespace: myApp-dev
Labels: app.kubernetes.io/instance=myApp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cd
app.kubernetes.io/version=9.3.0
helm.sh/chart=cd-1.0.0
Annotations: <none>
API Version: cert-manager.io/v1alpha3
Kind: Certificate
Metadata:
Creation Timestamp: 2020-06-15T11:59:53Z
Generation: 1
Owner References:
API Version: extensions/v1beta1
Block Owner Deletion: true
Controller: true
Kind: Ingress
Name: myApp-cd
UID: a6cbbf69-749e-4dd1-81cc-37a817051690
Resource Version: 1218430
Self Link: /apis/cert-manager.io/v1alpha3/namespaces/myApp-dev/certificates/letsencrypt-tls-cd
UID: 46ac0acb-71bf-4dbc-a376-c024e92d68ca
Spec:
Dns Names:
cd-myApp-dev.dev
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-prod
Secret Name: letsencrypt-tls-cd
Status:
Conditions:
Last Transition Time: 2020-06-15T11:59:53Z
Message: ***Waiting for CertificateRequest "letsencrypt-tls-cd-95531636" to complete***
Reason: InProgress
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal GeneratedKey 57m cert-manager Generated a new private key
Normal Requested 57m cert-manager Created new CertificateRequest resource "letsencrypt-tls-cd-95531636"
Looking into the certificate request :
Name: letsencrypt-tls-cd-95531636
Namespace: myApp-dev
Labels: app.kubernetes.io/instance=myApp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cd
app.kubernetes.io/version=9.3.0
helm.sh/chart=cd-1.0.0
Annotations: cert-manager.io/certificate-name: letsencrypt-tls-cd
cert-manager.io/private-key-secret-name: letsencrypt-tls-cd
API Version: cert-manager.io/v1alpha3
Kind: CertificateRequest
Metadata:
Creation Timestamp: 2020-06-15T11:59:54Z
Generation: 1
Owner References:
API Version: cert-manager.io/v1alpha2
Block Owner Deletion: true
Controller: true
Kind: Certificate
Name: letsencrypt-tls-cd
UID: 46ac0acb-71bf-4dbc-a376-c024e92d68ca
Resource Version: 1218442
Self Link: /apis/cert-manager.io/v1alpha3/namespaces/myApp-dev/certificaterequests/letsencrypt-tls-cd-95531636
UID: 2bef5e93-6722-43c0-bd2c-283d70334b1c
Spec:
Csr: mySecret
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-prod
Status:
Conditions:
Last Transition Time: 2020-06-15T11:59:54Z
Message: Waiting on certificate issuance from order myApp-dev/letsencrypt-tls-cd-95531636-1679437339: "pending"
Reason: Pending
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal OrderCreated 58m cert-manager Created Order resource myApp-dev/letsencrypt-tls-cd-95531636-1679437339
And the challenge:
Name: letsencrypt-tls-cm-1259919220-2936945618-694921812
Namespace: myApp-dev
Labels: <none>
Annotations: <none>
API Version: acme.cert-manager.io/v1alpha3
Kind: Challenge
Metadata:
Creation Timestamp: 2020-06-15T11:59:55Z
Finalizers:
finalizer.acme.cert-manager.io
Generation: 1
Owner References:
API Version: acme.cert-manager.io/v1alpha2
Block Owner Deletion: true
Controller: true
Kind: Order
Name: letsencrypt-tls-cm-1259919220-2936945618
UID: 4d8eab8e-449b-494e-a751-912a77671223
Resource Version: 1218492
Self Link: /apis/acme.cert-manager.io/v1alpha3/namespaces/myApp-dev/challenges/letsencrypt-tls-cm-1259919220-2936945618-694921812
UID: 8b355336-309a-4192-83b7-41397ebc20ac
Spec:
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/5253543313
Dns Name: cm-myApp-dev.dev
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-prod
Key: 0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI.qZ3FGlVmwRY6MwBNqUR5iktM1fJWdXxFWZYFOpjSUkQ
Solver:
http01:
Ingress:
Class: nginx
Pod Template:
Metadata:
Spec:
Node Selector:
kubernetes.io/os: linux
Token: 0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI
Type: http-01
URL: https://acme-v02.api.letsencrypt.org/acme/chall-v3/5253543313/1eUG0g
Wildcard: false
Status:
Presented: true
Processing: true
Reason: Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI': Get "http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI": dial tcp: lookup cm-myApp-dev.dev on 10.0.0.10:53: no such host
State: pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Started 2m15s cert-manager Challenge scheduled for processing
Normal Presented 2m14s cert-manager Presented challenge using http-01 challenge mechanism
I'm quite new to kubernetes and don't know where to look to fix the error bellow, any help is greatly appreciated.
Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI': Get "http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI": dial tcp: lookup cm-myApp-dev.dev on 10.0.0.10:53: no such host
Looking in the ingress controller i get the following error:
7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
W0616 06:24:29.033235 7 controller.go:1119] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cm": local SSL certificate myApp-dev/letsencrypt-tls-cm was not found. Using default certificate
W0616 06:24:29.033264 7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
I0616 06:24:50.355937 7 status.go:275] updating Ingress myApp-dev/cm-acme-http-solver-9z88h status from [] to [{10.240.0.252 } {10.240.1.58 }]
W0616 06:24:50.363181 7 controller.go:1119] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cm": local SSL certificate myApp-dev/letsencrypt-tls-cm was not found. Using default certificate
W0616 06:24:50.363346 7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
I0616 06:24:50.363514 7 event.go:278] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"myApp-dev", Name:"cm-acme-http-solver-9z88h", UID:"1b53f4dc-1b52-4f11-9cd0-6ffe1d0d9d40", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"1451371", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress myApp-dev/cm-acme-http-solver-9z88h
If someone Googles this, then know that this issue can be also caused by DNS caching in your Kubernetes cluster. In this case, it is a transient error, but in some contexts speed could be important (e.g. if you are a managed service provider).
I wrote about it here but in summary.
cert-manager would emit the "no such host" error for a while, and eventually succeed
my coredns ConfigMap (in kube-system namespace) stipulated local DNS resolvers, and a 30 sec cache
you can fix the delay by (1) removing the cache, and (2) pointing the resolver to Google DNS (or another, depending on your needs)
Hope this pointer is helpful to someone.
The problem was that the top level domain name we were using was not valid, therefore the ingress didn't refer to a valid domain and threw an error.
Creating a valid top level domain and implementing it in our deployment solved the problem.
You can refer this link to configure cert manager at AKS. It will automatically create the TLS secret too, once the certificate gets validated and will attain ready state
Remember to add DNS records to the domain, such as A and CNAME to route traffic to the Kubernetes load balancer.
E.g. cm-myApp-dev.dev or any other subdomains.

K8s Issue connecting to Cassandra on Mac OS (via Node.js)

While trying to setup Cassandra database in a local Kubernetes cluster on a Mac OS (via Minikube), I am getting connection issues.
It seems like Node.js is not able to resolve DNS settings correctly, but resolving via command line DOES work.
The setup is as following (simplified):
Cassandra Service
apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
spec:
type: NodePort
ports:
- port: 9042
targetPort: 9042
protocol: TCP
name: http
selector:
app: cassandra
In addition, there's a PersistentVolume and a StatefulSet.
The application itself is very basic
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: app1
labels:
app: app1
spec:
replicas: 1
selector:
matchLabels:
app: app1
template:
metadata:
labels:
app: app1
spec:
containers:
- name: app1
image: xxxx.dkr.ecr.us-west-2.amazonaws.com/acme/app1
imagePullPolicy: "Always"
ports:
- containerPort: 3003
And a service
apiVersion: v1
kind: Service
metadata:
name: app1
namespace: default
spec:
selector:
app: app1
type: NodePort
ports:
- port: 3003
targetPort: 3003
protocol: TCP
name: http
there also a simple ingress setup
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: dev.acme.com
http:
paths:
- path: /app1
backend:
serviceName: app1
servicePort: 3003
And added to /etc/hosts the minikube ip address
192.xxx.xx.xxx dev.acme.com
So far so good.
When trying to call dev.acme.com/app1 via Postman, the node.js app itself is being called correctly (can see in the logs), HOWEVER, the app can not connect to Cassandra and times out with the following error:
"All host(s) tried for query failed. First host tried,
92.242.140.2:9042: DriverError: Connection timeout. See innerErrors."
The IP 92.242.140.2 seems to be just a public IP that is related to my ISP, I believe since the app is not able to resolve the service name.
I created a simple node.js script to test dns:
var dns = require('dns')
dns.resolve6('cassandra', (err, res) => console.log('ERR:', err, 'RES:', res))
and the response is
ERR: { Error: queryAaaa ENOTFOUND cassandra
at QueryReqWrap.onresolve [as oncomplete] (dns.js:197:19) errno: 'ENOTFOUND', code: 'ENOTFOUND', syscall: 'queryAaaa', hostname:
'cassandra' } RES: undefined
However, and this is where it gets confusing - when I ssh into the pod (app1), I am able to connect to cassandra service using:
cqlsh cassandra 9042 --cqlversion=3.4.4
So it seems as the pod is "aware" of the service name, but node.js runtime is not.
Any idea what could cause the node.js to not being able to resolve the service name/dns settings?
UPDATE
After re-installing the whole cluster, including re-installing docker, kubectl and minikube I am getting the same issue.
While running ping cassandra from app1 container via ssh, I am getting the following
PING cassandra.default.svc.cluster.local (10.96.239.137) 56(84) bytes
of data. 64 bytes from cassandra.default.svc.cluster.local
(10.96.239.137): icmp_seq=1 ttl=61 time=27.0 ms
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
Which seems to be fine.
However, when running from Node.js runtime I am still getting the same error -
"All host(s) tried for query failed. First host tried,
92.242.140.2:9042: DriverError: Connection timeout. See innerErrors."
These are the services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
app1 ClusterIP None <none> 3003/TCP 11m
cassandra NodePort 10.96.239.137 <none> 9042:32564/TCP 38h
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 38h
And these are the pods (all namespaces)
NAMESPACE NAME READY STATUS RESTARTS AGE
default app1-85d889db5-m977z 1/1 Running 0 2m1s
default cassandra-0 1/1 Running 0 38h
kube-system calico-etcd-ccvs8 1/1 Running 0 38h
kube-system calico-node-thzwx 2/2 Running 0 38h
kube-system calico-policy-controller-5bb4fc6cdc-cnhrt 1/1 Running 0 38h
kube-system coredns-86c58d9df4-z8pr4 1/1 Running 0 38h
kube-system coredns-86c58d9df4-zcn6p 1/1 Running 0 38h
kube-system default-http-backend-5ff9d456ff-84zb5 1/1 Running 0 38h
kube-system etcd-minikube 1/1 Running 0 38h
kube-system kube-addon-manager-minikube 1/1 Running 0 38h
kube-system kube-apiserver-minikube 1/1 Running 0 38h
kube-system kube-controller-manager-minikube 1/1 Running 0 38h
kube-system kube-proxy-jj7c4 1/1 Running 0 38h
kube-system kube-scheduler-minikube 1/1 Running 0 38h
kube-system kubernetes-dashboard-ccc79bfc9-6jtgq 1/1 Running 4 38h
kube-system nginx-ingress-controller-7c66d668b-rvxpc 1/1 Running 0 38h
kube-system registry-creds-x5bhl 1/1 Running 0 38h
kube-system storage-provisioner 1/1 Running 0 38h
UPDATE 2
The code to connect to Cassandra from Node.js:
const cassandra = require('cassandra-driver');
const client = new cassandra.Client({ contactPoints: ['cassandra:9042'], localDataCenter: 'datacenter1', keyspace: 'auth_server' });
const query = 'SELECT * FROM user';
client.execute(query, [])
.then(result => console.log('User with email %s', result.rows[0].email));
It DOES work when replacing cassandra:9042 with 10.96.239.137:9042 (10.69.239.137 is the ip address received from pinging cassandra via cli).
The Cassandra driver for Node.js uses resolve4/resolve6 to do its dns lookup, which bypasses your resolv.conf file. A program like ping uses resolv.conf to resolve 'cassandra' to 'cassandra.default.svc.cluster.local', the actual dns name assigned to your Cassandra service. For a more detailed explanation of name resolution in node.js see here.
The fix is simple, just pass in the full service name to your client:
const client = new cassandra.Client({ contactPoints: ['cassandra.default.svc.cluster.local:9042'], localDataCenter: 'datacenter1', keyspace: 'auth_server' });

Cannot access Web API deployed in Azure ACS Kubernetes Cluster

Please help. I am trying to deploy a web API to Azure ACS Kubernetes cluster, it is a simple web API created in VSTS and the result should be like this: { "value1", "value2" }.
I plan to make the type as Cluster-IP but I want to test and access it first that is why this is LoadBalancer, the pods is running and no restart (I think it's good).
The guide I'm following is: Running Web API using Docker and Kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 3d
sampleapi-service LoadBalancer 10.0.238.155 102.51.223.6 80:31676/TCP 1h
When I tried to browse the IP 102.51.223.6/api/values it says:
"This site can’t be reached"
service.yaml
kind: Service
apiVersion: v1
metadata:
name: sampleapi-service
labels:
name: sampleapi
app: sampleapi
spec:
selector:
name: sampleapi
ports:
- protocol: "TCP"
# Port accessible inside the cluster
port: 80
# Port to forwards inside the pod
targetPort: 80
# Port accessible oustide the cluster
#nodePort: 80
type: LoadBalancer
deployment.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: sampleapi-deployment
spec:
replicas: 3
template:
metadata:
labels:
app: sampleapi
spec:
containers:
- name: sampleapi
image: mycontainerregistry.azurecr.io/sampleapi:latest
ports:
- containerPort: 80
POD
Name: sampleapi-deployment-498305766-zzs2z
Namespace: default
Node: c103facs9001/10.240.0.4
Start Time: Fri, 27 Jul 2018 00:20:06 +0000
Labels: app=sampleapi
pod-template-hash=498305766
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"sampleapi-deployme
-498305766","uid":"d064a8e0-9132-11e8-b58d-0...
Status: Running
IP: 10.244.2.223
Controlled By: ReplicaSet/sampleapi-deployment-498305766
Containers:
sampleapi:
Container ID: docker://19d414c87ebafe1cc99d101ac60f1113533e44c24552c75af4ec197d3d3c9c53
Image: mycontainerregistry.azurecr.io/sampleapi:latest
Image ID: docker-pullable://mycontainerregistry.azurecr.io/sampleapi#sha256:9635a9df168ef76a6a27cd46cb15620d762657e9b57a5ac2514ba0b9a8f47a8d
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 27 Jul 2018 00:20:48 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mj5m1 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-mj5m1:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mj5m1
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50m default-scheduler Successfully assigned sampleapi-deployment-498305766-zzs2z to c103facs9001
Normal SuccessfulMountVolume 50m kubelet, c103facs9001 MountVolume.SetUp succeeded for volume "default-token-mj5m1"
Normal Pulling 49m kubelet, c103facs9001 pulling image "mycontainerregistry.azurecr.io/sampleapi:latest"
Normal Pulled 49m kubelet, c103facs9001 Successfully pulled image "mycontainerregistry.azurecr.io/sampleapi:latest"
Normal Created 49m kubelet, c103facs9001 Created container
Normal Started 49m kubelet, c103facs9001 Started container
It seems like to me that your service isn't set to a port on the container. You have your targetPort commented out. So the service is reachable on port 80 but the service doesn't know to target the pod on that port.
You will need to start the service which exposes the internal port to some external Ip:port that can be used in your browser to access the service. try this after deploying your deployment and service yml files:
kubectl service sampleapi-service

Does kubernetes require internet access when using a private registry?

I have a question about kubernetes and network firewall rules. I want to secure my kubernetes cluster with firewall rules, and was wondering if workers/masters need internet access? I'm planning on using a private registry located on my network, but I'm having problems getting it to work when the workers don't have internet access. Here's an example
Name: foo
Namespace: default
Node: worker003/192.168.30.1
Start Time: Mon, 23 Jan 2017 10:33:07 -0500
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
foo:
Container ID:
Image: registry.company.org/wop_java/app:nginx
Image ID:
Port:
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-3cg0w (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-3cg0w:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-3cg0w
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned foo to worker003
4m 1m 4 {kubelet worker003} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for gcr.io/google_containers/pause-amd64:3.0, this may be because there are no credentials on this request. details: (Error response from daemon: {\"message\":\"Get https://gcr.io/v1/_ping: dial tcp 74.125.192.82:443: i/o timeout\"})"
3m 3s 9 {kubelet worker003} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"gcr.io/google_containers/pause-amd64:3.0\""
My question is, does kubernetes require internet access to work? If yes, where is it documented officially?
you need to pass an argument --pod-infra-container-image to a kubelet as documented here: https://kubernetes.io/docs/admin/kubelet/.
It defaults to gcr.io/google_containers/pause-amd64:3.0, which in unsuccessfuly pulled on your machine since gcr.io is unavailable.
You can easily transfer the pause image to you private registry
docker pull gcr.io/google_containers/pause-amd64:3.0
docker tag gcr.io/google_containers/pause-amd64:3.0 REGISTRY.PRIVATE/google_containers/pause-amd64:3.0
docker push REGISTRY.PRIVATE/google_containers/pause-amd64:3.0
# and pass
kubelet --pod-infra-container-image=REGISTRY.PRIVATE/google_containers/pause-amd64:3.0 ...
The pause is a container is created prior your container in order to allocate and keep network and ipc namespaces over restarts.
Kubernetes does not need any internet access for normal operation when all required containers and components are provided by the private repository. A good starting point is the Bare Metal offline provisioning guide.
they do not need Internet access but your not getting access to the private registry your designating. have you looked at https://kubernetes.io/docs/user-guide/images/ it has a couple good options on how to get access to the private registry. https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ also has some details on it. we do the specifing imagePullSecrets and it works fine

Resources