Strimzi Kafka Listener Custom Cert configuration - azure

I am trying to configure Strimzi Kafka listener custom cert, following the documentation: https://strimzi.io/docs/operators/latest/full/configuring.html#ref-alternative-subjects-certs-for-listeners-str
I want to expose those listener outside of the Azure Kubernetes Service within the private virtual network.
I have provided a custom cert with private key generated by an internal CA and pointed towards that secret in the Kafka configuration:
kubectl create secret generic kafka-tls --from-literal=listener.cer=$cert --from-literal=listener.key=$skey -n kafka
`
listeners:
- name: external
port: 9094
type: loadbalancer
tls: true
authentication:
type: tls
#Listener TLS config
configuration:
brokerCertChainAndKey:
secretName: kafka-tls
certificate: listener.cer
key: listener.key
bootstrap:
loadBalancerIP: 10.67.249.253
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
brokers:
- broker: 0
loadBalancerIP: 10.67.249.251
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
- broker: 1
loadBalancerIP: 10.67.249.252
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
- broker: 2
loadBalancerIP: 10.67.249.250
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
authorization:
type: simple
`
Certificate has following records:
SAN:
*.kafka-datalake-prod-kafka-brokers *.kafka-datalake-prod-kafka-brokers.kafka.svc kafka-datalake-prod-kafka-bootstrap kafka-datalake-prod-kafka-bootstrap.kafka.svc kafka-datalake-prod-kafka-external-bootstrap kafka-datalake-prod-kafka-external-bootstrap.kafka.svc kafka-datalake-prod-azure.custom.domain
CN=kafka-datalake-produkty-prod-azure.custom.domain
I have also created an A record in the custom DNS for the given address: kafka-datalake-produkty-prod-azure.custom.domain 10.67.249.253
Then, I created a KafkaUser object:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
name: customuser
namespace: kafka
labels:
strimzi.io/cluster: kafka-datalake-prod
spec:
authentication:
type: tls
authorization:
type: simple
acls:
- resource:
type: topic
name: notify.somecustomapp.prod.topic_name
patternType: literal
operations:
- Create
- Describe
- Write
# host: "*"
When I then retrieve the secrets from the Kafka cluster on AKS:
kubectl get secret kafka-datalake-prod-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > broker.crt kubectl get secret customuser -n kafka -o jsonpath='{.data.user\.key}' | base64 -d > customuser.key kubectl get secret customuser -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d > customuser.crt
Communication fails, when I try to connect and send some messages with a producer using those 3 files to authenticate/authorize, I get a following issue:
INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <connecting> [IPv4 ('10.67.249.253', 9094)]>: connecting to 10.67.249.253:9094 [('10.67.249.253', 9094) IPv4] INFO:kafka.conn:Probing node bootstrap-0 broker version INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL CA from certs/prod/broker.crt INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL Cert from certs/prod/customuser.crt INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL Key from certs/prod/customuser.key [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
What am I doing wrong?
The communication worked perfectly fine when I was using the same method of connecting, yet the cluster itself and listeners were using the default certs generated by Strimzi cluster.
All the best,
Krzysztof

#Turing85 #Jakub
Many thanks for your comments - especially those critical ones
And thanks, Jakub, for pointing me towards using the CA of custom certificate. What needed to be done in order to fix this was:
switch the value obtained from kafka-datalake-prod-cluster-ca-cert secret with the full chain of root CA, intermediate signing cert and the certificate itself.
Add LoadBalancer IPs of brokers - this is stated in the documentation, yet the way it is formulated misguided me into thinking that adding hostnames/service names to SAN is enough (https://strimzi.io/docs/operators/latest/full/configuring.html#tls_listener_san_examples, and later https://strimzi.io/docs/operators/latest/full/configuring.html#external_listener_san_examples).
After those changes, everything started to work.
Thank you for help.

Related

hyperledger on multiple clusters : ingress and dns routing failing

I am using 2 AKS(azure k8s clusters) with net peering to spin up hyperledger fabric components(one org on cluster-1 and org2 and orderers on cluster-2).
the issue I am facing is about using the ingress for the services/pods to communicate with each other.
between the 2 clusters we have a pvt DNS to route the requests to appropriate ingress points in the clusters. and have ingress resources configured(with FQDN) to further route them to the services.
but, they are failing to talk to each other. As a first priority, I am concentrating on getting the orderers talk and elect a leader. All the orderers are in the same cluster. so using the FQDN it queries the DNS for IP of the Ingress point(which it is able to do. confirmed by using ping in the pod. but the ping fails) and then onwards I am not sure what's happening.
The requests gets into an error at IP addresses of 2 pods strangely every time.
2022-06-21 01:28:24.294 UTC [core.comm] ServerHandshake -> ERRO 19558 Server TLS handshake failed in 5.000908746s with error read tcp 10.192.0.72:7050->10.192.0.55:24182: i/o timeout server=Orderer remoteaddress=**10.192.0.55**:24182
2022-06-21 01:28:24.294 UTC [grpc] Warningf -> DEBU 19559 grpc: Server.Serve failed to complete security handshake from "10.192.0.55:24182": read tcp 10.192.0.72:7050->10.192.0.55:24182: i/o timeout
2022-06-21 01:28:25.495 UTC [core.comm] ServerHandshake -> ERRO 1955a Server TLS handshake failed in 5.000462596s with error read tcp 10.192.0.72:7050->**10.192.0.4**:11226: i/o timeout server=Orderer remoteaddress=10.192.0.4:11226
2022-06-21 01:28:25.495 UTC [grpc] Warningf -> DEBU 1955b grpc: Server.Serve failed to complete security handshake from "10.192.0.4:11226": read tcp 10.192.0.72:7050->10.192.0.4:11226: i/o timeout
2022-06-21 01:28:28.781 UTC [orderer.consensus.etcdraft] Step -> INFO 1955c 2 is starting a new election at term 1 channel=syschannel node=2
the 2 ip addresses 10.192.0.4 and 10.192.0.55 are of kubernetes azure-ip-masq-agent/ calico-node. below is my ingress resource.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-orderer3
namespace: istakapazatestnet
uid: 03736608-2e29-4744-8b07-eaa67408f72e
resourceVersion: '6818484'
generation: 1
creationTimestamp: '2022-06-20T14:47:55Z'
labels:
component: orderer3
type: orderer
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"annotations":{"kubernetes.io/ingress.class":"nginx","nginx.ingress.kubernetes.io/rewrite-target":"/","nginx.ingress.kubernetes.io/ssl-redirect":"true"},"labels":{"component":"orderer3","type":"orderer"},"name":"ingress-orderer3","namespace":"istakapazatestnet"},"spec":{"rules":[{"host":"orderer3.istakapaza.com","http":{"paths":[{"backend":{"service":{"name":"orderer3-service","port":{"number":7050}}},"path":"/","pathType":"ImplementationSpecific"}]}}],"tls":[{"hosts":["orderer3.istakapaza.com"],"secretName":"ssl-secret"}]}}
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
selfLink: >-
/apis/networking.k8s.io/v1/namespaces/istakapazatestnet/ingresses/ingress-orderer3
status:
loadBalancer:
ingress:
- ip: 10.224.0.6
spec:
tls:
- hosts:
- orderer3.istakapaza.com
secretName: ssl-secret
rules:
- host: orderer3.istakapaza.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: orderer3-service
port:
number: 7050
I did also try a couple of combinations for the ports in configtx.yaml for orderers along with addresses and also attaching ssl certs to ingress. Looks like I am not able to figure this out. Any help will be much appreciated.
I know it sounds simple to route all the services through ingress but somehow I am missing something.
if anyone need further resources please just ping me. below is one combination I tried for configtx. I think we are not supposed to change configtx but I tried it anyway.
Orderer: &OrdererDefaults
OrdererType: etcdraft
EtcdRaft:
Consenters:
- Host: orderer.istakapaza.com
Port: 443
ClientTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer.istakapaza.com/tls/server.crt
ServerTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer.istakapaza.com/tls/server.crt
- Host: orderer2.istakapaza.com
Port: 443
ClientTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer2.istakapaza.com/tls/server.crt
ServerTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer2.istakapaza.com/tls/server.crt
- Host: orderer3.istakapaza.com
Port: 443
ClientTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer3.istakapaza.com/tls/server.crt
ServerTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer3.istakapaza.com/tls/server.crt
Addresses:
- https://orderer.istakapaza.com
- https://orderer2.istakapaza.com
- https://orderer3.istakapaza.com
BatchTimeout: 2s

Istio Multicluster primary-remote on Azure AKS

I am trying to create multi cluster istio primary-remote.
First created two clusters AZURE AKS. Used AzureCNI for Network Configuaration and following are the settings of the cluster.
First cluster
vnet istioclusterone - 10.10.0.0/20
subnet default - 10.10.0.0/20
k8s service address range 10.100.0.0/16
DNS service ip - 10.100.0.10
Docker Bridge address - 172.17.0.1/16
DNS-prefix - app-cluster-dns
Second cluster
vnet istioclusterone - 10.11.0.0/20
subnet default - 10.11.0.0/20
k8s service address range 10.101.0.0/16
DNS service ip - 10.101.0.10
Docker Bridge address - 172.18.0.1/16
DNS-prefix - processing-cluster-dns
Other than this gone with default settings.
Next Followed below articles to setup multi Istio cluster.
Before you begin
Primary-remote
last step in second article to setup cluster2 as remote is failed.
Found below errors in logs of istio-ingressgateway pod.
2022-04-11T07:51:00.352057Z warning envoy config StreamAggregatedResources gRPC config stream closed since 431s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:08.514428Z warning envoy config StreamAggregatedResources gRPC config stream closed since 439s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:12.462140Z warning envoy config StreamAggregatedResources gRPC config stream closed since 443s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:39.950935Z warning envoy config StreamAggregatedResources gRPC config stream closed since 471s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
Has anyone tried this scenario please share your insights.
Thanks.
Update:
Have used custom certs for both the clusters previous error was solved.
then created a gateway in both the clusters.
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cluster-aware-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: tls
protocol: TLS
tls:
mode: AUTO_PASSTHROUGH
hosts:
- "*.local"
Now getting new error. check below logs of pod istio-ingressgateway-575ccb4d79 of cluster2.
2022-04-13T09:14:04.650502Z warning envoy config StreamAggregatedResources gRPC config stream closed since 60s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway>:15012: i/o timeout"
2022-04-13T09:14:27.026016Z warning envoy config StreamAggregatedResources gRPC config stream closed since 83s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway:15012: i/o timeout"
what I undertood here, I have an eastwestgateway installed in cluster1 as in the documentation linkToDoc
cluster2 is trying to access cluster1. using publicIp of eastwest-gateway on port 15012 which is failing.
checked security groups port is opened. Tried telnet from a test pod from within the cluster to check. its failing.
can anyone help me here.
Thanks
It looks like a firewall issue. not sure if it'll help, but try opening the ports 15012 and 15443 on the remote cluster's outbound, to the eastwestgateway elb ip (primary cluster)

How to connect Vault with Consul agent on Kubernetes via Helm chart (Consul server is on Azure managed app)

Requesting for some help on : How to connect Vault with Consul agent on Kubernetes via Helm chart (Consul server is on Azure managed app)
I'm trying to build a POC, for Vault and consul and got some questions.
Deployed Azure managed app using - https://learn.hashicorp.com/tutorials/consul/hashicorp-consul-service-deploy
Installed consul agent on AKS with the steps in the https://learn.hashicorp.com/tutorials/consul/hashicorp-consul-service-aks?in=consul/hcs-azure
Consul helm chart : https://github.com/hashicorp/consul-helm
Installed vault via helm chart: https://github.com/hashicorp/vault-helm
Kubernetes services and pods for consul.
~$kubectl get svc -n consul
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul-connect-injector-svc ClusterIP 10.0.252.97 <none> 443/TCP 2d13h
consul-controller-webhook ClusterIP 10.0.169.80 <none> 443/TCP 2d13h
~$kubectl get pods -n consul
NAME READY STATUS RESTARTS AGE
consul-27j4j 1/1 Running 0 2d13h
consul-connect-injector-webhook-deployment-9454b8d68-778rd 1/1 Running 0 2d13h
consul-controller-7857456f99-mhzpw 1/1 Running 1 2d13h
consul-lkhpl 1/1 Running 0 2d13h
consul-webhook-cert-manager-cfbb689f7-fgtlw 1/1 Running 0 2d13h
consul-zf989 1/1 Running 0 2d13h
vault config as below:
ui:
enabled: true
serviceType: LoadBalancer
server:
ingress:
enabled: true
extraPaths:
- path: /
backend:
serviceName: vault-ui
servicePort: 8200
hosts:
- host: vault.something_masked.com
ha:
enabled: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "consul" {
path = "vault/"
scheme = "https"
address = "HOST_IP:8500"
}
Vault pods
kubectl get pods -n vault
NAME READY STATUS RESTARTS AGE
vault-0 0/1 Running 0 7m14s
vault-1 0/1 Running 0 7m11s
vault-2 0/1 Running 0 7m11s
vault-agent-injector-cbbb6f4df-rmbd7 1/1 Running 0 7m22s
ERROR : Vault is unable to make communication with consul agent.
Logs for vault-0 pod
kubectl logs vault-0 -n vault
WARNING! Unable to read storage migration status.
2021-06-27T08:37:17.801Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2021-06-27T08:37:18.824Z [WARN] storage migration check error: error="Get "https://10.54.0.206:8500/v1/kv/vault/core/migration": dial tcp 10.54.0.206:8500: connect: connection refused"
Logs for vault-agent-injector pod
kubectl logs vault-agent-injector-cbbb6f4df-rmbd7 -n vault
2021-06-27T08:37:09.189Z [INFO] handler: Starting handler..
Listening on ":8080"...
2021-06-27T08:37:09.218Z [INFO] handler.auto-tls: Generated CA
2021-06-27T08:37:09.219Z [INFO] handler.certwatcher: Updated certificate bundle received. Updating certs...
2021-06-27T08:37:18.252Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-06-27T08:37:18.452Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=10s
Any suggestions or advice in the vault configuration if I have missed something ??
Thank you in advance.
Regards
Pooja
There are many points to debug the issue and the problem can be anywhere
the major issue I am seeing right now is your vault pods are running but not in a Ready state.
You have to unseal the vault from seal status.
Read more about the Unseal : https://learn.hashicorp.com/tutorials/vault/ha-with-consul#step-5-start-vault-and-verify-its-state
also how your consul is iteracting with vault ? Using LB, Ingress, service name ?
I am using consul azure managed app server and i have installed consul agent on aks.
kubectl get svc -n consul
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul-connect-injector-svc ClusterIP 10.0.252.97 <none> 443/TCP 3d13h
consul-controller-webhook ClusterIP 10.0.169.80 <none> 443/TCP 3d13h
as i am using consul agent i do not see consul-server running.
Helm chart config for consul
global:
enabled: false
name: consul
datacenter: dc1
acls:
manageSystemACLs: true
bootstrapToken:
secretName: XXX-sandbox-managed-app-bootstrap-token
secretKey: token
gossipEncryption:
secretName: XXX-sandbox-managed-app-hcs
secretKey: gossipEncryptionKey
tls:
enabled: true
enableAutoEncrypt: true
caCert:
secretName: XXX-sandbox-managed-app-hcs
secretKey: caCert
externalServers:
enabled: true
hosts:
['XXX.az.hashicorp.cloud']
httpsPort: 443
useSystemRoots: true
k8sAuthMethodHost: https://XXX.uksouth.azmk8s.io:443
client:
enabled: true
# If you are using Kubenet in your AKS cluster (the default network),
# uncomment the line below.
# exposeGossipPorts: true
join:
['XXX.az.hashicorp.cloud']
connectInject:
enabled: true
controller:
enabled: true
Helm chart config for vault
ui:
enabled: true
serviceType: LoadBalancer
server:
ingress:
enabled: true
extraPaths:
- path: /
backend:
serviceName: vault-ui
servicePort: 8200
hosts:
- host: something.com
ha:
enabled: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "consul" {
path = "vault/"
scheme = "https"
address = "HOST_IP:8500"
}
Error in vault pod which is unable to connect to consul agent.
kubectl logs vault-0 -n vault
WARNING! Unable to read storage migration status.
2021-06-28T08:13:13.041Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2021-06-28T08:13:13.042Z [WARN] storage migration check error: error="Get "https://127.0.0.1:8500/v1/kv/vault/core/migration": dial tcp 127.0.0.1:8500: connect: connection refused"
i am not sure if some configuration is missed in consul helm chart as i do not see any service running on port 8500 in consul namespace.
Any suggestion would be much appreciated.
Thanks,
pooja

Kubernetes challenge waiting for http-01 propagation: dial tcp: no such host

I am trying to create a kubernetes cluster namespace with auto generated DNS for ingress, secured with Let's Encrypt TLS certificates. Unfortunately i'm running in some trouble and do not know where to look for the solution.
Deployment is being done with a multi-stage yaml pipeline into an AKS cluster, i've setup an nginx ingress controller and cert-manager, both in a separate namespace. The deployment succeeds and everything seems to be running, but the exposed hostnames from the ingress are not reachable. When taking a look at the certificates i see the following
Name: letsencrypt-tls-cd
Namespace: myApp-dev
Labels: app.kubernetes.io/instance=myApp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cd
app.kubernetes.io/version=9.3.0
helm.sh/chart=cd-1.0.0
Annotations: <none>
API Version: cert-manager.io/v1alpha3
Kind: Certificate
Metadata:
Creation Timestamp: 2020-06-15T11:59:53Z
Generation: 1
Owner References:
API Version: extensions/v1beta1
Block Owner Deletion: true
Controller: true
Kind: Ingress
Name: myApp-cd
UID: a6cbbf69-749e-4dd1-81cc-37a817051690
Resource Version: 1218430
Self Link: /apis/cert-manager.io/v1alpha3/namespaces/myApp-dev/certificates/letsencrypt-tls-cd
UID: 46ac0acb-71bf-4dbc-a376-c024e92d68ca
Spec:
Dns Names:
cd-myApp-dev.dev
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-prod
Secret Name: letsencrypt-tls-cd
Status:
Conditions:
Last Transition Time: 2020-06-15T11:59:53Z
Message: ***Waiting for CertificateRequest "letsencrypt-tls-cd-95531636" to complete***
Reason: InProgress
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal GeneratedKey 57m cert-manager Generated a new private key
Normal Requested 57m cert-manager Created new CertificateRequest resource "letsencrypt-tls-cd-95531636"
Looking into the certificate request :
Name: letsencrypt-tls-cd-95531636
Namespace: myApp-dev
Labels: app.kubernetes.io/instance=myApp
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cd
app.kubernetes.io/version=9.3.0
helm.sh/chart=cd-1.0.0
Annotations: cert-manager.io/certificate-name: letsencrypt-tls-cd
cert-manager.io/private-key-secret-name: letsencrypt-tls-cd
API Version: cert-manager.io/v1alpha3
Kind: CertificateRequest
Metadata:
Creation Timestamp: 2020-06-15T11:59:54Z
Generation: 1
Owner References:
API Version: cert-manager.io/v1alpha2
Block Owner Deletion: true
Controller: true
Kind: Certificate
Name: letsencrypt-tls-cd
UID: 46ac0acb-71bf-4dbc-a376-c024e92d68ca
Resource Version: 1218442
Self Link: /apis/cert-manager.io/v1alpha3/namespaces/myApp-dev/certificaterequests/letsencrypt-tls-cd-95531636
UID: 2bef5e93-6722-43c0-bd2c-283d70334b1c
Spec:
Csr: mySecret
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-prod
Status:
Conditions:
Last Transition Time: 2020-06-15T11:59:54Z
Message: Waiting on certificate issuance from order myApp-dev/letsencrypt-tls-cd-95531636-1679437339: "pending"
Reason: Pending
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal OrderCreated 58m cert-manager Created Order resource myApp-dev/letsencrypt-tls-cd-95531636-1679437339
And the challenge:
Name: letsencrypt-tls-cm-1259919220-2936945618-694921812
Namespace: myApp-dev
Labels: <none>
Annotations: <none>
API Version: acme.cert-manager.io/v1alpha3
Kind: Challenge
Metadata:
Creation Timestamp: 2020-06-15T11:59:55Z
Finalizers:
finalizer.acme.cert-manager.io
Generation: 1
Owner References:
API Version: acme.cert-manager.io/v1alpha2
Block Owner Deletion: true
Controller: true
Kind: Order
Name: letsencrypt-tls-cm-1259919220-2936945618
UID: 4d8eab8e-449b-494e-a751-912a77671223
Resource Version: 1218492
Self Link: /apis/acme.cert-manager.io/v1alpha3/namespaces/myApp-dev/challenges/letsencrypt-tls-cm-1259919220-2936945618-694921812
UID: 8b355336-309a-4192-83b7-41397ebc20ac
Spec:
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/5253543313
Dns Name: cm-myApp-dev.dev
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-prod
Key: 0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI.qZ3FGlVmwRY6MwBNqUR5iktM1fJWdXxFWZYFOpjSUkQ
Solver:
http01:
Ingress:
Class: nginx
Pod Template:
Metadata:
Spec:
Node Selector:
kubernetes.io/os: linux
Token: 0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI
Type: http-01
URL: https://acme-v02.api.letsencrypt.org/acme/chall-v3/5253543313/1eUG0g
Wildcard: false
Status:
Presented: true
Processing: true
Reason: Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI': Get "http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI": dial tcp: lookup cm-myApp-dev.dev on 10.0.0.10:53: no such host
State: pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Started 2m15s cert-manager Challenge scheduled for processing
Normal Presented 2m14s cert-manager Presented challenge using http-01 challenge mechanism
I'm quite new to kubernetes and don't know where to look to fix the error bellow, any help is greatly appreciated.
Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI': Get "http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI": dial tcp: lookup cm-myApp-dev.dev on 10.0.0.10:53: no such host
Looking in the ingress controller i get the following error:
7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
W0616 06:24:29.033235 7 controller.go:1119] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cm": local SSL certificate myApp-dev/letsencrypt-tls-cm was not found. Using default certificate
W0616 06:24:29.033264 7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
I0616 06:24:50.355937 7 status.go:275] updating Ingress myApp-dev/cm-acme-http-solver-9z88h status from [] to [{10.240.0.252 } {10.240.1.58 }]
W0616 06:24:50.363181 7 controller.go:1119] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cm": local SSL certificate myApp-dev/letsencrypt-tls-cm was not found. Using default certificate
W0616 06:24:50.363346 7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
I0616 06:24:50.363514 7 event.go:278] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"myApp-dev", Name:"cm-acme-http-solver-9z88h", UID:"1b53f4dc-1b52-4f11-9cd0-6ffe1d0d9d40", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"1451371", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress myApp-dev/cm-acme-http-solver-9z88h
If someone Googles this, then know that this issue can be also caused by DNS caching in your Kubernetes cluster. In this case, it is a transient error, but in some contexts speed could be important (e.g. if you are a managed service provider).
I wrote about it here but in summary.
cert-manager would emit the "no such host" error for a while, and eventually succeed
my coredns ConfigMap (in kube-system namespace) stipulated local DNS resolvers, and a 30 sec cache
you can fix the delay by (1) removing the cache, and (2) pointing the resolver to Google DNS (or another, depending on your needs)
Hope this pointer is helpful to someone.
The problem was that the top level domain name we were using was not valid, therefore the ingress didn't refer to a valid domain and threw an error.
Creating a valid top level domain and implementing it in our deployment solved the problem.
You can refer this link to configure cert manager at AKS. It will automatically create the TLS secret too, once the certificate gets validated and will attain ready state
Remember to add DNS records to the domain, such as A and CNAME to route traffic to the Kubernetes load balancer.
E.g. cm-myApp-dev.dev or any other subdomains.

Kubernetes ingress "an error on the server ("") has prevented the request from succeeding"

I have a managed azure cluster (AKS) with nginx ingress in it.
It was working fine but now nginx ingress stopped:
# kubectl -v=7 logs nginx-ingress-<pod-hash> -n nginx-ingress
GET https://<PRIVATE-IP-SVC-Kubernetes>:443/version?timeout=32s
I1205 16:59:31.791773 9 round_trippers.go:423] Request Headers:
I1205 16:59:31.791779 9 round_trippers.go:426] Accept: application/json, */*
Unexpected error discovering Kubernetes version (attempt 2): an error on the server ("") has prevented the request from succeeding
# kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: <PRIVATE-IP-SVC-Kubernetes>
Port: https 443/TCP
TargetPort: 443/TCP
Endpoints: <PUBLIC-IP-SVC-Kubernetes>:443
Session Affinity: None
Events: <none>
When I tried to curl https://PRIVATE-IP-SVC-Kubernetes:443/version?timeout=32s, I've always seen the same output:
curl: (35) SSL connect error
On my OCP 4.7 (OpenShift Container Registry) instances with 3 of master and 2 of worker nodes, the following log appears after kubelet and oc commands.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1-5-g76a04fc", GitCommit:"e29b355", GitTreeState:"clean", BuildDate:"2021-06-03T21:19:58Z", GoVersion:"go1.15.7", Compiler:"gc", Platform:"linux/amd64"}
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding
$ oc get nodes
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding
Also, when I wanted to login to the OCP dashboard, the following error occurred:
error_description": "The authorization server encountered an unexpected condition that prevented it from fulfilling the request
I restarted the whole master node machines then the problem solved.
I faced the same issue with three manager cluster and I was accessing it through ucp client bundle. I figured out 2 out of 3 manager nodes were in not ready state. On debugging further I found space issue on those not ready boxes. After cleaning a little (mainly /var folder) and restart docker, those nodes came to ready state and I'm not getting this error.
On Windows: edit host file (vi /etc/hosts) and replace the line with:
127.0.0.1 ~/.kube/config
Worked for me !!!

Resources