hyperledger on multiple clusters : ingress and dns routing failing - azure

I am using 2 AKS(azure k8s clusters) with net peering to spin up hyperledger fabric components(one org on cluster-1 and org2 and orderers on cluster-2).
the issue I am facing is about using the ingress for the services/pods to communicate with each other.
between the 2 clusters we have a pvt DNS to route the requests to appropriate ingress points in the clusters. and have ingress resources configured(with FQDN) to further route them to the services.
but, they are failing to talk to each other. As a first priority, I am concentrating on getting the orderers talk and elect a leader. All the orderers are in the same cluster. so using the FQDN it queries the DNS for IP of the Ingress point(which it is able to do. confirmed by using ping in the pod. but the ping fails) and then onwards I am not sure what's happening.
The requests gets into an error at IP addresses of 2 pods strangely every time.
2022-06-21 01:28:24.294 UTC [core.comm] ServerHandshake -> ERRO 19558 Server TLS handshake failed in 5.000908746s with error read tcp 10.192.0.72:7050->10.192.0.55:24182: i/o timeout server=Orderer remoteaddress=**10.192.0.55**:24182
2022-06-21 01:28:24.294 UTC [grpc] Warningf -> DEBU 19559 grpc: Server.Serve failed to complete security handshake from "10.192.0.55:24182": read tcp 10.192.0.72:7050->10.192.0.55:24182: i/o timeout
2022-06-21 01:28:25.495 UTC [core.comm] ServerHandshake -> ERRO 1955a Server TLS handshake failed in 5.000462596s with error read tcp 10.192.0.72:7050->**10.192.0.4**:11226: i/o timeout server=Orderer remoteaddress=10.192.0.4:11226
2022-06-21 01:28:25.495 UTC [grpc] Warningf -> DEBU 1955b grpc: Server.Serve failed to complete security handshake from "10.192.0.4:11226": read tcp 10.192.0.72:7050->10.192.0.4:11226: i/o timeout
2022-06-21 01:28:28.781 UTC [orderer.consensus.etcdraft] Step -> INFO 1955c 2 is starting a new election at term 1 channel=syschannel node=2
the 2 ip addresses 10.192.0.4 and 10.192.0.55 are of kubernetes azure-ip-masq-agent/ calico-node. below is my ingress resource.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-orderer3
namespace: istakapazatestnet
uid: 03736608-2e29-4744-8b07-eaa67408f72e
resourceVersion: '6818484'
generation: 1
creationTimestamp: '2022-06-20T14:47:55Z'
labels:
component: orderer3
type: orderer
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"annotations":{"kubernetes.io/ingress.class":"nginx","nginx.ingress.kubernetes.io/rewrite-target":"/","nginx.ingress.kubernetes.io/ssl-redirect":"true"},"labels":{"component":"orderer3","type":"orderer"},"name":"ingress-orderer3","namespace":"istakapazatestnet"},"spec":{"rules":[{"host":"orderer3.istakapaza.com","http":{"paths":[{"backend":{"service":{"name":"orderer3-service","port":{"number":7050}}},"path":"/","pathType":"ImplementationSpecific"}]}}],"tls":[{"hosts":["orderer3.istakapaza.com"],"secretName":"ssl-secret"}]}}
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
selfLink: >-
/apis/networking.k8s.io/v1/namespaces/istakapazatestnet/ingresses/ingress-orderer3
status:
loadBalancer:
ingress:
- ip: 10.224.0.6
spec:
tls:
- hosts:
- orderer3.istakapaza.com
secretName: ssl-secret
rules:
- host: orderer3.istakapaza.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: orderer3-service
port:
number: 7050
I did also try a couple of combinations for the ports in configtx.yaml for orderers along with addresses and also attaching ssl certs to ingress. Looks like I am not able to figure this out. Any help will be much appreciated.
I know it sounds simple to route all the services through ingress but somehow I am missing something.
if anyone need further resources please just ping me. below is one combination I tried for configtx. I think we are not supposed to change configtx but I tried it anyway.
Orderer: &OrdererDefaults
OrdererType: etcdraft
EtcdRaft:
Consenters:
- Host: orderer.istakapaza.com
Port: 443
ClientTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer.istakapaza.com/tls/server.crt
ServerTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer.istakapaza.com/tls/server.crt
- Host: orderer2.istakapaza.com
Port: 443
ClientTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer2.istakapaza.com/tls/server.crt
ServerTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer2.istakapaza.com/tls/server.crt
- Host: orderer3.istakapaza.com
Port: 443
ClientTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer3.istakapaza.com/tls/server.crt
ServerTLSCert: /scripts/crypto-config/ordererOrganizations/istakapaza.com/orderers/orderer3.istakapaza.com/tls/server.crt
Addresses:
- https://orderer.istakapaza.com
- https://orderer2.istakapaza.com
- https://orderer3.istakapaza.com
BatchTimeout: 2s

Related

Strimzi Kafka Listener Custom Cert configuration

I am trying to configure Strimzi Kafka listener custom cert, following the documentation: https://strimzi.io/docs/operators/latest/full/configuring.html#ref-alternative-subjects-certs-for-listeners-str
I want to expose those listener outside of the Azure Kubernetes Service within the private virtual network.
I have provided a custom cert with private key generated by an internal CA and pointed towards that secret in the Kafka configuration:
kubectl create secret generic kafka-tls --from-literal=listener.cer=$cert --from-literal=listener.key=$skey -n kafka
`
listeners:
- name: external
port: 9094
type: loadbalancer
tls: true
authentication:
type: tls
#Listener TLS config
configuration:
brokerCertChainAndKey:
secretName: kafka-tls
certificate: listener.cer
key: listener.key
bootstrap:
loadBalancerIP: 10.67.249.253
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
brokers:
- broker: 0
loadBalancerIP: 10.67.249.251
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
- broker: 1
loadBalancerIP: 10.67.249.252
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
- broker: 2
loadBalancerIP: 10.67.249.250
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
authorization:
type: simple
`
Certificate has following records:
SAN:
*.kafka-datalake-prod-kafka-brokers *.kafka-datalake-prod-kafka-brokers.kafka.svc kafka-datalake-prod-kafka-bootstrap kafka-datalake-prod-kafka-bootstrap.kafka.svc kafka-datalake-prod-kafka-external-bootstrap kafka-datalake-prod-kafka-external-bootstrap.kafka.svc kafka-datalake-prod-azure.custom.domain
CN=kafka-datalake-produkty-prod-azure.custom.domain
I have also created an A record in the custom DNS for the given address: kafka-datalake-produkty-prod-azure.custom.domain 10.67.249.253
Then, I created a KafkaUser object:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
name: customuser
namespace: kafka
labels:
strimzi.io/cluster: kafka-datalake-prod
spec:
authentication:
type: tls
authorization:
type: simple
acls:
- resource:
type: topic
name: notify.somecustomapp.prod.topic_name
patternType: literal
operations:
- Create
- Describe
- Write
# host: "*"
When I then retrieve the secrets from the Kafka cluster on AKS:
kubectl get secret kafka-datalake-prod-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > broker.crt kubectl get secret customuser -n kafka -o jsonpath='{.data.user\.key}' | base64 -d > customuser.key kubectl get secret customuser -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d > customuser.crt
Communication fails, when I try to connect and send some messages with a producer using those 3 files to authenticate/authorize, I get a following issue:
INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <connecting> [IPv4 ('10.67.249.253', 9094)]>: connecting to 10.67.249.253:9094 [('10.67.249.253', 9094) IPv4] INFO:kafka.conn:Probing node bootstrap-0 broker version INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL CA from certs/prod/broker.crt INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL Cert from certs/prod/customuser.crt INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL Key from certs/prod/customuser.key [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
What am I doing wrong?
The communication worked perfectly fine when I was using the same method of connecting, yet the cluster itself and listeners were using the default certs generated by Strimzi cluster.
All the best,
Krzysztof
#Turing85 #Jakub
Many thanks for your comments - especially those critical ones
And thanks, Jakub, for pointing me towards using the CA of custom certificate. What needed to be done in order to fix this was:
switch the value obtained from kafka-datalake-prod-cluster-ca-cert secret with the full chain of root CA, intermediate signing cert and the certificate itself.
Add LoadBalancer IPs of brokers - this is stated in the documentation, yet the way it is formulated misguided me into thinking that adding hostnames/service names to SAN is enough (https://strimzi.io/docs/operators/latest/full/configuring.html#tls_listener_san_examples, and later https://strimzi.io/docs/operators/latest/full/configuring.html#external_listener_san_examples).
After those changes, everything started to work.
Thank you for help.

Hyperledger fabric:TLS handshake failed with error remote error: tls: bad certificate server=Orderer remoteaddress

This seems like a common issue in HLF channel creation command
Here is my command to create the channel
peer channel create -o orderer1.workspace:7050 -c base-main-channel -f ./config/channel.tx --tls --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/base.order/orderers/orderer1.base.order/msp/tlscacerts/tlsca.base.order-cert.pem
The error I am getting from the order node is
ERRO 02d TLS handshake failed with error remote error: tls: bad certificate server=Orderer remoteaddress=172.23.0.7:36982
I've tried the solution from this question TLS handshake failed with error remote error: tls: bad certificate server=Orderer
But it doesn't work for me
The only difference I am using raft ordering service instead of kafka
Here is my raft config
Raft:
<<: *ChannelDefaults
Capabilities:
<<: *ChannelCapabilities
Orderer:
<<: *OrdererDefaults
OrdererType: etcdraft
EtcdRaft:
Consenters:
- Host: orderer.base
Port: 7050
ClientTLSCert: crypto-config/ordererOrganizations/base.order/orderers/orderer1.base.order/tls/server.crt
ServerTLSCert: crypto-config/ordererOrganizations/base.order/orderers/orderer1.base.order/tls/server.crt
Addresses:
- orderer.base:7050
You are using the uncorrect folder path for --cafile in your peer channel create command.
Instead of
--cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/base.order/orderers/orderer1.base.order/msp/tlscacerts/tlsca.base.order-cert.pem
Use
--cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/base.order/tlsca/tlsca.base.order-cert.pem
Another solution is to remove the container's volumes
docker volume rm $(docker volume ls)
and restart the network

TLS handshake failed with error remote error: tls: bad certificate server=Orderer

I am trying to set up a hyperledger fabric on a VM manually. I have generated all the artifacts and configured the orderer.yaml and core.yaml. I have orderer running on port 127.0.0.1:7050. When I try to create channel using the peer cli channel create command I am getting a context deadline exceeded message on peer terminal.
./bin/peer channel create -o 127.0.0.1:7050 -c $CHANNEL_NAME -f ./channel-artifacts/channel.tx --tls --cafile /home/fabric-release/mynetwork/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem
Error: failed to create deliver client: orderer client failed to connect to 127.0.0.1:7050: failed to create new connection: context deadline exceeded
On the orderer terminal I am getting the following error:
2019-04-23 09:22:03.707 EDT [core.comm] ServerHandshake -> ERRO 01b TLS handshake failed with error remote error: tls: bad certificate server=Orderer remoteaddress=127.0.0.1:38618
2019-04-23 09:22:04.699 EDT [core.comm] ServerHandshake -> ERRO 01c TLS handshake failed with error remote error: tls: bad certificate server=Orderer remoteaddress=127.0.0.1:38620
2019-04-23 09:22:06.187 EDT [core.comm] ServerHandshake -> ERRO 01d TLS handshake failed with error remote error: tls: bad certificate server=Orderer remoteaddress=127.0.0.1:38622
I have gone through the configurations a few time, I am not sure if I am missing something. Following is my orderer.yaml
General:
LedgerType: file
ListenAddress: 127.0.0.1
ListenPort: 7050
TLS:
Enabled: true
PrivateKey: /home/fabric-release/mynetwork/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/tls/server.key
Certificate: /home/fabric-release/mynetwork/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/tls/server.crt
RootCAs:
- /home/fabric-release/mynetwork/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/tls/ca.crt
ClientAuthRequired: true
Keepalive:
ServerMinInterval: 60s
ServerInterval: 7200s
ServerTimeout: 20s
GenesisMethod: file
GenesisProfile: OneOrgOrdererGenesis
GenesisFile: channel-artifacts/genesis.block
LocalMSPDIR: /home/fabric-release/mynetwork/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/msp
LocalMSPID: OrdererMSP
Authentication:
TimeWindow: 15m
FileLedger:
Location: /var/hyperledger/production/orderer
Prefix: hyperledger-fabric-ordererledger
The issue is that the TLS server certificate used by the orderer does not have a SAN matching "127.0.0.1". You can add "localhost" and/or "127.0.0.1" to you TLS certificates by using a custom crypto-config.yaml when generating your artifacts with cryptogen:
# ---------------------------------------------------------------------------
# "OrdererOrgs" - Definition of organizations managing orderer nodes
# ---------------------------------------------------------------------------
OrdererOrgs:
# ---------------------------------------------------------------------------
# Orderer
# ---------------------------------------------------------------------------
- Name: Orderer
Domain: example.com
EnableNodeOUs: false
# ---------------------------------------------------------------------------
# "Specs" - See PeerOrgs below for complete description
# ---------------------------------------------------------------------------
Specs:
- Hostname: orderer
SANS:
- "localhost"
- "127.0.0.1"
# ---------------------------------------------------------------------------
# "PeerOrgs" - Definition of organizations managing peer nodes
# ---------------------------------------------------------------------------
PeerOrgs:
# ---------------------------------------------------------------------------
# Org1
# ---------------------------------------------------------------------------
- Name: org1
Domain: org1.example.com
EnableNodeOUs: true
Template:
Count: 2
SANS:
- "localhost"
- "127.0.0.1"
Users:
Count: 1
- Name: org2
Domain: org2.example.com
EnableNodeOUs: false
Template:
Count: 2
SANS:
- "localhost"
- "127.0.0.1"
Users:
Count: 1
I also faced the same problem and in my case, the issue was that I made some changes to the local directory files and apparently those changes were not successfully reflected while mounting those files back into the docker containers. What fixed the problem for me was
docker volume rm $(docker volume ls)
I restarted the network again and didn't see any more certificate errors. Worth a try.
when the problem of TLS handshake failed occurs between the orderer and orderer, it is most likely that there is an error in the configuration parameters when generating the TLS file.
if you are registered with TLS via fabric-ca, then you need to check whether the CSR properties in the TLS files of the two orderer are the same. You can use the following command "openssl x509 -in certificate.crt -text -noout".
The following you need to check whether the --cer.names, -m and other parameters of the orderer enroll are duplicate or incorrect.
In cases where the contents of the TLS file are consistent and the HostName specified, it is rare for the handshake to fail

TLS Handshake Error while Creating Hyperledger Fabric Channel with Multiple Organisation Orderers

Scenario: I have two organisation with two peers in each organisation. Now, I want each organisation to provide an orderer node as well.
Below is my crypto-config.yaml file:
OrdererOrgs:
- Name: Orderer1
Domain: org1.xyz.com
Template:
Count: 1
- Name: Orderer2
Domain: org2.xyz.com
Template:
Count: 1
Below is my configtx.yaml file:
- &OrdererOrg1
Name: OrdererOrg01
ID: Orderer1MSP
MSPDir: crypto-config/ordererOrganizations/org1.xyz.com/msp
Policies:
Readers:
Type: Signature
Rule: "OR('Orderer1MSP.member')"
Writers:
Type: Signature
Rule: "OR('Orderer1MSP.member')"
Admins:
Type: Signature
Rule: "OR('Orderer1MSP.admin')"
- &OrdererOrg2
Name: OrdererOrg02
ID: Orderer2MSP
MSPDir: crypto-config/ordererOrganizations/org2.xyz.com/msp
Policies:
Readers:
Type: Signature
Rule: "OR('Orderer2MSP.member')"
Writers:
Type: Signature
Rule: "OR('Orderer2MSP.member')"
Admins:
Type: Signature
Rule: "OR('Orderer2MSP.admin')"
Below is my docker-compose-cli.yaml file:
services:
orderer.xyz.com:
extends:
file: base/docker-compose-base.yaml
service: orderer.xyz.com
container_name: orderer.xyz.com
networks:
- byfn
orderer0.xyz.com:
extends:
file: base/docker-compose-base.yaml
service: orderer0.xyz.com
container_name: orderer0.xyz.com
networks:
- byfn
I try to create a channel with the following command:
peer channel create -o orderer.xyz.com:7050 -t 60s -c bay -f ./channel-artifacts/channel.tx --tls true --cafile /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/ordererOrganizations/org1.xyz.com/orderers/orderer0.org1.xyz.com/msp/tlscacerts/tlsca.org1.xyz.com-cert.pem
I get the following ERROR on Orderer container logs while creating a channel:
[core.comm] ServerHandshake -> ERRO 015 TLS handshake failed with
error remote error: tls: bad certificate {"server": "Orderer", "remote
address": "172.22.0.18:48594"}
So, is it possible that for organisations providing peers, provide an orderer node as well or a separate third organisation will be providing orderer nodes (as observed in tutorials)? And why am I getting this error?
Thanks for your time and let me know If you require any further information.
I'm finally able to find the actual reason behind this issue. The issue was with the service name of orderer containers in the docker-compose-cli.yaml file. Service name should be matched with the name specified in the crypto-config.yaml file following hostname.domain pattern.
So, I changed the orderer configurations in the docker-compose-cli.yaml file like below:
services:
orderer0.telco1.vodworks.com:
extends:
file: base/docker-compose-base.yaml
service: orderer.vodworks.com
container_name: orderer.vodworks.com
networks:
- byfn
orderer0.telco2.vodworks.com:
extends:
file: base/docker-compose-base.yaml
service: orderer0.vodworks.com
container_name: orderer0.vodworks.com
networks:
- byfn
After this, I modified the peer channel commands in script.sh and utils.sh scripts by adding the correct name of orderers. After these couple of changes I was able to run my network successfully and verified this deployment by installing chaincodes as well.
Thanks to #arnaud-j-le-hors for the sample application which helped me out to figure out this issue.
I do not know how you defined the structure of organizations and peers in your network, but, by watching at the path you specify for the --cacert and the config files, it seems to me that telco1.vodworks.com is not specified being an orderer organization.
Overall I may ask, are you sure that the path for the --cacert is correct?
I'm not the expert here but I'm not sure why you are trying to connect to orderer.xyz.com? I've got one setup that looks like what you're trying to do and for that you should give a name to each of the ordering nodes you want to create by adding the following lines to your crypto-config file (for both orderers):
Specs:
- Hostname: orderer
And you should define two corresponding containers, one called orderer.org1.xyz.com and the other orderer.org2.xyz.com in your compose file.
You should then be able to create the channel by contacting orderer.org1.xyz.com.
in my case I got this error
[core.comm] ServerHandshake -> ERRO 025 TLS handshake failed with error remote error: tls: internal error {"server": "Orderer", "remote address": "190.22.189.42:40746"}
When I use a fabric sdk to connect to a Fabric Network that use TLS enabled.
To solve this you need ensure that the connection profile use the hostnameOverride propertie in Orderer section this an example
orderers:
orderer.example.com:
url: grpcs://localhost:7050
# these are standard properties defined by the gRPC library
# they will be passed in as-is to gRPC client constructor
grpcOptions:
hostnameOverride: orderer.example.com
grpc-max-send-message-length: 15
grpc.keepalive_time_ms: 360000
grpc.keepalive_timeout_ms: 180000
Please check the next example to more information : https://github.com/hyperledger/fabric-sdk-java/blob/master/src/test/fixture/sdkintegration/network_configs/network-config-tls.yaml
Really I was working days in this error and finally I found the solution
To more information, fabric training, or develop blockchain solutions to the business and goverment based in Hyperledger Fabric in Chile and Latin America please visit www.blockchainempresarial.com

Kubernetes DNS fails in Kubernetes 1.2

I'm attempting to set up DNS support in Kubernetes 1.2 on Centos 7. According to the documentation, there's two ways to do this. The first applies to a "supported kubernetes cluster setup" and involves setting environment variables:
ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"
DNS_SERVER_IP="10.0.0.10"
DNS_DOMAIN="cluster.local"
DNS_REPLICAS=1
I added these settings to /etc/kubernetes/config and rebooted, with no effect, so either I don't have a supported kubernetes cluster setup (what's that?), or there's something else required to set its environment.
The second approach requires more manual setup. It adds two flags to kubelets, which I set by updating /etc/kubernetes/kubelet to include:
KUBELET_ARGS="--cluster-dns=10.0.0.10 --cluster-domain=cluster.local"
and restarting the kubelet with systemctl restart kubelet. Then it's necessary to start a replication controller and a service. The doc page cited above provides a couple of template files for this that require some editing, both for local changes (my Kubernetes API server listens to the actual IP address of the hostname rather than 127.0.0.1, making it necessary to add a --kube-master-url setting) and to remove some Salt dependencies. When I do this, the replication controller starts four containers successfully, but the kube2sky container gets terminated about a minute after completing initialization:
[david#centos dns]$ kubectl --server="http://centos:8080" --namespace="kube-system" logs -f kube-dns-v11-t7nlb -c kube2sky
I0325 20:58:18.516905 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0325 20:58:19.518337 1 kube2sky.go:529] Using http://192.168.87.159:8080 for kubernetes master
I0325 20:58:19.518364 1 kube2sky.go:530] Using kubernetes API v1
I0325 20:58:19.518468 1 kube2sky.go:598] Waiting for service: default/kubernetes
I0325 20:58:19.533597 1 kube2sky.go:660] Successfully added DNS record for Kubernetes service.
F0325 20:59:25.698507 1 kube2sky.go:625] Received signal terminated
I've determined that the termination is done by the healthz container after reporting:
2016/03/25 21:00:35 Client ip 172.17.42.1:58939 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/03/25 21:00:35 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local', at 2016-03-25 21:00:35.608106622 +0000 UTC, error exit status 1
Aside from this, all other logs look normal. However, there is one anomaly: it was necessary to specify --validate=false when creating the replication controller, as the command otherwise gets the message:
error validating "skydns-rc.yaml": error validating data: [found invalid field successThreshold for v1.Probe, found invalid field failureThreshold for v1.Probe]; if you choose to ignore these errors, turn validation off with --validate=false
Could this be related? These arguments come directly Kubernetes documentation. if not, what's needed to get this running?
Here is the skydns-rc.yaml I used:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v11
template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
# Kube2sky watches all pods.
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain="cluster.local"
- --kube-master-url=http://192.168.87.159:8080
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain="cluster.local"
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.
and skydns-svc.yaml:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: "10.0.0.10"
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
I just commented out the lines that contain the successThreshold and failureThreshold values in skydns-rc.yaml, then re-run the kubectl commands.
kubectl create -f skydns-rc.yaml
kubectl create -f skydns-svc.yaml

Resources