How to fix etcd cluster misconfigured error - coreos

Have two servers : pg1: 10.80.80.195 and pg2: 10.80.80.196
Version of etcd :
etcd Version: 3.2.0
Git SHA: 66722b1
Go Version: go1.8.3
Go OS/Arch: linux/amd64
I'm trying to run like this :
pg1 server :
etcd --name infra0 --initial-advertise-peer-urls http://10.80.80.195:2380 --listen-peer-urls http://10.80.80.195:2380 --listen-client-urls http://10.80.80.195:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.80.80.195:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.80.80.195:2380,infra1=http://10.80.80.196:2380 --initial-cluster-state new
pg2 server :
etcd --name infra1 --initial-advertise-peer-urls http://10.80.80.196:2380 --listen-peer-urls http://10.80.80.196:2380 --listen-client-urls http://10.80.80.196:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.80.80.196:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.80.80.195:2380,infra1=http://10.80.80.196:2380 --initial-cluster-state new
When trying to cherck health state on pg1:
etcdctl cluster-health
have an error :
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout
; error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout
error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused
What I'm doing wrong and how to fix it ?
Both servers run on virtual machines with Bridged Adapter

I've got similar error when I set up etcd clusters using systemd according to the official tutorial from kubernetes.
It's three centos 7 of medium instances on AWS. I'm pretty sure the security groups are correct. And I've just:
$ systemctl restart network
and the
$ etcdctl cluster-health
just gives a healthy result.

Related

rafthttp: dial tcp timeout on etcd 3-node cluster creation

I don't have an access to the etcd part of the project's source code, however I do have access to the /var/log/syslog.
The goal is to setup up 3-node cluster.
(1)The very first etcd error that comes up is:
rafthttp: failed to dial 76e7ffhh20007a98 on stream MsgApp v2 (dial tcp 10.0.0.134:2380: i/o timeout)
Before continuing, I would say that I can ping all three nodes from each of the nodes. As well as I have tried to open the 2380 TCP ports and still no success - same error.
(2)So, before that error I had following messages from the etcd, which in my opinion confirm that cluster is setup correctly:
etcdserver/membership: added member 76e7ffhh20007a98 [https://server2:2380]
etcdserver/membership: added member 222e88db3803e816 [https://server1:2380]
etcdserver/membership: added member 999115e00e17123d [https://server3:2380]
In /etc/hosts file these DNS names are resolved as:
server2 10.0.0.135
server1 10.0.0.134
server3 10.0.0.136
(3)The initial setup, however (on each nodes looks like this):
embed: listening for peers on https://127.0.0.1:2380
embed: listening for client requests on 127.0.0.1:2379
So, to sum up, each node have got this initial setup log (3) and then adds members (2) then once these steps are done it fails with (1). As I know the etcd cluster creation is following this pattern: https://etcd.io/docs/v3.5/tutorials/how-to-setup-cluster/
Without knowing the source code is really hard to debug, however maybe some ideas on the error and what could cause it?
UPD: etcdctl cluster-health output (ETCDCTL_ENDPOINT is exported):
cluster may be unhealthy: failed to list members Error: client: etcd
cluster is unavailable or misconfigured; error #0: client: endpoint
http://127.0.0.1:2379 exceeded header timeout ; error #1: dial tcp
127.0.0.1:4001: connect: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header
timeout error #1: dial tcp 127.0.0.1:4001: connect: connection refused

Errors seen when setting up logspout in Hyperledger fabric 2.2

Following steps described here to setup logspout:
https://hyperledger-fabric.readthedocs.io/en/release-2.2/deploy_chaincode.html
Running this produces below errors:
./monitordocker.sh net_test
Starting monitoring on all containers on the network net_test
xxxx
docker: Error response from daemon: network net_test not found.
curl: (7) Failed to connect to 127.0.0.1 port 8000: Connection refused
xxx#xxxx:/home/fabric/fabric-samples/test-network#
xxx#xxxx:/home/fabric/fabric-samples/test-network# ./monitordocker.sh
Starting monitoring on all containers on the network basicnetwork_basic
xxxx
docker: Error response from daemon: network basicnetwork_basic not found.
curl: (7) Failed to connect to 127.0.0.1 port 8000: Connection refused
xxx#xxxx:/home/fabric/fabric-samples/test-network#
xxxx#xxxx:/home/fabric/fabric-samples/test-network# ./monitordocker.sh net_basic
Starting monitoring on all containers on the network net_basic
xxxx
docker: Error response from daemon: network net_basic not found.
curl: (7) Failed to connect to 127.0.0.1 port 8000: Connection refused
Few questions:
there is no process running in default port 8000. So connection refused error is expected. Do we need to use any other port ?
what is the name of the network to be given when running monitordocker.sh ?
Any other troubleshooting info is appreciated.
Ok, found the issue. The network name is fabric_test. So I issued command like
./monitordocker.sh fabric_test
This resolved the problem.

Cilium clustermesh with azure

I'm deploying a clustermesh using the Aks-engine. I have installed cilium on two different clusters. Following the clustermesh installation guide everything looks correct. Nodes are listed, the status is correct and no errors appear in the etcd-operator log. However, I cannot access external endpoints. The example app is always answering from the current cluster.
Following the troubleshooting guide I have found in the debuginfo from the agents that no external endpoints are declared. Clusters have a master and two slave nodes. I attach the node list and status from both clusters. I can provide additional logs if required.
Any help would be appreciated.
Cluster1
kubectl -nkube-system exec -it cilium-vg8sm cilium node list
Name IPv4 Address Endpoint CIDR IPv6 Address Endpoint CIDR
cluster1/k8s-cilium2-29734124-0 172.18.2.5 192.168.1.0/24
cluster1/k8s-cilium2-29734124-1 172.18.2.4 10.4.0.0/16
cluster1/k8s-master-29734124-0 172.18.1.239 10.239.0.0/16
cluster2/k8s-cilium2-14610979-0 172.18.2.6 192.168.2.0/24
cluster2/k8s-cilium2-14610979-1 172.18.2.7 10.7.0.0/16
cluster2/k8s-master-14610979-0 172.18.2.239 10.239.0.0/16
kubectl -nkube-system exec -it cilium-vg8sm cilium status
KVStore: Ok etcd: 1/1 connected: https://cilium-etcd-client.kube-system.svc:2379 - 3.3.11
ContainerRuntime: Ok docker daemon: OK
Kubernetes: Ok 1.15 (v1.15.1) [linux/amd64]
Kubernetes APIs: ["CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
Cilium: Ok OK
NodeMonitor: Disabled
Cilium health daemon: Ok
IPv4 address pool: 10/65535 allocated from 10.4.0.0/16
Controller Status: 48/48 healthy
Proxy Status: OK, ip 10.4.0.1, port-range 10000-20000
Cluster health: 6/6 reachable (2019-08-09T10:11:22Z)
Cluster2
kubectl -nkube-system exec -it cilium-rl8gt cilium node list
Name IPv4 Address Endpoint CIDR IPv6 Address Endpoint CIDR
cluster1/k8s-cilium2-29734124-0 172.18.2.5 192.168.1.0/24
cluster1/k8s-cilium2-29734124-1 172.18.2.4 10.4.0.0/16
cluster1/k8s-master-29734124-0 172.18.1.239 10.239.0.0/16
cluster2/k8s-cilium2-14610979-0 172.18.2.6 192.168.2.0/24
cluster2/k8s-cilium2-14610979-1 172.18.2.7 10.7.0.0/16
cluster2/k8s-master-14610979-0 172.18.2.239 10.239.0.0/16
kubectl -nkube-system exec -it cilium-rl8gt cilium status
KVStore: Ok etcd: 1/1 connected: https://cilium-etcd-client.kube-system.svc:2379 - 3.3.11
ContainerRuntime: Ok docker daemon: OK
Kubernetes: Ok 1.15 (v1.15.1) [linux/amd64]
Kubernetes APIs: ["CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
Cilium: Ok OK
NodeMonitor: Disabled
Cilium health daemon: Ok
IPv4 address pool: 10/65535 allocated from 10.7.0.0/16
Controller Status: 48/48 healthy
Proxy Status: OK, ip 10.7.0.1, port-range 10000-20000
Cluster health: 6/6 reachable (2019-08-09T10:40:39Z)
This problem is fixed with https://github.com/cilium/cilium/issues/8849 and will be available in version 1.6.

failed to create new connection: desc = transport: error while dialing: dial tcp 172.19.0.4:9051: connect: connection refused Hyperledger fabric

Can anyone help me fixing below error. I'm trying to install chaincode on peer via cli. I configured cli container correctly. But somehow Im getting this error..
grpc: addrConn.createTransport failed to connect to {peer0.org1.example.com:7051 0 <nil>}. Err :connection error: desc = “transport: Error while dialing dial tcp: lookup peer0.org1.example.com on 127.0.0.11:53: connection refused
Here is my docker-compose-cli.yaml
You can run into odd DNS resolution issues depending on the configuration of DNS on your host system. The easiest thing to try is to add the dns_search config value to your Compose file:
cli:
container_name: cli
image: hyperledger/fabric-tools:$IMAGE_TAG
tty: true
stdin_open: true
dns_search: .
See https://stackoverflow.com/a/45916717/6160507 as well ... you might need this for all of your services.
solution 1 :
use the command: sudo echo "nameserver 8.8.8.8" and start once again
Solution 2 : check your container logs and see
use the command "docker logs container-id"
Solution 3 :add "dns_serach: ." in docker-compose.yaml file and start
once again as below
**
dns_search: .
**

I can't connect to cassandra from nodejs app in docker container

I have a nodejs app inside a docker container (node:7.8.0) using 'bridge' network. I use the cassandra driver to connect with a cassandra server, but it raises timeout exception when initialize the connection:
Error: The host 172.16.210.101:9042 did not reply before timeout 12000 ms
at OperationTimedOutError.DriverError (node_modules/cassandra-driver /lib/errors.js:14:19)
at new OperationTimedOutError (node_modules/cassandra-driver/lib/errors.js:104:33)
at Connection.onTimeout (node_modules/cassandra-driver/lib/connection.js:645:20)
at Timeout._onTimeout (node_modules/cassandra-driver/lib/connection.js:620:10)
at ontimeout (timers.js:386:14)
at tryOnTimeout (timers.js:250:5)
at Timer.listOnTimeout (timers.js:214:5)
From inside the container, I can ping the cassandra server and do a telnet connection.
Using 'host' network works, and executing the app in a "standard" environment too.
Any help is appreciate.
Ok, I found the solution.
The MTU of my host machine is 1450, and docker0 uses by default 1500. It produces an error.
So, I change the MTU for docker and it works.
I use a debian host and I followed next steps:
Copy the service file:
cp /lib/systemd/system/docker.service /etc/systemd/system/docker.service
Then, edit the "ExecStart" line like this:
ExecStart=/usr/bin/dockerd -H fd:// --mtu=1400
Finally, restart docker:
sudo systemctl daemon-reload
sudo systemctl restart docker
Source:
https://rahulait.wordpress.com/2016/02/28/modifying-default-mtu-for-docker-containers/

Resources