Consul members fail - agent

I am trying to do this tutorial--> tutorialConsul
When I run:
ikerlan#ikerlan-docker:~$ consul members
Node Address Status Type Build Protocol
ikerlan-docker 172.16.8.37:8301 alive client 0.3.0 2
172.16.8.191 10.0.2.15:8301 alive server 0.3.0 2
172.16.8.192 10.0.2.15:8301 alive server 0.3.0 2
172.16.8.193 10.0.2.15:8301 alive server 0.3.0 2
ikerlan#ikerlan-docker:~$ consul members
Node Address Status Type Build Protocol
ikerlan-docker 172.16.8.37:8301 alive client 0.3.0 2
172.16.8.191 10.0.2.15:8301 failed server 0.3.0 2
172.16.8.192 10.0.2.15:8301 alive server 0.3.0 2
172.16.8.193 10.0.2.15:8301 failed server 0.3.0 2
ikerlan#ikerlan-docker:~$ consul members
Node Address Status Type Build Protocol
ikerlan-docker 172.16.8.37:8301 alive client 0.3.0 2
172.16.8.191 10.0.2.15:8301 failed server 0.3.0 2
172.16.8.192 10.0.2.15:8301 alive server 0.3.0 2
172.16.8.193 10.0.2.15:8301 alive server 0.3.0 2
I can see the members alive, but if I run again some of that members failed, and then other ones alive and failed..
I think the leader, remove the members so they failed, and they try to connect but then they are removed.
I am working with virtualbox machines, and I see that consul takes all the machines with the same address, but I have configure consul to use eth1 interface that is the IP there appear in Node(name)
What could it be?
Thanks very much.

Two different nodes must not have the same address. This will confuse Consul, as Consul will use this address to connect to the nodes.
Use the bind_addr configuration option to ensure that consul uses the right interface and IP address on each machine consul doesn't pick the right address automatically. By default Consul picks the "first" private address it finds and this might not work in your setup.
The version you're running seems to be quite old (0.3.0). The current version is 0.6.0.

Related

rafthttp: dial tcp timeout on etcd 3-node cluster creation

I don't have an access to the etcd part of the project's source code, however I do have access to the /var/log/syslog.
The goal is to setup up 3-node cluster.
(1)The very first etcd error that comes up is:
rafthttp: failed to dial 76e7ffhh20007a98 on stream MsgApp v2 (dial tcp 10.0.0.134:2380: i/o timeout)
Before continuing, I would say that I can ping all three nodes from each of the nodes. As well as I have tried to open the 2380 TCP ports and still no success - same error.
(2)So, before that error I had following messages from the etcd, which in my opinion confirm that cluster is setup correctly:
etcdserver/membership: added member 76e7ffhh20007a98 [https://server2:2380]
etcdserver/membership: added member 222e88db3803e816 [https://server1:2380]
etcdserver/membership: added member 999115e00e17123d [https://server3:2380]
In /etc/hosts file these DNS names are resolved as:
server2 10.0.0.135
server1 10.0.0.134
server3 10.0.0.136
(3)The initial setup, however (on each nodes looks like this):
embed: listening for peers on https://127.0.0.1:2380
embed: listening for client requests on 127.0.0.1:2379
So, to sum up, each node have got this initial setup log (3) and then adds members (2) then once these steps are done it fails with (1). As I know the etcd cluster creation is following this pattern: https://etcd.io/docs/v3.5/tutorials/how-to-setup-cluster/
Without knowing the source code is really hard to debug, however maybe some ideas on the error and what could cause it?
UPD: etcdctl cluster-health output (ETCDCTL_ENDPOINT is exported):
cluster may be unhealthy: failed to list members Error: client: etcd
cluster is unavailable or misconfigured; error #0: client: endpoint
http://127.0.0.1:2379 exceeded header timeout ; error #1: dial tcp
127.0.0.1:4001: connect: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header
timeout error #1: dial tcp 127.0.0.1:4001: connect: connection refused

AKS http-application-routing-nginx-ingress-controller Port 80 is already in use

I have two AKS K8s clusters (ver 1.11.1 in West and North Europe) with http-application-routing addon enabled. Suddenly today pod named addon-http-application-routing-nginx-ingress-controller-xxxx crashed and showed the state:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
kubectl logs addon-http-application-routing-nginx-ingress-controller-xxxx
shows:
I1003 20:21:21.129694 7 flags.go:162] Watching for ingress class: addon-http-application-routing
W1003 20:21:21.129745 7 flags.go:165] only Ingress with class "addon-http-application-routing" will be processed by this ingress controller
F1003 20:21:21.129819 7 main.go:59] Port 80 is already in use. Please check the flag --http-port
If I connect to any node on any cluster and check opened ports with netstat -latun it shows no service on 80 port.
Node restart didn't help.
I just killed the affected node and it started working again. Here's a link where a similar solution also worked:
https://github.com/kubernetes/ingress-nginx/issues/3177

Memsql Master Node is not running

I have a memsql cluster with 1 master and 4 leaf node.
I have a problem my master node is not running but it is connected in the cluster. And i can read and write a data to my cluster.
while trying to restart the master node its showing some error.
2018-03-31 20:54:22: Jb2ae955f6 [ERROR] Failed to connect to MemSQL node BD60BED7C8082966F375CBF983A46A9E39FAA791: ProcessHandshakeResponsePacket() failed. Sending back 1045: Access denied for user 'root'#'xx.xx.xx.xx' (using password: NO)
ProcessHandshakeResponsePacket() failed. Sending back 1045: Access denied for user 'root'#'10.254.34.135' (using password: NO)
Cluster status
Index ID Agent Id Process State Cluster State Role Host Port Version
1 BD60BED Afb08cd NOT RUNNING CONNECTED MASTER 10.254.34.135 3306 5.8.10
2 D84101F A10aad5 RUNNING CONNECTED LEAF 10.254.42.244 3306 5.8.10
3 3D2A2AF Aa2ac03 RUNNING CONNECTED LEAF 10.254.38.76 3306 5.8.10
4 D054B1C Ab6c885 RUNNING CONNECTED LEAF 10.254.46.99 3306 5.8.10
5 F8008F7 Afb08cd RUNNING CONNECTED LEAF 10.254.34.135 3307 5.8.10
That error means that while the node is online, memsql-ops is unable to log in to the node, most likely because the root user's password is misconfigured somewhere in the system - memsql-ops is configured with no password for that node, but likely the memsql node does have a root password set.
Did you set a root password in memsql? Are you able to connect to the master node directly via mysql client?
If yes, you can fix this by logging in to the memsql master node directly and changing the root password to blank:
GRANT ALL PRIVILEGES ON *.* TO 'root'#'%' identified by '' WITH GRANT OPTION
Then, after ensuring that connectivity is restored, you can update the root password in the future with the command https://docs.memsql.com/memsql-ops-cli-reference/v6.0/memsql-update-root-password/.

How to configure cassandra for remote connection

I am trying to configure Cassandra Datastax Community Edition for remote connection on windows,
Cassandra Server is installed on a Windows 7 PC, With the local CQLSH it connects perfectly to the local server.
But when i try to connect with CQLSH from another PC in the same Network, i get this error message:
Connection error: ('Unable to connect to any servers', {'MYHOST':
error(10061, "Tried connecting to [('HOST_IP', 9042)]. Last error: No
connection could be made because the target machine actively refused
it")})
So i am wondering how to configure correctly (what changes should i make on cassandra.yaml config file) the Cassandra server to allow remote connections.
Thank you in advance!
How about this:
Make these changes in the cassandra.yaml config file:
start_rpc: true
rpc_address: 0.0.0.0
broadcast_rpc_address: [node-ip]
listen_address: [node-ip]
seed_provider:
- class_name: ...
- seeds: "[node-ip]"
reference: https://gist.github.com/andykuszyk/7644f334586e8ce29eaf8b93ec6418c4
Remote access to Cassandra is via its thrift port for Cassandra 2.0. In Cassandra 2.0.x, the default cqlsh listen port is 9160 which is defined in cassandra.yaml by the rpc_port parameter. By default, Cassandra 2.0.x and earlier enables Thrift by configuring start_rpc to true in the cassandra.yaml file.
In Cassandra 2.1, the cqlsh utility uses the native protocol. In Cassandra 2.1, which uses the Datastax python driver, the default cqlsh listen port is 9042.
The cassandra node should be bound to the IP address of your server's network card - it shouldn't be 127.0.0.1 or localhost which is the loopback interface's IP, binding to this will prevent direct remote access. To configure the bound address, use the rpc_address parameter in cassandra.yaml. Setting this to 0.0.0.0 will listen on all network interfaces.
Have you checked that the remote machine can connect to the Cassandra node? Is there a firewall between the machines? You can try these steps to test this out:
1) Ensure you can connect to that IP from the server you are on:
$ ssh user#xxx.xxx.xx.xx
2) Check the node's status and also confirm it shows the same IP:
$nodetool status
3) Run the command to connect with the IP (only specify the port if you are not using the default):
$ cqlsh xxx.xxx.xx.xx
Alternate solution to Kat. Worked with Ubuntu 16.04
ssh into server server_user#**.**.**.**
Stop cassandra if running:
Check if running with ps aux | grep cassandra
If running, will output a large block of commands / flags, e.g.
ubuntu 14018 4.6 70.1 2335692 712080 pts/2 Sl+ 04:15 0:11 java -Xloggc:./../logs/gc.log ........
Note 14018 in the example is the process id
Stop with kill <process_id> (in this case 14018)
edit cassandra.yaml file to be the following
rpc_address: 0.0.0.0
broadcast_rpc_address: **.**.**.** <- your server's IP (cannot be set to 0.0.0.0)
Restart cassandra ./bin/cassandra -f (from within cassandra root)
Open another terminal on local machine & connect via cqlsh **.**.**.** (your server's IP) to test.
The ./bin/nodetool status address reported my localhost IP (127.0.0.1), but cqlsh remotely still worked despite that.

cassandra 1.2 nodetool getting 'Failed to connect' when trying to connect to remote node

I am running a 6 node cluster of cassandra 1.2 on an Amazon Web Service VPC with Oracle's 64-bit JVM version 1.7.0_10.
When I'm logged on to one of the nodes (ex. 10.0.12.200) I can run nodetool -h 10.0.12.200 status just fine.
However, if I try to use another ip address in the cluster (10.0.32.153) from that same terminal I get Failed to connect to '10.0.32.153:7199: Connection refused'.
On the 10.0.32.153 node I am trying to connect to I've made the following checks.
From 10.0.12.200 I can run telnet 10.0.32.153 7199 and I get a connection, so it doesn't appear to be a security group/firewall issue to port 7199.
On 10.0.32.153 if I run netstat -ant|grep 7199 I see
tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN
so cassandra does appear to be listening on the port
The cassandra-env.sh file on 10.0.32.153 has all of the JVM_OPTS for jmx active
-Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
The only shot in the dark I've seen while trying to solve this problem while searching the interwebs is to set the following:
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=10.0.32.153"
But when I do this I don't even get a response. It just hangs.
Any guidance would be greatly appreciated.
The issue did end up being a firewall/security group issue. While it is true that the jmx port 7199 is used, apparently other ports are used randomly for rmi. Cassandra port usage - how are the ports used?
So the solution is to open up the firewalls then configure the cassandra-env.sh to include
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<ip>

Resources