Cassandra opscenter "agents failed to connect" on removed nodes - cassandra

I have some EC2 instances where cassandra provisioning failed. I terminated the instances and the machines no longer exist.
Opscenter keeps nagging me about "agents failed to connect" on these machines.
The machines do not show up in nodepool status nor in the system.peers table.
Where does cassandra opscenter stores the node list to connect to so I can delete these zombie nodes ?

This is actually a bug in OpsCenter that is being addressed in the future. To mitigate your current issue, just restart OpsCenter and those messages should cease.

Related

While restarting one node other nodes are showing down in the Cassandra cluster

Whenever I am restarting my any Cassandra node in my cluster after few minutes other nodes are showing down, sometimes other nodes also hanging. We need to restart other nodes to up the services.
During restart cluster seems unstable and one after other showing stress and DN status however JVM and nodetool services are running fine but when we are describing the cluster it is showing unreachable.
We don't have much traffic and load in our environment. Can you please give me any suggestion.
Cassandra version is 3.11.2
Do you see any error/warning in your system.log after the restart of the node?

How to setup stomp_interface for failover node for cassandra Opscenter

How to setup a fail-over node for Cassandra Opscenter. The Opscenter data is stored on Opscenter node itself. So to setup a failover node i need to setup an Opscenter different from current Opscenter and sync Opscenter data and config files between Opscenters.
The stomp_interface on nodes in the cluster are pointed towards Opscenter_1 how will it change automatically to Opscenter_2 when failover occurs??
There are steps on the datastax documentation that have details for doing this. At a minimum:
Mirror the configuration directories stored on the OpsCenter primary to the OpsCenter backup using the method you prefer.
On the backup OpsCenter in the failover directory, create a primary_opscenter_location configuration file that indicates the IP address of the primary OpsCenter daemon to monitor
The stomp_interface setting on the agents gets changed (address.yaml file updated as well) when failover occurs. This is why the documentations recommend making sure there is no 3rd party configuration management on it.
3 things :
If you have firewall on, allow the corresponding ports to communicate (61620,61621,9160,9042,7199)
always verify IF the cassandra is up and running, so agent can actually connect to something.
stop the agent, check again the address.yaml, restart the agent.

OpsCenter Error: "Unable to start Repair Service. Cluster state was not verified."

We are running OpsCenter 5.2.1. When attempting to start repair service, the following message appears in "Event Log":
"Unable to start Repair Service. Cluster state was not verified. Make sure all nodes are up, the cluster topology is stable, and all agents are connected."
Currently using NetworkTopologyStrategy with 9 node cluster spanning 3 DCs.
Anyone else experience this error?

OpsCenter is having trouble connecting with the agents

I am trying to setup a two node cluster in Cassandra. I am able to get my nodes to connect fine as far as I can tell. When I run nodetool status it shows both my nodes in the same data center and same rack. I can also run cqlsh on either node and query data. The second node can see data from the first node, etc.
I have my first node as the seed node both in the Cassandra.yaml and the cluster config file.
To avoid any potential security issues, I flushed my iptable and allowed all on all ports for both nodes. They are also on the same virtual network.
iptables -P INPUT ACCEPT
When I start OpsCenter on either machine it sees both nodes but only has information on the node I am viewing OpsCenter on. It can tell if the other node is up/down, but I am not able to view any detailed information. It sometimes initially says 2 Agents Connected but after awhile it says 1 agent failed to connect. It keeps prompting me to install OpsCenter on the other node although it's already there.
The OpsCenterd.log doesn't reveal much. There don't appear to be any errors but I see INFO: Nodes with agents that appear to no longer be running .
I am not sure what else to check as everything but OpsCenter seems to be working fine.
You should install Opscenter on a single node rather than all nodes. The opscenter gui will then prompt you to install the agent on each of the nodes in the cluster. Use nodetool status or nodetool ring to make sure that the cluster is properly functioning and all nodes are Up and functioning Normally. (status = UN)
In address.yaml file you can set stomp_address equal to the ip address of the opscenter server to force the agents to the correct address.

datastax-agent and OpsCenter not communicating on fresh AWS EC2 datastax instance

I've got a two-node Amazon Web Services cassandra cluster created using "DataStax Auto-Clustering AMI 2.5.1-pv". OpsCenter is running on node 0, as is the datastax-agent, but they don't seem to be completely connected. OpsCenter says 0 of 0 agents connected and the connection icon next to "New Cluster" is blinking red.
OpsCenter screenshot: http://i.stack.imgur.com/Z6Tnx.png
a nodetool status would really help.
check the agent logs on either of the nodes when the starts, if you don't see any errors, try "adding a new cluster" then "manage existing cluster" and add the seed IPs of your two nodes, opscenter will try to update the agents if needed.
BTW: upgrade to opscenter 5.1, there are a lot of bug already fixed

Resources