We have a cassandra cluster of 3 nodes. yesterday I stopped one of the node and started it again today. Surprisingly now I have the different ring for the new node. why it is showing as different ring and there are no error messages in the logs.
ring 1: nodetool status
UN 1.2.3.4
UN 5.6.7.8
ring 2: nodetool status
UN 9.10.11.12
When I see the logs of ring 1 both the nodes shows the same message:
WARN [WRITE-/9.10.11.12] 2013-11-05 14:04:51,221 SSLFactory.java (line
139) Filtering out TLS_RSA_WITH_AES_256_CBC_SHA as it isnt supported
by the socket
Ring 2:
It has no errors
Both the cluster names are same and both are in the same network and all the three nodes are seed nodes. Any help would be appreciated.
Just a guess, but this may be related to How to remove a node from gossip in cassandra, in that you may have a ring that is a bad state.
I faced similar issue, Before starting node again clear Cassandra cache and hits directory solved my issue.
Related
I have a Single Node Cassandra Cluster which has around 44gb of data on it(/var/lib/cassandra/data/my_keyspace). The current storage is 1 tb and I need to migrate all the data to another VM which will have the same setup(single node cluster). My data-node has data being pushed to it every second so I can't afford any downtime(Some sensors are pushing time-series data).
Keyspace :- CREATE KEYSPACE my_keysopace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 43.4 GiB 256 100.0% e0ae36db-f639-430c-91ad-6af3ffb6f906 rack1
After a bit of research I decided it's best to add the new node to existing cluster and then let the old node stream all the data and after streaming is done, decommission the old node.
Source :- https://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
Configure old node as seed node for the new node
Add new node node to the ring(auto_bootstrap = true)
Once the status is UN for both nodes, run nodetool cleanup on old node
Decommission the old node
My only concern is will I be facing any data loss/ is this approach appropriate ?
Please let me know if I am missing anything here
Thanks
Firstly, disclaimer, using a single node of C* voids the purpose of the distributed database. Minimal cluster size tends to be 3 so some nodes can go offline without downtime (I'm sure you've seen this warning before). Now with that out the way, let's discuss the process.
Configure old node as seed node for the new node
Yep.
1.5. (Potentially missing step) The step you're missing is the consistency level of your queries needs to be verified. I see you're using replication_factor 1 for all keyspaces in use so make sure you're using a CONSISTENCY_LEVEL of ONE for your queries.
Add new node node to the ring(auto_bootstrap = true)
Sounds good. Make sure you've configured various ports / listen_address etc.
Once the status is UN for both nodes,
Once you reach UN double-check that the client isn't seeing any consistency errors.
3.5. run nodetool cleanup on old node
3.5. (Redundant step) You don't need to run nodetool cleanup. You won't care about left over data from the decommissioned node, as all the data will be moved to the new node replacing it.
Decommission the old node
Yep.
(Missing step) You'll have to modify the new node to see itself as a seed once you've decomissioned the old node or it wont be able to re-start.
Any reason why com.datastax.driver.core.Metadata:getHosts() would return state UP for a host that has shutdown?
However, nodetool status returns DN for that host.
No matter how many times I check Host.getState(), it still says UP for that dead host.
This is how I'm querying Metadata:
cluster = DseCluster.builder()
.addContactPoints("192.168.1.1", "192.168.1.2", "192.168.1.3")
.withPort(9042)
.withReconnectionPolicy(new ConstantReconnectionPolicy(2000))
.build();
cluster.getMetadata().getAllHosts();
EDIT: Updated code to reflect I'm trying to connect to 3 hosts. I should've stated that the cluster I'm connecting has 3 nodes, 2 in DC1 and another in DC2.
Also, whenever I relaunch my Java process running this code, the behavior changes. Sometimes it gives me the right states, then when I restart it again, it gives me the wrong states, and so on.
I will post an answer which I got from the datastaxacademy slack:
Host.getState() is the driver's view of what it thinks the host
state is, where nodetool status is what that C* node thinks the
state of all nodes in the clusters are from its view (propagated via
gossip) There is not a way to get that via the driver
I am new to Cassandra, use latest Cassandra 3.10. I have 3 nodes to link to participate in Cassandra. Cluster name Test Cluster same as three nodes. Same Datacenter dc1 ,Rack as rack1 and snitch as GossipingPropertyFileSnitch used .It Configures
Node A:
-seeds : "A,B,C address"
listen_address & rpc_address are same to A node ip address
Node B:
-seeds : "A,B,C address"
listen_address & rpc_address are same to B node ip address
Node C:
-seeds : "A,B,C address"
listen_address & rpc_address are same to C node ip address
What i am do possibility here listed
i) suppose if A node is failure get data from node B and C .
ii) If any one or two node failure get data from another node. How to configure these nodes.
I have use Simple Strategy with replication factor 3 has used.
If node failure get node from another node data retrieve so, seeds address or mistaken? Briefly explain what to do.
Answering your questions:
If a Node A goes down, then you want to fetch data from node B and C.
If one or two node goes down, you want to fetch data from other node.
To achieve the above, the replication factor which you have configured is enough to handle the node failure. The wrong configuration is having all your nodes be seed node.
A seed node is used to bootstrap other nodes, So usually first node is started first in a data center as a seed node. Suppose you have 2 data centers. Then you should have 2 seed nodes, as mentioned in below datastax docs:
http://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html
As per your last comment, you have mentioned "schema version mismatch detected". Which is means all your nodes are not in same cluster. Check the schema using nodetool when all your nodes are running
nodetool describecluster
This should give nodes schema version. All nodes should be same schema version.
So if any one node does not have same version then restart the node till the schema version is same.
Once you fix this schema error, you will be able to create keyspace.
I'm trying but failing to join a new (well old, but wiped out) node to an existing cluster.
Currently cluster consists of 2 nodes and runs C* 2.1.2. I start a third node with 2.1.2, it gets to joining state, it bootstraps, i.e. streams some data as shown by nodetool netstats, but after some time, it gets stuck. From that point nothing gets streamed, the new node stays in joining state. I restarted node twice, everytime it streamed more data, but then got stuck again. (I'm currently on a third round like that).
Other facts:
I don't see any errors in the log on any of the nodes.
The connectivity seems fine, I can ping, netcat to port 7000 all ways.
I have 267 GB load per running node, replication 2, 16 tokens.
Load of a new node is around 100GBs now
I'm guessing that the node after few rounds of restarts, will finally suck in all of the data from running nodes and join the cluster. But definitely it's not the way it should work.
EDIT: I discovered some more info:
The bootstrapping process stops in the middle of streaming some table, always after sending exactly 10MB of some SSTable, e.g.:
$ nodetool netstats | grep -P -v "bytes\(100"
Mode: NORMAL
Bootstrap e0abc160-7ca8-11e4-9bc2-cf6aed12690e
/192.168.200.16
Sending 516 files, 124933333900 bytes total
/home/data/cassandra/data/leadbullet/page_view-2a2410103f4411e4a266db7096512b05/leadbullet-page_view-ka-13890-Data.db 10485760/167797071 bytes(6%) sent to idx:0/192.168.200.16
Read Repair Statistics:
Attempted: 2016371
Mismatch (Blocking): 0
Mismatch (Background): 168721
Pool Name Active Pending Completed
Commands n/a 0 55802918
Responses n/a 0 425963
I can't diagnose the error & I'll be grateful for any help!
Try to telnet from one node to another using correct port.
Make sure you are joining the correct name cluster.
Try use: nodetool repair
You might be pinging the external IP addressed, and your cluster communicates using internal IP addresses.
If you are running on Amazon AWS, make sure you have firewall open on both internal IP addresses.
I'm setting up cassandra cluster.I made the configuration changes(adding seeds). When I ring that cluster, its showing only one node. bit actually 2 nodes on my cluster. what change should be done.
Thanks in advance..
As well as adding seeds, you need to configure the listen_address for your nodes so they aren't just listening on localhost (the same goes for the rpc_address). You need to enable auto-bootstrap for your new node, or set its token manually so that it takes on a portion of the keyspace from the original node.
See http://wiki.apache.org/cassandra/MultinodeCluster for details.
If you're setting up the cluster using virtual machines, this is a common scenario. Here's why: http://wiki.apache.org/cassandra/FAQ#cloned
Even if this isn't your case, as a solution, you can use the nodetool move command to re-assign the token space.
For Example, on a 4 node cluster:
nodetool -h NodeA move 0
nodetool -h NodeB move 42535295865117307932921825928971026431
nodetool -h NodeC move 85070591730234615865843651857942052863
nodetool -h NodeD move 127605887595351923798765477786913079295