Adding single token nodes to existing datastax cassandra Cluster and data transfer is not working - cassandra

Adding a new single token per nodes to existing datastax cluster and data transfer is not working. Process followed is mentioned below. Please update me if the process i followed is wrong.Thanks
We have 3 Single token range datastax nodes in our AWS EC2 Datacenter, both Search and Graph enabled. We are planning to add 3 more nodes into into our datacenter. We are currently using DseSimpleSnitch and Simple network topology for our keyspace.Also our current replication factor is 2.
Node 1 : 10.10.1.36
Node 2 : 10.10.1.46
Node 3 : 10.10.1.56
cat /etc/default/dse | grep -E 'GRAPH_ENABLED=|SOLR_ENABLED='
GRAPH_ENABLED=1
SOLR_ENABLED=1
Datacenter : SearchGraph
Address Rack Status State Load Owns Token
10.10.1.46 rack1 Up Normal 760.14 MiB ? -9223372036854775808
10.10.1.36 rack1 Up Normal 737.69 MiB ? -3074457345618258603
10.10.1.56 rack1 Up Normal 752.25 MiB ? 3074457345618258602
Step (1) For adding 3 new node into our datacenter first we changed our keyspace topology and snitch to network aware.
1)Changed the snitch.
cat /etc/dse/cassandra/cassandra.yaml | grep endpoint_snitch:
endpoint_snitch: GossipingPropertyFileSnitch
cat /etc/dse/cassandra/cassandra-rackdc.properties |grep -E 'dc=|rack='
dc=SearchGraph
rack=rack1
2)
(a) Shut down all the nodes, then restart them.
(b) Run a sequential repair and nodetool cleanup on each node.
3)Changed keyspace topology.
ALTER KEYSPACE tech_app1 WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};
ALTER KEYSPACE tech_app2 WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};
ALTER KEYSPACE tech_chat WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'SearchGraph' : 2};
Reference : http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html , http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSwitchSnitch.html
Step (2) For updating token range and setting up new cassandra node, we follow below process.
1) Recalculate token range
root#ip-10-10-1-36:~# token-generator
DC #1:
Node #1: -9223372036854775808
Node #2: -6148914691236517206
Node #3: -3074457345618258604
Node #4: -2
Node #5: 3074457345618258600
Node #6: 6148914691236517202
2) Installed Datastax enterprise same version on new nodes.
3) Stopped the node service and and cleared the data.
4) (a) Assigned token range in following manner to new node.
Node 4: 10.10.2.96 Range: -2
Node 5: 10.10.2.97 Range: 3074457345618258600
Node 6: 10.10.2.86 Range: 6148914691236517202
4) (b) Configured cassandra.yaml on each new node:
Node 4 :
cluster_name: 'SearchGraph'
num_tokens: 1
initial_token: -2
parameters:
- seeds: "10.10.1.46, 10.10.1.56"
listen_address: 10.10.2.96
rpc_address: 10.10.2.96
endpoint_snitch: GossipingPropertyFileSnitch
Node 5 :
cluster_name: 'SearchGraph'
num_tokens: 1
initial_token: 3074457345618258600
parameters:
- seeds: "10.10.1.46, 10.10.1.56"
listen_address: 10.10.2.97
rpc_address: 10.10.2.97
endpoint_snitch: GossipingPropertyFileSnitch
Node 6 :
cluster_name: 'SearchGraph'
num_tokens: 1
initial_token: 6148914691236517202
parameters:
- seeds: "10.10.1.46, 10.10.1.56"
listen_address: 10.10.2.86
rpc_address: 10.10.2.86
endpoint_snitch: GossipingPropertyFileSnitch
5) Changed the snitch.
cat /etc/dse/cassandra/cassandra.yaml | grep endpoint_snitch:
endpoint_snitch: GossipingPropertyFileSnitch
cat /etc/dse/cassandra/cassandra-rackdc.properties |grep -E 'dc=|rack='
dc=SearchGraph
rack=rack1
6) Start DataStax Enterprise on each new node in two minutes intervals with consistent.rangemovement turned off:
JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false
7) After the new nodes are fully bootstrapped, used nodetool move to assign the new initial_token for existing nodes as per token recalculation done at step 4(a). Process done on each node one at a time.
On Node 1(10.10.1.36) : nodetool move -3074457345618258603
On Node 2(10.10.1.46) : nodetool move -9223372036854775808
On Node 3(10.10.1.56) : nodetool move 3074457345618258602
Datacenter: SearchGraph
Address Rack Status State Load Owns Token
10.10.1.46 rack1 Up Normal 852.93 MiB ? -9223372036854775808
10.10.1.36 rack1 Up Moving 900.12 MiB ? -3074457345618258603
10.10.2.96 rack1 UP Normal 465.02 KiB ? -2
10.10.2.97 rack1 Up Normal 109.16 MiB ? 3074457345618258600
10.10.1.56 rack1 Up Moving 594.49 MiB ? 3074457345618258602
10.10.2.86 rack1 Up Normal 663.94 MiB ? 6148914691236517202
Post Updated:
But we are getting following error while joining nodes.
AbstractSolrSecondaryIndex.java:1884 - Cannot find core chat.chat_history
AbstractSolrSecondaryIndex.java:1884 - Cannot find core chat.history
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.business_units
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.feeds
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.feeds_2
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.knowledegmodule
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.userdetails
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.userdetails_2
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.vault_details
AbstractSolrSecondaryIndex.java:1884 - Cannot find core search.workgroup
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.feeds
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.knowledgemodule
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.organizations
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.userdetails
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.vaults
AbstractSolrSecondaryIndex.java:1884 - Cannot find core cloud.workgroup
Node joining failed with following error :
ERROR [main] 2017-08-10 04:22:08,449 DseDaemon.java:488 - Unable to start DSE server.
com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin com.datastax.bdp.plugin.SolrContainerPlugin
Caused by: java.lang.IllegalStateException: Cannot find secondary index for core ekamsearch.userdetails_2, did you create it?
If yes, please consider increasing the value of the dse.yaml option load_max_time_per_core, current value in minutes is: 10
ERROR [main] 2017-08-10 04:22:08,450 CassandraDaemon.java:705 - Exception encountered during startup
java.lang.RuntimeException: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Unable to activate plugin
Has anyone encountered these errors or warnings before?

Token Assign Issue ::
1) I had wrongly assigned token range in Step 4) (a). Assign token which
bisect or trisect the value which are generated using
"token-generator"
Node 4: 10.10.2.96 Range: -6148914691236517206
Node 5: 10.10.2.97 Range: -2
Node 6: 10.10.2.86 Range: 6148914691236517202
Note : We don't need to change the token range of existing nodes in data
center.No need to follow procedure in Step 7 which i have mentioned
above.
Solr Issue resolved : Cannot find cor ::
Increased load_max_time_per_core value in dse.yaml configuration file,
still i was receving the error.Finalys solved the issue
by following method
1) Started the new nodes as non-solr and wait for all cassandra data
to migrate to joining nodes.
2) Add the parameter auto_bootstrap: False directive to the
cassandra.yaml file
3) Re-start the same nodes after enabling solr. Changed parameter
SOLR_ENABLED=1 in /etc/default/dse
3) Re-index in all new joined nodes. I had to reloaded all core
required with the reindex=true and distributed=false parameters in
new joined nodes.
Ref : http://docs.datastax.com/en/archived/datastax_enterprise/4.0/datastax_enterprise/srch/srchReldCore.html

Related

cassandra service (3.11.5) stops automaticall after it starts/restart on AWS linux

cassandra service (3.11.5) stops automatically after it starts/restart on AWS linux.
I have fresh installation of cassandra on new instance of AWS linux (t3.xlarge) and
sudo service cassandra start
or
sudo service cassandra restart
after 1 or 2 seconds, the service stop automatically. I looked into logs and I found these.
I am not sure, I havent change configs related to snitch and its always SimpleSnitch. I dont have any multiple cassandras. Just only on single EC2.
Logs
INFO [main] 2020-02-12 17:40:50,833 ColumnFamilyStore.java:426 - Initializing system.schema_aggregates
INFO [main] 2020-02-12 17:40:50,836 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO [main] 2020-02-12 17:40:51,094 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
Installation steps
sudo curl -OL https://www.apache.org/dist/cassandra/redhat/311x/cassandra-3.11.5-1.noarch.rpm
sudo rpm -i cassandra-3.11.5-1.noarch.rpm
sudo pip install cassandra-driver
export CQLSH_NO_BUNDLED=true
sudo chkconfig --levels 3 cassandra on
The issue is in your log file:
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
It seems that you started the cluster, stopped it and renamed the datacenter from dc1 to datacenter1.
In order to fix:
If no data is stored, delete the data directories
If data is stored, rename the datacenter back to dc1 in the config
I had the same problem , where cassandra service immediately stops after it was started.
in the cassandra configuration file located at /etc/cassandra/cassandra.yaml change the cluster_name to the previous one, like this:
...
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'dc1'
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
...

Cassandra restart issues while restoring to a new cluster

I am restoring to a fresh new Cassandra 2.2.5 cluster consisting of 3 nodes.
Initial cluster health of the NEW cluster:
-- Address Load Tokens Owns Host ID Rack
UN 10.40.1.1 259.31 KB 256 ? d2b29b08-9eac-4733-9798-019275d66cfc uswest1adevc
UN 10.40.1.2 230.12 KB 256 ? 5484ab11-32b1-4d01-a5fe-c996a63108f1 uswest1adevc
UN 10.40.1.3 248.47 KB 256 ? bad95fe2-70c5-4a2f-b517-d7fd7a32bc45 uswest1cdevc
As part of the restore instructions in Datastax docs, i do the following on the new cluster:
1) cassandra stop on all of the three nodes one by one.
2) Edit cassandra.yaml for all of the three nodes with the backup'ed token ring information. [Step 2 from docs]
3) Remove the contents from /var/lib/cassandra/data/system/* [Step 4 from docs]
4) cassandra start on nodes 10.40.1.1, 10.40.1.2, 10.40.1.3 respectively.
Result:
10.40.1.1 restarts back successfully:
-- Address Load Tokens Owns Host ID Rack
UN 10.40.1.1 259.31 KB 256 ? 2d23add3-9eac-4733-9798-019275d125d3 uswest1adevc
But the second and the third nodes fail to restart stating:
java.lang.RuntimeException: A node with address 10.40.1.2 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:546) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:766) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:693) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:585) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) [apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) [apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) [apache-cassandra-2.2.5.jar:2.2.5]
INFO [StorageServiceShutdownHook] 2016-08-09 18:13:21,980 Gossiper.java:1449 - Announcing shutdown
java.lang.RuntimeException: A node with address 10.40.1.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
...
Eventual cluster health:
-- Address Load Tokens Owns Host ID Rack
UN 10.40.1.1 259.31 KB 256 ? 2d23add3-9eac-4733-9798-019275d125d3 uswest1adevc
DN 10.40.1.2 230.12 KB 256 ? 6w2321ad-32b1-4d01-a5fe-c996a63108f1 uswest1adevc
DN 10.40.1.3 248.47 KB 256 ? 9et4944d-70c5-4a2f-b517-d7fd7a32bc45 uswest1cdevc
I understand that the HostID of a node might change after system dirs are removed.
My question is:
Do i need to explicitly state during the start to replace itself? Are the docs incomplete or am i missing something in my steps?
Turns out there were stale directories commit_log and saved_caches which i missed to delete earlier. The instructions work correctly with those directories deleted.
Usually on a situation like this, after i do a
$ systemctl stop cassandra
It i will run the
$ ps awxs | grep cassandra
will notice cassandra still has some features up.
I usually do a
$ kill -9 cassandra.pid
and
$ rm -rf /var/lib/cassandra/data/* && /var/lib/cassandra/commitlog/*
java.lang.RuntimeException: A node with address 10.40.1.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
If you are still facing this above error, that means your cassandra process is running on that node. Login to 10.40.1.3 node firstly. Then follow the following steps-
$ jps
You see some processes running. For example:
9107 Jps
1112 CassandraDaemon
Then kill the CassandraDaemon process by the process id you see after executing jps. In my example, here process id 1112 for CassandraDaemon.
$ kill -9 1112
Then check processes again after a while-
$ jps
You will see CassandraDaemon will no longer be available.
9170 Jps
Then remove your saved_caches and commitlog and start cassandra again.
Do this for all nodes you are suffering with above error you mentioned.

Cassandra host in cluster with null ID

Note: We are seeing this issue in our Cassandra 2.1.12.1047 (DSE 4.8.4) cluster with 6 nodes across 3 regions (2 in each region).
Trying to update schemas on our cluster recently, we found the updates were failing. We suspected one node in the cluster was not accepting the change.
When checking the system.peers table of one of our servers in us-east-1, that it had an anomaly, it had what seemed to be a complete entry for a host that does not exist.
cassandra#cqlsh> SELECT peer, host_id FROM system.peers WHERE peer IN ('54.158.22.187', '54.196.90.253');
peer | host_id
---------------+--------------------------------------
54.158.22.187 | 8ebb7f2c-8f81-44af-814b-a537b84834e0
As that host did not exist, I tried to remove it using nodetool removenode but that failed error: Cannot remove self
-- StackTrace --
java.lang.UnsupportedOperationException: Cannot remove self
We know that the .187 server was abruptly terminated a few weeks ago due to an EC2 issue.
We had numerous attempts at trying to make the server healthy, but then in the end simply terminated the server that was reporting this .187 host in the system.peers, ran a nodetool removenode from one of the other servers, and then brought a new server online.
The new server came online, and in an hour or so seemed to have caught up on the backlog of activity needed to bring it inline with the other servers (assumption based purely on CPU monitoring).
However, things are now very odd because the .187 that was reported in the system.peers tables is appearing when we run a nodetool status from any server in the cluster other than the new one we just brought online.
$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 54.158.22.187 ? 256 ? null r1
Datacenter: cassandra-ap-southeast-1-A
======================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.255.xx.xx 7.9 GB 256 ? a0c45f3f-8479-4046-b3c0-b2dd19f07b87 ap-southeast-1a
UN 54.255.xx.xx 8.2 GB 256 ? b91c5863-e1e1-4cb6-b9c1-0f24a33b4baf ap-southeast-1b
Datacenter: cassandra-eu-west-1-A
=================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 176.34.xx.xxx 8.51 GB 256 ? 30ff8d00-1ab6-4538-9c67-a49e9ad34672 eu-west-1b
UN 54.195.xx.xxx 8.4 GB 256 ? f00dfb85-6099-40fa-9eaa-cf1dce2f0cd7 eu-west-1c
Datacenter: cassandra-us-east-1-A
=================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.225.xx.xxx 8.17 GB 256 ? 0e0adf3d-4666-4aa4-ada7-4716e7c49ace us-east-1e
UN 54.224.xx.xxx 3.66 GB 256 ? 1f9c6bef-e479-49e8-a1ea-b1d0d68257c7 us-east-1d
As there is no way I know of to delete a node that does not have a Host ID, I am quite perplexed.
What can I do to get rid of this rogue node?
Note: Here is the result from a describecluster
$ nodetool describecluster
Cluster Information:
Name: XXX
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
d140bc9b-134c-3dbe-929f-7a84c2cd4532: [54.255.17.28, 176.34.207.151, 54.225.11.249, 54.195.174.72, 54.224.182.94, 54.255.64.1]
UNREACHABLE: [54.158.22.187]
I've never had to do this myself, but probably the only thing left for you to do is to assassinate the endpoint. This was made into a nodetool command (nodetool assassinate) in Cassandra 2.2. But prior to that version, the only way to do it is via JMX. Here's a Gist with detailed instructions (instructions and code by Justen Walker).
Prerequisites
Log onto existing cluster alive node
Download JMX Term
wget
$ wget -q -O jmxterm.jar
> http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
> curl
or
$ curl -s -o jmxterm.jar
http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
Run jmxterm
$ java -jar ./jmxterm.jar
Welcome to JMX terminal. Type "help" for available commands.
$>
Assassinate node
Example bad node: 10.0.0.100
Connect to the local cluster
Select the Gossiper MBean Run the
unsafeAssassinateEndpoint with the ip of the bad node
$>open
localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.net:type=Gossiper
#bean is set to org.apache.cassandra.net:type=Gossiper
$>run unsafeAssassinateEndpoint 10.0.0.100
#calling operation unsafeAssassinateEndpoint of mbean org.apache.cassandra.net:type=Gossiper
#operation returns: null
$>quit
Update 20160308:
I've never had to do this myself
Just had to do this myself. Totally looked-up and followed the steps in my own answer, too.
Update 20220925:
As of Cassandra 3.0, you can complete this task simply by running:
nodetool assassinate 10.0.0.100

Cassandra Cluster configuration

I am trying to configure two windows servers in my network as Cassandra cluster.
I did some reading in various sites and changed the below in Cassandra.yalm
after changing the default value of 127.0.0.1 to actual IP the Cassandra service is not starting.
I also added the map to actual IP to localhost in (windows) hosts file.
After doing the above change, the service is coming up when I start the service. it is stopping immediately.
The reason I am changing this IP is to make this a cluster with two node setup,
Please let me know if I miss some thing.
Version: Datastax community version of Cassandra
Server : windows.
Thx
Muthu
Message from Cassandra.txt in logs dir:
ERROR [main] 2014-09-18 11:43:12,155 DatabaseDescriptor.java (line 116) Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Invalid yaml Caused by: Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=seed_provider for JavaBean=org.apache.cassandra.config.Config#34e5190a; No suitable constructor with 2 arguments found for class org.apache.cassandra.config.SeedProviderDef in 'reader', line 8, column 1: cluster_name: 'Test Cluster'
If you want to create Cassandra cluster you must have at least two nodes and configure /etc/cassandra/cassandra.yaml
cassandra.yaml
cluster_name: 'Some Cluster Name'
listen_address: [Current IP]
rpc_address: [Current IP]
seed_provuder:
- seeds: "[Current IP], [Remote IP]"
Note: seeds must have at least two IPs which must be reachable for each other
Clean and start Cassandra instance
sudo rm -rf /var/lib/cassandra/* /var/log/cassandra/*
Note: Cassandra instance must be killed before cleaning those folders.

Node is unreachable in single node Cassandra installation

I have a problem with a single node Cassandra installation.
I can start it without any errors in the log.
I can create a keyspace, create tables, insert and delete data.
However truncate is not working
cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 1};
cqlsh> use mykeyspace;
cqlsh:mykeyspace> create table test1 (num int, primary key (num));
cqlsh:mykeyspace> insert into test1 (num) values (12);
cqlsh:mykeyspace> select * from test1;
num
-----
12
(1 rows)
cqlsh:mykeyspace> truncate test1;
Unable to complete request: one or more nodes were unavailable.
Also if I try to run nodetool describecluster it doesn't return complete response
[XXXX#XXXX dsc-cassandra-2.0.6]$ ./bin/nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
UNREACHABLE: [127.0.0.1]
I'm using
Cassandra DSC 2.0.6.
Red Hat 5.8.
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
I get responses for ping 127.0.0.1 and ping localhost
I checked all the ports that I am aware of cassandra may need (7000, 9160, 7199, 9042) using telnet - for example
telnet 127.0.0.1 7199
telnet localhost 7199
I can connect to these ports.
I'm using the default cassandra.yaml. These are the lines where either IP or hostname shows up
listen_address: localhost
rpc_address: localhost
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "127.0.0.1"
I also looked into the source code. I believe the problem can be close to the method org.apache.cassandra.service.StorageProxyMBean.describeSchemaVersions(). Most likely I get no response to the SCHEMA_CHECK message.
I tried to enable TRACE log in log4j for nodetool (conf/log4j-tools.properties) to get more information about the issue, but somehow log4j didn't start logging (it did create the file that I set in the appender, but the file was empty.)
There must be something specific to this environment because I can't repeat this problem in any other environments. So I can't figure out what's causing it.
The problem was that Cassandra couldn't load snappy.
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:239)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:48)
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:66)
at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:359)
at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:150)
I turned off compression in cassanda.yaml
internode_compression: none
Now both nodetool describecluster and I truncate work.
I also found a similar post here Cassandra Startup Error 1.2.6 on Linux x86_64
Since I can't install another glibc on this machine for the sake of testing I downloaded snappy-java-1.0.4.1.jar and replaced libsnappyjava.so in my snappy-java-1.0.5.jar
With this jar I was able to run cassandra with
internode_compression: all
(I have glibc 2.5 installed)

Resources