I am trying to setup a Cassandra multinode cluster on CentOS 7 with OpenJDK.
I have 2 Nodes:
node1 10.99.189.49
node2 10.99.189.50
I have done following things till now:
Downloaded the tarball of Cassandra from PlanetCassandra site
Extracted it in Documents folder.
Created all the necessary directories (data/saved_cache, data/commitlog, data/data) as mentioned in the YAML file.
And I have made 3 changes in my conf/cassandra.yaml file as follows:
On node 10.99.189.49:
seeds: "10.99.189.49"
listen_address: 10.99.189.49
rpc_address: 10.99.189.49
On node 10.99.189.50:
seeds: "10.99.189.49"
listen_address: 10.99.189.50
rpc_address: 10.99.189.50
Now I run cassandra on node 10.99.189.49
and then I run cassandra on the other node.
Cassandra starts normally on both the nodes
BUT
when I do:
bin/nodetool status
I can see only one node in it.
Can anyone point what I am doing wrong or missing something?
So I started adding tips in the comments, and for my 3rd time around I thought I'd start putting them all together in an actual answer.
DataStax does a pretty good job documenting how this should work. Make sure that you've gone through these docs (specifically the first one) and that you're following all the steps:
Initializing a multiple node cluster (single data center)
Adding nodes to an existing cluster
In addition to everything you have mentioned above, make sure that the cluster_name is the same on each node.
I find it easier to make this work using the GossipingPropertyFileSnitch. Set that in your cassandra.yaml on each node:
endpoint_snitch: GossipingPropertyFileSnitch
Then make sure that each of your nodes is specifying the same default data center in the cassandra-rackdc.properties file:
dc=DC1
Get your first node (.49) up-and-running. Verify it with nodetool status.
Also verify that you have opened the necessary ports in your firewall. From .49, try telneting your way to the other node on the ports that Cassandra requires. I recommend 7000, as that is the port for non-SSL inter-node communication.
telnet 10.99.189.50 7000
Once you're sure all that works and everything is configured properly, then bring up .50. I remember reading that you should wait at least 2 minutes before bringing up another node, so do that just to be on the safe side. Tail the logs to make sure it handshakes with the other node, or to see any errors:
tail -f /var/log/cassandra/system.log
Notes: Your log location may vary. I'm assuming you're running 2.2. If you are using a different version of Cassandra, please indicate it.
Hope this helps!
On both node use
seeds: "10.99.189.49,10.99.189.50"
and also restart both node cassandra
Related
I am about to use simple cassnadra cluster (3 nodes, x.x.x.104-106). I'm using CentOS7, so i used datastax repository, Cassandra 3.0.
I read on forum, it is better to install the cassandra-stress outside the cluster, otherwise it consumes CPU of the node.
Could you please help me, how to install it?
I tried to copied cassandra-stress.sh separately, but it is dependent on some cassandra files (probably created during installation).
So I decided to install whole Cassandra on separate server, in the same network space. Now, I'm struggling with the correct setup, how to run cassandra-stress tool against the cassandra cluster.
In cassandra.yaml I setup Cassandra name, listen_adress to public_ip, rpc_address to loopback address, I set seeds to cassandra cluster nodes (x.x.x.104-106)... but in general it does not make sense to set it up, since I dont wan't create another node in the Cassandra cluster.
Could you please help me?
Edit: Maybe using something like this might be the correct way?
cassandra-stress user profile=/usr/cassandra/stress-file.yaml ops(insert=1,books=1) n=10000 -node x.x.x.104,x.x.x.105,x.x.x.106 -port native= ?
Telnet [cassandra_node_ip_ddress] 7000 works fine
If you have your Cassandra cluster running with the proper ports open (by default 9042 for clients and 7199 for JMX), and Cassandra directory on a different machine, then you should be able to run cassandra-stress, from outside the cluster, against your cluster simply by passing the -node option with an IP of one of the nodes in your cluster (say x.x.x.104). For example,
$CASSANDRA_HOME/tools/bin/cassandra-stress write -node x.x.x.104
should work. You can see more options with
$CASSANDRA_HOME/tools/bin/cassandra-stress help
on every node:
in cassandra.yaml set rpc_address to IP address
in cassanda-env.sh set LOCAL_JMX=no and jmx options autenticate=false
open firewall port 7199
restart firewall and cassandra
on cassandra-stress server:
cassandra-stress user profile=/usr/cassandra/stress-books.yaml ops\
(insert=1,books=1\)
n=10000 -node 172.16.20.104,172.16.20.105,172.16.20.106 -port native=9042
thrift=9160 jmx=7199
Note! JMX communication is not secured
I've been tasked with re-writing some sub-par Ansible playbooks to stand up a Cassandra cluster in CentOS. Quite frankly, there doesn't seem to be much information on Cassandra out there.
I've managed to get the service running on all three nodes at the same time, using the following configuration file, info scrubbed.
HOSTIP=10.0.0.1
MSIP=10.10.10.10
ADMIN_EMAIL=my#email.com
LICENSE_FILE=/tmp/license.conf
USE_LDAP_REMOTE_HOST=n
ENABLE_AX=y
MP_POD=gateway
REGION=test-1
USE_ZK_CLUSTER=y
ZK_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
ZK_CLIENT_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
USE_CASS_CLUSTER=y
CASS_HOSTS="10.0.0.1:1,1 10.0.0.2:1,1 10.0.0.3:1,1"
CASS_USERNAME=test
CASS_PASSWORD=test
The HOSTIP changes depending on which node the configuration file is on.
The problem is, when I run nodetool ring, each node says there's only two nodes in the cluster: itself and one other, seemingly random from the other two.
What are some basic sanity checks to determine a "healthy" Cassandra cluster? Why is nodetool saying each one thinks there's a different node missing from the cluster?
nodetool status - overview of the cluster (load, state, ownership)
nodetool info - more granular details at the node-level
As for the node mismatch I would check the following:
cassandra-topology.properties - identical across the cluster (all 3 IPs listed)
cassandra.yaml - I typically keep this file the same across all nodes. The parameters that MUST stay the same across the cluster are: cluster_name, seeds, partitioner, snitch).
verify all nodes can reach each other (ping, telnet, etc)
DataStax (Cassandra Vendor) has some good documentation. Please note that some features are only available on DataStax Enterprise -
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html
Also check out the Apache Cassandra site -
http://cassandra.apache.org/community/
As well as the user forums -
https://www.mail-archive.com/user#cassandra.apache.org/
Actually, the thing you really want to check is if all the nodes "AGREE" on schema_id. nodetool status shows if nodes or up, down, joining, yet it does not really mean 'healthy' enough to make schema changes or do other changes.
The simplest way is:
nodetool describecluster
Cluster Information:
Name: FooBarCluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2, 10.136.2.3]
If schema IDs do not match, you need to wait for schema to settle, or run repairs, say for example like this:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2]
43fe9177-382c-327e-904a-c8353a9dxxxx: [10.136.2.3]
However, running nodetool is 'heavy' and hard to parse.
The information is inside the database, you can check here:
'SELECT schema_version, release_version FROM system.local' and
'SELECT peer, schema_version, release_version FROM system.peers'
Then you compare schema_version across all nodes... if they match, the cluster is very likely healthy. You should ALWAYS check this before making any changes to schema.
Now, during a rolling upgrade, when changing engine versions, the release_version is different, so to support automatic rolling upgrades, you need to check schema_id matching within release_versions separately.
I'm not sure all of the problems you might be having, but...
Check the cassandra.yaml file. You need minimum 3 things to be the same - seeds: list (but do not list all nodes as seeds!), cluster_name, and snitch. Make sure your listen_address is correct.
If you are using gossipingPropertyFileSnitch then check cassandra-topology.properties and/or cassandra-rackdc.properties files for accuracy.
Don't start all the nodes at the same time. Start the seed nodes 1st - the other nodes will "gossip" with the seed node to learn cluster topology. Shutdown the seed nodes last.
Don't use shared storage. That defeats the purpose of distributed data and is considered a cassandra anti-pattern.
If you're in AWS, don't use auto-scaling groups unless you know what you're doing.
Once you've done all that, use nodetool status | ring | info or jmx to see what the cluster is doing.
Datastax does have decent documentation for cassandra.
I just installed a 3 node cassandra (2.0.11) community cluster with a single seed node. I installed opscenter (5.0.2) on the seed node and everything is working fairly well. The only issue I am having is that any node actions I perform (stop, start, compact, etc) apply only to the seed node. Even if I choose a different node on the ring or list, the action always happens on the seed node.
I watched the opscenter logs and can see requests for /ops/compact/ip_address and the ip address is the correct node that I chose but the action always run on the seed instance.
All agents have been installed on all the nodes and the cluster is fully operational. I can run nodetool compact on each node and see the compaction progress in opscenter.
I have each node configured to listen on an internal address and have verified that the rpc server is open on the network. I have also tried adding the cluster using a non-seed node but all actions continue to run on the seed node.
Posted the answer above but I'll explain more in detail for anyone else with this issue.
I changed rpc_address and listen_address in cassandra.yaml in order to listen on a private ip address. I restarted cassandra and the cluster could communicate easily. The datastax-agent was still reporting 127.0.0.1 to opscenter as the rpc address. I found this out by enabling trace logging in opscenter.
If you modify anything in cassandra.yaml, make sure you restart the datastax-agent as it apparently caches the data.
I've a four node apache cassandra community 1.2 cluster in single datacenter with a seed.
All configurations are similar in cassandra.yaml file.
The following issues are faced, please help.
1] Though fourth node isn't listed in nodetool ring or status command, system.log displayed only this node isn't communicating via gossip protoccol with other nodes.
However both jmx & telnet port is enabled with proper listen/seed address configured.
2] Though Opscenter is able to recognize all four nodes, the agents are not getting installed from opscenter.
However same JVM version is installed as well as JAVA_HOME is also set in all four nodes.
Further observed that problematic node has Ubuntu 64-Bit & other nodes are Ubuntu 32-Bit, can it be the reason?
What is the version of cassandra you are using. I had reported a similar kind of bug in cassandra 1.2.4 and it was told to move to subsequent versions.
Are you using gossiping property file snitch? If that's the case, your problem should have been solved by having updated cassandra-topology.properties files that are upto date.
If all these are fine, check your TCP level connection via netstat and TCP dump.If the connections are getting dropped at application layer, then consider a rolling restart.
You statement is actually very raw. Your server level configuration might be wrong in my assumption.
I would suggest you to check if cassandra-topology.properties and cassandra-racked.properties across all nodes are consistent.
Is there any detail step-by-step document to address the multi-node cassandra installation in Windows? I read some documents/blogs and tried on Window7 workstations/Windows2008 servers but not be able to establish connection from the 2nd node to the 1st node.
When I was setting up my first cluster on windows I found this blogpost to be excellent. It covers many aspects of the setup including:
Firewall / Networking issues.
Running Cassandra as a service.
Monitoring and maintenance.
If you want to create a complete setup with using just cassandra have a look at this blog.
But to setup a multi-node cluster, you basically need to have the correct ports open on your servers. When it comes to configuration you are basically going to have identical cassandra.yaml configs accross all your nodes, with the same seeds list, and the only two fields need to be changed are the listen_address and possibly rpc_address (although you could just listen an all interfaces for the rpc_address by setting it to:
rpc_address: 0.0.0.0