install multi-node cassandra in windows - cassandra

Is there any detail step-by-step document to address the multi-node cassandra installation in Windows? I read some documents/blogs and tried on Window7 workstations/Windows2008 servers but not be able to establish connection from the 2nd node to the 1st node.

When I was setting up my first cluster on windows I found this blogpost to be excellent. It covers many aspects of the setup including:
Firewall / Networking issues.
Running Cassandra as a service.
Monitoring and maintenance.
If you want to create a complete setup with using just cassandra have a look at this blog.
But to setup a multi-node cluster, you basically need to have the correct ports open on your servers. When it comes to configuration you are basically going to have identical cassandra.yaml configs accross all your nodes, with the same seeds list, and the only two fields need to be changed are the listen_address and possibly rpc_address (although you could just listen an all interfaces for the rpc_address by setting it to:
rpc_address: 0.0.0.0

Related

Cassandra-stress : how to install and set it up outside cassandra cluster

I am about to use simple cassnadra cluster (3 nodes, x.x.x.104-106). I'm using CentOS7, so i used datastax repository, Cassandra 3.0.
I read on forum, it is better to install the cassandra-stress outside the cluster, otherwise it consumes CPU of the node.
Could you please help me, how to install it?
I tried to copied cassandra-stress.sh separately, but it is dependent on some cassandra files (probably created during installation).
So I decided to install whole Cassandra on separate server, in the same network space. Now, I'm struggling with the correct setup, how to run cassandra-stress tool against the cassandra cluster.
In cassandra.yaml I setup Cassandra name, listen_adress to public_ip, rpc_address to loopback address, I set seeds to cassandra cluster nodes (x.x.x.104-106)... but in general it does not make sense to set it up, since I dont wan't create another node in the Cassandra cluster.
Could you please help me?
Edit: Maybe using something like this might be the correct way?
cassandra-stress user profile=/usr/cassandra/stress-file.yaml ops(insert=1,books=1) n=10000 -node x.x.x.104,x.x.x.105,x.x.x.106 -port native= ?
Telnet [cassandra_node_ip_ddress] 7000 works fine
If you have your Cassandra cluster running with the proper ports open (by default 9042 for clients and 7199 for JMX), and Cassandra directory on a different machine, then you should be able to run cassandra-stress, from outside the cluster, against your cluster simply by passing the -node option with an IP of one of the nodes in your cluster (say x.x.x.104). For example,
$CASSANDRA_HOME/tools/bin/cassandra-stress write -node x.x.x.104
should work. You can see more options with
$CASSANDRA_HOME/tools/bin/cassandra-stress help
on every node:
in cassandra.yaml set rpc_address to IP address
in cassanda-env.sh set LOCAL_JMX=no and jmx options autenticate=false
open firewall port 7199
restart firewall and cassandra
on cassandra-stress server:
cassandra-stress user profile=/usr/cassandra/stress-books.yaml ops\
(insert=1,books=1\)
n=10000 -node 172.16.20.104,172.16.20.105,172.16.20.106 -port native=9042
thrift=9160 jmx=7199
Note! JMX communication is not secured

What address should i use for listen_address in cassandra.yaml ?

I am trying to set up a multinode cassandra database on two different machines.
How am i supposed to configure the cassandra.yaml file?
The datastax documentation says
listen_address¶
(Default: localhost ) The IP address or hostname that other Cassandra nodes use to connect to this node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts , or DNS. Do not specify 0.0.0.0.
When i use 'localhost' as the value of listen_address, it runs fine on the local machine , and when i use my ip address, it fails to connect. Why so?
Configuring the nodes and seed nodes is fairly simple in Cassandra but certain steps must be followed. The procedure for setting up a multi node cluster is well documented and I will quote from the linked document.
I think it is easier to illustrate the set up of nodes with 4 instead of 2 since 2 nodes would make little sense to a running Cassandra instance. If you had 4 nodes split between 2 machines and 1 seed node on each machine the conceptual configuration would appear as follows:
node1 86.82.155.1 (seed 1)
node2 86.82.155.2
node3 192.82.156.1 (seed 2)
node4 192.82.156.2
If each of these machines is the same in terms of layout you can use the same cassandra.yaml file across all nodes.
If the nodes in the cluster are identical in terms of disk layout, shared libraries, and so on, you can use the same copy of the cassandra.yaml file on all of them
You will need to set the IP address up under the -seeds configuration in cassandra.yaml.
-seeds: internal IP address of each seed node
parameters:
- seeds: "86.82.155.1,192.82.156.1"
Understanding the difference between a node and seed node is important. If you get these IP addresses crossed you may experience issues similar to what you are describing and from your comment it appears you have corrected the configuration.
Seed nodes do not bootstrap, which is the process of a new node joining an existing cluster. For new clusters, the bootstrap process on seed nodes is skipped.
If you are having trouble grasping the node based architecture read the Achitecture in Brief document or watch the Understanding Core Concepts class.

How to use Zookeeper with Azure HDInsight Linux cluster?

Obviously I need to start a zookeeper server on one of the cluster machines, then I need other client machines to connect to this server.
The way I did it is that I used ssh to connect to the headnode, I found a zk server running on the port 2181. So, I used ifconfig to get the machine's IP address (for example 10.0.0.8) and i then had my worker nodes connect to:
10.0.0.8:2181.
However, my MR job now completes but it works slowly and the output is not correct. I suspect that I'm doing something wrong with Zookeeper, especially that I didn't follow a tutorial and improvised my steps.
HDInsight has multiple zookeeper servers. Not sure if specifying one might be the cause of the problem you are seeing.
I wrote an example a while back that uses Storm to write to HBase (both servers on the same Azure Virtual Network,) and as part of the configuration, I had to specify the three zookeeper servers for the component that writes to hbase. (https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-sensor-data-analysis/ is the article.)
From the cluster head node, you can probably ping zookeeper0, zookeeper1, and zookeeper2 to find the IP address of each.

Open ports for Cassandra in Google Cloud Comput production environment

I've been working with Storm topologies and Cassandra databases for relatively short period of time. I recently realized that my development environment's spec is not strong enough for my testing, so I deployed a 3-node Cassandra cluster on Google Cloud instance. Now I'd like to let Storm topology (hosted on a separate box) to insert into Cassandra. Obviously, this feature is not enabled by default, and I'd like to have a guideline of how to, securely, open Cassandra for database queries from different IP in production scenario. ( I suspect that Google protects its instances with a firewall as well?)
Following Carlos Rojas's directions in THIS LINK, I could open the ports to access Cassandra from outside the network computer. Also, you can open ports in your firewall using this line :
gcutil addfirewall cassandra-rule --allowed="tcp:9042,tcp:9160" --network="default" --description="Allow external Cassandra Thrift/CQL connections" from THIS LINK

Dynamically adding new nodes in Cassandra

Is it possible to add new hosts to a Cassandra cluster dynamically?
What I'm trying to do is set up a program that can:
Set up a local version of the database for each user
Each user's machine will become part of the cluster (the machines will be hosts)
Data will be replicated across all the clusters
Building a cluster of multiple hosts usually entails configuring the cassandra.yaml to store the seeds, listen_address and rpc_address of each host.
My idea is to edit these files through java and insert the new host addresses as required but making sure that data is accurate across each users's cassandra.yaml files would be challenging.
I'm wondering if someone has done something similar or has any advice on a better way to achieve this.
Yes is possible. Look at Netflix's Priam for an complete example of a dynamic cassandra cluster management (but designed to work with Amazon EC2).
For rpc_address and listen_address, you can setup a startup script that configures the cassandra.yaml if it's not ok.
For seeds you can configure a custom seed provider. Look at the seed provider used for Netflix's Priam for some ideas how to implement it
The most difficult part will be managing the tokens assigned to each node in a efficient way. Cassandra 1.2 is around the corner and will include a feature called virtual nodes that, IMO, will work well in your case. See the Acunu presentation about it

Resources