Elassandra Integration with Existing Cassandra Instance

Elassandra Integration with Existing Cassandra Instance - cassandra

I'm trying to learn Elassandra and am having an issue configuring it to my current Cassandra instance (I'm learning Cassandra as well).
I downloaded version 3.11.3 of Cassandra to my local computer. I didn't change anything except the cluster_name inside of cassandra.conf. It runs fine and I used bin/cqlsh to create a keyspace and a "user" table with a couple of rows for testing.
I followed the steps on the Elassandra integration page. I downloaded version 6.2.3.10 of Elassandra. I replaced the cassandra.yaml, cassandra-rackdc.properties and cassandra-topology.properties in the Elassandra conf with the ones from the Cassandra conf (I am assuming those last 2 are the "snitch configuration file" mentioned in the instructions but I'm not sure). I stopped my Cassandra instance and then ran the bin/cassandra -e f from my Elassandra directory.
When I run curl -X GET localhost:9200, the output seems to have my correct cluster name, etc.
However, if I run bin/cqlsh from my Elassandra directory and run describe keyspaces, the keyspace I created under Cassandra isn't there. I tried copying the data directory from Cassandra to Elassandra and that seemed to work, but I feel this can't possibly be the actual solution.
Can someone point me to what I am missing in regards to this configuration? With the steps being listed on the website, I'm sure there must be some dumb thing I'm missing.
Thanks in advance.

Related

Cassandra multinode cluster setup issue

I am trying to setup a Cassandra multinode cluster on CentOS 7 with OpenJDK.
I have 2 Nodes:
node1 10.99.189.49
node2 10.99.189.50
I have done following things till now:
Downloaded the tarball of Cassandra from PlanetCassandra site
Extracted it in Documents folder.
Created all the necessary directories (data/saved_cache, data/commitlog, data/data) as mentioned in the YAML file.
And I have made 3 changes in my conf/cassandra.yaml file as follows:
On node 10.99.189.49:
seeds: "10.99.189.49"
listen_address: 10.99.189.49
rpc_address: 10.99.189.49
On node 10.99.189.50:
seeds: "10.99.189.49"
listen_address: 10.99.189.50
rpc_address: 10.99.189.50
Now I run cassandra on node 10.99.189.49
and then I run cassandra on the other node.
Cassandra starts normally on both the nodes
BUT
when I do:
bin/nodetool status
I can see only one node in it.
Can anyone point what I am doing wrong or missing something?

So I started adding tips in the comments, and for my 3rd time around I thought I'd start putting them all together in an actual answer.
DataStax does a pretty good job documenting how this should work. Make sure that you've gone through these docs (specifically the first one) and that you're following all the steps:
Initializing a multiple node cluster (single data center)
Adding nodes to an existing cluster
In addition to everything you have mentioned above, make sure that the cluster_name is the same on each node.
I find it easier to make this work using the GossipingPropertyFileSnitch. Set that in your cassandra.yaml on each node:
endpoint_snitch: GossipingPropertyFileSnitch
Then make sure that each of your nodes is specifying the same default data center in the cassandra-rackdc.properties file:
dc=DC1
Get your first node (.49) up-and-running. Verify it with nodetool status.
Also verify that you have opened the necessary ports in your firewall. From .49, try telneting your way to the other node on the ports that Cassandra requires. I recommend 7000, as that is the port for non-SSL inter-node communication.
telnet 10.99.189.50 7000
Once you're sure all that works and everything is configured properly, then bring up .50. I remember reading that you should wait at least 2 minutes before bringing up another node, so do that just to be on the safe side. Tail the logs to make sure it handshakes with the other node, or to see any errors:
tail -f /var/log/cassandra/system.log
Notes: Your log location may vary. I'm assuming you're running 2.2. If you are using a different version of Cassandra, please indicate it.
Hope this helps!

On both node use
seeds: "10.99.189.49,10.99.189.50"
and also restart both node cassandra

Cassandra: where to modify opscenter agent for a newly added node to existing cluster

I have a single node Cassandra cluster on EC2 (launched from a Datastax AMI) and I manually added a new node which is also backed by the same Datastax AMI after deleting data directory and modifying cassandra.yaml. I can see two nodes in the Nodes section of Opscenter but I see Opscenter agent is not installed in the new node (1 of 2 agents are connected). It looks like in the new node it has its own opscenter installation and that somehow conflicts with the opscenter installation in the first node? I guess I have to fix some configuration file of opscenter agent in the new node so that it can point to the opscenter installation of the first node? But I can't find where to modify.
Thanks!

It is stomp_interface section of /var/lib/datastax-agent/conf/address.yaml

I had to manually put stomp_interface into the configuration file. Also, I noticed that the process was looking for /etc/datastax-agent/address.yaml and never looked for /var/lib/datastax-agent/conf/address.yaml
Also, local_interface was not necessary to get things to work for me. YMMV.
I'm not sure where this gets set, or if this changed between agent versions at some point in time. FWIW, I installed both opscenter and the agents via packages.

How to use sstableloader from a node not in Cassandra Cluster ring

we are using apache-cassandra 1.1.9 version on Production Cassandra Cluster on linux.I want to upload some data using sstableloader.
I was able to generate sstables for a small data and then tried to upload these sstables into Cassandra Cluster using sstableloader from another machine(which is in same network but not in cassandra cluster ring) but get below error
"Could not retrieve endpoint ranges:"
I do not understand why this error is coming.
This machine , where i am running sstableloader, has same cassandra installation.I copied the cassandra.yaml from production cassandra into my host machine's apache-cassandra/conf folder.
My sstables are in below directory structure:-
/path/to/keyspace dir/Keyspace/*.db
SStable command I am running is below
./sstableloader -d -i , /home/Data/Keyspace/
Could not retrieve endpoint ranges:
Please advise , if i am doing wrong here ?

Found the solution.
The sstableloader command needs to be executed from the directory containing the Keyspace subdirectory.
For e.g
if /home/Data is the directory structure, under which there is sub directories keyspace/ColumnFamily/
then execute command like below from /home/Data/ directory.
~/apache-cassandra/bin/sstableloader -d /keyspace/ColumnFamily

This is a bit old, but I ran into the "Could not retrieve endpoint ranges" error recently with a different root cause.
In our case, data was being exported from a production system, and loaded to a new development instance. The development instance had been incorrectly set up so the sstables were generated using dse 4.7, and the sstableloader being run was dse 4.6.
Note that it is possible to ingest tables from 4.6 into dse 4.7 for debugging, etc, but that it is necessary to run nodetool upgradesstables first. That isn't what was going on here.

Cassandra migration

I have Cassandra 0.8.0 running with data on server 1, and a clean install of Cassandra 1.0.3 on server 2.
Is it possible to just copy some files from server 1 to server 2? Or do i have to write my own import/export code?
Both servers can be taken down, restarted, etc.

Why would you not upgrade server1? Upgrade details here (either way read this first):
http://svn.apache.org/viewvc/cassandra/branches/cassandra-1.0/NEWS.txt?view=markup
But if you do want to change machines, follow the procedures for 'nodetool snapshot' as detailed here:
http://wiki.apache.org/cassandra/Operations#Backing_up_data
Re-create the schema on the new node, then add the snapshots to the data directory (as described above), restart cassandra then issue a nodetool scrub.

Thanks zznate it had to do with hardware.
Here some links i found useful:
http://jonathanhui.com/cassandra-data-maintenance-backup-and-system-recovery
http://wiki.apache.org/cassandra/StorageConfiguration
http://www.memonic.com/user/pneff/folder/database/id/1bZvk
If it looks like nothing happened after migrating make sure you create the column family's on the new node using CassandraCli.

How do you use the Cassandra tool sstableloader?

I'm trying to use the sstableloader to load data into an existing Cassandra ring, but cant figure out how to actually get it to work. I'm trying to run it on a machine that has a running cassandra node on it, but when I run it I get an error saying that port 7000 is already in use, which is the port the running Cassandra node is using for gossip.
So does that mean I can only use sstableloader on a machine that is in the same network as the target cassandra ring, but isn't actually running a cassandra node?
Any details would be useful, thanks.

Played around with sstableloader, read the source code, and finally figured out how to run sstableloader on the same machine that hosts a running cassandra node. There are two key points to get this running. First you need to create a copy of the cassandra install folder for sstableloader. This is becase sstableloader reads the yaml file to figure out what ipaddress to use for gossip, and the existing yaml file is being used by Cassandra. The second point is that you'll need to create a new loopback ipaddress (something like 127.0.0.2) on your machine. Once this is done, change the yaml file in the copied Cassandra install folder to listen to this ipaddress.
I wrote a tutorial going more into detail about how to do this here: http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx

The Austin Cassandra Users Group just had a presentation on this:
http://www.slideshare.net/alex_araujo/etl-with-cassandra-streaming-bulk-loading/

I have used the sstableloader utility provided in cassandra-0.8.4 to successfully load the sstables into cassandra.From Some of the issues i have faced i have following tips
If you are running it on single machine,you have to create a copy the cassandra installation folder and have to run sstable-loader from this folder.Also change the listen address,rpc address also provide the ip address of running cassandra as seeds in cassandra.yaml file of this copied one.Check if the cluster name in both the cassandra.yaml file is same.
These sstables have to be in a directory whose name is the name of the keyspace
It requires a directory containing a cassandra.yaml configuration file in the classpath.
Note that the schema for the column families to be loaded should be defined beforehand
For Reference SEE: Using Cassandra SStableloader

For Reference SEE: Using Cassandra SStableloader for bulkloading the data into cassandra
http://ramuprograms.blogspot.com/2014/07/bulk-loading-data-into-cassandra-using.html

If you are looking to do this in Java see below utility class:
BulkWriterLoader
List<String> argList = new ArrayList<>();
argList.add("-v");
argList.add("-d");
argList.add(params.hosts);
argList.add("-f");
argList.add(params.cassYaml);
argList.add(params.fullpath);
LoaderOptions options = LoaderOptions.builder()
.parseArgs(argList.stream().toArray(String[]::new))
.build();
try
{
BulkLoader.load(options);
}
catch (BulkLoadException e)
{
e.printStackTrace();
}
...
The code will also generate the sstable files using the CQLSSTableWriter class.

Things improve and the whole procedure of using sstableloader is much easier including a easier way to generate sstables with CQLSSTableWriter.
For all the details:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/tools/toolsBulkloader.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string