How to use sstableloader from a node not in Cassandra Cluster ring - cassandra

we are using apache-cassandra 1.1.9 version on Production Cassandra Cluster on linux.I want to upload some data using sstableloader.
I was able to generate sstables for a small data and then tried to upload these sstables into Cassandra Cluster using sstableloader from another machine(which is in same network but not in cassandra cluster ring) but get below error
"Could not retrieve endpoint ranges:"
I do not understand why this error is coming.
This machine , where i am running sstableloader, has same cassandra installation.I copied the cassandra.yaml from production cassandra into my host machine's apache-cassandra/conf folder.
My sstables are in below directory structure:-
/path/to/keyspace dir/Keyspace/*.db
SStable command I am running is below
./sstableloader -d -i , /home/Data/Keyspace/
Could not retrieve endpoint ranges:
Please advise , if i am doing wrong here ?

Found the solution.
The sstableloader command needs to be executed from the directory containing the Keyspace subdirectory.
For e.g
if /home/Data is the directory structure, under which there is sub directories keyspace/ColumnFamily/
then execute command like below from /home/Data/ directory.
~/apache-cassandra/bin/sstableloader -d /keyspace/ColumnFamily

This is a bit old, but I ran into the "Could not retrieve endpoint ranges" error recently with a different root cause.
In our case, data was being exported from a production system, and loaded to a new development instance. The development instance had been incorrectly set up so the sstables were generated using dse 4.7, and the sstableloader being run was dse 4.6.
Note that it is possible to ingest tables from 4.6 into dse 4.7 for debugging, etc, but that it is necessary to run nodetool upgradesstables first. That isn't what was going on here.

Related

how to migrate Cassandra data from windows server on to LINUX server for Cassandra?

I am not sure if my windows based Cassandra installation will work as it is on Linux based Cassandra nodes.
My data resides on windows Cassandra-DB and plans to shift on to LINUX server in order to use ELASSANDRA now.
Can same data files be copied from Win-OS to Linux-OS in same directories of Cassandra Folders?
As both are with different file system so i have some doubts if that will ever work.
If not what is the workaround to migrate all data?
The issue with the files has more to do with the version of Cassandra, rather than the OS. Cassandra's implementation in Java makes the underlying OS somewhat (albeit not completely) irrelevant.
Each version of Cassandra has a specific format for writing its SSTable files. As long as the version of Cassandra is the same between each server, copying the files should work.
Otherwise, if the Windows and Linux servers can see each other on the network, the easiest way to migrate would be to join the Linux server to the "cluster" on Windows. Just give the Linux server the IP of the Windows machine as its seed, set the cluster_name to be the same, and it should join. Then adjust the keyspace replication and run a repair.
You should not repair, but stream data from your existing DC with a nodetool rebuild -- [source_dc_name]:
1-Just start all nodes in your new DC with auto_bootstrap: false in conf/cassandra.yaml
2-Run nodetool rebuild on these nodes
3-Remove auto_bootstrap: false in conf/cassandra.yaml
If you start Elassandra in your new DC, Elasticsearch indices wil be rebuild while streaming or reparing, so have fun with it !

Elassandra Integration with Existing Cassandra Instance

I'm trying to learn Elassandra and am having an issue configuring it to my current Cassandra instance (I'm learning Cassandra as well).
I downloaded version 3.11.3 of Cassandra to my local computer. I didn't change anything except the cluster_name inside of cassandra.conf. It runs fine and I used bin/cqlsh to create a keyspace and a "user" table with a couple of rows for testing.
I followed the steps on the Elassandra integration page. I downloaded version 6.2.3.10 of Elassandra. I replaced the cassandra.yaml, cassandra-rackdc.properties and cassandra-topology.properties in the Elassandra conf with the ones from the Cassandra conf (I am assuming those last 2 are the "snitch configuration file" mentioned in the instructions but I'm not sure). I stopped my Cassandra instance and then ran the bin/cassandra -e f from my Elassandra directory.
When I run curl -X GET localhost:9200, the output seems to have my correct cluster name, etc.
However, if I run bin/cqlsh from my Elassandra directory and run describe keyspaces, the keyspace I created under Cassandra isn't there. I tried copying the data directory from Cassandra to Elassandra and that seemed to work, but I feel this can't possibly be the actual solution.
Can someone point me to what I am missing in regards to this configuration? With the steps being listed on the website, I'm sure there must be some dumb thing I'm missing.
Thanks in advance.

Migrate from Open Source Cassandra to Datastax Enterprise

We have our current production cluster running on Cassandra 2.2.4
[cqlsh 5.0.1 | Cassandra 2.2.4 | CQL spec 3.3.1 | Native protocol v4]
We want to migrate this setup to a new cluster with DSE 5.0 without disturbing our current production.
What are the steps to do this, with zero/minimal downtime?
We want to have this as a separate cluster.
Can we use sstableloader from source to destination cluster and do a sstableupgrade at destination?
Should we stop compaction on the existing cluster, when running sstabloader?
How to transfer newly created sstables because of the production traffic?
Should we make application to write to both clusters, but only read from the old cluster, until new cluster is in sync with old cluster?
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?
Setup your new cluster using DSE 5.0
Begin writing to both of your clusters
SCP your SSTables over from the old cluster to the new cluster
Use sstableloader, from your new cluster, on the SSTables you just copied to the new cluster.
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?
Depends, the snapshots are exactly that, a snapshot of a specific state your cluster was in at some point in time. If you want the freshest data, use the SSTables currently in your data directory.

Enable Spark on Same Node As Cassandra

I am trying to test out Spark so I can summarize some data I have in Cassandra. I've been through all the DataStax tutorials and they are very vague as to how you actually enable spark. The only indication I can find is that it comes enabled automatically when you select "Analytics" node during install. However, I have an existing Cassandra node and I don't want to have to use a different machine for testing as I am just evaluating everything on my laptop.
Is it possible to just enable Spark on the same node and deal with any performance implications? If so how can I enable it so that it can be tested?
I see the folders there for Spark (although I'm not positive all the files are present) but when I check to see if it's set to Spark master, it says that no spark nodes are enabled.
dsetool sparkmaster
I am using Linux Ubuntu Mint.
I'm just looking for a quick and dirty way to get my data averaged and so forth and Spark seems like the way to go since it's a massive amount of data, but I want to avoid having to pay to host multiple machines (at least for now while testing).
Yes, Spark is also able to interact with a cluster even if it is not on all the nodes.
Package install
Edit the /etc/default/dse file, and then edit the appropriate line
to this file, depending on the type of node you want:
...
Spark nodes:
SPARK_ENABLED=1
HADOOP_ENABLED=0
SOLR_ENABLED=0
Then restart the DSE service
http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseServ.html
Tar Install
Stop DSE on the node and the restart it using the following command
From the install directory:
...
Spark only node: $ bin/dse cassandra -k - Starts Spark trackers on a cluster of Analytics nodes.
http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseStandalone.html
Enable spark by changing SPARK_ENABLED=1
using the command: sudo nano /usr/share/dse/resources/dse/conf/dse.default

How do you use the Cassandra tool sstableloader?

I'm trying to use the sstableloader to load data into an existing Cassandra ring, but cant figure out how to actually get it to work. I'm trying to run it on a machine that has a running cassandra node on it, but when I run it I get an error saying that port 7000 is already in use, which is the port the running Cassandra node is using for gossip.
So does that mean I can only use sstableloader on a machine that is in the same network as the target cassandra ring, but isn't actually running a cassandra node?
Any details would be useful, thanks.
Played around with sstableloader, read the source code, and finally figured out how to run sstableloader on the same machine that hosts a running cassandra node. There are two key points to get this running. First you need to create a copy of the cassandra install folder for sstableloader. This is becase sstableloader reads the yaml file to figure out what ipaddress to use for gossip, and the existing yaml file is being used by Cassandra. The second point is that you'll need to create a new loopback ipaddress (something like 127.0.0.2) on your machine. Once this is done, change the yaml file in the copied Cassandra install folder to listen to this ipaddress.
I wrote a tutorial going more into detail about how to do this here: http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx
The Austin Cassandra Users Group just had a presentation on this:
http://www.slideshare.net/alex_araujo/etl-with-cassandra-streaming-bulk-loading/
I have used the sstableloader utility provided in cassandra-0.8.4 to successfully load the sstables into cassandra.From Some of the issues i have faced i have following tips
If you are running it on single machine,you have to create a copy the cassandra installation folder and have to run sstable-loader from this folder.Also change the listen address,rpc address also provide the ip address of running cassandra as seeds in cassandra.yaml file of this copied one.Check if the cluster name in both the cassandra.yaml file is same.
These sstables have to be in a directory whose name is the name of the keyspace
It requires a directory containing a cassandra.yaml configuration file in the classpath.
Note that the schema for the column families to be loaded should be defined beforehand
For Reference SEE: Using Cassandra SStableloader
For Reference SEE: Using Cassandra SStableloader for bulkloading the data into cassandra
http://ramuprograms.blogspot.com/2014/07/bulk-loading-data-into-cassandra-using.html
If you are looking to do this in Java see below utility class:
BulkWriterLoader
List<String> argList = new ArrayList<>();
argList.add("-v");
argList.add("-d");
argList.add(params.hosts);
argList.add("-f");
argList.add(params.cassYaml);
argList.add(params.fullpath);
LoaderOptions options = LoaderOptions.builder()
.parseArgs(argList.stream().toArray(String[]::new))
.build();
try
{
BulkLoader.load(options);
}
catch (BulkLoadException e)
{
e.printStackTrace();
}
...
The code will also generate the sstable files using the CQLSSTableWriter class.
Things improve and the whole procedure of using sstableloader is much easier including a easier way to generate sstables with CQLSSTableWriter.
For all the details:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/tools/toolsBulkloader.html

Resources