How to load data to remote server using sstableloder? - cassandra

I am newbie to the cassandra, the situation is:
[1] I want to bulkload(bulk-upload) my cassandra data from my client PC into the "remote server A"
[2] the IPAddress of the remote server A is 192.168..
[3] so I typed as follows from my client PC:
$ sstableloader -d 192.168.**.** [path/to/my/clientPC's/cassandra/columnFamily/Directory]
[4] the cassandra is running on both of clientPC and remote server A
then, I get the message like this...
Could not retrieve endpoint ranges:
I cant get what on earth is going on here... please somebody help me...

Ensure that you are running the command from your C* data directory root, and then pass the relative path for the keyspace and columnFamily. The target database must also have the same keyspace name and column family name.
So if your C* data dir in cassandra.yaml is defined as /cassandra/data and your keyspace is ks1 and column family is my_cf, then cd to /cassandra/data, the run sstableloader -d <ip> ks1/mv_cf.
From http://www.datastax.com/docs/1.1/references/bulkloader
Using sstableloader
In binary installations, sstableloader is located in the
/bin directory.
The sstableloader bulk loads the SSTables found in the directory
to the configured cluster. The parent directory of
is used as the keyspace name. For example to load an
SSTable named Standard1-he-1-Data.db into keyspace Keyspace1, the
files Keyspace1-Standard1-he-1-Data.db and
Keyspace1-Standard1-he-1-Index.db must be in a directory called
Keyspace1/Standard1/.
bash sstableloader [options]
Example:
$ ls -1 Keyspace1/Standard1/ Keyspace1-Standard1-he-1-Data.db
Keyspace1-Standard1-he-1-Index $ /bin/sstableloader
-d localhost //
Also, make sure any sstableloader defaults (such as port) match your target C* cluster.

Related

How to transfer data from local file system (linux) to a Hadoop Cluster made on Google Cloud Platform

I am a beginner in Hadoop, I made a Hadoop Cluster (one master and two slaves) on Google Cloud Platform.
I accessed the master of the cluster using from the local file system (Linux): ssh -i key key#public_ip_of_master
Then I did sudo su - inside the cluster because Hadoop functions only appears while being root.
Then I initiated the HDFS using start-dfs.sh and start-all.sh
Now the problem is that I want to tranfer files from the local Linux file system to the Hadoop Cluster and vice versa using the following command (inserting the command inside the cluster while being root):
root#master:~# hdfs dfs -put /home/abas1/Desktop/chromFa.tar.gz /Hadoop_File
The problem is that the local path which is: /home/abas1/Desktop/chromFa.tar.gz is never recognized and I can not seem to know what to do.
I am sure I am missing something trivial but I do not know what it is. I have to use either -copyFromLocal or -put.
local path is never recognized
That is not a Hadoop problem, then. You are on the master node (over SSH), as the root user. There is a /root folder with files, and probably no /home/abas1.
In other words, run ls -l /home, and you see what local files are available.
To get files to the master server to upload from that terminal session, you will want to SCP files first to there from a different machine.
Exit the SSH session
scp -i key root#master-ip home/abas1/Desktop/chromFa.tar.gz /tmp
ssh -i key root#master-ip
Then you can do this
hdfs mkdir /Hadoop_File
ls -l /tmp | grep chromFa # for example, to check file
hdfs -put /tmp/chromFa.tar.gz /Hadoop_file/
Hadoop functions only appears while being root.
Please do not use root for interacting with Hadoop services. Create unique user accounts for HDFS, YARN, Zookeeper, etc. with restricted permissions like you would for any other Unix process.
Using DataProc will do this... And you can still SSH to it, so you should really considering using it instead of manual GCE cluster.

Unable to run cqlsh(connection refused)

I'm getting a connection error "unable to connect to any server" when I run .cqlsh command from the bin directory of my node.
I'm using an edited yaml file containing only the following(rest all values present in the default yaml have been omitted) :
cluster name, num tokens, partitioner, data file directories, commitlog directory, commitlog sync, commitlog sync period, saved cache directory, seed provider info, listen address and endpoint snitch.
Is this error because I've not included some important parameter in the yaml like rpc address? Please help.
OS: RHEL 6.9
Cassandra: 3.0.14
The cassandra yaml file can have modified values, but you should not delete the rows and make your own yaml file. And yes, rpc address is needed in yaml file.
In writing the directories like data_file_directories, you should follow the same indentation as:
data_file_directories -
/path/to/access
Cassandra is very strict at it's indentation in yaml file. I once faced an issue due to this wrong indentation in data_file_directories.
Finally, run ./cqlsh , provide ip_address if it is a remote server.
Check the nodetool status and confirm whether the node is up and normal.
Check the following:
Cassandra is running: nodetool status / ps -elf | grep cassa
Port 9042 (default for CQL) is not used by something else: netstat -an | grep 9042
Try running cqlsh `hostname -i`
Good luck.

Setting up Cassandra on Cloud9 IDE

I've followed these instructions to install Cassandra: http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installDeb_t.html
When I do $ cqlsh terminal replies me with
Connection error: Could not connect to localhost:9160
I read that the issue might be with configuration file cassandra.yaml
However, I turned out I can't access it. My etc/cassandra folder is empty: enter image description here
How to access cassandra.yaml?
Where is cassandra is stored in my project?
Is there a way to check if Cassandra is actually set up in project?
The image you have attached is showing the ~/.cassandra directory off of your home dir. That's not the same as/etc/cassandra. You should be able to confirm this with the following command:
$ ls -al /etc/cassandra/cassandra.yaml
-rw-r--r-- 1 cassandra cassandra 43985 Mar 11 12:46 /etc/cassandra/cassandra.yaml
To verify if Cassandra is even running, this should work for you if you have successfully completed the packaged install:
$ sudo service cassandra status
Otherwise, simply running this should work, too:
$ ps -ef | grep cassandra
When you set up Cassandra, you'll want to set the listen_address and rpc_address to the machine's hostname or IP. They're set to localhost by default, so if it's running cqlsh should connect to that automatically.
My guess is that Cassandra is not starting for you. Check the system.log file, which (for the packaged install) is stored in /var/logs/cassandra:
$ cat /var/log/cassandra/system.log
Check out that file, and you might find some clues as to what is happening here.
Also, did you really install Cassandra 2.0? That version has been deprecated, so for a new install you shouldn't go any lower than Cassandra 2.1.

Cassandra moving data_file_firectories

Regarding the location of cassandra created data files and system files, I need to move the "commitlog_directory", "data_file_directories" and "saved_caches_directory" which have settings in the "cassandra.yaml" config file. It is currently at the default location "/var/lib/cassandra". The data is only some test data and of course the system generated keyspaces which are
dse_perf
dse_system
OpsCenter
system
system_traces
There are also the commitlog and saved_caches.db to move.
I am thinking of moving the keyspace directories with linux shell commands but I'm very unsure if they will become corrupt somehow. There is simply no space in the default drive and we need to move everything to the secondary and tertiary mounted drives.
Right now I'm in the process of moving all the files and resetting the yaml settings.
I have two questions -
Regarding the cassandra.yaml file, are there any other files besides this that are depended upon to have the location of the commitlog_directory and data_file_directories and saved_caches_directory, and their 'wrong location' will cause failure once I move all these files? I am also concerned the files (like the db files) inside the tables themselves have references to their own location and cause failure once they are moved.
If I just move the three settings commitlog_directory and data_file_directories and saved_caches_directory, will dse/cassandra actually create all the system keyspaces (system_traces, dse_perf, system, OpsCenter, dse_system), and the commitlof and the saved_caches.db, and will any other upstream config files be out of sync with that (same as first part of question 1)?
It is a very new installation so re installing would not be the end of the world but I realllly don't want to because we have kerberos and all kinds of other stuff on top of this cluster now.
This OS is ubuntu 14.0.4 and the DSE version is 4.7.
I just finished doing this. My instances are in AWS EC2 so your process may vary, but in essence:
create a new volume and attach it to the instance. my new device was
/dev/xvdg.
create new mount point sudo mkdir /new_data
format the new volume sudo mkfs -t ext4 /dev/xvdg
edit /etc/fstab so that your mount will survive reboots and add this
line /dev/xvdg /new_data ext4 defaults,nofail,nobootwait 0 2
mount the new volume sudo mount -a
make the new directories sudo mkdir -p
/new_data/lib/cassandra/commitlog
chown the ownership sudo chown -R cassandra:cassandra
/new_data/lib/cassandra
change cassandra.yaml to point to the new dirs
drain the node. if you're moving the data dir, copy over the data
from the old location to the new location. if you're moving
commitlog only, just restart cassandra.
I was able to move all the files and the commitlog as well. I changed the yaml and pointed it to where I wanted it to go. Remember to run the following command afterward -
chown -R cassandra:cassandra
And voila! Everything is reading/writing as it should. Cassandra is neato.

Where is cassandra backup stored on windows

I am using Cassandra 1.2 db on windows 7.
I want to take the back up of a keyspace.
I am doing as following:
C:\Workspace\apache-cassandra-1.2.4-bin\bin> nodetool -h localhost -p 7199 snaps
hot myDb
Starting NodeTool
Requested snapshot for: myDb
Snapshot directory: 1371534210892
C:\Workspace\apache-cassandra-1.2.4-bin\bin>
So it shows snapshot directory as 1371534210892 . What does it mean?
Where can I find the snapshot just created ?
TL;DR;
Check C:\var\lib\cassandra\data\system\myDb\snapshots\1371534210892
Before I provide details its important that you know my environment so you can compare.
How I setup Cassandra
I downloaded the zip from Apache's website then I unzipped it to C:\apache-cassandra-1.2.5 and finally I added the CASSANDRA_HOME environment variable.
How I start / backup Cassandra
I start cassandra by running startup.bat in the bin folder:
C:\apache-cassandra-1.2.4\bin\cassandra.bat
I backup cassandra by running the same command that you did (I backed up system because it was a fresh cassandra install):
nodetool -h localhost snapshot system
# output:
Starting NodeTool
Requested snapshot for: system
Snapshot directory: 1371547087563
I then browsed to the following directory where I found the 1371547087563 folder:
C:\var\lib\cassandra\data\system\local\snapshots
The snapshot is also created for every other keyspace so with a clean install I could find it in:
C:\var\lib\cassandra\data\system\schema_columns\snapshots
C:\var\lib\cassandra\data\system\schema_columnfamilies\snapshots
C:\var\lib\cassandra\data\system\schema_keyspaces
So basically it backups up the 4 internal keyspaces (system, schema_columns, schema_columnfamilies, schema_keyspaces) and the keyspace that you provide on the end as a parameter to the nodetool command, but because I specified system as the param, the command created snapshots of the 4 internal keyspaces only.
In your case the fifth keyspace would be the one you are after.
find 1371534210892 folder inside cassandra/data/yourkeyspacename (equivalent to folder/var/lib/data/yourkeyspacename in LINUX) here each CF have 1371534210892 folder under snapshot directory which is latest one,
This base cassandra folder is the one which you generated during installation not the one having bin and all directories

Resources