After starting Cassandra and starting batch writes, the system disk becomes full and when I inspect it using df -h. But I can't find the which files use this space used. I tried to inspect using du -h with no success. After restarting the machine, the problem still exists.
When I delete some files and start Cassandra again I got about 11GB available?
Any advice to got a solution for this problem?
Thanks
For data and commit log files see these places. You can configure these in cassandra.yaml file.
data_file_directories
The directory location where table data (SSTables) is stored. Cassandra distributes data evenly across the location, subject to the granularity of the configured compaction strategy. Default locations:
Package installations: /var/lib/cassandra/data
Tarball installations: install_location/data/data
commitlog_directory
The directory where the commit log is stored. Default locations:
Package installations: /var/lib/cassandra/commitlog
Tarball installations: install_location/data/commitlog
For more information read this.
Related
I'm trying to setup cassandra 3.11.6.1 (and tried with 3.11.4.1) but failed to make it works. In the yaml configuration I set directories values for commitlog, hints, data and saved_cache to another root dir but in the logs it seems that cassandra doesn't take care of it as it tries to open directories in the default conf root dir:
WARN [HintsWriteExecutor:1] 2020-05-06 11:30:34,864 NativeLibrary.java:306 - open(/var/lib/cassandra/hints, O_RDONLY) failed, errno (2).
ERROR [HintsWriteExecutor:1] 2020-05-06 11:30:34,864 HintsCatalog.java:167 - Unable to open directory /var/lib/cassandra/hints
The group/owner is correctly set and chmod is 0777 to avoid any user rights problems.
The last thing I've tried is to create a symlink /var/lib/cassandra pointing to my datastore directory but it doesn't change anything.
Is it possible to use antoher directory configuration but default one?
Is someone have faced this problem and solved it? (and how, please)
The problem was that cassandra reads /etc/cassandra/default.conf/cassandra.yaml even if you create your own cassandra.yaml in /etc/cassandra
Regarding the location of cassandra created data files and system files, I need to move the "commitlog_directory", "data_file_directories" and "saved_caches_directory" which have settings in the "cassandra.yaml" config file. It is currently at the default location "/var/lib/cassandra". The data is only some test data and of course the system generated keyspaces which are
dse_perf
dse_system
OpsCenter
system
system_traces
There are also the commitlog and saved_caches.db to move.
I am thinking of moving the keyspace directories with linux shell commands but I'm very unsure if they will become corrupt somehow. There is simply no space in the default drive and we need to move everything to the secondary and tertiary mounted drives.
Right now I'm in the process of moving all the files and resetting the yaml settings.
I have two questions -
Regarding the cassandra.yaml file, are there any other files besides this that are depended upon to have the location of the commitlog_directory and data_file_directories and saved_caches_directory, and their 'wrong location' will cause failure once I move all these files? I am also concerned the files (like the db files) inside the tables themselves have references to their own location and cause failure once they are moved.
If I just move the three settings commitlog_directory and data_file_directories and saved_caches_directory, will dse/cassandra actually create all the system keyspaces (system_traces, dse_perf, system, OpsCenter, dse_system), and the commitlof and the saved_caches.db, and will any other upstream config files be out of sync with that (same as first part of question 1)?
It is a very new installation so re installing would not be the end of the world but I realllly don't want to because we have kerberos and all kinds of other stuff on top of this cluster now.
This OS is ubuntu 14.0.4 and the DSE version is 4.7.
I just finished doing this. My instances are in AWS EC2 so your process may vary, but in essence:
create a new volume and attach it to the instance. my new device was
/dev/xvdg.
create new mount point sudo mkdir /new_data
format the new volume sudo mkfs -t ext4 /dev/xvdg
edit /etc/fstab so that your mount will survive reboots and add this
line /dev/xvdg /new_data ext4 defaults,nofail,nobootwait 0 2
mount the new volume sudo mount -a
make the new directories sudo mkdir -p
/new_data/lib/cassandra/commitlog
chown the ownership sudo chown -R cassandra:cassandra
/new_data/lib/cassandra
change cassandra.yaml to point to the new dirs
drain the node. if you're moving the data dir, copy over the data
from the old location to the new location. if you're moving
commitlog only, just restart cassandra.
I was able to move all the files and the commitlog as well. I changed the yaml and pointed it to where I wanted it to go. Remember to run the following command afterward -
chown -R cassandra:cassandra
And voila! Everything is reading/writing as it should. Cassandra is neato.
I am newbie to the cassandra, the situation is:
[1] I want to bulkload(bulk-upload) my cassandra data from my client PC into the "remote server A"
[2] the IPAddress of the remote server A is 192.168..
[3] so I typed as follows from my client PC:
$ sstableloader -d 192.168.**.** [path/to/my/clientPC's/cassandra/columnFamily/Directory]
[4] the cassandra is running on both of clientPC and remote server A
then, I get the message like this...
Could not retrieve endpoint ranges:
I cant get what on earth is going on here... please somebody help me...
Ensure that you are running the command from your C* data directory root, and then pass the relative path for the keyspace and columnFamily. The target database must also have the same keyspace name and column family name.
So if your C* data dir in cassandra.yaml is defined as /cassandra/data and your keyspace is ks1 and column family is my_cf, then cd to /cassandra/data, the run sstableloader -d <ip> ks1/mv_cf.
From http://www.datastax.com/docs/1.1/references/bulkloader
Using sstableloader
In binary installations, sstableloader is located in the
/bin directory.
The sstableloader bulk loads the SSTables found in the directory
to the configured cluster. The parent directory of
is used as the keyspace name. For example to load an
SSTable named Standard1-he-1-Data.db into keyspace Keyspace1, the
files Keyspace1-Standard1-he-1-Data.db and
Keyspace1-Standard1-he-1-Index.db must be in a directory called
Keyspace1/Standard1/.
bash sstableloader [options]
Example:
$ ls -1 Keyspace1/Standard1/ Keyspace1-Standard1-he-1-Data.db
Keyspace1-Standard1-he-1-Index $ /bin/sstableloader
-d localhost //
Also, make sure any sstableloader defaults (such as port) match your target C* cluster.
I am using Cassandra 1.2 db on windows 7.
I want to take the back up of a keyspace.
I am doing as following:
C:\Workspace\apache-cassandra-1.2.4-bin\bin> nodetool -h localhost -p 7199 snaps
hot myDb
Starting NodeTool
Requested snapshot for: myDb
Snapshot directory: 1371534210892
C:\Workspace\apache-cassandra-1.2.4-bin\bin>
So it shows snapshot directory as 1371534210892 . What does it mean?
Where can I find the snapshot just created ?
TL;DR;
Check C:\var\lib\cassandra\data\system\myDb\snapshots\1371534210892
Before I provide details its important that you know my environment so you can compare.
How I setup Cassandra
I downloaded the zip from Apache's website then I unzipped it to C:\apache-cassandra-1.2.5 and finally I added the CASSANDRA_HOME environment variable.
How I start / backup Cassandra
I start cassandra by running startup.bat in the bin folder:
C:\apache-cassandra-1.2.4\bin\cassandra.bat
I backup cassandra by running the same command that you did (I backed up system because it was a fresh cassandra install):
nodetool -h localhost snapshot system
# output:
Starting NodeTool
Requested snapshot for: system
Snapshot directory: 1371547087563
I then browsed to the following directory where I found the 1371547087563 folder:
C:\var\lib\cassandra\data\system\local\snapshots
The snapshot is also created for every other keyspace so with a clean install I could find it in:
C:\var\lib\cassandra\data\system\schema_columns\snapshots
C:\var\lib\cassandra\data\system\schema_columnfamilies\snapshots
C:\var\lib\cassandra\data\system\schema_keyspaces
So basically it backups up the 4 internal keyspaces (system, schema_columns, schema_columnfamilies, schema_keyspaces) and the keyspace that you provide on the end as a parameter to the nodetool command, but because I specified system as the param, the command created snapshots of the 4 internal keyspaces only.
In your case the fifth keyspace would be the one you are after.
find 1371534210892 folder inside cassandra/data/yourkeyspacename (equivalent to folder/var/lib/data/yourkeyspacename in LINUX) here each CF have 1371534210892 folder under snapshot directory which is latest one,
This base cassandra folder is the one which you generated during installation not the one having bin and all directories
I've hadoop single instance cluster configured to run with some IP address ( instead of localhost ) on centos linux. I was able to execute example mapreduce job correctly. That tells me that the hadoop setup appears to be fine.
I have also addded couple of data files to hadoop databse under "/data" folder and are visible through the "dfs" comand
bin/hadoop dfs -ls /data
I am trying to connect to this HDFS system from PDI/Kettle. In the HDFS File browser, if I put the HDFS connection parameters incorrectly, e.g. incorrect port, it says it can not connect to the HDFS server. Instead, If I put in all parameters correctly ( server,port,user,password ), and click 'connect' it does not give the error, meaning it is able to connect. But in the file list, it shows "/" .
Doesnt show data folder. What could be going wrong ?
I've already tried this :
tried chmod 777 to the datafiles using "bin/hadoop dfs -chmod -R 777 /data"
tried using root and also hdfs linux user in the PDI file browser
tried adding the data files in some other location
re-formatting hdfs several times and adding data files again
copying the hadoop-core jar file from hadoop installable to PDI extlib
but it does not list files in the PDI browser. I can not see anything in the PDI log either... Need quick help ... thanks !!!
-abhay
I got past this issue. On windows, PDI was not logging anything in the log file. I tried same thing on linux, when it showed me in the log that it was missing a library from Apache, the commons-configuration. I downloaded latest version of the same and put it under the extlib/pentaho folder and boom ! it worked !!