Cassandra doc mentions that "nodetool snapshot" command takes snapshot of table data. However, I am also able to see schema.cql and manifest.json file in my snapshot directory where all snapshot files are generated.
Is this expected behavior? Also can I use this schema.cql file to restore the schema if needed?
My cassandra version
cqlsh> show version
[cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4]
>nodetool version
ReleaseVersion: 3.0.9
EDIT:
Is it mandatory to use cql file from snapshot while restoring data? Suppose I have create table cql stored somewhere else. Can I use that?
I performed some tests. When I re-created table using cql from snapshot, ID in table name remains same "employee-42a71380966111e8870f97a01282a56a". However when I re-created table using my original cql, ID in table name changed. Can this be a problem and that's why we should use cql from snapshot?
Note-: When I restored data from snapshot, it loaded fine in both above cases
This cql file is for table. Can we get cql from snapshot to create keyspace?
Does cql file gets generated only for user defined table? I can't see cql file getting generated for system tables..
Yes, these files are necessary for restoring this particular table. And schema.cql captures the structure of table on the time of the snapshot because you need to restore snapshot to table with the same structure.
You can find more detailed description in the DataStax documentation.
Update after addition of more questions:
Presence of schema in snapshot makes life easier - quite often the schema evolve, and you can use non-snapshot schema if you guarantee that schema will match to data in snapshot;
nodetool snapshot generates only table's schemas
It's better not to mess-up with system tables...
Here is detailed knowledge base article from DataStax support about backup/restore.
Doc link you have given is for apache Cassandra, while the answer given is with reference to Datastax, I have done taking snaphosts and restore it back in apache-cassandra 2.0.4, It doesn't take any schema backup. All schemas need to be copied separately and need to be created manually in new cluster.
Related
This is an example scenario and we wanted to understand if it would be possible to recover it. And also understand better about the schema.
In a hypothetical scenario of just 1 node, Cassandra 3.11. I have 1 keyspace and 1 table.
root#dd85fa9a3c41:/# cqlsh -k cycling -e "describe tables;"
rank_by_year_and_name
Now I reset my schema and restart Cassandra: (I have no nodes to replicate it again)
root#dd85fa9a3c41:/# nodetool resetlocalschema
With the new schema, I no longer "see" my keyspace+table:
root#dd85fa9a3c41:/# cqlsh -e "describe keyspaces;"
system_traces system_schema system_auth system system_distributed
I lost my original schema, where was my keyspace+table. But, they are still on disk:
root#dd85fa9a3c41:/# ls -l /var/lib/cassandra/data/cycling/
total 0
drwxr-xr-x 1 root root 14 Nov 22 11:32 rank_by_year_and_name-4eedbbf0
How could I restore that keyspace in this scenario? With sstableloader I could recreate keyspace+table and import.
I would like to recover this schema and see my keyspace+table again.
I haven't found any way to do this without manually recreating and importing with sstableloader.
Thank you if you help me!
On-disk data and schema are two different things in Cassandra.
To be able to restore a keyspace schema, you need first to back it up using nodetool snapshot. It will do a back up of the sstable (hard link) and create a schema.cql file containing the schema.
See the official doc here: https://cassandra.apache.org/doc/3.11/cassandra/operating/backups.html
I realise it's a hypothetical scenario but running resetlocalschema on a single-node cluster is a bad idea. The node is supposed to drop its copy of the schema and request the latest copy from other nodes but in the case of a single-node cluster, there are no nodes to get the schema from.
You really shouldn't run resetlocalschema on a single-node cluster unless you're doing some specific test or edge case activity as discussed in CASSANDRA-5094.
Now to your question on how you would restore the schema, most enterprises have a copy of their schema usually in a Change Management system (or CI/Config Management System). Before updates can be made to the schema in production, it usually goes through testing, peer-review, staging/pre-production validation, and finally deployed to production through an approved Change Request (terms might differ between organisations but the net intent is the same).
Similarly when you perform regular backups, the nodetool snapshot command stores a copy of the schema together with the SSTable backups. In this example I posted in https://dba.stackexchange.com/questions/316520/, you can see that the snapshots/ folder contains both a manifest.json (inventory of SSTables included in the snapshot) and a schema.cql (the schema at the time of the snapshot):
data/
community/
users-6140f420a4a411ea9212efde68e7dd4b/
snapshots/
1591083719993/
manifest.json
mc-1-big-CompressionInfo.db
mc-1-big-Data.db
mc-1-big-Digest.crc32
mc-1-big-Filter.db
mc-1-big-Index.db
mc-1-big-Statistics.db
mc-1-big-Summary.db
mc-1-big-TOC.txt
schema.cql
From the above you should be able to see that you have two options available:
recreate the schema from a copy that's been submitted/peer-reviewed in your Change Management System, or
recreate the schema from the snapshot.
The choice depends on what you're trying to achieve. Cheers!
I was going to a blog by Datastax which says it is not recommended to recreate table with same name. That is drop the table and create with the same name. Here is the link for Datastax recreate table faq.
It talks about jira ticket CASSANDRA-5202. It was fixed in 2.1.
I have questions, I am on Cassandra 2.1.16
Is it safe to recreate table or keyspace with same name after dropping?
What precautions we must take if we recreate table or keyspace with same name?
I wrote that post 6 years ago. :)
As it clearly states, the problem existed in older versions of Cassandra. In C* 2.1 (and newer), a table ID (time UUID) is added to the table directory name on disk to prevent the problems I outlined in that post (CASSANDRA-5202). Cheers!
We tried to drop CF's using cassandra cli
DROP COLUMN FAMILY cfName
And when we list the CF from CLI it was not there and when i tried to get the existing CF's via hector
I still could see the CF name
KeyspaceDefinition keyspaceDefinition = newConnection().describeKeyspace(keyspaceName);
keyspaceDefinition.getCfDefs();
Data inside the CF is not there however, the CF is still listed, after listing the CFs via hector if i do a cassandra -cli list column families i can see my deleted CF again
I had to deal with this issue back on Cassandra 1.1 as well. Basically, my column family had become corrupted, and the only way to alter its schema, was to drop/restore the keyspace (which DataStax walked me through, at the time).
If you have a support contract with DataStax, I would HIGHLY recommend contacting them before proceeding. The first thing they'll tell you, is that this is a bug in specific versions of Cassandra 1.1, and that you should upgrade. I haven't tested it, but according to them an in-place upgrade would allow you to modify your schema in the new version. So you might be able to fix this by upgrading to 1.2 or 2.0.
In my case (production, enterprise environment) upgrading on-the-spot was not an option. To fix this, I basically had to drop my entire keyspace, re-create it (and my column families), and recover from a snapshot. I loosely followed the instructions found here:
Take a snapshot of the keyspace on each node. The snapshot files should be stored in the [keyspaceName]/snapshots dir, but I copied mine to another non-Cassandra location just to be on the safe side.
DROP your keyspace.
Stop all nodes.
On each node, delete the .db files in the keyspace directory (but not the snapshot dir).
Copy the files from the snapshot directory back into the keyspace directory.
Restart one node
From that node's cassandra-cli re-create your keyspace.
Verify that your data is there.
Restart the remaining nodes.
I need to take a keyspace from the server as a dump and restore that dump to my local cassandra,
I know to do in mysql but how to do in nosql ?
I learn from site that nodetool ,snapshot and csv file format can achieve this,but I unable to got it ?
You can do this with "nodetool". For a good reference documentation take a look here: http://www.datastax.com/docs/1.1/backup_restore
Roughly you need to perform the following steps:
take a "snapshot" of the keyspace using: nodetool snapshot <keyspace-name>. This is run on the server, where you want to take generates a "snapshot". It will do that, storing a "snapshot" for each table of the keyspace.
copy the "snapshots" to your local server. Do this for each keyspace table: <cassandra-dir>/data/<keyspace-name>/<table-name>/snapshots/ (look for the "latest" taken snapshot - when you take the snapshot it tells you the "name"/"ID" of the snapshot taken).
in your local server, before you place the "server" snapshots do the following: stop cassandra, delete the content of that "keyspace"(again for each keyspace table: <cassandra-dir>/data/<keyspace>/<table-name>/) and then place the "server" snapshots in each respective "keysapce table" (directly in the <cassandra-dir>/data/<keyspace>/<table-name>/ and not in the "snapshot" directory).
restart the local server, and you should have the data from the server in your local server.
HTH.
To do this by snapshot..
Command for taking the snapshot-
<-path to cassandra's bin folder> nodetool -h <-server host name/ IP> -p <-server port> snapshot
This will create a SNAPSHOT directory in VAR folder and this directory contains the snapshot
of server's current database which you can use as dump for your local server.
While nodetool is the preferred way, however if you don't have direct access to the underlying file structure, I would recommend using something like: cassandradump
$ python cassandradump.py --keyspace system --export-file dump.cql
Exporting schema for keyspace system
Exporting schema for column family system.peers
Exporting data for column family system.peers
Exporting schema for column family system.range_xfers
Exporting data for column family system.range_xfers
Exporting schema for column family system.schema_columns
Exporting data for column family system.schema_columns
...
How to rename keyspace and columnfamily in cassandra 1.2? I know that cassandra-cli rename api is no longer supported - How to rename keyspace in Cassandra. Maybe there are some api in CQL3? Or some api for creating new columnfamily and coping all data from old to new columnfamily?
Renaming is disabled internally, not just within the thrift API. So there isn't a CQL command to do it either.
However, there is a manual process which is described here:
https://issues.apache.org/jira/browse/CASSANDRA-1585
For rename only a column family also you can follow the next instructions:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201201.mbox/%3C4EF306AC-98D5-45BE-A29C-B68187FBA9C9#thelastpickle.com%3E
Basically is create the new CF, copy the SStables from the old column family to the one renaming the files.