I was going to a blog by Datastax which says it is not recommended to recreate table with same name. That is drop the table and create with the same name. Here is the link for Datastax recreate table faq.
It talks about jira ticket CASSANDRA-5202. It was fixed in 2.1.
I have questions, I am on Cassandra 2.1.16
Is it safe to recreate table or keyspace with same name after dropping?
What precautions we must take if we recreate table or keyspace with same name?
I wrote that post 6 years ago. :)
As it clearly states, the problem existed in older versions of Cassandra. In C* 2.1 (and newer), a table ID (time UUID) is added to the table directory name on disk to prevent the problems I outlined in that post (CASSANDRA-5202). Cheers!
Related
Cassandra doc mentions that "nodetool snapshot" command takes snapshot of table data. However, I am also able to see schema.cql and manifest.json file in my snapshot directory where all snapshot files are generated.
Is this expected behavior? Also can I use this schema.cql file to restore the schema if needed?
My cassandra version
cqlsh> show version
[cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4]
>nodetool version
ReleaseVersion: 3.0.9
EDIT:
Is it mandatory to use cql file from snapshot while restoring data? Suppose I have create table cql stored somewhere else. Can I use that?
I performed some tests. When I re-created table using cql from snapshot, ID in table name remains same "employee-42a71380966111e8870f97a01282a56a". However when I re-created table using my original cql, ID in table name changed. Can this be a problem and that's why we should use cql from snapshot?
Note-: When I restored data from snapshot, it loaded fine in both above cases
This cql file is for table. Can we get cql from snapshot to create keyspace?
Does cql file gets generated only for user defined table? I can't see cql file getting generated for system tables..
Yes, these files are necessary for restoring this particular table. And schema.cql captures the structure of table on the time of the snapshot because you need to restore snapshot to table with the same structure.
You can find more detailed description in the DataStax documentation.
Update after addition of more questions:
Presence of schema in snapshot makes life easier - quite often the schema evolve, and you can use non-snapshot schema if you guarantee that schema will match to data in snapshot;
nodetool snapshot generates only table's schemas
It's better not to mess-up with system tables...
Here is detailed knowledge base article from DataStax support about backup/restore.
Doc link you have given is for apache Cassandra, while the answer given is with reference to Datastax, I have done taking snaphosts and restore it back in apache-cassandra 2.0.4, It doesn't take any schema backup. All schemas need to be copied separately and need to be created manually in new cluster.
While deciding on the technology stack for my own product, I decided to go with scyllaDB for database due to it's impressive performance.
For local development, I setup Cassandra on my Macbook.
Considering ScyllaDB now supports (experimental) MV (Materialized View), it made the development easy. For dev server, I'm running ScyllaDB on Ubuntu 16.04 hosted on Linod.
I am facing following issues :
After a few weeks, one day when I deleted an entry from base table (from ScyllaDB running on Ubuntu) using the partition key, the respective MV still showed the respective entry for the deleted record.
It was fixed after I dropped the whole Key-Space and recreated it, but I'm unable to pinpoint what caused this inconsistency.
When I dropped the MV and recreated it, it did not copy the old data.
I tried to search, but could not find a way to force MV to read from base table and populate itself.
For the first issue, I would like to know if anyone faced similar scenario. Also if there is anything I can do to prevent this from happening or if it can't be prevented and that is what it means to be "experimental".
Any help or reference is appreciated.
In 2.1 Scylla lacked view building (that is, using existing data to populate a view on creation), but that is solved in 2.2.
Indeed the MV status of 2.1 is incomplete. It gotten much better in 2.2 which will be released this week. It's still not GA yet but we have a branch on top of 2.2 that merged newer changes from master which is almost there. It should reach GA quality within 2 months.
Note that the Cassandra MV status is experimental and we have been opening JIRA tickets everywhere we identified there is design flaw in C*'s MV.
tldr; I would suggest you either stick with cassandra if you want MV, or manually do the MV's in scylla.
Materialized views are super experimental. I ran them for about 6 months in production replacing their functionality manually. This was done to improve performance. So if performance is your goal here, I suggest avoiding them.
I can attest that the materialized views if created on a already populated table will infact populate the materialized view on their own so this seems like a scylladb problem. Cassandra has a different problem where the writes will crater the DB if you do this on a large production table.
I also did not have issues with truncating the primary table and seeing the reflection in cassandra.
Additionally I had tried scylladb for a spike for performance reasons. I found it very difficult to work with and dropped it after spending a week trying to get it to do what I knew cassandra would do.
Thanks #Highstead for confirming the automatic population of MV if base table has entries while creating the MV.
For the main query of the inconsistency in tables and MV, I found out that it was due to truncate query on base table.
Also found an issue for it https://github.com/scylladb/scylla/issues/3188
It states that currently, truncating the base table wont clear the MVs created from that table.
Vice-versa, you can run truncate query on the MV and it won't throw an exception (where it should've) and MV will be cleared even when base table contains entries.
So solution for now is to truncate each MV along with tables separately.
We tried to drop CF's using cassandra cli
DROP COLUMN FAMILY cfName
And when we list the CF from CLI it was not there and when i tried to get the existing CF's via hector
I still could see the CF name
KeyspaceDefinition keyspaceDefinition = newConnection().describeKeyspace(keyspaceName);
keyspaceDefinition.getCfDefs();
Data inside the CF is not there however, the CF is still listed, after listing the CFs via hector if i do a cassandra -cli list column families i can see my deleted CF again
I had to deal with this issue back on Cassandra 1.1 as well. Basically, my column family had become corrupted, and the only way to alter its schema, was to drop/restore the keyspace (which DataStax walked me through, at the time).
If you have a support contract with DataStax, I would HIGHLY recommend contacting them before proceeding. The first thing they'll tell you, is that this is a bug in specific versions of Cassandra 1.1, and that you should upgrade. I haven't tested it, but according to them an in-place upgrade would allow you to modify your schema in the new version. So you might be able to fix this by upgrading to 1.2 or 2.0.
In my case (production, enterprise environment) upgrading on-the-spot was not an option. To fix this, I basically had to drop my entire keyspace, re-create it (and my column families), and recover from a snapshot. I loosely followed the instructions found here:
Take a snapshot of the keyspace on each node. The snapshot files should be stored in the [keyspaceName]/snapshots dir, but I copied mine to another non-Cassandra location just to be on the safe side.
DROP your keyspace.
Stop all nodes.
On each node, delete the .db files in the keyspace directory (but not the snapshot dir).
Copy the files from the snapshot directory back into the keyspace directory.
Restart one node
From that node's cassandra-cli re-create your keyspace.
Verify that your data is there.
Restart the remaining nodes.
When trying to run PIG against a CQL3 created Cassandra Schema,
-- This script simply gets a row count of the given column family
rows = LOAD 'cassandra://Keyspace1/ColumnFamily/' USING CassandraStorage();
counted = foreach (group rows all) generate COUNT($1);
dump counted;
I get the following Error.
Error: Column family 'ColumnFamily' not found in keyspace 'KeySpace1'
I understand that this is by design, but I have been having trouble finding the correct method to load CQL3 tables into PIG.
Can someone point me in the right direction? Is there a missing bit of documentation?
This is now supported in Cassandra 1.2.8
As you mention this is by design because if thrift was updated to allow for this it would compromise backwards computability. Instead of creating keyspaces and column families using CQL (I'm guessing you used cqlsh) try using the C* CLI.
Take a look at these issues as well:
https://issues.apache.org/jira/browse/CASSANDRA-4924
https://issues.apache.org/jira/browse/CASSANDRA-4377
Per this https://github.com/alexliu68/cassandra/pull/3, it appears that this fix is planned for the 1.2.6 release of Cassandra. It sounds like they're trying to get that out in the reasonably near future, but of course there's no certain ETA.
As e90jimmy said, its supported in Cassandra 1.2.8, but we have a issue when using counter column type. This was fixed by Alex Liu but due to regression problem in 1.2.7 the patch doesn't go ahead:
https://issues.apache.org/jira/browse/CASSANDRA-5234
To correct this, wait until 2.0 become production ready or download the source, apply the patch from the above link by yourself and rebuild the cassandra .jar. Worked for me by now...
The best way to access Cql3 Tables in Pig is by using the CqlStorage Handler
The syntax is similar to what you have a above
row = Load 'cql://Keyspace/ColumnFamily/' Using CqlStorage()
More info In the Dev Blog Post
How to rename keyspace and columnfamily in cassandra 1.2? I know that cassandra-cli rename api is no longer supported - How to rename keyspace in Cassandra. Maybe there are some api in CQL3? Or some api for creating new columnfamily and coping all data from old to new columnfamily?
Renaming is disabled internally, not just within the thrift API. So there isn't a CQL command to do it either.
However, there is a manual process which is described here:
https://issues.apache.org/jira/browse/CASSANDRA-1585
For rename only a column family also you can follow the next instructions:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201201.mbox/%3C4EF306AC-98D5-45BE-A29C-B68187FBA9C9#thelastpickle.com%3E
Basically is create the new CF, copy the SStables from the old column family to the one renaming the files.