DROP COLUMN FAMILY from cassandra CLI will not drop the CF

DROP COLUMN FAMILY from cassandra CLI will not drop the CF - cassandra

We tried to drop CF's using cassandra cli
DROP COLUMN FAMILY cfName
And when we list the CF from CLI it was not there and when i tried to get the existing CF's via hector
I still could see the CF name
KeyspaceDefinition keyspaceDefinition = newConnection().describeKeyspace(keyspaceName);
keyspaceDefinition.getCfDefs();
Data inside the CF is not there however, the CF is still listed, after listing the CFs via hector if i do a cassandra -cli list column families i can see my deleted CF again

I had to deal with this issue back on Cassandra 1.1 as well. Basically, my column family had become corrupted, and the only way to alter its schema, was to drop/restore the keyspace (which DataStax walked me through, at the time).
If you have a support contract with DataStax, I would HIGHLY recommend contacting them before proceeding. The first thing they'll tell you, is that this is a bug in specific versions of Cassandra 1.1, and that you should upgrade. I haven't tested it, but according to them an in-place upgrade would allow you to modify your schema in the new version. So you might be able to fix this by upgrading to 1.2 or 2.0.
In my case (production, enterprise environment) upgrading on-the-spot was not an option. To fix this, I basically had to drop my entire keyspace, re-create it (and my column families), and recover from a snapshot. I loosely followed the instructions found here:
Take a snapshot of the keyspace on each node. The snapshot files should be stored in the [keyspaceName]/snapshots dir, but I copied mine to another non-Cassandra location just to be on the safe side.
DROP your keyspace.
Stop all nodes.
On each node, delete the .db files in the keyspace directory (but not the snapshot dir).
Copy the files from the snapshot directory back into the keyspace directory.
Restart one node
From that node's cassandra-cli re-create your keyspace.
Verify that your data is there.
Restart the remaining nodes.

Related

How to migrate data from Cassandra 2.1.9 to a fresh 3.5 installation

I tried to use sstableloader to load data into Cassandra 3.5. The data was captured using nodetool snapshot under Cassandra 2.1.9. All the tables loaded fine except one. It's small, only 2 columns and 20 rows. So, I entered this bug: https://issues.apache.org/jira/browse/CASSANDRA-11806. The bug was quickly closed as a duplicate. It doesn't seem to be a duplicate, since the original case is upgrading a node in-place, not loading data with sstableloader.
Even so, I tried to apply the the advice given to run upgradesstable [sic].
The directions given to upgrade from one version of Cassandra to another seem sketchy at best. Here's what I did based on my working backup/restore and info garnered from various Cassandra docs on how to upgrade:
Snapshot the data from prod (Cassandra 2.1.9), as usual
Restore data to Cassandra 2.1.14 running on my workstation
Verify the restore to 2.1.14 (it worked)
Copy the data/data/makeyourcase into a Cassandra 3.5 install
Fire up Cassandra 3.5
Run nodetool upgradesstables to upgrade the sstables to 3.5
nodetool upgradesstables fails:
>./bin/nodetool upgradesstables
error: Unknown column role in table makeyourcase.roles
-- StackTrace --
java.lang.AssertionError: Unknown column role in table makeyourcase.roles
So, the questions: Is it possible to upgrade directly from 2.1.x to 3.5? What's the actual upgrade process? The process at http://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgradeCassandraDetails.html is seemingly missing important details.

This turned out to be a problem with the changing state of the table over time.
Since the table was small, I was able to migrate the data by using COPY to export the data to CSV and then importing it into the new version.
Have a look at https://issues.apache.org/jira/browse/CASSANDRA-11806 for discussion of another workaround and a coming bug fix.

How to inspect the local hints directory on a Cassandra node?

I'm encountering the same problem as Cassandra system.hints table is empty even when the one of the node is down:
I am learning Cassandra from academy.datastax.com. I am trying the Replication and Consistency demo on local machine. RF = 3 and Consistency = 1.
When my Node3 is down and I am updating my table using update command, the SYSTEM.HINTS table is expected to store hint for node3 but it is always empty.
#amalober pointed out that this was due to a difference the Cassandra version being used. From the Cassandra docs at DataStax:
In Cassandra 3.0 and later, the hint is stored in a local hints directory on each node for improved replay.
This same question was asked 3 years ago, How to access the local data of a Cassandra node, but the accepted solution was to
...Hack something together using the Cassandra source that reads SSTables and have that feed the local client you're hoping to build. A great starting point would be looking at the source of org.apache.cassandra.tools.SSTableExport which is used in the sstable2json tool.
Is there an easier way to access the local hints directory of a Cassandra node?

Is there an easier way to access the local hints directory of a Cassandra node?
The hint directory is defined in $CASSANDRA_HOME/conf/cassandra.yaml file (sometimes it is located under /etc/cassandra also, depending on how you install Cassandra)
Look for the property hints_directory

I guess you are using ccm. So, the hint file should be in $CASSANDRA_HOME/.ccm/yourcluster/yournode/hints directory

I haven't been able to reproduce your issue with not getting a hints file. Every attempt I had resulted in the hints file as expected. There is a way to view the hints easier now.
We added a dump for hints in sstable-tools that you can use to view the mutations in the HH files. We may in the future add ability to use the HH files like sstables in the shell (use mutations to build memtable and include in queries) but for now its pretty raw.
Its pretty simple (sans metadata setup) if you wanna do analysis of data yourself. You can see what we did here and change to your needs: https://github.com/tolbertam/sstable-tools/blob/master/src/main/java/org/apache/cassandra/hints/HintsTool.java#L39

Loading Cassandra data with SStableloader from different Cassandra cluster

I have two different independent machines running Cassandra and I want to migrate the data from one machine to the other.
Thus, I first took a snapshot of my Cassandra Cluster on machine 1 according to the datastax documentation.
Then I moved the data to machine 2, where I'm trying to import it with sstableloader.
As a note: The keypsace (open_weather) and tablename (raw_weather_data) on the machine 2 have been created and are the same as on machine 1.
The command I'm using looks as follows:
bin/sstableloader -d localhost "path_to_snapshot"/open_weather/raw_weather_data
And then get the following error:
Established connection to initial hosts
Opening sstables and calculating sections to stream
For input string: "CompressionInfo.db"
java.lang.NumberFormatException: For input string: "CompressionInfo.db"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:276)
at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:235)
at org.apache.cassandra.io.sstable.Component.fromFilename(Component.java:120)
at org.apache.cassandra.io.sstable.SSTable.tryComponentFromFilename(SSTable.java:160)
at org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:84)
at java.io.File.list(File.java:1161)
at org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:78)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:162)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
Unfortunately I have no idea why?
I'm not sure if it is related to the issue, but somehow on machine 1 my *.db files are name rather "strange" as compared to the *.db files I already have on machine 2.
*.db files from machine 1:
la-53-big-CompressionInfo.db
la-53-big-Data.db
...
la-54-big-CompressionInfo.db
...
*.db files from machine 2:
open_weather-raw_weather_data-ka-5-CompressionInfo.db
open_weather-raw_weather_data-ka-5-Data.db
What am I missing? Any help would be highly appreciated. I'm also open to any other suggestions. The COPY command will most probably not work since it is Limited to 99999999 rows as far as I know.
P.s. I didn't want to create a overly huge post, but if you need any further information to help me out, just let me know.
EDIT:
Note that I'm using Cassandra in the stand-alone mode.
EDIT2:
After installing the same version 2.1.4 on my destination machine (machine 2), I still get all the same error. With SSTableLoader I still get the above mentioned error and with copying the files manually (as described by LHWizard), I still get empty tables after starting Cassandra again and performing a SELECT command.
Regarding the initial tokens, I get a huge list of tokens if I perform node ring on machine 1. I'm not sure what to do with those?

your data is already in the form of a snapshot (or backup). What I have done in the past is the following:
install the same version of cassandra on the restore node
edit cassandra.yaml on the restore node - make sure that cluster_name and snitch are the same.
edit seeds: list and any other properties that were altered in the original node.
get the schema from the original node using cqlsh DESC KEYSPACE.
start cassandra on the restore node and import the schema.
(steps 6 & 7 may not be completely necessary, but this is what I do.)
stop cassandra, delete the contents of /var/lib/cassandra/data/, commitlog/, and saved_caches/* folders.
restart cassandra on the restore node to recreate the correct folders, then stop it
copy the contents of the snapshots folder to each corresponding table folder in the restore node, then start cassandra. You probably want to run nodetool repair.
You don't really need to bulk import the data, it's already in the correct format if you are using the same version of cassandra, although you didn't specify that in your original question.

Cassandra - Delete Old Versions of Tables and Backup Database

Looking in my keyspace directory I see several versions of most of my tables. I am assuming this is because I dropped them at some point and recreated them as I was refining the schema.
table1-b3441432142142sdf02328914104803190
table1-ba234143018dssd810412asdfsf2498041
These created tables names are very cumbersome to work with. Try changing to one of the directories without copy pasting the directory name from the terminal window... Painful. So easy to mistype something.
That side note aside, how do I tell which directory is the most current version of the table? Can I automatically delete the old versions? I am not clear if these are considered snapshots or not since each directory also can contain snapshots. I read in another post you can stop autosnapshot, but I'm not sure I want that. I'd rather just automatically delete any tables not being currently used (i.e.: that are not the latest version).
I stumbled across this trying to do a backup. I realized I am forced go to every table directory and copy out the snapshot files (there are like 50 directories..not including all the old table versions) which seems like a terrible design (maybe I'm missing something??).
I assumed I could do a snapshot of the whole keyspace and get one file back or at least output all the files to a single directory that represents the snapshot of the entire keyspace. At the very least it would be nice knowing what the current versions are so I can grab the correct files and offload them to storage somewhere.
DataStax Enterprise has a backup feature but it only supports AWS and I am using Azure.
So to clarify:
How do I automatically delete old table versions and know which is
the current version?
How can I backup the most recent versions of the tables and output the files to a single directory that I can offload somewhere? I only have two nodes, so simply relying on the repair is not a good option for me if a node goes down.

You can see the active version of a table by looking in the system keyspace and checking the cf_id field. For example, to see the version for a table in the 'test' keyspace with table name 'temp', you could do this:
cqlsh> SELECT cf_id FROM system.schema_columnfamilies WHERE keyspace_name='test' AND columnfamily_name='temp' allow filtering;
cf_id
--------------------------------------
d8ea9830-20e9-11e5-afc0-c381f961c62a
As far as I know, it is safe to delete (rm -r) outdated table version directories that are no longer active. I imagine they don't delete them automatically so that you can recover the data if you dropped them by mistake. I don't know of a way to have them removed automatically even if auto snapshot is disabled.
I don't think there is a command to write all the snapshot files to a single directory. According to the documentation on snapshot, "After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place." So it's left up to the application developer how they want to handle archiving the snapshot files.

Altering a column family in cassandra in a multiple node topology

I'm having the following issue when trying to alter cassandra:
I'm altering the table straight forward:
ALTER TABLE posts ADD is_black BOOLEAN;
on a single-node environment, both under EC2 server and on localhost everything work perfect - select, delete and so on.
When I'm altering on a cluster with 3 nodes - stuff are getting massy.
When I perform
select().all().from(tableName).where..
I'm getting the following exception:
java.lang.IllegalArgumentException: is_black is not a column defined in this metadata
at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
at com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:69)
at com.datastax.driver.core.AbstractGettableData.getString(AbstractGettableData.java:137)
Apparently I'm not the only one who's having this behaviour:
reference
p.s - drop creating the keyspace is not a possibility for me since I cannot delete the data contained in the table.

The bug was resolved :-)
I issue was that DataStax maintains in memory cache that contains the configuration of each node, this cache wasn't update when I alter the table since I used cqlsh instead of their SDK.
After restarting all the node, the in memory cache was dropped and the bug was resolved.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string