Altering a column family in cassandra in a multiple node topology

Altering a column family in cassandra in a multiple node topology - cassandra

I'm having the following issue when trying to alter cassandra:
I'm altering the table straight forward:
ALTER TABLE posts ADD is_black BOOLEAN;
on a single-node environment, both under EC2 server and on localhost everything work perfect - select, delete and so on.
When I'm altering on a cluster with 3 nodes - stuff are getting massy.
When I perform
select().all().from(tableName).where..
I'm getting the following exception:
java.lang.IllegalArgumentException: is_black is not a column defined in this metadata
at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
at com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:69)
at com.datastax.driver.core.AbstractGettableData.getString(AbstractGettableData.java:137)
Apparently I'm not the only one who's having this behaviour:
reference
p.s - drop creating the keyspace is not a possibility for me since I cannot delete the data contained in the table.

The bug was resolved :-)
I issue was that DataStax maintains in memory cache that contains the configuration of each node, this cache wasn't update when I alter the table since I used cqlsh instead of their SDK.
After restarting all the node, the in memory cache was dropped and the bug was resolved.

Related

Cassandra not starting due to unknown Column Family

After adding/removing tables and views to a keyspace a got problems with inconsistency and error referring to tables previous deleted. We tried to restart cluster nodes, only resulting in the nodes not starting due to java.lang.IllegalArgumentException: Unknown CF.
The current error is thrown from a View that refers to a non existing table (The table do exist but has a new id). Is it possible to some how fix this when Cassandra is not running?

It may be that you have a schema mismatch. Verify first that you're running the same schema versions nodetool describecluster and make sure all nodes are reachable.
The only other time i've seen something like this is when you have corrupt data on a node. In which case you'll want to nodetool removenode the appropriate node and provision a new one.
As an asside MATERIALIZED VIEWS are deprecated in 3.11 and will not be supported going forward. I would suggest that you roll your own.

how do I evict prepared statement from cache in cassandra 2.1

I am attempting to add a field to a user defined type in cassandra 2.1.2, using the nodejs driver from datastax. I added the field using ALTER TYPE in cqlsh. When I attempt to add a row containing the udt with a value for the new field, it gets inserted with null value, instead of value I supplied. I strongly suspect this has to do with the way the cluster is caching the prepared statement. Because I recall reading that the prepared statements are indexed by a hash of the query, I tried changing some whitespace in the query to see if it helped.This actually seemed to work, but only once. subsequent inserts result in error:
message: 'Operation timed out - received only 0 responses.',
info: 'Represents an error message from the server',
code: 4352,
consistencies: 10,
received: 0,
blockFor: 1,
writeType: 'SIMPLE',
coordinator: '127.0.0.1:9042',
and it would seem the new rows are not added.. until I restart cassandra, at which point not only do the inserts that I thought had failed show up, but subsequent ones work fine. This is very disconcerting, but fortunately I have only done this in test instances. I do need to make this change in production however, and restarting the cluster to add a single field is not really an option. Is there a better way to get the cluster to evict the cached prepared statement?

I strongly suspect this has to do with the way the cluster is caching the prepared statement.
Put Cassandra log in DEBUG mode to be sure the prepared statement cache is the root cause. If it is, create an JIRA so the dev team can fix it...
Optionally you can also enable tracing to see what is going on server-side
To enable tracing in cqlsh, just type TRACING ON
To enable tracing with the Java driver, just call enableTracing() on the statement object

Cassandra 2.1 system schema missing

I have a six node cluster running cassandra 2.1.6. Yesterday I tried to drop a column family and received the message "Column family ID mismatch". I tried running nodetool repair but after repair was complete I got the same message. I then tried selecting from the column family but got the message "Column family not found". I ran the following query to get a list of all column families in my schema
select columnfamily_name from system.schema_columnfamilies where keyspace_name = 'xxx';
At this point I received the message
"Keyspace 'system' not found." I tried the command describe keyspaces and sure enough system was not in the list of keyspaces.
I then tried nodetool resetlocalshema on one of the nodes missing the system keyspace and when that failed to resolve the problem I tried nodetool rebuild but got the same messages after rebuild was complete. I tried stopping the nodes missing the system keyspace and restarted them, once the restart was completed the system keyspace was back and I was able to execute the above query successfully. However, the table I had tried to drop previously was not listed so I tried to recreate it and once again received the message Column family ID mismatch.
Finally, I shutdown the cluster and restarted it... and everything works as expected.
My questions are: How/why did the system keyspace disappear? What happened to the data being inserted into my column families while the system keyspace was missing from two of the six nodes? (my application didn't seem to have any problems) Is there a way I can detect problems like this automatically or do I have to manually check up on my keyspaces each day? Is there a way to fix the missing system keyspace and/or the Column family ID mismatch without restarting the entire cluster?
EDIT
As per Jim Meyers suggestion I queried the cf_id on each node of the cluster and confirmed that all nodes return the same value.
select cf_id from system.schema_columnfamilies where columnfamily_name = 'customer' allow filtering;
cf_id
--------------------------------------
cbb51b40-2b75-11e5-a578-798867d9971f
I then ran ls on my data directory and can see that there are multiple entries for a few of my tables
customer-72bc62d0ff7611e4a5b53386c3f1c9f9
customer-cbb51b402b7511e5a578798867d9971f
My application dynamically creates tables at run time (always using IF NOT EXISTS), seems likely that the application issued the same create table command on separate nodes at the same time resulting in the schema mismatch.
Since I've restarted the cluster everything seems to be working fine.
Is it safe to delete the extra file?
i.e. customer-72bc62d0ff7611e4a5b53386c3f1c9f9

1 The cause of this problem is a CREATE TABLE statement collision. Do not generate tables dynamically from multiple clients, even with IF NOT EXISTS. First thing you need to do is fix your code so that this does not happen. Just create your tables manually from cqlsh allowing time for the schema to settle. Always wait for schema agreement when modifying schema.
2 Here's the fix:
1) Change your code to not automatically re-create tables (even with IF NOT EXISTS).
2) Run a rolling restart to ensure schema matches across nodes. Run nodetool describecluster around your cluster. Check that there is only one schema version. 
ON EACH NODE:
3) Check your filesystem and see if you have two directories for the table in question in the data directory.
If THERE ARE TWO OR MORE DIRECTORIES:
4)Identify from schema_column_families which cf ID is the "new" one (currently in use). 
cqlsh -e "select * from system.schema_column_families"|grep
5) Move the data from the "old" one to the "new" one and remove the old directory. 
6) If there are multiple "old" ones repeat 5 for every "old" directory.
7) run nodetool refresh
IF THERE IS ONLY ONE DIRECTORY:
No further action is needed.
Futures
Schema collisions will continue to be an issue until - CASSANDRA-9424
Here's an example of it occurring on Jira and closed as not a problem CASSANDRA-8387

When you create a table in Cassandra it is assigned a unique id that should be the same on all nodes. Somehow it sounds like your table did not have the same id on all nodes. I'm not sure how that might happen, but maybe there was a glitch when the table was created and it was created multiple times, etc.
You should always use the IF NOT EXISTS clause when creating tables.
To check if your id's are consistent, try this on each node:
In cqlsh, run "SELECT cf_id from system.schema_columnfamilies where columnfamily_name ='yourtablename' allow filtering;
Look in the data directory under the keyspace name the table was created in. You should see a single directory for the table that looks like table_name-cf_id.
If things are correct you should see the same cf_id in all these places. If you see different ones, then somehow things got out of sync.
The other symptoms like the system keyspace disappearing I don't have a suggestion other than you hit some kind of bug in the software. If you get a lot of strange symptoms like this then perhaps you have some kind of data corruption. You might want to think about backing up your data in case things go south and you need to rebuild the cluster.

Modifying columnfamily metadata in Cassandra produces errors in datastax driver on server restart

I'm seeing some very strange effects after modifying column metadata in a columnfamily after executing the following CQL query: ALTER TABLE keyspace_name.table_name ADD column_name cql_type;
I have a cluster of 4 nodes on two data centers (Cassandra version 2.0.9). I also have two application servers talking to the Cassandra cluster via the datastax java driver (version 2.0.4).
After executing this kind of query I see no abnormal behaviour whatsoever (no exceptions detected at all), however long I wait. But once I restart my application on one of the servers I immediately start seeing errors on the other server. What I mean by errors is that after getting my data into a ResultSet, I try to deserialize it row by row and get 'null' values or values from other columns instead of the ones I expect. After restarting the second server (the one that is getting the errors) everything gets back to normal.
I've tried investigating both the logs of datastax-agent and cassandra on both the servers but there is nothing to be found.
Is there a 'proper procedure' to altering the columnfamily? Does anyone have any idea as to what may be the problem?
Thanks!

DROP COLUMN FAMILY from cassandra CLI will not drop the CF

We tried to drop CF's using cassandra cli
DROP COLUMN FAMILY cfName
And when we list the CF from CLI it was not there and when i tried to get the existing CF's via hector
I still could see the CF name
KeyspaceDefinition keyspaceDefinition = newConnection().describeKeyspace(keyspaceName);
keyspaceDefinition.getCfDefs();
Data inside the CF is not there however, the CF is still listed, after listing the CFs via hector if i do a cassandra -cli list column families i can see my deleted CF again

I had to deal with this issue back on Cassandra 1.1 as well. Basically, my column family had become corrupted, and the only way to alter its schema, was to drop/restore the keyspace (which DataStax walked me through, at the time).
If you have a support contract with DataStax, I would HIGHLY recommend contacting them before proceeding. The first thing they'll tell you, is that this is a bug in specific versions of Cassandra 1.1, and that you should upgrade. I haven't tested it, but according to them an in-place upgrade would allow you to modify your schema in the new version. So you might be able to fix this by upgrading to 1.2 or 2.0.
In my case (production, enterprise environment) upgrading on-the-spot was not an option. To fix this, I basically had to drop my entire keyspace, re-create it (and my column families), and recover from a snapshot. I loosely followed the instructions found here:
Take a snapshot of the keyspace on each node. The snapshot files should be stored in the [keyspaceName]/snapshots dir, but I copied mine to another non-Cassandra location just to be on the safe side.
DROP your keyspace.
Stop all nodes.
On each node, delete the .db files in the keyspace directory (but not the snapshot dir).
Copy the files from the snapshot directory back into the keyspace directory.
Restart one node
From that node's cassandra-cli re-create your keyspace.
Verify that your data is there.
Restart the remaining nodes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string