Houston, we have a problem.
Trying to create a new table with cqlsh on an existing Cassandra (v2.1.3) keyspace results in:
ServerError:
<ErrorMessage code=0000 [Server error] message="java.lang.RuntimeException:
java.util.concurrent.ExecutionException:
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found e8c03790-c952-11e4-a753-5981ea73cd7c; expected e8b14370-c952-11e4-a844-8f10bfb9c386)">
After the first create attempt, trying once more will result in:
AlreadyExists: Table 'ks.metrics' already exists
But retrieving the list of existing tables for the keyspace desc tables; will not report the new table.
The issue seems related to Cassandra-8387 except that there's only one client trying to create the table: cqlsh
We do have a bunch of Spark jobs that will create the keyspaces and tables at startup, potentially doing this in parallel. Would this render the keyspace corrupt?
Creating a new keyspace and adding a table to it works as expected.
Any ideas?
UPDATE
Found a workaround: issue a repair on the keyspace and the tables will appear (desc tables) and are also functional.
Short answer: They have a race condition, which they think they resolved in 1.1.8...
Long answer:
I get that error all the time on one of my clusters. I have test machines that have really slow hard drives and creating one or two tables is enough to get the error when I have 4 nodes on two separate computers.
Below I have a copy of the stack trace from my Cassandra 3.7 installation. Although your version was 2.1.3, I would be surprised that this part of the code changed that much.
As we can see, the exception happens in the validateCompatibility() function. This requires that the new and old versions of the MetaData have these equal:
ksName (keyspace name)
cfName (columnfamily name)
cfId (columnfamily UUID)
flags (isSuper, isCounter, isDense, isCompound)
comparator (key sorting comparator)
If any one of these values do not match between the old and new meta data, then the process raises an exception. In our case, the cfId values are different.
Going up the stack, we have the apply() which calls validateCompatibility() immediately.
Next we have updateTable(). Similarly, it calls apply() nearly immediately. First it calls the getCFMetaData() to retrieve the current column family data ("old") that is going to be compared against the new data.
Next we see updateKeyspace(). That function calculates a diff to know what changed. Then it saves that in each type of data. Table is 2nd after Type...
Before that they have the mergeSchema() which calculates what changed at the Keyspace level. It then drops keyspaces that were deleted and generate new keyspaces for those that were updated (and for new keyspaces). Finally, they loop over the new keyspaces calling updateKeyspace() for each one of them.
Next in the stack we see an interesting function: mergeSchemaAndAnnounceVersion(). This one will update the version once the keyspaces were updated in memory and on disk. The version of the schema includes that cfID that is not compatible and thus generates the exception. The Announce part is to send a gossip message to the other nodes about the fact that this node now knows of the new version of a certain schema.
Next we see something called MigrationTask. This is the message used to migrate changes between Cassandra nodes. The message payload is a collection of mutations (those handled by the mergeSchema() function.)
The rest of the stack just shows run() functions that are various types of functions used to handle messages.
In my case, for me the problem gets resolved a little later and all is well. I have nothing to do for the schema to finally get in sync. as expected. However, it prevents me from creating all my tables in one go. So, my take looking at this is that the migration messages do not arrive in the expected order. There must be a timeout which is handled by resending the event and that generates the mix-up.
So, lets look at the code sending the message in the first place, you see that one in the MigrationManager. Here we have a MIGRATION_DELAY_IN_MS parameter in link with an old issue, Schema push/pull race, which was to avoid a race condition. Well... there you go. So they are aware that there is a possible race condition and to try to avoid it, they added a little delay there. One part of that fix includes a version check. If the versions are already equal, avoid the update altogether (i.e. ignore that gossip).
if (Schema.instance.getVersion().equals(currentVersion))
{
logger.debug("not submitting migration task for {} because our versions match", endpoint);
return;
}
The delay we are talking about is one minute:
public static final int MIGRATION_DELAY_IN_MS = 60000;
One would think that one whole minute would suffice, but somehow I still get the error all the time.
The fact is that their code does not expect multiple changes happening one after the other including large delays like I have. So if I were to create one table, and then do other things, I'd be just fine. On the other hand, when I want to create 20 tables in a row on those slow machines, the gossiping message from a previous schema change arrives late (i.e. after the new CREATE TABLE command arrived to that node.) That's when I get that error. The worst part, I guess, is that it is a spurious error (i.e. it is telling me that the gossip was later, and not that my schema is invalid and the schema in the gossip message is an old one.)
org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 122a2d20-9e13-11e6-b830-55bace508971; expected 1213bef0-9e
at org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:790) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:750) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:661) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1350) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1306) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1256) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:92) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) [apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) [apache-cassandra-3.9.jar:3.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
I had two different table schemas with the same table name by mistake. so this issue happened (i was using express-cassandra)
Related
I'm running OS Cassandra 3.11.9 with Datastax Java Driver 3.8.0. I have a Cassandra keyspace that has multiple tables functioning as lookup tables / search indices. Whenever I receive a new POST request to my endpoint, I parse the object and insert it in the corresponding Cassandra table. I also put inserts to each corresponding lookup table. (10-20 per object)
When ingesting a lot of data into the system, I've been running into WriteTimeoutExceptions in the driver.
I tried to serialize the insert requests into the lookup tables by introducing Apache Camel and putting all the Statements into a queue that the Session could work off of, but it did not help.
With Camel, since the exceptions are now happening in the Camel thread, the test continues to run, instead of failing on the first exception. Eventually, the test seems to crash Cassandra. (Nothing in the Cassandra logs though)
I also tried to turn off my lookup tables and instead insert into the main table 15x per object (to simulate a similar number of writes as if I had the lookup tables on). This test passed with no exception, which makes me think the large number of tables is the problem.
Is a large number (2k+) of Cassandra tables a code smell? Should we rearchitect or just throw more resources at it? Nothing indicative has shown in the logs, mostly just some status about the number of tables etc - no exceptions)
Can the Datastax Java Driver be used multithreaded like this? It says it is threadsafe.
There is a direct effect of the high number of tables onto the performance - see this doc (the whole series is good source of information), and this blog post for more details. Basically, with ~1000 tables, you get ~20-25% degradation of performance.
That's could be a reason, not completely direct, but related. For each table, Cassandra needs to allocate memory, have a part for it in the memtable, keep information about it, etc. This specific problem could come from the blocked memtable flushes, or something like. Check the nodetool tpstats and nodetool tablestats for blocked or pending memtable flushes. It's better to setup some continuous monitoring solution, such as, metrics collector for Apache Cassandra, and and for period of time watch for the important metrics that include that information as well.
I am attempting to add a field to a user defined type in cassandra 2.1.2, using the nodejs driver from datastax. I added the field using ALTER TYPE in cqlsh. When I attempt to add a row containing the udt with a value for the new field, it gets inserted with null value, instead of value I supplied. I strongly suspect this has to do with the way the cluster is caching the prepared statement. Because I recall reading that the prepared statements are indexed by a hash of the query, I tried changing some whitespace in the query to see if it helped.This actually seemed to work, but only once. subsequent inserts result in error:
message: 'Operation timed out - received only 0 responses.',
info: 'Represents an error message from the server',
code: 4352,
consistencies: 10,
received: 0,
blockFor: 1,
writeType: 'SIMPLE',
coordinator: '127.0.0.1:9042',
and it would seem the new rows are not added.. until I restart cassandra, at which point not only do the inserts that I thought had failed show up, but subsequent ones work fine. This is very disconcerting, but fortunately I have only done this in test instances. I do need to make this change in production however, and restarting the cluster to add a single field is not really an option. Is there a better way to get the cluster to evict the cached prepared statement?
I strongly suspect this has to do with the way the cluster is caching the prepared statement.
Put Cassandra log in DEBUG mode to be sure the prepared statement cache is the root cause. If it is, create an JIRA so the dev team can fix it...
Optionally you can also enable tracing to see what is going on server-side
To enable tracing in cqlsh, just type TRACING ON
To enable tracing with the Java driver, just call enableTracing() on the statement object
I have 3 similar cassandra tables. Table A1, A2, A3.
All have same columns, but different partition keys.
Data is inserted in all three tables at same time through sequential inserts using Mapper Library (cassandra-driver-mapping-2.1.8.jar)
However, there has been inconsistency in few columns.
E.g. Sometimes A1.colX and A2.colX are same but A3.colX is having old value(not updated) and rest all columns in these three tables have exactly same value.
Another time A1.colY and A3.colY may have same value but A2.colY is having old value(not updated) and rest all columns in these three tables have exactly same value.
I am using Mapper Manager to save the input data in Cassandra.
Is it a known problem with mapper manager or something wrong in my approach?
Sample code:
public void insertInTables(String inputString){
.
.
ClassNameA1 classObjectA1=new Gson().fromJson(inputString, ClassNameA1.class);
ClassNameA2 classObjectA2=new Gson().fromJson(inputString, ClassNameA2.class);
ClassNameA3 classObjectA3=new Gson().fromJson(inputString, ClassNameA3.class);
MappingManager manager = new MappingManager(session);
Mapper<ClassNameA1> mapperA1 = manager.mapper(ClassNameA1.class);
Mapper<ClassNameA2> mapperA2 = manager.mapper(ClassNameA2.class);
Mapper<ClassNameA3> mapperA3 = manager.mapper(ClassNameA3.class);
mapperA1.save(classObjectA1);
mapperA2.save(classObjectA2);
mapperA3.save(classObjectA3);
.
.
}
It might happen as Cassandra is an eventual consistency store, not strong one. Typical reasons for similar behaviour I've witnessed in my experience are:
issues with read/write consistency level. If you have RF=3, but write data with CL=ONE, some nodes might fail to replicate your value on write for some reason (like network/hw glitch). Then if you read with CL=QUORUM (or ONE), the quorum may decide to show you the old column value because the new one not propagated to all nodes correctly. So make sure you're writing with CL=ALL/QUORUM and reading with CL=QUORUM.
issues with hinted hand-off (which is used to protect you from previous issue). Once I've observed a strange behaviour when the first column read was stale/inconsistent (in 1% of all queries), but the second (or third) one shown the correct column value all the time. So try to re-read your inconsistent column multiple times (and think about possible hw/net failures before).
internal database errors got due to hardware failure or the Cassandra itself.
Most of the issues described above are possible to fix with nodetool repair. You can do a full repair and see if this helps.
I am wondering if there is a way to check to see if a RethinkDB shard is in use before performing so ReQL query on it.
I am currently calling two functions back to back, the first creating a RethinkDB table and inserting data, the second will read the data from that newly created table. This works okay if the data being inserted is minimal, but once the size of the data set being inserted increases, I start getting:
Unhandled rejection RqlRuntimeError: Cannot perform write: Primary replica for shard ["", +inf) not available
This is because the primary shard is still doing the write from the previous function. I guess I am wondering if there is some RethinkDB specific way of avoiding this or if I need to emit/listen for events or something?
You can probably use the wait command for this. From the docs:
Wait for a table or all the tables in a database to be ready. A table
may be temporarily unavailable after creation, rebalancing or
reconfiguring. The wait command blocks until the given table (or
database) is fully up to date.
http://rethinkdb.com/api/javascript/wait/
In my live cassandra cluster, I have accidentally dropped the keyspace. Using snapshots, I have recovered the data but now the response time is very high, though cassandra recentReadLatencyMicros in < 2ms on all nodes.
After restore, I am getting following exception very frequently, I have created all the column families again but still getting the exception. How do I know by cfId which column family I am missing. I had also checked in schema_columnfamilies but this cfId doesn't exist. Any help is greatly appreciated.
ERROR [RequestResponseStage:1094556] 2014-04-01 03:12:05,583
AbstractCassandraDaemon.java (line 132) Exception in thread
Thread[RequestResponseStage:1094556,5,main] java.io.IOError:
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=1118 at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
at
org.apache.cassandra.service.AsyncRepairCallback.response(AsyncRepairCallback.java:47)
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722) Caused by:
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=1118 at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at org.apache.cassandra.db.Row$RowSerializer.deserialize(Row.java:72)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:109)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
I. in CQLSH run this to get the list of CFs in your restored (broken) keyspace:
SELECT columnfamily_name FROM system.schema_columnfamilies
WHERE keyspace_name='your_keyspace';
(Replace your_keyspace with an appropriate name.)
II. Go to your snapshot and see what directories there are.
III. Check if any items in the second list are missing from the first. If you have hundreds of CFs, you may want to use some scripting to quickly find the missing table. Or just paste both lists into a spreadsheet and sort. Then by eyeballing you should be able to quickly see the mismatched row.