In my live cassandra cluster, I have accidentally dropped the keyspace. Using snapshots, I have recovered the data but now the response time is very high, though cassandra recentReadLatencyMicros in < 2ms on all nodes.
After restore, I am getting following exception very frequently, I have created all the column families again but still getting the exception. How do I know by cfId which column family I am missing. I had also checked in schema_columnfamilies but this cfId doesn't exist. Any help is greatly appreciated.
ERROR [RequestResponseStage:1094556] 2014-04-01 03:12:05,583
AbstractCassandraDaemon.java (line 132) Exception in thread
Thread[RequestResponseStage:1094556,5,main] java.io.IOError:
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=1118 at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
at
org.apache.cassandra.service.AsyncRepairCallback.response(AsyncRepairCallback.java:47)
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722) Caused by:
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=1118 at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at org.apache.cassandra.db.Row$RowSerializer.deserialize(Row.java:72)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:109)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
I. in CQLSH run this to get the list of CFs in your restored (broken) keyspace:
SELECT columnfamily_name FROM system.schema_columnfamilies
WHERE keyspace_name='your_keyspace';
(Replace your_keyspace with an appropriate name.)
II. Go to your snapshot and see what directories there are.
III. Check if any items in the second list are missing from the first. If you have hundreds of CFs, you may want to use some scripting to quickly find the missing table. Or just paste both lists into a spreadsheet and sort. Then by eyeballing you should be able to quickly see the mismatched row.
Related
During cassandra decommission I noticed that the node tries to send hints which takes extremely long time and never completes. I checked the hints folder and found hints that are more than 9+ months old. I am not sure why those old hints are still present in the folder so I decided to delete them. After I deleted them I noticed the following entry in the system.log
INFO [HintsDispatcher:1070] 2021-07-08 11:32:01,056 HintsDispatchExecutor.java:141 - Transferring all hints to /10.199.190.233: 7935f1b5-4725-4dc2-ad6d-b883d53d907d
ERROR [HintsDispatcher:1070] 2021-07-08 11:32:01,061 CassandraDaemon.java:207 - Exception in thread Thread[HintsDispatcher:1070,1,RMI Runtime]
java.lang.RuntimeException: java.nio.file.NoSuchFileException: /data/cassandra/data/hints/ce6bb0e3-849f-487d-9274-38b8536b89cf-1603947885707-1.hints
Where does Cassandra keep metadata for the hints as system.hints folder didn't have any entry?
Cassandra Version is 3.0.12
There is a catalog of hints held in memory on each Cassandra node for tracking.
If you manually delete the contents of the hints directory on a node, the entries in the hints catalog become stale and you run into the NoSuchFileException you posted.
The correct way of deleting hints is with the nodetool truncatehints command. Cheers!
Cassandra System Log:
ERROR [ReadStage:8468] 2016-05-09 08:58:28,029 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in AAAAA.EVENT_QUEUE_DATA; query aborted (see tombstone_failure_threshold)
ERROR [ReadStage:8468] 2016-05-09 08:58:28,029 CassandraDaemon.java (line 258) Exception in thread Thread[ReadStage:8468,5,main]
java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Application Log:
! java.net.SocketException: Broken pipe
! at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45]
! at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45]
! at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45]
! at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) ~[na:1.8.0_45]
I don't know the exact cause for this yet. My guess is a trigger of many delete calls to Cassandra might have caused this situation. Any advise would be very helpful to me at this moment. Thanks a lot.
A temp workaround, you can increase tombstone_failure_threshold in cassandra.yaml.
I guess from the AAAAA.EVENT_QUEUE_DATA name is that you've implemented a queue. This is an anti-pattern which also would cause exactly what your explaining. This will continue to get bad and cause a lot of GC style issues and performance problems down the road.
Knowing that doesn't really help you today though. I would suggest you increase your failure threshold (above) and update your compaction strategy to help in future. Heres an idea:
ALTER TABLE footable WITH
compaction = {'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': '256mb',
'tombstone_compaction_interval': '14400',
'unchecked_tombstone_compaction': 'true',
'tombstone_threshold': '0.05'} AND
gc_grace_seconds = 14400 # assuming you will use everything in queue within this window of seconds
But you will want to make changes in your application. Keep in mind that more aggressive tombstone removal creates a possibility for a delete to be "lost" but its not very likely and is better than being down.
Tombstones are generated when you "delete" your data and they represent logical markers for delete functionality. This is a part of a mechanism which helps you fight ghost columns. If you deleted a lot of data you can easily hit tombstone warning and even error (like in your case). There is a gc_grace period setting on your table which defines retention time for tombstones. Also, try to avoid selecting everything (make select statements target actual data instead of range queries).
I am struggling with a problem where a straightforward combination of a java app (that prepares sstables using the CQLSSTableWriter api) in combination with sstableloader fails to insert all rows.
The only suspect message I see during the creation of the sstables is
[Reference-Reaper:1] ERROR o.a.cassandra.utils.concurrent.Ref - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State#4651731d) to class
org.apache.cassandra.io.util.SafeMemory$MemoryTidy#1452490935:Memory#[7f89dc05adc0..7f89dc05dfc0)
was not released before the reference was garbage collected
The sstableloader does not list anything suspect. After the load completes the number of rows does not match.
I checked key uniqueness and that does not seem to be the issue.
Anyone any thoughts on how to go about fixing this?
Many thanks indeed!
Peter
Houston, we have a problem.
Trying to create a new table with cqlsh on an existing Cassandra (v2.1.3) keyspace results in:
ServerError:
<ErrorMessage code=0000 [Server error] message="java.lang.RuntimeException:
java.util.concurrent.ExecutionException:
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found e8c03790-c952-11e4-a753-5981ea73cd7c; expected e8b14370-c952-11e4-a844-8f10bfb9c386)">
After the first create attempt, trying once more will result in:
AlreadyExists: Table 'ks.metrics' already exists
But retrieving the list of existing tables for the keyspace desc tables; will not report the new table.
The issue seems related to Cassandra-8387 except that there's only one client trying to create the table: cqlsh
We do have a bunch of Spark jobs that will create the keyspaces and tables at startup, potentially doing this in parallel. Would this render the keyspace corrupt?
Creating a new keyspace and adding a table to it works as expected.
Any ideas?
UPDATE
Found a workaround: issue a repair on the keyspace and the tables will appear (desc tables) and are also functional.
Short answer: They have a race condition, which they think they resolved in 1.1.8...
Long answer:
I get that error all the time on one of my clusters. I have test machines that have really slow hard drives and creating one or two tables is enough to get the error when I have 4 nodes on two separate computers.
Below I have a copy of the stack trace from my Cassandra 3.7 installation. Although your version was 2.1.3, I would be surprised that this part of the code changed that much.
As we can see, the exception happens in the validateCompatibility() function. This requires that the new and old versions of the MetaData have these equal:
ksName (keyspace name)
cfName (columnfamily name)
cfId (columnfamily UUID)
flags (isSuper, isCounter, isDense, isCompound)
comparator (key sorting comparator)
If any one of these values do not match between the old and new meta data, then the process raises an exception. In our case, the cfId values are different.
Going up the stack, we have the apply() which calls validateCompatibility() immediately.
Next we have updateTable(). Similarly, it calls apply() nearly immediately. First it calls the getCFMetaData() to retrieve the current column family data ("old") that is going to be compared against the new data.
Next we see updateKeyspace(). That function calculates a diff to know what changed. Then it saves that in each type of data. Table is 2nd after Type...
Before that they have the mergeSchema() which calculates what changed at the Keyspace level. It then drops keyspaces that were deleted and generate new keyspaces for those that were updated (and for new keyspaces). Finally, they loop over the new keyspaces calling updateKeyspace() for each one of them.
Next in the stack we see an interesting function: mergeSchemaAndAnnounceVersion(). This one will update the version once the keyspaces were updated in memory and on disk. The version of the schema includes that cfID that is not compatible and thus generates the exception. The Announce part is to send a gossip message to the other nodes about the fact that this node now knows of the new version of a certain schema.
Next we see something called MigrationTask. This is the message used to migrate changes between Cassandra nodes. The message payload is a collection of mutations (those handled by the mergeSchema() function.)
The rest of the stack just shows run() functions that are various types of functions used to handle messages.
In my case, for me the problem gets resolved a little later and all is well. I have nothing to do for the schema to finally get in sync. as expected. However, it prevents me from creating all my tables in one go. So, my take looking at this is that the migration messages do not arrive in the expected order. There must be a timeout which is handled by resending the event and that generates the mix-up.
So, lets look at the code sending the message in the first place, you see that one in the MigrationManager. Here we have a MIGRATION_DELAY_IN_MS parameter in link with an old issue, Schema push/pull race, which was to avoid a race condition. Well... there you go. So they are aware that there is a possible race condition and to try to avoid it, they added a little delay there. One part of that fix includes a version check. If the versions are already equal, avoid the update altogether (i.e. ignore that gossip).
if (Schema.instance.getVersion().equals(currentVersion))
{
logger.debug("not submitting migration task for {} because our versions match", endpoint);
return;
}
The delay we are talking about is one minute:
public static final int MIGRATION_DELAY_IN_MS = 60000;
One would think that one whole minute would suffice, but somehow I still get the error all the time.
The fact is that their code does not expect multiple changes happening one after the other including large delays like I have. So if I were to create one table, and then do other things, I'd be just fine. On the other hand, when I want to create 20 tables in a row on those slow machines, the gossiping message from a previous schema change arrives late (i.e. after the new CREATE TABLE command arrived to that node.) That's when I get that error. The worst part, I guess, is that it is a spurious error (i.e. it is telling me that the gossip was later, and not that my schema is invalid and the schema in the gossip message is an old one.)
org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 122a2d20-9e13-11e6-b830-55bace508971; expected 1213bef0-9e
at org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:790) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:750) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:661) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1350) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1306) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1256) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:92) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53) [apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) [apache-cassandra-3.9.jar:3.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
I had two different table schemas with the same table name by mistake. so this issue happened (i was using express-cassandra)
I'm using cassandra 0.7.4 on centos5.5 x86_64 with jdk-1.6.0_24 64-Bit.
When I restart it , it throw out:
ERROR 11:37:32,009 Exception encountered during startup.
java.io.IOError: org.apache.cassandra.config.ConfigurationException: Attempt to assign id to existing column family.
at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:476)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
Caused by: org.apache.cassandra.config.ConfigurationException: Attempt to assign id to existing column family.
at org.apache.cassandra.config.CFMetaData.map(CFMetaData.java:223)
at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:472)
... 3 more
I try to location the problem: when I delete the file of the system keyspace ,It can restart sucess!
So I think this problem is cause by system Keyspace,even at the CF Scheam.
Then I build a new test environment, I know this proble is cause by this opeartion
update keyspace system with replication_factor=3;
But now how can i repair it ?!
There are many data on this cluster,and I couldn't lose data.
I have already do update keyspace system with replication_factor=1; ,but the problem still exist.
I try to use nodetool to repair after or befor flush, all no effect.
How can I restart cassandra without lose data ? Who can help me?
You should never modify the system keyspace unless you really, really know what you are doing. (If you have to ask, you don't. :)
So, the answer is: don't do that.
To recover, you should set initial_token in cassandra.yaml to your node's current token (which you can see with "nodetool ring"), then delete the system keyspace and restart. Then you'll need to recreate your columnfamily definitions, but your data will not be affected.