We have a 15 node cassandra setup across 3 DC's.
Using cassandra 3.0.9
One of the nodes in one of our DCs has died with the below startup error
CassandraDaemon.java:709 - Exception encountered during startup
java.lang.IllegalArgumentException: Unknown CF 111111-111111111-11111111
What we have tried -
Bootstrap replacing the node -https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsReplaceNode.html
This works for a while and then the bootstrap process hangs with the SAME error in the logs -
CassandraDaemon.java:709 - Exception encountered during startup
java.lang.IllegalArgumentException: Unknown CF
Provision a new blank node,
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html
and try to join this to the cluster (with the intention of removing the dead node after). It refuses to start, with the same error.
has anyone ever ran across this before?
Related
I am running neo4j 3.5.14 packaged by Bitnami on AWS linux "5.4.0-1078-aws". The error began when the disk filled and the database crashed. After clearing disk space and attempting to restart the database. I have tried starting neo4j at both the CLI and service levels. When attempting to start from the CLI I get the following log in neo4j.log. I have run this both as both the neo4j user and the root user.
2022-06-17 17:13:50.745+0000 INFO ======== Neo4j 3.5.14 ========
2022-06-17 17:13:50.768+0000 INFO Starting...
2022-06-17 17:13:54.203+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:124)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:91)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:180)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory, /var/lib/neo4j/data/databases
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:232)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:148)
at org.neo4j.server.database.CommunityGraphFactory.newGraphDatabase(CommunityGraphFactory.java:41)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:90)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.NeoStoreDataSource#19648c40' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.kernel.impl.transaction.state.DataSourceManager.start(DataSourceManager.java:116)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:227)
... 9 more
Caused by: java.lang.RuntimeException: Error reading transaction logs, recovery not possible. To force the database to start anyway, you can specify 'unsupported.dbms.tx_log.fail_on_corrupted_log_files=false'. This will try to recover as much as possible and then truncate the corrupt part of the transaction log. Doing this means your database integrity might be compromised, please consider restoring from a consistent backup instead.
at org.neo4j.kernel.recovery.Recovery.throwUnableToCleanRecover(Recovery.java:160)
at org.neo4j.kernel.recovery.LogTailScanner.findLogTail(LogTailScanner.java:147)
at org.neo4j.kernel.recovery.LogTailScanner.getTailInformation(LogTailScanner.java:260)
at org.neo4j.kernel.impl.transaction.log.LogVersionUpgradeChecker.check(LogVersionUpgradeChecker.java:48)
at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:349)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 14 more
Caused by: java.io.IOException: java.lang.IllegalArgumentException: Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10
at org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:115)
at org.neo4j.kernel.impl.transaction.log.LogEntryCursor.next(LogEntryCursor.java:54)
at org.neo4j.kernel.recovery.LogTailScanner.findLogTail(LogTailScanner.java:99)
... 18 more
Caused by: java.lang.IllegalArgumentException: Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10
at org.neo4j.kernel.impl.transaction.log.entry.LogEntryVersion.entryParser(LogEntryVersion.java:130)
at org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:81)
... 20 more
2022-06-17 17:13:54.217+0000 INFO Neo4j Server shutdown initiated by request
I should also add that while I can see that the data of the database is intact in the databases folder, neo4j-admin dump command is dumping an empty database file.
You can
Shut down the database
make a copy of the data folder as a backup
use neo4j-admin check-consistency to check your database integrity
add this line to your /etc/neo4j/neo4j.conf unsupported.dbms.tx_log.fail_on_corrupted_log_files=false
Restart the db.
If that doesn't help, shut down the db again.
remove data/transactions folder
Restart the db.
If that doesn't help, shut down the db again.
You can also use neo4j-admin copy to make a copy of the database while skipping corrupt records.
I'm trying to start a spark jobserver, here are the steps I'm following:
I configure the local.sh based on the template.
Then I run ./bin/server_deploy.sh and it finishes without any error.
Configure local.conf.
Run ./bin/server_start.sh in the deploy server.
But when I do the last step I get the following error:
Error: Exception thrown by the agent : java.lang.NullPointerException
Note: I'm using spark 1.4.1. I'm using version 0.5.2 from jobserver (https://github.com/spark-jobserver/spark-jobserver/tree/v0.5.2)
Any idea in how I can fix this (or at least debug it).
Thanks
The error log does not provide much information.
I encountered the same error. For my case, I had another instance of the JobServer running (and somehow ./bin/server_stop.sh did not catch it). It works after I manually killed the other process.
Hint : Error: Exception thrown by the agent : java.lang.NullPointerException when starting Java application
I have a 6 node cluster running 1.1.5. I added a new node running the same version. The new node joined the cluster without any issues. But we are seeing the below error messages in the system.log
ERROR 10:15:53,455 Exception in thread Thread[ReadStage:320,5,main]
java.lang.AssertionError: Unknown keyspace
Any thoughts?
I am trying to upgrade a Cassandra 2.1.0 cluster to 2.1.8 (latest release).
When I start a first node with 2.1.8 runtime, I get an error and the node refuses to start.
This is the error's stack trace :
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:642) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) [apache-cassandra-2.1.8.jar:2.1.8]
Caused by: java.lang.NullPointerException: null
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:634) ~[apache-cassandra-2.1.8.jar:2.1.8]
... 3 common frames omitted
FSReadError in Failed to remove unfinished compaction leftovers (file: /home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Statistics.db). See log for details.
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:642)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:634)
... 3 more
Exception encountered during startup: java.lang.NullPointerException
The cluster has 7 nodes and it turns on AWS Linux EC2 instances.
The node I try to upgrade was stopped after a nodetool drain.
Then I tried to come back to 2.1.0 runtime but I now get a similar error.
I also tried to stop and start another node and everything was ok, the node restarted without any problem.
I tried to touch the missing file (as it should be removed, I thought it would perhaps not need a specific content). I had two other files with the same error that I also touched. And finally the node fails further while trying to read these files.
Anyone has any idea what I should do ?
Thank you for any help.
It might be worth opening a Jira for that issue, so if nothing else, they can catch the NPE and provide a better error message.
It looks like it's trying to open:
file: /home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Statistics.db
It's possible that it's trying to read that file because it finds the associated data file: (/home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Data.db). Does that data file exist? I'd be tempted to move it out of the way, and see if it starts properly.
I have a cluster running on datastax-cassandra 1.2.5, it works fine, because of vnodes adn leveled compaction strategy issue i tried promoting it to 1.2.6.
So upgrading involved -
1 - stopping all the nodes
2 - deleting 1.2.5 rpm
3 - installing 1.2.6 rpm
4 - fixing cassandra.yaml
5 - starting cassandra.
Problem Statement - The problem now is that all the nodes are up and running, but not in one cluster. They all are operating in their own cluster even though the seeds in yaml points to the original seed.
nodetool status also just shows the one node (the node on which we are on)
system log shows one error
ERROR [WRITE-/10.93.3.46] 2013-10-21 19:43:29,101 CassandraDaemon.java (line 192)
Exception in thread Thread[WRITE-/10.10.10.10,5,main]
java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:66)
at
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:351)
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:143)
**** 10.10.10.10 is the seed ip
Any help on how to pass through it
Try to set the internode_compression to none. It will disable compression between nodes, which is failing because snappy cannot initialize
internode_compression: none