Cassandra 2.1.8: Node refuses to start with NPE in removeUnfinishedCompactionLeftovers - cassandra

I am trying to upgrade a Cassandra 2.1.0 cluster to 2.1.8 (latest release).
When I start a first node with 2.1.8 runtime, I get an error and the node refuses to start.
This is the error's stack trace :
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:642) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) [apache-cassandra-2.1.8.jar:2.1.8]
Caused by: java.lang.NullPointerException: null
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:634) ~[apache-cassandra-2.1.8.jar:2.1.8]
... 3 common frames omitted
FSReadError in Failed to remove unfinished compaction leftovers (file: /home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Statistics.db). See log for details.
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:642)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:634)
... 3 more
Exception encountered during startup: java.lang.NullPointerException
The cluster has 7 nodes and it turns on AWS Linux EC2 instances.
The node I try to upgrade was stopped after a nodetool drain.
Then I tried to come back to 2.1.0 runtime but I now get a similar error.
I also tried to stop and start another node and everything was ok, the node restarted without any problem.
I tried to touch the missing file (as it should be removed, I thought it would perhaps not need a specific content). I had two other files with the same error that I also touched. And finally the node fails further while trying to read these files.
Anyone has any idea what I should do ?
Thank you for any help.

It might be worth opening a Jira for that issue, so if nothing else, they can catch the NPE and provide a better error message.
It looks like it's trying to open:
file: /home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Statistics.db
It's possible that it's trying to read that file because it finds the associated data file: (/home/nudgeca2/datas/data/main/segment-97b5ba00571011e49a928bffe429b6b5/main-segment-ka-15432-Data.db). Does that data file exist? I'd be tempted to move it out of the way, and see if it starts properly.

Related

Neo4j failing start to start

I am running neo4j 3.5.14 packaged by Bitnami on AWS linux "5.4.0-1078-aws". The error began when the disk filled and the database crashed. After clearing disk space and attempting to restart the database. I have tried starting neo4j at both the CLI and service levels. When attempting to start from the CLI I get the following log in neo4j.log. I have run this both as both the neo4j user and the root user.
2022-06-17 17:13:50.745+0000 INFO ======== Neo4j 3.5.14 ========
2022-06-17 17:13:50.768+0000 INFO Starting...
2022-06-17 17:13:54.203+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:124)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:91)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase#3a6f2de3' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:180)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory, /var/lib/neo4j/data/databases
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:232)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:148)
at org.neo4j.server.database.CommunityGraphFactory.newGraphDatabase(CommunityGraphFactory.java:41)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:90)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.NeoStoreDataSource#19648c40' was successfully initialized, but failed to start. Please see the attached cause exception "Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.kernel.impl.transaction.state.DataSourceManager.start(DataSourceManager.java:116)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:227)
... 9 more
Caused by: java.lang.RuntimeException: Error reading transaction logs, recovery not possible. To force the database to start anyway, you can specify 'unsupported.dbms.tx_log.fail_on_corrupted_log_files=false'. This will try to recover as much as possible and then truncate the corrupt part of the transaction log. Doing this means your database integrity might be compromised, please consider restoring from a consistent backup instead.
at org.neo4j.kernel.recovery.Recovery.throwUnableToCleanRecover(Recovery.java:160)
at org.neo4j.kernel.recovery.LogTailScanner.findLogTail(LogTailScanner.java:147)
at org.neo4j.kernel.recovery.LogTailScanner.getTailInformation(LogTailScanner.java:260)
at org.neo4j.kernel.impl.transaction.log.LogVersionUpgradeChecker.check(LogVersionUpgradeChecker.java:48)
at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:349)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 14 more
Caused by: java.io.IOException: java.lang.IllegalArgumentException: Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10
at org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:115)
at org.neo4j.kernel.impl.transaction.log.LogEntryCursor.next(LogEntryCursor.java:54)
at org.neo4j.kernel.recovery.LogTailScanner.findLogTail(LogTailScanner.java:99)
... 18 more
Caused by: java.lang.IllegalArgumentException: Unknown entry type -10 for version -10. At position LogPosition{logVersion=0, byteOffset=7639039} and entry version V3_0_10
at org.neo4j.kernel.impl.transaction.log.entry.LogEntryVersion.entryParser(LogEntryVersion.java:130)
at org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:81)
... 20 more
2022-06-17 17:13:54.217+0000 INFO Neo4j Server shutdown initiated by request
I should also add that while I can see that the data of the database is intact in the databases folder, neo4j-admin dump command is dumping an empty database file.
You can
Shut down the database
make a copy of the data folder as a backup
use neo4j-admin check-consistency to check your database integrity
add this line to your /etc/neo4j/neo4j.conf unsupported.dbms.tx_log.fail_on_corrupted_log_files=false
Restart the db.
If that doesn't help, shut down the db again.
remove data/transactions folder
Restart the db.
If that doesn't help, shut down the db again.
You can also use neo4j-admin copy to make a copy of the database while skipping corrupt records.

CassandraDaemon.java:709 - Exception encountered during startup java.lang.IllegalArgumentException: Unknown CF

We have a 15 node cassandra setup across 3 DC's.
Using cassandra 3.0.9
One of the nodes in one of our DCs has died with the below startup error
CassandraDaemon.java:709 - Exception encountered during startup
java.lang.IllegalArgumentException: Unknown CF 111111-111111111-11111111
What we have tried -
Bootstrap replacing the node -https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsReplaceNode.html
This works for a while and then the bootstrap process hangs with the SAME error in the logs -
CassandraDaemon.java:709 - Exception encountered during startup
java.lang.IllegalArgumentException: Unknown CF
Provision a new blank node,
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html
and try to join this to the cluster (with the intention of removing the dead node after). It refuses to start, with the same error.
has anyone ever ran across this before?

CDIIntegrationService errors were found when Weblogic upgrade from 12.1.2 to 12.1.3

I have upgraded Weblogic version in linux server by changing the wl_home path in the setDomainEnv.sh for 12.1.2 to 12.1.3 and restart. when restarting it gives below errors.
Appreciate if anyone can give idea about this.
java.lang.IllegalAccessError: tried to access method com.bea.logging.LogBufferHandler.bufferLogObject(Ljava/lang/Object;)V from class weblogic.logging.log4j.WLLog4jMemoryBufferAppender
java.lang.IllegalStateException: Unable to perform operation: post construct on weblogic.diagnostics.lifecycle.LoggingServerService
java.lang.IllegalArgumentException: While attempting to resolve the dependencies of weblogic.diagnostics.lifecycle.DiagnosticFoundationService errors were found
java.lang.IllegalStateException: Unable to perform operation: resolve on weblogic.diagnostics.lifecycle.DiagnosticFoundationService
java.lang.IllegalArgumentException: While attempting to resolve the dependencies of com.oracle.injection.integration.CDIIntegrationService errors were found
java.lang.IllegalStateException: Unable to perform operation: resolve on com.oracle.injection.integration.CDIIntegrationService
Use the wllog4j.jar which is part of WLS 12.1.3 build. The one part of WLS_HOME/server/lib diretory.
If your domains-home/lib folder holds older wllog4.jar which was part of 12.1.2 then you will face this issue

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/downloads
So far I'm using (as of now) lastest versions of gcloud (143.0.0) and bdutil (1.3.5), freshly installed.
./bdutil deploy -e extensions/spark/spark_env.sh
using debian-8 as image (as bdutil still uses debian-7-backports).
At some point I got
Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1
full debug output is in https://gist.github.com/jlorper/4299a816fc0b140575ed70fe0da1f272
(project id and bucket names changed)
Instances are created, but spark not even installed. Digging a bit I've managed to run spark installation and start hadoop commands in the master after after ssh. But it fails badly when starting the spark-shell:
17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
and not able to import sparkSQL. For what I've read everything should be started automatically.
Up to this point I'm a bit lost and don't know what else to do.
Am I missing any step? Is any of the commands faulty? Thanks in advance.
Update: solved
As pointed out in accepted solution I cloned the repo and cluster was created without issues. When trying to start the spark-shell though it gave
java.lang.RuntimeException: java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.`
That sounded to me like connectors were not initialized properly, so after running
./bdutil --env_var_files extensions/spark/spark_env.sh,bigquery_env.sh run_command_group install_connectors
it worked as expected.
The last version of bdutil on https://cloud.google.com/hadoop/downloads is a bit stale and I'd instead recommend using the version of bdutil at head on github: https://github.com/GoogleCloudPlatform/bdutil.

Spark JobServer NullPointerException

I'm trying to start a spark jobserver, here are the steps I'm following:
I configure the local.sh based on the template.
Then I run ./bin/server_deploy.sh and it finishes without any error.
Configure local.conf.
Run ./bin/server_start.sh in the deploy server.
But when I do the last step I get the following error:
Error: Exception thrown by the agent : java.lang.NullPointerException
Note: I'm using spark 1.4.1. I'm using version 0.5.2 from jobserver (https://github.com/spark-jobserver/spark-jobserver/tree/v0.5.2)
Any idea in how I can fix this (or at least debug it).
Thanks
The error log does not provide much information.
I encountered the same error. For my case, I had another instance of the JobServer running (and somehow ./bin/server_stop.sh did not catch it). It works after I manually killed the other process.
Hint : Error: Exception thrown by the agent : java.lang.NullPointerException when starting Java application

Resources