cassandra ec2 node is shutting down Caused by: java.nio.file.FileSystemException - cassandra

This is the log I see
Caused by: java.nio.file.FileSystemException: /var/lib/cassandra/data/dev_fortis_mtd/explanationofbenefit-5fb6576031e511ec8611d5b080c74d01/snapshots/dropped-166672
6203042-explanationofbenefit/mc-1-big-Summary.db -> /var/lib/cassandra/data/dev_fortis_mtd/explanationofbenefit-5fb6576031e511ec8611d5b080c74d01/mc-1-big-Summary.d
b: Operation not permitted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_342]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_342]
at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476) ~[na:1.8.0_342]
at java.nio.file.Files.createLink(Files.java:1086) ~[na:1.8.0_342]
at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:100) ~[apache-cassandra-3.11.11.jar:3.11.11]
... 23 common frames omitted
ERROR [InternalResponseStage:6] 2022-10-25 19:30:03,044 DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is stop
ERROR [InternalResponseStage:6] 2022-10-25 19:30:03,044 StorageService.java:518 - Stopping gossiper
WARN [InternalResponseStage:6] 2022-10-25 19:30:03,044 StorageService.java:360 - Stopping gossip by operator request
INFO [InternalResponseStage:6] 2022-10-25 19:30:03,044 Gossiper.java:1683 - Announcing shutdown
INFO [InternalResponseStage:6] 2022-10-25 19:30:03,046 StorageService.java:2480 - Node /172.X.X.X state jump to shutdown

looks like a problem with mc-1-big-Summary.db for the dev_fortis_mtd.explanationofbenefit table, can you check the data dir to see if mc-1... has a complete sstable set? If not, can you remove the incomplete set and then repair this table to pull data from another node in the cluster?

Related

K8ssandra pod is replaying a large commit log and is not responding

We a 3 node Cassandra 4 cluster, at some point (I don't know why) we get in one of the ndoes:
CommitLog.java:173 - Replaying /opt/cassandra/data/commitlog/CommitLog-7-1674673652744.log
With a long list of logs
We can see in the metrics that disk throughput was about 17 GB
During this time we see in the other 2 nodes (the node replaying is not responsive for almost 2m) :
NoSpamLogger.java:98 - /20.9.1.45:7000->prod-k8ssandra-seed-service/20.9.0.242:7000-SMALL_MESSAGES-[no-channel] failed to connect
java.nio.channels.ClosedChannelException: null
at org.apache.cassandra.net.OutboundConnectionInitiator$Handler.channelInactive(OutboundConnectionInitiator.java:248)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:819)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
Questions:
What is the reason for this commit log replay?
Can we mitigate this node outage risk?
Update:
it seems the restart of node looks like somthing initiated by k8ssandra... this can explain the replay, what is the rason to the HTTP 500? I can't seem to see an
INFO [nioEventLoopGroup-2-2] 2023-01-25 19:07:10,694 Cli.java:617 - address=/127.0.0.6:53027 url=/api/v0/probes/liveness status=200 OK
INFO [nioEventLoopGroup-2-1] 2023-01-25 19:07:12,698 Cli.java:617 - address=http url=/api/v0/probes/readiness status=500 Internal Server Error
INFO [epollEventLoopGroup-38-1] 2023-01-25 19:07:20,700 Clock.java:47 - Using native clock for microsecond precision
WARN [epollEventLoopGroup-38-2] 2023-01-25 19:07:20,701 AbstractBootstrap.java:452 - Unknown channel option 'TCP_NODELAY' for channel '[id: 0x919a5c8b]'
WARN [epollEventLoopGroup-38-2] 2023-01-25 19:07:20,703 Loggers.java:39 - [s33] Error connecting to Node(endPoint=/tmp/cassandra.sock, hostId=null, hashCode=71aac1d0), trying next node (AnnotatedConnectException: connect(..) failed: Connection refused: /tmp/cassandra.sock)
INFO [nioEventLoopGroup-2-2] 2023-01-25 19:07:20,703 Cli.java:617 - address=/127.0.0.6:51773 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO [epollEventLoopGroup-39-1] 2023-01-25 19:07:25,393 Clock.java:47 - Using native clock for microsecond precision
WARN [epollEventLoopGroup-39-2] 2023-01-25 19:07:25,394 AbstractBootstrap.java:452 - Unknown channel option 'TCP_NODELAY' for channel '[id: 0x80b52436]'
WARN [epollEventLoopGroup-39-2] 2023-01-25 19:07:25,395 Loggers.java:39 - [s34] Error connecting to Node(endPoint=/tmp/cassandra.sock, hostId=null, hashCode=cc8ec36), trying next node (AnnotatedConnectException: connect(..) failed: Connection refused: /tmp/cassandra.sock)
INFO [pool-2-thread-1] 2023-01-25 19:07:25,602 LifecycleResources.java:186 - Started Cassandra
When a Cassandra doesn't shutdown cleanly, Cassandra doesn't have a chance to persist the contents of the memtable to disk so when it is restarted, Cassandra replays the commit logs to repopulate the memtables.
It seems like you're confusing cause and effect. The K8ssandra operator restarted the pod because it was unresponsive -- the restart is the effect, not the cause.
You will need to review the Cassandra logs on the pod for clues as to why it became unresponsive. From your description that there was a large commitlog replayed on restart, I would suspect that there was a lot of traffic to the cluster (a large commitlog is a result of lots of writes) and an overloaded node would explain why it became unresponsive. Again, you will need to review the logs to determine the cause.
K8ssandra monitors the pods using "liveness" and "readiness" probes (aka health checks) and the HTTP 500 error would have been a result of the node being unresponsive. This would have triggered the operator to initiate a restart of the pod to automatically recover it. Cheers!

Apache Pulsar Zookeeper: Unable to access datadir, exiting abnormally

I am using these steps to use apache pulsar on docker: https://github.com/streamnative/tgip/blob/master/episodes/001/demo.md
I was able to use these steps before to install and use pulsar but for some reason now when am creating a directory, it is going to write protected and pulsar zookeeper container is exiting with following logs as soon as it is created:
ERROR org.apache.zookeeper.server.ZooKeeperServerMain - Unable to access datadir, exiting abnormally
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Unable to create data directory data/zookeeper/version-2
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:136) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:137) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:112) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:67) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:140) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
Unable to access datadir, exiting abnormally
23:36:15.223 [main] INFO org.apache.zookeeper.audit.ZKAuditProvider - ZooKeeper audit is disabled.
23:36:15.226 [main] ERROR org.apache.zookeeper.util.ServiceUtils - Exiting JVM with code 3
23:36:15.196 [PurgeTask] ERROR org.apache.zookeeper.server.DatadirCleanupManager - Error occurred while purging.
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Unable to create data directory data/zookeeper/version-2
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:136) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:80) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:141) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
at java.util.TimerThread.run(Timer.java:506) [?:?]
23:36:15.229 [PurgeTask] INFO org.apache.zookeeper.server.DatadirCleanupManager - Purge task completed
I have made sure that SELinux is disabled and tried changing permission using chmod 777 data/ and every other step available to resolve this but still unable to find any. Please help me with the possible resolution.

Error committing 10k records to Janus graph with cassandra

I'm fetching around 10 million records from a oracle DB and trying to persist those to Janus graph with Cassandra as storage backend [using Spark framework].
When i tried iterating the records in a loop and tried to commit every 10k, I'm getting the below error
ERROR StandardJanusGraph: Could not commit transaction [1] due to storage exception in commit
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
When i tried to get only the first 1L record from Oracle and committed every 1K, then its working fine.
Can someone help me to resolve this error? Appreciate your help. Thank you!!
Update:
WARN [ReadStage-3] 2019-09-29 08:39:28,327 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-3,5,main]: {}
WARN [MemtableFlushWriter:17] 2019-09-29 09:09:40,843 NativeLibrary.java:304 - open(/var/lib/cassandra/data/circuit_equipment/system_properties-eeef4cb0e29711e9af61a34111381c19, O_RDONLY) failed, errno (2).
ERROR [MemtableFlushWriter:17] 2019-09-29 09:09:40,846 LogTransaction.java:272 - Transaction log [md_txn_flush_de900e80-e298-11e9-af61-a34111381c19.log in /var/lib/cassandra/data/circuit_equipment/system_properties-eeef4cb0e29711e9af61a34111381c19] indicates txn was not completed, trying to abort it now
ERROR [MemtableFlushWriter:17] 2019-09-29 09:09:40,847 LogTransaction.java:275 - Failed to abort transaction log [md_txn_flush_de900e80-e298-11e9-af61-a34111381c19.log in /var/lib/cassandra/data/circuit_equipment/system_properties-eeef4cb0e29711e9af61a34111381c19]
ERROR [MemtableFlushWriter:17] 2019-09-29 09:09:40,848 LogTransaction.java:222 - Unable to delete /var/lib/cassandra/data/circuit_equipment/system_properties-eeef4cb0e29711e9af61a34111381c19/md_txn_flush_de900e80-e298-11e9-af61-a34111381c19.log as it does not exist, see debug log file for stack trace
ERROR [MemtablePostFlush:9] 2019-09-29 09:09:40,849 CassandraDaemon.java:228 - Exception in thread Thread[MemtablePostFlush:9,5,main]
WARN [StorageServiceShutdownHook] 2019-09-29 09:09:40,849 StorageService.java:4591 - Caught exception while waiting for memtable flushes during shutdown hook
ERROR [StorageServiceShutdownHook] 2019-09-29 09:09:40,931 AbstractCommitLogSegmentManager.java:308 - Failed to force-recycle all segments; at least one segment is still in use with dirty CFs.
WARN [main] 2019-09-29 09:09:44,580 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN [main] 2019-09-29 09:09:44,581 StartupChecks.java:169 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
WARN [main] 2019-09-29 09:09:44,591 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : false, Address space adequate? : true, nofile limit adequate? : true, nproc limit adequate? : true
WARN [main] 2019-09-29 09:09:44,593 StartupChecks.java:311 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl.
WARN [Native-Transport-Requests-1] 2019-09-29 09:12:12,841 CompressionParams.java:383 - The sstable_compression option has been deprecated. You should use class instead
WARN [Native-Transport-Requests-1] 2019-09-29 09:12:12,842 CompressionParams.java:334 - The chunk_length_kb option has been deprecated. You should use chunk_length_in_kb instead
WARN [main] 2019-09-29 12:59:57,584 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN [main] 2019-09-29 12:59:57,585 StartupChecks.java:169 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
WARN [main] 2019-09-29 12:59:57,599 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : false, Address space adequate? : true, nofile limit adequate? : true, nproc limit adequate? : true
WARN [main] 2019-09-29 12:59:57,602 StartupChecks.java:311 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl.
root#f451df425ca8:/var/log/cassandra#
From these messages, you should disable swap (this is actually one of the main recommendations in Cassandra):
WARN [main] 2019-09-29 09:09:44,580 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN [main] 2019-09-29 09:09:44,591 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : false, Address space adequate? : true, nofile limit adequate? : true, nproc limit adequate? : true
You should also change max_map_count. You can use this guide to set the other values for production environments. From this message:
WARN [main] 2019-09-29 12:59:57,602 StartupChecks.java:311 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl.

Streaming error occurred org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe

I am trying to add a node to the cluster. Adding new node to the cluster fails with a broken pipe. Cassandra fails after starting within 2 minutes. I removed the node from the ring and adding it back fails.
OS info: 4.4.0-59-generic #80-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux.
This is the error I get on the node that I am trying to bootstrap.
cassandra version - 2.2.7. Getting Broken pipe exception..
ERROR [STREAM-OUT-/123.120.56.71] 2017-04-10 23:46:15,410 StreamSession.java:532 - Stream #cbb7a150-1e47-11e7-a556-a98ec456f4de Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144) ~[apache-cassandra-2.2.7.jar:2.2.7]
at org.apache.cassandra.streaming.compress.CompressedStreamWriter$1.apply(CompressedStreamWriter.java:91) ~[apache-cassandra-2.2.7.jar:2.2. 7]
at org.apache.cassandra.streaming.compress.CompressedStreamWriter$1.apply(CompressedStreamWriter.java:88) ~[apache-cassandra-2.2.7.jar:2.2. 7]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:297) ~[apache-cassandra-2.2.7 .jar:2.2.7]
at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:87) ~[apache-cassandra-2.2.7.jar:2.2.7]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:90) ~[apache-cassandra-2.2.7.jar:2.2.7]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48) ~[apache-cassandra-2.2.7.jar:2.2.7]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-2.2.7.jar:2.2.7]
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:47) ~[apache-cassandra-2.2.7.jar:2.2.7]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:389) ~[apache-cassandra-2.2.7 .jar:2.2.7]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:361) ~[apache-cassandra-2.2.7.jar:2.2.7]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_101]
at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428) ~[na:1.8.0_101]
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493) ~[na:1.8.0_101]
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608) ~[na:1.8.0_101]
at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:140) ~[apache-cassandra-2.2.7.jar:2.2.7]
... 11 common frames omitted
INFO [STREAM-OUT-/123.120.56.71] 2017-04-10 23:46:15,424 StreamResultFuture.java:183 - Stream #cbb7a150-1e47-11e7-a556-a98ec456f4de Session with / 123.120.56.71 is complete
WARN [STREAM-OUT-/123.120.56.71] 2017-04-10 23:46:15,425 StreamResultFuture.java:210 - Stream #cbb7a150-1e47-11e7-a556-a98ec456f4de Stream failed
Can be due to corrupted data, wrong ssl configuration, schema disagreement or network failures.
Look like you have corrupted data or schema disagreement, so try the following:
1) Remove all the data from your data and commitlog directories, and then try to start.
2) If it doesn't help, try to to start with auto_bootstrap: false in cassandra.yaml. After the node starts and up, run nodetool rebuild.
If it fails, please attach all the errors here.

Rexster refuses to start with extension but does not display errors

I have a small Rexster/Titan cluster using Cassandra. A Rexster extension is used to query the graph. I did some benchmarking and did start and stop Rexster/Titan many times. But now I run into a strange issue: Rexster refuses to start but does not display any error message.
I tried to figure out what is causing this and reduced the cluster to a single node 192.168.0.4.
If I remove my extension Rexster manages to start up.
# console output
Forking Cassandra...
Running `nodetool statusthrift`..... OK
(returned exit status 0 and printed string "running").
Forking Titan + Rexster...
Connecting to Titan + Rexster (127.0.0.1:8184)...... OK
(connected to 127.0.0.1:8184).
Run rexster-console.sh to connect.
but when I place my extension uber JAR in the ext folder Rexster refuses to start.
# console output
Forking Cassandra...
Running `nodetool statusthrift`..... OK
(returned exit status 0 and printed string "running").
Forking Titan + Rexster...
Connecting to Titan + Rexster (127.0.0.1:8184)............................
timeout exceeded (60 seconds): could not connect to 127.0.0.1:8184
See /var/lib/titan/bin/../log/rexstitan.log for Rexster log output.
If I now check rexstitan.log, as suggested by the console output, I can not find any error message.
# rexstitan.log
0 [main] INFO com.tinkerpop.rexster.Application - .:Welcome to Rexster:.
73 [main] INFO com.tinkerpop.rexster.server.RexsterProperties -
Using [/var/lib/titan/rexhome/../conf/rexster-cassandra-cluster.xml]
as configuration source.
78 [main] INFO com.tinkerpop.rexster.Application - Rexster is watching
[/var/lib/titan/rexhome/../conf/rexster-cassandra-cluster.xml] for change.
244 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=ClusterTitanConnectionPool,ServiceType=connectionpool
252 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
537 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=KeyspaceTitanConnectionPool,ServiceType=connectionpool
538 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
1951 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration -
Set cluster.partition=false from store features
1971 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration -
Set default timestamp provider MICRO
2019 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration -
Generated unique-instance-id=7f0000012902-node1
2045 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=ClusterTitanConnectionPool,ServiceType=connectionpool
2046 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
2053 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=KeyspaceTitanConnectionPool,ServiceType=connectionpool
2054 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
2228 [main] INFO com.thinkaurelius.titan.diskstorage.Backend -
Initiated backend operations thread pool of size 4
6619 [main] INFO com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog -
Loaded unidentified ReadMarker start time Timepoint[1423479705116000 μs]
into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller#212f3ff1
6625 [main] INFO com.tinkerpop.rexster.RexsterApplicationGraph -
Graph [graph] - configured with allowable namespace [*:*]
The only entry that looks strange to me is the one concerning the log:
6619 [main] INFO com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog -
Loaded unidentified ReadMarker start time Timepoint[1423479705116000 μs]
into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller#212f3ff1
My exception uses the logger for debugging. You can see the instantiation an usage on github: https://github.com/sebschlicht/titan-graphity-kribble/blob/master/src/main/java/de/uniko/sebschlicht/titan/extensions/GraphityExtension.java#L22
Though Rexster failed to start there is a process with the PID displayed in the console but curl fails to connect to Rexster:
$ curl 192.168.0.4:8182
curl: (7) Failed to connect to 192.168.0.4 port 8182: Connection refused
Why doesn't Rexster throw an exception? How can I debug this situation?
edit:
I removed any log messages in my code. I removed all exceptions that may be thrown during startup. Still Rexster refuses to start with my extension and the only hint in the log files is the unidentified read marker. I have to clue what prevents Rexster from starting.
The log message is nothing to worry about.
After rebuilding the application in another project step-by-step Rexster is now able to start with the extension. During this rebuild I noticed two situations, that can cause the behaviour described:
Missing dependency
If your project depends on a second project you might use Maven to inject it as a dependency. However, if you use
mvn clean package
to build the extension's JAR file it does not contain this dependency by default. You need to use a Maven plugin (e.g. maven-shade-plugin) to create a shaded JAR that contains all the dependencies your extension needs. Set the dependency scope to provided for all Titan/Rexster/Blueprints related dependencies. Use the shaded uber-JAR to deploy the extension to Rexster.
However, this was not new to me and should not have caused the problem in my case. There might be more situations that cause this problem or maybe there was a problem with Maven that messed up the shaded JAR. Feel free to browse the commit on github to catch this voodoo.
Missing extension
Another cause of this behaviour is a missing extension.
If you specify an extension in the com.tinkerpop.rexster.extension.RexsterExtension resource file, that is not present on startup, Rexster does neither log nor throw an exception, but refuses to start.

Resources