JanusGraph Error : "Could not find type for id" during a concurrent load operation - cassandra

While performing a concurrent bulk load operation, I received this error. Subsequently, all my queries failed, and I kept getting the same error .
The exception I got is as follows:
java.lang.NullPointerException: Could not find type for id: 52237 at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:250) at org.janusgraph.graphdb.types.vertices.JanusGraphSchemaVertex.name(JanusGraphSchemaVertex.java:57) at org.janusgraph.graphdb.vertices.AbstractVertex.label(AbstractVertex.java:121) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceElement.(ReferenceElement.java:57) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex.(ReferenceVertex.java:46) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceFactory.detach(ReferenceFactory.java:48) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceFactory.detach(ReferenceFactory.java:69) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceFactory.detach(ReferenceFactory.java:80) at org.apache.tinkerpop.gremlin.process.traversal.strategy.decoration.HaltedTraverserStrategy.halt(HaltedTraverserStrategy.java:60) at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.next(TraverserIterator.java:64) at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:529) at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$4(TraversalOpProcessor.java:382) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Some additional context :
storage.batch-loading was NOT enabled
The bulk write operation I was running was highly concurrent and with high load
I used about 100 instances of gremlin server connecting to Cassandra/ES backend
I did not explicitly define a schema
Would be great if someone could give me an idea about what could have caused this .
Thanks !

it happens if multiple instance of gremlin-server are running
it is because gremlin server was not shutdown or killed properly.
it can be because the vm on which gremlin-server is running might have restarted.
so the solution is login to gremlin-console and run your commands based on your backend.in my case it's cassandra and elasticsearch
so i will run
method 1
:remote connect tinkerpop.server conf/remote.yaml session
:remote console session
or
graph=JanusGraphFactory.open('conf/janusgraph-cql-es.properties');
g=graph.traversal()
and if you are running containers then your command must be similar to this
graph=JanusGraphFactory.open('/etc/opt/janusgraph/janusgraph.properties');
g=graph.traversal()
now after running those you can run
mgmt = graph.openManagement()
mgmt.getOpenInstances()
it will display all the instances
eg
ac12000231-a9ffbcbb0e921
ac12000230-a9ffbcbb0e921(current)
except that current instance close other instances
mgmt.forceCloseInstance('ac12000231-a9ffbcbb0e921')
after closing all the instances commit the changes
mgmt.commit()
now restart your gremlin server and run your query it should work
method 2
if the problem persists just kill your gremlin-server and start it again few times...it should work
load command should work
another reason why this happens is if the data is not restored properly..
if you are using cluster take the backup on all the nodes
then restore on your destination node or nodes
i used nodetool for backup and sstableloader for restoring data

Related

Databricks error:Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster

I'm working in databricks 8.1. If I’m running my Pyspark code line by line using Shift + Enter then my code is running fine but if I’m running the entire notebook using Run All button then I’m getting internal error message. The complete error message is
Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster.
com.databricks.rpc.RPCResponseTooLarge: rpc response (of 20984709 bytes) exceeds limit of 20971520
bytes at com.databricks.rpc.Jetty9Client$$anon$1.onContent(Jetty9Client.scala:370) at
shaded.v9_4.org.eclipse.jetty.client.api.Response$Listener$Adapter.onContent(Response.java:248) at
shaded.v9_4.org.eclipse.jetty.client.ResponseNotifier.notifyContent(ResponseNotifier.java:135) at
shaded.v9_4.org.eclipse.jetty.client.ResponseNotifier.notifyContent(ResponseNotifier.java:126) at
shaded.v9_4.org.eclipse.jetty.client.HttpReceiver.responseContent(HttpReceiver.java:340) at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.content(HttpReceiverOverHTTP.java:283)
at shaded.v9_4.org.eclipse.jetty.http.HttpParser.parseContent(HttpParser.java:1762) at
shaded.v9_4.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1490) at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:172)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:135)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:73)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:133)
at shaded.v9_4.org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:151)
at shaded.v9_4.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at shaded.v9_4.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:426)
at shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:320) at
shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:158) at
shaded.v9_4.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at
shaded.v9_4.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at shaded.v9_4.org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367) at shaded.v9_4.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782) at shaded.v9_4.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:914) at java.base/java.lang.Thread.run(Thread.java:834)
Several times, I restarted the cluster but still I'm getting same issue
Can you suggest me the steps to resolve this issue?

prestodb| worker not found & v1/collector/general is not returning any values

Hi I have configured prestodb with one coordinator and one worker.
when I run the worker I do get the message like
Discovery server connect succeeded for refresh (presto/general)
Discovery server connect succeeded for refresh (collector/general)
io.airlift.discovery.client.Announcer Discovery server connect succeeded for announce
however when I run a query it says worker not available.
also when I try to see if the below urls works
http://<master>/v1/service/presto/general - works ( i can see both nodes)
However when i use
http://<master>//v1/service/collector/general - doesn't work below is the result
{"environment":"dev","services":[]}
Server config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8000
query.max-memory=50GB
query.max-memory-per-node=3GB
discovery-server.enabled=true
discovery.uri=http://gdcrtdev01.[domain]:8000
Worker config.properties
coordinator=false
http-server.http.port=8000
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://gdcrtdev01.[domain]:8000

Tinkerpop Gremlin server MissingPropertyException for SparkGraphComputer in remote mode

I am new to tinkerpop, gremlin and groovy.
I have configured a Tinkerpop Gremlin Server and Console [v3.2.3] with verified integration with HDFS and Spark.
Next I try to execute below code using gremlin console in local mode, everything works fine, a spark job is submitted and successfully processed.
:load data/grateful-dead-janusgraph-schema.groovy
graph = JanusGraphFactory.open('conf/connection.properties')
defineGratefulDeadSchema(graph)
graph.close()
hdfs.copyFromLocal('data/grateful-dead.kryo','data/grateful-dead.kryo')
graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
blvp = BulkLoaderVertexProgram.build().writeGraph('conf/connection.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
Next I connect gremlin console to gremlin server as remote using below command.
:remote connect tinkerpop.server conf/remote.yaml
After this I execute above code prefixing statements with ":> ". As soon as I submit last line which is submitting processing to SparkGraphComputer, I get below exception at the server -
[WARN] AbstractEvalOpProcessor - Exception processing a script on request [RequestMessage{, requestId=097785d6-7114-44fb-acbc-1b116dfdaac2, op='eval', processor='', args={gremlin=graph.compute(SparkGraphComputer).program(blvp).submit().get(), bindings={}, batchSize=64}}].
groovy.lang.MissingPropertyException: No such property: SparkGraphComputer for class: Script4
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:53)
at org.codehaus.groovy.runtime.callsite.PogoGetPropertySite.getProperty(PogoGetPropertySite.java:52)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:307)
at Script4.run(Script4.groovy:1)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:619)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:448)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
at org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines.eval(ScriptEngines.java:119)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$2(GremlinExecutor.java:287)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I am unable to understand what does MissingPropertyException means in groovy, is it similar to NoClassDefFound in java?
I believe some configuration is missing at the server end, can someone help me out?
Well there's two ways to go about this. You can simply import SparkGraphComputer in the script that you're sending or you can add it to the scriptEngines configuration for your gremlin server. Something like
scriptEngines: {
gremlin-groovy: {
imports: [your.full.path.to.TheClass],
staticImports: [your.full.path.to.TheClass.StaticVar]
}
}

Accumulo's createtable command gets stuck and does not create a table

I was trying to create a table inside Accumulo using the createtable command and found out that it was getting stuck. I waited for around 20 mins before cancelling the createtable command.
createtable test_table
I have one master and 2 tablet servers and found out that my master and one of the tablets died. I could not telnet to port 9997 of that particular tablet server and I could not even telnet to port 29999 (master.port.client in accumulo-site.xml). When I saw the tserver logs of the dead server, I saw the following entries.
2016-05-10 02:12:07,456 [zookeeper.DistributedWorkQueue] INFO : Got unexpected z
ookeeper event: None for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/recovery
2016-05-10 02:12:23,883 [zookeeper.ZooCache] WARN : Saw (possibly) transient exc
eption communicating with ZooKeeper, will retry
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/tables
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.accumulo.fate.zookeeper.ZooCache$1.run(ZooCache.java:210)
at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)
at org.apache.accumulo.fate.zookeeper.ZooCache.getChildren(ZooCache.java
:221)
at org.apache.accumulo.core.client.impl.Tables.exists(Tables.java:142)
at org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager.tab
leExists(LargestFirstMemoryManager.java:149)
at org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager.get
MemoryManagementActions(LargestFirstMemoryManager.java:175)
at org.apache.accumulo.tserver.TabletServerResourceManager$MemoryManagem
entFramework.manageMemory(TabletServerResourceManager.java:408)
at org.apache.accumulo.tserver.TabletServerResourceManager$MemoryManagem
entFramework.access$400(TabletServerResourceManager.java:318)
at org.apache.accumulo.tserver.TabletServerResourceManager$MemoryManagem
entFramework$2.run(TabletServerResourceManager.java:346)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.jav
a:35)
at java.lang.Thread.run(Thread.java:745)
2016-05-10 02:12:23,884 [zookeeper.ZooCache] WARN : Saw (possibly) transient exc
eption communicating with ZooKeeper, will retry
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/tables/!0/con
f/table.classpath.context
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:264)
at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)
at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCache
PropertyAccessor.java:117)
at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCache
PropertyAccessor.java:103)
at org.apache.accumulo.server.conf.TableConfiguration.get(TableConfigura
tion.java:99)
at org.apache.accumulo.tserver.constraints.ConstraintChecker.classLoader
Changed(ConstraintChecker.java:93)
at org.apache.accumulo.tserver.tablet.Tablet.checkConstraints(Tablet.jav
a:1225)
at org.apache.accumulo.tserver.TabletServer$8.run(TabletServer.java:2848
)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
1)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-10 02:12:23,887 [zookeeper.ZooReader] WARN : Saw (possibly) transient ex
ception communicating with ZooKeeper
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/tservers/accu
mulo.tablet.2:9997
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java
:132)
at org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.j
ava:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-05-10 02:12:24,252 [watcher.MonitorLog4jWatcher] INFO : Changing monitor lo
g4j address to accumulo.master:4560
2016-05-10 02:12:24,252 [watcher.MonitorLog4jWatcher] INFO : Enabled log-forward
ing
Even the master server's logs had the same stacktrace. My zookeeper is running.
At first, I thought it was a disk issue. Maybe there was no space. But that was not the case. I ran the fsck on the accumulo instance.volumes and it returned the HEALTHY status.
Does anyone know what exactly happened and if possible, how to avoid it?
EDIT : Even the tracer_accumulo.master.log had the same stacktrace.
ZooKeeper session expirations occur when a thread inside the ZooKeeper client does not get run within the necessary time (by default, 30s) to maintain the session which is an in-memory state between ZooKeeper client and server. There is no single explanation for this, but many common culprits:
JVM garbage collection pauses in the client. Accumulo should log a warning if it experienced a pause.
Lack of CPU time. If the host itself is overburdened, Accumulo might not have the cycles to run all of the tasks it needs to in a timely manner.
Lack of sockets/filehandles, Accumulo could be trying to connect to ZooKeeper, but be unable to open new connections
ZooKeeper might be rate-limiting connections as a denial-of-service prevention. Check the zookeeper logs for errors about dropping/denying new connections from a specific IP, and, if you see these errors, consider increasing maxClientCnxns in zoo.cfg.

Cassandra Streaming error - Unknown keyspace system_traces

In our dev cluster, which has been running smooth before, when we replace a node (which we have been doing constantly) the following failure occurs and prevents the replacement node from joining.
cassandra version is 2.0.7
What can be done about it?
ERROR [STREAM-IN-/10.128.---.---] 2014-11-19 12:35:58,007 StreamSession.java (line 420) [Stream #9cad81f0-6fe8-11e4-b575-4b49634010a9] Streaming error occurred
java.lang.AssertionError: Unknown keyspace system_traces
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:260)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:239)
at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:368)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:289)
at java.lang.Thread.run(Thread.java:745)
I got the same error while I was trying to setup my cluster, and as I was experimenting with different switches in cassandra.yaml, I restarted the service multiple times and removed the system dir under data directory (/var/lib/cassandra/data as mentioned here).
I guess for some reason cassandra tries to load system_traces keyspace and fails (the other dir under /var/lib/cassandra/data), and nodetool throws this error. You can just remove both system and system_traces before starting cassandra service, or even better delete all content of bommitlog, data and savedcache there.
This works obviously if you dont have any data just yet in the system.

Resources