Tinkerpop Gremlin server MissingPropertyException for SparkGraphComputer in remote mode - groovy

I am new to tinkerpop, gremlin and groovy.
I have configured a Tinkerpop Gremlin Server and Console [v3.2.3] with verified integration with HDFS and Spark.
Next I try to execute below code using gremlin console in local mode, everything works fine, a spark job is submitted and successfully processed.
:load data/grateful-dead-janusgraph-schema.groovy
graph = JanusGraphFactory.open('conf/connection.properties')
defineGratefulDeadSchema(graph)
graph.close()
hdfs.copyFromLocal('data/grateful-dead.kryo','data/grateful-dead.kryo')
graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
blvp = BulkLoaderVertexProgram.build().writeGraph('conf/connection.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
Next I connect gremlin console to gremlin server as remote using below command.
:remote connect tinkerpop.server conf/remote.yaml
After this I execute above code prefixing statements with ":> ". As soon as I submit last line which is submitting processing to SparkGraphComputer, I get below exception at the server -
[WARN] AbstractEvalOpProcessor - Exception processing a script on request [RequestMessage{, requestId=097785d6-7114-44fb-acbc-1b116dfdaac2, op='eval', processor='', args={gremlin=graph.compute(SparkGraphComputer).program(blvp).submit().get(), bindings={}, batchSize=64}}].
groovy.lang.MissingPropertyException: No such property: SparkGraphComputer for class: Script4
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:53)
at org.codehaus.groovy.runtime.callsite.PogoGetPropertySite.getProperty(PogoGetPropertySite.java:52)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:307)
at Script4.run(Script4.groovy:1)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:619)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:448)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
at org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines.eval(ScriptEngines.java:119)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$2(GremlinExecutor.java:287)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I am unable to understand what does MissingPropertyException means in groovy, is it similar to NoClassDefFound in java?
I believe some configuration is missing at the server end, can someone help me out?

Well there's two ways to go about this. You can simply import SparkGraphComputer in the script that you're sending or you can add it to the scriptEngines configuration for your gremlin server. Something like
scriptEngines: {
gremlin-groovy: {
imports: [your.full.path.to.TheClass],
staticImports: [your.full.path.to.TheClass.StaticVar]
}
}

Related

cucumber defaults publishing to some URL?

I am trying to generate a report using Cucumber-jvm 6.11.0, and it works fine on my machine, when I put these properties in junit-platform.properties :
cucumber.publish.enabled=true
cucumber.plugin=pretty, json:build/reports/cucumber/report.json
cucumber.junit-platform.naming-strategy=long
However, when I run it on Jenkins, I get an ConnectException during the publication :
java.lang.RuntimeException: java.net.ConnectException: Connection timed out (Connection timed out)
at io.cucumber.core.plugin.MessageFormatter.writeMessage(MessageFormatter.java:36)
at io.cucumber.core.eventbus.AbstractEventPublisher.send(AbstractEventPublisher.java:51)
at io.cucumber.core.eventbus.AbstractEventBus.send(AbstractEventBus.java:12)
at io.cucumber.core.runtime.SynchronizedEventBus.send(SynchronizedEventBus.java:47)
at io.cucumber.core.runtime.CucumberExecutionContext.emitTestRunFinished(CucumberExecutionContext.java:102)
at io.cucumber.core.runtime.CucumberExecutionContext.finishTestRun(CucumberExecutionContext.java:74)
at io.cucumber.junit.platform.engine.CucumberEngineExecutionContext.finishTestRun(CucumberEngineExecutionContext.java:98)
at io.cucumber.junit.platform.engine.CucumberEngineDescriptor.after(CucumberEngineDescriptor.java:37)
at io.cucumber.junit.platform.engine.CucumberEngineDescriptor.after(CucumberEngineDescriptor.java:10)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:149)
at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:149)
at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
...
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at java.base/sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1963)
at java.base/sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1958)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1957)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1525)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1509)
at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:329)
at io.cucumber.core.plugin.UrlOutputStream.getResponseBody(UrlOutputStream.java:111)
at io.cucumber.core.plugin.UrlOutputStream.sendRequest(UrlOutputStream.java:83)
I tried with different combination of properties, and I see it starts happening the moment I enable the publishing, with only :
cucumber.publish.enabled=true
I am not finding the default behavior in the documentation, once we enable the publishing : where does it get published by default ? does it really try to upload it through http ? (I guess the proxy is not configured when running on Jenkins, while it is found when running on my machine, hence the different behavior)
How come I still get this error when I simply try to write the html or json report on disk ?
When you enable report publishing, it uploads test result to Cucumber cloud service and you get the unique URL that you (or anyone you share that link with) can use to access your report.
The report is self-destructive in 24 hours. You can find more details in official Cucumber blog.

JanusGraph Error : "Could not find type for id" during a concurrent load operation

While performing a concurrent bulk load operation, I received this error. Subsequently, all my queries failed, and I kept getting the same error .
The exception I got is as follows:
java.lang.NullPointerException: Could not find type for id: 52237 at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:250) at org.janusgraph.graphdb.types.vertices.JanusGraphSchemaVertex.name(JanusGraphSchemaVertex.java:57) at org.janusgraph.graphdb.vertices.AbstractVertex.label(AbstractVertex.java:121) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceElement.(ReferenceElement.java:57) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex.(ReferenceVertex.java:46) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceFactory.detach(ReferenceFactory.java:48) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceFactory.detach(ReferenceFactory.java:69) at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceFactory.detach(ReferenceFactory.java:80) at org.apache.tinkerpop.gremlin.process.traversal.strategy.decoration.HaltedTraverserStrategy.halt(HaltedTraverserStrategy.java:60) at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.next(TraverserIterator.java:64) at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:529) at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$4(TraversalOpProcessor.java:382) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Some additional context :
storage.batch-loading was NOT enabled
The bulk write operation I was running was highly concurrent and with high load
I used about 100 instances of gremlin server connecting to Cassandra/ES backend
I did not explicitly define a schema
Would be great if someone could give me an idea about what could have caused this .
Thanks !
it happens if multiple instance of gremlin-server are running
it is because gremlin server was not shutdown or killed properly.
it can be because the vm on which gremlin-server is running might have restarted.
so the solution is login to gremlin-console and run your commands based on your backend.in my case it's cassandra and elasticsearch
so i will run
method 1
:remote connect tinkerpop.server conf/remote.yaml session
:remote console session
or
graph=JanusGraphFactory.open('conf/janusgraph-cql-es.properties');
g=graph.traversal()
and if you are running containers then your command must be similar to this
graph=JanusGraphFactory.open('/etc/opt/janusgraph/janusgraph.properties');
g=graph.traversal()
now after running those you can run
mgmt = graph.openManagement()
mgmt.getOpenInstances()
it will display all the instances
eg
ac12000231-a9ffbcbb0e921
ac12000230-a9ffbcbb0e921(current)
except that current instance close other instances
mgmt.forceCloseInstance('ac12000231-a9ffbcbb0e921')
after closing all the instances commit the changes
mgmt.commit()
now restart your gremlin server and run your query it should work
method 2
if the problem persists just kill your gremlin-server and start it again few times...it should work
load command should work
another reason why this happens is if the data is not restored properly..
if you are using cluster take the backup on all the nodes
then restore on your destination node or nodes
i used nodetool for backup and sstableloader for restoring data

Instrumenting Spark JDBC with javaagent

I am attempting to instrument JDBC calls using the Kamon JDBC Kanela agent in my Spark app.
I am able to successfully instrument JDBC calls in a non-spark test app by passing in -javaagent:kanela-agent-1.0.1.jar on the command line when I run the app from the JAR. When I do this, I see the Kanela banner display in the console, and can see that my failed statement processor is getting called when there is a SQL error.
From my research, I should be able to inject a javaagent into the executor of a Spark app by passing in the following to spark-submit: --conf "spark.executor.extraJavaOptions=-javaagent:kanela-agent-1.0.1.jar". However, when I do this, although the Kamon banner IS displaying on the console upon my call to Kamon.init(), my failed statement processor is NOT getting called when there is a SQL error.
Things I'm wondering:
Is there something about the way that spark-jdbc makes these JDBC calls that would prevent a javaagent from "seeing" them?
Does my call to Kamon.init() somehow only apply to code in the Spark driver, and not the executor?
Any other reason that you can think of that would be preventing this from working?

Unable to connect to Neo4j Community Server via Spark Shell : The client is unauthorized due to authentication failure

Versions
Spark:: 2.4.0
Neo4j:: Neo4j Community Version 3.5.6
Problem Statement:
I am trying to connect spark shell with the Neo4j Community server. Everything is being run locally. The end goal that I am trying to achieve is that I want to query the Neo4j and load the data in the form of rdds. Later on I want to convert these rdds to Json Structure. I am using this connector https://github.com/neo4j-contrib/neo4j-spark-connector
But at the moment I am facing authentication Problems with the Neo4j Server. When I execute basic commands to make connection and set the Neo4j context. They seems to work fine but when I try to run rdd.count or rdd.first.schema.fieldName I run into authentication errors that client is not authenticated.
Spark Shell Commands:
spark-shell --master spark://10.62.10.71:7077 --conf spark.neo4j.bolt.username=neo4j spark.neo4j.bolt.password=<password> --jars ‪C:/Users/khalid-admin/Desktop/jar_files/neo4j-spark-connector-full-2.4.0-M6
import org.neo4j.spark._
val neo = Neo4j(sc)
val rdd = neo.cypher("MATCH (n:Person) RETURN id(n) as id ").loadRowRdd
Image:
Error:
[Stage 0:> (0 + 1) / 1]2019-08-22 00:25:17 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, 10.62.10.71, executor 0): org.neo4j.driver.v1.exceptions.AuthenticationException: The client is unauthorized due to authentication failure.
at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:122)
at org.neo4j.driver.internal.DriverFactory.verifyConnectivity(DriverFactory.java:346)
at org.neo4j.driver.internal.DriverFactory.newInstance(DriverFactory.java:93)
at org.neo4j.driver.v1.GraphDatabase.driver(GraphDatabase.java:136)
at org.neo4j.driver.v1.GraphDatabase.driver(GraphDatabase.java:119)
at org.neo4j.spark.Neo4jConfig.driver(Neo4jConfig.scala:15)
at org.neo4j.spark.Neo4jConfig.driver(Neo4jConfig.scala:19)
at org.neo4j.spark.Executor$.execute(Neo4j.scala:394)
at org.neo4j.spark.Neo4jRDD.compute(Neo4j.scala:458)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppresse
Steps Tried so far:
So far I have tried the following steps:
I have made sure that I am not using default credentials of the
Neo4j.
When I first login in the Neo4j Browser, It prompted me for
a Change Password window and I changed my password. Everything else seems to be working fine.
Can anybody guide what further steps I can take to narrow down the problem?

Cassandra 2.1, Datastax Java driver 2.1.1 - Custom authentication/authorization plugin causes NoHostAvailableException

Environment is Red Hat, Cassandra 2.1, Datastax Java driver 2.1.1.
I have developed custom authentication/authorization plugins for Cassandra, and they work beautifully when I try them with cqlsh - I can see my plugins being called, users are authenticated/authorized accordingly, etc. - bottom line, everything works exactly as expected.
Then I tried to test using the Datastax driver. I'm connecting to Cassandra with:
public class CassandraConnection {
private final Cluster cluster;
private final Session session;
public CassandraConnection(final String node, final int port) {
this.cluster = Cluster.builder()
.addContactPoint(node)
.withPort(port)
.withCredentials("someuser", "somepassword")
.build();
this.session = cluster.connect();
}
// Etc....
The call to cluster.connect() generates an exception:
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:196)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:80)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1145)
at com.datastax.driver.core.Cluster.init(Cluster.java:149)
at com.datastax.driver.core.Cluster.connect(Cluster.java:225)
at com.<company...packages...>.CassandraConnection.<init>(CassandraConnection.java:21)
Here is the puzzling part: although I can see my plugins being called when I test them using cqlsh, they are never accessed when I use the Datastax driver - I have added log messages in the beginning of each method, and they are never called. There are no errors in the logs indicating any sort of initialization problem, and I do see a message indicating that my plugins will be used.
That exact same client code works with no problem when:
I don't have my plugin running.
I use Cassadra's PasswordAuthenticator.
So, it looks like there is some problem with my plugins, but how can that be if 1) they work fine with cqlsh and 2) none of their methods are being called when the datastax driver is being used?
A couple of additional points - if I try to connect using Datastax's DevCenter, I see the same behavior as my client, with the exact same exception, so that rules out my (very simple) client code. I have also tried to:
cluster.getConfiguration().getSocketOptions().setReadTimeoutMillis(10000);
before calling connect() as suggested in other posts, but that didn't help either - when I step through the client with the debugger, I see the error as soon as I call cluster.connect(), so it's not a time out issue either.
Any help is appreciated.

Resources