scala nutch gora-cassandra - RuntimeException: job failed - cassandra

I'm trying to run nutch and load the crawled data into cassandra.
I've got my sbt file
"org.apache.gora" % "gora-cassandra" % "0.3",
"org.apache.nutch" % "nutch" % "2.2.1",
"com.datastax.cassandra" % "cassandra-driver-core" % "2.1.2"
and am kicking off the job
ToolRunner.run(NutchConfiguration.create(), new Crawler(), Array("urls"));
but am hitting the slightly vague error
EDIT - updated to be full logs from start of request
[Ljava.lang.String;#526950c7
****file:/home/abdev/Working/Qordaoba/gl/web-crawling-services/crawling-services/urls
[error] play - Cannot invoke the action, eventually got an error: java.lang.RuntimeException: job failed: name=generate: null, jobid=job_local_0002
[error] application -
! #6kemm159h - Internal server error, for (POST) [/nutch/job] ->
play.api.Application$$anon$1: Execution exception[[RuntimeException: job failed: name=generate: null, jobid=job_local_0002]]
at play.api.Application$class.handleError(Application.scala:296) ~[play_2.11-2.3.6.jar:2.3.6]
at play.api.DefaultApplication.handleError(Application.scala:402) [play_2.11-2.3.6.jar:2.3.6]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.6.jar:2.3.6]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [play_2.11-2.3.6.jar:2.3.6]
at scala.Option.map(Option.scala:145) [scala-library-2.11.1.jar:na]
Caused by: java.lang.RuntimeException: job failed: name=generate: null, jobid=job_local_0002
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.Crawler.run(Crawler.java:152) ~[nutch-2.2.1.jar:na]
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) ~[nutch-2.2.1.jar:na]
In cassandra - the keyspace webpage and tables sc p f are being created before the error is thrown.
EDIT --- If I put all (sorry its a long list I know) the below jars in my lib folder - then the job runs; and the first few logs are about connecting to cassandra. I don't see those logs when I'm trying to just use the SBT dependencies.
Logs when running with below jar files:
SLF4J: The following set of substitute loggers may have been accessed
SLF4J: during the initialization phase. Logging calls during this
SLF4J: phase were not honored. However, subsequent logging calls to these
SLF4J: loggers will work as normally expected.
SLF4J: See also http://www.slf4j.org/codes.html#substituteLogger
SLF4J: org.webjars.WebJarExtractor
[info] Compiling 5 Scala sources and 1 Java source to /home/abdev/Working/Qordaoba/gl/web-crawling-services/crawling-services/target/scala-2.11/classes...
14/12/10 07:31:03 INFO play: Application started (Dev)
14/12/10 07:31:03 INFO slf4j.Slf4jLogger: Slf4jLogger started
[Ljava.lang.String;#3a6f1296
14/12/10 07:31:05 INFO connection.CassandraHostRetryService: Downed Host Retry service started with queue size -1 and retry delay 10s
14/12/10 07:31:05 INFO service.JmxMonitor: Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector
14/12/10 07:31:06 INFO crawl.InjectorJob: InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the Gora storage class.
14/12/10 07:31:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/10 07:31:06 INFO input.FileInputFormat: Total input paths to process : 1
Full list of Jar files
activation-1.1.jar
antlr-3.2.jar
aopalliance-1.0.jar
apache-cassandra-1.2.19.jar
apache-cassandra-clientutil-1.2.19.jar
apache-cassandra-thrift-1.2.19.jar
apache-nutch-2.2.1.jar
asm-3.2.jar
avro-1.3.3.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.1.jar
commons-cli-1.2.jar
commons-codec-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.1.jar
commons-math-2.1.jar
commons-net-1.4.1.jar
compress-lzf-0.8.4.jar
concurrentlinkedhashmap-lru-1.3.jar
cql-internal-only-1.4.1.zip
crawler-commons-0.2.jar
cxf-api-2.5.2.jar
cxf-common-utilities-2.5.2.jar
cxf-rt-bindings-xml-2.5.2.jar
cxf-rt-core-2.5.2.jar
cxf-rt-frontend-jaxrs-2.5.2.jar
cxf-rt-transports-common-2.5.2.jar
cxf-rt-transports-http-2.5.2.jar
elasticsearch-0.19.4.jar
geronimo-javamail_1.4_spec-1.7.1.jar
geronimo-stax-api_1.0_spec-1.0.1.jar
gora-cassandra-0.3.jar
gora-core-0.3.jar
guava-11.0.2.jar
guava-13.0.1.jar
hadoop-core-1.2.0.jar
hamcrest-core-1.3.jar
hector-core-1.1-4.jar
high-scale-lib-1.1.2.jar
hsqldb-2.2.8.jar
httpclient-4.1.1.jar
httpcore-4.1.jar
icu4j-4.0.1.jar
jackson-core-asl-1.8.8.jar
jackson-core-asl-1.9.2.jar
jackson-jaxrs-1.7.1.jar
jackson-mapper-asl-1.8.8.jar
jackson-mapper-asl-1.9.2.jar
jackson-xc-1.7.1.jar
jamm-0.2.5.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jbcrypt-0.3m.jar
jdom-1.1.jar
jersey-core-1.8.jar
jersey-json-1.8.jar
jersey-server-1.8.jar
jettison-1.3.1.jar
jetty-6.1.26.jar
jetty-client-6.1.26.jar
jetty-sslengine-6.1.26.jar
jetty-util5-6.1.26.jar
jetty-util-6.1.26.jar
jline-0.9.1.jar
jline-1.0.jar
json-simple-1.1.jar
jsr305-1.3.9.jar
jsr311-api-1.1.1.jar
junit-4.11.jar
juniversalchardet-1.0.3.jar
libthrift-0.7.0.jar
log4j-1.2.16.jar
lucene-analyzers-3.6.0.jar
lucene-core-3.6.0.jar
lucene-highlighter-3.6.0.jar
lucene-memory-3.6.0.jar
lucene-queries-3.6.0.jar
lz4-1.1.0.jar
metrics-core-2.2.0.jar
neethi-3.0.1.jar
org.osgi.core-4.0.0.jar
org.restlet.ext.jackson-2.0.5.jar
org.restlet-2.0.5.jar
oro-2.0.8.jar
paranamer-2.2.jar
paranamer-ant-2.2.jar
paranamer-generator-2.2.jar
qdox-1.10.1.jar
serializer-2.7.1.jar
servlet-api-2.5-6.1.14.jar
servlet-api-2.5-20081211.jar
slf4j-api-1.6.6.jar
slf4j-api-1.7.2.jar
slf4j-log4j12-1.6.1.jar
slf4j-log4j12-1.7.2.jar
snakeyaml-1.6.jar
snappy-java-1.0.5.jar
snaptree-0.1.jar
solr-solrj-3.4.0.jar
spring-aop-3.0.6.RELEASE.jar
spring-asm-3.0.6.RELEASE.jar
spring-beans-3.0.6.RELEASE.jar
spring-context-3.0.6.RELEASE.jar
spring-core-3.0.6.RELEASE.jar
spring-expression-3.0.6.RELEASE.jar
spring-web-3.0.6.RELEASE.jar
stax2-api-3.1.1.jar
stax-api-1.0.1.jar
stax-api-1.0-2.jar
thrift-python-internal-only-0.7.0.zip
tika-core-1.3.jar
woodstox-core-asl-4.1.1.jar
wsdl4j-1.6.2.jar
wstx-asl-3.2.7.jar
xercesImpl-2.9.1.jar
xml-apis-1.3.04.jar
xmlenc-0.52.jar
xmlParserAPIs-2.6.2.jar
xmlschema-core-2.0.1.jar
zookeeper-3.3.1.jar
Thanks,
Brent

Related

How to show mlcp copy summary at the end of the copy process

With mlMlcpVersion=10.0.6.2, one could see the summary of the mlcp copy process like how many records were copied and how many failed etc.
Below is one example.
2022-09-01 16:45:48 INFO LocalJobRunner:231 - com.marklogic.mapreduce.MarkLogicCounter:
2022-09-01 16:45:48 INFO LocalJobRunner:235 - ESTIMATED_INPUT_RECORDS: 12
2022-09-01 16:45:48 INFO LocalJobRunner:235 - INPUT_RECORDS: 12
2022-09-01 16:45:48 INFO LocalJobRunner:235 - OUTPUT_RECORDS: 12
2022-09-01 16:45:48 INFO LocalJobRunner:235 - OUTPUT_RECORDS_COMMITTED: 12
2022-09-01 16:45:48 INFO LocalJobRunner:235 - OUTPUT_RECORDS_FAILED: 0
2022-09-01 16:45:48 INFO LocalJobRunner:239 - Total execution time: 6 sec
However, since it is upgraded to mlMlcpVersion=10.0.9.2, there is no more summary at the end of the mlcp process. How to enable it? (Lastest Version of MLCP is 10.0.9.2)
Here is the screenshot of running 10.0.9.2 for the very same copy command.
Successfully started process 'command 'C:\Program Files\OpenJDK\openjdk-11.0.16_8\bin\java.exe''
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/xx/.gradle/caches/modules-2/files-2.1/ch.qos.logback/logback-classic/1.2.3/7c4f3c474fb2c041d8028740440937705ebb473a/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/xx/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-slf4j-impl/2.17.1/84692d456bcce689355d33d68167875e486954dd/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/xx/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-log4j12/1.7.10/b3eeae7d1765f988a1f45ea81517191315c69c9e/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
04:49:35.639 [main] ERROR org.apache.hadoop.util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:364)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:440)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:486)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at com.marklogic.contentpump.ContentPump.runCommand(ContentPump.java:120)
at com.marklogic.contentpump.ContentPump.main(ContentPump.java:74)
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/C:/Users/ling/.gradle/caches/modules-2/files-2.1/org.apache.hadoop/hadoop-auth/2.7.2/bf613cfec06a1f3d3a91d7f82f9e4af75bc01f72/hadoop-auth-2.7.2.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
04:49:36.358 [main] WARN o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
:mCopyNISOData (Thread[Daemon worker Thread 6,5,main]) completed. Took 5.882 secs.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
See https://docs.gradle.org/7.2/userguide/command_line_interface.html#sec:command_line_warnings
BUILD SUCCESSFUL in 9s
It is important to see the summary information and copy progress.

Can not generate nodejs-server with openapi-generator

How can I generate nodejs-server with OAS 2.0 as input with openapi-generator?
openapi-generator runs on two versions 3.3.4 and 4.0.0.
The results of my execution are listed below.
■in 3.3.4
java -jar openapi-generator-cli-3.3.4.jar generate -i petstore.json -g nodejs-server -o stub
[main] WARN o.o.c.ignore.CodegenIgnoreProcessor - Output directory does not exist, or is inaccessible. No file (.openapi-generator-ignore) will be evaluated.
[main] WARN o.o.c.languages.NodeJSServerCodegen -
=======================================================================================
Currently, Node.js server doesn't work as its dependency doesn't support OpenAPI Spec3.
For further details, see https://github.com/OpenAPITools/openapi-generator/issues/34
=======================================================================================
[main] INFO o.o.codegen.DefaultGenerator - Model Pets not generated since it's an alias to array (without property)
Exception in thread "main" java.lang.RuntimeException: Could not generate api file for 'Pets'
at org.openapitools.codegen.DefaultGenerator.generateApis(DefaultGenerator.java:651)
at org.openapitools.codegen.DefaultGenerator.generate(DefaultGenerator.java:891)
at org.openapitools.codegen.cmd.Generate.run(Generate.java:355)
at org.openapitools.codegen.OpenAPIGenerator.main(OpenAPIGenerator.java:62)
Caused by: java.lang.IllegalArgumentException: character to be escaped is missing
at java.util.regex.Matcher.appendReplacement(Matcher.java:809)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at java.lang.String.replaceAll(String.java:2223)
at org.openapitools.codegen.languages.NodeJSServerCodegen.apiFilename(NodeJSServerCodegen.java:192)
at org.openapitools.codegen.DefaultGenerator.generateApis(DefaultGenerator.java:595)
... 3 more
■in 4.0.0
java -jar openapi-generator-cli-4.0.0.jar generate -i petstore.json -g nodejs-server -o stub
[main] WARN o.o.c.ignore.CodegenIgnoreProcessor - Output directory does not exist, or is inaccessible. No file (.openapi-generator-ignore) will be evaluated.
[main] INFO o.o.codegen.DefaultGenerator - OpenAPI Generator: nodejs-server (server)
[main] INFO o.o.codegen.DefaultGenerator - Generator 'nodejs-server' is considered stable.
[main] WARN o.o.c.languages.NodeJSServerCodegen -
=======================================================================================
Currently, Node.js server doesn't work as its dependency doesn't support OpenAPI Spec3.
For further details, see https://github.com/OpenAPITools/openapi-generator/issues/34
=======================================================================================
[main] INFO o.o.codegen.DefaultGenerator - Model Pets not generated since it's an alias to array (without property) and `generateAliasAsModel` is set to false (default)
Exception in thread "main" java.lang.RuntimeException: Could not generate api file for 'Pets'
at org.openapitools.codegen.DefaultGenerator.generateApis(DefaultGenerator.java:666)
at org.openapitools.codegen.DefaultGenerator.generate(DefaultGenerator.java:922)
at org.openapitools.codegen.cmd.Generate.run(Generate.java:396)
at org.openapitools.codegen.OpenAPIGenerator.main(OpenAPIGenerator.java:60)
Caused by: java.lang.IllegalArgumentException: character to be escaped is missing
at java.util.regex.Matcher.appendReplacement(Matcher.java:809)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at java.lang.String.replaceAll(String.java:2223)
at org.openapitools.codegen.languages.NodeJSServerCodegen.apiFilename(NodeJSServerCodegen.java:181)
at org.openapitools.codegen.DefaultGenerator.generateApis(DefaultGenerator.java:611)
... 3 more
The input OAS uses the following.
https://raw.githubusercontent.com/OAI/OpenAPI-Specification/master/examples/v2.0/json/petstore.json
The execution environment is as follows.
・windows10
・java 1.8.0_202
It works for me:
java -jar modules/openapi-generator-cli/target/openapi-generator-cli.jar generate -g nodejs-server -i https://raw.githubusercontent.com/OAI/OpenAPI-Specification/master/examples/v2.0/json/petstore.json -o /tmp/nodejs-server
[main] INFO o.o.codegen.DefaultGenerator - OpenAPI Generator: nodejs-server (server)
[main] INFO o.o.codegen.DefaultGenerator - Generator 'nodejs-server' is considered stable.
[main] WARN o.o.c.languages.NodeJSServerCodegen -
=======================================================================================
Currently, Node.js server doesn't work as its dependency doesn't support OpenAPI Spec3.
For further details, see https://github.com/OpenAPITools/openapi-generator/issues/34
=======================================================================================
[main] INFO o.o.codegen.DefaultCodegen - Skipped overwriting README.md as the file already exists in /tmp/java2//README.md
[main] INFO o.o.codegen.DefaultGenerator - Model Pets not generated since it's an alias to array (without property) and `generateAliasAsModel` is set to false (default)
[main] INFO o.o.codegen.AbstractGenerator - writing file /tmp/java2/service/PetsService.js
[main] INFO o.o.codegen.AbstractGenerator - writing file /tmp/java2/controllers/Pets.js
[main] INFO o.o.codegen.AbstractGenerator - writing file /tmp/java2/utils/writer.js
[main] INFO o.o.codegen.AbstractGenerator - writing file /tmp/java2/api/openapi.yaml
[main] INFO o.o.codegen.AbstractGenerator - writing file /tmp/java2/index.js
[main] INFO o.o.codegen.AbstractGenerator - writing file /tmp/java2/package.json
[main] INFO o.o.codegen.AbstractGenerator - writing file /tmp/java2/.openapi-generator/VERSION
But as mentioned in the warning, the nodejs-server generator no longer works as expected as one of its dependencies does not support OpenAPI spec v3.
Please refer to https://github.com/OpenAPITools/openapi-generator/issues/2828 for the latest development of creating a new NodeJS Express generator.
UPDATE (2019/09): we've added a new nodejs-express-server generator. Please refer to https://twitter.com/oas_generator/status/1160000504455319553 for more information.

Presto unable to create injector with localfile connector

I am using presto-server-0.149 on macOS 10.11. Just for testing purpose, I run a single node and everything is okay. When I add etc/catalog/localfile.properties with:
connector.name=localfile
presto-logs.http-request-log-location=/var/log/apache2/access_log
I get the following error:
2016-07-04T12:02:45.435-0700 INFO main io.airlift.bootstrap.LifeCycleManager Life cycle starting...
2016-07-04T12:02:45.435-0700 INFO main io.airlift.bootstrap.LifeCycleManager Life cycle startup complete. System ready.
2016-07-04T12:02:45.436-0700 INFO main com.facebook.presto.metadata.CatalogManager -- Added catalog jmx using connector jmx --
2016-07-04T12:02:45.436-0700 INFO main com.facebook.presto.metadata.CatalogManager -- Loading catalog etc/catalog/localfile.properties --
2016-07-04T12:02:45.797-0700 INFO main Bootstrap PROPERTY DEFAULT RUNTIME DESCRIPTION
2016-07-04T12:02:45.797-0700 INFO main Bootstrap presto-logs.http-request-log.pattern null null If log location is a directory this glob is used to match the file names in the directory
2016-07-04T12:02:45.797-0700 INFO main Bootstrap presto-logs.http-request-log.location var/log/http-request.log var/log/http-request.log Directory or file where http request logs are written
2016-07-04T12:02:45.797-0700 INFO main Bootstrap
2016-07-04T12:02:45.797-0700 WARN main Bootstrap UNUSED PROPERTIES
2016-07-04T12:02:45.797-0700 WARN main Bootstrap presto-logs.http-request-log-location=/var/log/apache2/access_log
2016-07-04T12:02:45.797-0700 WARN main Bootstrap
2016-07-04T12:02:45.989-0700 ERROR main com.facebook.presto.server.PrestoServer Unable to create injector, see the following errors:
1) Configuration property 'presto-logs.http-request-log-location=/var/log/apache2/access_log' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:235)
1 error
com.google.inject.CreationException: Unable to create injector, see the following errors:
1) Configuration property 'presto-logs.http-request-log-location=/var/log/apache2/access_log' was not used
at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:235)
1 error
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:466)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
at com.google.inject.Guice.createInjector(Guice.java:96)
at io.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:242)
at com.facebook.presto.localfile.LocalFileConnectorFactory.create(LocalFileConnectorFactory.java:64)
at com.facebook.presto.connector.ConnectorManager.createConnector(ConnectorManager.java:315)
at com.facebook.presto.connector.ConnectorManager.addCatalogConnector(ConnectorManager.java:169)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:162)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:148)
at com.facebook.presto.metadata.CatalogManager.loadCatalog(CatalogManager.java:99)
at com.facebook.presto.metadata.CatalogManager.loadCatalogs(CatalogManager.java:77)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:115)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:63)
UPDATE
Based on Dain Sundstrom's answer below, I was able to fix my problem. It turned out that the fb documentation for Local File Connector is incorrect. And since I needed to get something to feed Presto to test the localfile connector, I changed the config filepath to Presto's own request log:
presto-logs.http-request-log.location=/var/presto/data/var/log/http-request.log
You have a typo in the configuration property. It should be:
presto-logs.http-request-log.location=/var/log/apache2/access_log
Also, this connector can only process the http log format created by Presto itself, so you would need to reconfigure your Apache2 server to output the same format.

FileNotFoundException with Titan (titan-all)

I'm trying to set up a basic Titan example. In following the docs, I tried running bin/gremlin-server.sh -i com.thinkaurelius.titan titan-all 1.0.0 which throws;
Could not install the dependency: java.io.FileNotFoundException: /usr/share/titan/ext/titan-all/plugin/titan-all-1.0.0.jar (No such file or directory)
java.lang.RuntimeException: java.io.FileNotFoundException: /usr/share/titan/ext/titan-all/plugin/titan-all-1.0.0.jar (No such file or directory)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.apache.tinkerpop.gremlin.groovy.util.DependencyGrabber.getAdditionalDependencies(DependencyGrabber.groovy:165)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.apache.tinkerpop.gremlin.groovy.util.DependencyGrabber.copyDependenciesToPath(DependencyGrabber.groovy:99)
at org.apache.tinkerpop.gremlin.server.util.GremlinServerInstall.main(GremlinServerInstall.java:38)
Caused by: java.io.FileNotFoundException: /usr/share/titan/ext/titan-all/plugin/titan-all-1.0.0.jar (No such file or directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:219)
at java.util.zip.ZipFile.<init>(ZipFile.java:149)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.apache.tinkerpop.gremlin.groovy.util.DependencyGrabber.getAdditionalDependencies(DependencyGrabber.groovy:148)
... 3 more
I also tried it from gremlin.sh;
root#ubuntu:/usr/share/titan# bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: aurelius.titan
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/share/titan/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/titan/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14:45:44 INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph - HADOOP_GREMLIN_LIBS is set to: /usr/share/titan/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph
gremlin> :install com.thinkaurelius.titan titan-all 1.0.0
==>java.io.FileNotFoundException: /usr/share/titan/ext/titan-all/plugin/titan-all-1.0.0.jar (No such file or directory)
gremlin>
I've confirmed that groovy has the file;
root#ubuntu:/usr/share/titan# ls ~/.groovy/grapes/com.thinkaurelius.titan/titan-all/jars
titan-all-1.0.0.jar
So now I'm stumped.. Has anyone come across this before?
EDIT: Some notes on how I got here..
My first attempt at getting this working was to use the all-inclusive zip file as per the docs... I changed gremlin-server.yaml to;
graph: conf/titan-cassandra-es.properties
That threw;
407 [main] WARN org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] configured at [conf/titan-cassandra-es.properties] could not be instantiated and will not be available in Gremlin Server. GraphFactory message: Configuration must contain a valid 'gremlin.graph' setting
java.lang.RuntimeException: Configuration must contain a valid 'gremlin.graph' setting
Ok, simple google search tells me I need to add this to conf/titan-cassandra-es.properties;
gremlin.graph=com.thinkaurelius.titan.core.TitanFactory
At which point, I get..
484 [main] WARN org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] configured at [conf/titan-cassandra-es.properties] could not be instantiated and will not be available in Gremlin Server. GraphFactory message: GraphFactory could not instantiate this Graph implementation [class com.thinkaurelius.titan.core.TitanFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class com.thinkaurelius.titan.core.TitanFactory]
This leads me to believe that I'm missing com.thinkaurelius.titan.core.TitanFactory. Which is curious, since $TITAN_HOME/lib does in fact contain titan-all-1.0.0.jar. So I assumed (perhaps wrongly) that I need to run the titan-all install to make it actually load the jars..
The basic install for Titan is unzip the titan-1.0.0-hadoop1.zip. That is it!
Download it from http://titandb.io
http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html
It is already packaged with the Titan plugins, so you don't need to install them into the Gremlin Console or Gremlin Server.
If you want to try the Titan Server, there is a pre-packaged titan.sh script which automatically starts Cassandra and Elasticsearch with the server.
http://s3.thinkaurelius.com/docs/titan/1.0.0/server.html#_getting_started
For anyone that comes across this strangeness, read the whole stack trace. It turns out waaay at the bottom, it actually had the real issue; it couldn't connect to Cassandra because I had not enabled Thrift.

Rexster refuses to start with extension but does not display errors

I have a small Rexster/Titan cluster using Cassandra. A Rexster extension is used to query the graph. I did some benchmarking and did start and stop Rexster/Titan many times. But now I run into a strange issue: Rexster refuses to start but does not display any error message.
I tried to figure out what is causing this and reduced the cluster to a single node 192.168.0.4.
If I remove my extension Rexster manages to start up.
# console output
Forking Cassandra...
Running `nodetool statusthrift`..... OK
(returned exit status 0 and printed string "running").
Forking Titan + Rexster...
Connecting to Titan + Rexster (127.0.0.1:8184)...... OK
(connected to 127.0.0.1:8184).
Run rexster-console.sh to connect.
but when I place my extension uber JAR in the ext folder Rexster refuses to start.
# console output
Forking Cassandra...
Running `nodetool statusthrift`..... OK
(returned exit status 0 and printed string "running").
Forking Titan + Rexster...
Connecting to Titan + Rexster (127.0.0.1:8184)............................
timeout exceeded (60 seconds): could not connect to 127.0.0.1:8184
See /var/lib/titan/bin/../log/rexstitan.log for Rexster log output.
If I now check rexstitan.log, as suggested by the console output, I can not find any error message.
# rexstitan.log
0 [main] INFO com.tinkerpop.rexster.Application - .:Welcome to Rexster:.
73 [main] INFO com.tinkerpop.rexster.server.RexsterProperties -
Using [/var/lib/titan/rexhome/../conf/rexster-cassandra-cluster.xml]
as configuration source.
78 [main] INFO com.tinkerpop.rexster.Application - Rexster is watching
[/var/lib/titan/rexhome/../conf/rexster-cassandra-cluster.xml] for change.
244 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=ClusterTitanConnectionPool,ServiceType=connectionpool
252 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
537 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=KeyspaceTitanConnectionPool,ServiceType=connectionpool
538 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
1951 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration -
Set cluster.partition=false from store features
1971 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration -
Set default timestamp provider MICRO
2019 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration -
Generated unique-instance-id=7f0000012902-node1
2045 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=ClusterTitanConnectionPool,ServiceType=connectionpool
2046 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
2053 [main] INFO com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager -
Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,
name=KeyspaceTitanConnectionPool,ServiceType=connectionpool
2054 [main] INFO com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor -
AddHost: 192.168.0.4
2228 [main] INFO com.thinkaurelius.titan.diskstorage.Backend -
Initiated backend operations thread pool of size 4
6619 [main] INFO com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog -
Loaded unidentified ReadMarker start time Timepoint[1423479705116000 μs]
into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller#212f3ff1
6625 [main] INFO com.tinkerpop.rexster.RexsterApplicationGraph -
Graph [graph] - configured with allowable namespace [*:*]
The only entry that looks strange to me is the one concerning the log:
6619 [main] INFO com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog -
Loaded unidentified ReadMarker start time Timepoint[1423479705116000 μs]
into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller#212f3ff1
My exception uses the logger for debugging. You can see the instantiation an usage on github: https://github.com/sebschlicht/titan-graphity-kribble/blob/master/src/main/java/de/uniko/sebschlicht/titan/extensions/GraphityExtension.java#L22
Though Rexster failed to start there is a process with the PID displayed in the console but curl fails to connect to Rexster:
$ curl 192.168.0.4:8182
curl: (7) Failed to connect to 192.168.0.4 port 8182: Connection refused
Why doesn't Rexster throw an exception? How can I debug this situation?
edit:
I removed any log messages in my code. I removed all exceptions that may be thrown during startup. Still Rexster refuses to start with my extension and the only hint in the log files is the unidentified read marker. I have to clue what prevents Rexster from starting.
The log message is nothing to worry about.
After rebuilding the application in another project step-by-step Rexster is now able to start with the extension. During this rebuild I noticed two situations, that can cause the behaviour described:
Missing dependency
If your project depends on a second project you might use Maven to inject it as a dependency. However, if you use
mvn clean package
to build the extension's JAR file it does not contain this dependency by default. You need to use a Maven plugin (e.g. maven-shade-plugin) to create a shaded JAR that contains all the dependencies your extension needs. Set the dependency scope to provided for all Titan/Rexster/Blueprints related dependencies. Use the shaded uber-JAR to deploy the extension to Rexster.
However, this was not new to me and should not have caused the problem in my case. There might be more situations that cause this problem or maybe there was a problem with Maven that messed up the shaded JAR. Feel free to browse the commit on github to catch this voodoo.
Missing extension
Another cause of this behaviour is a missing extension.
If you specify an extension in the com.tinkerpop.rexster.extension.RexsterExtension resource file, that is not present on startup, Rexster does neither log nor throw an exception, but refuses to start.

Resources