Cannot Start JanusGraph as per documentation - tinkerpop3

I am using the Janus Graph doc and I extracted it as mentioned.
./gremlin.sh
works fine and it starts the Gremlin prompt.
This code also works fine
graph = JanusGraphFactory.open('inmemory')
g = graph.traversal()
Problem
When I do this, I get a huge stacktrace
graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')
12:15:49 WARN org.janusgraph.diskstorage.es.rest.RestElasticSearchClient - Unable to determine Elasticsearch server version. Default to FIVE.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.lang.Thread.run(Thread.java:748)
Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex
Type ':help' or ':h' for help.
Display stack trace? [yN]

As of JanusGraph 0.2.0, you cannot run Elasticsearch as an embedded node, so you need to start up an Elasticsearch node. You could download and deploy your own Elasticsearch node, or you could take advantage of the JanusGraph pre-packaged distribution:
bin/janusgraph.sh start
This command will start up one Cassandra node, one Elasticsearch node, and a Gremlin Server. Note that you will need the Java Development Kit (JDK) to run this command.
Alternatively, you could start the only the Elasticsearch node from the JanusGraph pre-packaged distribution:
elasticsearch/bin/elasticsearch
which will start Elasticsearch in the foreground.

Related

Using Titan DynamoDB on AWS and querying from NodeJs

I've read most of their documentation, looked into TinkerPop. Tried setting up Docker instances, EC2 instances using AWS CloudFormation template they recommended for Titan 1.0.0 but still can't work it out.
I can start the Titan database, connect gremlin to it and make queries etc. but how do I use it from NodeJs. It seems like since they upgraded to 1.0.0 the documentation doesn't explain it very well. Rexster is now gone as far as I'm aware and was replaced by Gremlin Server but I still can't find anything on remotely working with it.
Really tempted to sack it and move over to Neo4j but don't want to be bound to using a single machine, I want the scalability that Titan allows. I've managed to get older versions of Titan working with Rexster but I need to get the new version running.
Can anyone explain what I need to do or if it's perhaps broken? Or just point me in the right direction.
Thanks
Gremlin Server is the replacement for Rexster in TinkerPop3, which Titan 1.0 uses. In the Gremlin Server documentation, you can find a lot more detail on configuration than the Titan docs.
Under titan-1.0.0-hadoop1/conf/gremlin-server/gremlin-server.yaml, you can find the configuration settings for the server. Out of the box, it uses WebSockets and a BerkeleyDB backend. You can update those settings to match your set up. For example, here's a Titan server configuration for Cassandra and Elasticsearch. If you are planning to connect to it from a different computer, make sure to update the host property.
Start the server with bin/gremlin-server.sh conf/gremlin-server/gremlin-server.yaml then you can connect to it with a remote connection. As described in the TinkerPop docs, you could connect with a Gremlin Console then issue commands to the remote server.
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :> g.V().values('name')
For using Node, you can use this WebSockets Gremlin client. You can find client libraries for other languages on the TinkerPop homepage.

Where to find Titan error logs?

I'm using Titan with Cassandra, Elasticsearch and Rexster.
Everything is properly set up and I can add/remove nodes and edges to the graph through Rexster as well as the REST API.
When it crashes, I have to kill java and run it again. The error that I get in Rexster is:
Could not get the vertices of graphs from Rexster.
It happens often and I don't know what the problem is. I'm not sure what part of the stack -- Titan, Rexster or Elasticsearch -- fails.
Where can I find a log file that I could look at to find out what the problem is?
I assume that you are using Titan Server distribution. By default there should be a log directory in the root of your titan installation directory. It should contain two files:
cassandra.log - obviously for cassandra
rexstitan.log - Rexster logs. As Rexster hosts Titan, the Titan logging messages should be in here as well.
It also depends of your Titan configuration, whether is remote or embedded. Cassandra logs are usually stored in /var/log/cassandra/. Check there also.

How to setup Titan with embedded Cassandra and Rexster

I am trying to setup Titan (server 0.4.4) with Cassandra embedded. My
environment is Windows 8.1 x64 + Cygwin.
The install is in E:\titan-server-0.4.4.
I also need to be able to access this setup via Rexster.
For my configuration, I referred to https://github.com/thinkaurelius/titan/wiki/Using-Cassandra.
I've modified graph configuration
E:\titan-server-0.4.4\conf\rexster-cassandra-es.xml
graph section to
<graph>
<graph-name>graph</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<graph-read-only>false</graph-read-only>
<properties>
<auto-type>none</auto-type>
<storage.batch-loading>true</storage.batch-loading>
<storage.cassandra-config-dir>file:///E:\titan-server-0.4.4\conf\cassandra.yaml</storage.cassandra-config-dir>
<storage.backend>embeddedcassandra</storage.backend>
<storage.index.search.backend>elasticsearch</storage.index.search.backend>
<storage.index.search.directory>../db/es</storage.index.search.directory>
<storage.index.search.client-only>false</storage.index.search.client-only>
<storage.index.search.local-mode>true</storage.index.search.local-mode>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
(Note
<auto-type>none</auto-type>
<storage.batch-loading>true</storage.batch-loading>
these are to allow bulk insert. The whole idea of embedded Cassandra is to improve the insertion performance.)
However, when I tried starting the service with ./bin/titan.sh -v start, the start failed with:
org.apache.cassandra.exceptions.ConfigurationException:
localhost/127.0.0.1:7000 is in use by another process. Change
listen_address:storage_port in cassandra.yaml to values that do not
conflict with other services
at org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:439)
at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:387)
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:549)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:514)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:411)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:278)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:366)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:409)
at com.thinkaurelius.titan.diskstorage.cassandra.utils.CassandraDaemonWrapper.start(CassandraDaemonWrapper.java:51)
at com.thinkaurelius.titan.diskstorage.cassandra.embedded.CassandraEmbeddedStoreManager.(CassandraEmbeddedStoreManager.java:102)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at com.thinkaurelius.titan.diskstorage.Backend.instantiate(Backend.java:344)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:367)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:311)
at com.thinkaurelius.titan.diskstorage.Backend.(Backend.java:121)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1173)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.(StandardTitanGraph.java:75)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:40)
at com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration.configureGraphInstance(TitanGraphConfiguration.java:25)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:119)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.(GraphConfigurationContainer.java:54)
at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
at com.tinkerpop.rexster.server.XmlRexsterApplication.(XmlRexsterApplication.java:47)
at com.tinkerpop.rexster.Application.(Application.java:96)
at com.tinkerpop.rexster.Application.main(Application.java:188)
localhost/127.0.0.1:7000 is in use by another process. Change
listen_address:storage_port in cassandra.yaml to values that do not
conflict with other services
I tried mofiying the ports in "E:\titan-server-0.4.4\conf\cassandra.yaml", but after some investigation, I've realized that the port is actually taken by Cassandra itself, i.e. in this configuration, ./bin/titan.sh -v start tries to start multiple instances of Cassandra?!
I copied cassandra.yaml to cassandra2.yaml with different port settings and specified path to cassandra2.yaml in the graph configuration xml.
After this, I was able to start Rexster with Titan and Cassandra embedded by running ./bin/titan.sh -v start.
However, I strongly believe that something is wrong with this setup. Besides, the system does not behave well - sometime I cannot save a graph in Rexster's (Web based) Gremlin shell by using g.commit() - the command succeeds, but nothing gets saved.
So is the right way to run Titan with Cassandra embedded? What is the configuration supposed to be?
If you use Titan server via the shell or bat script, it will automatically start a Titan instance for you and attempt to connect to it over localhost.
When you configured it to use Cassandra embedded, the two instances naturally conflict.
Is there a particular reason you want to use Cassandra embedded. I'd strongly encourage you to try the out-of-the-box version first. Cassandra embedded is mostly meant for low latency applications and requires a solid understanding of the JVM.
Good luck!

org.jboss.netty.channel.ChannelPipelineException: Failed to initialize a pipeline

I have an application that connects to Cassandra using the Java Driver, fetches some configuration and based on the results generates and executes some PIG scripts.
Now, I am able to successfully connect to Cassandra, when jars required for PIG are not in the classpath. Similarly, I am able to launch PigServer class and execute scripts / statements using the entire DSE stack when I am not connecting to Cassandra using the java driver to retrieve the configuration.
When I use both of them I get following exception:
org.jboss.netty.channel.ChannelPipelineException: Failed to initialize a pipeline.
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:181)
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:570)
... 35 more
Caused by: org.jboss.netty.channel.ChannelPipelineException: Failed to initialize a pipeline.
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:208)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182)
at com.datastax.driver.core.Connection.<init>(Connection.java:100)
at com.datastax.driver.core.Connection.<init>(Connection.java:51)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:376)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:207)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:170)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:87)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:576)
at com.datastax.driver.core.Cluster$Manager.access$100(Cluster.java:520)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:67)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:94)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:501)
I see others have seen similar exception, but when trying to execute Cassandra statements, from MapReduce tasks, which is not my case:
https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/FhW_8e4FyAI
http://www.datastax.com/dev/blog/the-native-cql-java-driver-goes-ga#comment-297187
Thanks!
DSE stacks connect to Cassandra through thrift API which is different from Cassandra Java Driver.
You can't use Cassandra Java driver for Pig/Hadoop before CASSANDRA-6311 is resolved.
There may be the bad security certificate/security certificate expiration issue if you are using certificate.

Cassandra datastax java driver ,can not connect to server "No handler set for stream"

If I create a new project like this .
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
this code works.
But if I take all the jars from this project and migrate the jars to my own project .the code above doesn't work and it says:
13/07/01 16:27:16 ERROR core.Connection: [/127.0.0.1-1] No handler set for stream 1 (this is a bug, either of this driver or of Cassandra, you should report it)
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: [/127.0.0.1])
What version of Cassandra are you running? Have you enabled the native protocol in your cassandra.yaml?
In Cassandra 1.2.0-1.2.4 the native protocol was disabled by default, but in 1.2.5+ it's on by default.
See https://github.com/apache/cassandra/blob/cassandra-1.2.5/conf/cassandra.yaml#L335
That's the most common reason I've seen for not being able to connect with the driver.

Resources