Where to find Titan error logs? - cassandra

I'm using Titan with Cassandra, Elasticsearch and Rexster.
Everything is properly set up and I can add/remove nodes and edges to the graph through Rexster as well as the REST API.
When it crashes, I have to kill java and run it again. The error that I get in Rexster is:
Could not get the vertices of graphs from Rexster.
It happens often and I don't know what the problem is. I'm not sure what part of the stack -- Titan, Rexster or Elasticsearch -- fails.
Where can I find a log file that I could look at to find out what the problem is?

I assume that you are using Titan Server distribution. By default there should be a log directory in the root of your titan installation directory. It should contain two files:
cassandra.log - obviously for cassandra
rexstitan.log - Rexster logs. As Rexster hosts Titan, the Titan logging messages should be in here as well.

It also depends of your Titan configuration, whether is remote or embedded. Cassandra logs are usually stored in /var/log/cassandra/. Check there also.

Related

Cassandra JMX need to connect all the nodes

I am trying to get to know Cassandra cfstats information from all the machines using JMX. This can be done using OpsCenter, but I do not want to use it. I started building my own utility. For now, my java program connects to JMX and fetching cfstats information such as estimateKeys, No of SSTables ..etc.
My requirement is, This is a java jar file, will run from one Cassandra node and should be able to connect to all the machines and fetch cfstats using their respective JMX per node.
I am planning to use java driver for this, as java driver will be able to connects all the machines in the cluster using system.peers coumnFamily. Once java driver connect to the machines, I will form the service:jmx:rmi using respective hostname and JMX port(7199). Then I will be able to connect to NodeProbe using this information.
My question is, after connecting to the another node using java driver, will I be able to retain state there and after forming service:jmx:rmi url, will this url really connects to the current node JMX and pull cfstats information from the current node. Because JMX host name it will take from the Cassandra-env.sh file. Can some one please help me in this.
Does this idea works or is there another best way to achieve this?
It's possible to use JMX remotely, but that's not the easiest thing to do.
But if you are writing your tool - maybe it's worth to check out a different connection. E.g. you can easily convert JMX calls to HTTP using Jolokia

Unable to start DSE using SPARK_ENABLED=1

We are running 6 node cluster with:
HADOOP_ENABLED=0
SOLR_ENABLED=0
SPARK_ENABLED=0
CFS_ENABLED=0
Now, we would like to add Spark to all of them. It seems like "adding" is not the right term because this would not fail. Anyways, the steps we've done:
1. drained one of the nodes
2. changed /etc/default/dse to SPARK_ENABLED=1 and HADOOP_ENABLED=0
3. sudo service dse restart
And got the following in the log:
ERROR [main] 2016-05-17 11:51:12,739 CassandraDaemon.java:294 - Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Cannot start node if snitch's data center (Analytics) differs from previous data center (Cassandra). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
There are two related questions that have been already answered:
Unable to start solr aspect of DSE search
Two node DSE spark cluster error setting up second node. Why?
Unfortunately, clearing the data on the node is not an option - why would I do that? I need the data to be intact.
Using "-Dcassandra.ignore_rack=true -Dcassandra.ignore_dc=true" is a bit scary in production. I don't understand why DSE wants to create another DC and why can't it just use the existing one?
I know that according to datastax's doc one should partition the load using different DC for different workloads. In our case we just want to run SPARK jobs on the same nodes that Cassandra is running using the same DC.
Is that possible?
Thanks!
The other answers are correct. The issue here is trying to warn you that you have previously identified this node as being in another DC. This means that it probably doesn't have the right data for any key-spaces with Network Topology Strategy. For example if you had a NTS keyspace which had only one replica in "Cassandra" and changed the DC to "Analytics" you could inadvertently lose all of the data.
This warning and the accompanying flag are telling you that you are doing something that you should not be doing in a production cluster.
The real solution to this is to explicitly name your dc's using GossipingFileSnitch and not rely on SimpleSnitch which names based on the DSE workload.
In this case, switch to GPFS and set the DC name to Cassandra.

Script to run Rexster as a daemon in linux

I'm setting up Titan graph database for the first time in a production environment on Debian virtual machines, and I am utilising Rexster to provide the interface into Titan. However after googling around I cannot find any scripts to allow rexster to run as a daemon in the background. As per titan rexster with external cassandra instance I have split off Cassandra, Elasticsearch, and Rexster to start as their own processes. Cassandra and Elasticsearch conveniently have Debian packages that deploy the daemon scripts out of the box, however there is nothing for Rexster. Has anyone made a script that allows Rexster to run as a daemon?
Looking at the rexster.sh script in titan download zip ../$titan_base/bin/ it calls java to start Rexster up, so I'm thinking that some kind of wrapper like JSVC could be used to start it up, unless there is an easier way?
A simple, generic tool to handle this is Daemonize. More details in this post.
If your Debian is new enough to be using Systemd, look into creating a service script. The key commands for using your script would be:
systemctl start rexster.service
systemctl enable rexster.service

titan rexster with external cassandra instance

I have a cassandra cluster (2.1.0) running fine.
After installing titan 5.1, and editing the titan-cassandra.properties to point to cluster hostname list rather than localhost, i run following -
titan.sh -c conf/titan-cassandra.properties start
It is able to recognize running cassandra instance, starts elastic search, but times out while connecting to rexster.
If i run it with local cassandra, everything runs fine using following ->br>
titan.sh start
do i need to make any change in rexster properties to point to running cassandra cluster..
Thanks in advance
Titan Server started by titan.sh represents a quick way to get started with Titan/Rexster/ES. It is designed to simplify running all those things with one startup script. Once you start breaking things apart (e.g. a separate cassandra cluster), you might not want to use titan.sh anymore because, it still forks a cassandra process when it starts up. Presumably, you don't need that anymore, given that you have a separate cassandra cluster.
Given the more advanced nature of your environment, I would simply download Rexster and configure it to connect to your cluster.

How to setup Titan with embedded Cassandra and Rexster

I am trying to setup Titan (server 0.4.4) with Cassandra embedded. My
environment is Windows 8.1 x64 + Cygwin.
The install is in E:\titan-server-0.4.4.
I also need to be able to access this setup via Rexster.
For my configuration, I referred to https://github.com/thinkaurelius/titan/wiki/Using-Cassandra.
I've modified graph configuration
E:\titan-server-0.4.4\conf\rexster-cassandra-es.xml
graph section to
<graph>
<graph-name>graph</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<graph-read-only>false</graph-read-only>
<properties>
<auto-type>none</auto-type>
<storage.batch-loading>true</storage.batch-loading>
<storage.cassandra-config-dir>file:///E:\titan-server-0.4.4\conf\cassandra.yaml</storage.cassandra-config-dir>
<storage.backend>embeddedcassandra</storage.backend>
<storage.index.search.backend>elasticsearch</storage.index.search.backend>
<storage.index.search.directory>../db/es</storage.index.search.directory>
<storage.index.search.client-only>false</storage.index.search.client-only>
<storage.index.search.local-mode>true</storage.index.search.local-mode>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
(Note
<auto-type>none</auto-type>
<storage.batch-loading>true</storage.batch-loading>
these are to allow bulk insert. The whole idea of embedded Cassandra is to improve the insertion performance.)
However, when I tried starting the service with ./bin/titan.sh -v start, the start failed with:
org.apache.cassandra.exceptions.ConfigurationException:
localhost/127.0.0.1:7000 is in use by another process. Change
listen_address:storage_port in cassandra.yaml to values that do not
conflict with other services
at org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:439)
at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:387)
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:549)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:514)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:411)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:278)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:366)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:409)
at com.thinkaurelius.titan.diskstorage.cassandra.utils.CassandraDaemonWrapper.start(CassandraDaemonWrapper.java:51)
at com.thinkaurelius.titan.diskstorage.cassandra.embedded.CassandraEmbeddedStoreManager.(CassandraEmbeddedStoreManager.java:102)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at com.thinkaurelius.titan.diskstorage.Backend.instantiate(Backend.java:344)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:367)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:311)
at com.thinkaurelius.titan.diskstorage.Backend.(Backend.java:121)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1173)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.(StandardTitanGraph.java:75)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:40)
at com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration.configureGraphInstance(TitanGraphConfiguration.java:25)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:119)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.(GraphConfigurationContainer.java:54)
at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
at com.tinkerpop.rexster.server.XmlRexsterApplication.(XmlRexsterApplication.java:47)
at com.tinkerpop.rexster.Application.(Application.java:96)
at com.tinkerpop.rexster.Application.main(Application.java:188)
localhost/127.0.0.1:7000 is in use by another process. Change
listen_address:storage_port in cassandra.yaml to values that do not
conflict with other services
I tried mofiying the ports in "E:\titan-server-0.4.4\conf\cassandra.yaml", but after some investigation, I've realized that the port is actually taken by Cassandra itself, i.e. in this configuration, ./bin/titan.sh -v start tries to start multiple instances of Cassandra?!
I copied cassandra.yaml to cassandra2.yaml with different port settings and specified path to cassandra2.yaml in the graph configuration xml.
After this, I was able to start Rexster with Titan and Cassandra embedded by running ./bin/titan.sh -v start.
However, I strongly believe that something is wrong with this setup. Besides, the system does not behave well - sometime I cannot save a graph in Rexster's (Web based) Gremlin shell by using g.commit() - the command succeeds, but nothing gets saved.
So is the right way to run Titan with Cassandra embedded? What is the configuration supposed to be?
If you use Titan server via the shell or bat script, it will automatically start a Titan instance for you and attempt to connect to it over localhost.
When you configured it to use Cassandra embedded, the two instances naturally conflict.
Is there a particular reason you want to use Cassandra embedded. I'd strongly encourage you to try the out-of-the-box version first. Cassandra embedded is mostly meant for low latency applications and requires a solid understanding of the JVM.
Good luck!

Resources