Using Titan DynamoDB on AWS and querying from NodeJs - node.js

I've read most of their documentation, looked into TinkerPop. Tried setting up Docker instances, EC2 instances using AWS CloudFormation template they recommended for Titan 1.0.0 but still can't work it out.
I can start the Titan database, connect gremlin to it and make queries etc. but how do I use it from NodeJs. It seems like since they upgraded to 1.0.0 the documentation doesn't explain it very well. Rexster is now gone as far as I'm aware and was replaced by Gremlin Server but I still can't find anything on remotely working with it.
Really tempted to sack it and move over to Neo4j but don't want to be bound to using a single machine, I want the scalability that Titan allows. I've managed to get older versions of Titan working with Rexster but I need to get the new version running.
Can anyone explain what I need to do or if it's perhaps broken? Or just point me in the right direction.
Thanks

Gremlin Server is the replacement for Rexster in TinkerPop3, which Titan 1.0 uses. In the Gremlin Server documentation, you can find a lot more detail on configuration than the Titan docs.
Under titan-1.0.0-hadoop1/conf/gremlin-server/gremlin-server.yaml, you can find the configuration settings for the server. Out of the box, it uses WebSockets and a BerkeleyDB backend. You can update those settings to match your set up. For example, here's a Titan server configuration for Cassandra and Elasticsearch. If you are planning to connect to it from a different computer, make sure to update the host property.
Start the server with bin/gremlin-server.sh conf/gremlin-server/gremlin-server.yaml then you can connect to it with a remote connection. As described in the TinkerPop docs, you could connect with a Gremlin Console then issue commands to the remote server.
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :> g.V().values('name')
For using Node, you can use this WebSockets Gremlin client. You can find client libraries for other languages on the TinkerPop homepage.

Related

Can't use Tableau on a EMR Spark cluster

I have a client that wants to use Tableau on their EMR Spark cluster.
The documentation seems straightforward but I'm getting errors when I try to connect.
Here is the setup:
EMR cluster's master doesn't have a public IP, but from the Tableau desktop EC2 instance I am able to ping and telnet into the port 10001 where thrift is running
I am able to test thrift with beeline and it connects fine
I am not using SSL or authentication given the limit access the cluster has
I have installed both data direct 8.0 and simbaodbc
I'm using emr-5.13.0, the Hadoop distribution is Amazon 2.8.3 and the Spark version is 2.3.0.
The error is
Unable to connect to the ODBC Data Source. Check that the necessary drivers are installed and that the connection properties are valid.
[Simba][ThriftExtension] (5) Error occurred while contacting server: No more data to read.. This could be because you are trying to establish a non-SSL connection to an SSL-enabled server.
Unable to connect to the server "IP". Check that the server is running and that you have access privileges to the requested database."
I simply followed the documentation provided by Tableau which says to install the driver only (not mess with ODBC), then us it in Tableau. I have verified that I have set no SSL and no authentication before trying to connect. I also verified by running Datagrip and doing a query from the Tableau EC2 instance, which works as expected.
resolved the issue by ignoring the documentation and just setting up the odbc driver, then choosing it instead of sparksql as a source.

Gremlin-Server Cassandra

I am starting to work with Titan and I am using cassandra as backend store.
When I start titan.sh cassandra and elasticsearch were started but the gremlin server did not.
I was looking at titan.sh and I have seen that it start gremlin server with conf/gremlin-server/gremlin-server.yaml. The problem is that the gremlin-server.yaml is configured as this:
graphs: {
graph: conf/gremlin-server/titan-berkeleyje-server.properties}
Using BerkeleyDb. I have not seen the cassandra.yaml for Gremlin Server.
How can I configure the it for Cassandra ?
Thanks
A fix from Stephen has been checked in to address this https://github.com/thinkaurelius/titan/commit/89c0a2b30e798a13e098949c219730b228bcc82a

titan rexster with external cassandra instance

I have a cassandra cluster (2.1.0) running fine.
After installing titan 5.1, and editing the titan-cassandra.properties to point to cluster hostname list rather than localhost, i run following -
titan.sh -c conf/titan-cassandra.properties start
It is able to recognize running cassandra instance, starts elastic search, but times out while connecting to rexster.
If i run it with local cassandra, everything runs fine using following ->br>
titan.sh start
do i need to make any change in rexster properties to point to running cassandra cluster..
Thanks in advance
Titan Server started by titan.sh represents a quick way to get started with Titan/Rexster/ES. It is designed to simplify running all those things with one startup script. Once you start breaking things apart (e.g. a separate cassandra cluster), you might not want to use titan.sh anymore because, it still forks a cassandra process when it starts up. Presumably, you don't need that anymore, given that you have a separate cassandra cluster.
Given the more advanced nature of your environment, I would simply download Rexster and configure it to connect to your cluster.

Where to find Titan error logs?

I'm using Titan with Cassandra, Elasticsearch and Rexster.
Everything is properly set up and I can add/remove nodes and edges to the graph through Rexster as well as the REST API.
When it crashes, I have to kill java and run it again. The error that I get in Rexster is:
Could not get the vertices of graphs from Rexster.
It happens often and I don't know what the problem is. I'm not sure what part of the stack -- Titan, Rexster or Elasticsearch -- fails.
Where can I find a log file that I could look at to find out what the problem is?
I assume that you are using Titan Server distribution. By default there should be a log directory in the root of your titan installation directory. It should contain two files:
cassandra.log - obviously for cassandra
rexstitan.log - Rexster logs. As Rexster hosts Titan, the Titan logging messages should be in here as well.
It also depends of your Titan configuration, whether is remote or embedded. Cassandra logs are usually stored in /var/log/cassandra/. Check there also.

Connecting to Cassandra Cluster instead of specific node

I am trying to learn Cassandra and have setup a 2 node Cassandra cluster. I have written a client in Java using cassandra jdbc driver, which currently connects to a hard coded single node in the cluster. Ideally, I would like my client to connect to the "cluster" rather then a specific node.
So that client code automatically connects to other node if the first node is down.
Is this possible using cassandra jdbc driver? Currently using below code to create connection
DriverManager.getConnection("jdbc:cassandra://localhost:9160/testdb");
Yes. If you're using the Datastax Java driver, you can get all of these benefits and more. From the documentation:
The driver has the following features:
connection pooling
node discovery
automatic failover
load balancing
What is your language? If you're using Java, I suggest for Hector framework.
http://hector-client.github.io/hector/build/html/index.html
I think it's very good for correspond on Cassandra db.

Resources