Not loading data from titan graph with cassandra backend using gremlin. - cassandra

I have added data in titan(cassandra backend) using blueprint api with java. I used following configuration in java for inserting data.
TitanGraph getTitanGraph()
{
conf2 = new BaseConfiguration();
conf2.setProperty("storage.backend", "cassandra");
conf2.setProperty("storage.directory","/some/directory");
conf2.setProperty("storage.read-only", "false");
conf2.setProperty("attributes.allow-all", true);
return TitanFactory.open(conf2);
}
Now I am trying to query that database using gremlin. I used following cmd to load it
g = TitanFactory.open("bin/cassandra.local");
following is my cassandra.local file
conf = new BaseConfiguration();
conf.setProperty("storage.backend","cassandra");
conf.setProperty("storage.hostname","127.0.0.1");
conf.setProperty("storage.read-only", "false");
conf.setProperty("attributes.allow-all", true)
but when I am running "g.V", I am getting empty graph. Please help
thanks

Make sure that you commit the changes to your TitanGraph after making graph mutations in your Java program. If you're using Titan 0.5.x, the call is graph.commit(). If you're using Titan 0.9.x, the call is graph.tx().commit().
Note that storage.directory isn't valid for a Cassandra backend, however the default value for storage.hostname is 127.0.0.1 so those should be the same between your Java program and cassandra.local. It might be easier to use a properties file to store your connection properties.

Related

Datastax Node.js Cassandra driver When to use a Mapper vs. Query

I'm working with the Datastax Node.js driver and I can't figure out when to use a mapper vs. query. Both seem to be able to perform the same CRUD operations.
With a query:
const q = SELECT * FROM mykeyspace.mytable WHERE id='12345';
client.execute(q).then(result => console.log('This is the data', result);
With mapper:
const tableRow = await tableMapper.find({ id: '12345' });
When should I use the mapper over a query and vice versa?
Mapper is a feature from cassandra-driver released in 2018. Using mapper, cassandra-driver can make a map from your cassandra table to an object in nodejs and you can handle in your nodejs application like a set of document.
Using mapper you can make selects or inserts in your database like said in this article:
https://www.datastax.com/blog/2018/12/introducing-datastax-nodejs-mapper-apache-cassandra
With query method, if you need to use or reuse any property from your json you will need to make a Json.Parse().
The short answer is: whatever you find more comfortable.
The Mapper lets you deal with database data as documents (JavaScript objects), builds the CQL query for you, executes the query and maps the results.
On the other hand, the core driver only supports executing CQL queries that you have to write yourself.

MemSQL Spark Job

I am trying to read a CSV file in Spark job using MemSQL Extractor and do some enrichment using Transformer and load to MemSQL Database using Java.
I see there is memsql-spark interface jar but not finding any useful Java API documentation or example.
I have started writing extractor to read from CSV but I dont know how to move further.
public Option<RDD<byte[]>> nextRDD(SparkContext sparkContext, UserExtractConfig config, long batchInterval, PhaseLogger logger) {
RDD<String> inputFile = sparkContext.textFile(filePath, minPartitions);
RDD<String> inputFile = sparkContext.textFile(filePath, minPartitions);
RDD<byte[]> bytes = inputFile.map(ByteUtils.utf8StringToBytes(filePath), String.class); //compilation error
return bytes; //compilation error
}
Would appreciate if someone can point me to some direction to get started...
thanks...
First configure Spark connector in java using following code:
SparkConf conf = new SparkConf();
conf.set("spark.datasource.singlestore.clientEndpoint", "singlestore-host")
spark.conf.set("spark.datasource.singlestore.user", "admin")
spark.conf.set("spark.datasource.singlestore.password", "s3cur3-pa$$word")
After running the above code spark will be connected to java then you can read csv in spark dataframe. You can transform and manipulate data according to requirements then you can write this dataframe to Database table.
Also attaching link for your reference.
spark-singlestore.

Storm Cassandra Integeration

I am newbie in both Storm and Cassandra. I want to use a Bolt to write the strings emitted by a Spout, in a column family in Cassandra. I have read the example here which seems a little bit complex for me, as it uses different classes for writing in the Cassandra DB. Furthermore, I want to know how many times the strings are written in the Cassandra DB. In the example, for me, it is not clear how we can control the number of strings entered in the Cassandra DB?
Simply, I need a Bolt to write the emitted strings by a Spout to a Cassandra column family e.g., 200 records?
Thanks in advance!
You can either use Datastax Cassandra Driver or your can you the storm-cassandra library you posted earlier.
Your requirements is unclear. You only want to store 200 tuples?
Any way, run the topology with sample data and after the stream is finished, query Cassandra and see what is there.
Apache Storm and Apache Cassandra are quite deep and extensive projects. There is no walk around learning them and do sample projects in order to learn.
hope this will help.
/*Main Class */
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.put("cassandra.keyspace", "Storm_Output"); //Key_space name
conf.put("cassandra.nodes","ip-address-of-cassandra-machine");
conf.put("cassandra.port",9042);
//port on which cassandra is running (Default:9042)
builder.setSpout("generator", new RandomSentenceSpout(), 1);
builder.setBolt("counter", new CassandraInsertionBolt(), 1).shuffleGrouping("generator");
builder.setBolt("CassandraBolt",new CassandraWriterBolt(
async(
simpleQuery("INSERT INTO Storm_Output.tanle_name (field1,field2 ) VALUES (?,?);")
.with(
fields("field1","field2 ")
)
)
), 1).globalGrouping("counter");
// Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(1);
StormSubmitter.submitTopologyWithProgressBar("Cassnadra-Insertion", conf, builder.createTopology());
/*Bolt sending data for insertion into cassandra */
/*CassandraWriter Bolt */
public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
Random rand=new Random();
basicOutputCollector.emit(new Values(rand.nextInt(20),rand.nextInt(20)));
}
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
// TODO Auto-generated method stub
outputFieldsDeclarer.declare(new Fields("field1","field2"));
}
}

How to create and retrieve a graph database using java api in cassandra database

I am trying to create a graph having nodes and edges with some weights in Cassandra database using Titan Graph api . so how to retrieve this graph so that I could visualize it.
rexster or gremlin is the solution for it.. ?? If it so.. Please tell me the process.
I have a 4 cluster cassandra setup. I run titan on top of it. I made this program which creates two nodes and then create an edge between them and finally queries and prints it.
public static void main(String args[])
{
BaseConfiguration baseConfiguration = new BaseConfiguration();
baseConfiguration.setProperty("storage.backend", "cassandra");
baseConfiguration.setProperty("storage.hostname", "192.168.1.10");
TitanGraph titanGraph = TitanFactory.open(baseConfiguration);
Vertex rash = titanGraph.addVertex(null);
rash.setProperty("userId", 1);
rash.setProperty("username", "rash");
rash.setProperty("firstName", "Rahul");
rash.setProperty("lastName", "Chaudhary");
rash.setProperty("birthday", 101);
Vertex honey = titanGraph.addVertex(null);
honey.setProperty("userId", 2);
honey.setProperty("username", "honey");
honey.setProperty("firstName", "Honey");
honey.setProperty("lastName", "Anant");
honey.setProperty("birthday", 201);
Edge frnd = titanGraph.addEdge(null, rash, honey, "FRIEND");
frnd.setProperty("since", 2011);
titanGraph.commit();
Iterable<Vertex> results = rash.query().labels("FRIEND").has("since", 2011).vertices();
for(Vertex result : results)
{
System.out.println("Id: " + result.getProperty("userId"));
System.out.println("Username: " + result.getProperty("username"));
System.out.println("Name: " + result.getProperty("firstName") + " " + result.getProperty("lastName"));
}
}
My pom.xml, I just have this dependency:
<dependency>
<groupId>com.thinkaurelius.titan</groupId>
<artifactId>titan-cassandra</artifactId>
<version>0.5.4</version>
</dependency>
I am running Cassandra 2.1.2.
I don’t know much on gremlin, but I believe Gremlin is the shell that you can use to query your database from the command line. Rexter is an API that you can use on top of any other API that uses Blueprints (like Titan), to make your query/code accessible to others via REST API. Since you want to use embedded Java with titan, you don’t need gremlin. With the dependency that I have stated, blueprint API comes with it and using that API (like I have done in my code), you can do everything with your graph.
You might find this cheatsheet useful. http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html
First note that TitanGraph uses the Blueprints API, thus the Titan API is the Blueprints API. As you are using Blueprints you can use Gremlin, Rexster or any part of the TinkerPop stack to process your graph.
How you visualize your graph once in Titan is dependent on the graph visualization tools you choose. If I assume you are using Gephi or similar tool that can consume GraphML, then the easiest way to get the data from Titan would be to open a Gremlin REPL, get a reference to the graph and just do:
g = TitanFactory.open(...)
g.saveGraphML('/tmp/my-graph.xml')
From there you could import my-graph.xml into Gephi and visualize it.

How to delete graph in Titan with Cassandra storage backend?

I use Titan 0.4.0 All, running Rexster in shared VM mode on Ubuntu 12.04.
How could I properly delete a graph in Titan which is using the Cassandra storage backend?
I have tried the TitanCleanup.clear(graph), but it does not delete everything. The indices are still there. My real issue is that I have an index which I don't want (it crashes every query), however as I understand Titan's documentation it is impossible to remove an index once it is created.
You can clear all the edges/vertices with:
g.V.remove()
but as you have found that won't clear the types/indices previously created. The most cleanly option would be to just delete the Cassandra data directory.
If you are executing the delete via a unit test you might try to do this as part of your test setup:
this.config = new BaseConfiguration(){{
addProperty("storage.backend", "berkeleyje")
addProperty("storage.directory", "/tmp/titan-schema-test")
}}
GraphDatabaseConfiguration graphconfig = new GraphDatabaseConfiguration(config)
graphconfig.getBackend().clearStorage()
g = (StandardTitanGraph) TitanFactory.open(config)
Be sure to call g.shutdown() in your test teardown method.
Just to update this answer.
With Titan 1.0.0 this can be done programmatically in Java with:
TitanGraph graph = TitanFactory.open(config);
graph.close();
TitanCleanup.clear(graph);
For the continuation of Titan called JanusGraph, the command is JanusGraphFactory.clear(graph) but is soon to be JanusGraphCleanup.clear(graph).
As was mentioned in one of the comments to the earlier answer DROPping a keyspace titan using cqlsh should do it:
cqlsh> DROP KEYSPACE titan;
The name of the keyspace Titan uses is set up using storage.cassandra.keyspace configuration option. You can change it to whatever name you want and is acceptable by Cassandra.
storage.cassandra.keyspace=hello_titan
When Cassandra is getting up, it prints out the keyspace's name as follows:
INFO 19:50:32 Create new Keyspace: KSMetaData{name=hello_titan,
strategyClass=SimpleStrategy, strategyOptions={replication_factor=1},
cfMetaData={}, durableWrites=true,
userTypes=org.apache.cassandra.config.UTMetaData#767d6a9f}
In 0.9.0-M1, the name appears in Titan's log in DEBUG (set log4j.rootLogger=DEBUG, stdout in conf/log4j-server.properties):
[DEBUG] AstyanaxStoreManager - Found keyspace titan
or the following when it doesn't:
[DEBUG] AstyanaxStoreManager - Creating keyspace titan...
[DEBUG] AstyanaxStoreManager - Created keyspace titan

Resources