I want to create a Cassandra keyspace with 'network topology'. I can do it using CLI like this.
CREATE KEYSPACE test
WITH placement_strategy = 'NetworkTopologyStrategy'
AND strategy_options={us-east:6,us-west:3};
How can I achieve the same using Hector?
Thanks,
Bhathiya
You shouldn't do that. Back when Hector was the primary choice for a client driver, the recommendations were that you create your keyspace via the cassandra-CLI
Having said that, I would suggest that you use an up-to-date driver and recomend the DataStax binary protocol driver.
For the record, this works.
Map<String,String> options = new HashMap<String,String>();
options.put("dc1", "3");
options.put("dc2", "1");
ThriftKsDef kd = (ThriftKsDef) HFactory.createKeyspaceDefinition(keyspaceName,
strategyClass, replicationFactor, cfDefs);
if(options != null){
kd.setStrategyOptions(options);
}
getCluster().addKeyspace(kd, true);
Related
I have written a custom LoadBalancerPolicy for spark-cassandra-connector and now I want to ensure that it really works!
I have a Cassandra cluster with 3 nodes and a keyspace with a replication factor of 2, so when we want to retrieve a record, there will be only two nodes on cassandra which hold the data.
The thing is that I want to ensure the spark-cassandra-connector (with my load-balancer-policy) is still token-aware and will choose the right node as coordinator for each "SELECT" statement.
Now, I'm thinking if we can write a trigger on the SELECT statement for each node, in case of the node does not hold the data, the trigger will create a log and I realize the load-balancer-policy does not work properly. How can we write a trigger On SELECT in Cassandra? Is there any better way to accomplish that?
I already checked the documentation for creating the triggers and those are too limited:
Official documentation
Documentation at DataStax
Example implementation in official repo
You can do it from the program side, if you get routing key for your bound statement (you must use prepared statements), find the replicas for it via Metadata class, and then compare if this host is in the ExecutionInfo that you can get from ResultSet.
According to what Alex said, we can do it as below:
After creating SparkSession, we should make a connector:
import com.datastax.spark.connector.cql.CassandraConnector
val connector = CassandraConnector.apply(sparkSession.sparkContext.getConf)
Now we can define a preparedStatement and do the rest:
connector.withSessionDo(session => {
val selectQuery = "select * from test where id=?"
val prepareStatement = session.prepare(selectQuery)
val protocolVersion = session.getCluster.getConfiguration.getProtocolOptions.getProtocolVersion
// We have to explicitly bind the all of parameters that partition key is based on them, otherwise the routingKey will be null.
val boundStatement = prepareStatement.bind(s"$id")
val routingKey = boundStatement.getRoutingKey(protocolVersion, null)
// We can get tha all of nodes that contains the row
val replicas = session.getCluster.getMetadata.getReplicas("test", routingKey)
val resultSet = session.execute(boundStatement)
// We can get the node which gave us the row
val host = resultSet.getExecutionInfo.getQueriedHost
// Final step is to check whether the replicas contains the host or not!!!
if (replicas.contains(host)) println("It works!")
})
The important thing is that we have to explicitly bind the all of parameters that partition key is based on them (i.e. we cannot set them har-codded in the SELECT statement), otherwise the routingKey will be null.
Just installed Cassandra (3.11.1) and Datastax node.js driver (3.3.0).
Created a simple structure:
CREATE KEYSPACE CaseBox
WITH REPLICATION={
'class': 'SimpleStrategy',
'replication_factor': 1};
USE CaseBox;
CREATE TABLE Regions (
regionId int,
name varchar,
PRIMARY KEY ((regionId))
);
Trying connect to the keyspace with no success:
const client = new cassandra.Client({
contactPoints: ['localhost'],
keyspace: 'CaseBox'
});
await client.connect();
Error: Keyspace 'CaseBox' does not exist.
If I change the desired kayspace to system it works.
What's wrong? Please assist.
You can find my full Cassandra config, CQL and js scripts here.
Keyspace names are forced lower case when unquoted in CQL.
Use CREATE KEYSPACE "CaseBox" in CQL or keyspace: 'casebox' in Node.
Generally just use snake_case names as you won't get trapped like this as often.
I am newbie in both Storm and Cassandra. I want to use a Bolt to write the strings emitted by a Spout, in a column family in Cassandra. I have read the example here which seems a little bit complex for me, as it uses different classes for writing in the Cassandra DB. Furthermore, I want to know how many times the strings are written in the Cassandra DB. In the example, for me, it is not clear how we can control the number of strings entered in the Cassandra DB?
Simply, I need a Bolt to write the emitted strings by a Spout to a Cassandra column family e.g., 200 records?
Thanks in advance!
You can either use Datastax Cassandra Driver or your can you the storm-cassandra library you posted earlier.
Your requirements is unclear. You only want to store 200 tuples?
Any way, run the topology with sample data and after the stream is finished, query Cassandra and see what is there.
Apache Storm and Apache Cassandra are quite deep and extensive projects. There is no walk around learning them and do sample projects in order to learn.
hope this will help.
/*Main Class */
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.put("cassandra.keyspace", "Storm_Output"); //Key_space name
conf.put("cassandra.nodes","ip-address-of-cassandra-machine");
conf.put("cassandra.port",9042);
//port on which cassandra is running (Default:9042)
builder.setSpout("generator", new RandomSentenceSpout(), 1);
builder.setBolt("counter", new CassandraInsertionBolt(), 1).shuffleGrouping("generator");
builder.setBolt("CassandraBolt",new CassandraWriterBolt(
async(
simpleQuery("INSERT INTO Storm_Output.tanle_name (field1,field2 ) VALUES (?,?);")
.with(
fields("field1","field2 ")
)
)
), 1).globalGrouping("counter");
// Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(1);
StormSubmitter.submitTopologyWithProgressBar("Cassnadra-Insertion", conf, builder.createTopology());
/*Bolt sending data for insertion into cassandra */
/*CassandraWriter Bolt */
public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
Random rand=new Random();
basicOutputCollector.emit(new Values(rand.nextInt(20),rand.nextInt(20)));
}
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
// TODO Auto-generated method stub
outputFieldsDeclarer.declare(new Fields("field1","field2"));
}
}
I use Titan 0.4.0 All, running Rexster in shared VM mode on Ubuntu 12.04.
How could I properly delete a graph in Titan which is using the Cassandra storage backend?
I have tried the TitanCleanup.clear(graph), but it does not delete everything. The indices are still there. My real issue is that I have an index which I don't want (it crashes every query), however as I understand Titan's documentation it is impossible to remove an index once it is created.
You can clear all the edges/vertices with:
g.V.remove()
but as you have found that won't clear the types/indices previously created. The most cleanly option would be to just delete the Cassandra data directory.
If you are executing the delete via a unit test you might try to do this as part of your test setup:
this.config = new BaseConfiguration(){{
addProperty("storage.backend", "berkeleyje")
addProperty("storage.directory", "/tmp/titan-schema-test")
}}
GraphDatabaseConfiguration graphconfig = new GraphDatabaseConfiguration(config)
graphconfig.getBackend().clearStorage()
g = (StandardTitanGraph) TitanFactory.open(config)
Be sure to call g.shutdown() in your test teardown method.
Just to update this answer.
With Titan 1.0.0 this can be done programmatically in Java with:
TitanGraph graph = TitanFactory.open(config);
graph.close();
TitanCleanup.clear(graph);
For the continuation of Titan called JanusGraph, the command is JanusGraphFactory.clear(graph) but is soon to be JanusGraphCleanup.clear(graph).
As was mentioned in one of the comments to the earlier answer DROPping a keyspace titan using cqlsh should do it:
cqlsh> DROP KEYSPACE titan;
The name of the keyspace Titan uses is set up using storage.cassandra.keyspace configuration option. You can change it to whatever name you want and is acceptable by Cassandra.
storage.cassandra.keyspace=hello_titan
When Cassandra is getting up, it prints out the keyspace's name as follows:
INFO 19:50:32 Create new Keyspace: KSMetaData{name=hello_titan,
strategyClass=SimpleStrategy, strategyOptions={replication_factor=1},
cfMetaData={}, durableWrites=true,
userTypes=org.apache.cassandra.config.UTMetaData#767d6a9f}
In 0.9.0-M1, the name appears in Titan's log in DEBUG (set log4j.rootLogger=DEBUG, stdout in conf/log4j-server.properties):
[DEBUG] AstyanaxStoreManager - Found keyspace titan
or the following when it doesn't:
[DEBUG] AstyanaxStoreManager - Creating keyspace titan...
[DEBUG] AstyanaxStoreManager - Created keyspace titan
Is there an easy way to check if table (column family) is defined in Cassandra using CQL (or API perhaps, using com.datastax.driver)?
Right now I am leaning towards executing SELECT 1 FROM table and checking for exception but maybe there is a better way?
As of 1.1 you should be able to query the system keyspace, schema_columnfamilies column family. If you know which keyspace you want to check, this CQL should list all column families in a keyspace:
SELECT columnfamily_name
FROM schema_columnfamilies WHERE keyspace_name='myKeyspaceName';
The report describing this functionality is here: https://issues.apache.org/jira/browse/CASSANDRA-2477
Although, they do note that some of the system column names have changed between 1.1 and 1.2. So you might have to mess around with it a little to get your desired results.
Edit 20160523 - Cassandra 3.x Update:
Note that for Cassandra 3.0 and up, you'll need to make a few adjustments to the above query:
SELECT table_name
FROM system_schema.tables WHERE keyspace_name='myKeyspaceName';
The Java driver (since you mentioned it in your question) also maintains a local representation of the schema.
Driver 3.x and below:
KeyspaceMetadata ks = cluster.getMetadata().getKeyspace("myKeyspace");
TableMetadata table = ks.getTable("myTable");
boolean tableExists = (table != null);
Driver 4.x and above:
Metadata metadata = session.getMetadata();
boolean tableExists =
metadata.getKeyspace("myKeyspace")
.flatMap(ks -> ks.getTable("myTable"))
.isPresent();
I just needed to manually check for the existence of a table using cqlsh.
Possibly useful general info.
describe keyspace_name.table_name
If it doesn't exist you'll get 'table_name' not found in keyspace 'keyspace'
If it does exist you'll get a description of the table.
For the .NET driver CassandraCSharpDriver version 3.17.1 the following code creates a table if it doesn't exist yet:
var ks = _cassandraSession.Cluster.Metadata.GetKeyspace(keyspaceName);
var tableNames = ks.GetTablesNames();
if(!tableNames.Contains(tableName.ToLowerInvariant()))
{
var stmt = new SimpleStatement($"CREATE TABLE {tableName} (id text PRIMARY KEY, name text, price decimal, volume int, time timestamp)");
_cassandraSession.Execute(stmt);
}
You will need to adapt the list of table columns to your needs. This can also be awaited by using await _cassandraSession.ExecuteAsync(stmt).ConfigureAwait(false) in an async method.
Also, I want to mention that I'm using Cassandra version 4.0.1.