Kundera - Cassandra Replication Factor using EntityManagerFactory - cassandra

I have an application which uses Kundera to generate tables from objects. I want to change Cassandra Replication factor. I use EntityManagerFactory separately to interact with the database for initializing, persisting records, etc.
I know we can create a separate kundera-cassandra.xml file and mention the replication factor. However, this throws an error for me and says keyspace doesn't exist. Also, I don't want to do this.
I want to change replication factor using EntityManagerFactory instead and somehow it doesn't work.
Here is my initialize function:
props.put(KUNDERA_NODES_KEY, host);
props.put(KUNDERA_PORT_KEY, String.valueOf(port));
props.put(KUNDERA_KEYSPACE_KEY, databaseName);
props.put(CassandraConstants.CQL_VERSION,CassandraConstants.CQL_VERSION_3_);
props.put("replication_factor", 2);
entityManagerFactory = Persistence.createEntityManagerFactory(
DataServiceConfiguration.KUNDERA_PERSISTENCE_UNIT, props);
LOG.info("DataServiceImpl initialized with Properties: " + props);
Note: I have tried setting the replication factor value as String as well and also have tried using CassandraConstants. Please let me know what am I doing incorrectly?

This is not possible with the current code. I have added a fix for this in github (track issue #1005).
This fix will be available from next Kundera release or you can build code from source to use it ASAP.
-Karthik

Related

Schema disagreements with Cassandra 4.0 using the Java driver

we have a 3-node dev Cassandra cluster running 3.11.13 that we have upgraded to 4.0.7, and we’ve been basically sending DDL statements through our Java applications using spring-data-cassandra:3.4.6 which uses the DataStax Java Driver version 4.14.1, and ever since we hadn’t had faced any issues with it until the upgrade to 4.0.7
The main issue with 4.0.7 that we’re facing is the schema disagreements that we’ve been seeing due to the tables created programmatically that has been a non-issue for us since 3.11.x. Although DDL statements made through cqlsh is working as expected, it’s only through the programmatic creation that we’re seeing the schema disagreements.
We’ve tried different cluster setups, C* versions, and Ubuntu versions, but we still face the same issue:
3-node, single-rack DC (Ubuntu 18.04, 20.04, 22.04) (4.0.x, 4.1.x)
3-node, 3-rack DC (Ubuntu 18.04, 20.04, 22.04) (4.0.x, 4.1.x) — This is the setup we’ve been using since 3.11.x
We’ve also tried fiddling with the driver configurations like adjusting the timeouts and disabling debouncing, but with no luck, face the same issue.
advanced.control-connection {
schema-agreement {
interval = 500 milliseconds
timeout = 10 seconds
warn-on-failure = true
}
},
advanced.metadata {
topology-event-debouncer {
window = 1 milliseconds
max-events = 1
}
schema {
request-timeout = 5 seconds
debouncer {
window = 1 milliseconds
max-events = 1
}
}
}
We’re creating tables programmatically through the following snippets:
#Override
protected abstract List<String> getStartupScripts();
#Bean
SessionFactoryInitializer sessionFactoryInitializer(SessionFactory sessionFactory) {
SessionFactoryInitializer initializer = new SessionFactoryInitializer();
initializer.setSessionFactory(sessionFactory);
final ResourceKeyspacePopulator resourceKeyspacePopulator = new ResourceKeyspacePopulator();
getStartupScripts().forEach(script ->
{
resourceKeyspacePopulator.addScript(scriptOf(script));
});
initializer.setKeyspacePopulator(resourceKeyspacePopulator);
return initializer;
}
And create one like:
#Override
protected List<String> getStartupScripts() {
return Arrays.asList(testTable());
}
private String testTable() {
return "CREATE TABLE IF NOT EXISTS test_table ("
+ "test text, "
+ "test2 text, "
+ "createdat bigint, "
+ "PRIMARY KEY(test, test2))";
}
But we end up in a loop until it timeouts due to the schema disagreement with the following errors:
DEBUG com.datastax.oss.driver.internal.core.metadata.SchemaAgreementChecker - [s1] Schema agreement not reached yet ([09989a2c-7348-3117-8b4a-d5cad549bc09, f4c8755d-6fec-38fe-984f-4083f4a0a0a0]), rescheduling in 500 ms
WARN org.springframework.context.support.GenericApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'sessionFactoryInitializer' defined in com.bitcoin.wallet.config.CassandraConfig: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.cassandra.core.cql.session.init.SessionFactoryInitializer]: Factory method 'sessionFactoryInitializer' threw exception; nested exception is org.springframework.data.cassandra.core.cql.session.init.ScriptStatementFailedException: Failed to execute CQL script statement #1 of Byte array resource [resource loaded from byte array]: CREATE TABLE IF NOT EXISTS test_table (test text,test2 text,createdat bigint,PRIMARY KEY(test, test2)); nested exception is com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT10S
So two things come to mind when reading through this:
Schema disagreements are often a symptom of some larger issue.
Does the node have its CPU pegged at 100%? Schema disagreement. Inefficient network routing? Schema disagreement. Disk IOPS maxed-out causing write back-pressure? Schema disagreement.
I'd have a look at the activity on the nodes and see if any of the above stand out.
Programmatic schema changes are often problematic.
Each node needs to store the complete schema, so each schema change gets sent to all nodes, essentially making schema changes running at an asynchronous ALL level of consistency. Because of that, there's no margin for error. And programmatic schema changes are often sent from within an application much faster than Cassandra can reconcile them.
My recommendations for making any schema changes:
Execute during off-peak times.
Only run when all nodes are UN.
Run them using cqlsh (not from application code).
Verify each individual change using nodetool describecluster.

Storm Cassandra Integeration

I am newbie in both Storm and Cassandra. I want to use a Bolt to write the strings emitted by a Spout, in a column family in Cassandra. I have read the example here which seems a little bit complex for me, as it uses different classes for writing in the Cassandra DB. Furthermore, I want to know how many times the strings are written in the Cassandra DB. In the example, for me, it is not clear how we can control the number of strings entered in the Cassandra DB?
Simply, I need a Bolt to write the emitted strings by a Spout to a Cassandra column family e.g., 200 records?
Thanks in advance!
You can either use Datastax Cassandra Driver or your can you the storm-cassandra library you posted earlier.
Your requirements is unclear. You only want to store 200 tuples?
Any way, run the topology with sample data and after the stream is finished, query Cassandra and see what is there.
Apache Storm and Apache Cassandra are quite deep and extensive projects. There is no walk around learning them and do sample projects in order to learn.
hope this will help.
/*Main Class */
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.put("cassandra.keyspace", "Storm_Output"); //Key_space name
conf.put("cassandra.nodes","ip-address-of-cassandra-machine");
conf.put("cassandra.port",9042);
//port on which cassandra is running (Default:9042)
builder.setSpout("generator", new RandomSentenceSpout(), 1);
builder.setBolt("counter", new CassandraInsertionBolt(), 1).shuffleGrouping("generator");
builder.setBolt("CassandraBolt",new CassandraWriterBolt(
async(
simpleQuery("INSERT INTO Storm_Output.tanle_name (field1,field2 ) VALUES (?,?);")
.with(
fields("field1","field2 ")
)
)
), 1).globalGrouping("counter");
// Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(1);
StormSubmitter.submitTopologyWithProgressBar("Cassnadra-Insertion", conf, builder.createTopology());
/*Bolt sending data for insertion into cassandra */
/*CassandraWriter Bolt */
public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
Random rand=new Random();
basicOutputCollector.emit(new Values(rand.nextInt(20),rand.nextInt(20)));
}
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
// TODO Auto-generated method stub
outputFieldsDeclarer.declare(new Fields("field1","field2"));
}
}

How to delete graph in Titan with Cassandra storage backend?

I use Titan 0.4.0 All, running Rexster in shared VM mode on Ubuntu 12.04.
How could I properly delete a graph in Titan which is using the Cassandra storage backend?
I have tried the TitanCleanup.clear(graph), but it does not delete everything. The indices are still there. My real issue is that I have an index which I don't want (it crashes every query), however as I understand Titan's documentation it is impossible to remove an index once it is created.
You can clear all the edges/vertices with:
g.V.remove()
but as you have found that won't clear the types/indices previously created. The most cleanly option would be to just delete the Cassandra data directory.
If you are executing the delete via a unit test you might try to do this as part of your test setup:
this.config = new BaseConfiguration(){{
addProperty("storage.backend", "berkeleyje")
addProperty("storage.directory", "/tmp/titan-schema-test")
}}
GraphDatabaseConfiguration graphconfig = new GraphDatabaseConfiguration(config)
graphconfig.getBackend().clearStorage()
g = (StandardTitanGraph) TitanFactory.open(config)
Be sure to call g.shutdown() in your test teardown method.
Just to update this answer.
With Titan 1.0.0 this can be done programmatically in Java with:
TitanGraph graph = TitanFactory.open(config);
graph.close();
TitanCleanup.clear(graph);
For the continuation of Titan called JanusGraph, the command is JanusGraphFactory.clear(graph) but is soon to be JanusGraphCleanup.clear(graph).
As was mentioned in one of the comments to the earlier answer DROPping a keyspace titan using cqlsh should do it:
cqlsh> DROP KEYSPACE titan;
The name of the keyspace Titan uses is set up using storage.cassandra.keyspace configuration option. You can change it to whatever name you want and is acceptable by Cassandra.
storage.cassandra.keyspace=hello_titan
When Cassandra is getting up, it prints out the keyspace's name as follows:
INFO 19:50:32 Create new Keyspace: KSMetaData{name=hello_titan,
strategyClass=SimpleStrategy, strategyOptions={replication_factor=1},
cfMetaData={}, durableWrites=true,
userTypes=org.apache.cassandra.config.UTMetaData#767d6a9f}
In 0.9.0-M1, the name appears in Titan's log in DEBUG (set log4j.rootLogger=DEBUG, stdout in conf/log4j-server.properties):
[DEBUG] AstyanaxStoreManager - Found keyspace titan
or the following when it doesn't:
[DEBUG] AstyanaxStoreManager - Creating keyspace titan...
[DEBUG] AstyanaxStoreManager - Created keyspace titan

_replicate db does not exists in couchDb on Android

I am trying to setup replication between android tablet and a system. I use CouchDbInstance object to setup replication
This is my code
/**
* #param builder
* #param couchDbInstance
* #return the #Link{ReplicationStatus} for the replication command #Link {ReplicationCommand.Builder}
*/
private ReplicationStatus replicate(ReplicationCommand.Builder builder, CouchDbInstance couchDbInstance) {
int retryCount = 0;
ReplicationStatus replicationStatus = null;
while (retryCount < REPLICTAION_RETRY_MAX) {
replicationStatus = couchDbInstance.replicate(builder.build());
if (replicationStatus.isOk()) {
break;
}
retryCount++;
}
return replicationStatus;
}
In couch logs I see POST on _replicate returns 404
We use couchbasemobile and I know its not more supported. Can I know if _replicate way of replication is not supported and should I use _replicator way of replication
I don't know much java, so I'm guessing here, but I think you problem is misunderstanding how _replicate is used.
The documentation here explains: http://wiki.apache.org/couchdb/Replication It's not in the offical docs anymore, as I think they want people to use _replicator.
To start a continuous replication, you POST to _replicate:
{"source":"example-database","target":"target-db", "continuous": true}
In response, you get:
{"ok":true,"_local_id":"127c65ee56bcd253d9a019f5a6f84f16+continuous+create_target"}
To get the status of the replication, you GET _active_tasks. In response, for each active replication, you get:
{"ok":true,"_local_id":"127c65ee56bcd253d9a019f5a6f84f16+continuous+create_target"}
If the "_local_id" is not in _active_tasks, the replication is not happening.
I think that your problem is here:
replicationStatus = couchDbInstance.replicate(builder.build());
I don't know the libraries you're using, but this seems wrong. You should be checking _active_tasks to see if the _local_id is there. Also, you seem to be implementing continuous replication your self.
With couchbasemobile, I have found that there are some bugs with continuous replication, and it is a good idea to periodically GET _active_tasks to check if the continuous replications are still going, restarting them if they're not. But you should still use continuous replication.

Simplest way to insert data into a fresh Cassandra database using the Hector API?

I've followed numerous examples on inserting data into a Cassandra database and every time I get an exception about unconfigured column families.
Exception in thread "main" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:unconfigured columnfamily TestColumnFamily)
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:252)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:69)
at CassandraInterface.main(CassandraInterface.java:101)
Caused by: InvalidRequestException(why:unconfigured columnfamily TestColumnFamily)
at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19477)
at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
... 4 more
So I looked up how to configure them and found
BasicColumnFamilyDefinition cfdef = new BasicColumnFamilyDefinition();
cfdef.setKeyspaceName(keyspaceName);
cfdef.setName(columnFamilyName);
cfdef.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());
cfdef.setComparatorType(ComparatorType.UTF8TYPE);
That didn't configure the column family.
All of the examples I have found are fragments without any context, so I don't know what to import or set up. In addition, some examples appear to mix the Hector API v2 and the original Hector API, so when I use them, I get "class not found" or "function not found" compiler errors.
Hector CassandraClusterTest.java
#Test
public void testAddDropColumnFamily() throws Exception {
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("Keyspace1", "DynCf");
cassandraCluster.addColumnFamily(cfDef);
String cfid2 = cassandraCluster.dropColumnFamily("Keyspace1", "DynCf");
assertNotNull(cfid2);
// Let's wait for agreement
cassandraCluster.addColumnFamily(cfDef, true);
cfid2 = cassandraCluster.dropColumnFamily("Keyspace1", "DynCf", true);
assertNotNull(cfid2);
}
Long story short, keyspace and column family need to exist before you try and insert data into them. You can either manage this in your code, to check to see if they exist, using the example above as a nice reference -- or modify via the command line interface (cassandra-cli)
Hector Unit Tests
Hopefully you've been able to do this by now but this is how I've done it.
I have a cassandra install (using 1.1.4) and assuming you have all the necessary directories created:
/var/lib/cassandra
/var/lib/casandra/data
/var/lib/cassnadra/commitlogs
/var/lib/cassandra/saved_caches
I start it using:
bin/cassandra -f
I create a simple script called schema_create.txt:
CREATE KEYSPACE TEST
WITH strategy_class = 'org.apache.cassandra.locator.SimpleStrategy'
AND strategy_options:replication_factor='1';
use TEST;
CREATE COLUMNFAMILY TestColumnFamily(
userid varchar,
firstname varchar,
lastname varchar,
PRIMARY KEY (userid));
Then from the command line you can run this script using the new CQL tool that comes with cassandra as follows:
bin/cqlsh --cql3 < schema_createt.txt
This will install a keyspace named test with a column family named testcolumnfamily into cassandra.
Now from within your java application you can simply create a test class that has a main method (i will assume your development environment has all necessary dependencies if using maven):
try{
Mutator mutator = HFactory.createMutator(kweyspace, stringSerializer.get());
mutator.addInsertion("iamauser", "tescolumnfamily", HFactory.createStringColumn("firstname", "John"));
mutator.addInsertion("iamauser", "testcolumnfamily", HFactory.createStringColumn("lastname", "Smith"));
mutator.execute();
}
catch(HectorException Hex){ Hex.printStackTrace(); }
finally{ cluster.getConnectionManger().shutdown(); }
Now go back to the command line and enter into cassandra using:
$bin/cqlsh --cql3
use test;
select * from testcolumnfamily;
This will insert a row of data into your cassandra db with the key iamauser, and name as John Smith and you can verify as shown above using the cqlsh tool.
Hope this helps.

Resources