Simplest way to insert data into a fresh Cassandra database using the Hector API? - cassandra

I've followed numerous examples on inserting data into a Cassandra database and every time I get an exception about unconfigured column families.
Exception in thread "main" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:unconfigured columnfamily TestColumnFamily)
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:252)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:69)
at CassandraInterface.main(CassandraInterface.java:101)
Caused by: InvalidRequestException(why:unconfigured columnfamily TestColumnFamily)
at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19477)
at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
... 4 more
So I looked up how to configure them and found
BasicColumnFamilyDefinition cfdef = new BasicColumnFamilyDefinition();
cfdef.setKeyspaceName(keyspaceName);
cfdef.setName(columnFamilyName);
cfdef.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());
cfdef.setComparatorType(ComparatorType.UTF8TYPE);
That didn't configure the column family.
All of the examples I have found are fragments without any context, so I don't know what to import or set up. In addition, some examples appear to mix the Hector API v2 and the original Hector API, so when I use them, I get "class not found" or "function not found" compiler errors.

Hector CassandraClusterTest.java
#Test
public void testAddDropColumnFamily() throws Exception {
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("Keyspace1", "DynCf");
cassandraCluster.addColumnFamily(cfDef);
String cfid2 = cassandraCluster.dropColumnFamily("Keyspace1", "DynCf");
assertNotNull(cfid2);
// Let's wait for agreement
cassandraCluster.addColumnFamily(cfDef, true);
cfid2 = cassandraCluster.dropColumnFamily("Keyspace1", "DynCf", true);
assertNotNull(cfid2);
}
Long story short, keyspace and column family need to exist before you try and insert data into them. You can either manage this in your code, to check to see if they exist, using the example above as a nice reference -- or modify via the command line interface (cassandra-cli)
Hector Unit Tests

Hopefully you've been able to do this by now but this is how I've done it.
I have a cassandra install (using 1.1.4) and assuming you have all the necessary directories created:
/var/lib/cassandra
/var/lib/casandra/data
/var/lib/cassnadra/commitlogs
/var/lib/cassandra/saved_caches
I start it using:
bin/cassandra -f
I create a simple script called schema_create.txt:
CREATE KEYSPACE TEST
WITH strategy_class = 'org.apache.cassandra.locator.SimpleStrategy'
AND strategy_options:replication_factor='1';
use TEST;
CREATE COLUMNFAMILY TestColumnFamily(
userid varchar,
firstname varchar,
lastname varchar,
PRIMARY KEY (userid));
Then from the command line you can run this script using the new CQL tool that comes with cassandra as follows:
bin/cqlsh --cql3 < schema_createt.txt
This will install a keyspace named test with a column family named testcolumnfamily into cassandra.
Now from within your java application you can simply create a test class that has a main method (i will assume your development environment has all necessary dependencies if using maven):
try{
Mutator mutator = HFactory.createMutator(kweyspace, stringSerializer.get());
mutator.addInsertion("iamauser", "tescolumnfamily", HFactory.createStringColumn("firstname", "John"));
mutator.addInsertion("iamauser", "testcolumnfamily", HFactory.createStringColumn("lastname", "Smith"));
mutator.execute();
}
catch(HectorException Hex){ Hex.printStackTrace(); }
finally{ cluster.getConnectionManger().shutdown(); }
Now go back to the command line and enter into cassandra using:
$bin/cqlsh --cql3
use test;
select * from testcolumnfamily;
This will insert a row of data into your cassandra db with the key iamauser, and name as John Smith and you can verify as shown above using the cqlsh tool.
Hope this helps.

Related

Schema disagreements with Cassandra 4.0 using the Java driver

we have a 3-node dev Cassandra cluster running 3.11.13 that we have upgraded to 4.0.7, and we’ve been basically sending DDL statements through our Java applications using spring-data-cassandra:3.4.6 which uses the DataStax Java Driver version 4.14.1, and ever since we hadn’t had faced any issues with it until the upgrade to 4.0.7
The main issue with 4.0.7 that we’re facing is the schema disagreements that we’ve been seeing due to the tables created programmatically that has been a non-issue for us since 3.11.x. Although DDL statements made through cqlsh is working as expected, it’s only through the programmatic creation that we’re seeing the schema disagreements.
We’ve tried different cluster setups, C* versions, and Ubuntu versions, but we still face the same issue:
3-node, single-rack DC (Ubuntu 18.04, 20.04, 22.04) (4.0.x, 4.1.x)
3-node, 3-rack DC (Ubuntu 18.04, 20.04, 22.04) (4.0.x, 4.1.x) — This is the setup we’ve been using since 3.11.x
We’ve also tried fiddling with the driver configurations like adjusting the timeouts and disabling debouncing, but with no luck, face the same issue.
advanced.control-connection {
schema-agreement {
interval = 500 milliseconds
timeout = 10 seconds
warn-on-failure = true
}
},
advanced.metadata {
topology-event-debouncer {
window = 1 milliseconds
max-events = 1
}
schema {
request-timeout = 5 seconds
debouncer {
window = 1 milliseconds
max-events = 1
}
}
}
We’re creating tables programmatically through the following snippets:
#Override
protected abstract List<String> getStartupScripts();
#Bean
SessionFactoryInitializer sessionFactoryInitializer(SessionFactory sessionFactory) {
SessionFactoryInitializer initializer = new SessionFactoryInitializer();
initializer.setSessionFactory(sessionFactory);
final ResourceKeyspacePopulator resourceKeyspacePopulator = new ResourceKeyspacePopulator();
getStartupScripts().forEach(script ->
{
resourceKeyspacePopulator.addScript(scriptOf(script));
});
initializer.setKeyspacePopulator(resourceKeyspacePopulator);
return initializer;
}
And create one like:
#Override
protected List<String> getStartupScripts() {
return Arrays.asList(testTable());
}
private String testTable() {
return "CREATE TABLE IF NOT EXISTS test_table ("
+ "test text, "
+ "test2 text, "
+ "createdat bigint, "
+ "PRIMARY KEY(test, test2))";
}
But we end up in a loop until it timeouts due to the schema disagreement with the following errors:
DEBUG com.datastax.oss.driver.internal.core.metadata.SchemaAgreementChecker - [s1] Schema agreement not reached yet ([09989a2c-7348-3117-8b4a-d5cad549bc09, f4c8755d-6fec-38fe-984f-4083f4a0a0a0]), rescheduling in 500 ms
WARN org.springframework.context.support.GenericApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'sessionFactoryInitializer' defined in com.bitcoin.wallet.config.CassandraConfig: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.cassandra.core.cql.session.init.SessionFactoryInitializer]: Factory method 'sessionFactoryInitializer' threw exception; nested exception is org.springframework.data.cassandra.core.cql.session.init.ScriptStatementFailedException: Failed to execute CQL script statement #1 of Byte array resource [resource loaded from byte array]: CREATE TABLE IF NOT EXISTS test_table (test text,test2 text,createdat bigint,PRIMARY KEY(test, test2)); nested exception is com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT10S
So two things come to mind when reading through this:
Schema disagreements are often a symptom of some larger issue.
Does the node have its CPU pegged at 100%? Schema disagreement. Inefficient network routing? Schema disagreement. Disk IOPS maxed-out causing write back-pressure? Schema disagreement.
I'd have a look at the activity on the nodes and see if any of the above stand out.
Programmatic schema changes are often problematic.
Each node needs to store the complete schema, so each schema change gets sent to all nodes, essentially making schema changes running at an asynchronous ALL level of consistency. Because of that, there's no margin for error. And programmatic schema changes are often sent from within an application much faster than Cassandra can reconcile them.
My recommendations for making any schema changes:
Execute during off-peak times.
Only run when all nodes are UN.
Run them using cqlsh (not from application code).
Verify each individual change using nodetool describecluster.

Getting SyntaxException programmatically creating a table with the Cassandra Python driver

Error:
cassandra.protocol.SyntaxException: \
<Error from server: code=2000 [Syntax error in CQL query] \
message="line 1:36 no viable alternative at input '(' \
(CREATE TABLE master_table(dict_keys[(]...)">
Code:
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session=cluster.connect('firstkey')
ColName={"qty_dot_url": "int",
"qty_hyphen_url": "int",
"qty_underline_url": "int",
"qty_slash_url": "int"}
columns = ColName.keys()
values = ColName.values()
session.execute('CREATE TABLE master_table({ColName} {dataType}),PRIMARY KEY(qty_dot_url)'.format(ColName=columns, dataType=values))
How to resolve above mentioned error?
So I replaced the session.execute with a print, and it produced this:
CREATE TABLE master_table(dict_keys(['qty_dot_url', 'qty_hyphen_url', 'qty_underline_url', 'qty_slash_url']) dict_values(['int', 'int', 'int', 'int'])),PRIMARY KEY(qty_dot_url)
That is not valid CQL. It needs to look like this:
CREATE TABLE master_table(qty_dot_url int, qty_hyphen_url int,
qty_underline_url int, qty_slash_url int, PRIMARY KEY(qty_dot_url))
I was able to create that by making these adjustments to your code:
createTableCQL = "CREATE TABLE master_table("
for key, value in ColName.items():
createTableCQL += key + " " + value + ", "
createTableCQL += "PRIMARY KEY(qty_dot_url))"
You could then follow that with a session.execute(createTableCQL).
Notes:
The PRIMARY KEY definition must be inside the paren list.
Creating schema from inside application code is often problematic, and can create a schema disagreement in the cluster. It's almost always better to create tables outside of code.
The syntax exception is a result of your Python code generating an invalid CQL which Aaron pointed out in his response.
To add to his answer, you need to add additional steps whenever you are programatically making schema changes. In particular, you need to make sure that you check for schema agreement (i.e. the schema change has been propagated to all nodes) before moving on to the next bit in your code.
You will need to modify your code to save the result from the schema change, for example:
resultset = session.execute(SimpleStatement("CREATE TABLE ..."))
then call this in your code:
resultset.response_future.is_schema_agreed
You'll need to loop through this check until True is returned. Depending on how long you want to wait (default max_schema_agreement_wait is 10 seconds), you'll need to implement some logic to do [something] when schema agreement is not achieved (because a node is down for example) -- this requires manual intervention from an operator to investigate the cluster.
As Aaron already said, performing schema changes programatically is very problematic and we discourage doing this unless you fully understand the pitfalls and know how to handle failures. Cheers!

mybatis spring batch +sybase: trying to get the database identity value after insertion to assign it to the id field in pojo

my code looks like the sample code given below.
--table create statement
CREATE TABLE LOG
(
uniqueID NUMERIC(20,0) IDENTITY,
NAME VARCHAR(20) NOT NULL,
DESCRIPTION VARCHAR(200) NOT NULL,
USR VARCHAR(20) NOT NULL
)
--pojo class
public class Log
{
private long identifier;
private String name;
private String description;
private String user;
//getters+setters......
}
--insert statement in mapper
<insert id="insertRecord" parameterType="com.xxx.yyy.zzz.model.Log" useGeneratedKeys="true" keyProperty="identifier" keyColumn="uniqueID">
INSERT INTO LOG (NAME, DESCRIPTION, USR)
VALUES (#{log.name}, #{log.description}, #{log.user})
</insert>
issue: when i try to run this code against sybase database, am getting NullPointerException. When i tried to debug it, error came from within SybStatement.class. Sorry am not able to provide entier stacktrace due to constraint in copy/paste at my work station.
I am able to run the same code against H2 database successfully. Records got inserted and "identifier" in Log object is having the identify value same as database rows.
Did you face this issue in sybase?. Please share if anyone is having code for showing the usage of "useGeneratedKeys" mybatis feature in sybase..
Note:
I am running this insert statement using MybatisBatchItemWriter.
I tried to use two different sqlsessiontemplate objects for chunk reader & chunk writer and it didn't resolve the issue.
I am using jconn3 sybase jdbc jar, mybatis 3.4.4 and mybatis-spring 1.3.1 jar.
Thanks in advance
In SQL terms, you need to do SELECT ##IDENTITY to pick up the generated value. Thecquestion is if your framework generates such SQL...

Composite key in Cassandra with Pig and where_clause for part of the key in the where clause

I basically have the same problem as the following Composite key in Cassandra with Pig. The only difference is I try to query for a part of the composite key within the where_clause of pig.
The data structure is similar to the earlier mentioned issue, I'll copy some code/context to minimize the reading of that issue.
We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
Instead of querying for both the seqnumber and the occurday (as was the issue in previously mentioned issue) I try to query one of the keys.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlStorage();
gives
java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: InvalidRequestException(why:occurday cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:605)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
... 7 more
Basically my question is, what am I doing wrong or what don't I understand?
As I understand from CqlPagingRecorderReader Used when Partition Key Is Explicitly Stated
I should be able to query with just part of the partition key?
Also while reading
Add CqlRecordReader to take advantage of native CQL pagination
I get the impression this should be possible, but I am swimming around with (in my opinion) no clear direction on how to accomplish this.
Any help is very very welcome at this point.
Regards,
Lennart Weijl
PS.
I am running on Cassandra 2.0.9 with Pig 0.13.0
According to CASSANDRA-6311, I believe you need to apply the 6331-v2-2.0-branch.txt patch, recompile pig, and then update your LOAD statement to:
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlInputFormat();
The key change being USING CqlInputFormat() which triggers the use of the new CqlRecordReader that was released in Cassandra 2.0.7.
Edit: Note that the exception is thrown from CqlPagingRecordReader which means you're still using the old record reader.

How to check if a Cassandra table exists

Is there an easy way to check if table (column family) is defined in Cassandra using CQL (or API perhaps, using com.datastax.driver)?
Right now I am leaning towards executing SELECT 1 FROM table and checking for exception but maybe there is a better way?
As of 1.1 you should be able to query the system keyspace, schema_columnfamilies column family. If you know which keyspace you want to check, this CQL should list all column families in a keyspace:
SELECT columnfamily_name
FROM schema_columnfamilies WHERE keyspace_name='myKeyspaceName';
The report describing this functionality is here: https://issues.apache.org/jira/browse/CASSANDRA-2477
Although, they do note that some of the system column names have changed between 1.1 and 1.2. So you might have to mess around with it a little to get your desired results.
Edit 20160523 - Cassandra 3.x Update:
Note that for Cassandra 3.0 and up, you'll need to make a few adjustments to the above query:
SELECT table_name
FROM system_schema.tables WHERE keyspace_name='myKeyspaceName';
The Java driver (since you mentioned it in your question) also maintains a local representation of the schema.
Driver 3.x and below:
KeyspaceMetadata ks = cluster.getMetadata().getKeyspace("myKeyspace");
TableMetadata table = ks.getTable("myTable");
boolean tableExists = (table != null);
Driver 4.x and above:
Metadata metadata = session.getMetadata();
boolean tableExists =
metadata.getKeyspace("myKeyspace")
.flatMap(ks -> ks.getTable("myTable"))
.isPresent();
I just needed to manually check for the existence of a table using cqlsh.
Possibly useful general info.
describe keyspace_name.table_name
If it doesn't exist you'll get 'table_name' not found in keyspace 'keyspace'
If it does exist you'll get a description of the table.
For the .NET driver CassandraCSharpDriver version 3.17.1 the following code creates a table if it doesn't exist yet:
var ks = _cassandraSession.Cluster.Metadata.GetKeyspace(keyspaceName);
var tableNames = ks.GetTablesNames();
if(!tableNames.Contains(tableName.ToLowerInvariant()))
{
var stmt = new SimpleStatement($"CREATE TABLE {tableName} (id text PRIMARY KEY, name text, price decimal, volume int, time timestamp)");
_cassandraSession.Execute(stmt);
}
You will need to adapt the list of table columns to your needs. This can also be awaited by using await _cassandraSession.ExecuteAsync(stmt).ConfigureAwait(false) in an async method.
Also, I want to mention that I'm using Cassandra version 4.0.1.

Resources