I use wso2 dss to insert data into a cassandra table.
for exemple this table :
CREATE TABLE logs.test (id int,code int, PRIMARY KEY (id));
Inside wso2 dss, I defined code column with default value like this : #{NULL}
When I Try the dss service like this without given the code parameter:
<p:test xmlns:p="http://ws.wso2.org/dataservice">
<xs:id xmlns:xs="http://ws.wso2.org/dataservice">1</xs:id>
</p:test>
I get this error :
<axis2ns56:source_data_service>
<axis2ns56:data_service_name>Cassandra</axis2ns56:data_service_name>
<axis2ns56:description>N/A</axis2ns56:description>
<axis2ns56:location>\Cassandra.dbs</axis2ns56:location>
<axis2ns56:default_namespace>http://ws.wso2.org/dataservice</axis2ns56:default_namespace>
</axis2ns56:source_data_service>
<axis2ns56:ds_code>UNKNOWN_ERROR</axis2ns56:ds_code>
<axis2ns56:nested_exception>java.lang.NumberFormatException: null</axis2ns56:nested_exception>
Nested Exception:- java.lang.NumberFormatException: For input string: "null"
Best regards,
Nicolas
Would it be possible to get the source of the dataservice?
Did you try with the following payload
<p:test xmlns:p="http://ws.wso2.org/dataservice">
<p:id>1</p:id>
<p:code>2</p:code>
</p:test>
So I guess your issue is in this part
<param defaultValue="#{NULL}" name="code" sqlType="INTEGER"/>.
I do not know your use case but if I remember well it's not so nice to insert null values in Cassandra because it create tombstones.
You could as well have a second query that simply inserts the id like
insert test (id) values (:id).
The execption sound to be raised by dss not cassandra, looks like it is not able to set a null value for integer field
I find a workaround, I use the jdbc cassandra instead of com.datatasax driver.
And it work well. The only problem is that I just can call only one node for the connection and not the cluster.
I hope the problem will be resolve soon and I will use the Dss Cassandra datasource connection again.
Thks for your help
Related
Error:
cassandra.protocol.SyntaxException: \
<Error from server: code=2000 [Syntax error in CQL query] \
message="line 1:36 no viable alternative at input '(' \
(CREATE TABLE master_table(dict_keys[(]...)">
Code:
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session=cluster.connect('firstkey')
ColName={"qty_dot_url": "int",
"qty_hyphen_url": "int",
"qty_underline_url": "int",
"qty_slash_url": "int"}
columns = ColName.keys()
values = ColName.values()
session.execute('CREATE TABLE master_table({ColName} {dataType}),PRIMARY KEY(qty_dot_url)'.format(ColName=columns, dataType=values))
How to resolve above mentioned error?
So I replaced the session.execute with a print, and it produced this:
CREATE TABLE master_table(dict_keys(['qty_dot_url', 'qty_hyphen_url', 'qty_underline_url', 'qty_slash_url']) dict_values(['int', 'int', 'int', 'int'])),PRIMARY KEY(qty_dot_url)
That is not valid CQL. It needs to look like this:
CREATE TABLE master_table(qty_dot_url int, qty_hyphen_url int,
qty_underline_url int, qty_slash_url int, PRIMARY KEY(qty_dot_url))
I was able to create that by making these adjustments to your code:
createTableCQL = "CREATE TABLE master_table("
for key, value in ColName.items():
createTableCQL += key + " " + value + ", "
createTableCQL += "PRIMARY KEY(qty_dot_url))"
You could then follow that with a session.execute(createTableCQL).
Notes:
The PRIMARY KEY definition must be inside the paren list.
Creating schema from inside application code is often problematic, and can create a schema disagreement in the cluster. It's almost always better to create tables outside of code.
The syntax exception is a result of your Python code generating an invalid CQL which Aaron pointed out in his response.
To add to his answer, you need to add additional steps whenever you are programatically making schema changes. In particular, you need to make sure that you check for schema agreement (i.e. the schema change has been propagated to all nodes) before moving on to the next bit in your code.
You will need to modify your code to save the result from the schema change, for example:
resultset = session.execute(SimpleStatement("CREATE TABLE ..."))
then call this in your code:
resultset.response_future.is_schema_agreed
You'll need to loop through this check until True is returned. Depending on how long you want to wait (default max_schema_agreement_wait is 10 seconds), you'll need to implement some logic to do [something] when schema agreement is not achieved (because a node is down for example) -- this requires manual intervention from an operator to investigate the cluster.
As Aaron already said, performing schema changes programatically is very problematic and we discourage doing this unless you fully understand the pitfalls and know how to handle failures. Cheers!
I am planning to use the Update...USING TIMESTAMP... statement to make sure that I do not overwrite fresh data with stale data while having to avoid doing at least LOCAL_QUORUM writes.
Here is my table structure.
Table=DocumentStore
DocumentID (primaryKey, bigint)
Document(text)
Version(int)
If the service receives 2 write requests with Version=1 and Version=2, regardless of the order of arrival, the business requirement is that we end up with Version=2 in the database.
Can I use the following CQL Statement?
Update DocumentStore using <versionValue>
SET Document=<documentValue>,
Version=<versionValue>
where DocumentID=<documentIDValue>;
Has anybody used something like this? If so was the behavior as expected?
Yes, this is a known technique. Although it should be
UPDATE "DocumentStore" USING TIMESTAMP <versionValue>
SET "Document" = <documentValue>,
"Version" = <versionValue>
WHERE "DocumentID" = <documentIDValue>;
You missed a TIMESTAMP keyword, and also since you are using case sensitive names, you should enclose them in quotes.
I basically have the same problem as the following Composite key in Cassandra with Pig. The only difference is I try to query for a part of the composite key within the where_clause of pig.
The data structure is similar to the earlier mentioned issue, I'll copy some code/context to minimize the reading of that issue.
We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
Instead of querying for both the seqnumber and the occurday (as was the issue in previously mentioned issue) I try to query one of the keys.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlStorage();
gives
java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: InvalidRequestException(why:occurday cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:605)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
... 7 more
Basically my question is, what am I doing wrong or what don't I understand?
As I understand from CqlPagingRecorderReader Used when Partition Key Is Explicitly Stated
I should be able to query with just part of the partition key?
Also while reading
Add CqlRecordReader to take advantage of native CQL pagination
I get the impression this should be possible, but I am swimming around with (in my opinion) no clear direction on how to accomplish this.
Any help is very very welcome at this point.
Regards,
Lennart Weijl
PS.
I am running on Cassandra 2.0.9 with Pig 0.13.0
According to CASSANDRA-6311, I believe you need to apply the 6331-v2-2.0-branch.txt patch, recompile pig, and then update your LOAD statement to:
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlInputFormat();
The key change being USING CqlInputFormat() which triggers the use of the new CqlRecordReader that was released in Cassandra 2.0.7.
Edit: Note that the exception is thrown from CqlPagingRecordReader which means you're still using the old record reader.
Is there an easy way to check if table (column family) is defined in Cassandra using CQL (or API perhaps, using com.datastax.driver)?
Right now I am leaning towards executing SELECT 1 FROM table and checking for exception but maybe there is a better way?
As of 1.1 you should be able to query the system keyspace, schema_columnfamilies column family. If you know which keyspace you want to check, this CQL should list all column families in a keyspace:
SELECT columnfamily_name
FROM schema_columnfamilies WHERE keyspace_name='myKeyspaceName';
The report describing this functionality is here: https://issues.apache.org/jira/browse/CASSANDRA-2477
Although, they do note that some of the system column names have changed between 1.1 and 1.2. So you might have to mess around with it a little to get your desired results.
Edit 20160523 - Cassandra 3.x Update:
Note that for Cassandra 3.0 and up, you'll need to make a few adjustments to the above query:
SELECT table_name
FROM system_schema.tables WHERE keyspace_name='myKeyspaceName';
The Java driver (since you mentioned it in your question) also maintains a local representation of the schema.
Driver 3.x and below:
KeyspaceMetadata ks = cluster.getMetadata().getKeyspace("myKeyspace");
TableMetadata table = ks.getTable("myTable");
boolean tableExists = (table != null);
Driver 4.x and above:
Metadata metadata = session.getMetadata();
boolean tableExists =
metadata.getKeyspace("myKeyspace")
.flatMap(ks -> ks.getTable("myTable"))
.isPresent();
I just needed to manually check for the existence of a table using cqlsh.
Possibly useful general info.
describe keyspace_name.table_name
If it doesn't exist you'll get 'table_name' not found in keyspace 'keyspace'
If it does exist you'll get a description of the table.
For the .NET driver CassandraCSharpDriver version 3.17.1 the following code creates a table if it doesn't exist yet:
var ks = _cassandraSession.Cluster.Metadata.GetKeyspace(keyspaceName);
var tableNames = ks.GetTablesNames();
if(!tableNames.Contains(tableName.ToLowerInvariant()))
{
var stmt = new SimpleStatement($"CREATE TABLE {tableName} (id text PRIMARY KEY, name text, price decimal, volume int, time timestamp)");
_cassandraSession.Execute(stmt);
}
You will need to adapt the list of table columns to your needs. This can also be awaited by using await _cassandraSession.ExecuteAsync(stmt).ConfigureAwait(false) in an async method.
Also, I want to mention that I'm using Cassandra version 4.0.1.
trying to write a query which will paginate through all rows in a column family using astyanax client and RowSliceQuery.
keyspace.prepareQuery(COLUMN_FAMILY).getKeyRange(null, null, null, null, 100);
Done this successfully using hector where 1st call is done with null start and end keys. After retrieving 1st page I use last key from the result to make query for second page and etc. This is code for 1st page using hector.
HFactory.createRangeSlicesQuery(keyspace,
LongSerializer.get(), new CompositeSerializer(),
BytesArraySerializer.get())
.setColumnFamily(COLUMN_FAMILY)
.setRange(null, null, false, 100).setRowCount(100);
Now when I am trying to do this with astyanax I am getting errors about null and non-null keys and tokens. Not sure what tokens do in this query. Also I am able to use allRows(), but would like to do this using key range query as it gives me more flexibility.
Does anybody have an example of key range query using astyanax? I cannot find an example neither in "getting started" documentation or anywhere else on the net.
Thanks!
Anton
What you are referring to is the getRowRange method:
keyspace.prepareQuery(CF_STANDARD1)
.getRowRange(startKey, endKey, startToken, endToken, count)
Note however that this works only when the ByteOrderedPartitioner is used. Since by default Cassandra uses the Murmur3Partitioner, this will usually not work. Using an index to do this instead is recommended. Astyanax also provides the reverse index search recipe which takes advantage of a second column family which stores your keys as columns to allow efficient range searches on the original data.
Check this sample code. I hope this code will help you in doing the paging.
IndexQuery<String, String> query = keyspace
.prepareQuery(CF_STANDARD1).searchWithIndex()
.setRowLimit(10).autoPaginateRows(true).addExpression()
.whereColumn("Index2").equals().value(42);
Best,