Composite key in Cassandra with Pig and where_clause for part of the key in the where clause - cassandra

I basically have the same problem as the following Composite key in Cassandra with Pig. The only difference is I try to query for a part of the composite key within the where_clause of pig.
The data structure is similar to the earlier mentioned issue, I'll copy some code/context to minimize the reading of that issue.
We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
Instead of querying for both the seqnumber and the occurday (as was the issue in previously mentioned issue) I try to query one of the keys.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlStorage();
gives
java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: InvalidRequestException(why:occurday cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:605)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
... 7 more
Basically my question is, what am I doing wrong or what don't I understand?
As I understand from CqlPagingRecorderReader Used when Partition Key Is Explicitly Stated
I should be able to query with just part of the partition key?
Also while reading
Add CqlRecordReader to take advantage of native CQL pagination
I get the impression this should be possible, but I am swimming around with (in my opinion) no clear direction on how to accomplish this.
Any help is very very welcome at this point.
Regards,
Lennart Weijl
PS.
I am running on Cassandra 2.0.9 with Pig 0.13.0

According to CASSANDRA-6311, I believe you need to apply the 6331-v2-2.0-branch.txt patch, recompile pig, and then update your LOAD statement to:
data = LOAD 'cql://ks/data?where_clause=occurday%3D%272013-10-01%27' USING CqlInputFormat();
The key change being USING CqlInputFormat() which triggers the use of the new CqlRecordReader that was released in Cassandra 2.0.7.
Edit: Note that the exception is thrown from CqlPagingRecordReader which means you're still using the old record reader.

Related

Apache Cassandra "no viable alternative at input 'OR' "

My table looks like :
CREATE TABLE prod_cust (
pid bigint,
cid bigint,
effective_date date,
expiry_date date,
PRIMARY KEY ((pid, cid))
);
My below query is giving no viable alternative at input 'OR' error
SELECT * FROM prod_cust
where
pid=101 and cid=201
OR
pid=102 and cid=202;
Does Cassandra not support OR operator if not, Is there any alternate way to achieve my result.
CQL does not support the OR operator. Sometimes you can get around that by using IN. But even IN won't let you do what you're attempting.
I see two options:
Submit each side of your OR as individual queries.
Restructure the table to better-suit what you're trying to do. Doing a "port-over" from a RDBMS to Cassandra almost never works as intended.

Wso2 Dss insert null cassandra

I use wso2 dss to insert data into a cassandra table.
for exemple this table :
CREATE TABLE logs.test (id int,code int, PRIMARY KEY (id));
Inside wso2 dss, I defined code column with default value like this : #{NULL}
When I Try the dss service like this without given the code parameter:
<p:test xmlns:p="http://ws.wso2.org/dataservice">
<xs:id xmlns:xs="http://ws.wso2.org/dataservice">1</xs:id>
</p:test>
I get this error :
<axis2ns56:source_data_service>
<axis2ns56:data_service_name>Cassandra</axis2ns56:data_service_name>
<axis2ns56:description>N/A</axis2ns56:description>
<axis2ns56:location>\Cassandra.dbs</axis2ns56:location>
<axis2ns56:default_namespace>http://ws.wso2.org/dataservice</axis2ns56:default_namespace>
</axis2ns56:source_data_service>
<axis2ns56:ds_code>UNKNOWN_ERROR</axis2ns56:ds_code>
<axis2ns56:nested_exception>java.lang.NumberFormatException: null</axis2ns56:nested_exception>
Nested Exception:- java.lang.NumberFormatException: For input string: "null"
Best regards,
Nicolas
Would it be possible to get the source of the dataservice?
Did you try with the following payload
<p:test xmlns:p="http://ws.wso2.org/dataservice">
<p:id>1</p:id>
<p:code>2</p:code>
</p:test>
So I guess your issue is in this part
<param defaultValue="#{NULL}" name="code" sqlType="INTEGER"/>.
I do not know your use case but if I remember well it's not so nice to insert null values in Cassandra because it create tombstones.
You could as well have a second query that simply inserts the id like
insert test (id) values (:id).
The execption sound to be raised by dss not cassandra, looks like it is not able to set a null value for integer field
I find a workaround, I use the jdbc cassandra instead of com.datatasax driver.
And it work well. The only problem is that I just can call only one node for the connection and not the cluster.
I hope the problem will be resolve soon and I will use the Dss Cassandra datasource connection again.
Thks for your help

Cassandra 2.0, CQL 3.x Update ...USING TIMESTAMP

I am planning to use the Update...USING TIMESTAMP... statement to make sure that I do not overwrite fresh data with stale data while having to avoid doing at least LOCAL_QUORUM writes.
Here is my table structure.
Table=DocumentStore
DocumentID (primaryKey, bigint)
Document(text)
Version(int)
If the service receives 2 write requests with Version=1 and Version=2, regardless of the order of arrival, the business requirement is that we end up with Version=2 in the database.
Can I use the following CQL Statement?
Update DocumentStore using <versionValue>
SET Document=<documentValue>,
Version=<versionValue>
where DocumentID=<documentIDValue>;
Has anybody used something like this? If so was the behavior as expected?
Yes, this is a known technique. Although it should be
UPDATE "DocumentStore" USING TIMESTAMP <versionValue>
SET "Document" = <documentValue>,
"Version" = <versionValue>
WHERE "DocumentID" = <documentIDValue>;
You missed a TIMESTAMP keyword, and also since you are using case sensitive names, you should enclose them in quotes.

Composite key in Cassandra with Pig

We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
I can query this table from cqlsh like this:
select * from data where seqnumber = 10 AND occurday = '2013-10-01';
This query works and returns the expected data.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=seqnumber%3D10%20AND%20occurday%3D%272013-10-01%27' USING CqlStorage();
gives
InvalidRequestException(why:seqnumber cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:39567)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1625)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1611)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:591)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:621)
Shouldn't these behave the same? Why is the version through Pig failing where the straight cqlsh command works?
Hadoop is using CqlPagingRecordReader to try to load your data. This is leading to queries that are not identical to what you have entered. The paging record reader is trying to obtain small slices of Cassandra data at a time to avoid timeouts.
This means that your query is executed as
SELECT * FROM "data" WHERE token("occurday","seqnumber") > ? AND
token("occurday","seqnumber") <= ? AND occurday='A Great Day'
AND seqnumber=1 LIMIT 1000 ALLOW FILTERING
And this is why you are seeing your repeated key error. I'll submit a bug to the Cassandra Project.
Jira:
https://issues.apache.org/jira/browse/CASSANDRA-6151

How do I specify a primary key on table creation with haskellDB?

Currently I am using something like this:
dbCreateTable db "MyTable" [ ("Col1", (StringT, False)), ("Col2", (StringT, False)) ]
which works fine, but i'd like to make "Col1" the primary key. Do i need to go back to raw SQL?
edit:
This still seems to hold:
"The part of creating a database from Haskell itself is not very
useful, for example you cannot express foreign- and primary keys,
indexes and constraints. Even the most simple database will need
one of these."
From http://www.mijnadres.net/published/HaskellDB.pdf
As the edit notes, HaskellDB is not very good at creating tables at the moment. It's best to build a database first, and then extract the info.

Resources