Cassandra/Hector: Add a counter on a composite primary key - cassandra

I've created a table in CQL3 console (no single primary key constituent is unique, together they will be):
CREATE TABLE aggregate_logs (
bpid varchar,
jid int,
month int,
year int,
value counter,
PRIMARY KEY (bpid, jid, month, year));
then been able to update and query by using:
UPDATE aggregate_logs SET value = value + 1 WHERE bpid='1' and jid=1 and month=1 and year=2000;
This works as expected. I wanted to do the same update in Hector (in Scala):
val aggregateMutator:Mutator[Composite] = HFactory.createMutator(keyspace, compositeSerializer)
val compKey = new Composite()
compKey.addComponent(bpid, stringSerializer)
compKey.addComponent(new Integer(jid), intSerializer)
compKey.addComponent(new Integer(month), intSerializer)
compKey.addComponent(new Integer(year), intSerializer)
aggregateMutator.incrementCounter(compKey, LogsAggregateFamily, "value", 1)
but I get an error with the message:
...HInvalidRequestException: InvalidRequestException(why:String didn't validate.)
Running the query direct from hector with:
val query = new me.prettyprint.cassandra.model.CqlQuery(keyspace, compositeSerializer, stringSerializer, new IntegerSerializer())
query.setQuery("UPDATE aggregate_logs SET value = value + 1 WHERE 'bpid'=1 and jid=1 and month=1 and year=2000")
query.execute()
which gives me the error:
InvalidRequestException(why:line 1:59 mismatched input 'and' expecting EOF)
I've not seem any other examples which use a counter under a composite primary key. Is it even possible?

It's definitely possible using directly cql (both via CQLSH and C++, at least):
cqlsh:goh_master> describe table daily_caps;
CREATE TABLE daily_caps
( caps_type ascii, id ascii, value counter, PRIMARY KEY
(caps_type, id) ) WITH COMPACT STORAGE AND comment='' AND
caching='KEYS_ONLY' AND read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
cqlsh:goh_master> update daily_caps set value=value +1 where caps_type='xp' and id ='myid';
cqlsh:goh_master> select * from daily_caps;
caps_type | id | value
-----------+------+-------
xp | myid | 1

CQL3 and the thrift API are not compatible. So creating a column family with CQL3 and accessing it with Hector or another thrift based client will not work. For more information see:
https://issues.apache.org/jira/browse/CASSANDRA-4377

Related

Yugabyte YCQL check if a set contain a value?

Is there there any way to query on a SET type(or MAP/LIST) to find does it contain a value or not?
Something like this:
CREATE TABLE test.table_name(
id text,
ckk SET<INT>,
PRIMARY KEY((id))
);
Select * FROM table_name WHERE id = 1 AND ckk CONTAINS 4;
Is there any way to reach this query with YCQL api?
And can we use a SET type in SECONDRY INDEX?
Is there any way to reach this query with YCQL api?
YCQL does not support the CONTAINS keyword yet (feel free to open an issue for this on the YugabyteDB GitHub).
One workaround can be to use MAP<INT, BOOLEAN> instead of SET<INT> and the [] operator.
For instance:
CREATE TABLE test.table_name(
id text,
ckk MAP<int, boolean>,
PRIMARY KEY((id))
);
SELECT * FROM table_name WHERE id = 'foo' AND ckk[4] = true;
And can we use a SET type in SECONDRY INDEX?
Generally, collection types cannot be part of the primary key, or an index key.
However, "frozen" collections (i.e. collections serialized into a single value internally) can actually be part of either primary key or index key.
For instance:
CREATE TABLE table2(
id TEXT,
ckk FROZEN<SET<INT>>,
PRIMARY KEY((id))
) WITH transactions = {'enabled' : true};
CREATE INDEX table2_idx on table2(ckk);
Another option is to use with compound primary key and defining ckk as clustering key:
cqlsh> CREATE TABLE ybdemo.tt(id TEXT, ckk INT, PRIMARY KEY ((id), ckk)) WITH CLUSTERING ORDER BY (ckk DESC);
cqlsh> SELECT * FROM ybdemo.tt WHERE id='foo' AND ckk=4;

nested map in cassandra data modelling

I have following requirement of my dataset, need to unserstand what datatype should I use and how to save my data accordingly :-
CREATE TABLE events (
id text,
evntoverlap map<text, map<timestamp,int>>,
PRIMARY KEY (id)
)
evntoverlap = {
'Dig1': {{'2017-10-09 04:10:05', 0}},
'Dig2': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0}},
'Dig3': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0},{'2017-10-09 04:11:05', 0}}
}
This gives an error :-
Error from server: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, map<timestamp, int>>"
How should I store this type of data in single column . Please suggest datatype and insert command for the same.
Thanks,
There is limitation of Cassandra - you can't nest collection (or UDT) inside collection without making it frozen. So you need to "froze" one of the collections - either nested:
CREATE TABLE events (
id text,
evntoverlap map<text, frozen<map<timestamp,int>>>,
PRIMARY KEY (id)
);
or top-level:
CREATE TABLE events (
id text,
evntoverlap frozen<map<text, map<timestamp,int>>>,
PRIMARY KEY (id)
);
See documentation for more details.
CQL collections limited to 64kb, if putting things like maps in maps you might push that limit. Especially with frozen maps you are deserializing the entire map, modifying it, and re inserting. Might be better off with a
CREATE TABLE events (
id text,
evnt_key, text
value map<timestamp, int>,
PRIMARY KEY ((id), evnt_key)
)
Or even a
CREATE TABLE events (
id text,
evnt_key, text
evnt_time timestamp
value int,
PRIMARY KEY ((id), evnt_key, evnt_time)
)
It would be more efficient and safer while giving additional benefits like being able to order the event_time's in ascending or descending order.

Cassandra QueryBuilder not returning any result, whereas same query works fine in CQL shell

SELECT count(*) FROM device_stats
WHERE orgid = 'XYZ'
AND regionid = 'NY'
AND campusid = 'C1'
AND buildingid = 'C1'
AND floorid = '2'
AND year = 2017;
The above CQL query returns correct result - 32032, in CQL Shell
But when I run the same query using QueryBuilder Java API , I see the count as 0
BuiltStatement summaryQuery = QueryBuilder.select()
.countAll()
.from("device_stats")
.where(eq("orgid", "XYZ"))
.and(eq("regionid", "NY"))
.and(eq("campusid", "C1"))
.and(eq("buildingid", "C1"))
.and(eq("floorid", "2"))
.and(eq("year", "2017"));
try {
ResultSetFuture tagSummaryResults = session.executeAsync(tagSummaryQuery);
tagSummaryResults.getUninterruptibly().all().stream().forEach(result -> {
System.out.println(" totalCount > "+result.getLong(0));
});
I have only 20 partitions and 32032 rows per partition.
What could be the reason QueryBuilder not executing the query correctly ?
Schema :
CREATE TABLE device_stats (
orgid text,
regionid text,
campusid text,
buildingid text,
floorid text,
year int,
endofwindow timestamp,
categoryid timeuuid,
devicestats map<text,bigint>,
PRIMARY KEY ((orgid, regionid, campusid, buildingid, floorid,year),endofwindow,categoryid)
) WITH CLUSTERING ORDER BY (endofwindow DESC,categoryid ASC);
// Using the keys function to index the map keys
CREATE INDEX ON device_stats (keys(devicestats));
I am using cassandra 3.10 and com.datastax.cassandra:cassandra-driver-core:3.1.4
Moving my comment to an answer since that seems to solve the original problem:
Changing .and(eq("year", "2017")) to .and(eq("year", 2017)) solves the issue since year is an int and not a text.

com.datastax.driver.core.exceptions.InvalidQueryException: Invalid operator IN for PRIMARY KEY part

I have cassandra 2.1.15.
I have this table
CREATE TABLE ks_mobapp.messages (
pair_id text,
belong_to text,
message_id timeuuid,
cli_time bigint,
sender text,
text text,
time bigint,
PRIMARY KEY ((pair_id, belong_to), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
I was trying to delete multiple record as
instances.getCqlSession().execute(QueryBuilder.delete()
.from(AppConstants.KEYSPACE, "messages")
.where(QueryBuilder.eq("pair_id", pairId))
.and(QueryBuilder.eq("belong_to", currentUser.value("userId")))
.and(QueryBuilder.in("message_id", msgId)));
I am getting error:
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid operator IN for PRIMARY KEY part message_id
Then I tried:
Session session = instances.getCqlSession();
PreparedStatement statement = session.prepare("DELETE FROM ks_mobApp.messages WHERE pair_id = ? AND belong_to = ? AND message_id = ?;");
Iterator<String> iterator = msgId.iterator();
while(iterator.hasNext()) {
try {
session.executeAsync(statement.bind(pairId, currentUser.value("userId"), UUID.fromString(iterator.next())));
} catch(Exception ex) {
}
}
Its working nice. Is this the correct way? I can't use IN for same partition key ?
DELETE in Query only supported for partition key.
Delete IN relation is only supported for partition key)
There are some WHERE clause restrictions for the UPDATE and DELETE statements in cassandra 2.x
more specifically you can only use the IN operator on the last partition key column. So in your case the last partition column is belong_to. so IN can only be used on that column.
However these limitation are removed in cassandra 3.0. and it will allow
IN to be specified on any partition key column
IN to be specified on any clustering column
Here is the patch https://issues.apache.org/jira/browse/CASSANDRA-6237
Read this also http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

Cassandra CQL range query rejected despite equality operator and secondary index

From the table schema below, I am trying to select all pH readings that are below 5.
I have followed these three pieces of advice:
Use ALLOW FILTERING
Include an equality comparison
Create a secondary index on the reading_value column.
Here is my query:
select * from todmorden_numeric where sensor_name = 'pHradio' and reading_value < 5 allow filtering;
Which is rejected with this message:
Bad Request: No indexed columns present in by-columns clause with Equal operator
I tried adding a secondary index to the sensor_name column and was told that it was already part of the key and therefore already indexed.
I created the index after the table had been in use for a while - could that be the problem? I ran "nodetool refresh" in the hope it would make the index available but this did not work. Here is the output of describe table todmorden_numeric :
CREATE TABLE todmorden_numeric (
sensor_name text,
reading_time timestamp,
reading_value float,
PRIMARY KEY ((sensor_name), reading_time)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='Data that suits being stored as floats' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX todmorden_numeric_reading_value_idx ON todmorden_numeric (reading_value);
Cassandra allows range search only on:
a) Partition Key only if ByteOrderPartitioner is used (default now is murmur3).
b) any single clustering key ONLY IF any clustering keys defined BEFORE the target column in the primary key definition are already specified by an = operator in the predicate.
They don't work on secondary indices.
Consider the following table definition:
CREATE TABLE tod1 (name text, time timestamp,
val float, PRIMARY KEY (name, time));
You CAN'T do a range on the val in this case.
Consider this one:
CREATE TABLE tod2 (name text, time timestamp,
val float, PRIMARY KEY (name, time, val));
Then the following is valid:
SELECT * FROM tod2 WHERE name='X' AND time='timehere' AND val < 5;
Kinda pointless, but this is not valid:
SELECT * from tod2 WHERE name='X' AND val < 5;
It's not valid as you haven't filtered by a previous clustering key in the primary key def (in this case, time).
For your query, you may want to do this:
CREATE TABLE tod3 (name text, time timestamp,
val float, PRIMARY KEY (name, val, time));
Note the order of columns in the primary key: val's before time.
This will allow you to do:
SELECT * from tod3 WHERE name='asd' AND val < 5;
On a different note, how long do you intend to hold data? How frequently do you get readings? This can cause your partition to grow quite large quite quickly. You may want to bucket it readings into multiple partitions (manual sharding). Perhaps one partition per day? Of course, such things would greatly depend on your access patterns.
Hope that helps.

Resources