From the table schema below, I am trying to select all pH readings that are below 5.
I have followed these three pieces of advice:
Use ALLOW FILTERING
Include an equality comparison
Create a secondary index on the reading_value column.
Here is my query:
select * from todmorden_numeric where sensor_name = 'pHradio' and reading_value < 5 allow filtering;
Which is rejected with this message:
Bad Request: No indexed columns present in by-columns clause with Equal operator
I tried adding a secondary index to the sensor_name column and was told that it was already part of the key and therefore already indexed.
I created the index after the table had been in use for a while - could that be the problem? I ran "nodetool refresh" in the hope it would make the index available but this did not work. Here is the output of describe table todmorden_numeric :
CREATE TABLE todmorden_numeric (
sensor_name text,
reading_time timestamp,
reading_value float,
PRIMARY KEY ((sensor_name), reading_time)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='Data that suits being stored as floats' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX todmorden_numeric_reading_value_idx ON todmorden_numeric (reading_value);
Cassandra allows range search only on:
a) Partition Key only if ByteOrderPartitioner is used (default now is murmur3).
b) any single clustering key ONLY IF any clustering keys defined BEFORE the target column in the primary key definition are already specified by an = operator in the predicate.
They don't work on secondary indices.
Consider the following table definition:
CREATE TABLE tod1 (name text, time timestamp,
val float, PRIMARY KEY (name, time));
You CAN'T do a range on the val in this case.
Consider this one:
CREATE TABLE tod2 (name text, time timestamp,
val float, PRIMARY KEY (name, time, val));
Then the following is valid:
SELECT * FROM tod2 WHERE name='X' AND time='timehere' AND val < 5;
Kinda pointless, but this is not valid:
SELECT * from tod2 WHERE name='X' AND val < 5;
It's not valid as you haven't filtered by a previous clustering key in the primary key def (in this case, time).
For your query, you may want to do this:
CREATE TABLE tod3 (name text, time timestamp,
val float, PRIMARY KEY (name, val, time));
Note the order of columns in the primary key: val's before time.
This will allow you to do:
SELECT * from tod3 WHERE name='asd' AND val < 5;
On a different note, how long do you intend to hold data? How frequently do you get readings? This can cause your partition to grow quite large quite quickly. You may want to bucket it readings into multiple partitions (manual sharding). Perhaps one partition per day? Of course, such things would greatly depend on your access patterns.
Hope that helps.
Related
Is there there any way to query on a SET type(or MAP/LIST) to find does it contain a value or not?
Something like this:
CREATE TABLE test.table_name(
id text,
ckk SET<INT>,
PRIMARY KEY((id))
);
Select * FROM table_name WHERE id = 1 AND ckk CONTAINS 4;
Is there any way to reach this query with YCQL api?
And can we use a SET type in SECONDRY INDEX?
Is there any way to reach this query with YCQL api?
YCQL does not support the CONTAINS keyword yet (feel free to open an issue for this on the YugabyteDB GitHub).
One workaround can be to use MAP<INT, BOOLEAN> instead of SET<INT> and the [] operator.
For instance:
CREATE TABLE test.table_name(
id text,
ckk MAP<int, boolean>,
PRIMARY KEY((id))
);
SELECT * FROM table_name WHERE id = 'foo' AND ckk[4] = true;
And can we use a SET type in SECONDRY INDEX?
Generally, collection types cannot be part of the primary key, or an index key.
However, "frozen" collections (i.e. collections serialized into a single value internally) can actually be part of either primary key or index key.
For instance:
CREATE TABLE table2(
id TEXT,
ckk FROZEN<SET<INT>>,
PRIMARY KEY((id))
) WITH transactions = {'enabled' : true};
CREATE INDEX table2_idx on table2(ckk);
Another option is to use with compound primary key and defining ckk as clustering key:
cqlsh> CREATE TABLE ybdemo.tt(id TEXT, ckk INT, PRIMARY KEY ((id), ckk)) WITH CLUSTERING ORDER BY (ckk DESC);
cqlsh> SELECT * FROM ybdemo.tt WHERE id='foo' AND ckk=4;
I have following requirement of my dataset, need to unserstand what datatype should I use and how to save my data accordingly :-
CREATE TABLE events (
id text,
evntoverlap map<text, map<timestamp,int>>,
PRIMARY KEY (id)
)
evntoverlap = {
'Dig1': {{'2017-10-09 04:10:05', 0}},
'Dig2': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0}},
'Dig3': {{'2017-10-09 04:11:05', 0},{'2017-10-09 04:15:05', 0},{'2017-10-09 04:11:05', 0}}
}
This gives an error :-
Error from server: code=2200 [Invalid query] message="Non-frozen collections are not allowed inside collections: map<text, map<timestamp, int>>"
How should I store this type of data in single column . Please suggest datatype and insert command for the same.
Thanks,
There is limitation of Cassandra - you can't nest collection (or UDT) inside collection without making it frozen. So you need to "froze" one of the collections - either nested:
CREATE TABLE events (
id text,
evntoverlap map<text, frozen<map<timestamp,int>>>,
PRIMARY KEY (id)
);
or top-level:
CREATE TABLE events (
id text,
evntoverlap frozen<map<text, map<timestamp,int>>>,
PRIMARY KEY (id)
);
See documentation for more details.
CQL collections limited to 64kb, if putting things like maps in maps you might push that limit. Especially with frozen maps you are deserializing the entire map, modifying it, and re inserting. Might be better off with a
CREATE TABLE events (
id text,
evnt_key, text
value map<timestamp, int>,
PRIMARY KEY ((id), evnt_key)
)
Or even a
CREATE TABLE events (
id text,
evnt_key, text
evnt_time timestamp
value int,
PRIMARY KEY ((id), evnt_key, evnt_time)
)
It would be more efficient and safer while giving additional benefits like being able to order the event_time's in ascending or descending order.
I am running into a strange problem using Cassandra 1.2 (DSE 3.1.1). I have a table called JSESSION and here is the structure:
cqlsh> use recommender;
cqlsh:recommender> describe table jsession;
CREATE TABLE jsession (
sessionid text,
accessdate timestamp,
atompaths set<text>,
filename text,
processed boolean,
processedtime timestamp,
userid text,
usertag bigint,
PRIMARY KEY (sessionid, accessdate)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX processed_index ON jsession (processed);
You can see that the table is indexed on the field 'processed' which is boolean. When I started coding on this table, the following query used to work fine:
cqlsh:recommender> select * from jsession where processed = false limit 100;
But now that the size is more than 100,000 (not a large number at all), the query has stopped working suddenly, and I couldn't figure out a workaround yet.
cqlsh:recommender> select count(*) from jsession limit 1000000;
count
--------
142320
cqlsh:recommender> select * from jsession where processed = false limit 100;
Request did not complete within rpc_timeout.
I tried several options, to increase the rpc_timout to 60 seconds, also to start Cassandra with more memory (it is 8GB now), but I still have the same problem. Do you have any solution for this?
The deeper question is what is the right way to model a boolean field in CQL3 so that I can search for that field and update it as well. I need to set the field 'processed' to true after I have processed that session.
You don't have a boolean modeling problem. You just need to paginate the results.
select * from jsession where processed = false and token(sessionid) > token('ABC') limit 1000;
Where 'ABC' is the last session id you read (or '' for the first query). Just keep feeding the token id back into this query until you've read everything.
See also http://www.datastax.com/documentation/cql/3.1/webhelp/index.html#cql/cql_reference/../cql_using/paging_c.html
I am using the following Cassandra/CQL versions:
[cqlsh 4.0.1 | Cassandra 2.0.1 | CQL spec 3.1.1 | Thrift protocol 19.37.0]
I am trying to insert data into a pre-existing CF with case sensitive column names. I hit "unknown identifier" errors when trying to insert data.
Following is how the column family is described:
CREATE TABLE "Sample_List_CS" (
key text,
column1 text,
"fName" text,
"ipSubnet" text,
"ipSubnetMask" text,
value text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=0 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='false' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX ipSubnet ON "Sample_List_CS" ("ipSubnet");
The insert statements result in errors:
cqlsh:Sample_KS> INSERT INTO "Sample_List_CS" (key,column1,"fName") VALUES ('123','1','myValue');
Bad Request: Unknown identifier fName
cqlsh:Sample_KS> INSERT INTO "Sample_List_CS" (key,column1,"ipSubnet") VALUES ('123','1','255');
Bad Request: Unknown identifier ipSubnet
Any idea what I am doing wrong?
As I understand it when using WITH COMPACT STORAGE a table may only have one column other than the primary key.
As quoted in the manual:
Using the compact storage directive prevents you from adding more than
one column that is not part of the PRIMARY KEY.
For you that means you can only have one of these 4 columns in your table:
"fName"
"ipSubnet"
"ipSubnetMask"
value
(Alternatively, you could add 3 of them to the primary key definition.)
Thus it makes sense that the other three columns lead to an Unknown identifier error.
I've created a table in CQL3 console (no single primary key constituent is unique, together they will be):
CREATE TABLE aggregate_logs (
bpid varchar,
jid int,
month int,
year int,
value counter,
PRIMARY KEY (bpid, jid, month, year));
then been able to update and query by using:
UPDATE aggregate_logs SET value = value + 1 WHERE bpid='1' and jid=1 and month=1 and year=2000;
This works as expected. I wanted to do the same update in Hector (in Scala):
val aggregateMutator:Mutator[Composite] = HFactory.createMutator(keyspace, compositeSerializer)
val compKey = new Composite()
compKey.addComponent(bpid, stringSerializer)
compKey.addComponent(new Integer(jid), intSerializer)
compKey.addComponent(new Integer(month), intSerializer)
compKey.addComponent(new Integer(year), intSerializer)
aggregateMutator.incrementCounter(compKey, LogsAggregateFamily, "value", 1)
but I get an error with the message:
...HInvalidRequestException: InvalidRequestException(why:String didn't validate.)
Running the query direct from hector with:
val query = new me.prettyprint.cassandra.model.CqlQuery(keyspace, compositeSerializer, stringSerializer, new IntegerSerializer())
query.setQuery("UPDATE aggregate_logs SET value = value + 1 WHERE 'bpid'=1 and jid=1 and month=1 and year=2000")
query.execute()
which gives me the error:
InvalidRequestException(why:line 1:59 mismatched input 'and' expecting EOF)
I've not seem any other examples which use a counter under a composite primary key. Is it even possible?
It's definitely possible using directly cql (both via CQLSH and C++, at least):
cqlsh:goh_master> describe table daily_caps;
CREATE TABLE daily_caps
( caps_type ascii, id ascii, value counter, PRIMARY KEY
(caps_type, id) ) WITH COMPACT STORAGE AND comment='' AND
caching='KEYS_ONLY' AND read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
cqlsh:goh_master> update daily_caps set value=value +1 where caps_type='xp' and id ='myid';
cqlsh:goh_master> select * from daily_caps;
caps_type | id | value
-----------+------+-------
xp | myid | 1
CQL3 and the thrift API are not compatible. So creating a column family with CQL3 and accessing it with Hector or another thrift based client will not work. For more information see:
https://issues.apache.org/jira/browse/CASSANDRA-4377