Why CQL is suggesting me to use ALLOW FILTERING on update? - cql

My simple update query is throwing following error.
InvalidRequest: Error from server: code=220 [Invalid query]
message="Cannot execute this query as it might involve data filtering and
thus may have unpredictable performance. If you want to execute this
query despite the performance unpredictability, use ALLOW FILTERING"
As far as I know ALLOW FILTERING adds indexing and it is usually used to SELECT query. I tried to understand the reason behind this error. But could not find anything. Also, I have just started to learn CQL few concepts are new to me.
Here is the schema of my table:
CREATE TABLE blogplatform.users (
user_id int PRIMARY KEY,
active_status int,
password text,
user_name text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Query I am trying to execute:
UPDATE users SET active_status = 0 WHERE user_name='Heath';
Output:
InvalidRequest: Error from server: code=220 [Invalid query]
message="Cannot execute this query as it might involve data filtering and
thus may have unpredictable performance. If you want to execute this
query despite the performance unpredictability, use ALLOW FILTERING"
I even tried to add ALLOW FILTERING in the query:
UPDATE users SET active_status = 0 WHERE user_name='Heath' ALLOW FILTERING;
Output:
SyntaxException: line 1:59 : syntax error...
Could someone please explain me why I am getting this error on simple update. Also, can anyone guide me to resolve this error.
Thank you.

You cannot use a value column in the where clause (without allow filtering but again allow filtering is not a magic word and it just puts a lots of load on the database) - this is applicable for all type of queries and not just Select queries.
Better option for you would be to move the user_name as the PRIMARY KEY (PARTITION KEY).
With that table you can execute a query like.
UPDATE users SET active_status = 0 WHERE user_name='Heath';

Related

In Scylla DB, How can I query the records for multiple condition and without mentioning ALLOW FILTERING?

I have a table in ScyllaDB:
CREATE TABLE taxiservice.operatoragentsauditlog (
hourofyear int,
operationtime bigint,
action text,
actiontype text,
appname text,
entityid text,
entitytype text,
operatorid text,
operatoripaddress text,
operatorname text,
payload text,
PRIMARY KEY (hourofyear, operationtime)
) WITH CLUSTERING ORDER BY (operationtime DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX auditactiontype ON taxiservice.operatoragentsauditlog (actiontype);
CREATE INDEX auditid ON taxiservice.operatoragentsauditlog (entityid);
CREATE INDEX agentid ON taxiservice.operatoragentsauditlog (operatorid);
CREATE INDEX auditaction ON taxiservice.operatoragentsauditlog (action);
I have return the query:
select * from taxiService.operatoragentsauditlog
where hourOfYear =3655
and actionType ='XYZ'
and operatorId in ('100','200') limit 500;
And Scylla throwing the issue like :
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance. If you
want to execute this query despite the performance
unpredictability, use ALLOW FILTERING"
Here whatever I included column names in conditions are index's in the table, then also its throwing the above mentioned error.
How I can fetch the details without adding allow filtering in query.
All the Scylla Query Written with Allow Filters and I deployed changes in Production, then Server started throwing Service internal error(NoHostAvailableException) and its caused to fetch the data from scylla db.
How I can resolve the NoHostAvailableException In Scylla?
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:83) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58) ~[cassandra-driver-core-3.10.2.jar:?]
at com.datastax.driver.mapping.MethodMapper.invoke(MethodMapper.java:184) ~[cassandra-driver-mapping-3.10.2.jar:?]
at com.datastax.driver.mapping.AccessorInvocationHandler.invoke(AccessorInvocationHandler.java:67) ~[cassandra-driver-mapping-3.10.2.jar:?]
at com.sun.proxy.$Proxy161.getRideAuditLog(Unknown Source) ~[?:?]
at com.mycomany.myproduct.auditLog.AuditLogService.getRideAuditLog(AuditLogService.java:21) ~[taxiopconsoleservice-1.1.0.jar:?]
With distributed databases like Cassandra and Scylla, the idea is to build your tables to suit your queries. To that end, you could build another table and duplicate the data into it. In this new table, the primary key definition should look like this:
PRIMARY KEY (hourOfYear, actionType, operatorId)
That will support this query without the dreaded ALLOW FILTERING directive.
select * from taxiService.operatoragentsauditlog_by_hourofyear_and_actiontype
where hourOfYear =3655
and actionType ='XYZ'
and operatorId in ('100','200');
But as the original table is partitioned on hourOfYear, the query is restricted to a single partition. So even with ALLOW FILTERING it might not be that bad.
Your query
select * from taxiService.operatoragentsauditlog
where hourOfYear =3655
and actionType ='XYZ'
and operatorId in ('100','200') limit 500;
Will look at either the actionType or operatorId index to find all the rows matching the restriction on that column, and than need to go over all the candidate rows to check if they also match the two other restrictions. That's why you need ALLOW FILTERING. Theoretically, actionType = 'XYZ' may match a million rows, so this query will need to go over a million rows just to return a handful that match all the candiate rows.
Some search engines have an efficient way to intersect two index lookups - maybe actionType = 'XYZ' has a million matches, and operatorId in ('100', '200') has a million matches, but their intersection is just 10 rows. Search engines use a skip list mechanism to allow the intersection to be calculated efficiently. But Scylla doesn't have this feature. It will pick just one of the two indexes (you don't know which), and go over its matches one by one. By the way, please note that even if Scylla did support efficient index intersection, your hourOfYear = 3655 restriction isn't indexed, so would need row-by-row filtering anyway.
As Aaron noted in his answer, the solution can be to use ALLOW FILTERING if the one index match results in a small-enough number of matches (it would be better to have just one index, not two, so you'll know exactly which index is being used), or - change your table schema to something which better matches your queries. The latter is always good advice in many situations.
Regarding the NoHostAvailableException - it means the Scylla replicas failed to perform this query for some reason. It might indicate a bug, or a timeout (which would also be a bug, because a scan like your query should do paging - not time out). Please look at the Scylla log if there's an error message that appears at the time of this request, and report the problem in the Scylla bug tracker at https://github.com/scylladb/scylla/issues.

Problems performing an update on Cassandra having a compound partitioning key

I have this table in Cassandra:
CREATE TABLE wear_dealer.product_color_size_stock (
productcode text,
colorcode text,
sizecode text,
ean text,
shortdescription text,
stock int,
**PRIMARY KEY (productcode, colorcode, sizecode)**
) WITH CLUSTERING ORDER BY (colorcode ASC, sizecode ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX product_color_size_stock_stock_idx ON wear_dealer.product_color_size_stock (stock);
How can I update shortdescription having only the value for productcode
When I perform this query:
cqlsh:wear_dealer> update seasons_product_color_size
set shortdescription ='AAA'
where productcode='RUNTS';
I get the following error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Some partition key parts are missing: seasoncode"
Any strategie to overcome this?
Many thanks in advance!
Unfortunately, CQL does not allow writes for a partial key. Remember that Cassandra treats INSERTs and UPDATEs the same. So when this:
UPDATE seasons_product_color_size
SET shortdescription ='AAA'
WHERE productcode='RUNTS';
Returns this: "Some partition key parts are missing: seasoncode"
It's saying that Cassandra doesn't know which node to write the data to, because there isn't a partition key. In SQL, it would just iterate through all rows in the table and update them according to your WHERE clause. But Cassandra is specifically designed not to allow operations like that.
For this query you will need to figure out the missing seasoncodes separately, and UPDATE each row individually.
Cassandra supports write based on partition key, As you supplied partial partition key you cannot update with that.
UPDATE seasons_product_color_size SET shortdescription ='AAA' WHERE productcode='RUNTS' and sizecode=10

How to use both ORDER BY and IN together in a query in Cassandra?

I use Cassandra 3.0.5.
I'm having problem using ORDER BY and IN together.
Schema:
CREATE TABLE my_status.user_status_updates (
username text,
id timeuuid,
body text,
PRIMARY KEY (username, id))
WITH CLUSTERING ORDER BY (id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Query:
SELECT username, id, UNIXTIMESTAMPOF(id), body
FROM user_status_updates
WHERE username IN ('carol', 'dave')
ORDER BY id DESC
LIMIT 2;
InvalidRequest: code=2200 [Invalid query] message="Cannot page queries with both ORDER BY and a IN restriction on the partition key; you must either remove the ORDER BY or the IN and sort client side, or disable paging for this query"
I'm sure I've seen people query this without errors, so I know there is a way to get around this. What do I need to do to make this query work, or is it inefficient to query both ORDER BY and IN together?
You've set the Clustering Key to be ordered ASC, but are requesting it be ordered DESC in your query. These are at odds and are counter-productive. If you change the Clustering Key to DESC, then you won't need the ORDER BY clause in the query. If you truly do need to have the Clustering Key be ASC for other queries, then I would suggest a second table with it being DESC. Design your tables for what the query will require. Hope that helps.
Adam
Using both IN and ORDER BY will require turning off paging with the PAGING OFF command in cqlsh.
cqlsh> PAGING OFF is the answer.

Cassandra timeout during read query at consistency ONE

I have a problem with the cassandra db and hope somebody can help me. I have a table “log”. In the log table, I have inserted about 10000 rows. Everything works fine. I can do a
select * from
select count(*) from
As soon I insert 100000 rows with TTL 50, I receive a error with
select count(*) from
Version: cassandra 2.1.8, 2 nodes
Cassandra timeout during read query at consistency ONE (1 responses
were required but only 0 replica responded)
Has someone a idea what I am doing wrong?
CREATE TABLE test.log (
day text,
date timestamp,
ip text,
iid int,
request text,
src text,
tid int,
txt text,
PRIMARY KEY (day, date, ip)
) WITH read_repair_chance = 0.0
AND dclocal_read_repair_chance = 0.1
AND gc_grace_seconds = 864000
AND bloom_filter_fp_chance = 0.01
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
AND comment = ''
AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
AND compression = { 'sstable_compression' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
AND default_time_to_live = 0
AND speculative_retry = '99.0PERCENTILE'
AND min_index_interval = 128
AND max_index_interval = 2048;
That error message indicates a problem with the READ operation. Most likely it is a READ timeout. You may need to update your Cassandra.yaml with a larger read timeout time as described in this SO answer.
Example for 200 seconds:
read_request_timeout_in_ms: 200000
If updating that does not work you may need to tweak the JVM settings for Cassandra. See DataStax's "Tuning Java Ops" for more information
count() is a very costly operation, imagine Cassandra need to scan all the row from all the node just to give you the count. In small amount of rows if works, but on bigger data, you should use another approaches to avoid timeout.
First of all, we have to retrieve row by row to count amount and forgot about count(*)
We should make a several(dozens, hundreds?) queries with filtering by partition and clustering key and summ amount of rows retrieved by each query.
Here is good explanation what is clustering and partition keys In your case day - is partition key, composite key consists from two columns: date and ip.
It most likely impossible to do it with cqlsh commandline client, so you should write a script by yourself. Official drivers for popular programming languages: http://docs.datastax.com/en/developer/driver-matrix/doc/common/driverMatrix.html
Example of one of such queries:
select day, date, ip, iid, request, src, tid, txt from test.log where day='Saturday' and date='2017-08-12 00:00:00' and ip='127.0 0.1'
Remarks:
If you need just to calculate count and nothing more, probably has a sense to google for tool like https://github.com/brianmhess/cassandra-count
If Cassandra refuses to run your query without ALLOW FILTERING that mean query is not efficient https://stackoverflow.com/a/38350839/2900229

OperationTimedOut during cqlsh alter table

I am receiving a OperationTimedOut error while running an alter table command in cqlsh. How is that possible? Since this is just a table metadata update, shouldn't this operation run almost instantaneously?
Specifically, this is an excerpt from my cqlsh session
cqlsh:metric> alter table metric with gc_grace_seconds = 86400;
OperationTimedOut: errors={}, last_host=sandbox73vm230
The metric table currently has a gc_grace_seconds of 864000. I am seeing this behavior in a 2-node cluster and in a 6-node 2-datacenter cluster. My nodes seem to be communicating fine in general (e.g. I can insert in one and read from the other). Here is the full table definition (a cyanite 0.1.3 schema with DateTieredCompactionStrategy, clustering and caching changes):
CREATE TABLE metric.metric (
tenant text,
period int,
rollup int,
path text,
time bigint,
data list<double>,
PRIMARY KEY ((tenant, period, rollup, path), time)
) WITH CLUSTERING ORDER BY (time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'timestamp_resolution': 'SECONDS', 'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
I realize at this point the question is pretty old, and you may have either figured out the answer or otherwise moved on, but wanted to post this in case others stumbled upon it.
The default cqlsh request timeout is 10 seconds. You can adjust this by starting up cqlsh with the --request-timeout option set to some value that allows your ALTER TABLE to run to completion, e.g.:
cqlsh --request-timeout=1000000

Resources