Problems performing an update on Cassandra having a compound partitioning key - cassandra

I have this table in Cassandra:
CREATE TABLE wear_dealer.product_color_size_stock (
productcode text,
colorcode text,
sizecode text,
ean text,
shortdescription text,
stock int,
**PRIMARY KEY (productcode, colorcode, sizecode)**
) WITH CLUSTERING ORDER BY (colorcode ASC, sizecode ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX product_color_size_stock_stock_idx ON wear_dealer.product_color_size_stock (stock);
How can I update shortdescription having only the value for productcode
When I perform this query:
cqlsh:wear_dealer> update seasons_product_color_size
set shortdescription ='AAA'
where productcode='RUNTS';
I get the following error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Some partition key parts are missing: seasoncode"
Any strategie to overcome this?
Many thanks in advance!

Unfortunately, CQL does not allow writes for a partial key. Remember that Cassandra treats INSERTs and UPDATEs the same. So when this:
UPDATE seasons_product_color_size
SET shortdescription ='AAA'
WHERE productcode='RUNTS';
Returns this: "Some partition key parts are missing: seasoncode"
It's saying that Cassandra doesn't know which node to write the data to, because there isn't a partition key. In SQL, it would just iterate through all rows in the table and update them according to your WHERE clause. But Cassandra is specifically designed not to allow operations like that.
For this query you will need to figure out the missing seasoncodes separately, and UPDATE each row individually.

Cassandra supports write based on partition key, As you supplied partial partition key you cannot update with that.
UPDATE seasons_product_color_size SET shortdescription ='AAA' WHERE productcode='RUNTS' and sizecode=10

Related

Cassandra queries perform a full table scan if no rows exist for a specific partition key

I have a very large table like
CREATE TABLE IF NOT EXISTS profile (
account_id text,
user_id uuid,
user_data text,
creation_date timestamp,
update_date timestamp,,
PRIMARY KEY ((account_id, user_id))
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
The following query will run the full table scan if the table has no rows matching the partial partition key (account_id = 'D-F-8CX7PGX')
SELECT * FROM profile WHERE account_id = 'D-F-8CX7PGX' AND user_id = '123e4567-e89b-12d3-a456-426614174000';
I expect that Cassandra could quickly return with no rows found, not scan the full table.
Someone suggested inserting a dummy row with (account_id = 'D-F-8CX7PGX' AND user_id = '00000000-0000-0000-0000-000000000000') could avoid the full table scan. But I don't understand why it is needed.
Does anyone encounter the similar issue?
A single partition query does not do a full table scan.
Since the partition key is (account_id, user_id) and your query filters on a single partition, Cassandra will attempt to retrieve the partition from the relevant replica(s) without scanning the whole table. Cheers!

How to do pagination and sorting on post table in cassandra database?

I am using cassandra v3.0 as my database
My Table Scheme is as below:
`CREATE TABLE db_name.post (
postcreatedby timeuuid,
contenttype text,
createdat bigint,
friendid timeuuid,
posttype text,
id timeuuid,
PRIMARY KEY (postcreatedby, contenttype, createdat, friendid, posttype, id)
) WITH CLUSTERING ORDER BY (contenttype ASC, createdat ASC, friendid ASC, posttype ASC, id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry`enter code here` = '99PERCENTILE';`
And I am trying to run the following query:
SELECT * from post WHERE postcreatedby = timeuuid AND contenttype IN ('text', 'text') AND createdat < bigint AND friendid = timeuuid AND posttype = 'text';
And getting following Error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering column "friendid" cannot be restricted (preceding column "createdat" is restricted by a non-EQ relation)"
My question is:
I need to use all columns for filtering the data and need to sort it as well. Here i am using 'createdAt' parameter to maintain sorting and pagination.
My problem is, if I will set createdAt as a last cluster key then I can use all columns for filtering as well but unable to sort that data. And if I will put the createdAt as before any parameter then I can not use last parameters as a filter.

Cassandra Delete Records

I'm new to Cassandra and I've been having some issues trying to delete records. I have a table defined as follows:
CREATE TABLE wire_journal (
persistence_id text,
partition_nr bigint,
sequence_nr bigint,
timestamp timeuuid,
timebucket text,
event blob,
event_manifest text,
message blob,
ser_id int,
ser_manifest text,
tag1 text,
tag2 text,
tag3 text,
used boolean static,
writer_uuid text,
PRIMARY KEY ((persistence_id, partition_nr), sequence_nr, timestamp, timebucket)
) WITH CLUSTERING ORDER BY (sequence_nr ASC, timestamp ASC, timebucket ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'bucket_high': '1.5', 'bucket_low': '0.5', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'enabled': 'true', 'max_threshold': '32', 'min_sstable_size': '50', 'min_threshold': '4', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 'unchecked_tombstone_compaction': 'false'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
And Indexes defined as follows:
CREATE CUSTOM INDEX timestamp_idx ON wire_journal (timestamp) USING 'org.apache.cassandra.index.sasi.SASIIndex';
CREATE CUSTOM INDEX manifest_idx ON wire_journal (event_manifest) USING 'org.apache.cassandra.index.sasi.SASIIndex';
I would like to be able to delete by timestamp and event_manifest.
I can query by an event manifest for example:
select event_manifest, dateOf(timestamp) from wire_journal where event_manifest = '011000028';
The query above works. However If I try to do a deletion for the same criteria as follows:
delete from wire_journal where event_manifest = '011000028';
I get the following error:
InvalidRequest: code=2200 [Invalid query] message="Some partition key parts are missing: persistence_id, partition_nr"
I've tried including those columns in my delete as follows:
delete persistence_id, partition_nr from wire_journal where event_manifest = 'aba:011000028';
and I get the following error:
invalidRequest: code=2200 [Invalid query] message="Invalid identifier persistence_id for deletion (should not be a PRIMARY KEY part)"
How can I go about deleting all the records that match that condition?
Your partition key is (persistence_id, partition_nr) and Cassandra only delete records using partition key
So your query need to be like:
delete from wire_journal where persistence_id = x AND partition_nr = y AND event_manifest = 'aba:011000028';

How to use both ORDER BY and IN together in a query in Cassandra?

I use Cassandra 3.0.5.
I'm having problem using ORDER BY and IN together.
Schema:
CREATE TABLE my_status.user_status_updates (
username text,
id timeuuid,
body text,
PRIMARY KEY (username, id))
WITH CLUSTERING ORDER BY (id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Query:
SELECT username, id, UNIXTIMESTAMPOF(id), body
FROM user_status_updates
WHERE username IN ('carol', 'dave')
ORDER BY id DESC
LIMIT 2;
InvalidRequest: code=2200 [Invalid query] message="Cannot page queries with both ORDER BY and a IN restriction on the partition key; you must either remove the ORDER BY or the IN and sort client side, or disable paging for this query"
I'm sure I've seen people query this without errors, so I know there is a way to get around this. What do I need to do to make this query work, or is it inefficient to query both ORDER BY and IN together?
You've set the Clustering Key to be ordered ASC, but are requesting it be ordered DESC in your query. These are at odds and are counter-productive. If you change the Clustering Key to DESC, then you won't need the ORDER BY clause in the query. If you truly do need to have the Clustering Key be ASC for other queries, then I would suggest a second table with it being DESC. Design your tables for what the query will require. Hope that helps.
Adam
Using both IN and ORDER BY will require turning off paging with the PAGING OFF command in cqlsh.
cqlsh> PAGING OFF is the answer.

Select distinct gives incorrect values even if performed on primary key Cassandra

Im running Cassandra Version 2.1.2 and cqlsh 5.0.1
Here is the table weather.log, weather is the keyspace having consistency level One.
I have 2 nodes configured.
CREATE KEYSPACE weather WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east': '1'} AND durable_writes = true;
CREATE TABLE weather.log (
ip inet,
ts timestamp,
city text,
country text,
PRIMARY KEY (ip, ts)
) WITH CLUSTERING ORDER BY (ts DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
When we run the query.
select distinct ip from weather.log
We get inconsistent, wrong responses. Once we get 99 just next time we get 1600 etc. [where the actual number should be > 2000]
I have tried this query with consistency level set to ALL also. It dint work.
Why is this happening ? I need to get all the keys. How to get all the primary keys?
It looks like you might be effected by CASSANDRA-8940. I'd suggest to update to the latest 2.1.x release and verify if this issue is fixed for you.

Resources