Cassandra: table and secondary index update - is it atomic? - cassandra

Please clarify:
if table A does have a secondary index on column COL1, and a new row is inserted to A, are A and A's index updated transactionally? Is there a window where A and A's index hold inconsistent state?
Sources saying table and index ARE NOT updated transactionally:
Secondary index update issue
https://dba.stackexchange.com/questions/136640/why-does-cassandra-recommend-against-creating-an-index-on-high-cardinality-colum
Sources saying table and index ARE updated transactionally:
https://wiki.apache.org/cassandra/SecondaryIndexes

Secondary indexes are consistent once an index is built. For performance reasons, writes are not atomic. Instead, checks are applied lazily at read time to ensure consistency.
Sam Tunnicliffe, who helped implement this, explains how consistency is maintained in Improving Secondary Index Write Performance in 1.2 and the related CASSANDRA-2897.

Related

Does a Secondry index lock anything when it is being created?

Given the following table schema:
CREATE TABLE Record (
-- uuidv4
recordId STRING(36) NOT NULL,
-- uuidv4
userId STRING(36),
isActive BOOL
lastUpdate TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true)
...
) PRIMARY KEY (recordId)
CREATE NULL_FILTERED INDEX RecordByUser
ON Record (userId, isActive)
For every record created we make a record (in the index) to be able able to get all of a user's records by their userId. Depending on what may be needed there could be an extra STORING clause with additional information columns.
My understanding is that as I add records to the Record table, Spanner will trigger a write to the index. Since the index is non-interleaved the data itself may have a different locality to the original record.
Under that assumption, will that write to the secondary index lock the Record table until it is completed or does one not affect the other?
I'm going to guess they are totally independent since an index can be created after the fact and Spanner will trigger a backfill operation that does not affect the operational status of the Record table.
The act of writing the index has to take some resources though from the node(s) so I would imagine that is really the limitation. Under a high write scenario for the Record table, we would also be effectively invoking a second write for the Index table RecordByUser consuming a bit more of the node(s) write throughput capacity.
So the act of adding to a Secondary Index doesn't require any locking on the source table (Record in this case). The primary concern would be the write throughput and any hotspots from those writes. For example, if we indexed on a timestamp as the first part of the index, the writes to the index would bunch up. Is my understanding here correct?
During the act of creating the index on an existing table, does the backfill process hold an exclusive lock on the index, like Postgres for example:
https://www.postgresql.org/docs/current/index-locking.html
Or can new writes land in the index during the secondary index creation while backfill is taking place?
I can imagine a backfill process on spanners end of things that takes a read snapshot and starts writing. Given Spanners fancy clocks if it encounters a row in the index newer than the row it is attempting to write, it just drops the old row on the floor and carries on.
Thanks for the question. Google engineer here for the help.
+1 to chainicko# answer for the general locking mechanism. It is not "locked" in the sense that you can still read/write the original table despite the backfill is still running.
Read/query to the index itself are not allowed during the backfill. But writes to the original table are allowed. New writes are added to the index concurrently. After the backfill, Spanner will make sure only the latest data will be presented when queried.
As for the example of "indexed on a timestamp as the first part of the index", since it creates a hotspot on the index, so it would still have a negative impact on the system as a whole, even though it does not lock the original table.

Cassandra hard vs soft delete

I have multiple tables that I want to keep their deleted data.
I thought of two options to achieve that:
Create new table called deleted_x and when deleting from x, immediatly insert to deleted_x.
Advantage : querying from only one table.
Disadvantages :
Do insert for each delete
When the original table structure changes, I will have to change the deleted table too.
Have a column called is_deleted and put it in the partition key in each of these tables and set it to true when deleting a row.
Advantage : One table structure
Disadvantage : mention is_deleted in all queries from table
Are there any performence considerations I should think of additionally?
Which way is the better way?
Option #1 is awkward, but it's probably the right way to do things in Cassandra. You could issue the two mutations (one DELETE, and one INSERT) in a single batch, and guarantee that both are written.
Option #2 isn't really as easy as you may expect if you're coming from a relational background, because adding an is_deleted column to a table in Cassandra and expecting to be able to query against it isn't trivial. The primary reason is that Cassandra performs significantly better when querying against the primary key (partition key(s) + optional clustering key(s) than secondary indexes. Therefore, for maximum performance, you'd need to model this as a clustering key - doing so then prohibits you from simply issuing an update - you'd need to delete + insert, anyway.
Option #2 becomes somewhat more viable in 3.0+ with Materialized Views - if you're looking at Cassandra 3.0+, it may be worth considering.
Are there any performence considerations I should think of additionally?
You will effectively double the write load and storage size for your cluster by inserting your data twice. This includes compactions, repairs, bootstrapping new nodes and backups.
Which way is the better way?
Let me suggest a 3rd option instead.
Create table all_data that contains each row and will never be deleted from
Create table active_data using the same partition key. This table will only contain non-deleted rows (Edit: but not any data at all, just the key!).
Check if key is in active_data before reading from all_data will allow you to only read non-deleted rows

Cassandra UPDATE not working after deletion

I'm using a wide row schema in Cassandra. My table definition is as follows:
CREATE TABLE usertopics (
key text,
topic text,
score counter,
PRIMARY KEY (key, topic)
)
I'm inserting entries using:
UPDATE usertopics SET score = score + ? WHERE key=? AND topic=?
such that if key does not exist it will insert and if it exists it will update.
I'm deleting entries from using:
Delete form usertopics where key in ?
But after deletion when I'm trying to update again, it's not updating. It's not giving any error, but it's not reflecting in db as well.
It's inserting perfectly again when I'm truncating the table. I'm using Datastax java driver for accessing Cassandra. Any suggestions?
From cassandra documentation -
Counter removal is intrinsically limited. For instance, if you issue
very quickly the sequence "increment, remove, increment" it is
possible for the removal to be lost (if for some reason the remove
happens to be the last received messages). Hence, removal of counters
is provided for definitive removal only, that is when the deleted
counter is not increment afterwards. This holds for row deletion too:
if you delete a row of counters, incrementing any counter in that row
(that existed before the deletion) will result in an undetermined
behavior. Note that if you need to reset a counter, one option (that
is unfortunately not concurrent safe) could be to read its value and
add -value.
Once deleted, a counter with same key cannot/should not be used. Please use the below links for further info -
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
https://wiki.apache.org/cassandra/Counters

maximum secondary indexes on a columnfamily

Is it a performance issue if we have two or more secondary indexes on a columnfamily? I have orderid,city and shipmenttype. So I thought I create primary key on orderid and secondary indexes on city and shipmenttype. And use combination of secondary index columns while querying. Is that a bad modelling?
Consider the data that will be placed in the secondary index. Looking at the docs, you want to avoid columns with high cardinality. If your city and shipment type values vary greatly (or conversely, too similarly) then a secondary index may not be the right fit.
Look in to potentially maintaining a separate table with this information. This would behave as a manual index of sorts, but have the additional benefit of behaving as you expect a Cassandra table should. When you create or update records be sure to update this index table. Writes are cheap, performing multiple writes over the course of updating a record is not unheard of.
When looking at your access patterns will you be using the partition key as part of the WHERE clause or just the secondary indexes?
If you're performing a query against the secondary indexes along with the partition key you will achieve better performance than when you just query with secondary indexes.
For example, with WHERE orderid = 'foo' AND shipmenttype = 'bar' the request will only be sent to nodes responsible for the partition where foo is stored. Then the secondary index will be consulted for shipmenttype = 'bar' and your results will be returned.
When you run a query with just WHERE shipmenttype = 'bar' the query is sent to all nodes in the cluster before the secondary indexes are consulted for looking up rows. This is less than ideal.
Additionally should you query against multiple secondary indexes with a single request you must use ALLOW FILTERING. This will only consult ONE secondary index during your request, usually the more specific of the indexes referenced. This will cause a performance hit as all records returned from checking the first index will require checking for the other values listed in your WHERE clause.
Should you be using a secondary index always strive to include the partition key portion of the query. Secondly do NOT use multiple secondary indexes when querying a table, this will cause a major performance hit.
Ultimately your performance is determined by how you construct your queries against the partition and secondary indexes.

Why am I reading many tombstones in Cassandra table although my access pattern should avoid them

I know this is not the best way to use Cassandra, but the type of my data requires reading all data from the last week. However when using Collection-types in CQL3, I ran into certain limitations which prevent me from doing normal date-range queries.
So I have set up Cassandra (currently single node, probably more in the future) with the following table
CREATE TABLE cache (tag text, id int, tags map<text,text>,
PRIMARY KEY (tag, id) );
ALTER TABLE cache WITH GC_GRACE_SECONDS = 0;
I am inserting with a TTL of one week to automatically remove the items from the Cache.
I tried to follow the suggestions mentioned in this article to avoid reading many tombstones by selecting by "minimum id", which I persist elsewhere to avoid reading old data:
SELECT * FROM cache WHERE tag = ? AND id >= ?
The id is basically some sort of timestamp which is constantly increasing, i.e. I only insert higher values over time and constantly remove older ids from the table.
But I still get warnings about thresholds being reached
WARN 08:59:06,286 Read 5001 live and 5702 tombstoned cells in cache (see tombstone_warn_threshold)
And if I do not run manual compaction/scrubbing regularly I get exceptions and queries fail.
However based on my understanding from the articles and documentation, I should be avoiding most if not all tombstones here as I query on equality for the tag, which allows Cassandra to only look for those areas and I use a minimum id which allows Cassandra to start reading only after most of the tombstones, so why are there still tombstone warnings/exceptions reported?
Map k/v pair is actually a column (name, value and timestamp): so, if you are issuing a lot of deletions of map elements (expiring by TTL is also the case) -- this is the source of this warning. Because you are still reading full maps (with lots of tombstones in them). Also, TTL setting on map is applied on per-element basis.
Second, this is multiplied by >= predicate in your select query.
If this is the case, you should remodel your data access pattern to use only EQ relations in SELECT query and bump id more often. Also, this access pattern will allow you to get rid of clustering part of your PRIMARY KEY.
So, if you do not issue lots of deletions on that map, you can try to use tag text, time timeuuid, name text, data text model and slice it precisely by time.

Resources