I have 2 Cassandra nodes. I have a table with 3 text fields (all keys) and 2 counters. RF is 2.
I added another counter column to table. Mistakenly, I issued drop on that column. I reverted application back to old version to not use the column.
I have added another counter column, to replace the dropped one, with a different name. I changed application to use that new column.
Now, all my queries that have where clause fail with this error:
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 2 failures" info={'failures': 2, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
And I see this in debug.log:
java.lang.IllegalStateException: [payable, revenue, to_pay, value] is not a subset of [revenue to_pay value]
payable is the old column, that was dropped, and to_pay is the new column.
What is happening?
Cassandra version is 3.11.
PS. I tried repairing, and it is running. Will it help?
EDIT:
Table schema:
CREATE TABLE backend_platform_prod.stats_counters (
date text,
key text,
revenue counter,
to_pay counter,
value counter,
PRIMARY KEY (date, key)
) WITH CLUSTERING ORDER BY (key ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
There was a payable counter field, that is dropped.
I tried backing up data with copy to, dropping table and recreating it, and restoring with copy from. It is working now, although some data seems missing (not very important).
I see payable column in system_schema.dropped_columns, but not in system_schema.columns.
Please check in System_schema keyspace .if the column is still existing delete that row.
Related
When doing a repair on a Cassandra node, I sometimes see a lot of tombstone logs. The error looks like this:
org.apache.cassandra.db.filter.TombstoneOverwhelmingException: Scanned over 100001 tombstone rows during query 'SELECT * FROM my_keyspace.table_foo WHERE token(<my params>) >= token(<my params>) AND token(<my params>) <= 2988334221698479200 LIMIT 2147385647' (last scanned row partition key was ((<my params>), 7c650d21-797e-4476-93d5-b1248e187f22)); query aborted
I have read here that tombstones are inserted as a way to mark a record as deleted. However, I don't see any code in this project that runs a delete on this table - just a read and an insert. What am I missing - how can I prevent these TombStoneOverwhelmingExceptions?
Here is the table definition:
CREATE TABLE my_keyspace.table_foo(
foo1 text,
year int,
month int,
foo2 text,
PRIMARY KEY ((foo1, year, month), foo2)
) WITH CLUSTERING ORDER BY (foo2 ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 6912000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND speculative_retry = '99PERCENTILE';
However, I don't see any code in this project that runs a delete on this table - just a read and an insert.
The code might not be running DELETEs, but the table definition tells Cassandra to delete anything >= 80 days old. TTLs create tombstones.
AND default_time_to_live = 6912000
So the thought behind TTLs in a time series model, is that they are typically ordered by timestamp in descending order. What ends up happening, is that most use cases tend to care only about recent data, and the descending order by timestamp causes the tombstones to end up on the "bottom" of the partition, where they are rarely (if ever) queried.
To create that effect, you'd need to create a new table with a definition something like this:
PRIMARY KEY ((foo1, year, month), created_time, foo2)
) WITH CLUSTERING ORDER BY (created_time DESC, foo2 ASC)
#anthony, here is my pov.
As a first step, don't let tombstones inserted into the table
Use the full primary key during the read path so we skip having to read the tombstones. Data modeling is key to designing the tables based on your access patterns required on the reading side
We could go and adjust min_threshold and set it to 2 to do some aggressive tombstone eviction
Similarly, we could tweak common options (for e.g. unchecked_tombstone_compaction set to true or other properties/options) to evict them faster
I would encourage you to view a similar question and the answers that are documented here
I used Cassandra 3.6 Database and the table definition is this.
CREATE TABLE sg.products (
date_updated text,
time_added int,
id text,
best_seller text,
company text,
PRIMARY KEY (date_updated, time_added, id)
) WITH CLUSTERING ORDER BY (time_added ASC, id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Table have millions of data.
In "products" table I Drop the column best_seller, successfully Drop the column.
but when I check the space disk, it does not decree,
so I hit the query on google and I found this term "Tombstone",
so the Cassandra was not deleting the data, kind of save into tombstone.
Now my question is how do I delete the tombstone data?, so I can save the memory.
Or is there any way to save the memory?
Thanks in advance.
Tombstones drop
Cassandra will fully drop those tombstones when a compaction triggers, only after local_delete_time + gc_grace_seconds as defined on the table the data belongs to. Remember that all the nodes are supposed to have been repaired within gc_grace_seconds to ensure a correct distribution of the tombstones and prevent deleted data from reappearing.
See this line from your table definition:
AND gc_grace_seconds = 864000
That is the time period which tombstones will live for. 864000 seconds == 10 days. Tombstones exist for that duration to allow them adequate time to be distributed to the other nodes in your cluster. That way all of the other nodes are aware of the delete(s), and do not return the obsoleted values.
Once that 10 day period has passed, and the next time this table triggers compaction (after that 10 days), the tombstones will be removed.
Note that you can shorten that period by modifying that property on your table definition. Just make sure that you're running repairs within that timeframe.
I'm learning Cassandra, started off with v3.8. My sample keyspace/table looks like this
CREATE TABLE digital.usage (
provider decimal,
deviceid text,
date text,
hours varint,
app text,
flat text,
usage decimal,
PRIMARY KEY ((provider, deviceid), date, hours)
) WITH CLUSTERING ORDER BY (date ASC, hours ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Using a composite PRIMARY KEY with partition key as provider and deviceId, so that the uniqueness and distribution is done across the cluster nodes. Then the clustering keys are date and hours.
I have few observatons:
1) For a PRIMARY KEY((provider, deviceid), date, hours), while inserting multiple entries for hours field, only latest is logged and the previous are disappeared.
2) For a PRIMARY KEY((provider, deviceid), date), while inserting multiple entries for same date field, only latest is logged and the previous are disappeared.
Though i'm happy with above(point-1) behaviour, want to know whats happening in the background. Do I have to understand more about the clustering order keys?
PRIMARY KEY is meant to be unique.
Most of RDBMS throws error if you insert duplicate value in PRIMARY KEY.
Cassandra does not do Read before Write. It creates a new version of record with latest timestamp. When you insert data with same values for columns in primary key, new data will be created with latest timestamp and while querying (SELECT) record with only latest timestamp is returned back.
Example:
PRIMARY KEY((provider, deviceid), date, hours)
Insert into digital.usage(provider, deviceid, date, hours,app,flat) values(1.0,'a','2017-07-27',1,"test","test")
---- This will create a new record with let's say timestamp as 1
Insert into digital.usage(provider, deviceid, date, hours,app,flat) values(1.0,'a','2017-07-27',1,"test1","test1")
---- This will create a new record with let's say timestamp as 2
SELECT app,flat FROM digital.usage WHERE provider=1.0 AND deviceid='a' AND date='2017-07-27' AND hours=1
Will give
------------
| app | flat |
|-----|------|
|test1|test1 |
------------
We have the below table with ttl 24 hours or 1 day. We have 4 cassandra 3.0 node cluster and there will be a spark processing on this table. Once processed, it will truncate all the data in the tables and new batch of data would be inserted. This will be a continuous process.
Problem I am seeing is , we are getting more tombstones because data is truncated frequently everyday after spark finishes processing.
If I set gc_grace_seconds to default , there will be more tombstones. If I reduce gc_grace_seconds to 1 day will it be an issue ? even if I run repair on that table every day will that be enough.
How should I approach this problem, I know frequent deletes is an antipattern in Cassandra, is there any other way to solve this issue?
TABLE b.stag (
xxxid bigint PRIMARY KEY,
xxxx smallint,
xx smallint,
xxr int,
xxx text,
xxx smallint,
exxxxx smallint,
xxxxxx tinyint,
xxxx text,
xxxx int,
xxxx text,
xxxxx text,
xxxxx timestamp
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCom pactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandr a.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 86400
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
thank you
A truncate of a table should not invoke tombstones. So when you're saying "truncating" I assume you mean deleting. You can as you have already mentioned drop the gc_grace_seconds value, however this is means you have a smaller window for repairs to run to reconcile any data, make sure each node has the right tombstone for a given key etc or old data could reappear. Its a trade off.
However to be fair if you are clearing out the table each time, why not use the TRUNCATE command, this way you'll flush the table with no tombstones.
I tried to CQL Python driver to insert 100k rows,
# no_of_rows = 100k
for row in range(no_of_rows):
session.execute("INSERT INTO test_table (key1, key2, key3) VALUES ('test', 'test', 'test'"))
but only one row is inserted into test_table (using Cassandra CQL Shell and select * from test_table), how to fix the issue?
UPDATE
If I tried
for row in range(no_of_rows):
session.execute("INSERT INTO test_table (key1, key2, key3) VALUES ('test' + str(row), 'test', 'test'"))
no rows were inserted, here key1 is the primary key.
describe test_table,
CREATE TABLE test_keyspace.test_table (
key1 text PRIMARY KEY,
key2 text,
key3 text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Cassandra primary keys are unique. 100000 in-place writes to the same key(s) leaves you with 1 row.
Which means if your primary key structure is PRIMARY KEY(key1,key2,key3) and you INSERT 'test','test','test' 100000 times...
...it'll write 'test','test','test' to the same partition 100000 times.
To get your Python code to work, I made some adjustments, such as creating a separate variable for the key (key1) and using a prepared statement:
pStatement = session.prepare("""
INSERT INTO test_table (key1, key2, key3) VALUES (?, ?, ?);
""")
no_of_rows=100000
for row in range(no_of_rows):
key='test' + str(row)
session.execute(pStatement,[key,'test','test'])
using Cassandra CQL Shell and select * from test_table
I feel compelled to mention, that both multi-key (querying for more than one partition key at a time) and unbound queries (SELECTs without a WHERE clause) are definite anti-patterns in Cassandra. They may appear work fine in a dev/test environment. But when you get to a production-scale cluster with dozens of nodes, these types of queries will introduce a lot of network time into the equation, as they will have to scan each node to compile the query results.
Your new code has a bug in string concatenation. It should be:
for row in range(no_of_rows):
session.execute("INSERT INTO test_table (key1, key2, key3) VALUES ('test" + str(row) + "', 'test', 'test')")