How to delete tombstones of cassandra table? - cassandra

My OpsCenter give me 'Failed' result on Tombstone count performance service. I read this paper and find that may be the insertion of NULL value is the casual.
So I try to fix this problem using the following procedures:
Set the NULL column of table channels and articles to ''. And for checking reason, there is no any insertings to these two tables.
Set gc_grace_seconds to 0 using commands:
alter table channels with gc_grace_seconds = 0
alter table articles with gc_grace_seconds = 0
Truncate bestpractice_results table in OpsCenter keyspace.
Restart agents and OpsCenter using commands:
service datastax-agent restart
service opscenterd restart
But, when OpsCenter run routine performance check (every 1 minute), the following 'Failed' information appeared again. And the number of tombstones is not changed (i.e., 23552 and 1374)
And I have the question:
How to remove these tombstones when there is no any insertion operations on two tables ?
Do I need repair the cluster ?
OpsCenter Version: 6.0.3 Cassandra Version: 2.1.15.1423 DataStax Enterprise Version: 4.8.10

With Cassandra 3.10+, use
nodetool garbagecollect keyspace_name table_name
Check https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/nodetool/toolsGarbageCollect.html

Please go through below link to get complete info about Delete and Tombstone.. It may be helpful for you.
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Related

Cassandra dropped keyspaces still on HDD

I noticed an increase in the number of open files on my cassandra cluster and went to check the health of it. Nodetool status reported only 300gb in use per node of the 3TB each has allocated.
Shortly there after i began to see HEAP OOM errors showing up in the cassandra logs.
These nodes had been running for 3-4 months no issue, but had a series of test data populate and then dropped from them.
After checking the harddrives via the df command i was able to determine they were all between 90-100% filled in a jboded scenario.
edit: further investigation shows that the remaining files are in the 'snapshot' subfolder and the data subfolder itself has no db tables.
My question is, has anyone seen this? Why did compaction not free these tombstones? Is this a bug?
Snapshots aren't tombstones - they are a backup of your data.
As Highstead says you can drop any unused snapshots via the clearsnapshot command.
You can disable the automatic snapshot facility via the cassandra.yaml
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__auto_snapshot
Also check if you may have non-default true for snapshot_before_compaction
Snapshots occur over the lifetime of the cassandra cluster. These snapshots are not captured in a nodetool status but still occupy space. In this case the snapshots consuming all the space were created when a table was dropped.
To retrieve a list of current snapshots use the command nodetool listsnapshots
This feature can be disabled through editing /etc/cassandra/cassandra-env.sh and setting auto_snapshot to false. Alternatively these snapshots can be purged via the command nodetool clearsnapshot <name>.

Cassandra : Error while using INSERT INTO with 'IF NOT EXISTS' from within cqlsh

I've created a Table called 'test'.
CREATE TABLE test1(
link text PRIMARY KEY,
title text,
descp text,
pubdate text,
ts timestamp
);
Then I insert a record into:
INSERT INTO test1(title,link,descp,pubdate, ts) VALUES('T3','http://link.com/a3','D3','date3', toTimestamp(now())) IF NOT EXISTS;
This results in an error (red colored text in cqlsh): NoHostAvailable
The Cassandra setup uses Cassandra version 3.9 on Mac OS El Capitain.
The key space is this:
CREATE KEYSPACE testkeyspace
WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};
This is configuration parameters I changed according to answers on stackoverflow:
start_rpc: true (from false)
start_native_transport: true (from false)
Still, I can't seem to pin-point why I can't run this statement of INSERT INTO using these keywords at the end of the insert statement "IF NOT EXISTS"
Note that I started Cassandra using "cassandra -f"
Please help if you know what's wrong here.
A possibility to try, many of the drivers now use LOCAL_ONE as a default consistency level. With SimpleStrategy, you can get cases where even with all the nodes being up this can request can fail (CASSANDRA-12053) if none of the nodes in your DC has the data. That should be exposed as an UnavailableException, not a NoHostAvailable so but its worth a try to use the network topology RF instead.
Is this a 1 node cluster? Having a replication factor > number of nodes will doubtless cause issues, so per comment setting to 1 is a good idea.
Was cassandra still running at time of the query? with -f you need to keep Cassandra running in foreground or cqlsh will lose its connection and give a NoHostAvailable exception.

How can I disable system_traces keyspace in Cassandra?

We use Cassandra 2.1.5 in a small dev environment (2 DCs, 3 nodes each).
We don't have much space on dev machines and almost every day face disk space errors. Main culprit is system_traces keyspace :
.../system_traces]# du -sh
8.1G .
I tried to turn tracing off in cqlsh:
cqlsh> tracing off;
Tracing is not enabled.
I tried nodetool settraceprobability 0 - but still tables are getting populated.
I can't delete tables and keyspace:
cqlsh> drop keyspace system_traces;
Unauthorized: code=2100 [Unauthorized] message="Cannot DROP <keyspace system_traces>"
The only working solution is "truncate system_traces.sessions; truncate system_traces.events;" - but those tables are getting full of rows quite soon.
How do I disable it once and for all?
Theres a chance the trace probability was enabled too, You can disable it via nodetool at runtime:
nodetool settraceprobability 0
but would have to be done on each node. You can truncate the events/sessions tables
cqlsh> truncate system_traces.events;
cqlsh> truncate system_traces.sessions;
but may want to then clear snapshots if truncate triggered one.
nodetool clearsnapshot system_traces
You must still have tracing enabled somewhere. Try to look at a few rows, that might give you a hint as to what generates these traces. For instance, when I trace a CQL query manually, the query string appears in sessions.parameters.
nodetool settraceprobability 1 helped me to start tracing and getting data in those system_traces tables like sessions and events. Earlier it was not showing any data for me. I did same step on all nodes. So that make it clear that nodetool settraceprobability 0 should stop the logging if we do it on all tables.

Remove all data Cassandra?

I have a eight node cassandra setup. I am saving data with 3 days TTL. But the data is useless after I take a summary (using my java script, count of things etc). I want to delete all the data in a table. I can stop cassandra for sometime to do the deletion. So the data is removed from all nodes.
Should I run truncate and nodetool repair afterwards or should I flush first then delete. Whats the proper way to do it.
You can drop the tables or truncate them... but keep in mind that Cassandra will snapshot your tables by default, so you'll also need to run nodetool clearsnapshot on all of your nodes afterwards. There is no need to stop Cassandra while you do this delete.
I don't know that there is a right way per se... but when I do when I need to clear a table is, first, I run truncate on the table using cqlsh. Then I run nodetool clearsnapshot on my nodes using pssh (https://code.google.com/p/parallel-ssh/).
Hope this helps

Cassandra Java Driver returns deleted columns

I'm using Datastax Cassandra Java Driver 2.1.0 to delete a set of rows in the database. My test environment is based on a single node with Cassandra 2.0.7.
I run the delete statement and then checked the result running a query to select the deleted rows.
The problem is that the second query returns the rows, but if I check it via cqlsh, the rows are indeed deleted.
The query trace reports that the rows are marked as tombstoned, so why the select query retrieves the data anyway?
Here is the code for the delete task:
Statement query = QueryBuilder.delete().from(QueryBuilder.quote(CF_MESSAGES))
.where(QueryBuilder.in(CF_MESSAGES_KEY, (Object[]) rowKeyArray));
session.execute(query);
And here the code for the select:
query = QueryBuilder.select().all().from(QueryBuilder.quote(CF_MESSAGES))
.where(QueryBuilder.in(CF_MESSAGES_KEY, (Object[]) rowKeyArray))
.and(QueryBuilder.lte(CF_MESSAGES_COLUMN1, "2:" + Character.MAX_VALUE));
ResultSet queryResult = session.execute(query);
Thank you!
Repair is an anti entropy mechanism that should be run ~weekly or at least more often than your gc_grace_seconds in order to avoid zombie tombstones from coming back. DataStax OpsCenter has a Repair Service that automates this task.
Manually you can run:
nodetool repair
in one node or
nodetool repair -pr
in each of your nodes. The -pr option will ensure you only repair a node's primary ranges.
You should try a nodetool repair... I had the same issue :
Cassandra and defuncting connection (see comments)
If your environment is a test environment try reducing the gc_grace_seconds and check if any node was down(you can check with Linux uptime command) when delete happened.

Resources