Cassandra CQLSSTableWriter Issue - cassandra

In our production environment we are using CQLSStablWriter to create SSTables for future bulk load Sample Gist. After the process of sstable creation. Once we boot cassandra up, we were getting following exceptions. All our CFs are counter column families.
Versions Tried 2.1.5 and 2.0.8
Exception 1 and Exception 2
When looking up the code, we found that in ColumnFamily.java, cellInternal is of type BufferedCounterUpdateCell, whose diff operation is not supported. As well reconcile method is called on a wrong type too.
Just to confirm the issue, we did use the sstable2json utility to inspect the data. And as we guessed
Cells of a properly dumped sstable had valid counters
(":1:c05:desktop/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/","00010000db5ad3b00d4711e5b52dab7bf928868d00000000000000590000000000000059",1433835586985,"c",-9223372036854775808),
(":1:c05:*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/","00010000db5ad3b00d4711e5b52dab7bf928868d00000000000000590000000000000059",1433835586985,"c",-9223372036854775808)
whereas the faulty ones had buffered counters and hence the sstable2json failed
("*:0:c01:*/direct/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/","0000000000000001",1433924262793),
("*:0:c01:*/*/singapore/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/","0000000000000002",1433924262793),
Basically sstable2json doesn't have support for dumping BufferedCounterUpdateCells, Hence it assumes such cells to be of normal type and dumps them.
It is evident from error logs and sstable2json output, that instead of dumping CounterColumns, CQLSSTableWriter dumped Counter Update Columns counter types with a different mask, which resulted in error while cassandra tried to load SSTables with such columns up.
We see this issue happening always when SSTable get created via CQLSSTableWriter.
While going through any issues reported on the same note, we stalled on this. Something related to switching CFs for writing. We guess problem could be on the same lines. Any inputs are welcome.
----Update
On further debugging we figured out that CQLSSTableWriter fails to convert CounterUpdateColumns to CounterColumns as it is usually done in all other cases row mutations. Might need a patch

Counters are not supported on CQLSStableWriter. See https://issues.apache.org/jira/browse/CASSANDRA-10258 for more information.

Related

Could my large amount of tables (2k+) be causing my write timeout exceptions?

I'm running OS Cassandra 3.11.9 with Datastax Java Driver 3.8.0. I have a Cassandra keyspace that has multiple tables functioning as lookup tables / search indices. Whenever I receive a new POST request to my endpoint, I parse the object and insert it in the corresponding Cassandra table. I also put inserts to each corresponding lookup table. (10-20 per object)
When ingesting a lot of data into the system, I've been running into WriteTimeoutExceptions in the driver.
I tried to serialize the insert requests into the lookup tables by introducing Apache Camel and putting all the Statements into a queue that the Session could work off of, but it did not help.
With Camel, since the exceptions are now happening in the Camel thread, the test continues to run, instead of failing on the first exception. Eventually, the test seems to crash Cassandra. (Nothing in the Cassandra logs though)
I also tried to turn off my lookup tables and instead insert into the main table 15x per object (to simulate a similar number of writes as if I had the lookup tables on). This test passed with no exception, which makes me think the large number of tables is the problem.
Is a large number (2k+) of Cassandra tables a code smell? Should we rearchitect or just throw more resources at it? Nothing indicative has shown in the logs, mostly just some status about the number of tables etc - no exceptions)
Can the Datastax Java Driver be used multithreaded like this? It says it is threadsafe.
There is a direct effect of the high number of tables onto the performance - see this doc (the whole series is good source of information), and this blog post for more details. Basically, with ~1000 tables, you get ~20-25% degradation of performance.
That's could be a reason, not completely direct, but related. For each table, Cassandra needs to allocate memory, have a part for it in the memtable, keep information about it, etc. This specific problem could come from the blocked memtable flushes, or something like. Check the nodetool tpstats and nodetool tablestats for blocked or pending memtable flushes. It's better to setup some continuous monitoring solution, such as, metrics collector for Apache Cassandra, and and for period of time watch for the important metrics that include that information as well.

Clearing prepared statement cache in cassandra 3.0.10

We have cassandra 3.0.10 installed on centos. The developers made some coding mistakes on preparing statements. The result is that the prepared statement cache is overrunning and we always get evicted error message. The error is shown below:
INFO [ScheduledTasks:1] 2017-12-07 10:38:28,216 QueryProcessor.java:134 - 7 prepared statements discarded in the last minute because cache limit reached (8178944 bytes)
We have corrected the prepared statements and would like to flush the prepared statement cache to start from scratch. We have stopped and restarted the cassandra instance but the prepared statement count was not reset.
Cassandra 3.0.10 is installed on centos and we are using svcadm disable/enable cassandra to stop/start cassandra.
I noticed that in later version of cassandra, e.g. 3.11.1, there is a prepared_statements table under the system keyspace. Shutting down cassandra and deleting the file ${CASSANDRA_HOME}/data/data/system/prepared_statements-*, then restart cassandra actually resets the prepared_statement cache.
Appreciate any help on this.
Thanks.
Update: 2018-06-01
We are currently using a work-around to clear prepared statements associated with certain tables by dropping index then recreating the index on the table. This discards prepared statements that have dependencies on the defined index. For now, this is the most we can do. Problem is, if this won't work for tables that don't have index defined on them.
Still need a better way of doing this, e.g. some admin command to clear the cache.

Could not read commit log descriptor in file

I started to use cassandra 3.7 and always I have problems with the commitlog. When the pc unexpected finished by a power outage for example the cassandra service doesn't restart. I try to start for the command line, but always the error cassandra could not read commit log descriptor in file appears.
I have to delete all the commit logs to start the cassandra service. The problem is that I lose a lot of data. I tried to increment the replication factor to 3, but is the same.
What I can do to decrease amount of lost data?
pd: I only one pc to use cassandra database, it is not possible to add more pcs.
I think your option here is to work around the issue since its unlikely there is a guaranteed solution to prevent commit table files getting corrupted on sudden power outage. Since you only have a single node, it makes it more difficult to recover the data. Increasing the replication factor to 3 on a single node cluster is not going to help.
One thing you can try is to reduce the frequency at which the memtables are flushed. On flush of memtable the entries in the commit log are discarded, therefore reducing the amount of data lost. Details here. This will however not resolve the root issue

Cassandra : Rows missing from table after CQLSSTableWriter / sstableloader

I am struggling with a problem where a straightforward combination of a java app (that prepares sstables using the CQLSSTableWriter api) in combination with sstableloader fails to insert all rows.
The only suspect message I see during the creation of the sstables is
[Reference-Reaper:1] ERROR o.a.cassandra.utils.concurrent.Ref - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State#4651731d) to class
org.apache.cassandra.io.util.SafeMemory$MemoryTidy#1452490935:Memory#[7f89dc05adc0..7f89dc05dfc0)
was not released before the reference was garbage collected
The sstableloader does not list anything suspect. After the load completes the number of rows does not match.
I checked key uniqueness and that does not seem to be the issue.
Anyone any thoughts on how to go about fixing this?
Many thanks indeed!
Peter

Data in Cassandra not consistent even with Quorum configuration

I encountered a consistency problem using Hector and Cassandra when we have Quorum for both read and write.
I use MultigetSubSliceQuery to query rows from super column limit size 100, and then read it, then delete it. And start another around.
I found that the row which should be deleted by my prior query is still shown from next query.
And also from a normal Column Family, I updated the value of one column from status='FALSE' to status='TRUE', and the next time I queried it, the status was still 'FALSE'.
More detail:
It has not happened not every time (1/10,000)
The time between the two queries is around 500 ms (but we found one pair of queries in which 2 seconds had elapsed between them, still indicating a consistency problem)
We use ntp as our cluster time synchronization solution.
We have 6 nodes, and replication factor is 3
I understand that Cassandra is supposed to be "eventually consistent", and that read may not happen before write inside Cassandra. But for two seconds?! And if so, isn't it then meaningless to have Quorum or other consistency level configurations?
So first of all, is it the correct behavior of Cassandra, and if not, what data we need to analyze for further investment?
After check the source code with the system log, I found the root cause of the inconsistency.
Three factors cause the problem:
Create and update same record from different nodes
Local system time is not synchronized accurately enough (although we use NTP)
Consistency level is QUORUM
Here is the problem, take following as the event sequence
seqID NodeA NodeB NodeC
1. New(.050) New(.050) New(.050)
2. Delete(.030) Delete(.030)
First Create request come from Node C with local time stamp 00:00:00.050, assume requests first record in Node A and Node B, then later synchronized with Node C.
Then Delete request come from Node A with local time stamp 00:00:00.030, and record in node A and Node B.
When read request come, Cassandra will do version conflict merge, but the merge only depend on time stamp, so although Delete happened after Create, but the merge final result is "New" which has latest time stamp due to local time synchronization issue.
I also faced similar a issue. The issue occured because cassandra driver uses server timestamp by default to check which query is latest. However in latest version of cassandra driver they have changes it and now by default they are using client timestamp.
I have described the details of issue here
The deleted rows may be showing up as "range ghosts" because of the way that distributed deletes work: see http://wiki.apache.org/cassandra/FAQ#range_ghosts
If you are reading and writing individual columns both at CL_QUORUM, then you should always get full consistency, regardless of the time interval (provided strict ordering is still observed, i.e. you are certain that the read is always after the write). If you are not seeing this, then something, somewhere, is wrong.
To start with, I'd suggest a) verifying that the clients are syncing to NTP properly, and/or reproduce the problem with times cross-checked between clients somehow, and b) maybe try to reproduce the problem with CL_ALL.
Another thought - are your clients synced with NTP, or just the Cassandra server nodes? Remember that Cassandra uses the client timestamps to determine which value is the most recent.
I'm running into this problem with one of my clients/node. The other 2 clients I'm testing with (and 2 other nodes) run smoothly. I have a test that uses QUORUM in all reads and all writes and it fails very quickly. Actually some processes do not see anything from the others and others may always see data even after I QUORUM remove it.
In my case I turned on the logs and intended to test the feat with the tail -F command:
tail -F /var/lib/cassandra/log/system.log
to see whether I was getting some errors as presented here. To my surprise the tail process itself returned an error:
tail: inotify cannot be used, reverting to polling: Too many open files
and from another thread this means that some processes will fail opening files. In other words, the Cassandra node is likely not responding as expected because it cannot properly access data on disk.
I'm not too sure whether this is related to the problem that the user who posted the question, but tail -F is certainly a good way to determine whether the limit of files was reached.
(FYI, I have 5 relatively heavy servers running on the same machine so I'm not too surprise about the fact. I'll have to look into increasing the ulimit. I'll report here again if I get it fixed in this way.)
More info about the file limit and the ulimit command line option: https://askubuntu.com/questions/181215/too-many-open-files-how-to-find-the-culprit
--------- Update 1
Just in case, I first tested using Java 1.7.0-11 from Oracle (as mentioned below, I first used a limit of 3,000 without success!) The same error would popup at about the same time when running my Cassandra test (Plus even with the ulimit of 3,000 the tail -F error would still appear...)
--------- Update 2
Okay! That worked. I changed the ulimit to 32,768 and the problems are gone. Note that I had to enlarge the per user limit in /etc/security/limits.conf and run sudo sysctl -p before I could bump the maximum to such a high number. Somehow the default upper limit of 3000 was not enough even though the old limit was only 1024.

Resources