We have been using Cassandra in our current live project for almost a year. We are using Cassandra 2.1.14 and sometimes we get to see that there is some synchronization problem between Cassandra and Presto. When there is some update in database using Cassandra and I am going to fire any query from presto then it doesn’t return data while data exists in the database.
Second issue is that sometimes delete and update statements don’t get executed. It shows no error but transaction is not committed.
The metadata caching in Cassandra doesn't update immediately, which means you might not see some changes. I suggest you change cassandra.schema-cache-ttl to 0s; we're going to remove caching in Cassandra altogether soon.
Related
We have setup a apache ignite 2.9.0 cluster with native persistence using kubernetes in Azure with 4 nodes. To update some cache configuration, we restarted all the ignite nodes. After restart, running any sql query on one particular table - results in restart of 2 ignite nodes and after that we see lost partitions exception.
If we try to restart all nodes to recover from lost partitions, then its fine until we run any sql query on that table after which 2 nodes restart and we get lost partitions exception.
Is there anyway we can recover from lost partitions and overcome this problem? We also wanted to understand why its occuring?We could not find any logs related to this.
When all partition owners left the grid, the partition is considered to be lost, you might think of this as a special internal marker. Depending on the PartitionLossPolicy Ignite might ignore this fact and allow cache operations or disallow them to protect data consistency.
If you use native persistence, then most likely there was no physical data loss and all you need is to tell Ignite that you are aware of the situation, now all data are in place and it's safe to remove the "lost" mark from the partitions.
I think the most simple way to handle this would be to use the control script from within a pod:
control.sh --cache reset_lost_partitions cacheName1,cacheName2,...
More details:
https://ignite.apache.org/docs/latest/configuring-caches/partition-loss-policy#handling-partition-loss
While setting up the Cassandra on 15 nodes, we found that one of the node went into the draining state after making keyspaces on Cassandra.
We don’t know why the node went into the draining state. We are working with ALL CONSISTENCY setup, so in such case we are not able to query Cassandra.
Later we were trying to upgrade DC/OS to latest version along with latest Cassandra version, but this upgradation resulted in corruption of data on all the nodes. From this point on we were not able to add existing data as it was getting corrupted.
I understand the concept of SSTable in Cassandra. I have also tested the different version of files create with insert after nodetool flush.
I have also setup a snapshot backup and incremental back and tested it's working fine.
For testing purpose i deleted all the sstable files from all the nodes. Strangely , am still able to select the data.
Can someone please explain me from where cassandra is fetching the data ?
Regards
Sid
The record you queried was available in the ROW cache aka memtables(memory).
So once you restart your node strangely you will again get back the result because the commit logs got replayed eventually building those SSTables for you.
Clear all the SSTables and commit logs and restart your node .And then you can observe that you get no records for your query .
Offsite backups for Cassandra seem like a challenging thing. You basically have to make yet another copy of ALL your data, including the copies of data that exist due to the replication factor. Snapshots make backups easy when you don't mind storing it on the same disk that your node already uses. I'm curious - in the event of a catastrophic failure of this disk, is it possible to recover the node using the nodes that the data was replicated to?
Yes, you can restore data on crashed node using a procedure in documentation - Replacing a dead node or dead seed node. It's for Cassandra 3.x, please pick your Cassandra version from a drop-down menu on the top of the page.
But please note that you still need to do backups if your data is valuable. If you using AWS you can use this project to backup Cassandra to S3 storage.
If you are looking for offsite or off-host backups, you can also look at opscenter from Datastax or Talena software (my company). Both provide you the ability to backup your database locally or to S3. As you may expect, you also have the ability to restore data in case of hardware failures, user errors or logical corruptions which the replicas will not protect you against.
Yes, it is possible. Just execute in terminal "nodetool repair" on the node with missed data. It can take a lot of time. Also I would recommend execute repair operation on each node every month to keep your data always replicated because cassandra does not repairs data automatically (for example after node(s) falling).
Our QA Team has requested that we completely clear all data within the app's keyspace on our Cassandra 2.1.3 server prior to testing. (Cassandra 2.1.3 is running on an Ubuntu 14.04LTS Azure D12 instance [4 cores, 28GB Memory]).
We have attempted to TRUNCATE the column families and had problems with both Cassandra and Stargate index corruption afterwards. (returning incorrect/no data).
We have attempted to DELETE the data from the column families and had the same problem with indexes and tombstoning.
We were told to use DROP KEYSPACE with snapshot turned off; this resulted in Cassandra shutting down with all remote connections forcibly shut down, a partially deleted state on several occasions where we were able to access the keyspace via DevCenter, but it did not appear in the schema_keyspaces table, and/or corrupted indexes.
There are less than 100,000 records across 30 column families, so not a whole lot of data.
We cannot upgrade Cassandra to the latest version because Stargate only supports the C* 2.1.3 version.
Any other recommendations of how we can resolve this problem?
We answered the question internally.
Remove StarGate. Once we removed StarGate, the TRUNCATE and DROP KEYSPACE functionalities began to work appropriately again.
We notified StarGate support.