I am running into a strange problem where 1 out of 4 materialized view on base table is out of sync on all nodes.
I have tried below options but still not able to make out any solution.
nodetool refresh on all nodes
nodetool repair on keyspace
Also, I ran nodetool compaction to clear tombstones.
Finally, I dropped and recreated the materialized view and as the data is huge the view is stuck in build process. I can view the build process in opCenter and in system.views_builds_in_progress table.
Then I manually stopped the build process with nodetool stop VIEW_BUILD and ran the compaction again. Still the issue persists.
Is it because one of the primary key of my materialized view is having primary key for which about 60% of data in base table is NULL?
i.e I have primary key of materialized as (key1, key2, key3) for key1 there will be about 60% of data in base table as null.
Related
My cluster consists of 20 nodes and they are in same DC and Rack.
Keyspace DDL is:
create keyspace hello with replication = {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'};
Table DDL is:
create table world
(
foo text,
bar text,
baz text,
primary key (foo, bar)
)
with compression = {'chunk_length_in_kb': '16', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
and gc_grace_seconds = 86400;
create index idx_world_bar
on world (bar);
The situation is that one of the nodes has a disk failure, so status of the node changes to DN checked by nodetool status.
In this situation, when I use query like:
select * from hello.world where bar="..";
every query result was NoHostAvaiable.
(I know this query is bad pattern.)
I think the reason is,
secondary index is local index.
rf of hello keyspace is 1.
so coordinator try to full node search.
status of one of node is Down. so NoHostAvaiable raised.
These are solutions I think but not best solution.
1st solution
stop cassandra, replace fault disk, cassandra restart on node which is down.
but replace + restart time is so long and node is still down during that time.
2nd solution
remove node which is down by nodetool removenode.
replace fault disk, clear data, bootstrap cassandra.
this solution occurs data loss by clearing data and nodetool repair is useless because rf=1.
this solution may cause token redistribution.
Is there any other way,
to avoid NoHostAvailable while keeping the node down?
Or
restore data and token range after remove nodes?
Or Would you give me a best solution for this situation?
I am trying to extract data from a table as part of a migration job.
The schema is as follows:
CREATE TABLE IF NOT EXISTS ${keyspace}.entries (
username text,
entry_type int,
entry_id text,
PRIMARY KEY ((username, entry_type), entry_id)
);
In order to query the table we need the partition keys, the first part of the primary key.
Hence, if we know the username and the entry_type, we can query the table.
In this case the username can be whatever, but the entry_type is an integer in the range 0-9.
When doning the extraction we iterate the table 10 times for every username to make sure we try all versions of entry_type.
We can no longer find any entries as we have depleted our list of usernames. But our nodetool tablestats report that there is still data left in the table, gigabytes even. Hence we assume the table is not empty.
But I cannot find a way to inspect the table to figure out what usernames remains in the table. If I could inspect it I could add the usernames left in the table to our extraction job and eventually we could deplete the table. But I cannot simply query the table as such:
SELECT * FROM ${keyspace}.entries LIMIT 1
as cassandra requires the partition keys to make meaningful queries.
What can I do to figure out what is left in our table?
As per the comment, the migration process includes a DELETE operation from the Cassandra table, but the engine will have a delay before actually removing from disk the affected records; this process is controlled internally with tombstones and the gc_grace_seconds attribute of the table. The reason for this delay is fully explained in this blog entry, for a tl dr, if the default value is still in place, Cassandra will need to pass at least 10 days (864,000 seconds) from the execution of the delete before the actual removal of the data.
For your case, one way to proceed is:
Ensure that all your nodes are "Up" and "Healthy" (UN)
Decrease the gc_grace_seconds attribute of your table, in the example, it will set it to 1 minute, while the default is
ALTER TABLE .entries with GC_GRACE_SECONDS = 60;
Manually compact the table:
nodetool compact entries
Once that the process is completed, nodetool tablestats should be up to date
To answer your first question, I would like to put more light on gc_grace_seconds property.
In Cassandra, data isn’t deleted in the same way it is in RDBMSs. Cassandra is designed for high write throughput, and avoids reads-before-writes. So in Cassandra, a delete is actually an update, and updates are actually inserts. A “tombstone” marker is written to indicate that the data is now (logically) deleted (also known as soft delete). Records marked tombstoned must be removed to claim back the storage space. Which is done by a process called Compaction. But remember that tombstones are eligible for physical deletion / garbage collection only after a specific number of seconds known as gc_grace_seconds. This is a very good blog to read more in detail : https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
Now possibly you are looking into table size before gc_grace_seconds and data is still there.
Coming to your second issue where you want to fetch some samples from the table without providing partition keys. You can analyze your table content using Spark. The Spark Cassandra Connector allows you to create Java applications that use Spark to analyze database data. You can follow the articles / documentation to write a quick handy spark application to analyze Cassandra data.
https://www.instaclustr.com/support/documentation/cassandra-add-ons/apache-spark/using-spark-to-sample-data-from-one-cassandra-cluster-and-write-to-another/
https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/spark/sparkJavaApi.html
I would recommend not to delete records while you do the migration. Rather first complete the migration and post that do a quick validation / verification to ensure all records are migrated successfully (this use can easily do using Spark buy comparing dataframes from old and new tables). Post successful verification truncate the old table as truncate does not create tombstones and hence more efficient. Note that huge no of tombstone is not good for cluster health.
I have a table which is relatively big. I want to create a Materialized View in Cassandra. While the view is being populated, if the base table gets updated, will the view also get updated with those changes? How does it work? Because in order to execute a batchlog on the view, the partition on the base table will be locked, therefore it cannot wait until the population has finished.
In my case, i will perform only inserts or deletes on the base table which simplifies things, I guess. But what if I would also perform updates? Would cassandra check the timestamps to detect somehow which value is most recent?
I am trying to update a column in base table which is a partition key in the materialized view and trying to understand its performance implications in a production environment.
Base Table:
CREATE TABLE if not exists data.test
(
foreignid uuid,
id uuid,
kind text,
version text,
createdon timestamp,
**certid** text,
PRIMARY KEY(foreignid,createdon,id)
);
Materialized view:
CREATE MATERIALIZED VIEW if not exists data.test_by_certid
AS
SELECT *
FROM data.test
WHERE id IS NOT NULL AND foreignid
IS NOT NULL AND createdon IS NOT NULL AND certid IS NOT NULL
PRIMARY KEY (**certid**, foreignid, createdon, id);
So, certid is the new partition key in our materialized view
What takes place :
1. When we first insert into the test table , usually the certids would
be empty which would be replaced by "none" string and inserted into
the test base table.
2.The row gets inserted into materialized view as well
3. When the user provides us with certid , the row gets updated in the test base table with the new certid
4.the action gets mirrored and the row is updated in materialized view wherein the partition key certid is getting updated from "none"
to a new value
Questions:
1.What is the perfomance implication of updating the partition key certid in the materialized view?
2.For my use case, is it better to create a new table with certid as partition key (insert only when certid in non-empty) and manually
maintain all CRUD operations to the new table or should I use MV and
let cassandra do the bookkeeping?
It is to be noted that performance is an important criteria since it will be used in a production environment.
Thanks
Updating a table for which one or more views exist is always more expensive then updating a table with no views, due to the overhead of performing a read-before-write and locking the partition to ensure concurrent updates play well with the read-before-write. You can read more about the internals of materialized views in Cassandra in ScyllaDb's wiki.
If changing the certid is a one-time operation, then the performance impact shouldn't be too much of a worry. Regardless, it is always a better idea to let Cassandra deal with updating the MV because it will take care of anomalies (such as what happens when the node storing the view is partitioned away and the update is unable to propagate), and eventually ensure consistency.
If you are worried about performance, consider replacing Cassandra with Scylla.
I'm trying to create a Materialized-View on an existing cassandra table with 1200K+ records.
Its been a couple of hours since i executed the query for view creation, but still I find the row count to be 0 in the view.
I have already tried nodetool flush keyspace-name.
Is there a way to find out where it is stuck? or what is the progress?
Is there any way to speed this up?