Does cassandra support creating materialized view with mutable keys?
Does cassandra support the use case of updating partition key while rows are fetched from materialized view via pagination? For the fetched rows, the partition key will be updated to a different value.
Related
Does couch DB have a partition key and sort key? If yes, What is the sorting logic and theory?
Recent versions of CouchDB support partitioned databases, letting you specify an explicit partition as a document id of the form mypartition:documentid. For certain use cases, this can be a very efficient way of organising your data, with partitioned view queries not having to visit every shard. Note that there are some quirks to be aware of in terms of partition sizes etc. A database can be created as partitioned, but an existing (non-partitioned) database can't be made partitioned at a later stage. Cloudant wrote a handy blog post introducing partitioned databases.
There is no separate sort key; ordering is determined by the document id (or for views, the view key). The ordering is lexicographical on the key.
If I store documents without providing partition key, In this case documentId will be treated as Partition Key of Logical Partition?
If yes: how about Billion logical partitions in that collection? I have query to only look by documentId.
Now inside Document JSON:
I have multiple fields & I have provided /asset as the partitionKey. Is this a composite partition key now: /asset/documentId?
or /asset will tel partition to search for documentId from?
If I store documents without providing partition key, In this case
documentId will be treated as Partition Key of Logical Partition?
No. If you create a document without Partition Key, document id will not be treated as the partition key. Cosmos DB engine will put all the documents without a partition key value in a hidden logical partition. This particular partition can be accessed by specifying partition key as {}.
You define the partition key when you create the collection (according to the screenshot asset is a partition key in your case). If you dont provide a partition key when you create a collection - it will be limited to 10 GB of data (because it wouldn't be able to shard it without partition key).
Only partition key is used to determine the partition of the document. other fields are irrelevant when deciding which partition this document belongs to.
I am running into a strange problem where 1 out of 4 materialized view on base table is out of sync on all nodes.
I have tried below options but still not able to make out any solution.
nodetool refresh on all nodes
nodetool repair on keyspace
Also, I ran nodetool compaction to clear tombstones.
Finally, I dropped and recreated the materialized view and as the data is huge the view is stuck in build process. I can view the build process in opCenter and in system.views_builds_in_progress table.
Then I manually stopped the build process with nodetool stop VIEW_BUILD and ran the compaction again. Still the issue persists.
Is it because one of the primary key of my materialized view is having primary key for which about 60% of data in base table is NULL?
i.e I have primary key of materialized as (key1, key2, key3) for key1 there will be about 60% of data in base table as null.
I am trying to update a column in base table which is a partition key in the materialized view and trying to understand its performance implications in a production environment.
Base Table:
CREATE TABLE if not exists data.test
(
foreignid uuid,
id uuid,
kind text,
version text,
createdon timestamp,
**certid** text,
PRIMARY KEY(foreignid,createdon,id)
);
Materialized view:
CREATE MATERIALIZED VIEW if not exists data.test_by_certid
AS
SELECT *
FROM data.test
WHERE id IS NOT NULL AND foreignid
IS NOT NULL AND createdon IS NOT NULL AND certid IS NOT NULL
PRIMARY KEY (**certid**, foreignid, createdon, id);
So, certid is the new partition key in our materialized view
What takes place :
1. When we first insert into the test table , usually the certids would
be empty which would be replaced by "none" string and inserted into
the test base table.
2.The row gets inserted into materialized view as well
3. When the user provides us with certid , the row gets updated in the test base table with the new certid
4.the action gets mirrored and the row is updated in materialized view wherein the partition key certid is getting updated from "none"
to a new value
Questions:
1.What is the perfomance implication of updating the partition key certid in the materialized view?
2.For my use case, is it better to create a new table with certid as partition key (insert only when certid in non-empty) and manually
maintain all CRUD operations to the new table or should I use MV and
let cassandra do the bookkeeping?
It is to be noted that performance is an important criteria since it will be used in a production environment.
Thanks
Updating a table for which one or more views exist is always more expensive then updating a table with no views, due to the overhead of performing a read-before-write and locking the partition to ensure concurrent updates play well with the read-before-write. You can read more about the internals of materialized views in Cassandra in ScyllaDb's wiki.
If changing the certid is a one-time operation, then the performance impact shouldn't be too much of a worry. Regardless, it is always a better idea to let Cassandra deal with updating the MV because it will take care of anomalies (such as what happens when the node storing the view is partitioned away and the update is unable to propagate), and eventually ensure consistency.
If you are worried about performance, consider replacing Cassandra with Scylla.
What is the relationship between a node and partition key in cassandra. According to partition key's hash value the data will be stored in a node, is that mean there is "one to one" relationship in between a node and partition key i.e one node contains only one value of hashed value of partition key or a node can contains multiple hashed value of partition keys.
As I'm new to cassandra got confused in this basic point.
partition keys determine the locality of the data. in a cassandra cluster with RF=1, there will be only a single copy of every item, and all the items with the same partition key will be stored in the same node. depending on your usecase, this can be good or bad.
back to your question: it is NOT true that "one node contains only one value of hashed value of partition key" but rather the other way around: all the items with the same partition key would be stored in one node (along with other partition keys, potentially).
Each Node in cassandra is responsible for range of hash value of partition key (Consistent hashing).
By default casssandra uses MurMur3 partitioner.
So on each node in cassandra there will be multiple partition keys availaible. For same partition key there will be only one record on one node, other copies will be available on other nodes based on replication factor.Consistent Hashing in cassandra