cassandra partion key grow limit? - cassandra

what means partions my grow large ? I think cassandra can handle a very large size of it. Why they use in this example 2 partion keys ?
And what I do maybe both partiton keys are too large ?

The example which you gave is one of the ways for preventing partitions to become too large. In Cassandra partition key ( part of primary key) is used for grouping similar set of rows.
Here in left side data model, user_id is the partition key which means every video interaction by that user will be placed in same partition. As mentioned in example comment, if user is active and has 1000 interaction daily then in 60 days (2 months) you will have 60000 rows for that user. This may breach Cassandra permissible partition size (in terms of data size stored in single partirion).
So to avoid this situation there are many ways you can avoid partition size to grow too big. For example, you can do
Make another column from that table a part of partition key. This is done in the example above. The video_id is made part of partition key along with user_id.
Bucketing - This is the strategy which is used in time series data generally where you make multiple buckets of a partition key. For example if date is your partition key then you can create 24 buckets as date_1, date_2,.....,date_24. Now you have divided your partition key into smaller partition keys and hence you divided one big partition into 24 small partitions.
The main idea is to avoid your partition to grow too big in size. This is a data modeling technique which one should be aware of while creating data model for Cassandra.
If still you have large partition size, you need to remodel your data model based on various data modelling techniques available. For that I would recommend understand your data, estimate rate of growth, calculate estimated size of partition and if your data model is not meeting the partition size demand then refine your data model.

Related

Wide partition pattern in Cassandra

What is 'wide partition pattern' in Cassandra? In book 'Defiinitive Cassandra' it seems its a recommended thing, but in some online articles I see its something to be avoided.
So what actually it is and it is preferrable or not?
Partition in Cassandra represent grouping of similar kind of rows. In Cassandra it is recommended to model your data such that you should have similar kind of rows fall in same partition. This is called wide partition pattern.
Searching in Cassandra is super fast using partition key. So wide partition pattern is recommened. But with this recommendation comes a warning that your wide partitions should not become too large.
The reason for warning (avoiding large partitions) is that searching becomes too slow as search within partition is slow. Also it puts lot of pressure on heap.
For better understanding, would recommend reading this blog https://thelastpickle.com/blog/2019/01/11/wide-partitions-cassandra-3-11.html
wide partition refers a partition which contains many cells/values (number_of_cells = column * row)
The commonly used time series design pattern is a great example for wide partition table design: You use the date bucket as the partition key, each bucket will contain all timestamps for the date range defined by that bucket.
This model illustrates why it's called "wide" partition
another dynamodb example, the idea is the same (sort key in dynamodb is the same as clustering key in cassandra)

Does having unique and lot of small partition in a table effect performance or create extra load in cassandra

I have a table with 4 million unique partition keys
select count(*) from "KS".table;
count
4355748
(1 rows)
I have read the cardinality of Partition Key should not too high and also not too low, which
means don’t make partition key too unique. Is it correct?
The table does not have any clustering key. Will changing data partitioning help with the load?
It really depends on the use case... If you don't have natural clustering by partition, then maybe little sense to introduce it. Also, what are the read patterns? Do you need to read multiple rows in one go, or not?
Number of partitions has an effect on the size of the bloom filter, key cache, etc., so as you increase the number of partitions, bloom filter is increased, and key cache has less hits (until you increase its size).
As far as I know, Cassandra is using consistent hashing for mapping partition key to physical partition, so cardinality should not matter.

Cassandra partition keys organisation

I am trying to store the following structure in cassandra.
ShopID, UserID , FirstName , LastName etc....
The most of the queries on it are
select * from table where ShopID = ? , UserID = ?
That's why it is useful to set (ShopID, UserID) as the primary key.
According to docu the default partitioning key by Cassandra is the first column of primary key - for my case it's ShopID, but I want to distribute the data uniformly on Cassandra cluster, I can not allow that all data from one shopID are stored only in one partition, because some of shops have 10M records and some only 1k.
I can setup (ShopID, UserID) as partitioning keys then I can reach the uniform distribution of records in the Cassandra cluster . But after that I can not receive all users that belong to some shopid.
select *
from table
where ShopID = ?
Its obvious that this query demand full scan on the whole cluster but I have no any possibility to do it. And it looks like very hard constraint.
My question is how to reorganize the data to solve both problem (uniform data partitioning, possibility to make full scan queries) in the same time.
In general you need to make user id a clustering column and add some artificial information to your table and partition key during saving. It allows to break a large natural partition to multiple synthetic. But now you need to query all synthetic partitions during reading to combine back natural partition. So the goal is find a reasonable trade-off between number(size) of synthetic partitions and read queries to combine all of them.
Comprehensive description of possible implementations can be found here and here
(Example 2: User Groups).
Also take a look at solution (Example 3: User Groups by Join Date) when querying/ordering/grouping is performed by clustering column of date type. It can be useful if you also have similar queries.
Each node in Cassandra is responsible for some token ranges. Cassandra derives a token from row's partition key using hashing and sends the record to node whose token range includes this token. Different records can have the same token and they are grouped in partitions. For simplicity we can assume that each cassandra nodes stores the same number of partitions. And we also want that partitions will be equal in size for uniformly distribution between nodes. If we have a too huge partition that means that one of our nodes needs more resources to process it. But if we break it in multiple smaller we increase the chance that they will be evenly distirbuted between all nodes.
However distribution of token ranges between nodes doesn't related with distribution of records between partitions. When we add a new node it just assumes responsibility for even portion of token ranges from other nodes and as the result the even number of partitions. If we had 2 nodes with 3 GB of data, after adding a third node each node stores 2 GB of data. That's why scalability isn't affected by partitioning and you don't need to change your historical data after adding a new node.

Azure Cosmos DB - Understanding Partition Key

I'm setting up our first Azure Cosmos DB - I will be importing into the first collection, the data from a table in one of our SQL Server databases. In setting up the collection, I'm having trouble understanding the meaning and the requirements around the partition key, which I specifically have to name while setting up this initial collection.
I've read the documentation here: (https://learn.microsoft.com/en-us/azure/cosmos-db/documentdb-partition-data) and still am unsure how to proceed with the naming convention of this partition key.
Can someone help me understand how I should be thinking in naming this partition key? See the screenshot below for the field I'm trying to fill in.
In case it helps, the table I'm importing consists of 7 columns, including a unique primary key, a column of unstructured text, a column of URL's and several other secondary identifiers for that record's URL. Not sure if any of that information has any bearing on how I should name my Partition Key.
EDIT: I've added a screenshot of several records from the table from which I'm importing, per request from #Porschiey.
Honestly the video here* was a MAJOR help to understanding partitioning in CosmosDb.
But, in a nutshell:
The PartitionKey is a property that will exist on every single object that is best used to group similar objects together.
Good examples include Location (like City), Customer Id, Team, and more. Naturally, it wildly depends on your solution; so perhaps if you were to post what your object looks like we could recommend a good partition key.
EDIT: Should be noted that PartitionKey isn't required for collections under 10GB. (thanks David Makogon)
* The video used to live on this MS docs page entitled, "Partitioning and horizontal scaling in Azure Cosmos DB", but has since been removed. A direct link has been provided, above.
Partition key acts as a logical partition.
Now, what is a logical partition, you may ask? A logical partition may vary upon your requirements; suppose you have data that can be categorized on the basis of your customers, for this customer "Id" will act as a logical partition and info for the users will be placed according to their customer Id.
What effect does this have on the query?
While querying you would put your partition key as feed options and won't include it in your filter.
e.g: If your query was
SELECT * FROM T WHERE T.CustomerId= 'CustomerId';
It will be Now
var options = new FeedOptions{ PartitionKey = new PartitionKey(CustomerId)};
var query = _client.CreateDocumentQuery(CollectionUri,$"SELECT * FROM T",options).AsDocumentQuery();
I've put together a detailed article here Azure Cosmos DB. Partitioning.
What's logical partition?
Cosmos DB designed to scale horizontally based on the distribution of data between Physical Partitions (PP) (think of it as separately deployable underlaying self-sufficient node) and logical partition - bucket of documents with same characteristic (partition key) which is supposed to be stored fully on the same PP. So LP can't have part of the data on PP1 and another on PP2.
There are two main limitation on Physical Partitions:
Max throughput: 10k RUs
Max data size (sum of sizes of all LPs stored in this PP): 50GB
Logical partition has one - 20GB limit in size.
NOTE: Since initial releases of Cosmos DB size limits grown and I won't be surprised that soon size limitations might increase.
How to select right partition key for my container?
Based on the Microsoft recommendation for maintainable data growth you should select partition key with highest cardinality (like Id of the document or a composite field). For the main reason:
Spread request unit (RU) consumption and data storage evenly across all logical partitions. This ensures even RU consumption and storage distribution across your physical partitions.
It is critical to analyze application data consumption pattern when considering right partition key. In a very rare scenarios larger partitions might work though in the same time such solutions should implement data archiving to maintain DB size from a get-go (see example below explaining why). Otherwise you should be ready to increasing operational costs just to maintain same DB performance and potential PP data skew, unexpected "splits" and "hot" partitions.
Having very granular and small partitioning strategy will lead to an RU overhead (definitely not multiplication of RUs but rather couple additional RUs per request) in consumption of data distributed between number of physical partitions (PPs) but it will be neglectable comparing to issues occurring when data starts growing beyond 50-, 100-, 150GB.
Why large partitions are a terrible choice in most cases even though documentation says "select whatever works best for you"
Main reason is that Cosmos DB is designed to scale horizontally and provisioned throughput per PP is limited to the [total provisioned per container (or DB)] / [number of PP].
Once PP split occurs due to exceeding 50GB size your max throughput for existing PPs as well as two newly created PPs will be lower then it was before split.
So imagine following scenario (consider days as a measure of time between actions):
You've created container with provisioned 10k RUs and CustomerId partition key (which will generate one underlying PP1). Maximum throughput per PP is 10k/1 = 10k RUs
Gradually adding data to container you end-up with 3 big customers with C1[10GB], C2[20GB] and C3[10GB] of invoices
When another customer was onboarded to the system with C4[15GB] of data Cosmos DB will have to split PP1 data into two newly created PP2 (30GB) and PP3 (25GB). Maximum throughput per PP is 10k/2 = 5k RUs
Two more customers C5[10GB] C6[15GB] were added to the system and both ended-up in PP2 which lead to another split -> PP4 (20GB) and PP5 (35GB). Maximum throughput per PP is now 10k/3 = 3.333k RUs
IMPORTANT: As a result on [Day 2] C1 data was queried with up to 10k RUs
but on [Day 4] with only max to 3.333k RUs which directly impacts execution time of your query
This is a main thing to remember when designing partition keys in current version of Cosmos DB (12.03.21).
CosmosDB can be used to store any limit of data. How it does in the back end is using partition key. Is it the same as Primary key? - NO
Primary Key: Uniquely identifies the data
Partition key helps in sharding of data(For example one partition for city New York when city is a partition key).
Partitions have a limit of 10GB and the better we spread the data across partitions, the more we can use it. Though it will eventually need more connections to get data from all partitions. Example: Getting data from same partition in a query will be always faster then getting data from multiple partitions.
Partition Key is used for sharding, it acts as a logical partition for your data, and provides Cosmos DB with a natural boundary for distributing data across partitions.
You can read more about it here: https://learn.microsoft.com/en-us/azure/cosmos-db/partition-data
Each partition on a table can store up to 10GB (and a single table can store as many document schema types as you like). You have to choose your partition key though such that all the documents that get stored against that key (so fall into that partition) are under that 10GB limit.
I'm thinking about this too right now - so should the partition key be a date range of some type? In that case, it would really depend on how much data is getting stored in a period of time.
You are defining a logical partition.
Underneath, physically the data is split into physical partitions by Azure.
Ideally a partitionKey should be a primary Key, or a field with high cardinality to ensure proper distribution, with the self generated id field within that partition also set to the primary key, that will help with documentFetchById much faster.
You cannot change a partitionKey once container is created.
Looking at the dataset, captureId is a good candidate for partitionKey, with id set manually to this field, and not an auto generated cosmos one.
There is documentation available from Microsoft about partition keys. According to me you need to check the queries or operations that you plan to perform with cosmos DB. Are they read-heavy or write-heavy? if read heavy it is ideal to choose a partition key in the where clause that will be used in the query, if it is a write heavy operation then look for a key which has high cardinality
Always point reads /writes are better since it consumes way less RU's than running other queries

Is cassandra unable to store relationships that cross partition size limit?

I've noticed that relationships cannot be properly stored in C* due to its 100MB partition limit, denormalization doesn't help in this case and the fact that C* can have 2B cells per partition neither as those 2B cells of just Longs have 16GB ?!?!? Doesn't that cross 100MB partition size limit ?
Which is what I don't understand in general, C* proclaims it can have 2B cells but a partition sizes should not cross 100MB ???
What is the idiomatic way to do this? People say that this an ideal use case for TitanDB or JanusDB that scale well with billions of nodes and edges. How do these databases that use C* under the hood data-model this?
The use case of mine is described here https://groups.google.com/forum/#!topic/janusgraph-users/kF2amGxBDCM
Note that I'm fully aware of the fact that the answer to this question is "use extra partition key to decrease partition size" but honestly, who of us has this possibility? Especially in modeling relationships ... I'm not interested in relationship that happened in a particular hour...
Maximum number of cells (rows x columns) in a partition is 2 billion and single column value size is 2 GB ( 1 MB is recommended)
Source : http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html
Partition size 100MB is not the upper limit. If you check the datastax doc
For efficient operation, partitions must be sized within certain limits in Apache Cassandra™. Two measures of partition size are the number of values in a partition and the partition size on disk. Sizing the disk space is more complex, and involves the number of rows and the number of columns, primary key columns and static columns in each table. Each application will have different efficiency parameters, but a good rule of thumb is to keep the maximum number of rows below 100,000 items and the disk size under 100 MB
You can see that for efficient operation and low heap pressure they just made a good rule of thumb is to keep number of row 100,000 and disk size 100MB in a single partition.
TitanDB or JanusDB stores graphs in adjacency list format which means that a graph is stored as a collection of vertices with their adjacency list. The adjacency list of a vertex contains all of the vertex’s incident edges (and properties).
They used VertexID is the partition key, PropertyKeyID or EdgeID as clustering key and property value or edge properties as normal column.
If you use cassandra as storage backend.
In TitanDB or JanusDB, For efficient operation and low heap pressure, same rule apply, means number of edge and property of a vertex is 100,000 and size 100MB

Resources