Force Cassandra to save particular key values to be partitioned to Specific node. - cassandra

How to use the ByteOrderedPartitioner (BOP) to force specific key values to be partitioned according to a custom requirement. I want to force Cassandra to partition and replicate data according to custom requirements, without introducing a custom partitioner how far I can control this behavior and how ?
Overall: I want my data starting with particular ID to be at a predefined node because I know data will be accessed from that node heavily. Also like the data to be replicated to nearby nodes.

I want my data starting with particular ID to be at a predefined node because I know data will be accessed from that node heavily.
Looks like that you talk about data locality problem, which is really important in bigdata-like computations (Spark, Hadoop, etc.). But the general approach for that isn't to pin data to specific node, but just to move your whole computation to the data itself.
Pinning data to specific node may cause problems like:
what should you do if your node goes down?
how evenly will the data be distributed among the cluster? Will be there any hotspots/bottlenecks because of node over(under)-utilization?
how can you scale your cluster in future?
Moving computation to data has no issues with these questions, but the approach you going to choose - has.

Found the answer here...
http://www.mail-archive.com/user%40cassandra.apache.org/msg14997.html
Changing the setting "initial_token" in cassandra.yaml file we can let the nodes to be divided into key ranges and partitioning will choose the node which is going to save the first replication of the data and strategy class SimpleStrategy will add the replica to proceeding nodes so by arranging the nodes the way you want you can exploit the replication strategy.

Related

How to ensure that consistent hashing works?

I'm going to implement consistent hashing over a bunch of nodes. Each node has a limited capacity (let's say 1GB). I starts with one node and when it's getting full I'm gonna add another node and use consistent hashing to redistribute the data and move forward by adding new nodes. However there are still chances that a node gets full. I know some nosql databases such as cassandra uses consistent hashing to do something similar to what i'm doing. How can I avoid nodes from overflowing using consistent hashing?
Cassandra does not use consistent hashing in a way you described.
Each table has a partition key (you can think about it as a primary key or first part of it in RDBMS terminology), this key is hashed using murmur3 algorithm. The whole hash space forms a continuos ring from lowest possible hash to the highest. After that this ring is divided into chunks (vnodes, 256 by default) and these chunks are fairly distributed among multiple nodes. Each node hosts not only it's own part of the ring, but also maintains replicated copy of other vnodes according to replication factor.
This way of doing things helps to solve a lot of problems:
balance data load among all cluster nodes, no specific node can be overloaded (data size, reads and writes are evenly distributed, no hot points)
if you add a new node to a cluster, it will handle it's own part of ring and pull required vnodes automatically from other nodes. No need to manual resharding.
if node fails, due to replication you won't miss any data because it is already stored on other nodes. In this case you can decomission failed nodes so all other nodes will redistribute failed ring part among them. No need to have complex switching scenarios for failed db nodes.
Of course, you can always implement similar DB behaviour on top of any RDBMS in your application layer, but it is always much harder and not error-prone than using already existing battle-tested solution.
I guess you know how keys gets moved from one node to another node, when a node is added or deleted. Coming to your question of how uniform distribution happens?
You can have your own logic here to make it happen. You keep on monitoring all the nodes in the hash if any node is getting hot(Handling more keys) insert another node before this node so that the load will be distributed among the old and the new nodes. Similar way if any of the the nodes are under utilised you can delete them so that load will be shift to the next node.
Hope this help..!!

How does Cassandra partitioning work when replication factor == cluster size?

Background:
I'm new to Cassandra and still trying to wrap my mind around the internal workings.
I'm thinking of using Cassandra in an application that will only ever have a limited number of nodes (less than 10, most commonly 3). Ideally each node in my cluster would have a complete copy of all of the application data. So, I'm considering setting replication factor to cluster size. When additional nodes are added, I would alter the keyspace to increment the replication factor setting (nodetool repair to ensure that it gets the necessary data).
I would be using the NetworkTopologyStrategy for replication to take advantage of knowledge about datacenters.
In this situation, how does partitioning actually work? I've read about a combination of nodes and partition keys forming a ring in Cassandra. If all of my nodes are "responsible" for each piece of data regardless of the hash value calculated by the partitioner, do I just have a ring of one partition key?
Are there tremendous downfalls to this type of Cassandra deployment? I'm guessing there would be lots of asynchronous replication going on in the background as data was propagated to every node, but this is one of the design goals so I'm okay with it.
The consistency level on reads would probably generally be "one" or "local_one".
The consistency level on writes would generally be "two".
Actual questions to answer:
Is replication factor == cluster size a common (or even a reasonable) deployment strategy aside from the obvious case of a cluster of one?
Do I actually have a ring of one partition where all possible values generated by the partitioner go to the one partition?
Is each node considered "responsible" for every row of data?
If I were to use a write consistency of "one" does Cassandra always write the data to the node contacted by the client?
Are there other downfalls to this strategy that I don't know about?
Do I actually have a ring of one partition where all possible values
generated by the partitioner go to the one partition?
Is each node considered "responsible" for every row of data?
If all of my nodes are "responsible" for each piece of data regardless
of the hash value calculated by the partitioner, do I just have a ring
of one partition key?
Not exactly, C* nodes still have token ranges and c* still assigns a primary replica to the "responsible" node. But all nodes will also have a replica with RF = N (where N is number of nodes). So in essence the implication is the same as what you described.
Are there tremendous downfalls to this type of Cassandra deployment?
Are there other downfalls to this strategy that I don't know about?
Not that I can think of, I guess you might be more susceptible than average to inconsistent data so use C*'s anti-entropy mechanisms to counter this (repair, read repair, hinted handoff).
Consistency level quorum or all would start to get expensive but I see you don't intend to use them.
Is replication factor == cluster size a common (or even a reasonable)
deployment strategy aside from the obvious case of a cluster of one?
It's not common, I guess you are looking for super high availability and all your data fits on one box. I don't think I've ever seen a c* deployment with RF > 5. Far and wide RF = 3.
If I were to use a write consistency of "one" does Cassandra always
write the data to the node contacted by the client?
This depends on your load balancing policies at the driver. Often we select token aware policies (assuming you're using one of the Datastax drivers), in which case requests are routed to the primary replica automatically. You could use round robin in your case and have the same effect.
The primary downfall will be increased write costs at the coordinator level as you add nodes. The maximum number of replicas written to I've seen is around 8 (5 for other data centers and 3 for local replicas).
In practice this will mean a reduced stability while performing large or batched writes (greater than 1mb) or a lower per node write TPS.
The primary advantage is you can do a lot of things that'd normally be awful and impossible to do. Want to use secondary indexes? probably will work reasonably well (assuming cardinality and partition size doesn't become your bottleneck there). Want to add a custom UDF that does GroupBy or use very large IN queries it'll probably work.
It is as #Phact mentions not a common usage pattern and I primarily saw it used with DSE Search on low write throughput use cases that had requirements for 'single node' features from Solr, but for those same use cases with pure Cassandra you'd get some benefits on the read side and be able to do expensive queries that are normally impossible in a more distributed cluster.

Problems In Cassandra ByteOrderedPartitioner in Cluster Environment

I am using cassandra 1.2.15 with ByteOrderedPartitioner in a cluster environment of 4 nodes with 2 replicas. I want to know what are the drawbacks of using the above partitioner in cluster environment? After a long search I found one drawback. I need to know what are the consequences of such drawback?
1) Data will not distribute evenly.
What type of problem will occur if data are not distributed evenly?
Is there is any other drawback with the above partitioner in cluster environment if so, what are the consequences of such drawbacks? Please explain me clearly.
One more question is, Suppose If I go with Murmur3Partitioner the data will distribute evenly. But the order will not be preserved, however this drawback can be overcome with cluster ordering (Second key in the primary keys). Whether my understanding is correct?
As you are using Cassandra 1.2.15, I have found a doc pertaining to Cassandra 1.2 which illustrates the points behind why using the ByteOrderedPartitioner (BOP) is a bad idea:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePartitionerBOP_c.html
Difficult load balancing More administrative overhead is required to load balance the cluster. An ordered partitioner
requires administrators to manually calculate partition ranges
(formerly token ranges) based on their estimates of the row key
distribution. In practice, this requires actively moving node
tokens around to accommodate the actual distribution of data once
it is loaded.
Sequential writes can cause hot spots If your application tends to write or update a sequential block of rows at a time, then the
writes are not be distributed across the cluster; they all go to
one node. This is frequently a problem for applications dealing
with timestamped data.
Uneven load balancing for multiple tables If your application has multiple tables, chances are that those tables have different row keys and different distributions of data. An ordered
partitioner that is balanced for one table may cause hot spots and uneven distribution for another table in the same cluster.
For these reasons, the BOP has been identified as a Cassandra anti-pattern. Matt Dennis has a slideshare presentation on Cassandra Anti-Patterns, and his slide about the BOP looks like this:
So seriously, do not use the BOP.
"however this drawback can be overcome with cluster ordering (Second key in the primary keys). Whether my understanding is correct?"
Somewhat, yes. In Cassandra you can dictate the order of your rows (within a partition key) by using a clustering key. If you wanted to keep track of (for example) station-based weather data, your table definition might look something like this:
CREATE TABLE stationreads (
stationid uuid,
readingdatetime timestamp,
temperature double,
windspeed double,
PRIMARY KEY ((stationid),readingdatetime));
With this structure, you could query all of the readings for a particular weather station, and order them by readingdatetime. However, if you queried all of the data (ex: SELECT * FROM stationreads;) the results probably will not be in any discernible order. That's because the total result set will be ordered by the (random) hashed values of the partition key (stationid in this case). So while "yes" you can order your results in Cassandra, you can only do so within the context of a particular partition key.
Also, there have been many improvements in Cassandra since 1.2.15. You should definitely consider using a more recent (2.x) version.

cassandra cluster, 1 table, how to plan forward

I am planning to create an application that will use just 1 cassandra table. Replication factor will be probably 2 or 3. I might start initially with 2 cassandra server and then keep adding servers as needed. But I am not sure if I need to pre-plan anything so that the table is distributed uniformly when I add more servers. Are there any best practices or things I need to be aware? I read about tokens , http://www.datastax.com/docs/1.1/initialize/token_generation , but I am not sure what I need to do.
I suppose the keys have to be distrubuted uniformly in the cluster, so:
how will that happen i.e. when I add the 2nd server and say the 1st one already has 1 million keys
do I need to pre-plan the keyspace or tables?
I can suggest two things.
First, when designing your schema, pick a good partition key (1st column in the primary key). You need to ensure a couple of things:
There are enough values such that you can distribute it to an arbitrary amount of nodes. For example, sex would be a bad partition key, because you only have two values and therefore can only distribute it to two nodes.
The distribution across different partition key values is more or less uniform. For example, country might not be best, because you will most likely have most of your rows in just a few unique countries.
Secondly, to ease deployment of new nodes later consider setting up your cluster to use virtual nodes (vnodes). If you do that you will be able to skip a few steps when expanding your cluster.
To configure virtual nodes, set num_tokens in cassandra.yaml to more than 1. This will decide how many virtual nodes your node will have. A recommended value is 256.
Later, when you add new nodes, you need to make sure add_bootstrap is true in cassandra.yaml for your new nodes. Then you configure network parameters as usual to match your cluster, and finally start your node. It should automatically bootstrap and start streaming appropriate data. After everything is settled down, you can run cleanup (nodetool clean) on your other nodes to make sure they purge redundant data that they're no longer responsible for.
For more detailed documentation, please see http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

How to make Riak data localized?

I'm designing a Riak cluster at the moment and wondering if it is possible to hint Riak that a specific bunch of keys should be placed on a single node of the cluster?
For example, there is some private data for the user, that only she is able to access. This data contains ~10k documents (too large to be kept in one key/document), and to serve one page, we need to retrieve ~100 of them. It would be better to keep the whole bunch on a single node + have the application on the same instance to make this faster.
AFAIK it is easy on Cassandra: just use OrderedPartitioner and keys like this: <hash(username)>/<private data key>. That way, almost all user keys will be kept on a single node.
One of the points of using Riak is that your data is replicated and evenly distributed throughout the cluster, thus improving your tolerance for network partitions and outages. Placing data on specific nodes goes against that goal and increases your vulnerability.

Resources