Can I force a query to fetch from local itself. WE have two data centers with replication factor 3 and 3 and i want to see the replicaiton is done properly or not 1) across nodes and 2 ) across data centers. Can i force the query to check only from a particular node and see if data is present in that node? I know getendpoints will fetch if i give ids but if want to check table updates in general and see if the data is being replicated or not how can i do this? APart from local_quorum we have any other option? Thanks
I can understand why someone might want to do this (sanity check), and I'll put some instructions on how to do this below. However firstly, it's not really necessary; The reason is because Cassandra is a distributed system, under normal circumstances its not necessary to check data is on a given node, the replication will place a given row on a given node according to where the snitch determines placement. So for a given replication factor like DC1:3 and DC2:3 a row may be on any 3 nodes in each DC. As long as you can query the cluster as a whole or each DC and get the right results then you know that your replication is working ok.
Having said that, here is how you find a key, the caveat is it has to be flushed to disc (you might need to invoke a nodetool flush). It may seem convoluted but this is how you trace it to a sstable so you may find this useful to know:
Use nodetool getendpoints to locate the nodes the key is on
Use nodetool getsstables to find the sstables the key is present in
Locate the file on disc and then use sstable2json to view the contents of the table, or use sstablekeys to just view the keys
Note: the sstable2json tools are only listed under Cassandra 1.2 docs but they should still be present in 2.1, at least I verified they are and have used them in DSE4.7 and 4.8.
Related
I am a newbie to Cassandra.I have created a keyspace in Cassandra in NetworkTopology Strategy with 2 replicas in one datacenter. Is there a cql command or some other way to view my data in two replicas?
Like SELECT * FROM tablename in replica1 / replica2
Whether there is another way such that I can visually see the data in two replicas?
Thanks in advance.
So your question is not real clear "See the data in 2 replicas". If you ever want to validate your data, you can run some commands to visually see things.
The first thing you'd want to do is log onto the node you want to investigate. Go to the data directory of the interested table -> DataDir/keyspace/table. In there you'll see one or more files that look like *Data.db. Those are your sstables. Data in memory is flushed to sstables in certain scenarios. You want to be sure your data is flushed from memory to disk if you're validating (as you may not find what you're looking for otherwise). To do that, you issue a "nodetool flush" command (you can use the keyspace and table as parameters if you only want to flush the specific table).
Like I said, after that, everything in memory would be flushed to disk. So you'd be able to see your sstables (again, *Data.db) files. Once you have those sstables, you can run the "sstabledump" command on each sstable to see the data that resides in them, thus validating your data.
If you have only a few rows you want to validate and a lot of nodes, you can find which node the rows would reside by running "nodetool getendpoints" with the keyspace, table, and partition key. That will tell you every node that will have the data. That way you're not guessing which node the row(s) should be on. Unfortunately, there is no way to know which sstable the rows should exist in (and it could be more than one if updates/deletes, etc. occurred). You'll have to go through each sstable on the specific node(s).
Hope that helps answer your question?
Good luck.
-Jim
You can for a specific partition. If you are sure host1 is a replica (nodetool getendpoints or from query trace), then if you make your query with CL.ONE and explicitly to that host, the coordinator will always pick local first. So
Statement q = new SimpleStatement("SELECT * FROM tablename WHERE key = X");
q.setHost("host1")
Where host1 owns X.
For SELECT * FROM tablename its a bit harder because you are looking over entire data set and coordinator will send out multiple queries for each part of ring. If you do some queries with CL.ONE it will still only go to one node for each part of that range so if you set q.enableTracing() you can see what node answered for each range. You have no control over which coordinator picks so may take few queries.
If you just want to see if theres differences you can use preview repair. nodetool repair --preview --full.
I have a 3 node cluster with a replication factor of 3. nodetool status shows that one node has 100gb of data, another 90gb, and another 30gb. Each node owns 100% of the data.
I'm using a unique url as my clustering key, so I would imagine data should be spread around evenly. Even such, since RF is 3 all nodes should contain the same amount of data. Any ideas what's going on?
Thanks.
What is the write consistency level being used? I guess it might be "consistency one" and hence data would get eventually replicated. Especially if the data was dumped in one shot. Try to use "consistency local_quorum" to avoid this issue in future.
Try running a "nodetool repair" and it should bring the data back in sync in all nodes.
Remember the writes from "cqlsh" are by default with "consistency one".
How to use the ByteOrderedPartitioner (BOP) to force specific key values to be partitioned according to a custom requirement. I want to force Cassandra to partition and replicate data according to custom requirements, without introducing a custom partitioner how far I can control this behavior and how ?
Overall: I want my data starting with particular ID to be at a predefined node because I know data will be accessed from that node heavily. Also like the data to be replicated to nearby nodes.
I want my data starting with particular ID to be at a predefined node because I know data will be accessed from that node heavily.
Looks like that you talk about data locality problem, which is really important in bigdata-like computations (Spark, Hadoop, etc.). But the general approach for that isn't to pin data to specific node, but just to move your whole computation to the data itself.
Pinning data to specific node may cause problems like:
what should you do if your node goes down?
how evenly will the data be distributed among the cluster? Will be there any hotspots/bottlenecks because of node over(under)-utilization?
how can you scale your cluster in future?
Moving computation to data has no issues with these questions, but the approach you going to choose - has.
Found the answer here...
http://www.mail-archive.com/user%40cassandra.apache.org/msg14997.html
Changing the setting "initial_token" in cassandra.yaml file we can let the nodes to be divided into key ranges and partitioning will choose the node which is going to save the first replication of the data and strategy class SimpleStrategy will add the replica to proceeding nodes so by arranging the nodes the way you want you can exploit the replication strategy.
I have 4 nodes in cassandra cluster. If I have a replication factor for a keyspace as 4 then taking backup from one node will guarantee that entire data is backed up. Suppose if i set the replication factor as 2 or 3 then taking backup of one node will not backup entire data instead it will backup only the data present in it. For example if I have 4 nodes A,B,C,D and replication factor is 3 and suppose the data is distributed as follows,
node A: 1-10,11-20,21-30
node B: 11-20,21-30,31-40
node C: 21-30,31-40,1-10
node D: 31-40,1-10,11-20
Now if a take the backup from node A and restore the data for some other cluster then I will only get records 1-10,11-20,21-30 but I will lose record 31-40. What is the solution for this? Can't we take the backup of entire data from one node irrespective of replication factor?
The short answer is no. At least automatic backups is a no go.
You do have two other options, but they require "extra labour":
Create a side Keyspace with RF=1 and back it up on all 4 nodes (no need for custom scripts, just enable snapshots). This way you can have a second storage setup just for these backups (mount the backup dir in fstab). You will have "two writes per write" so use batch inserts.
Although your logic for replica location is correct, your conclusion is not. You just have to back any two nodes, since with 4 nodes and RF=3 every combination of two nodes will have the whole range of keys. You will have to watch out when/if you decide to add more nodes..
Option one will require a lot of work if you have to restore data, since you will need to perform a full keyspace read in order to find missing keys.
Option two will be easier in case of irreversible data loss. You just have to run a repair on the keyspaces.
Since I don't know your use case I can't give you a suggestion, but in most failure scenarios Cassandra recovers pretty well by itself with minimal to no downtime to your app.
The rule of thumb is bet on the storage system (using raid or JBOD).
Unfortunately there is no solution for this. Normally backup is run on all nodes.
I am planning to create an application that will use just 1 cassandra table. Replication factor will be probably 2 or 3. I might start initially with 2 cassandra server and then keep adding servers as needed. But I am not sure if I need to pre-plan anything so that the table is distributed uniformly when I add more servers. Are there any best practices or things I need to be aware? I read about tokens , http://www.datastax.com/docs/1.1/initialize/token_generation , but I am not sure what I need to do.
I suppose the keys have to be distrubuted uniformly in the cluster, so:
how will that happen i.e. when I add the 2nd server and say the 1st one already has 1 million keys
do I need to pre-plan the keyspace or tables?
I can suggest two things.
First, when designing your schema, pick a good partition key (1st column in the primary key). You need to ensure a couple of things:
There are enough values such that you can distribute it to an arbitrary amount of nodes. For example, sex would be a bad partition key, because you only have two values and therefore can only distribute it to two nodes.
The distribution across different partition key values is more or less uniform. For example, country might not be best, because you will most likely have most of your rows in just a few unique countries.
Secondly, to ease deployment of new nodes later consider setting up your cluster to use virtual nodes (vnodes). If you do that you will be able to skip a few steps when expanding your cluster.
To configure virtual nodes, set num_tokens in cassandra.yaml to more than 1. This will decide how many virtual nodes your node will have. A recommended value is 256.
Later, when you add new nodes, you need to make sure add_bootstrap is true in cassandra.yaml for your new nodes. Then you configure network parameters as usual to match your cluster, and finally start your node. It should automatically bootstrap and start streaming appropriate data. After everything is settled down, you can run cleanup (nodetool clean) on your other nodes to make sure they purge redundant data that they're no longer responsible for.
For more detailed documentation, please see http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html