Cassandra move data on another cluster [closed] - cassandra

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
i have a 3 node Cassandra cluster (3.11.0) and i want to move this cluster onto another location.
i have a 3 vm in my old cluster (Tashkent cluster)
192.168.1.11 -1 node
192.168.1.12 -2 node
192.168.1.13 -3 node
and i have new (empty) cluster
172.17.5.10 -1 node
172.17.5.11 -2 node
172.17.5.12 -3 node
and i want to move all the data from old cluster, to new and my steps is (old cluster have RF=3 for all keyspaces and have a NetworkTopologystrategy BTW)
1) add old nodes as seeds to new cluster
2) bootstrap new nodes
3) nodetool cleanup on old nodes
4) then run nodetool decommision on old nodes, one by one
5) nodetool removenode <oldnodeip>
so my plan is correct? should i do nodetool repair after ster 3?
thanks

Instead of doing bootstrap/decomission, it's could be much faster to do following:
Add all new nodes as a new data center
Adjust replication factor to use 2nd data center
Run nodetool rebuild from every node of the new DC
Switch applications to new DC
Decomission the DC
Steps 1-3 are described in the following documentation, step 5 is described here.
Another possibility is perform the replacing nodes one by one.
Main advantage in the both cases is that you're minimizing the data movements between the nodes. Otherwise you'll move the data when adding nodes, and then move again when doing decomission or removenode

Nodetool cleanup should be replaced by nodetool repair. Cleanup removes extra data from nodes and running it in older nodes will not be beneficial, while repair streams any missing data. Also, nodetool removenode will not be required after decommission.Both perform the same function but in different ways.

Related

Can I upgrade a Cassandra cluster swapping in new nodes running the updated version?

I am relatively new to Cassandra... both as a User and as an Operator. Not what I was hired for, but it's now on my plate. If there's an obvious answer or detail I'm missing, I'll be more than happy to provide it... just let me know!
I am unable to find any recent or concrete documentation that explicitly spells out how tolerant Cassandra nodes will be when a node with a higher Cassandra version is introduced to an existing cluster.
Hypothetically, let's say I have 4 nodes in a cluster running 3.0.16 and I wanted to upgrade the cluster to 3.0.24 (the latest version as of posting; 2021-04-19). For reasons that are not important here, running an 'in-place' upgrade on each existing node is not possible. That is: I can not simply stop Cassandra on the existing nodes and then do an nodetool drain; service cassandra stop; apt upgrade cassandra; service cassandra start.
I've looked at the change log between 3.0.17 and 3.0.24 (inclusive) and don't see anything that looks like a major breaking change w/r/t the transport protocol.
So my question is: Can I introduce new nodes (running 3.0.24) to the c* cluster (comprised of 3.0.16 nodes) and then run nodetool decommission on each of the 3.0.16 nodes to perform a "one for one" replacement to upgrade the cluster?
Do i risk any data integrity issues with this procedure? Is there a specific reason why the procedure outlined above wouldn't work? What about if the number of tokens each node was responsible for was increased with the new nodes? E.G.: 0.16 nodes equally split the keyspace over 128 tokens but the new nodes 0.24 will split everything across 256 tokens.
EDIT: After some back/forth on the #cassandra channel on the apache slack, it appears as though there's no issue w/ the procedure. There were some other comorbid issues caused by other bits of automation that did threaten the data-integrity of the cluster, however. In short, each new node was adding ITSSELF to list list of seed nodes as well. This can be seen in the logs: This node will not auto bootstrap because it is configured to be a seed node.
Each new node failed to bootstrap, but did not fail to take new writes.
EDIT2: I am not on a k8s environment; this is 'basic' EC2. Likewise, the volume of data / node size is quite small; ranging from tens of megabytes to a few hundred gigs in production. In all cases, the cluster is fewer than 10 nodes. The case I outlined above was for a test/dev cluster which is normally 2 nodes in two distinct rack/AZs for a total of 4 nodes in the cluster.
Running bootstrap & decommission will take quite a long time, especially if you have a lot of data - you will stream all data twice, and this will increase load onto cluster. The simpler solution would be to replace old nodes by copying their data onto new nodes that have the same configuration as old nodes, but with different IP and with 3.0.24 (don't start that node!). Step-by-step instructions are in this answer, when it's done correctly you will have minimal downtime, and won't need to wait for bootstrap decommission.
Another possibility if you can't stop running node is to add all new nodes as a new datacenter, adjust replication factor to add it, use nodetool rebuild to force copying of the data to new DC, switch application to new data center, and then decommission the whole data center without streaming the data. In this scenario you will stream data only once. Also, it will play better if new nodes will have different number of num_tokens - it's not recommended to have different num_tokens on the nodes of the same DC.
P.S. usually it's not recommended to do changes in cluster topology when you have nodes of different versions, but maybe it could be ok for 3.0.16 -> 3.0.24.
To echo Alex's answer, 3.0.16 and 3.0.24 still use the same SSTable file format, so the complexity of the upgrade decreases dramatically. They'll still be able to stream data between the different versions, so your idea should work. If you're in a K8s-like environment, it might just be easier to redeploy with the new version and attach the old volumes to the replacement instances.
"What about if the number of tokens each node was responsible for was increased with the new nodes? E.G.: 0.16 nodes equally split the keyspace over 128 tokens but the new nodes 0.24 will split everything across 256 tokens."
A couple of points jump out at me about this one.
First of all, it is widely recognized by the community that the default num_tokens value of 256 is waaaaaay too high. Even 128 is too high. I would recommend something along the lines of 12 to 24 (we use 16).
I would definitely not increase it.
Secondly, changing num_tokens requires a data reload. The reason, is that the token ranges change, and thus each node's responsibility for specific data changes. I have changed this before by standing up a new, logical data center, and then switching over to it. But I would recommend not changing that if at all possible.
"In short, each new node was adding ITSSELF to list list of seed nodes as well."
So, while that's not recommended (every node a seed node), it's not a show-stopper. You can certainly run a nodetool repair/rebuild afterward to stream data to them. But yes, if you can get to the bottom of why each node is adding itself to the seed list, that would be ideal.

Added new nodes to cassandra cluster and data is missing

I have added new 4 nodes to existing 4 nodes cluster. Now some data are missing on cluster.
What can be the reason for it? what can I do for resolve it?
Data missing keyspace RF is 1 when I was adding to the cluster. It can be a issue?
Note: Once I added new nodes to cluster executed repair command to all nodes
You really shouldn't be running a RF of 1.
I imagine that if you added them all in a short timeframe with a low RF that the VNodes got shuffled from one node to another without settling. I'm suprised a full repair didn't do anything.
You might check the HDDs of the original nodes to see if the repair didn't delete the old data. If it's still there you may be able to remove the new nodes (temporarily) and then add each node back in 1 by 1 while repairing.
Edit: additionally probably use an odd number of nodes.

looking for an opensource in memory database with indexes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
we are looking for an opensource in memory database which can support indexes.
The use case is that we have lot of items that are going to grow in a big way.
Each item has a few fields on which we need to query.
Currently we store the data in application's memory. However with increasing data, we have to think about distributing/sharding the db.
We have looked at a few options
Redis cluster could be used, but it does not have the concept of
indexes or SQL like queries.
Apache Ignite is both in-memory, and distributed as well as provides
SQL queries. However, the problem is that ignite fires all
queries into all master nodes, so that the final result will be
slower than the slowest of those queries. It seems like a problem
because a non performing/slow node out of a number of nodes can
really slow down the application a lot. Further in ignite, reads are
done from the masters and slaves are not used, so that it is
difficult to scale the queries. Increasing the nodes will have
negative impact as the no of queries will increase and it will be
even slower.
Cassandra - The in-memory option in cassandra can be used, but it
seems that the max size of a table per node can be 1 GB. If
our table is more than 1 GB, we will have to resort to partitioning
which will inturn lead cassandra to make multiple queries(one per
node) and it is a problem(same as ignite). Not sure whether reads in
cassandra in-memory table can be scaled by increasing the number of
slaves.
We are open to other solutions but wondering whether the multi-query will be a problem everywhere(like hazelcast).
The ideal solution for our use case would be an in-memory database with indexes which could be read scaled by increasing the number of slaves. Making it distributed/sharded will lead to multiple queries and we are reluctant because one erring node could slow the whole system down.
Hazelcast supports indexes (sorted & unsorted) and what is important there is no Multi-Query problem with Hazelcast.
Hazelcast supports a PartitionPredicate that restricts the execution of a query to a node that is a primaryReplica of the key passed to the constructor of the PartitionPredicate. So if you know where the data resides you can just query this node. So no need to fix or implement anything to support it, you can use it right away.
It's probably not reasonable to use it all the time. Depends on your use-case.
For complex queries that scan a lot of data but return small results it's better to use OBJECT inMemoryFormat. You should get excellent execution times and low latencies.
Disclaimer: I am GridGain employee and Apache Ignite committer.
Several comments on your concerns:
1) Slow nodes will lead to problems in virtually any clustered environment, so I would not consider this as disadvantage. This is reality you should embrace and accept. It is necessary understand why it is slow and fix/upgrade it.
2) Ignite are able to perform reads from slaves both for regular cache operations [1] and for SQL queries executed over REPLICATED caches. In fact, using REPLICATED cache for reference data is one of the most important features allowing Ignite to scale smoothly.
3) As you correctly mentioned, currently query is broadcasted to all data nodes. We are going to improve it. First, we will let users to specify partitions to execute the query against [2]. Second, we are going to improve our optimizer so that it will try to calculate target data nodes in advance to avoid broadcast [3], [4]. Both improvements will be released very soon.
4) Last, but not least - persistent layer will be released in several months [5], meaning that Ignite will become distributed database with both in-memory and persistence capabilities.
[1] https://ignite.apache.org/releases/mobile/org/apache/ignite/configuration/CacheConfiguration.html#isReadFromBackup()
[2] https://issues.apache.org/jira/browse/IGNITE-4523
[3] https://issues.apache.org/jira/browse/IGNITE-4509
[4] https://issues.apache.org/jira/browse/IGNITE-4510
[5] http://apache-ignite-developers.2346864.n4.nabble.com/GridGain-Donates-Persistent-Distributed-Store-To-ASF-Apache-Ignite-tc16788.html
I can give opinions on cassandra. Max size of your table per node is configurable and tunable so it depends on the amount of the memory that you are willing to pay. Partitioning is built in into cassandra so basically cassandra manages it for you. It's relatively simple to do paritioning. Basically first part of the primary key syntax is partitioning key and it determines on which node in the cluster the data lives.
But I also guess you are aware of this since you are mentioning multiple query per node. I guess there is no nice way around it.
Just one slight remark there is no master slaves in cassandra. Every node is equal. Basically client asks any node in the cluster, this node then becomes coordinator nodes and since it gets partitioning key it knows which node to ask the data for and it gives it then to the client.
Other than that I guess you read upon cassandra enough (from what I can see in your question)
Basically it comes down to the access pattern, if you know how you are going to access your data then it's the way to go. But other databases are also pretty decent.
Indexing with cassandra usually hides some potential performance problems. Usually people avoid it because in cassandra index has to be build for every record there is on whole cluster and it's done per node. This doesn't really scale. Basically you always have to do query first no matter how ypu put it with cassandra.
Plus the in memory seems to be part of the DSE cassandra. Not the open source or community one. You have to take this into account also.

Cassandra Query for a Specific Node

I am using cassandra 1.2.15 cluster with 4 number of nodes and having a keyspace with a replication of 2 in Simple Network Topology. And I am using Murmur3Partitioner. I have used default configurations that are available in the yaml file. First node is the seed node, other 3 nodes pointed the first node as seed node.
First node yaml configuration is
initial_token: left empty
num_tokens: 256
auto_bootstrap: false
Other 3 nodes yaml configuration is
initial_token: left empty
num_tokens: 256
auto_bootstrap: true
I have three questions, My main question is Question1,
Question 1:
I need to query a specific node on the cluster. (ie) In a four node cluster I need to make a query to select all the rows in a column family for the node 2 alone.Is it possible? If yes how to proceed?
Question 2:
Whether my yaml configuration is correct or not for the above approach?
Question 3:
Whether this configuration will make any trouble in future, if I add two nodes in the cluster?
Q1 I need to query a specific node on the cluster. (ie) In a four node cluster I need to make a query to select all the rows in a column family for the node 2 alone.Is it possible? If yes how to proceed?
Nope, not possible. What you can do is query a specific datacenter using the LOCAL_QUORUM or EACH_QUORUM consistency levels. Or you can connect to a specific node and query the system KS which is specific to each node (by specifying the address in either cqlsh or your driver). There are some times where this can be useful, but it's not what you're after.
Q2 Whether my yaml configuration is correct or not for the above approach?
In 1.2 I think it might be a better idea to populate the tokens on your own for your initial nodes rather than leaving that to C*.
As for auto_bootstrap, false is the right choice for a fresh cluster node:
This setting has been removed from default configuration. It makes new (non-seed)
nodes automatically migrate the right data to themselves. When initializing a
fresh cluster with no data, add auto_bootstrap: false.
Q3 Whether this configuration will make any trouble in future, if I add two nodes in the cluster?
I'd advice you to move away from simple network topology simply because it complicates the process of expanding to multiple data centres. Another thin to remember is to enable auto-bootstrap for your new nodes and it should work quite nicely with v-nodes.

Usage of Cassandra 1.2 Vnodes in production [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
One year passed since Vnodes was released with Cassandra 1.2. I have read a couple of Datastax articles describering this feature, they said the feature is awesome, but I want to ask those people who use it in production:
Is it really stable and ready to production?
What about Repair speed and disk usage overhead while Repair is running? Very important for us
What about rebalancing speed?
What about Hadoop stability/performance while using it with Cassandra vnodes enabled?
When should I avoid of using vnodes?
We have 1.5Tb per node with RF=3. When I turned vnodes on is all the data will be redistributed? My concern is network
I can't answer all of your questions, but here's what I can help with.
Repair is only very slightly affected by vnodes. Assuming you have 256 tokens per node, there are 256 times as many repair tasks with each one being 256 times smaller. For anything other than a very small amount of data, the extra overhead in creating the extra tasks is negligible. So I don't think you will notice any difference with repair with 1.5 TB of data.
You don't need to rebalance with vnodes. When you add and remove nodes the cluster remains balanced.
Upgrading to vnodes is the biggest challenge. Practically all data needs to be redistributed. This can be done with shuffle (which in practice doesn't work very well so is not recommended), decommissioning and bootstrapping each node (which leaves one node temporarily storing a copy of all your data) or by duplicating your hardware and creating a new virtual data center and then decommissioning the old one.

Resources