Cassandra - alter Keyspace without downtime - cassandra

I'm coming from a relational databases background. In PostgreSQL or MySQL if we do any alter statement it'll lock the entire table.
I have a Cassandra cluster(3 nodes) that has SimpleStretegy for all the keyspaces. One keyspace has 4 tables and each table has 500GB data.
So if I alter the keyspace to change the strategy, is there any lock or block?

The change itself is fast, keyspace in Cassandra is not the physical object, it's just a metadata. But when you change replication, you need to run repair operation as per documentation:
Simply altering the keyspace may lead to faulty data replication.
If you have one datacenter, then you may not get this problem, but it still better to run the repair.

Related

How to flush data in all tables of keyspace in cassandra?

I am currently writing tests in golang and I want to get rid of all the data of tables after finishing tests. I was wondering if it is possible to flush the data of all tables in cassandra.
FYI: I am using 3.11 version of Cassandra.
The term "flush" is ambiguous in this case.
In Cassandra, "flush" is an operation where data is "flushed" from memory and written to disk as SSTables. Flushing can happen automatically based on certain triggers or can be done manually with the nodetool flush command.
However based on your description, what you want is to "truncate" the contents of tables. You can do this using the following CQL command:
cqlsh> TRUNCATE ks_name.table_name
You will need to iterate over each table in the keyspace. For more info, see the CQL TRUNCATE command. Cheers!

Cassandra multi-DC:write on LOCAL and read from any DC

We use a multi-data center (DC) cassandra cluster. During write on to the cluster, I want only LOCAL DC to perform writes on its nodes as we are already routing the write requests to the desired DC only based on the source from where write is initiated. So, I want only LOCAL DC to process the write and no other DC to perform the writes on its nodes. But later on by virtue of replication among nodes across DCs, I want the written data to be replicated across DCs. Is this replication across DCs possible when I am restricting the write to only one DC in the first place.If I do not open connections to REMOTE hosts lying in different DCs during my write operation, is data replication possible amongst DCs later on. Why I definitely need replicas of data in all DCs is because during data read from cluster, we want the data to be read from any DC the read request falls on, not necessarily the LOCAL one.
Do anyone has solution to this?
You may want to use Local_Quorum consistency for writes if you want to perform them in only Local DC.
Check keyspace definition for the one you want these restriction. It should have class "Network topology" and RF in both DC. Something like this:
ALTER KEYSPACE <Keyspace_name> WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
It states that after consistency is satisfied Cassandra will propagate the writes to another DC.
Use Quorum consistency for reads if they are not restricted to one DC but be aware that it might add bit latency because Cassandra has to read data from other data center as well.

Decommissioning One of Two Datacenter

I have one cassandra datacenter. Let's name it DC1. Then I added new datacenter for extending purporse in nodesize. Let's name it DC2. I use replication_factors DC1:3 and DC2:3. I write all my data as LocalDC=DC2 and ConsistencyLevel.LocalQuorom. I am sure that all write requests go to DC2. I want to remove DC1 and I dont want to run nodetool repair command. I dont want to wait.
Can I just simply change all keypaces replication_factor DC2:3 and run nodetool decommission on DC1 nodes?
Yes.
As you said you are sure that there is no data latency between two data centers you can skip that step.
Change all key space replication strategy using ALTER
Decommission all the nodes one by one
See this: https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html

Cassandra: Migrate keyspace data from Multinode cluster to SingleNode Cluster

I have a keyspace in a multi-node cluster in QA environment. I want to copy that keyspace to my local single-node cluster. Is there any direct way to do this? I can't afford to write some code like SSTableLoader implementation at this point of time. Please suggest the quickest way.
Make sure you have plenty of free disk space on your new node and that you've properly set replication factor and consistency levels in your tests/build for your new, single node "cluster"
First, restore the exact schema from the old cluster to your new node. After that the data can be loaded in two ways:
1.) Execute the "sstableloader" utility on every node in your old cluster and point it at your new node. sstableloader is token aware, but in your case it will end up shipping all data to your new, single node cluster.
sstableloader -d NewNode /Path/To/OldCluster/SStables
2.) Snapshot the keyspace and copy the raw sstable files from the snapshot folders of each table in your old cluster to your new node. Once they're all there, copy the files to their corresponding table directory and run "nodetool refresh."
# Rinse and repeat for all tables
nodetool snapshot -t MySnapshot
cd /Data/keyspace/table-UUID/snapshots/MySnapshot/
rsync -avP ./*.db User#NewNode:/NewData/Keyspace/table-UUID
...
# when finished, exec the following for all tables in your new node
nodetool refresh keyspace table
Option #1 is probably best because it will stream the data and compact naturally on the new node. It's also less manual work. Option #2 is good, quick, and dirty if you don't have a direct line from one cluster to the other. You probably won't notice much difference since it's probably a relatively small keyspace for QA.

Cassandra cluster with each node total replication

Hi I'm new to Cassandra. I have a 2 node Cassandra cluster. For reasons imposed by the front end I need...
Total replication of all data on each of the two nodes.
Eventual consistent writes. So the node being written to will respond with an acknowledge to the front end straight away. Not synchronized on the replication
Can anyone tell me is this possible? Is it done in the YAML file? I know there is properties there for consistency but I don't see that any of the Partitioners suit my needs. Where can I set the replication factor?
Thanks
You set the replication factor during creation of the keyspace. So if you use (and plan for the future on using) a single data center set-up, you create the keyspace using cqlsh like so
CREATE KEYSPACE "Excalibur"
WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 3};
Check out the documentation regarding the create keyspace. How this is handled internally is related to the snitch definition of the cluster and a strategy option defined per keyspace. In the case of the SimpleStrategy above, this simply assumes a ring topology of your cluster and places the data clockwise in that ring (see this).
Regarding consistency, you can very different levels of consistency for write and read operations in your client/driver during each operation:
Cassandra extends the concept of eventual consistency by offering tunable consistency―for any given read or write operation, the client application decides how consistent the requested data should be.
Read the doc
If you use Java in your clients, and the DatatStax Java driver, you can set the consistency level using
QueryOptions.setConsistencyLevel(ConsistencyLevel consistencyLevel)
"One" is the default setting.
Hope that helps

Resources