HazelCast list, partition data loss - hazelcast

When I have 2 node cluster, list.add(xxx) adds data by splitting it to two different nodes.
eg: Node-1 - carries 3 instances,
Node-2 - carries 3 instances,
When we have a total of 6.
When I perform an abrupt shutdown of any one of the Node/s, the data/instance in the node gets lost and only 3 is available.
Is it the nature, or can we tune to get the other 3 also.
Any help would be appreciated !

Related

Slurm: can i create e sub-queue using a subset of resources in a single node?

I have a use case with slurm and I wonder if there is a way to handle it.
Constraints:
I would like to run several jobs (say 60 jobs).
Each one takes a few hours, e.g. 3h/job.
In the cluster managed by slurm, I use a queue with 2 nodes with 4 gpus each (so I can restrict my batch script to one node).
Each job takes 1 gpu.
Problem: if I put everything in the queue, I will block 4 gpus even if I specify only 1 node.
Desired solution: avoid blocking a whole machine by taking, say, 2 gpus only.
How can I put them in the queue without them taking all 4 gpus?
Could I create a kind of sub-file that would be limited to a subset of resources of a node for example?
You can use the Slurm consumable trackable resources plug-in (cons_tres enabled in your slurm.conf file- more info here: https://slurm.schedmd.com/cons_res.html#using_cons_tres) to:
Specify the --gpus-per-task=X
-or-
Bind a specific number of gpus to the task with --gpus=X
-or-
Bind the task to a specific gpu by its ID with --gpu-bind=GPUID

Adding a new node in the topology after the given time interval

I am writing an algorithm for which I want to add new nodes in the topology after every 1 minute for 5 minutes. Initially the topology contains 5 nodes, so after 5 minutes it should have total 10 nodes. How can I implement this in the simulation script. Which behaviour will be best suited to do this ?

Can we modify the number of tablets/shards per table after we create a universe?

As described in this example each tserver started with 12 tablet as we set number of shards to 4.
And when we added a new node the number of tablet per tserver became 9. it seems the total number of tablet, which is 36, will not increase.
My question is:
How many node could we add while we have 36 total tablet(in this example)?
And Is it possible to increase shards count in a running universe to be able to add more node?
How many node could we add while we have 36 total tablet(in this example)?
In this example, you can expand to 12 nodes (each node would end up with 1 leader and 2 followers).
Reasoning: There are 36 total tablets for this table and the replication factor is 3. So there will 12 tablet leaders and 24 tablet followers. Leaders are responsible for handling writes and reads (unless you're doing follower reads, lets assume that is not the case). If you go to 12 nodes, each node would at least have one leader and be doing some work.
Ideally, you should create enough tablets upfront so that you end up with 4 tablets per node eventually.
And Is it possible to increase shards count in a running universe to be able to add more node?
This is currently not possible, but being worked on and getting close to the finish. expected to be released in Q1 2020. If you are interested in this feature, please subscribe to this GitHub issue for updates.
Until that is ready, as a workaround, you can split the table into sufficient number of tablets.

Cassandra cluster it taking lot of time When I add new node

I've 5 node Cass cluster and each one currently owning 1 TB of data. When I tried adding another node, it almost took 15+ Hours to bring to 'UN' State.
Is there a way to make it fast?
Cassandra version: 3.0.13
Environment : AWS , m4.2xlarge machines.
1 TB is a lot of data per node. Since you have a 5 node cluster and you're adding a new node that node will take 0,833 TB data that has to be streamed from all nodes. That is the equivalent of 6,67 Tbit, or 6990507 Mbit. Cassandra has a default value for stream_throughput_outbound_megabits_per_sec of 200. 6990507รท200 = 34952,535 seconds = 9,7 hours to transfer all data. Since you're probably running other traffic at the same time etc. this could very well take 15 hours.
Solution: Change the stream_throughput_outbound_megabits_per_sec on all nodes to a higher value.
Note: Don't forget to run nodetool cleanup after the node has joined the cluster.

Why does hinted handoff not work sometimes?

I was testing the effect of playing around with some of the parameters relevant to hinted handoff.
I have 3 nodes in the datacenter with replication strategy as SimpleStrategy. I wrote 200k entries(12 MB) in 2 nodes while the 3rd node was down. After the write was successful, I brought up the 3rd node.
Now I left the system undisturbed for 3 minutes and after 3 minutes I turned off the 2 nodes that had been on since the beginning. Now I queried the 3rd node via cql.
The above procedure was repeated thrice. All the configurations were exactly similar in the 1st and 3rd iteration.
The parameters I played with were hinted_handoff_throttle_in_kb and max_hints_delivery_threads.
In the 1st and 3rd iteration, I had set hinted_handoff_throttle_in_kb: 2048, max_hints_delivery_threads: 4 and in the 2nd iteration I had set hinted_handoff_throttle_in_kb: 1024, max_hints_delivery_threads: 2.
The observations:
In the 1st iteration, node 3 contained more than 195k rows.
In the 2nd iteration, node 3 contained more than 60k rows.
In the 3rd iteration, node 3 contained 0 rows.
I'm not able to understand what makes hinted handoff work in the first 2 cases, but not in the 3rd case despite the fact that during the 1st and 3rd iteration, all the configurations were exactly similar.
System: RHEL
Cassandra Version: 3.0.14

Resources