I am writing an algorithm for which I want to add new nodes in the topology after every 1 minute for 5 minutes. Initially the topology contains 5 nodes, so after 5 minutes it should have total 10 nodes. How can I implement this in the simulation script. Which behaviour will be best suited to do this ?
Related
Here the thing. I am try to simulate a manufacturing plants. It is said that the Machine A operated by 5 workers and statistics show that on average, every 20 working days there will be 2 workers off 1 day. So how do I set up the arena so that out of these 5 people, 2 people will have a day of. I was thinking about using Failure, but i cant find the function to take random 2 out of 5 workers.
I have the next SLA for my big data project: regardless amount of concurrent spark tasks, the max execution time shouldn't exceed 5 minutes. For example: there are 10 spark concurrent tasks,
the slowest task shall take < 5 mins, as the number of tasks increases I have to be sure that this time wouldn't exceed 5 mins. Using the usual autoscaling is not appropriate here because adding new nodes takes a couple of minutes and it doesn't solve a problem with the exponential growth of tasks (e.g. skew from 10 concurrent tasks to 30 concurrent tasks).
I came across the idea to spin up a new cluster on demand so meet the SLA requirements. Let's say, I found the max number of the concurrent tasks (all of them are almost equal and take the same resources) which can be executed simultaneously on my cluster within 5 mins e.g. - 30 tasks. when the number of tasks approaches the threshold, the new cluster will be spinned up. The idea of this pattern is to struggle slowness during autoscaling and meet the SLA.
My question is next: are there any alternative options to my pattern except autoscaling on a single cluster (it's not suitable for my usecase because of slowness of my spark provider).
we have 8 cassandra nodes in 2 DC configuration and we have around 250k record insertion per second and our ttl is 1 year , Recently we upgraded to 3.0.14 . Our main problem is slow read performance .
to improve this we increased compaction period from 1 hour to 1 week .( Full compaction still needs to run ) But now we are thinking to install new cassandra nodes and divide data to 2 pieces ( 6 months period each ) and new casandra nodes can be responsible only querying first 6 months .Is this possible to move data like this ? is the procedure to follow ? thanks
When I have 2 node cluster, list.add(xxx) adds data by splitting it to two different nodes.
eg: Node-1 - carries 3 instances,
Node-2 - carries 3 instances,
When we have a total of 6.
When I perform an abrupt shutdown of any one of the Node/s, the data/instance in the node gets lost and only 3 is available.
Is it the nature, or can we tune to get the other 3 also.
Any help would be appreciated !
I'm currently exploring CouchDB replication and trying to figure out the difference between max_replication_retry_count and retries_per_request configuration options in [replicator] section of configuration file.
Basically I want to configure continuous replication of local couchdb to the remote instance that would never stop replication attempts, considering potentially continuous periods of being offline(days or even weeks). So, I'd like to have infinite replication attempts with maximum retry interval of 5 minutes or so. Can I do this? Do I need to change default configuration to achieve this?
Here's the replies I've got at CouchDB mailing lists:
If we are talking Couch 1.6, the attribute retries_per_request
controls a number of attempts a current replication is going to do to
read _changes feed before giving up. The attribute
max_replication_retry_count controls a number of attempts the whole replication job is going to be retried by a replication manager.
Setting this attribute to “infinity” should make the replicaton
manager to never give up.
I don’t think the interval between those attempts is configurable. As
far as I understand it’s going to start from 2.5 sec between the
retries and then double until reached 10 minutes, which is going to be
hard upper limit.
Extended answer:
The answer is slightly different depending if you're using 1.x/2.0
releases or current master.
If you're using 1.x or 2.0 release: Set "max_replication_retry_count =
infinity" so it will always retry failed replications. That setting
controls how the whole replication job restarts if there is any error.
Then "retries_per_request" can be used to handle errors for individual
replicator HTTP requests. Basically the case where a quick immediate
retry succeeds. The default value for "retries_per_request" is 10.
After the first failure, there is a 0.25 second wait. Then on next
failure it doubles to 0.5 and so on. Max wait interval is 5 minutes.
But If you expect to be offline routinely, maybe it's not worth
retrying individual requests for too long so reduce the
"retries_per_request" to 6 or 7. So individual requests would retry a
few times for about 10 - 20 seconds then the whole replication job
will crash and retry.
If you're using current master, which has the new scheduling
replicator: No need to set "max_replication_retry_count", that setting
is gone and all replication jobs will always retry for as long as
replication document exists. But "retries_per_request" works the same
as above. Replication scheduler also does exponential backoffs when
replication jobs fail consecutively. First backoff is 30 seconds. Then
it doubles to 1 minute, 2 minutes, and so on. Max backoff wait is
about 8 hours. But if you don't want to wait 4 hours on average for
the replication to restart when network connectivity is restored, and
want to it be about 5 minutes or so, set "max_history = 8" in the
"replicator" config section. max_history controls how much history of
past events are retained for each replication job. If there is less
history of consecutive crashes, that backoff wait interval will also
be shorter.
So to summarize, for 1.x/2.0 releases:
[replicator] max_replication_retry_count = infinity
retries_per_request = 6
For current master:
[replicator] max_history = 8 retries_per_request = 6