Extending Cassandra cluster with datacenter in China (CGF) - cassandra

I need to extend my cluster with a new datacenter to be present in China mainland, behind the Great Firewall. Currently I have datacenters in the US and Europe - so the cluster already matches to the requirements of the Geographical Location Scenario.
At this point I have the chinese infrastructure ready for Cassandra, but the network statistics from the past few days are bit troublesome and I am a bit afraid: if and how this can effect my current cluster and will be the new datacenter functional at all?
My actual questions regarding this are:
How does Cassandra handle huge packet-loss during replication? (occasionally up to 40%)
How does it effect the cluster when the network connection between two datacenters are really bad (only few kilobits/sec and latency as above) for hours?
Will the chinese dc considered as dead? Or Cassandra will still try to use the limited bandwidth?
Can this cause any problem on the non-chinese datacenters? e.g. they get slow, which results in client request timeouts.
Is it possible to enforce somehow, that only one of my non-chinese datacenter communicates with the chinese one? Or should I trust that Cassandra will handle this? (trying to avoid to possible harm all my datacenters)
Is there any way to fasten up the initial data replication (nodetool rebuild), because with the current speed it would take weeks to replicate our current data.
Any suggestion or remark is welcomed, thanks!

How does Cassandra handle huge packet-loss during replication? (occasionally up to 40%)
Usually packet loss will cause a large number of read repairs. In some cases it can cause requests to fail depending on replication factor and consistency. Also, be prepared to have very costly repairs which will create a lot of tiny SSTables and a substansial amount of IO.
I would suggest to run a test on a development requirement to see the actual behavior in your system. There are plenty of tools to simulate bad network.
How does it effect the cluster when the network connection between two datacenters are really bad (only few kilobits/sec and latency as above) for hours? Will the chinese dc considered as dead? Or Cassandra will still try to use the limited bandwidth? Can this cause any problem on the non-chinese datacenters?
It largely depends on how bad and what consistency level/replication factor you are running with. In some cases it will just cause rather high latency between clusters. However, if the connection is bad enough that the nodes will start marking the other as down - Then you are looking at issues in all datacenters. Your existing datacenters will struggle with performance caused by requests timing out. This will in turn cause requests to be held longer in memory which can lead to GC. (It can cause a number of other issues in your other cluster as well)
The threshold on how sensitive the failure detector is can be adjusted and fine tuned to suit your use case. phi_convict_threshold is a setting that can decrease the likelihood of a node being marked as down. You can find more about that here. If you find that sweet spot where your nodes are not marked down due to being unresponsive, you can have Cassandra leverage what little it has to work with.
Is it possible to enforce somehow, that only one of my non-chinese datacenter communicates with the chinese one? Or should I trust that Cassandra will handle this? (trying to avoid to possible harm all my datacenters)
There is not really a way to tell Cassandra to limit what datacenters to speak to. You are kind of stuck with communicating between the datacenters you include in your replication factor.
Is there any way to fasten up the initial data replication (nodetool rebuild), because with the current speed it would take weeks to replicate our current data.
I would recommend against the solution of using sstableloader for it functions very similar as rebuild does and requires a snapshot to operate. If network is what is causing the slow speed, then changing the way of streaming is not going to make much difference.
In my opinion, the first thing to do would be to measure where the bottleneck is for your system. If the slow network is really the bottleneck, one could add more nodes to stream from more sources at the same time but ultimately you will still be hampered by the slow network connection.

Related

Cassandra Scaling: Is it a good idea to use a common mount for multi node Cassandra DB?

I have Cassandra running in two different DC, and now it's time to scale it up and add more storage. Unfortunately, I'm not able to add storage on the existing partitions due to restrictions/limitations. I'd like to know would it be a good idea to use one common mount(NFS) to store the data. I know Cassandra is distributed across many nodes but can they share a common mount to access the data?
Thank you,
No, it is not a good idea to do that. Essentially, you're trading disk I/O for network I/O; so it'll perform terribly. Also, you're introducing a single point of failure into your cluster.
DataStax published a blog post on this a couple of years ago. The important thing to remember, is that blog posts don't usually happen about isolated incidents. They happen because someone sees the same thing causing problems over and over again, and they're trying to stop others from rationalizing that same mistake.
https://www.datastax.com/dev/blog/impact-of-shared-storage-on-apache-cassandra

RethinkDB with more throughput?

I am looking to build out a realtime pubsub database backend. RethinkDB is actually a perfect package for what I need, mainly because of it's very low latency changefeeds. But RethinkDB seems to be a DB that you can expect about 10k-20k inserts per second on two machines. Whereas I have seen some postings claim people are getting 1 million inserts per second on DB's like Cassandra with comparable hardware, but Cassandra doesn't have the realtime changefeeds feature.
So my question is, is there another DB, or combination of open source systems, which can provide the low latency changefeed functionality of RethinkDB, but enable it to occur on a scale much much larger than RethinkDB? Both quantity of inserts per second, and amount of users that are subscribed to change feeds are both important requirements that need to be high as possible.
RethinkDB might still fit your needs if you can scale out to a robust cluster (lots of nodes). Below is a link to a report they generated with performance metrics scaling up to a 16-node cluster.
https://rethinkdb.com/docs/2-1-5-performance-report/

Hazelcast : server node and client node best practices for beginner

new to hazelcast, want to understand functionalities of client and server functions in a cluster.
lets say I have 4 different servers(not referring to hazelcast server)/machines and I want to maximize RAM utilization :-
Do I start 4 servers instances, one on each server/machine ?
Do I start 4 clients instances, one on each server/machine ?
Is business logic written only in client instance ? if so, then do server instance not contain any logic apart from managing the lifecycle ?
I know this would vary as per requirement, but I want to get a general idea.
Adding on to Ernest's statements. You would usually expect data to be held in cache and processing to be on the client. However, with hazelcast, it doesn't have to be that way. Check some interesting features like ExecutorService and EntryProcessors in the documentation.
You may also want to look at the concept of Near cache, where you still hold the data on dedicated Hz instance (servers), while maintaining a near-cache in the client. Be wary of data sync challenges around this, though this works well in most cases (again very subjective).
Hope these pointers give some idea to start off with. All the Best !
there is no single answer to your question. There are many factors to be considered. For example one of your questions is where does the business logic reside. This depends heavily on how the hazelcast is used. Lets say Hazelcast is used purly for Caching purposes. The business logic then resides entirly on the client side.
Alternativly if we say that Hazelcast is full of rich Pojos, and domain driven design is used then we can say the logic lies entirly on the hazelcast instance itself. Usualy in real life the truth is somewhere on between
In terms of memory utilization again this depends very much on your setup budget and so on.... We can say that if you have one server with a lot of ram and you don't use any commersial addons from Hazelcast like off memory heap then running several hazelcasts on the same machine with limited amount of memory each would be more beneficial compared to running a single node with a lot of memory.
Also it should be noted the phenomenon where allocating more than 32Gigs of heap will drive you into te 64 bit universe.
Again this depends on many factors. If you have a Live interactive application you can not tolerate big GC pausas so you would incline to usage of more hazelcasts with small heaps. If you have non interactive application tolerant to big GC pauses then it is the other way around you can have big heap. So you see there is no simple answer to your question.

Sizing an Azure Geo Replicated Database

If I have an S2 Sql Database, and I create a secondary geo-replicated database, should it be of the same size (S2)? I see that you get charged for the secondary DB, but the DTU's reported against that secondary are 0%, which seems to indicate that S2 is too large.
Obviously, we'd like to save the cost if at all possible and move the secondary to a smaller size if at all possible.
Considerations
I understand if we need to failover to the secondary, at that point, it would need to be bumped up to the size of S2 to meet the production workloads, but assuming that we could do this at the time of failover?
I also get that if we were actively using the replicated DB for reporting, etc, then we'd have to size it accordingly to meet that demand. But currently, we are not actively using the secondary for anything other than to use as a failover point if it is ever needed.
At this point both primary and secondary must be in the same edition but can have different performance objectives (DTU size). We are working on lifting that limitation so that geo-replication databases could scale to a different edition when needed without breaking the replication links (e.g. standard to premium).
Re sizing the secondary, you *can" make it smaller in DTU than the primary if you believe that the updates take less capacity than reads (high read/write ratio). But as noted earlier, you will have to upsize it right after the failover and it may take time during which your app's performance will be impacted. In general, we do not recommend having the secondary more than 1 level smaller. E.g. S3->S1 is not a good idea as it will likely cause replication lag and may result in excessive data loss after failover.
You can safely change the tier of the secondary database, but bear in mind, that in the case of failover, you will face performance issues. Also you cant scale past your current performance Tier (so both bases ought to be of the same Tier).
And yes, you can change the size past failover, but the process is manual.

synthetic multi-node crossbar system implementation

I am implementing a system composed of a collection of small systems, ie. Raspberry, Yun, Beaglebone, the occasional PC. Crossbar.io has real promise ... but, as I understand it, doesn't currently support multiple nodes. Am I correct? Does anyone know when that might happen?
In the meantime it occurred to me that each individual node can offer an http interface that I might be able to use for my purposes. My initial thought is to crate workers that wrap access to the web the interface on subsidiary nodes. This fits the overall architecture of the system I want to create - does it have any merit? Is it tractable? I'm new to websockets - and insight would be a great help.
Thanks for your time,
Al
In general that does sound like a fit for Crossbar.io.
There is no timeline on multi-node (i.e. multiple routers), but we hope to have at least hot-standby nodes for high availability ready in Q1. Other than for high availability, I think that a single instance should provide sufficient performance for most applications out there - on a single current (non-high-end) Xeon we're talking tens of thousands of events a second, and concurrent connections are mostly limited by RAM (and 100s of thousands on a single box are definitely not a problem). (If you need more than that then I'd be very interested in your specific use case - we want to learn more about our users.)
I don't completely understand the second part of your question: What precisely is the architecture you're planning here? If you're talking about the integrated Web server, then with recent optimizations (it can now use multiple cores) this should be enough for even moderately big sites, and with SPAs you're not likely to ever run into performance issues.
Hope this helps, and I'll be glad to answer in more detail once you've clarified the second part.

Resources