Multiple instances of Cassandra on each node in the cluster - cassandra

Is it possible to have a cluster in Cassandra where each of the server is running multiple instances of Cassandra(each instance is part of the same cluster).
I'm aware that if there's a single server in the cluster, then it's possible to run multiple instances of Cassandra on it, but is it also possible to have multiple such servers in the cluster. If yes, how will the configuration look like(listen address,ports etc)?
Even if it was possible, I understand that there might not be any performance benefits at all, just wanted to know if it's theoretically possible.

Yes, it's possible & such setup is often used for testing, for example, using CCM, although it creates multiple interfaces on loopback (127.0.0.2, ...). DataStax Enterprise also has so-called Multi-instance.
You need carefully configure your instances separating ports, etc. Right now, potentially using the Docker could be the simpler solution to implement it.
But why do you need to do it? Until you have really biffy machine, with a lot of RAM & multiple SSDs, this won't bring you additional performance.

Yes, it is possible even i have worked with 5 instance running in one server in production cluster.
Trust me still it is still running but the generic issues i had is high GC all the time, dropped mutations and high latency so of course it is not good to have this kind of setup.
but for your questions's answer yes it is possible and can be in production also.

Related

Do all Apache Cassandra nodes need to use the same Garbage Collector?

I have recently upgraded our Cassandra cluster from 3.11 to 4.0 with the long term goal to also upgrade the Java version. I did not want to do both of these things at once for obvious reasons, however we have been upgraded on C4 for just over two weeks now and I'm looking to upgrade the Java version from jdk8 to jdk11, and also move from CMS Garbage Collector to G1GC.
We wanted to get an idea of what the impact of moving to G1GC would be before going big bang across all nodes.
Is it safe to use a different Garbage collector on different nodes? or should this be something setup in a test environment to monitor?
Thanks in advance.
Yes! That is actually the recommended practice when changing/testing new GC types, assuming that you cannot fully simulate production workloads in a lower environment.
I'd advise making the switch on one or two nodes, and then monitor their performance relative to the CMS nodes.
Logically you can do it since they are different java processes running on different machines. Actual intention behind you doing this activity is to test you must analyze the impact on test environment first and then apply changes on production if you find test results suitable.

Node.js Cluster module vs Microservices

They both solve the same issue - scalability. When to use which?
And is there a point to integrating cluster API for node app running inside a docker container?
They're not really equivalent. Microservices solve an organizational and code management problem, scalability in a very dynamic way, reducing tight coupling, and keeping bugs isolated to one microservice). cluster solves scalability in a very limited way, by spinning out cluster workers on the same machine. If you have one large app and generally scale vertically (by increasing the amount of computing power your hosts have), cluster is great. If not, breaking things down int services (or further down into microservices) is also great.
You can also do both (your second question), for example running Node apps in containers on Kubernetes, where the Node apps use cluster. Depending on how your containers get run and how many vCPUs they're allocated, it may or may not have any effect, but it's only a couple lines of code so it doesn't hurt to add it.

Which MongoDB scaling strategy (Sharding, Replication) is suitable for concurrent connections?

Consider scenario that
I have multiple devclouds (remote workplace for developers), they are all virtual machines running on the same bare-metal server.
In the past, they used their own MongoDB containers running on Docker. So that number of MongoDB containers can add up to over 50 instances across devclouds.
The problem becomes apparent that while 50 instances is running at the same time, but only 5 people actually perform read/write operations against their own instances. So other 45 running instances waste the server's resources.
Should I use only one MongoDB cluster by combining a set of MongoDB instances ,for everyone so that they can connect to 1 endpoint only (via internal network) to avoid wasting resources.
I am considering the sharding strategy, but the problem is there are chances that if one node taken down (one VM shut down), is that ok for availability (redundancy)?
I am pretty new to sharding and replication, looking forward to know your solutions. Thank you
If each developer expects to have full control over their database deployment, you can't combine the deployments. Otherwise one developer can delete all data in the deployment, etc.
If each developer expects to have access to one database, you can deploy a single replica set serving all developers and assign one database per developer (via authentication).
Sharding in MongoDB sense (a sharded cluster) is not really going to help in this scenario since an application generally uses all of the shards. You can of course "shard manually" by setting up multiple replica sets.

Is implementing elastic search service on same server as node server with auto scaling is a good idea?

Trying to deploy a project on t3 large server with auto scaling.
I have my elastic search service deployed on same system as node and react projects.(Not using AWS elastic search)
Will it be facing issues in future and i need to segregate elastic search service to some other server?
It's always nice to have a separate dedicated server for running the Elasticsearch server but as you are using AWS some of the things which you can do to minimize the issues:
Elasticsearch is a stateful application contrast to your node and react app unless you are storing the state there as well which is not a good idea and due to stateless nature of the applications, autoscaling is very useful as you can on-demand based on the CPU, memory or other metrics scale up or down the instances.
But in case of Elasticsearch or other stateful applications, it becomes tricky as when you scale up or down the instance, shards get relocated if they are not reachable within a threshold which can lead to unbalanced Elasticsearech cluster.
Now in order to minimize these issues:
Make sure you can storing Elasticsearch indices on the network-attached disk so that there is no data loss when autoscaling brings a new instance and new instance again should use earlier network attaches EBS(where your data is stored).
Make sure you don't create a new Elasticsearch process when you scale up or down the instances according to your autoscaling policy and the Elasticsearch process should be fixed and scale up/down with some manual intervention.
If you have to scale up the Elasticsearch cluster then make sure you disable shard allocation to avoid the issues mentioned earlier.
These are some known issues which you might face and there could be even more based on your configuration and while writing the answer itself I felt, it so easy to just have a dedicated instance for Elasticsearch to avoid these weird issues.
I would add to other answers following:
Elasticsearch performs best if it has enough RAM to keep indexes in entirety in RAM. If the Elasticsearch is competing with Node/Application for RAM it will affect it's performance.
From maintenance/performance perspective you should consider having at least 3-node cluster. Even if that means you have smaller machines. If AWS is upgrading infrastructure and you have 1 machine, when than 0.05% unavailability hits your search is down. If you need to do maintenance on the node or do upgrades having multiple machines will help with availability.
Depending on your use of Elasticsearch and how often you update/delete items in the indexes, and how fast your indexes will grow, adding more machines/nodes to the cluster will help with growth.
There are probably many more things to consider, but that totally depends on your application, budget, SLAs etc.

Keeping replicas of Redis instances in sync ?

It is possible to create repicas of a Redis instance? If yes, what is the overhead of keeping them in sync (apart from the network traffic) ?
See the Redis documentation on setting up replication scenarios.
Since there is a delay between synchronisation, you'll probably need additional application-side logic to keep access that uses the same data to the same server instance. In some cases you may also need to issue additional 'slaveof' commands to one instance in case another goes down/comes back up.
If you need more concrete info, you should elaborate a bit on your use case, i.e. in what environment you're using Redis (i.e. Rails app cluster, custom client...).

Resources