Cassandra Machines Requirement: General purpose or IO intensive - cassandra

Since most cloud providers have general-purpose, CPU-intensive or memory-intensive machines, I am just curious which category would be good for the deployment of the Cassandra cluster.

Related

Does clustering in Node.js and auto-scaling web application using Kubernetes serve the same purpose?

Node.js has introduced the Cluster module to scale up applications for performance optimization. We have Kubernetes doing the same thing.
I'm confused if both are serving the same purpose? My assumption is clustering can spawn up to max 8 processes (if there are 4 cpu cores with 2 threads each) and there is no such limitation in Kubernetes.
Kubernetes and the Node.js Cluster module operate at different levels.
Kubernetes is in charge of orchestrating containers (amongst many other things). From its perspective, there are resources to be allocated, and deployments that require or use a specific amount of resources.
The Node.js Cluster module behaves as a load-balancer that forks N times and spreads the requests between the various processes it owns, all within the limits defined by its environment (CPU, RAM, Network, etc).
In practice, Kubernetes has the possibility to spawn additional Node.js containers (scaling horizontally). On the other hand, Node.js can only grow within its environment (scaling vertically). You can read about this here.
While from a performance perspective both approaches might be relatively similar (you can use the same number of cores in both cases); the problem with vertically scaling on a single machine is that you lose the high-availability aspect that Kubernetes provides. On the other hand, if you decide to deploy several Node.js containers on different machines, you are much more tolerant for the day one of them is going down.

Kafka using Docker for production clusters

We need to build a Kafka production cluster with 3-5 nodes in cluster ,
We have the following options:
Kafka in Docker containers (Kafka cluster include zookeeper and schema registry on each node)
Kafka cluster not using docker (Kafka cluster include zookeeper and schema registry on each node)
Since we are talking on production cluster we need good performance as we have high read/write to disks (disk size is 10T), good IO performance, etc.
So does Kafka using Docker meet the requirements for productions clusters?
more info - https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment/
It can be done, sure. I have no personal experience with it, but if you don't otherwise have experience managing other stateful containers, I'd suggest avoiding it.
As far as "getting started" with Kafka in containers, Kubernetes is the most documented way, and Strimzi (free, optional commercial support by Lightbend) or Confluent Operator (commercial support by Confluent) can make this easy when using Kubernetes or Openshift. Or DC/OS offers a Kafka service over Mesos/Marathon. If you don't already have any of these services, then I think it's apparent that you should favor not using containers.
Bare metal or virtualized deployments would be much easier to maintain than hand-deployed containerized ones, from what I have experienced. Particularly for logging, metric gathering, and statically assigned Kafka listener mappings over the network. Confluent provides Ansible scripts for doing deployments to such environments
That isn't to say there's companies that have been successful at it, or at least tried. IBM, RedHat, and Shopify immediately pop up in my searches, for example
Here's a few talk about things to consider when Kafka is in containers
https://www.confluent.io/kafka-summit-london18/kafka-in-containers-in-docker-in-kubernetes-in-the-cloud
https://kafka-summit.org/sessions/running-kafka-kubernetes-practical-guide/

Kubernetes cluster architecture

Does it make sense to create a separate Kubernetes cluster for my Cassandra instances and one cluster for the application layer? Is the DB cluster accessible from the service cluster when both are in the same region and zone?
Or is it better to have one cluster with different pools - one pool for the service layer and one pool the DB nodes?
Thanks
This is more of a toss-up or opinion in terms of how you want to design your whole architecture. Here are some things to consider:
Same cluster:
Pros
Workloads don't need to go to a different podCidr to get its data.
You can optimize your resources in the same set of servers.
This is one of the main reasons people use containers orchestrators and containers.
It allows you to run multiple different types of workloads on the same set of resources.
Cons
If you have an issue with your cluster running Cassandra you risk losing your data. Or temporarily lose data if you have backups. (Longer downtime)
If you'd like to super isolate the db and app in terms of security, it may be harder.
Different clusters:
Pros
'Safer' if one of your clusters goes down.
More separation in terms of security for your data at rest.
Cons
Resources may not be optimally utilized. Leaving some CPUs, memory, etc idle.
More infrastructure management.
Different node pools:
Pros
Separation of data at rest
Still going through the same PodCidr.
Cons
More management of different node pools.
Resources may not be optimally utilized.

Is Imap.get() expensive in Hazelcast if Hazelcast cluster is running in the Cloud?

I have a distributed map stored in hazelcast. My hazelcast cluster run in a cloud either private or public. My app may not run on the same network where hazelcast cluster is running.
My app tries to access distributed map using IMap.get() may be thousands per second. I tried to major performance of the above operation on the local cluster by running hazelcast cluster on my local machine. I could read everything in 15-20ms. But I am not getting the same performance if hazelcast cluster runs in the cloud.
If you are reading a map, more frequently, Will it increase the load on hazelcast in the cloud environment?, yes any reasons?
Performance of running software locally will always be different than running in a distributed environment, more so when servers are located elsewhere - network latencies being the most prominent factor.
Servers in cloud, app on local = not the recipe for best performance. Either move all cluster components- servers and app clients, in one network (aim for same availability zone if looking for best performance) or expect delays. Its not the cloud in particular that deteriorates the performance, its the way VMs are setup in cloud. For example, if one VM is in us-east-1 and other in London and your app is in Tokyo then expect inferior performance numbers.

What's difference between Apache Mesos, Mesosphere and DCOS?

Looks to me that Apache Mesos is a distributed systems kernel, and Mesosphere is something Linux distribution based on Apache Mesos.
For example, its like Linux Kernel(Apache Mesos) and Ubuntu(Mesosphere).
Am I right about this?
and DCOS is a free edition of Mesosphere, like RedHat vs RedHat Enterprise?
Let me try ;-)
Apache Mesos - OpenSource Cluster Resource Manager, Kernel of DC/OS
Mesosphere - Company contributing to both Apache Mesos and DC/OS
DC/OS - OpenSource Distribution around Apache Mesos including UI, networking, and many other pieces. Mesosphere also offers an Enterprise Edition of DC/OS with support and some advanced features.
Hope this helped!
my two cents and from various online sources...
DC/OS is a datacenter operating system, also a distributed system. The operating system is based on the Apache Mesos distributed kernel. See Apache Mesos more details below
It comprises of three main components:
A cluster manager,
A container platform, and
An operating system.
Essentially DC/OS abstracts the infrastructure below with the Mesos and provides powerful tools that can run services and applications and more importantly you would find complete SMACK stack all pulled in under one OS platform. DC/OS has a built-in self-healing distributed system.
It is agnostic to infrastructure layer meaning the host may consist of either virtual or physical hardware as long as it provides computing, storage and networking., it is designed to run anywhere on-premises and/or virtual AWS, AZURE….see https://dcos.io/docs/1.10/overview/
Apache Mesos is a distributed kernel and it is the backbone of DC/OS. It’s programmed against your datacentre as being a single pool of resources. It abstracts CPU, memory, storage and other computing resouces.. It also provides an API for resource management , scheduling across datacentre and cloud environment. It can be scale up to 10,000’s of nodes. So it can definitely be considered as a solution for large production clusters. It supports container orchestration platforms like Kubernetes and of course Marathon.
Mesosphere - DC/OS is created and maintained by Mesosphere
js84 provides an excellent and concise answer above. Just to drive home the point, here is
an analogy to the Linux ecosystem:
Mesos is akin to Linux kernel (as identified by Linux kernel version such as 2.6, found by command $ uname -a)
DC/OS is akin to Linux Operating Systems (as identified by Linux Distribution/Releases in file such as /etc/redhat-release: RHEL 7.1, CentOS 7.2), with a whole bunch of bin and utilities in /bin, /usr/bin, ...
Mesosphere is akin to RedHat, the company which contributes a lot to the open source Linux kernel and Linux Distribution, as well as provides paid-support to enterprise customers and additional features required by enterprise.
This is a good overview of what DC/OS is:
https://docs.mesosphere.com/1.11/overview/what-is-dcos/
Apache Mesos is the Opensource distributed orchestrator for Container as well as non-Container workloads.It is a cluster manager that simplifies the complexity of running applications on a shared pool of servers and responsible on sharing resources across application framework by using scheduler and executor.
DC/OS(Datacenter Operating System) are built on top of Apache Mesos. Opensource DC/OS adds Service discovery, Universe package for different frameworks, CLI and GUI support for management and Volume support for persistent storage. DC/OS used unified API to manage multiple system on cloud or on-premises such as deploys containers, distributed services etc. Unlike traditional operating systems, DC/OS spans multiple machines within a network, aggregating their resources to maximize utilization by distributed applications.
Mesosphere company has products that are built on top of Apache Mesos. Mesosphere contribute to both Apache Mesos and Opensource DC/OS. Mesosphere offers a layer of software that organizes your machines, VMs, and cloud instances and lets applications draw from a single pool of intelligently- and dynamically-allocated resources, increasing efficiency and reducing operational complexity
I understood this way, I might wrong.
DC/OS will give more features like #js84 said and Mesos will give less of DC/OS.
Apologies for bad writing on board or diagram

Resources