What's difference between Apache Mesos, Mesosphere and DCOS? - linux

Looks to me that Apache Mesos is a distributed systems kernel, and Mesosphere is something Linux distribution based on Apache Mesos.
For example, its like Linux Kernel(Apache Mesos) and Ubuntu(Mesosphere).
Am I right about this?
and DCOS is a free edition of Mesosphere, like RedHat vs RedHat Enterprise?

Let me try ;-)
Apache Mesos - OpenSource Cluster Resource Manager, Kernel of DC/OS
Mesosphere - Company contributing to both Apache Mesos and DC/OS
DC/OS - OpenSource Distribution around Apache Mesos including UI, networking, and many other pieces. Mesosphere also offers an Enterprise Edition of DC/OS with support and some advanced features.
Hope this helped!

my two cents and from various online sources...
DC/OS is a datacenter operating system, also a distributed system. The operating system is based on the Apache Mesos distributed kernel. See Apache Mesos more details below
It comprises of three main components:
A cluster manager,
A container platform, and
An operating system.
Essentially DC/OS abstracts the infrastructure below with the Mesos and provides powerful tools that can run services and applications and more importantly you would find complete SMACK stack all pulled in under one OS platform. DC/OS has a built-in self-healing distributed system.
It is agnostic to infrastructure layer meaning the host may consist of either virtual or physical hardware as long as it provides computing, storage and networking., it is designed to run anywhere on-premises and/or virtual AWS, AZURE….see https://dcos.io/docs/1.10/overview/
Apache Mesos is a distributed kernel and it is the backbone of DC/OS. It’s programmed against your datacentre as being a single pool of resources. It abstracts CPU, memory, storage and other computing resouces.. It also provides an API for resource management , scheduling across datacentre and cloud environment. It can be scale up to 10,000’s of nodes. So it can definitely be considered as a solution for large production clusters. It supports container orchestration platforms like Kubernetes and of course Marathon.
Mesosphere - DC/OS is created and maintained by Mesosphere

js84 provides an excellent and concise answer above. Just to drive home the point, here is
an analogy to the Linux ecosystem:
Mesos is akin to Linux kernel (as identified by Linux kernel version such as 2.6, found by command $ uname -a)
DC/OS is akin to Linux Operating Systems (as identified by Linux Distribution/Releases in file such as /etc/redhat-release: RHEL 7.1, CentOS 7.2), with a whole bunch of bin and utilities in /bin, /usr/bin, ...
Mesosphere is akin to RedHat, the company which contributes a lot to the open source Linux kernel and Linux Distribution, as well as provides paid-support to enterprise customers and additional features required by enterprise.
This is a good overview of what DC/OS is:
https://docs.mesosphere.com/1.11/overview/what-is-dcos/

Apache Mesos is the Opensource distributed orchestrator for Container as well as non-Container workloads.It is a cluster manager that simplifies the complexity of running applications on a shared pool of servers and responsible on sharing resources across application framework by using scheduler and executor.
DC/OS(Datacenter Operating System) are built on top of Apache Mesos. Opensource DC/OS adds Service discovery, Universe package for different frameworks, CLI and GUI support for management and Volume support for persistent storage. DC/OS used unified API to manage multiple system on cloud or on-premises such as deploys containers, distributed services etc. Unlike traditional operating systems, DC/OS spans multiple machines within a network, aggregating their resources to maximize utilization by distributed applications.
Mesosphere company has products that are built on top of Apache Mesos. Mesosphere contribute to both Apache Mesos and Opensource DC/OS. Mesosphere offers a layer of software that organizes your machines, VMs, and cloud instances and lets applications draw from a single pool of intelligently- and dynamically-allocated resources, increasing efficiency and reducing operational complexity

I understood this way, I might wrong.
DC/OS will give more features like #js84 said and Mesos will give less of DC/OS.
Apologies for bad writing on board or diagram

Related

Kafka using Docker for production clusters

We need to build a Kafka production cluster with 3-5 nodes in cluster ,
We have the following options:
Kafka in Docker containers (Kafka cluster include zookeeper and schema registry on each node)
Kafka cluster not using docker (Kafka cluster include zookeeper and schema registry on each node)
Since we are talking on production cluster we need good performance as we have high read/write to disks (disk size is 10T), good IO performance, etc.
So does Kafka using Docker meet the requirements for productions clusters?
more info - https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment/
It can be done, sure. I have no personal experience with it, but if you don't otherwise have experience managing other stateful containers, I'd suggest avoiding it.
As far as "getting started" with Kafka in containers, Kubernetes is the most documented way, and Strimzi (free, optional commercial support by Lightbend) or Confluent Operator (commercial support by Confluent) can make this easy when using Kubernetes or Openshift. Or DC/OS offers a Kafka service over Mesos/Marathon. If you don't already have any of these services, then I think it's apparent that you should favor not using containers.
Bare metal or virtualized deployments would be much easier to maintain than hand-deployed containerized ones, from what I have experienced. Particularly for logging, metric gathering, and statically assigned Kafka listener mappings over the network. Confluent provides Ansible scripts for doing deployments to such environments
That isn't to say there's companies that have been successful at it, or at least tried. IBM, RedHat, and Shopify immediately pop up in my searches, for example
Here's a few talk about things to consider when Kafka is in containers
https://www.confluent.io/kafka-summit-london18/kafka-in-containers-in-docker-in-kubernetes-in-the-cloud
https://kafka-summit.org/sessions/running-kafka-kubernetes-practical-guide/

Configuration management for Linux / Hadoop Cluster

I have to setup a small size Hadoop cluster on Linux (Ubuntu) machines. For that I have to install JDK, python and some other linux utilities on all systems. After that I have to configure Hadoop for each system one by one. Is there any tool available so that I can install all these tools from a single system. For example if I have to install jdk on some system, that tool should install to that. I prefer the tool is web based .
Apache Ambari or Cloudera Manager are purposely built to accomplish these tasks for Hadoop
They also monitor the cluster, and provision extra services that communicate with it like Kafka, Hbase, Spark, etc
That only gets you so far, though, and you'll want to have something like Ansible to deploy custom configurations (AWX is a web UI for Ansible). Puppet & Chef are alternatives too

Phisycal ressources for spark cluster

I trying to learn spark to implement one of our algorithms to increase the execution time. I downloaded a pre-compiled version on my local machine to run it in a local mode and I enjoyed creating some toy apps.
Next step is to use the cluster mode ( standalone one at this level).
I found a lot of amazing tutorials talking about how to configure the cluster and the difference between local and cluster modes and this is really clear ( I will be back here if I had troubles with that ).
My question for now is:
What physical infrastructure to use for spark cluster?
No downvotes please, I will explain: For now, we have 2 dedicated server with 32Go of RAM and 8 CPUS each one.
Now I am asking:
What is the best way to fully exploit this resources with spark?
Which is better:
Use a virtualization ( ESXI / Proxmox ) in order to create virtual machines which will be my cluster nodes?
just use the two servers and create a 2-noded cluster?

Linux HA vs Apache Hadoop

I'm using Cloudera (Apache Hadoop), so I have a pretty good idea about it.
However, I just found out about Linux HA project and
I cannot find out what is the difference between Linux HA and Apache Hadoop.
When should we use Apache Hadoop and when should we use Linux HA?
Thank you!
Linux HA is a software based High-availability cluster services which are used to improve the ability of many kinds of services. That means - This Linux HA is used to keep desired services up and running with no downtime. This uses the concept of heartbeat to identify the service state in the cluster. For example if you have a web server running on hostA, it is replicated to run on hostB also. Whenever the hostA is down, hostB starts and serves requests. i.e there is no downtime provided by the server.
Whereas, Apache Hadoop is a Framework that solves the problem of storing large amount of data and processing it.

can HBase , MapReduce and HDFS can work on a single machine having Hadoop installed and running on it?

I am working on a search engine design, which is to be run on cloud.
We have just started, and have not much idea about Hdoop.
Can anyone tell if HBase , MapReduce and HDFS can work on a single machine having Hdoop installed and running on it ?
Yes you can. You can even create a Virtual Machine and run it on there on a single "computer" (which is what I have :) ).
The key is to simply install Hadoop in "Pseudo Distributed Mode" which is even described in the Hadoop Quickstart.
If you use the Cloudera distribution they have even created the configs needed for that in an RPM. Look here for more info in that.
HTH
Yes. In my development environment, I run
NameNode (HDFS)
SecondaryNameNode (HDFS)
DataNode (HDFS)
JobTracker (MapReduce)
TaskTracker (MapReduce)
Master (HBase)
RegionServer (HBase)
QuorumPeer (ZooKeeper - needed for HBase)
In addition, I run my applications, and map and reduce tasks launched by the task tracker.
Running so many processes on the same machine results in a lot of contention for CPU cores, memory, and disk I/O, so it's definitely not great for high performance, but there is no limitation other than the amount of resources available.
same here, I am running hadoop/hbase/hive on a single computer.
If you really really want to see distributed computing on a single computer, grab lots of RAM, some hard disk space and go like this -
make one or two virtual machines (use virtual box)
install hadoop on each of them, make ur real instalation (not any virtual one) as the master, rest slave
configure hadoop for real distributed environment
now when hadoop starts, you should actually have a cluster of multiple computers (one real, rest virtual)
this could just be an experiment, because unless you have a decent multi-cpu or multi-core system, such a configuration will actually consume more on maintaining itself than giving you any performance.
gud luck.
--l4l

Resources