Linux Hadoop Services monitoring tool and restart if down - linux

I have configured Hadoop 2.7.5 with Hbase. It is a 5 system cluster in fully distributed mode. I have to monitor Hadoop/Hbase daemons and want to start some action (e.g. mail ) if some daemon goes down. Is there any built-in solution.
Also I want to start Hadoop at boot time. How can I do this ?

I am assuming that you aren't using major dists like Cloudera or Hortonworks, they have this feature built into their stack.
For automated restarts a boot time, you can use the init.d (or systemd)
and example of emailing out in the event of failure there is scripting solution in this answer Bash script to monitor process and sendmail if failed
In enterprise organisation, most will have a monitoring solution in place such as tivoli, which you can hook into.

Related

Is there a shell-command for spark that says what jobs are queued or running?

Environment: Spark 1.6.2; Linux 2.6.x (Red Hat 4.4.x); Hadoop 2.4.x.
I launched a job this morning through spark-submit but do not see the files it was supposed to write. I've read a bit about the web UI for monitoring spark jobs, but at this point, my only visibility into what is happening on the Hadoop cluster and HDFS is through a bash-shell terminal.
Question: what are the standard ways from the command-line to get a quick readout on spark jobs, and any log trail they might leave behind (during or after job execution)?
Thanks.
You can use yarn application -list

Configuration management for Linux / Hadoop Cluster

I have to setup a small size Hadoop cluster on Linux (Ubuntu) machines. For that I have to install JDK, python and some other linux utilities on all systems. After that I have to configure Hadoop for each system one by one. Is there any tool available so that I can install all these tools from a single system. For example if I have to install jdk on some system, that tool should install to that. I prefer the tool is web based .
Apache Ambari or Cloudera Manager are purposely built to accomplish these tasks for Hadoop
They also monitor the cluster, and provision extra services that communicate with it like Kafka, Hbase, Spark, etc
That only gets you so far, though, and you'll want to have something like Ansible to deploy custom configurations (AWX is a web UI for Ansible). Puppet & Chef are alternatives too

How to setup Jenkins with HA?

Currently we are using a Jenkins as our CI system and there is one master server and slaves which are provisioned by Saltstack on Openstack. If our Jenkins master server goes down, we need to create a new master and we need to pull the files from the old master & put it in new ones but it's gonna take at least 30mins.
Is there any way to setup Jenkins with High Availability?
I already check with Gearman Plugin, however if the Gearman server goes down for some reason, we need to setup a HA for Gearman also.
Is there any other ways to setup a High Availability for Jenkins?
Jenkins doesn't have a great HA story; the best you can do with the open source version is to put all of the files in $JENKINS_HOME on a shared file system, and then have a cold standby master machine that you can spin up if the active master goes down. That would reduce your failover time to however long it takes for the master to restart, which is usually just a few minutes.
You could also look at CloudBees' Jenkins Enterprise offering, which includes a High Availability Plugin.
I use cluster from scratch doc to create a Jenkins WAN-HA active/passive cluster. See the attached Architecture Diagram for Jenkins HA using pacemaker .
/etc/init.d/jenkins will need to be converted to be an ocf agent script. Currently I manually start up Jenkins via systemd on pcmk-2 server when pcmk-1 is down.

Script to run Rexster as a daemon in linux

I'm setting up Titan graph database for the first time in a production environment on Debian virtual machines, and I am utilising Rexster to provide the interface into Titan. However after googling around I cannot find any scripts to allow rexster to run as a daemon in the background. As per titan rexster with external cassandra instance I have split off Cassandra, Elasticsearch, and Rexster to start as their own processes. Cassandra and Elasticsearch conveniently have Debian packages that deploy the daemon scripts out of the box, however there is nothing for Rexster. Has anyone made a script that allows Rexster to run as a daemon?
Looking at the rexster.sh script in titan download zip ../$titan_base/bin/ it calls java to start Rexster up, so I'm thinking that some kind of wrapper like JSVC could be used to start it up, unless there is an easier way?
A simple, generic tool to handle this is Daemonize. More details in this post.
If your Debian is new enough to be using Systemd, look into creating a service script. The key commands for using your script would be:
systemctl start rexster.service
systemctl enable rexster.service

Restart tasktracker and job tracker of hadoop CDH4 using Cloudera services

I have made few entries in mapred-site.xml, to pick these changes i need to restart TT and JT running at my cluster nodes.
Is there any i can restart them using Cloud Era manager web services from command line.
So I can automate those steps any time changed made configuration files for hadoop it will restart TT and JT..
Since version 4.0, Cloudera Manager exposes its functionality through an HTTP API which allows you to do the operations through "curl" from the shell. The API is available in both the Free Edition and the Enterprise Edition.
Their repository hosts a set of client-side utilities for communicating with the Cloudera Manager API. You can find more on the documentation page.

Resources