Restart tasktracker and job tracker of hadoop CDH4 using Cloudera services - linux

I have made few entries in mapred-site.xml, to pick these changes i need to restart TT and JT running at my cluster nodes.
Is there any i can restart them using Cloud Era manager web services from command line.
So I can automate those steps any time changed made configuration files for hadoop it will restart TT and JT..

Since version 4.0, Cloudera Manager exposes its functionality through an HTTP API which allows you to do the operations through "curl" from the shell. The API is available in both the Free Edition and the Enterprise Edition.
Their repository hosts a set of client-side utilities for communicating with the Cloudera Manager API. You can find more on the documentation page.

Related

Creating Linux Service for Cassandra DDAC

Creating Linux service for DataStax Distribution of Apache Cassandra (DDAC)
Hi,
Installed DataStax Distribution of Apache Cassandra (DDAC), the Cassandra community version by DataStax.
Used this link:
https://docs.datastax.com/en/ddac/doc/datastax_enterprise/install/installDDAC.html
At the end of the instructions, it says to start Cassandra using interactive command, not as a service:
$ bin/cassandra
Also, there is NO option to create a service for Cassandra using:
$ service cassandra start
I get:
Failed to start cassandra.service: Unit not found.
Does DDAC support starting as a service?
Regards,
You are right, DDAC has those instructions to launch the process from command line, if you want to set it as a service, my guess is that Datastax will provide it as part of their enterprise product.
You can still create the systemd service unit, there are multiple examples in github like this one

Configuration management for Linux / Hadoop Cluster

I have to setup a small size Hadoop cluster on Linux (Ubuntu) machines. For that I have to install JDK, python and some other linux utilities on all systems. After that I have to configure Hadoop for each system one by one. Is there any tool available so that I can install all these tools from a single system. For example if I have to install jdk on some system, that tool should install to that. I prefer the tool is web based .
Apache Ambari or Cloudera Manager are purposely built to accomplish these tasks for Hadoop
They also monitor the cluster, and provision extra services that communicate with it like Kafka, Hbase, Spark, etc
That only gets you so far, though, and you'll want to have something like Ansible to deploy custom configurations (AWX is a web UI for Ansible). Puppet & Chef are alternatives too

Linux Hadoop Services monitoring tool and restart if down

I have configured Hadoop 2.7.5 with Hbase. It is a 5 system cluster in fully distributed mode. I have to monitor Hadoop/Hbase daemons and want to start some action (e.g. mail ) if some daemon goes down. Is there any built-in solution.
Also I want to start Hadoop at boot time. How can I do this ?
I am assuming that you aren't using major dists like Cloudera or Hortonworks, they have this feature built into their stack.
For automated restarts a boot time, you can use the init.d (or systemd)
and example of emailing out in the event of failure there is scripting solution in this answer Bash script to monitor process and sendmail if failed
In enterprise organisation, most will have a monitoring solution in place such as tivoli, which you can hook into.

What version of Apache spark is used in my IBM Analytics for Apache Spark for IBM Cloud service?

I saw an email indicating the sunset of support for 1.6 apache spark within IBM Cloud. I am pretty sure my version is 2.x, but I wanted to confirm. I couldn't find anywhere in the UI that indicated the version, and the bx cli command that I thought would show it didn't.
[chrisr#oc5287453221 ~]$ bx service show "Apache Spark-bc"
Invoking 'cf service Apache Spark-bc'...
Service instance: Apache Spark-bc
Service: spark
Bound apps:
Tags:
Plan: ibm.SparkService.PayGoPersonal
Description: IBM Analytics for Apache Spark for IBM Cloud.
Documentation url: https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html
Dashboard: https://spark-dashboard.ng.bluemix.net/dashboard
Last Operation
Status: create succeeded
Message:
Started: 2018-01-22T16:08:46Z
Updated: 2018-01-22T16:08:46Z
How do I determine the version of spark that I am using? Also, I tried going to the "Dashboard" URL from above, and I got an "Internal Server Error" message after logging in.
The information found on How to check the Spark version doesn't seem to help, because it seems to be related to locally installed spark instances. I need to find out the information from the IBM Cloud (ie. Bluemix) using either the UI or the bluemix CLI. Other possibilities would be running some command from a Jupyter Notebook in iPython running in Data Science Experience (part of IBM Cloud).
The answer was given by ptitzler above, just adding an answer as requested by the email I was sent.
The Spark service itself is not version specific. To find out whether
or not you need to migrate you need to inspect the apps/tools that
utilize the service. For example if you've created notebooks in DSX
you associated them with a kernel that was bound to a specific Spark
version and you'd need to open each notebook to find out which Spark
version they are utilizing. – ptitzler Jan 31 at 16:32

Spark metrics fot gmond / ganglia

OS: Cent OS 6.4
ISSUE:
Installed gmond, gmetad and gweb on a server. Installed spark worker in the same server.
configured metrics.properties in $SPARK_HOME/conf/metrics.properties as below...
CONFIGURATION (metrics.properties in spark):
org.apache.spark.metrics.sink.GangliaSink
host localhost
port 8649
period 10
unit seconds
ttl 1
mode multicast
We are not able to see any metrics in ganglia web.
Please do the needful.
-pradeep samudrala
In the first place, those are just indications of the default settings of Ganglia. You should not uncomment that. Taken from the metrics section from the Spark web page (spark page):
To install the GangliaSink you’ll need to perform a custom build of Spark. Note that by embedding this library you will include LGPL-licensed code in your Spark package. For sbt users, set the SPARK_GANGLIA_LGPL environment variable before building. For Maven users, enable the -Pspark-ganglia-lgpl profile. In addition to modifying the cluster’s Spark build user applications will need to link to the spark-ganglia-lgpl artifact.

Resources