Spark as a Linux Service - linux

I've been tasked to deploy spark into a production environment. I typically manage everything with Ansible. I've packaged up zookeeper and kafka and can deploy those as linux services, but Spark I'm having problems with.
It just doesn't seem setup to be started/stopped as a service (referring to init.d services). Is anyone running spark in cluster mode and do you have it setup to start/stop via an init.d script? Or what's the general consensus on how to set this up?
This is what I already tried before:
spark init.d service:
#!/bin/bash
SPARK_BASE_DIR=/opt/spark-2.0.0-bin-hadoop2.7
SPARK_SBIN=$SPARK_BASE_DIR/sbin
PID=''
if [ -f $SPARK_BASE_DIR/conf/spark-env.sh ];then
source $SPARK_BASE_DIR/conf/spark-env.sh
else
echo "$SPARK_BASE_DIR/conf/spark-env.sh does not exist. Can't run script."
exit 1
fi
check_status() {
PID=$(ps ax | grep 'org.apache.spark.deploy.master.Master' | grep java | grep -v grep | awk '{print $1}')
if [ -n "$PID" ]
then
return 1
else
return 0
fi
}
start() {
check_status
if [ "$?" -ne 0 ]
then
echo "Master already running"
exit 1
fi
echo -n "Starting master and workers ... "
su user -c "$SPARK_SBIN/start-all.sh" spark &>/dev/null
sleep 5
check_status
if [ "$?" -eq 0 ]
then
echo "FAILURE"
exit 1
fi
echo "SUCCESS"
exit 0
}
stop() {
check_status
if [ "$?" -eq 0 ]
then
echo "No master running ..."
return 1
else
echo "Stopping master and workers ..."
su user -c "$SPARK_SBIN/stop-all.sh" spark &>/dev/null
sleep 4
echo "done"
return 0
fi
}
status() {
check_status
if [ "$?" -eq 0 ]
then
echo "No master running"
exit 1
else
echo -n "master running: "
echo $PID
exit 0
fi
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
status)
status
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
esac
exit 0
I'm running the service from the master node to start all the cluster nodes.
Some info about my environment:
Ubuntu 16.04
spark 2.0.0 with hadoop 2.7

I solve it. The issue was coming from my ansible role. I didn't set the group of log folder's owner. Now it works fine.

Related

Exit from the `nohup` called from shell script

I have a shell script from where I want to start the jmeter-server on a machine and here below is my code,
#!/bin/sh
#Shell script for starting the jmeter server in agent machine
startServer() {
checkProcess=$(lsof -t -i :1099)
if [ -z "$checkProcess" ] ; then
JmeterDirectory='/home/performance/PerfTool/JMeter/apache-jmeter-5.2.1/bin'
cd ${JmeterDirectory} || exit
if [ $? -eq 0 ] ; then
pwd=$(pwd)
echo "Directory changed to $pwd"
startJmeterServer=$(nohup ./jmeter-server &) || exit
if [ "$startJmeterServer" -eq 0 ] ; then
echo "service started successfully"
else
echo "failed to start the service"
fi
else
echo "Went wrong"
fi
else
totalNumberofService=$(lsof -t -i :1099 | wc -l)
if [ "$totalNumberofService" -gt 1 ] ; then
echo "There are multiple JMeter-Server process are running"
else
procssKiller=$(kill -9 "$checkProcess")
${procssKiller}
if [ $? -eq 0 ] ; then
echo "Process has been killed with processId $checkProcess"
else
echo "Failed to kill the process with processId $checkProcess"
fi
fi
fi
}
startServer
this working fine, but after starting the jmeter server the terminal is not exiting unless I press CTRL+Z. what I have missed here

Shell script segmentation fault - AWS

I've been following a tutorial on connecting a raspberry pi to the AWS greengrass and I keep getting a segmentation fault on the final step. AWS provided me with this greengrassd shell script however when i run it I'm getting a segmentation fault. I have no idea why its throwing this error so any help would be appreciated.
AWS Greengrass Tutorial / RaspberryPi
Error
pi#raspberrypi:/greengrass/ggc/packages/1.1.0 $ sudo ./greengrassd start
Setting up greengrass daemon
Validating execution environment
Found cgroup subsystem: cpu
Found cgroup subsystem: cpuacct
Found cgroup subsystem: blkio
Found cgroup subsystem: memory
Found cgroup subsystem: devices
Found cgroup subsystem: freezer
Found cgroup subsystem: net_cls
Starting greengrass daemon./greengrassd: line 158: 2254 Segmentation fault nohup $COMMAND > /dev/null 2> $CRASH_LOG < /dev/null
Greengrass daemon 2254 failed to start
greengrassd script
#!/usr/bin/env bash
##########Environment Requirement for Greengrass Daemon##########
# by default, the daemon assumes it's going to be launched from a directory
# that has the following structure:
# GREENGRASS_ROOT/
# greengrassd
# bin/daemon
# configuration/
# group/group.json
# certs/server.crt
# lambda/
# system_lambda1/...
# system_lambda2/...
# root cgroup has to be mounted separately, this script doesn't do that for you.
#################################################################
set -e
PWD=$(cd $(dirname "$0"); pwd)
GGC_PKG_HOME=$(readlink -f $PWD)
GG_HOME=$(cd $GGC_PKG_HOME/../../; pwd)
CRASH_LOG=$GG_HOME/var/log/crash.log
GGC_ROOT_FS=$GGC_PKG_HOME/ggc_root
PID_FILE=/var/run/greengrassd.pid
FS_SETTINGS=/proc/sys/fs
GGC_GROUP=ggc_group
GGC_USER=ggc_user
MAX_DAEMON_KILL_WAIT_SECONDS=60
RETRY_SIGTERM_INTERVAL_SECONDS=20
if [ -z "$COMMAND" ]; then
COMMAND="$GGC_PKG_HOME/bin/daemon -core-dir=$GGC_PKG_HOME -greengrassdPid=$$"
fi
# Function ran as part of initial setup
setup() {
echo "Setting up greengrass daemon"
mkdir -p $GGC_ROOT_FS
# Mask greengrass directory for containers
mknod $GGC_ROOT_FS/greengrass c 1 3 &>/dev/null || true
mkdir -p $(dirname "$CRASH_LOG")
}
validatePlatformSecurity() {
if [[ -f $FS_SETTINGS/protected_hardlinks &&
-f $FS_SETTINGS/protected_symlinks ]]; then
PROT_HARDLINK_VAL=$(cat $FS_SETTINGS/protected_hardlinks)
PROT_SOFTLINK_VAL=$(cat $FS_SETTINGS/protected_symlinks)
if [[ "$PROT_HARDLINK_VAL" -ne 1 || "$PROT_SOFTLINK_VAL" -ne 1 ]]; then
echo "AWS Greengrass detected insecure OS configuration: No hardlink/softlink protection enabled." | tee -a $CRASH_LOG
exit 1
fi
fi
}
validateEnvironment() {
echo "Validating execution environment"
# ensure all commands that the installation script is going to use are available
if ! type grep >/dev/null ; then
echo "grep command is NOT on the path or is NOT installed on the system"
exit 1
fi
if ! type cat >/dev/null ; then
echo "cat command is NOT on the path or is NOT installed on the system"
exit 1
fi
if ! type awk >/dev/null ; then
echo "awk command is NOT on the path or is NOT installed on the system"
exit 1
fi
if ! type id >/dev/null ; then
echo "id command is NOT on the path or is NOT installed on the system"
exit 1
fi
if ! type ps >/dev/null ; then
echo "ps command is NOT on the path or is NOT installed on the system"
exit 1
fi
if ! type sqlite3 >/dev/null ; then
echo "sqlite3 command is NOT on the path or is NOT installed on the system"
exit 1
fi
# the script needs to be run as root
if [ ! $(id -u) = 0 ]; then
echo "The script needs to be run using sudo"
exit 1
fi
if ! id $GGC_USER >/dev/null ; then
echo "${GGC_USER} doesn't exist. Please add a user ${GGC_USER} on the system"
exit 1
fi
if ! grep -q $GGC_GROUP /etc/group ; then
echo "${GGC_GROUP} doesn't exist. Please add a group ${GGC_GROUP} on the system"
exit 1
fi
# ensure that kernel supports cgroup
if [ ! -e /proc/cgroups ]; then
echo "The kernel in use does NOT support cgroup."
exit 1
fi
# assume that all kernel supported subsystems, which are listed in /proc/cgroups, are going to be used
# so check whether all of them are mounted.
for d in `awk '$4 == 1 {print $1}' /proc/cgroups`; do
if cat /proc/self/cgroup | grep -q $d; then
echo "Found cgroup subsystem: $d"
else
# exit with error if can't find cgroup
echo "The cgroup subsystem is not mounted: $d"
exit 1
fi
done
}
finish() {
pid=$1
echo "$pid" > $PID_FILE
echo ""
echo -e "\e[0;32mGreengrass successfully started with PID: $pid\e[0m"
exit 0
}
start() {
setup
if [[ $INSECURE -ne 1 ]]; then
validatePlatformSecurity
fi
validateEnvironment
trap 'finish $pid' SIGUSR1
echo ""
echo -n "Starting greengrass daemon"
if nohup $COMMAND >/dev/null 2>$CRASH_LOG < /dev/null &
then
pid=$!
# sleep 10 seconds to wait for daemon to start or exit
sleep 10 &
wait $!
echo ""
echo "Greengrass daemon $pid failed to start"
echo -e "\e[0;31m$(cat $CRASH_LOG)\e[0m"
exit 1
else
echo "Failed to start Greengrass daemon"
exit 1
fi
}
version() {
$GGC_PKG_HOME/bin/daemon --version
}
stop() {
if [ -f $PID_FILE ]; then
PID=$(cat $PID_FILE)
echo "Stopping greengrass daemon of PID: $PID"
if [ ! -e "/proc/$PID" ]; then
rm $PID_FILE
echo "Process with pid $PID does not exist already"
return 0
fi
echo -n "Waiting"
kill "$PID" > /dev/null 2>&1
total_sleep_seconds=0
until [ "$total_sleep_seconds" -ge "$MAX_DAEMON_KILL_WAIT_SECONDS" ]; do
sleep 1
# If the pid no longer exists, we're done, remove the pid file and exit. Otherwise, just increment the loop counter
if [ ! -e "/proc/$PID" ]; then
rm $PID_FILE
echo -e "\nStopped greengrass daemon, exiting with success"
break
else
total_sleep_seconds=$(($total_sleep_seconds+1))
echo -n "."
fi
# If it has been $RETRY_SIGTERM_INTERVAL_SECONDS since the last SIGTERM, send SIGTERM
if [ $(($total_sleep_seconds % $RETRY_SIGTERM_INTERVAL_SECONDS)) -eq "0" ]; then
kill "$PID" > /dev/null 2>&1
fi
done
if [ $total_sleep_seconds -ge $MAX_DAEMON_KILL_WAIT_SECONDS ] && [ -e "/proc/$PID" ]; then
# If we are here, we never exited in the previous loop and the pid still exists. Exit with failure.
kill -9 "$PID" > /dev/null 2>&1
echo -e "\nProcess with pid $PID still alive after timeout of $MAX_DAEMON_KILL_WAIT_SECONDS seconds. Forced kill process, exiting with failure."
exit 1
fi
fi
}
usage() {
echo ""
echo "Usage: $0 [FLAGS] {start|stop|restart}"
echo ""
echo -e "[FLAGS]: \n -i, --insecure \t Run GGC in insecure mode without hardlink/softlink protection, (highly discouraged for production use) \n -v, --version \t\t Outputs the version of GGC."
echo ""
exit 1
}
if [[ $# -eq 0 ]]; then
usage
fi
for var in "$#"
do
case "$var" in
-v|--version)
version
exit 0
;;
esac
done
while [[ $# -gt 0 ]]
do
key="$1"
case $key in
-i|--insecure)
mkdir -p $(dirname "$CRASH_LOG")
echo "Warning! You are running in insecure mode, this is highly discouraged!" | tee -a $CRASH_LOG
INSECURE=1
;;
-h|--help)
usage
;;
start)
stop
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
usage
esac
shift
done
#Jim Maybe check the model of Pi you are using?
It seems that the Pi version of Greengrass is for ARMv7-A. I got this problem too and I'm using an older Model 1 B+ which is ARMv6Z (https://en.wikipedia.org/wiki/Raspberry_Pi#Specifications).
The error we're seeing for line 158 is the ./greengrassd script waiting for the actual process to run:
sudo /greengrass/ggc/packages/1.1.0/bin/daemon -core-dir=/greengrass/ggc/packages/1.1.0 -greengrassdPid=641
/greengrass/ggc/packages/1.1.0/bin/daemon is the binary. If you run the above command directly in the console it exits with the same segmentation fault error.
AWS do recommend using the Pi 3 so I'm guessing it will work on that.

Service status not working

I have the following code for a service that I'm trying to have automatically start on boot.
#!/bin/sh
# Source function library.
. /etc/rc.d/init.d/functions
RETVAL=0
prog='foo'
exec="/usr/sbin/$prog"
pidfile="/var/run/$prog.pid"
lock_file="/var/lock/subsys/$prog"
logfile="/var/log/$prog"
if [ -f /etc/default/foo ]; then
. /etc/default/foo
fi
if [ -z $QUEUE_TYPE ]; then
echo 'ENV variable QUEUE_TYPE has not been set, please set it in /etc/default/foo'
exit 1
fi
get_pid() {
cat "$pidfile"
}
is_running() {
[ -f "$pidfile" ] && ps `get_pid` > /dev/null 2>&1
}
case "$1" in
start)
echo -n "Starting Consul daemon: "
#
daemon --pidfile $pidfile --check foo --user my-user "my app stuff here"
echo
;;
stop)
echo -n 'Stopping Consul daemon: '
killproc foo
echo
;;
status)
status $pidfile
RETVAL=$?
#status -p $pidfile -l $prog
#[ $RETVAL -eq 0 ] && RETVAL=$?
#RETVAL=$?
#if is_running; then
# echo 'Running'
#else
# echo 'Not Running'
#fi
#status foo
#RETVAL=$?
;;
restart)
$0 stop
$0 start
RETVAL=$?
;;
*)
echo 'Usage: foo {start|stop|status|restart}'
exit 1
esac
exit $RETVAL
When I run sudo service foo status it says that it hasn't been started which is correct. After running sudo service foo start and then running the status command, it tells me that the service hasn't been started. I'm not sure what is causing this to happen. I looked at the configurations for other init.d scripts to see how they were handling this and tried to follow their lead. Is there something obvious here that I'm doing wrong or something else that I may be unaware of that's causing this problem?

monitoring gearman in nagios

I am trying to monitor gearman by nagios for that I am using script check_gearman.sh.
Localhost is where gearman server running.
When I run
./check_gearman.sh -H localhost -p 4730 -t 1000
It results in:
CRITICAL: gearman: gearman_client_run_tasks : gearman_wait(GEARMAN_TIMEOUT) timeout reached, 1 servers were poll(), no servers were available, pipe:false -> libgearman/universal.cc:331: pid(613)
Can some one please help me out in this.
below is script
#!/bin/sh
#
# gearman check for nagios
# written by Georg Thoma (georg#thoma.cn)
# Last modified: 07-04-2014
#
# Description:
#
#
#
PROGNAME=`/usr/bin/basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION="0.04"
export TIMEFORMAT="%R"
. $PROGPATH/utils.sh
# Defaults
hostname=localhost
port=4730
timeout=50
# search for gearmanstuff
GEARMAN_BIN=`which gearman 2>&1 | grep -v "no gearman in"`
if [ "x$GEARMAN_BIN" == "x" ] ; then # result of check is empty
echo "gearman executable not found in path"
exit $STATE_UNKNOWN
fi
GEARADMIN_BIN=`which gearadmin 2>&1 | grep -v "no gearadmin in"`
if [ "x$GEARADMIN_BIN" == "x" ] ; then # result of check is empty
echo "gearadmin executable not found in path"
exit $STATE_UNKNOWN
fi
print_usage() {
echo "Usage: $PROGNAME [-H hostname -p port -t timeout]"
echo "Usage: $PROGNAME --help"
echo "Usage: $PROGNAME --version"
}
print_help() {
print_revision $PROGNAME $REVISION
echo ""
print_usage
echo ""
echo "gearman check plugin for nagios"
echo ""
support
}
# Make sure the correct number of command line
# arguments have been supplied
if [ $# -lt 1 ]; then
print_usage
exit $STATE_UNKNOWN
fi
# Grab the command line arguments
exitstatus=$STATE_WARNING #default
while test -n "$1"; do
case "$1" in
--help)
print_help
exit $STATE_OK
;;
-h)
print_help
exit $STATE_OK
;;
--version)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
-V)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
-H)
hostname=$2
shift
;;
--hostname)
hostname=$2
shift
;;
-t)
timeout=$2
shift
;;
--timeout)
timeout=$2
shift
;;
-p)
port=$2
shift
;;
--port)
port=$2
shift
;;
*)
echo "Unknown argument: $1"
print_usage
exit $STATE_UNKNOWN
;;
esac
shift
done
# check if server is running and replys to version query
VERSION_RESULT=`$GEARADMIN_BIN -h $hostname -p $port --server-version 2>&1 `
if [ "x$VERSION_RESULT" == "x" ] ; then # result of check is empty
echo "CRITICAL: Server is not running / responding"
exitstatus=$STATE_CRITICAL
exit $exitstatus
fi
# drop funtion echo to remove functions without workers
DROP_RESULT=`$GEARADMIN_BIN -h $hostname -p $port --drop-function echo_for_nagios 2>&1 `
# check for worker echo_for_nagios and start a new one if needed
CHECKWORKER_RESULT=`$GEARADMIN_BIN -h $hostname -p $port --status | grep echo_for_nagios`
if [ "x$CHECKWORKER_RESULT" == "x" ] ; then # result of check is empty
nohup $GEARMAN_BIN -h $hostname -p $port -w -f echo_for_nagios -- echo echo >/dev/null 2>&1 &
fi
# check the time to get the status from gearmanserver
CHECKWORKER_TIME=$( { time $GEARADMIN_BIN -h $hostname --status ; } 2>&1 |tail -1 )
# check if worker returns "echo"
CHECK_RESULT=`cat /dev/null | $GEARMAN_BIN -h $hostname -p $port -t $timeout -f echo_for_nagios 2>&1`
# validate result and set message and exitstatus
if [ "$CHECK_RESULT" = "echo" ] ; then # we got echo back
echo "OK: got an echo back from gearman server version: $VERSION_RESULT, responded in $CHECKWORKER_TIME sec|time=$CHECKWORKER_TIME;;;"
exitstatus=$STATE_OK
else # timeout reached, no echo
echo "CRITICAL: $CHECK_RESULT"
exitstatus=$STATE_CRITICAL
fi
exit $exitstatus
Thanks in advance.
If you download the mod_gearman package, this contains a much better and more featured check_gearman plugin for Nagios.
With your current plugin, the error message shows that the check script cannot connect to the gearman daemon.
You should verify that port 4370 is listening on localhost, and that there is no local firewall blocking connections. It is likely that you have installed your gearmand on a different port, or have it only listening on the network interface, not on localhost. Or maybe it is not runing at all, or is on a different server from the one running the check...

starting WSO2 carbon with su

I did install the WSO2 Identity server on a Ubuntu 10.4 server and connected it to a MySQL database. Now I did create a user wso2user and gave this user full permission over the WSO2 folders. When I start the server with the following command:
#! /bin/sh
su wso2user -c '/opt/identitywso2/bin/wso2server.sh'
the server starts and I can log in, but the my command prompt stays in the shell with the last log message:
[2014-05-19 14:14:27,938] INFO {org.wso2.carbon.identity.entitlement.internal.EntitlementServiceComponent} - Started thrift entitlement service at port:10500
[2014-05-19 14:14:43,534] INFO {org.wso2.carbon.identity.entitlement.internal.SchemaBuilder} - XACML policy schema loaded successfully.
What could be wrong? I want to start the serve without need to stay in the shell.
Thanks for any hints.
Lucas
Here is my script, based on WSO2 API Manager, but you can use to also for any other WSO2 product. Script is based on Suse EE SP3. Put this file in /etc/init.d and do a checkconfig.
#!/bin/sh
#
# /etc/init.d/wso2
# init script for wso2.
#
# chkconfig: 2345 90 60
# description: wso2 indexer service
#
RETVAL=0
. /etc/rc.status
BAD_USER="This script should be run as root or as wso2 user. Exiting......."
cmd="/bin/sh -c"
if [ "$USER" != 'root' -a "$USER" != 'wso2' -a "$USER" != '' ]; then echo $BAD_USER && exit 1;fi
if [ "$USER" == 'root' -o "$USER" == '' ]; then cmd="su - wso2 -c";fi
wso2pid=`pidof java`
wso2_start() {
echo Starting wso2...
$cmd "/opt/wso2/am/bin/wso2server.sh --start"
}
wso2_stop() {
echo Stopping wso2...
$cmd "/opt/wso2/am/bin/wso2server.sh --stop"
if [ -n "$wso2pid" ]
then
echo -n "Waiting for wso2 ($wso2pid)"
while [[ ( -d /proc/$wso2pid ) ]]
do
echo -n "."
sleep 1
done
echo "Stopped"
fi
}
wso2_restart() {
echo Restarting wso2...
$cmd "/opt/wso2/am/bin/wso2server.sh --restart"
}
wso2_status() {
echo -n "Status of wso2 is "
if [ -n "$wso2pid" ]
then echo "Running. ($wso2pid)"
else echo "Stopped."
fi
}
case "$1" in
status)
wso2_status
;;
start)
wso2_start
;;
stop)
wso2_stop
;;
restart)
wso2_restart
;;
*)
echo "Usage: $0 {start|stop|restart}"
exit 1
;;
esac
exit $RETVAL

Resources