How would you make a shell script to monitor mounts and log issues?

How would you make a shell script to monitor mounts and log issues? - linux

I am looking for a good way monitor and log mounts on a CentOS 6.5 box. Since I am new to Linux shell scripting I am somewhat at a loss as to if there is something that is already around and proven which I could just plug in or is there a good method I should direct my research toward to build my own.
In the end what I am hoping to have running is a check of each of the 9 mounts on the server to confirm they are up and working. If there is an issue I would like to log the information to a file, possibly email out the info, and check the next mount. 5-10 minutes later I would like to run it again. I know that probably this isn't needed but we are trying to gather evidence if there is an issue or show to a vendor that what they are saying is the issue is not a problem.

This shell script will test each mountpoint and send mail to root if any of them is not mounted:
#!/bin/bash
while sleep 10m;
do
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
done
My intention here is not to give a complete turn-key solution but, instead, to give you a starting point for your research.
To make this fit your precise needs, you will need to learn about bash and shell scripts, cron jobs, and other of Unix's very useful tools.
How it works
#!/bin/bash
This announces that this is a bash script.
while sleep 10m; do
This repeats the commands in the loop once every 10 minutes.
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
This cycles through mount points /mnt/disk1, /mnt/disk2, and /mnt/disk3 and tests that each one is mounted. If it isn't, a message is created and stored in the shell variable status.
You will want to replace /mnt/disk1 /mnt/disk2 /mnt/disk3 with your list of mount points, whatever they are.
This uses the command mountpoint which is standard on modern linux versions. It is part of the util-linux package. It might be missing on old installations.
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
If status contains any messages, they will be mailed to root with the subject line Missing mount.
There are a few different versions of the mail command. You may need to adjust the argument list to work with the version on your system.
done
This marks the end of the while loop.
Notes
The above script uses a while loop that runs the tests every ten minutes. If you are familiar with the cron system, you may want to use that to run the commands every 10 minutes instead of the while loop.

Related

shell: rebooting bunch of servers all at the same time

I want to be able to reboot a bunch of servers all at the same time (in a bash script).
Currently, what I do is something like that:
function reboot_servers() {
echo "Rebooting servers..."
for server in "${servers[#]}"
do
sshpass -p 'password' ssh -o StrictHostKeyChecking=no root#$server 'reboot'
done
}
(servers is an array of 4 servers, sometimes 8, and in the future probably more)
Now, I am aware that in theory I cannot really have them rebooted all at the exact same time, but I'd like it to be as simultaneously as possible, and the above solution is far from optimal for me.
In my current script, if every iteration takes (say) few hundreds milliseconds in average (the ssh login sometimes lags and is unpredictable), the time passed from when the first server launches the reboot command until the last one does could amount to seconds, which is completely ineffective.
I should also mention that the clocks in all the servers are synced, and also to give you some context, the above function is being run over and over again in something similar to this;
function main() {
iteration=0
while true
do
echo "------> Iteration $((++iteration)) <------"
wait_random_time
reboot_servers
wait_for_servers
if bug_reproduced
then
echo "Bug was reproduced."
exit 0
else
echo "No reproduction, trying again..."
fi
done
}
I read a little bit about the at command, but I'm not sure how to use it for my benefit here.

I would recommend using parallel-ssh.

I end up using pdsh, which gave quite impressive results...
$> pdsh -l root -w server0[0-3] date "+%T.%3N"
server00: 12:29:45.845
server01: 12:29:45.830
server02: 12:29:45.870
server03: 12:29:45.893

Using "Deep Sleep" / "Power Saving" mode through SSH in NAS OS?

I'm an user of a LACIE 2-BIG-NAS. Until the NAS OS 4.1.9.2 version I had the "Deep Sleep" option in the Home menu, but after the next upgrade this option was removed.
I tried to downgrade to the previous version following the manual steps but it was not able, only upgrades are available.
I asked to the support service of Lacie but the their solution is to backup my data and do a fresh install and upgrade until 4.1.9.2. This isn't a solution from my point of view.
Now I tried to get into deep sleep mode from a SSH conection because NAS OS is a linux-based SO. I tried all the posibilities with initng command (sudo ngc -0 and -1) which is used by the NAS OS, but it's imposible to wake on lan the NAS (the OS powers off but no answer from the wake-on-lan request).
The code for wake on lan is correct because when I schedule the deep sleep mode I can do it, but I don`t know how to get deep sleep mode on-demand.
I googled and try other options but I think these were the closest to the solution.
Please, can you help me to find the correct ssh command line to get the deep sleep mode in the Lacie 2-big-nas?
Best regards.

I found the solution in cron. There is a scheduled command in /sbin/smart_shutdown so, if you execute that script as root, the 2-big-nas go into the Deep sleep mode.
This is the content of the script "smart_shutdown":
#!/bin/sh
#
# This script is intended to handle a user shutdown request.
# It will probably (but not necesseraly) called from a crontab.
#
PATH=/bin:/sbin:/usr/bin:/usr/sbin
valid_runlevels="shutdown halt sleep reboot"
runlevel="sleep"
check_runlevel()
{
req_runlevel=$1
for valid in ${valid_runlevels}; do
[ "${req_runlevel}" = "${valid}" ] && return 0
done
logger "smart_shutdown: request invalid runlevel ${req_runlevel}"
return 1
}
request_runlevel()
{
dbus-send --system --dest=com.lacie.Unicorn --type=method_call --print-reply --reply-timeout=1000 /com/lacie/Unicorn com.lacie.Unicorn.switch_runlevel string:"$1"
}
if [ ! -z "$1" ]; then
check_runlevel "$1" || exit 1
runlevel=$1
fi
request_runlevel ${runlevel}
exit 0
I hope you can take advantage of this in the future.

LDAP - SSH script across multiple VM's

So I'm ssh'ing into a router that has several VM's. It is setup using LDAP so that each VM has the same files, settings, etc. However they have different cores allocated, different libraries and packages installed. Instead of logging into each VM individually and running the command, I want to automate it by putting the script in .bashrc.
So what I have so far:
export LD_LIBRARY_PATH=/lhome/username
# .so files are in ~/ to avoid permission denied problems
output=$(cat /proc/cpuinfo | grep "^cpu cores" | uniq | tail -c 2)
current=server_name
if [[ `hostname-s` != $current ]]; then
ssh $current
fi
/path/to/program --hostname $(echo $(hostname -s)) --threads $((output*2))
Each VM, upon logging in, will execute this script, so I have to check if the current VM has the hostname to avoid an SSH loop. The idea is to run the program, then exit back out to the origin to resume the script. The problem is of course that the process will die upon logging out.
It's been suggested to me to use TMUX on an array of the hostnames, but I would have no idea on how to approach this.

You could install clusterSSH, set up a list of hostnames, and execute things from the terminal windows opened. You may use screen/tmux/nohup to allow processes started to keep running, even after logout.
Yet, if you still want to play around with scripting, you may install tmux, and use:
while read host; do
scp "script_to_run_remotely" ${host}:~/
ssh ${host} tmux new-session -d '~/script_to_run_remotely'\; detach
done < hostlist
Note: hostlist should be a list of hostnames, one per line.

Is there a variable in Linux that shows me the last time the machine was turned on?

I want to create a script that, after knowing that my machine has been turned on for at least 7h, it does something.
Is this possible? Is there a system variable or something like that that shows me the last time the machine was turned on?

The following command placed in /etc/rc.local:
echo 'touch /tmp/test' | at -t $(date -d "+7 hours" +%m%d%H%M)
will create a job that will run a touch /tmp/test in seven hours.
To protect against frequent reboots and prevent adding multiple jobs you could use one at queue exclusively for this type of jobs (e.g. c queue). Adding -q c to the list of at parameters will place the job in the c queue. Before adding new job you can delete all jobs from c queue:
for job in $(atq -q c | sed 's/[ \t].*//'); do atrm $job; done

You can parse the output of uptime I suppose.

As Pavel and thkala point out below, this is not a robust solution. See their comments!
The uptime command shows you how long the system has been running.
To accomplish your task, you can make a script that first does sleep 25200 (25200 seconds = 7 hours), and then does something useful. Have this script run at startup, for example by adding it to /etc/rc.local. This is a better idea than polling the uptime command to see if the machine has been up for 7 hours (which is comparable to a kid in the backseat of a car asking "are we there yet?" :-))

Just wait for uptime to equal seven hours.
http://linux.die.net/man/1/uptime

I don't know if this is what you are looking for, but uptime command will give you for how many computer was running since last reboot.

$ cut -d ' ' -f 1 </proc/uptime
This will give you the current system uptime in seconds, in floating point format.
The following could be used in a bash script:
if [[ "$(cut -d . -f 1 </proc/uptime)" -gt "$(($HOURS * 3600))" ]]; then
...
fi

Add the following to your crontab:
#reboot sleep 7h; /path/to/job
Either /etc/crontab, /etc/cron.d/, or your users crontab, depending on whether you want to run it as root or the user -- don't forget to put "root" after "#reboot" if you put it in /etc/crontab or cron.d
This has the benefit that if you reboot multiple times, the jobs get cancelled at shut down, so you won't get a bunch of them stacking up if you reboot several times within 7 hours. The "#reboot" time specification triggers the job to be run once when the system is rebooted. "sleep 7h;" waits for 7 hours before running "/path/to/job".

How to make sure an application keeps running on Linux

I'm trying to ensure a script remains running on a development server. It collates stats and provides a web service so it's supposed to persist, yet a few times a day, it dies off for unknown reasons. When we notice we just launch it again, but it's a pain in the rear and some users don't have permission (or the knowhow) to launch it up.
The programmer in me wants to spend a few hours getting to the bottom of the problem but the busy person in me thinks there must be an easy way to detect if an app is not running, and launch it again.
I know I could cron-script ps through grep:
ps -A | grep appname
But again, that's another hour of my life wasted on doing something that must already exist... Is there not a pre-made app that I can pass an executable (optionally with arguments) and that will keep a process running indefinitely?
In case it makes any difference, it's Ubuntu.

I have used a simple script with cron to make sure that the program is running. If it is not, then it will start it up. This may not be the perfect solution you are looking for, but it is simple and works rather well.
#!/bin/bash
#make-run.sh
#make sure a process is always running.
export DISPLAY=:0 #needed if you are running a simple gui app.
process=YourProcessName
makerun="/usr/bin/program"
if ps ax | grep -v grep | grep $process > /dev/null
then
exit
else
$makerun &
fi
exit
Then add a cron job every minute, or every 5 minutes.

Monit is perfect for this :)
You can write simple config files which tell monit to watch e.g. a TCP port, a PID file etc
monit will run a command you specify when the process it is monitoring is unavailable/using too much memory/is pegging the CPU for too long/etc. It will also pop out an email alert telling you what happened and whether it could do anything about it.
We use it to keep a load of our websites running while giving us early warning when something's going wrong.
-- Your faithful employee, Monit

Notice: Upstart is in maintenance mode and was abandoned by Ubuntu which uses systemd. One should check the systemd' manual for details how to write service definition.
Since you're using Ubuntu, you may be interested in Upstart, which has replaced the traditional sysV init. One key feature is that it can restart a service if it dies unexpectedly. Fedora has moved to upstart, and Debian is in experimental, so it may be worth looking into.
This may be overkill for this situation though, as a cron script will take 2 minutes to implement.
#!/bin/bash
if [[ ! `pidof -s yourapp` ]]; then
invoke-rc.d yourapp start
fi

If you are using a systemd-based distro such as Fedora and recent Ubuntu releases, you can use systemd's "Restart" capability for services. It can be setup as a system service or as a user service if it needs to be managed by, and run as, a particular user, which is more likely the case in OP's particular situation.
The Restart option takes one of no, on-success, on-failure, on-abnormal, on-watchdog, on-abort, or always.
To run it as a user, simply place a file like the following into ~/.config/systemd/user/something.service:
[Unit]
Description=Something
[Service]
ExecStart=/path/to/something
Restart=on-failure
[Install]
WantedBy=graphical.target
then:
systemctl --user daemon-reload
systemctl --user [status|start|stop|restart] something
No root privilege / modification of system files needed, no cron jobs needed, nothing to install, flexible as hell (see all the related service options in the documentation).
See also https://wiki.archlinux.org/index.php/Systemd/User for more information about using the per-user systemd instance.

I have used from cron "killall -0 programname || /etc/init.d/programname start". kill will error if the process doesn't exist. If it does exist, it'll deliver a null signal to the process (which the kernel will ignore and not bother passing on.)
This idiom is simple to remember (IMHO). Generally I use this while I'm still trying to discover why the service itself is failing. IMHO a program shouldn't just disappear unexpectedly :)

Put your run in a loop- so when it exits, it runs again... while(true){ run my app.. }

I couldn't get Chris Wendt solution to work for some reason, and it was hard to debug. This one is pretty much the same but easier to debug, excludes bash from the pattern matching. To debug just run: bash ./root/makerun-mysql.sh. In the following example with mysql-server just replace the value of the variables for process and makerun for your process.
Create a BASH-script like this (nano /root/makerun-mysql.sh):
#!/bin/bash
process="mysql"
makerun="/etc/init.d/mysql restart"
if ps ax | grep -v grep | grep -v bash | grep --quiet $process
then
printf "Process '%s' is running.\n" "$process"
exit
else
printf "Starting process '%s' with command '%s'.\n" "$process" "$makerun"
$makerun
fi
exit
Make sure it's executable by adding proper file permissions (i.e. chmod 700 /root/makerun-mysql.sh)
Then add this to your crontab (crontab -e):
# Keep processes running every 5 minutes
*/5 * * * * bash /root/makerun-mysql.sh

The supervise tool from daemontools would be my preference - but then everything Dan J Bernstein writes is my preference :)
http://cr.yp.to/daemontools/supervise.html
You have to create a particular directory structure for your application startup script, but it's very simple to use.

first of all, how do you start this app? Does it fork itself to the background? Is it started with nohup .. & etc? If it's the latter, check why it died in nohup.out, if it's the first, build logging.
As for your main question: you could cron it, or run another process on the background (not the best choice) and use pidof in a bashscript, easy enough:
if [ `pidof -s app` -eq 0 ]; then
nohup app &
fi

You could make it a service launched from inittab (although some Linuxes have moved on to something newer in /etc/event.d). These built in systems make sure your service keeps running without writing your own scripts or installing something new.

It's a job for a DMD (daemon monitoring daemon). there are a few around; but I usually just write a script that checks if the daemon is running, and run if not, and put it in cron to run every minute.

Check out 'nanny' referenced in Chapter 9 (p197 or thereabouts) of "Unix Hater's Handbook" (one of several sources for the book in PDF).

A nice, simple way to do this is as follows:
Write your server to die if it can't listen on the port it expects
Set a cronjob to try to launch your server every minute
If it isn't running it'll start, and if it is running it won't. In any case, your server will always be up.

I think a better solution is if you test the function, too. For example, if you had to test an apache, it is not enough only to test, if "apache" processes on the systems exist.
If you want to test if apache OK is, then try to download a simple web page, and test if your unique code is in the output.
If not, kill the apache with -9 and then do a restart. And send a mail to the root (which is a forwarded mail address to the roots of the company/server/project).

It's even simplier:
#!/bin/bash
export DISPLAY=:0
process=processname
makerun="/usr/bin/processname"
if ! pgrep $process > /dev/null
then
$makerun &
fi
You have to remember though to make sure processname is unique.

One can install minutely monitoring cronjob like this:
crontab -l > crontab;echo -e '* * * * * export DISPLAY=":0.0" && for
app in "eiskaltdcpp-qt" "transmission-gtk" "nicotine";do ps aux|grep
-v grep|grep "$app";done||"$app" &' >> crontab;crontab crontab
disadvantage is that the app names you enter have to be found in ps aux|grep "appname" output and at same time being able to be launched using that name: "appname" &

also you can use the pm2 library.
sudo apt-get pm2
And if its a node app can install.
Sudo npm install pm2 -g
them can run the service.
linux service:
sudo pm2 start [service_name]
npm service app:
pm2 start index.js

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string