shell: rebooting bunch of servers all at the same time - linux

I want to be able to reboot a bunch of servers all at the same time (in a bash script).
Currently, what I do is something like that:
function reboot_servers() {
echo "Rebooting servers..."
for server in "${servers[#]}"
do
sshpass -p 'password' ssh -o StrictHostKeyChecking=no root#$server 'reboot'
done
}
(servers is an array of 4 servers, sometimes 8, and in the future probably more)
Now, I am aware that in theory I cannot really have them rebooted all at the exact same time, but I'd like it to be as simultaneously as possible, and the above solution is far from optimal for me.
In my current script, if every iteration takes (say) few hundreds milliseconds in average (the ssh login sometimes lags and is unpredictable), the time passed from when the first server launches the reboot command until the last one does could amount to seconds, which is completely ineffective.
I should also mention that the clocks in all the servers are synced, and also to give you some context, the above function is being run over and over again in something similar to this;
function main() {
iteration=0
while true
do
echo "------> Iteration $((++iteration)) <------"
wait_random_time
reboot_servers
wait_for_servers
if bug_reproduced
then
echo "Bug was reproduced."
exit 0
else
echo "No reproduction, trying again..."
fi
done
}
I read a little bit about the at command, but I'm not sure how to use it for my benefit here.

I would recommend using parallel-ssh.

I end up using pdsh, which gave quite impressive results...
$> pdsh -l root -w server0[0-3] date "+%T.%3N"
server00: 12:29:45.845
server01: 12:29:45.830
server02: 12:29:45.870
server03: 12:29:45.893

Related

How to use the "watch" command with SSH

I have a script that monitors a specific server, giving me the disk usage, CPU usage, etc. I am using 2 Ubuntu VMs: I run the script on the server using SSH (ssh user#ip < script.sh from the first VM), and I want to make it show values in real time, so I tried 2 approaches I found on here:
1. while loop with clear
The first approach is using a while loop with "clear" to make the script run multiple times, giving new values every time and clearing the previous output like so:
while true
do
clear;
# bunch of code
done
The problem here is that it doesn't clear the terminal, it just keeps printing the new results one after another.
2. watch
The second approach uses watch:
watch -n 1 Script.sh
This works fine on the local machine (to monitor the current machine where the script is), but I can't find a way to make it run via SSH. Something like
ssh user#ip < 'watch -n 1 script.sh'
works in principle, but requires that the script be present on the server, which I want to avoid. Is there any way to run watch for the remote execution (via SSH) of a script that is present on the local machine?
For your second approach (using watch), what you can do instead is to run watch locally (from within the first VM) with an SSH command and piped-in script like this:
watch -n 1 'ssh user#ip < script.sh'
The drawback of this is that it will reconnect in each watch iteration (i.e., once a second), which some server configurations might not allow. See here for how to let SSH re-use the same connection for serial ssh runs.
But if what you want to do is to monitor servers, what I really recommend is to use a monitoring system like 'telegraf'.

How would you make a shell script to monitor mounts and log issues?

I am looking for a good way monitor and log mounts on a CentOS 6.5 box. Since I am new to Linux shell scripting I am somewhat at a loss as to if there is something that is already around and proven which I could just plug in or is there a good method I should direct my research toward to build my own.
In the end what I am hoping to have running is a check of each of the 9 mounts on the server to confirm they are up and working. If there is an issue I would like to log the information to a file, possibly email out the info, and check the next mount. 5-10 minutes later I would like to run it again. I know that probably this isn't needed but we are trying to gather evidence if there is an issue or show to a vendor that what they are saying is the issue is not a problem.
This shell script will test each mountpoint and send mail to root if any of them is not mounted:
#!/bin/bash
while sleep 10m;
do
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
done
My intention here is not to give a complete turn-key solution but, instead, to give you a starting point for your research.
To make this fit your precise needs, you will need to learn about bash and shell scripts, cron jobs, and other of Unix's very useful tools.
How it works
#!/bin/bash
This announces that this is a bash script.
while sleep 10m; do
This repeats the commands in the loop once every 10 minutes.
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
This cycles through mount points /mnt/disk1, /mnt/disk2, and /mnt/disk3 and tests that each one is mounted. If it isn't, a message is created and stored in the shell variable status.
You will want to replace /mnt/disk1 /mnt/disk2 /mnt/disk3 with your list of mount points, whatever they are.
This uses the command mountpoint which is standard on modern linux versions. It is part of the util-linux package. It might be missing on old installations.
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
If status contains any messages, they will be mailed to root with the subject line Missing mount.
There are a few different versions of the mail command. You may need to adjust the argument list to work with the version on your system.
done
This marks the end of the while loop.
Notes
The above script uses a while loop that runs the tests every ten minutes. If you are familiar with the cron system, you may want to use that to run the commands every 10 minutes instead of the while loop.

Linux, Bash execute commands in script after spawned process has ended

Writing a small bash script.
I have a command that spawns a process then returns, leaving the spawned process running. I need to wait for the spawned process to terminate, then run some commands. How can I do this?
The specific case is:
VBoxManage startvm "my_vm"
#when my_vm closes
do_things
However, I've encountered this issue before in other contexts, so if possible I'm looking for a general solution, rather than just one relating to virtualbox vm's.
I have an answer, and it's not pretty
tail --pid=$pid -f /dev/null
In the context of a VBox vm, the following heroic one-liner has proved successful.
VBoxManage startvm "my_vm"; tail --pid=$(awk '/Process ID:/ {print $4;}' /path_to/my_vm/Logs/VBox.log) -f /dev/null; echo hello
Running this, I was able to see that 'hello' was not output until after my_vm had shut down.
Good grief, there needs to be a better way of doing that than a side effect of an option of a command out of left-field. Any better answers? Please....
Unfortunately, I do not think you can generalize VirtualBox (or any process that spawns processes in the background) to one-size-fits-all solution.
Here is a VirtualBox specific answer.
As you noticed VBox runs the machine (and a bunch other things) in the background. But you can query VBoxManage for what is running:
vm_name="my_vm"
VBoxManage startvm $vm_name
while [[ `VBoxManage list runningvms | grep $vm_name` ]]; do
echo Sleeping ...
sleep 5
done
echo Done!
do_things
Hopefully you can tune this to your specific needs.

Is there a variable in Linux that shows me the last time the machine was turned on?

I want to create a script that, after knowing that my machine has been turned on for at least 7h, it does something.
Is this possible? Is there a system variable or something like that that shows me the last time the machine was turned on?
The following command placed in /etc/rc.local:
echo 'touch /tmp/test' | at -t $(date -d "+7 hours" +%m%d%H%M)
will create a job that will run a touch /tmp/test in seven hours.
To protect against frequent reboots and prevent adding multiple jobs you could use one at queue exclusively for this type of jobs (e.g. c queue). Adding -q c to the list of at parameters will place the job in the c queue. Before adding new job you can delete all jobs from c queue:
for job in $(atq -q c | sed 's/[ \t].*//'); do atrm $job; done
You can parse the output of uptime I suppose.
As Pavel and thkala point out below, this is not a robust solution. See their comments!
The uptime command shows you how long the system has been running.
To accomplish your task, you can make a script that first does sleep 25200 (25200 seconds = 7 hours), and then does something useful. Have this script run at startup, for example by adding it to /etc/rc.local. This is a better idea than polling the uptime command to see if the machine has been up for 7 hours (which is comparable to a kid in the backseat of a car asking "are we there yet?" :-))
Just wait for uptime to equal seven hours.
http://linux.die.net/man/1/uptime
I don't know if this is what you are looking for, but uptime command will give you for how many computer was running since last reboot.
$ cut -d ' ' -f 1 </proc/uptime
This will give you the current system uptime in seconds, in floating point format.
The following could be used in a bash script:
if [[ "$(cut -d . -f 1 </proc/uptime)" -gt "$(($HOURS * 3600))" ]]; then
...
fi
Add the following to your crontab:
#reboot sleep 7h; /path/to/job
Either /etc/crontab, /etc/cron.d/, or your users crontab, depending on whether you want to run it as root or the user -- don't forget to put "root" after "#reboot" if you put it in /etc/crontab or cron.d
This has the benefit that if you reboot multiple times, the jobs get cancelled at shut down, so you won't get a bunch of them stacking up if you reboot several times within 7 hours. The "#reboot" time specification triggers the job to be run once when the system is rebooted. "sleep 7h;" waits for 7 hours before running "/path/to/job".

How to make sure an application keeps running on Linux

I'm trying to ensure a script remains running on a development server. It collates stats and provides a web service so it's supposed to persist, yet a few times a day, it dies off for unknown reasons. When we notice we just launch it again, but it's a pain in the rear and some users don't have permission (or the knowhow) to launch it up.
The programmer in me wants to spend a few hours getting to the bottom of the problem but the busy person in me thinks there must be an easy way to detect if an app is not running, and launch it again.
I know I could cron-script ps through grep:
ps -A | grep appname
But again, that's another hour of my life wasted on doing something that must already exist... Is there not a pre-made app that I can pass an executable (optionally with arguments) and that will keep a process running indefinitely?
In case it makes any difference, it's Ubuntu.
I have used a simple script with cron to make sure that the program is running. If it is not, then it will start it up. This may not be the perfect solution you are looking for, but it is simple and works rather well.
#!/bin/bash
#make-run.sh
#make sure a process is always running.
export DISPLAY=:0 #needed if you are running a simple gui app.
process=YourProcessName
makerun="/usr/bin/program"
if ps ax | grep -v grep | grep $process > /dev/null
then
exit
else
$makerun &
fi
exit
Then add a cron job every minute, or every 5 minutes.
Monit is perfect for this :)
You can write simple config files which tell monit to watch e.g. a TCP port, a PID file etc
monit will run a command you specify when the process it is monitoring is unavailable/using too much memory/is pegging the CPU for too long/etc. It will also pop out an email alert telling you what happened and whether it could do anything about it.
We use it to keep a load of our websites running while giving us early warning when something's going wrong.
-- Your faithful employee, Monit
Notice: Upstart is in maintenance mode and was abandoned by Ubuntu which uses systemd. One should check the systemd' manual for details how to write service definition.
Since you're using Ubuntu, you may be interested in Upstart, which has replaced the traditional sysV init. One key feature is that it can restart a service if it dies unexpectedly. Fedora has moved to upstart, and Debian is in experimental, so it may be worth looking into.
This may be overkill for this situation though, as a cron script will take 2 minutes to implement.
#!/bin/bash
if [[ ! `pidof -s yourapp` ]]; then
invoke-rc.d yourapp start
fi
If you are using a systemd-based distro such as Fedora and recent Ubuntu releases, you can use systemd's "Restart" capability for services. It can be setup as a system service or as a user service if it needs to be managed by, and run as, a particular user, which is more likely the case in OP's particular situation.
The Restart option takes one of no, on-success, on-failure, on-abnormal, on-watchdog, on-abort, or always.
To run it as a user, simply place a file like the following into ~/.config/systemd/user/something.service:
[Unit]
Description=Something
[Service]
ExecStart=/path/to/something
Restart=on-failure
[Install]
WantedBy=graphical.target
then:
systemctl --user daemon-reload
systemctl --user [status|start|stop|restart] something
No root privilege / modification of system files needed, no cron jobs needed, nothing to install, flexible as hell (see all the related service options in the documentation).
See also https://wiki.archlinux.org/index.php/Systemd/User for more information about using the per-user systemd instance.
I have used from cron "killall -0 programname || /etc/init.d/programname start". kill will error if the process doesn't exist. If it does exist, it'll deliver a null signal to the process (which the kernel will ignore and not bother passing on.)
This idiom is simple to remember (IMHO). Generally I use this while I'm still trying to discover why the service itself is failing. IMHO a program shouldn't just disappear unexpectedly :)
Put your run in a loop- so when it exits, it runs again... while(true){ run my app.. }
I couldn't get Chris Wendt solution to work for some reason, and it was hard to debug. This one is pretty much the same but easier to debug, excludes bash from the pattern matching. To debug just run: bash ./root/makerun-mysql.sh. In the following example with mysql-server just replace the value of the variables for process and makerun for your process.
Create a BASH-script like this (nano /root/makerun-mysql.sh):
#!/bin/bash
process="mysql"
makerun="/etc/init.d/mysql restart"
if ps ax | grep -v grep | grep -v bash | grep --quiet $process
then
printf "Process '%s' is running.\n" "$process"
exit
else
printf "Starting process '%s' with command '%s'.\n" "$process" "$makerun"
$makerun
fi
exit
Make sure it's executable by adding proper file permissions (i.e. chmod 700 /root/makerun-mysql.sh)
Then add this to your crontab (crontab -e):
# Keep processes running every 5 minutes
*/5 * * * * bash /root/makerun-mysql.sh
The supervise tool from daemontools would be my preference - but then everything Dan J Bernstein writes is my preference :)
http://cr.yp.to/daemontools/supervise.html
You have to create a particular directory structure for your application startup script, but it's very simple to use.
first of all, how do you start this app? Does it fork itself to the background? Is it started with nohup .. & etc? If it's the latter, check why it died in nohup.out, if it's the first, build logging.
As for your main question: you could cron it, or run another process on the background (not the best choice) and use pidof in a bashscript, easy enough:
if [ `pidof -s app` -eq 0 ]; then
nohup app &
fi
You could make it a service launched from inittab (although some Linuxes have moved on to something newer in /etc/event.d). These built in systems make sure your service keeps running without writing your own scripts or installing something new.
It's a job for a DMD (daemon monitoring daemon). there are a few around; but I usually just write a script that checks if the daemon is running, and run if not, and put it in cron to run every minute.
Check out 'nanny' referenced in Chapter 9 (p197 or thereabouts) of "Unix Hater's Handbook" (one of several sources for the book in PDF).
A nice, simple way to do this is as follows:
Write your server to die if it can't listen on the port it expects
Set a cronjob to try to launch your server every minute
If it isn't running it'll start, and if it is running it won't. In any case, your server will always be up.
I think a better solution is if you test the function, too. For example, if you had to test an apache, it is not enough only to test, if "apache" processes on the systems exist.
If you want to test if apache OK is, then try to download a simple web page, and test if your unique code is in the output.
If not, kill the apache with -9 and then do a restart. And send a mail to the root (which is a forwarded mail address to the roots of the company/server/project).
It's even simplier:
#!/bin/bash
export DISPLAY=:0
process=processname
makerun="/usr/bin/processname"
if ! pgrep $process > /dev/null
then
$makerun &
fi
You have to remember though to make sure processname is unique.
One can install minutely monitoring cronjob like this:
crontab -l > crontab;echo -e '* * * * * export DISPLAY=":0.0" && for
app in "eiskaltdcpp-qt" "transmission-gtk" "nicotine";do ps aux|grep
-v grep|grep "$app";done||"$app" &' >> crontab;crontab crontab
disadvantage is that the app names you enter have to be found in ps aux|grep "appname" output and at same time being able to be launched using that name: "appname" &
also you can use the pm2 library.
sudo apt-get pm2
And if its a node app can install.
Sudo npm install pm2 -g
them can run the service.
linux service:
sudo pm2 start [service_name]
npm service app:
pm2 start index.js

Resources