Long running service check in Nagios

Long running service check in Nagios - linux

I have a service check that I've found on the Nagios Exchange site which works well for small directories, but not well for larger ones that take longer than 30 or 60 seconds to complete.
http://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-Systems/Linux/CheckDirSize/details
The problem I'm having is that I need to configure a service check that Nagios can run once a day but will remain open for 1440 minutes (one day). The directory listing is huge and takes many hours to complete (up to 20 hours).
This is my service check (check every day, when using nrpe, the timeout is 86400 seconds which is also one day). But for some reason, even though I can see the du -sk running on the command line in ps -ef | grep du, Nagios is reporting "(Service Check Timed Out)":
define service {
use generic-service,srv-pnp
host_name IMAGEServer1
service_description Images
check_command check_nrpe!check_dirsize -t 86400
check_interval 1440
}
In my nrpe.cfg file on the linux server i have these two directives as well:
command_timeout=86400
connection_timeout=86400
How can I get Nagios to complete the check and not time out? I was under the impression that my directives above were correct.

What's timing out is the check_nrpe command on the local side (it has a default timeout of 2 minutes). You could edit its command definition to use a long timeout.
Alternatively, you might want to do this as a passive check on IMAGEServer1, running as a cron job.

Related

Setting up a cronjob on Google Compute Engine

I am new to setting up cronjobs and I'm trying to do it on a virtual machine in google compute engine. After a bit of research, I found this StackOverflow question: Running Python script at Regular intervals using Cron in Virtual Machine (Google Cloud Platform)
As per the answer, I managed to enter the crontab -e edit mode and set up a test cronjob like 10 8 * * * /usr/bin/python /scripts/kite-data-pull/dataPull.py. I also checked the system time, which was in UTC, and entered the time according to that.
The step I'm supposed to take, as per the answer, is to run sudo systemctl restart cron which is throwing an error for me:
sudo systemctl restart cron
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
Any suggestions on what I can do to set up this cronjob correctly?

Edit a cron jobs with crontab -e and inset a line:
* * * * * echo test123 > /your_homedir_path/file.log
That will write test123 every minute into file.log file.
Then do tail if and wait a couple minutes. You should see test123 lines appearing in the file (and screen).
If it runs try running your python file but first make your .py file executable with "chmod +x script.py"
Here you can find my reply to similar question.

Crontab reboot job reboot multiple times instead of once

I am trying to schedule my debian jessie machine to shutdown at 9:00 p.m. every 3 days. I currently use a cronjob:
00 21 */3 * * root bash /home/pi/scripts/reboot.sh
where reboot.sh is:
sudo reboot
The machine shuts down on schedule but what is strange is that it just keeps rebooting for several times. how am I able to get rid of this issue. is this related to maybe the RTC clock no have enough time to update itself and so the cron job still thinks the time is still 9... I really doubt this.. any help

its better to use the internal command shutdown instead of using the script. shutdown now shuts the computer and -r flag is for reboot system. you can also pass specific time instead of now like shutdown -r 11:00.
For now you can use
shutdown -r now

Using cron Chef cookbook to run a command every 30 mins

I am using the cron cookbook to run every 30 minutes in the following way:
cron_d 'logrotate_check' do
minute "*/30"
command "logrotate -s /var/log/logstatus /etc/logrotate.d/consul_logs"
user 'root'
end
Please let me know if it is correct?

Yes, that is fine. In the future, please just try it yourself rather than asking the internet and waiting 10 hours.

how to efficiently monitor system stat using vmstat?

Am getting the real-time memory stats from vmstat command. I did this using following steps:
$nohup vmstat 60 > vmstatrecord.app &
the command executes in background and writes the log to the file vmstatrecord.app. When i see use the command
$ps -A | grep stat
I could see the vmstat running in the background and i could also access the log using tail command as:
$tail -f vmstatrecord.app
the file updates every 60sec interval.
Now my question is
1. process continues to write to the file so what will happen if i leave for days ?
Assumption:
If the process writes the file forever am afraid that the file size might grow too large
If my assumption is correct and my steps are inefficient. Is there any alternatives to achieve what am trying to achieve from my above steps ?

This question should better be asked on superuser.com or maybe serverfault.com, as it's not about programming.
Yes, your file will keep growing. That's what the 2nd parameter of vmstat is for - run vmstat 60 1440 to stop after a day (note 1440 = 60 minutes * 24 hours). Once when i had this problem, i made a crontab entry:
0 0 * * * vmstat 60 1440 > /some/where/vmstat.out
to restart the output every day.

About system time in a bash script

I'm starting work on a bash script that will shutdown my computer at a certain time of day, but I'm not really sure what all specifically needs to go into that. Could someone post an example of how they would do it?

If you want to shut down at 22:15 every day, be root, run crontab -e, then add
15 22 * * * shutdown -h 5
on a line by itself.
Then save. At 22:15 every day, you will get a warning that the system will shut down in 5 minutes, and five minutes later, true to its word, it shuts down. If you want to abort the shutdown, run shutdown -c as root after the warning.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Long running service check in Nagios - linux

What's timing out is the check_nrpe command on the local side (it has a default timeout of 2 minutes). You could edit its command definition to use a long timeout. Alternatively, you might want to do this as a passive check on IMAGEServer1, running as a cron job.

Related

Setting up a cronjob on Google Compute Engine

Crontab reboot job reboot multiple times instead of once

Using cron Chef cookbook to run a command every 30 mins

how to efficiently monitor system stat using vmstat?

About system time in a bash script

Categories

Resources