Logging VMStat data to file - linux

I am trying to create some capacity planning reports and one of the requrements is to have info on Memory usage for a few Unix Servers.
Now my knowledge of Unix is very low. I usually just log on and run a few scripts.
But for this report I need to gather VMStat data and produce reports based on previous the previous weeks data broken down by hour which is an average of Vmstat data taken every 10 seconds.
So first question: is VMStat logging on by default and if so what location on the server is the data output to?
If not how can I set this up?
Thanks

vmstat is a command that you run.
To generate one week of Virtual Memory stats spaced out at ten second intervals (less the last one) is 60,479 10 second intervals
So the command you want is:
nohup vmstat 10 604879 > myvmstatfile.dat &
This will make a very big file myvmstatfile.dat
EDIT: RobKielty (The & will put this job in the background, the nohup will prevent the task from hanging up when you logout of the command shell. If you ran this command it would be prudent to monitor the disk partition to which this file was being written to. Use df -h /path/to/directory/where/outputfile/resides to monitor the disk space usage.)
I have no idea what you need to do with the data, so I can't help you there.
Create a crontab entry (crontab -e) like this
0 0 * * 0 /path/to/my/vmstat_script.sh
The file vmstat_script.sh will contain the follow bash script commands.
#!/bin/bash
# vmstat_script.sh
vmstat 10 604879 > myvmstatfile.dat
mv myvmstatfile.dat myvmstatfile.dat.`date +%Y-%m-%d`
This will create one file per week with a name like myvmstatfile.dat.2012-07-01

The command I use for monitoring the Linux vm metrics is below:
nohup vmstat 10 720| (while read; do echo "$(date +%d-%m-%Y" "%H:%M:%S) $REPLY"; done) >> nameofLogfile.log
Here nohup is used for running the process in background.
It will run for 2 hours with interval of 10 secs.
This is the best command for generating graphs and reports as timestamp will also be included in logs along with different metrics, so that we can filter the logs accordingly.

Related

rabbitmq inet_gethost reading ~50 times per second /etc/hosts file

My task is to reduce load on Linux machine running rabbitmq. First place in top is taken by inet_gethost 4 ( there are two such processes, but one is constantly sitting on the top of the top ). I started analyzing that process with strace and it's showing huge number of opening and reading from /etc/hosts. strace -fp 4571 -e open 2> count && wc -l count revealed over 50 reads of that file per second. Question is if that kind of behavior is normal or is this a result of badly configured rabbit or some networking settings.

Bash: How to record highest memory/cpu consumption during execution of a bash script?

I have a function in a bash script that executes a long process called runBatch. Basically runBatch takes a file as an argument and loads the contents into a db. (runBatch is just a wrapper function for a database command that loads the content of the file)
My function has a loop that looks something like the below, where I am currently recording start time and elapsed time for the process to variables.
for batchFile in `ls $batchFilesDir`
do
echo "Batch file is $batchFile"
START_TIME=$(($(date +%s%N)/1000000))
runBatch $batchFile
ELAPSED_TIME=$(($(($(date +%s%N)/1000000))-START_TIME))
IN_SECONDS=$(awk "BEGIN {printf \"%.2f\",${ELAPSED_TIME}/1000}")
done
Then I am writing some information on each batch (such as time, etc.) to a table in a html page I am generating.
How would I go about recording the highest memory/cpu usage while the runBatch is running, along with the time, etc?
Any help appreciated.
Edit: I managed to get this done. I added a wrapper script around this script that runs this script in the background. I pass it's PID with $! to another script in the wrapper script that monitors the processes CPU and Memory usage with top every second. I compile everything into a html page at the end when the PID is no longer alive. Cheers for the pointers.
You should be able to get the PID of the process using $!,
runBatch $batchFile &
myPID=$!
and then you can run a top -b -p $myPID to print out a ticking summary of CPU.
Memory:
cat /proc/meminfo
Next grep whatever you want,
Cpu, it is more complicated - /proc/stat expained
Average load:
cat /proc/loadavg
For timing "runBatch" use
time runBatch
like
time sleep 10
Once you've got the pid of your process (e.g. like answered here) you can use (with watch(1) & cat(1) or grep(1)) the proc(5) file system, e.g.
watch cat /proc/$myPID/stat
(or use /proc/$myPID/status or /proc/$myPID/statm, or /proc/$myPID/maps for the address space, etc...)
BTW, to run batch jobs you should consider batch (and you might look into crontab(5) to run things periodically)

Get the load, cpu usage and time of executing a bash script

I have a bash script that I plan to run every 5 or 15 mins using crontab based on the load it puts on server.
I can find time of running the script, but load, memory usage and CPU usage I am not sure how to find.
Can someone help me?
Also any suggestions of rough benchmark that will help me decide if the script puts too much load and should be run every 15 mins and not 5 mins.
Thanks in Advance!
You can use "top -b", top gives the CPU usage, memory usage etc,
Insert these lines in your script, this will process in background and will terminate the process as soon as your testing overs.
ssh server_name "nohup top -b -d 0.5 >> file_name &"
\top process will run in background because of &, -d 0.5 will give you the cpu status at every 0.5 secs, redirect the output in file_name for latter analysis.
for killing the process after your test, insert following in your script,
ssh server_name "kill \`ps -elf | grep 'top -b' | grep -v grep | sed 's/ */ /g' |cut -d ' ' -f4\`"
Your main testing script should be between top command and command for killing top.
I presumed you are running the script from client side, if not ignore "ssh server_name".
If you are running it from client side, because of "ssh", you will be asked for the password, for avoiding this follow these 3 simple steps
This will definitely solve the issue.
You can check following utilities
pidstat for CPU load, man page
pmap for memory load, man page
Although you might need to make measurements also for the child processes of your executable, in order to collect summarizing information
For memory, use free -m. Your actual memory available is the second number next to +/- buffers/cache (in megabytes with -m) (source).
For CPU, it's a bit more complicated. Start by looking at cat /proc/stat | grep 'cpu ' (note the space). You'll see something like this:
cpu 2255 34 2290 22625563 6290 127 456
The columns are from left to right, "user, nice, system, idle". CPU usage is usually calculated as (user+nice+system) / (user+nice+system+idle). However, these numbers show the number of "time units" that the CPU has spent doing that thing since boot, and thus are always increasing. If you were to do the aforementioned calculation, you'd get the CPU usage average since boot. To get a point-in-time usage, you have to take 2 samples, find their difference, and calculate the usage from that. To be clear, that will be the average CPU usage between your samples. (source)

Linux display average CPU load for last week

On a Linux box, I need to display the average CPU utilisation per hour for the last week. Is that information logged somewhere? Or do I need to write a script that wakes up every 15 minutes to copy /proc/loadavg to a logfile?
EDIT: I'm not allowed to use any tools other than those that come with Linux.
You might want to check out sar (man page), it fits your use case nicely.
System Activity Reporter (SAR) - capture important system performance metrics at
periodic intervals.
Example from IBM Developer Works Article:
Add an entry to your root crontab
# Collect measurements at 10-minute intervals
0,10,20,30,40,50 * * * * /usr/lib/sa/sa1
# Create daily reports and purge old files
0 0 * * * /usr/lib/sa/sa2 -A
Then you can simply query this information using a sar command (display all of today's info):
root ~ # sar -A
Or just for a certain days log file:
root ~ # sar -f /var/log/sa/sa16
You can usually find it in the sysstat package for your linux distro
As far as I know it's not stored anywhere... It's a trivial thing to write, anyway. Just add something like
cat /proc/loadavg >> /var/log/loads
to your crontab.
Note that there are monitoring tools (like Munin) which can do this kind of thing for you, and generate pretty graphs of it to boot... they might be overkill for your situation though.
I would recommend looking at Multi Router Traffic Grapher (MRTG).
Using snmpd to read the load average, it will automatically calculate averages at any time interval and length, along with nice charts for analysis.
Someone has already posted a CPU usage example.

linux uptime history

How can I get a history of uptimes for my debian box? After a reboot, I dont see an option for the uptime command to print a history of uptimes. If it matters, I would like to use these uptimes for graphing a page in php to show my webservers uptime lengths between boots.
Update:
Not sure if it is based on a length of time or if last gets reset on reboot but I only get the most recent boot timestamp with the last command. last -x also does not return any further info. Sounds like a script is my best bet.
Update:
Uptimed is the information I am looking for, not sure how to grep that info in code. Managing my own script for a db sounds like the best fit for an application.
Install uptimed. It does exactly what you want.
Edit:
You can apparantly include it in a PHP page as easily as this:
<? system("/usr/local/bin/uprecords -a -B"); ?>
Examples
the last command will give you the reboot times of the system. You could take the difference between each successive reboot and that should give the uptime of the machine.
update
1800 INFORMATION answer is a better solution.
You could create a simple script which runs uptime and dumps it to a file.
uptime >> uptime.log
Then set up a cron job for it.
Try this out:
last | grep reboot
according to last manual page:
The pseudo user reboot logs in each time the system is rebooted.
Thus last reboot will show a log of all reboots since the log file
was created.
so last column of #last reboot command gives you uptime history:
#last reboot
reboot system boot **************** Sat Sep 21 03:31 - 08:27 (1+04:56)
reboot system boot **************** Wed Aug 7 07:08 - 08:27 (46+01:19)
This isn't stored between boots, but The Uptimes Project is a third-party option to track it, with software for a range of platforms.
Another tool available on Debian is uptimed which tracks uptimes between boots.
I would create a cron job to run at the required resolution (say 10 minutes) by entering the following [on one single line - I've just separated it for formatting purposes] in your crontab (cron -l to list, cron -e to edit).
0,10,20,30,40,50 * * * *
/bin/echo $(/bin/date +\%Y-\%m-\%d) $(/usr/bin/uptime)
>>/tmp/uptime.hist 2>&1
This appends the date, time and uptime to the uptime.hist file every ten minutes while the machine is running. You can then examine this file manually to figure out the information or write a script to process it as you see fit.
Whenever the uptime reduces, there's been a reboot since the previous record. When there are large gaps between lines (i.e., more than the expected ten minutes), the machine's been down during that time.
This information is not normally saved. However, you can sign up for an online service that will do this for you. You just install a client that will send your uptime to the server every 5 minutes and the site will present you with a graph of your uptimes:
http://uptimes-project.org/
i dont think this information is saved between reboots.
if shutting down properly you could run a command on shutdown that saves the uptime, that way you could read it back after booting back up.
Or you can use tuptime https://sourceforge.net/projects/tuptime/ for a total uptime time.
You can use tuptime, a simple command for report the total uptime in linux keeping it betwwen reboots.
http://sourceforge.net/projects/tuptime/
Since I haven't found an answer here that would help retroactively, maybe this will help someone.
kern.log (depending on your distribution) should log a timestamp.
It will be something like:
2019-01-28T06:25:25.459477+00:00 someserver kernel: [44114473.614361] somemessage
"44114473.614361" represents seconds since last boot, from that you can calculate the uptime without having to install anything.
Nagios can make even very beautiful diagrams about this.
Use Syslog
For anyone coming here searching for their past uptime.
The solution of #1800_Information is a good advise for the future, but I needed to find information for my past uptimes on a specific date.
Therefore I used syslog to determine when that day the system was started (first log entry of that day) and when the system was shutdown again.
Boot time
To get the system start time grep for the month and day and show only the first lines:
sudo grep "May 28" /var/log/syslog* | head
Shutdown time
To get the system shutdown time grep for the month and day and show only the last few lines:
sudo grep "May 28" /var/log/syslog* | tail

Resources