How to make atop exclude the statistics since boot? - linux

I have a linux box, whose resource utilization i need to monitor every hour. By resource, i mean mainly cpu, memory and network. I am using atop for the cpu memory and nethogs for the network utilization monitoring. I am thinking of redirecting the reports to text files and send them my email, but the initial startup screen for atop shows all statistics since boot, and it makes the text look messy, so is there a way to make atop skip the initial statistics ?

I would suggest you to use something other than atop. There are many other tools like top, free -m, etc for your cpu, memory and network statistics. The only disadvantage would be that you would have o write them independently.
Landed on your question as I was looking for just that. SeaLion actually works well for this purpose and also you wouldn't need to store them in files. It's all presented on a timeline so you can just "Jump to" whenever you want to check your data. You don't even manually have to record the data.
I suppose this is all you need.

Having the same problem right now, I came up with
atop -PCPU,NET,DSK | sed -n -e '/SEP/,$p'
The -P.. instructs atop to only show the requested information, so roll your own. The important bit is sed which skips lines until the first line containing SEP is found, which effectively skips over the first block of data containing the summary since boot time.

I an not sure, but i think you can't because atop produce statistics over some interval. On the initial run there is no previous point, so atop produce stats since boot to current point, but you can easily use for example awk to parse the output:
atop 1 2 | awk '/ seconds elapsed$/ {output=1;} {if (output) print}'
This is simplest way to solve problem with atop, but there is tons of other tools probably better suited for this job.

Related

about managing file system space

Space Issues in a filesystem on Linux
Lets call it FILESYSTEM1
Normally, space in FILESYSTEM1 is only about 40-50% used
and clients run some reports or run some queries and these reports produce massive files about 4-5GB in size and this instantly fills up FILESYSTEM1.
We have some cleanup scripts in place but they never catch this because it happens in a matter of minutes and the cleanup scripts usually clean data that is more than 5-7 days old.
Another set of scripts are also in place and these report when free space in a filesystem is less than a certain threshold
we thought of possible solutions to detect and act on this proactively.
Increase the FILESYSTEM1 file system to double its size.
set the threshold in the Alert Scripts for this filesystem to alert when 50% full.
This will hopefully give us enough time to catch this and act before the client reports issues due to FILESYSTEM1 being full.
Even though this solution works, does not seem to be the best way to deal with the situation.
Any suggestions / comments / solutions are welcome.
thanks
It sounds like what you've found is that simple threshold-based monitoring doesn't work well for the usage patterns you're dealing with. I'd suggest something that pairs high-frequency sampling (say, once a minute) with a monitoring tool that can do some kind of regression on your data to predict when space will run out.
In addition to knowing when you've already run out of space, you also need to know whether you're about to run out of space. Several tools can do this, or you can write your own. One existing tool is Zabbix, which has predictive trigger functions that can be used to alert when file system usage seems likely to cross a threshold within a certain period of time. This may be useful in reacting to rapid changes that, left unchecked, would fill the file system.

Monitor network usage of a process

This question might sound fairly repetitive, but there are subtle details which make it a bit different.
I am looking for a simple tool (for Ubunut/Linux) to monitor network usage such that it gives the min, max, average, and time-plot of network usage by 1) a single process; and, 2) the system; only during the time when the process was running. The major requirement is that I am not looking for a GUI (or terminal GUI like top) based tool but I want this monitoring information to be pushed to a file so that I can perform some post-processing over that.
I come across the following link which lists various options: http://www.binarytides.com/linux-commands-monitor-network/. However, most tools are GUI based and ones which are not do not provide above information.
Any help would be much appreciated.
Wireshark might work, depending on how far you're willing to relax the non-GUI requirement and whether locating your target processes is simple. Wireshark is of course a GUI app, but the tshark command which comes with it is headless and can be used to capture packets to a file. After capturing all packets on an interface, you can run tshark again on the pcap file to filter the file using Wireshark "Display filters" and extract just the packets for your process. That's one part that may or may not be simple, depending on whether you can identify your process from network traffic content, port(s), or by adding some sentinel dummy data. You'll then have two pcap files, one for the whole network interface and one for just your process.
The capinfos command will report the average throughput. Wireshark can be used to generate a time-plot of the traffic with millisecond (or other) granularity via the menu "Statistics >> IO Graph". As for min and max, you can either eyeball that from the time-plot or use editcap to split the pcap files into chunks, run capinfos on each chunk, and calculate the min and max over all chunks.
That might not be the simplest approach, it's just what occurred to me off the top of my head.

Linux: get amount of memory swapped in/out over a time period

is there an (easy(?)) way to get the the amount of data moved to/from swap over a certain time ? Maybe, either integrated over all processes and time or integrated over specific processes and time?
Story: I have a machine which tends to swap. However, I do not know, if swap is 'actively' used. I.e., if it is constantly swapping or let's say just the shared libraries not really used are swapped away after some time and 'active' memory usage happens in mem in the end.
Thus, I am looking for a way to comfort myself, that the swap usage may be not serious...
Cheers and thanks for ideas,
Thomas
This can be relatively easily (if you know kernel MM subsystem) done via SystemTap.
You need to know the name of functions which do swapin/swapout, create corresponding probes and two counters incremented from probes. Finally, you need a timer which is fired every N seconds, dumps current counters and resets them.
here is my temporary solution to get the overall number of pages swapped in/out between to calls using vmstat
#!/bin/sh
OLDSWAPPEDIN=$SWAPPEDIN
OLDSWAPPEDOUT=$SWAPPEDOUT
PAGEINOUT=$(vmstat -s | grep swapped 2>&1)
SWAPPEDIN=`echo $PAGEINOUT | awk '{print $1}'`
SWAPPEDOUT=`echo $PAGEINOUT | awk '{print $5}'`
SWAPPEDINDIFF=`expr $SWAPPEDIN - $OLDSWAPPEDIN`
SWAPPEDOUTDIFF=`expr $SWAPPEDOUT - $OLDSWAPPEDOUT`
I tried to avoid temporary files for storage variables (so either sourcing it or create the variables at login would be necessary)

How to monitor a process in Linux CPU, Memory and time

How can I benchmark a process in Linux? I need something like "top" and "time" put together for a particular process name (it is a multiprocess program so many PIDs will be given)?
Moreover I would like to have a plot over time of memory and cpu usage for these processes and not just final numbers.
Any ideas?
I typically throw together a simple script for this type of work.
Take a look at the kernel documentation for the proc filesystem (Google 'linux proc.txt').
The first line of /proc/stat (Section 1.8 in proc.txt) will give you cumulative cpu usage stats (i.e. user, nice, system, idle, ...). For each process, the file /proc/$PID/stat (Table 1-4 in proc.txt) will provide you with both process-specific cpu usage stats and memory usage stats (see rss).
If you google a bit you'll find plenty of detailed info on these files, and pointers to libraries / apps / code snippets that can help you obtain / derive the values you need. With that in mind, I'll focus on the high-level strategy.
For CPU stats, use your favorite scripting language to create an executable that takes a set of process ids for monitoring. At a fixed interval (ex: 1 second) poll / calculate the cumulative totals for each process and the system as a whole. During each poll interval, write all results on a single line to stdout.
For memory stats, write a similar script, but simply log the per-process memory usage. Memory is a bit easier as we directly obtain the instantaneous values.
Run these script for the duration of your test, passing the set of processes ids that you'd like to monitor and redirecting its output to a log file.
./logcpu $(pidof foo) $(pidof bar) > cpustats
./logmem $(pidof foo) $(pidof bar) > memstats
Import the contents of these files into a spreadsheet (for certain applications this is as easy as copy / paste). For CPU, you are after instantaneous values but have cumulative values, so you'll need to do some minor spreadsheet work to derive these values (it's just the delta 't(x + 1) - t(x)'). Of course you could have your cpu logger write the delta, but you'll be spending a bit more time up front on the script.
Finally, use your spreadsheet to generate a nice plot.
Following are the tools for monitoring a linux system
System commands like top, free -m, vmstat, iostat, iotop, sar, netstat, etc. Nothing comes near these linux utility when you are debugging a problem. These command give you a clear picture that is going inside your server
SeaLion: Agent executes all the commands mentioned in #1 (also user defined) and outputs of these commands can be accessed in a beautiful web interface. This tool comes handy when you are debugging across hundreds of servers as installation is clear simple. And its FREE
Nagios: It is the mother of all monitoring/alerting tools. It is very much customization but very much difficult to setup for beginners. There are sets of tools called nagios plugins that covers pretty much all important Linux metrics
Munin
Server Density: A cloudbased paid service that collects important Linux metrics and gives users ability to write own plugins.
New Relic: Another well know hosted monitoring service.
Zabbix

Using "top" in Linux as semi-permanent instrumentation

I'm trying to find the best way to use 'top' as semi-permanent instrumentation in the development of a box running embedded Linux. (The instrumentation will be removed from the final-test and production releases.)
My first pass is to simply add this to init.d:
top -b -d 15 >/tmp/toploop.out &
This runs top in "batch" mode every 15 seconds. Let's assume that /tmp has plenty of spaceā€¦
Questions:
Is 15 seconds a good value to choose for general-purpose monitoring?
Other than disk space, how seriously is this perturbing the state of the system?
What other (perhaps better) tools could be used like this?
Look at collectd. It's a very light weight system monitoring framework coded for performance.
We use sysstat to monitor things like this.
You might find that vmstat and iostat with a delay and no repeat counter is a better option.
I suspect 15 seconds would be more than adequate unless you actually want to watch what's happening in real time, but that doesn't appear to be the case here.
As far as load, on an idling PIII 900Mhz w/ 768MB of RAM running Ubuntu (not sure which version, but not more than a year old) I have top updating every 0.5 seconds and it's about 2% CPU utilization. At 15s updates, I'm seeing 0.1% CPU utilization.
depending upon what exactly you want, you could use the output of uptime, free, and ps to get most, if not all, of top's information.
If you are looking for overall load, uptime is probably sufficient. However, if you want specific information about processes, you are adventurous, and have the /proc filessystem enabled, you may want to write your own tools. The primary benefit in this environment is that you can focus on exactly what you want and minimize the load introduced to the system.
The proc file system gives your application read access to the kernel memory that keeps track of many of the interesting variables. Reading from /proc is one of the lightest ways to get this information. Additionally, you may be able to get more information than provided by top. I've done this in the past to get amount of time spent in user and system by this process. Additionally, you can use this to get information about the number of file descriptors open by the process. You might also use this to get detailed information about how the network system is working.
Much of this information is pre-processed by other applications which can be used if you get the information you need. However, it is rather straight-forward to read the raw information. Do a man proc for more information.
Pity you haven't said what you are monitoring for.
You should decide whether 15 seconds is ok or not. Feel free to drop it way lower if you wish (and have a fast HDD)
No worries unless you are running a soft real-time system
Have a look at tools suggested in other answers. I'll add another sugestion: "iotop", for answering a "who is thrashing the HDD" questions.
At work for system monitoring during stress tests we use a tool called nmon.
What I love about nmon is it has the ability to export to XLS and generate beautiful graphs for you.
It generates statistics for:
Memory Usage
CPU Usage
Network Usage
Disk I/O
Good luck :)

Resources