PBS walltime: how much was actually used? - pbs

How do I figure out how much walltime (mem? vmem?) a PBS job (PBS Pro) actually ended up using, if it's not presented in the stodut/sterr logs?

In Torque, this information is visible in the accounting log and in the qstat -f output for the job. In qstat -f, you wanted to look at the resources_used information.
This may have diverged somewhat in PBS Pro, but my guess is they have something similar.

Wall time is always measured outside of the system. That's why it refers to the "clock on the wall".
This is important because it often encompasses elements that some systems fail to measure, or measure poorly. To illustrate, before a system can capture the time, some code must run to allocate the memory to capture the time, and then some code must run to assign that memory. Everything before that happens is misreported to not have "cost" any time at all.
While I may have described the essence of wall time, do look to dbeer's excellent answer for capturing a time close to wall clock time (and hopefully solving your metric gathering problem).

Related

What may slow down an ATA read-verify command sent to HDD on linux?

I am writing a C program to scan hard drives using ATA read-verify(0x40) command on Linux, like what MHDD's scan does on DOS.
I issue the command using HDIO_DRIVE_TASK, and measure ioctl's block time using CLOCK_MONOTONIC.
I run the program as root, and have its ionice set to real time, but the readouts are always larger than what MHDD shows. Also, MHDD's result don't change a lot, but my program's result often vary a lot.
I try to issue the command twice for each block and measure the block time of the second run.
This fixes part of the problem, but my results still vary a lot.
What factors may slow down my command? How should I avoid them?
P.S. I have some spare drives with different health for testing use.

how can I simply benchmark my linux application

Imagine you write application, alternative to some existing version and you want to compare if it's more effective or not,
you can simply use time like
time yourcommand
time oldcommand
and compare the execution time to check some difference, but this isn't very detailed
Is there similar command to check more data? Such as memory usage, cpu utilization, cpu peak, memory peak etc...
A good implementation of time actually tells you a lot more than wallclock time. Most Linux systems have one, but Bash tends to obscure it in favor of its built-in time, so you have to call it as /usr/bin/time:
$ /usr/bin/time python -c "import numpy as np; np.empty(100000)"
0.12user 0.00system 0:00.13elapsed 96%CPU (0avgtext+0avgdata 12860maxresident)k
0inputs+0outputs (0major+3777minor)pagefaults 0swaps
That's CPU use, memory usage and several other statistics for a simple Python command. See the manpage time(1) for what time can do.
There is no single best way to do what you're talking about, as it depends a lot on your application, as well as what you wish to profile.
But this post offers some suggestions on ways to profile Linux or a specific application, which may help you along the right direction.
You will likely find better answers if you can tell us more specifically what you're hoping to profile, which language(s) you're using, etc.

Accurate way of measuring overhead in kernel space

I recently implemented a security mechanism for Linux which hooks into system calls. Now I have to measure the overhead caused by it. The project requires to compare the execution time of typical Linux apps with and without the mechanism. By typical Linux apps I assume ex. gzipping 1G file, doing 'find /', grepping files. The main goal is to show the overhead in different types of tasks: CPU bound, I/O bound etc.
The question is: how to organise the test so that they will be reliable? The first important thing is the fact that my mechanism works only in kernel space, so it is relevant to compare systime. I can use 'time' command for it, but is it the most accurate way of measuring systime? Another idea is to run those apps in long loops to minimize error. Then the loops should be inside or outside time command? If they are outside I will get many results - should I choose min, max, median, average?
Thanks for any suggestions.
I think you want more to measure a typical application payload (as Ninjajl's comment suggests, the compilation of the kernel could be a good payload). You probably don't want to measure the overhead inside each syscall itself, or even inside the kernel as a whole.
The reason for this is that most applications spend much more time and resource in user-space than in kernel-land (i.e. syscalls), so overhead inside syscalls is a "second-order" effect and probably don't matter as much. Of course, there are probable exceptions.
Perhaps phoronix test suite might be relevant.
You might be interested by oprofile
See also this answer and this question

Microsecond accurate (or better) process timing in Linux

I need a very accurate way to time parts of my program. I could use the regular high-resolution clock for this, but that will return wallclock time, which is not what I need: I needthe time spent running only my process.
I distinctly remember seeing a Linux kernel patch that would allow me to time my processes to nanosecond accuracy, except I forgot to bookmark it and I forgot the name of the patch as well :(.
I remember how it works though:
On every context switch, it will read out the value of a high-resolution clock, and add the delta of the last two values to the process time of the running process. This produces a high-resolution accurate view of the process' actual process time.
The regular process time is kept using the regular clock, which is I believe millisecond accurate (1000Hz), which is much too large for my purposes.
Does anyone know what kernel patch I'm talking about? I also remember it was like a word with a letter before or after it -- something like 'rtimer' or something, but I don't remember exactly.
(Other suggestions are welcome too)
The Completely Fair Scheduler suggested suggested by Marko is not what I was looking for, but it looks promising. The problem I have with it is that the calls I can use to get process time are still not returning values that are granular enough.
times() is returning values 21, 22, in milliseconds.
clock() is returning values 21000, 22000, same granularity.
getrusage() is returning values like 210002, 22001 (and somesuch), they look to have a bit better accuracy but the values look conspicuously the same.
So now the problem I'm probably having is that the kernel has the information I need, I just don't know the system call that will return it.
If you are looking for this level of timing resolution, you are probably trying to do some micro-optimization. If that's the case, you should look at PAPI. Not only does it provide both wall-clock and virtual (process only) timing information, it also provides access to CPU event counters, which can be indispensable when you are trying to improve performance.
http://icl.cs.utk.edu/papi/
See this question for some more info.
Something I've used for such things is gettimeofday(). It provides a structure with seconds and microseconds. Call it before the code, and again after. Then just subtract the two structs using timersub, and you can get the time it took in seconds from the tv_usec field.
If you need very small time units to for (I assume) testing the speed of your software, I would reccomend just running the parts you want to time in a loop millions of times, take the time before and after the loop and calculate the average. A nice side-effect of doing this (apart from not needing to figure out how to use nanoseconds) is that you would get more consistent results because the random overhead caused by the os sceduler will be averaged out.
Of course, unless your program doesn't need to be able to run millions of times in a second, it's probably fast enough if you can't measure a millisecond running time.
I believe CFC (Completely Fair Scheduler) is what you're looking for.
You can use the High Precision Event Timer (HPET) if you have a fairly recent 2.6 kernel. Check out Documentation/hpet.txt on how to use it. This solution is platform dependent though and I believe it is only available on newer x86 systems. HPET has at least a 10MHz timer so it should fit your requirements easily.
I believe several PowerPC implementations from Freescale support a cycle exact instruction counter as well. I used this a number of years ago to profile highly optimized code but I can't remember what it is called. I believe Freescale has a kernel patch you have to apply in order to access it from user space.
http://allmybrain.com/2008/06/10/timing-cc-code-on-linux/
might be of help to you (directly if you are doing it in C/C++, but I hope it will give you pointers even if you're not)... It claims to provide microsecond accuracy, which just passes your criterion. :)
I think I found the kernel patch I was looking for. Posting it here so I don't forget the link:
http://user.it.uu.se/~mikpe/linux/perfctr/
http://sourceforge.net/projects/perfctr/
Edit: It works for my purposes, though not very user-friendly.
try the CPU's timestamp counter? Wikipedia seems to suggest using clock_gettime().

Using "top" in Linux as semi-permanent instrumentation

I'm trying to find the best way to use 'top' as semi-permanent instrumentation in the development of a box running embedded Linux. (The instrumentation will be removed from the final-test and production releases.)
My first pass is to simply add this to init.d:
top -b -d 15 >/tmp/toploop.out &
This runs top in "batch" mode every 15 seconds. Let's assume that /tmp has plenty of spaceā€¦
Questions:
Is 15 seconds a good value to choose for general-purpose monitoring?
Other than disk space, how seriously is this perturbing the state of the system?
What other (perhaps better) tools could be used like this?
Look at collectd. It's a very light weight system monitoring framework coded for performance.
We use sysstat to monitor things like this.
You might find that vmstat and iostat with a delay and no repeat counter is a better option.
I suspect 15 seconds would be more than adequate unless you actually want to watch what's happening in real time, but that doesn't appear to be the case here.
As far as load, on an idling PIII 900Mhz w/ 768MB of RAM running Ubuntu (not sure which version, but not more than a year old) I have top updating every 0.5 seconds and it's about 2% CPU utilization. At 15s updates, I'm seeing 0.1% CPU utilization.
depending upon what exactly you want, you could use the output of uptime, free, and ps to get most, if not all, of top's information.
If you are looking for overall load, uptime is probably sufficient. However, if you want specific information about processes, you are adventurous, and have the /proc filessystem enabled, you may want to write your own tools. The primary benefit in this environment is that you can focus on exactly what you want and minimize the load introduced to the system.
The proc file system gives your application read access to the kernel memory that keeps track of many of the interesting variables. Reading from /proc is one of the lightest ways to get this information. Additionally, you may be able to get more information than provided by top. I've done this in the past to get amount of time spent in user and system by this process. Additionally, you can use this to get information about the number of file descriptors open by the process. You might also use this to get detailed information about how the network system is working.
Much of this information is pre-processed by other applications which can be used if you get the information you need. However, it is rather straight-forward to read the raw information. Do a man proc for more information.
Pity you haven't said what you are monitoring for.
You should decide whether 15 seconds is ok or not. Feel free to drop it way lower if you wish (and have a fast HDD)
No worries unless you are running a soft real-time system
Have a look at tools suggested in other answers. I'll add another sugestion: "iotop", for answering a "who is thrashing the HDD" questions.
At work for system monitoring during stress tests we use a tool called nmon.
What I love about nmon is it has the ability to export to XLS and generate beautiful graphs for you.
It generates statistics for:
Memory Usage
CPU Usage
Network Usage
Disk I/O
Good luck :)

Resources