How accurate is the Linux bash time command? - linux

I want to timestamp some events in a logfile from a bash script. I need this timestamp to be as accurate as possible. I see that the standard way of doing this from bash seems to be the time command, which can produce a nanoseconds timestamp with the +%s%N option.
However, when doing this from C I remembered that multiple timekeeping functions had multiple clock sources, and not all of them were equally accurate or had the same guarantees (e.g. being monotonic). How do I know what clock source time uses?

The man 1 time is rather clear:
These statistics consist of (i) the elapsed real time between
invocation and termination, (ii) the user CPU time (the sum of the tms_utime and tms_cutime values in a struct tms as returned by
times(2)), and (iii) the system CPU time (the sum of the tms_stime and tms_cstime values in a struct tms as returned by
times(2)).
So we can go to man 3p times where is just states The accuracy of the times reported is intentionally left unspecified to allow implementations flexibility in design, from uniprocessor to multi-processor networks. So we can go to man 2 times, and learn that it's all measured with clock_t and maybe we should use clock_gettime instead
How do I know what clock source time uses?
As usually on a GNU system, all programs are open source. So you go and download sources of the kernel and you shell and inspect them to see how it works. I see in bash time_command() there are many methods available and nowadays bash uses rusage as a replacement for times.
How accurate is the Linux bash time command?
Both getrusage() and times() are system calls by themselfs, so the values are returned straight from the kernel. My guess would be that they are measured with the accuracy the kernel can give us - so with jiffies/HZ.
The resolution of the measurement will be equal to jiffies, so usually with 300 HZ thats 3.333ms if my math is right. The accuracy will depend on your hardware, maybe also workload - my overestimated guess would be that the values will be right up to one or two jiffies of accuracy, so up to ~7 milliseconds.

Related

How to implement sleep utility in RISC-V?

I want to implement sleep utility that receives number of seconds as an input and pauses for given seconds on a educatational xv6 operation system that runs on risc-v processors.
The OS already have system call that get number of ticks and pauses: https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/sysproc.c#L56
Timers are initialized using a timer vector: https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/kernelvec.S#L93
The timer vector is initialized with CLINT_MTIMECMP function that tells timer controller when to wake the next interrupt.
What I do not understand is how to know the time between the ticks and how many ticks are done during 1 second.
Edit: A quick google of "qemu timebase riscv mtime" found a google groups chat which states that RDTIME is nanoseconds since boot and mtime is an emulated 10Mhz clock.
I haven't done a search to find the information you need, but I think I have some contextual information that would help you find it. I would recommend searching QEMU documentation / code (probably from Github search)for how mtime and mtimecmp work.
In section 10.1 (Counter - Base Counter and Timers) of specification1, it is explained that the RDTIME psuedo-instruction should have some fixed tick rate that can be determined based on the implementation 2. That tick rate would also be shared for mtimecmp and mtime as defined in the privileged specification 3.
I would presume the ticks used be the sleep system call would be the same as these ticks from the specifications. In that case, xv6 is just a kernel and wouldn't then define how many ticks/second there are. It seems that xv6 is made to run on top of qemu so the definition of ticks/second should be defined somewhere in the qemu code and might be documented.
From the old wiki for QEMU-riscv it should be clear that the SiFive CLINT defines the features xv6 needs to work, but I doubt that it specifies how to know the tickrate. Spike also supports the CLINT interface so it may also be instructive to search for the code in spike that handles it.
1 I used version 20191213 of the unprivileged specification as a reference
2
The RDTIME pseudoinstruction reads the low XLEN bits of the time CSR, which counts wall-clock
real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only in-
struction that reads bits 63–32 of the same real-time counter. The underlying 64-bit counter should
never overflow in practice. The execution environment should provide a means of determining the
period of the real-time counter (seconds/tick). The period must be constant. The real-time clocks
of all harts in a single user application should be synchronized to within one tick of the real-time
clock. The environment should provide a means to determine the accuracy of the clock.
3
3.1.10
Machine Timer Registers (mtime and mtimecmp)
Platforms provide a real-time counter, exposed as a memory-mapped machine-mode read-write
register, mtime. mtime must run at constant frequency, and the platform must provide a mechanism
for determining the timebase of mtime.

Using /proc/*/stat for profiling

On Linux, a process' (main thread's) last program-counter value is presented in /proc/$PID/stat. This seems to be a really simple and easy way to do some sampled profiling without having to instrument a program in any way whatsoever.
I'm wondering if this has any caveats when it comes to the sampling quality, however. I'm assuming this value is updated whenever the process runs out of its timeslice, which should happen at completely random intervals in the program code, and that samples taken at more than time-slice length should be uniformly randomly distributed according to where the program actually spends its time. But that's just an assumption, and I realize it could be wrong in any number of ways.
Does anyone know?
Why not to try modern builtin linux tools like perf (https://perf.wiki.kernel.org/index.php/Main_Page)?
It has record mode with adjustable frequency (-F100 for 100 Hz), with many events, for example, on software event task-clock without using of hardware performance counters (stop the perf with Ctrl-C or add sleep 10 to the right to sample for 10 seconds):
perf record -p $PID -e task-clock -o perf.output.file
Perf works for all threads without any instrumenting (recompilation or code editing) and will not interfere with program execution (only timer interrupt is slightly modified). (There is also some support of stacktrace sampling with -g option.)
Output can be parsed offline with perf report (only this command will try to parse binary and shared libraries)
perf report -i perf.output.file
or converted to raw PC (EIP) samples with perf script -i perf.output.file.
PS: EIP pointer in /proc/$pid/stat file is mentioned in official linux man page 5 proc http://man7.org/linux/man-pages/man5/proc.5.html as kstkeip - "The current EIP (instruction pointer)." It is read at fs/proc/array.c:do_task_stat eip = KSTK_EIP(task);, but I'm not sure where and when it is filled. It can be written on task switch (both on involuntary when taskslice ends and voluntary when tasks does something like sched_yield) or on blocking syscalls, so it is probably not the best choice as sampling source.
If it works, which it could, it will have the shortcomings of prof, which gprof was supposed to remedy. Then gprof has its own shortcomings, which have led to numerous more modern profilers. Some of us consider this to be the most effective, and it can be accomplished with a tool as simple as pstack or lsstack.

Unique time stamp in Linux

Is there a clock in Linux with nanosecond precision that is strictly increasing and maintained through a power cycle? I am attempting to store time series data in a database where each row has a unique time stamp. Do I need to use an external time source such as a GPS receiver to do this? I would like the time stamp to be in or convertible to GPS time.
This is not a duplicate of How to create a high resolution timer in Linux to measure program performance?. I am attempting to store absolute times, not calculate relative time differences. The clock must persist over a power cycle.
Most computers now have software that periodically corrects the system time using the internet. This means that the system clock can go up or down some milliseconds every so often. Remember that the computer clock has some drift. If you don't want problems with leap seconds, use sidereal time (no leap second corrections). ntp will be off in the microsecond or millisecond range because of differences in latency over the internet. The clocks that would meet your requirements are fifty thousand dollars and up.
Based on the question, the other answers, and discussion in comments...
You can get "nanosecond precision that is strictly increasing and maintained through a power cycle" by combining the results of clock_gettime() using the CLOCK_REALTIME and from using CLOCK_MONOTONIC - with some caveats.
First, turn off NTP. Run NTP once at each system restart to sync your time with the world, but do not update the time while the system is up. This will avoid rolling the time backwards. Without doing this you could get a sequence such as
20160617173556 1001
20160617173556 1009
20160617173556 1013
20160617173555 1020 (notice the second went backward)
20160617173556 1024
(For this example, I'm just using YYMMDDhhmmss followed by some fictional monotonic clock value.)
Then you face business decisions.
• How important is matching the world's time, compared to strictly increasing uniqueness?
. (hardware clock drift could throw off accuracy with world time)
• Given that decision, is it worth the investment in specialized hardware, rather than a standard (or even high-end) PC?
• If two events actually happen during the same nanosecond, is it acceptable to have duplicate entries?
• etc.
There are many tradeoffs possible based on the true requirements that lead to developing this application.
In no particular order:
To sure your time is free of savings time changes use date's -u argument to use UTC time. That is always increasing, baring time corrections from system admins.
The trouble with %N is that the precision you actually get depends on the hardware and can be much less than %N allows. Run a few experiments to find out. Warnings on this tend to be everywhere but it is still overlooked.
If you are writing Cish code, see the utime() function and use gmtime(), not localtime() type functions to convert to text. Look at the strftime() function to format the integer part of the time. You will find the strftime() format fields magically match those of the date command formats as basically, date calls strftime. The true paranoid willing to write additional code can use CLOCK_MONOTONIC to be sure your time is increasing.
If you truly require increasing times you may need to write your own command or function that remembers the last time. If called during the same time add and offset of 1. Keep incrementing the offset as often as required to assure unique times until you get hardware time greater than your adjusted time.
Linux tends to favor NTP to obtain Network time. The previously mentioned function to assure increasing time will help with backwards jumps as the jumps are usually not large.
If nanosecond precision is really sufficient for you:
date +%s%N

Linux time command - real vs user vs system

I am running a jar file in Linux with time command. Below is the output after execution.
15454.58s real 123464.61s user 6455.55s system
Below is the command executed.
time java -jar -Xmx7168m Batch.jar
But actual time taken to execute the process is 9270 seconds.
Why the actual time(wall clock time) and real time is different?
Can anyone explain this? Its running on multi core machine (32 core).
Maybe this explains the deviation you are experiencing. From the time Wikipedia article:
Because a program may fork children whose CPU times (both user and
sys) are added to the values reported by the time command, but on a
multicore system these tasks are run in parallel, the total CPU time
may be greater than the real time.
Apart from that, your understanding of real time conforms with the definition given in time(7):
Real time is defined as time measured from some fixed point, either from a standard point in the past (see the description of the Epoch and calendar time below), or from some point (e.g., the start) in the life of a process (elapsed time).
See also bash(1) (although its documentation on the time command is not overly comprehensive).
If seconds are exact enough for you, this little wrapper can help:
#!/bin/bash
starttime=$( date +"%s" )
# run your program here
endtime=$( date +"%s" )
duration=$(( endtime-starttime ))
echo "Execution took ${duration} s."
Note: If the system time is changed while your program is running, the results will be incorrect.
From what I remember, user time is the time it spends in user space, system is the time spend running in kernel space (syscalls, for example), and real is also called the wall clock time (the actual time that you can measure with a stop-watch, for example). Don't know exactly how this is calculated on a SMP system.

Microsecond accurate (or better) process timing in Linux

I need a very accurate way to time parts of my program. I could use the regular high-resolution clock for this, but that will return wallclock time, which is not what I need: I needthe time spent running only my process.
I distinctly remember seeing a Linux kernel patch that would allow me to time my processes to nanosecond accuracy, except I forgot to bookmark it and I forgot the name of the patch as well :(.
I remember how it works though:
On every context switch, it will read out the value of a high-resolution clock, and add the delta of the last two values to the process time of the running process. This produces a high-resolution accurate view of the process' actual process time.
The regular process time is kept using the regular clock, which is I believe millisecond accurate (1000Hz), which is much too large for my purposes.
Does anyone know what kernel patch I'm talking about? I also remember it was like a word with a letter before or after it -- something like 'rtimer' or something, but I don't remember exactly.
(Other suggestions are welcome too)
The Completely Fair Scheduler suggested suggested by Marko is not what I was looking for, but it looks promising. The problem I have with it is that the calls I can use to get process time are still not returning values that are granular enough.
times() is returning values 21, 22, in milliseconds.
clock() is returning values 21000, 22000, same granularity.
getrusage() is returning values like 210002, 22001 (and somesuch), they look to have a bit better accuracy but the values look conspicuously the same.
So now the problem I'm probably having is that the kernel has the information I need, I just don't know the system call that will return it.
If you are looking for this level of timing resolution, you are probably trying to do some micro-optimization. If that's the case, you should look at PAPI. Not only does it provide both wall-clock and virtual (process only) timing information, it also provides access to CPU event counters, which can be indispensable when you are trying to improve performance.
http://icl.cs.utk.edu/papi/
See this question for some more info.
Something I've used for such things is gettimeofday(). It provides a structure with seconds and microseconds. Call it before the code, and again after. Then just subtract the two structs using timersub, and you can get the time it took in seconds from the tv_usec field.
If you need very small time units to for (I assume) testing the speed of your software, I would reccomend just running the parts you want to time in a loop millions of times, take the time before and after the loop and calculate the average. A nice side-effect of doing this (apart from not needing to figure out how to use nanoseconds) is that you would get more consistent results because the random overhead caused by the os sceduler will be averaged out.
Of course, unless your program doesn't need to be able to run millions of times in a second, it's probably fast enough if you can't measure a millisecond running time.
I believe CFC (Completely Fair Scheduler) is what you're looking for.
You can use the High Precision Event Timer (HPET) if you have a fairly recent 2.6 kernel. Check out Documentation/hpet.txt on how to use it. This solution is platform dependent though and I believe it is only available on newer x86 systems. HPET has at least a 10MHz timer so it should fit your requirements easily.
I believe several PowerPC implementations from Freescale support a cycle exact instruction counter as well. I used this a number of years ago to profile highly optimized code but I can't remember what it is called. I believe Freescale has a kernel patch you have to apply in order to access it from user space.
http://allmybrain.com/2008/06/10/timing-cc-code-on-linux/
might be of help to you (directly if you are doing it in C/C++, but I hope it will give you pointers even if you're not)... It claims to provide microsecond accuracy, which just passes your criterion. :)
I think I found the kernel patch I was looking for. Posting it here so I don't forget the link:
http://user.it.uu.se/~mikpe/linux/perfctr/
http://sourceforge.net/projects/perfctr/
Edit: It works for my purposes, though not very user-friendly.
try the CPU's timestamp counter? Wikipedia seems to suggest using clock_gettime().

Resources