Measure total time spent by a process on IO - linux

While running time command, one of the programs gives following output:
real 1m33.523s
user 0m15.156s
sys 0m1.312s
Here the real and user+sys time have a lot of difference. This is most likely due to time spent on IO wait/calls. I want to measure total time spend by program in IO wait or IO calls. Is there any way to do that?
I tried using iotop. However, it doesnot report total time spent by the program performing IO.

Yes, strace - which can provide per-system-call statistics.
Example 1
I want to measure time spent on I/O while accessing stackoverflow.com:
$ time curl stackoverflow.com >/dev/null 2>&1
curl stackoverflow.com > /dev/null 2>&1 0.00s user 0.01s system 2% cpu 0.392 total
OK, 2% CPU and 0.01 s in system. Let's find out:
$ strace -c curl stackoverflow.com >/dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 240k 0 240k 0 0 127k 0 --:--:-- 0:00:01 --:--:-- 130k
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
54.12 0.005497 11 506 write
18.16 0.001845 43 43 fstat
11.95 0.001214 30 41 poll
5.75 0.000584 32 18 recvfrom
3.40 0.000345 3 101 mmap
2.51 0.000255 4 62 mprotect
1.98 0.000201 4 50 close
1.84 0.000187 31 6 getsockname
0.29 0.000029 1 42 1 open
Especially useful compare this results with results measured for runing curl without args.
Anyway. strace shows that curl mostly spends time in write, fstat and poll.
Another example
The first approach seems show incorrect results for sleep. If you are not satisfied with the first approach you can just print get times of each syscall (strace -T). Get this data and process them to find summary time of each syscall.
$ strace 2>&1 -T curl stackoverflow.com >/dev/null | head -n 20
execve("/usr/bin/curl", ["curl", "stackoverflow.com"], [/* 62 vars */]) = 0 <0.000219>
brk(0) = 0x186e000 <0.000175>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc04c9e6000 <0.000166>
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000238>
open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000144>
fstat(3, {st_mode=S_IFREG|0644, st_size=96498, ...}) = 0 <0.000175>
mmap(NULL, 96498, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc04c9ce000 <0.000164>
close(3) = 0 <0.000160>
open("/usr/lib64/libcurl.so.4", O_RDONLY) = 3 <0.000047>
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\333\300\">\0\0\0"..., 832) = 832 <0.000160>
fstat(3, {st_mode=S_IFREG|0755, st_size=346008, ...}) = 0 <0.000216>
mmap(0x3e22c00000, 2438600, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e22c00000 <0.000189>
mprotect(0x3e22c51000, 2097152, PROT_NONE) = 0 <0.000032>
mmap(0x3e22e51000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x51000) = 0x3e22e51000 <0.000119>
close(3) = 0 <0.000110>
open("/lib64/libidn.so.11", O_RDONLY) = 3 <0.000257>
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0/#U1\0\0\0"..., 832) = 832 <0.000051>
fstat(3, {st_mode=S_IFREG|0755, st_size=209088, ...}) = 0 <0.000041>
mmap(0x3155400000, 2301736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3155400000 <0.000037>
mprotect(0x3155432000, 2093056, PROT_NONE) = 0 <0.000037>

Related

Why does this strace on a pipeline not finish

I have a directory with a single file, one.txt. If I run ls | cat, it works fine. However, if I try to strace both sides of this pipeline, I do see the output of the command as well as strace, but the process doesn't finish.
strace ls 2> >(stdbuf -o 0 sed 's/^/command1:/') | strace cat 2> >(stdbuf -o 0 sed 's/^/command2:/')
The output I get is:
command2:execve("/usr/bin/cat", ["cat"], [/* 50 vars */]) = 0
command2:brk(0) = 0x1938000
command2:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f87e5a93000
command2:access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
<snip>
command2:open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
command2:fstat(3, {st_mode=S_IFREG|0644, st_size=106070960, ...}) = 0
command2:mmap(NULL, 106070960, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f87def8a000
command2:close(3) = 0
command2:fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
command2:fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
command2:fadvise64(0, 0, 0, POSIX_FADV_SEQUENTIAL) = -1 ESPIPE (Illegal seek)
command2:read(0, "command1:execve(\"/usr/bin/ls\", ["..., 65536) = 4985
command1:execve("/usr/bin/ls", ["ls"], [/* 50 vars */]) = 0
command1:brk(0) = 0x1190000
command1:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fae869c3000
command1:access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
<snip>
command1:close(3) = 0
command1:fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
command2:write(1, "command1:close(3) "..., 115) = 115
command2:read(0, "command1:mmap(NULL, 4096, PROT_R"..., 65536) = 160
command1:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fae869c2000
one.txt
command1:write(1, "one.txt\n", 8) = 8
command2:write(1, "command1:mmap(NULL, 4096, PROT_R"..., 160) = 160
command2:read(0, "command1:close(1) "..., 65536) = 159
command1:close(1) = 0
command1:munmap(0x7fae869c2000, 4096) = 0
command1:close(2) = 0
command2:write(1, "command1:close(1) "..., 159) = 159
command2:read(0, "command1:exit_group(0) "..., 65536) = 53
command1:exit_group(0) = ?
command2:write(1, "command1:exit_group(0) "..., 53) = 53
command2:read(0, "command1:+++ exited with 0 +++\n", 65536) = 31
command1:+++ exited with 0 +++
command2:write(1, "command1:+++ exited with 0 +++\n", 31) = 31
and it hangs from then on. ps reveals that both commands in the pipeline (ls and cat here) are running.
I am on RHEL7 running Bash version 4.2.46.
I put a strace on your strace:
strace bash -c 'strace true 2> >(cat > /dev/null)'
It hangs on a wait4, indicating that it's stuck waiting on children. ps f confirms this:
24740 pts/19 Ss 0:00 /bin/bash
24752 pts/19 S+ 0:00 \_ strace true
24753 pts/19 S+ 0:00 \_ /bin/bash
24755 pts/19 S+ 0:00 \_ cat
Based on this, my working theory is that this effect is a deadlock because:
strace waits on all children, even the ones it didn't spawn directly
Bash spawns the process substitution as a child of the process. Since the process substitution is attached to stderr, it essentially waits for the parent to exit.
This suggests at least two workarounds, both of which appear to work:
strace -D ls 2> >(nl)
{ strace ls; true; } 2> >(nl)
-D, to quote the man page, "[runs the] tracer process as a detached grandchild, not as parent of the tracee". The second one forces bash to do another fork to run strace by adding another command to do after.
In both cases, the extra forks mean that the process substitution doesn't end up as strace's child, avoiding the issue.

Why do strace's timings for -c and -T disagree?

I have an rsync running a no-op (all files are already there) directory copy operation to a network-mounted file system.
Because all files are already there, the only thing that rsync does is lstat() syscalls.
If I strace -c this, I get this:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.076780 30 2524 lstat
------ ----------- ----------- --------- --------- ----------------
100.00 0.076780 2524 total
real 0m5.451s
But if I strace -T (showing time per syscall), I get this:
lstat("file1", {st_mode=S_IFREG|0644, st_size=32820, ...}) = 0 <0.005523>
lstat("file2", {st_mode=S_IFREG|0644, st_size=20816, ...}) = 0 <0.001529>
lstat("file3", {st_mode=S_IFREG|0644, st_size=1828312, ...}) = 0 <0.001991>
lstat("file4", {st_mode=S_IFREG|0644, st_size=1823258, ...}) = 0 <0.001326>
lstat("file5", {st_mode=S_IFREG|0644, st_size=32820, ...}) = 0 <0.006562>
lstat("file6", {st_mode=S_IFREG|0644, st_size=22578, ...}) = 0 <0.002151>
lstat("file7", {st_mode=S_IFREG|0644, st_size=32835, ...}) = 0 <0.001705>
lstat("file8", {st_mode=S_IFREG|0644, st_size=25493, ...}) = 0 <0.001492>
lstat("file9", {st_mode=S_IFREG|0644, st_size=1783930, ...}) = 0 <0.001974>
The times are completely off!
-c claims each lstat takes roughly 30 usecs/call, while -T shows about 2 ms/call.
The 2 ms make sense, that's in the order of ping speed for the network mount, but 30 microseconds is just plain impossible.
Why is the value in the usecs/call column bogus? Am I misunderstanding it?
From the strace man page:
-c Count time, calls, and errors for each system call and report a summary on program exit. On Linux, this attempts to show
system time (CPU time
spent running in the kernel) independent of wall clock time. If -c is used with -f or -F (below), only aggregate
totals for all traced processes
are kept.
(Emphasis added by me.) Most I/O will just make the actual asynchronous call and context switch away, rather than doing some kind of busy loop. -T will instead show the wall clock time duration between calling into the kernel, and that call returning.
Edit: In later versions, -w gives you wait times, rather than system times, so -c -w will give you times that should match -T.

Implicit system calls in UNIX commands

I've been studying UNIX and system calls and I came across a low-level and tricky questions. The question asks what system calls are called for this command:
grep word1 word2 > file.txt
I did some research and I was unable to find a huge number of resources on the underlying UNIX calls. However, it seems to me that the answer would be open (to open and the file descriptor for the file file.txt), then dup2 (to change the STDOUT of grep to the file descriptor of open), then write to write the STDOUT of grep (which is now the file descriptor of file.txt), and finally close(), to close the file descriptor of file.txt... However, I have no idea if I am right or on the correct path, can anyone with experience in UNIX enlighten me on this topic?
You are on correct direction in your research. This command is very helpful to trace system calls in any program:
strace
On my PC it shows output (without stream redirection):
$ strace grep abc ss.txt
execve("/bin/grep", ["grep", "abc", "ss.txt"], [/* 237 vars */]) = 0
brk(0) = 0x13de000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1785694000
close(3) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
stat("ss.txt", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
open("ss.txt", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffa0e4f370) = -1 ENOTTY (Inappropriate ioctl for device)
read(3, "abc\n123\n321\n\n", 32768) = 13
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f178568c000
write(1, "abc\n", 4abc
) = 4
read(3, "", 32768) = 0
close(3) = 0
close(1) = 0
munmap(0x7f178568c000, 4096) = 0
close(2) = 0
exit_group(0) = ?

linux /proc/<pid>/exe & valgrind

As per the man page /proc/pid/exe is a symlink containing the actual path of the executed command..
when I run valgrind on my program, I see that /proc/pid/exe points to /usr/lib64/valgrind/amd64-linux/memcheck
lnx-host> which valgrind
/usr/bin/valgrind
Any idea why /proc/pid/exe points to usr/lib64/valgrind/amd64-linux/memcheck when I am invoking it as valgrind ?
In my code I am trying to get the executable name from the pid, and in this case expecting to see valgrind.
memcheck is the default tool used by Valgrind, unless you tell it to use another of the tools, such as callgrind.
Use --tool=<name> to specify the tool you want to invoke.
Side-note: is your /usr/bin/valgrind also a script just like it is by default? Why not play with that to do what you want to achieve? On my system that invokes first of all /usr/bin/valgrind.bin and then the respective (backend) tool (/usr/lib/valgrind/memcheck-amd64-linux).
Relevant output from strace:
execve("/usr/bin/valgrind", ["valgrind", "./myprog"], [/* 35 vars */]) = 0
stat("/home/user/HEAD/myprog", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
execve("/usr/bin/valgrind.bin", ["/usr/bin/valgrind.bin", "./myprog"], [/* 39 vars */]) = 0
open("./myprog", O_RDONLY) = 3
execve("/usr/lib/valgrind/memcheck-amd64-linux", ["/usr/bin/valgrind.bin", "./myprog"], [/* 40 vars */]) = 0
getcwd("/home/user/HEAD/myprog", 4095) = 25
open("./myprog", O_RDONLY) = 3
stat("./myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
readlink("/proc/self/fd/3", "/home/user/HEAD/myprog/myprog", 4096) = 31
readlink("/proc/self/fd/3", "/home/user/HEAD/myprog/myprog", 4096) = 31
open("./myprog", O_RDONLY) = 3
write(1015, "./myprog", 8) = 8
write(1016, "==23547== Command: ./myprog\n", 28==23547== Command: ./myprog
stat("/home/user/HEAD/myprog/myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
open("/home/user/HEAD/myprog/myprog", O_RDONLY) = 3
stat("/home/user/HEAD/myprog/myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
open("/home/user/HEAD/myprog/myprog", O_RDONLY) = 3
open("/home/user/HEAD/myprog/myprog", O_RDONLY) = 3
readlink("/proc/self/fd/3", "/home/user/HEAD/myprog/myprog", 4096) = 31
getcwd("/home/user/HEAD/myprog", 4096) = 25
lstat("/home/user/HEAD/myprog/myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
open("/home/user/HEAD/myprog/datafile", O_RDONLY) = 3
access("/home/user/HEAD/myprog/datafile", F_OK) = 0
open("/home/user/HEAD/myprog/datafile", O_RDONLY) = 3
open("/home/user/HEAD/myprog/datafile", O_RDONLY) = 4
You'll notice that all execve calls are not referring to ./myprog but instead to the Valgrind wrapper script, the binary and then the backend tool:
execve("/usr/bin/valgrind", ["valgrind", "./myprog"], [/* 35 vars */]) = 0
execve("/usr/bin/valgrind.bin", ["/usr/bin/valgrind.bin", "./myprog"], [/* 39 vars */]) = 0
execve("/usr/lib/valgrind/memcheck-amd64-linux", ["/usr/bin/valgrind.bin", "./myprog"], [/* 40 vars */]) = 0

what functions are called when i do vi

When I do a vi filename from the command prompt, what fuse functions are called if I am using the fusexmp example ? I could guess mknod, open are called.
When I do a write ie when i do :wq write is called. is that right.
There's no fantastically easy way to see which FUSE functions are called for any given file operation, but running strace(1) will record the system calls, which is quite close to the FUSE functions:
$ strace -o /tmp/vim.all vim /etc/motd
A lot of those system calls aren't related to the one file specifically, but to the process of loading vim, its dynamically linked libraries, your local configuration, and all its supporting files.
Here's some selected lines that refer to the /etc/motd that I opened:
stat("/etc/motd", {st_mode=S_IFREG|0644, st_size=183, ...}) = 0
stat("/etc/motd", {st_mode=S_IFREG|0644, st_size=183, ...}) = 0
stat("/etc/motd", {st_mode=S_IFREG|0644, st_size=183, ...}) = 0
stat("/etc/motd", {st_mode=S_IFREG|0644, st_size=183, ...}) = 0
access("/etc/motd", W_OK) = -1 EACCES (Permission denied)
open("/etc/motd", O_RDONLY) = 7
close(7) = 0
open("/etc/motd", O_RDONLY) = 7
read(7, "Welcome to Ubuntu 11.04 (GNU/Lin"..., 8192) = 183
read(7, "", 65536) = 0
close(7) = 0
stat("/etc/motd", {st_mode=S_IFREG|0644, st_size=183, ...}) = 0
The intervening lines make the repeated stat(2) calls a little less silly looking.

Resources