Why does this strace on a pipeline not finish - linux

I have a directory with a single file, one.txt. If I run ls | cat, it works fine. However, if I try to strace both sides of this pipeline, I do see the output of the command as well as strace, but the process doesn't finish.
strace ls 2> >(stdbuf -o 0 sed 's/^/command1:/') | strace cat 2> >(stdbuf -o 0 sed 's/^/command2:/')
The output I get is:
command2:execve("/usr/bin/cat", ["cat"], [/* 50 vars */]) = 0
command2:brk(0) = 0x1938000
command2:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f87e5a93000
command2:access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
<snip>
command2:open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
command2:fstat(3, {st_mode=S_IFREG|0644, st_size=106070960, ...}) = 0
command2:mmap(NULL, 106070960, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f87def8a000
command2:close(3) = 0
command2:fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
command2:fstat(0, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
command2:fadvise64(0, 0, 0, POSIX_FADV_SEQUENTIAL) = -1 ESPIPE (Illegal seek)
command2:read(0, "command1:execve(\"/usr/bin/ls\", ["..., 65536) = 4985
command1:execve("/usr/bin/ls", ["ls"], [/* 50 vars */]) = 0
command1:brk(0) = 0x1190000
command1:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fae869c3000
command1:access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
<snip>
command1:close(3) = 0
command1:fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
command2:write(1, "command1:close(3) "..., 115) = 115
command2:read(0, "command1:mmap(NULL, 4096, PROT_R"..., 65536) = 160
command1:mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fae869c2000
one.txt
command1:write(1, "one.txt\n", 8) = 8
command2:write(1, "command1:mmap(NULL, 4096, PROT_R"..., 160) = 160
command2:read(0, "command1:close(1) "..., 65536) = 159
command1:close(1) = 0
command1:munmap(0x7fae869c2000, 4096) = 0
command1:close(2) = 0
command2:write(1, "command1:close(1) "..., 159) = 159
command2:read(0, "command1:exit_group(0) "..., 65536) = 53
command1:exit_group(0) = ?
command2:write(1, "command1:exit_group(0) "..., 53) = 53
command2:read(0, "command1:+++ exited with 0 +++\n", 65536) = 31
command1:+++ exited with 0 +++
command2:write(1, "command1:+++ exited with 0 +++\n", 31) = 31
and it hangs from then on. ps reveals that both commands in the pipeline (ls and cat here) are running.
I am on RHEL7 running Bash version 4.2.46.

I put a strace on your strace:
strace bash -c 'strace true 2> >(cat > /dev/null)'
It hangs on a wait4, indicating that it's stuck waiting on children. ps f confirms this:
24740 pts/19 Ss 0:00 /bin/bash
24752 pts/19 S+ 0:00 \_ strace true
24753 pts/19 S+ 0:00 \_ /bin/bash
24755 pts/19 S+ 0:00 \_ cat
Based on this, my working theory is that this effect is a deadlock because:
strace waits on all children, even the ones it didn't spawn directly
Bash spawns the process substitution as a child of the process. Since the process substitution is attached to stderr, it essentially waits for the parent to exit.
This suggests at least two workarounds, both of which appear to work:
strace -D ls 2> >(nl)
{ strace ls; true; } 2> >(nl)
-D, to quote the man page, "[runs the] tracer process as a detached grandchild, not as parent of the tracee". The second one forces bash to do another fork to run strace by adding another command to do after.
In both cases, the extra forks mean that the process substitution doesn't end up as strace's child, avoiding the issue.

Related

Measure total time spent by a process on IO

While running time command, one of the programs gives following output:
real 1m33.523s
user 0m15.156s
sys 0m1.312s
Here the real and user+sys time have a lot of difference. This is most likely due to time spent on IO wait/calls. I want to measure total time spend by program in IO wait or IO calls. Is there any way to do that?
I tried using iotop. However, it doesnot report total time spent by the program performing IO.
Yes, strace - which can provide per-system-call statistics.
Example 1
I want to measure time spent on I/O while accessing stackoverflow.com:
$ time curl stackoverflow.com >/dev/null 2>&1
curl stackoverflow.com > /dev/null 2>&1 0.00s user 0.01s system 2% cpu 0.392 total
OK, 2% CPU and 0.01 s in system. Let's find out:
$ strace -c curl stackoverflow.com >/dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 240k 0 240k 0 0 127k 0 --:--:-- 0:00:01 --:--:-- 130k
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
54.12 0.005497 11 506 write
18.16 0.001845 43 43 fstat
11.95 0.001214 30 41 poll
5.75 0.000584 32 18 recvfrom
3.40 0.000345 3 101 mmap
2.51 0.000255 4 62 mprotect
1.98 0.000201 4 50 close
1.84 0.000187 31 6 getsockname
0.29 0.000029 1 42 1 open
Especially useful compare this results with results measured for runing curl without args.
Anyway. strace shows that curl mostly spends time in write, fstat and poll.
Another example
The first approach seems show incorrect results for sleep. If you are not satisfied with the first approach you can just print get times of each syscall (strace -T). Get this data and process them to find summary time of each syscall.
$ strace 2>&1 -T curl stackoverflow.com >/dev/null | head -n 20
execve("/usr/bin/curl", ["curl", "stackoverflow.com"], [/* 62 vars */]) = 0 <0.000219>
brk(0) = 0x186e000 <0.000175>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc04c9e6000 <0.000166>
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000238>
open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000144>
fstat(3, {st_mode=S_IFREG|0644, st_size=96498, ...}) = 0 <0.000175>
mmap(NULL, 96498, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc04c9ce000 <0.000164>
close(3) = 0 <0.000160>
open("/usr/lib64/libcurl.so.4", O_RDONLY) = 3 <0.000047>
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\333\300\">\0\0\0"..., 832) = 832 <0.000160>
fstat(3, {st_mode=S_IFREG|0755, st_size=346008, ...}) = 0 <0.000216>
mmap(0x3e22c00000, 2438600, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3e22c00000 <0.000189>
mprotect(0x3e22c51000, 2097152, PROT_NONE) = 0 <0.000032>
mmap(0x3e22e51000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x51000) = 0x3e22e51000 <0.000119>
close(3) = 0 <0.000110>
open("/lib64/libidn.so.11", O_RDONLY) = 3 <0.000257>
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0/#U1\0\0\0"..., 832) = 832 <0.000051>
fstat(3, {st_mode=S_IFREG|0755, st_size=209088, ...}) = 0 <0.000041>
mmap(0x3155400000, 2301736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3155400000 <0.000037>
mprotect(0x3155432000, 2093056, PROT_NONE) = 0 <0.000037>

Implicit system calls in UNIX commands

I've been studying UNIX and system calls and I came across a low-level and tricky questions. The question asks what system calls are called for this command:
grep word1 word2 > file.txt
I did some research and I was unable to find a huge number of resources on the underlying UNIX calls. However, it seems to me that the answer would be open (to open and the file descriptor for the file file.txt), then dup2 (to change the STDOUT of grep to the file descriptor of open), then write to write the STDOUT of grep (which is now the file descriptor of file.txt), and finally close(), to close the file descriptor of file.txt... However, I have no idea if I am right or on the correct path, can anyone with experience in UNIX enlighten me on this topic?
You are on correct direction in your research. This command is very helpful to trace system calls in any program:
strace
On my PC it shows output (without stream redirection):
$ strace grep abc ss.txt
execve("/bin/grep", ["grep", "abc", "ss.txt"], [/* 237 vars */]) = 0
brk(0) = 0x13de000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1785694000
close(3) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
stat("ss.txt", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
open("ss.txt", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffa0e4f370) = -1 ENOTTY (Inappropriate ioctl for device)
read(3, "abc\n123\n321\n\n", 32768) = 13
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f178568c000
write(1, "abc\n", 4abc
) = 4
read(3, "", 32768) = 0
close(3) = 0
close(1) = 0
munmap(0x7f178568c000, 4096) = 0
close(2) = 0
exit_group(0) = ?

How to start multiple processes inside a linux service construct?

I am attempting to start multiple memcached processes from the linux service framework using the following logic:
RETVAL=0
pcount="$CACHES"
if [ ! -z "$pcount" ]; then
while [ $pcount -gt 0 ];
do
(( pcount-- ))
(( port=PORT + pcount ))
daemon --pidfile ${pidfile}${pcount}.pid memcached -d -p $port -u $USER -m $CACHESIZE -c $MAXCONN -P ${pidfile}${pcount}.pid $OPTIONS
(( RETVAL=RETVAL + $? ))
done
else
daemon --pidfile ${pidfile}.pid memcached -d -p $PORT -u $USER -m $CACHESIZE -c $MAXCONN -P ${pidfile}.pid $OPTIONS
RETVAL=$?
fi
When run using the command service memcached start, it creates and updates pid files for each cycle in the loop, but only the last instance of the process remains running. That is, while each of the /var/run/memcached/memcached(1 through 5).pid are created and updated with a PID; those processes do not exist. /var/run/memcached/memcached0.pid is also created and updated and the PID points to a running process.
I turned on tracing and I can see that the loop is executed and the process invocation is made; however the process does not start (or likely, starts and immediately terminates so I dont see it as having started).
On the other hand, running this script directly as /etc/init.d/memcached start results in all the processes getting started correctly.
Can someone help me understand why the service framework is preventing the starting of the other instances except the last one?
As suggested by #nos, I added strace -f to trace the calls during the service memcached start operation. I compared the traced calls between the unsuccessful/terminated process and the successful process. The only lines of significant difference that I found were:
< bind(26, {sa_family=AF_INET, sin_port=htons(11216), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
< dup(2) = 27
< fcntl(27, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
< fstat(27, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
< ioctl(27, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff20d5d780) = -1 ENOTTY (Inappropriate ioctl for device)
< mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5dae958000
< lseek(27, 0, SEEK_CUR) = 0
< write(27, "bind(): Permission denied\n", 26) = 26
< close(27) = 0
< munmap(0x7f5dae958000, 4096) = 0
< close(26) = 0
< dup(2) = 26
< fcntl(26, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
< fstat(26, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
< ioctl(26, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff20d5d730) = -1 ENOTTY (Inappropriate ioctl for device)
< mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5dae958000
< lseek(26, 0, SEEK_CUR) = 0
< write(26, "failed to listen on TCP port 112"..., 54) = 54
< close(26) = 0
< munmap(0x7f5dae958000, 4096) = 0
< exit_group(71) = ?
---
> bind(26, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
> listen(26, 1024) = 0
> epoll_ctl(3, EPOLL_CTL_ADD, 26, {EPOLLIN, {u32=26, u64=26}}) = 0
> socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 27
> fcntl(27, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(27, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> setsockopt(27, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0
> setsockopt(27, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> setsockopt(27, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
> setsockopt(27, SOL_SOCKET, SO_LINGER, {onoff=0, linger=0}, 8) = 0
> setsockopt(27, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> bind(27, {sa_family=AF_INET6, sin6_port=htons(11211), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
> listen(27, 1024) = 0
> epoll_ctl(3, EPOLL_CTL_ADD, 27, {EPOLLIN, {u32=27, u64=27}}) = 0
> socket(PF_NETLINK, SOCK_RAW, 0) = 28
> bind(28, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
> getsockname(28, {sa_family=AF_NETLINK, pid=31943, groups=00000000}, [12]) = 0
> gettimeofday({1393735036, 191154}, NULL) = 0
> sendto(28, "\24\0\0\0\26\0\1\3|\265\22S\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
> recvmsg(28, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0|\265\22S\307|\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108
> recvmsg(28, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"#\0\0\0\24\0\2\0|\265\22S\307|\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 128
> recvmsg(28, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0|\265\22S\307|\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
> close(28) = 0
> socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 28
The top (<) one being from a terminated process and bottom one (>) from the last (successful) process. It is clear that the process is terminating due to lack of permission to bind to the port. On looking further, I realized that the SELinux was set to ENFORCE, which was preventing the memcached service from binding to port other than 11211 (the default port).
To the best of what I could figure, when I was running it without the service command, the behavior was simply that of a process (not a service) and hence the binding was not being enforced.
Turning off the ENFORCED mode of SELinux, got the service memcached start command working!

ldconfig includes libraries from default path also when i have a customized ld-<me>.so.conf

I have cross compiled libraries and a linux loader.
I have placed a custom ld-.so.conf under /etc , The conf file has a path that contains all the cross compiled library and the loader.
But when i run ldconfig,
ldconfig -C /etc/ld-.so.cache -f /etc/ld-.so.conf
All the system libraries and their paths are present in the cache file.
I need the cache file generated to contain only my cross compiled libraries.
Strace of ldconfig operation is as below:
strace /opt/me/ldconfig -C /etc/ld-me.so.cache -f /etc/ld-me.so.conf
execve("/opt/me/ldconfig", ["/opt/me/ldc"..., "-C",
"/etc/ld-me.so.cache", "-f", "/etc/ld-me.so.conf"], [/* 38 vars */]) =
0
uname({sys="Linux", node="ip-172-31-32-236", ...}) = 0 brk(0)
= 0x10c1000 brk(0x10c2180) = 0x10c2180 arch_prctl(ARCH_SET_FS, 0x10c1860) = 0 brk(0x10e3180)
= 0x10e3180 brk(0x10e4000) = 0x10e4000 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=99154480, ...}) = 0 mmap(NULL,
99154480, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f46155a4000 close(3)
= 0 open("/etc/ld-me.so.conf", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0640, st_size=25, ...}) = 0 mmap(NULL, 4096,
PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f46155a3000 read(3, "/opt/me/lib\n", 4096) = 25 stat("/opt/me/lib",
{st_mode=S_IFDIR|0750, st_size=4096, ...}) = 0 read(3, "", 4096)
= 0 close(3) = 0 munmap(0x7f46155a3000, 4096) = 0 stat("/lib", {st_mode=S_IFDIR|0555, st_size=4096,
...}) = 0 stat("/lib64", {st_mode=S_IFDIR|0555, st_size=12288, ...}) =
0 stat("/usr/lib", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0
stat("/usr/lib64", {st_mode=S_IFDIR|0555, st_size=12288, ...}) = 0
open("/opt/me/lib", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
Can anybody tell me why system libraries are getting added ?
Because that is the defined behavior of ldconfig:
ldconfig creates the necessary links and cache to the most
recent shared libraries found in the directories specified on the
command line, in the file /etc/ld.so.conf, and in the trusted
directories (/lib and /usr/lib). The cache is used by the run-time
linker, ld.so or ld-linux.so. ldconfig checks the header and
filenames of the libraries it encounters when
determining which versions should have their links updated.
The trusted directory list was probably updated with lib64 dirs sometime after the man page I cut'n'pasted from was written.
You could create a directory structure based on your conf file with symlinks or bind mounts pointing to the real directories and then use
-r directory to make ldconfig build based on empty versions of the system directories.

linux /proc/<pid>/exe & valgrind

As per the man page /proc/pid/exe is a symlink containing the actual path of the executed command..
when I run valgrind on my program, I see that /proc/pid/exe points to /usr/lib64/valgrind/amd64-linux/memcheck
lnx-host> which valgrind
/usr/bin/valgrind
Any idea why /proc/pid/exe points to usr/lib64/valgrind/amd64-linux/memcheck when I am invoking it as valgrind ?
In my code I am trying to get the executable name from the pid, and in this case expecting to see valgrind.
memcheck is the default tool used by Valgrind, unless you tell it to use another of the tools, such as callgrind.
Use --tool=<name> to specify the tool you want to invoke.
Side-note: is your /usr/bin/valgrind also a script just like it is by default? Why not play with that to do what you want to achieve? On my system that invokes first of all /usr/bin/valgrind.bin and then the respective (backend) tool (/usr/lib/valgrind/memcheck-amd64-linux).
Relevant output from strace:
execve("/usr/bin/valgrind", ["valgrind", "./myprog"], [/* 35 vars */]) = 0
stat("/home/user/HEAD/myprog", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
execve("/usr/bin/valgrind.bin", ["/usr/bin/valgrind.bin", "./myprog"], [/* 39 vars */]) = 0
open("./myprog", O_RDONLY) = 3
execve("/usr/lib/valgrind/memcheck-amd64-linux", ["/usr/bin/valgrind.bin", "./myprog"], [/* 40 vars */]) = 0
getcwd("/home/user/HEAD/myprog", 4095) = 25
open("./myprog", O_RDONLY) = 3
stat("./myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
readlink("/proc/self/fd/3", "/home/user/HEAD/myprog/myprog", 4096) = 31
readlink("/proc/self/fd/3", "/home/user/HEAD/myprog/myprog", 4096) = 31
open("./myprog", O_RDONLY) = 3
write(1015, "./myprog", 8) = 8
write(1016, "==23547== Command: ./myprog\n", 28==23547== Command: ./myprog
stat("/home/user/HEAD/myprog/myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
open("/home/user/HEAD/myprog/myprog", O_RDONLY) = 3
stat("/home/user/HEAD/myprog/myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
open("/home/user/HEAD/myprog/myprog", O_RDONLY) = 3
open("/home/user/HEAD/myprog/myprog", O_RDONLY) = 3
readlink("/proc/self/fd/3", "/home/user/HEAD/myprog/myprog", 4096) = 31
getcwd("/home/user/HEAD/myprog", 4096) = 25
lstat("/home/user/HEAD/myprog/myprog", {st_mode=S_IFREG|0755, st_size=1886240, ...}) = 0
open("/home/user/HEAD/myprog/datafile", O_RDONLY) = 3
access("/home/user/HEAD/myprog/datafile", F_OK) = 0
open("/home/user/HEAD/myprog/datafile", O_RDONLY) = 3
open("/home/user/HEAD/myprog/datafile", O_RDONLY) = 4
You'll notice that all execve calls are not referring to ./myprog but instead to the Valgrind wrapper script, the binary and then the backend tool:
execve("/usr/bin/valgrind", ["valgrind", "./myprog"], [/* 35 vars */]) = 0
execve("/usr/bin/valgrind.bin", ["/usr/bin/valgrind.bin", "./myprog"], [/* 39 vars */]) = 0
execve("/usr/lib/valgrind/memcheck-amd64-linux", ["/usr/bin/valgrind.bin", "./myprog"], [/* 40 vars */]) = 0

Resources