Visualising strace output - linux

Is there a simple tool, or maybe a method to turn strace output into something that can be visualised or otherwise easier to sift through? I am having to figure out where an application is going wrong, but stracing it produces massive amounts of data. Trying to trace what this application and its threads are doing (or trying to do) on a larger scale is proving to be very difficult to do reading every system call.
I have no budget for anything, and we are a pure Linux shop.

If your problem is a network one, you could try to limit the strace output to the network related syscalls with a
strace -e trace=network your_program

Yes, use -c parameter to visualise count time, calls, and errors for each syscall and report summary in form of table, e.g.
$ strace -c -fp $(pgrep -n php)
Process 11208 attached
^CProcess 11208 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
83.78 0.112292 57 1953 152 stat
7.80 0.010454 56 188 lstat
7.79 0.010439 28 376 access
0.44 0.000584 0 5342 32 recvfrom
0.15 0.000203 0 3985 sendto
0.04 0.000052 0 27184 gettimeofday
0.00 0.000000 0 6 write
0.00 0.000000 0 3888 poll
------ ----------- ----------- --------- --------- ----------------
100.00 0.134024 42922 184 total
This will identify the problem without having analysing large amount of data.
Another way is to filter by specific syscalls (such as recvfrom/sendto) to visualize received data and sent, example debugging PHP process:
strace -e recvfrom,sendto -fp $(pgrep -n php) -s 1000 2>&1 | while read -r line; do
printf "%b" $line;
done | strings
Related: How to parse strace in shell into plain text?

Related

calculating correct memory utilization from invalid output of pidstat

I used, pidstat -r -p <pid> <interval> >> log_Path/MemStat.CSV & command to to collect the memory stat.
after running this command I found that RSS VSZ %MEM values are increased continuously, which is not expected, as pidstat provides the the values considering the interval.
After searching on net I found that there is bug in pidstat and I need to update the syssat package.
(please refer last few statements by pidstat author on this link : http://sebastien.godard.pagesperso-orange.fr/tutorial.html)
Now, my question is, how do I calculate the correct %MEM utilization from the current output as we cannot run the test again.
sample output :
Time PID minflt/s majflt/s VSZ RSS %MEM
9:55:22 AM 18236 1280 0 26071488 119136 0.36
9:55:23 AM 18236 4273 0 27402768 126276 0.38
9:55:24 AM 18236 9831 0 27402800 162468 0.49
9:55:25 AM 18236 161 0 27402800 169092 0.51
9:55:26 AM 18236 51 0 27402800 175416 0.53
9:55:27 AM 18236 6859 0 27402800 198340 0.6
9:55:28 AM 18236 1440 0 27402800 203608 0.62
On the Sysstat tutorial page you refer to it is said:
I noticed that pidstat had a memory footprint (VSZ and RSS fields)
that was constantly increasing as the time went by. I quickly found
that I had forgotten to close a file descriptor in a function of my
code and that was responsible for the memory leak...!
So, the output of pidstat has never been invalid; on the contrary, the author wrote
pidstat helped me to detect a memory leak in the pidstat command
itself.

How to measure CPU usage

I would like to log CPU usage at a frequency of 1 second.
One possible way to do it is via vmstat 1 command.
The problem is that the time between each output is not always exactly one second, especially on a busy server. I would like to be able to output the timestamp along with the CPU usage every second. What would be a simple way to accomplish this, without installing special tools?
There are many ways to do that. Except top another way is to you the "sar" utility. So something like
sar -u 1 10
will give you the cpu utilization for 10 times every 1 second. At the end it will print averages for each one of the sys, user, iowait, idle
Another utility is the "mpstat", that gives you similar things with sar
Use the well-known UNIX tool top that is normally available on Linux systems:
top -b -d 1 > /tmp/top.log
The first line of each output block from top contains a timestamp.
I see no command line option to limit the number of rows that top displays.
Section 5a. SYSTEM Configuration File and 5b. PERSONAL Configuration File of the top man page describes pressing W when running top in interactive mode to create a $HOME/.toprc configuration file.
I did this, then edited my .toprc file and changed all maxtasks values so that they are maxtasks=4. Then top only displays 4 rows of output.
For completeness, the alternative way to do this using pipes is:
top -b -d 1 | awk '/load average/ {n=10} {if (n-- > 0) {print}}' > /tmp/top.log
You might want to try htop and atop. htop is beautifully interactive while atop gathers information and can report CPU usage even for terminated processes.
I found a neat way to get the timestamp information to be displayed along with the output of vmstat.
Sample command:
vmstat -n 1 3 | while read line; do echo "$(date --iso-8601=seconds) $line"; done
Output:
2013-09-13T14:01:31-0700 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
2013-09-13T14:01:31-0700 r b swpd free buff cache si so bi bo in cs us sy id wa
2013-09-13T14:01:31-0700 1 1 4197640 29952 124584 12477708 12 5 449 147 2 0 7 4 82 7
2013-09-13T14:01:32-0700 3 0 4197780 28232 124504 12480324 392 180 15984 180 1792 1301 31 15 38 16
2013-09-13T14:01:33-0700 0 1 4197656 30464 124504 12477492 344 0 2008 0 1892 1929 32 14 43 10
To monitor the disk usage, cpu and load i created a small bash scripts that writes the values to a log file every 10 seconds.
This logfile is processed by logstash kibana and riemann.
# #!/usr/bin/env bash
# Define a timestamp function
LOGPATH="/var/log/systemstatus.log"
timestamp() {
date +"%Y-%m-%dT%T.%N"
}
#server load
while ( sleep 10 ) ; do
echo -n "$(timestamp) linux::systemstatus::load " >> $LOGPATH
cat /proc/loadavg >> $LOGPATH
#cpu usage
echo -n "$(timestamp) linux::systemstatus::cpu " >> $LOGPATH
top -bn 1 | sed -n 3p >> $LOGPAT
#disk usage
echo -n "$(timestamp) linux::systemstatus::storage " >> $LOGPATH
df --total|grep total|sed "s/total//g"| sed 's/^ *//' >> $LOGPATH
done

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

How to see top processes sorted by actual memory usage?

I have a server with 12G of memory. A fragment of top is shown below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12979 frank 20 0 206m 21m 12m S 11 0.2 26667:24 krfb
13 root 15 -5 0 0 0 S 1 0.0 36:25.04 ksoftirqd/3
59 root 15 -5 0 0 0 S 0 0.0 4:53.00 ata/2
2155 root 20 0 662m 37m 8364 S 0 0.3 338:10.25 Xorg
4560 frank 20 0 8672 1300 852 R 0 0.0 0:00.03 top
12981 frank 20 0 987m 27m 15m S 0 0.2 45:10.82 amarok
24908 frank 20 0 16648 708 548 S 0 0.0 2:08.84 wrapper
1 root 20 0 8072 608 572 S 0 0.0 0:47.36 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
The free -m shows the following:
total used free shared buffers cached
Mem: 12038 11676 362 0 599 9745
-/+ buffers/cache: 1331 10706
Swap: 2204 257 1946
If I understand correctly, the system has only 362 MB of available memory. My question is: How can I find out which process is consuming most of the memory?
Just as background info, the system is running 64bit OpenSuse 12.
use quick tip using top command in linux/unix
$ top
and then hit Shift+m (i.e. write a capital M).
From man top
SORTING of task window
For compatibility, this top supports most of the former top sort keys.
Since this is primarily a service to former top users, these commands do
not appear on any help screen.
command sorted-field supported
A start time (non-display) No
M %MEM Yes
N PID Yes
P %CPU Yes
T TIME+ Yes
Or alternatively: hit Shift + f , then choose the display to order by memory usage by hitting key n then press Enter. You will see active process ordered by memory usage
First, repeat this mantra for a little while: "unused memory is wasted memory". The Linux kernel keeps around huge amounts of file metadata and files that were requested, until something that looks more important pushes that data out. It's why you can run:
find /home -type f -name '*.mp3'
find /home -type f -name '*.aac'
and have the second find instance run at ridiculous speed.
Linux only leaves a little bit of memory 'free' to handle spikes in memory usage without too much effort.
Second, you want to find the processes that are eating all your memory; in top use the M command to sort by memory use. Feel free to ignore the VIRT column, that just tells you how much virtual memory has been allocated, not how much memory the process is using. RES reports how much memory is resident, or currently in ram (as opposed to swapped to disk or never actually allocated in the first place, despite being requested).
But, since RES will count e.g. /lib/libc.so.6 memory once for nearly every process, it isn't exactly an awesome measure of how much memory a process is using. The SHR column reports how much memory is shared with other processes, but there is no guarantee that another process is actually sharing -- it could be sharable, just no one else wants to share.
The smem tool is designed to help users better gage just how much memory should really be blamed on each individual process. It does some clever work to figure out what is really unique, what is shared, and proportionally tallies the shared memory to the processes sharing it. smem may help you understand where your memory is going better than top will, but top is an excellent first tool.
ps aux | awk '{print $2, $4, $11}' | sort -k2rn | head -n 10
(Adding -n numeric flag to sort command.)
First you should read an explanation on the output of free. Bottom line: you have at least 10.7 GB of memory readily usable by processes.
Then you should define what "memory usage" is for a process (it's not easy or unambiguous, trust me).
Then we might be able to help more :-)
List and Sort Processes by Memory Usage:
ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS
ps aux --sort '%mem'
from procps' ps (default on Ubuntu 12.04) generates output like:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
...
tomcat7 3658 0.1 3.3 1782792 124692 ? Sl 10:12 0:25 /usr/lib/jvm/java-7-oracle/bin/java -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties -D
root 1284 1.5 3.7 452692 142796 tty7 Ssl+ 10:11 3:19 /usr/bin/X -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
ciro 2286 0.3 3.8 1316000 143312 ? Sl 10:11 0:49 compiz
ciro 5150 0.0 4.4 660620 168488 pts/0 Sl+ 11:01 0:08 unicorn_rails worker[1] -p 3000 -E development -c config/unicorn.rb
ciro 5147 0.0 4.5 660556 170920 pts/0 Sl+ 11:01 0:08 unicorn_rails worker[0] -p 3000 -E development -c config/unicorn.rb
ciro 5142 0.1 6.3 2581944 239408 pts/0 Sl+ 11:01 0:17 sidekiq 2.17.8 gitlab [0 of 25 busy]
ciro 2386 3.6 16.0 1752740 605372 ? Sl 10:11 7:38 /usr/lib/firefox/firefox
So here Firefox is the top consumer with 16% of my memory.
You may also be interested in:
ps aux --sort '%cpu'
Building on gaoithe's answer, I attempted to make the memory units display in megabytes, and sorted by memory descending limited to 15 entries:
ps -e -orss=,args= |awk '{print $1 " " $2 }'| awk '{tot[$2]+=$1;count[$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n | tail -n 15 | sort -nr | awk '{ hr=$1/1024; printf("%13.2fM", hr); print "\t" $2 }'
588.03M /usr/sbin/apache2
275.64M /usr/sbin/mysqld
138.23M vim
97.04M -bash
40.96M ssh
34.28M tmux
17.48M /opt/digitalocean/bin/do-agent
13.42M /lib/systemd/systemd-journald
10.68M /lib/systemd/systemd
10.62M /usr/bin/redis-server
8.75M awk
7.89M sshd:
4.63M /usr/sbin/sshd
4.56M /lib/systemd/systemd-logind
4.01M /usr/sbin/rsyslogd
Here's an example alias to use it in a bash config file:
alias topmem="ps -e -orss=,args= |awk '{print \$1 \" \" \$2 }'| awk '{tot[\$2]+=\$1;count[\$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n | tail -n 15 | sort -nr | awk '{ hr=\$1/1024; printf(\"%13.2fM\", hr); print \"\t\" \$2 }'"
Then you can just type topmem on the command line.
How to total up used memory by process name:
Sometimes even looking at the biggest single processes there is still a lot of used memory unaccounted for. To check if there are a lot of the same smaller processes using the memory you can use a command like the following which uses awk to sum up the total memory used by processes of the same name:
ps -e -orss=,args= |awk '{print $1 " " $2 }'| awk '{tot[$2]+=$1;count[$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n
e.g. output
9344 docker 1
9948 nginx: 4
22500 /usr/sbin/NetworkManager 1
24704 sleep 69
26436 /usr/sbin/sshd 15
34828 -bash 19
39268 sshd: 10
58384 /bin/su 28
59876 /bin/ksh 29
73408 /usr/bin/python 2
78176 /usr/bin/dockerd 1
134396 /bin/sh 84
5407132 bin/naughty_small_proc 1432
28061916 /usr/local/jdk/bin/java 7
you can specify which column to sort by, with following steps:
steps:
* top
* shift + F
* select a column from the list
e.g. n means sort by memory,
* press enter
* ok
You can see memory usage by executing this code in your terminal:
$ watch -n2 free -m
$ htop
This very second in time
ps -U $(whoami) -eom pid,pmem,pcpu,comm | head -n4
Continuously updating
watch -n 1 'ps -U $(whoami) -eom pid,pmem,pcpu,comm | head -n4'
I also added a few goodies here you might appreciate (or you might ignore)
-n 1 watch and update every second
-U $(whoami) To show only your processes. $(some command) evaluates now
| head -n4 To only show the header and 3 processes at a time bc often you just need high usage line items
${1-4} says my first argument $1 I want to default to 4, unless I provide it
If you are using a mac you may need to install watch first
brew install watch
Alternatively you might use a function
psm(){
watch -n 1 "ps -eom pid,pmem,pcpu,comm | head -n ${1-4}"
# EXAMPLES:
# psm
# psm 10
}
You have this simple command:
$ free -h

How should strace be used?

A colleague once told me that the last option when everything has failed to debug on Linux was to use strace.
I tried to learn the science behind this strange tool, but I am not a system admin guru and I didn’t really get results.
So,
What is it exactly and what does it do?
How and in which cases should it be used?
How should the output be understood and processed?
In brief, in simple words, how does this stuff work?
Strace Overview
strace can be seen as a light weight debugger. It allows a programmer / user to quickly find out how a program is interacting with the OS. It does this by monitoring system calls and signals.
Uses
Good for when you don't have source code or don't want to be bothered to really go through it.
Also, useful for your own code if you don't feel like opening up GDB, but are just interested in understanding external interaction.
A good little introduction
Here is a gentle introduction to using strace to debug process hangs: strace introduction
In simple words, strace traces all system calls issued by a program along with their return codes. Think things such as file/socket operations and a lot more obscure ones.
It is most useful if you have some working knowledge of C since here system calls would more accurately stand for standard C library calls.
Let's say your program is /usr/local/bin/cough. Simply use:
strace /usr/local/bin/cough <any required argument for cough here>
or
strace -o <out_file> /usr/local/bin/cough <any required argument for cough here>
to write into 'out_file'.
All strace output will go to stderr (beware, the sheer volume of it often asks for a redirection to a file). In the simplest cases, your program will abort with an error and you'll be able to see what where its last interactions with the OS in strace output.
More information should be available with:
man strace
strace lists all system calls done by the process it's applied to. If you don't know what system calls mean, you won't be able to get much mileage from it.
Nevertheless, if your problem involves files or paths or environment values, running strace on the problematic program and redirecting the output to a file and then grepping that file for your path/file/env string may help you see what your program is actually attempting to do, as distinct from what you expected it to.
Strace stands out as a tool for investigating production systems where you can't afford to run these programs under a debugger. In particular, we have used strace in the following two situations:
Program foo seems to be in deadlock and has become unresponsive. This could be a target for gdb; however, we haven't always had the source code or sometimes were dealing with scripted languages that weren't straight-forward to run under a debugger. In this case, you run strace on an already running program and you will get the list of system calls being made. This is particularly useful if you are investigating a client/server application or an application that interacts with a database
Investigating why a program is slow. In particular, we had just moved to a new distributed file system and the new throughput of the system was very slow. You can specify strace with the '-T' option which will tell you how much time was spent in each system call. This helped to determine why the file system was causing things to slow down.
For an example of analyzing using strace see my answer to this question.
I use strace all the time to debug permission issues. The technique goes like this:
$ strace -e trace=open,stat,read,write gnome-calculator
Where gnome-calculator is the command that you want to run.
strace -tfp PID will monitor the PID process's system calls, thus we can debug/monitor our process/program status.
Strace can be used as a debugging tool, or as a primitive profiler.
As a debugger, you can see how given system calls were called, executed and what they return. This is very important, as it allows you to see not only that a program failed, but WHY a program failed. Usually it's just a result of lousy coding not catching all the possible outcomes of a program. Other times it's just hardcoded paths to files. Without strace you get to guess what went wrong where and how. With strace you get a breakdown of a syscall, usually just looking at a return value tells you a lot.
Profiling is another use. You can use it to time execution of each syscalls individually, or as an aggregate. While this might not be enough to fix your problems, it will at least greatly narrow down the list of potential suspects. If you see a lot of fopen/close pairs on a single file, you probably unnecessairly open and close files every execution of a loop, instead of opening and closing it outside of a loop.
Ltrace is strace's close cousin, also very useful. You must learn to differenciate where your bottleneck is. If a total execution is 8 seconds, and you spend only 0.05secs on system calls, then stracing the program is not going to do you much good, the problem is in your code, which is usually a logic problem, or the program actually needs to take that long to run.
The biggest problem with strace/ltrace is reading their output. If you don't know how the calls are made, or at least the names of syscalls/functions, it's going to be difficult to decipher the meaning. Knowing what the functions return can also be very beneficial, especially for different error codes. While it's a pain to decipher, they sometimes really return a pearl of knowledge; once I saw a situation where I ran out of inodes, but not out of free space, thus all the usual utilities didn't give me any warning, I just couldn't make a new file. Reading the error code from strace's output pointed me in the right direction.
Minimal runnable example
If a concept is not clear, there is a simpler example that you haven't seen that explains it.
In this case, that example is the Linux x86_64 assembly freestanding (no libc) hello world:
hello.S
.text
.global _start
_start:
/* write */
mov $1, %rax /* syscall number */
mov $1, %rdi /* stdout */
mov $msg, %rsi /* buffer */
mov $len, %rdx /* buffer len */
syscall
/* exit */
mov $60, %rax /* exit status */
mov $0, %rdi /* syscall number */
syscall
msg:
.ascii "hello\n"
len = . - msg
GitHub upstream.
Assemble and run:
as -o hello.o hello.S
ld -o hello.out hello.o
./hello.out
Outputs the expected:
hello
Now let's use strace on that example:
env -i ASDF=qwer strace -o strace.log -s999 -v ./hello.out arg0 arg1
cat strace.log
We use:
env -i ASDF=qwer to control the environment variables: https://unix.stackexchange.com/questions/48994/how-to-run-a-program-in-a-clean-environment-in-bash
-s999 -v to show fuller information on the logs
strace.log now contains:
execve("./hello.out", ["./hello.out", "arg0", "arg1"], ["ASDF=qwer"]) = 0
write(1, "hello\n", 6) = 6
exit(0) = ?
+++ exited with 0 +++
With such a minimal example, every single character of the output is self evident:
execve line: shows how strace executed hello.out, including CLI arguments and environment as documented at man execve
write line: shows the write system call that we made. 6 is the length of the string "hello\n".
= 6 is the return value of the system call, which as documented in man 2 write is the number of bytes written.
exit line: shows the exit system call that we've made. There is no return value, since the program quit!
More complex examples
The application of strace is of course to see which system calls complex programs are actually doing to help debug / optimize your program.
Notably, most system calls that you are likely to encounter in Linux have glibc wrappers, many of them from POSIX.
Internally, the glibc wrappers use inline assembly more or less like this: How to invoke a system call via sysenter in inline assembly?
The next example you should study is a POSIX write hello world:
main.c
#define _XOPEN_SOURCE 700
#include <unistd.h>
int main(void) {
char *msg = "hello\n";
write(1, msg, 6);
return 0;
}
Compile and run:
gcc -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out
This time, you will see that a bunch of system calls are being made by glibc before main to setup a nice environment for main.
This is because we are now not using a freestanding program, but rather a more common glibc program, which allows for libc functionality.
Then, at the every end, strace.log contains:
write(1, "hello\n", 6) = 6
exit_group(0) = ?
+++ exited with 0 +++
So we conclude that the write POSIX function uses, surprise!, the Linux write system call.
We also observe that return 0 leads to an exit_group call instead of exit. Ha, I didn't know about this one! This is why strace is so cool. man exit_group then explains:
This system call is equivalent to exit(2) except that it terminates not only the calling thread, but all threads in the calling process's thread group.
And here is another example where I studied which system call dlopen uses: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710
Tested in Ubuntu 16.04, GCC 6.4.0, Linux kernel 4.4.0.
Strace is a tool that tells you how your application interacts with your operating system.
It does this by telling you what OS system calls your application uses and with what parameters it calls them.
So for instance you see what files your program tries to open, and weather the call succeeds.
You can debug all sorts of problems with this tool. For instance if application says that it cannot find library that you know you have installed you strace would tell you where the application is looking for that file.
And that is just a tip of the iceberg.
strace is a good tool for learning how your program makes various system calls (requests to the kernel) and also reports the ones that have failed along with the error value associated with that failure. Not all failures are bugs. For example, a code that is trying to search for a file may get a ENOENT (No such file or directory) error but that may be an acceptable scenario in the logic of the code.
One good use case of using strace is to debug race conditions during temporary file creation. For example a program that may be creating files by appending the process ID (PID) to some predecided string may face problems in multi-threaded scenarios. [A PID+TID (process id + thread id) or a better system call such as mkstemp will fix this].
It is also good for debugging crashes. You may find this (my) article on strace and debugging crashes useful.
Here's some examples of how I use strace to dig into websites. Hope this is helpful.
Check for time to first byte like so:
time php index.php > timeTrace.txt
See what percentage of actions are doing what. Lots of lstat and fstat could be an indication that it's time to clear the cache:
strace -s 200 -c php index.php > traceLstat.txt
Outputs a trace.txt so you can see exactly what calls are being made.
strace -Tt -o Fulltrace.txt php index.php
Use this to check on whether anything took between .1 to .9 of a second to load:
cat Fulltrace.txt | grep "[<]0.[1-9]" > traceSlowest.txt
See what missing files or directories got caught in the strace. This will output a lot of stuff involving our system - the only relevant bits involve the customer's files:
strace -vv php index.php 2>&1 | sed -n '/= -1/p' > traceFailures.txt
I liked some of the answers where it reads strace checks how you interacts with your operating system.
This is exactly what we can see. The system calls. If you compare strace and ltrace the difference is more obvious.
$>strace -c cd
Desktop Documents Downloads examples.desktop Music Pictures Public Templates Videos
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 7 read
0.00 0.000000 0 1 write
0.00 0.000000 0 11 close
0.00 0.000000 0 10 fstat
0.00 0.000000 0 17 mmap
0.00 0.000000 0 12 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 2 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 2 ioctl
0.00 0.000000 0 8 8 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 getdents
0.00 0.000000 0 2 2 statfs
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 9 openat
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 93 10 total
On the other hand there is ltrace that traces functions.
$>ltrace -c cd
Desktop Documents Downloads examples.desktop Music Pictures Public Templates Videos
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
15.52 0.004946 329 15 memcpy
13.34 0.004249 94 45 __ctype_get_mb_cur_max
12.87 0.004099 2049 2 fclose
12.12 0.003861 83 46 strlen
10.96 0.003491 109 32 __errno_location
10.37 0.003303 117 28 readdir
8.41 0.002679 133 20 strcoll
5.62 0.001791 111 16 __overflow
3.24 0.001032 114 9 fwrite_unlocked
1.26 0.000400 100 4 __freading
1.17 0.000372 41 9 getenv
0.70 0.000222 111 2 fflush
0.67 0.000214 107 2 __fpending
0.64 0.000203 101 2 fileno
0.62 0.000196 196 1 closedir
0.43 0.000138 138 1 setlocale
0.36 0.000114 114 1 _setjmp
0.31 0.000098 98 1 realloc
0.25 0.000080 80 1 bindtextdomain
0.21 0.000068 68 1 opendir
0.19 0.000062 62 1 strrchr
0.18 0.000056 56 1 isatty
0.16 0.000051 51 1 ioctl
0.15 0.000047 47 1 getopt_long
0.14 0.000045 45 1 textdomain
0.13 0.000042 42 1 __cxa_atexit
------ ----------- ----------- --------- --------------------
100.00 0.031859 244 total
Although I checked the manuals several time, I haven't found the origin of the name strace but it is likely system-call trace, since this is obvious.
There are three bigger notes to say about strace.
Note 1: Both these functions strace and ltrace are using the system call ptrace. So ptrace system call is effectively how strace works.
The ptrace() system call provides a means by which one process (the
"tracer") may observe and control the execution of another process
(the "tracee"), and examine and change the tracee's memory and
registers. It is primarily used to implement breakpoint debugging
and system call tracing.
Note 2: There are different parameters you can use with strace, since strace can be very verbose. I like to experiment with -c which is like a summary of things. Based on -c you can select one system-call like -e trace=open where you will see only that call. This can be interesting if you are examining what files will be opened during the command you are tracing.
And of course, you can use the grep for the same purpose but note you need to redirect like this 2>&1 | grep etc to understand that config files are referenced when the command was issued.
Note 3: I find this very important note. You are not limited to a specific architecture. strace will blow you mind, since it can trace over binaries of different architectures.

Resources