linux strace: How to filter system calls that take more than a second - linux

I'm using "strace -f -T -tt -o foo.txt -p 1234" to print the time spent in system calls. This makes the output file huge, is there a way to just print the system calls that took greater than 1second. I can grep it out from the file later, but is there a better way?

If we simply omit the -o foo.txt argument, the output goes to standard output. We can pipe it through grep and redirect to the file:
strace -f -T -tt -p 1234 | grep pattern > foo.txt
To watch the output at the same time:
strace -f -T -tt -p 1234 | grep pattern | tee foo.txt
If a command prints only to a file that is passed as an argument, and we want to filter/redirect its output, the first step is to check whether it implements the dash convention: can you specify standard input or output using - as a filename argument:
some_command - | our_pipe > file.txt
If not, then the recourse is to use Bash process substitution substitution syntax: >(output command) and <(input command):
some_command >(our_pipe > file.txt)
The process substitution syntax expands into a token that is suitable as a filename argument for a command or function. When the program opens that token, it gets a file descriptor to the command's input or output, depending on direction.
With process substitution, we can redirect the input or output of stubborn programs which work only with files passed as by name as arguments, and which do not support any convention for requesting that standard input or output be used in place of a file.
The token used by process substitution is platform-dependent; we can see what it is using echo. For instance on GNU/Linux, Bash takes advantage of the /dev/fd operating system feature:
$ echo <(ls -l)
/dev/fd/63

You can use the following command:
strace -T command 2>&1 >/dev/null | awk '{gsub(/[<>]/,"",$NF)}$NF+.0>1.0'
Explanation:
strace -T adds the time spent in the syscall end the end of the line, enclosed in <...>
2>&1 >/dev/null | awk pipes stderr to awk. (strace writes it's output to stderr!)
The awk command removes the <> from the last field $NF and prints lines where the time spent is higher than a second.
Probably you'll also want to pass the threshold as a variable to the awk command:
strace -T command 2>&1 >/dev/null \
| awk -v thres=0.001 '{gsub(/[<>]/,"",$NF)}$NF+.0>thres+.0'

Related

How to reference the output of the previous command twice in Linux command?

For instance, if I'd like to reference the output of the previous command once, I can use the command below:
ls *.txt | xargs -I % ls -l %
But how to reference the output twice? Like how can I implement something like:
ls *.txt | xargs -I % 'some command' % > %
PS: I know how to do it in shell script, but I just want a simpler way to do it.
You can pass this argument to bash -c:
ls *.txt | xargs -I % bash -c 'ls -l "$1" > "out.$1"' - %
You can lookup up 'tpipe' on SO; it will also lead you to 'pee' (which is not a good search term elsewhere on the internet). Basically, they're variants of the tee command which write to multiple processes instead of writing to files like the tee command does.
However, with Bash, you can use Process Substitution:
ls *.txt | tee >(cmd1) >(cmd2)
This will write the input to tee to each of the commands cmd1 and cmd2.
You can arrange to lose standard output in at least two different ways:
ls *.txt | tee >(cmd1) >(cmd2) >/dev/null
ls *.txt | tee >(cmd1) | cmd2

Shell script to log output of console

I want to grep the output of my script - which itself contains call to different binaries...
Since the script has multiple binaries within I can't simply put exec and dump the output in file (it does not copy output from the binaries)...
And to let you know, I am monitoring the script output to determine if the system has got stuck!
Why don't you append instead?
mybin1 | grep '...' >> mylog.txt
mybin2 | grep '...' >> mylog.txt
mybin3 | grep '...' >> mylog.txt
Does this not work?
#!/bin/bash
exec 11>&1 12>&2 > >(exec tee /var/log/somewhere) 2>&1 ## Or add -a option to tee to append.
# call your binaries here
exec >&- 2>&- >&11 2>&12 11>&- 12>&-

using sed and pstree to display the type of terminal being used

I've been trying to display the type of terminal being used as the name only. For instance if I was using konsole it would display konsole. So far I've been using this command.
pstree -A -s $$
That outputs this.
systemd---konsole---bash---pstree
I have the following that can extract konsole from that line
pstree -A -s $$ | sed 's/systemd---//g;s/---.*//g' | head -1
and that outputs konsole properly. But some people have output from just the pstree command that can look like this.
systemd---kdeinit4---terminator---bash---pstree
or this
systemd---kdeinit4---lxterminal---bash---pstree
and then when I add the sed command it extracts kdeinit4 instead of terminator. I can think of a couple scenarios to extract the type of terminal but none that don't contain conditional statements to check for specific types of terminals. The problem I'm having is I can't accurately predict how many non or non-relative things may be infront or behind of the terminal name or what they will be nor can I accurately predict what the terminal name will be. Does anyone have any ideas on a solution to this?
You could use
ps -p "$PPID" -o comm=
Or
ps -p "$PPID" -o fname=
If your shell does not have PPID variable set you could get it with
ps -p "$(ps -p "$$" -o ppid= | sed 's|\s\+||')" -o fname=
Another theory is that the parent process of the current shell that doesn't belong to the same tty as the shell could actually be the one that produces the virtual terminal, so we could find it like this as well:
#!/bin/bash
shopt -s extglob
SHELLTTY=$(exec ps -p "$$" -o tty=)
P=$$
while read P < <(exec ps -p "$P" -o ppid=) && [[ $P == +([[:digit:]]) ]]; do
if read T < <(exec ps -p "$P" -o tty=) && [[ $T != "$SHELLTTY" ]]; then
ps -p "$P" -o comm=
break
fi
done
I don't know how to isolate the terminal name on your system, but as a parsing exercise, and assuming the terminal is directly running your bash you could pipe the pstree output through:
awk -F"---bash---" ' NF == 2 { count = split( $1, arr, "---" ); print arr [count]; }'
This will find the word prior to the "---bash---" which in your examples is
konsole
terminator
lxterminal
If you want different shell types, you could expand the field separator to include them like:
awk -F"---(bash|csh)---" ' NF == 2 { count = split( $1, arr, "---" ); print arr[count]; }'
Considering an imaginary line like:
systemd---imaginary---monkey---csh---pstree
the awk would find "monkey" as the terminal name as well as anything from your test set.
No guarantees here, but I think this will work most of the time, on linux:
ps -ocomm= $(lsof -tl /proc/$$/fd/0 | grep -Fxf <(lsof -t /dev/ptmx))
A little explanation is probably in order, but see man ps, man lsof and (especially) man pts for information.
/dev/ptmx is a pseudo-tty master (on modern linux systems, and some other unix(-like) systems). A program will have one of these open if it is a terminal emulator, a telnet/ssh daemon, or some other program which needs a captive terminal (screen, for example). The emulator writes to the pseudo-tty master what it wants to "type", and reads the result from the pseudo-tty slave.
/proc/$$/fd/0 is stdin of process $$ (i.e. the shell in which the command is executed). If stdin has not been redirected, this will be a symlink to some slave pseudotty, /dev/pts/#. That is the other side of the /dev/ptmx device, and consequently all of the programs listed above which have /dev/ptmx open will also have some /dev/pts/# slave open as well. (You might think that you could use /dev/stdin or /dev/fd/0 instead of /proc/$$/fd/0, but those would be opened by lsof itself, and consequently would be its stdin; because of the way lsof is implemented, that won't work.) The -l option to lsof causes it to follow symlinks, so that will cause it to show the process which have the same pts open as the current shell.
The -t option to lsof causes it to produce "terse" output, consisting only of pids, one per line. The -Fx options to grep cause it to match strings, rather than regex, and to force a full line match; the -f FILE option causes it to accept the strings to match from FILE (which in this case is a process substitution), one per line.
Finally, ps -ocomm= prints out the "command" (chopped, by default, to 8 characters) corresponding to a pid.
In short, the command finds a list of terminal emulators and other master similar programs which have a master pseudo-tty, and a list of processes which use the pseudo-tty slave; finds the intersection between the two, and then looks up the command name for whatever results.
curTerm=$(update-alternatives --query x-terminal-emulator | grep '^Best:')
curTerm=${curTerm##*/}
printf "%s\n" "$curTerm"
And the result is
terminator
Of course it can be different.
Now you can use $curTerm variable in your sed command.
But I am not sure if this is going to work properly with symlinks.

How can I use a pipe or redirect in a qsub command?

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?
You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:
qsub bash -c "cmd <options> | cmd2 <options>"
As #spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
echo "cmd <options> | cmd2 <options>" | qsub
as well.
Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).
Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
echo each call to qsub and then pipe that to shell.
Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:
bowtie -q query -1 param1 -2 param2 ... | samtools ...
To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.
cat params-query.txt | xargs -i echo qsub -b y -o output_log -e error_log -N job_name \"{}\" | sh
If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:
bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out
This page has interesting qsub tricks you might like
grep http *.job | awk -F: '{print $1}' | sort -u | xargs -I {} qsub {}

What is meant by 'output to stdout'

New to bash programming. I am not sure what is meant by 'output to stdout'. Does it mean print out to the command line?
If I have a simple bash script:
#!/bin/bash
wget -q http://192.168.0.1/test -O - | grep -m 1 'Hello'
it outputs a string to the terminal. Does this mean it's 'outputting to stdout' ?
Thanks
Yes, stdout is the terminal (unless it's redirected to a file using the > operator or into the stdin of another process using |)
In your specific example, you're actually redirecting using | grep ... through grep then to the terminal.
Every process on a Linux system (and most others) has at least 3 open file descriptors:
stdin (0)
stdout (1)
stderr (2)
Regualary every of this file descriptors will point to the terminal from where the process was started. Like this:
cat file.txt # all file descriptors are pointing to the terminal where you type the command
However, bash allows to modify this behaviour using input / output redirection:
cat < file.txt # will use file.txt as stdin
cat file.txt > output.txt # redirects stdout to a file (will not appear on terminal anymore)
cat file.txt 2> /dev/null # redirects stderr to /dev/null (will not appear on terminal anymore
The same is happening when you are using the pipe symbol like:
wget -q http://192.168.0.1/test -O - | grep -m 1 'Hello'
What is actually happening is that the stdout of the wget process (the process before the | ) is redirected to the stdin of the grep process. So wget's stdout isn't a terminal anymore while grep's output is the current terminal. If you want to redirect grep's output to a file for example, then use this:
wget -q http://192.168.0.1/test -O - | grep -m 1 'Hello' > output.txt
Unless redirected, standard output is the text terminal which initiated the program.
Here's a wikipedia article: http://en.wikipedia.org/wiki/Standard_streams#Standard_output_.28stdout.29

Resources