Bash Multiple Process Substitution Redirect Order - linux

I am trying to use multiple process substitutions in a BASH command but I seem to be misunderstanding the order in which they resolve and redirect to each other.
The System
Ubuntu 18.04
BASH version - GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
The Problem
I am trying to redirect an output of a command into tee, have that redirect into ts (adding a timestamp) and then have that redirect into split (splitting the output into separate files). I can get the output to redirect into tee and ts but when redirecting into split I run into a problem.
My Attempts
command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' > tempfile.txt)) - this will redirect the output into process substitution of tee then redirext to process substitution ts and add the timestamp then redirect to tempfile.txt this is what I would expect
command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -))) - this does nothing even though I would hope that the result would have been a bunch of 10 byte files with timestamps on the different rows.
To continue testing I tried with echo instead to see what happens
command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' >(echo))) - the print from the initial tee prints (as it should) but the echo prints an empty line apparently this irrelevant because of new result I got - see edit at the bottom
command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]') >(split -d -b 10 -)) - This prints the command with the timestamp (as tee and ts should) and in addition creates 10 byte files with the command output (no timestamp on them). - this is what I expected and makes sense as the tee gets redirected to both process substitutions separately, it was mostly a sanity check
What I think is happening
From what I can tell >(ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -)) are resolving first as a complete and separate command of its own. Thus split (and echo) are receiving an empty output from ts which has no output on its own. Only after this does the actual command resolve and send its output to its substitution tee.
This doesn't explain why command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' > tempfile.txt)) does work as by this theory tee by itself has no output so ts should be receiving not input and should also output a blank.
All this is to say I am not really sure what is happening.
What I want
Basically I just want to understand how to make command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -))) work in the way it seems it should. I need the commands output to send itself to the process substitution tee which will send it to the process substitution ts and add the timestamps which will sent it to split and split the output to several small files.
I have tried command > >(echo) and saw the output is blank, which is not what I expected (i expected echo to receive and then output the command output). I think I am just very much misunderstanding how process substitution works at this point.

You can split send the error stream from the command into a different pipeline than the output, if that is desired:
{ { cmd 2>&3 | ts ... | split; } 3>&1 >&4 | ts ... | split; } 4>&1
This sends the output of cmd to the first pipeline, while the error stream from cmd goes into the 2nd pipe. File descriptor 3 is introduced to keep the error streams from ts and split separate, but that may be undesirable. fd 4 is introduced to prevent the output of split from being consumed by the second pipeline, and that may be unnecessary (if split does not produce any output, for example.)

One thing you could do if you really want to have one command redirect stdin/stderr to a separate ts|tee|split is this
command 1> >(ts '[%Y-%m-%d %H:%M:%S]' | tee -a >(split -d -b 10 -)) 2> >(ts '[%Y-%m-%d %H:%M:%S]' | tee -a >(split -d -b 10 -))
But the downside is tee only prints after the prompt gets printed. There is probably a way to avoid this by duplicating file descriptors, but this is the best I could think of.

This:
ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -)
expands the file name generated by the process substitution on the command line of ts, so what gets run is something like ts '[%Y-%m-%d %H:%M:%S]' /dev/fd/63. ts then tries to open the fd that goes to split to read input from there, instead of reading from the original stdin.
That's probably not what you want, and on my machine, I got some copies of ts and split stuck in the background while testing. Possibly successfully connected to each other, which may explain the lack of error messages.
You probably meant to write
ts '[%Y-%m-%d %H:%M:%S]' > >(split -d -b 10 -)
^
with a redirection to the process substitution.
That said, you could just use a pipe there between ts and split.

Related

Store output of every command with timestamp in a file in Linux

Is there any way to store the output of every command in a log file with a timestamp?
I have tried this script but it did nothing.
mkdir /home/my_name/demo |& tee /home/my_name/My_log.log
mkdir has no output so you won't see any output. Also, you need to use ts to get the timestamp.
echo hello | ts '[%Y-%m-%d %H:%M:%S]' | tee ~/my_name/My_log.log
ts might not be installed on your system, but it can be found in the package moreutils.
If you have multiple commands you want to log, you can put them in a script and then pipe the output of the script through the pipeline above:
myscript | ts '[%Y-%m-%d %H:%M:%S]' | tee ~/my_name/My_log.log
use >> operator to write your output to file.You can use tee command as well. the only difference is >> doesnt write the output to STDOUT.
have your script or command executed something like below:
customScript | ts -r '[%Y-%m-%d %H:%M:%S]' | tee -a /home/my_name/My_log.log
or
customScript | ts -r '[%Y-%m-%d %H:%M:%S]' >> /home/my_name/My_log.log

linux strace: How to filter system calls that take more than a second

I'm using "strace -f -T -tt -o foo.txt -p 1234" to print the time spent in system calls. This makes the output file huge, is there a way to just print the system calls that took greater than 1second. I can grep it out from the file later, but is there a better way?
If we simply omit the -o foo.txt argument, the output goes to standard output. We can pipe it through grep and redirect to the file:
strace -f -T -tt -p 1234 | grep pattern > foo.txt
To watch the output at the same time:
strace -f -T -tt -p 1234 | grep pattern | tee foo.txt
If a command prints only to a file that is passed as an argument, and we want to filter/redirect its output, the first step is to check whether it implements the dash convention: can you specify standard input or output using - as a filename argument:
some_command - | our_pipe > file.txt
If not, then the recourse is to use Bash process substitution substitution syntax: >(output command) and <(input command):
some_command >(our_pipe > file.txt)
The process substitution syntax expands into a token that is suitable as a filename argument for a command or function. When the program opens that token, it gets a file descriptor to the command's input or output, depending on direction.
With process substitution, we can redirect the input or output of stubborn programs which work only with files passed as by name as arguments, and which do not support any convention for requesting that standard input or output be used in place of a file.
The token used by process substitution is platform-dependent; we can see what it is using echo. For instance on GNU/Linux, Bash takes advantage of the /dev/fd operating system feature:
$ echo <(ls -l)
/dev/fd/63
You can use the following command:
strace -T command 2>&1 >/dev/null | awk '{gsub(/[<>]/,"",$NF)}$NF+.0>1.0'
Explanation:
strace -T adds the time spent in the syscall end the end of the line, enclosed in <...>
2>&1 >/dev/null | awk pipes stderr to awk. (strace writes it's output to stderr!)
The awk command removes the <> from the last field $NF and prints lines where the time spent is higher than a second.
Probably you'll also want to pass the threshold as a variable to the awk command:
strace -T command 2>&1 >/dev/null \
| awk -v thres=0.001 '{gsub(/[<>]/,"",$NF)}$NF+.0>thres+.0'

tee command not working when run in background

I want a daemon that continuously watches a named pipe for input, and then creates two files: one that contains all the data that went through the named pipe, and one that contains a filtered version of it.
When I run this command from the command line it works as intended, $ACTIONABLE_LOG_FILE is created as a filtered version of $LOG_FILE_BASENAME:
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE" &
But when I leave the following code running in the background, nothing gets appended to $ACTIONABLE_LOG_FILE:
while true
do
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE" &
wait $!
done
The file $ACTIONABLE_LOG_FILE gets created, but nothing gets appended to it. What's going on here?
My suspicion would be that when daemonized, you do not have a full environment available, and hence no $PATH. The full path to the command (likely /usr/bin/tee) might help a lot. You can confirm that locally with which tee.

How can I use a pipe or redirect in a qsub command?

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?
You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:
qsub bash -c "cmd <options> | cmd2 <options>"
As #spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
echo "cmd <options> | cmd2 <options>" | qsub
as well.
Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).
Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
echo each call to qsub and then pipe that to shell.
Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:
bowtie -q query -1 param1 -2 param2 ... | samtools ...
To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.
cat params-query.txt | xargs -i echo qsub -b y -o output_log -e error_log -N job_name \"{}\" | sh
If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:
bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out
This page has interesting qsub tricks you might like
grep http *.job | awk -F: '{print $1}' | sort -u | xargs -I {} qsub {}

How to log output in bash and see it in the terminal at the same time?

I have some scripts where I need to see the output and log the result to a file, with the simplest example being:
$ update-client > my.log
I want to be able to see the output of the command while it's running, but also have it logged to the file. I also log stderr, so I would want to be able to log the error stream while seeing it as well.
update-client 2>&1 | tee my.log
2>&1 redirects standard error to standard output, and tee sends its standard input to standard output and the file.
Just use tail to watch the file as it's updated. Background your original process by adding & after your above command After you execute the command above just use
$ tail -f my.log
It will continuously update. (note it won't tell you when the file has finished running so you can output something to the log to tell you it finished. Ctrl-c to exit tail)
You can use the tee command for that:
command | tee /path/to/logfile
The equivelent without writing to the shell would be:
command > /path/to/logfile
If you want to append (>>) and show the output in the shell, use the -a option:
command | tee -a /path/to/logfile
Please note that the pipe will catch stdout only, errors to stderr are not processed by the pipe with tee. If you want to log errors (from stderr), use:
command 2>&1 | tee /path/to/logfile
This means: run command and redirect the stderr stream (2) to stdout (1). That will be passed to the pipe with the tee application.
Learn about this at askubuntu site
another option is to use block based output capture from within the script (not sure if that is the correct technical term).
Example
#!/bin/bash
{
echo "I will be sent to screen and file"
ls ~
} 2>&1 | tee -a /tmp/logfile.log
echo "I will be sent to just terminal"
I like to have more control and flexibility - so I prefer this way.

Resources