How to capture and prefix to process output started within a bash script?

How to capture and prefix to process output started within a bash script? - linux

I'm writing a small bash script where I am compiling 2 programs then executing them in the background within a bash script. These 2 programs do output some generic text. However I need to prefix these outputs like PROGRAM1: xxxxx. How do I achieve that? I have found several answers here however they weren't exactly applicable to this situation.
Here's the code:
#!/bin/bash
echo "This program compiles 2 programs, executes them (executes the 2nd one first, then sleeps for 0.01 second which then executes program 1), then observes the outputs"
gcc -O3 -std=c11 one.c -o one
gcc -O3 -std=c11 two.c -o two
./two &
sleep 0.01
TWO_PID=$(pgrep two)
./one $TWO_PID
#"Prefix output statements here"
#add code here
#Cleanup
rm -f one two

You can do something like this
#! /bin/bash
# ...
label() {
while read -r l; do
echo "$1: $l"
done
}
./two | label "two" &
two_pid=$(pgrep two)
./one $two_pid | label "one"
You don't need the sleep and be careful as pgrep could match more than one process.
Also, instead of compiling and removing you should use make.

Related

Running a regression with shell script and make utility

I want to run a regression with a shell script which should start each test via make command. The following is a simple version of my script:
#!/bin/sh
testlist="testlist.txt"
while read line;
do
test_name=$(echo $line | awk '{print $1}')
program_path=$(echo $line | awk '{print $2}')
make sim TEST_NAME=$test_name PROGRAM_PATH=$program_path
done < "$testlist"
The problem with the above script is that when make command starts a program, the script goes to the next iteration without waiting for the completion of that program in the previous iteration and continues to read the next line from the file.
Is there any option in make utility to make sure that it waits for the completion of the program? Maybe I'm missing something else.
This is the related part of Makefile:
sim:
vsim -i -novopt \
-L $(QUESTA_HOME)/uvm-1.1d \
-L questa_mvc_lib \
-do "add wave top/AL_0/*;log -r /*" \
-G/path=$(PROGRAM_PATH) \
+UVM_TESTNAME=$(TEST_NAME) +UVM_VERBOSITY=UVM_MEDIUM -sv_seed random

As OP stated in a comment, he noticed that vsim forks few other processes, that are still running after vsim is finished. So he needs to wait until the other processes are finished.
I emulated your vsim command with a script that forks some sleep processes:
#!/bin/bash
forksome() {
sleep 3&
sleep 2&
sleep 5&
}
echo "Forking some sleep"
forksome
I made a Makefile, that shows your problem after make normal.
When you know which processes are forked, you can make a solution as I demonstrated with make new.
normal: sim
ps -f
new: sim mywait
ps -f
sim:
./vsim
mywait:
echo "Waiting for process(es):"
while pgrep sleep; do sleep 1; done

Concatenate two instructions in bash script

I would like to make program with option -j after checking how many cores cpu has got.
#!/bin/bash
x="grep -c ^processor /proc/cpuinfo"
make -j${x}
x variable display number of cores. make -j${x} not working

You'll want to capture the output of the command into a string through a subshell:
#!/bin/bash
x=$(grep -c ^processor /proc/cpuinfo)
make -j"${x}"

Does there exist a dummy command that doesn't change input and output?

Does there exist a dummy command in Linux, that doesn't change the input and output, and takes arguments that doesn't do anything?
cmd1 | dummy --tag="some unique string here" | cmd2
My interest in such dummy command is to be able to identify the processes, then I have multiple cmd1 | cmd2 running.

You can use a variable for that:
cmd1 | USELESS_VAR="some unique string here" cmd2
This has the advantage that Jonathan commented about of not performing an extra copy on all the data.

To create the command that you want, make a file called dummy with the contents:
#!/bin/sh
cat
Place the file in your PATH and make it executable (chmod a+x dummy). Because dummy ignores its arguments, you may place any argument on the command line that you choose. Because the body of dummy runs cat (with no arguments), dummy will echo stdin to stdout.
Alternative
My interest in such dummy command is to be able to identify the processes, then I have multiple cmd1 | cmd2 running.
As an alternative, consider
pgrep cmd1
It will show a process ID for every copy of cmd that is running. As an example:
$ pgrep getty
3239
4866
4867
4893
This shows that four copies of getty are running.

I'm not quite clear on your use case, but if you want to be able to "tag" different invocations of a command for the convenience of a sysadmin looking at a ps listing, you might be able to rename argv[0] to something suitably informative:
$:- perl -E 'exec {shift} "FOO", #ARGV' sleep 60 &
$:- perl -E 'exec {shift} "BAR", #ARGV' sleep 60 &
$:- perl -E 'exec {shift} "BAZ", #ARGV' sleep 60 &
$:- ps -o cmd=
-bash
FOO 60
BAR 60
BAZ 60
ps -o cmd
You don't have to use perl's exec. There are other conveniences for this, too, such as DJB's argv0, RedHat's doexec, and bash itself.

You should find a better way of identifying the processes buy anyway, you can use:
ln -s /dev/null /tmp/some-unique-file
cmd1 | tee /tmp/some-unique-file | cmd2

Why aren't positional variables changed when one runs a command every time

I am learning shell scripting and I have this situation.
We say that positional variables are environmental variables, but why they don't change every time a command is executed.
Take a look at this
set v1set v2set v3set v4set
old=$#
#Just a random command
ls -l
new=$#
echo $old $new
It outputs 4 4. If environmental variables are global, why isn't it 4 1, as I ran ls -l and it should have updated positional variables?

Interesant question - you got a good point.
For understanding it, you need understand what happens when you run any command, like ls -l. It has nothing with "variables are restored or similar"...
When you going to run any command,
the bash FORKS itself into to two identical copies
the one copy (called as child) will replace itself with the wanted command (e.g. with ls -l)
at this moment, the child process will get the correct count of positional variables $#
remerber - this happens for the child process, the second (parent) process know NOTHING about this
the parent simply waits until the child finishes (and of course, HIS $# is not changes, because for the parent nothing happens - only waits
when the child (ls -l) finishes, the parent contienue to run - and of course, his $# was no reason to change...
ps: the above is simplyfied. In fact, after the fork they are not fully identical but difer in one number - the parent gets the child's process number, the child this nuber has '0'

If environmental variables are global, why isn't it 4 1
I presume that you are asking why running the command ls -l does not change the positional parameters from four to one with the one being -l.
It does set them to -l for the program ls. When the program ls queries its positional parameters, it is told that is has a single one consisting of -l. Once ls terminates, however, the positional parameters are returned to what they were before.
If environmental variables are global,
Even for global environmental variables, changes to them in child process never appear to the parent process. The communication of environmental variables is a one way street: from parent to child only.
For example:
$ cat test1.sh
echo "in $0, before, we have $# pos. params with values=$*"
bash test2.sh calling test2 from test1
echo "in $0, after , we have $# pos. params with values=$*"
$ cat test2.sh
echo "in $0, we have $# pos. params with values=$*"
$ bash test1.sh -l
in test1.sh, before, we have 1 pos. params with values=-l
in test2.sh, we have 4 pos. params with values=calling test2 from test1
in test1.sh, after , we have 1 pos. params with values=-l
And, another example, this one showing that a child's changes to an environment variable do not affect the parent:
$ cat test3.sh
export myvar=1
echo "in $0, before, myvar=$myvar"
bash test4.sh
echo "in $0, after, myvar=$myvar"
$ cat test4.sh
export myvar=2
echo "in $0, myvar=$myvar"
$ bash test3.sh
in test3.sh, before, myvar=1
in test4.sh, myvar=2
in test3.sh, after, myvar=1

I don't think $# applies to an interactive shell. It works fine in a script. Try this.
$ cat try.sh
#!/bin/bash
echo $*
echo $#
$ ./try.sh one
one
1
$ ./try.sh one two
one two
2
$ ./try.sh one two three
one two three
3

How can I use a pipe or redirect in a qsub command?

There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|) or a redirect (>). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram), that message is all that would be passed to myprogram, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt command to qsub without having to create another file containing the command?

You can do this by turning it into a bash -c command, which lets you put the | in a quoted statement:
qsub bash -c "cmd <options> | cmd2 <options>"
As #spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
echo "cmd <options> | cmd2 <options>" | qsub
as well.

Although my answer is a bit late I am adding it for any incoming viewers. To use a pipe/direct and submit that as a qsub job you need to do a couple of things. But first, using qsub at the end of a pipe like you're doing will only result in one job being sent to the queue (i.e. Your code will run serially rather than get parallelized).
Run qsub with enabling binary mode since the default qsub behavior rather expects compiled code. For that you use the "-b y" flag to qsub and you'll avoid any errors of the sort "command required for a binary mode" or "script length does not match declared length".
echo each call to qsub and then pipe that to shell.
Suppose you have a file params-query.txt which hold several bowtie commands and piped calls to samtools of the following form:
bowtie -q query -1 param1 -2 param2 ... | samtools ...
To send each query as a separate job first prepare your command line units from STDIN through xargs STDIN. Notice the quotes around the braces are important if you are submitting a command of piped parts. That way your entire query is treated a single unit.
cat params-query.txt | xargs -i echo qsub -b y -o output_log -e error_log -N job_name \"{}\" | sh
If that didn't work as expected then you're probably better off generating an intermediate output between bowtie and samtools before calling samtools to accept that intermediate output. You won't need to change the qsub call through xargs but the code in params-query.txt should look like:
bowtie -q query -o intermediate_query_out -1 param1 -2 param2 && samtools read_from_intermediate_query_out
This page has interesting qsub tricks you might like

grep http *.job | awk -F: '{print $1}' | sort -u | xargs -I {} qsub {}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string