GNU parallel inheriting environment variable while executing a local script - linux

Suppose I have foo.sh that calls bar.sh using parallel:
# foo.sh
#! /bin/bash
parallel -N 3 bar.sh ::: $(seq 10)
My bar.sh works like this: if there is an environment variable (e.g. DEBUG=1) set, then it will output lots of debug info.
Ideally I want to simply execute my foo.sh like such:
$ DEBUG=1 foo.sh
Normally, foo.sh has $DEBUG value, as well as bar.sh sees it. But now with me using GNU parallel to call bar.sh, which is a local program, my bar.sh no longer has the DEBUG value set.
I read that --env only works if I had remote execution -S set, and from me trying it does not seem to work for me.
Is there a way to get my parallel'ed bar.sh to simply "inherit" the environment settings of my foo.sh? I really don't want to spell out each and every environment variable and their values when calling bar.sh in parallel.
TIA

You are looking for env_parallel which does exactly this.
Put this in $HOME/.bashrc:
. `which env_parallel.bash`
E.g. by doing:
echo '. `which env_parallel.bash`' >> $HOME/.bashrc
aliases
alias myecho='echo aliases'
env_parallel myecho ::: work
env_parallel -S server myecho ::: work
env_parallel --env myecho myecho ::: work
env_parallel --env myecho -S server myecho ::: work
functions
myfunc() { echo functions $*; }
env_parallel myfunc ::: work
env_parallel -S server myfunc ::: work
env_parallel --env myfunc myfunc ::: work
env_parallel --env myfunc -S server myfunc ::: work
variables
myvar=variables
env_parallel echo '$myvar' ::: work
env_parallel -S server echo '$myvar' ::: work
env_parallel --env myvar echo '$myvar' ::: work
env_parallel --env myvar -S server echo '$myvar' ::: work
arrays
myarray=(arrays work, too)
env_parallel -k echo '${myarray[{}]}' ::: 0 1 2
env_parallel -k -S server echo '${myarray[{}]}' ::: 0 1 2
env_parallel -k --env myarray echo '${myarray[{}]}' ::: 0 1 2
env_parallel -k --env myarray -S server echo '${myarray[{}]}' ::: 0 1 2
env_parallel is part of GNU Parallel 20160722. It is beta quality, so please report bugs if you find any.
If your know your UNIX you will know that you cannot use aliases, non-exported functions, non-exported variables, and non-exported arrays in shells started from the current shell (e.g. in bash -c); and especially not if the shell is remote (e.g. ssh server myalias). With env_parallel this common knowledge has to be revised into: you cannot do it without cheating.

In order to copy the entire environment, use _ as the variable exported by --env:
parallel --env _ -N 3 bar.sh ::: $(seq 10)

Related

Passing static variables to GNU Parallel [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
In a bash script I am trying to pass multiple distinct fastq files and several user-provided static variables to GNU Parallel. I can't hardcode the static variables because while they do not change within the script, they are set by the user and are variable between uses. I have tried a few different ways but get an error argument -b/--bin: expected one argument
Attempt 1:
binSize="10000"
outputDir="output"
errors="1"
minReads="10"
ls fastq_F* | parallel "python myscript.py -f split_fastq_F{} -b $binSize -o $outputDir -e $errors -p -t $minReads"
Attempt 2:
my_func() {
python InDevOptimizations/DemultiplexUsingBarcodes_New_V1.py \
-f split_fastq_F$1 \
-b $binSize \
-o $outputDir \
-e $errors \
-p \
-t $minReads
}
export -f my_func
ls fastq_F* | parallel my_func
It seems clear that I am not correctly passing the static variables... but I can't seem to grasp what the correct way to do this is.
Always try --dr when GNU Parallel does not do what you expect.
binSize="10000"
outputDir="output"
errors="1"
minReads="10"
ls fastq_F* | parallel --dr "python myscript.py -f split_fastq_F{} -b $binSize -o $outputDir -e $errors -p -t $minReads"
You are using " and not ' so the variables should be substituted by the shell before GNU Parallel starts.
If the commands are run locally (i.e. not remote) you can use export VARIABLE.
If run on remote servers, use env_parallel:
env_parallel --session
alias myecho='echo aliases'
env_parallel -S server myecho ::: work
myfunc() { echo functions $*; }
env_parallel -S server myfunc ::: work
myvar=variables
env_parallel -S server echo '$myvar' ::: work
myarray=(arrays work, too)
env_parallel -k -S server echo '${myarray[{}]}' ::: 0 1 2
env_parallel --end-session

Executing `sh -c` in a bash script

I have a test.sh file which takes as a parameter a bash command, it does some logic, i.e. setting and checking some env vars, and then executes that input command.
#!/bin/bash
#Some other logic here
echo "Run command: $#"
eval "$#"
When I run it, here's the output
% ./test.sh echo "ok"
Run command: echo ok
ok
But the issue is, when I pass something like sh -c 'echo "ok"', I don't get the output.
% ./test.sh sh -c 'echo "ok"'
Run command: sh -c echo "ok"
%
So I tried changing eval with exec, tried to execute $# directly (without eval or exec), even tried to execute it and save the output to a variable, still no use.
Is there any way to run the passed command in this format and get the ourput?
Use case:
The script is used as an entrypoint for the docker container, it receives the parameters from docker CMD and executes those to run the container.
As a quickfix I can remove the sh -c and pass the command without it, but I want to make the script reusable and not to change the commands.
TL;DR:
This is a typical use case (perform some business logic in a Docker entrypoint script before running a compound command, given at command line) and the recommended last line of the script is:
exec "$#"
Details
To further explain this line, some remarks and hyperlinks:
As per the Bash user manual, exec is a POSIX shell builtin that replaces the shell [with the command supplied] without creating a new process.
As a result, using exec like this in a Docker entrypoint context is important because it ensures that the CMD program that is executed will still have PID 1 and can directly handle signals, including that of docker stop (see also that other SO answer: Speed up docker-compose shutdown).
The double quotes ("$#") are also important to avoid word splitting (namely, ensure that each positional argument is passed as is, even if it contains spaces). See e.g.:
#!/usr/bin/env bash
printargs () { for arg; do echo "$arg"; done; }
test0 () {
echo "test0:"
printargs $#
}
test1 () {
echo "test1:"
printargs "$#"
}
test0 /bin/sh -c 'echo "ok"'
echo
test1 /bin/sh -c 'echo "ok"'
test0:
/bin/sh
-c
echo
"ok"
test1:
/bin/sh
-c
echo "ok"
Finally eval is a powerful bash builtin that is (1) unneeded for your use case, (2) and actually not advised to use in general, in particular for security reasons. E.g., if the string argument of eval relies on some user-provided input… For details on this issue, see e.g. https://mywiki.wooledge.org/BashFAQ/048 (which recaps the few situations where one would like to use this builtin, typically, the command eval "$(ssh-agent -s)").

qsub Job using GNU parallel not running

I am trying execute qsub job in a multinode(2) and PPN of 20 using GNU parallel, However it shows some error.
#!/bin/bash
#PBS -l nodes=2:ppn=20
#PBS -l walltime=02:00:00
#PBS -N down
cd $PBS_O_WORKDIR
module load gnu-parallel
for cdr in /scratch/data/v/mt/Downscale/*;do
(cp /scratch/data/v/mt/DWN_FILE_NEW/* $cdr/)
(cd $cdr && parallel -j20 --sshloginfile $PBS_NODEFILE 'echo {} | ./vari_1st_imge' ::: *.DS0 )
done
When I run the above code I got the following error(Please note all the path are properly checked, and the same code without qsub is running properly in a normal computer)
$ ./down
parallel: Error: Cannot open echo {} | ./vari_1st_imge.
& for $qsub down -- no output is creating
I am using parallel --version
GNU parallel 20140622
Please help to solve the problem
First try adding --dryrun to parallel.
But my feeling is that $PBS_NODEFILE is not set for some reason, and that GNU Parallel tries to read the command as the --sshloginfile.
To test this:
echo $PBS_NODEFILE
(cd $cdr && parallel --sshloginfile $PBS_NODEFILE -j20 'echo {} | ./vari_1st_imge' ::: *.DS0 )
If GNU Parallel now tries to open -j20 then it is clear that it is empty.

Setting environment variables for multiple commands in bash one-liner

Let's say I have following command
$> MYENVVAR=myfolder echo $MYENVVAR && MYENVVAR=myfolder ls $MYENVVAR
I mean that MYENVVAR=myfolder repeats
Is it possible to set it once for both "&&" separated commands while keeping the command on one line?
Assuming you actually need it as an environment variable (even though the example code does not really need an environment variable; some shell variables are not environment variables):
(export MYENVVAR=myfolder; echo $MYENVVAR && ls $MYENVVAR)
If you don't need it as an environment variable, then:
(MYENVVAR=myfolder; echo $MYENVVAR && ls $MYENVVAR)
The parentheses create a sub-shell; environment variables (and plain variables) set in the sub-shell do not affect the parent shell. In both commands shown, the variable is set once and then used twice, once by each of the two commands.
Parentheses spawn a new process, where you can set its own variables:
( MYENVVAR=myfolder; echo 1: $MYENVVAR; ); echo 2: $MYENVVAR;
1: myfolder
2:
Wrapping the commands into a string and using eval on them is one way not yet mentioned:
a=abc eval 'echo $a; echo $a'
a=abc eval 'echo $a && echo $a'
Or, if you want to use a general-purpose many-to-many mapping between environment variables and commands, without the need to quote your commands, you can use my trap-based function below:
envMulti()
{
shopt -s extdebug;
PROMPT_COMMAND="$(trap -p DEBUG | tee >(read -n 1 || echo "trap - DEBUG")); $(shopt -p extdebug); PROMPT_COMMAND=$PROMPT_COMMAND";
eval "trap \"\
[[ \\\"\\\$BASH_COMMAND\\\" =~ ^trap ]] \
|| { eval \\\"$# \\\$BASH_COMMAND\\\"; false; }\" DEBUG";
}
Usage:
envMulti a=aaa b=bbb; eval 'echo $a'; eval 'echo $b'
Note: the eval 'echo...'s above have nothing to do with my script; you can never do a=aaa echo $a directly, because the $a gets expanded too early.
Or use it with env if you prefer (it actually prefixes any commands with anything):
echo -e '#!/bin/bash\n\necho $a' > echoScript.sh
chmod +x echoScript.sh
envMulti env a=aaa; ./echoScript.sh; ./echoScript.sh
Note: created a test script just to demonstrate usage with env, which can't accept built-ins like eval as used in the earlier demo.
Oh, and the above were all intended for running your own shell commands by-hand. If you do anything other than that, make sure you know all the cautions about using eval -- i.e. make sure you trust the source of the commands, etc.
Did you consider using export like
export MYENVVAR=myfolder
then type your commands like echo $MYENVVAR (that would work even in sub-shells) etc

bash - errors trying to pipe commands to run to separate function

I'm trying to get this function for making it easy to parallelize my bash scripts working. The idea is simple; instead of running each command sequentially, I pipe the command I want to run to this function and it does while read line; run the jobs in the bg for me and take care of logistics.... it doesn't work though. I added set -x by where stuff's executed and it looks like I'm getting weird quotes around the stuff I want executed... what should I do?
runParallel () {
while read line
do
while [ "`jobs | wc -l`" -eq 8 ]
do
sleep 2
done
{
set -x
${line}
set +x
} &
done
while [ "`jobs | wc -l`" -gt 0 ]
do
sleep 1
jobs >/dev/null 2>/dev/null
echo sleeping
done
}
for H in `ypcat hosts | grep fmez | grep -v mgmt | cut -d\ -f2 | sort -u`
do
echo 'ping -q -c3 $H 2>/dev/null 1>/dev/null && echo $H - UP || echo $H - DOWN'
done | runParallel
When I run it, I get output like the following:
> ./myscript.sh
+ ping -q -c3 '$H' '2>/dev/null' '1>/dev/null' '&&' echo '$H' - UP '||' echo '$H' - DOWN
Usage: ping [-LRUbdfnqrvVaA] [-c count] [-i interval] [-w deadline]
[-p pattern] [-s packetsize] [-t ttl] [-I interface or address]
[-M mtu discovery hint] [-S sndbuf]
[ -T timestamp option ] [ -Q tos ] [hop1 ...] destination
+ set +x
sleeping
>
The quotes in the set -x output are not the problem, at most they are another result of the problem. The main problem is that ${line} is not the same as eval ${line}.
When a variable is expanded, the resulting words are not treated as shell reserved constructs. And this is expected, it means that eg.
A="some text containing > ; && and other weird stuff"
echo $A
does not shout about invalid syntax but prints the variable value.
But in your function it means that all the words in ${line}, including 2>/dev/null and the like, are passed as arguments to ping, which set -x output nicely shows, and so ping complains.
If you want to execute from variables complicated commandlines with redirections and conditionals, you will have to use eval.
If I'm understanding this correctly, you probably don't want single quotes in your echo command. Single quotes are literal strings, and don't interpret your bash variable $H.
Like many users of GNU Parallel you seem to have written your own parallelizer.
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
cat hosts | parallel -j8 'ping -q -c3 {} 2>/dev/null 1>/dev/null && echo {} - UP || echo {} - DOWN'
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Put your command in an array.

Resources