execute program in various folders sequentially - linux

I have two folders named one and two, placed in /home/. There is an (identical) executable prog.x in each folder, which takes 1 hour to finish after begin executed. I would like to first execute /home/one/prog.x and then /home/two/prog.x (so not at the same time).
I would like to make a script that does this for me. Is there a smart way to achieve this? Note that prog.x works by loading in a data file, which is located in the same folder.

Since your program (unwisely?) has an implicit dependence on its executing directory, you may want to consider using subshells, and a ; to separate sequential commands
in a bash style shell you could do something like:
(cd /home/one ; ./prog x) ; (cd /home/two ; ./prog.x)
If you want to make a more general solution, you could use a for loop over a list:
for d in one two ; do cd /home/$d ; ./prog.x ; done

Related

Use more than one core in bash

I have a linux tool that (greatly simplifying) cuts me the sequences specified in illumnaSeq file. I have 32 files to grind. One file is processed in about 5 hours. I have a server on the centos, it has 128 cores.
I've found a few solutions, but each one works in a way that only uses one core. The last one seems to fire 32 nohups, but it'll still pressurize the whole thing with one core.
My question is, does anyone have any idea how to use the server's potential? Because basically every file can be processed independently, there are no relations between them.
This is the current version of the script and I don't know why it only uses one core. I wrote it with the help of advice here on stack and found on the Internet:
#!/bin/bash
FILES=/home/daw/raw/*
count=0
for f in $FILES
to
base=${f##*/}
echo "process $f file..."
nohup /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o "OUT$base" $f &
(( count ++ ))
if (( count = 31 )); then
wait
count=0
fi
done
I'm explaining: FILES is a list of files from the raw folder.
The "core" line to execute nohup: the first path is the path to the tool, -a path is the path to the file with paternas to cut, out saves the same file name as the processed + OUT at the beginning. The last parameter is the input file to be processed.
Here readme tools:
https://github.com/vsbuffalo/scythe
Does anybody know how you can handle it?
P.S. I also tried move nohup before count, but it's still use one core. I have no limitation on server.
IMHO, the most likely solution is GNU Parallel, so you can run up to say, 64 jobs in parallel something like this:
parallel -j 64 /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o OUT{.} {} ::: /home/daw/raw/*
This has the benefit that jobs are not batched, it keeps 64 running at all times, starting a new one as each job finishes, which is better than waiting potentially 4.9 hours for all 32 of your jobs to finish before starting the last one which takes a further 5 hours after that. Note that I arbitrarily chose 64 jobs here, if you don't specify otherwise, GNU Parallel will run 1 job per CPU core you have.
Useful additional parameters are:
parallel --bar ... gives a progress bar
parallel --dry-run ... does a dry run so you can see what it would do without actually doing anything
If you have multiple servers available, you can add them in a list and GNU Parallel will distribute the jobs amongst them too:
parallel -S server1,server2,server3 ...

balancing the bash calculations

We have a tool for cutting adaptors https://github.com/vsbuffalo/scythe/blob/master/README.md and we wanted it to be used on all the files in the raw folder and make an output of each file separately as OUT+File Name.
Something is wrong with this script I wrote, because it doesn't take each file separately, and the whole thing doesn't work properly. It's gonna generateing empty file named OUT+files
Expected operation will looks:
take file1, use scythe on it, write output as OUTfile1
take file2 etc.
#!/bin/bash
FILES=/home/dave/raw/*
for f in $FILES
do
echo "Processing the $f file..."
/home/deve/scythe/scythe -a /home/dev/scythe/illumina_adapters.fa -o "OUT"+$f $f
done
Additionally, I noticed (testing for a single file) that the script uses only one core out of 130 available. Is there any way to improve it?
There is no string concatenation operator in shell. Use juxtaposition instead; it's "OUT$f", not "OUT"+$f.

abyss-pe: variables to assemble multiple genomes with one command

How do I rewrite the following to properly replace the variable with my genomeID? (I have it working with this method in Spades and Masurca assemblers, so it's something about Abyss that doesnt like this approach and I need a work-around)
I am trying to run abyss on a cluster server but am running into trouble with how abyss-pe is reading my variable input:
my submit file loads a script for each genome listed in a .txt file
my script writes in the genome name throughout the script
the abyss assembly fumbles the variable replacement
Input.sub:
queue genomeID from genomelisttest.txt
Input.sh:
#!/bin/bash
genomeID=$1
cp /mnt/gluster/harrow2/trim_output/${genomeID}_trim.tar.gz ./
tar -xzf ${genomeID}_trim.tar.gz
rm ${genomeID}_trim.tar.gz
for k in `seq 86 10 126`; do
mkdir k$k
abyss-pe -C k$k name=${genomeID} k=$k lib='pe1 pe2' pe1='../${genomeID}_trim/${genomeID}_L1_1.fq.gz ../${genomeID}_trim/${genomeID}_L1_2.fq.gz' pe2='../${genomeID}_trim/${genomeID}_L2_1.fq.gz ../${genomeID}_trim/${genomeID}_L2_2.fq.gz'
done
Error that I get:
`../enome_trim/enome_L1_1.fq.gz': No such file or directory
This is where "enome" is supposed to replace with a five digit genomeID, which happens properly in the earlier part of the script up to this point, where abyss comes in.
pe1='../'"$genomeID"'_trim/'"$genomeID"'_L1_1.fq.gz ...'
I added a single quote before and after the variable

Shell script Fetching data from 5 different directories

I'm trying to run a shell script to get data from multiple directories.
My target (targetDir) has 5 directories. So the program, when executed, should search data from these 5 different directories, but when I execute it, it treats all the 5 folders same line. Any advice?
targetDir="snavis_bub snavis_bub2 snavis_bub3 snavis_hdw snavis_ldw"
datadir=/opt/pkg/home/tools/zform/marnel/$targetDir/of_inspect
Upon execute:
./orsInspect.sh: line 60:
cd: /opt/pkg/home/tools/zform/marnel/snavis_bub,snavis_bub2,snavis_bub3,snavis_hdw,snavis_ldw/oref_inspect: No such file or directory
Many things you can do. For example you can use arrays and for loops and perform a task each iteration of the loop:
#!/bin/bash
declare -a targetDirs=("snavis_bub" "snavis_bub2" "snavis_bub3" "snavis_hdw" "snavis_ldw")
for the_dir in "${targetDirs[#]}" ;do
datadir="/opt/pkg/home/tools/zform/marnel/${the_dir}/of_inspect"
echo "$datadir"
# ... do something for each datadir
done
example output (just echoing):
/opt/pkg/home/tools/zform/marnel/snavis_bub/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_bub2/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_bub3/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_hdw/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_ldw/of_inspect

What does this bash script command mean (sed - e)?

I'm totally new to bash scripting but i want to solve this problem..
the command is:
objfil=`echo ${srcfil} | sed -e "s,c$,o,"`
the idea about the bash script program is to check for the source files, and check if there is an adjacent object file in the OBJ directory, if so, the rest of the program runs smoothly, if not, the iteration terminates and skips the current source file, and moves on to the next one.. it works with .c files but not on the headers, since the object filenames depend on .c files.. i want to write this command so it checks the object files not just the .c but the .h files too.. but without skipping them. i know i have to do something else too, but i need to understand what this line of command does exactly to move on. Thanks. (Sorry for my english)
UPDATE:
if test -r ${curOBJdir}/${objfil}
then
cp -v ${srcfil} ./SAVEDSRC/${srcfil}
fdone="NO"
linenums=ALL
else
fdone="YES"
err="${curOBJdir}/${objfil} is missing - ${srcfil} skipped)"
echo ${err}
echo ${err} >>${log}
fi
while test ${fdone} == "NO"
do
#rest of code ...
here is the rest of the program.. i tried to comment out the "test" part to ignore the comparison just because i only want my script to work on .h files, but without checking the e.g abc.h files has an abc.o file.. (the object file generation is needed because the end of the script there's a comparison between the hexdump of the original and modified object files). The whole script is for changing the basic types with typedefs like int to sint32_t for example.
This concrete command will substitute all c's right before line-end to o:
srcfill=abcd.c
objfil=`echo ${srcfil} | sed -e "s,c$,o,"`
echo $objfil
Output:
abcd.o
P.S. It uses a different match/replace separator: default is / but it uses ,.

Resources