Shell script Fetching data from 5 different directories - linux

I'm trying to run a shell script to get data from multiple directories.
My target (targetDir) has 5 directories. So the program, when executed, should search data from these 5 different directories, but when I execute it, it treats all the 5 folders same line. Any advice?
targetDir="snavis_bub snavis_bub2 snavis_bub3 snavis_hdw snavis_ldw"
datadir=/opt/pkg/home/tools/zform/marnel/$targetDir/of_inspect
Upon execute:
./orsInspect.sh: line 60:
cd: /opt/pkg/home/tools/zform/marnel/snavis_bub,snavis_bub2,snavis_bub3,snavis_hdw,snavis_ldw/oref_inspect: No such file or directory

Many things you can do. For example you can use arrays and for loops and perform a task each iteration of the loop:
#!/bin/bash
declare -a targetDirs=("snavis_bub" "snavis_bub2" "snavis_bub3" "snavis_hdw" "snavis_ldw")
for the_dir in "${targetDirs[#]}" ;do
datadir="/opt/pkg/home/tools/zform/marnel/${the_dir}/of_inspect"
echo "$datadir"
# ... do something for each datadir
done
example output (just echoing):
/opt/pkg/home/tools/zform/marnel/snavis_bub/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_bub2/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_bub3/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_hdw/of_inspect
/opt/pkg/home/tools/zform/marnel/snavis_ldw/of_inspect

Related

How to execute sequential commands un linux

I have what I guess is a pretty basic question about linux. Let us say I have I series of files, which run in couples. Each couple of files is the input of my command, which produces a single output. Then a would like to execute the same command with the following couple, producing a new output.
Let us say, for the example, that I have four files: F1.R1, F1.R2, F2.R1, and F2.R2
My first command would be:
myfunction F1.R1 F1.R2 -output F1
And the second:
myfunction F2.R1 F2.R2 -output F2
I would like to produce a command so that all couples are treated sequentially until all files are processed.
Thanks a lot for your help.
Best
Loop over the .R1 files, replace R1 with R2 to get the other filename from the pair, then execute the command.
for file in *.R1
do
file2=${file/.R1/.R2}
output=${file%.R1}
myfunction "$file" "$file2" --output "$output"
done

Using bash to loop through nested folders to run script in current working directory

I've got (what feels like) a fairly simple problem but my complete lack of experience in bash has left me stumped. I've spent all day trying to synthesize a script from many different SO threads explaining how to do specific things with unintuitive commands, but I can't figure out how to make them work together for the life of me.
Here is my situation: I've got a directory full of nested folders each containing a file with extension .7 and another file with extension .pc, plus a whole bunch of unrelated stuff. It looks like this:
Folder A
Folder 1
Folder x
data_01.7
helper_01.pc
...
Folder y
data_02.7
helper_02.pc
...
...
Folder 2
Folder z
data_03.7
helper_03.pc
...
...
Folder B
...
I've got a script that I need to run in each of these folders that takes in the name of the .7 file as an input.
pc_script -f data.7 -flag1 -other_flags
The current working directory needs to be the folder with the .7 file when running the script and the helper.pc file also needs to be present in it. After the script is finished running, there are a ton of new files and directories. However, I need to take just one of those output files, result.h5, and copy it to a new directory maintaining the same folder structure but with a new name:
Result Folder/Folder A/Folder 1/Folder x/new_result1.h5
I then need to run the same script again with a different flag, flag2, and copy the new version of that output file to the same result directory with a different name, new_result2.h5.
The folders all have pretty arbitrary names, though there aren't any spaces or special characters beyond underscores.
Here is an example of what I've tried:
#!/bin/bash
DIR=".../project/data"
for d in */ ; do
for e in */ ; do
for f in */ ; do
for PFILE in *.7 ; do
echo "$d/$e/$f/$PFILE"
cd "$DIR/$d/$e/$f"
echo "Performing operation 1"
pc_script -f "$PFILE" -flag1
mkdir -p ".../results/$d/$e/$f"
mv "results.h5" ".../project/results/$d/$e/$f/new_results1.h5"
echo "Performing operation 2"
pc_script -f "$PFILE" -flag 2
mv "results.h5" ".../project/results/$d/$e/$f/new_results2.h5"
done
done
done
done
Obviously, this didn't work. I've also tried using find with -execdir but then I couldn't figure out how to insert the name of the file into the script flag. I'd appreciate any help or suggestions on how to carry this out.
Another, perhaps more flexible, approach to the problem is to use the find command with the -exec option to run a short "helper-script" for each file found below a directory path that ends in ".7". The -name option allows find to locate all files ending in ".7" below a given directory using simple file-globbing (wildcards). The helper-script then performs the same operation on each file found by find and handles moving the result.h5 to the proper directory.
The form of the command will be:
find /path/to/search -type f -name "*.7" -exec /path/to/helper-script '{}` \;
Where the -f option tells find to only return files (not directories) ending in ".7". Your helper-script needs to be executable (e.g. chmod +x helper-script) and unless it is in your PATH, you must provide the full path to the script in the find command. The '{}' will be replaced by the filename (including relative path) and passed as an argument to your helper-script. The \; simply terminates the command executed by -exec.
(note there is another form for -exec called -execdir and another terminator '+' that can be used to process the command on all files in a given directory -- that is a bit safer, but has additional PATH requirements for the command being run. Since you have only one ".7" file per-directory -- there isn't much benefit here)
The helper-script just does what you need to do in each directory. Based on your description it could be something like the following:
#!/bin/bash
dir="${1%/*}" ## trim file.7 from end of path
cd "$dir" || { ## change to directory or handle error
printf "unable to change to directory %s\n" "$dir" >&2
exit 1
}
destdir="/Result_Folder/$dir" ## set destination dir for result.h5
mkdir -p "$destdir" || { ## create with all parent dirs or exit
printf "unable to create directory %s\n" "$dir" >&2
exit 1
}
ls *.pc 2>/dev/null || exit 1 ## check .pc file exists or exit
file7="${1##*/}" ## trim path from file.7 name
pc_script -f "$file7" -flags1 -other_flags ## first run
## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result1.h5"
pc_script -f "$file7" -flags2 -other_flags ## second run
## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result2.h5"
Which essentially stores the path part of the file.7 argument in dir and changes to that directory. If unable to change to the directory (due to read-permissions, etc..) the error is handled and the script exits. Next the full directory structure is created below your Result_Folder with mkdir -p with the same error handling if the directory cannot be created.
ls is used as a simple check to verify that a file ending in ".pc" exits in that directory. There are other ways to do this by piping the results to wc -l, but that spawns additional subshells that are best avoided.
(also note that Linux and Mac have files ending in ".pc" for use by pkg-config used when building programs from source -- they should not conflict with your files -- but be aware they exists in case you start chasing why weird ".pc" files are found)
After all tests are performed, the path is trimmed from the current ".7" filename storing just the filename in file7. The file7 variabli is then used in your pc_script command (which should also include the full path to the script if not in you PATH). After the pc_script is run [ -s "result.h5" ] is used to verify that result.h5 exists and is non-empty before moving that file to your Result_Folder location.
That should get you started. Using find to locate all .7 files is a simple way to let the tool designed to find the files for you do its job -- rather than trying to hand-roll your own solution. That way you only have to concentrate on what should be done for each file found. (note: I don't have pc_script or the files, so I have not testes this end-to-end, but it should be very close if not right-on-the-money)
There is nothing wrong in writing your own routine, but using find eliminates a lot of area where bugs can hide in your own solution.
Let me know if you have further questions.

abyss-pe: variables to assemble multiple genomes with one command

How do I rewrite the following to properly replace the variable with my genomeID? (I have it working with this method in Spades and Masurca assemblers, so it's something about Abyss that doesnt like this approach and I need a work-around)
I am trying to run abyss on a cluster server but am running into trouble with how abyss-pe is reading my variable input:
my submit file loads a script for each genome listed in a .txt file
my script writes in the genome name throughout the script
the abyss assembly fumbles the variable replacement
Input.sub:
queue genomeID from genomelisttest.txt
Input.sh:
#!/bin/bash
genomeID=$1
cp /mnt/gluster/harrow2/trim_output/${genomeID}_trim.tar.gz ./
tar -xzf ${genomeID}_trim.tar.gz
rm ${genomeID}_trim.tar.gz
for k in `seq 86 10 126`; do
mkdir k$k
abyss-pe -C k$k name=${genomeID} k=$k lib='pe1 pe2' pe1='../${genomeID}_trim/${genomeID}_L1_1.fq.gz ../${genomeID}_trim/${genomeID}_L1_2.fq.gz' pe2='../${genomeID}_trim/${genomeID}_L2_1.fq.gz ../${genomeID}_trim/${genomeID}_L2_2.fq.gz'
done
Error that I get:
`../enome_trim/enome_L1_1.fq.gz': No such file or directory
This is where "enome" is supposed to replace with a five digit genomeID, which happens properly in the earlier part of the script up to this point, where abyss comes in.
pe1='../'"$genomeID"'_trim/'"$genomeID"'_L1_1.fq.gz ...'
I added a single quote before and after the variable

Using two variables in a loop to access two diff file types in the same folder shell scripting linux

I am trying to call two diff files types in a loop.
I have a1.in-a10.in files and b1.out-b10.out files.
I wanna access both files simultaneously. I dont want to use nested loops but simultaneously.
for f1,f2 `ls *.in` `ls *.out`;do
echo "$f1 $f2"
done
I get f1 and f2 not valid identified error
You can process this with essentially the same command as you did with your last question. Just remove the extra arguments and the Java command.
for num in $(seq 1 10);
do echo a$num.in b$num.out; # processing command here
done;
One way is this (here assuming bash):
$ touch a{1..10}.in b{1..10}.in
$ ls
a10.in a2.in a4.in a6.in a8.in b10.in b2.in b4.in b6.in b8.in
a1.in a3.in a5.in a7.in a9.in b1.in b3.in b5.in b7.in b9.in
$ for i in {1..10}; do echo a$i.in b$i.in; done
a1.in b1.in
a2.in b2.in
a3.in b3.in
a4.in b4.in
a5.in b5.in
a6.in b6.in
a7.in b7.in
a8.in b8.in
a9.in b9.in
a10.in b10.in
Here I'm just echoing the strings but you can use any command you like, diff, cat, etc instead of echo

execute program in various folders sequentially

I have two folders named one and two, placed in /home/. There is an (identical) executable prog.x in each folder, which takes 1 hour to finish after begin executed. I would like to first execute /home/one/prog.x and then /home/two/prog.x (so not at the same time).
I would like to make a script that does this for me. Is there a smart way to achieve this? Note that prog.x works by loading in a data file, which is located in the same folder.
Since your program (unwisely?) has an implicit dependence on its executing directory, you may want to consider using subshells, and a ; to separate sequential commands
in a bash style shell you could do something like:
(cd /home/one ; ./prog x) ; (cd /home/two ; ./prog.x)
If you want to make a more general solution, you could use a for loop over a list:
for d in one two ; do cd /home/$d ; ./prog.x ; done

Resources