Using "for" to align paired end sequences - linux

I have a folder with many paired end files (1.1.fq 1.2.fq 2.1.fq 2.2.fq ...) I want to use the "for" to do the aligment for each pair (*.1fq *2.fq) and generate 2 outputs *.stats.txt and *.sam.
I wrote the following command:
for x in *.fq ; do
~/Pedro_Dias/Mamão/Single_end/novocraft/novoalign -d cpapaya.novoIndex -f demultiplex-fq/$x *.1.fq demultiplex-fq/$x *.2.fq -x 3 -H -a -o SAM 2> demultiplex-sam/$x *.stats.txt > demultiplex-sam/$x *.sam;
done
The code return the error:
demultiplex-sam/demultiplex-fq/98.1.fq*.stats.txt:No file or directory
P.s. My files were in demultiplex-fq folder and the output must go to the demultiplex-sam folder. I'm working in a folder that contains the demultiplex-fq demultiplex-sam folders.

You should just loop over one file in the pairs. Then replace .1.fq with .2.fq to get the other file in that pair.
The wildcards need to include the directory name, and then you have to replace the directory when generating the output filenames.
for x in demultiplex-fq/*.1.fq
do
y=${x/.1.fq/.2.fq}
stats=${x/demultiplex-fq/demultiplex/sam}.stats.txt
sam=${x/demultiplex-fq/demultiplex/sam}.sam
~/Pedro_Dias/Mamão/Single_end/novocraft/novoalign -d cpapaya.novoIndex -f "$x" "$y" -x 3 -H -a -o SAM 2> "$stats" > "$sam"
You don't use wildcards in the command, you just use the $x and $y variables.

I used that code and works:
for x in $( ls -1 dem*/*.fq | rev | cut -d . -f 3- | rev | sort -u ) ; do ~/Pedro_Dias/Mamão/Single_end/novocraft/novoalign -d cpapaya.novoIndex -f $x.1.fq $x.2.fq -x 3 -H -a -o SAM 2> ./bams/$(echo $x | tr / _).stats.txtx > ./bams/$(echo $x | tr / _).sam; done

Related

Linux: append all filenames in path to text file

I want to add the filenames of all files of a certain type (*.cub) in the path to a text file in the same path. This file will become the batch (.submit) file. That I can run overnight. I also need to adapt the name a bit.
I do not really know how to describe it better, so I'll give an example:
Let's say I have three files: 001.cub, 002.cub & 003.cub
Then the final text file must be:
[program] -i 001.cub -o 001.vdb
[program] -i 002.cub -o 002.vdb
[program] -i 003.cub -o 003.vdb
It seems a fairly easy operation, but I simply can't get it right.
Also, it really has to become a .submit (or at least some text) file. I cannot run the program immediately.
I hope someone can help!
A simple for loop will do the job:
for i in *.cub
b=$(basename "$i" .cub)
echo "program -i \"$b.cub\" -o \"$b.vdb\""
done >output.txt
Create an empty sh file
List the files *.cub and loop through them
Store the sequence by splitting on dot [.]
echo the required string and append to the sh file of step 1
echo -n "" > 'Run.sh'
for filename in `ls *.cub`
do
sequence=`echo $filename | cut -d "." -f1`
echo "Program -i $filename -o $sequence.vdb" >> Run.sh
done
Directly put the stream into the file as below:
for filename in `ls *.cub`
do
sequence=`echo $filename | cut -d "." -f1`
echo "Program -i $filename -o $sequence.vdb"
done > Run.sh
For everything before the extension to be retained in the variable:
for filename in `ls *.cub`
do
sequence=`echo $filename | rev | cut -d "." -f2- | rev`
echo "Program -i $filename -o $sequence.vdb"
done > Run.sh
For extracting only the numbers from the filename and use accordingly:
for filename in `ls *.cub`
do
sequence=`echo $filename | sed 's/[^0-9]*//g'`
echo "Program -i $filename -o $sequence.vdb"
done > Run.sh
This oneliner will do what you want:
ls *.cub | sort | awk '{split($1,x,"."); print "[program] -i "$1" -o "x[1]".vdb "}' > something.sh

bash - loop through subdirectories, cat files and rename with directory name

I have a folder structure like this ...
data/
---B1/
name_x_1.gz
name_y_1.gz
name_z_2.gz
name_p_2.gz
---C1
name_s_1.gz
name_t_1.gz
name_u_2.gz
name_v_2.gz
I need to go in to each subdirectory (e.g. B1) and perform the following:
cat *_1.gz > B1_1.gz
cat *_2.gz > B1_2.gz
I'm having problems with the file naming part. I can get in directories using the following:
for d in */; do
cat *_1.gz > $d_1.gz
cat *_2.gz > $d_2.gz
done
However I get an error that $d is a directory -- how do I strip the name to create the concatenated filename?
Thanks
Taking your question verbatim: If you have a variable d, where you know that it ends in / (as is the case in your example), you can get the value with this last character stripped by writing ${d:0:-1} (i.e. the substring starting at the beginning, up to (excluding) the last character.
Of course in your case, I would rather write the loop as
for d in *; do
which already creates the names without a trailing slash. But this is still probably not what you want, because d would assume the name of the entries in the directory you have cd'ed to, but you want the name of the directory itself. You can optain this for instance by $(basename "$PWD"), which turns your loop into (i.e.)
cd B1
prefix=$(basename "$PWD") # This set prefix to B1
for f in *
do
# Since your original code indicates that you want to create a *copy* of the file
# with a new name, I do the same here.
cp -v "$f" "${prefix}_$f" #
done
You can also use cat, as in your original solution, if you prefer.
If you're calling bash, you can use parameter expansion and do everything natively in the shell without creating a sub-shell to another process. This is POSIX compliant
#!/bin/bash
for dir in data/*; do
cat "$dir/"*_1.gz > "$dir/${dir##*/}_1.gz"
cat "$dir/"*_2.gz > "$dir/${dir##*/}_2.gz"
done
Sure, just descend into the directory.
# assuming PWD = data/
for d in */; do
(
cd "$d"
cat *_1.gz > "$(basename "$d")"_1.gz
cat *_2.gz > "$(basename "$d")"_2.gz
)
done
how do I strip the name to create the concatenated filename?
The simplest and most portable is with basename.
This requires Ed, which should hopefully be present on your machine. If not, I trust your distribution will have a package for it.
#!/bin/sh
cat >> edprint+.txt << EOF
1p
q
EOF
cat >> edpop+.txt << EOF
1d
wq
EOF
b1="${PWD}/data/B1"
c1="${PWD}/$data/C1"
find "${b1}" -maxdepth 1 -type f > b1stack
find "${c1}" -maxdepth 1 -type f > c1stack
while [ $(wc -l b1stack | cut -d' ' -f1) -gt 0 ]
do
b1line=$(ed -s b1stack < edprint+.txt)
b1name=$(basename "${b1line}")
b1suffix=$(echo "${b1name}" | cut -d'_' -f3)
b1fixed=$(echo "B1_${b1suffix}"
mv -v "${b1}/${b1line}" "${b1}/${b1fixed}"
ed -s b1stack < edpop+.txt
done
while [ $(wc -l c1stack | cut -d' ' -f1) -gt 0 ]
do
c1line=$(ed -s c1stack < edprint+.txt)
c1name=$(basename "${c1line}")
c1suffix=$(echo "${c1name}" | cut -d'_' -f3)
c1fixed=$(echo "B1_${c1suffix}"
mv -v "${c1}/${c1line}" "${c1}/${c1fixed}"
ed -s c1stack < edpop+.txt
done
rm -v ./edprint+.txt
rm -v ./edpop+.txt
rm -v ./b1stack
rm -v ./c1stack

Batch file rename where part of the old name became directories [duplicate]

This question already has answers here:
How do I split a string on a delimiter in Bash?
(37 answers)
Closed 2 years ago.
I don't know if this is possible using bash but it would be nice to be able to do this just using bash.
I receive bunch of files (regularly) with the following name pattern:
xxx___yyy___abc__def.pdf
xxxa___y_yy___fg-h___ijdfdak.pdf
xx___v-vv___a_fasl-bk___os___23l.pdf
etc.
And I need to rename and move them into directories:
~/xxx/yyy/abc/def.pdf
~/xxxa/y_yy/fg-h/ijdfdak.pdf
~/xx/v-vv/a_fasl-bk/os/23l.pdf
Is it possible? Please help.
Make a folder based on two arguments like
mkdir -p ~/xxx/yyy/abc
move the file inside the folder
mv xxx___yyy___abc__def.pdf ~/xxx/yyy/abc/def.pdf
Or just make a script accept the file as argument
#!/bin/bash
FOLDER="$(echo $1 | tr -s '_' | cut -d "_" -f1)"
SUBFOLDER="$(echo $1 | tr -s '_' | cut -d "_" -f2)"
SUBSUBFOLDER="$(echo $1 | tr -s '_' | cut -d "_" -f3)"
FILE="$(echo $1 | tr -s '_' | cut -d "_" -f4)"
mkdir -p "~/${FOLDER}/${SUBFOLDER}/${SUBSUBFOLDER}"
mv "$1" "~/${FOLDER}/${SUBFOLDER}/${SUBSUBFOLDER}/${FILE}"
Usage: ./script.sh xxx___yyy___abc__def.pdf
Not fancy but it works.
Using parameter expansion.
#!/bin/sh
for f in *.pdf; do
last=${f##*_} first=${f%%_*}
third=${f%$last*} third="${third%??*}"
third=${third##*_} second=${f%$third*}
second=${second%???*} second=${second##*_}
echo mkdir -p ~/"$first/$second/$third" && \
echo mv -v "$f" ~/"$first/$second/$third/$last"
done
As per update of the OP's question. Should remove all underscores.
#!/bin/bash
for f in *.pdf; do
new=$(awk -F'[_]+' -vOFS='/' '{$1=$1}1' <<< "$f")
echo mkdir -p ~/"${new%/*}/" && \
echo mv -v "$f" ~/"$new"
done
Remove the echo to actually rename the files.
The actual out put without the echo
copied 'xx___v-vv___a_fasl-bk__os23l.pdf' -> '/home/Pelangi/xx/v-vv/a/fasl-bk/os23l.pdf'
removed 'xx___v-vv___a_fasl-bk__os23l.pdf'
copied 'xxxa___y_yy___fg-h__ijdfdak.pdf' -> '/home/Pelangi/xxxa/y/yy/fg-h/ijdfdak.pdf'
removed 'xxxa___y_yy___fg-h__ijdfdak.pdf'
copied 'xxx___vvv___abk__osl.pdf' -> '/home/Pelangi/xxx/vvv/abk/osl.pdf'
removed 'xxx___vvv___abk__osl.pdf'
copied 'xxx___yyy___abc__def.pdf' -> '/home/Pelangi/xxx/yyy/abc/def.pdf'
removed 'xxx___yyy___abc__def.pdf'
copied 'xxx___yyy___fgh__ijk.pdf' -> '/home/Pelangi/xxx/yyy/fgh/ijk.pdf'
removed 'xxx___yyy___fgh__ijk.pdf'

Script which subtract two file sizes

I would like to subtract sizes of two files. I found location of that files and then I used command:
du -h /bin/ip | cut -d "K" -f1
I got 508 and I wanted to create variable
x=$((du -h /bin/ip | cut -d "K" -f1))
but at the result I got
"-bash: du -h /bin/ip | cut -d 'K' -f1: division by 0 (error token is "bin/ip | cut -d 'K' -f1")"
What did I do wrong? How can i put this value in variable?
What did I do wrong?
You used arithmetic expansion $(( ... )) instead of a command substitution $( ... ). As a result shell interpreted /bin as / as division and bin as 0 (because there is no variable named bin) and tried to divide by 0.
How can i put this value in variable?
Use a command substitution:
x=$(du -h /bin/ip | cut -d "K" -f1)
But it would be way more reliable to use stat for collecting information about files:
x=$(stat -c %s /bin/ip)
To substract two file sizes, you can again use command substitutions to get the size, but use arithmetic expansion to calculate the difference.
difference=$(( $(stat -c %s file1) - $(stat -c %s file2) ))
Perl to the rescue!
perl -le 'print((-s "file1.txt") - (-s "file2.txt"))'
-l adds newline to print
-s returns a file size (see -x)

Remove certain characters from file name

I have four files in a directory let's say as following
Test_File_20170101_20170112_1.txt
Test_File_20170101_20170112_2.txt
Test_File_20170101_20170112_3.txt
Test_File_20170101_20170112_4.txt
and I want to merge them in order and want the final file as
Test_File_20170101_20170112.txt
You can do something like this:
ls *_[1-9].txt \
| sed 's/_[1-9]\.txt//' \
| sort -u \
| xargs -n 1 -I {} sh -c "cat {}_*.txt > {}.txt"
Explaining each step:
ls *_[1-9].txt: list all files ending with _1.txt, _2.txt etc
sed 's/_[1-9]\.txt//': remove the extension and number suffix
sort -u: unique file names (e.g. Test_File_20170101_20170112)
xargs ...: for each file name, catenate each numbered file into a new file
You could extend this to a larger sequence, e.g. _10.txt etc, but you would need to be aware that the order would not be correct, as it would be in alphabetical order at the expansion of the final *, e.g. _1, _10, _2... Here are some approaches for this: cat files in specific order based on number in filename
I complete the following. Your doubt is fine but I was stuck at 1 point where #cmbuckley helped and my code is working as I expected. Thanks for your concerns and help. You can still correct my code if anything is not right but it is working fine.
#!/bin/bash
PDIR=$1
FDIR=$2
# Change the current working directory
cd "$PDIR" || exit;
# Count number of files present in the current working directory
c=$(ls -p | grep -v / | wc -l)
# Count number of iteration it needed for Loop
n=$(expr "$c" / 4)
# Move the first 4 sorted files.
# 1.Header File 2.Column Name File 3.Detail records file 4.Trailer record file
i=1
while [ "$i" -le "$n" ];
# Move first 4 files from source folder to processing folder
do mv `ls -p | grep -v / | head -4` "$PDIR"merge/;
# Change directory to processing
cd "$PDIR"merge/ || exit;
# Look for files ending with "_[1-9], sort and then merge them to one file and remove numeric from output file
ls *_[1-9].txt | sort -u | sed 's/_[1-9]\.txt//' | xargs -n 1 -I {} sh -c "cat {}_*.txt > {}.txt";
# Remove processed files
rm -f *_[1-9].txt;
# Move output file to Target directory
mv *.txt "$FDIR";
cd "$PDIR" || exit;
i=$(($i + 1));
done

Resources