Executing same command for several files in same repository in linux - linux

I'd like to execute the following command for several files in same repository in linux:
../../../../../openSMILE-2.1.0/SMILExtract -C ../../../../../openSMILE-2.1.0/config/IS13_ComParE.conf -I inputfilename.wav -D outputfilename.csv
there are several files (named 1.wav, 2.wav, 3.wav) in the directory and if I execute
../../../../../openSMILE-2.1.0/SMILExtract -C ../../../../../openSMILE-2.1.0/config/IS13_ComParE.conf -nologfile 1 -noconsoleoutput 1 -I 1.wav -D 1.csv
it outputs 1.csv.
How can I create 1.csv, 2.csv, 3.csv, .. by executing just one single command in linux? (or do I have to make .sh file?)

It's probably cleaner to put the following to a script, but you can type it directly into the bash command line as well:
#! /bin/bash
for file in *.wav ; do
prefix=${file%.wav} # Remove from the right.
../../../../../openSMILE-2.1.0/SMILExtract \
-C ../../../../../openSMILE-2.1.0/config/IS13_ComParE.conf \
-I "$file" -D "$prefix".csv
done

Related

Create Directory, download file and execute command from list of URL

I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.
My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.
Example URLs:
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz
Ideally, the file name determines the directory name, for example, TC_3370/.
I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast
Currently I just run the commands in line:
mkdir TC_3370
cd TC_3370/
wget url
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz
crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
Try this Shellcheck-clean program:
#! /bin/bash -p
while read -r url; do
file=${url##*/}
dir=${file%%_chr.*}
mkdir -v -- "$dir"
(
cd "./$dir" || exit 1
wget -- "$url"
crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
)
done <URL.txt
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${url##*/} etc.
The subshell (( ... )) is used to ensure that the cd doesn't affect the main program.
Another implementation
#!/bin/sh
# Read lines as url as long as it can
while read -r url
do
# Get file name by stripping-out anything up to the last / from the url
file_name=${url##*/}
# Get the destination dir name by stripping anything from the first __chr
dest_dir=${file_name%%_chr*}
# Compose the wget output path
fasta_path="$dest_dir/$file_name"
if
# Successfully created the destination directory AND
mkdir -p -- "$dest_dir" &&
# Successfully downloaded the file
wget --output-file="$fasta_path" --quiet -- "$url"
then
# Process the fasta file into fna
fna_path="$dest_dir/TCV2_annot_cds.fna"
crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
else
# Cleanup remove destination directory if any of mkdir or wget failed
rm -fr -- "$dest_dir"
fi
# reading from the URL.txt file for the whole while loop
done < URL.txt
Download files from list is task for -i file option, if you have file named say urls.txt with one URL per line you might simply do
wget -i urls.txt
Note that this will put all files inside current working directory, so if you wish to have them in separate dirs, you would need to move them after wget finish.

How to make the bash script work with one command after another?

I have a bash script like below. First it will take sorted.bam files as input and use "stringtie" tool give each sample gtf as output. Then path for each sample gtf will be given into mergelist.txt. and then use "stringtie merge" on them to get "stringtie_merged.gtf".
I totally have 40 sorted.bam files.
for sample in /path/*.sorted.bam
do
dir="/pathto/hisat2_output"
dir2="/pathto/folder"
base=`basename $sample '.sorted.bam'`
"stringtie -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/${base}/${base}_GRCh38.gtf -l ${dir2}/stringtie_output/${base}/${base} ${dir}/${base}.sorted.bam; ls ${dir2}/stringtie_output/*/*_GRCh38.gtf > mergelist.txt; stringtie --merge -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/stringtie_merged.gtf mergelist.txt"
done
I separated the commands with ; After running the script on all sorted.bam files and after completing the job I see that mergelist.txt has paths only for 33 sample gtf's. Which means the path for other 7 sample gtfs is missing in merge list.txt.
Is Separating the commands with ; right one or is there any other way?
The script should use one command first and with the output the paths need to be given in the text file and then use the other command.
You haven't separated the commands with semi-colons; you've invoked a single command that has semi-colons embedded in it. Consider the simple script:
"ls; pwd"
This script does not call ls followed by pwd. Instead, the shell will search the PATH looking for a file named ls; pwd (that is, a file with a semi-colon and a space in its name), probably not find one and respond with an error message. You need to remove the double quotes.
What's wrong with multiple lines, as you already have more than one line:
dir="/pathto/hisat2_output"
dir2="/pathto/folder"
for sample in /path/*.sorted.bam ;do
base=$(basename ${sample} '.sorted.bam')
stringtie -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/${base}/${base}_GRCh38.gtf -l ${dir2}/stringtie_output/${base}/${base} ${dir}/${base}.sorted.bam
ls ${dir2}/stringtie_output/*/*_GRCh38.gtf > mergelist.txt
stringtie --merge -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/stringtie_merged.gtf mergelist.txt
done
Anyway, I don't see the point in having the second stringtie command inside the loop, it should work fine just after.
If stringtie is able process STDIN you might get away without the mergelist.txt by using:
stringtie --merge -p 8 -G gencode.v27.primary_assembly.annotation_nochr.gtf -o ${dir2}/stringtie_output/stringtie_merged.gtf <<< $(echo ${dir2}/stringtie_output/*/*_GRCh38.gtf)
you should double quote your variables and use $( command ) instead backticks
base=$( basename $sample '.sorted.bam' ) :
you have a space in filenames??
prefer:
base=$( basename "$sample.sorted.bam" ) # with or without space
if you have spaces, you must double quote:
stringtie -p 8 \
-G gencode.v27.primary_assembly.annotation_nochr.gtf \
-o "$dir2/stringtie_output/$base/$base_GRCh38.gtf" \
-l "$dir2/stringtie_output/$base/$base" \
"$dir/$base.sorted.bam"
ls "$dir2"/stringtie_output/*/*_GRCh38.gtf > mergelist.txt
...

Execute commands in specific location and depending on answer of previous command

I am currently working on a Text-to-speech project and I need to write bash script which will, when it is called, execute two commands. If the first command returns the proper answer (if returns an answer at all), the second command will be called and executed.
My question is, how can I write a script, that executes shell commands in a specific certain file system location?
For example, I need to be in the directory /opt/text/example and execute this command:
sudo ./bin/sample_read -I ../languages/ -I ../languages -v dave -T 2 \
-i /opt/text/example.txt -F 22 -O embedded-pro -o out_file.pcm
and then to wait for the answer, then (if it is good) execute the second command.
The second command is
aplay -f S16_LE -r 22050 -c 1 out_file.pcm
This should help:
pushd /path/to/directory
my_var=$(command1)
if [ "$my_var" == "expected_result" ]; then
command2
fi
popd
You basically run command1 and store its output in my_var. Then you compare the content of $my_var with whatever you're expecting.
Also pushd <path>/popd allow you to move to a directory and back.

What does `!:-` do?

I am very new to bash scripting and in Ubuntu\Debian package system.
Today I am studying the content of this preinst file that the script executes before that package is unpacked from its Debian archive (.deb) file.
My fist doubt is about a line containing this:
!:-
Probably it is a stupid question but, using Google, I can't find an answer.
Insert the last command without the last argument (bash)
/usr/sbin/ab2 -f TLS1 -S -n 1000 -c 100 -t 2 http://www.google.com/
then
!:- http://www.stackoverflow.com/
is the same as
/usr/sbin/ab2 -f TLS1 -S -n 1000 -c 100 -t 2 http://www.stackoverflow.com/

Creating a script from a complex bash command

I need to run the following command in a folder containing a lot of gzipped files.
perl myscript.pl -d <path> -f <protocol> "<startdate time>"
"<enddate time>" -o "0:00 23:59" -v -g -b <outputfilename>
My problem is that the command does not take gzipped files as input. So, I would need to unzip all those gzipped files on the fly and run this command on those unzipped files. These gzipped files are in another folder where I am not allowed to unzip them. I want a shell script that will take the path of the remote gzipped files and store it under path (which is also going to be a argument to the script). Do the unzipping and then run the above command.
N.B: The "protocol", "startdate time", "enddate time", "outputfilename" don't have to be arguments for now I will just put them directly in the script so that it is less complex.
You can do:
for fname in path/to/*.gz; do gunzip -c "$fname" | perl myscript.pl ; done
Expanded:
for fname in path/to/*.gz; do
gunzip -c "$fname" | perl myscript.pl
done
And to make it accept filenames with spaces:
old_IFS=$IFS
IFS=$'\n'
for fname in path/to/*.gz; do
gunzip -c "$fname" | perl myscript.pl -f <protocol> "<startdate time>" \
"<enddate time>" -o "0:00 23:59" -v -g -b <outputfilename>
done
IFS=$old_IFS
This way, you make the script read standard input, which will contain the file content, without having to use temporary files.
EDIT: Here's a wrapper script that solves the problem like initially suggested in the question:
`myscriptwrapper`:
#!/bin/bash
gzip_path="$1"
temp_path="$2"
#loop thru files from gzip_pah\th
for fname in $gzip_path/*.gz; do
basename=`basename $fname`
#fill them in the target dir
gunzip "$fname" -c > "$temp_path/$basename"
done
#finally, call our script
perl myscript.pl -d "$temp_path" -f <protocol> "<startdate time>" "<enddate time>" -o "0:00 23:59" -v -g -b <outputfilename>
EDIT 2: Using tar.gz files:
`myscriptwrapper`:
#!/bin/bash
gzip_path="$1"
temp_path="$2"
cd "$temp_path"
#loop thru files from gzip_pah\th
for fname in $gzip_path/*.tar.gz; do
tar -xzf $fname
done
#finally, call our script
perl myscript.pl -d "$temp_path" -f <protocol> "<startdate time>" "<enddate time>" -o "0:00 23:59" -v -g -b <outputfilename>

Resources