Launching program several times - linux

I am using Mac Os. This is command line code to lauch my programm (two parts)
nucmer --mum file1.txt file2.txt
show-snps -Clr -x 2 out.delta > out_file1.snps
First part of the programm creates file out.delta. My file2.txt is always the same, but I want to launch this both parts 35000 times whith different file1.txt. All the file1s are located in the same directory.
Is it possible to do it using BASH?

Keep all the input files in a directory. Create a wrapper script to invoke nucmer script and then show-snps script. Your wrapper script will accept path to file directory as input. Iterate over all files in the directory and call your two scripts.

You could do something along these lines:
find . -maxdepth 1 -type f -print | grep -v './out_' | while read f
do
b=$(basename ${f})
nucmer --mum ${f} file2.txt
show-snps -Clr -x 2 out.delta > out_${b}.snps
done
The find bit finds all files in the current directory. grep filters out any previous output files, in case you've run some previously. The basename line strips off the leading ./ and trailing extension, and then your two programs get run with the input file name and an output filename based on the basename output.

If you don't get an argument list too long error, you could just use for:
for f in file*.txt; do nucmer --mum $f second.txt; show-snps -Clr -x 2 out.delta > out_${f%.txt}.snps; done

Related

how to run grep from script and store output in a file in the destination directory from bash script

I am trying to filter out lines from a file through a bash script. I am able to find the path of the file from script location by running the command
Fgff=`find $D -maxdepth 1 -type f -name "*.gff"`
I can add a column to the found .gff file by running the command
sed -i '1 s/$/\tsample/; 1! s/$/\t'${D##*/}'/' $Fpsi
However if I try to filter the file and write the output in another file in the same folder then its not working.
grep 'ENSG00000155657\|ENSG00000198947' $Fgff > "$Fgff$filtered"
I want to know why grep is not working?
How can I filter all the lines having substring ENSG00000155657 or ENSG00000198947 in file apple.gff at ./dira/dirb/apple.gff and store it in ./dira/dirb/applefiltered.gff?
thanks
Providing that your $Fgff contains the correct filename, your grep command does exactly what you requested, searching for the string 'ENSG0000015565(7\|E)NSG00000198947' while you probably wanted '(ENSG00000155657)\|(ENSG00000198947)'.

how to copy file to multiple sub directories linux

I have a file needs to copy unique directory call test
directory structure as below
/contentroot/path/a/x/test
/contentroot/path/a/y/test
/contentroot/path/a/z/test
--------------------------
as above I have more then 250 combination test directory
I have try below command ( by using asterisk) but it's only copy one test directly only and giving issue (cp: omitting directory )
cp myfile.txt /contentroot/path/a/*/test
any Help
Perhaps a for loop?
for FOLDER in /contentroot/path/a/*/test; do
cp myfile.txt $FOLDER
done
You can feed the output of an echo as an input to xargs. xargs will then run the cp command three times, appending the next directory path piped to it from the echo.
The -n 1 option on the xargs command is so it only appends one of those arguments at a time to the cp each time it runs.
echo /contentroot/path/a/x/test /contentroot/path/a/y/test /contentroot/path/a/z/test | xargs -n 1 cp myfile.txt
Warnings! Firstly this will over-write files (if they exist) and secondlt any bash command should be tested and used at the runners risk! ;)

Bash script to get all file with desired extensions

I'm trying to write a bash script that if I pass a text file containing some extension and a folder returns me an output file with the list of all files that match the desired extension, searching recursively in all sub-directories
the folder is my second parameter the extension list file my first parameter
I have tried:
for i in $1 ; do
find . -name $2\*.$i -print>>result.txt
done
but doesn't work
As noted from in comment:
It is not a good idea to write to a hard coded file name.
The given example fixes only the given code from the OP question.
Yes of course, it is even better to call with
x.sh y . > blabla
and remove the filename from the script itself. But my intention is not to fix the question...
The following bash script, named as x.sh
#!/bin/bash
echo -n >result.txt # delete old content
while read i; do # read a line from file
find $2 -name \*.$i -print>>result.txt # for every item do a find
done <$1 # read from file named with first arg from cmdline
with an text file named y with following content
txt
sh
and called with:
./x.sh y .
results in a file result.txt which contents is:
a.txt
b.txt
x.sh
OK, lets give some additional hints as got from comments:
If the results fiel should not collect any other conntent from other results of the script it can be simplified to:
#!/bin/bash
while read i; do # read a line from file
find $2 -name \*.$i -print # for every item do a find
done <$1 >result.txt # read from file named with first arg from cmdline
And as already mentioned:
The hard coded result.txt could be removed and the call can be something like
./x.sh y . > result.txt
Give this one-liner command a try.
Replace /mydir with the folder to search.
Change the list of extensions passed as argument to the egrep command:
find /mydir -type f | egrep "[.]txt|[.]xml" >> result.txt
After the egrep, each extension should be separated with |.
. char must be escaped with [.]

Command line bash for entering multiple directories and executing a command

I'm new to this site (and to programming, more or less), but I'm hoping you can help.
I have numerous directories named 3K, 4K, 5K, etc. Within each directory I have 12 subdirectories named v1 to v12, each containing a file called OUTCAR. I am trying to write a bash command that will allow me to enter each of the subdirectories and gather data from OUTCAR.
The function works with no issues when I enter each subdirectory individually.
I'm using
for file in v{1..12} ; do grep "key_string" OUTCAR | awk '{print(relevant_stuff)}' > output.dat ; done
From the *K fine that contains the v{1..12} subdirectories.
However, I'm getting an error telling me that OUTCAR doesn't exist for each v{1..12}. I know it does, so I'm guessing that I haven't properly directed the command to cd into each subdirectory first. Any tips?
Thanks!
You would be better of using this find command from top level directory where these sub directories exist:
find . -type d -name 'v[1-9][[1-9]' \
-exec awk '/key_string/ {print FILENAME ":" $0}' {}/* >> output.dat \;

How to open all files in a directory in Bourne shell script?

How can I use the relative path or absolute path as a single command line argument in a shell script?
For example, suppose my shell script is on my Desktop and I want to loop through all the text files in a folder that is somewhere in the file system.
I tried sh myshscript.sh /home/user/Desktop, but this doesn't seem feasible. And how would I avoid directory names and file names with whitespace?
myshscript.sh contains:
for i in `ls`
do
cat $i
done
Superficially, you might write:
cd "${1:-.}" || exit 1
for file in *
do
cat "$file"
done
except you don't really need the for loop in this case:
cd "${1:-.}" || exit 1
cat *
would do the job. And you could avoid the cd operation with:
cat "${1:-.}"/*
which lists (cats) all the files in the given directory, even if the directory or the file names contains spaces, newlines or other difficult to manage characters. You can use any appropriate glob pattern in place of * — if you want files ending .txt, then use *.txt as the pattern, for example.
This breaks down if you might have so many files that the argument list is too long. In that case, you probably need to use find:
find "${1:-.}" -type f -maxdepth 1 -exec cat {} +
(Note that -maxdepth is a GNU find extension.)
Avoid using ls to generate lists of file names, especially if the script has to be robust in the face of spaces, newlines etc in the names.
Use a glob instead of ls, and quote the loop variable:
for i in "$1"/*.txt
do
cat "$i"
done
PS: ShellCheck automatically points this out.

Resources