how to run grep from script and store output in a file in the destination directory from bash script - linux

I am trying to filter out lines from a file through a bash script. I am able to find the path of the file from script location by running the command
Fgff=`find $D -maxdepth 1 -type f -name "*.gff"`
I can add a column to the found .gff file by running the command
sed -i '1 s/$/\tsample/; 1! s/$/\t'${D##*/}'/' $Fpsi
However if I try to filter the file and write the output in another file in the same folder then its not working.
grep 'ENSG00000155657\|ENSG00000198947' $Fgff > "$Fgff$filtered"
I want to know why grep is not working?
How can I filter all the lines having substring ENSG00000155657 or ENSG00000198947 in file apple.gff at ./dira/dirb/apple.gff and store it in ./dira/dirb/applefiltered.gff?
thanks

Providing that your $Fgff contains the correct filename, your grep command does exactly what you requested, searching for the string 'ENSG0000015565(7\|E)NSG00000198947' while you probably wanted '(ENSG00000155657)\|(ENSG00000198947)'.

Related

How to rename file based on parent and child folder name in bash script

I would like to rename file based on parent/subparent directories name.
For example:
test.xml file located at
/usr/local/data/A/20180101
/usr/local/data/A/20180102
/usr/local/data/B/20180101
how to save test.xml file in /usr/local/data/output as
A_20180101_test.xml
A_20180102_test.xml
b_20180101_test.xml
tried shall script as below but does not help.
#!/usr/bin/env bash
target_dir_path="/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
l1="${file%%/*}"
l2="${file#*/}"
l2="${l2%%/*}"
filename="${file##*/}"
target_file_name="${l1}_${l2}_${filename}"
echo cp "$file" "${target_dir_path}/${target_file_name}"
done
Anything i am doing wrong in this shall script?
You can use the following command to do this operation:
source_folder="usr/local/data/";target_folder="target"; find $source_folder -type f -name test.xml | awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' | xargs -n2 cp;
or on several lines for readibility:
source_folder="usr/local/data/";
target_folder="target";
find $source_folder -type f -name test.xml |\
awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' |\
xargs -n2 cp;
where
target_folder is your target folder
source_folder is your source folder
the find command will search for all the test.xml named files present under this source folder
then the awk command will receive the target folder as a variable to be able to use it, then in the BEGIN bloc you define the field separator and output field separator, then you just print the initial filename as well as the new one
you use xargs to pass the result output grouped by 2 to the cp command and the trick is done
TESTED:
TODO:
you will just need to set up your source_folder and target_folder variables with what is on your environment and eventually put it in a script and you are good to go!
I've modified your code a little to get it to work. See comments in code
target_dir_path=""/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
tmp=${file%/*/*/*}
curr="${file#"$tmp/"}" # Extract wanted part of the filename
mod=${curr//[\/]/_} # Replace forward slash with underscore
mv "$file" "$target_dir_path$mod" # Move the file
done
if you have perl based rename command
$ for f in tst/*/*/test.xml; do
rename -n 's|.*/([^/]+)/([^/]+)/(test.xml)|./$1_$2_$3|' "$f"
done
rename(tst/A/20180101/test.xml, ./A_20180101_test.xml)
rename(tst/A/20180102/test.xml, ./A_20180102_test.xml)
rename(tst/B/20180101/test.xml, ./B_20180101_test.xml)
-n option is for dry run, remove it after testing
change tst to /usr/local/data and ./ to /usr/local/data/output/ for your usecase
.*/ to ignore file path
([^/]+)/([^/]+)/(test.xml) capture required portions
$1_$2_$3 re-arrange as required

Bash script to get all file with desired extensions

I'm trying to write a bash script that if I pass a text file containing some extension and a folder returns me an output file with the list of all files that match the desired extension, searching recursively in all sub-directories
the folder is my second parameter the extension list file my first parameter
I have tried:
for i in $1 ; do
find . -name $2\*.$i -print>>result.txt
done
but doesn't work
As noted from in comment:
It is not a good idea to write to a hard coded file name.
The given example fixes only the given code from the OP question.
Yes of course, it is even better to call with
x.sh y . > blabla
and remove the filename from the script itself. But my intention is not to fix the question...
The following bash script, named as x.sh
#!/bin/bash
echo -n >result.txt # delete old content
while read i; do # read a line from file
find $2 -name \*.$i -print>>result.txt # for every item do a find
done <$1 # read from file named with first arg from cmdline
with an text file named y with following content
txt
sh
and called with:
./x.sh y .
results in a file result.txt which contents is:
a.txt
b.txt
x.sh
OK, lets give some additional hints as got from comments:
If the results fiel should not collect any other conntent from other results of the script it can be simplified to:
#!/bin/bash
while read i; do # read a line from file
find $2 -name \*.$i -print # for every item do a find
done <$1 >result.txt # read from file named with first arg from cmdline
And as already mentioned:
The hard coded result.txt could be removed and the call can be something like
./x.sh y . > result.txt
Give this one-liner command a try.
Replace /mydir with the folder to search.
Change the list of extensions passed as argument to the egrep command:
find /mydir -type f | egrep "[.]txt|[.]xml" >> result.txt
After the egrep, each extension should be separated with |.
. char must be escaped with [.]

Command line bash for entering multiple directories and executing a command

I'm new to this site (and to programming, more or less), but I'm hoping you can help.
I have numerous directories named 3K, 4K, 5K, etc. Within each directory I have 12 subdirectories named v1 to v12, each containing a file called OUTCAR. I am trying to write a bash command that will allow me to enter each of the subdirectories and gather data from OUTCAR.
The function works with no issues when I enter each subdirectory individually.
I'm using
for file in v{1..12} ; do grep "key_string" OUTCAR | awk '{print(relevant_stuff)}' > output.dat ; done
From the *K fine that contains the v{1..12} subdirectories.
However, I'm getting an error telling me that OUTCAR doesn't exist for each v{1..12}. I know it does, so I'm guessing that I haven't properly directed the command to cd into each subdirectory first. Any tips?
Thanks!
You would be better of using this find command from top level directory where these sub directories exist:
find . -type d -name 'v[1-9][[1-9]' \
-exec awk '/key_string/ {print FILENAME ":" $0}' {}/* >> output.dat \;

Using grep to overwrite its current file

I have a list of directories within directories and this is what I am trying to attempt:
find a specific file format which is .xml
within all these .xml files, read the contents in the files and remove line 3
For line 3, its string is as follows: dxflib <Name of whatever folder it is in>.dxb
I tried using find -name "*.xml" | xargs grep -v "dxflib" in the terminal (I am using linux) and I found out that while my code works and it displays the results, it did not overwrite the changes to the file.
And as I googled online, it is mentioned that I will need to add in >> output.txt etc
And hence, are there anyways in which I can make it to save / overwrite its own file?
Removes third line in file:
sed -i '3d' file

Launching program several times

I am using Mac Os. This is command line code to lauch my programm (two parts)
nucmer --mum file1.txt file2.txt
show-snps -Clr -x 2 out.delta > out_file1.snps
First part of the programm creates file out.delta. My file2.txt is always the same, but I want to launch this both parts 35000 times whith different file1.txt. All the file1s are located in the same directory.
Is it possible to do it using BASH?
Keep all the input files in a directory. Create a wrapper script to invoke nucmer script and then show-snps script. Your wrapper script will accept path to file directory as input. Iterate over all files in the directory and call your two scripts.
You could do something along these lines:
find . -maxdepth 1 -type f -print | grep -v './out_' | while read f
do
b=$(basename ${f})
nucmer --mum ${f} file2.txt
show-snps -Clr -x 2 out.delta > out_${b}.snps
done
The find bit finds all files in the current directory. grep filters out any previous output files, in case you've run some previously. The basename line strips off the leading ./ and trailing extension, and then your two programs get run with the input file name and an output filename based on the basename output.
If you don't get an argument list too long error, you could just use for:
for f in file*.txt; do nucmer --mum $f second.txt; show-snps -Clr -x 2 out.delta > out_${f%.txt}.snps; done

Resources