How to rename file based on parent and child folder name in bash script - linux

I would like to rename file based on parent/subparent directories name.
For example:
test.xml file located at
/usr/local/data/A/20180101
/usr/local/data/A/20180102
/usr/local/data/B/20180101
how to save test.xml file in /usr/local/data/output as
A_20180101_test.xml
A_20180102_test.xml
b_20180101_test.xml
tried shall script as below but does not help.
#!/usr/bin/env bash
target_dir_path="/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
l1="${file%%/*}"
l2="${file#*/}"
l2="${l2%%/*}"
filename="${file##*/}"
target_file_name="${l1}_${l2}_${filename}"
echo cp "$file" "${target_dir_path}/${target_file_name}"
done
Anything i am doing wrong in this shall script?

You can use the following command to do this operation:
source_folder="usr/local/data/";target_folder="target"; find $source_folder -type f -name test.xml | awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' | xargs -n2 cp;
or on several lines for readibility:
source_folder="usr/local/data/";
target_folder="target";
find $source_folder -type f -name test.xml |\
awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' |\
xargs -n2 cp;
where
target_folder is your target folder
source_folder is your source folder
the find command will search for all the test.xml named files present under this source folder
then the awk command will receive the target folder as a variable to be able to use it, then in the BEGIN bloc you define the field separator and output field separator, then you just print the initial filename as well as the new one
you use xargs to pass the result output grouped by 2 to the cp command and the trick is done
TESTED:
TODO:
you will just need to set up your source_folder and target_folder variables with what is on your environment and eventually put it in a script and you are good to go!

I've modified your code a little to get it to work. See comments in code
target_dir_path=""/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
tmp=${file%/*/*/*}
curr="${file#"$tmp/"}" # Extract wanted part of the filename
mod=${curr//[\/]/_} # Replace forward slash with underscore
mv "$file" "$target_dir_path$mod" # Move the file
done

if you have perl based rename command
$ for f in tst/*/*/test.xml; do
rename -n 's|.*/([^/]+)/([^/]+)/(test.xml)|./$1_$2_$3|' "$f"
done
rename(tst/A/20180101/test.xml, ./A_20180101_test.xml)
rename(tst/A/20180102/test.xml, ./A_20180102_test.xml)
rename(tst/B/20180101/test.xml, ./B_20180101_test.xml)
-n option is for dry run, remove it after testing
change tst to /usr/local/data and ./ to /usr/local/data/output/ for your usecase
.*/ to ignore file path
([^/]+)/([^/]+)/(test.xml) capture required portions
$1_$2_$3 re-arrange as required

Related

How to rename fasta header based on filename in multiple files?

I have a directory with multiple fasta file named as followed:
BC-1_bin_1_genes.faa
BC-1_bin_2_genes.faa
BC-1_bin_3_genes.faa
BC-1_bin_4_genes.faa
etc. (about 200 individual files)
The fasta header look like this:
>BC-1_k127_3926653_6 # 4457 # 5341 # -1 # ID=2_6;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.697
I now want to add the filename to the header since I want to annotate the sequences for each file.I tried the following:
for file in *.faa;
do
sed -i "s/>.*/${file%%.*}/" "$file" ;
done
It worked partially but it removed the ">" from the header which is essential for the fasta file. I tried to modify the "${file%%.*}" part to keep the carrot but it always called me out on bad substitutions.
I also tried this:
awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' *.faa
This worked in theory but only printed everything on my terminal rather than changing it in the respective files.
Could someone assist with this?
It's not clear whether you want to replace the earlier header, or add to it. Both scenarios are easy to do. Don't replace text you don't want to replace.
for file in ./*.faa;
do
sed -i "s/^>.*/>${file%%.*}/" "$file"
done
will replace the header, but include a leading > in the replacement, effectively preserving it; and
for file in ./*.faa;
do
sed -i "s/^>.*/&${file%%.*}/" "$file"
done
will append the file name at the end of the header (& in the replacement string evaluates to the string we are replacing, again effectively preserving it).
For another variation, try
for file in *.faa;
do
sed -i "/^>/s/\$/ ${file%%.*}/" "$file"
done
which says on lines which match the regex ^>, replace the empty string at the end of the line $ with the file name.
Of course, your Awk script could easily be fixed, too. Standard Awk does not have an option to parallel the -i "in-place" option of sed, but you can easily use a temporary file:
for file in ./*.faa;
do
awk '/>/{ $0 = $0 " " FILENAME);sub(/\.faa/,"")}1' "$file" >"$file.tmp" &&
mv "$file.tmp" "$file"
done
GNU Awk also has an -i inplace extension which you could simply add to the options of your existing script if you have GNU Awk.
Since FASTA files typically contain multiple headers, adding to the header rather than replacing all headers in a file with the same string seems more useful, so I changed your Awk script to do that instead.
For what it's worth, the name of the character ^ is caret (carrot is 🥕). The character > is called greater than or right angle bracket, or right broket or sometimes just wedge.
You just need to detect the pattern to replace and use regex to implement it:
fasta_helper.sh
location=$1
for file in $location/*.faa
do
full_filename=${file##*/}
filename="${full_filename%.*}"
#scape special chars
filename=$(echo $filename | sed 's_/_\\/_g')
echo "adding file name: $filename to: $full_filename"
sed -i -E "s/^[^#]+/>$filename /" $location/$full_filename
done
usage:
Just pass the folder with fasta files:
bash fasta_helper.sh /foo/bar
test:
lectures
Regex: matching up to the first occurrence of a character
Extract filename and extension in Bash
https://unix.stackexchange.com/questions/78625/using-sed-to-find-and-replace-complex-string-preferrably-with-regex
Locating your files
Suggesting to first identify your files with find command or ls command.
find . -type f -name "*.faa" -printf "%f\n"
A find command to print only file with filenames extension .faa. Including sub directories to current directory.
ls -1 "*.faa"
An ls command to print files and directories with extension .faa. In current directory.
Processing your files
Once you have the correct files list, iterate over the list and apply sed command.
for fileName in $(find . -type f -name "*.faa" -printf "%f\n"); do
stripedFileName=${fileName/.*/} # strip extension .faa
sed -i "1s|\$| $stripedFileName|" "fileName" # append value of stripedFileName at end of line 1
done

How to move files in Linux based on its name in a folder with a corresponding name?

I would need to move a series of files in certain folders via scripts. The files are of the format xxxx.date.0000 and I have to move them to a folder whose name is the same value given.
For example:
file hello.20190131.0000
in folder 20190131
The ideal would be to be able to create folders even before moving files but it is not a priority because I can create them by hand. I managed to get the value of dates on video with
ls * .0000 | awk -F. '{Print $ 2}'
Does anyone have any suggestions on how to proceed?
The initial awk command provided much of the answer. You just need to do something with the directory name you extract:
A simple option:
ls *.0000 | awk -F. '{printf "mkdir -p '%s'; mv '%s' '%s';",$2,$0,$2}' | sh
This might be more efficient with a large number of files:
ls *.0000 | awk -F. '{print $2}' |\
sort | uniq |\
while read dir; do
mkdir -p "$dir"
mv *."$dir".0000 "$dir"
done
I would do something like this:
ls *.0000 |\
sort |\
while read f; do
foldername="`echo $f | cut -d. -f2`"
echo mkdir +p "$foldername/"
echo mv "$f" "$foldername/"
done
i.e.: For eache of your files, I build the folder name using the cut command with a dot as field separator, and getting the second field (the date in this case); then I create that folder with mkdir -p (the -p flag avoids any warning if the folder should exist already), and finally I move the file to the brand new folder.
You can do that with rename, a.k.a. Perl rename.
Try it on a COPY of your files in a temporary directory.
If you use -p parameter, it will make any necessary directories for you automatically. If you use --dry-run parameter, you can see what it would do without actually doing anything.
rename --dry-run -p 'my #X=split /\./; $_=$X[1] . "/" . $_' hello*
Sample Output
'hello.20190131.0000' would be renamed to '20190131/hello.20190131.0000'
'hello.20190137.0000' would be renamed to '20190137/hello.20190137.0000'
All you need to know is that it passes you the current name of the file in a variable called $_ and it expects you to change that to return the new filename you would like.
So, I split the current name into elements of an array X[] with the dot (period) as the separator:
my #X = split /\./
That gives me the output directory in $X[1]. Now I can set the new filename I want by putting the new directory, a slash and the old filename into $_:
$_=$X[1] . "/" . $_
You could also try this, shorter version:
rename --dry-run -p 's/.*\.(\d+)\..*/$1\/$_/' hello*
On ArchLinux, the package you would use is called perl-rename.
On debian, it is called rename
On macOS, use homebrew like this: brew install rename

Bash script to get all file with desired extensions

I'm trying to write a bash script that if I pass a text file containing some extension and a folder returns me an output file with the list of all files that match the desired extension, searching recursively in all sub-directories
the folder is my second parameter the extension list file my first parameter
I have tried:
for i in $1 ; do
find . -name $2\*.$i -print>>result.txt
done
but doesn't work
As noted from in comment:
It is not a good idea to write to a hard coded file name.
The given example fixes only the given code from the OP question.
Yes of course, it is even better to call with
x.sh y . > blabla
and remove the filename from the script itself. But my intention is not to fix the question...
The following bash script, named as x.sh
#!/bin/bash
echo -n >result.txt # delete old content
while read i; do # read a line from file
find $2 -name \*.$i -print>>result.txt # for every item do a find
done <$1 # read from file named with first arg from cmdline
with an text file named y with following content
txt
sh
and called with:
./x.sh y .
results in a file result.txt which contents is:
a.txt
b.txt
x.sh
OK, lets give some additional hints as got from comments:
If the results fiel should not collect any other conntent from other results of the script it can be simplified to:
#!/bin/bash
while read i; do # read a line from file
find $2 -name \*.$i -print # for every item do a find
done <$1 >result.txt # read from file named with first arg from cmdline
And as already mentioned:
The hard coded result.txt could be removed and the call can be something like
./x.sh y . > result.txt
Give this one-liner command a try.
Replace /mydir with the folder to search.
Change the list of extensions passed as argument to the egrep command:
find /mydir -type f | egrep "[.]txt|[.]xml" >> result.txt
After the egrep, each extension should be separated with |.
. char must be escaped with [.]

Append directory name to the end of the files and move them

I am finding some files in a directory using this command:
find /Users/myname -type f
output is:
/Users/myname/test01/logs1/err.log
/Users/myname/test01/logs1/std
/Users/myname/test01/logs2/std
/Users/myname/test02/logs2/velocity.log
/Users/myname/test03/logs3/err.log
/Users/myname/test03/logs3/invalid-arg
I need to move this files to a different directory by appending the test directory name to the end of the files. Like below:
err.log-test01
std-test01
std-test01
velocity.log-test02
err.log-test03
invalid-arg-test03
I am trying with the cut command but not getting the desired output.
find /Users/myname -type f | cut -d'/' -f6,4
plus, I also need to move the files to a different directory. I guess a suitable way could be there using sed command, but I am not proficient with sed. How this can be achieved in an efficient way?
You can let find create the mv command, use sed to modify it and then have it run by the shell:
find /Users/myname -type f -printf "mv %p /other/dir/%f\n" |
sed 's,/\(test[0-9]*\)/\(.*\),/\1/\2-\1,' | sh
This assumes there are no spaces in any argument, otherwise liberally add ' or ". Also run it without the final | sh to see what it actually wants to do. If you need to anchor the test[0-9]* pattern better you can include part of the left or right string to match:
's,myname/\(test[0-9]*\)/\(.*\),myname/\1/\2-\1,'
You can move it from the dst to the dst_dir appending the directory, using awk, and the target name would be awk -F/ '{print $5 "-" $4}'. The full command could be as simple as:
for i in `find . -type f`
do mv $i /dst_dir/`echo $i| awk -F/ '{print $5 "-" $4}' `
done
There are a number of things going on that you may want to use a helper script with find to insure you can validate the existence of the directory to move the files to, etc.. A script might take the form of:
#!/bin/bash
[ -z $1 -o -z $2 ] && { # validate at least 2 arguments
printf "error: insufficient input\n"
exit 1
}
ffn="$1" # full file name provided by find
newdir="$2" # the target directory
# validate existence of 'newdir' or create/exit on failure
[ -d "$newdir" ] || mkdir -p "$newdir"
[ -d "$newdir" ] || { printf "error: uname to create '$newdir'\n"; exit 1; }
tmp="${ffn##*test}" # get the test## number
num="${tmp%%/*}"
fn="${ffn##*/}" # remove existing path from ffn
mv "$ffn" "${newdir}/${fn}-test${num}" # move to new location
exit 0
Save it in a location where it is accessible under a name like myscript and make it executable (e.g. chmod 0755 myscript) You may also choose to put it in a directory within your path. You can then call the script for every file returned by find with:
find /Users/myname -type f -exec ./path/to/myscript '{}' somedir \;
Where somedir is the target directory for the renamed file. Helper scripts generally provide the ability to do required validation that would otherwise not be done in one-liners.

Removing 10 Characters of Filename in Linux

I just downloaded about 600 files from my server and need to remove the last 11 characters from the filename (not including the extension). I use Ubuntu and I am searching for a command to achieve this.
Some examples are as follows:
aarondyne_kh2_13thstruggle_or_1250556383.mus should be renamed to aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 should be renamed to aarondyne_kh2_darknessofunknow.mp3
It seems that some duplicates might exist after I do this, but if the command fails to complete and tells me what the duplicates would be, I can always remove those manually.
Try using the rename command. It allows you to rename files based on a regular expression:
The following line should work out for you:
rename 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
The following changes will occur:
aarondyne_kh2_13thstruggle_or_1250556383.mus renamed as aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 renamed as aarondyne_kh2_darknessofunknow.mp3
You can check the actions rename will do via specifying the -n flag, like this:
rename -n 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
For more information on how to use rename simply open the manpage via: man rename
Not the prettiest, but very simple:
echo "$filename" | sed -e 's!\(.*\)...........\(\.[^.]*\)!\1\2!'
You'll still need to write the rest of the script, but it's pretty simple.
find . -type f -exec sh -c 'mv {} `echo -n {} | sed -E -e "s/[^/]{10}(\\.[^\\.]+)?$/\\1/"`' ";"
one way to go:
you get a list of your files, one per line (by ls maybe) then:
ls....|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
this will print the mv a b command
e.g.
kent$ echo "aarondyne_kh2_13thstruggle_or_1250556383.mus"|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
mv aarondyne_kh2_13thstruggle_or_1250556383.mus aarondyne_kh2_13thstruggle_or.mus
to execute, just pipe it to |sh
I assume there is no space in your filename.
This script assumes each file has just one extension. It would, for instance, rename "foo.something.mus" to "foo.mus". To keep all extensions, remove one hash mark (#) from the first line of the loop body. It also assumes that the base of each filename has at least 12 character, so that removing 11 doesn't leave you with an empty name.
for f in *; do
ext=${f##*.}
new_f=${base%???????????.$ext}
if [ -f "$new_f" ]; then
echo "Will not rename $f, $new_f already exists" >&2
else
mv "$f" "$new_f"
fi
done

Resources