replacing file name and content : shell script - linux

I have written the code for replacing content but quite confused with replacing file name, both in the same script. I want my script to run this way:
suppose I give input : ./myscript.sh abc xyz
abc is my string to be replaced
xyz is the string to replace with
If some directory or any subdirectory or file in it has name abc, that should also change to xyz.
Here is my code:
filepath="/misc/home3/babitasandooja/fol1"
for file in $(grep -lR $1 $filepath)
do
sed -i "s/$1/$2/g" $file
echo "Modified: " $file
done
Now How should I code to replace the filename as well.
I tried:
if( *$1*==$file )
then
rename $1 $2 $filepath
fi
and
find -iname $filepath -exec mv $1 $2 \;
but any of then is not working correctly. What should I do? Which approach should I take?
Any help would be appreciated.
Thanks :)

#!/bin/bash
dir=$1 str=$2 rep=$3
while IFS= read -rd '' file; do
sed -i "s/$str/$rep/g" -- "$file"
base=${file##*/} dir=${file%/*}
[[ $base == *"$str"* ]] && mv "$file" "$dir/${base//$str/$rep}"
done < <(exec grep -ZFlR "$str" "$dir")
Usage:
bash script.sh dir string replacement
Note: rename would also rename the directory part.
grep -Z makes it produce null-delimited outputs. i.e. it produces output which is composed of filenames in which everything is separated by 0x00.
-d '' makes read read input delimited by 0x00; -r prevents
backslashes to be interpreted; and IFS= prevents word splitting with IFS.
IFS= read -rd '' makes read read input with delimiter of `0
base=${file##*/} removes the directory part. It's the same as base=$(basename "$file")
dir=${file%/*} removes the file part.
[[ $base == *"$str"* ]] checks if the filename has something that could be renamed. && makes the command that follows it execute if the previous returns zero (true) code. Think of it as a single link if statement.
"$dir/${base//$str/$rep}" forms the new filename. ${base//$str/$rep} replaces anything in $base that matches value of $str and replace it with value of $rep.

Related

Replace filename to a string of the first line in multiple files in bash

I have multiple fasta files, where the first line always contains a > with multiple words, for example:
File_1.fasta:
>KY620313.1 Hepatitis C virus isolate sP171215 polyprotein gene, complete cds
File_2.fasta:
>KY620314.1 Hepatitis C virus isolate sP131957 polyprotein gene, complete cds
File_3.fasta:
>KY620315.1 Hepatitis C virus isolate sP127952 polyprotein gene, complete cds
I would like to take the word starting with sP* from each file and rename each file to this string (for example: File_1.fasta to sP171215.fasta).
So far I have this:
$ for match in "$(grep -ro '>')";do
fname=$("echo $match|awk '{print $6}'")
echo mv "$match" "$fname"
done
But it doesn't work, I always get the error:
grep: warning: recursive search of stdin
I hope you can help me!
you can use something like this:
grep '>' *.fasta | while read -r line ; do
new_name="$(echo $line | cut -d' ' -f 6)"
old_name="$(echo $line | cut -d':' -f 1)"
mv $old_name "$new_name.fasta"
done
It searches for *.fasta files and handles every "hitted" line
it splits each result of grep by spaces and gets the 6th element as new name
it splits each result of grep by : and gets the first element as old name
it
moves/renames from old filename to new filename
There are several things going on with this code.
For a start, .. I actually don't get this particular error, and this might be due to different versions.
It might resolve to the fact that grep interprets '>' the same as > due to bash expansion being done badly. I would suggest maybe going for "\>".
Secondly:
fname=$("echo $match|awk '{print $6}'")
The quotes inside serve unintended purpose. Your code should like like this, if anything:
fname="$(echo $match|awk '{print $6}')"
Lastly, to properly retrieve your data, this should be your final code:
for match in "$(grep -Hr "\>")"; do
fname="$(echo "$match" | cut -d: -f1)"
new_fname="$(echo "$match" | grep -o "sP[^ ]*")".fasta
echo mv "$fname" "$new_fname"
done
Explanations:
grep -H -> you want your grep to explicitly use "Include Filename", just in case other shell environments decide to alias grep to grep -h (no filenames)
you don't want to be doing grep -o on your file search, as you want to have both the filename and the "new filename" in one data entry.
Although, i don't see why you would search for '>' and not directory for 'sP' as such:
for match in "$(grep -Hro "sP[0-9]*")"
This is not the exact same behaviour, and has different edge cases, but it just might work for you.
Quite straightforward in (g)awk :
create a file "script.awk":
FNR == 1 {
for (i=1; i<=NF; i++) {
if (index($i, "sP")==1) {
print "mv", FILENAME, $i ".fasta"
nextfile
}
}
}
use it :
awk -f script.awk *.fasta > cmmd.txt
check the content of the output.
mv File_1.fasta sP171215.fasta
mv File_2.fasta sP131957.fasta
if ok, launch rename with . cmmd.txt
For all fasta files in directory, search their first line for the first word starting with sP and rename them using that word as the basename.
Using a bash array:
for f in *.fasta; do
arr=( $(head -1 "$f") )
for word in "${arr[#]}"; do
[[ "$word" =~ ^sP* ]] && echo mv "$f" "${word}.fasta" && break
done
done
or using grep:
for f in *.fasta; do
word=$(head -1 "$f" | grep -o "\bsP\w*")
[ -z "$word" ] || echo mv "$f" "${word}.fasta"
done
Note: remove echo after you are ok with testing.

Creating a directory name based on a file name

In my script I am taking a text file and splitting into sections. Before doing any splitting, I am reformatting the name of the text file. PROBLEM: Creating a folder/directory and naming it the formatted file name. This is where segments are placed. However the script breaks when the text file has spaces in it. But that is the reason I am trying to reformat the name first and then do the rest of the operations. How could I do so in that sequence?
execute script: text_split.sh -s "my File .txt" -c 2
text_split.sh
# remove whitespace and format file name
FILE_PATH="/archive/"
find $FILE_PATH -type f -exec bash -c 'mv "$1" "$(echo "$1" \
| sed -re '\''s/^([^-]*)-\s*([^\.]*)/\L\1\E-\2/'\'' -e '\''s/ /_/g'\'' -e '\''s/_-/-/g'\'')"' - {} \;
sleep 1
# arg1: path to input file / source
# create directory
function fallback_out_file_format() {
__FILE_NAME=`rev <<< "$1" | cut -d"." -f2- | rev`
__FILE_EXT=`rev <<< "$1" | cut -d"." -f1 | rev`
mkdir -p $FILE_PATH${__FILE_NAME};
__OUT_FILE_FORMAT="$FILE_PATH${__FILE_NAME}"/"${__FILE_NAME}-part-%03d.${__FILE_EXT}"
echo $__OUT_FILE_FORMAT
exit 1
}
# Set variables and default values
OUT_FILE_FORMAT=''
# Grab input arguments
while getopts “s:c” OPTION
do
case $OPTION in
s) SOURCE=$(echo "$OPTARG" | sed 's/ /\\ /g' ) ;;
c) CHUNK_LEN="$OPTARG" ;;
?) usage
exit 1
;;
esac
done
if [ -z "$OUT_FILE_FORMAT" ] ; then
OUT_FILE_FORMAT=$(fallback_out_file_format $SOURCE)
fi
Your script takes a filename argument, specified with -s, then modifies a hard-coded directory by renaming the files it contains, then uses the initial filename to generate an output directory and filename. It definitely sounds like the workflow should be adjusted. For instance, instead of trying to correct all the bad filenames in /archive/, just fix the name of the file specified with -s.
To get filename and extension, use bash's string manipulation ability, as shown in this question:
filename="${fullfile##*/}"
extension="${filename##*.}"
name="${filename%.*}"
You can trim whitespace from the input string using tr -d ' '.
You can then join this to your FILE_PATH variable with something like this:
FILE_NAME=$(echo $1 | tr -d ' ')
FILE_PATH="/archive/"
FILE_PATH=$FILE_PATH$FILE_NAME
You can escape the space using a back slash \
Now the user may not always provide with the back slash, so the script can use sed to convert all (space) to \
sed 's/ /\ /g'
you can obtain the new directory name as
dir_name=`echo $1 | sed 's/ /\ /g'

sed or operator in set of regex

The bash script I wrote is supposed to modify my text files. The problem is the speed of operation. There are 4 lines of each file I want to modify.
This is my bash script to modify all .txt files in a given folder:
srcdir="$1" //source directory
cpr=$2 //given string argument
find $srcdir -name "*.txt" | while read i; do
echo "#############################"
echo "$i"
echo "Custom string: $cpr"
echo "#############################"
# remove document name and title
sed -i 's_document=.*\/[0-9]\{10\}\(, User=team\)\?__g' $i
# remove document date
sed -i 's|document date , [0-9]\{2\}\/[0-9]\{2\}\/[0-9]\{4\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} MDT||g' $i
# remove document id
sed -i 's|document id = 878h67||g' $i
# replace new producer
sed_arg="-i 's|Reproduced by $cpr|john smith|g' $i"
eval sed "$sed_arg"
done
I dont know how to concatinate all my sed commands in one command or two, so the job would be done faster ( I think! )
I have tried the OR operator for regex | but no success.
Have you tried
sed -i -e 's/pattern/replacement/g' -e 's/pattern1/replace1/g' file
sed -i '
s_document=.*\/[0-9]\{10\}\(, User=team\)\?__g;
s|document date , [0-9]\{2\}\/[0-9]\{2\}\/[0-9]\{4\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} MDT||g;
s|document id = 878h67||g;
s|Reproduced by '"$cpr"'|john smith|g' $i

Iterate over a list of files with spaces

I want to iterate over a list of files. This list is the result of a find command, so I came up with:
getlist() {
for f in $(find . -iname "foo*")
do
echo "File found: $f"
# do something useful
done
}
It's fine except if a file has spaces in its name:
$ ls
foo_bar_baz.txt
foo bar baz.txt
$ getlist
File found: foo_bar_baz.txt
File found: foo
File found: bar
File found: baz.txt
What can I do to avoid the split on spaces?
You could replace the word-based iteration with a line-based one:
find . -iname "foo*" | while read f
do
# ... loop body
done
There are several workable ways to accomplish this.
If you wanted to stick closely to your original version it could be done this way:
getlist() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: %s\n' "$file"
done
}
This will still fail if file names have literal newlines in them, but spaces will not break it.
However, messing with IFS isn't necessary. Here's my preferred way to do this:
getlist() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
If you find the < <(command) syntax unfamiliar you should read about process substitution. The advantage of this over for file in $(find ...) is that files with spaces, newlines and other characters are correctly handled. This works because find with -print0 will use a null (aka \0) as the terminator for each file name and, unlike newline, null is not a legal character in a file name.
The advantage to this over the nearly-equivalent version
getlist() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: %s\n' "$file"
done
}
Is that any variable assignment in the body of the while loop is preserved. That is, if you pipe to while as above then the body of the while is in a subshell which may not be what you want.
The advantage of the process substitution version over find ... -print0 | xargs -0 is minimal: The xargs version is fine if all you need is to print a line or perform a single operation on the file, but if you need to perform multiple steps the loop version is easier.
EDIT: Here's a nice test script so you can get an idea of the difference between different attempts at solving this problem
#!/usr/bin/env bash
dir=/tmp/getlist.test/
mkdir -p "$dir"
cd "$dir"
touch 'file not starting foo' foo foobar barfoo 'foo with spaces'\
'foo with'$'\n'newline 'foo with trailing whitespace '
# while with process substitution, null terminated, empty IFS
getlist0() {
while IFS= read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# while with process substitution, null terminated, default IFS
getlist1() {
while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done < <(find . -iname 'foo*' -print0)
}
# pipe to while, newline terminated
getlist2() {
find . -iname 'foo*' | while read -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# pipe to while, null terminated
getlist3() {
find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, default IFS
getlist4() {
for file in "$(find . -iname 'foo*')" ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# for loop over subshell results, newline terminated, newline IFS
getlist5() {
IFS=$'\n'
for file in $(find . -iname 'foo*') ; do
printf 'File found: '"'%s'"'\n' "$file"
done
}
# see how they run
for n in {0..5} ; do
printf '\n\ngetlist%d:\n' $n
eval getlist$n
done
rm -rf "$dir"
There is also a very simple solution: rely on bash globbing
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
$ ls
stupid file 3 stupid file1 stupid file2
$ for file in *; do echo "file: '${file}'"; done
file: 'stupid file 3'
file: 'stupid file1'
file: 'stupid file2'
Note that I am not sure this behavior is the default one but I don't see any special setting in my shopt so I would go and say that it should be "safe" (tested on osx and ubuntu).
find . -iname "foo*" -print0 | xargs -L1 -0 echo "File found:"
find . -name "fo*" -print0 | xargs -0 ls -l
See man xargs.
Since you aren't doing any other type of filtering with find, you can use the following as of bash 4.0:
shopt -s globstar
getlist() {
for f in **/foo*
do
echo "File found: $f"
# do something useful
done
}
The **/ will match zero or more directories, so the full pattern will match foo* in the current directory or any subdirectories.
I really like for loops and array iteration, so I figure I will add this answer to the mix...
I also liked marchelbling's stupid file example. :)
$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid file 3"
Inside the test directory:
readarray -t arr <<< "`ls -A1`"
This adds each file listing line into a bash array named arr with any trailing newline removed.
Let's say we want to give these files better names...
for i in ${!arr[#]}
do
newname=`echo "${arr[$i]}" | sed 's/stupid/smarter/; s/ */_/g'`;
mv "${arr[$i]}" "$newname"
done
${!arr[#]} expands to 0 1 2 so "${arr[$i]}" is the ith element of the array. The quotes around the variables are important to preserve the spaces.
The result is three renamed files:
$ ls -1
smarter_file1
smarter_file2
smarter_file_3
find has an -exec argument that loops over the find results and executes an arbitrary command. For example:
find . -iname "foo*" -exec echo "File found: {}" \;
Here {} represents the found files, and wrapping it in "" allows for the resultant shell command to deal with spaces in the file name.
In many cases you can replace that last \; (which starts a new command) with a \+, which will put multiple files in the one command (not necessarily all of them at once though, see man find for more details).
I recently had to deal with a similar case, and I built a FILES array to iterate over the filenames:
eval FILES=($(find . -iname "foo*" -printf '"%p" '))
The idea here is to surround each filename with double quotes, separate them with spaces and use the result to initialize the FILES array.
The use of eval is necessary to evaluate the double quotes in the find output correctly for the array initialization.
To iterate over the files, just do:
for f in "${FILES[#]}"; do
# Do something with $f
done
In some cases, here if you just need to copy or move a list of files, you could pipe that list to awk as well.
Important the \"" "\" around the field $0 (in short your files, one line-list = one file).
find . -iname "foo*" | awk '{print "mv \""$0"\" ./MyDir2" | "sh" }'
Ok - my first post on Stack Overflow!
Though my problems with this have always been in csh not bash the solution I present will, I'm sure, work in both. The issue is with the shell's interpretation of the "ls" returns. We can remove "ls" from the problem by simply using the shell expansion of the * wildcard - but this gives a "no match" error if there are no files in the current (or specified folder) - to get around this we simply extend the expansion to include dot-files thus: * .* - this will always yield results since the files . and .. will always be present. So in csh we can use this construct ...
foreach file (* .*)
echo $file
end
if you want to filter out the standard dot-files then that is easy enough ...
foreach file (* .*)
if ("$file" == .) continue
if ("file" == ..) continue
echo $file
end
The code in the first post on this thread would be written thus:-
getlist() {
for f in $(* .*)
do
echo "File found: $f"
# do something useful
done
}
Hope this helps!
Another solution for job...
Goal was :
select/filter filenames recursively in directories
handle each names (whatever space in path...)
#!/bin/bash -e
## #Trick in order handle File with space in their path...
OLD_IFS=${IFS}
IFS=$'\n'
files=($(find ${INPUT_DIR} -type f -name "*.md"))
for filename in ${files[*]}
do
# do your stuff
# ....
done
IFS=${OLD_IFS}

How can I add a string to the beginning of each file in a folder in bash?

I want to be able to prepend a string to the beginning of each text file in a folder. How can I do this using bash on Linux?
This will do that. You could make it more efficient if you are doing the same text to each file...
for f in *; do
echo "whatever" > tmpfile
cat $f >> tmpfile
mv tmpfile $f
done
You can do it like this without a loop and cat
sed -i '1i whatever' *
if you want to back up your files, use -i.bak
Or using awk
awk 'FNR==1{$0="whatever\n"$0;}{print $0>FILENAME}' *
And you can do this using sed in 1 single command as well
for f in *; do
sed -i.bak '1i\
foo-bar
' ${f}
done
This should do the trick.
FOLDER='path/to/your/folder'
TEXT='Text to prepend'
cd $FOLDER
for i in `ls -1 $FOLDER`; do
CONTENTS=`cat $i`
echo $TEXT > $i # use echo -n if you want the append to be on the same line
echo $CONTENTS >> $i
done
I wouldn't recommending doing this if your files are very big though.
You can do this as well:
for f in *; do
cat <(echo "someline") $f > tempfile
mv tempfile $f
done
It's not much different from the 1st post but does show how to treat the output of the 'echo' statement as a file without having to create a temporay file to store the value.
You may use the ed command to do without temporary files if you like:
for file in *; do
(test ! -f "${file}" || test ! -w "${file}") && continue # sort out non-files and non-writable files
if test -s "${file}" && ! grep -Iqs '.*' "${file}"; then continue; fi # sort out binary files
printf '\n%s\n\n' "FILE: ${file}"
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 0a "foobar" . ',p' q | ed -s "${file}" # dry run (just prints to stdout)
#printf '%s\n' H 0a "foobar" . wq | ed -s "${file}" # in-place file edit without any backup
done | less
This is the easiest I have worked out.
sed -i '1s/^/Text to add then new file\n/' /file/to/change
Here is an example :
for f in *;
do
mv "$f" "whatever_$f"
done
A one-liner: rename '' string_ *

Resources