How to replace N repeated special characters in Bash? - linux

I want to replace any special character (not a number or letter) to one single '-'.
I tried the code below with some characters, but it doesn't work when the character is repeated more than 1 time because would still have more than one '-'.
#!/bin/bash
for f in *; do mv "$f" "${f// /-}"; done
for f in *; do mv "$f" "${f//_/-}"; done
for f in *; do mv "$f" "${f//-/-}"; done
what I want:
test---file -> test-file
test file -> test-file
test______file -> test-file
teeesst--ffile -> teeesst-ffile
test555----file__ -> test555-file
Please, explain your answer because I don't know much about bash, regexp...

There are a couple of different rename (or prename) commands available in various distributions of Linux what will handle regex substitutions.
But you can also use Bash's extended globbing to do some of that. The pattern ${var//+([-_ ])/-} says to replace any runs of one or more characters that are listed in the square brackets with one hyphen.
shopt -s extglob
# demonstration:
for file in test---file 'test file' test______file teeesst--ffile test555----file__
do
echo "${file//+([-_ ])/-}"
done
Output:
test-file
test-file
test-file
teeesst-ffile
test555-file-
The extended glob +() is similar to .+ in regex. Other Bash extended globs (from man bash):
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Note that the final hyphen is not removed here, but could be using an additional parameter expansion:
file=${file/%-/}
which says to remove a hyphen at the end of the string.

You can use tr (as shown above in the comment) or, actually, sed makes more sense in this case. For example, given your list of filenames:
$ cat fnames
test---file
test file
test______file
teeesst--ffile
test555----file__
You can use the sed expression:
sed -e 's/[[:punct:] ][[:punct:] ]*/-/' -e 's/[[:punct:] ]*$//'
Example Use/Output
$ sed -e 's/[[:punct:] ][[:punct:] ]*/-/' -e 's/[[:punct:] ]*$//' fnames
test-file
test-file
test-file
teeesst-ffile
test555-file
Depending on how your filenames are stored, you can either use command substitution individually, or you can use process substitution and feed the updated names into a while loop or something similar.

Related

How to replace date part in filename with current date

How to replace only date part to current date of all files present in diretory in unix.
Folder path: C:/shan
Sample files:
CN_Apria_837p_20180924.txt
DN_Apria_837p_20150502.txt
GN_Apria_837p_20160502.txt
CH_Apria_837p_20170502.txt
CU_Apria_837p_20180502.txt
PN_Apria_837p_20140502.txt
CN_Apria_837p_20101502.txt
Desired result should be:
CN_Apria_837p_20190502.txt
DN_Apria_837p_20190502.txt
GN_Apria_837p_20190502.txt
CH_Apria_837p_20190502.txt
CU_Apria_837p_20190502.txt
PN_Apria_837p_20190502.txt
CN_Apria_837p_20190502.txt
Edit:
I'm completely new to unix sell scripting. I tried this below, however it's not working.
#!/bin/bash
for i in ls $1 | grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}'
do
x=echo $i | grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2}'
y=echo $i | sed "s/$x/$(date +%F)/g"
mv $1/$i $1/$y 2>/dev/null #incase if old date is same as current date
done
I would use regular expressions here. From the bash man-page:
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right
of the operator is considered an extended regular expression and
matched accordingly (as in regex(3)). The return value is 0 if the
string matches the pattern, and 1 otherwise. .... Substrings
matched by parenthesized subexpressions within the regular
expression are saved in the array variable BASH_REMATCH. ...
The element of BASH_REMATCH with indexn is the portion of the
string matching the nth parenthesized sub-expression.
Hence, assuming that the variable x holds the name of one of the files
in question, the code
if [[ $x =~ ^(.*_)[0-9]+([.]txt$) ]]
then
mv "$x" "$BASH_REMATCH[1]$(date +%Y%m%d)$BASH_REMATCH[2]"
fi
first tests roughly whether the file indeed follows the required naming scheme, and then modifies the name accordingly.
Of course in practice, you will tailor the regexp to match your application better. Only you can know what variations in the file name are permitted.
The below should do this
for f in $(find /path/to/files -name "*_*_*_*.txt")
do
newname=$(echo "$f" | sed -r "s/[12][0-9]{3}[01][0-9][0-3][0-9]/$(date '+%Y%m%d')/g")
mv "$f" "$newname"
done
Try this Shellcheck-clean code:
#! /bin/bash -p
readonly dir=$1
shopt -s nullglob # Make glob patterns that match nothing expand to nothing
readonly dateglob='20[0-9][0-9][0-9][0-9][0-9][0-9]'
currdate=$(date '+%Y%m%d')
# shellcheck disable=SC2231
for path in "$dir"/*_${dateglob}.* ; do
name=${path##*/}
newname=${name/_${dateglob}./_${currdate}.}
if [[ $newname != "$name" ]] ; then
newpath="$dir/$newname"
printf "%q -> %q\\n" "$path" "$newpath"
mv -i -- "$path" "$newpath"
fi
done
shopt -s nullglob stops the code trying to process a garbage path if nothing matches the glob pattern in for path in ....
The pattern assigned to dateglob assumes that you will not have to process dates before 2000 (or after 2099!). Change it if that assumption is not valid.
The # shellcheck ... line is to prevent Shellcheck warning about the use of ${dateglob} without quotes. The quotes would be wrong in this case because they would prevent the glob pattern being expanded.
The pattern used to match filenames (*_${dateglob}.*) will match many more forms of filename than the examples given (e.g. A_20180313.tar.gz). You might want to change it.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for information about the Bash string manipulation mechanisms used (${path##...}, ${name/...}).
I've added a printf to output details of what is being moved.
The -i option to mv prompts for confirmation if a file would be overwritten. This turns out to be an issue for the example files because both CN_Apria_837p_20180924.txt and CN_Apria_837p_20101502.txt are identical except for the date, so the code tries to rename them to the same thing.
If any of the files with dates in their names have names beginning with '.', the code will not process them. Add line shopt -s dotglob somewhere before the loop if that is an issue.

Expanding when one has command substitution

I have to echo an output of a command substitution concatenated with string. The string to be prepended is in fact the string of pathname. The need for use of absolute path together with filename arises due to filename containing special character,-, at the beginning of it. I've come up with a draft that only works as planned for the first line of output. How do I expand it to other lines as well?
The example scenario is as provided below.
Inside /tmp directory the files are:
-foo 1.txt
-bar 1.txt
Command and the output is:
$ echo "$PWD/$(ls | grep "^-")"
/tmp/-bar 1.txt
-foo 1.txt
While I want it to be like
/tmp/-bar 1.txt
/tmp/-foo 1.txt
I read about this brace expansion feature but I'm not sure if it works for variables, or command substitution for that matter, as stated here. I also want the separate lines for each files and the filename words unsplitted, which is suggested at when the brace expansion is carried out. (Honestly, I don't understand much of the literature about the features such as brace expansion!)
Also, are there other more convenient ways to do this? Any help is appreciated.
To do what you're asking, getting a list of full paths for files starting with -, you can use readlink:
$ readlink -f ./-*
/tmp/-bar 1.pdf
/tmp/-foo 1.pdf
However, to do directly what you mentioned in comments (using filenames starting with - as arguments for pdfgrep), you can take advantage of the common convention that -- marks the end of options, so everything after it is recognized as a filename:
pdfgrep 'pattern' -- -*
See also POSIX Utility Syntax Guidelines (Guideline 10):
The first -- argument that is not an option-argument should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the - character.
Don't use ls at all. You can use a glob, and your loop can be implicit
$ printf '%s\n' /tmp/-*
/tmp/-foo 1.txt
/tmp/-bar 1.txt
or explicit
$ for f in /tmp/-*; do echo "$f"; done
/tmp/-foo 1.txt
/tmp/-bar 1.txt
Use this
for f in /tmp/-*
do
echo $f
done

How to delete numbers, dashes and underscores in the beginning of a file name

I have thousands of mp3 files but all with unusual file names such as 1-2songone.mp3, 2songtwo.mp3, 2_2_3_songthree.mp3. I want to remove all the numbers, dashes and underscores in the beginning of these files and get the result:
songone.mp3
songtwo.mp3
songthree.mp3
This can be done using extended globbing:
$ ls
1-2songone.mp3 2_2_3_songthree.mp3 2songtwo.mp3
$ shopt -s extglob
$ for fname in *.mp3; do mv -- "$fname" "${fname##*([-_[:digit:]])}"; done
$ ls
songone.mp3 songthree.mp3 songtwo.mp3
This uses parameter expansion: ${fname##pattern} removes the longest possible match from the beginning of fname. As the pattern, we use *([-_[:digit:]]), where *(pattern) stands for "zero or more matches of pattern", and the actual pattern is a bracket expression for hyhpens, underscores and digits.
Remarks:
The -- after mv indicates the end of options for move and makes sure that filenames starting with - aren't interpreted as options.
The *() expression requires the extglob shell option. As pointed out, if you don't want extended globs later, you have to unset it again with shopt -u extglob.
As per Gordon Davisson's comment: this will clobber files if you have, for example, something like 1file.mp3 and 2file.mp3. To avoid that, you can either use mv -i (or --interactive), which will prompt you before overwriting a file, or mv -n (or --noclobber), which will just not overwrite any files.
triplee points out that this needlessly moves files onto themselves if they don't start with slash, underscore or digit. To avoid that, we can iterate only over matching files with
for fname in [-_[:digit:]]*.mp3; do mv -- "$fname" "${fname##*([-_[:digit:]])}"; done
which makes sure that there is something to rename.
Benjamin W.'s answer is helpful and efficient, but has two drawbacks:
It requires setting global shell option extglob, which should be restored to its previous value afterward (the alternative, at the cost of creating an extra process, is to use a subshell: (shopt -s extglob; for fname ...)).
The extglob syntax, an extension to regular glob syntax, is familiar to few people and still less powerful than true regular expressions.
Using Bash's regex-matching operator, =~:
for f in *.mp3; do [[ $f =~ ^[0-9_-]+(.+)$ ]] && echo mv "$f" "${BASH_REMATCH[1]}"; done
Remove the echo to perform actual renaming.
$f =~ ^[0-9_-]+(.+)$ matches the longest nonempty sequence of digits, hyphens, and underscores at the start of the filename, followed by any nonempty sequence of characters captured in a parenthesized subexpression (capture group).
If the match succeeds (&&), the mv command is invoked, with the captured subexpression - accessible via element 1 of special BASH array variable ${BASH_REMATCH[#]} - forming the target filename.
You may do it this way too :
find . -type f -name "*.mp3" -print0 | while read -r -d '' line
do
mv "$line" "$( sed -E 's!(.*)/[^[:alpha:]]*([[:alpha:]].*mp3)$!\1/\2!' <<<"$line")" 2>/dev/null
done
Using sed gives you more control over the regex, I guess. Also, the 2>/dev/null is for ignoring the mv error for already converted/correct filenames.
Note:
This will recursively change the filenames across subfolders too.

Understanding sed expression 's/^\.\///g'

I'm studying Bash programming and I find this example but I don't understand what it means:
filtered_files=`echo "$files" | sed -e 's/^\.\///g'`
In particular the argument passed to sed after '-e'.
It's a bad example; you shouldn't follow it.
First, understanding the sed expression at hand.
s/pattern/replacement/flags is the a sed command, described in detail in man sed. In this case, pattern is a regular expression; replacement is what that pattern gets replaced with when/where found; and flags describe details about how that replacement should be done.
In this case, the s/^\.\///g breaks down as follows:
s is the sed command being run.
/ is the sigil used to separate the sections of this command. (Any character can be used as a sigil, and the person who chose to use / for this expression was, to be charitable, not thinking about what they were doing very hard).
^\.\/ is the pattern to be replaced. The ^ means that this replaces anything only at the beginning; \. matches only a period, vs . (which is regex for matching any character); and \/ matches only a / (vs /, which would go on to the next section of this sed command, being the selected sigil).
The next section is an empty string, which is why there's no content between the two following sigils.
g in the flags section indicates that more than one replacement can happen each line. In conjunction with ^, this has no meaning, since there can only be one beginning-of-the-line per line; further evidence that the person who wrote your example wasn't thinking much.
Using the same data structures, doing it better:
All of the below are buggy when handling arbitrary filenames, because storing arbitrary filenames in scalar variables is buggy in general.
Still using sed:
# Use printf instead of echo to avoid bugginess if your "files" string is "-n" or "-e"
# Use "#" as your sigil to avoid needing to backslash-escape all the "\"s
filtered_files=$(printf '%s\n' "$files" | sed -e 's#^[.]/##g'`)
Replacing sed with a bash builtin:
# This is much faster than shelling out to any external tool
filtered_files=${files//.\//}
Using better data structures
Instead of running
files=$(find .)
...instead:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -print0)
That stores files in an array; it looks complex, but it's far safer -- works correctly even with filenames containing spaces, quote characters, newline literals, etc.
Also, this means you can do the following:
# Remove the leading ./ from each name; don't remove ./ at any other position in a name
filtered_files=( "${files[#]#./}" )
This means that a file named
./foo/this directory name (which has spaces) ends with a period./bar
will correctly be transformed to
foo/this directory name (which has spaces) ends with a period./bar
rather than
foo/this directory name (which has spaces) ends with a periodbar
...which would have happened with the original approach.
man sed. In particular:
-e script, --expression=script
add the script to the commands to be executed
And:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success-
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
In this case, it replaces any occurence of ./ at the beginning of a line with the empty string, in other words removing it.

How to rename a bunch of files with a specific pattern

I want to rename the files in a directory which are named with this pattern:
string1-number.html
for example:
English-5.html
what I want to do is to rename the files like this:
string2-number.string3
for example:
Dictionary-5.en
How can I do this?
I used this script, but nothing happened:
echo "English-5.html" | sed 's%\({English}\).\(\.*\)\(html\)%dictionary\2\en%'
I would suggest using the mmv tool: http://linux.dsplabs.com.au/mmv-copy-append-link-move-multiple-files-under-linux-shell-bash-by-wildcard-patterns-p5/
With that you can do:
mmv *-*.html Dictionary-#2.en
echo "English-5.html" | sed 's%English\(-[0-9][0-9]*.\)html%dictionary\1en%'
Explanation:
I'm looking for English
followed by a dash, one or more numbers, and a dot -[0-9][0-9]*. (I surround this part with escaped parenthesis to make it a group (group 1)).
followed by html
In the replacement text, I use \1 to output the contents of group 1, as well as the changed text.
You have 2 errors: The {...} is not required, and you confused \. and .
\. matches a literal dot, while . matches a single character.
echo "English-5.html" |
sed 's%\(English\)\(.*\)\.\(html\)%dictionary\2.en%'
This answer shows some minor optimizations for sed commands already posted and shows how to actually rename the files (in the current folder):
for f in *; do mv "$f" $(echo "$f" |\
sed 's/^English-\([0-9]\+\)\.html$/dictionary-\1\.en/'); done

Resources