basename command confusion - linux

Given the following command:
$(basename "/this-directory-does-not-exist/*.txt" ".txt")
it outputs not only txt files but other files as well. On the other hand if I change ".txt" to something like "gobble de gook" it returns:
*.txt
I'm confused with regard to why it returns the other extension types.

Your problem doesn't stem from basename, but from inadvertent use of the shell's pathname expansion (globbing) feature due to lack of quoting:
If you use the result of your command substitution ($(...)) unquoted:
$ echo $(basename "/this-directory-does-not-exist/*.txt" ".txt")
you effectively execute the following:
$ echo * # unquoted '*' expands to all files and folders in the current dir
because basename "/this-directory-does-not-exist/*.txt" ".txt" returns literal * (it strips the extension from filename *.txt;
the reason that the filename pattern *.txt didn't expand to an actual filename is that the shell leaves globbing patterns that don't match anything unmodified (by default).)
If you double-quote the command substitution, the problem goes away:
$ echo "$(basename "/this-directory-does-not-exist/*.txt" ".txt")" # -> *
However, even with this problem resolved, your basename command will only work correctly if the glob expands to one matching file, because the syntax form you're using only supports one filename argument.
GNU basename and BSD basename support the non-POSIX -s option, which allows for multiple file operands from which to strip the extension:
basename -s .txt "/some-dir/*.txt"
Assuming you use bash, you can put it all together robustly as follows:
#!/usr/bin/env bash
names=() # initialize result array
files=( *.txt ) # perform globbing and capture matching paths in an array
# Since the shell by default returns a pattern as-is if there are no matches,
# we test the first array item for existence; if it refers to an existing
# file or dir., we know that at least 1 match was found.
if [[ -e ${files[0]} ]]; then
# Apply the `basename` command with suffix-stripping to all matches
# and read the results robustly into an array.
# Note that just `names=( $(basename ...) )` would NOT work robustly.
readarray -t names < <(basename -s '.txt' "${files[#]}")
# Note: `readarray` requires Bash 4; in Bash 3.x, use the following:
# IFS=$'\n' read -r -d '' -a names < <(basename -s '.txt' "${files[#]}")
fi
# "${names[#]}" now contains an array of suffix-stripped basenames,
# or is empty, if no files matched.
printf '%s\n' "${names[#]}" # print names line by line
Note: The -e test comes with a tiny caveat: if there are matches and the first match is a broken symlink, the test will mistakenly conclude that there are no matches.
A more robust option is to use shopt -s nullglob to make the shell expand non-matching globs to the empty string, but note that this is a shell-global option, and it is good practice to return it to its previous value afterward, which makes that approach more cumbersome.

Try to put quotes around the whole thing, what you is globbing happening, your command becomes * which then is converted to all files in the current directory, this does not happen inside single or double quotes.

Related

How to replace date part in filename with current date

How to replace only date part to current date of all files present in diretory in unix.
Folder path: C:/shan
Sample files:
CN_Apria_837p_20180924.txt
DN_Apria_837p_20150502.txt
GN_Apria_837p_20160502.txt
CH_Apria_837p_20170502.txt
CU_Apria_837p_20180502.txt
PN_Apria_837p_20140502.txt
CN_Apria_837p_20101502.txt
Desired result should be:
CN_Apria_837p_20190502.txt
DN_Apria_837p_20190502.txt
GN_Apria_837p_20190502.txt
CH_Apria_837p_20190502.txt
CU_Apria_837p_20190502.txt
PN_Apria_837p_20190502.txt
CN_Apria_837p_20190502.txt
Edit:
I'm completely new to unix sell scripting. I tried this below, however it's not working.
#!/bin/bash
for i in ls $1 | grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}'
do
x=echo $i | grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2}'
y=echo $i | sed "s/$x/$(date +%F)/g"
mv $1/$i $1/$y 2>/dev/null #incase if old date is same as current date
done
I would use regular expressions here. From the bash man-page:
An additional binary operator, =~, is available, with the same
precedence as == and !=. When it is used, the string to the right
of the operator is considered an extended regular expression and
matched accordingly (as in regex(3)). The return value is 0 if the
string matches the pattern, and 1 otherwise. .... Substrings
matched by parenthesized subexpressions within the regular
expression are saved in the array variable BASH_REMATCH. ...
The element of BASH_REMATCH with indexn is the portion of the
string matching the nth parenthesized sub-expression.
Hence, assuming that the variable x holds the name of one of the files
in question, the code
if [[ $x =~ ^(.*_)[0-9]+([.]txt$) ]]
then
mv "$x" "$BASH_REMATCH[1]$(date +%Y%m%d)$BASH_REMATCH[2]"
fi
first tests roughly whether the file indeed follows the required naming scheme, and then modifies the name accordingly.
Of course in practice, you will tailor the regexp to match your application better. Only you can know what variations in the file name are permitted.
The below should do this
for f in $(find /path/to/files -name "*_*_*_*.txt")
do
newname=$(echo "$f" | sed -r "s/[12][0-9]{3}[01][0-9][0-3][0-9]/$(date '+%Y%m%d')/g")
mv "$f" "$newname"
done
Try this Shellcheck-clean code:
#! /bin/bash -p
readonly dir=$1
shopt -s nullglob # Make glob patterns that match nothing expand to nothing
readonly dateglob='20[0-9][0-9][0-9][0-9][0-9][0-9]'
currdate=$(date '+%Y%m%d')
# shellcheck disable=SC2231
for path in "$dir"/*_${dateglob}.* ; do
name=${path##*/}
newname=${name/_${dateglob}./_${currdate}.}
if [[ $newname != "$name" ]] ; then
newpath="$dir/$newname"
printf "%q -> %q\\n" "$path" "$newpath"
mv -i -- "$path" "$newpath"
fi
done
shopt -s nullglob stops the code trying to process a garbage path if nothing matches the glob pattern in for path in ....
The pattern assigned to dateglob assumes that you will not have to process dates before 2000 (or after 2099!). Change it if that assumption is not valid.
The # shellcheck ... line is to prevent Shellcheck warning about the use of ${dateglob} without quotes. The quotes would be wrong in this case because they would prevent the glob pattern being expanded.
The pattern used to match filenames (*_${dateglob}.*) will match many more forms of filename than the examples given (e.g. A_20180313.tar.gz). You might want to change it.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for information about the Bash string manipulation mechanisms used (${path##...}, ${name/...}).
I've added a printf to output details of what is being moved.
The -i option to mv prompts for confirmation if a file would be overwritten. This turns out to be an issue for the example files because both CN_Apria_837p_20180924.txt and CN_Apria_837p_20101502.txt are identical except for the date, so the code tries to rename them to the same thing.
If any of the files with dates in their names have names beginning with '.', the code will not process them. Add line shopt -s dotglob somewhere before the loop if that is an issue.

How to delete numbers, dashes and underscores in the beginning of a file name

I have thousands of mp3 files but all with unusual file names such as 1-2songone.mp3, 2songtwo.mp3, 2_2_3_songthree.mp3. I want to remove all the numbers, dashes and underscores in the beginning of these files and get the result:
songone.mp3
songtwo.mp3
songthree.mp3
This can be done using extended globbing:
$ ls
1-2songone.mp3 2_2_3_songthree.mp3 2songtwo.mp3
$ shopt -s extglob
$ for fname in *.mp3; do mv -- "$fname" "${fname##*([-_[:digit:]])}"; done
$ ls
songone.mp3 songthree.mp3 songtwo.mp3
This uses parameter expansion: ${fname##pattern} removes the longest possible match from the beginning of fname. As the pattern, we use *([-_[:digit:]]), where *(pattern) stands for "zero or more matches of pattern", and the actual pattern is a bracket expression for hyhpens, underscores and digits.
Remarks:
The -- after mv indicates the end of options for move and makes sure that filenames starting with - aren't interpreted as options.
The *() expression requires the extglob shell option. As pointed out, if you don't want extended globs later, you have to unset it again with shopt -u extglob.
As per Gordon Davisson's comment: this will clobber files if you have, for example, something like 1file.mp3 and 2file.mp3. To avoid that, you can either use mv -i (or --interactive), which will prompt you before overwriting a file, or mv -n (or --noclobber), which will just not overwrite any files.
triplee points out that this needlessly moves files onto themselves if they don't start with slash, underscore or digit. To avoid that, we can iterate only over matching files with
for fname in [-_[:digit:]]*.mp3; do mv -- "$fname" "${fname##*([-_[:digit:]])}"; done
which makes sure that there is something to rename.
Benjamin W.'s answer is helpful and efficient, but has two drawbacks:
It requires setting global shell option extglob, which should be restored to its previous value afterward (the alternative, at the cost of creating an extra process, is to use a subshell: (shopt -s extglob; for fname ...)).
The extglob syntax, an extension to regular glob syntax, is familiar to few people and still less powerful than true regular expressions.
Using Bash's regex-matching operator, =~:
for f in *.mp3; do [[ $f =~ ^[0-9_-]+(.+)$ ]] && echo mv "$f" "${BASH_REMATCH[1]}"; done
Remove the echo to perform actual renaming.
$f =~ ^[0-9_-]+(.+)$ matches the longest nonempty sequence of digits, hyphens, and underscores at the start of the filename, followed by any nonempty sequence of characters captured in a parenthesized subexpression (capture group).
If the match succeeds (&&), the mv command is invoked, with the captured subexpression - accessible via element 1 of special BASH array variable ${BASH_REMATCH[#]} - forming the target filename.
You may do it this way too :
find . -type f -name "*.mp3" -print0 | while read -r -d '' line
do
mv "$line" "$( sed -E 's!(.*)/[^[:alpha:]]*([[:alpha:]].*mp3)$!\1/\2!' <<<"$line")" 2>/dev/null
done
Using sed gives you more control over the regex, I guess. Also, the 2>/dev/null is for ignoring the mv error for already converted/correct filenames.
Note:
This will recursively change the filenames across subfolders too.

Bash and Variable Substitution for file with space in their name: application for gpsbabel

I am trying to program a script to run gpsbabel. I am stuck to handle files with name containing (white) spaces.
My problem is in the bash syntax. Any help or insight from bash programmers will be much appreciated.
gpsbabel is software which permit merging of tracks recorded by gps devices.
The syntax for my purpose and which is working is:
gpsbabel -i gpx -f "file 1.gpx" -f "file 2.gpx" -o gpx -F output.gpx -x track,merge
The input format of the GPS data is given by -i , the output format by -o.
The input data files are listed after -f, and the resulting file after -F
(ref. gpsbabel manual, see example 4.9)
I am trying to write a batch to run this syntax with a number of input file not known initially. It means that the sequence -f "name_of_the_input_file" has to be repeated for each input file passed from the batch parameters.
Here is a script working for file with no spaces in their name
#!/bin/bash
# Append multiple gpx files easily
# batch name merge_gpx.sh
# Usage:
# merge_gpx.sh track_*.gpx
gpsbabel -i gpx $(echo $* | for GPX; do echo -n " -f $GPX "; done) \
-o gpx -F appended.gpx
`
So I tried to modify this script to handle also filename with containing spaces.
I got lost in the bash substitution and wrote and more sequenced bash for debugging purpose with no success.
Here is one of my trial
I get an error from gpsbabel "Extra arguments on command line" suggesting that I made a mistake in the variable usage.
#/bin/bash
# Merging all tracks in a single one
old_IFS=$IFS # Backup internal separator
IFS=$'\n' # New IFS
let i=0
echo " Merging GPX files"
for file in $(ls -1 "$#")
do
let i++
echo "i=" $i "," "$file"
tGPX[$i]=$file
done
IFS=$old_IFS #
#
echo "Number of files:" ${#tGPX[#]}
echo
# List of the datafile to treat (each name protected with a ')
LISTE=$(for (( ifile=1; ifile<=${#tGPX[#]} ; ifile++)) ;do echo -ne " -f '""${tGPX[$ifile]}""'"; done)
echo "LISTE: " $(echo -n $LISTE)
echo "++Merging .."
if (( $i>=1 )); then
gpsbabel -t \
-i gpx $(echo -n $LISTE) \
-x track,merge,title="TEST COMPIL" \
-o gpx -F track_compil.gpx
else
echo "Wrong selection of input file"
fi
#end
You are making things way more complicated for yourself than they need to be.
Any reasonably posix/gnu-compatible utility which takes an option in the form of two command-line arguments (-f STRING, or equivalently -f FILENAME) should also accept a single command-line argument -fSTRING. If the utility uses either getopt or getopt_long, this is automatic. gpsbabel appears to not use standard posix or gnu libraries for argument parsing, but I believe it still gets this right.
Apparently, your script expects its arguments to be a list of filenames; presumably, if the filenames include whitespace, you will quote the names which include whitespace:
./myscript "file 1.gpx" "file 2.gpx"
In that case, you only need to change the list of arguments by prepending -f to each one, so that the argument list becomes, in effect:
"-ffile 1.gpx" "-ffile 2.gpx"
That's extremely straightforward. We'll use the bash-specific find-and-replace syntax, described in the bash manual: (I highlighted the two features this solution uses)
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern just as in pathname expansion. Parameter is expanded and the longest match of pattern against its value is replaced with string. If pattern begins with /, all matches of pattern are replaced with string. Normally only the first match is replaced. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If parameter is # or *, the substitution operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with # or *, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.
So, "${#/#/-f}" is the list of arguments (#), with the empty pattern at the beginning (#) replaced with -f:
#/bin/bash
# Merging all tracks in a single one
# $# is the number of arguments to the script.
if (( $# > 0 )); then
gpsbabel -t \
-i gpx "${#/#/-f}" \
-x track,merge,title="TEST COMPIL" \
-o gpx -F track_compil.gpx
else
# I changed the error message to make it more clear, sent it to stderr
# and cause the script to fail.
echo "No input files specified" >> /dev/stderr
exit 1
fi
Use an array:
files=()
for f; do
files+=(-f "$f")
done
gpsbabel -i gpx "${files[#]}" -o gpx -F appended.gpx
for f; do is short for for f in "$#"; do; most often you want to use $# to access the command-line arguments instead of $*. Quoting "${files[#]}" produces a list of words, one per element, that are treated as if they were quoted, so array elements containing whitespace are treated as a single word.

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.
To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.
There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out
rename .bak '' *.bak
(rename is in the util-linux package)
Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done
You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

Shell Script: Truncating String

I have two folders full of trainings and corresponding testfiles and I'd like to run the fitting pairs against each other using a shell script.
This is what I have so far:
for x in SpanishLS.train/*.train
do
timbl -f $x -t SpanishLS.test/$x.test
done
This is supposed to take file1(-n).train in one directory, look for file1(-n).test in the other, and run them trough a tool called timbl.
What it does instead is look for a file called SpanishLS.train/file1(-n).train.test which of course doesn't exist.
What I tried to do, to no avail, is truncate $x in a way that lets the script find the correct file, but whenever I do this, $x is truncated way too early, resulting in the script not even finding the .train file.
How should I code this?
If I got you right, this will do the job:
for x in SpanishLS.train/*.train
do
y=${x##*/} # strip basepath
y=${y%.*} # strip extention
timbl -f $x -t SpanishLS.test/$y.test
done
Use basename:
for x in SpanishLS.train/*.train
do
timbl -f $x -t SpanishLS.test/$(basename "$x" .train).test
done
That removes the directory prefix and the .train suffix from $x, and builds up the name you want.
In bash (and other POSIX-compliant shells), you can do the basename operation with two shell parameter expansions without invoking an external program. (I don't think there's a way to combine the two expansions into one.)
for x in SpanishLS.train/*.train
do
y=${x##*/} # Remove path prefix
timbl -f $x -t SpanishLS.test/${y%.train}.test # Remove .train suffix
done
Beware: bash supports quite a number of (useful) expansions that are not defined by POSIX. For example, ${y//.train/.test} is a bash-only notation (or bash and compatible shells notation).
Replace all occurences of .train in the filename to .text:
timbl -f $x -t $(echo $x | sed 's/\.train/.text/g')

Resources