Check file extension BASH - linux

I'm trying to compare 2 files that have the same name and the same contents but I only want to keep one of them.
Lets say I have the following
/root/dir1/file1.doc
/root/dir1/file1.pdf
I only want to keep the .doc file what would be the best way to go about this?

From the dash manpage (and I believe all bash, ksh, and zsh "inherit" these features):
${parameter%word} Remove Smallest Suffix Pattern. The word is
expanded to produce a pattern. The parameter expansion then results
in parameter, with the smallest portion of the suffix matched by the
pattern
deleted.
${parameter%%word} Remove Largest Suffix Pattern. The word is expanded to produce a pattern. The parameter expansion then results
in parameter, with the largest portion of the suffix matched by the
pattern
deleted.
${parameter#word} Remove Smallest Prefix Pattern. The word is expanded to produce a pattern. The parameter expansion then
results in parameter, with the smallest portion of the prefix matched
by the pattern
deleted.
${parameter##word} Remove Largest Prefix Pattern. The word is expanded to produce a pattern. The parameter expansion then results
in parameter, with the largest portion of the prefix matched by the
pattern
In practice:
shortSuff(){ printf '%s\n' "${1#*.}"; }
#^applies the string op to the $1 positional parameter
#`printf '%s\n'` is like `echo` but doesn't break on e.g., hyphen-prefix arguments
file=/root/dir1/file1.doc
[ shortSuff "$file" = doc ] && echo "Yes, the short suffix is doc"

You can write a short script to do this for you:
#!/bin/bash
EXT="${1}"
if -z "${EXT}"; then EXT=".doc"; fi
TD="$(mktemp -d)"
mv ./*"${EXT}" "${TD}"
rm ./*
mv "${TD}"/* .
rmdir "${TD}"
Assuming your script is called keep_copy.sh, you would call it as follows: keep_copy.sh .doc.

Related

Rename filenames in shellscript substring

I have files which are named as "images123.jpg", "images456.jpg" etc
I would want to mv these files into testfolder folder and rename them accordingly to "123.jpg", "456.jpg" etc.
This is what i tried
for file in *.jpg; d
mv $file testfolder/($file | cut -c7-)
done
The below script should do what you need.
#!/bin/sh
for file in images*.jpg; do
mv ${file} testfolder/${file#images}
done
The key part is ${file#images}. That is a bash shell parameter expansion:
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the shortest matching pattern (the ‘#’ case) or the longest matching pattern (the ‘##’ case) deleted.
In this particular case it matches and strips images from the start of each file name.

Don't understand bash parameter declaration

I have a problem with reading this parameter below:
I don't understand the purpose of using this $(basename "$0") where is it come from.
${BINARY%/*} seems it try to get the path of the directory but what exactly why just need to like this.
DIR_NAME=$(dirname "$0")
FILE_NAME=$(basename "$0")
BINARY=`readlink ${ROOT_DIR}/${DIR_NAME}/${FILE_NAME} -f`
BIN_PATH=${BINARY%/*}
$0 is the pathname of the script being run. So $(dirname "$0") returns the directory of the script, and $(basename "$0") is the filename.
${BINARY%/*} is explained in Shell Parameter Expansion
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern just as in filename expansion. If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted.
So this finds the trailing portion of $BINARY that matches /* and removes it, which returns the directory portion. It's equivalent to $(dirname "$BINARY")

Understanding sed expression 's/^\.\///g'

I'm studying Bash programming and I find this example but I don't understand what it means:
filtered_files=`echo "$files" | sed -e 's/^\.\///g'`
In particular the argument passed to sed after '-e'.
It's a bad example; you shouldn't follow it.
First, understanding the sed expression at hand.
s/pattern/replacement/flags is the a sed command, described in detail in man sed. In this case, pattern is a regular expression; replacement is what that pattern gets replaced with when/where found; and flags describe details about how that replacement should be done.
In this case, the s/^\.\///g breaks down as follows:
s is the sed command being run.
/ is the sigil used to separate the sections of this command. (Any character can be used as a sigil, and the person who chose to use / for this expression was, to be charitable, not thinking about what they were doing very hard).
^\.\/ is the pattern to be replaced. The ^ means that this replaces anything only at the beginning; \. matches only a period, vs . (which is regex for matching any character); and \/ matches only a / (vs /, which would go on to the next section of this sed command, being the selected sigil).
The next section is an empty string, which is why there's no content between the two following sigils.
g in the flags section indicates that more than one replacement can happen each line. In conjunction with ^, this has no meaning, since there can only be one beginning-of-the-line per line; further evidence that the person who wrote your example wasn't thinking much.
Using the same data structures, doing it better:
All of the below are buggy when handling arbitrary filenames, because storing arbitrary filenames in scalar variables is buggy in general.
Still using sed:
# Use printf instead of echo to avoid bugginess if your "files" string is "-n" or "-e"
# Use "#" as your sigil to avoid needing to backslash-escape all the "\"s
filtered_files=$(printf '%s\n' "$files" | sed -e 's#^[.]/##g'`)
Replacing sed with a bash builtin:
# This is much faster than shelling out to any external tool
filtered_files=${files//.\//}
Using better data structures
Instead of running
files=$(find .)
...instead:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(find . -print0)
That stores files in an array; it looks complex, but it's far safer -- works correctly even with filenames containing spaces, quote characters, newline literals, etc.
Also, this means you can do the following:
# Remove the leading ./ from each name; don't remove ./ at any other position in a name
filtered_files=( "${files[#]#./}" )
This means that a file named
./foo/this directory name (which has spaces) ends with a period./bar
will correctly be transformed to
foo/this directory name (which has spaces) ends with a period./bar
rather than
foo/this directory name (which has spaces) ends with a periodbar
...which would have happened with the original approach.
man sed. In particular:
-e script, --expression=script
add the script to the commands to be executed
And:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success-
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
In this case, it replaces any occurence of ./ at the beginning of a line with the empty string, in other words removing it.

Use sed/grep to get string from tail to a char in middle

I have some strings (variables) i need to edit in the bash for further analysis
They consist of things like
str="~/folder/item"
How can I use sed or grep to grab just "item" (meaning from tail to char '/')?
Refer to Shell Parameter Expansion. You can say:
$ str="~/folder/item"
$ echo ${str##*/}
item
Quoting from the manual:
${parameter##word}
The word is expanded to produce a pattern just as in filename
expansion (see Filename Expansion). If the pattern matches the
beginning of the expanded value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ‘#’ case) or the longest matching pattern (the
‘##’ case) deleted. If parameter is ‘#’ or ‘*’, the pattern removal
operation is applied to each positional parameter in turn, and the
expansion is the resultant list. If parameter is an array variable
subscripted with ‘#’ or ‘*’, the pattern removal operation is applied
to each member of the array in turn, and the expansion is the
resultant list.
Using grep:
$ grep -Po '.*/\K.*' <<< $str
item
Assuming you are dealing with a text file containing lines like the one shown, then:
sed 's%.*/\([^/]*\)"%\1%' <<< 'str="~/folder/item"'
This yields:
item
If you are dealing with a variable str that contains a string ~/folder/item, then you can use:
basename "$str"
or:
echo "${str##*/}"
basename "${str}" gives the last part of the string after the final /, including any file extensions you may have. If you want to do the opposite and grab the directory, use dirname "${str}"
echo $str|awk -F"/" '{print $NF}'
or
echo "$str" | perl -pe 's/.*\///g'
basename also suitable for this
for example
str="~/folder/item"
name=$(basename $file)
echo $name
the output would be "item"

Extract file basename without path and extension in bash [duplicate]

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 6 years ago.
Given file names like these:
/the/path/foo.txt
bar.txt
I hope to get:
foo
bar
Why this doesn't work?
#!/bin/bash
fullfile=$1
fname=$(basename $fullfile)
fbname=${fname%.*}
echo $fbname
What's the right way to do it?
You don't have to call the external basename command. Instead, you could use the following commands:
$ s=/the/path/foo.txt
$ echo "${s##*/}"
foo.txt
$ s=${s##*/}
$ echo "${s%.txt}"
foo
$ echo "${s%.*}"
foo
Note that this solution should work in all recent (post 2004) POSIX compliant shells, (e.g. bash, dash, ksh, etc.).
Source: Shell Command Language 2.6.2 Parameter Expansion
More on bash String Manipulations: http://tldp.org/LDP/LG/issue18/bash.html
The basename command has two different invocations; in one, you specify just the path, in which case it gives you the last component, while in the other you also give a suffix that it will remove. So, you can simplify your example code by using the second invocation of basename. Also, be careful to correctly quote things:
fbname=$(basename "$1" .txt)
echo "$fbname"
A combination of basename and cut works fine, even in case of double ending like .tar.gz:
fbname=$(basename "$fullfile" | cut -d. -f1)
Would be interesting if this solution needs less arithmetic power than Bash Parameter Expansion.
Here are oneliners:
$(basename "${s%.*}")
$(basename "${s}" ".${s##*.}")
I needed this, the same as asked by bongbang and w4etwetewtwet.
Pure bash, no basename, no variable juggling. Set a string and echo:
p=/the/path/foo.txt
echo "${p//+(*\/|.*)}"
Output:
foo
Note: the bash extglob option must be "on", (Ubuntu sets extglob "on" by default), if it's not, do:
shopt -s extglob
Walking through the ${p//+(*\/|.*)}:
${p -- start with $p.
// substitute every instance of the pattern that follows.
+( match one or more of the pattern list in parenthesis, (i.e. until item #7 below).
1st pattern: *\/ matches anything before a literal "/" char.
pattern separator | which in this instance acts like a logical OR.
2nd pattern: .* matches anything after a literal "." -- that is, in bash the "." is just a period char, and not a regex dot.
) end pattern list.
} end parameter expansion. With a string substitution, there's usually another / there, followed by a replacement string. But since there's no / there, the matched patterns are substituted with nothing; this deletes the matches.
Relevant man bash background:
pattern substitution:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pat
tern just as in pathname expansion. Parameter is expanded and
the longest match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are
replaced with string. Normally only the first match is
replaced. If pattern begins with #, it must match at the begin‐
ning of the expanded value of parameter. If pattern begins with
%, it must match at the end of the expanded value of parameter.
If string is null, matches of pattern are deleted and the / fol
lowing pattern may be omitted. If parameter is # or *, the sub
stitution operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the substitution
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
extended pattern matching:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Here is another (more complex) way of getting either the filename or extension, first use the rev command to invert the file path, cut from the first . and then invert the file path again, like this:
filename=`rev <<< "$1" | cut -d"." -f2- | rev`
fileext=`rev <<< "$1" | cut -d"." -f1 | rev`
If you want to play nice with Windows file paths (under Cygwin) you can also try this:
fname=${fullfile##*[/|\\]}
This will account for backslash separators when using BaSH on Windows.
Just an alternative that I came up with to extract an extension, using the posts in this thread with my own small knowledge base that was more familiar to me.
ext="$(rev <<< "$(cut -f "1" -d "." <<< "$(rev <<< "file.docx")")")"
Note: Please advise on my use of quotes; it worked for me but I might be missing something on their proper use (I probably use too many).
Use the basename command. Its manpage is here: http://unixhelp.ed.ac.uk/CGI/man-cgi?basename

Resources