Bash Pattern Matching - linux

I'm trying to use pattern matching to find all files within a directory that have an extension of either .jpg or jpeg.
ls *.[jJ][pP][eE][gG] <- this obviously will only find the .jpeg file extension. The question is, how do I make the [eE optional?

Match harder.
ls *.[jJ][pP]{[eE],}[gG]

As well as the standard (simple) glob patterns, bash ≥4.0 has extended globbing.
It is off by default. To turn it on, use: shopt -s extglob
With extglob you have access to extended regular expression patterns as well as the standard patterns. Furthermore, in your particular situation, you can tailor your glob's behaviour even further by enabling a case insensitive glob, which is also off by default. To turn it on, use: shopt -s nocaseglob
Enabling extglob does not alter how standard globs work. You can mix the two forms. The only issue is that you have to be aware of the special extended regex syntax. eg, In the example below, the only part of it which is an extended regex, is ?(e). The rest is standard glob expansion, with case-insensitivity enabled.
The extended-regex, case-insensitive glob for your situation is:
shopt -s extglob nocaseglob
ls -l *.jp?(e)g
You can find more info and examples at: Bash Extended Globbing.

Related

Why isn't nullglob behaviour default in Bash?

I recently needed to use a Bash for loop to recurse through a few files in a directory:
for video in **/*.{mkv,mp4,webm}; do
echo "$video"
done
After way too much time spent debugging, I realised that the loop was run even when the pattern didn't match, resulting in:
file1.mkv
file2.mp4
**/*.webm # literal pattern printed when no .webm files can be found
Some detailed searching eventually revealed that this is known behaviour in Bash, for which enabling the shell's nullglob option with shopt -s nullglob is intended to be the solution.
Is there a reason that this counter-intuitive behaviour is the default and needs to be explicitly disabled with nullglob, instead of the other way around? Or to put the question another way, are there any disadvantages to always having nullglob enabled?
From man 7 glob:
Empty lists
The nice and simple rule given above: "expand a wildcard pattern
into the list of matching pathnames" was the original UNIX
definition. It allowed one to have patterns that expand into an
empty list, as in
xv -wait 0 *.gif *.jpg
where perhaps no *.gif files are present (and this is not an
error). However, POSIX requires that a wildcard pattern is left
unchanged when it is syntactically incorrect, or the list of
matching pathnames is empty. With bash one can force the
classical behavior using this command:
shopt -s nullglob
(Similar problems occur elsewhere. For example, where old
scripts have
rm `find . -name "*~"`
new scripts require
rm -f nosuchfile `find . -name "*~"`
to avoid error messages from rm called with an empty argument
list.)
In short, it is the behavior required to be POSIX compatible.
Granted though, you can now ask what the rationale for POSIX was to specify that behavior. See this unix.stackexchange.com question for some reasons/history.

Copying files with even number in its name - bash

I want to copy all files from /usr/lib which ends with .X.0.0 where X is an even number. Is there a better way than the following one to select all the files?
ls /usr/lib | grep "[02468].0.0$"
My problem with this solutions is that in case there are files with names like "xy.800.0.0" I need to use the bracket three times etc.
Just use a glob expansion to match the files:
cp /usr/lib/*.*[02468].0.0 /path/to/destination
The shell expands this pattern to the list of files before passing them as arguments to cp.
Since you tagged Bash, you can make the match more strict by using an extended glob:
shopt -s extglob failglob
cp /usr/lib/*.*([0-9])[02468].0.0 /path/to/destination
This matches 0 or more other digits followed by an even digit, and doesn't run the command at all if no files match.
You could use extended grep regular expressions to only match even numbers:
ls -1q /usr/lib | grep -E "\.[0-9]*[02468].0.0$"
However, as Tom suggested, there are better options than parsing the output of ls. It's generally safer and faster to use glob expansion, and more maintainable to just put everything in a python script.

Patterns and Regular Expressions in Shell/bash

the command ls #(*xx|*AK) in bash
what is the purpose of # sign here and what does the | do?
From man bash, under "Pathname Expansion":
#(pattern-list)
Matches one of the given patterns
a pattern-list is a list of one or more patterns separated by a |.
It's not regular expression. It's Bash's extended glob pattern. See Pattern Matching in Bash's manual for details.
By default the extended glob pattern support is not enabled. To enable it, run shopt -s extglob. For more details about the shopt command, see The shopt Builtin.

Why does grep work differently in Debian vs CentOs?

Take this simple shell function...
function search (){
grep -roiI --include $2 $1 . | sort -u
}
And then use it like this:
# search drop *.properties
In CentOs it will function as desired returning a list of grep'd results. However, in Debian, it parses the special chars in "*.properties" as a regex, thus not grep'ing properly. Why is Debian parsing special chars and CentOs not?
This sounds like different settings for the nullglob shell option which controls what happens when you use a glob (something with a wildcard) and there are no files matching that glob. With nullglob turned on, this would treat ".properties" as a list of files, even if that was an empty list, with nullglob turned off, this would treat ".properties" as a string if it didn't match any files. You can try disabling nullglob with shopt -u nullglob and turn it back on with shopt -s nullglob.
However, in this case, when you do NOT want *.properties to be treated as a glob ever, and you want this string to be passed directly into your script, you should be either escaping the * as search drop \*.properties or you should be quoting the string with either double or single quotes: search drop '*.properties'. Similarly, in your search script, you should be enclosing the $2, $1 parameters in double qoutes.
Maybe grep is not the issue. It can be a shell expansion problem.
On bash:
Bash scans each word for the characters ‘*’, ‘?’, and ‘[’. If one of these
characters appears, then the word is regarded as a pattern, and replaced with
an alphabetically sorted list of file names matching the pattern.

SHELL: How do I use a or operator when defining a string

This may not even be possible but Im writing my first shell script and I need to use a regexp type operator in my string (shown below)
FILES=tif2/name(45|79)*.pdf
Is this possible? Or would I just have to have two strings.
FILES=tif2/name45*.pdf
FILES=tif2/name79*.pdf
Alternatives in the shell globbing syntax use a comma-separated list enclosed by semicolons. Your example becomes:
FILES=tif2/name{45,79}*.pdf
There's a pretty nice quick reference here to the glob syntax supported by most shells.
For the more esoteric bash-specific glob syntax, see http://www.gnu.org/software/bash/manual/bashref.html#Shell-Expansions
In Bash, zsh, pdksh and ksh93, you can use extended globbing:
shopt -s extglob # Bash
setopt KSH_GLOB # zsh
FILES=tif2/name#(45|79)*.pdf
The #() operator matches one of the patterns within which are separated by pipe characters.
If your specific shell doesn't support such advanced globbing, you can always use grep:
FILES=`ls tif2/name[0-9][0-9]*.pdf|egrep "name(45|79)" | tr "\012" " "`
If you just want the shell to ignore any special characters in the string, enclose it with single quotes.
FILES='tif2/name(45|79)*.pdf'

Resources