Patterns and Regular Expressions in Shell/bash - linux

the command ls #(*xx|*AK) in bash
what is the purpose of # sign here and what does the | do?

From man bash, under "Pathname Expansion":
#(pattern-list)
Matches one of the given patterns
a pattern-list is a list of one or more patterns separated by a |.

It's not regular expression. It's Bash's extended glob pattern. See Pattern Matching in Bash's manual for details.
By default the extended glob pattern support is not enabled. To enable it, run shopt -s extglob. For more details about the shopt command, see The shopt Builtin.

Related

pattern matching in Bash of Linux

Does anybody know how to delete the pattern "#TechCrunch:" in the following str by sed in Linux?
str="0,RT #TechCrunch: The Tyranny Of Government And Our Duty Of Confidentiality As Bloggers."
So the desired output will be:
"0,RT The Tyranny Of Government And Our Duty Of Confidentiality As Bloggers."
I tried many ways but no one works, e.g:
echo $str | sed 's/#[a-zA-Z]*\ //'
Using sed (or any other external tool) for a single line that's already in a shell variable is hideously inefficient. Much easier to have the shell do the replacement itself.
#!/bin/bash
# ^- must be /bin/bash, not /bin/sh, for extglobs to be available
shopt -s extglob # put this somewhere early in your script to enable extended globs
str="0,RT #TechCrunch: The Tyranny Of Government And Our Duty Of Confidentiality As Bloggers."
echo "${str//#+([[:alpha:]]): /}"
This uses extglob syntax to provide more powerful pattern matches with built-in shell pattern matching; +(foo) is the extglob equivalent to the regex form (foo)+.
You were close - just missing the :.
perl -pe 's/#\w*:\s//i'
Or in sed:
sed -e 's/#[a-z]: //i'
: is not matched by [a-zA-Z]. Also, there's no need to backslash the space.
sed 's/#[a-zA-Z]*: //'

My regular expression isn't working in grep

Here's the text of the file I'm working with:
(4 spaces)Hi, everyone
(1 tab)yes
When I run this command - grep '^[[:space:]]+' myfile - it doesn't print anything to stdout.
Why doesn't it match the whitespace in the file?
I'm using GNU grep version 2.9.
There are several different regular expression syntaxes. The default for grep is called basic syntax in the grep documentation.
From man grep(1):
In basic regular expressions the meta-characters
?, +, {, |, (, and ) lose their special meaning; instead
use the backslashed versions \?, \+, \{, \|, \(, and \).
Therefore instead of + you should have typed \+:
grep '^[[:space:]]\+' FILE
If you need more power from your regular expressions, I also encourage you to take a look at Perl regular expression syntax. They are generally considered the most expressive. There is a C library called PCRE which emulates them, and grep links to it. To use them (instead of basic syntax) you can use grep -P.
You could use -E:
grep -E '^[[:space:]]+' FILE
This enables extended regex. Without it you get BREs (basic regex) which have a more simplified syntax. Alternatively you could run egrep instead with the same result.
I found you need to escape the +:
grep '^[[:space:]]\+' FILE
Try grep -P '^\s+' instead, provided you’re using GNU grep. It’s a lot easier to type, and has better regexes.

Bash Pattern Matching

I'm trying to use pattern matching to find all files within a directory that have an extension of either .jpg or jpeg.
ls *.[jJ][pP][eE][gG] <- this obviously will only find the .jpeg file extension. The question is, how do I make the [eE optional?
Match harder.
ls *.[jJ][pP]{[eE],}[gG]
As well as the standard (simple) glob patterns, bash ≥4.0 has extended globbing.
It is off by default. To turn it on, use: shopt -s extglob
With extglob you have access to extended regular expression patterns as well as the standard patterns. Furthermore, in your particular situation, you can tailor your glob's behaviour even further by enabling a case insensitive glob, which is also off by default. To turn it on, use: shopt -s nocaseglob
Enabling extglob does not alter how standard globs work. You can mix the two forms. The only issue is that you have to be aware of the special extended regex syntax. eg, In the example below, the only part of it which is an extended regex, is ?(e). The rest is standard glob expansion, with case-insensitivity enabled.
The extended-regex, case-insensitive glob for your situation is:
shopt -s extglob nocaseglob
ls -l *.jp?(e)g
You can find more info and examples at: Bash Extended Globbing.

Specify multiple possible patterns for a single command

Basically there a few lines which contain a common format, but different wording at the end. The command will work for all of them, but I want to match all possible pattern, thereby needing only 1 line in the script. As an example, I know how to make the script work like so:
/pattern1/ s/asdf/ghjk/g
/pattern2/ s/asdf/ghjk/g
/pattern3/ s/asdf/ghjk/g
Any ideas?
If your patterns are really as similar as in your example, you can use
sed -e '/pattern[1-3]/ s/asdf/ghjk/g'
If the patterns aren't so similar and your sed command supports extended regular expressions, you can use
sed -E -e '/(pattern1|pattern2|pattern3)/ s/asdf/ghjk/g'
# ^^ use extended regular expressions
# for GNU sed, use -r or escape (, |, and ) with \
If your sed command doesn't support extended regular expressions, you might have to turn to awk or perl:
perl -ple '/(pattern1|pattern2|pattern3)/ && s/asdf/ghjk/g'

SHELL: How do I use a or operator when defining a string

This may not even be possible but Im writing my first shell script and I need to use a regexp type operator in my string (shown below)
FILES=tif2/name(45|79)*.pdf
Is this possible? Or would I just have to have two strings.
FILES=tif2/name45*.pdf
FILES=tif2/name79*.pdf
Alternatives in the shell globbing syntax use a comma-separated list enclosed by semicolons. Your example becomes:
FILES=tif2/name{45,79}*.pdf
There's a pretty nice quick reference here to the glob syntax supported by most shells.
For the more esoteric bash-specific glob syntax, see http://www.gnu.org/software/bash/manual/bashref.html#Shell-Expansions
In Bash, zsh, pdksh and ksh93, you can use extended globbing:
shopt -s extglob # Bash
setopt KSH_GLOB # zsh
FILES=tif2/name#(45|79)*.pdf
The #() operator matches one of the patterns within which are separated by pipe characters.
If your specific shell doesn't support such advanced globbing, you can always use grep:
FILES=`ls tif2/name[0-9][0-9]*.pdf|egrep "name(45|79)" | tr "\012" " "`
If you just want the shell to ignore any special characters in the string, enclose it with single quotes.
FILES='tif2/name(45|79)*.pdf'

Resources