SHELL: How do I use a or operator when defining a string - linux

This may not even be possible but Im writing my first shell script and I need to use a regexp type operator in my string (shown below)
FILES=tif2/name(45|79)*.pdf
Is this possible? Or would I just have to have two strings.
FILES=tif2/name45*.pdf
FILES=tif2/name79*.pdf

Alternatives in the shell globbing syntax use a comma-separated list enclosed by semicolons. Your example becomes:
FILES=tif2/name{45,79}*.pdf
There's a pretty nice quick reference here to the glob syntax supported by most shells.
For the more esoteric bash-specific glob syntax, see http://www.gnu.org/software/bash/manual/bashref.html#Shell-Expansions

In Bash, zsh, pdksh and ksh93, you can use extended globbing:
shopt -s extglob # Bash
setopt KSH_GLOB # zsh
FILES=tif2/name#(45|79)*.pdf
The #() operator matches one of the patterns within which are separated by pipe characters.

If your specific shell doesn't support such advanced globbing, you can always use grep:
FILES=`ls tif2/name[0-9][0-9]*.pdf|egrep "name(45|79)" | tr "\012" " "`

If you just want the shell to ignore any special characters in the string, enclose it with single quotes.
FILES='tif2/name(45|79)*.pdf'

Related

how to escape file path in bash script variable

I would like to escape a file path that is stored in a variable in a bash script.
I read several threads about escaping back ticks or but it seems not working as it should:
I have this variable:
The variables value is entered during the bash script execution as user parameter
CONFIG="/home/teams/blabla/blabla.yaml"
I would need to change this to: \/home\/teams\/blabla\/blabla.yaml
How can I do that with in the script via sed or so (not manually)?
With GNU bash and its Parameter Expansion:
echo "${CONFIG//\//\\/}"
Output:
\/home\/teams\/blabla\/blabla.yaml
Using the solution from this question, in your case it will look like this:
CONFIG=$(echo "/home/teams/blabla/blabla.yaml" | sed -e 's/[]\/$*.^[]/\\&/g')
echo "/home/teams/blabla/blabla.yaml" | sed 's/\//\\\//g'
\/home\/teams\/blabla\/blabla.yaml
explanation:
backslash is used to set the following letter/symbol as an regular expression or vice versa. double backslash is used when you need a backslash as letter.
Why does that need escaping? Is this an XY Problem?
If the issue is that you are trying to use that variable in a substitution regex, then the examples given should work, but you might benefit by removing some of the "leaning toothpick syndrom", which many tools can do just by using a different match delimiter. sed, for example:
$: sed "s,SOME_PLACEHOLDER_VALUE,$CONFIG," <<< SOME_PLACEHOLDER_VALUE
/home/teams/blabla/blabla.yaml
Be very careful about this, though. Commas are perfectly valid characters in a filename, as are almost anything but NULLs. Know your data.

Patterns and Regular Expressions in Shell/bash

the command ls #(*xx|*AK) in bash
what is the purpose of # sign here and what does the | do?
From man bash, under "Pathname Expansion":
#(pattern-list)
Matches one of the given patterns
a pattern-list is a list of one or more patterns separated by a |.
It's not regular expression. It's Bash's extended glob pattern. See Pattern Matching in Bash's manual for details.
By default the extended glob pattern support is not enabled. To enable it, run shopt -s extglob. For more details about the shopt command, see The shopt Builtin.

Why does grep work differently in Debian vs CentOs?

Take this simple shell function...
function search (){
grep -roiI --include $2 $1 . | sort -u
}
And then use it like this:
# search drop *.properties
In CentOs it will function as desired returning a list of grep'd results. However, in Debian, it parses the special chars in "*.properties" as a regex, thus not grep'ing properly. Why is Debian parsing special chars and CentOs not?
This sounds like different settings for the nullglob shell option which controls what happens when you use a glob (something with a wildcard) and there are no files matching that glob. With nullglob turned on, this would treat ".properties" as a list of files, even if that was an empty list, with nullglob turned off, this would treat ".properties" as a string if it didn't match any files. You can try disabling nullglob with shopt -u nullglob and turn it back on with shopt -s nullglob.
However, in this case, when you do NOT want *.properties to be treated as a glob ever, and you want this string to be passed directly into your script, you should be either escaping the * as search drop \*.properties or you should be quoting the string with either double or single quotes: search drop '*.properties'. Similarly, in your search script, you should be enclosing the $2, $1 parameters in double qoutes.
Maybe grep is not the issue. It can be a shell expansion problem.
On bash:
Bash scans each word for the characters ‘*’, ‘?’, and ‘[’. If one of these
characters appears, then the word is regarded as a pattern, and replaced with
an alphabetically sorted list of file names matching the pattern.

My regular expression isn't working in grep

Here's the text of the file I'm working with:
(4 spaces)Hi, everyone
(1 tab)yes
When I run this command - grep '^[[:space:]]+' myfile - it doesn't print anything to stdout.
Why doesn't it match the whitespace in the file?
I'm using GNU grep version 2.9.
There are several different regular expression syntaxes. The default for grep is called basic syntax in the grep documentation.
From man grep(1):
In basic regular expressions the meta-characters
?, +, {, |, (, and ) lose their special meaning; instead
use the backslashed versions \?, \+, \{, \|, \(, and \).
Therefore instead of + you should have typed \+:
grep '^[[:space:]]\+' FILE
If you need more power from your regular expressions, I also encourage you to take a look at Perl regular expression syntax. They are generally considered the most expressive. There is a C library called PCRE which emulates them, and grep links to it. To use them (instead of basic syntax) you can use grep -P.
You could use -E:
grep -E '^[[:space:]]+' FILE
This enables extended regex. Without it you get BREs (basic regex) which have a more simplified syntax. Alternatively you could run egrep instead with the same result.
I found you need to escape the +:
grep '^[[:space:]]\+' FILE
Try grep -P '^\s+' instead, provided you’re using GNU grep. It’s a lot easier to type, and has better regexes.

Specify multiple possible patterns for a single command

Basically there a few lines which contain a common format, but different wording at the end. The command will work for all of them, but I want to match all possible pattern, thereby needing only 1 line in the script. As an example, I know how to make the script work like so:
/pattern1/ s/asdf/ghjk/g
/pattern2/ s/asdf/ghjk/g
/pattern3/ s/asdf/ghjk/g
Any ideas?
If your patterns are really as similar as in your example, you can use
sed -e '/pattern[1-3]/ s/asdf/ghjk/g'
If the patterns aren't so similar and your sed command supports extended regular expressions, you can use
sed -E -e '/(pattern1|pattern2|pattern3)/ s/asdf/ghjk/g'
# ^^ use extended regular expressions
# for GNU sed, use -r or escape (, |, and ) with \
If your sed command doesn't support extended regular expressions, you might have to turn to awk or perl:
perl -ple '/(pattern1|pattern2|pattern3)/ && s/asdf/ghjk/g'

Resources