Question about shell commands and grep - linux

Does anyone know why
grep "p\{2\}" textfile
will find "apple" if it's in the file, but
grep p\{2\} textfile
won't?
I'm new to using a command line and regular expressions, and this is puzzling me.

Although this has already been answered, but since you are new to all this stuff, here is how to debug it:
-- get the pid of current shell (using ps).
PID TTY TIME CMD
1611 pts/0 00:00:00 su
1619 pts/0 00:00:00 bash
1763 pts/0 00:00:00 ps
-- from some other shell, attach strace (system call tracer) to the required pid (here 1619):
strace -f -o <output_file> -p 1619
-- Run both the commands that you tried
-- open the output file and look for exec family calls for the required process, here: grep
The output on my machine is some thing like:
1723 execve("/bin/grep", ["grep", "--color=auto", "p{2}", "foo"], [/* 19 vars */]) = 0
1725 execve("/bin/grep", ["grep", "--color=auto", "p\\{2\\}", "foo"], [/* 19 vars */]) = 0
Now you can see the difference how grep was executed in both the cases and can figure out the problem yourself. :)
still the -e flag mystery is yet to be solved....

Without the quotes, the shell will try to expanding the options. In your case the curly brackets '{}' have a special meaning in the shell much like the asterisk '*' which expands to a wildcard.

With quotes, your complete regex gets passed directly to grep. Without the quotes, grep sees your regex as p{2}.
Edit:
To clarify, without the quotes your slashes are being removed by shell before your regex is passed to grep.
Try:
echo grep p\{2\} test.txt
And you'll see your output as...
grep p{2} test.txt
The quotes prevent shell from escaping characters before they get to grep. You could also escape your slashes and it will work without quotes - grep p\\{2\\} test.txt

The first one greps the pattern using regex, then pp:
echo "apple" | grep 'p\{2\}'
The second one greps the pattern literally, then p{2}:
echo "ap{2}le" | grep p\{2\}

From the grep man page
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
so these two become functional equivalent
egrep p{2}
and
grep "p\{2\}"
the first uses EREs(Extended Regular Expressions) the second uses BREs(Basic Regular Expressions) in your example because your using grep(which supports BREs when you don't use the -e switch) and you're enclosed in quotes so "\{" gets expanded as a special BRE character.
You second instance doesn't work because your just looking for the literal string 2{p} which doesn't exist in your file
you can demonstrate that grep is expanding your string as a BRE by trying:
grep "p\{2"
grep will complain
grep: Unmatched \{

Related

How to extract lines that start with either this string or that string? [duplicate]

This question already has answers here:
Use grep to find either of two strings without changing the order of the lines?
(2 answers)
Closed 4 months ago.
Newbie UNIX user question ...
The input file (location.txt) is this:
WGS_LAT deg 12
WGS_LAT min 30
WGS_LAT sec 05
WGS_LAT hsec 29
WGS_LAT northSouth North
WGS_DLAT decimalDegreesLatitude 12.501469
WGS_LONG deg 07
WGS_LONG min 00
WGS_LONG sec 05
WGS_LONG hsec 61
WGS_LONG eastWest West
WGS_DLONG decimalDegreesLongitude -70.015606
I want to get all lines that start with WGS_LAT or WGS_DLAT.
First, is grep the tool you recommend for this job?
Second, if it is, then how to express the pattern? All of these failed:
grep ^WGS_LAT|^WGS_DLAT location.txt
grep ^(WGS_LAT|WGS_DLAT) location.txt
grep ^WGS_D?LAT location.txt
What is the correct pattern, please?
Grep can handle two types of regular expressions:
Basic regular expressions (BRE) which you call using grep PATTERN file
Extended regular expressions (ERE) which you call using grep -E PATTERN file
So by default grep makes use of BRE.
When reading the man-pages of grep you find
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their
special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
So, in your case the answer is:
$ grep "^\(WGS_LAT\|WGS_DLAT \)" location.txt
$ grep -E "^(WGS_LAT|WGS_DLAT)" location.txt
$ grep "^WGS_D\?LAT" location.txt
$ grep -E "^WGS_D?LAT" location.txt
First, you should always quote your regular expressions to protect them from the shell. For example, | has special meaning in the shell, it is the pipe operator that allows you to pass the output of one program as input to another. So the unquoted grep ^WGS_LAT|^WGS_DLAT location.txt is interpreted as "run grep ^WGS_LAT and pass its output as input to ^WGS_DLAT location.txt.
Next, grep uses Basic Regular Expressions by default, and to get the | to mean OR you need to either escape it as \| or use the -E (or -P flag if you are using GNU grep, which enables PCRE) to enable extended regular expressions. So all of these should work for you:
grep -E '^WGS_LAT|^WGS_DLAT' location.txt
grep -E '^(WGS_LAT|WGS_DLAT)' location.txt
grep '^WGS_LAT\|^WGS_DLAT' location.txt
Or, more simply, grep for lines starting with WGS_ and an optional D followed by LAT:
grep -E '^WGS_D?LAT' location.txt

Bash grep mac address unix/linux with semicolon

why does not this work, shouldnt this output 30:84:A9:9B:2A:67 from my textfile?
grep [A-F0-9]\:{5}[A-F0-9] textfile.txt
try this:
$ echo 30:84:A9:9B:2A:67 | grep -P "([A-F0-9]{2}:){5}[A-F0-9]{2}"
30:84:A9:9B:2A:67
In your question "[A-F0-9]:{5}" was trying to match an alpha numeric character plus colon five times: X:X:X:X:X:
Also, grep accepts basic regular expressions (BRE) so you need to escape brackets and parenthesis.

Egrep results are current command and grabage

Im trying to egrep lines that contain nothing but a single occurrence of "Hihihihihihihi!", with arbitrarily many 'hi's
Here is what I write
egrep "^Hi(hi)*!$" myfile.txt
But it didn't work. After pressing enter, the command was displayed again:
egrep "^Hi(hi)*myfile.txt" mayflies.txt
Anyone can help me?
Thanks!
The shell is interpreting !$ to substitute the last argument of the previous commend.
To disable these shell substitutions, change the double quotes to single quotes.
egrep '^Hi(hi)*!$' myfile.txt
Alternatively, you can use the -x switch to match only whole lines, obviating the need for the ^ and $ characters, and thus avoiding the fatal !$ argument substitution:
egrep -x "Hi(hi)*!" myfile.txt
You don't say what shell, but I suspect the problem you have is that the exclamation mark (!) is extra special to the shell. You need to escape that:
egrep "^Hi(hi)*\!$" myfile.txt
Should work in most shells where that's true.
Changing the double quotes to single quotes is not enough for all shells, the exclamation is still special inside single quotes. I just tested all this in the tcsh, other shells will have differences.
try it with single quotes. I think the $ is being interpreted by BASH as something, not sure what:
egrep '^Hi(hi)*!$' myfile.txt

Who reads the regex, Shell or the command?

The regex, we use to limit the results or for any other purposes, whom from are those interpreted, the command itself or the shell.
If you look at ls *.txt | sed -e 's/[AB]/a/' then the *.txt are interpreted by the shell (this is not a regex but is called globbing) and the regex 's/[AB]/a/' are interpreted by sed.
See http://wiki.bash-hackers.org/syntax/expansion/globs for more about how bash do it.

Complex shell wildcard

I want to use echo to display(not content) directories that start with atleast 2 characters but can't begin with "an"
For example if had the following in the directory:
a as an23 an23 blue
I would only get
as blue back
I tried echo ^an* but that returns the directory with 1 charcter too.
Is there any way i can do this in the form of echo globalpattern
You can use the shells extended globbing feature, in bash:
bash$ setsh -s extglob
bash$ echo !(#(?|an*))
The !() construct inverts its internal expression, see this for more.
In zsh:
zsh$ setopt extendedglob
zsh$ print *~(?|an*)
In this case the ~ negates the pattern before the tilde. See the manual for more.
Since you want at least two characters in the names, you can use printf '%s\n' ??* to echo each such name on a separate line. You can then eliminate those names that start with an with grep -v '^an', leading to:
printf '%s\n' ??* | grep -v '^an'
The quotes aren't strictly necessary in the grep command with modern shells. Once upon a quarter of a century or so ago, the Bourne shell had ^ as a synonym for | so I still use quotes around carets.
If you absolutely must use echo instead of printf, then you'll have to map white space to newlines (assuming you don't have any names that contain white space).
I'm trying with just the echo command, no grep either?
What about:
echo [!a]?* a[!n]*
The first term lists all the two-plus character names not beginning with a; the second lists all the two-plus character names where the first is a and the second is not n.
This should do it, but you'd likely be better off with ls or even find:
echo * | tr ' ' '\012' | egrep '..' | egrep -v '^an'
Shell globbing is a form of regex, but it's not as powerful as egrep regex's.

Resources