Grep issues: Why is grep -i not grabbing the instance with special characters - linux

I want to find this:
gxxcc/issdd/jaabb/krrss/lxxnn
On this (all capital letters)
AXXRR/BVVTTS/CRRTTDD/DEETTFF/EAABBRR/FRSSTT/GXXCC/ISSDD/JAABB_KRRSS_LXXNN/LL
I tried this
grep -i 'GXXCC/ISSDD/JAABB.KRRSS.LXXNN' filename.txt
grep -i 'GXXCC*ISSDD*JAABB*KRRSS*LXXNN' filename.txt
but, neither of those work. Any solution and explanation?

This, from your example, works fine, highlighting a portion of the text:
echo AXXRR/BVVTTS/CRRTTDD/DEETTFF/EAABBRR/FRSSTT/GXXCC/ISSDD/JAABB_KRRSS_LXXNN/LL |
egrep -i --color 'gxxcc/issdd/jaabb.krrss.lxxnn'
Avoid patterns like D*JAABB*K. That would match text like 'JAABBBBBBK' or 'DDDDDJAABK', rather than slashes or underscores.
The . dot matches exactly one character. Is your _ underscore maybe some crazy utf8 multibyte sequence, rather than ascii 95? One way to tell is to put Kleene stars in your pattern: egrep -i --color 'jaabb.*krrss.*lxxnn'. Another is to view your text with hexdump -C

Related

grep: Invalid regular expression

I have a text file which looks like this:
haha1,haha2,haha3,haha4
test1,test2,test3,test4,[offline],test5
letter1,letter2,letter3,letter4
output1,output2,[offline],output3,output4
check1,[core],check2
num1,num2,num3,num4
I need to exclude all those lines that have "[ ]" and output them to another file without all those lines that have "[ ]".
I'm currently using this command:
grep ",[" loaded.txt | wc -l > newloaded.txt
But it's giving me an error:
grep: Invalid regular expression
Use grep -F to treat the search pattern as a fixed string. You could also replace wc -l with grep -c.
grep -cF ",[" loaded.txt > newloaded.txt
If you're curious, [ is a special character. If you don't use -F then you'll need to escape it with a backslash.
grep -c ",\[" loaded.txt > newloaded.txt
By the way, I'm not sure why you're using wc -l anyways...? From your problem description, it sounds like grep -v might be more appropriate. -v inverts grep's normal output, printing lines that don't match.
grep -vF ",[" loaded.txt > newloaded.txt
An alternative method to Grep
It's unclear if you want to remove lines that might contain either bracket [], or only the ones where the brackets specifically surround characters. Regardless of which method you intend to use, sed can easily remove lines that fit a definitive pattern:
To delete only lines that contained both brackets surrounding characters [...]:
sed '/\[.*\]/d' loaded.txt > newloaded.txt
Another approach might be to remove any line that contained either bracket:
sed '/\[/d;/\]/d' loaded.txt > newloaded.txt
(eg. lines containing either [ or ] would be deleted)
Your grep command doesn't seem to be excluding anything. Also, why are you using wc? I thought you want the lines, not their count.
So if you just want the lines, as you say, that don't have [], then this should work:
grep -v "\[" loaded.txt > new.txt
You can also use awk for this:
awk -F\[ 'NF==1' file > newfile
cat newfile
haha1,haha2,haha3,haha4
letter1,letter2,letter3,letter4
num1,num2,num3,num4
Or this:
awk '!/\[/' file

Trying to use grep to find something, then output a different part of the line

Say for instance I'm searching a line that is like this:
Color asdf
and I use grep to find that line, like grep asdf file.txt
How would I then display Color? Learning linux is hard.
With the command line tool sed you can replace stings by using regular expressions:
echo "Color asdf" | sed 's/\([^ ]*\).*/\1/'
This part: \([^ ]*\).* is a regular expresion. The first part of the regex: [^ ]*, matches any character except a space as many times as possible and what's between the \( and \) is being captured in the variable \1. Then you also match the remaining part of the string with .* and replace all of that with only the first word which was captured by \([^ ]*\) by using \1 in the replace part of the sed command.
Here some more info about sed:
http://linux.about.com/od/commands/a/Example-Uses-Of-Sed-Cmdsedxa.htm
You could use sed:
sed -n 's/[[:space:]][[:space:]]*asdf$//p' file.txt
Details:
The -n option tells sed not to print the pattern space automatically. Basically, it doesn't output anything unless you tell it to.
The s command of sed replaces text. Here, if a line ends with asdf, preceded by at least one whitespace character, we replace all of that with nothing and then print the line (notice the p flag at the end of the s command). The printing is only done if something was actually replaced. More information about the s command can be found e. g. in the GNU sed manual.
Edit for clarity: When using single quotes, parameter expansion does not work and thus, variables won't be replaced. To use variables, use double quotes:
search=asdf
sed -n "s/[[:space:]][[:space:]]*${search}\$//p" file.txt
If you'd really like to use grep here, you could pipe the output from grep into cut:
grep -h asdf *.txt | cut -s -d -f 1
Note that there have to be two spaces after the -d option to cut - the first tells cut to use a blank as the field delimiter (I'm assuming your fields are blank-delimited rather than tab-delimited), while the second separates the -d option from the following option (-f).
But, yeah, sed or awk are probably your friends here... :-)
you can color pattern in the line using grep
grep --colour -o 'asdf' file.txt
edit: the -o option will print only the patterns

Replace string between square brackets with sed

I have some strings in a textfile that look like this:
[img:3gso40ßf]
I want to replace them to look like normal BBCode:
[img]
How can I do that with sed? I tried this one but it doesn't do anything:
sed -i 's/^[img:.*]/[img]/g' file.txt
Escape those square brackets
Square brackets are metacharacters: they have a special meaning in POSIX regular expressions. If you mean [ and ] literally, you need to escape those characters in your regexp:
$ sed -i .bak 's/\[img:.*\]/\[img\]/g' file.txt
Use [^]]* instead of .*
Because * is greedy, .* will capture more than what you want; see Jidder's comment. To fix this, use [^]]*, which captures a sequence of characters up to (but excluding) the first ] encountered.
$ sed -i .bak 's/\[img:.[^]]\]/\[img\]/g' file.txt
Are you using an incorrect sed -i syntax?
(Thanks to j.a. for his comment.)
Depending on the flavour of sed that you're using, you may be allowed to use sed -i without specifying any <extension> argument, as in
$ sed -i 's/foo/bar/' file.txt
However, in other versions of sed, such as the one that ships with Mac OS X, sed -i expects a mandatory <extension> argument, as in
$ sed -i .bak 's/foo/bar/' file.txt
If you omit that extension argument (.bak, here), you'll get a syntax error. You should check out your sed's man page to figure out whether that argument is optional or mandatory.
Match a specific number of characters
Is there a way to tell sed that there are always 8 random characters after the colon?
Yes, there is. If the number of characters between the colon and the closing square bracket is always the same (8, here), you can make your command more specific:
$ sed -i .bak 's/\[img:[^]]\{8\}\]/\[img\]/g' file.txt
Example
# create some content in file.txt
$ printf "[img:3gso40ßf]\nfoo [img:4t5457th]\n" > file.txt
# inspect the file
$ cat file.txt
[img:3gso40ßf]
foo [img:4t5457th]
# carry out the substitutions
$ sed -i .bak 's/\[img:[^]]\{8\}\]/\[img\]/g' file.txt
# inspect the file again and make sure everything went smoothly
$ cat file.txt
[img]
foo [img]
# if you're happy, delete the backup that sed created
$ rm file.txt.bak

Grep Syntax with Capitals

I'm trying to write a script with a file as an argument that greps the text file to find any word that starts with a capital and has 8 letters following it. I'm bad with syntax so I'll show you my code, I'm sure it's an easy fix.
grep -o '[A-Z][^ ]*' $1
I'm not sure how to specify that:
a) it starts with a capital letter, and
b)that it's a 9 letter word.
Cheers
EDIT:
As an edit I'd like to add my new code:
while read p
do
echo $p | grep -Eo '^[A-Z][[:alpha:]]{8}'
done < $1
I still can't get it to work, any help on my new code?
'[A-Z][^ ]*' will match one character between A and Z, followed by zero or more non-space characters. So it would match any A-Z character on its own.
Use \b to indicate a word boundary, and a quantifier inside braces, for example:
grep '\b[A-Z][a-z]\{8\}\b'
If you just did grep '[A-Z][a-z]\{8\}' that would match (for example) "aaaaHellosailor".
I use \{8\}, the braces need to be escaped unless you use grep -E, also known as egrep, which uses Extended Regular Expressions. Vanilla grep, that you are using, uses Basic Regular Expressions. Also note that \b is not part of the standard, but commonly supported.
If you use ^ at the beginning and $ at the end then it will not find "Wiltshire" in "A Wiltshire pig makes great sausages", it will only find lines which just consist of a 9 character pronoun and nothing else.
This works for me:
$ echo "one-Abcdefgh.foo" | grep -o -E '[A-Z][[:alpha:]]{8}'
$ echo "one-Abcdefghi.foo" | grep -o -E '[A-Z][[:alpha:]]{8}'
Abcdefghi
$
Note that this doesn't handle extensions or prefixes. If you want to FORCE the input to be a 9-letter capitalized word, we need to be more explicit:
$ echo "one-Abcdefghij.foo" | grep -o -E '\b[A-Z][[:alpha:]]{8}\b'
$ echo "Abcdefghij" | grep -o -E '\b[A-Z][[:alpha:]]{8}\b'
$ echo "Abcdefghi" | grep -o -E '\b[A-Z][[:alpha:]]{8}\b'
Abcdefghi
$
I have a test file named 'testfile' with the following content:
Aabcdefgh
Babcdefgh
cabcdefgh
eabcd
Now you can use the following command to grep in this file:
grep -Eo '^[A-Z][[:alpha:]]{8}' testfile
The code above is equal to:
cat testfile | grep -Eo '^[A-Z][[:alpha:]]{8}'
This matches
Aabcdefgh
Babcdefgh

Grep not as a regular expression

I need to search for a PHP variable $someVar. However, Grep thinks that I am trying to run a regex and is complaining:
$ grep -ir "Something Here" * | grep $someVar
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
$ grep -ir "Something Here" * | grep "$someVar"
<<Here it returns all rows with "someVar", not only those with "$someVar">>
I don't see an option for telling grep not to interpret the string as a regex, but to include the $ as just another string character.
Use fgrep (deprecated), grep -F or grep --fixed-strings, to make it treat the pattern as a list of fixed strings, instead of a regex.
For reference, the documentation mentions (excerpts):
-F --fixed-strings Interpret the pattern as a list of fixed
strings (instead of regular expressions), separated by newlines, any
of which is to be matched. (-F is specified by POSIX.)
fgrep is the same as grep -F. Direct invocation as fgrep is
deprecated, but is provided to allow historical applications that rely
on them to run unmodified.
For the complete reference, check:
https://www.gnu.org/savannah-checkouts/gnu/grep/manual/grep.html
grep -F is a standard way to tell grep to interpret argument as a fixed string, not a pattern.
You have to tell grep you use a fixed-string, instead of a pattern, using '-F' :
grep -ir "Something Here" * | grep -F \$somevar
In this question, the main issue is not about grep interpreting $ as a regex. It's about the shell substituting $someVar with the value of the environment variable someVar, likely the empty string.
So in the first example, it's like calling grep without any argument, and that's why it gives you a usage output. The second example should not return all rows containing someVar but all lines, because the empty string is in all lines.
To tell the shell to not substitute, you have to use '$someVar' or \$someVar. Then you'll have to deal with the grep interpretation of the $ character, hence the grep -F option given in many other answers.
So one valid answer would be:
grep -ir "Something Here" * | grep '$someVar'
+1 for the -F option, it shall be the accepted answer.
Also, I had a "strange" behaviour while searching for the -I.. pattern in my files, as the -I was considered as an option of grep ; to avoid such kind of errors, we can explicitly specify the end of the arguments of the command using --.
Example:
grep -HnrF -- <pattern> <files>
Hope that'll help someone.
Escape the $ by putting a \ in front of it.

Resources