How to grep non-keyboard charaters? - linux

I'm trying to use grep to get all μs under a directory, unfortunately, μ is not a keyboard character, any ideas?
BTW, for normal keyboard words, I could use
find / -type f -print | xargs grep -inE <search_word> 2>/dev/null
to find out all plain text files that contain the search word.

Would You mind using sed instead of grep?
sed -n '/\xb5/p'
However grep should also work:
grep -P '\xb5'

In Bash, you can use the shell's quoting facilities to pass non-ASCII content. In order to correctly identify the search string, we need to know the encoding of the files you are grepping. If they are in UTF-8, you need a different search string than if they are in ISO-8859-1 or UTF-16.
If your shell's locale agrees with the contents of the file, this should all work undramatically out of the box, but here are a couple of workarounds.
# grep ISO-8859-1 \xB5
grep $'\xB5' file
# grep UTF-8 U+03BC
grep $'\xCE\xBC' file
# grep UTF-16be U+03BC
grep $'\x03\xBC' file
# grep UTF-16le U+03BC
grep $'\xBC\x03' file
Some older versions of grep have a problem with non-ASCII characters; as a workaround, you can also use Perl.
perl -ne 'print if m/\u03BC/' file
You might have to play around with Perl's Unicode facilities to get this to work.

Related

GREP to show files WITH text and WITHOUT text

I am trying to search for files with specific text but excluding a certain text and showing only the files.
Here is my code:
grep -v "TEXT1" *.* | grep -ils "ABC2"
However, it returns:
(standard input)
Please suggest. Thanks a lot.
The output should only show the filenames.
Here's one way to do it, assuming you want to match these terms anywhere in the file.
grep -LZ 'TEXT1' *.* | xargs -0 grep -li 'ABC2'
-L will match files not containing the given search term
use -LiZ if you want to match TEXT1 irrespective of case
The -Z option is needed to separate filenames with NUL character and xargs -0 will then separate out filenames based on NUL character
If you want to check these two conditions on same line instead of anywhere in the file:
grep -lP '^(?!.*TEXT1).*(?i:ABC2)' *.*
-P enables PCRE, which I assume you have since linux is tagged
(?!regexp) is a negative lookahead construct, so ^(?!.*TEXT1) will ensure the line doesn't have TEXT1
(?i:ABC2) will match ABC2 case insensitively
Use grep -liP '^(?!.*TEXT1).*ABC2' if you want to match both terms irrespective of case
(standard input)
This error is due to use of grep -l in a pipeline as your second grep command is reading input from stdin not from a file and -l option is printing (standard input) instead of the filename.
You can use this alternate solution in a single awk command:
awk '/ABC2/ && !/TEXT1/ {print FILENAME; nextfile}' *.* 2>/dev/null

Grep issues: Why is grep -i not grabbing the instance with special characters

I want to find this:
gxxcc/issdd/jaabb/krrss/lxxnn
On this (all capital letters)
AXXRR/BVVTTS/CRRTTDD/DEETTFF/EAABBRR/FRSSTT/GXXCC/ISSDD/JAABB_KRRSS_LXXNN/LL
I tried this
grep -i 'GXXCC/ISSDD/JAABB.KRRSS.LXXNN' filename.txt
grep -i 'GXXCC*ISSDD*JAABB*KRRSS*LXXNN' filename.txt
but, neither of those work. Any solution and explanation?
This, from your example, works fine, highlighting a portion of the text:
echo AXXRR/BVVTTS/CRRTTDD/DEETTFF/EAABBRR/FRSSTT/GXXCC/ISSDD/JAABB_KRRSS_LXXNN/LL |
egrep -i --color 'gxxcc/issdd/jaabb.krrss.lxxnn'
Avoid patterns like D*JAABB*K. That would match text like 'JAABBBBBBK' or 'DDDDDJAABK', rather than slashes or underscores.
The . dot matches exactly one character. Is your _ underscore maybe some crazy utf8 multibyte sequence, rather than ascii 95? One way to tell is to put Kleene stars in your pattern: egrep -i --color 'jaabb.*krrss.*lxxnn'. Another is to view your text with hexdump -C

xargs error: File name too long

I have a file that has a list of malicious file names. There are many file names contains blank spaces. I need to find them and change their permissions. I have tried the following:
grep -E ". " suspicious.txt | xargs -0 chmod 000
But I am getting an error:
:File name too long
An ideas?
OK, you have one filename per line in your file, and the problem is that xargs without -0 will treat spaces and tabs as well as the newlines as file separators, while xargs with -0 expects the filenames to be separated by NUL characters and won't care about the newlines at all.
So turn the newlines into NULs before feeding the result into the xargs -0 command:
grep -E ". " suspicious.txt | tr '\n' '\0' | xargs -0 chmod 000
Update:
See Mark Reeds correct answer. This was wrong because nulls were needed for the filenames from the file, not the filenames generated by grep.
Original:
You need something more like this:
grep -Z -E ". " suspicious.txt | xargs -0 chmod 000
From xargs man page:
Because Unix filenames can contain blanks and newlines, this default behaviour is often problematic; filenames containing blanks and/or newlines are incorrectly processed by xargs. In these situations it is better to use the -0 option, which prevents such problems. When using this option you will need to ensure that the program which produces the input for xargs also uses a null character as a separator.
From grep man page:
-Z, --null
Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file name. For example, grep -lZ outputs a zero byte after each file name instead of the usual newline. This option makes the output unambiguous, even in the presence of file names containing unusual characters like newlines. This option can be used with commands like find -print0, perl -0, sort -z, and xargs -0 to process arbitrary file names, even those that contain newline characters.

Grep and inserting a string

I have a text file with a bunch of file paths such as -
web/index.erb
web/contact.erb
...
etc. I need to append before the
</head>
a line of code, to every single file, I'm trying to figure out how to do this without opening each file of course. I've heard sed, but I've never used it before..was hoping there would be a grep command maybe?
Thanks
xargs can be used to apply sed (or any other command) to each filename or argument in a list. So combining that with Rom1's answer gives:
xargs sed -i 's/<\/html>/myline\n<\/html>/g' < fileslist.txt
while read f ; do
sed -i '/<\/head>/i*iamthelineofcode*' "$f"
done <iamthefileoffiles.list
or
sed -i '/<\/head>/i*iamthelineofcode*' $(cat iamthefileoffiles.list)

Grep Search all files in directory for string1 AND string2

How can I make use of grep in cygwin to find all files that contain BOTH words.
This is what I use to search all files in a directory recursively for one word:
grep -r "db-connect.php" .
How can I extend the above to look for files that contain both "db-connect.php" AND "version".
I tried this: grep -r "db-connect.php\|version" . but this is an OR i.e. it gets file that contain one or the other.
Thanks all for any help
grep -r db-connect.php . | grep version
If you want to grep for several strings in a file which have different lines, use the following command:
grep -rl expr1 | xargs grep -l expr2 | xargs grep -l expr3
This will give you a list of files that contain expr1, expr2, and expr3.
Note that if any of the file names in the directory contains spaces, these files will produce errors. This can be fixed by adding -0 I think to grep and xargs.
grep "db-connect.php" * | cut -d: -f1 | xargs grep "version"
I didn't try it in recursive mode but it should be the same.
To and together multiple searches, use multiple lookahead assertions, one per thing looked for apart from the last one:
instead of writing
grep -P A * | grep B
you write
grep -P '(?=.*A)B' *
grep -Pr '(?=.*db-connect\.php)version' .
Don’t write
grep -P 'A.*B|B.*A' *
because that fails on overlaps, whereas the (?=…)(?=…) technique does not.
You can also add in NOT operators as well. To search for lines that don’t match X, you normally of course use -v on the command line. But you can’t do that if it is part of a larger pattern. When it is, you add (?=(?!X).)*$) to the pattern to exclude anything with X in it.
So imagine you want to match lines with all three of A, B, and then either of C or D, but which don’t have X or Y in them. All you need is this:
grep -P '(?=^.*A)(?=^.*B)(?=^(?:(?!X).)*$)(?=^(?:(?!Y).)*$)C|D' *
In some shells and in some settings. you’ll have to escape the ! if it’s your history-substitution character.
There, isn’t that pretty cool?
In my cygwin the given answers didn't work, but the following did:
grep -l firststring `grep -r -l secondstring . `
Do you mean "string1" and "string2" on the same line?
grep 'string1.*string2'
On the same line but in indeterminate order?
grep '(string1.*string2)|(string2.*string1)'
Or both strings must appear in the file anywhere?
grep -e string1 -e string2
The uses PCRE (Perl-Compatible Regular Expressions) with multiline matching and returns the filenames of files that contain both strings (AND rather than OR).
grep -Plr '(?m)db-connect\.php(.*\n)*version|version(.*\n)*db-connect\.php' .
Why to stick to only grep:
perl -lne 'print if(/db-connect.php/&/version/)' *

Resources