Removing the </p> tag from grep output - linux

I have I bash script that will find phones numbers inside .htm or .html files in a directory (or recursivly down if I want it) to find phone numbers in the format (ddd)ddd-dddd or ddd-ddd-dddd (Where d represents a digit).
This is my code:
find ./ -maxdepth 1 -regex ".*\(html\|htm\)$" | xargs grep '\(([0-9]\{3\})\|[0-9]\{3\}\)[-]\?[0-9]\{3\}-[0-9]\{4\}'
The output is:
./dash_only_phone.htm:800-555-1212</p>
./paren_phone.htm:(800)555-1212</p>
I was wondering how I would change the grep command to remove the html p tag printout at the end.
Thanks,

If your grep supports Perl Compatible Regular Expressions, as do GNU and OS X grep:
grep -Po '(\([0-9]{3}\)|[0-9]{3})-?[0-9]{3}-[0-9]{4}(?=</p>)'
Note the changes in escaping (which are similar to or the same as for grep -E).

Why not just pass the output through a sed filter to remove it, as in the following transcript:
pax$ echo './dash_only_phone.htm:800-555-1212</p>' | sed 's?</p>$??'
./dash_only_phone.htm:800-555-1212
This will get rid of any </p> sequences that appear at the end of a line.

You can just add the -o switch to get the IP
find ./ -maxdepth 1 -regex ".*\(html\|htm\)$" | xargs grep -o '\(([0-9]\{3\})\|[0-9]\{3\}\)[-]\?[0-9]\{3\}-[0-9]\{4\}'

Related

GREP to show files WITH text and WITHOUT text

I am trying to search for files with specific text but excluding a certain text and showing only the files.
Here is my code:
grep -v "TEXT1" *.* | grep -ils "ABC2"
However, it returns:
(standard input)
Please suggest. Thanks a lot.
The output should only show the filenames.
Here's one way to do it, assuming you want to match these terms anywhere in the file.
grep -LZ 'TEXT1' *.* | xargs -0 grep -li 'ABC2'
-L will match files not containing the given search term
use -LiZ if you want to match TEXT1 irrespective of case
The -Z option is needed to separate filenames with NUL character and xargs -0 will then separate out filenames based on NUL character
If you want to check these two conditions on same line instead of anywhere in the file:
grep -lP '^(?!.*TEXT1).*(?i:ABC2)' *.*
-P enables PCRE, which I assume you have since linux is tagged
(?!regexp) is a negative lookahead construct, so ^(?!.*TEXT1) will ensure the line doesn't have TEXT1
(?i:ABC2) will match ABC2 case insensitively
Use grep -liP '^(?!.*TEXT1).*ABC2' if you want to match both terms irrespective of case
(standard input)
This error is due to use of grep -l in a pipeline as your second grep command is reading input from stdin not from a file and -l option is printing (standard input) instead of the filename.
You can use this alternate solution in a single awk command:
awk '/ABC2/ && !/TEXT1/ {print FILENAME; nextfile}' *.* 2>/dev/null

Need to delete first N lines of grep result files

I'm trying to get rid of a hacker issue on some of my wordpress installs.
This guy puts 9 lines of code in the head of multiple files on my server... I'm trying to use grep and sed to solve this.
Im trying:
grep -r -l "//360cdn.win/c.css" | xargs -0 sed -e '1,9d' < {}
But nothing is happening, if I remove -0 fromxargs, the result of the files found are clean, but they are not overwriting the origin file with thesed` result, can anyone help me with that?
Many thanks!
You should use --null option in grep command to output a NUL byte or \0 after each filename in the grep output. Also use -i.bak in sed for inline editing of each file:
grep -lR --null '//360cdn.win/c\.css' . | xargs -0 sed -i.bak '1,9d'
What's wrong with iterating over the files directly¹?
And you might want to add the -i flat to sed so that files are edited in-place
grep -r -l "//360cdn.win/c.css" | while read f
do
sed -e '1,9d' -i "${f}"
done
¹ well, you might get problems if your files contain newlines and the like.
but then...if your website contains files with newlines, you probably have other problems anyhow...

Search and Replace SED Linux

I need to convert my all url relative path
https://www.example.com/image/abc.gif to /image/abc.gif
i tried this command but not worked for https:\\ section . How can i use https\\ in this command .
grep -rl "http://www.example.com" /root/ | xargs sed -i 's/http://www.example.com//g'
The following will do the job:
echo "https://www.example.com/image/abc.gif" | sed 's/https:\/\/www\.example\.com//g'
OUTPUT
/image/abc.gif
You can use regular expressions, the -r indicates extended regexes. The ? says 0 or 1 occurrence of 's'.
echo "http://www.example.com/image/abc.gif" | sed -r 's/https?:\/\/www\.example\.com//g'

Grep Search all files in directory for string1 AND string2

How can I make use of grep in cygwin to find all files that contain BOTH words.
This is what I use to search all files in a directory recursively for one word:
grep -r "db-connect.php" .
How can I extend the above to look for files that contain both "db-connect.php" AND "version".
I tried this: grep -r "db-connect.php\|version" . but this is an OR i.e. it gets file that contain one or the other.
Thanks all for any help
grep -r db-connect.php . | grep version
If you want to grep for several strings in a file which have different lines, use the following command:
grep -rl expr1 | xargs grep -l expr2 | xargs grep -l expr3
This will give you a list of files that contain expr1, expr2, and expr3.
Note that if any of the file names in the directory contains spaces, these files will produce errors. This can be fixed by adding -0 I think to grep and xargs.
grep "db-connect.php" * | cut -d: -f1 | xargs grep "version"
I didn't try it in recursive mode but it should be the same.
To and together multiple searches, use multiple lookahead assertions, one per thing looked for apart from the last one:
instead of writing
grep -P A * | grep B
you write
grep -P '(?=.*A)B' *
grep -Pr '(?=.*db-connect\.php)version' .
Don’t write
grep -P 'A.*B|B.*A' *
because that fails on overlaps, whereas the (?=…)(?=…) technique does not.
You can also add in NOT operators as well. To search for lines that don’t match X, you normally of course use -v on the command line. But you can’t do that if it is part of a larger pattern. When it is, you add (?=(?!X).)*$) to the pattern to exclude anything with X in it.
So imagine you want to match lines with all three of A, B, and then either of C or D, but which don’t have X or Y in them. All you need is this:
grep -P '(?=^.*A)(?=^.*B)(?=^(?:(?!X).)*$)(?=^(?:(?!Y).)*$)C|D' *
In some shells and in some settings. you’ll have to escape the ! if it’s your history-substitution character.
There, isn’t that pretty cool?
In my cygwin the given answers didn't work, but the following did:
grep -l firststring `grep -r -l secondstring . `
Do you mean "string1" and "string2" on the same line?
grep 'string1.*string2'
On the same line but in indeterminate order?
grep '(string1.*string2)|(string2.*string1)'
Or both strings must appear in the file anywhere?
grep -e string1 -e string2
The uses PCRE (Perl-Compatible Regular Expressions) with multiline matching and returns the filenames of files that contain both strings (AND rather than OR).
grep -Plr '(?m)db-connect\.php(.*\n)*version|version(.*\n)*db-connect\.php' .
Why to stick to only grep:
perl -lne 'print if(/db-connect.php/&/version/)' *

Pipe output to use as the search specification for grep on Linux

How do I pipe the output of grep as the search pattern for another grep?
As an example:
grep <Search_term> <file1> | xargs grep <file2>
I want the output of the first grep as the search term for the second grep. The above command is treating the output of the first grep as the file name for the second grep. I tried using the -e option for the second grep, but it does not work either.
You need to use xargs's -i switch:
grep ... | xargs -ifoo grep foo file_in_which_to_search
This takes the option after -i (foo in this case) and replaces every occurrence of it in the command with the output of the first grep.
This is the same as:
grep `grep ...` file_in_which_to_search
Try
grep ... | fgrep -f - file1 file2 ...
If using Bash then you can use backticks:
> grep -e "`grep ... ...`" files
the -e flag and the double quotes are there to ensure that any output from the initial grep that starts with a hyphen isn't then interpreted as an option to the second grep.
Note that the double quoting trick (which also ensures that the output from grep is treated as a single parameter) only works with Bash. It doesn't appear to work with (t)csh.
Note also that backticks are the standard way to get the output from one program into the parameter list of another. Not all programs have a convenient way to read parameters from stdin the way that (f)grep does.
I wanted to search for text in files (using grep) that had a certain pattern in their file names (found using find) in the current directory. I used the following command:
grep -i "pattern1" $(find . -name "pattern2")
Here pattern2 is the pattern in the file names and pattern1 is the pattern searched for
within files matching pattern2.
edit: Not strictly piping but still related and quite useful...
This is what I use to search for a file from a listing:
ls -la | grep 'file-in-which-to-search'
Okay breaking the rules as this isn't an answer, just a note that I can't get any of these solutions to work.
% fgrep -f test file
works fine.
% cat test | fgrep -f - file
fgrep: -: No such file or directory
fails.
% cat test | xargs -ifoo grep foo file
xargs: illegal option -- i
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
[-L number] [-n number [-x]] [-P maxprocs] [-s size]
[utility [argument ...]]
fails. Note that a capital I is necessary. If i use that all is good.
% grep "`cat test`" file
kinda works in that it returns a line for the terms that match but it also returns a line grep: line 3 in test: No such file or directory for each file that doesn't find a match.
Am I missing something or is this just differences in my Darwin distribution or bash shell?
I tried this way , and it works great.
[opuser#vjmachine abc]$ cat a
not problem
all
problem
first
not to get
read problem
read not problem
[opuser#vjmachine abc]$ cat b
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$ grep -e "`grep problem a`" b --col
not problem xxy
problem abcd
read problem werwer
read not problem 98989
123 not problem 345
345 problem tyu
[opuser#vjmachine abc]$
You should grep in such a way, to extract filenames only, see the parameter -l (the lowercase L):
grep -l someSearch * | xargs grep otherSearch
Because on the simple grep, the output is much more info than file names only. For instance when you do
grep someSearch *
You will pipe to xargs info like this
filename1: blablabla someSearch blablabla something else
filename2: bla someSearch bla otherSearch
...
Piping any of above line makes nonsense to pass to xargs.
But when you do grep -l someSearch *, your output will look like this:
filename1
filename2
Such an output can be passed now to xargs
I have found the following command to work using $() with my first command inside the parenthesis to have the shell execute it first.
grep $(dig +short) file
I use this to look through files for an IP address when I am given a host name.

Resources