Why does find -regex command, differ from find | grep? - linux

The find command below outputs nothing, and does not find any "include" files or directories.
find -regex "*include*" 2>/dev/null
However piping the find command into grep -E seems to find most include files.
find ./ 2>/dev/null | grep -E "*include*"
I've left out the output since the first is blank and the second matches quite a few files.
I'm starting to need to dig through linux system files to find the answers I need (especially to find macro values). In order to do that I have been using find | grep -E to find the files that that should have the macro I am looking for.
Below is the line I tried today with find (my root directory is /), and nothing is output. I don't want to run the command as root so I pipe the errors out to /dev/null. I checked the errors for regex syntax errors but nothing. Its still looping through all directories since I still get a "find: /var/lib: Permission Denied" Error
find -regex "*include*" 2>/dev/null
However this seems to work and give me everything I want.
find ./ 2>/dev/null | grep -E "*include*"
So my main question is why does find -regex not output the same as find | grep -E ?

Regular expressions are not a language, but a general mathematical construct with many different notations and dialects thereof.
For simple patterns you can often get away with ignoring this fact since most dialects use very similar notation, but since you are specifying an ill defined pattern with a leading asterisk, you get into engine specific behavior.
grep -E uses the GNU implementation of POSIX ERE, and interprets your pattern as ()*includ(e)* and therefore matches includ followed by zero or more es. (POSIX says that the behavior of a leading asterisk is undefined).
find uses Emacs Regex, and interprets it as \*includ(e)* and therefore requires a literal asterisk in the filename.
If you want the same result from both, you can use find -regextype posix-egrep, or you can specify a regex that is equivalent in both such as .*include.* to match include as a substring.

As I understand your question you want to find files in Linux directories
You should use this library
yum install locate
If you use ubuntu
sudo apt-get install locate
Prepare library
sudo updatedb
Then start search
locate include

Related

Finding .php files with certain variable declaration (string search) on command line (Shell)

I'm attempting to find all .PHP files that are in certain depth of a directory (at least 4 levels down, but not more than 5 levels in).
I'm logged into my Centos server with root authority via shell.
The string I want to search for is:
$slides='';
What I have in front of me.. I would expect it to work. I tried to escape the $ with a \ (I thought perhaps it works like regex, needing special chars excluded). I tried without the ='' portion, or tried adding \'\' to that part.. or remove the ='' altogether to simplify. nothing.
find . -maxdepth 5 -mindepth 4 -type f -name ‘*.php’ -print | xargs grep "\$slides=’’" *
I'm already running it under the directory under which I want to recursively search.
Also - I have the filter to look for *.php only but I still get a bunch of directory names in the return with a warning that says grep: [dir_name]: Is a directory
Clearly I am missing something here as far as syntax of grep command goes, or how the filter works here. I use grep more in PHP so this is quite a transition for me!
So you were almost right. The problem looks to have been the grep part of the command
grep "\$slides=''" *
Namely the * was the issue. From the bash manual
After word splitting, unless the -f option has been set (see The Set
Builtin), Bash scans each word for the characters ‘*’, ‘?’, and ‘[’.
If one of these characters appears, and is not quoted, then the word
is regarded as a pattern, and replaced with an alphabetically sorted
list of filenames matching the pattern
When you piped the found files with find into xargs and attempted to grep them with *, grep would have interpreted this as you wanting to find the string $slides='' in a list of filenames/directories returned by the glob *, and you cannot grep directories without supplying the -r flag to grep, so it returned an error.
Instead, what you wanted to do is pipe the found files with find into xargs so it can add the list of filenames to the grep command, as that's what xargs does. From the xargs manual
xargs reads items from the standard input, delimited by blanks (which
can be protected with double or single quotes or a backslash) or
newlines, and executes the command (default is /bin/echo) one or more
times with any initial- arguments followed by items read from
standard input. Blank lines on the standard input are ignored.
Making the correct command
find . -maxdepth 5 -mindepth 4 -type f -name '*.php' -print0 | xargs -0 grep "\$slides=''"
Using the -print0 flag in find, and the -0 flag in xargs, to use NUL as the delimiter, in case any filenames contained newlines.
If you want to use shell_exec from your PHP code, it is a program execution function which allows you to run a command like 'ls -al' in the operating system shell and get the result returned into a variable. Querystrings are not commands you can use in this way.
Do you mean running PHP from the command line so that it runs from the shell, not from the web server:
php -r 'echo "hello world\n";'
If you run PHP 4.3 and above, you can use the PHP Command Line Interface (CLI) which can also execute scripts stored in files. Have a look at the syntax and examples at: http://php.net/features.commandline

Copying files with even number in its name - bash

I want to copy all files from /usr/lib which ends with .X.0.0 where X is an even number. Is there a better way than the following one to select all the files?
ls /usr/lib | grep "[02468].0.0$"
My problem with this solutions is that in case there are files with names like "xy.800.0.0" I need to use the bracket three times etc.
Just use a glob expansion to match the files:
cp /usr/lib/*.*[02468].0.0 /path/to/destination
The shell expands this pattern to the list of files before passing them as arguments to cp.
Since you tagged Bash, you can make the match more strict by using an extended glob:
shopt -s extglob failglob
cp /usr/lib/*.*([0-9])[02468].0.0 /path/to/destination
This matches 0 or more other digits followed by an even digit, and doesn't run the command at all if no files match.
You could use extended grep regular expressions to only match even numbers:
ls -1q /usr/lib | grep -E "\.[0-9]*[02468].0.0$"
However, as Tom suggested, there are better options than parsing the output of ls. It's generally safer and faster to use glob expansion, and more maintainable to just put everything in a python script.

Using grep to identify a pattern

I have several documents hosted on a cloud instance. I want to extract all words conforming to a specific pattern into a .txt file. This is the pattern:
ABC123A
ABC123B
ABC765A
and so one. Essentially the words start with a specific character string 'ABC', have a fixed number of numerals, and end with a letter. This is my code:
grep -oh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt
When I execute the query, it runs for several hours without generating any output. I have over 1100 documents. However, when I run this query:
grep -r ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt
the list of files with the strings is generated in a matter for seconds.
What do I need to correct in my query? Also, what is causing the delay?
UPDATE 1
Based on the answers, it's evident that the command is missing the file name on which it needs to be executed. I want to run the code on multiple document files (>1000)
The documents I want searched are in multiple sub-directories within a directory. What is a good way to search through them? Doing
grep -roh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt
only returns the file names.
UPDATE 2
If I use the updated code from the answer below:
find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \;
I get a no file or directory error
UPDATE 3
The pattern can be anywhere in the line.
You can use this regexp :
~/ grep -E "^ABC[0-9]{3}[A-Z]$" docs > filename
ABC123A
ABC123B
ABC765A
There is no delay, grep is just waiting for the input you didn't give it (and therefore it waits on standard input, by default). You can correct your command by supplying argument with filename:
grep -oh "ABC[0-9].*[a-zA-Z]$" file.txt > /home/user/abcLetterMatches.txt
Source (man grep):
SYNOPSIS
grep [OPTIONS] PATTERN [FILE...]
To perform the same grepping on several files recursively, combine it with find command:
find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \;
This does what you ask for:
grep -hr '^ABC[0-9]\{3\}[A-Za-z]$'
-h to not get the filenames.
-r to search recursively. If no directory is given (as above) the current one is used. Otherwise just specify one as the last argument.
Quotes around the pattern to avoid accidental globbing, etc.
^ at the beginning of the pattern to — together with $ at the end — only match whole lines. (Not sure if this was a requirement, but the sample data suggests it.)
\{3\} to specify that there should be three digits.
No .* as that would match a whole lot of other things.

find only files with extension using ls

I need to find only files in directory which have a extension using ls (can't use find).
I tried ls *.*, but if dir doesn't contain any file with extension it returns "No such file or directory".
I dont want that error and want ls to return to cmd prompt if there are files with extension.
I have trying to use grep with ls to achieve the same.
ls|grep "*.*" - doesn't work
but ls | grep "\." works.
I have no idea why grep *.* doesn't work. Any help is appreciated!
Thanks!
I think the correct solution is this:
( shopt -s nullglob ; echo *.* )
It's a bit verbose, but it will always work no matter what kind of funky filenames you have. (The problem with piping ls to grep is that typical systems allow really bizarre characters in filenames, including, for example, newlines.)
The shopt -s nullglob part enables ("sets") the nullglob shell optoption, which tells Bash that if no files have names matching *.*, then the *.* should be removed (i.e., should expand into nothing) rather than being left alone.
The parentheses (...) are to set up a subshell, so the nullglob option is only enabled for this small part of the script.
It's important to understand the difference between a shell pattern and a regular expression. Shell patterns are a bit simpler, but less flexible. grep matches using a regular expression. A shell pattern like
*.*
would be done with a regular expression as
.*\..*
but the regular expressions in grep are not anchored, which means it searches for a match anywhere on the line, making the two .* parts unnecessary.
Try
ls -1 | grep "\."
list only files with extensión and nothing (empty list) if there is no file: like you need.
With Linux grep, you can add -v to get a list files with no extension.

Looking for tool to search text in files on command line

Hello
I'm looking some script or program that use keywords or pattern search in files ex. php, html, etc and show where is this file
I use command cat /home/* | grep "keyword"
but i have too many folders and files and this command causes big uptime :/
I need this script to find fake websites (paypal, ebay, etc)
find /home -exec grep -s "keyword" {} \; -print
You don't really say what OS (and shell) you are using. You might want to retag your question to help us out.
Because you mention cat | ... , I am assuming you are using a Unix/Linux variant, so here are some pointers for looking at files. (bmargulies solution is good too).
I'm looking some script or program that use keywords or pattern search in files
grep is the basic program for searching files for text strings. Its usage is
grep [-options] 'search target' file1 file2 .... filen
(Note that 'search target' contains a space, if you don't surround spaces in your searchTarget with double or single quotes, you will have a minor error to debug.)
(Also note that 'search target' can use a wide range of wild-card characters, like .,?,+,,., and many more, that is beyond the scope of your question). ... anyway ...
As I guess you have discovered, you can only cram so many files at a time into the comand-line, even when using wild-card filename expansion. Unix/linux almost always have a utiltiyt that can help with that,
startDir=/home
find ${startDir} -print | xargs grep -l 'Search Target'
This, as one person will be happy to remind you, will require further enhancements if your filenames contain whitespace characters or newlines.
The options available for grep can vary wildly based on which OS you are using. If you're lucky, you type the following to get the man page for your local grep.
man grep
If you don't have your page buffer setup for a large size, you might need to do
man grep | page
so you can see the top of the 'document'. Press any key to advance to the next page and when you are at the end of the document, the last key press returns you to the command prompt.
Some options that most greps have that might be useful to you are
-i (ignore case)
-l (list filenames only (where txt is found)
There is also fgrep, which is usually interpretted to mean 'file' grep
becuase you can give it a file of search targets to scan for, and is used like
fgrep [-other_options] -f srchTargetsFile file1 file2 ... filen
I need this script to find fake websites (paypal, ebay, etc)
Final solution
you can make a srchFile like
paypal.fake.com
ebay.fake.com
etc.fake.com
and then combined with above, run the following
startDir=/home
find ${startDir} -print | xargs fgrep -il -f srchFile
Some greps require that the -fsrchFile be run together.
Now you are finding all files starting /home, searching with fgrep for paypay, ebay, etc in all files. The -l says it will ONLY print the filename where a match is found. You can remove the -l and then you will see the output of what is found, prepended with the filename.
IHTH.

Resources