how to find files containing a string using egrep - linux

I would like to find the files containing specific string under linux.
I tried something like but could not succeed:
find . -name *.txt | egrep mystring

Here you are sending the file names (output of the find command) as input to egrep; you actually want to run egrep on the contents of the files.
Here are a couple of alternatives:
find . -name "*.txt" -exec egrep mystring {} \;
or even better
find . -name "*.txt" -print0 | xargs -0 egrep mystring
Check the find command help to check what the single arguments do.
The first approach will spawn a new process for every file, while the second will pass more than one file as argument to egrep; the -print0 and -0 flags are needed to deal with potentially nasty file names (allowing to separate file names correctly even if a file name contains a space, for example).

try:
find . -name '*.txt' | xargs egrep mystring
There are two problems with your version:
Firstly, *.txt will first be expanded by the shell, giving you a listing of files in the current directory which end in .txt, so for instance, if you have the following:
[dsm#localhost:~]$ ls *.txt
test.txt
[dsm#localhost:~]$
your find command will turn into find . -name test.txt. Just try the following to illustrate:
[dsm#localhost:~]$ echo find . -name *.txt
find . -name test.txt
[dsm#localhost:~]$
Secondly, egrep does not take filenames from STDIN. To convert them to arguments you need to use xargs

find . -name *.txt | egrep mystring
That will not work as egrep will be searching for mystring within the output generated by find . -name *.txt which are just the path to *.txt files.
Instead, you can use xargs:
find . -name *.txt | xargs egrep mystring

You could use
find . -iname *.txt -exec egrep mystring \{\} \;

Here's an example that will return the file paths of a all *.log files that have a line that begins with ERROR:
find . -name "*.log" -exec egrep -l '^ERROR' {} \;

there's a recursive option from egrep you can use
egrep -R "pattern" *.log

If you only want the filenames:
find . -type f -name '*.txt' -exec egrep -l pattern {} \;
If you want filenames and matches:
find . -type f -name '*.txt' -exec egrep pattern {} /dev/null \;

Related

Linux find command get all text in the file and print file path

I need to get all the texts in the matching file in the folder. However, at the same time need to get the matching file path as well. How can I get the matching file path as well using the following command.
find . -type f -name release.txt | xargs cat
try
find . -type f -name release.txt -exec grep -il {} \; | xargs cat
Skip xargs, just do:
find . -type f -name release.txt -exec sh -c 'echo "$1"; cat "$1"' _ {} \;

Using 'find' to return filenames without extension

I have a directory (with subdirectories), of which I want to find all files that have a ".ipynb" extension. But I want the 'find' command to just return me these filenames without the extension.
I know the first part:
find . -type f -iname "*.ipynb" -print
But how do I then get the names without the "ipynb" extension?
Any replies greatly appreciated...
To return only filenames without the extension, try:
find . -type f -iname "*.ipynb" -execdir sh -c 'printf "%s\n" "${0%.*}"' {} ';'
or (omitting -type f from now on):
find "$PWD" -iname "*.ipynb" -execdir basename {} .ipynb ';'
or:
find . -iname "*.ipynb" -exec basename {} .ipynb ';'
or:
find . -iname "*.ipynb" | sed "s/.*\///; s/\.ipynb//"
however invoking basename on each file can be inefficient, so #CharlesDuffy suggestion is:
find . -iname '*.ipynb' -exec bash -c 'printf "%s\n" "${#%.*}"' _ {} +
or:
find . -iname '*.ipynb' -execdir basename -s '.sh' {} +
Using + means that we're passing multiple files to each bash instance, so if the whole list fits into a single command line, we call bash only once.
To print full path and filename (without extension) in the same line, try:
find . -iname "*.ipynb" -exec sh -c 'printf "%s\n" "${0%.*}"' {} ';'
or:
find "$PWD" -iname "*.ipynb" -print | grep -o "[^\.]\+"
To print full path and filename on separate lines:
find "$PWD" -iname "*.ipynb" -exec dirname "{}" ';' -exec basename "{}" .ipynb ';'
Here's a simple solution:
find . -type f -iname "*.ipynb" | sed 's/\.ipynb$//1'
I found this in a bash oneliner that simplifies the process without using find
for n in *.ipynb; do echo "${n%.ipynb}"; done
If you need to have the name with directory but without the extension :
find . -type f -iname "*.ipynb" -exec sh -c 'f=$(basename $1 .ipynb);d=$(dirname $1);echo "$d/$f"' sh {} \;
find . -type f -iname "*.ipynb" | grep -oP '.*(?=[.])'
The -o flag outputs only the matched part. The -P flag matches according to Perl regular expressions. This is necessary to make the lookahead (?=[.]) work.
Perl One Liner
what you want
find . | perl -a -F/ -lne 'print $F[-1] if /.*.ipynb/g'
Then not your code
what you do not want
find . | perl -a -F/ -lne 'print $F[-1] if !/.*.ipynb/g'
NOTE
In Perl you need to put extra .. So your pattern would be .*.ipynb
If there's no occurrence of this ".ipynb" string on any file name other than a suffix, then you can try this simpler way using tr:
find . -type f -iname "*.ipynb" -print | tr -d ".ipbyn"
If you don't know that the extension is or there are multiple you could use this:
find . -type f -exec basename {} \;|perl -pe 's/(.*)\..*$/$1/;s{^.*/}{}'
and for a list of files with no duplicates (originally differing in path or extension)
find . -type f -exec basename {} \;|perl -pe 's/(.*)\..*$/$1/;s{^.*/}{}'|sort|uniq
Another easy way which uses basename is:
find . -type f -iname '*.ipynb' -exec basename -s '.ipynb' {} +
Using + will reduce the number of invocations of the command (manpage):
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of
invocations of the command will be much less than the number
of matched files. The command line is built in much the same
way that xargs builds its command lines. Only one instance of
'{}' is allowed within the command, and (when find is being
invoked from a shell) it should be quoted (for example, '{}')
to protect it from interpretation by shells. The command is
executed in the starting directory. If any invocation with
the `+' form returns a non-zero value as exit status, then
find returns a non-zero exit status. If find encounters an
error, this can sometimes cause an immediate exit, so some
pending commands may not be run at all. For this reason -exec
my-command ... {} + -quit may not result in my-command
actually being run. This variant of -exec always returns
true.
Using -s with basename runs accepts multiple filenames and removes a specified suffix (manpage):
-a, --multiple
support multiple arguments and treat each as a NAME
-s, --suffix=SUFFIX
remove a trailing SUFFIX; implies -a

prevent space from splitting filenames using backticks

Using find to select files to pass to another command using backticks/backquotes, I've noted that filenames that contain spaces will be split, and therfore not found.
Is it possible to avoid this behaviour? The command I issued looks like this
wc `find . -name '*.txt'`
but for example when there is a file named a b c.txt in directory x it reports
$ wc `find . -name '*.txt'`
wc: ./x/a: No such file or directory
wc: b: No such file or directory
wc: c.txt: No such file or directory
When used with multiple files wc will show the output of each file, and a final summary line with the totals of all files. that's why I want to execute wc once.
I tried escaping spaces with sed, but wc produces the same output (splits filenames with spaces).
wc `find . -name '*.txt' | sed 's/ /\\\ /pg'`
Use the -print0 option to find and the corresponding -0 option to xargs:
find . -name '*.txt' -print0 | xargs -0 wc
You can also use the -exec option to find:
find . -name '*.txt' -exec wc {} +
from this very similar question (should I flag my question as a duplicate?) I found another answer to this using bash's ** expansion:
wc **/*.txt
for this to work I had to
shopt -s globstar

Using find and grep in a specifc file in a specific directory

I have the following directory structure:
./A1
./A2
./A3
./B
./C
In each one of the A* directories I have:
./A*/logs
./A*/test
in the logs directory I have:
./log-jan-1
./log-jan-2
./log-feb-1
How do I grep for a string in all January logs in the A directories?
I tried this, but it did not find the string although it is present in the log files:
find . -type d -name 'A*' print | xargs -n1 -I PATH grep string - PATH/logs/log-jan*
What am I doing wrong?
Why don't you simply use
grep string ./A*/logs/log-jan*
?
If it is not a typo, you should use -print (or -print0) instead of print.
But as find + xargs + grep constructs are hard to debug, you should test in sequence :
find . -type d -name 'A*' -print
find . -type d -name 'A*' -print0 | xargs -0 -n1 -I PATH echo grep string - PATH
and finally :
find . -type d -name 'A*' -print0 | xargs -0 -n1 -I PATH grep string - PATH/logs/log-jan*
In you use case, -print and -print0 should give same results, but for having been burnt with -print, I always use -print0 before xargs

In Linux terminal, how to delete all files in a directory except one or two

In a Linux terminal, how to delete all files from a folder except one or two?
For example.
I have 100 image files in a directory and one .txt file.
I want to delete all files except that .txt file.
From within the directory, list the files, filter out all not containing 'file-to-keep', and remove all files left on the list.
ls | grep -v 'file-to-keep' | xargs rm
To avoid issues with spaces in filenames (remember to never use spaces in filenames), use find and -0 option.
find 'path' -maxdepth 1 -not -name 'file-to-keep' -print0 | xargs -0 rm
Or mixing both, use grep option -z to manage the -print0 names from find
In general, using an inverted pattern search with grep should do the job. As you didn't define any pattern, I'd just give you a general code example:
ls -1 | grep -v 'name_of_file_to_keep.txt' | xargs rm -f
The ls -1 lists one file per line, so that grep can search line by line. grep -v is the inverted flag. So any pattern matched will NOT be deleted.
For multiple files, you may use egrep:
ls -1 | grep -E -v 'not_file1.txt|not_file2.txt' | xargs rm -f
Update after question was updated:
I assume you are willing to delete all files except files in the current folder that do not end with .txt. So this should work too:
find . -maxdepth 1 -type f -not -name "*.txt" -exec rm -f {} \;
find supports a -delete option so you do not need to -exec. You can also pass multiple sets of -not -name somefile -not -name otherfile
user#host$ ls
1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt josh.pdf keepme
user#host$ find . -maxdepth 1 -type f -not -name keepme -not -name 8.txt -delete
user#host$ ls
8.txt keepme
Use the not modifier to remove file(s) or pattern(s) you don't want to delete, you can modify the 1 passed to -maxdepth to specify how many sub directories deep you want to delete files from
find . -maxdepth 1 -not -name "*.txt" -exec rm -f {} \;
You can also do:
find -maxdepth 1 \! -name "*.txt" -exec rm -f {} \;
In bash, you can use:
$ shopt -s extglob # Enable extended pattern matching features
$ rm !(*.txt) # Delete all files except .txt files

Resources