"find" and "ls" with GNU parallel - linux

I'm trying to use GNU parallel to post a lot of files to a web server. In my directory, I have some files:
file1.xml
file2.xml
and I have a shell script that looks like this:
#! /usr/bin/env bash
CMD="curl -X POST -d#$1 http://server/path"
eval $CMD
There's some other stuff in the script, but this was the simplest example. I tried to execute the following command:
ls | parallel -j2 script.sh {}
Which is what the GNU parallel pages show as the "normal" way to operate on files in a directory. This seems to pass the name of the file into my script, but curl complains that it can't load the data file passed in. However, if I do:
find . -name '*.xml' | parallel -j2 script.sh {}
it works fine. Is there a difference between how ls and find are passing arguments to my script? Or do I need to do something additional in that script?

GNU parallel is a variant of xargs. They both have very similar interfaces, and if you're looking for help on parallel, you may have more luck looking up information about xargs.
That being said, the way they both operate is fairly simple. With their default behavior, both programs read input from STDIN, then break the input up into tokens based on whitespace. Each of these tokens is then passed to a provided program as an argument. The default for xargs is to pass as many tokens as possible to the program, and then start a new process when the limit is hit. I'm not sure how the default for parallel works.
Here is an example:
> echo "foo bar \
baz" | xargs echo
foo bar baz
There are some problems with the default behavior, so it is common to see several variations.
The first issue is that because whitespace is used to tokenize, any files with white space in them will cause parallel and xargs to break. One solution is to tokenize around the NULL character instead. find even provides an option to make this easy to do:
> echo "Success!" > bad\ filename
> find . "bad\ filename" -print0 | xargs -0 cat
Success!
The -print0 option tells find to seperate files with the NULL character instead of whitespace.
The -0 option tells xargs to use the NULL character to tokenize each argument.
Note that parallel is a little better than xargs in that its default behavior is the tokenize around only newlines, so there is less of a need to change the default behavior.
Another common issue is that you may want to control how the arguments are passed to xargs or parallel. If you need to have a specific placement of the arguments passed to the program, you can use {} to specify where the argument is to be placed.
> mkdir new_dir
> find -name *.xml | xargs mv {} new_dir
This will move all files in the current directory and subdirectories into the new_dir directory. It actually breaks down into the following:
> find -name *.xml | xargs echo mv {} new_dir
> mv foo.xml new_dir
> mv bar.xml new_dir
> mv baz.xml new_dir
So taking into consideration how xargs and parallel work, you should hopefully be able to see the issue with your command. find . -name '*.xml' will generate a list of xml files to be passed to the script.sh program.
> find . -name '*.xml' | parallel -j2 echo script.sh {}
> script.sh foo.xml
> script.sh bar.xml
> script.sh baz.xml
However, ls | parallel -j2 script.sh {} will generate a list of ALL files in the current directory to be passed to the script.sh program.
> ls | parallel -j2 echo script.sh {}
> script.sh some_directory
> script.sh some_file
> script.sh foo.xml
> ...
A more correct variant on the ls version would be as follows:
> ls *.xml | parallel -j2 script.sh {}
However, and important difference between this and the find version is that find will search through all subdirectories for files, while ls will only search the current directory. The equivalent find version of the above ls command would be as follows:
> find -maxdepth 1 -name '*.xml'
This will only search the current directory.

Since it works with find you probably want to see what command GNU Parallel is running (using -v or --dryrun) and then try to run the failing commands manually.
ls *.xml | parallel --dryrun -j2 script.sh
find -maxdepth 1 -name '*.xml' | parallel --dryrun -j2 script.sh

I have not used parallel but there is a different between ls & find . -name '*.xml'. ls will list all the files and directories where as find . -name '*.xml' will list only the files (and directories) which end with a .xml.
As suggested by Paul Rubel, just print the value of $1 in your script to check this. Additionally you may want to consider filtering the input to files only in find with the -type f option.
Hope this helps!

Neat.
I had never used parallel before. It appears, though that there are two of them.
One is the Gnu Parrallel, and the one that was installed on my system has Tollef Fog Heen
listed as the author in the man pages.
As Paul mentioned, you should use
set -x
Also, the paradigm that you mentioned above doesn't seem to work on my parallel, rather, I have
to do the following:
$ cat ../script.sh
+ cat ../script.sh
#!/bin/bash
echo $#
$ parallel -ij2 ../script.sh {} -- $(find -name '*.xml')
++ find -name '*.xml'
+ parallel -ij2 ../script.sh '{}' -- ./b.xml ./c.xml ./a.xml ./d.xml ./e.xml
./c.xml
./b.xml
./d.xml
./a.xml
./e.xml
$ parallel -ij2 ../script.sh {} -- $(ls *.xml)
++ ls --color=auto a.xml b.xml c.xml d.xml e.xml
+ parallel -ij2 ../script.sh '{}' -- a.xml b.xml c.xml d.xml e.xml
b.xml
a.xml
d.xml
c.xml
e.xml
find does provide a different input, It prepends the relative path to the name.
Maybe that is what is messing up your script?

Related

Grep files in subdirectories and write out files for each directory

I am working on a bioinformatics workflow in which the tool in question, 'salmon' creates multiple directories having a 'quant.sf' file. I want to find all 'lnc' entries within these files and save them as 'lnc.sf' for all directories.
I was previously running
cat quant.sf | grep 'lnc' > lnc.sf
in all directories individually that seemed to solve my problem. Now I want to write a script that goes into each directory and generates a lnc.sf file.
I have tried doing
find . -name "quant.sf" | while read A
do
cat $A | grep 'lnc' > lnc.sf
done
But this just creates a concatenated lnc.sf file in the current directory. Any help is highly appreciated.
Thank You!
If all your quant.sf files are at the same hierarchy level, the following should work, assuming a folder structure like month/day/quant.sf:
grep -h 'lnc' */*/quant.sf > lnc.sf
Otherwise, find the files, be aware of using find+read instead of exec or xargs; understand variable expansion with whitespaces, get rid of the redundant cat process, and write the file to the correct directory:
find . -name 'quant.sf' | while IFS= read -r A
do
grep 'lnc' "$A" > "${A%/*}/lnc.sf"
done
If you have GNU find + xargs, use -print0 combined with -0:
find . -name 'quant.sf' -print0 | xargs -0 -n1 sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' -
Or use -exec of find, which avoids problems with weird files names:
find . -name 'quant.sf' -exec sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' - ';'

No such file or directory when piping. Each command works separately, but not when piping

I have 2 folders: folder_a & folder_b. In each of these folders there are a bunch of files. I am trying to use sed to move all of these files out of these folders and into my current working directory I am currently in.
My folder structure looks like this:
mytest:
a:
1.txt
2.txt
3.txt
b:
4.txt
5.txt
The command I am trying to use is:
find . -type d ! -iname '*.*' # find all folders other than root
| sed -r 's/.*/&\/*/' # add '/*' to each of the arguments
| sed -r 'p;s/.*/./' # output: a/* . b/* .
| xargs -n 2 mv # should be creating two commands: 'mv a/* .' and 'mv b/* .'
Unfortunately I get an error:
mv: cannot stat './aaa/*': No such file or directory
I also get the same error when I try this other strategy (using ls instead of mv):
for dir in */; do
ls $dir;
done;
Even if I use sed to replace the spaces in each directory name with '\ ', or surround the directory names with quotes I get the same error.
I'm not sure if these 2 examples are related in my misunderstanding of bash but they both seem to demonstrate my ignorance of how bash translates the output from one command into the input of another command.
Can anyone shed some light on this?
Update: Completely rewritten.
As #EtanReisner and #melpomene have noted, mv */* . or, more specifically, mv a/* b/* . is the most straightforward solution, but you state that this is in part a learning exercise, so the remainder of the answer shows an efficient find-based solution and explains the problem with the original command.
An efficient find-based solution
Generally, if feasible, it's best and most efficient to let find itself do the work, without involving additional tools; find's -exec action is like a built-in xargs, with {} representing the path at hand (with terminator \;) / all paths (with +):
find . -type f -exec echo mv -t . {} +
To be safe, his will just print the mv commands that would be executed; remove the echo to actually execute them.
This will execute a single[1] mv command to which all matching files are passed, and -t . moves them all to the current dir.
[1] If the resulting command line is too long (which is unlikely), it is split up into multiple commands, just as with xargs.
Operating on files (-type f) bypasses the need for globbing, as find will then enumerate all files for you (it also bypasses the need to exclude . explicitly).
Note that this solution works on entire subtrees, not just (immediate) subdirectories.
It's tempting to consider turning on Bash 4's globstar option and using mv */** ., but that won't work, because it will attempt to move directories as well, not just the files in them.
A caveat re -exec with +: it only works if {} - the placeholder for all paths - is the token immediately before the +.
Since you're on Linux, we can satisfy this condition by specifying the target folder for mv with option -t before the {}; on BSD-based systems such as OSX, you could not do that, because mv doesn't support -t there, so you'd have to use terminator \;, which means that mv is called once for every path, which is obviously much slower.
Why your command didn't work:
As #EtanReisner points out in a comment, xargs invokes the command specified without (implicitly) involving a shell, so globbing won't work; you can verify this with the following command:
echo '*' | xargs echo # -> '*' - NO globbing
If we leave the globbing issue aside, additional work would have been necessary to make your xargs command work correctly with folder names with embedded spaces (or other shell metacharacters):
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -n 2 echo mv # NOTE: still won't work due to lack of globbing
Note how the (combined) sed command now produces a single output line '<input-path>'/* ., with the input path enclosed in embedded single-quotes, which is required for xargs to recognize <input-path> as a single argument, even if it contains embedded spaces.
(If your filenames contain single-quotes, you'd have to do more work; also note that since now all arguments for a given dir. are on a single line, you could use xargs -L 1 ....)
Also note how -mindepth 1 (only process paths at the subdirectory level or below) is used to skip processing of . itself.
The only way to make globbing happen is to get the shell involved:
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -I {} sh -c 'echo mv {}' # works, but is inefficient
Note the use of xargs' -I option to treat each input line as its own argument ({} is a self-chosen placeholder for the input).
sh -c invokes the (default) shell to execute the resulting command, at which globbing does happen.
However, overall, this is quite inefficient:
A pipeline with 3 segments is used.
A shell instance is invoked for every input path, which in turn calls the mv utility.
Compare this to the efficient find-only solution above, which (typically) creates only 2 processes in total.

Remove files not containing a specific string

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?
The following will work:
find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm
This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.
Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.
The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:
my string
If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)
Finally you call xargs again to get rid of all the files that you've found with the previous calls.
The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:
rm $(some command that produces lots of filenames)
It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.
Note that this solution would have been simpler without the need to cope with files containing white space and new lines.
Alternatively
grep -r -L -Z 'my string' . | xargs --null rm
will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).
Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.
GNU grep and bash.
grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done
Use a find solution if portability is needed. This is slightly faster.
EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to #ormaaj for pointing it out!
find . -type f | grep -v "exclude string" | xargs rm
Note: grep pattern will match against full file path from current directory (see find . -type f output)
One possibility is
find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;
You can remove the echo if the output of this preview looks correct.
The equivalent with -delete is
find . -type f '!' -exec grep -q "user_id" {} \; -delete
but then you don't get the nice preview option.
To remove files not containing a specific string:
Bash:
To use them, enable the extglob shell option as follows:
shopt -s extglob
And just remove all files that don't have the string "fix":
rm !(*fix*)
If you want to don't delete all the files that don't have the names "fix" and "class":
rm !(*fix*|*class*)
Zsh:
To use them, enable the extended glob zsh shell option as follows:
setopt extended_glob
Remove all files that don't have the string, in this example "fix":
rm -- ^*fix*
If you want to don't delete all the files that don't have the names "fix" and "class":
rm -- ^(*fix*|*class*)
It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.
Here are the sources:
https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/
https://codeday.me/es/qa/20190819/1296122.html
I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.
find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm
This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.
This worked for me, you can remove the -f if you're okay with deleting directories.
myString="keepThis"
for x in `find ./`
do if [[ -f $x && ! $x =~ $myString ]]
then rm $x
fi
done
Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.
find -type f ! -name "*my string*" -exec rm {} \; -print

Extract arguments from stdout and pipe

I was trying to execute a script n times with a different file as argument each time using this:
ls /user/local/*.log | xargs script.pl
(script.pl accepts file name as argument)
But the script is executed only once. How to resolve this ? Am I not doing it correctly ?
ls /user/local/*.log | xargs -rn1 script.pl
I guess your script only expects one parameter, you need to tell xargs about that.
Passing -r helps if the input list would be empty
Note that something like the following is, in general, better:
find /user/local/ -maxdepth 1 -type f -name '*.log' -print0 |
xargs -0rn1 script.pl
It will handle quoting and directories safer.
To see what xargs actually executes use the -t flag.
I hope this helps:
for i in $(ls /usr/local/*.log)
do
script.pl $i
done
I would avoid using ls as this is often an alias on many systems. E.g. if your ls produces colored output, then this will not work:
for i in ls; do ls $i;done
Rather, you can just let the shell expand the wildcard for you:
for i in /usr/local/*.log; do script.pl $i; done

How do I recursively grep all directories and subdirectories?

How do I recursively grep all directories and subdirectories?
find . | xargs grep "texthere" *
grep -r "texthere" .
The first parameter represents the regular expression to search for, while the second one represents the directory that should be searched. In this case, . means the current directory.
Note: This works for GNU grep, and on some platforms like Solaris you must specifically use GNU grep as opposed to legacy implementation. For Solaris this is the ggrep command.
If you know the extension or pattern of the file you would like, another method is to use --include option:
grep -r --include "*.txt" texthere .
You can also mention files to exclude with --exclude.
Ag
If you frequently search through code, Ag (The Silver Searcher) is a much faster alternative to grep, that's customized for searching code. For instance, it's recursive by default and automatically ignores files and directories listed in .gitignore, so you don't have to keep passing the same cumbersome exclude options to grep or find.
I now always use (even on Windows with GoW -- Gnu on Windows):
grep --include="*.xxx" -nRHI "my Text to grep" *
(As noted by kronen in the comments, you can add 2>/dev/null to void permission denied outputs)
That includes the following options:
--include=PATTERN
Recurse in directories only searching file matching PATTERN.
-n, --line-number
Prefix each line of output with the line number within its input file.
(Note: phuclv adds in the comments that -n decreases performance a lot so, so you might want to skip that option)
-R, -r, --recursive
Read all files under each directory, recursively; this is equivalent to the -d recurse option.
-H, --with-filename
Print the filename for each match.
-I
Process a binary file as if it did not contain matching data;
this is equivalent to the --binary-files=without-match option.
And I can add 'i' (-nRHIi), if I want case-insensitive results.
I can get:
/home/vonc/gitpoc/passenger/gitlist/github #grep --include="*.php" -nRHI "hidden" *
src/GitList/Application.php:43: 'git.hidden' => $config->get('git', 'hidden') ? $config->get('git', 'hidden') : array(),
src/GitList/Provider/GitServiceProvider.php:21: $options['hidden'] = $app['git.hidden'];
tests/InterfaceTest.php:32: $options['hidden'] = array(self::$tmpdir . '/hiddenrepo');
vendor/klaussilveira/gitter/lib/Gitter/Client.php:20: protected $hidden;
vendor/klaussilveira/gitter/lib/Gitter/Client.php:170: * Get hidden repository list
vendor/klaussilveira/gitter/lib/Gitter/Client.php:176: return $this->hidden;
...
Also:
find ./ -type f -print0 | xargs -0 grep "foo"
but grep -r is a better answer.
globbing **
Using grep -r works, but it may overkill, especially in large folders.
For more practical usage, here is the syntax which uses globbing syntax (**):
grep "texthere" **/*.txt
which greps only specific files with pattern selected pattern. It works for supported shells such as Bash +4 or zsh.
To activate this feature, run: shopt -s globstar.
See also: How do I find all files containing specific text on Linux?
git grep
For projects under Git version control, use:
git grep "pattern"
which is much quicker.
ripgrep
For larger projects, the quickest grepping tool is ripgrep which greps files recursively by default:
rg "pattern" .
It's built on top of Rust's regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast. Check the detailed analysis here.
In POSIX systems, you don't find -r parameter for grep and your grep -rn "stuff" . won't run, but if you use find command it will:
find . -type f -exec grep -n "stuff" {} \; -print
Agreed by Solaris and HP-UX.
If you only want to follow actual directories, and not symbolic links,
grep -r "thingToBeFound" directory
If you want to follow symbolic links as well as actual directories (be careful of infinite recursion),
grep -R "thing to be found" directory
Since you're trying to grep recursively, the following options may also be useful to you:
-H: outputs the filename with the line
-n: outputs the line number in the file
So if you want to find all files containing Darth Vader in the current directory or any subdirectories and capture the filename and line number, but do not want the recursion to follow symbolic links, the command would be
grep -rnH "Darth Vader" .
If you want to find all mentions of the word cat in the directory
/home/adam/Desktop/TomAndJerry
and you're currently in the directory
/home/adam/Desktop/WorldDominationPlot
and you want to capture the filename but not the line number of any instance of the string "cats", and you want the recursion to follow symbolic links if it finds them, you could run either of the following
grep -RH "cats" ../TomAndJerry #relative directory
grep -RH "cats" /home/adam/Desktop/TomAndJerry #absolute directory
Source:
running "grep --help"
A short introduction to symbolic links, for anyone reading this answer and confused by my reference to them:
https://www.nixtutor.com/freebsd/understanding-symbolic-links/
To find name of files with path recursively containing the particular string use below command
for UNIX:
find . | xargs grep "searched-string"
for Linux:
grep -r "searched-string" .
find a file on UNIX server
find . -type f -name file_name
find a file on LINUX server
find . -name file_name
just the filenames can be useful too
grep -r -l "foo" .
another syntax to grep a string in all files on a Linux system recursively
grep -irn "string"
the -r indicates a recursive search that searches for the specified string in the given directory and sub directory looking for the specified string in files, program, etc
-i ingnore case sensitive can be used to add inverted case string
-n prints the line number of the specified string
NB: this prints massive result to the console so you might need to filter the output by piping and remove less interesting bits of info it also searches binary programs so you might want to filter some of the results
ag is my favorite way to do this now github.com/ggreer/the_silver_searcher . It's basically the same thing as ack but with a few more optimizations.
Here's a short benchmark. I clear the cache before each test (cf https://askubuntu.com/questions/155768/how-do-i-clean-or-disable-the-memory-cache )
ryan#3G08$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3
ryan#3G08$ time grep -r "hey ya" .
real 0m9.458s
user 0m0.368s
sys 0m3.788s
ryan#3G08:$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3
ryan#3G08$ time ack-grep "hey ya" .
real 0m6.296s
user 0m0.716s
sys 0m1.056s
ryan#3G08$ sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
3
ryan#3G08$ time ag "hey ya" .
real 0m5.641s
user 0m0.356s
sys 0m3.444s
ryan#3G08$ time ag "hey ya" . #test without first clearing cache
real 0m0.154s
user 0m0.224s
sys 0m0.172s
This should work:
grep -R "texthere" *
If you are looking for a specific content in all files from a directory structure, you may use find since it is more clear what you are doing:
find -type f -exec grep -l "texthere" {} +
Note that -l (downcase of L) shows the name of the file that contains the text. Remove it if you instead want to print the match itself. Or use -H to get the file together with the match. All together, other alternatives are:
find -type f -exec grep -Hn "texthere" {} +
Where -n prints the line number.
This is the one that worked for my case on my current machine (git bash on windows 7):
find ./ -type f -iname "*.cs" -print0 | xargs -0 grep "content pattern"
I always forget the -print0 and -0 for paths with spaces.
EDIT: My preferred tool is now instead ripgrep: https://github.com/BurntSushi/ripgrep/releases . It's really fast and has better defaults (like recursive by default). Same example as my original answer but using ripgrep: rg -g "*.cs" "content pattern"
grep -r "texthere" . (notice period at the end)
(^credit: https://stackoverflow.com/a/1987928/1438029)
Clarification:
grep -r "texthere" / (recursively grep all directories and subdirectories)
grep -r "texthere" . (recursively grep these directories and subdirectories)
grep recursive
grep [options] PATTERN [FILE...]
[options]
-R, -r, --recursive
Read all files under each directory, recursively.
This is equivalent to the -d recurse or --directories=recurse option.
http://linuxcommand.org/man_pages/grep1.html
grep help
$ grep --help
$ grep --help |grep recursive
-r, --recursive like --directories=recurse
-R, --dereference-recursive
Alternatives
ack (http://beyondgrep.com/)
ag (http://github.com/ggreer/the_silver_searcher)
Throwing my two cents here. As others already mentioned grep -r doesn't work on every platform. This may sound silly but I always use git.
git grep "texthere"
Even if the directory is not staged, I just stage it and use git grep.
Below are the command for search a String recursively on Unix and Linux environment.
for UNIX command is:
find . -name "string to be searched" -exec grep "text" "{}" \;
for Linux command is:
grep -r "string to be searched" .
In 2018, you want to use ripgrep or the-silver-searcher because they are way faster than the alternatives.
Here is a directory with 336 first-level subdirectories:
% find . -maxdepth 1 -type d | wc -l
336
% time rg -w aggs -g '*.py'
...
rg -w aggs -g '*.py' 1.24s user 2.23s system 283% cpu 1.222 total
% time ag -w aggs -G '.*py$'
...
ag -w aggs -G '.*py$' 2.71s user 1.55s system 116% cpu 3.651 total
% time find ./ -type f -name '*.py' | xargs grep -w aggs
...
find ./ -type f -name '*.py' 1.34s user 5.68s system 32% cpu 21.329 total
xargs grep -w aggs 6.65s user 0.49s system 32% cpu 22.164 total
On OSX, this installs ripgrep: brew install ripgrep. This installs silver-searcher: brew install the_silver_searcher.
In my IBM AIX Server (OS version: AIX 5.2), use:
find ./ -type f -print -exec grep -n -i "stringYouWannaFind" {} \;
this will print out path/file name and relative line number in the file like:
./inc/xxxx_x.h
2865: /** Description : stringYouWannaFind */
anyway,it works for me : )
For a list of available flags:
grep --help
Returns all matches for the regexp texthere in the current directory, with the corresponding line number:
grep -rn "texthere" .
Returns all matches for texthere, starting at the root directory, with the corresponding line number and ignoring case:
grep -rni "texthere" /
flags used here:
-r recursive
-n print line number with output
-i ignore case
Note that find . -type f | xargs grep whatever sorts of solutions will run into "Argument list to long" errors when there are too many files matched by find.
The best bet is grep -r but if that isn't available, use find . -type f -exec grep -H whatever {} \; instead.
I guess this is what you're trying to write
grep myText $(find .)
and this may be something else helpful if you want to find the files grep hit
grep myText $(find .) | cut -d : -f 1 | sort | uniq
For .gz files, recursively scan all files and directories
Change file type or put *
find . -name \*.gz -print0 | xargs -0 zgrep "STRING"
Just for fun, a quick and dirty search of *.txt files if the #christangrant answer is too much to type :-)
grep -r texthere .|grep .txt
Here's a recursive (tested lightly with bash and sh) function that traverses all subfolders of a given folder ($1) and using grep searches for given string ($3) in given files ($2):
$ cat script.sh
#!/bin/sh
cd "$1"
loop () {
for i in *
do
if [ -d "$i" ]
then
# echo entering "$i"
cd "$i"
loop "$1" "$2"
fi
done
if [ -f "$1" ]
then
grep -l "$2" "$PWD/$1"
fi
cd ..
}
loop "$2" "$3"
Running it and an example output:
$ sh script start_folder filename search_string
/home/james/start_folder/dir2/filename
Get the first matched files from grep command and get all the files don't contain some word, but input files for second grep comes from result files of first grep command.
grep -l -r --include "*.js" "FIRSTWORD" * | xargs grep "SECONDwORD"
grep -l -r --include "*.js" "FIRSTWORD" * | xargs grep -L "SECONDwORD"
dc0fd654-37df-4420-8ba5-6046a9dbe406
grep -l -r --include "*.js" "SEARCHWORD" * | awk -F'/' '{print $NF}' | xargs -I{} sh -c 'echo {}; grep -l -r --include "*.html" -w --include=*.js -e {} *; echo '''
5319778a-cec2-444d-bcc4-53d33821fedb

Resources