How to do a recursive grep (search) and automatically look inside archive files especially jar ones? - search

I am looking for a command that would work like grep -r sometext . but that will know to look inside archives (at least .jar).
Extra bonus ;) if we can make it ignore directories like .svn .hg.

The low-tech solution:
find . -name "*.jar" -exec jar tf '{}' \| grep -H "SearchTerm" \;
If you do that sort of thing often, consider replacing grep with ack (plus it does avoid version control directories by default)

To search in archives (e.g. zip, tar, pax, jar) the open source ugrep tool with option -z does exactly that:
ugrep -z "sometext" file.jar

Related

linux diff on folder and file structure [duplicate]

I have two directories with the same list of files. I need to compare all the files present in both the directories using the diff command. Is there a simple command line option to do it, or do I have to write a shell script to get the file listing and then iterate through them?
You can use the diff command for that:
diff -bur folder1/ folder2/
This will output a recursive diff that ignore spaces, with a unified context:
b flag means ignoring whitespace
u flag means a unified context (3 lines before and after)
r flag means recursive
If you are only interested to see the files that differ, you may use:
diff -qr dir_one dir_two | sort
Option "q" will only show the files that differ but not the content that differ, and "sort" will arrange the output alphabetically.
Diff has an option -r which is meant to do just that.
diff -r dir1 dir2
diff can not only compare two files, it can, by using the -r option, walk entire directory trees, recursively checking differences between subdirectories and files that occur at comparable points in each tree.
$ man diff
...
-r --recursive
Recursively compare any subdirectories found.
...
Another nice option is the über-diff-tool diffoscope:
$ diffoscope a b
It can also emit diffs as JSON, html, markdown, ...
If you specifically don't want to compare contents of files and only check which one are not present in both of the directories, you can compare lists of files, generated by another command.
diff <(find DIR1 -printf '%P\n' | sort) <(find DIR2 -printf '%P\n' | sort) | grep '^[<>]'
-printf '%P\n' tells find to not prefix output paths with the root directory.
I've also added sort to make sure the order of files will be the same in both calls of find.
The grep at the end removes information about identical input lines.
If it's GNU diff then you should just be able to point it at the two directories and use the -r option.
Otherwise, try using
for i in $(\ls -d ./dir1/*); do diff ${i} dir2; done
N.B. As pointed out by Dennis in the comments section, you don't actually need to do the command substitution on the ls. I've been doing this for so long that I'm pretty much doing this on autopilot and substituting the command I need to get my list of files for comparison.
Also I forgot to add that I do '\ls' to temporarily disable my alias of ls to GNU ls so that I lose the colour formatting info from the listing returned by GNU ls.
When working with git/svn or multiple git/svn instances on disk this has been one of the most useful things for me over the past 5-10 years, that somebody might find useful:
diff -burN /path/to/directory1 /path/to/directory2 | grep +++
or:
git diff /path/to/directory1 | grep +++
It gives you a snapshot of the different files that were touched without having to "less" or "more" the output. Then you just diff on the individual files.
In practice the question often arises together with some constraints. In that case following solution template may come in handy.
cd dir1
find . \( -name '*.txt' -o -iname '*.md' \) | xargs -i diff -u '{}' 'dir2/{}'
Here is a script to show differences between files in two folders. It works recursively. Change dir1 and dir2.
(search() { for i in $1/*; do [ -f "$i" ] && (diff "$1/${i##*/}" "$2/${i##*/}" || echo "files: $1/${i##*/} $2/${i##*/}"); [ -d "$i" ] && search "$1/${i##*/}" "$2/${i##*/}"; done }; search "dir1" "dir2" )
Try this:
diff -rq /path/to/folder1 /path/to/folder2

Find multiple files and rename them in Linux

I am having files like a_dbg.txt, b_dbg.txt ... in a Suse 10 system. I want to write a bash shell script which should rename these files by removing "_dbg" from them.
Google suggested me to use rename command. So I executed the command rename _dbg.txt .txt *dbg* on the CURRENT_FOLDER
My actual CURRENT_FOLDER contains the below files.
CURRENT_FOLDER/a_dbg.txt
CURRENT_FOLDER/b_dbg.txt
CURRENT_FOLDER/XX/c_dbg.txt
CURRENT_FOLDER/YY/d_dbg.txt
After executing the rename command,
CURRENT_FOLDER/a.txt
CURRENT_FOLDER/b.txt
CURRENT_FOLDER/XX/c_dbg.txt
CURRENT_FOLDER/YY/d_dbg.txt
Its not doing recursively, how to make this command to rename files in all subdirectories. Like XX and YY I will be having so many subdirectories which name is unpredictable. And also my CURRENT_FOLDER will be having some other files also.
You can use find to find all matching files recursively:
find . -iname "*dbg*" -exec rename _dbg.txt .txt '{}' \;
EDIT: what the '{}' and \; are?
The -exec argument makes find execute rename for every matching file found. '{}' will be replaced with the path name of the file. The last token, \; is there only to mark the end of the exec expression.
All that is described nicely in the man page for find:
-exec utility [argument ...] ;
True if the program named utility returns a zero value as its
exit status. Optional arguments may be passed to the utility.
The expression must be terminated by a semicolon (``;''). If you
invoke find from a shell you may need to quote the semicolon if
the shell would otherwise treat it as a control operator. If the
string ``{}'' appears anywhere in the utility name or the argu-
ments it is replaced by the pathname of the current file.
Utility will be executed from the directory from which find was
executed. Utility and arguments are not subject to the further
expansion of shell patterns and constructs.
For renaming recursively I use the following commands:
find -iname \*.* | rename -v "s/ /-/g"
small script i wrote to replace all files with .txt extension to .cpp extension under /tmp and sub directories recursively
#!/bin/bash
for file in $(find /tmp -name '*.txt')
do
mv $file $(echo "$file" | sed -r 's|.txt|.cpp|g')
done
with bash:
shopt -s globstar nullglob
rename _dbg.txt .txt **/*dbg*
find -execdir rename also works for non-suffix replacements on basenames
https://stackoverflow.com/a/16541670/895245 works directly only for suffixes, but this will work for arbitrary regex replacements on basenames:
PATH=/usr/bin find . -depth -execdir rename 's/_dbg.txt$/_.txt' '{}' \;
or to affect files only:
PATH=/usr/bin find . -type f -execdir rename 's/_dbg.txt$/_.txt' '{}' \;
-execdir first cds into the directory before executing only on the basename.
Tested on Ubuntu 20.04, find 4.7.0, rename 1.10.
Convenient and safer helper for it
find-rename-regex() (
set -eu
find_and_replace="$1"
PATH="$(echo "$PATH" | sed -E 's/(^|:)[^\/][^:]*//g')" \
find . -depth -execdir rename "${2:--n}" "s/${find_and_replace}" '{}' \;
)
GitHub upstream.
Sample usage to replace spaces ' ' with hyphens '-'.
Dry run that shows what would be renamed to what without actually doing it:
find-rename-regex ' /-/g'
Do the replace:
find-rename-regex ' /-/g' -v
Command explanation
The awesome -execdir option does a cd into the directory before executing the rename command, unlike -exec.
-depth ensure that the renaming happens first on children, and then on parents, to prevent potential problems with missing parent directories.
-execdir is required because rename does not play well with non-basename input paths, e.g. the following fails:
rename 's/findme/replaceme/g' acc/acc
The PATH hacking is required because -execdir has one very annoying drawback: find is extremely opinionated and refuses to do anything with -execdir if you have any relative paths in your PATH environment variable, e.g. ./node_modules/.bin, failing with:
find: The relative path ‘./node_modules/.bin’ is included in the PATH environment variable, which is insecure in combination with the -execdir action of find. Please remove that entry from $PATH
See also: https://askubuntu.com/questions/621132/why-using-the-execdir-action-is-insecure-for-directory-which-is-in-the-path/1109378#1109378
-execdir is a GNU find extension to POSIX. rename is Perl based and comes from the rename package.
Rename lookahead workaround
If your input paths don't come from find, or if you've had enough of the relative path annoyance, we can use some Perl lookahead to safely rename directories as in:
git ls-files | sort -r | xargs rename 's/findme(?!.*\/)\/?$/replaceme/g' '{}'
I haven't found a convenient analogue for -execdir with xargs: https://superuser.com/questions/893890/xargs-change-working-directory-to-file-path-before-executing/915686
The sort -r is required to ensure that files come after their respective directories, since longer paths come after shorter ones with the same prefix.
Tested in Ubuntu 18.10.
Script above can be written in one line:
find /tmp -name "*.txt" -exec bash -c 'mv $0 $(echo "$0" | sed -r \"s|.txt|.cpp|g\")' '{}' \;
If you just want to rename and don't mind using an external tool, then you can use rnm. The command would be:
#on current folder
rnm -dp -1 -fo -ssf '_dbg' -rs '/_dbg//' *
-dp -1 will make it recursive to all subdirectories.
-fo implies file only mode.
-ssf '_dbg' searches for files with _dbg in the filename.
-rs '/_dbg//' replaces _dbg with empty string.
You can run the above command with the path of the CURRENT_FOLDER too:
rnm -dp -1 -fo -ssf '_dbg' -rs '/_dbg//' /path/to/the/directory
You can use this below.
rename --no-act 's/\.html$/\.php/' *.html */*.html
This command worked for me. Remember first to install the perl rename package:
find -iname \*.* | grep oldname | rename -v "s/oldname/newname/g
To expand on the excellent answer #CiroSantilliПутлерКапут六四事 : do not match files in the find that we don't have to rename.
I have found this to improve performance significantly on Cygwin.
Please feel free to correct my ineffective bash coding.
FIND_STRING="ZZZZ"
REPLACE_STRING="YYYY"
FIND_PARAMS="-type d"
find-rename-regex() (
set -eu
find_and_replace="${1}/${2}/g"
echo "${find_and_replace}"
find_params="${3}"
mode="${4}"
if [ "${mode}" = 'real' ]; then
PATH="$(echo "$PATH" | sed -E 's/(^|:)[^\/][^:]*//g')" \
find . -depth -name "*${1}*" ${find_params} -execdir rename -v "s/${find_and_replace}" '{}' \;
elif [ "${mode}" = 'dryrun' ]; then
echo "${mode}"
PATH="$(echo "$PATH" | sed -E 's/(^|:)[^\/][^:]*//g')" \
find . -depth -name "*${1}*" ${find_params} -execdir rename -n "s/${find_and_replace}" '{}' \;
fi
)
find-rename-regex "${FIND_STRING}" "${REPLACE_STRING}" "${FIND_PARAMS}" "dryrun"
# find-rename-regex "${FIND_STRING}" "${REPLACE_STRING}" "${FIND_PARAMS}" "real"
In case anyone is comfortable with fd and rnr, the command is:
fd -t f -x rnr '_dbg.txt' '.txt'
rnr only command is:
rnr -f -r '_dbg.txt' '.txt' *
rnr has the benefit of being able to undo the command.
On Ubuntu (after installing rename), this simpler solution worked the best for me. This replaces space with underscore, but can be modified as needed.
find . -depth | rename -d -v -n "s/ /_/g"
The -depth flag is telling find to traverse the depth of a directory first, which is good because I want to rename the leaf nodes first.
The -d flag on rename tells it to only rename the filename component of the path. I don't know how general the behavior is but on my installation (Ubuntu 20.04), it could be the file or the directory as long as it is the leaf node of the path.
I recommend the -n (no action) flag first along with -v, so you can see what would get renamed and how.
Using the two flags together, it renames all the files in a directory first and then the directory itself. Working backwards. Which is exactly what I needed.
classic solution:
for f in $(find . -name "*dbg*"); do mv $f $(echo $f | sed 's/_dbg//'); done

Searching for information in files in several directories

I need to check several files which are in different locations for a specific information.
So, how to make a script which checks for the argument word through several directories?
The directories are in different locations. For ex.
/home/check1/
/opt/log/
/var/status/
You could also do (next to ´find´)
do a
for DIR in /home/check1 /opt/log /var/status ; do
grep -R searchword $DIR;
done
At the very simplest, it boils down to
find . -name '*.c' | xargs grep word
to find a given word in all the .c files in the current directory and below.
grep -R may also work for you, but it can be a problem if you don't want to search all files.
Use the grep -R (recursive) option and give grep multiple directory arguments.
Try find http://content.hccfl.edu/pollock/Unix/FindCmd.htm using your searchwords and the directories.
The man page of grep should explain what you need. Anyway, if you need to search recursively you can use:
grep -R --include=PATTERN "string_to_search" $directory
You can also use:
--exclude=PATTERN to skip some file
--exclude-dir=PATTERN to skip some directories
The other option is use find to get the files and pipe it to grep to search the strings.

Exclude .svn directories from grep [duplicate]

This question already has answers here:
How can I exclude directories from grep -R?
(14 answers)
Closed 6 years ago.
When I grep my Subversion working copy directory, the results include a lot of files from the .svn directories. Is it possible to recursively grep a directory, but exclude all results from .svn directories?
If you have GNU Grep, it should work like this:
grep --exclude-dir=".svn"
If happen to be on a Unix System without GNU Grep, try the following:
grep -R "whatever you like" *|grep -v "\.svn/*"
For grep >=2.5.1a
You can put this into your environment (e.g. .bashrc)
export GREP_OPTIONS='--exclude-dir=".svn"'
PS: thanks to Adrinan, there are extra quotes in my version:
export GREP_OPTIONS='--exclude-dir=.svn'
PPS: This env option is marked for deprecation: https://www.gnu.org/software/grep/manual/html_node/Environment-Variables.html "As this causes problems when writing portable scripts, this feature will be removed in a future release of grep, and grep warns if it is used. Please use an alias or script instead."
If you use ack (a 'better grep') it will handle this automatically (and do a lot of other clever things too!). It's well worth checking out.
psychoschlumpf is correct, but it only works if you have the latest version of grep. Earlier versions do not have the --exclude-dir option. However, if you have a very large codebase, double-grep-ing can take forever. Drop this in your .bashrc for a portable .svn-less grep:
alias sgrep='find . -path "*/.svn" -prune -o -print0 | xargs -0 grep'
Now you can do this:
sgrep some_var
... and get expected results.
Of course, if you're an insane person like me who just has to use the same .bashrc everywhere, you could spend 4 hours writing an overcomplicated bash function to put there instead. Or, you could just wait for an insane person like me to post it online:
http://gist.github.com/573928
grep --exclude-dir=".svn"
works because the name ".svn" is rather unique. But this might fail on a more generalized name.
grep --exclude-dir="work"
is not bulletproof, if you have "/home/user/work" and "/home/user/stuff/work" it will skip both. It is not possible to define "/*/work/*"
to restrict the exclusion to only the former folder name.
As far as I could experiment, in GNU grep the simple --exclude won't exclude directories.
On my GNU grep 2.5, --exclude-dirs is not a valid option. As an alternative, this worked well for me:
grep --exclude="*.svn-base"
This should be a better solution than excluding all lines which contain .svn/ since it wouldn't accidentally filter out such lines in a real file.
Two greps will do the trick:
The first grep will get everything.
The second grep will use output of first grep as input (via piping). By using the -v flag, grep will select the lines which DON'T match the search terms. Voila. You are left with all the ouputs from the first grep which do not contain .svn in the filepath.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
grep the_text_you_want_to_search_for * | grep -v .svn
I tried double grep'in on my huge code base and it took forever so I got this solution with the help of my co-worker
Pruning is much faster as it stops find from processing those directories compared to 'grep -v' which processes everything and only excludes displaying results
find . -name .svn -prune -o -type f -print0 | xargs -0 egrep 'YOUR STRING'
You can also alias this command in your .bashrc as
alias sgrep='find . -name .svn build -prune -o -type f -print0 | xargs -0 egrep '
Now simply use
sgrep 'whatever'
Another option, albeit one that may not be perceived as an acceptable answer is to clone the repo into git and use git grep.
Rarely, I run into svn repositories that are so massive, it's just impractical to clone via git-svn. In these rare cases, I use a double grep solution, svngrep, but as many answers here indicate, this could be slow on large repositories, and exclude '.svn' occurrences that aren't directories. I would argue that these would be extremely seldom though...
Also regarding slow performance of multiple greps, once you've used something like git, pretty much everything seems slow in svn!
One last thing.., my variation of svngrep passes through colorization, beware, the implementation is ugly! Roughly grep -rn "$what" $where | egrep -v "$ignore" | grep --color "$what"
For grep version 2.5.1 you can add multiple --exclude items to filter out the .svn files.
$ grep -V | grep grep
grep (GNU grep) 2.5.1
GREP_OPTIONS="--exclude=*.svn-base --exclude=entries --exclude=all-wcprops" grep -l -R whatever ./
I think the --exclude option of recursion is what you are searching for.

What's the best way to find a string/regex match in files recursively? (UNIX)

I have had to do this several times, usually when trying to find in what files a variable or a function is used.
I remember using xargs with grep in the past to do this, but I am wondering if there are any easier ways.
grep -r REGEX .
Replace . with whatever directory you want to search from.
The portable method* of doing this is
find . -type f -print0 | xargs -0 grep pattern
-print0 tells find to use ASCII nuls as the separator and -0 tells xargs the same thing. If you don't use them you will get errors on files and directories that contain spaces in their names.
* as opposed to grep -r, grep -R, or grep --recursive which only work on some machines.
This is one of the cases for which I've started using ack (http://petdance.com/ack/) in lieu of grep. From the site, you can get instructions to install it as a Perl CPAN component, or you can get a self-contained version that can be installed without dealing with dependencies.
Besides the fact that it defaults to recursive searching, it allows you to use Perl-strength regular expressions, use regex's to choose files to search, etc. It has an impressive list of options. I recommend visiting the site and checking it out. I've found it extremely easy to use, and there are tips for integrating it with vi(m), emacs, and even TextMate if you use that.
If you're looking for a string match, use
fgrep -r pattern .
which is faster than using grep.
More about the subject here: http://www.mkssoftware.com/docs/man1/grep.1.asp
grep -r if you're using GNU grep, which comes with most Linux distros.
On most UNIXes it's not installed by default so try this instead:
find . -type f | xargs grep regex
If you use the zsh shell you can use
grep REGEX **/*
or
grep REGEX **/*.java
This can run out of steam if there are too many matching files.
The canonical way though is to use find with exec.
find . -name '*.java' -exec grep REGEX {} \;
or
find . -type f -exec grep REGEX {} \;
The 'type f' bit just means type of file and will match all files.
I suggest changing the answer to:
grep REGEX -r .
The -r switch doesn't indicate regular expression. It tells grep to recurse into the directory provided.
This is a great way to find the exact expression recursively with one or more file types:
find . \\( -name '\''*.java'\'' -o -name '\''*.xml'\'' \\) | xargs egrep
(internal single quotes)
Where
-name '\''*.<filetype>'\'' -o
(again single quotes here)
is repeated in the parenthesis ( ) for how many more filetypes you want to add to your recursive search
an alias looks like this in bash
alias fnd='find . \\( -name '\''*.java'\'' -o -name '\''*.xml'\'' \\) | xargs egrep'

Resources