Exclude .svn directories from grep [duplicate] - linux

This question already has answers here:
How can I exclude directories from grep -R?
(14 answers)
Closed 6 years ago.
When I grep my Subversion working copy directory, the results include a lot of files from the .svn directories. Is it possible to recursively grep a directory, but exclude all results from .svn directories?

If you have GNU Grep, it should work like this:
grep --exclude-dir=".svn"
If happen to be on a Unix System without GNU Grep, try the following:
grep -R "whatever you like" *|grep -v "\.svn/*"

For grep >=2.5.1a
You can put this into your environment (e.g. .bashrc)
export GREP_OPTIONS='--exclude-dir=".svn"'
PS: thanks to Adrinan, there are extra quotes in my version:
export GREP_OPTIONS='--exclude-dir=.svn'
PPS: This env option is marked for deprecation: https://www.gnu.org/software/grep/manual/html_node/Environment-Variables.html "As this causes problems when writing portable scripts, this feature will be removed in a future release of grep, and grep warns if it is used. Please use an alias or script instead."

If you use ack (a 'better grep') it will handle this automatically (and do a lot of other clever things too!). It's well worth checking out.

psychoschlumpf is correct, but it only works if you have the latest version of grep. Earlier versions do not have the --exclude-dir option. However, if you have a very large codebase, double-grep-ing can take forever. Drop this in your .bashrc for a portable .svn-less grep:
alias sgrep='find . -path "*/.svn" -prune -o -print0 | xargs -0 grep'
Now you can do this:
sgrep some_var
... and get expected results.
Of course, if you're an insane person like me who just has to use the same .bashrc everywhere, you could spend 4 hours writing an overcomplicated bash function to put there instead. Or, you could just wait for an insane person like me to post it online:
http://gist.github.com/573928

grep --exclude-dir=".svn"
works because the name ".svn" is rather unique. But this might fail on a more generalized name.
grep --exclude-dir="work"
is not bulletproof, if you have "/home/user/work" and "/home/user/stuff/work" it will skip both. It is not possible to define "/*/work/*"
to restrict the exclusion to only the former folder name.
As far as I could experiment, in GNU grep the simple --exclude won't exclude directories.

On my GNU grep 2.5, --exclude-dirs is not a valid option. As an alternative, this worked well for me:
grep --exclude="*.svn-base"
This should be a better solution than excluding all lines which contain .svn/ since it wouldn't accidentally filter out such lines in a real file.

Two greps will do the trick:
The first grep will get everything.
The second grep will use output of first grep as input (via piping). By using the -v flag, grep will select the lines which DON'T match the search terms. Voila. You are left with all the ouputs from the first grep which do not contain .svn in the filepath.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
grep the_text_you_want_to_search_for * | grep -v .svn

I tried double grep'in on my huge code base and it took forever so I got this solution with the help of my co-worker
Pruning is much faster as it stops find from processing those directories compared to 'grep -v' which processes everything and only excludes displaying results
find . -name .svn -prune -o -type f -print0 | xargs -0 egrep 'YOUR STRING'
You can also alias this command in your .bashrc as
alias sgrep='find . -name .svn build -prune -o -type f -print0 | xargs -0 egrep '
Now simply use
sgrep 'whatever'

Another option, albeit one that may not be perceived as an acceptable answer is to clone the repo into git and use git grep.
Rarely, I run into svn repositories that are so massive, it's just impractical to clone via git-svn. In these rare cases, I use a double grep solution, svngrep, but as many answers here indicate, this could be slow on large repositories, and exclude '.svn' occurrences that aren't directories. I would argue that these would be extremely seldom though...
Also regarding slow performance of multiple greps, once you've used something like git, pretty much everything seems slow in svn!
One last thing.., my variation of svngrep passes through colorization, beware, the implementation is ugly! Roughly grep -rn "$what" $where | egrep -v "$ignore" | grep --color "$what"

For grep version 2.5.1 you can add multiple --exclude items to filter out the .svn files.
$ grep -V | grep grep
grep (GNU grep) 2.5.1
GREP_OPTIONS="--exclude=*.svn-base --exclude=entries --exclude=all-wcprops" grep -l -R whatever ./

I think the --exclude option of recursion is what you are searching for.

Related

linux diff on folder and file structure [duplicate]

I have two directories with the same list of files. I need to compare all the files present in both the directories using the diff command. Is there a simple command line option to do it, or do I have to write a shell script to get the file listing and then iterate through them?
You can use the diff command for that:
diff -bur folder1/ folder2/
This will output a recursive diff that ignore spaces, with a unified context:
b flag means ignoring whitespace
u flag means a unified context (3 lines before and after)
r flag means recursive
If you are only interested to see the files that differ, you may use:
diff -qr dir_one dir_two | sort
Option "q" will only show the files that differ but not the content that differ, and "sort" will arrange the output alphabetically.
Diff has an option -r which is meant to do just that.
diff -r dir1 dir2
diff can not only compare two files, it can, by using the -r option, walk entire directory trees, recursively checking differences between subdirectories and files that occur at comparable points in each tree.
$ man diff
...
-r --recursive
Recursively compare any subdirectories found.
...
Another nice option is the über-diff-tool diffoscope:
$ diffoscope a b
It can also emit diffs as JSON, html, markdown, ...
If you specifically don't want to compare contents of files and only check which one are not present in both of the directories, you can compare lists of files, generated by another command.
diff <(find DIR1 -printf '%P\n' | sort) <(find DIR2 -printf '%P\n' | sort) | grep '^[<>]'
-printf '%P\n' tells find to not prefix output paths with the root directory.
I've also added sort to make sure the order of files will be the same in both calls of find.
The grep at the end removes information about identical input lines.
If it's GNU diff then you should just be able to point it at the two directories and use the -r option.
Otherwise, try using
for i in $(\ls -d ./dir1/*); do diff ${i} dir2; done
N.B. As pointed out by Dennis in the comments section, you don't actually need to do the command substitution on the ls. I've been doing this for so long that I'm pretty much doing this on autopilot and substituting the command I need to get my list of files for comparison.
Also I forgot to add that I do '\ls' to temporarily disable my alias of ls to GNU ls so that I lose the colour formatting info from the listing returned by GNU ls.
When working with git/svn or multiple git/svn instances on disk this has been one of the most useful things for me over the past 5-10 years, that somebody might find useful:
diff -burN /path/to/directory1 /path/to/directory2 | grep +++
or:
git diff /path/to/directory1 | grep +++
It gives you a snapshot of the different files that were touched without having to "less" or "more" the output. Then you just diff on the individual files.
In practice the question often arises together with some constraints. In that case following solution template may come in handy.
cd dir1
find . \( -name '*.txt' -o -iname '*.md' \) | xargs -i diff -u '{}' 'dir2/{}'
Here is a script to show differences between files in two folders. It works recursively. Change dir1 and dir2.
(search() { for i in $1/*; do [ -f "$i" ] && (diff "$1/${i##*/}" "$2/${i##*/}" || echo "files: $1/${i##*/} $2/${i##*/}"); [ -d "$i" ] && search "$1/${i##*/}" "$2/${i##*/}"; done }; search "dir1" "dir2" )
Try this:
diff -rq /path/to/folder1 /path/to/folder2

linux : listing files that contain several words

I try to find a way to list all the files in the directory tree (recursively) that contain several words.
While searching I found example such as egrep -R -l 'toto|tata' . but | induce OR. I would like AND...
Thank you for your help
Using GNU grep with GNU xargs,
grep -ERl 'toto' | xargs -r grep 'tata'
The first grep lists those files containing the pattern toto which is then fed to xargs and with the second grep those files containing tata is retrieved. The -r flag is to ensure second grep doesn't run on an empty output.
The -r flag in xargs from the man page,
-r, --no-run-if-empty
If the standard input does not contain any nonblanks, do not run the command.
Normally, the command is run once even if there is no input. This option is a GNU
extension.
agrep tool is designed for providing AND to grep with usage:
agrep 'pattern1;pattern2' file
In your case you could run
find . -type f -exec agrep 'toto;tata' {} \; #apply -l to display the file names
PS1: For current directory you can just agrep 'pattern1;pattern2' *.*
PS2: Unfortunatelly agrep does not support -R option.

removing files with numerals in the beginning of the file name

I'm working with Ubuntu recently and I have been asked to remove files with numerals at the beginning.
How do I remove ordinary files from current directory that have numerals at the first three characters?
Since nobody else bothered to post this,
rm [0-9][0-9][0-9]*
First of all: Be careful when trying out such delete commands! Try running in a directory with test files or files that are backed up well.
You could try something like this from shell:
find . -regex './[0-9]{3}.*' -exec 'rm {}' \;
For debugging, try running it without the rm-command first, listing the files that will be deleted:
find . -regex './[0-9]{3}.*'
You may have to escape the curly braces - at least I had to in FreeBSD, using zsh-shell:
find . -regex './[0-9]\{3\}.*'
How about something like
ls | egrep '^[0-9]{3}' | xargs rm
The ls lists all the files, the egrep filters the list so that it only contains filenames that start with three digits, and the xargs applies rm to each of the filenamess that egrep lets through.

How to do a recursive grep (search) and automatically look inside archive files especially jar ones?

I am looking for a command that would work like grep -r sometext . but that will know to look inside archives (at least .jar).
Extra bonus ;) if we can make it ignore directories like .svn .hg.
The low-tech solution:
find . -name "*.jar" -exec jar tf '{}' \| grep -H "SearchTerm" \;
If you do that sort of thing often, consider replacing grep with ack (plus it does avoid version control directories by default)
To search in archives (e.g. zip, tar, pax, jar) the open source ugrep tool with option -z does exactly that:
ugrep -z "sometext" file.jar

What's the best way to find a string/regex match in files recursively? (UNIX)

I have had to do this several times, usually when trying to find in what files a variable or a function is used.
I remember using xargs with grep in the past to do this, but I am wondering if there are any easier ways.
grep -r REGEX .
Replace . with whatever directory you want to search from.
The portable method* of doing this is
find . -type f -print0 | xargs -0 grep pattern
-print0 tells find to use ASCII nuls as the separator and -0 tells xargs the same thing. If you don't use them you will get errors on files and directories that contain spaces in their names.
* as opposed to grep -r, grep -R, or grep --recursive which only work on some machines.
This is one of the cases for which I've started using ack (http://petdance.com/ack/) in lieu of grep. From the site, you can get instructions to install it as a Perl CPAN component, or you can get a self-contained version that can be installed without dealing with dependencies.
Besides the fact that it defaults to recursive searching, it allows you to use Perl-strength regular expressions, use regex's to choose files to search, etc. It has an impressive list of options. I recommend visiting the site and checking it out. I've found it extremely easy to use, and there are tips for integrating it with vi(m), emacs, and even TextMate if you use that.
If you're looking for a string match, use
fgrep -r pattern .
which is faster than using grep.
More about the subject here: http://www.mkssoftware.com/docs/man1/grep.1.asp
grep -r if you're using GNU grep, which comes with most Linux distros.
On most UNIXes it's not installed by default so try this instead:
find . -type f | xargs grep regex
If you use the zsh shell you can use
grep REGEX **/*
or
grep REGEX **/*.java
This can run out of steam if there are too many matching files.
The canonical way though is to use find with exec.
find . -name '*.java' -exec grep REGEX {} \;
or
find . -type f -exec grep REGEX {} \;
The 'type f' bit just means type of file and will match all files.
I suggest changing the answer to:
grep REGEX -r .
The -r switch doesn't indicate regular expression. It tells grep to recurse into the directory provided.
This is a great way to find the exact expression recursively with one or more file types:
find . \\( -name '\''*.java'\'' -o -name '\''*.xml'\'' \\) | xargs egrep
(internal single quotes)
Where
-name '\''*.<filetype>'\'' -o
(again single quotes here)
is repeated in the parenthesis ( ) for how many more filetypes you want to add to your recursive search
an alias looks like this in bash
alias fnd='find . \\( -name '\''*.java'\'' -o -name '\''*.xml'\'' \\) | xargs egrep'

Resources