When comparing directories with diff is there a way to exclude certain file differences from the output?

When comparing directories with diff is there a way to exclude certain file differences from the output? - linux

I am running this command:
diff --recursive a.git b.git
It shows differences for some files that do not concern me. Is there a way to for example not have it show lines that end in:
".xaml.g.cs"
or lines that start in:
"Binary files"
or lines that have this text in them:
".git/objects"

A simple but effective way is to pipe output to grep. With grep -Ev you can ignore lines using regular expressions.
diff --recursive a.git b.git | grep -Ev "^< .xaml.g.cs|^> .xaml.g.cs" | grep -Ev "Binary files$" | grep -v ".git/objects"
This ignores all lines with matching text. As for the regular expressions: ^ means line starts with, $ means line ends with. But at least for ^ you have to adjust it to the diff output (where lines normally start with < or >).
Also note that diff provides a flag --ignore-matching-lines=RE but it might not work as you would expect as mentioned in this question/answer. And because it does not work as I would expect I rather use grep for filtering.

man diff
gives following examples:
...
-x, --exclude=PAT
exclude files that match PAT
-X, --exclude-from=FILE
exclude files that match any pattern in FILE
...
Did you try using those switches (like diff -x=".xaml.g.cs" --recursive a.git b.git)?

Related

grep recursively for a specific file type on Linux

Can we search a term (eg. "onblur") recursively in some folders only in specific files (html files)?
grep -Rin "onblur" *.html
This returns nothing. But,
grep -Rin "onblur" .
returns "onblur" search result from all available files, like in text(".txt"), .mako, .jinja etc.

Consider checking this answer and that one.
Also this might help you: grep certain file types recursively | commandlinefu.com.
The command is:
grep -r --include="*.[ch]" pattern .
And in your case it is:
grep -r --include="*.html" "onblur" .

grep -r --include "*.html" onblur .
Got it from :
How do I grep recursively?

You might also like ag 'the silver searcher' -
ag --html onblur
it searches by regexp and is recursive in the current directory by default, and has predefined sets of extensions to search - in this case --html maps to .htm, .html, .shtml, .xhtml. Also ignores binary files, prints filenames, line numbers, and colorizes output by default.
Some options -
-Q --literal
Do not parse PATTERN as a regular expression. Try to match it literally.
-S --smart-case
Match case-sensitively if there are any uppercase letters in PATTERN,
case-insensitively otherwise. Enabled by default.
-t --all-text
Search all text files. This doesn't include hidden files.
--hidden
Search hidden files. This option obeys ignored files.
For the list of supported filetypes run ag --list-file-types.
The only thing it seems to lack is being able to specify a filetype with an extension, in which case you need to fall back on grep with --include.

To be able to grep only from .py files by typing grepy mystring I added the following line to my bashrc:
alias grepy='grep -r --include="*.py"'
Also note that grep accepts The following:
grep mystring *.html
for .html search in current folder
grep mystring */*.html
for recursive search (excluding any file in current dir!).
grep mystring .*/*/*.html
for recursive search (all files in current dir and all files in subdirs)

Have a look at this answer instead, to a similar question: grep, but only certain file extensions
This worked for me. In your case just type the following:
grep -inr "onblur" --include \*.html ./
consider that
grep: command
-r: recursively
-i: ignore-case
-n: each output line is preceded by its relative line number in the file
--include \*.html: escape with \ just in case you have a directory with asterisks in the filenames
./: start at current directory.

LINUX Shell commands cat and grep

I am a windows user having basic idea about LINUX and i encountered this command:
cat countryInfo.txt | grep -v "^#" >countryInfo-n.txt
After some research i found that cat is for concatenation and grep is for regular exp search (don't know if i am right) but what will the above command result in (since both are combined together) ?
Thanks in Advance.
EDIT: I am asking this as i dont have linux installed. Else, i could test it.

Short answer: it removes all lines starting with a # and stores the result in countryInfo-n.txt.
Long explanation:
cat countryInfo.txt reads the file countryInfo.txt and streams its content to standard output.
| connects the output of the left command with the input of the right command (so the right command can read what the left command prints).
grep -v "^#" returns all lines that do not (-v) match the regex ^# (which means: line starts with #).
Finally, >countryInfo-n.txt stores the output of grep into the specified file.

It will remove all lines starting with # and put the output in countryInfo-n.txt

This command would result in removing lines starting with # from the file countryInfo.txt and place the output in the file countryInfo-n.txt.
This command could also have been written as
grep -v "^#" countryInfo.txt > countryInfo-n.txt
See Useless Use of Cat.

Listing entries in a directory using grep

I'm trying to list all entries in a directory whose names contain ONLY upper-case letters. Directories need "/" appended.
#!/bin/bash
cd ~/testfiles/
ls | grep -r *.*
Since grep by default looks for upper-case letters only (right?), I'm just recursively searching through the directories under testfiles for all names who contain only upper-case letters.
Unfortunately this doesn't work.
As for appending directories, I'm not sure why I need to do this. Does anyone know where I can start with some detailed explanations on what I can do with grep? Furthermore how to tackle my problem?

No, grep does not only consider uppercase letters.
Your question I a bit unclear, for example:
from your usage of the -r option, it seems you want to search recursively, however you don't say so. For simplicity I assume you don't need to; consider looking into #twm's answer if you need recursion.
you want to look for uppercase (letters) only. Does that mean you don't want to accept any other (non letter) characters, but which are till valid for file names (like digits or dashes, dots, etc.)
since you don't say th it i not permissible to have only on file per line, I am assuming it is OK (thus using ls -1).
The naive solution would be:
ls -1 | grep "^[[:upper:]]\+$"
That is, print all lines containing only uppercase letters. In my TEMP directory that prints, for example:
ALLBIG
LCFEM
WPDNSE
This however would exclude files like README.TXT or FILE001, which depending on your requirements (see above) should most likely be included.
Thus, a better solution would be:
ls -1 | grep -v "[[:lower:]]\+"
That is, print all lines not containing an lowercase letter. In my TEMP directory that prints for example:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM
WPDNSE
~DFA0214428CD719AF6.TMP
Finally, to "properly mark" directories with a trailing '/', you could use the -F (or --classify) option.
ls -1F | grep -v "[[:lower:]]\+"
Again, example output:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM/
WPDNSE/
~DFA0214428CD719AF6.TMP
Note a different option would to be use find, if you can live with the different output (e.g. find ! -regex ".*[a-z].*"), but that will have a different output.

The exact regular expression depend on the output format of your ls command. Assuming that you do not use an alias for ls, you can try this:
ls -R | grep -o -w "[A-Z]*"
note that with -R in ls you will recursively list directories and files under the current directory. The grep option -o tells grep to only print the matched part of the text. The -w options tell grep to consider as match only for whole words. The "[A-Z]*" is a regexp to filter only upper-cased words.
Note that this regexp will print TEST.txt as well as TEXT.TXT. In other words, it will only consider names that are formed by letters.

It's ls which lists the files, not grep, so that is where you need to specify that you want "/" appended to directories. Use ls --classify to append "/" to directories.
grep is used to process the results from ls (or some other source, generally speaking) and only show lines that match the pattern you specify. It is not limited to uppercase characters. You can limit it to just upper case characters and "/" with grep -E '^[A-Z/]*$ or if you also want numbers, periods, etc. you could instead filter out lines that contain lowercase characters with grep -v -E [a-z].
As grep is not the program which lists the files, it is not where you want to perform the recursion. ls can list paths recursively if you use ls -R. However, you're just going to get the last component of the file paths that way.
You might want to consider using find to handle the recursion. This works for me:
find . -exec ls -d --classify {} \; | egrep -v '[a-z][^/]*/?$'
I should note, using ls --classify to append "/" to the end of directories may also append some other characters to other types of paths that it can classify. For instance, it may append "*" to the end of executable files. If that's not OK, but you're OK with listing directories and other paths separately, this could be worked around by running find twice - once for the directories and then again for other paths. This works for me:
find . -type d | egrep -v '[a-z][^/]*$' | sed -e 's#$#/#'
find . -not -type d | egrep -v '[a-z][^/]*$'

grep based on blacklist -- without procedural code?

It's a well-known task, simple to describe:
Given a text file foo.txt, and a blacklist file of exclusion strings, one per line, produce foo_filtered.txt that has only the lines of foo.txt that do not contain any exclusion string.
A common application is filtering compiler warnings from a build log, but to ignore warnings on files that are not yours. The file foo.txt is the warnings file (itself filtered from the build log), and a blacklist file excluded_filenames.txt with file names, one per line.
I know how it's done in procedural languages like Perl or AWK, and I've even done it with combinations of Linux commands such as cut, comm, and sort.
But I feel that I should be really close with xargs, and just can't see the last step.
I know that if excluded_filenames.txt has only 1 file name in it, then
grep -v foo.txt `cat excluded_filenames.txt`
will do it.
And I know that I can get the filenames one per line with
xargs -L1 -a excluded_filenames.txt
So how do I combine those two into a single solution, without explicit loops in a procedural language?
Looking for the simple and elegant solution.

You should use the -f option (or you can use fgrep which is the same):
grep -vf excluded_filenames.txt foo.txt
You could also use -F which is more directly the answer to what you asked:
grep -vF "`cat excluded_filenames.txt`" foo.txt
from man grep
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.

Highlight text similar to grep, but don't filter out text [duplicate]

This question already has answers here:
Colorized grep -- viewing the entire file with highlighted matches
(24 answers)
Closed 7 years ago.
When using grep, it will highlight any text in a line with a match to your regular expression.
What if I want this behaviour, but have grep print out all lines as well? I came up empty after a quick look through the grep man page.

Use ack. Checkout its --passthru option here: ack. It has the added benefit of allowing full perl regular expressions.
$ ack --passthru 'pattern1' file_name
$ command_here | ack --passthru 'pattern1'
You can also do it using grep like this:
$ grep --color -E '^|pattern1|pattern2' file_name
$ command_here | grep --color -E '^|pattern1|pattern2'
This will match all lines and highlight the patterns. The ^ matches every start of line, but won't get printed/highlighted since it's not a character.
(Note that most of the setups will use --color by default. You may not need that flag).

You can make sure that all lines match but there is nothing to highlight on irrelevant matches
egrep --color 'apple|' test.txt
Notes:
egrep may be spelled also grep -E
--color is usually default in most distributions
some variants of grep will "optimize" the empty match, so you might want to use "apple|$" instead (see: https://stackoverflow.com/a/13979036/939457)

EDIT:
This works with OS X Mountain Lion's grep:
grep --color -E 'pattern1|pattern2|$'
This is better than '^|pattern1|pattern2' because the ^ part of the alternation matches at the beginning of the line whereas the $ matches at the end of the line. Some regular expression engines won't highlight pattern1 or pattern2 because ^ already matched and the engine is eager.
Something similar happens for 'pattern1|pattern2|' because the regex engine notices the empty alternation at the end of the pattern string matches the beginning of the subject string.
[1]: http://www.regular-expressions.info/engine.html
FIRST EDIT:
I ended up using perl:
perl -pe 's:pattern:\033[31;1m$&\033[30;0m:g'
This assumes you have an ANSI-compatible terminal.
ORIGINAL ANSWER:
If you're stuck with a strange grep, this might work:
grep -E --color=always -A500 -B500 'pattern1|pattern2' | grep -v '^--'
Adjust the numbers to get all the lines you want.
The second grep just removes extraneous -- lines inserted by the BSD-style grep on Mac OS X Mountain Lion, even when the context of consecutive matches overlap.
I thought GNU grep omitted the -- lines when context overlaps, but it's been awhile so maybe I remember wrong.

You can use my highlight script from https://github.com/kepkin/dev-shell-essentials
It's better than grep cause you can highlight each match with it's own color.
$ command_here | highlight green "input" | highlight red "output"

Since you want matches highlighted, this is probably for human consumption (as opposed to piping to another program for instance), so a nice solution would be to use:
less -p <your-pattern> <your-file>
And if you don't care about case sensitivity:
less -i -p <your-pattern> <your-file>
This also has the advantage of having pages, which is nice when having to go through a long output

You can do it using only grep by:
reading the file line by line
matching a pattern in each line and highlighting pattern by grep
if there is no match, echo the line as is
which gives you the following:
while read line ; do (echo $line | grep PATTERN) || echo $line ; done < inputfile

If you want to print "all" lines, there is a simple working solution:
grep "test" -A 9999999 -B 9999999
A => After
B => Before

If you are doing this because you want more context in your search, you can do this:
cat BIG_FILE.txt | less
Doing a search in less should highlight your search terms.
Or pipe the output to your favorite editor. One example:
cat BIG_FILE.txt | vim -
Then search/highlight/replace.

If you are looking for a pattern in a directory recursively, you can either first save it to file.
ls -1R ./ | list-of-files.txt
And then grep that, or pipe it to the grep search
ls -1R | grep --color -rE '[A-Z]|'
This will look of listing all files, but colour the ones with uppercase letters. If you remove the last | you will only see the matches.
I use this to find images named badly with upper case for example, but normal grep does not show the path for each file just once per directory so this way I can see context.

Maybe this is an XY problem, and what you are really trying to do is to highlight occurrences of words as they appear in your shell. If so, you may be able to use your terminal emulator for this. For instance, in Konsole, start Find (ctrl+shift+F) and type your word. The word will then be highlighted whenever it occurs in new or existing output until you cancel the function.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

When comparing directories with diff is there a way to exclude certain file differences from the output? - linux

I am running this command: diff --recursive a.git b.git It shows differences for some files that do not concern me. Is there a way to for example not have it show lines that end in: ".xaml.g.cs" or lines that start in: "Binary files" or lines that have this text in them: ".git/objects"

man diff gives following examples: ... -x, --exclude=PAT exclude files that match PAT -X, --exclude-from=FILE exclude files that match any pattern in FILE ... Did you try using those switches (like diff -x=".xaml.g.cs" --recursive a.git b.git)?

Related

grep recursively for a specific file type on Linux

LINUX Shell commands cat and grep

Listing entries in a directory using grep

grep based on blacklist -- without procedural code?

Highlight text similar to grep, but don't filter out text [duplicate]

Categories

Resources