Grep regular files in a linux File System and show their content

Grep regular files in a linux File System and show their content - linux

How do I display the content of files regular files matched with grep command? For example I grep a directory in order to see the regular files it has. I used the next line to see the regular files only:
ls -lR | grep ^-
Then I would like to display the content of the files found there. How do I do it?

I would do something like:
$ cat `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
Use ls to find the files
grep finds your pattern
reverse the whole result
cut out the first file separated field to get the file name (files with spaces are problematic)
reverse the file name back to normal direction
Backticks will execute that and return the list of file names to cat.
or the way I would probably do it is use vim to look at each file.
$ vim `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
It feels like you are trying to find only the files recursively. This is what I do in those cases:
$ vim `find . -type f -print`

There are multiple ways of doing it. Would try to give you a few easy and clean ways here. All of them handle filenames with space.
$ find . -type f -print0 | xargs -0 cat
-print0 adds a null character '\0' delimiter and you need to call xargs -0 to recognise the null delimiter. If you don't do that, whitespace in the filename create problems.
e.g. without -print0 filenames: abc 123.txt and 1.inc would be read as three separate files abc, 123.txt and 1.inc.
with -print0 this becomes abc 123.txt'\0' and 1.inc'\0' and would be read as abc 123.txt and 1.inc
As for xargs, it can accept the input as a parameter. command1 | xargs command2 means the output of command1 is passed to command2.
cat displays the content of the file.
$ find . -type f -exec echo {} \; -exec cat {} \;
This is just using the find command. It finds all the files (type f), calls echo to output the filename, then calls cat to display its content.
If you don't want the filename, omit -exec echo {} \;
Alternatively you can use cat command and pass the output of find.
$ cat `find . -type f -print`
If you want to scroll through the content of multiple files one by one. You can use.
$ less `find . -type f -print`
When using less, you can navigate through :n and :p for next and previous file respectively. press q to quit less.

Related

delete all files except a pattern list file

I need to delete all the files in the current directory except a list of patterns that are described in a whitelist file (delete_whitelist.txt) like this:
(.*)dir1(/)?
(.*)dir2(/)?
(.*)dir2/ser1(/)?(.*)
(.*)dir2/ser2(/)?(.*)
(.*)dir2/ser3(/)?(.*)
(.*)dir2/ser4(/)?(.*)
(.*)dir2/ser5(/)?(.*)
How can I perform this in one bash line?

Any bash script can fit on one line:
find . -type f -print0 | grep -EzZvf delete_whitelist.txt | xargs -0 printf '%s\n'
Check the output and then, if it's OK:
find . -type f -print0 | grep -EzZvf delete_whitelist.txt | xargs -0 rm

Select files by extension using grep

I need to count all the .txt files in the current folder.
I tried ls | grep .txt but if my folder content is: a.txt btxt c.c it will select a.txt and btxt and I only want files that end with .txt. I tried various combinations of regexp but with no result.

Find may be better than in this case since it is designed for handling file names:
find . -maxdepth 0 -name '*.txt' | wc -l
Buf if you are very cautious about possibly strange file names:
find . -maxdepth 0 -name '*.txt' -exec echo 1 \; | wc -l

For Grep, using the character '.' means: "any character"... so you'll need to escape the dot:
ls | grep -e "\.txt"
edit in fact the -e option is not even necessary. this will do the trick:
ls | grep "\.txt"

If all you need is number of files with extension '.txt' in current directory only, then this will also help.
ls -l *.txt | wc -l

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?

This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr

Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done

find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted

You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.

Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0

Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .

Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines

This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines

This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.

My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.

I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^

THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh

find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot

I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done

This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'

A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980

I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).

Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c

In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr

Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l

omg why the complex commands. just use something like
find whatever_folder | wc -l

Unix Command to List files containing string but NOT containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!

Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.

The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.

These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'

Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>

find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.

To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

Linux command: How to 'find' only text files?

After a few searches from Google, what I come up with is:
find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text
which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.

I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find to find only non-binary files:
find . -type f -exec grep -Iq . {} \; -print
The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, #lucas.werkmeister!)
Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.
EDIT: As #ruslan correctly pointed out, the -and can be omitted since it is implied.

Based on this SO question :
grep -rIl "needle text" my_folder

Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text
}
put it in your .bashrc and then just run:
findTextInAsciiFiles your_folder "needle text"
whenever you want.
EDIT to reflect OP's edit:
if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :: cut -d':' -f1:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}

find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"
This is unfortunately not space save. Putting this into bash script makes it a bit easier.
This is space safe:
#!/bin/bash
#if [ ! "$1" ] ; then
echo "Usage: $0 <search>";
exit
fi
find . -type f -print0 \
| xargs -0 file \
| grep -P text \
| cut -d: -f1 \
| xargs -i% grep -Pil "$1" "%"

Another way of doing this:
# find . |xargs file {} \; |grep "ASCII text"
If you want empty files too:
# find . |xargs file {} \; |egrep "ASCII text|empty"

How about this:
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'
If you want the filenames without the file types, just add a final sed filter.
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
You can filter-out unneeded file types by adding more -e 'type' options to the last grep command.
EDIT:
If your xargs version supports the -d option, the commands above become simpler:
$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'

Here's how I've done it ...
1 . make a small script to test if a file is plain text
istext:
#!/bin/bash
[[ "$(file -bi $1)" == *"file"* ]]
2 . use find as before
find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;

Here's a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.
If you were to write out the problem in steps, it would look like this:
// For every file in this directory
// Check the filetype
// If it's an ASCII file, then print out the filename
To achieve this, we can use three UNIX commands: find, file, and grep.
find will check every file in the directory.
file will give us the filetype. In our case, we're looking for a return of 'ASCII text'
grep will look for the keyword 'ASCII' in the output from file
So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).
find ./ -exec file {} ";" | grep 'ASCII'
Looks complicated, but not bad when we break it down:
find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the 'expression', or whatever comes after the path, which in our case is the current directory or ./
The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.
-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It's like calling a function within a function.
file {} = the command being called inside of find. The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt. In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces {} to act as an empty variable, or parameter. In other words, we're just asking for the system to output a string for every file in the directory.
";" = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for 'find' for more explanation if you need it by running man find.
| grep 'ASCII' = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string 'ASCII'. If it does, it returns true.
NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.

I have two issues with histumness' answer:
It only list text files. It does not actually search them as
requested. To actually search, use
find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
It spawns a grep process for every file, which is very slow. A better solution is then
find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
or simply
find . -type f -print0 | xargs -0 grep -I "needle text"
This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.
Also, nobody cited ag, the Silver Searcher or ack-grep¸as alternatives. If one of these are available, they are much better alternatives:
ag -t "needle text" # Much faster than ack
ack -t "needle text" # or ack-grep
As a last note, beware of false positives (binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.

Although it is an old question, I think this info bellow will add to the quality of the answers here.
When ignoring files with the executable bit set, I just use this command:
find . ! -perm -111
To keep it from recursively enter into other directories:
find . -maxdepth 1 ! -perm -111
No need for pipes to mix lots of commands, just the powerful plain find command.
Disclaimer: it is not exactly what OP asked, because it doesn't check if the file is binary or not. It will, for example, filter out bash script files, that are text themselves but have the executable bit set.
That said, I hope this is useful to anyone.

I do it this way:
1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:
find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &
2) create a function in .bashrc:
findex() {
cat ~/.src_list | xargs grep "$*" 2>/dev/null
}
Then I can use below command to do the search:
findex "needle text"
HTH:)

I prefer xargs
find . -type f | xargs grep -I "needle text"
if your filenames are weird look up using the -0 options:
find . -type f -print0 | xargs -0 grep -I "needle text"

bash example to serach text "eth0" in /etc in all text/ascii files
grep eth0 $(find /etc/ -type f -exec file {} \; | egrep -i "text|ascii" | cut -d ':' -f1)

If you are interested in finding any file type by their magic bytes using the awesome file utility combined with power of find, this can come in handy:
$ # Let's make some test files
$ mkdir ASCII-finder
$ cd ASCII-finder
$ dd if=/dev/urandom of=binary.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.009023 s, 116 MB/s
$ file binary.file
binary.file: data
$ echo 123 > text.txt
$ # Let the magic begin
$ find -type f -print0 | \
xargs -0 -I ## bash -c 'file "$#" | grep ASCII &>/dev/null && echo "file is ASCII: $#"' -- ##
Output:
file is ASCII: ./text.txt
Legend: $ is the interactive shell prompt where we enter our commands
You can modify the part after && to call some other script or do some other stuff inline as well, i.e. if that file contains given string, cat the entire file or look for a secondary string in it.
Explanation:
find items that are files
Make xargs feed each item as a line into one liner bash
command/script
file checks type of file by magic byte, grep checks if ASCII
exists, if so, then after && your next command executes.
find prints results null separated, this is good to escape
filenames with spaces and meta-characters in it.
xargs , using -0 option, reads them null separated, -I ##
takes each record and uses as positional parameter/args to bash
script.
-- for bash ensures whatever comes after it is an argument even
if it starts with - like -c which could otherwise be interpreted
as bash option
If you need to find types other than ASCII, simply replace grep ASCII with other type, like grep "PDF document, version 1.4"

find . -type f | xargs file | grep "ASCII text" | awk -F: '{print $1}'
Use find command to list all files, use file command to verify they are text (not tar,key), finally use awk command to filter and print the result.

How about this
find . -type f|xargs grep "needle text"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Grep regular files in a linux File System and show their content - linux

Related

delete all files except a pattern list file

Select files by extension using grep

How to count number of files in each directory?

Unix Command to List files containing string but NOT containing another string

Linux command: How to 'find' only text files?

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Grep regular files in a linux File System and show their content - linux

Related

delete all files except a pattern list file

Select files by extension using grep

How to count number of files in each directory?

Unix Command to List files containing string but *NOT* containing another string

Linux command: How to 'find' only text files?

Categories

Resources

Unix Command to List files containing string but NOT containing another string