find files not in a list - linux

I have a list of files in file.lst.
Now I want to find all files in a directory dir which are older than 7 days, except those in the file.lst file. How can I either modify the find command or remove all entries in file.lst from the result?
Example:
file.lst:
a
b
c
Execute:
find -mtime +7 -print > found.lst
found.lst:
a
d
e
so what I expect is:
d
e

Pipe your find command through grep -Fxvf:
find -mtime +7 -print | grep -Fxvf file.lst
What the flags mean:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.

Pipe the find-command to grep using the -v and -f switches
find -mtime +7 -print | grep -vf file.lst > found.lst
grep options:
-v : invert the match
-f file: - obtains patterns from FILE, one per line
example:
$ ls
a b c d file.lst
$ cat file.lst
a$
b$
c$
$ find . | grep -vf file.lst
.
./file.lst
./d

Related

Count all occurrences of a tag in lots of log files using grep

I need to get the quantity of tags, for example "103=16" found in lots of files, how many of them are, but only the files that have one or more occurrences
I'm using:
find . /opt/FIXLOGS/l51prdsrv\* -iname "TRADX\_*oe*.log" -type f -exec grep -F 103=16 -c {} /dev/null ;
which finds the file where the tag is and shows the number of matches, but it also shows the 0 occurrences
returns
file1.log:0
file2.log:0
file3.log:6
file4.log:0
using a -i to exclude the 0 or grep -v :0 haven't worked for me, gets the result:
grep: :0: No such file or directory
How can I get only the files where the count is more than 0?
Have you tried piping into grep to negate the ones with zeroes after the find/exec?
E.g., like this works for me:
find . -type f -iname "TRADX_oe.log" -exec grep -cFH "103=16" {} \; | grep -v ":0"
Using awk to do everything in one place
find . -type f -iname "TRADX_oe.log" -exec awk '/103=16/{c++} END { if(c)print FILENAME, c}' {} \;
That is the way how -c option of grep works:
-c, --count
Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match
option (see below), count non-matching lines.
So it will print 0 counts, only option is to remove 0 with another grep using -v or use awk:
awk '/search_pattern/{f[FILENAME]+=1} END {for(i in f){print i":"f[i]}}' /path/to/files*
It worked when i pipe the grep after the ; excluding the zero | grep -v ":0"
ending like this:
find . route -iname "TRAD_oe.log" -type f -exec grep -cHF "103=16" {} ; | grep -v ":0"

Grep regular files in a linux File System and show their content

How do I display the content of files regular files matched with grep command? For example I grep a directory in order to see the regular files it has. I used the next line to see the regular files only:
ls -lR | grep ^-
Then I would like to display the content of the files found there. How do I do it?
I would do something like:
$ cat `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
Use ls to find the files
grep finds your pattern
reverse the whole result
cut out the first file separated field to get the file name (files with spaces are problematic)
reverse the file name back to normal direction
Backticks will execute that and return the list of file names to cat.
or the way I would probably do it is use vim to look at each file.
$ vim `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
It feels like you are trying to find only the files recursively. This is what I do in those cases:
$ vim `find . -type f -print`
There are multiple ways of doing it. Would try to give you a few easy and clean ways here. All of them handle filenames with space.
$ find . -type f -print0 | xargs -0 cat
-print0 adds a null character '\0' delimiter and you need to call xargs -0 to recognise the null delimiter. If you don't do that, whitespace in the filename create problems.
e.g. without -print0 filenames: abc 123.txt and 1.inc would be read as three separate files abc, 123.txt and 1.inc.
with -print0 this becomes abc 123.txt'\0' and 1.inc'\0' and would be read as abc 123.txt and 1.inc
As for xargs, it can accept the input as a parameter. command1 | xargs command2 means the output of command1 is passed to command2.
cat displays the content of the file.
$ find . -type f -exec echo {} \; -exec cat {} \;
This is just using the find command. It finds all the files (type f), calls echo to output the filename, then calls cat to display its content.
If you don't want the filename, omit -exec echo {} \;
Alternatively you can use cat command and pass the output of find.
$ cat `find . -type f -print`
If you want to scroll through the content of multiple files one by one. You can use.
$ less `find . -type f -print`
When using less, you can navigate through :n and :p for next and previous file respectively. press q to quit less.

How can I use grep to get all the lines that contains string1 and string2 separated by space?

Line1: .................
Line2: #hello1 #hello2 #hello3
Line3: .................
Line4: .................
Line5: #hello1 #hello4 #hello3
Line6: #hello1 #hello2 #hello3
Line7: .................
I have files that look similar in terms of lines on one of my project directories. I want to get the counts of all the lines that contain #hello1 and #hello2. In this case I would get 2 as a result only for this file. However, I want to do this recursively.
The canonical way to "do something recursively" is to use the find command. If you want to find lines that have two words on them, a simple regex will do:
grep -lr '#hello1.*#hello2' .
The option -l instructs grep to show us only filenames rather than file content, and the option -r tells grep to traverse the filesystem recursively. The start of the search is the path at the end of the line. Once you have the list of files, you can parse that list using commands run by xargs.
For example, this will count all the lines in files matching the pattern you specified.
grep -lr '#hello1.*#hello2' . | xargs -n 1 wc -l
This uses xargs to run the wc command on each of the files listed by grep. You could probably also run this without the -n 1, unless you're dealing with many many thousands of files that would exceed your maximum command line length.
Or, if I'm interpreting your question correctly, the following will count just the patterns in those files.
grep -lr '#hello1.*#hello2' . | xargs -n 1 grep -Hc '#hello1.*#hello2'
This runs a similar grep to the one used to generate your recursive list of files, and presents the output with filename (-H) and count (-c).
But if you want complex rules like finding two patterns possibly on different lines in the file, then grep probably is not the optimal tool, unless you use multiple greps launched by find:
find /path/to/base -type f \
-exec grep -q '#hello1' {} \; \
-exec grep -q '#hello2' {} \; \
-print
(Lines split for easier reading.)
This is somewhat costly, as find needs to launch up to two children for each file. So another approach would be to use awk instead:
find /path/to/base -type f \
-exec awk '/#hello1/{c++} /#hello2/{c++} c==2{r=1} END{exit 1-r}' {} \; \
-print
Alternately, if your shell is bash version 4 or above, you can avoid using find and use the bash option globstar:
$ shopt -s globstar
$ awk 'FNR=1{c=0} /#hello1/{c++} /#hello2/{c++} c==2{print FILENAME;nextfile}' **/*
Note: none of this is tested.
If you are not nterested in the number of files also,
then just something along:
find $BASEDIRECTORY -type f -print0 | xargs -0 grep -h PATTERN | wc -l
If you want to count lines containing #hello1 and #hello2 separated by space in a specific file you can:
$ grep -c '#hello1 #hello2' file
If you want to count in more than one file:
$ grep -c '#hello1 #hello2' file1 file2 ...
And if you want to get the gran total:
$ grep -c '#hello1 #hello2' file1 file2 ... | paste -s -d+ - | bc
of course you can let your shell expanding file names. So, for example:
$ grep -c '#hello1 #hello2' *.txt | paste -s -d+ - | bc
or so...
find . -type f | xargs -1 awk '/#hello1/ && /#hello2/{c++} END{print FILENAME, c+0}'

How to count the number of files whose name contains a vowel

I was trying to code a script that counts the number of files with a vowel in a directory.
If I use
find $1 -type f | wc -l
I get the number of files in the directory $1, but I do not know how to use grep to count just the one with a vowel, I was trying something like this
find $1 -type f | grep -l '[a,e,i,o,u,A,E,I,O,U]' | wc -l
You can use this gnu find command to count all the files with at least one vowel:
find . -maxdepth 1 -type f -iname '*[aeiou]*' -printf ".\n" | wc -l
-iname '*[aeiou]*' glob pattern will match only filename with at least one of the a,e,i,o,u (ignore case).
Remove -maxdepth 1 if you want to count files recursively in sub directories as well.
If you can accept counting directories:
ls -d *a* *e* *i* *o* *u* *y* *A* *E* *I* *O* *U* *Y* | wc -l
Otherwise:
find $1 -type f | grep -i '[aeiouy]' | wc -l
Your attempt fails for two reasons. First, -l does not make sense if grep is reading in a pipeline, since the purpose of -l is to print only the input file that matched, but in this case the only input file is stdin. Second, your syntax is wrong. Try:
... | grep -i '[aeiou]' | ...
Please don't use commas in a character group expression (the thing in [] brackets)
The best way is to first do a find(1) to get the files you want to scan. Then you need the base names, as the path info is not valid. Finally, you need to grep with [aeiouAEIOU] to get only the lines with a vowel in, and finally use wc(1) to count lines.
find ${DIRECTORY} -type f -print | sed -e 's#^.*/##' | grep '[aeiouAEIOU]' | wc -l
-type f allows you to select just files (not directories). The sed(1) command edits the output, line by line, eliminating the first part of the name up to the last / character. The grep filters names with at least one vowel and discards the others, and finally wc -l counts the lines.

How do I count the number of instances a string/sub-string appears in the filenames of a certain directory in Unix?

Say my current working directory is called my_dir and I have a few different files in them:
file1
file_dog
filedog_1
dog_file
How do I count the number of instances "dog" appears (should be 3) in my directory?
Thank you!
You can use the -c option from grep
grep -c:
Suppress normal output; instead print a count of matching lines for each input file.
ls -1 | grep -c dog
or:
ls -1 *[dD][oO][gG]* | wc -l
The following solution works even if filenames contain newline characters:
$ touch file1 file_dog filedog_1 dog_file
$ find . -maxdepth 1 -name '*dog*' -print0 | grep -zc .
3
How it works:
find . -maxdepth 1 -name '*dog*' -print0
This finds all files in the current directory with dog in their name and prints them out in a nul-separated list. Since the nul character is one of the few not allowed in a file name, this is safe. If you want a case-insensitive match (so that Dog is matched as well), replace -name with -iname.
grep -zc .
This reads in a nul-separated list and counts the lines.
-z tells grep that the input is nul-separated
-c tells grep to suppress normal output and count the number of matches
. tells grep to match anything.

Resources