Find all files without broken links - linux

I need to find all source files including linked files, but avoid broken links.
I have a c/c++ source code area, with some files linked to files to other directories and I want to index them using cscope. When there are broken links, cscope gives an error:
cscope: cannot find file /...file....
What I need actually is to create a clean cscope.files without broken linked files.
What I'm currently doing is:
find $code_path -type f -name '*.h' -o -name '*.c' -o -name '*.cpp' >> cscope.files

find with little help from file to get the filetype, and print-ing the files that do not have broken symbolic link target:
find "$code_path" -type f -name '*.h' -o -name '*.c' -o -name '*.cpp' \
-exec sh -c 'file "$1" | grep -qv broken' _ {} \; -print
This has the caveat of processing one file at a time.
if you do not have any filename with whitespace(s) or any control character(s), you can leverage find ... -exec {} + capability of dealing with maximum number of file(s) in one go without triggering ARG_MAX, and getting the desired filenames with awk:
find "$code_path" -type f -name '*.h' -o -name '*.c' -o -name '*.cpp' \
-exec file {} + | awk '!/broken/ {print $1}'
Side note: Quote your variable expansions to prevent word splitting and pathname expansion from taking place on the expansion.

Related

Using linux find command to identify files that (A) match either of two names (with wildcards) and (B) that also contain a string

The find command is really useful to identify files with a given name that also contain a string somewhere inside of them.
For instance lets say I'm looking for the string "pacf(" in an R markdown file somewhere in my current directory.
find . -name "*.Rmd" -exec grep -ls "pacf(" {} \;
I get useful results.
However, sometimes, I'm not sure if the file I am looking for is an .R file or a .Rmd file so I might also run.
find . -name "*.R" -exec grep -ls "pacf(" {} \;
And lets say there are no R files containing this string so that returns nothing.
One think I'd like to do is look in both .R and .Rmd files for this string. I would think that I could run
find . -name "*.Rmd" -o -name "*.R" -exec grep -ls "pacf(" {} \
But that returns no results.
However if I run
find . -name "*.R" -o -name "*.Rmd" -exec grep -ls "pacf(" {} \
I get the same results as just searching the .Rmd files. So it seems like it is only running the stuff in exec for the second set of files.
Is there a way I could change these commands to look through both the .R and .Rmd files at once?
Add parentheses '()'
find . \( -name '*.R' -o -name '*.Rmd' \) -exec grep -ls "pacf(" {} \;
you can pass "*[.Rmd]" for -name
like this
find . -name "*[.Rmd]" -exec grep -ls "pacf(" {} \;

Find different types of files and move to a specific directory

Finding *.mkv and *.mp4 works
find /home6/movies/ -name '*.mp4' -o -name '*.mkv'
but moving them for some reason partially fails and moves only mkv files
find /home6/movies/ -name '*.mp4' -o -name '*.mkv' -exec mv {} /home6/archive/ \;
Am I using an incorrect find switch "-o" for this task?
Looks like you need to surround the or expression in parentheses so the exec applies to both matches.
This is a similar question: `find -name` pattern that matches multiple patterns
find /home6/movies/ \( -name '*.mp4' -o -name '*.mkv' \) -exec mv {} /home6/archive/ \;

Find Command with multiple file extensions

I'm looking through many sub directories and finding all the files ending in .JPG .jpg and .png and copying them to a separate directory, however just now its only finding .JPG
Could someone explain what i'm doing wrong?
find /root/TEST/Images -name '*.png' -o -name '*.jpg' -o -name '*.JPG' -exec cp -t /root/TEST/CopiedImages {} +
You have to group the -o conditions because -a, the implied AND between the last -name '*.JPG' and -exec has higher precedence:
find /root/TEST/Images \( -name '*.png' -o -name '*.jpg' -o -name '*.JPG' \) -exec cp -t /root/TEST/CopiedImages {} +
Grouping is done with parentheses, but they have to be escaped (or quoted) due to their special meaning is shell.
Unrelated to this, you can shorten the overall expression by combining filters for jpg and JPG with the case-insensitive -iname (as noted in comments):
find /root/TEST/Images \( -name '*.png' -o -iname '*.jpg' \) -exec cp -t /root/TEST/CopiedImages {} +

How can I make find pass file names to exec without the leading directory name?

Someone created directories with names like source.c. I am doing a find over all the directories in a tree. I do want find to search in the source.c directory, but I do not want source.c to be passed to the grep I am doing on what is found.
How can I make find not pass directory names to grep? Here is what my command line looks like:
find sources* \( -name "*.h" -o -name "*.cpp" -o -name "*.c" \) -exec grep -Hi -e "ThingToFind" {} \;
Add -a -type f to your find command. This will force find to only output files, not directories. (It will still search directories):
find sources* \( -name "*.h" -o -name "*.cpp" -o -name "*.c" \) -a -type f -exec grep -Hi -e "ThingToFind" {} \;

Exclude list of files from find

If I have a list of filenames in a text file that I want to exclude when I run find, how can I do that? For example, I want to do something like:
find /dir -name "*.gz" -exclude_from skip_files
and get all the .gz files in /dir except for the files listed in skip_files. But find has no -exclude_from flag. How can I skip all the files in skip_files?
I don't think find has an option like this, you could build a command using printf and your exclude list:
find /dir -name "*.gz" $(printf "! -name %s " $(cat skip_files))
Which is the same as doing:
find /dir -name "*.gz" ! -name first_skip ! -name second_skip .... etc
Alternatively you can pipe from find into grep:
find /dir -name "*.gz" | grep -vFf skip_files
This is what i usually do to remove some files from the result (In this case i looked for all text files but wasn't interested in a bunch of valgrind memcheck reports we have here and there):
find . -type f -name '*.txt' ! -name '*mem*.txt'
It seems to be working.
I think you can try like
find /dir \( -name "*.gz" ! -name skip_file1 ! -name skip_file2 ...so on \)
find /var/www/test/ -type f \( -iname "*.*" ! -iname "*.php" ! -iname "*.jpg" ! -iname "*.png" \)
The above command gives list of all files excluding files with .php, .jpg ang .png extension. This command works for me in putty.
Josh Jolly's grep solution works, but has O(N**2) complexity, making it too slow for long lists. If the lists are sorted first (O(N*log(N)) complexity), you can use comm, which has O(N) complexity:
find /dir -name '*.gz' |sort >everything_sorted
sort skip_files >skip_files_sorted
comm -23 everything_sorted skip_files_sorted | xargs . . . etc
man your computer's comm for details.
This solution will go through all files (not exactly excluding from the find command), but will produce an output skipping files from a list of exclusions.
I found that useful while running a time-consuming command (file /dir -exec md5sum {} \;).
You can create a shell script to handle the skipping logic and run commands on the files found (make it executable with chmod, replace echo with other commands):
$ cat skip_file.sh
#!/bin/bash
found=$(grep "^$1$" files_to_skip.txt)
if [ -z "$found" ]; then
# run your command
echo $1
fi
Create a file with the list of files to skip named files_to_skip.txt (on the dir you are running from).
Then use find using it:
find /dir -name "*.gz" -exec ./skip_file.sh {} \;
This should work:
find * -name "*.gz" $(printf "! -path %s " $(<skip_files.txt))
Working out
Assuming skip_files has a filename on each line, you can get the list of filenames via $(<skip_files.txt). E.g. echo $(<skip_files.txt) should print them all out.
For each filename you want to have a ! -path filename expression. To build this, use $(printf "! -path %s " $(<skip_files.txt))
Then, put it together with a filter on -name "*.gz"

Resources