GNU find, how to use debug options, -D - linux

I've been using find more successfully in my daily work especially with xargs, parallel, grep and pcregrep.
I was reading the man pages for find tonight and found a section I'd always skipped right over...Debug options.
I promptly tried:
find -D help
# then
find -D tree . -type f -name "fjdalfj"
# and
find -D search . -type f -name "fjaldkj"
Wow some really interesting output....however I was unsuccessful in tracking down what this means.
Can someone point me to a resource I can read about this or perhaps they care to explain a bit about how they would use these options or when they'd be useful.

Fedora '$ man find' provides illumination. Notice carefully where the "-D" options are placed: before the selection expression and before the search directories are listed. Most of the debug mode deals with how the selection expressions are parsed, as these can get quite involved.
tree: shows the generated parse tree for the expression.
search: verbosely log the navigation of the searched directories.
stat: trace calls to stat(2) and friends when getting the file attributes, such as its size.
rates: indicate how frequently each expression predicate is successful.
opt: provide details on how the selection expression was optimized after it is parsed.
exec: show details about the -exec, -execdir, and -okdir programs run.
You mileage may vary, your version of find(1) may have different debug options. A '$ find -D help' is probably in order.

Related

What is/how does the "search" command work?

Put briefly, can I use the 'search' command to search through file contents? If so, how?
I am trying to find an easier way to search through file contents in Linux Mint than having to type
grep -rnw . -e 'my search text'
I just noticed there is a 'search' command
$ which search
/usr/local/bin/search
However, when I look at its help text; I am presented with this:
usage: search [arguments] [options]
arguments:
for text
in directory
I am unable to make sense of this, any arguments I try passing in seem to inevitably lead me to the same text.
There is also no man page entry, I tried that as well.
I have tried e.g.
$ search "my search text" .
$ search . "test"
$ search . . . .
Lacking a way to get this to work, I may opt for an alias. Sadly though, 'search' is already taken..
I just did a clean install, and went to have a look at this 'search' executable. Turns out, it is in fact a shell command with the following resulting command:
find $directory -type f -exec grep -$case$verbose "$text" --color=auto -n {} \;
The four $variables above are respectively populated with the search directory, case-sensitivity, verbosity and ultimately the single word search text.
I have no idea why this facility exists, but as the manual suggests, you can only use it to search for a single word (no spaces), and in the specified following directory.
An example looking for (case insensitive) and without verbosity would thus be:
search for bar in ~/home/foo
The command was apparently added in Linux mint in 2008, as indicated in this post that also details its use.
Holy smokes. I never thought you may need to interpret a manual so literally;
$ search for text in directory
find: ‘directory’: No such file or directory
It still doesn't completely make me sure how to use it exactly, though; I am puzzled on whether it supports regex/wildcards, etc.
I do get good results for simple case-insensitive text with
~/mytargetfolder/ $ search for singlewordspacesdontwork in .
However, it breaks as soon as there is a space, escaping it with ", \ or ' is not working.
Given my experienced, I'd recommend steering clear of this command.
I'd also like to recommend not giving answers on StackOverflow, or asking questions like these, since I seem to be having a case of the downvotes as of just now?

Using -s command in bash script

I have a trivial error that I cant seem to get around. Im trying to return the various section numbers of lets say "man" since it resides in all the sections. I am using the -s command but am having problems. Every time I use it I keep getting "what manual page do you want". Any help?
In the case of getting the section number of a command, you want something like man -k "page_name" | awk -F'-' "/^page_name \(/ {print $1}", replacing any occurrence of page_name with whatever command you're needing.
This won't work for all systems necessarily as the format for the "man" output is "implementation-defined". In other words, the format on FreeBSD, OS X, various flavours of Linux, etc. may not be the same. For example, mine is:
page_name (1) - description
If you want the section number only, I'm sure there is something you can do such as saving the result of that line in a shell variable and use parameter expansion to remove the parentheses around the section number:
man -k "page_name" | awk -F'-' "/^page_name \(/ {print $1}" | while IFS= read sect ; do
sect="${sect##*[(]}"
sect="${sect%[)]*}"
printf '%s\n' "$sect"
done
To get the number of sections a command appears in, add | wc -l at the end on the same line as the done keyword. For the mount command, I have 3:
2
2freebsd
8
You've misinterpreted the nature of -s. From man man:
-S list, -s list, --sections=list
List is a colon- or comma-separated list of `order specific' manual sections to search. This option overrides the
$MANSECT environment variable. (The -s
spelling is for compatibility with System V.)
So when man sees man -s man it thinks you want to look for a page in section "man" (which most likely doesn't exist, since it is not a normal section), but you didn't say what page, so it asks:
What manual page do you want?
BTW, wrt "man is just the test case cuz i believe its in all the sections" -- nope, it is probably only in one, and AFAIK there isn't any word with a page in all sections. More than 2 or 3 would be very unusual.
The various standard sections are described in man man too.
The correct syntax requires an argument. Typically you're looking for either
man -s 1 man
to read the documentation for the man(1) command, or
man -s 7 man
to read about the man(7) macro package.
If you want a list of standard sections, the former contains that. You may have additional sections installed locally, though. A directory listing of /usr/local/share/man might reveal additional sections, for example.
(Incidentally, -s is not a "command" in this context, it's an option.)

Why cscope don't search definitions in header files?

I use the following command to generate my cscope database:
tmpfile=$(mktemp)
find dir1/ dir2/ dir3/ -type f -regex ".*\.\([chlysS]\(xx\|pp\)*\|cc\|hh\|inl\|inc\|ld\)$" -print > $tmpfile
cscope -q -b -U -i $tmpfile -f cscope.out
Into vim, a :cs f g myfunction only leads me to the definition in C file, and nether in header file.
Make sure you got the terminology right. In C, usually the function definitions are put in C files, whereas declarations go into header files.
The cscope command f g (find definition) should correctly take you to the function definition. In the case where you actually have definitions in a header file (for example inline functions) the find definition command takes you there as well. If this is not the case, you should file a bug report to the cscope team.
Cscope unfortunately does not provide functionality for showing only a declaration. You could use the find symbol command (f s) but this might show a lot of results if the function is called from many places in your code.
You can use ctags which usually lets you choose from the declaration or definition. I usually use a mix of cscope and ctags within my projects because neither of them provides all the functionality i want.

grep but indexable?

I have over 200mb of source code files that I have to constantly look up (I am part of a very big team). I notice that grep does not create an index so lookup requires going through the entire source code database each time.
Is there a command line utility similar to grep which has indexing ability?
The solutions below are rather simple. There are a lot of corner cases that they do not cover:
searching for start of line ^
filenames containing \n or : will fail
filenames containing white space will fail (though that can be fixed by using GNU Parallel instead of xargs)
searching for a string that matches the path of another files will be suboptimal
The good part about the solutions is that they are very easy to implement.
Solution 1: one big file
Fact: Seeking is dead slow, reading one big file is often faster.
Given those facts the idea is to simply make an index containing all the files with all their content - each line prepended with the filename and the line number:
Index a dir:
find . -type f -print0 | xargs -0 grep -Han . > .index
Use the index:
grep foo .index
Solution 2: one big compressed file
Fact: Harddrives are slow. Seeking is dead slow. Multi core CPUs are normal.
So it may be faster to read a compressed file and decompress it on the fly than reading the uncompressed file - especially if you have RAM enough to cache the compressed file but not enough for the uncompressed file.
Index a dir:
find . -type f -print0 | xargs -0 grep -Han . | pbzip2 > .index
Use the index:
pbzcat .index | grep foo
Solution 3: use index for finding potential candidates
Generating the index can be time consuming and you might not want to do that for every single change in the dir.
To speed that up only use the index for identifying filenames that might match and do an actual grep through those (hopefully limited number of) files. This will discover files that no longer match, but it will not discover new files that do match.
The sort -u is needed to avoid grepping the same file multiple times.
Index a dir:
find . -type f -print0 | xargs -0 grep -Han . | pbzip2 > .index
Use the index:
pbzcat .index | grep foo | sed s/:.*// | sort -u | xargs grep foo
Solution 4: append to the index
Re-creating the full index can be very slow. If most of the dir stays the same, you can simply append to the index with newly changed files. The index will again only be used for locating potential candidates, so if a file no longer matches it will be discovered when grepping through the actual file.
Index a dir:
find . -type f -print0 | xargs -0 grep -Han . | pbzip2 > .index
Append to the index:
find . -type f -newer .index -print0 | xargs -0 grep -Han . | pbzip2 >> .index
Use the index:
pbzcat .index | grep foo | sed s/:.*// | sort -u | xargs grep foo
It can be even faster if you use pzstd instead of pbzip2/pbzcat.
Solution 5: use git
git grep can grep through a git repository. But it seems to do a lot of seeks and is 4 times slower on my system than solution 4.
The good part is that the .git index is smaller than the .index.bz2.
Index a dir:
git init
git add .
Append to the index:
git add .
Use the index:
git grep foo
Solution 6: optimize git
Git puts its data into many small files. This results in seeking. But you can ask git to compress the small files into few, bigger files:
git gc --aggressive
This takes a while, but it packs the index very efficiently in few files.
Now you can do:
find .git -type f | xargs cat >/dev/null
git grep foo
git will do a lot of seeking into the index, but by running cat first, you put the whole index into RAM.
Adding to the index is the same as in solution 5, but run git gc now and then to avoid many small files, and git gc --aggressive to save more disk space, when the system is idle.
git will not free disk space if you remove files. So if you remove large amounts of data, remove .git and do git init; git add . again.
There is https://code.google.com/p/codesearch/ project which is capable of creating index and fast searching in the index. Regexps are supported and computed using index (actually, only subset of regexp can use index to filter file set, and then real regexp is reevaluted on the matched files).
Index from codesearch is usually 10-20% of source code size, building an index is fast like running classic grep 2 or 3 times, and the searching is almost instantaneous.
The ideas used in the codesearch project are from google's Code Search site (RIP). E.g. the index contains map from n-grams (3-grams or every 3-byte set found in your sources) to the files; and regexp is translated to 4-grams when searching.
PS And there are ctags and cscope to navigate in C/C++ sources. Ctags can find declarations/definitions, cscope is more capable, but has problems with C++.
PPS and there are also clang-based tools for C/C++/ObjC languages: http://blog.wuwon.id.au/2011/10/vim-plugin-for-navigating-c-with.html and clang-complete
I notice that grep does not create an index so lookup requires going through the entire source code database each time.
Without addressing the indexing ability part, git grep will have, with Git 2.8 (Q1 2016) the abililty to run in parallel!
See commit 89f09dd, commit 044b1f3, commit b6b468b (15 Dec 2015) by Victor Leschuk (vleschuk).
(Merged by Junio C Hamano -- gitster -- in commit bdd1cc2, 12 Jan 2016)
grep: add --threads=<num> option and grep.threads configuration
"git grep" can now be configured (or told from the command line) how
many threads to use when searching in the working tree files.
grep.threads:
Number of grep worker threads to use.
ack is a code searching tool that is optimized for programmers, especially programmers dealing with large heterogeneous source code trees: http://beyondgrep.com/
Is some of your search examples where you only want to search a certain type of file, like only Java files? Then you can do
ack --java function
ack does not index the source code, but it may not matter depending on what your searching patterns are like. In many cases, only searching for certain types of files gives the speedup that you need because you're not also searching all those other XML, etc files.
And if ack doesn't do it for you, here is a list of many tools designed for searching source code: http://beyondgrep.com/more-tools/
We use a tool internally to index very large log files and make efficient searches of them. It has been open-sourced. I don't know how well it scales to large numbers of files, though. It multithreads by default, it searches inside gzipped files, and it caches indexes of previously searched files.
https://github.com/purestorage/4grep
This grep-cache article has a script for caching grep results. His examples were run on windows with linux tools installed, so it can easily be used on nix/mac with little modification. It's mostly just a perl script anyway.
Also, the filesystem itself (assuming your using *nix) often caches recently read data, causing future grep times to be faster since grep is effectively searching virt memory instead of disk.
The cache is usually located in /proc/sys/vm/drop_caches if you want manually erase it to see the speed increase from an uncached to a cached grep.
Since you mention various kinds of text files that are not really code, I suggest you have a look at GNU ID utils. For example:
cd /tmp
# create index file named 'ID'
mkid -m /dev/null -d text /var/log/messages.*
# query index
gid -r 'spamd|kernel'
These tools focus on tokens, so queries on strings of tokens are not possible. There is minimal integration in emacs for the gid command.
For the more specific case of indexing source code, I prefer to use GNU global, which I find more flexible. For example:
cd sourcedir
# index source tree
gtags .
# look for a definition
global -x main
# look for a reference
global -xr printf
# look for another kind of symbol
global -xs argc
Global natively supports C/C++ and Java, and with a bit of configuration, can be extended to support many more languages. It also has very good integration with emacs: successive queries are stacked, and updating a source file updates the index efficiently. However I'm not aware that it is able to index plain text (yet).

linux find command is not working properly

I am using Linux(Ubuntu), I am trying to find the files, but it is not working properly.
I have created some files in my directory structure, for example: World/India/Maharashtra/Pune/filename.xml
When I use the find command like:
find /home/lokesh/Desktop/Testing_India2/Test/World/India/Maharashtra/ -name filename*.xml -mmin -3000
It is giving the result perfectly.
But, when I am using the same command at "World" or "India" level:
find /home/lokesh/Desktop/Testing_India2/Test/World/ -name filename*.xml -mmin -3000
it does not give any result.
I have lots of directories at "India" level as well as at "Maharashtra" level and may be some directories within "Maharashtra's" inner directories. I have to find each file created in all directories.
And I have mounted all folders from different machine.(I mean some state from different and some from different machine.)
If someone knows how to solve this problem please reply me as soon as possible.
Double quote your search string and -L to make it follow symbolic links:
find -L /home/lokesh/Desktop/Testing_India2/Test/World/ -name "filename*.xml" -mmin -30000
This is something I ran into earlier today actually when using the '*' wildcard. I couldn't get it to continually traverse the subdirectories unless I escaped the * with a .
Give this a try:
find -L /home/lokesh/Desktop/Testing_India2/Test/World/ -name filename\*.xml -mmin -30000
Yes, as mentioned you have to double qoute your -name argument or use a backslash prior to the *. The reason for it not working from one directory, but working fine in other directories, is that the * character is used for filename generation by your shell. This of course happens before the find command is executed. Therefore, if you have a file that match the filename*.xml pattern in your current directory it will be substituted before find is executed, which is not what you want. On the other hand, if there is no pattern match in the current directory, the * character is passed on to the find command unmodified. By qouting you protect the string from shell filename generation.
Regards

Resources