Text indexer search tool which can filter by punctuation? - search

This is not a programming question per se but a question about searching source code files, which help me in programming.
I use a search tool, X1, which quickly tells me which source code files contain some keywords I am looking for. However it doesn't work well for keywords which have punctuation attached to them. For example, if I search for "show()", X1 shows everything that has "show" in it including the too many results from "MessageBox.Show(.....)" which I don't want to see.
Another example: I need to filter to show ".parent" (notice the dot) and not show everything that has "parent" (no dot) in it.
Anyone knows a text search tool which can filter by keywords that have punctuation? I really prefer a desktop app instead of web based tool like Google (I find it clunky).
I am looking for a tool which indexes words and not a general file searcher like Windows File Explorer.

If you want to search code files efficiently for keywords and punctuation,
consider the SD Source Code Search Engine. It indexes each source langauge according
to langage-specific rules, so it knows exactly the identifiers, keywords,
strings, comments, operators in that langauge and indexes it according to
those elements. It will handle a wide variety of languages: C, C++, Java, VB6, C#, COBOL,
all at once.
Your first query would be posed as:
I=show - I=MessageBox ... '('
(locate identifiers named "show" but eliminate those that are overlapped by
MessageBox leftparen).
You second query would be posed as simply
'.' I=parent
See http://www.semanticdesigns.com/Products/SearchEngine/index.html

It seem to be the job of tools like ctags and cscope.
Ctags is used to index declarations of source files (many languages supported) and Cscope for in-depth c file analysis.
These tools are more suited for a per project use in my opinion. Moreover, you may need to use another tool to use these index, I use vim myself for this purpose, but many text editors use ctags.

The tool from DTSearch.com.

Related

Why Eclipse JDT does not have a global symbol search

In the CDT there is an "Open Element" to search for global symbols, but not in JDT.
only uses "Java search" to search, obviously not very convenient, why JDT does not provide a function like this?
Anywhere in Eclipse you can use the general File Search to search for words regardless their position in text. This search can be limited to *.java files; also the Whole word option may be relevant for this question.
If you want more precise search results, JDT offers language-aware search, but for this added precision you need to specify the kind of symbol you are interested in (Search For). Without specifying the kind, search would be very similar to plain text search.
Both CDT and JDT use an index for search. The CDT index is said to be faster, because it is more complete, whereas JDT search needs to operate in two phases: index based match candidates plus exact matching using resolved AST. In fact, efforts have started, to port the concept of the more complete CDT index also to JDT for improved search speed. As of Oxygen, however, this effort has not been completed.
Anyone seeing substantial benefit in allowing to search for more than one kind at a time is invited to chime in at Bug 221081.

creating tags for a script language for easy browsing in vim

I use ctags+Vim for a lot of my projects and I really like the ability to easily browse through large chunks of code quickly.
I am also using Stata, a statistical package, which has a script language. Even though you can have routines in the code, its code tends to be series of commands that perform data and statistics operations. And the code files can be very long. So I always find myself in need of a way to browse it efficiently.
Since I use Vim, I can use marks. But I was wondering if I could use ctags to do this. That is, I want to create a tag marker which (1) won't cause a problem when I run the script (2) easy to introduce to ctags.
Because it is supposed to not break the script, it needs to be a comment. In Stata, comment lines start with * and flow comments can be made by /* ..... */.
It would be great, for example, have sections in the code, marked by comments:
* Section: Data
And ctags picks up "Data Manipulation" as the tag. So I can see a list of sections and jump to them easily without the needs for creating marks.
Is there anyway to do this? I'd appreciate any comments.
You need a way to generate a tags database for your Stata files. The format is simple, see :help tags-file-format. The default tags program, Exuberant Ctags can be extended with regular expressions (--langmap, --regex); that probably only yields an approximate parsing for complex languages, but it should suffice for custom section marks; maybe you could even directly extract interesting language keywords.

arch options in stanford tagger?

other than the standard arch options like left3words, left5words, bidirectional, bi5words, what do the rest of the options mean? And what arguments are needed for them?
I can't seem to find the documentation anywhere!
I'm afraid that the arch options are at present only documented in the source code :-(.
See the ExtractorFrames and ExtractorFramesRare classes.
A first thing to do would be to look at the arch options that are used in the distributed taggers. You can find them in the *.props files in the models subdirectory.
In brief:
"generic" gives you a decent basic
set of word and tag features
(current, previous, and next word
features, previous tag and previous
two tags, and conjunctions of
previous tag and current word and
current and previous word). It's a
good place to start.
There are various options that turn on a whole bunch of extractors to give known good configurations for English and Chinese (bidirectional, sighan2005, naacl2003unknowns).
Other options, often with a parameter, turn on sets of features in sensible ways that can be mixed together. You can see this in the definitions of the distributed Chinese and Arabic taggers. E.g., suffix(6) includes as features all word-ending substrings of length up to 6.

Using Emacs for big big projects

Maybe is a often repeated question here, but i can't find anything similar with the search.
The point is that i like to use Emacs for my personal projects, usually very small applications using C or python, but i was wondering how to use it also for my work, in which we have project with about 10k files of source code, so is veeeery big (actually i am using source insight, that is very nice tool, but only for windows), questions are:
Searching: Which is the most convenient way to search a string within the whole project?
Navigating throught the function: I mean something like putting the cursor over a function, define, var, and going to the definition
Refactoring
Also if you have any experience with this and want to share your thoughts i will consider it highly interesting.
Br
The "traditional" way of navigating C source files is to use "etags" to make a file called TAGS, then use ALT-. to go to functions across files.
For searching for strings in files, I usually use "grep". You could make a shell script with all the directories you want to search or something if you get tired of typing them in each time.
My projects typically live in git, so I put this together to quickly search them:
;; There's something similar (but fancier) in vc-git.el: vc-git-grep
;; -I means don't search through binary files
(defcustom git-grep-switches "--extended-regexp -I -n --ignore-case"
"Switches to pass to `git grep'."
:type 'string)
(defun git-grep (command-args)
(interactive
(list (read-shell-command "Run git-grep (like this): "
(format "git grep %s -e "
git-grep-switches)
'git-grep-history)))
(let ((grep-use-null-device nil))
(grep command-args)))
There is also the Emacs Code Browser. It makes exploring projects a lot simpler. See here and here for more information.
Regarding searches in the whole project, I find extremely useful the rgrep command.
Also, imenu is quite handy to jump to a function definition in the same file.
These are my 2p.
look to EDE from CEDET - it provide base support for projects...
ECB is too heavyweight for my taste. I have had good results with xcscope. Needless to say it doesn't help too much with Python.
http://www.emacswiki.org/emacs/CScopeAndEmacs
In addition to using TAGS as others have mentioned, I find igrep and igrep-find very useful. There is also Emacs' built in grep and grep-find, but I find their interface more clumsy.
My standard search is:
M-x igrep-find some_regexp RET ~/work_area/*.cxx
Which will look for all *.cxx files under ~/work/area, and show results matching some_regexp. Like all the search utilities, it populates a compilation-like buffer you can navigate using C-x ` (aka M-x next-error).
There are many ways that Icicles can help with projects. Likewise, Bookmark+ and even Dired+.
These libraries can help you create, organize, and manage projects, wherever their files and directories might reside. And they can help you navigate and search in various ways.
Some of the features are unique -- quite different from other approaches. I could list some of the project support here, but this is the best place to start.

Intelligent file search for windows that can ignore whitespace and search in code?

Does anybody know a Windows based searching tool that is easy to use and is programmer
friendly.
The functions I am looking for:
Ignore white space in search
= capable to find
myTestFunction ( $parameter, $another_parameter, $yet_another_parameter )
{ doThis();
using the query
myTestFunction($parameter,$another_parameter,$yet_another_parameter){doThis();
without Regexes.
Search code "semantically" (for me, it would have to be PHP):
Search in comments only
Search in function names only
Search for parameters that are named $xyz
Search in (insert code construct here) only
If there is none around, it's high time somebody developed it! :)
I have opened a bounty for this.
See our SD Search Engine. This is a language-sensitive search engine designed to search large code bases, with special language classifiers for C, C++, Java, C#, COBOL, JavaScript, Ada, Python, Ruby and lot of other languages, including your specific target langauge PHP (PHP4 and PHP5).
I think it does everything you requested.
It indexes the language elements so search across large code bases are extremely fast (Linux Kernal ~~ 7.5 Million lines --> 2.5 seconds). (The indexing step runs
on Windows, but the display engine is in Java.)
Search hits are shown in one-line context hit window showing the file and line number, as well as the line with the hit highlighted. Clicks on hits bring up the source code, tabs expanded appropriately, and the line count right even for languages which have odd line counting rules (such as GCC WRT form characters), with the hit line and hit text highlighted. Clicking in the source window will launch your favorite editor on the file.
Because it understands language elements, it ignores language-specific whitespace. It skips over comments unless you insist they be inspected. Searches thus ignore whitespace, comments and lineboundaries (if the language thinks lineboundaries are whitespace, which is why there are langauge-specific scanners). The query language allows you to specify which language tokens you want (specific tokens in quotes, or generic tokens such as identifiers I, numbers N, strings S, operators O and punctuation P) with constraints on the token value as well as a series of tokens.
Your example search:
myTestFunction($parameter,$another_parameter,$yet_another_parameter){doThis();
would be expressed to the search engine precisely as:
I=myTestFunction '(' I ',' I ',' I ')' '{' I=dothis '(' ')' ';'
but it would probably be easier (less typing) to find it as:
I=myTest* ... I=dothis
where I=myTest* means an identifier starting with myTest and ... means "near".
The Search Engine also offer regular expressions searches on the text, if you insist.
So you still have grep-like searches (a lot slower than indexed searches)
but with the hit window and source display windows too.
I use ack really successfully for this kind of thing, particularly when trying to find things in large codebases. I run it linux myself but I don't see any reason why it won't run on windows or in Cygwin at the very least. Check it out, I think you'll find it is exactly what you're looking for.
Search code "semantically" (for me, it would have to be PHP):
For this you could (and I think should) use some custom code using token_get_all()
See also the available tokens
Ignore white space in search
A simple regex should be sufficient. It depends on your regex-library, but most come with a whitespace modifier/flag.
For my Windows desktop search, I use Agent Ransack. I use this as a replacement for the windows search.
You can use regular expressions, but there is a nice entry screen if you want to avoid entering them directly.
Take a look at Google Desktop API, it has very powerful set of methods to do what you're looking for.
Of course it requires you to have the Google Desktop installed.
After reviewing it a little, it provides some functionality but not that specific as what you require.
I really like Crimson Editor and it allows RegEx searches. It has helped me a bunch over the past six years. I think it will fit your needs. Try it.
I use TextPad for searching code files in Windows. It has a very handy find-in-files function (Search / Find In Files) and you can use regex which should meet any search requirements. In the search results it will list the file location, line number and a snippet from that line.

Resources