I noticed that ellipsis '...' is ignored in all search engines I tested, including in stackoverflow search engine, when trying to search literature to address this POST, is there any way to avoid this?
Sadly not directly possible, most search engines ignore all punctuation. Google has made an effort to allow for certain types to be searched like "," or ".", but all of these are special cases and don't follow a general rule.
What you can do however is to search for "sizeof ellipsis" which brings the results of thoughtful writers.
Sidenote: "..." in particular is kind of special in common texts, there are lots of different ways to use it in printing, with differences in spacing, bracing, vertical position etc.. Also there is the possibility of using the single character HTML … which results in … as does Unicode U+2026.
Related
Question about Lucene,
I have a file that I would like to index and search by different analyzers. My goal is to be able to change how I search.
In one case I would like to search exact phrase with punctuation IE. for "one,two" and only return exact matchings w/ punctuation.
I would also like to be able to search the exact phrase without punctuation. IE. for "one two." As in the StandardAnalyzer
Essentially I need to change the search functionality on one field.
How can I change the search on the same file. Ive tried using two analyzers (standard and whitespace) however this makes the indexing time very long.
My second thought is to use just a WhitespaceAnalyzer and when searching pass a query that further tokenizes each string if needed? However I am not sure which API has this if any do.
Also is there a good reading on how analyzers and tokens work and are implemented.
Thanks
What do you mean you tried two analyzers? Duplicate the content to 2 seperate fields with different analyzers? That would be my suggestion.
I found the grammar error "it's" as a possessive on one page of a large project. I'm trying to search for any other usages of this on pages to correct it, but I'm getting results containing hundreds of comments. I just want to filter for the important user-facing portions of the project. Is there a way to exclude comments from the results of a global search?
In more recent versions, at least in PyCharm 2018 (similar to IntelliJ), there is a filter option "Except comments," as shown here:
(Click the small filter icon to show the dropdown.)
Note: The selected filter option persists during a session, and the active filter option is not immediately apparent unless you open the dropdown. To prevent accidentally limiting subsequent searches, it may be a good idea to switch back to "Anywhere" afterward.
Another approach would be to enable the "Regular expression" (or "Regex") checkbox in the search dialog, then use some kind of negative lookaround to exclude comments.
In one case, I needed to exclude lines with single-line comments (e.g. # this is a comment) from a search, but not lines with inline comments (e.g. a=b+1 # this is an inline comment). The following did the trick, searching for something (for Python comments, starting with #):
^((?!#).)*something.*$
Please note I'm not a regex-expert, so this regex pattern can probably be improved upon greatly, but it illustrates the idea. You can play around with this on regex101. Any comments to improve the pattern are most welcome, of course.
Note sure if this approach could be extended to multiline comments though. (as in """ several lines here """).
It's difficult to suggest something without really looking at the code, but since it seems like a one-off thing, I would use global search just in comments to temporary replace "it's" with some #temporary-token#, then use global search everywhere, you should everything what's left. Then rollback temporary token for comments. Should be easy to try with VCS. Just an idea.
As you can see, with "Comments only" option, only one #token is found.
Does anybody know a Windows based searching tool that is easy to use and is programmer
friendly.
The functions I am looking for:
Ignore white space in search
= capable to find
myTestFunction ( $parameter, $another_parameter, $yet_another_parameter )
{ doThis();
using the query
myTestFunction($parameter,$another_parameter,$yet_another_parameter){doThis();
without Regexes.
Search code "semantically" (for me, it would have to be PHP):
Search in comments only
Search in function names only
Search for parameters that are named $xyz
Search in (insert code construct here) only
If there is none around, it's high time somebody developed it! :)
I have opened a bounty for this.
See our SD Search Engine. This is a language-sensitive search engine designed to search large code bases, with special language classifiers for C, C++, Java, C#, COBOL, JavaScript, Ada, Python, Ruby and lot of other languages, including your specific target langauge PHP (PHP4 and PHP5).
I think it does everything you requested.
It indexes the language elements so search across large code bases are extremely fast (Linux Kernal ~~ 7.5 Million lines --> 2.5 seconds). (The indexing step runs
on Windows, but the display engine is in Java.)
Search hits are shown in one-line context hit window showing the file and line number, as well as the line with the hit highlighted. Clicks on hits bring up the source code, tabs expanded appropriately, and the line count right even for languages which have odd line counting rules (such as GCC WRT form characters), with the hit line and hit text highlighted. Clicking in the source window will launch your favorite editor on the file.
Because it understands language elements, it ignores language-specific whitespace. It skips over comments unless you insist they be inspected. Searches thus ignore whitespace, comments and lineboundaries (if the language thinks lineboundaries are whitespace, which is why there are langauge-specific scanners). The query language allows you to specify which language tokens you want (specific tokens in quotes, or generic tokens such as identifiers I, numbers N, strings S, operators O and punctuation P) with constraints on the token value as well as a series of tokens.
Your example search:
myTestFunction($parameter,$another_parameter,$yet_another_parameter){doThis();
would be expressed to the search engine precisely as:
I=myTestFunction '(' I ',' I ',' I ')' '{' I=dothis '(' ')' ';'
but it would probably be easier (less typing) to find it as:
I=myTest* ... I=dothis
where I=myTest* means an identifier starting with myTest and ... means "near".
The Search Engine also offer regular expressions searches on the text, if you insist.
So you still have grep-like searches (a lot slower than indexed searches)
but with the hit window and source display windows too.
I use ack really successfully for this kind of thing, particularly when trying to find things in large codebases. I run it linux myself but I don't see any reason why it won't run on windows or in Cygwin at the very least. Check it out, I think you'll find it is exactly what you're looking for.
Search code "semantically" (for me, it would have to be PHP):
For this you could (and I think should) use some custom code using token_get_all()
See also the available tokens
Ignore white space in search
A simple regex should be sufficient. It depends on your regex-library, but most come with a whitespace modifier/flag.
For my Windows desktop search, I use Agent Ransack. I use this as a replacement for the windows search.
You can use regular expressions, but there is a nice entry screen if you want to avoid entering them directly.
Take a look at Google Desktop API, it has very powerful set of methods to do what you're looking for.
Of course it requires you to have the Google Desktop installed.
After reviewing it a little, it provides some functionality but not that specific as what you require.
I really like Crimson Editor and it allows RegEx searches. It has helped me a bunch over the past six years. I think it will fit your needs. Try it.
I use TextPad for searching code files in Windows. It has a very handy find-in-files function (Search / Find In Files) and you can use regex which should meet any search requirements. In the search results it will list the file location, line number and a snippet from that line.
I know the topic of removing www.experts-echange.com has been beaten to death but having to type -site:www.experts-exchange.com is tedious. Even the ability to auto add strings to a query would solve this problem. I can probably wrap this into some wget mess but this seems like basic functionality many users would base their search engine of choice on. If you have discovered some easy method to do this for yourself please let me know.
I imagine there is a really slick toolbar that feeds google your text plus the additional strings you choose. There is some internal limit to the number of words and or operators Google searches process (with good reason I suppose).
The Google Custom Search API allows you to include or exclude sites from your search. You can add a custom search engine to your iGoogle home page.
Google custom search: http://www.google.com/coop/cse/
2 easy ways in Firefox:
Write a Grease Monkey script.
Use a search keyword. You type the keyword plus a string in the address bar to trigger a search. In this case the URL is http://www.google.com/search?q=-site:expertsexchange.com%20%s. To search on the site your tab is currently in, use javascript:location='http://www.google.com/search?num=100&q=site:'%20+%20escape(location.hostname)%20+%20'%20%S'%20;%20void%200
As an alternative to excluding results, I have a greasemonkey script that highlights google search results by domain. I configure subtle colors for a few sites of interest to me, like wikipedia & stackoverflow. But I use red for expertsexchange, which allows me to visually skip right over it.
I can publish my script if there is interest...
If you want to whip up your own script, you need to operate on two kinds of elements. Here are the two XPath expressions that I use:
//cite[contains(., '" + domain + "')]/ancestor::li[1]
//span[#class='a'][contains(., '" + domain + "')]/ancestor::div[#class='g']
Then I just apply background-color styles to matching elements. Pretty straight forward.
This is not a programming question per se but a question about searching source code files, which help me in programming.
I use a search tool, X1, which quickly tells me which source code files contain some keywords I am looking for. However it doesn't work well for keywords which have punctuation attached to them. For example, if I search for "show()", X1 shows everything that has "show" in it including the too many results from "MessageBox.Show(.....)" which I don't want to see.
Another example: I need to filter to show ".parent" (notice the dot) and not show everything that has "parent" (no dot) in it.
Anyone knows a text search tool which can filter by keywords that have punctuation? I really prefer a desktop app instead of web based tool like Google (I find it clunky).
I am looking for a tool which indexes words and not a general file searcher like Windows File Explorer.
If you want to search code files efficiently for keywords and punctuation,
consider the SD Source Code Search Engine. It indexes each source langauge according
to langage-specific rules, so it knows exactly the identifiers, keywords,
strings, comments, operators in that langauge and indexes it according to
those elements. It will handle a wide variety of languages: C, C++, Java, VB6, C#, COBOL,
all at once.
Your first query would be posed as:
I=show - I=MessageBox ... '('
(locate identifiers named "show" but eliminate those that are overlapped by
MessageBox leftparen).
You second query would be posed as simply
'.' I=parent
See http://www.semanticdesigns.com/Products/SearchEngine/index.html
It seem to be the job of tools like ctags and cscope.
Ctags is used to index declarations of source files (many languages supported) and Cscope for in-depth c file analysis.
These tools are more suited for a per project use in my opinion. Moreover, you may need to use another tool to use these index, I use vim myself for this purpose, but many text editors use ctags.
The tool from DTSearch.com.