How to use RegexQuery with Lucene.net? - search

I am doing a simple project on searchengine using C#. For this I am using lucene.net. Now I am done with indexing and simple searching..My project has one module where it extracts all email-ids on a particular page. For this i need to use some logic of regular expression..I searched and searched on the net, but couldn't find any example on how to search the index using RegexQuery or any other reg expression tool.. Please help!
I am using lucene.net version 2.9.2

I just started researching the same thing and have come to the conclusion that Regex searches wasn't supported in 2.9.2.
On the brighter side it looks like the newest version does do Regex searches.
http://incubator.apache.org/lucene.net/
To quote the site version 2.9.4 "A couple of new features: Search.Regex, Simple Faceted Search, and simple phrase analysis in the Fast Vector Highlighter"

Related

Implementing a docs search for multiple docs sites

We have many different documentation sites and I would like to search a keyword across all of these sites. How can I do that?
I already thought about implementing a simple web scraper, but this seems like a very ugly solution.
An alternative may be to use Elasticsearch and somehow point it to the different doc repos.
Are there better suggestions?
Algolia is the absolute best solution that I can think of. There's also Typesense and Meilisearch of course.
Algolia is meant specifically for situations like yours, so it even comes with a crawler.
https://www.algolia.com/products/search-and-discovery/crawler/
https://www.algolia.com/
https://typesense.org/
https://www.meilisearch.com/
Here's a fun page comparing them (probably a little biased in Typesense's favor)
https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/
Here are some example sites that use Algolia Search
https://developers.cloudflare.com/
https://getbootstrap.com/docs/5.1/getting-started/introduction/
https://reactjs.org/
https://hn.algolia.com/
If you personally are just trying to search for a keyword, as long as they're indexed by Google, you can always search with the format site:{domain} "keyword"
You can checkout Meilisearch for your use case. Meilisearch is a Rust based and open sourced search engine.
Meilisearch comes with a document scraper tool ( https://github.com/meilisearch/docs-scraper ) that can scrape content and then also index it.
While using it you need to define what exact content you are searching for in the configuration file for the scraper tool. And then you can run the tool using Docker.

Fuzzy Search in the Search API

The Azure search api offers a fuzzy paramter for suggestions. like this:
https://blssuggestions.search.windows.net/indexes/cities/docs/suggest?api-version=2015-02-28&suggesterName=default&fuzzy=true&search=berlen
Would return "Berlin" as a result of berlen.
I can't find a documentation about this how to activate it in a normal search
setting there fuzzy = true seems to not change anything
https://blssuggestions.search.windows.net/indexes/cities/docs?api-version=2015-02-28&search=berlen&fuzzy=true
[Update]: Please see the other responsed about using querytype=full as this response is no longer correct.
This is correct. Fuzzy search is only available currently in the suggestions api.
You need to call:
https://blssuggestions.search.windows.net/indexes/cities/docs/suggest?api-version=2015-02-28&suggesterName=default&queryType=full&search=berlen~
You were missing querytype=full and the tilde after the character that you want to execute fuzzy searches on.
This is now in the preview version of the api:
https://{yourSite}.search.windows.net/indexes/{yourIndex}/docs?search={fieldToSearch}:{lookupValue}~&queryType=Full&api-version=2015-02-28-preview
Note the ~ and queryType=Full, both of which are required to force fuzzy matching.
Documentation is here:
https://msdn.microsoft.com/library/azure/mt589323.aspx
CAVEAT: The fuzzy search is very fuzzy! i.e. dog will match any 3 letter word with only a single matched letter - dim, now, bag
I am trying to figure out how to tune and tweak but as it is still in preview the documentation is sparse.
UPDATE: I just re-read the documentation and it has since been updated with details of an optional distance parameter. I will investigate.

How to use lucene query syntax on Orchard CMS

I would like to use the full Lucene query syntax on an Orchard CMS based Website.
Currently, after enabling the indexing and search on Orchard, I can search on the website according to the fields I selected on the Orchard search administration page,
but I cannot perform one search on a particular field only (without changing the behavior on the entire search)
I cannot use fuzzy search...
From the logs, I can see that Orchard take care of that part (providing Lucene a good query syntax), but I would like to do it on my own.
For example, when searching "wel" on the website, Orchard will send to Lucene this query : title:wel* body:wel* (if I have the title and body fields activated on the search).
I did see some blogs that talk about coding some features to customize search, but I would like to be sure I'm not missing something before switching to developer mode :)
There are so many scenarios that can be done with search that there is no way to provide such coverage out of the box, which is why the API is very simple to use if you need custom searching capabilities.
You should copy-paste the controller from the search module and use the Parse() method of the ISearchBuilder with the escape parameter to false. This will parse a pure lucene query. You can also use the WithField("body", "value") to do simpler field search.
I don't believe anyone has released any modules that provide additional search functionality, because if you need it, it is so simple to develop ^_^ So yes, you will have to go dev mode to do custom field search

Can Drupal's search module search for a substring? (Partial Search)

Drupal's core search module, only searches for keywords, e.g. "sandwich". Can I make it search with a substring e.g. "sandw" and return my sandwich-results?
Maybe there is a plugin that does that?
The most direct module for it is probably Fuzzy Search. I have not tried it. If you have more advanced search needs on a small to medium sized site, Search Lucene API is a fine solution. For a larger site, or truly advanced needs, Solr is the premiere solution.
Recently I made a patch for Drupal's core search module to provide it with partial search (aka n-gram searches) ability. This is tested against Drupal 6.15 & 6.16 releases. You might want to read about patching.
On the other hand you can make use of Apache Solr Search Integration, Search Lucene API modules or other 3rd-party search solutions which takes more time to implement.
PorterStemmer module has its own different story in which you might be interested, too.
Yes. Fuzzy Search (module) does it. https://drupal.org/project/fuzzysearch
Drupal Finder does it somehow, namely: it has an autosuggest feature, so if You start typich sand it should suggest You a node containing sandwich.

Grails (On App Engine) - Basic Search Functionality

What I need is Search Scaffolding but in its absence I was wondering if you could point me in the direction of any really simple examples for adding search to a domain.
I can't use the searchable plugin as it conflicts with the AppEngine plugin (Unless someone has got this to work?). I just need to be able to filter the scaffold list to contain only the results which match the query. I don't need a pure text box solution, I imagine it too look exactly like the 'create' form except when you submit you get a list of matching objects.
I hope this makes sense, thanks in advance!
Gav
Google App Engine - Full Text Search

Resources