Lucene: How to index ngrams and search in them - search

I guess the only way to search for partial words and phrases in Lucene is by using ngramtokenizer. If there are any other ways then please suggest. I am a C# programmer and I am new to lucene and java programming. I am using Eclipse IDE and Lucene 5.3.0. I managed to create lucene index of a text file and search in it with standardanalyzer and englishanalyzer, but I am running into errors when indexing ngram tokens. I followed a few examples online and none of them work. I need a simple straightforward example of how to create index and search in it for partial words and phrases using ngrams in lucene 5.3.0. I hope, the experts here can help me out with this.
Thanks in advance.

Related

Is it possible to use RegEx to find a string of its superset in mongodb and node.js?

I am using node.js and Mongoose, and I am wondering about how to find a word by its superset word with RegEx.
for example, is it possible to find the string example by its superset string examplee or foo with the string foox?
I was tried to google it and read the documentation of MongoDB but haven't found any solution. Thanks in advance!
It is search use case.
You could find the subset using superset by phrase search with a slop parameter.
Refer
or
you can use fuzzy search.
But these will bring not only subset but also some other terms.
For eg, when you search for foods you may expect foo to return but not fee. It may bring fee but with lowest ranking. You can filter out at front-end.

Azure search, search by partial terms

Here are two examples for search in the portal, where I would expect to get some results in the second search, even with one letter missing.
The search is in Hebrew language
The full term return some results,
The same term with one letter missing return no results,
There are a few ways you can search for partial terms in Azure Search. You'll need to decide which of the following methods will work best in your scenario. Based on the example it seems either fuzzy search or prefix search will do the job. You can learn about the differences between the these methods in the documentation.
Fuzzy search: blog, documentation
Wildcard search, specifically prefix search: documentation
Regular expression search: documentation
Index partial terms by defining a custom analyzer: blog, documentation
Let me know if you have any questions about any of the above
Check this answer I solve this using a regex and change the GET by a POST request.

How to combine fuzzy search and field boosting

I'm developing a Lucene search for my Zend 1.12 site. I would like to combine fuzzy search and field boosting. I try syntax like
title:"query"^10~0.8 OR description:"query"~0.8
It seems not to change results. I've also tried to find hints on the Internet, nobody had similar problem. This is query for particular setting and field boosting cannot be set in advance.
The question is: does Lucene support such a combination of modifiers? Is this syntax correct?

How to use RegexQuery with Lucene.net?

I am doing a simple project on searchengine using C#. For this I am using lucene.net. Now I am done with indexing and simple searching..My project has one module where it extracts all email-ids on a particular page. For this i need to use some logic of regular expression..I searched and searched on the net, but couldn't find any example on how to search the index using RegexQuery or any other reg expression tool.. Please help!
I am using lucene.net version 2.9.2
I just started researching the same thing and have come to the conclusion that Regex searches wasn't supported in 2.9.2.
On the brighter side it looks like the newest version does do Regex searches.
http://incubator.apache.org/lucene.net/
To quote the site version 2.9.4 "A couple of new features: Search.Regex, Simple Faceted Search, and simple phrase analysis in the Fast Vector Highlighter"

Multilingual Search in Alfresco

I want to achieve the multilingual search in Alfresco.
And I know that in Alfresco there is a multilingual function, you could upload the different language version of the document.
But I don't know how can I related them together. That is when I search "cat", I want to get the Japanese version of the document with "ねこ"(means 'cat') in it.
But I can only get the English version as a search result.
Could anyone tell me how can I get all the related document (all the translated version of this document) as search result? Thanks.
Look at the Alfresco Wiki. If you have your locale set to jap, then you should only get the japanese version.
I think you'll need to refer through this documentation in Alfresco. http://docs.alfresco.com/4.0/tasks/tuh-multilingual.html

Resources