Search in Single-Token-Field using Lucene.NET

Search in Single-Token-Field using Lucene.NET - search

I´m using Lucene.NET 3.0.3 for indexing the content of word-, excel-, etc. documents and some custom fields for each document.
If I index a field named "title" as Field.Index.NOT_ANALYZED the Lucene-Index stored the field in correct form. The hole title is stored in a single token. That´s what I want.
e.g. title of document is "Lorem ipsum dolor"
field in Lucene-index: "Lorem ipsum dolor"
If I search using exact search in this field I get no results.
My searchterm looks like: title:"Lorem ipsum dolor"
For searching i´m use the same StandardAnalzer.
Why I can´t find the document?

StandardAnalyzer is sensitive to whitespace, among other delimiters. That is, it tokenizes the search term into three tokens:
( Lorem, ipsum, dolor )
But you indexed field title using Field.Index.NOT_ANALYZED so none of the three tokens above can match the single token in this field:
( Lorem ipsum dolor )
Use KeywordAnalyzer, which tokenizes the entire field value as a single token. As always, you need to use the same analyzer for both indexing and searching.

Related

How to disable tokenization for Azure Search Autocomplete?

I've created Azure Search Suggester for "full_name" index field in order to support autocomplete functionality. Now when I use Azure autocomplete REST endpoint by using "search" parameter as a let's say "Lor" I only get back the result "Lorem" not the "Lorem Ipsum". Is there any way to disable tokenization for suggester and to get back full name like "Lorem Ipsum" for the search term "Lor" for autocomplete?

The Autocomplete API is meant to suggest search terms based on incomplete terms one is typing into to the search box (type-ahead). It supports three modes:
oneTerm – Only one term is suggested. If the query has two terms, only
the last term is completed. For example:
"washington medic" -> "medicaid", "medicare", "medicine"
twoTerms – Matching two-term phrases in the index will be suggested,
for example:
"medic" -> "medicare coverage", "medical assistant"
oneTermWithContext – Completes the last term in a query with two or
more terms, where the last two terms are a phrase that exists in the
index, for example:
"washington medic" -> "washington medicaid", "washington medical"
The twoTerms mode might work for you. If you're looking for an API that suggests documents based on an incomplete query term, try the Suggestions API. It returns the entire contents of a field that has a Suggester enabled for all documents that matched the query.

Hybris: Use same field for search and facet

I have to use a field "manufacturerName" for both solr search and solr facet in Hybris. While the solr free text search requires the field type to be text, the facet only works properly in string type.
Is there any way to use this same field for both search and facet. I think there is one way by using "copyField" but I searched a lot, and still don't know how to use it?
Any help would be highly appreciated!
PS: On keeping the field type string, free text search doesn't fetch proper results. On keeping the field type text, facet shows truncated values.

Using a copyField instruction is the way to go, but that require you to define an alternative field - meaning you have one field with the type text and the associated tokenization, and one field of the type string which isn't processed in any way. There is no way in Solr to combine these in a single field that I know of.
You'll then use the name of the string field to generate the facets, while you use the other field when you're querying.
<copyField source="text_search_field" dest="string_facet_field" />
You'll then have to refer to the name string_facet_field when you're filtering or faceting on the field. You'll want to filter against the facet field after the user selects a facet, since you otherwise would end up with documents from other facets possibly leaking into your document result set (for example if the facet was "Foo Bar", you'd suddenly get documents that had "Baz Foo Bar Spam" as the facet, since both words are present in the search string.

I was not able to implement the "copyField" approach, but I found another easy way to do this. In solr.impex, I had already added my new field manufacturerNameFacet of type string, but there is a parameter "fieldValueProvider" and "valueProviderParameter". I provided these values as "springELValueProvider" and the field I wanted to use for search and facet "manufacturerName". After a solr full indexing, it worked like a charm. No other setting was required. The search and facet both were working as expected.

Finding tweets with a word contaning "_" (underscore) with twitter API

I am using Twitter API for searching tweets, by querying the /search/tweets endpoint.
There is a problem with my search I guess
All those words that contains "_" (underscore) in them, results in weird tweets.
For example when i query using "happy hour" Or "AAPL" the search results are perfect, But the problem occurs when search words like "CL_F" Or "FN_F"
I have googled this problem but didn't get any proper solution.
Any Help would be greatly appreciated.
Thanks in advance

When you search using the Twitter REST API endpoint search/tweets, Twitter tokenizes the string you pass in the q parameter. An underscore happens to be one of characters used as a word separator. So, searching for "CL_F" will always be treated as a search for tweets containing both "CL" and "F". You will have to filter your search results for the pattern "CL_F".

Solr search using contains, sound like

Problem:
I have a movie information in solr. Two string fields define the movie title and director name. A copy field define another field which solr search for default.
I would like to have google like search with limited scope as follows. How to achieve it.
1)How to search solr for contains
E.g.
a) If the movie director name is "John Cream", searching for joh won't return anything. However, searchign for John return the correct result.
b) If there is a movie title called aaabbb and another one called aaa, searching for aaa returns only one result. I need to return the both results.
2) How to account for misspelling
E.g.
If the movie director name is "John Cream", searching for Jon returns no results. Is there a good sounds like (soundex) implementation for solr. If so how to enable it?
You can use solr query syntax

Searching for contains is obviously possible using wildcards (eg: title:*aaa* will match 'aaabbb' and also 'cccaaabbb'), but be careful about it, becouse it doesn't use indexes efficently. Do you really need this?
A soundex like search is possible applying solr.PhoneticFilterFactory filter to both your index and query. To achieve this define your fieldType like this in schema:
<fieldType name="text_soundex" class="solr.TextField">
...
<filter class="solr.PhoneticFilterFactory" encoder="Soundex" inject="true"/>
</fieldType>
If you define your "director" field as "text_soundex" you'll be able to search for "Jon" and find "John"
See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more information.

Things you are asking, the first one is definitely achievable from Solr. I don't know about soundex.
1)How to search solr for contains
You can store data into string type of field or text type of field. In string field by wild card searching you can achieve the result (E.g field1:"John*"). Also you should look into different types of analyzers. But before everything, please look into the Solr reference http://wiki.apache.org/solr/.

def self.get_search_deals(search_q, per = 50)
data = Sunspot.search(Deal) do
fulltext '*'+search_q +'*', fields: :title
paginate page: page_no, per_page: per
end
data.results
end
searchable do
text :title
end
just pass string as "*sam*"

Is there any way to search through CouchDB documents for substring

CouchDB gives an opportunity to search values from startkey, for exact key-value pair etc
But is there any way to search for substring in specified field?
The problem is like this. Our news database consists of about 40,000 news documents. Say, they have title, content and url fields. We want to find news documents which have "restaurant" in their title. Is there any way to do it?
View Collation wiki page tells nothing :( And it seems strange to me that there's no tool to handle this problem and all I can to do is just parsing JSON results with Python, PHP or smth else. In MySQL it's simply LOCATE() function..

Use couchdb-lucene.

Be careful here. Lucene is not always the best answer.
If your only searching one limited field and only searching for a word like restaurant then lucene which is really meant to tokenize large texts/documents can be way overkill, you can get the same effect by splitting the title.
function(doc){
var stringarray = doc.title.split(" ");
for(var idx in stringarray)
emit(stringarray[idx],doc);
}
Also Lucene and Couchdb do not support substring search, where the string is not in the beginning of a word.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Search in Single-Token-Field using Lucene.NET - search

Related

How to disable tokenization for Azure Search Autocomplete?

Hybris: Use same field for search and facet

Finding tweets with a word contaning "_" (underscore) with twitter API

Solr search using contains, sound like

Is there any way to search through CouchDB documents for substring

Categories

Resources