I have been tasked to use lucene to search in our product table. I have created an index and am searching using a QueryParser with multiple fields, but the results are not what I require.
I have a product that is stored as LM10, but I want to be able to find it if the search term is LM 10, but it also must be able to match if the search term is Fred LM10 or Fred LM 10.
Any ideas how I can do this in Lucene.
Thanks in advance
use a Tokenizer that splits tokens on word/number changes, apply it both at index and query time. You might use solr.WordDelimiterFilterFactory and avoid having to write a custom one.
Related
Can somebody explain in details on difference between :
Search and filter keyword
I have already gone through https://www.arangodb.com/learn/search/tutorial/ -> SEARCH vs FILTER
Do anybody has any other experience on the difference?
Thanks,
Nilotpal
FILTER corresponds to the WHERE clause in SQL. It does, what the name says. It uses all sorts of arithmetic and AQL operators to filter the search result. It can make use of regular indexes. There is no ranking of filtered results. Filters operate on single collection result sets.
SEARCH offers a full fledged search engine very much like what you would get from regular search engines like Google's page ranking based on a grammar that you could formulate on your own and can operate on multiple collection contents. Its most natural functionality would be a full text search and ranking. In that use it would be a much more powerful version of the full-text index. But it can do much more: normalisation, tokenisation based on language ...
The list goes on and on. Please refer to the documentation of search here:
https://www.arangodb.com/docs/stable/arangosearch.html
Our search service uses Azure Cognitve Search in the following way:
Search non-fuzzy (i.e. with full match of query string).
Search fuzzy (i.e. it's allowed to change 1-2 letters in a query string)
Join results by certain rule.
This way we want to achieve that full match results will always be on the top.
But now we want to introduce a pagination. And to do it with two separate queries is a difficult and not effective task.
An alternative would be to somehow create a single query which will combine in itself both fuzzy and non-fuzzy search but with different scoring profiles, one with higher weights for full-match search and another with lower weights for fuzzy search.
Like
search=rabbit&scoringProfile=highWeightsProfile | seacrh=rabbit~&scoringProfile=lowWeightsProfile
Is there any way to do this, either in API or in SDK?
Is there any other alternative solutions to the problem of fuzzy search but with higher scores for full-match?
Boosting individual subqueries with Lucene query syntax worked for me as a good solution. Maybe not that flexible as separate search profiles for fuzzy and non-fuzzy parts, but still good.
I am building a database in Neo4J. I am trying to build a match query within the fulltext search. The search query has to be quite robust as it will take queries from users which are not familiar with the search term and return the node which best matches the term. I am aware of a few ways of doing this, but all require that the search term is fuzzied and not the return term. My current rules rely on contains / does not contain and loops, without building a new database, is there a way to fuzzy the search term so that essentially the nodes will search through the term provided and not the other way round?
I am aware that this may not make sense. It is only my 3rd day on Neo4J. Please let me know if you need any more clarification.
Edit: I figure that I can combine the does/does not contain search terms and fuzzy the search term, by increasing the does contain score and decreasing the does not contain score.
Lets say I have my list of ingredients:
{'potato','rice','carrot','corn'}
and I want to return lists from a database that are most similar to mine:
{'beans','potato','oranges','lettuce'},
{'carrot','rice','corn','apple'}
{'onion','garlic','radish','eggs'}
My query would return this first:
{'carrot','rice','corn','apple'}
I've used Solr, and have looked at CloudSearch, ElasticSearch, Algolia, Searchify and Swiftype. These engines only seem to let me put in one query string and then filter by other facets.
In a real scenario my search list will be about 200 items long and will be matching against about a million lists in my database.
What technology should I use to accomplish what I want to do?
Should I look away from search indexers and more towards database-esque things like mongo, map reduce, hadoop... All I know are the names of other technologies and I just need someone to point me in the right direction on what technology path I should be exploring for this.
With so much data I can't really loop through it, I need to query everything at once.
I wonder what keeps you from trying it with Solr, as Solr provides much of what you need. You can declare the field as type="string" multiValued="true and save each list item as a value. Then, when querying, you specify each of the items in the list to look for as a search term for that field, and Solr will – by default – return the closest match.
If you need exact control over what will be regarded as a match (e.g. at least 40% of the terms from the search list have to be in a matching list) you can use the mm EDisMax parameter, cf. Solr Wiki
Having said that, I must add that I’ve never searched for 200 query terms (do I unerstand correctly that the list whose contents should be searched will contain about 200 items?) and do not know how well that performs. But I guess that setting up a test core and filling it with random lists using a script should not take more than a few hours, so it should be possible to evaluate the performance of this approach without investing too much time.
I have multiple indices populated in my elasticsearch engine. And I have one text search box which is supposed to query all indices for possible hits. I am planning to query these indices fuzzy and autocomplete. Any suggestion on how the implementation should look like?
Use either GET /_all/_search endpoint or create an alias that gathers under it all the indices you want and use GET /[alias_name]/_search.
As to which field to search, I think _all field could be a good match, depending on how you have your mappings configured (disabling _all or not).