ElasticSearch mixing query types - node.js

I'm having a profoundly hard time getting this elasticsearch query to cooperate. I'm currently trying to use a Bool Query to get results for both exact and analyzed/fulltext searches.
Seems I cannot, for what ever reason, use match filters inside a bool query:
{
"bool":{
"should":[
"match":{"mainInfo.states": "Wisconsin"},
"match":{"mainInfo.cities": "WI"}
]
}
}
This throws a parser error, telling me that match is not a query type.
I need the ability to have fulltext searches inside of a bool query that also has term filters. So essentially I'd like to have a query somewhat like this:
{
"bool":{
"must":[
"term":{"mainInfo.clientId":"123456"}
],
"should":[
"match":{"mainInfo.states": "Wisconsin"},
"match":{"mainInfo.cities": "WI"}
]
}
}
Where I can have 1 or 2 terms which must exists, and several fulltext searches where 2 of them should exist.
I'm also using a function_score query to return random, page-able results using a random_seed.
My problem is that I cannot get fulltext queries to run inside of my bool query. Seems that only term filters work. Also when I put the match queries out of the bool query, it seems that either the match query works, or the bool query works, but never both.
I cannot seem to form a query that will match several term queries, and several match queries. The term must exist, where as the match should exist(so at least matching one of the filters).
I'm by no means an elasticsearch expert, and the docs seem pretty vague when you're looking for information on more complicated queries.
If anyone could lend an example as how to query in such a manner, it would be greatly appreciated. However, please don't just link me to the docs for Bool Queries, or any other ElasticSearch docs. I assure you I've read them thoroughly, and I've tried a number of different ways to execute this query, but none seem to have the expected behavior.
I'm using ElasticSearch 1.3.2

Seems I cannot, for what ever reason, use match filters inside a bool query:
The syntax of your query is not correct, if you have multiple clauses each clause should be wrapped in their own object:
{
"query":{
"custom_filters_score":{
"query":{
"nested":{
"path":"mainInfo",
"query":{
"bool":{
"should":[
{
"match":{
"mainInfo.states":"WI"
}
},
{
"match":{
"mainInfo.cities":"Wisconsin"
}
}
]
}
}
}
},
"filters":[{
"term":{
"mainInfo.clientId":"123456"
},
"script":"yourscriptgoeshere"
}]
}
}
}

Related

Usage of TSVECTOR and to_tsquery to filter records in Sequelize

I've been trying to get full search text to work for a while now without any success. The current documentation has this example:
[Op.match]: Sequelize.fn('to_tsquery', 'fat & rat') // match text search for strings 'fat' and 'rat' (PG only)
So I've built the following query:
Title.findAll({
where: {
keywords: {
[Op.match]: Sequelize.fn('to_tsquery', 'test')
}
}
})
And keywords is defined as a TSVECTOR field.
keywords: {
type: DataTypes.TSVECTOR,
},
It seems like it's generating the query properly, but I'm not getting the expected results. This is the query that it's being generated by Sequelize:
Executing (default): SELECT "id" FROM "Tests" AS "Test" WHERE "Test"."keywords" ## to_tsquery('test');
And I know that there are multiple records in the database that have 'test' in their vector, such as the following one:
{
"id": 3,
"keywords": "'keyword' 'this' 'test' 'is' 'a'",
}
so I'm unsure as to what's going on. What would be the proper way to search for matches based on a TSVECTOR field?
It's funny, but these days I am also working on the same thing and getting the same problem.
I think part of the solution is here (How to implement PostgresQL tsvector for full-text search using Sequelize?), but I haven't been able to get it to work yet.
If you find examples, I'm interested. Otherwise as soon as I find the solution that works 100% I will update this answer.
What I also notice is when I add data (seeds) from sequelize, it doesn't add the lexemes number after the data of the field in question. Do you have the same behavior ?
last thing, did you create the index ?
CREATE INDEX tsv_idx ON data USING gin(column);

How to fuzzy query against multiple fields in elasticsearch?

Here's my query as it stands:
"query":{
"fuzzy":{
"author":{
"value":query,
"fuzziness":2
},
"career_title":{
"value":query,
"fuzziness":2
}
}
}
This is part of a callback in Node.js. Query (which is being plugged in as a value to compare against) is set earlier in the function.
What I need it to be able to do is to check both the author and the career_title of a document, fuzzily, and return any documents that match in either field. The above statement never returns anything, and whenever I try to access the object it should create, it says it's undefined. I understand that I could write two queries, one to check each field, then sort the results by score, but I feel like searching every object for one field twice will be slower than searching every object for two fields once.
https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html
If you see here, in a multi match query you can specify the fuzziness...
{
"query": {
"multi_match": {
"fields": [ "text", "title" ],
"query": "SURPRIZE ME!",
"fuzziness": "AUTO"
}
}
}
Somewhat like this.. Hope this helps.

Elastic search synonym match involving numeric characters

I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.
POST /location-test-no-boost {
"settings":{
"analysis":{
"analyzer":{
"mysynonym":{
"tokenizer":"standard",
"filter":[
"standard","lowercase","stop","mysynonym"
],
"ignore_case":true
}
},
"filter":{
"mysynonym":{
"type":"synonym",
"synonyms": [
"2500 HD=>2500HD",
"chevy silverado=>Silverado"
]
}
}
}
},
"mappings":{
"vehicles":{
"properties":{
"id":{
"type":"long",
"ignore_malformed":true
},
"model":{
"type":"String",
"index_analyzer": "standard",
"search_analyzer":"mysynonym"
}
}
}
}
}
The sample document content is
POST /location-test-no-boost/vehicles/10
{
"model" : "Silverado 2500HD"
}
When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?
is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"
Ok here's your problem:
You try to define a filter that try to merge "2500 HD" into "2500HD" for searching
But the analyzer will work like this:
Perform char_filter first (if any)
Perform tokenizer first, which is standard in your definition, hence "2500 HD" will be split into two terms 2500, HD
Perform filters after that, which will transform terms into 2500, hd. Your filter synonyms will be ignored because none of them matched the passed filter.
So when you query for "2500 HD", you actually search for 2500 and hd. And none of documents matched since the indexed terms is 2500hd.
I prefer you to replace your synonyms with word_delimiter filter, something like this:
"filter":{
"my_delimiter":{
"type":"word_delimiter",
"preserve_original": true
}
}
It will transform your document 2500HD into 2500hd, 2500, hd. And hence it will match the query "2500 HD", which will be transformed into 2500, hd. Please refer the document link to find out more options.
You dont need to define a synonym filter like that. If you actually want to transform like your current definitions, let define another tokenizer instead of using standard tokenizer.
P/S: You can install inquisitor plugin to see how terms will be analyzed: https://github.com/polyfractal/elasticsearch-inquisitor

elasticsearch prefix query for multiple words to solve the autocomplete use case

How do I get elastic search to work to solve a simple autocomplete use case that has multiple words?
Lets say I have a document with the following title - Elastic search is a great search tool built on top of lucene.
So if I use the prefix query and construct it with the form -
{
"prefix" : { "title" : "Elas" }
}
It will return that document in the result set.
However if I do a prefix search for
{
"prefix" : { "title" : "Elastic sea" }
}
I get no results.
What sort of query do I need to construct so as to present to the user that result for a simple autocomplete use case.
A prefix query made on Elastic sea would match a term like Elastic search in the index, but that doesn't appear in your index if you tokenize on whitespaces. What you have is elastic and search as two different tokens. Have a look at the analyze api to find out how you are actually indexing your text.
Using a boolean query like in your answer you wouldn't take into account the position of the terms. You would get as a result the following document for example:
Elastic model is a framework to store your Moose object and search
through them.
For auto-complete purposes you might want to make a phrase query and use the last term as a prefix. That's available out of the box using the match_phrase_prefix type in a match query, which was made available exactly for your usecase:
{
"match" : {
"message" : {
"query" : "elastic sea",
"type" : "phrase_prefix"
}
}
}
With this query your example document would match but mine wouldn't since elastic is not close to search there.
To achieve that result, you will need to use a Boolean query. The partial word needs to be a prefix query and the complete word or phrase needs to be in a match clause. There are other tweaks available to the query like must should etc.. that can be applied as needed.
{
"query": {
"bool": {
"must": [
{
"prefix": {
"name": "sea"
}
},
{
"match": {
"name": "elastic"
}
}
]
}
}
}

Fuzziness settings in ElasticSearch

Need a way for my search engine to handle small typos in search strings and still return the right results.
According to the ElasticSearch docs, there are three values that are relevant to fuzzy matching in text queries: fuzziness, max_expansions, and prefix_length.
Unfortunately, there is not a lot of detail available on exactly what these parameters do, and what sane values for them are. I do know that fuzziness is supposed to be a float between 0 and 1.0, and the other two are integers.
Can anyone recommend reasonable "starting point" values for these parameters? I'm sure I will have to tune by trial and error, but I'm just looking for ballpark values to correctly handle typos and misspellings.
I found it helpful when using the fuzzy query to actually use both a term query and a fuzzy query(with the same term) in order to both retrieve results for typos, but also ensure that instances of the entered search word appeared highest in the results.
I.E.
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": search_term
}
},
{
"match": {
"_all": {
"query": search_term,
"fuzziness": "1",
"prefix_length": 2
}
}
}
]
}
}
}
a few more details listed here: https://medium.com/#wampum/fuzzy-queries-ae47b66b325c
According to the Fuzzy Query doc, default values are 0.5 for min_similarity (which looks like your fuzziness option), "unbounded" for max_expansions and 0 for prefix_length.
This answer should help you understand the min_similarity option. 0.5 seems to be a good start.
prefix_length and max_expansions will affect performance: you can try and develop with the default values, but be sure it will not scale (lucene developers were even considering setting a default value of 2 for prefix_length). I would recommend to run benchmarks to find the right values for your specific case.

Resources