elasticsearch wildcard search with and operator - node.js

I am developing an application that uses elastic search, and in some case I want to make a search that according to term and locales. I am testing this on localhost
http://localhost:9200/index/type/_search
and parameters
query : {
wildcard : { "term" : "myterm*" }
},
filter : {
and : [
{
term : { "lang" : "en" }
},
{
term : { "translations.lang" : "tr" } //this is subdocument search
},
]
}
Here is an example document:
{
"_index": "twitter",
"_type": "tweet",
"_id": "5084151d2c6e5d5b11000008",
"_score": null,
"_source": {
"lang": "en",
"term": "photograph",
"translations": [
{
"_id": "5084151d2c6e5d5b11000009",
"lang": "tr",
"translation": "fotoğraf",
"score": "0",
"createDate": "2012-10-21T15:30:37.994Z",
"author": "anonymous"
},
{
"_id": "50850346532b865c2000000a",
"lang": "tr",
"translation": "resim",
"score": "0",
"createDate": "2012-10-22T08:26:46.670Z",
"author": "anonymous"
}
],
"author": "anonymous",
"createDate": "2012-10-21T15:30:37.994Z"
}
}
I am trying to get terms with wildcard(for autocomplete) with input language "en", and output language "tr". It is getting terms that has "myterm" but doesnt apply, and operation on this. Any suggestion would be appreciated
Thanks in advance

I would guess that the translations element has nested type. If this is the case, you should use nested query:
curl -XPOST "http://localhost:9200/twitter/tweet/_search" -d '{
query: {
wildcard: {
"term": "term*"
}
},
filter: {
and: [{
term: {
"lang": "en"
}
}, {
"nested": {
"path": "translations",
"query": {
"term" : { "translations.lang" : "tr" }
}
}
}]
}
}'

I have manage to solve my problem with following query;
query : {
wildcard : { "term" : "myterm*" }
},
filter : {
and : [
{
term : { "lang" : "en" }
},
{
term : { "translations.lang" : "tr" } //this is subdocument search
}
]
},
sort : {
{"term" : "desc"}
}
An important point here is, you need to set your sorting field as not_analyzed. Since, you cannot sort a field that is analyzed.

Related

How to define a default value when creating an index in Elasticsearch

I need to create an index in elasticsearch by assigning a default value for a field. Ex,
In python3,
request_body = {
"settings":{
"number_of_shards":1,
"number_of_replicas":1
},
"mappings":{
"properties":{
"name":{
"type":"keyword"
},
"school":{
"type":"keyword"
},
"pass":{
"type":"keyword"
}
}
}
}
from elasticsearch import Elasticsearch
es = Elasticsearch(['https://....'])
es.indices.create(index="test-index", ignore=400, body= request_body)
in above scenario, the index will be created with those fields. But i need to put a default value to "pass" as True. Can i do that here?
Elastic search is schema-less. It allows any number of fields and any content in fields without any logical constraints.
In a distributed system integrity checking can be expensive so checks like RDBMS are not available in elastic search.
Best way is to do validations at client side.
Another approach is to use ingest
Ingest pipelines let you perform common transformations on your data before indexing. For example, you can use pipelines to remove fields, extract values from text, and enrich your data.
**For testing**
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "2",
"_source": {
"name": "a",
"school":"aa"
}
}
]
}
PUT _ingest/pipeline/default-value_pipeline
{
"description": "Set default value",
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
}
**Indexing document**
POST my-index-000001/_doc?pipeline=default-value_pipeline
{
"name":"sss",
"school":"sss"
}
**Result**
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "hlQDGXoB5tcHqHDtaEQb",
"_score" : 1.0,
"_source" : {
"school" : "sss",
"pass" : "true",
"name" : "sss"
}
},

Where can I find the complete reference document for CouchDb Design Docs syntax?

Please don't tell me to "googleit"!
I have been poring over the Apache pages and the IBM pages for days trying to find the full allowed syntax for a Design Doc.
From the above readings:
the 'map' property is always a Javascript function
the 'options' property may be one/both of local_seq or include_design.
When I use Fauxton to edit a Mango Query, however, I see that the reality is much broader.
I defined a query ...
{
"selector": {
"data.type": {
"$eq": "invoice"
},
"data.idib": {
"$gt": 0,
"$lt": 99999
}
},
"sort": [
{
"data.type": "desc"
},
{
"data.idib": "desc"
}
]
}
... with an accompanying index ...
{
"index": {
"fields": [
"foo"
]
},
"name": "foo-json-index",
"type": "json"
}
... and then looked at the design doc produced ...
{
"_id": "_design/5b1cf1be5a6b7013019ba4afac2b712fc06ea82f",
"_rev": "1-1e6c5b7bc622d9b3c9b5f14cb0fcb672",
"language": "query",
"views": {
"invoice_code": {
"map": {
"fields": {
"data.type": "desc",
"data.idib": "desc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
{
"data.type": "desc"
},
{
"data.idib": "desc"
}
]
}
}
}
}
}
Both of the published syntax rules are broken!
map is not a function
options defines the fields of the index
Where can I find a full description of all the allowed properties of a Design Document?

Elasticsearch term suggester return stemmed results

why is the elasticsearch term suggester results are stemmed ?
when i do this query:
curl -XPOST 'localhost:9200/posts/_suggest' -d '{
"my-suggestion" : {
"text" : "manger",
"term" : {
"field" : "body"
}
}
}'
the expected result should be "manager" but I get back "manag":
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"my-suggest-1":[
{
"text":"mang",
"offset":0,
"length":6,
"options":[
{
"text":"manag",
"score":0.75,
"freq":180
},
{
"text":"mani",
"score":0.75,
"freq":6
}
]
}
]
}
EDIT
i found a solution for my problem: i added a standard analyzer to my query.
curl -XPOST 'localhost:9200/posts/_suggest' -d '{
"my-suggestion" : {
"text" : "manger",
"term" : {
"analyzer" : "standard",
"field" : "body"
}
}
}'
now the results are good:
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"my-suggest":[
{
"text":"mang",
"offset":0,
"length":6,
"options":[
{
"text":"manager",
"score":0.75,
"freq":180
},
{
"text":"manuel",
"score":0.75,
"freq":6
}
]
}
]
}
but i've run to another similar problem with agregations:
{
"aggs" : {
"cities" : {
"terms" : { "field" : "location" }
}
}
}
the results i get are trimmed:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 473,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"cities": {
"buckets": [{
"key": "londr",
"doc_count": 244
}, {
"key": "pari",
"doc_count": 244
}, {
"key": "tang",
"doc_count": 12
}, {
"key": "agad",
"doc_count": 8
}]
}
}
}
Terms aggregation works on "term" that are made from original text via tokenization and stemming. You need to mark field as "not_analyzed" in your index mappings to disable tokenization and stemming.
I never used suggesters, but it think that you need to disable stemming for that field, but enable tokenization. You can have two versions of field in index - one for search (tokenized and stemmed) and one for suggesters (tokenized, but non-stemmed).

Search query to retrieve nested documents in elasticsearch with _source disabled

I have the following mapping
{
"cloth": {
"dynamic" : false,
"_source" : {"enabled" : false },
"properties": {
"name": {
"type": "string",
"index": "analyzed"
},
"variation": {
"type": "nested",
"properties": {
"size": {
"type": "string",
"index": "not_analyzed"
},
"color": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
I am not able to figure out a way to retrieve the nested object fields using the fields query.
{
"fields" : ["name" , "variation.size", "variation.color"],
"query" : {
"nested" : {
"path" : "variation",
"query" : {
"bool" : {
"must" : [
{ "term" : { "variation.size" : "XXL" } },
{ "term" : { "variation.color" : "red" } }
]
}
}
}
}
}
The above query returns
"_id" : "1",
"_score" : 1.987628,
"fields" : {
"variation.size" : [ "XXL", "XL" ],
"variation.color" : [ "red", "black" ],
"name" : [ "Test shirt" ]
}
When I tried
"fields" : ["name" , "variation"]
I got the error
status: 400
reason: "ElasticsearchIllegalArgumentException[field [variation] isn't a leaf field]"
Which is as expected.
How can I get the variation object as it is?
Expected Result. I need to retrieve the variable object as a whole so that I can preserve the association of size and color. Like "red" with "XXL".
"variation" : { "XXL" , "red"}
Update: Source is disabled for this Index Type.
If you use Source Filtering it will return the nested objects as a whole, your query would be:
{
"_source": [
"name",
"variation"
],
"query": {
"nested": {
"path": "variation",
"query": {
"bool": {
"must": [
{
"term": {
"variation.size": "XXL"
}
},
{
"term": {
"variation.color": "red"
}
}
]
}
}
}
}
}
You should use this:
"script_fields": {
"variation": {
"script": {
"inline": "doc['variation.size'].value + ' ' + doc['variation.red'].value"
}
}
}
I use elasticsearch v. 5.1.1

Elasticsearch wildcard search on not_analyzed field

I have an index like following settings and mapping;
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"product":{
"properties":{
"name":{
"analyzer":"analyzer_keyword",
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I am struggling with making an implementation for wildcard search on name field. My example data like this;
[
{"name": "SVF-123"},
{"name": "SVF-234"}
]
When I perform following query;
http://localhost:9200/my_index/product/_search -d '
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"query": "*SVF-1*"
}
}
}
}
}'
It returns SVF-123,SVF-234. I think, it still tokenizes data. It must return only SVF-123.
Could you please help on this?
Thanks in advance
There's a couple of things going wrong here.
First, you are saying that you don't want terms analyzed index time. Then, there's an analyzer configured (that's used search time) that generates incompatible terms. (They are lowercased)
By default, all terms end up in the _all-field with the standard analyzer. That is where you end up searching. Since it tokenizes on "-", you end up with an OR of "*SVF" and "1*".
Try to do a terms facet on _all and on name to see what's going on.
Here's a runnable Play and gist: https://www.found.no/play/gist/3e5fcb1b4c41cfc20226 (https://gist.github.com/alexbrasetvik/3e5fcb1b4c41cfc20226)
You need to make sure the terms you index is compatible with what you search for. You probably want to disable _all, since it can muddy what's going on.
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"settings": {
"analysis": {
"text": [
"SVF-123",
"SVF-234"
],
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed",
"analyzer": "analyzer_keyword"
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-123"}
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-234"}
'
# Do searches
# See all the generated terms.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"facets": {
"name": {
"terms": {
"field": "name"
}
},
"_all": {
"terms": {
"field": "_all"
}
}
}
}
'
# Analyzed, so no match
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"name": {
"query": "SVF-123"
}
}
}
}
'
# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"name": {
"value": "SVF-123"
}
}
}
}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"_all": {
"value": "svf"
}
}
}
}
'
My solution adventure
I have started my case as you can see in my question. Whenever, I have changed a part of my settings, one part started to work, but another part stop working. Let me give my solution history:
1.) I have indexed my data as default. This means, my data is analyzed as default. This will cause problem on my side. For example;
When user started to search a keyword like SVF-1, system run this query:
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
}
and results;
SVF-123
SVF-234
This is normal, because name field of my documents are analyzed. This splits query into tokens SVF and 1, and SVF matches my documents, although 1 does not match. I have skipped this way. I have create a mapping for my fields make them not_analyzed
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
but my problem continued.
2.) I wanted to try another way after lots of research. Decided to use wildcard query.
My query is;
{
"query": {
"wildcard" : {
"name" : {
"value" : *SVF-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
This query worked, but one problem here. My fields are not_analyzed anymore, and I am making wildcard query. Case sensitivity is problem here. If I search like svf-1, it returns nothing. Since, user can input lowercase version of query.
3.) I have changed my document structure to;
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"nameLowerCase":{
"type":"string",
"index": "not_analyzed"
}
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I have adde one more field for name called nameLowerCase. When I am indexing my document, I am setting my document like;
{
name: "SVF-123",
nameLowerCase: "svf-123",
site: "pro_en_GB"
}
Here, I am converting query keyword to lowercase and make search operation on new nameLowerCase index. And displaying name field.
Final version of my query is;
{
"query": {
"wildcard" : {
"nameLowerCase" : {
"value" : "*svf-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
Now it works. There is also one way to solve this problem by using multi_field. My query contains dash(-), and faced some problems.
Lots of thanks to #Alex Brasetvik for his detailed explanation and effort
Adding to Hüseyin answer, we can use AND as the default operator. So SVF and 1* will be joined using AND operator, therefore giving us the correct results.
"query": {
"filtered" : {
"query" : {
"query_string" : {
"default_operator": "AND",
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
#Viduranga Wijesooriya as you stated "default_operator" : "AND" will check for presence of both SVF and 1 but exact match alone is still not possible,
but ya this will filter the results in more appropriate way leaving with all combination of SVF and 1 and sorting the results by relevance which will promote SVF-1 up the order
For pulling out the exact result
"settings": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
and the query is
{
"query": {
"bool": {
"must": [
{
"query_string" : {
"fields": ["name"],
"query" : "*svf-1*",
"analyze_wildcard": true
}
}
]
}
}
}
result
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "play",
"_type": "type",
"_id": "AVfXzn3oIKphDu1OoMtF",
"_score": 1,
"_source": {
"name": "SVF-123"
}
}
]
}
}

Resources