I'm trying to construct a search query based on object's property value, that is inside a nested array.
It works fine:
curl -s -XGET 'http://localhost:9200/products/product/_search?q=items.sku:ABC123'
This doesn't work. Why, what am I missing?:
{
"_source" : "items",
"query": {
"term" : {
"items.sku" : "ABC123"
}
}
}
The item I want to retrieve looks like this:
{
"name" : "Example product"
"brand" : {
"id" : 123,
"name" : "Nike"
},
"items" : [
{
id: 234,
sku: "ABC123"
},
{
id: 456,
sku: "XYZ963"
}
]
}
The query doesn't work because it's not using the same operator than the q operator.
The q operator is a "shortcut" to the use of a query_string query. The equivalent query to what you did would in fact be the following :
{
"_source": "items",
"query": {
"query_string": {
"query": "items.sku:ABC123"
}
}
}
which outputs the document as expected :
"hits": {
"total": 1,
"max_score": 0.19178301,
"hits": [
{
"_index": "products",
"_type": "product",
"_id": "AU_VTrjXeQSEeO15aKQi",
"_score": 0.19178301,
"_source": {
"items": [
{
"id": 234,
"sku": "ABC123"
},
{
"id": 456,
"sku": "XYZ963"
}
]
}
}
]
}
Difference is that the term query search for values that are not analyzed (which in your example, would be ABC123) whereas query_string choose to analyze or not the value according to the mapping of the field being queried.
I think that in your case, the mapping of your field items.sku is simply string, which means that by default, your field is analyzed with standard analyzer : your value has been indexed as abc123 because it has been lowercased (see standard analyzer documentation).
I advise you to read the analysis section of the ElasticSearch Definitive Guide (see here) to have a better understanding of this.
Related
I need to create an index in elasticsearch by assigning a default value for a field. Ex,
In python3,
request_body = {
"settings":{
"number_of_shards":1,
"number_of_replicas":1
},
"mappings":{
"properties":{
"name":{
"type":"keyword"
},
"school":{
"type":"keyword"
},
"pass":{
"type":"keyword"
}
}
}
}
from elasticsearch import Elasticsearch
es = Elasticsearch(['https://....'])
es.indices.create(index="test-index", ignore=400, body= request_body)
in above scenario, the index will be created with those fields. But i need to put a default value to "pass" as True. Can i do that here?
Elastic search is schema-less. It allows any number of fields and any content in fields without any logical constraints.
In a distributed system integrity checking can be expensive so checks like RDBMS are not available in elastic search.
Best way is to do validations at client side.
Another approach is to use ingest
Ingest pipelines let you perform common transformations on your data before indexing. For example, you can use pipelines to remove fields, extract values from text, and enrich your data.
**For testing**
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "2",
"_source": {
"name": "a",
"school":"aa"
}
}
]
}
PUT _ingest/pipeline/default-value_pipeline
{
"description": "Set default value",
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
}
**Indexing document**
POST my-index-000001/_doc?pipeline=default-value_pipeline
{
"name":"sss",
"school":"sss"
}
**Result**
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "hlQDGXoB5tcHqHDtaEQb",
"_score" : 1.0,
"_source" : {
"school" : "sss",
"pass" : "true",
"name" : "sss"
}
},
I have following collection
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending...",
}
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress...",
}
I need to group by status and get all the keys dynamically which are in status
[
{
"completed": [
{
"_id": "5b18d31a27a37696ec8b5773",
"status": "completed",
"description": "completed..."
}
]
},
{
"pending": [
{
"_id": "5b18d14cbc83fd271b6a157c",
"status": "pending",
"description": "You have to complete the challenge..."
},
{
"_id": "5b18d31a27a37696ec8b5775",
"status": "pending",
"description": "pending..."
}
]
},
{
"inProgress": [
{
"_id": "5b18d31a27a37696ec8b5776",
"status": "inProgress",
"description": "inProgress..."
}
]
}
]
Not that I think it's a good idea and mostly because I don't see any "aggregation" here at all is that after "grouping" to add to an array you similarly $push all that content into array by the "status" grouping key and then convert into keys of a document in a $replaceRoot with $arrayToObject:
db.collection.aggregate([
{ "$group": {
"_id": "$status",
"data": { "$push": "$$ROOT" }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": "$_id",
"v": "$data"
}
}
}},
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$data" }
}}
])
Returns:
{
"inProgress" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5776"),
"status" : "inProgress",
"description" : "inProgress..."
}
],
"completed" : [
{
"_id" : ObjectId("5b18d31a27a37696ec8b5773"),
"status" : "completed",
"description" : "completed..."
}
],
"pending" : [
{
"_id" : ObjectId("5b18d14cbc83fd271b6a157c"),
"status" : "pending",
"description" : "You have to complete the challenge..."
},
{
"_id" : ObjectId("5b18d31a27a37696ec8b5775"),
"status" : "pending",
"description" : "pending..."
}
]
}
That might be okay IF you actually "aggregated" beforehand, but on any practically sized collection all that is doing is trying force the whole collection into a single document, and that's likely to break the BSON Limit of 16MB, so I just would not recommend even attempting this without "grouping" something else before this step.
Frankly, the same following code does the same thing, and without aggregation tricks and no BSON limit problem:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d => {
if (!obj.hasOwnProperty(d.status))
obj[d.status] = [];
obj[d.status].push(d);
})
printjson(obj);
Or a bit shorter:
var obj = {};
// Using forEach as a premise for representing "any" cursor iteration form
db.collection.find().forEach(d =>
obj[d.status] = [
...(obj.hasOwnProperty(d.status)) ? obj[d.status] : [],
d
]
)
printjson(obj);
Aggregations are used for "data reduction" and anything that is simply "reshaping results" without actually reducing the data returned from the server is usually better handled in client code anyway. You're still returning all data no matter what you do, and the client processing of the cursor has considerably less overhead. And NO restrictions.
I am trying to get "search by example" functionality out of ElasticSearch.
I have a number of objects which have fields, e.g. name, description, objectID, etc.
I want to perform a search where, for example, "name=123" and "description=ABC"
Mapping:
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 3,
"refresh_interval": "5s",
"index.mapping.total_fields.limit": "500"
},
"mappings": {
"CFS": {
"_routing": {
"required": true
},
"properties": {
"objectId": {
"store": true,
"type": "keyword",
"index": "not_analyzed"
},
"name": {
"type": "text",
"analyzer": "standard"
},
"numberOfUpdates": {
"type": "long"
},
"dateCreated": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"lastModified": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis",
"index": "not_analyzed"
}
}
}
}
}
Trying a very simple search, without field name, gives correct result:
Request: GET http://localhost:9200/repository/CFS/_search?routing=CFS&q=CFS3
Returns:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.7831944,
"hits": [
{
"_index": "repository",
"_type": "CFS",
"_id": "589a9a62-1e4d-4545-baf9-9cc7bf4d582a",
"_score": 0.7831944,
"_routing": "CFS",
"_source": {
"doc": {
"name": "CFS3",
"description": "CFS3Desc",
"objectId": "589a9a62-1e4d-4545-baf9-9cc7bf4d582a",
"lastModified": 1480524291530,
"dateCreated": 1480524291530
}
}
}
]
}
}
But trying to prefix with a field name fails (and this happens on all fields, e.g. objectId):
Request: GET http://localhost:9200/repository/CFS/_search?routing=CFS&q=name:CFS3
Returns:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Eventually I want to do something like:
{
"bool" : {
"must" : [
{
"wildcard" : {
"name" : {
"wildcard" : "*CFS3*",
"boost" : 1.0
}
}
},
{
"wildcard" : {
"description" : {
"wildcard" : "*CFS3Desc*",
"boost" : 1.0
}
}
}
]
}
}
Maybe related? When I try to use a "multi_match" to do this, I have to prefix my field name with a wildcard, e.g.
POST http://localhost:9200/repository/CFS/_search?routing=CFS
{
"query": {
"multi_match" : {
"query" : "CFS3",
"fields" : ["*name"]
}
}
}
If I don't prefix it, it doesn't find anything. I've spent 2 days searching StackOverflow and the ElasticSearch documentation. But these issues don't seem to be mentioned.
There's lots about wildcards for search terms, and even mention of wildcards AFTER the field name, but nothing about BEFORE the field name.
What piece of information am I missing from the field name, that I need to deal with by specifying a wildcard?
I think the types of my fields in the mapping are correct. I'm specifying an analyzer.
I found out the answer to this :(
I had been keen to utilise "upserts", to avoid having to check if the object already existed, and to therefore keep performance high.
As you see at this link https://www.elastic.co/guide/en/elasticsearch/guide/current/partial-updates.html and this one https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html when calling the Update REST call, you specify your payload as:
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
When implementing the equivalent using the Java client, I didn't follow the examples exactly. Instead of what was suggested:
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject());
client.update(updateRequest).get();
I had implemented:
JsonObject state = extrapolateStateFromEvent( event );
JsonObject doc = new JsonObject();
doc.add( "doc", state );
UpdateRequest updateRequest = new UpdateRequest( indexName, event.getEntity().getType(), event.getEntity().getObjectId() );
updateRequest.routing( event.getEntity().getType() );
updateRequest.doc( doc.toString() );
updateRequest.upsert( doc.toString() );
UpdateResponse response = client.update( updateRequest ).get();
I wrapped my payload/"state" with a "doc" object, thinking it was needed.
But this had a large impact on how I interacted with my data, and at no point was I warned about it.
I guess I had accidentally created a nested object. Although I wonder why it affects the search APIs so much?
How this could be improved? Maybe the mapping could default to disallow nested objects? Or there could be some kind of validation that a programmer could perform?
My requirement is this:
If I pass multiple words for search as a list, ES will return documents with subset of word matches along with words matched So I can understand which document matched which subset.
Suppose I need to search for words such as Football, Cricket, Tennis, Golf etc.
in three documents
I am going to store these files in corresponding documents. Mappings for "mydocuments" index looks like this:
{
"mydocuments" : {
"mappings" : {
"docs" : {
"properties" : {
"file_content" : {
"type" : "string"
}
}
}
}
}
}
First Document
{ _id: 1, file_content: "I love tennis and cricket"}
Second document:
{ _id: 2, file_content: "tennis and football are very popular"}
Third document:
{ _id: 3, file_content: "football and cricket are originated in england"}
I should be able to search a single file/or multiple files for Football, Tennis,
cricket, golf and it should return something like this
Something like this
"hits":{
"total" : 3,
"hits" : [
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"file_content" : ["football","cricket"],
"postDate" : "2009-11-15T14:12:12",
}
},
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "2",
"_source" : {
"file_content" : ["football","tennis"],
"postDate" : "2009-11-15T14:12:12",
}
}
]
Or in case of multiple file searches an array of above search results
Any idea how can we do this using Elasticsearch?
If this really can not be done using elasticsearch I am ready to evaluate any other options (Native lucene, Solr)
EDIT
My bad probably I did not provide enough details. #Andrew what I meant by file is the text content of a file stored as a String Field (Full Text) in a document in ES. Assume One file corresponds to one document with text content string in a field called "file_content".
The closest thing you can get to what you want is highlighting, meaning emphasizing the searched terms in the documents.
Sample query:
{
"query": {
"match": {
"file_content": "football tennis cricket golf"
}
},
"highlight": {
"fields": {"file_content":{}}
}
}
Result:
"hits": {
"total": 3,
"max_score": 0.027847305,
"hits": [
{
"_index": "test_highlight",
"_type": "docs",
"_id": "1",
"_score": 0.027847305,
"_source": {
"file_content": "I love tennis and cricket"
},
"highlight": {
"file_content": [
"I love <em>tennis</em> and <em>cricket</em>"
]
}
},
{
"_index": "test_highlight",
"_type": "docs",
"_id": "2",
"_score": 0.023869118,
"_source": {
"file_content": "tennis and football are very popular"
},
"highlight": {
"file_content": [
"<em>tennis</em> and <em>football</em> are very popular"
]
}
},
{
"_index": "test_highlight",
"_type": "docs",
"_id": "3",
"_score": 0.023869118,
"_source": {
"file_content": "football and cricket are originated in england"
},
"highlight": {
"file_content": [
"<em>football</em> and <em>cricket</em> are originated in england"
]
}
}
]
}
As you can see the terms that were found are highlighted (elements surrounded by <em> tags) under a special highlight section.
I am developing an application that uses elastic search, and in some case I want to make a search that according to term and locales. I am testing this on localhost
http://localhost:9200/index/type/_search
and parameters
query : {
wildcard : { "term" : "myterm*" }
},
filter : {
and : [
{
term : { "lang" : "en" }
},
{
term : { "translations.lang" : "tr" } //this is subdocument search
},
]
}
Here is an example document:
{
"_index": "twitter",
"_type": "tweet",
"_id": "5084151d2c6e5d5b11000008",
"_score": null,
"_source": {
"lang": "en",
"term": "photograph",
"translations": [
{
"_id": "5084151d2c6e5d5b11000009",
"lang": "tr",
"translation": "fotoğraf",
"score": "0",
"createDate": "2012-10-21T15:30:37.994Z",
"author": "anonymous"
},
{
"_id": "50850346532b865c2000000a",
"lang": "tr",
"translation": "resim",
"score": "0",
"createDate": "2012-10-22T08:26:46.670Z",
"author": "anonymous"
}
],
"author": "anonymous",
"createDate": "2012-10-21T15:30:37.994Z"
}
}
I am trying to get terms with wildcard(for autocomplete) with input language "en", and output language "tr". It is getting terms that has "myterm" but doesnt apply, and operation on this. Any suggestion would be appreciated
Thanks in advance
I would guess that the translations element has nested type. If this is the case, you should use nested query:
curl -XPOST "http://localhost:9200/twitter/tweet/_search" -d '{
query: {
wildcard: {
"term": "term*"
}
},
filter: {
and: [{
term: {
"lang": "en"
}
}, {
"nested": {
"path": "translations",
"query": {
"term" : { "translations.lang" : "tr" }
}
}
}]
}
}'
I have manage to solve my problem with following query;
query : {
wildcard : { "term" : "myterm*" }
},
filter : {
and : [
{
term : { "lang" : "en" }
},
{
term : { "translations.lang" : "tr" } //this is subdocument search
}
]
},
sort : {
{"term" : "desc"}
}
An important point here is, you need to set your sorting field as not_analyzed. Since, you cannot sort a field that is analyzed.