im new with elastic and can't solve this problem
i have 2 requests:
1) curl -XGET 'host/process_test_1/1/_search?title:*New*'
it returns me
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 116,
"max_score": 1.0,
"hits": [
{
"_index": "process_test_1",
"_type": "1",
"_id": "7118_folder_1",
"_score": 1.0,
"_source": {
"obj_type": "folder",
"obj_id": 7118,
"title": "sadasd"
}
},
{
"_index": "process_test_1",
"_type": "1",
"_id": "6780_folder_1",
"_score": 1.0,
"_source": {
"obj_type": "folder",
"obj_id": 6780,
"title": "New Object"
}
}
}
]
}
}
why it returns me an object with title "sadasd"?
and second request
`curl -XGET 'host/process_test_1/1/_search' -d '{"query":{"match":{"text":{"query":"*New*","operator":"and"}}}}`'
it returns
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": [
]
}
}
why it return me nothins if i really have one element which match (actually i have more than 50 elements with such name and different ids)
First, your first query is missing the parameter name q=
curl -XGET 'host/process_test_1/1/_search?q=title:*New*'
^
|
this is missing
Second, the match query doesn't interpret the * character as a wildcard, so if you want an equivalent query using the DSL for the first query above, you need to be using the query_string query instead:
curl -XGET 'host/process_test_1/1/_search' -d '{
"query": {
"query_string": {
"query": "*New*",
"default_field": "text"
}
}
}'
Related
I have data like
id, title
'Deploying SQL Server Databases from Test to Live'
'Deploying SQL Server Databases for clients'
'Merge SQL Server databases'
'SQL Server : gather data from different databases'
.......
.......
more then millions of records.
my search query be like
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
bd={
'query':{
'match': {
'title': "Deploying SQL Server Databases from Test to Live"
}
},
'sort': {
'_score': {
'order': 'desc'
}
}
}
res = es.search(index='abc-index', body=bd)
My search result :
{
"took": 1297,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 38.9089,
"hits": [
{
"_index": "abc-index",
"_type": "_doc",
"_id": "1",
"_score": 38.9089, #normalized this value to some range [a,b]
"_source": {
"id": 1,
"title": "Deploying SQL Server Databases from Test to Live"
}
},
{
"_index": "abc-index",
"_type": "_doc",
"_id": "2",
"_score": 25.427029, #normalized this value to some range [a,b]
"_source": {
"id": 2,
"title": "Deploying SQL Server Databases for clients"
}
},
{
"_index": "abc-index",
"_type": "_doc",
"_id": "3",
"_score": 19.293251, #normalized this value to some range [a,b]
"_source": {
"id": 3,
"title": "Merge SQL Server databases"
}
},
{
"_index": "abc-index",
"_type": "_doc",
"_id": "4",
"_score": 18.969624, #normalized this value to some range [a,b]
"_source": {
"id": 4,
"title": "SQL Server : gather data from different databases"
}
}
.......... # 10,000 query result
]
}
}
I want _score value normalized in some range [a,b] for example [0,2].Can anyone please help me how to do that.
Using Elasticsearch aggregations, is it possible to return only the first hit from each aggregation? I have not found this functionality detailed in the Elastic docs.
{
took: 1,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 2,
max_score: 0.7380617,
hits: [
{},
{}
]
}
}
I use top_hits aggregation to ensure that the first hit of each aggregation is the hit which is relevant, so it would be neat if I could return only the first hit of each aggregation in a separate list. Is this at all possible, or does it require looping through the aggregated query results programmatically?
When you perform an aggregation, you want to check the aggregations json in your result, not the hits. Since you already know Top hits Aggregation, be aware that it provides a size option, so just set it to 1 and you'll have one hit per bucket.
In this example I am aggregating by a field in my index called catL1, and top-categories is the name I chose to give to my aggregation:
{
"aggs": {
"top-categories": {
"terms": {
"field": "catL1"
},
"aggs": {
"top-categories_hits": {
"top_hits": {
"size" : 1
}
}
}
}
}
}
Now my result is:
{
"took": 33,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1248280,
"max_score": 1,
"hits": [
...
]
},
"aggregations": {
"top-categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 217939,
"buckets": [
{
"key": "category1",
"doc_count": 412189,
"top-categories_hits": {
"hits": {
"total": 412189,
"max_score": 1,
"hits": [
ONLY_1_HIT
]
}
}
},
{
"key": "category2",
"doc_count": 3000189,
"top-categories_hits": {
"hits": {
"total": 3000189,
"max_score": 1,
"hits": [
ONLY_1_HIT
]
}
}
}
]
}
}
}
You can see that there is a json called aggregations, it contains only one hit per bucket (I replaced the hits with a placeholder)
EDIT:
You may be also interested in the total hits of course, but I mean that aggregations is what you are looking for in the context of this question
I need help. I have these documents on elasticsearch 1.6
{
"name":"Sam",
"age":25,
"description":"Something"
},
{
"name":"Michael",
"age":23,
"description":"Something else"
}
with this query:
GET /MyIndex/MyType/_search?q=Michael
Elastic return this object:
{
"name":"Michael",
"age":23,
"description":"Something else"
}
... That's right, but I want to get the exactly key where text "Michael" was found. Is that possible? Thanks a lot.
I assume that by key you mean the document ID.
When indexing the following documents:
PUT my_index/my_type/1
{
"name":"Sam",
"age":25,
"description":"Something"
}
PUT my_index/my_type/2
{
"name":"Michael",
"age":23,
"description":"Something else"
}
And searching for:
GET /my_index/my_type/_search?q=Michael
You'll get the following response:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.15342641,
"_source": {
"name": "Michael",
"age": 23,
"description": "Something else"
}
}
]
}
}
As you can see, the hits array contains an object for each search hit.
The key for Michael in this case is "_id": "2" which its his document id.
Hope it helps.
I have a document in the form of:
curl -XPOST localhost:9200/books/book/1 -d '{
"user_id": 1,
"pages": [ {"page_id": 1, "count": 1}, {"page_id": 2, "count": 3}]
}
Now lets say the user reads page 1 again, so I want to increment the count. The document should become:
{
"user_id": 1,
"pages": [ {"page_id": 1, "count": 2}, {"page_id": 2, "count": 3}]
}
But how do you do this update of an element of a list using an if variable?
An example of a simple update in Elasticsearch is as follows:
curl -XPOST localhost:9200/books/book/2 -d '{
"user_id": 1,
"pages": {
"page_1": 1,
"page_2": 2
}
}'
curl -XPOST localhost:9200/books/book/2/_update -d '
{
"script": "ctx._source.pages.page_1+=1"
}'
The document now becomes:
{
"user_id": 1,
"pages": {
"page_1": 1,
"page_2": 2
}
However this more simple format of a doc looses stating the page_id as a field, so the id itself acts as the field. Similarly the value associated to the field has no real definition. Thus this isn't a great solution.
Anyway, would be great to have any ideas on how to update the array accordingly or any ideas on structuring of the data.
Note: Using ES 1.4.4, You also need to add script.disable_dynamic: false to your elasticsearch.yml file.
Assuming I'm understanding your problem correctly, I would probably use a parent/child relationship.
To test it, I set up an index with a "user" parent and "page" child, as follows:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"user": {
"_id": {
"path": "user_id"
},
"properties": {
"user_id": {
"type": "integer"
}
}
},
"page": {
"_parent": {
"type": "user"
},
"_id": {
"path": "page_id"
},
"properties": {
"page_id": {
"type": "integer"
},
"count": {
"type": "integer"
}
}
}
}
}
(I used the "path" parameter in the "_id"s because it makes the indexing less redundant; the ES docs say that path is deprecated in ES 1.5, but they don't say what it's being replaced with.)
Then indexed a few docs:
POST /test_index/_bulk
{"index":{"_type":"user"}}
{"user_id":1}
{"index":{"_type":"page","_parent":1}}
{"page_id":1,"count":1}
{"index":{"_type":"page","_parent":1}}
{"page_id":2,"count":1}
Now I can use a scripted partial update to increment the "count" field of a page. Because of the parent/child relationship, I have to use the parent parameter to tell ES how to route the request.
POST /test_index/page/2/_update?parent=1
{
"script": "ctx._source.count+=1"
}
Now if I search for that document, I will see that it was updated as expected:
POST /test_index/page/_search
{
"query": {
"term": {
"page_id": {
"value": "2"
}
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "page",
"_id": "2",
"_score": 1,
"_source": {
"page_id": 2,
"count": 2
}
}
]
}
}
Here is the code all in one place:
http://sense.qbox.io/gist/9c977f15b514ec251aef8e84e9510d3de43aef8a
i'm pretty new to elasticsearch and i want to use synonyms, i added these lines in the configuration file:
index :
analysis :
analyzer :
synonym :
type : custom
tokenizer : whitespace
filter : [synonym]
filter :
synonym :
type : synonym
synonyms_path: synonyms.txt
then i created an index test:
"mappings" : {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"analyzer" : "synonym"
},
"text_2" : {
"search_analyzer" : "standard",
"index_analyzer" : "synonym",
"type" : "string"
},
"text_3" : {
"type" : "string",
"analyzer" : "synonym"
}
}
}
}
and insrted a type test with this data:
{
"text_3" : "foo dog cat",
"text_2" : "foo dog cat",
"text_1" : "foo dog cat"
}
synonyms.txt contains "foo,bar,baz", and when i search for foo it returns what i expected but when i search for baz or bar it return zero results:
{
"query":{
"query_string":{
"query" : "bar",
"fields" : [ "text_1"],
"use_dis_max" : true,
"boost" : 1.0
}}}
result:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
}
I don't know, if your problem is because you defined bad the synonyms for "bar". As you said you are pretty new I'm going to put an example similar to yours that works. I want to show how elasticsearch deal with synonyms at search time and at index time. Hope it helps.
First thing create the synonym file:
foo => foo bar, baz
Now I create the index with the particular settings you are trying to test:
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonyms.txt"
}
}
}
}
},
"mappings": {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"analyzer" : "synonym"
},
"text_2" : {
"search_analyzer" : "standard",
"index_analyzer" : "standard",
"type" : "string"
},
"text_3" : {
"type" : "string",
"search_analyzer" : "synonym",
"index_analyzer" : "standard"
}
}
}
}
}'
Note that synonyms.txt must be in the same directory that the configuration file since that path is relative to the config dir.
Now index a doc:
curl -XPUT 'http://localhost:9200/test/test/1' -d '{
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}'
Now the searches
Searching in field text_1
curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.15342641,
"_source": {
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}
}
]
}
}
You get the document because baz is synonym of foo and at index time foo is expanded with its synonyms
Searching in field text_2
curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'
result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I don't get hits because I didn't expand synonyms while indexing (standard analyzer). And, since I'm searching baz and baz is not in the text, I don't get any result.
Searching in field text_3
curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.15342641,
"_source": {
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}
}
]
}
}
Note: text_3 is "baz dog cat"
text_3 was indexes without expanding synonyms. As I'm searching for foo, which have "baz" as one of the synonyms I get the result.
If you want to debug you can use _analyze endpoint for example:
curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
result:
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "baz",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "bar",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
}
]
}