Elasticsearch terms stats query not grouping correctly - search

I have a terms stats query very similar to this one:
Sum Query in Elasticsearch
However, my key_field is a date.
I was expecting to receive results grouped by the full key_field value ["2014-01-20", "2014-01-21", "2014-01-22"] but it appears to be splitting the key field when it encounters a "-". What I received is actually grouped by ["2014", "01", "20", "21", "22"].
Why is it splitting my key?

You probably have your key_field mapped with a string-type using the standard-analyzer.
That'll tokenize 2014-01-20 into 2014, 01, and 20.
You probably want to index your date as having type date. You can also have it as a string without analyzing it.
Here's a runnable example you can play with: https://www.found.no/play/gist/5eb6b8d176e1cc72c9b8
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"settings": {},
"mappings": {
"type": {
"properties": {
"date_as_a_string": {
"type": "string"
},
"date_as_nonanalyzed_string": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"date":"2014-01-01T00:00:00.000Z","date_as_a_string":"2014-01-01T00:00:00.000Z","date_as_nonanalyzed_string":"2014-01-01T00:00:00.000Z","x":42}
'
# Do searches
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"facets": {
"date": {
"terms_stats": {
"key_field": "date",
"value_field": "x"
}
},
"date_as_a_string": {
"terms_stats": {
"key_field": "date_as_a_string",
"value_field": "x"
}
},
"date_as_nonanalyzed_string": {
"terms_stats": {
"key_field": "date_as_nonanalyzed_string",
"value_field": "x"
}
}
},
"size": 0
}
'

Related

How can I sort price string in ElasticSearch?

In my example i tried to sort but i have no success. My problem is because my price is string and the price is like that => 1.300,00. When I sort string price i have that for exemplo. 0,00 | 1,00 | 1.000,00 | 2,00.
I wanna format format in double for sort or like similar that.
How can i do that ?
It is not a good idea to keep Price as a keyword in Elastic search best approach would be to map price as scaled float in elastic search like this:
New Mapping:
PUT [index_name]/_mapping
{
"properties": {
"price2": {
"type": "scaled_float",
"scaling_factor": 100
}
}
}
To solve your problem you can add new mapping and convert your value from string to numeric value:
Update by query:
POST [index_name]/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source['price2'] = ctx._source['price'].replace(',','')"
}
}
This query will convert your keyword value to string and map it in another field named price2, then you will need to have an ingest pipeline to do the process to new entries:
Ingest pipeline:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"description": "Extract 'tags' from 'env' field",
"lang": "painless",
"source": "ctx['price2'] = ctx['price'].replace(',','')"
}
}
]
},
"docs": [
{
"_source": {
"price": "5,000.00"
}
}
]
}
You need to remove _simulate and add this ingest pipeline to your index.

index and searchs analysers in elastic search: troubles in hitting exact string as first result

I am doing tests with elastic search in indexing wikipedia's topics.
Below my settings.
Results I expect is to have first result matching the exact string - especially if string is made by one word only.
Instead:
Searching for "g"
curl "http://localhost:9200/my_index/_search?q=name:g&pretty=True"
returns
[Changgyeonggung, Lopadotemachoselachogaleokranioleipsanodrimhypotrimmatosilphioparaomelitokatakechymenokichlepikossyphophattoperisteralektryonoptekephalliokigklopeleiolagoiosiraiobaphetraganopterygon, ..] as first results (yes, serendipity time! that is a greek dish if you are curious [http://nifty.works/about/BgdKMmwV6B3r4pXJ/] :)
I thought because the results weight more "G" letters respect to other words.. but:
Searching for "google":
curl "http://localhost:9200/my_index/_search?q=name:google&pretty=True"
returns
[Googlewhack, IGoogle, Google+, Google, ..] as first results, and I would expect Google to be the first.
What is wrong in my settings for not hitting exact keyword if exists?
I used index and search analyzers for the reason suggested in this answer:[https://stackoverflow.com/a/15932838/305883]
Settings
# make index with mapping
curl -X PUT localhost:9200/test-ngram -d '
{
"settings": {
"analysis": {
"analyzer": {
"index_analyzer": {
"type" : "custom",
"tokenizer": "lowercase",
"filter": ["asciifolding", "title_ngram"]
},
"search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "stop", "asciifolding"]
}
},
"filter": {
"title_ngram" : {
"type" : "nGram",
"min_gram" : 1,
"max_gram" : 10
}
}
}
},
"mappings": {
"topic": {
"properties": {
"name": {
"type": "string",
"boost": 10.0,
"index": "analyzed",
"index_analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
}
}
'
That's because relevance works in a different way by default (check the part about TF/IDF
https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html)
If you want to have exact term match on the top of the results while also matching substrings etc, you need to index name as multifield like this:
"name": {
"type": "string",
"index": "analyzed",
// other analyzer stuff here
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}
Then in the boolean query you need to query both name and name.raw and boost results from name.raw

"stop" filter behaving differently in Elasticsearch when using "_all"

I'm trying to implement a match search in Elasticsearch, and I noticed that the behavior is different depending if I use _all or if a enter a specific string value as the field name of my query.
To give some context, I've created an index with the following settings:
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop",
"kstem",
"word_delimiter"
]
}
}
}
}
}
If I create a document like:
{
"name": "Hello.World"
}
And I execute a search using _all like:
curl -d '{"query": { "match" : { "_all" : "hello" } }}' http://localhost:9200/myindex/mytype/_search
It will correctly match the document (since I'm using the stop filter to split the words at the dot), but if I execute this query instead:
curl -d '{"query": { "match" : { "name" : "hello" } }}' http://localhost:9200/myindex/mytype/_search
Nothing is being returned instead. How is this possible?
Issue a GET for /myindex/mytype/_mapping and see if your index is configured the way you think it is. Meaning, see if that "name" field is not_analyzed, for example.
Even more, run the following query to see how name field is actually indexed:
{
"query": {
"match": {
"name": "hello"
}
},
"fielddata_fields": ["name"]
}
You should see something like this in the result:
"fields": {
"name": [
"hello",
"world"
]
}
If you don't, then you know something's wrong with your mapping for the name field.

Elasticsearch sort based on the number of occurrences a string appears in an array

I have an array field containig a list of strings: ie.: ["NY", "CA"]
At search time I have a filter which matches any of the strings in the array.
I would like to sort the results based on documents that have the most number of appearances of the searched string: "NY"
Results should include:
document 1: ["CA", "NY", "NY"]
document 2: ["NY", FL"]
document 3: ["NY", CA", "NY", "NY"]
Results should be ordered as such
User 3, User 1, User 2
Is this possible? If so, how?
For those curious, I was not able to boost based on how many occurrences of the word happen in the array. I did however accomplished what I needed with the following:
curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"states_ties":["CA"],"state_abbreviation":"CA","worked_in_states":["CA"],"training_in_states":["CA"]}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"states_ties":["CA","NY"],"state_abbreviation":"FL","worked_in_states":["NY","CA"],"training_in_states":["NY","CA"]}'
curl -X POST "http://localhost:9200/index/document/3" -d '{"id":3,"states_ties":["CA","NY","FL"],"state_abbreviation":"NY","worked_in_states":["NY","CA"],"training_in_states":["NY","FL"]}'
curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
"query": {
"custom_filters_score": {
"query": {
"terms": {
"states_ties": [
"CA"
]
}
},
"filters": [
{
"filter": {
"term": {
"state_abbreviation": "CA"
}
},
"boost": 1.03
},
{
"filter": {
"terms": {
"worked_in_states": [
"CA"
]
}
},
"boost": 1.02
},
{
"filter": {
"terms": {
"training_in_states": [
"CA"
]
}
},
"boost": 1.01
}
],
"score_mode": "multiply"
}
},
"sort": [
{
"_score": "desc"
}
]
}'
results: id: score
1: 0.75584483
2: 0.73383
3: 0.7265643
This would be accomplished by the standard Lucene scoring implementation. If you were simply searching for "NY", without specifying an order, it will sort by relevance, and will assign highest relevance to a document with more occurances of the term, all else being equal.

Indexing geospatial in elastic search results in error?

{ title: 'abcccc',
price: 3300,
price_per: 'task',
location: { lat: -33.8756, lon: 151.204 },
description: 'asdfasdf'
}
The above is the JSON that I want to index. However, when I index it, the error is:
{"error":"MapperParsingException[Failed to parse [location]]; nested: ElasticSearchIllegalArgumentException[unknown property [lat]]; ","status":400}
If I remove the "location" field, everything works.
How do I index geo? I read the tutorial and I'm still confused how it works. It should work like this, right...?
You are getting this error message because the field location s not mapped correctly. It's possible that at some point of time, you tried to index a string in this field and it's now mapped as a string. Elasticsearch cannot automatically detect that a field contains a geo_point. It has to be explicitly specified in mapping. Otherwise, Elasticsearch maps such field as a string, number or object depending on the type of geo_point representation that you used in the first indexed record. Once field is added to the mapping, its type can no longer be changed. So, in order to fix the situation, you will need to delete the mapping for this type and create it again. Here is an example of specifying mapping for a geo_point field:
curl -XDELETE "localhost:9200/geo-test/"
echo
# Set proper mapping. Elasticsearch cannot automatically detect that something is a geo_point:
curl -XPUT "localhost:9200/geo-test" -d '{
"settings": {
"index": {
"number_of_replicas" : 0,
"number_of_shards": 1
}
},
"mappings": {
"doc": {
"properties": {
"location" : {
"type" : "geo_point"
}
}
}
}
}'
echo
# Put some test data in Sydney
curl -XPUT "localhost:9200/geo-test/doc/1" -d '{
"title": "abcccc",
"price": 3300,
"price_per": "task",
"location": { "lat": -33.8756, "lon": 151.204 },
"description": "asdfasdf"
}'
curl -XPOST "localhost:9200/geo-test/_refresh"
echo
# Search, and calculate distance to Brisbane
curl -XPOST "localhost:9200/geo-test/doc/_search?pretty=true" -d '{
"query": {
"match_all": {}
},
"script_fields": {
"distance": {
"script": "doc['\''location'\''].arcDistanceInKm(-27.470,153.021)"
}
},
"fields": ["title", "location"]
}
'
echo
Since you don't specify how you parse it, this:
Parsing through JSON in JSON.NET with unknown property names
may bring some light in.

Resources