"stop" filter behaving differently in Elasticsearch when using "_all" - search

I'm trying to implement a match search in Elasticsearch, and I noticed that the behavior is different depending if I use _all or if a enter a specific string value as the field name of my query.
To give some context, I've created an index with the following settings:
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop",
"kstem",
"word_delimiter"
]
}
}
}
}
}
If I create a document like:
{
"name": "Hello.World"
}
And I execute a search using _all like:
curl -d '{"query": { "match" : { "_all" : "hello" } }}' http://localhost:9200/myindex/mytype/_search
It will correctly match the document (since I'm using the stop filter to split the words at the dot), but if I execute this query instead:
curl -d '{"query": { "match" : { "name" : "hello" } }}' http://localhost:9200/myindex/mytype/_search
Nothing is being returned instead. How is this possible?

Issue a GET for /myindex/mytype/_mapping and see if your index is configured the way you think it is. Meaning, see if that "name" field is not_analyzed, for example.
Even more, run the following query to see how name field is actually indexed:
{
"query": {
"match": {
"name": "hello"
}
},
"fielddata_fields": ["name"]
}
You should see something like this in the result:
"fields": {
"name": [
"hello",
"world"
]
}
If you don't, then you know something's wrong with your mapping for the name field.

Related

How to Configure the Elasticsearch with fuzzy search

I have requirement where I need to install the elasticsearch where they want to use it for doing fuzzy search.
How do I configure it and installed on the Linux box
Thanks
You no need any other configuration for using Elastic fuzzy search. What you care is query string.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html
To install Elasticsearch in Linux, you can refer to this official ES documentation
There can be several types of fuzzy searches according to your use case -
1. You can use match with fuzziness parameter
2. You can use fuzzy query
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index Data:
{
"name": "breadsticks"
}
Search Query using Match Query:
Searching for breastiks instead of breadsticks
{
"query":{
"match":{
"name":{
"query":"breadstiks",
"fuzziness":"auto"
}
}
}
}
Search Result:
"hits": [
{
"_index": "66962659",
"_type": "_doc",
"_id": "1",
"_score": 0.25891387,
"_source": {
"name": "breadsticks"
}
}
]
You can set the fuzziness value according to your use case
Search Query using Fuzzy query:
{
"query": {
"fuzzy": {
"name": {
"value": "breadstiks"
}
}
}
}

Logic App : Finding element in Json Object array (like XPath fr XML)

In my logic app, I have a JSON object (parsed from an API response) and it contains an object array.
How can I find a specific element based on attribute values... Example below where I want to find the (first) active one
{
"MyList" : [
{
"Descrip" : "This is the first item",
"IsActive" : "N"
},
{
"Descrip" : "This is the second item",
"IsActive" : "N"
},
{
"Descrip" : "This is the third item",
"IsActive" : "Y"
}
]
}
Well... The answer is in plain sight ... There's a FILTER ARRAY action, which works on a JSON Object (from PARSE JSON action).. couple this with an #first() expression will give the desired outcome.
You can use the Parse JSON Task to parse your JSON and a Condition to filter for the IsActive attribute:
Use the following Schema to parse the JSON:
{
"type": "object",
"properties": {
"MyList": {
"type": "array",
"items": {
"type": "object",
"properties": {
"Descrip": {
"type": "string"
},
"IsActive": {
"type": "string"
}
},
"required": [
"Descrip",
"IsActive"
]
}
}
}
}
Here how it looks like (I included the sample data you provided to test it):
Then you can add the Condition:
And perform whatever action you want within the If true section.

index and searchs analysers in elastic search: troubles in hitting exact string as first result

I am doing tests with elastic search in indexing wikipedia's topics.
Below my settings.
Results I expect is to have first result matching the exact string - especially if string is made by one word only.
Instead:
Searching for "g"
curl "http://localhost:9200/my_index/_search?q=name:g&pretty=True"
returns
[Changgyeonggung, Lopadotemachoselachogaleokranioleipsanodrimhypotrimmatosilphioparaomelitokatakechymenokichlepikossyphophattoperisteralektryonoptekephalliokigklopeleiolagoiosiraiobaphetraganopterygon, ..] as first results (yes, serendipity time! that is a greek dish if you are curious [http://nifty.works/about/BgdKMmwV6B3r4pXJ/] :)
I thought because the results weight more "G" letters respect to other words.. but:
Searching for "google":
curl "http://localhost:9200/my_index/_search?q=name:google&pretty=True"
returns
[Googlewhack, IGoogle, Google+, Google, ..] as first results, and I would expect Google to be the first.
What is wrong in my settings for not hitting exact keyword if exists?
I used index and search analyzers for the reason suggested in this answer:[https://stackoverflow.com/a/15932838/305883]
Settings
# make index with mapping
curl -X PUT localhost:9200/test-ngram -d '
{
"settings": {
"analysis": {
"analyzer": {
"index_analyzer": {
"type" : "custom",
"tokenizer": "lowercase",
"filter": ["asciifolding", "title_ngram"]
},
"search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "stop", "asciifolding"]
}
},
"filter": {
"title_ngram" : {
"type" : "nGram",
"min_gram" : 1,
"max_gram" : 10
}
}
}
},
"mappings": {
"topic": {
"properties": {
"name": {
"type": "string",
"boost": 10.0,
"index": "analyzed",
"index_analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
}
}
'
That's because relevance works in a different way by default (check the part about TF/IDF
https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html)
If you want to have exact term match on the top of the results while also matching substrings etc, you need to index name as multifield like this:
"name": {
"type": "string",
"index": "analyzed",
// other analyzer stuff here
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}
Then in the boolean query you need to query both name and name.raw and boost results from name.raw

How can I use prefix query on Korean word in Elasticsearch?

I've been doing well using Elasticsearch on "English" documents.
However, I got stuck on prefix query when using "Korean" words.
In details, a document contains word such as "한글" and I want to get the document using prefix query with search term not only "한" but also "ㅎ".
I could not do that using default settings.
I saw that it's related to icu_normalizer or nfd decomposition or something else.
But I could not totally understand the way I have to do to get the result "한글" using "ㅎ" search term.
Is there anyone can help me?
Thanks in advance.
Maybe this code helps you.
curl -XPUT '127.0.0.1:9200/test' -d '{
"settings" : {
"analysis": {
"tokenizer" : {
"autocomplete_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "1",
"max_gram" : "30",
"token_chars": ["letter", "digit"]
}
},
"char_filter" : {
"nfd_normalizer" : {
"type" : "icu_normalizer",
"name": "nfc",
"mode": "decompose"
}
},
"analyzer": {
"autocomplete_analyzer": {
"type": "custom",
"char_filter": ["nfd_normalizer"],
"tokenizer": "autocomplete_tokenizer"
}
}
}
}
}'
curl '127.0.0.1:9200/test/_analyze?pretty=1&analyzer=autocomplete_analyzer' -d '아버지가 방에 들어가신다. 태권-V'

Query all unique values of a field with Elasticsearch

How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.

Resources