Searching by name on ElasticSearch - search

Say I have an index with thousands of names of clients and I need to be able to easily search them in a administration panel, like this:
John Anders
John Smith
Sarah Smith
Bjarne Stroustrup
I want to have full search capabilities on it, which means that:
If I search for John, I should get the John Anders and John Smith.
If I search for Smith, I should get the Smith's couple.
If I search for sarasmit or sara smit, I should get Sarah Smith as I searched for the initials of the name and the whitespace doesn't matter.
If I search for johers or joh ers, I should get John Anders as I searched for strings contained in the name.
I already figured out that I could use an analyser with lowercase filter and a keyword tokenizer but they don't work for every case.
What is the right combination of tokenizers/analysers/queries to use?

Have a look at this, this is a question I asked regarding a similar data set. Here is a look at the index settings/mapping I have used to produce some decent results. Development has ceased on this for the interim however this is what I've produced so far. You can then develop your queries -
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
},
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
},
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"synonym"
]
},
"metaphone": {
"tokenizer": "standard",
"filter": [
"my_metaphone"
]
},
"porter": {
"tokenizer": "standard",
"filter": [
"lowercase",
"porter_stem"
]
}
}
}
},
"mappings": {
"mes": {
"_all": {
"enabled": false
},
"properties": {
"pty_forename": {
"type": "multi_field",
"store": "yes",
"fields": {
"pty_forename": {
"type": "string",
"analyzer": "simple"
},
"metaphone": {
"type": "string",
"analyzer": "metaphone"
},
"porter": {
"type": "string",
"analyzer": "porter"
},
"synonym": {
"type": "string",
"analyzer": "synonym"
}
}
},
"pty_full_name": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"pty_surname": {
"type": "multi_field",
"store": "yes",
"fields": {
"pty_surname": {
"type": "string",
"analyzer": "simple"
},
"metaphone": {
"type": "string",
"analyzer": "metaphone"
},
"porter": {
"type": "string",
"analyzer": "porter"
},
"synonym": {
"type": "string",
"analyzer": "synonym"
}
}
}
}
}
}
}'
Note I have used the phonetic plugin and also I have a comprehensive list of synonyms which is critical for returning results for richard when dick is entered.

Related

Azure Resource Manager Query with multiple dynamic tag filters

I'm trying to query the Azure Cost Management API and I want to be able to filter the results based off of 2 different types of resource tags but I am having trouble figuring out the format. I can get the single tag filter working, but I'm blanking on the format for multiple. Can anyone throw in their 2 cents?
Working single filter query:
{
"type": "Usage",
"timeframe": "{TimeFrame}",
"dataset": {
"granularity": "None",
"filter": {
"tags": {
"name": "Environment",
"operator": "In",
"values": [
{Environment}
]
}
},
"aggregation": {
"totalCost": {
"name": "PreTaxCost",
"function": "Sum"
}
},
"grouping": [
{
"type": "Dimension",
"name": "{Aggregation}"
}
]
}
}
My attempt at adding more than one filter:
{
"type": "Usage",
"timeframe": "{TimeFrame}",
"dataset": {
"granularity": "None",
"filter": {
"tags": [
{
"name": "Environment",
"operator": "In",
"values": [
{Environment}
]
},
{
"name": "Location",
"operator": "In",
"values": [
{Location}
]
}
]
},
"aggregation": {
"totalCost": {
"name": "PreTaxCost",
"function": "Sum"
}
},
"grouping": [
{
"type": "Dimension",
"name": "{Aggregation}"
}
]
}
}
I am very new to Azure so please don't roast me too hard lol.
Thank you to everyone who took a look at my question, much appreciated even if you don't have an answer for me.
There was an issue with the way my parameters were set causing a bad query. Here is the working code with multiple tag attributes for filtering:
{
"type": "Usage",
"timeframe": "{TimeFrame}",
"dataset": {
"granularity": "None",
"filter": {
"and": [
{
"tags": {
"name": "Location",
"operator": "In",
"values": [{LocationTag}]
}
},
{
"tags": {
"name": "Environment",
"operator": "In",
"Values": [{EnvironmentTag}]
}
},
{
"tags": {
"name": "Integrated-System",
"operator": "In",
"Values": [{IntegratedSystemTag}]
}
}
]
},
"aggregation": {
"totalCost": {
"name": "PreTaxCost",
"function": "Sum"
}
},
"grouping": [
{
"type": "Dimension",
"name": "{Aggregation}"
}
]
}
}

Kibana: Search within text for string

I have A log message in Kibana that contains this:
org.hibernate.exception.GenericJDBCException: Cannot open connection
at org.springframework.orm.hibernate3.HibernateTransactionManager.doBegin(HibernateTransactionManager.java:597)
Actual search that isn't returning results: log_message: "hibernate3"
If I search for "hibernate3" this message will not appear. I am using an Elasticsearch template and have indexed the field, but also want to be able to do case-insensitive full-text searching. Is this possible?
Template that is in use:
{
"template": "filebeat-*",
"mappings": {
"mainProgram": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "text"
},
"beat": {
"properties": {
"hostname": {
"type": "text"
},
"name": {
"type": "text"
}
}
},
"class_method": {
"type": "text",
"fielddata": "true",
"index": "true"
},
"class_name": {
"type": "text",
"fielddata": "true"
},
"clientip": {
"type": "ip",
"index": "not_analyzed"
},
"count": {
"type": "long"
},
"host": {
"type": "text",
"index": "not_analyzed"
},
"input_type": {
"type": "text",
"index": "not_analyzed"
},
"log_level": {
"type": "text",
"fielddata": "true",
"index": "true"
},
"log_message": {
"type": "text",
"index": "true"
},
"log_timestamp": {
"type": "text"
},
"log_ts": {
"type": "long",
"index": "not_analyzed"
},
"message": {
"type": "text"
},
"offset": {
"type": "long",
"index": "not_analyzed"
},
"query_params": {
"type": "text",
"index": "true"
},
"sessionid": {
"type": "text",
"index": "true"
},
"source": {
"type": "text",
"index": "not_analyzed"
},
"tags": {
"type": "text"
},
"thread": {
"type": "text",
"index": "true"
},
"type": {
"type": "text"
},
"user_account_combo": {
"type": "text",
"index": "true"
},
"version": {
"type": "text"
}
}
},
"access": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "text"
},
"beat": {
"properties": {
"hostname": {
"type": "text"
},
"name": {
"type": "text"
}
}
},
"clientip": {
"type": "ip",
"index": "not_analyzed"
},
"count": {
"type": "long",
"index": "not_analyzed"
},
"host": {
"type": "text",
"index": "true"
},
"input_type": {
"type": "text",
"index": "not_analyzed"
},
"log_timestamp": {
"type": "text"
},
"log_ts": {
"type": "long",
"index": "not_analyzed"
},
"message": {
"type": "text"
},
"offset": {
"type": "long",
"index": "not_analyzed"
},
"query_params": {
"type": "text",
"index": "true"
},
"response_time": {
"type": "long"
},
"sessionid": {
"type": "text",
"index": "true"
},
"source": {
"type": "text",
"index": "not_analyzed"
},
"statuscode": {
"type": "long"
},
"tags": {
"type": "text"
},
"thread": {
"type": "text",
"index": "true"
},
"type": {
"type": "text",
"index": "true"
},
"uripath": {
"type": "text",
"index": "true"
},
"user_account_combo": {
"type": "text",
"index": "true"
},
"verb": {
"type": "text",
"index": "true"
}
}
}
}
}
message: *.hibernate3.*
also works (please note, that no quotes are needed for that)
According to your scenario, what you're looking for is an analyzed type string which would first analyze the string and then index it. A quote from the doc.
In other words, index this field as full text.
Thus make sure that, you have your mapping of the necessary fields properly so that you'll be able to do a full-text search on the docs.
Assuming that, in Kibana if the log line is under the field message, you could simply search for the word by:
message:"hibernate3"
You might also want to refer this, to identify the variance between Term Based and Full-Text.
EDIT
Have the mapping of the field log_message as such:
"log_message": {
"type": "string", <- to make it analyzed
"index": "true"
}
Also try doing a wildcard search as such:
{"wildcard":{"log_message":"*.hibernate3.*"}}
With Kibana 6.4.1 I used the "%" as wildcard.
message: %hibernate3%
For me it was because I was using the ".keyword".
My key was called "message" and I had "message" and "message.keyword" available.
Full text search isn't working on ".keyword".
Not working :
message.keyword : hello
Working :
message : hello

mapper_parsing_exception in new elasticsearch 2.1.1 version

Problem : I have created mapping and its working fine in elasticsearch
1.7.1 but after updating to 2.1.1 it will give me exception
EXCEPTION
response: '{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason"
:"analyzer on field [_all] must be set when search_analyzer is set"}],"type":"ma
pper_parsing_exception","reason":"Failed to parse mapping [movie]: analyzer on f
ield [_all] must be set when search_analyzer is set","caused_by":{"type":"mapper
_parsing_exception","reason":"analyzer on field [_all] must be set when search_a
nalyzer is set"}},"status":400}',
toString: [Function],
toJSON: [Function] }
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"movie": {
"_all": {
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"movieName": {
"type": "string",
"index": "not_analyzed"
},
"movieYear": {
"type": "double"
},
"imageUrl": {
"type": "string"
},
"genre": {
"type": "string"
},
"director": {
"type": "string"
},
"producer": {
"type": "string"
},
"cast": {
"type": "String"
},
"writer": {
"type": "string"
},
"synopsis": {
"type": "string"
},
"rating": {
"type": "double"
},
"price": {
"type": "double"
},
"format": {
"type": "string"
},
"offer": {
"type": "double"
},
"offerString": {
"type": "string"
},
"language": {
"type": "string"
}
}
}
}
}
The error is quite clear if you ask me, you need to specify analyzer for _all in your movie mapping. Setting index_analyzer was removed in Elasticsearch 2.0.
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},

Prioritize exact words first and then others while using query_string query in elasticsearch

If I search for a term "Liebe" the current query and the analyzers used, returns me the results containing the word "Liebe" as a part of a different word "Verlieben" are prioritized over those with only this word.
It should be the other way.
I am also using some advance filters and aggregations too. But here is the most basic query that I use to search.
{
"query": {
"query_string": {
"query": "Liebe",
"default_operator": "AND",
"analyzer": "my_analyzer1"
}
},
"size": "10",
"from": 0
}
The analyzers and index settings are as follows:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
},
"my_analyzer1":{
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_stop'.$s_id.'",
"asciifolding"
]
}
}
}
},
"mappings": {
"product": {
"_all": {
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
'.$mapping.'
}
}
}
}
Please ignore the $mapping. These are the dynamic fields that reside in the index based on some settings in my framework.
Can anyone please point me some direction where I don't need to change more and can get the what I mentioned above?
I have checked many things like match query n all. But, I don't have any fields fixed. So,I cant use that. And I want both the exact search and the search results which has partial match(Using nGrams).
Please help!
Thanks!

Elasticsearch wildcard query string with fuzziness

We have an index of items with which I'm attempting to do fuzzy wildcard on the items name.
the query
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"name.suggest"
],
"query": "avacado*",
"fuzziness": 0.7
}
}
}
}
}
the field in the index and the analyzers at play
"
suggest_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "shingle", "punctuation"]
}
"punctuation" : {
"type" : "word_delimiter",
"preserve_original": "true"
}
"name": {
"fields": {
"name": {
"type": "string",
"analyzer": "stem"
},
"suggest":{
"type": "string",
"analyzer": "suggest_analyzer"
},
"untouched": {
"include_in_all": false,
"index": "not_analyzed",
"index_options": "docs",
"omit_norms": true,
"type": "string"
},
"untouched_lowercase": {
"type": "string",
"index_analyzer": "lowercase",
"search_analyzer": "lowercase"
}
},
"type": "multi_field"
},
The problem is this
An item with the name "Avocado Test" will match for the following
avocado*
avo*
avacado
but fails to match for
avacado*
ava*
ava~2
I cant seem to make fuzzy work with wildcards, it seems to be either fuzzy works or wildcards work but not in combination.
Es version is 1.3.1
Note that my query is simplified and we have other filtering going on but I boiled it down to just the query to take any ambiguity out of the results. I've attempted to use the suggest features but they won't allow the level of filtering we need.
Is there any other way to handle doing suggest/typeahead style searching with fuzziness to catch misspellings?
You could try EdgeNgramTokenFilter, use it on a analyzer applied on the desired field and do a fuzzy search on it.

Resources