Global Search in Elastic Search - search

Working on Elasticsearch, my use case is very straight forward. When a user types in a search box I want to search all of my data set irrespective of field or column or any condition (search all data and provide all occurrences of searched word in documents).
This might be available in their documentation but I'm not able to understand it. Can somebody explain on this?

The easiest way to search across all fields in an index is to use the _all field.
The _all field is a catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored.
For example:
PUT my_index/user/1
{
"first_name": "John",
"last_name": "Smith",
"date_of_birth": "1970-10-24"
}
GET my_index/_search
{
"query": {
"match": {
"_all": "john smith 1970"
}
}
}
Highlighting is supported so matching occurrences can be returned in your search results.
Drawbacks
There are two main drawbacks to this approach:
Additional disk space and memory are needed to store the _all field
You lose flexibility in how the data and search terms are analysed
A better approach is to disable the _all field and instead list out the fields you are interested in:
GET /_search
{
"query": {
"query_string" : {
"query" : "this AND that OR thus",
"fields":[
"name",
"addressline1",
"dob",
"telephone",
"country",
"zipcode"
]
}
}
}

Query_string (link) can do this job for u .
It support partial search effectively , here is my analysis https://stackoverflow.com/a/43321606/2357869 .
Query_string is more powerful than match , term and wildcard query .
Scenario 1 - Suppose u want to search "Hello" :-
Then go with :-
{
"query": {
"query_string": {"query": "*Hello*" }
}
}
It will search all words like ABCHello , HelloABC , ABCHeloABC
By default it will search hello in all fields (_all)
2) Scenario 2 - Suppose u want to search "Hello" or "World" :-
Then go with :-
{
"query": {
"query_string": {"query": "*Hello* *World*" }
}
}
It will search all words like ABCHello , HelloABC , ABCHelloABC , ABCWorldABC ,ABChello ,ABCworldABC etc.
it will search like Hello OR World , so whichever word having Hello Or world , it wiil give .
By default query_string (link) use default operator OR , u can change that .

Related

elastic search exact phrase matching

I am new to ES. I am having trouble finding exact phrase matches.
Let's assume my index has a field called movie_name.
Let's assume I have 3 documents with the following values
movie_name = Mad Max
movie_name = mad max
movie_name = mad max 3d
If my search query is Mad Max, I want the first 2 documents to be returned but not the 3rd.
If I do the "not_analyzed" solution I will get only document 1 but not 2.
What am I missing?
I was able to do it using the following commands, basically create a custom analyzer, use the keyword tokenizer to prevent tokenization. Then use the analyzer in the "mappings" for the desired field, in this case "movie_name".
PUT /movie
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"keylower":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings" : {
"search" : {
"properties" : {
"movie_name" : { "type" : "string", "analyzer":"keylower" }
}
}
}
}
Use Phrase matching like this :
{
"query": {
"match_phrase": {
"movie_name": "a"
}
}
}

How to filter out fields that do not exist in elastic search?

I would like to check if a field exists, and return results for documents where it does not exist. I am using the Golang library Elastic: https://github.com/olivere/elastic
I tried the following but it does not work:
e := elastic.NewExistsFilter("my_tag")
n := elastic.NewNotFilter(e)
filters = append(filters, n)
Ok, I wont go deep in your language query API. Since you want to search on a field not existing (null), use an exists filter inside a must_not (if you use bool filters):
{
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "your_field"
}
}
]
}
}
}
},
"from": 0,
"size": 500
}
Hope this helps!
Thanks
You can use exist query with bool query must_not:
GET /_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "your_field"
}
}
}
}
}
Tested in Elasticsearch 6.5
You can create a bool query for not exists like this:
existsQuery := elastic.NewExistsQuery(fieldName)
existsBoolQuery := elastic.NewBoolQuery().MustNot(existsQuery)
I won't try to provide a complete solution, being that I'm not really familiar with the library your using (or, indeed, the go language).
However, Lucene doesn't support pure negative querying as you have here. Lucene needs to be told what to match. Negations like this serve strictly to prohibit search results, but do not implicitly match everything else.
In order to do what you are looking for, you would want to use a boolean query to combine your not filter with a match all (which I see is available in the library).
Note: As with anytime you use a match all, performance may suffer.

Searching a number in a string field with query_string on Elasticsearch

Among other text fields, I've got this string field in my Elasticsearch index:
"user": { "type": "string", "analyzer": "simple", "norms": { "enabled": False } }
It gets filled with a typical username, e.g. "simon".
Using query_string I can limit my search results for "other search terms" to this particular user:
'query': { 'query_string': { 'query': 'user:simon other search terms' } }
Default operator is set to "AND". However, in case a username only consists of a number (saved and indexed as string), Elasticsearch appears to ignore the "user:..." statement. For example:
'query': { 'query_string': { 'query': 'user:111 other search terms' } }
yields the same results as
'query': { 'query_string': { 'query': 'other search terms' } }
Any idea what might be the cause or how to fix it?
You are using the simple tokenizer. As the documentation says:
An analyzer of type simple that is built using a Lower Case Tokenizer.
And the lower case tokenizer uses the letter tokenizer and the lower case token filter. The problem with your specific test data is that the letter tokenizer divides the text at non-letters. And the digits are non-letters. This method from Java API defines what exactly is a letter. In contrast, this method from Java API defines what exactly is a digit.
You may want to look at the standard tokenizer instead.

Elastic search having "not_analyzed" and "analyzed" together

I'm new to elasticsearch. What my business needs is that I should also do a partial matching on searchable fields I ended up with wildcard queries. my query is like this :
{
"query" : {
"wildcard" : "*search_text_here*"
}
}
Suppose that I'm searching for Red Flowers before the above query I was using an analyzed match query which provided me with both results for Red and Flowers lonely. but now my query only works when both Red Flowers are present together.
Use match phrase query as shown below for more information refer the ES doc:
GET /my_index/my_type/_search
{
"query": {
"match_phrase": {
"title": "red floewers"
}
}
}

How do I sort the search results according to the number of items in ElasticSearch?

Let's say that I store documents like this in ElasticSearch:
{
'name':'user name',
'age':43,
'location':'CA, USA',
'bio':'into java, scala, python ..etc.',
'tags':['java','scala','python','django','lift']
}
And let's say that I search using location=CA, how can I sort the results according to the number of the items in 'tags'?
I would like to list the people with the most number of tag in the first page.
You can do it indexing an additional field which contains the number of tags, on which you can then easily sort your results. Otherwise, if you are willing to pay a little performance cost at query time there's a nice solution that doesn't require to reindex your data: you can sort based on a script like this:
{
"query" : {
"match_all" : {}
},
"sort" : {
"_script" : {
"script" : "doc['tags'].values.length",
"type" : "number",
"order" : "asc"
}
}
}
As you can read from the script based sorting section:
Note, it is recommended, for single custom based script based sorting,
to use custom_score query instead as sorting based on score is faster.
That means that it'd be better to use a custom score query to influence your score, and then sort by score, like this:
{
"query" : {
"custom_score" : {
"query" : {
"match_all" : {}
},
"script" : "_score * doc['tags'].values.length"
}
}
}

Resources