Elastic Search size to unlimited - node.js

Am new to elastic search. Am facing a problem to write a search query returning all matched records in my collection. Following is my query to search record
{
"size":"total no of record" // Here i need to get total no of records in collection
"query": {
"match": {
"first_name": "vineeth"
}
}
}
By running this query i am only getting maximum 10 records, am sure there is more than 10 matching records in my collection. I searched a lot and finally got size parameter in query. But in my case i dont know the total count of records. I think giving an unlimited number to size variable is not a good practice, so how to manage this situation please help me to solve this issue, Thanks

It's not very common to display all results, but rather use fromand size to specify a range of results to fetch. So your query (for fetching the first 10 results) should look something like this:
{
"from": 0,
"size": 10,
"query": {
"match": {
"first_name": "vineeth"
}
}
}
This should work better than setting size to a ridiculously large value. To check how many documents matched your query you can get the hits.total (total number of hits) from the response.

To fetch all the records you can also use scroll concept.. It's like cursor in db's..
If you use scroll, you can get the docs batch by batch.. It will reduce high cpu usage and also memory usage..
For more info refer
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

To get all records, per de doc, you should use scroll.
Here is the doc:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
But the idea is to specify your search and indicate that you want to scroll it:
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '
{
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}'
in the scroll param you specify how long you want the search results available.
Then you can retrieve them with the returned scroll_id and the scroll api.

in new versions of elastic (e.g. 7.X), it is better to use pagination than scroll (deprecated):
https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html
deprecated in 7.0.0:
GET /_search/scroll/<scroll_id>

Related

how do i get the top 20 most searched query in elasticsearch?

I have stored sentences in elasticsearch for autosuggestion.
format:
{
"text": "what is temperature in chicago"
}
it suggests correctly when w or wha or what typed. but I am wondering if there is any way I can fetch most search sentences from elasticsearch.
Sounds like what you need is terms aggregations:
Your request body should look something like this:
{
"query": {
//your query
},
"aggs": {
"common" : {
"terms" : { "field" : "text.keyword", "size": 20 }
}
}
}
If I get your question correctly you want most common searches done wrt to input query, a simple solution can be implemented.
Just track what user finally selects (document of ES) and then increment its counter by 1 keeping mapping of _id.
Running a batch system/sync/indexing this data in ES data will have counter value in your data.
Use this while giving suggestions i.e sort with count field.
This will start working properly as users start using.
Your ES document would look like.
{ "text":"what is temperature in chicago",
"count":10
}
I would suggest this is very raw solution there can be many, but nice to start with.

Elasticsearch Pagination From - Size Result window is too large

I would like to have pages for my documents, each Page should have 1000 results.
Documents size: 95.000 docs
So starting from product number 10k, I would like to have 1K results from that point, however, using this query, I got the error mentioned below
Query:
this.es.search({
index: indexName,
type: type,
from: 10000,
size: 1000,
body: {}
});
Error:
{
"msg": "[query_phase_execution_exception] Result window is too large, from + size must be less than or equal to: [10000] but was [16000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.",
"path": "/products/Product/_search",
"query": {},
"body": "{\"from\":10000,\"size\":1000}",
"statusCode": 500,
"response": "{\"error\":{\"root_cause\":[{\"type\":\"query_phase_execution_exception\",\"reason\":\"Result window is too large, from + size must be less than or equal to: [10000] but was [16000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.\"}],\"type\":\"search_phase_execution_exception\",\"reason\":\"all shards failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shards\":[{\"shard\":0,\"index\":\"products\",\"node\":\"bTGkra6dTB-0Z_8jVUNByQ\",\"reason\":{\"type\":\"query_phase_execution_exception\",\"reason\":\"Result window is too large, from + size must be less than or equal to: [10000] but was [16000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.\"}}]},\"status\":500}"
}
How can I increase the paging result/ limit?
Also any other method is welcome, thank you.
Based on elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
Note that from + size can not be more than the index.max_result_window
index setting which defaults to 10,000
As alternative, you can try to use scroll or searchAfter. For more real-time query, searchAfter is more suitable.
Scroll documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
SearchAfter documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
Increase max_result_window as below, it works
PUT indexName/_settings
{
"index": {
"max_result_window": 10000000
}
}
As the error shows Elastic will only display 10000 rows.
so to display more than that number you need to use the scroll function
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
An example of a js search with scroll can be found here.
async await with elasticsearch search/scroll
Well, I could increase the limit of pagination using this CURL
curl -XPUT "http://localhost:9200/my_index/_settings" -d '{ "index" : { "max_result_window" : 500000 } }'

How to fuzzy query against multiple fields in elasticsearch?

Here's my query as it stands:
"query":{
"fuzzy":{
"author":{
"value":query,
"fuzziness":2
},
"career_title":{
"value":query,
"fuzziness":2
}
}
}
This is part of a callback in Node.js. Query (which is being plugged in as a value to compare against) is set earlier in the function.
What I need it to be able to do is to check both the author and the career_title of a document, fuzzily, and return any documents that match in either field. The above statement never returns anything, and whenever I try to access the object it should create, it says it's undefined. I understand that I could write two queries, one to check each field, then sort the results by score, but I feel like searching every object for one field twice will be slower than searching every object for two fields once.
https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html
If you see here, in a multi match query you can specify the fuzziness...
{
"query": {
"multi_match": {
"fields": [ "text", "title" ],
"query": "SURPRIZE ME!",
"fuzziness": "AUTO"
}
}
}
Somewhat like this.. Hope this helps.

Search within single document using Elasticsearch

If I want to search an index I can use:
$curl -XGET 'X/index1/_search?q=title:ES'
If I want to search a document type I can use:
$curl -XGET 'X/index1/docType1/_search?q=title:ES'
But if I want to search a specific document, this doesn't work:
$curl -XGET 'X/index1/docType1/documentID/_search?q=title:ES'
Is there a simple work around for this so that I can search within a single document as opposed to an entire index or an entire document type? To explain why I need this, I have to do some resource intensive queries to find what I'm looking for. Once I find the documents I need, I don't actually need the whole document, just the highlighted portion that matches the query. But I don't want to store all the highlighted hits in memory because I might not need them for a few hours and at times they could take up a lot of space (I would also prefer not to write them to disk). I'd rather store a list of document ids so that when I need the highlighted portion of a document I can just run the highlighted query on a specific document and get back the highlighted portion. Thanks in advance for your help!
You can index the document's id as a field, then when you query, include the unique document id as a term to narrow the results just to that single document.
'$curl -XPOST 'X/index1/docType1/_search' -d '{
"query": {
"bool": {
"must":[
{"match":{"doc":"223"}},
{"match":{"title":"highlight me please"}}
]
}
}
}'
You can use the Ids Query in Elasticsearch to search on a single document. Elasticsearch by default, indexes a field called _uid which is the combination of type and id so that it can be used for queries, aggregations, scripts, and sorting.
So the query you need will be as follows
curl -XGET 'X/index1/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"title": "ES"
}
},
{
"ids": {
"type" : "docType1",
"values": [
"documentID"
]
}
}
]
}
}
}'
If you need to search on multiple documents, then specify doc_ids in the values array in ids query.

Fuzziness settings in ElasticSearch

Need a way for my search engine to handle small typos in search strings and still return the right results.
According to the ElasticSearch docs, there are three values that are relevant to fuzzy matching in text queries: fuzziness, max_expansions, and prefix_length.
Unfortunately, there is not a lot of detail available on exactly what these parameters do, and what sane values for them are. I do know that fuzziness is supposed to be a float between 0 and 1.0, and the other two are integers.
Can anyone recommend reasonable "starting point" values for these parameters? I'm sure I will have to tune by trial and error, but I'm just looking for ballpark values to correctly handle typos and misspellings.
I found it helpful when using the fuzzy query to actually use both a term query and a fuzzy query(with the same term) in order to both retrieve results for typos, but also ensure that instances of the entered search word appeared highest in the results.
I.E.
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": search_term
}
},
{
"match": {
"_all": {
"query": search_term,
"fuzziness": "1",
"prefix_length": 2
}
}
}
]
}
}
}
a few more details listed here: https://medium.com/#wampum/fuzzy-queries-ae47b66b325c
According to the Fuzzy Query doc, default values are 0.5 for min_similarity (which looks like your fuzziness option), "unbounded" for max_expansions and 0 for prefix_length.
This answer should help you understand the min_similarity option. 0.5 seems to be a good start.
prefix_length and max_expansions will affect performance: you can try and develop with the default values, but be sure it will not scale (lucene developers were even considering setting a default value of 2 for prefix_length). I would recommend to run benchmarks to find the right values for your specific case.

Resources