Elasticsearch Pagination From - Size Result window is too large - node.js

I would like to have pages for my documents, each Page should have 1000 results.
Documents size: 95.000 docs
So starting from product number 10k, I would like to have 1K results from that point, however, using this query, I got the error mentioned below
Query:
this.es.search({
index: indexName,
type: type,
from: 10000,
size: 1000,
body: {}
});
Error:
{
"msg": "[query_phase_execution_exception] Result window is too large, from + size must be less than or equal to: [10000] but was [16000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.",
"path": "/products/Product/_search",
"query": {},
"body": "{\"from\":10000,\"size\":1000}",
"statusCode": 500,
"response": "{\"error\":{\"root_cause\":[{\"type\":\"query_phase_execution_exception\",\"reason\":\"Result window is too large, from + size must be less than or equal to: [10000] but was [16000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.\"}],\"type\":\"search_phase_execution_exception\",\"reason\":\"all shards failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shards\":[{\"shard\":0,\"index\":\"products\",\"node\":\"bTGkra6dTB-0Z_8jVUNByQ\",\"reason\":{\"type\":\"query_phase_execution_exception\",\"reason\":\"Result window is too large, from + size must be less than or equal to: [10000] but was [16000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.\"}}]},\"status\":500}"
}
How can I increase the paging result/ limit?
Also any other method is welcome, thank you.

Based on elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
Note that from + size can not be more than the index.max_result_window
index setting which defaults to 10,000
As alternative, you can try to use scroll or searchAfter. For more real-time query, searchAfter is more suitable.
Scroll documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
SearchAfter documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html

Increase max_result_window as below, it works
PUT indexName/_settings
{
"index": {
"max_result_window": 10000000
}
}

As the error shows Elastic will only display 10000 rows.
so to display more than that number you need to use the scroll function
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
An example of a js search with scroll can be found here.
async await with elasticsearch search/scroll

Well, I could increase the limit of pagination using this CURL
curl -XPUT "http://localhost:9200/my_index/_settings" -d '{ "index" : { "max_result_window" : 500000 } }'

Related

Buildfire - How to load sorted pages dynamically from publicData/datastore

I am trying to lazy load data from publicData using the options that BuildFire describes in the wiki. I have set up some code to test that it works and it seems that is does not any way that I configure the request options. Here is the code that I am using:
var loadSortedPages = function(page) {
var skip = page*50;
var options = {
"filter": {},
"sort": {"points": 1},
"pageSize": "50",
"skip": skip.toString()
}
buildfire.publicData.search(options, 'users', function(err, records) {
console.log("RECORDS SORTED ASCENDING BY POINTS FOR PAGE " + page, records);
});
}
loadSortedPages(0);
loadSortedPages(1);
loadSortedPages(2);
I have tried, it seems, every thinkable combination of "page" and "skip" both as different combinations of string and number values. Nothing works and I always get back the first 50 sorted records for each of the loadSortedPages calls even though I am passing in different page numbers. If this something on BuildFire's end?
Here is the documentation on how to use Datastore search https://github.com/BuildFire/sdk/wiki/How-to-use-Datastore#buildfiredatastoresearchoptions-tag-optional-callback
It seems like you are mixing to pagination methods:
and for pagination you can either use:
page : is number that determine the page that need to retrieve.
pageSize: is a number of record per page , the max value is 20.
Or use:
skip : is number of record that you need to skip.
limit: is a number
of record for this call, the max value is 20.

How to get top hits by document type

I have a number of simple queries I'm sending that look roughly like this:
My question is how to sort the results such that I get at least some top 'n' hits from each document type. I've been playing with boosting, and some results from docType1 are so powerful that with a limit of 30, they push all relevant hits from the other document types out of the search results.
{
from: 0, limit: 30,
index: 'myIndex',
type: 'docType1,docType2,docType3',
body: {
query: {
simple_query_string: {
query: 'foo',
}
}
}
}
I've looked into sorting, but that's not really what I want. I've also looked into aggregation, but I'm having trouble finding the right formula that would get things there. I've also looked into top_hits but frankly am having a tough time understanding the documentation or whether this is applicable to the use case.

reduce output must shrink more rapidly, on adding new document

I have couple of documents in couchdb, each having a cId field, such as -
{
"_id": "ccf8a36e55913b7cf5b015d6c50009f7",
"_rev": "8-586130996ad60ccef54775c51599e73f",
"cId": 1,
"Status": true
}
I have a simple view, which tries to return max of cId with map and reduce functions as follows -
Map
function(doc) {
emit(null, doc.cId);
}
Reduce
function(key, values, rereduce){
return Math.max.apply(null, values);
}
This works fine (output is 1) until I add one more document with cId = 2 in db. I am expecting output as 2 but it starts giving error as "Reduce output must shrink more rapidly". When I delete this document things are back to normal again. What can be the issue here? Is there any alternative way to achieve this?
Note: There are more views in db, which perform different role and few return json as well. They also start failing on this change.
You could simply use the built-in _statsreduce function, in order to get the maximum value. It is returned in the "max" field.

MongoDB fulltext search: Overflow sort stage buffered data usage

I am trying to implement mongo text search in my node(express.js) application.
Here are my codes:
Collection.find({$text: {$search: searchString}}
, {score: {$meta: "textScore"}})
.sort({score: {$meta: 'textScore'}})
.exec(function(err, docs {
//Process docs
});
I am getting following error when text search is performed on large dataset:
MongoError: Executor error: Overflow sort stage buffered data usage of 33554558 bytes exceeds internal limit of 33554432 bytes
I am aware that MongoDB can sort maximum of 32MB data and this error can be avoided by adding index for field we will be sorting collection with. But in my case I am sorting collection by textScore and I am not exactly sure if is it possible to set index for this field. If not, is there any workaround for this?
NOTE: I am aware there are similar questions on SO but most of these questions do not have textScore as sort criteria and therefore my question is different.
You can use aggregate to circumvent the limit.
Collection.aggregate([
{ $match: { $text: { $search: searchString } } },
{ $sort: { score: { $meta: "textScore" } } }
])
The $sort stage has a 100 MB limit. If you need more, you can use allowDiskUse, that will write to temp files while sorting takes place. To do that just add allowDiskUse: true to the aggregate option.
If your result is greater than 16MB (i.e. MongoDB's document size limit), you need to request a cursor to iterate through your data. Just add .cursor() before your exec and here's a detailed example. http://mongoosejs.com/docs/api.html#aggregate_Aggregate-cursor

Elastic Search size to unlimited

Am new to elastic search. Am facing a problem to write a search query returning all matched records in my collection. Following is my query to search record
{
"size":"total no of record" // Here i need to get total no of records in collection
"query": {
"match": {
"first_name": "vineeth"
}
}
}
By running this query i am only getting maximum 10 records, am sure there is more than 10 matching records in my collection. I searched a lot and finally got size parameter in query. But in my case i dont know the total count of records. I think giving an unlimited number to size variable is not a good practice, so how to manage this situation please help me to solve this issue, Thanks
It's not very common to display all results, but rather use fromand size to specify a range of results to fetch. So your query (for fetching the first 10 results) should look something like this:
{
"from": 0,
"size": 10,
"query": {
"match": {
"first_name": "vineeth"
}
}
}
This should work better than setting size to a ridiculously large value. To check how many documents matched your query you can get the hits.total (total number of hits) from the response.
To fetch all the records you can also use scroll concept.. It's like cursor in db's..
If you use scroll, you can get the docs batch by batch.. It will reduce high cpu usage and also memory usage..
For more info refer
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
To get all records, per de doc, you should use scroll.
Here is the doc:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
But the idea is to specify your search and indicate that you want to scroll it:
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '
{
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}'
in the scroll param you specify how long you want the search results available.
Then you can retrieve them with the returned scroll_id and the scroll api.
in new versions of elastic (e.g. 7.X), it is better to use pagination than scroll (deprecated):
https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html
deprecated in 7.0.0:
GET /_search/scroll/<scroll_id>

Resources