Refresh index mapping and reduce number of total fields - node.js

In my current application I have setup a search for my (web-)shop items. Until now I have always worked with dynamic mappings and now encountered the problem, that I have reached the default index.total_fields.limit of 1000.
Now what I want to do is to reduce the number the total number of fields (reduce the mappings total_fields) just by putting a new mapping where I set dynamic to false on most of the unnecessary properties.
Somehow when doing this, I now only can't reduce the number of total fields, I also get the 1000 total fields limit error while putting the new mapping to the items index. Is there a way to refresh the mapping on an existing index without the need to recreate a new index with correct mappings.
Thanks in advance

No, the mapping is only applied when creating an index, if you want to change the mapping of any field you will need to create a new index or reindex your current index into a newly created index with the correct mapping.
Alternatively you can increase the limit of total mapping fields, but having too many fields can impact in performance.
PUT your_index/_settings
{
"index.mapping.total_fields.limit": 5000
}

Related

How can I use Dynamic feature to find an item by value

I have a table that has many random generated items in it. I need to choose one. Loading the table in a collection, then looping into it is so much work and time. Is there a way to use Dynamic feature in blueprism to achieve the same work faster and in less steps?
A combination of wildcard for the ID and dynamic features for value or matching index will do the trick. In action, add a navigate stage, choose the item and pass in the value or number for matching index in param.

MongoDB API pagination

Imagine situation when a client has feed of objects with limit 10.
When the next 10 are required it sends request with skip 10 and limit 10.
But what if there are some new objects were added (or deleted) to collection since the 1st request with offset == 0.
Then on 2nd request (with offset == 10) response may have wrong objects order.
Sorting on time of their creation does not work here, because I have some feeds which are formed on sorting via some numeric field.
You can add a time field like created_at or updated_at. It must updated when ever the document is created or modified and the field must be unique.
Then query the DB for the range of time using $gte and $lte along with a sort on this time field.
This ensures that any changes made outside the time window will not get reflected in the pagination, provided that the time field does not have duplicates. Most probably if you include microtime, duplicates wont happen.
It really depends on what you want the result to be.
If you want the original objects in their original order regardless of Delete and Add operations then you need to make a copy of the list (or at least of the order) and then page through that. Copy every Id to a new collection that doesn't change once the page has loaded and then paginate through that.
Alternatively, and perhaps more likely, what you want is to see the next 10 after the last one in the current set including any Delete or Add operations that have take place since. For this, you can use the sorted order in which you are viewing them and a filter, $gt whatever the last item was. BUT that doesn't work when there are duplicates in the field on which you are sorting. To get around that you will need to index on that field PLUS some other field which is unique per record, for example, the _id field. Now, you can take the last record in the first set and look for records that are $eq the indexed value and $gt the _id OR are simply $gt the indexed value.

What is the most efficient way to update record(s) value when using SummingCombiner?

I have a table with a SummingCombiner on minC and majC. Every day I need to update the value for a small number of records. What is the most efficient way to do so?
My current implementation is to create a new record with value set to amount to increase/decrease (new mutation w/Row,CF,CQ equal to existing record(s)).
Yes, the most efficient way to update the value is to insert a new record and let the SummingCombiner add the new value into the existing value. You probably also want to have the SummingCombiner configured on the scan scope, so that scans will see the updated value right away, before a major compaction has occurred.

Changing your 20 indexed columns

I have a large SharePoint 0365 list of over 15,000 items. I have already used all 20 indexed columns. I now need to filter by a different column. Is it safe for me to remove an indexed column and changed to a different field? Do you have to reindex the list, if I do that?
I'm afraid you'll find that creating or removing column indexes are among the operations that are restricted upon surpassing SharePoint's list view threshold, as documented here.
In an on-premises SharePoint farm (or an otherwise traditional SharePoint farm using cloud-hosted infrastructure), you'd have access to central administration where you could temporarily increase the threshold, set a time window during which the threshold won't apply, or even use Powershell to temporarily set the EnableThrottling property of the list to false, allowing you to make your indexed column changes. But with Office 365 you won't have any of those options.
Depending on the circumstances, you can still circumvent the list view threshold when filtering by first filtering the list by one or more of your indexed columns such that less than 5000 items are returned; you should then be able to filter that subset of results using your unindexed column.
Another alternative would be to use SharePoint's search services to access results in your list that match the given metadata. Since the search crawl index is generated ahead of time (rather than a live query), it is not beholden to the list view threshold. Only problem there is that the results might be stale depending on the frequency of search crawls.
Since you already have 20 indexed columns, it is possible that you might be able to query the list using an already-indexed column to return a response that obeys the list-view threshold ('Date Created' range, or 'Created By' might be useful columns)
Once you return your initial response, you can then filter on the unindexed column of interest.

How to iterate over a SOLR shard which has over 100 million documents?

I would like to iterate over all these documents without having to load the entire result in memory which seems to be the case apparently - QueryResponse.getResults() returns SolrDocumentList which is an ArrayList.
Can't find anything in the documentation. Am using SOLR 4.
Note on the background of problem: I need to do this when adding a new SOLR shard to the existing shard cluster. In that case, I would like to move some documents from the existing shards to the newly added shard(s) based on consistent hashing. Our data grows constantly and we need to keep introducing new shards.
You can set the 'rows' and 'start' query params to paginate a result set. Query first with start = 0, then start = rows, start = 2*rows, etc. until you reach the end of the complete result set.
http://wiki.apache.org/solr/CommonQueryParameters#start
I have a possible solution I'm testing:
Solr paging 100 Million Document result set
pasted:
I am trying to do deep paging of very large result sets (e.g., over 100 million documents) using a separate indexed field (integer) into which I insert a random variable (between 0 and some known MAXINT). When querying large result sets, I do the initial field query with no rows returned and then based on the count, I divide the range 0 to MAXINT in order to get on average PAGE_COUNT results by doing the query again across a sub-range of the random variable and grabbing all the rows in that range. Obviously the actual number of rows will vary but it should follow a predictable distribution.

Resources