MongoDB API pagination - node.js

Imagine situation when a client has feed of objects with limit 10.
When the next 10 are required it sends request with skip 10 and limit 10.
But what if there are some new objects were added (or deleted) to collection since the 1st request with offset == 0.
Then on 2nd request (with offset == 10) response may have wrong objects order.
Sorting on time of their creation does not work here, because I have some feeds which are formed on sorting via some numeric field.

You can add a time field like created_at or updated_at. It must updated when ever the document is created or modified and the field must be unique.
Then query the DB for the range of time using $gte and $lte along with a sort on this time field.
This ensures that any changes made outside the time window will not get reflected in the pagination, provided that the time field does not have duplicates. Most probably if you include microtime, duplicates wont happen.

It really depends on what you want the result to be.
If you want the original objects in their original order regardless of Delete and Add operations then you need to make a copy of the list (or at least of the order) and then page through that. Copy every Id to a new collection that doesn't change once the page has loaded and then paginate through that.
Alternatively, and perhaps more likely, what you want is to see the next 10 after the last one in the current set including any Delete or Add operations that have take place since. For this, you can use the sorted order in which you are viewing them and a filter, $gt whatever the last item was. BUT that doesn't work when there are duplicates in the field on which you are sorting. To get around that you will need to index on that field PLUS some other field which is unique per record, for example, the _id field. Now, you can take the last record in the first set and look for records that are $eq the indexed value and $gt the _id OR are simply $gt the indexed value.

Related

Identify negative values in Collection using blue prism

[I have one collection named "Total Amount' and having column named "Amount", so i am fetching some amounts from one application and putting them into above collection under same mentioned column hence there are some negative amounts exists into it. So ideally my robot should recognize the negative amount under "Amount" column and if exists, should stop the bot.
It's not clear to me whether you want to loop through the 'Total Amount' collection and filter the negative amounts out of there, or skip appending negative amounts to the collection when you're filling it.
Also it's not clear why you would want to stop the robot if you can just remove the negative values from the collection.
What I would suggest is to use the 'Filter Collection' action in the 'Utility - Collection Manipulation' object.
This action basically checks every item in the collection and matches it to your filter query (in this case "Amount < 0").
If the result is True, it will be put into the output collection, if not, if will be omitted.
Another way of doing it, is to loop through the collection using a loop and program the action you want to take with a decision stage when you come across a negative number like Esqew already said in his/her comment.
Hope this helps :).
You can Filter the collection to check if there is negative values available in specific column. The Filter action under Utility - Collection Manipulation will allow you to save filtered data into in one more collection. Check the count of generated collection if it is greater than zero then the collection is having negative values else collection does not hold any negative value.
To Filter the collection please check the below screen shot:

Query Couchdb by date while maintaining sort order

I am new to couchdb, i have looked at the docs and SO posts but for some reason this simple query is still eluding me.
SELECT TOP 10 * FROM x WHERE DATE BETWEEN startdate AND enddate ORDER BY score
UPDATE: It cannot be done. This is unfortunate since to get this type
of data you have to pull back potentially millions of records (a few
fields) from couch then do either filtering, sorting or limiting
yourself to get the desired results. I am now going back to my
original solution of using _changes to capture and store elsewhere the data i do need to perform that query on.
Here is my updated view (thanks to Dominic):
emit([d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score], doc.name);
What I need to do is:
Always sort by score descending
Optionally filter by date range (for instance, TODAY only)
Limit by x
Update: Thanks to Dominic I am much closer - but still having an
issue.
?startkey=[2017,1,13,{}]&endkey=[2017,1,10]&descending=true&limit=10&include_docs=true
This brings back documents between the dates sorted by score
However if i want top 10 regardless of date then i only get back top 10 sorted by date (and not score)
For starters, when using complex keys in CouchDB, you can only sort from left to right. This is a common misconception, but read up on Views Collation for a more in-depth explanation. (while you're at it, read the entire Guide to Views as well since you're getting started)
If you want to be able to sort by score, but filter by date only, you can accomplish this by breaking down your timestamp to only show the degree you care about.
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), score ])
}
You'll end up outputting a more complex key than what you currently have, but you query it like so:
startkey=[2017,1,1]&endkey=[2017,1,1,{}]
This will pick out all the documents on 1-1-2017, and it'll be sorted by score already! (in ascending order, simply swap startkey and endkey to get descending order, no change to the view needed)
As an aside, avoid emitting the entire doc as the value in your view. It is likely more efficient to leverage the include_docs=true parameter, and leaving the value of your emit empty. (please refer to this SO question for more information)
With this exact setup, you'd need separate views in order to query by different precisions. For example, to query by month you just use the year/month and so on.
However, if you are willing/able to sort your scores in your application, you can use a single view to get all the date precision you want. For example:
function (doc) {
var d = new Date(doc.date)
emit([ d.getUTCFullYear(), d.getUTCMonth() + 1, d.getUTCDate(), d.getUTCHour(), d.getUTCMinutes(), d.getUTCSeconds(), d.getUTCMilliseconds() ])
}
With this view and the group_level parameter, you can get all the scores by year, month, date, hour, etc. As I mentioned, in this case it won't be sorted by score yet, but maybe this opens up other queries to you. (eg: what users participated this month?)

What is the most efficient way to update record(s) value when using SummingCombiner?

I have a table with a SummingCombiner on minC and majC. Every day I need to update the value for a small number of records. What is the most efficient way to do so?
My current implementation is to create a new record with value set to amount to increase/decrease (new mutation w/Row,CF,CQ equal to existing record(s)).
Yes, the most efficient way to update the value is to insert a new record and let the SummingCombiner add the new value into the existing value. You probably also want to have the SummingCombiner configured on the scan scope, so that scans will see the updated value right away, before a major compaction has occurred.

Building a pagination cursor

I have activities that are stored in a graph database. Multiple activities are grouped and aggregated into 1 activity in some circumstances.
A processed activity feed could look like this:
Activity 1
Activity 2
Grouped Activity
Activity 3
Activity 4
Activity 5
Activities have an updated timestamp and a unique id.
The activities are ordered by their updated time and in the case of a grouped activity, the most recent updated time within its child activities is used.
Activities can be inserted anywhere in the list (for example, if we start following someone, their past activities would be inserted into the list).
Activities can be removed from anywhere in the list.
Due to the amount of data, using the timestamp with microseconds can still result in conflicts (2 items can have the same timestamp).
Cursor identifiers should be unique and stable. Adding and removing feed items should not change the identifier.
I would like to introduce cursor based paging to allow clients to paginate through the feed similar to twitter's. There doesn't seem to be much information on how they are built as I have only found this blog post talking about implementing them. However it seems to have a problem if the cursor's identifier happens to be pointing to the item that was removed.
With the above, how can I produce an identifier that can be used as a cursor for the above? Initially, I considered combining the timestamp with the unique id: 1371813798111111.myuniqueid. However, if the item at 1371813798111111.myuniqueid is deleted, I can get the items with the 1371813798111111 timestamp, but would not be able to determine which item with that timestamp I should start with.
Another approach I had was to assign an incrementing number to each feed result. Since the number is incrementing and in order, if the number/id is missing, I can just choose the next one. However, the problem with this is that the cursor ids will change if I start removing and adding feed items in the middle of the feed. One solution I had to this problem is to have a huge gap between each number, but it is difficult to determine how new items can be added to the space between each number in a deterministic way. In addition, as the new items are added, and the gaps are being filled up, we would end up with the same problem.
Simply put, if I have a list of items where items can be added and removed from anywhere in the list, what is the best way to generate an id for each list item such that if the item for the id is deleted, I can still determine its position in the list?
You need to have additional (or existing) column which sequentially increased for every new added row to target table. Let's call this column seq_id.
When client request cursor for the first time:
GET /api/v1/items?sort_by={sortingFieldName}&size={count}
where sortingFieldName is name of field by which we apply sorting
What happened under the hood:
SELECT * FROM items
WHERE ... // apply search params
ORDER BY sortingFieldName, seq_id
LIMIT :count
Response:
{
"data": [...],
"cursor": {
"prev_field_name": "{result[0].sortingFieldName}",
"prev_id": "{result[0].seq_id}",
"nextFieldName": "{result[count-1].sortingFieldName}",
"next_id": "{result[count-1].seq_id}",
"prev_results_link": "/api/v1/items?size={count}&cursor=bw_{prevFieldName}_{prevId}",
"next_results_link": "/api/v1/items?size={count}&cursor=fw_{nextFieldName}_{nextId}"
}
}
Next of cursor will not be present in response if we retrieved less than count rows.
Prev part of cursor will not be present in response if we don't have cursor in request or don't have data to return.
When client perform request again - he need to use cursor. Forward cursor:
GET /api/v1/items?size={count}&cursor=fw_{nextFieldName}_{nextId}
What happened under the hood:
SELECT * FROM items
WHERE ... // apply search params
AND ((fieldName = :cursor.nextFieldName AND seq_id > :cursor.nextId) OR
fieldName > :cursor.nextFieldName)
ORDER BY sortingFieldName, seq_id
LIMIT :count
Or backward cursor:
GET /api/v1/items?size={count}&cursor=fw_{prevFieldName}_{prevId}
What happened under the hood:
SELECT * FROM items
WHERE ... // apply search params
AND ((fieldName = :cursor.prevFieldName AND seq_id < :cursor.prevId) OR
fieldName < :cursor.prevFieldName)
ORDER BY sortingFieldName DESC, seq_id DESC
LIMIT :count
Response will be similar to previous one

CouchDB view collation sorted by date

I am using a couchDB database.
I can get all documents by category and paginate results with a key like ["category","document_id"]and a query likestartkey=["category","document_id"]&endkey=["category",{}]`
Now I want to sort those results by date to have latest documents first.
I tried a lot of keys such as ["category","date","document_id"]
but nothing works (or I can't get it working).
I would use something like
startkey=["queried_category","queried_date","queried_document_id"]&endkey=["queried_category"]
but ignore the "queried_date" key part (sort but do not take documents where "document_id" > "queried_document_id")
EDIT:
Example :
With a key like :
startkey=["apple","2012-12-27","ZZZ"]&endkey=["apple",{}]&descending=true
I will have (and it is the normal behavior)
"apple","2012-12-27","ABC"
"apple","2012-05-01","EFG"
...
"apple","2012-02-13","ZZZ"
...
But the result set I want should start with
"apple","2012-02-13","ZZZ"
Emit the category and the timestamp (you don't need the document_id):
emit(category, timestamp);
And then filter on the category:
?startkey=[":category"]&endkey=[":category",{}]
You must understand that this is only a sort, so you need the startkey to be before the first row, and the endkey to be after the last row.
Last but not least, don't forget to have a representation for the timestamp that is adequate to the sort.
The problem with pagination with timestamp instead of doc ID is that timestamp is not unique. That's why you will have problem with paging Aurélien's solution.
I would stay with what you tried but use timestamp as the number (standard UNIX milliseconds since 1970). You can reverse the order of single numeric field just by multiplying by -1:
emit(category, -timestamp, doc_id)
This way result sorted lexicographically (ascending) will be ordered according to your needs:
first dates descending,
then document id's ascending.

Resources