Fixed Sized Bucket - Nested JSON Array - azure

I have stream of events coming from a particular user. I am using Cosmos DB to store my User profile. Take following example JSON object. Here I just want to store limited set of events, say only 2. As soon as the 3rd event comes in I want to remove older one and add that 3rd event, not exceeding my bucket size. Easy way is for each update I pull the record for that User, and modify and update. Was wondering if there is any other efficient way to achieve the same.
{
"id":"4717cd3c-78d9-4a0e-bf5d-4645c97bd55c",
"email":"abc#acme.org",
"events":[
{
"event":"USER_INSTALL",
"time":1641232180,
"data":{
"app":"com.abc"
}
},
{
"time":1641232181,
"event":"USER_POST_INSTALL",
"data":{
"app":"com.xyz"
}
}
]
}

There are no options to limit an array size, within a document. You would need to modify this on your own, as you're currently doing. Even if you stored each array item as a separate document, you would still need to periodically purge older documents on your own. At least, with independent documents per event, you could consider purging older events via ttl, but you still wouldn't be able to specify an exact number of documents to keep.

Related

How to specify feed when using 'updateActivities' with Stream?

The updateActivities method in the Stream API is perplexing, as the docs seem to indicate that a feed is not specified during this operation. How is this supposed to work?
The other activity methods (addActivity, removeActivity) are performed on a feed object, which makes sense. But the docs show updateActivities as a method on the client object, with no way to specify the feed containing the activity.
From the docs:
var now = new Date();
activity = {
"actor": "1",
"verb":"like",
"object": "3",
"time": now.toISOString(),
"foreign_id": "like:3",
"popularity": 100
};
// first time the activity is added
user1.addActivity(activity);
// update the popularity value for the activity
activity.popularity = 10;
// send the update to the APIs
client.updateActivities([activity]);
My expectation (and the only thing that makes sense, as far as I can tell), would be that the updateActivities method would be on the feed object, since a foreign_id is not unique across all feeds.
(Previous assumption based on lots of experience using identical foreign IDs across multiple feeds.)
When an activity is added to a feed, Stream generates a unique ID for it and uses such ID to propagate the activity to the direct feed and if any, to all follower feeds. In fact only references to activities are stored inside feeds.
Stream also guarantees that IDs are consistent for same time and foreign_id values. This means that if you add an activity with same time and foreign_id, it will always end up with the same ID.
This allows you to control activity uniqueness and to update all occurrences of an activity without keeping track of all feeds that can have a copy (the to target and follow relationships would make this a very complex task!).

How can I reduce a collection by keeping only change points?

I have a Collection exampled below. This data is pulled from an endpoint every twenty minutes on a cron job.
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474116005,"_id":"kX0DpoZ5fkMr2ezg"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474684808,"_id":"ken1WRN47PTW159H"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474275606,"_id":"ko9r8u860es7E2hI"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
I want to discard any document (row) that doesn't show a change in the empty (and consequently ready). My goal is to find the most recent time stamp where these values have changed with in this collection.
Better illustrated, I want to reduce it to where the values change as such:
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
Can I do this at the in a MongoDB query? Or am I better off with a JavaScript filter function?
MongoDB allows you to specify a unique constraint on an index. These constraints prevent applications from inserting documents that have duplicate values for the inserted fields.
Use the following code to make unique
db.collection.createIndex( { "id": 1 }, { unique: true } )
Also refer the MongoDB documentation for more clarification.

Update a million records in MongoDb each with Subdocument that has a array which also needs to updated

I'm a noobie in Nodejs and MongoDb so please excuse my silly doubts :D but i need help right now
{
"_id": "someid",
"data": "some_data",
"subData": [
{
"_id": "someid",
"data": "some_data"
},
{
"_id": "some_id",
"data": "some_data"
}
]
}
I have a schema like above and imagine i have millions of Documents in that schema, Now i want to update those Documents.
Based on condition i want to select a set of them and modify those "subdata" arrays and update them.
I know there is no way to do that in one query and the issue here at Jira for that feauture but my question now is, what is the most efficient way to update a million records in mongoDb ?
Thanks in advance :)
Going by the schema that you have posted here, it is good that you are maintaining a specific id for the sub document which is automatically added if you are using mongoose (in case the backend is node.js).
I would like to post something from the post that you have posted along with the main post of yours.
It doesn't just not work for updating multiple items in one
array, it also doesn't work for updating a single item in an array for
multiple documents.
So our relevant option there goes out of the window. There is no way to update large chunks in single command as you'll have to target them individually.
If you are going to target them individually it is advisable that you target them using specific unique ids that are being generated and now to automate the whole process you can choose whichever efficient method suits the backend you are using.
You can make several processes in parallel that would help you to attain the desired task in less time but it wont be possible to do everything in one go because mongodb don't support that.
It is also advisable that at place of maintaining several sub documents you should just go for separate collection instead as it'll ease the whole process. Maintain a field to map your two collections.
References
https://jira.mongodb.org/browse/SERVER-831
https://jira.mongodb.org/browse/SERVER-1243
https://www.nodechef.com/docs/cloud-search/updates-documents-nested-in-arrays

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

CouchDB partial/differential writes

Basic problem
I have some large, but logically organised documents - and would like to perform updates on just a sub-section of an individual document.
Example
Given this simple document:
_id: 123456,
_rev: 3242342,
name: 'Stephen',
type: 'Person',
hobbies: [ 'sky-diving' ]
In my application I might have an addHobbies method, that would use a view that just retrieves:
_id: 123456,
_rev: 3242342,
hobbies: [ 'sky-diving' ]
So that it can then add an additional hobby to the hobbies array, and then PUT just this sub-set of data back to the document.
Question
As I understand it, CouchDB [1.2] does not allow partial updates like this, and so I believe it would be necessary to grab the whole document during the save operation, merge my changes, then PUT the whole document back on every single save.
Is there another way of doing this (am I wrong about CouchDB's capabilities)?
Are there any libraries (I'm using express on node.js) to handle this kind of operation?
You are correct. That is, in fact, what document database means: check-outs and check-ins.
You can create (or use) shim code to simulate what you want, letting you focus on the important parts. On the server side, you can use update functions.
There are many solutions on the client side.
cradle.js will give you fake partials updates with the merge method.
If you only want to update one or more attributes, and leave the others untouched, you can use the merge() method:
db.merge('luke', {jedi: true}, function (err, res) {
// Luke is now a jedi,
// but remains on the dark side of the force.
});
https://github.com/cloudhead/cradle/
Related, and also for Node.js is Transaction for performing arbitrary atomic transactions on CouchDB documents.
I would say that cradle is currently missing a real partial update feature, which would also support updating a path to a key inside the field value's JSON data, like Apache demonstrates here, rather than being limited to updating only a single key in a document, like the db.merge method.
In fact, looking at the cradle source, I see that there is a Database.prototype.update method (in lib/cradle/document.js), but this method doesn't seem to be documented.
It would be elegant of this could be made an integral part of cradle, eliminating the need to do separate requests to CouchDB view's updates only for partial updates.

Resources