Mongoose: Only return one embedded document from array of embedded documents - node.js

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!

You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

Related

Trying to understand mongodb indexes for finding documents with exact and unique value(s)

I am reading through mongo docs fro nodejs driver, particularly this index section https://www.mongodb.com/docs/drivers/node/current/fundamentals/indexes/#geospatial-indexes and it looks like all of the indexes that they mention are for sortable / searchable data. So I wanted to ask if I need indexes for following use case:
I have this user document structure
{
email: string,
version: number,
otherData: ...
}
As far as I understand I can query each user by _id and this already has default unique index applied to it? I alos want to query user by email as well, so I created following unique index
collection.createIndex({ email: 1 }, { unique: true })
Is my understanding correct here that by creating this index I guarantee thaa:
Email is always unique
My queries like collection.findOne({email: 'my#email.com'}) are optimised?
Next, I want to perform update operations on user documents, but only on specific versions, so:
collection.updateOne({email: '...', version: 2}, update)
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
Yes, the unique constraint happens at the db layer so by definition this will be unique, It is worth mentioning that this can affect insert/update performance as this check has to be executed on each of these instances - from my experience you only start feeling this overhead in larger scale ( hundreds of millions of documents in a single collection + thousands of inserts a minutes ).
Yes. there is no other way to optimize this further.
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
You want to create a compound index, the syntax will looks like this:
collection.createIndex({ email: 1, version: 1 }, { unique: true })
I will just say that by definition the (first) email index ensures uniqueness, so any additional filtering you add to the query and index will not really affect anything as there will always be only 1 of those emails in the DB. Basically why bother adding a "version" field to the query? if you need it for filtering that's fine but then you won't be needing to alter the existing index.

Get IDs of nodes via the edge collection only

I am writing an application that stores external data in ArangoDB for further processing inside the application. Let's assume I am talking about Photos in Photosets here.
Due to the nature of used APIs, I need to fetch Photosets befor I can load Photos. In the Photosets API reply, there is a list of Photo IDs that I later use to fetch the Photos. So I created an edge collection called photosInSets and store the edges between Photosets and Photos, although the Photos are not there yet.
Later on, I need to get a list of all needed Photos to load them via the API. All IDs are numeric. At the moment, I use the following AQL query to fetch the IDs of all required Photos:
FOR edge
IN photosInSets
RETURN DISTINCT TO_NUMBER(
SUBSTITUTE(edge._from, "photos/", "")
)
However... this does not look like a nice solution. I'd like to (at least) get rid of the string operation to remove the collection name. What's the nice way to do that?
One way you can find this is with a join on the photosInSets edge collection back to the photos collection.
Try a query that looks like this:
FOR e IN photoInSets
LET item = (FOR v IN photos FILTER e._from == v._id RETURN v._key)
RETURN item
This joins the _from reference in photoInSets with the _id back in the photos collection, then pulls the _key from photos, which won't have the collection name as part of it.
Have a look at a photo item and you'll see there is _id, _key and _rev as system attributes. It's fine to use the _key value if you want a string, it's not necessary to implement your own unique id unless there is a burning reason why you can't expose _key.
With a little manipulation, you could even return an array of objects stating which photo._key is a member of which photoSet, you'll just have to have two LET commands and return both results. One looking at the Photo, one looking at the photoSet.
I'm not official ArangoDB support, but I'm interested if they have another way of doing this.

Is list function a good candidate for my scenario?

I have a view in couchDb that is defined like this
function (doc) {
if (doc.url) {
var a = new Date(doc.postedOn);
emit([a.toLocaleDateString(), doc.count, doc.userId], {
_id: doc.userId,
postTitle: doc.postTitle,
postSummary: doc.postSummary,
url: doc.url,
count: doc.count
});
}
};
This gives me the result in a format that I want.Sorted first by date then by count and then by userID.
However I have trouble querying it.What I want is to query this view just by userId.That is leave the date and the count parameter null.
_view/viewName?limit=20&descending=true&endkey=["","","userId"]
does not give me the desired result.
Should I be using list function to filter out the results of the view.Is there any impact on performance if I do this?
This quote from the definitive guide first gave me the idea that list functions could be used to filter and aggregate results.
The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON.
List function has nothing to do with your case. From the docs you've linked to yourself:
While Show functions are used to customize document presentation, List functions are used for same purpose, but against View functions results.
Show functions are used to represent documents in various formats, commonly as HTML page with nicer formatting. They can also be used to run server-side functions without requiring a pre-existing document.
To solve your problem just change the order of the emitted keys, putting userId first, i.e.:
[ doc.userId, a.toLocaleDateString(), doc.count ]
and update your query appropriately.
If changing the order of emitted keys is not an option, just create another view.

Is a type property the correct way to store different data entities in CouchDB?

I'm trying to wrap my head around CouchDB. I'm trying to switch off of MongoDB to CouchDB because I think the concept of views are more appealing to me. In CouchDB it looks like all records are stored in a single database. There is no concept of collections or anything, like in MongoDB. So, when storing different data entities such as users, blog posts, comments, etc, how do you differentiate between them from within your map reduce functions? I was thinking about just using some sort of type property and for each item I'd just have to make sure to specify the type, always. This line of thought was sort of reinforced when I read over the CouchDB cookbook website, in which an example does the same thing.
Is this the most reliable way of doing this, or is there a better method? I was thinking of alternatives, and I think the only other alternative way is to basically embed as much as I can into logical documents. Like, the immediate records inside of the database would all be User records, and each User would have an array of Posts, in which you just add all of the Posts to. The downside here would be that embedded documents wouldn't get their own id properties, correct?
Using type is convenient and fast when creating views. Alternatively you can consider using a part of the JSON document. I.e., instead of defining:
{
type: "user",
firstname: "John",
lastname: "Smith"
}
You would have:
{
user: {
firstname: "John",
lastname: "Smith"
}
}
And then in the view for emitting documents containing user information, instead of using:
function (doc) {
if (doc.type === "user") emit(null, doc);
}
You would write:
function (doc) {
if (doc.user) emit(null, doc);
}
As you can see there is not much difference. As you have already realized 1st approach is the most widely used but second (afaik) is well accepted.
Regarding the question of storing all Posts of one User in one single document. Depends on how you plan to update your document. Remember that you need to write the whole document each time that you update (unless you use attachments). That means that each time a user writes a new Post you need to retrieve the document containing the array of Posts, add/modify one element and update the document. Probably too much (heavy).

CouchDB partial/differential writes

Basic problem
I have some large, but logically organised documents - and would like to perform updates on just a sub-section of an individual document.
Example
Given this simple document:
_id: 123456,
_rev: 3242342,
name: 'Stephen',
type: 'Person',
hobbies: [ 'sky-diving' ]
In my application I might have an addHobbies method, that would use a view that just retrieves:
_id: 123456,
_rev: 3242342,
hobbies: [ 'sky-diving' ]
So that it can then add an additional hobby to the hobbies array, and then PUT just this sub-set of data back to the document.
Question
As I understand it, CouchDB [1.2] does not allow partial updates like this, and so I believe it would be necessary to grab the whole document during the save operation, merge my changes, then PUT the whole document back on every single save.
Is there another way of doing this (am I wrong about CouchDB's capabilities)?
Are there any libraries (I'm using express on node.js) to handle this kind of operation?
You are correct. That is, in fact, what document database means: check-outs and check-ins.
You can create (or use) shim code to simulate what you want, letting you focus on the important parts. On the server side, you can use update functions.
There are many solutions on the client side.
cradle.js will give you fake partials updates with the merge method.
If you only want to update one or more attributes, and leave the others untouched, you can use the merge() method:
db.merge('luke', {jedi: true}, function (err, res) {
// Luke is now a jedi,
// but remains on the dark side of the force.
});
https://github.com/cloudhead/cradle/
Related, and also for Node.js is Transaction for performing arbitrary atomic transactions on CouchDB documents.
I would say that cradle is currently missing a real partial update feature, which would also support updating a path to a key inside the field value's JSON data, like Apache demonstrates here, rather than being limited to updating only a single key in a document, like the db.merge method.
In fact, looking at the cradle source, I see that there is a Database.prototype.update method (in lib/cradle/document.js), but this method doesn't seem to be documented.
It would be elegant of this could be made an integral part of cradle, eliminating the need to do separate requests to CouchDB view's updates only for partial updates.

Resources