Finding items by simpler ID's on MongooseJS - node.js

I'm playing with mongoose schemas. I have 3 items, each one with a corresponding ID field like this:
"_id": "508dcb79cb1c8ad910000001","
"_id": "508dcba389e2f32e11000002","
"_id": "508dcba389e2f32e11000003","
Is there any default way to process that big chunk of ID to get a cleaner one without having to set up custom id's numbers to just get the last numbers, corresponding the ID?
Or should I use a regex or something to be able to get them by id like this:
app.get('/fields/:id', function(req, res) { ; });
And access them with /field/1, /field/3 or whatever.

As described here, the ObjectID for a document is made up of a timestamp, a machine id, a process id, and a counter. I don't see a way to take the counter by itself and come up with the ObjectID. Using a regular expression to search the index probably isn't a good idea because you would end up iterating over all ObjectIDs in the index. As pointed out here, only simple prefix regular expressions can take advantage of the index. Furthermore, I think multiple documents could have the same counter, e.g. if a write occurred on two different machines to the same collection.
So the short answer is "no". If you really want short IDs you probably should roll your own sequence numbers, use something like shortid, or just look the documents up by a naturally unique index (if your collection happens to have one), like email, username, etc.

Related

Trying to understand mongodb indexes for finding documents with exact and unique value(s)

I am reading through mongo docs fro nodejs driver, particularly this index section https://www.mongodb.com/docs/drivers/node/current/fundamentals/indexes/#geospatial-indexes and it looks like all of the indexes that they mention are for sortable / searchable data. So I wanted to ask if I need indexes for following use case:
I have this user document structure
{
email: string,
version: number,
otherData: ...
}
As far as I understand I can query each user by _id and this already has default unique index applied to it? I alos want to query user by email as well, so I created following unique index
collection.createIndex({ email: 1 }, { unique: true })
Is my understanding correct here that by creating this index I guarantee thaa:
Email is always unique
My queries like collection.findOne({email: 'my#email.com'}) are optimised?
Next, I want to perform update operations on user documents, but only on specific versions, so:
collection.updateOne({email: '...', version: 2}, update)
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
Yes, the unique constraint happens at the db layer so by definition this will be unique, It is worth mentioning that this can affect insert/update performance as this check has to be executed on each of these instances - from my experience you only start feeling this overhead in larger scale ( hundreds of millions of documents in a single collection + thousands of inserts a minutes ).
Yes. there is no other way to optimize this further.
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
You want to create a compound index, the syntax will looks like this:
collection.createIndex({ email: 1, version: 1 }, { unique: true })
I will just say that by definition the (first) email index ensures uniqueness, so any additional filtering you add to the query and index will not really affect anything as there will always be only 1 of those emails in the DB. Basically why bother adding a "version" field to the query? if you need it for filtering that's fine but then you won't be needing to alter the existing index.

Adding extra value at Model.create with Mongoose

I have a large array that comes in from an API that I'd like to store straight into MongoDB.
Model.create(largeArray) ... // many documents created
The problem is, I have one additional key:value pair that I need to set for all documents in that array. It's a user id, and many documents are created for a given user once per API call. So for a given Model.create call, the user id is the same for every doc in the array.
Without mapping over the array, is there an efficient way of adding a field with a consistent value? Something like Model.create(myLargeArray, {userId: someUserId}) would be ideal, but I know this isn't the case with the Mongoose API.
function addDocsForUser(largeArray, someUserId) {
// each element of largeArray needs to have `userId: someUserId` added to it
return Model.create(largeArray)
}

How can I reduce a collection by keeping only change points?

I have a Collection exampled below. This data is pulled from an endpoint every twenty minutes on a cron job.
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474116005,"_id":"kX0DpoZ5fkMr2ezg"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474684808,"_id":"ken1WRN47PTW159H"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474275606,"_id":"ko9r8u860es7E2hI"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
I want to discard any document (row) that doesn't show a change in the empty (and consequently ready). My goal is to find the most recent time stamp where these values have changed with in this collection.
Better illustrated, I want to reduce it to where the values change as such:
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
Can I do this at the in a MongoDB query? Or am I better off with a JavaScript filter function?
MongoDB allows you to specify a unique constraint on an index. These constraints prevent applications from inserting documents that have duplicate values for the inserted fields.
Use the following code to make unique
db.collection.createIndex( { "id": 1 }, { unique: true } )
Also refer the MongoDB documentation for more clarification.

Pagination in CouchDB using variable keys

There's a bunch of questions on here related to pagination using CouchDB, but none that quite fit what I'm wondering about.
Basically, I have a result set ranked by number of votes, and I want to page through the set in descending order.
Here's the map for reference.
function(doc) {
emit(doc.votes);
}
Now, the problem. I found out that startkey_docid doesn't work on it's own. You have to use it in combination with startkey. The thing is, for the query, I don't use a startkey parameter (I'm not looking to restrict the results, just get the most->least). I was thinking I could just use startkey={{doc.votes}}&startkey_docid={{doc._id}} instead, but the number of votes for a document could have changed by the time someone clicks the "Next Page" link.
The way to solve this seemed obvious: just set startkey=99999999 so that it will return all documents in the database and I can just use startkey_docid to start at the one where we left off last time. Oddly, when I do that, the startkey_docid stopped working and just allowed all results to be returned again. Apparently startkey needs to exactly equal the key on the document whose _id is used in startkey_docid.
What I'm asking is whether anyone knows a workaround for using startkey_docid to page when the actual startkey could have changed by the time you want to use it? Should my application just lookup the document by _id and immediately use the doc.votes value hoping it hasn't changed in the few milliseconds between requests? Even that doesn't seem very reliable.
EDIT: Ended up switching to Mongo for the speed, so this question turned out to be kinda moot.
I have never done something like this but I think I have some idea how to do it. What you can do is to take a snapshot of the ratings and refer to it in every page. You probably want your view not to consume to much space, so you should not map separate copies of the documents with votes not changed after taking the snapshot. So, you can do the following:
Add some history of ratings with timestamp to your document.
Map the ratings AND history like this.
In your app get the current time: start_time = Date.now() and query all pages.
Cleanup the history older then the oldest active sessions.
The problem is that if you emit [votes, date] and try to paginate you will never know how many document you have to fetch to get desired number per page. There can always be some older version which you will have to skip, and you will have make next get from DB. Thats why you can consider emitting: [date, votes], read the view always twice -- for start_time and current time, and merge and sort the result (like in merge-sort).
Ad.1:
{ ...,
votes: 12,
history: [
{date: 1357390271342, votes: 10},
{date: 1357390294682, votes: 11}
]
}
Ad.2:
function (doc) {
emit([{}, doc.votes], null);
doc.history && doc.history.forEach(function(h) {
emit([h.date, h.votes], null);
});
}
Ad.3:
?startkey=[start_time, votes]&limit=items_per_page_plus1
?startkey=[{}, votes]&limit=items_per_page_plus1
Merge lists, sort by votes in your app (on in a list function).
If you will have problems with using start_docid then you can emit [date, votes, id] and query with the ID explicitly. Even when this particular doc changes its votes it will still be available in the history.
Ad.4:
If you emit [date, votes] then you can just get outdated history width: ?startkey=[0]&endkey=[oldest_active_session_time]&inclusive_end=false and update them with update handler:
function(doc, req) {
if (!doc || !doc.history) return [null, 'Error'];
var history = new Array();
var oldest = +(req.query.date);
doc.history.forEach(function(h) {
if (h.date >= oldest)
history.push(h);
});
doc.history = history;
return [doc, 'OK'];
}
Note: I have not tested it, so it is expected not to run without modifications :)
As far as I know CouchDB uses b-tree shadowing to make updates and in principle is should be possible to access older revisions of the view. I am not into the CouchDB design, so it is just a guess and there seems not to be any (documented) API for this.
I can't figure out any simple solution by now, but there are options:
Replicate not-so-often your sorting list to small dedicated db so it will be much more stale than stale=ok
Modify your schema in a way that you'll be able to sort by some more stable data. Look at the banking/ledger example in CouchDb guide: http://guide.couchdb.org/draft/recipes.html#banking. Try to log every vote and reduce them hourly for example. As a bonus you'll get a history/trends :)
I'm kind of surprised this question has been left unanswered because the functionality of CouchDB Futon basically does this when you are paginating through the results of a map function. I opened up firebug to see what was happening in the javascript console as I paginated and saw that for every set of paginated results it is passing the startkey along with startkey_docid. So although the question is how do I paginate without including startkey, CouchDB specifies that the startkey is required and demonstrates how it can work. The endkey is not specified, so if there is only one result for the specified startkey, the next set of paginated results will also contain the next key of the sorted results that do not match the startkey.
So to clarify a bit, the answer to this problem is that as you are paginating and keeping track of the startkey_docid, you also need to capture the startkey of the same document that will be the start of the next set of results. When you are calling the paginated results use both the captured startkey and startkey_docid as couchdb requires. Leave endkey off so that the results will continue on to the next key of the sorted results.
The usecase scenario for wanting to be able to paginate without specifying a key is kind of odd. So let's say that the start docid of the next paginated result did change it's key value drastically from a 9 to a 3. And we are also assuming that there is only one instance of the docid existing in the map results, even though it could potentially appear multiple times (which I believe is why the startkey needs to be specified). As the user is clicking the next button, the user's paginated results will have now moved from looking at rank 9 to rank 3. But if you are including the startkey in addition to the startkey_docid, the paginated results would just start all over at the beginning of the rank 9 results which is a more logical progression than potentially jumping over a large set of results.

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

Resources