How to order by twice with MongoDB, Mongoose, and NodeJS [duplicate]

How to order by twice with MongoDB, Mongoose, and NodeJS [duplicate] - node.js

I am looking to get a random record from a huge collection (100 million records).
What is the fastest and most efficient way to do so?
The data is already there and there are no field in which I can generate a random number and obtain a random row.

Starting with the 3.2 release of MongoDB, you can get N random docs from a collection using the $sample aggregation pipeline operator:
// Get one random document from the mycoll collection.
db.mycoll.aggregate([{ $sample: { size: 1 } }])
If you want to select the random document(s) from a filtered subset of the collection, prepend a $match stage to the pipeline:
// Get one random document matching {a: 10} from the mycoll collection.
db.mycoll.aggregate([
{ $match: { a: 10 } },
{ $sample: { size: 1 } }
])
As noted in the comments, when size is greater than 1, there may be duplicates in the returned document sample.

Do a count of all records, generate a random number between 0 and the count, and then do:
db.yourCollection.find().limit(-1).skip(yourRandomNumber).next()

Update for MongoDB 3.2
3.2 introduced $sample to the aggregation pipeline.
There's also a good blog post on putting it into practice.
For older versions (previous answer)
This was actually a feature request: http://jira.mongodb.org/browse/SERVER-533 but it was filed under "Won't fix."
The cookbook has a very good recipe to select a random document out of a collection: http://cookbook.mongodb.org/patterns/random-attribute/
To paraphrase the recipe, you assign random numbers to your documents:
db.docs.save( { key : 1, ..., random : Math.random() } )
Then select a random document:
rand = Math.random()
result = db.docs.findOne( { key : 2, random : { $gte : rand } } )
if ( result == null ) {
result = db.docs.findOne( { key : 2, random : { $lte : rand } } )
}
Querying with both $gte and $lte is necessary to find the document with a random number nearest rand.
And of course you'll want to index on the random field:
db.docs.ensureIndex( { key : 1, random :1 } )
If you're already querying against an index, simply drop it, append random: 1 to it, and add it again.

You can also use MongoDB's geospatial indexing feature to select the documents 'nearest' to a random number.
First, enable geospatial indexing on a collection:
db.docs.ensureIndex( { random_point: '2d' } )
To create a bunch of documents with random points on the X-axis:
for ( i = 0; i < 10; ++i ) {
db.docs.insert( { key: i, random_point: [Math.random(), 0] } );
}
Then you can get a random document from the collection like this:
db.docs.findOne( { random_point : { $near : [Math.random(), 0] } } )
Or you can retrieve several document nearest to a random point:
db.docs.find( { random_point : { $near : [Math.random(), 0] } } ).limit( 4 )
This requires only one query and no null checks, plus the code is clean, simple and flexible. You could even use the Y-axis of the geopoint to add a second randomness dimension to your query.

The following recipe is a little slower than the mongo cookbook solution (add a random key on every document), but returns more evenly distributed random documents. It's a little less-evenly distributed than the skip( random ) solution, but much faster and more fail-safe in case documents are removed.
function draw(collection, query) {
// query: mongodb query object (optional)
var query = query || { };
query['random'] = { $lte: Math.random() };
var cur = collection.find(query).sort({ rand: -1 });
if (! cur.hasNext()) {
delete query.random;
cur = collection.find(query).sort({ rand: -1 });
}
var doc = cur.next();
doc.random = Math.random();
collection.update({ _id: doc._id }, doc);
return doc;
}
It also requires you to add a random "random" field to your documents so don't forget to add this when you create them : you may need to initialize your collection as shown by Geoffrey
function addRandom(collection) {
collection.find().forEach(function (obj) {
obj.random = Math.random();
collection.save(obj);
});
}
db.eval(addRandom, db.things);
Benchmark results
This method is much faster than the skip() method (of ceejayoz) and generates more uniformly random documents than the "cookbook" method reported by Michael:
For a collection with 1,000,000 elements:
This method takes less than a millisecond on my machine
the skip() method takes 180 ms on average
The cookbook method will cause large numbers of documents to never get picked because their random number does not favor them.
This method will pick all elements evenly over time.
In my benchmark it was only 30% slower than the cookbook method.
the randomness is not 100% perfect but it is very good (and it can be improved if necessary)
This recipe is not perfect - the perfect solution would be a built-in feature as others have noted.
However it should be a good compromise for many purposes.

Here is a way using the default ObjectId values for _id and a little math and logic.
// Get the "min" and "max" timestamp values from the _id in the collection and the
// diff between.
// 4-bytes from a hex string is 8 characters
var min = parseInt(db.collection.find()
.sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
max = parseInt(db.collection.find()
.sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
diff = max - min;
// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;
// Use "random" in the range and pad the hex string to a valid ObjectId
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")
// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
.sort({ "_id": 1 }).limit(1).toArray()[0];
That's the general logic in shell representation and easily adaptable.
So in points:
Find the min and max primary key values in the collection
Generate a random number that falls between the timestamps of those documents.
Add the random number to the minimum value and find the first document that is greater than or equal to that value.
This uses "padding" from the timestamp value in "hex" to form a valid ObjectId value since that is what we are looking for. Using integers as the _id value is essentially simplier but the same basic idea in the points.

Now you can use the aggregate.
Example:
db.users.aggregate(
[ { $sample: { size: 3 } } ]
)
See the doc.

In Python using pymongo:
import random
def get_random_doc():
count = collection.count()
return collection.find()[random.randrange(count)]

Using Python (pymongo), the aggregate function also works.
collection.aggregate([{'$sample': {'size': sample_size }}])
This approach is a lot faster than running a query for a random number (e.g. collection.find([random_int]). This is especially the case for large collections.

it is tough if there is no data there to key off of. what are the _id field? are they mongodb object id's? If so, you could get the highest and lowest values:
lowest = db.coll.find().sort({_id:1}).limit(1).next()._id;
highest = db.coll.find().sort({_id:-1}).limit(1).next()._id;
then if you assume the id's are uniformly distributed (but they aren't, but at least it's a start):
unsigned long long L = first_8_bytes_of(lowest)
unsigned long long H = first_8_bytes_of(highest)
V = (H - L) * random_from_0_to_1();
N = L + V;
oid = N concat random_4_bytes();
randomobj = db.coll.find({_id:{$gte:oid}}).limit(1);

You can pick a random timestamp and search for the first object that was created afterwards.
It will only scan a single document, though it doesn't necessarily give you a uniform distribution.
var randRec = function() {
// replace with your collection
var coll = db.collection
// get unixtime of first and last record
var min = coll.find().sort({_id: 1}).limit(1)[0]._id.getTimestamp() - 0;
var max = coll.find().sort({_id: -1}).limit(1)[0]._id.getTimestamp() - 0;
// allow to pass additional query params
return function(query) {
if (typeof query === 'undefined') query = {}
var randTime = Math.round(Math.random() * (max - min)) + min;
var hexSeconds = Math.floor(randTime / 1000).toString(16);
var id = ObjectId(hexSeconds + "0000000000000000");
query._id = {$gte: id}
return coll.find(query).limit(1)
};
}();

My solution on php:
/**
* Get random docs from Mongo
* #param $collection
* #param $where
* #param $fields
* #param $limit
* #author happy-code
* #url happy-code.com
*/
private function _mongodb_get_random (MongoCollection $collection, $where = array(), $fields = array(), $limit = false) {
// Total docs
$count = $collection->find($where, $fields)->count();
if (!$limit) {
// Get all docs
$limit = $count;
}
$data = array();
for( $i = 0; $i < $limit; $i++ ) {
// Skip documents
$skip = rand(0, ($count-1) );
if ($skip !== 0) {
$doc = $collection->find($where, $fields)->skip($skip)->limit(1)->getNext();
} else {
$doc = $collection->find($where, $fields)->limit(1)->getNext();
}
if (is_array($doc)) {
// Catch document
$data[ $doc['_id']->{'$id'} ] = $doc;
// Ignore current document when making the next iteration
$where['_id']['$nin'][] = $doc['_id'];
}
// Every iteration catch document and decrease in the total number of document
$count--;
}
return $data;
}

In order to get a determinated number of random docs without duplicates:
first get all ids
get size of documents
loop geting random index and skip duplicated
number_of_docs=7
db.collection('preguntas').find({},{_id:1}).toArray(function(err, arr) {
count=arr.length
idsram=[]
rans=[]
while(number_of_docs!=0){
var R = Math.floor(Math.random() * count);
if (rans.indexOf(R) > -1) {
continue
} else {
ans.push(R)
idsram.push(arr[R]._id)
number_of_docs--
}
}
db.collection('preguntas').find({}).toArray(function(err1, doc1) {
if (err1) { console.log(err1); return; }
res.send(doc1)
});
});

The best way in Mongoose is to make an aggregation call with $sample.
However, Mongoose does not apply Mongoose documents to Aggregation - especially not if populate() is to be applied as well.
For getting a "lean" array from the database:
/*
Sample model should be init first
const Sample = mongoose …
*/
const samples = await Sample.aggregate([
{ $match: {} },
{ $sample: { size: 33 } },
]).exec();
console.log(samples); //a lean Array
For getting an array of mongoose documents:
const samples = (
await Sample.aggregate([
{ $match: {} },
{ $sample: { size: 27 } },
{ $project: { _id: 1 } },
]).exec()
).map(v => v._id);
const mongooseSamples = await Sample.find({ _id: { $in: samples } });
console.log(mongooseSamples); //an Array of mongoose documents

I would suggest using map/reduce, where you use the map function to only emit when a random value is above a given probability.
function mapf() {
if(Math.random() <= probability) {
emit(1, this);
}
}
function reducef(key,values) {
return {"documents": values};
}
res = db.questions.mapReduce(mapf, reducef, {"out": {"inline": 1}, "scope": { "probability": 0.5}});
printjson(res.results);
The reducef function above works because only one key ('1') is emitted from the map function.
The value of the "probability" is defined in the "scope", when invoking mapRreduce(...)
Using mapReduce like this should also be usable on a sharded db.
If you want to select exactly n of m documents from the db, you could do it like this:
function mapf() {
if(countSubset == 0) return;
var prob = countSubset / countTotal;
if(Math.random() <= prob) {
emit(1, {"documents": [this]});
countSubset--;
}
countTotal--;
}
function reducef(key,values) {
var newArray = new Array();
for(var i=0; i < values.length; i++) {
newArray = newArray.concat(values[i].documents);
}
return {"documents": newArray};
}
res = db.questions.mapReduce(mapf, reducef, {"out": {"inline": 1}, "scope": {"countTotal": 4, "countSubset": 2}})
printjson(res.results);
Where "countTotal" (m) is the number of documents in the db, and "countSubset" (n) is the number of documents to retrieve.
This approach might give some problems on sharded databases.

You can pick random _id and return corresponding object:
db.collection.count( function(err, count){
db.collection.distinct( "_id" , function( err, result) {
if (err)
res.send(err)
var randomId = result[Math.floor(Math.random() * (count-1))]
db.collection.findOne( { _id: randomId } , function( err, result) {
if (err)
res.send(err)
console.log(result)
})
})
})
Here you dont need to spend space on storing random numbers in collection.

The following aggregation operation randomly selects 3 documents from the collection:
db.users.aggregate(
[ { $sample: { size: 3 } } ]
)
https://docs.mongodb.com/manual/reference/operator/aggregation/sample/

MongoDB now has $rand
To pick n non repeat items, aggregate with { $addFields: { _f: { $rand: {} } } } then $sort by _f and $limit n.

I'd suggest adding a random int field to each object. Then you can just do a
findOne({random_field: {$gte: rand()}})
to pick a random document. Just make sure you ensureIndex({random_field:1})

When I was faced with a similar solution, I backtracked and found that the business request was actually for creating some form of rotation of the inventory being presented. In that case, there are much better options, which have answers from search engines like Solr, not data stores like MongoDB.
In short, with the requirement to "intelligently rotate" content, what we should do instead of a random number across all of the documents is to include a personal q score modifier. To implement this yourself, assuming a small population of users, you can store a document per user that has the productId, impression count, click-through count, last seen date, and whatever other factors the business finds as being meaningful to compute a q score modifier. When retrieving the set to display, typically you request more documents from the data store than requested by the end user, then apply the q score modifier, take the number of records requested by the end user, then randomize the page of results, a tiny set, so simply sort the documents in the application layer (in memory).
If the universe of users is too large, you can categorize users into behavior groups and index by behavior group rather than user.
If the universe of products is small enough, you can create an index per user.
I have found this technique to be much more efficient, but more importantly more effective in creating a relevant, worthwhile experience of using the software solution.

non of the solutions worked well for me. especially when there are many gaps and set is small.
this worked very well for me(in php):
$count = $collection->count($search);
$skip = mt_rand(0, $count - 1);
$result = $collection->find($search)->skip($skip)->limit(1)->getNext();

My PHP/MongoDB sort/order by RANDOM solution. Hope this helps anyone.
Note: I have numeric ID's within my MongoDB collection that refer to a MySQL database record.
First I create an array with 10 randomly generated numbers
$randomNumbers = [];
for($i = 0; $i < 10; $i++){
$randomNumbers[] = rand(0,1000);
}
In my aggregation I use the $addField pipeline operator combined with $arrayElemAt and $mod (modulus). The modulus operator will give me a number from 0 - 9 which I then use to pick a number from the array with random generated numbers.
$aggregate[] = [
'$addFields' => [
'random_sort' => [ '$arrayElemAt' => [ $randomNumbers, [ '$mod' => [ '$my_numeric_mysql_id', 10 ] ] ] ],
],
];
After that you can use the sort Pipeline.
$aggregate[] = [
'$sort' => [
'random_sort' => 1
]
];

My simplest solution to this ...
db.coll.find()
.limit(1)
.skip(Math.floor(Math.random() * 500))
.next()
Where you have at least 500 items on collections

If you have a simple id key, you could store all the id's in an array, and then pick a random id. (Ruby answer):
ids = #coll.find({},fields:{_id:1}).to_a
#coll.find(ids.sample).first

Using Map/Reduce, you can certainly get a random record, just not necessarily very efficiently depending on the size of the resulting filtered collection you end up working with.
I've tested this method with 50,000 documents (the filter reduces it to about 30,000), and it executes in approximately 400ms on an Intel i3 with 16GB ram and a SATA3 HDD...
db.toc_content.mapReduce(
/* map function */
function() { emit( 1, this._id ); },
/* reduce function */
function(k,v) {
var r = Math.floor((Math.random()*v.length));
return v[r];
},
/* options */
{
out: { inline: 1 },
/* Filter the collection to "A"ctive documents */
query: { status: "A" }
}
);
The Map function simply creates an array of the id's of all documents that match the query. In my case I tested this with approximately 30,000 out of the 50,000 possible documents.
The Reduce function simply picks a random integer between 0 and the number of items (-1) in the array, and then returns that _id from the array.
400ms sounds like a long time, and it really is, if you had fifty million records instead of fifty thousand, this may increase the overhead to the point where it becomes unusable in multi-user situations.
There is an open issue for MongoDB to include this feature in the core... https://jira.mongodb.org/browse/SERVER-533
If this "random" selection was built into an index-lookup instead of collecting ids into an array and then selecting one, this would help incredibly. (go vote it up!)

This works nice, it's fast, works with multiple documents and doesn't require populating rand field, which will eventually populate itself:
add index to .rand field on your collection
use find and refresh, something like:
// Install packages:
// npm install mongodb async
// Add index in mongo:
// db.ensureIndex('mycollection', { rand: 1 })
var mongodb = require('mongodb')
var async = require('async')
// Find n random documents by using "rand" field.
function findAndRefreshRand (collection, n, fields, done) {
var result = []
var rand = Math.random()
// Append documents to the result based on criteria and options, if options.limit is 0 skip the call.
var appender = function (criteria, options, done) {
return function (done) {
if (options.limit > 0) {
collection.find(criteria, fields, options).toArray(
function (err, docs) {
if (!err && Array.isArray(docs)) {
Array.prototype.push.apply(result, docs)
}
done(err)
}
)
} else {
async.nextTick(done)
}
}
}
async.series([
// Fetch docs with unitialized .rand.
// NOTE: You can comment out this step if all docs have initialized .rand = Math.random()
appender({ rand: { $exists: false } }, { limit: n - result.length }),
// Fetch on one side of random number.
appender({ rand: { $gte: rand } }, { sort: { rand: 1 }, limit: n - result.length }),
// Continue fetch on the other side.
appender({ rand: { $lt: rand } }, { sort: { rand: -1 }, limit: n - result.length }),
// Refresh fetched docs, if any.
function (done) {
if (result.length > 0) {
var batch = collection.initializeUnorderedBulkOp({ w: 0 })
for (var i = 0; i < result.length; ++i) {
batch.find({ _id: result[i]._id }).updateOne({ rand: Math.random() })
}
batch.execute(done)
} else {
async.nextTick(done)
}
}
], function (err) {
done(err, result)
})
}
// Example usage
mongodb.MongoClient.connect('mongodb://localhost:27017/core-development', function (err, db) {
if (!err) {
findAndRefreshRand(db.collection('profiles'), 1024, { _id: true, rand: true }, function (err, result) {
if (!err) {
console.log(result)
} else {
console.error(err)
}
db.close()
})
} else {
console.error(err)
}
})
ps. How to find random records in mongodb question is marked as duplicate of this question. The difference is that this question asks explicitly about single record as the other one explicitly about getting random documents.

For me, I wanted to get the same records, in a random order, so I created an empty array used to sort, then generated random numbers between one and 7( I have seven fields). So each time I get a different value, I assign a different random sort.
It is 'layman' but it worked for me.
//generate random number
const randomval = some random value;
//declare sort array and initialize to empty
const sort = [];
//write a conditional if else to get to decide which sort to use
if(randomval == 1)
{
sort.push(...['createdAt',1]);
}
else if(randomval == 2)
{
sort.push(...['_id',1]);
}
....
else if(randomval == n)
{
sort.push(...['n',1]);
}

If you're using mongoid, the document-to-object wrapper, you can do the following in
Ruby. (Assuming your model is User)
User.all.to_a[rand(User.count)]
In my .irbrc, I have
def rando klass
klass.all.to_a[rand(klass.count)]
end
so in rails console, I can do, for example,
rando User
rando Article
to get documents randomly from any collection.

you can also use shuffle-array after executing your query
var shuffle = require('shuffle-array');
Accounts.find(qry,function(err,results_array){
newIndexArr=shuffle(results_array);

What works efficiently and reliably is this:
Add a field called "random" to each document and assign a random value to it, add an index for the random field and proceed as follows:
Let's assume we have a collection of web links called "links" and we want a random link from it:
link = db.links.find().sort({random: 1}).limit(1)[0]
To ensure the same link won't pop up a second time, update its random field with a new random number:
db.links.update({random: Math.random()}, link)

Related

Mongoose get a random element except one

I have a collection of articles in mongodb. I choose an article that i want to render, and I want two other articles chosen randomly. I want to pick two articles in my collection that are not the same, and are not the article I have chosen before.
Been on this problem for hours, search for a solution but only found how to pick an element randomly, but not except one...
Here is what I have now :
article.find({}, function(err, articles{
var articleChosen = articles.filter(selectArticleUrl, articleUrl)[0];
article.find({})
.lean()
.distinct("_id")
.exec(function(err, arrayIds){
var articleChosenIndex = arrayIds.indexOf(articleChosen._id);
arrayIds.splice(articleChosenIndex, 1);
chooseRdmArticle(arrayIds, function(articleRdm1Id){
var articleRmd1 = articles.filter(selectArticleId, articleRdm1Id)[0];
var articleRdm1Index = arrayIds.indexOf(articleRdm1Id);
arrayIds.splice(articleRdm1Index, 1);
chooseRdmArticle(arrayIds, function(articleRdm2Id){
var articleRmd2 = articles.filter(selectArticleId, articleRdm2Id)[0];
// do stuff with articleChosen, articleRmd1 and articleRmd2
})
})
})
})
where the function which choose rdm article is :
function chooseRdmArticle(articles, callback){
var min = Math.ceil(0);
var max = Math.floor(articles.length);
var rdm = Math.floor(Math.random() * (max - min)) + min;
callback(articles[rdm])
}
and the function which select the article from its url is :
function selectArticleUrl(element){
return element.url == this
}
My idea was to work on the array containing all the ObjectId (arrayIds here), to choose two Ids randomly after removing the articleChosen id. But I understood that arrayIds.indexOf(articleRdm1Id); couldn't work because ObjectIds are not strings ... Is there a method to find the index of the Id I want? Or any better idea ?
Thanks a lot !

Run two queries where the first fetches the chosen document and the other uses the aggregation framework to run a pipeline with the $sample operator to return 2 random documents from the collection except the chosen one.
The following query uses Mongoose's built-in Promises to demonstrate this:
let chosenArticle = article.find({ "url": articleUrl }).exec();
let randomArticles = article.aggregate([
{ "$match": { "url": { "$ne": articleUrl } } },
{ "$sample": { "size": 2 } }
]).exec();
Promise.all([chosenArticle, randomArticles]).then(articles => {
console.log(articles);
});

There is the mongodb command $sample, which is gonna read documents in a random way.
Example from the documentation :
db.users.aggregate( [ { $sample: { size: 3 } } ] )

I had the same problem and this works for me
const suggestedArticles = await Article.find({
articleId: { $ne: req.params.articleId },
}).limit(2);

How to define a sort function in Mongoose

I'm developing a small NodeJS web app using Mongoose to access my MongoDB database. A simplified schema of my collection is given below:
var MySchema = mongoose.Schema({
content: { type: String },
location: {
lat: { type: Number },
lng: { type: Number },
},
modifierValue: { type: Number }
});
Unfortunately, I'm not able to sort the retrieved data from the server the way it is more convenient for me. I wish to sort my results according to their distance from a given position (location) but taking into account a modifier function with a modifierValue that is also considered as an input.
What I intend to do is written below. However, this sort of sort functionality seems to not exist.
MySchema.find({})
.sort( modifierFunction(location,this.location,this.modifierValue) )
.limit(20) // I only want the 20 "closest" documents
.exec(callback)
The mondifierFunction returns a Double.
So far, I've studied the possibility of using mongoose's $near function, but this doesn't seem to sort, not allow for a modifier function.
Since I'm fairly new to node.js and mongoose, I may be taking a completely wrong approach to my problem, so I'm open to complete redesigns of my programming logic.
Thank you in advance,

You might have found an answer to this already given the question date, but I'll answer anyway.
For more advanced sorting algorithms you can do the sorting in the exec callback. For example
MySchema.find({})
.limit(20)
.exec(function(err, instances) {
let sorted = mySort(instances); // Sorting here
// Boilerplate output that has nothing to do with the sorting.
let response = { };
if (err) {
response = handleError(err);
} else {
response.status = HttpStatus.OK;
response.message = sorted;
}
res.status(response.status).json(response.message);
})
mySort() has the found array from the query execution as input and the sorted array as output. It could for instance be something like this
function mySort (array) {
array.sort(function (a, b) {
let distanceA = Math.sqrt(a.location.lat**2 + a.location.lng**2);
let distanceB = Math.sqrt(b.location.lat**2 + b.location.lng**2);
if (distanceA < distanceB) {
return -1;
} else if (distanceA > distanceB) {
return 1;
} else {
return 0;
}
})
return array;
}
This sorting algorithm is just an illustration of how sorting could be done. You would of course have to write the proper algorithm yourself. Remember that the result of the query is an array that you can manipulate as you want. array.sort() is your friend. You can information about it here.

Add an array as a subdocument to a Mongoose model instance

I am building an array with data and want to push that array to sub-document.
var pubArray = [];
var count = 5
for (i = 0; i < count; i++) {
pubArray.push({publicationName: req.body.publicationName[i], dateSent:req.body.dateSent[i]});
};
Students.findOne({studentNumber: filter}, function (err, student) {
student.publications.push({pubArray});
student.save();
});
If I use the {publicationName: req.body.publicationName[i], dateSent:req.body.dateSent[i]} inside the student.publications.push it works fine. If I try to push the array, nothing happens.

Note that the .push() method in mogoose works just like it's JavaScript equivalent in that it is "pushing" a single element onto the array, rather than a whole array. So you can either assign the whole array or just construct in the loop:
student.publications = pubArray;
or:
// Construct with .push in loop:
Students.findOne({ "studentNumber": filter },function(err,student) {
for ( var i = 0; i < count: i++ ) {
student.publications.push({
"publicationName": req.body.publicationName[i],
"dateSent": req.body.dateSent[i]
});
}
student.save(function(err) {
// Complete
});
});
But really you would be better off using an "atomic" operator of $push with $each in a direct update. This is then just one trip to the server, rather than two:
Students.update(
{ "studentNumber": filter },
{ "$push": { "publications": { "$each": pubArray } } },
function(err,numAffected) {
}
);
That is generally worlds better than the "find/modify/save" pattern, and not only in being more efficient, but it also avoids potential conflicts or overwriting data since the object and array is modified "in-place" in the database, with the state current to the time of modification.
Atomic operators should always be favoured for the performance benefits as well as lack of conflicts in modification.

The publications property of the student object is an Array. You can simply assign this property to the pubArray created earlier:
Students.findOne({studentNumber: filter}, function (err, student) {
student.publications = pubArray;
student.save();
});

Shuffle sub documents in mongoose query

I have following models:
Question Model
var OptionSchema = new Schema({
correct : {type:Boolean, default:false}
value : String
});
var QuestionSchema = new Schema({
value : String
, choices : [OptionSchema]
, quiz : {type:ObjectId, ref:'quizzes'}
, createdOn : {type:Date, default:Date.now}
...
});
var Question = mongoose.model('questions', QuestionSchema);
Quiz Model
var QuizSchema = new Schema({
name : String
, questions : [{type:ObjectId, ref:'questions'}]
,company : {type:ObjectId, ref:'companies'}
...
});
var Quiz = mongoose.model('quizzes', QuizSchema);
Company Model
var CompanySchema = new Schema({
name :String
...
});
I want to shuffle choices of each question per each query, and I am doing It as follows :
shuffle = function(v){
//+ Jonas Raoni Soares Silva
//# http://jsfromhell.com/array/shuffle [rev. #1]
for(var j, x, i = v.length; i; j = parseInt(Math.random() * i), x = v[--i], v[i] = v[j], v[j] = x);
return v;
};
app.get('/api/companies/:companyId/quizzes', function(req, res){
var Query = Quiz.find({company:req.params.companyId});
Query.populate('questions');
Query.exec(function(err, docs){
docs.forEach(function(doc) {
doc.questions.forEach(function(question) {
question.choices = shuffle(question.choices);
})
});
res.json(docs);
});
});
My Question is :
Could I randomize the choices array without looping through all documents as now I am doing?

shuffle = function(v){
//+ Jonas Raoni Soares Silva
//# http://jsfromhell.com/array/shuffle [rev. #1]
for(var j, x, i = v.length; i; j = parseInt(Math.random() * i), x = v[--i], v[i] = v[j], v[j] = x);
return v;
};
app.get('/api/companies/:companyId/quizzes', function(req, res){
var Query = Quiz.find({company:req.params.companyId});
Query.populate('questions');
Query.exec(function(err, docs){
var raw = docs.toObject();
//shuffle choices
raw.questions.map(el => shuffle(el.choices))
//if you need to shuffle the questions too
shuffle(raw.questions);
//if you need to limit the output questions, especially when ouput questions needs to be a subset of a pool of questions
raw.questions.splice(limit);
res.json(raw); // output quiz with shuffled questions and answers
});
});

The essence of the question comes down to "Can I randomly shuffle results and have MongoDB do the work for me?". Well yes you can, but the important thing to remember here is that "populate" is not longer going to be your friend in helping you do so and you will need to perform the work that is doing yourself.
The short part of this is we are going to "hand-off" your client side "shuffle" to mapReduce in order to process the shuffling of the "choices" on the server. Just for kicks, I'm adding in a technique to shuffle your "questions" as well:
var Query = Quiz.findOne({ company: "5382a58bb7ea27c9301aa9df" });
Query.populate('company', 'name -_id');
Query.exec(function(err,quiz) {
var shuffle = function(v) {
for(var j, x, i = v.length; i; j = parseInt(Math.random() * i), x = v[--i], v[i] = v[j], v[j] = x);
};
if (err)
throw err;
var raw = quiz.toObject();
shuffle( raw.questions );
Question.mapReduce(
{
map: function() {
shuffle( this.choices );
var found = -1;
for ( var n=0; n<inputs.length; n++ ) {
if ( this._id.toString() == inputs[n].toString() ) {
found = n;
break;
}
}
emit( found, this );
},
reduce: function() {},
scope: { inputs: raw.questions, shuffle: shuffle },
query: { "_id": { "$in": raw.questions } }
},
function(err,results) {
if (err)
throw err;
raw.questions = results.map(function(x) {
return x.value;
});
console.log( JSON.stringify( raw, undefined, 4 ) );
}
);
});
So the essential part of this is rather than allowing "populate" to pull all the related question information into your schema object, you are doing a manual replacement using mapReduce.
Note that the "schema document" must be converted to a plain object which is done by the .toObject() call in there in order to allow us to replace "questions" with something that would not match the schema type.
We give mapReduce a query to select the required questions from the model by simply passing in the "questions" array as an argument to match on _id. Really nothing directly different to what "populate" does for you behind the scenes, it's just that we are going to handle the "merge" manually.
The "shuffle" function is now executed on the server, which since it was declared as a var we can easily pass in via the "scope", and the "options" array will be shuffled before it is emitted, and eventually returned.
The other optional as I said was that we are also "shuffling" the questions, which is merely done by calling "shuffle" on just the _id values of the "questions" array and then passing this into the "scope". Noting that this is also passed to the query via $in but that alone does not guarantee the return order.
The trick employed here is that mapReduce at the "map" stage, must "emit" all keys in their ascending order to later stages. So by comparing the current _id value to where it's position is as an index value of the "inputs" array from scope then there is a positional order that can be emitted as the "key" value here to respect the order of the shuffle done already.
The "merging" then is quite simple as we just replace the "questions" array with the values returned from the mapReduce. There is a little help here from the .map() Array function here to clean up the results from the way mapReduce returns things.
Aside from the fact that your "options" are now actually shuffled on the server rather than through a loop, this should give you ideas of how to "custom populate" for other functions such as "slicing" and "paging" the array of referenced "questions" if that is something else you might want to look at.

Creating incrementing numbers with mongoDB

We have an order system where every order has an id. For accounting purposes we need a way to generate invoices with incremening numbers. What is the best way to do this without using an sql database?
We are using node to implement the application.

http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field
The first approach is keeping counters in a side document:
One can keep a counter of the current _id in a side document, in a
collection dedicated to counters. Then use FindAndModify to atomically
obtain an id and increment the counter.
The other approach is to loop optimistically and handle dup key error code of 11000 by continuing and incrementing the id for the edge case of collisions. That works well unless there's high concurrency writes to a specific collection.
One can do it with an optimistic concurrency "insert if not present"
loop.
But be aware of the warning on that page:
Generally in MongoDB, one does not use an auto-increment pattern for
_id's (or other fields), as this does not scale up well on large database clusters. Instead one typically uses Object IDs.
Other things to consider:
Timestamp - unique long but not incrementing (base on epoch)
Hybrid Approach - apps don't necessarily have to pick one storage option.
Come up with your id mechanism based on things like customer, date/time parts etc... that you generate and handle collisions for. Depending on the scheme, collisions can be much less likely. Not necessarily incrementing but is unique and has a well defined readable pattern.

I did not find any working solution, so I implemented the "optimistic loop" in node.js to get Auto-Incrementing Interger ID fields. Uses the async module to realize the while loop.
// Insert the document to the targetCollection. Use auto-incremented integer IDs instead of UIDs.
function insertDocument(targetCollection, document, callback) {
var keepRunning = true;
var seq = 1;
// $type 16/18: Integer Values
var isNumericQuery = {$or : [{"_id" : { $type : 16 }}, {"_id" : { $type : 18 }}]};
async.whilst(testFunction, mainFunction, afterFinishFunction);
// Called before each execution of mainFunction(). Works like the stop criteria of a while function.
function testFunction() {
return keepRunning;
}
// Called each time the testFunction() passes. It is passed a function (next) which must be called after it has completed.
function mainFunction(next) {
findCursor(targetCollection, findCursorCallback, isNumericQuery, { _id: 1 });
function findCursorCallback(cursor) {
cursor.sort( { _id: -1 } ).limit(1);
cursor.each(cursorEachCallback);
}
function cursorEachCallback(err, doc) {
if (err) console.error("ERROR: " + err);
if (doc != null) {
seq = doc._id + 1;
document._id = seq;
targetCollection.insert(document, insertCallback);
}
if (seq === 1) {
document._id = 1;
targetCollection.insert(document, insertCallback);
}
}
function insertCallback(err, result) {
if (err) {
console.dir(err);
}
else {
keepRunning = false;
}
next();
}
}
// Called once after the testFunction() fails and the loop has ended.
function afterFinishFunction(err) {
callback(err, null);
}
}
// Call find() with optional query and projection criteria and return the cursor object.
function findCursor(collection, callback, optQueryObject, optProjectionObject) {
if (optProjectionObject === undefined) {
optProjectionObject = {};
}
var cursor = collection.find(optQueryObject, optProjectionObject);
callback(cursor);
}
Call with
insertDocument(db.collection(collectionName), documentToSave, function() {if(err) console.error(err);});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to order by twice with MongoDB, Mongoose, and NodeJS [duplicate] - node.js

I am looking to get a random record from a huge collection (100 million records). What is the fastest and most efficient way to do so? The data is already there and there are no field in which I can generate a random number and obtain a random row.

Do a count of all records, generate a random number between 0 and the count, and then do: db.yourCollection.find().limit(-1).skip(yourRandomNumber).next()

Now you can use the aggregate. Example: db.users.aggregate( [ { $sample: { size: 3 } } ] ) See the doc.

In Python using pymongo: import random def get_random_doc(): count = collection.count() return collection.find()[random.randrange(count)]

Using Python (pymongo), the aggregate function also works. collection.aggregate([{'$sample': {'size': sample_size }}]) This approach is a lot faster than running a query for a random number (e.g. collection.find([random_int]). This is especially the case for large collections.

The following aggregation operation randomly selects 3 documents from the collection: db.users.aggregate( [ { $sample: { size: 3 } } ] ) https://docs.mongodb.com/manual/reference/operator/aggregation/sample/

MongoDB now has $rand To pick n non repeat items, aggregate with { $addFields: { _f: { $rand: {} } } } then $sort by _f and $limit n.

I'd suggest adding a random int field to each object. Then you can just do a findOne({random_field: {$gte: rand()}}) to pick a random document. Just make sure you ensureIndex({random_field:1})

non of the solutions worked well for me. especially when there are many gaps and set is small. this worked very well for me(in php): $count = $collection->count($search); $skip = mt_rand(0, $count - 1); $result = $collection->find($search)->skip($skip)->limit(1)->getNext();

My simplest solution to this ... db.coll.find() .limit(1) .skip(Math.floor(Math.random() * 500)) .next() Where you have at least 500 items on collections

If you have a simple id key, you could store all the id's in an array, and then pick a random id. (Ruby answer): ids = #coll.find({},fields:{_id:1}).to_a #coll.find(ids.sample).first

you can also use shuffle-array after executing your query var shuffle = require('shuffle-array'); Accounts.find(qry,function(err,results_array){ newIndexArr=shuffle(results_array);

Related

Mongoose get a random element except one

How to define a sort function in Mongoose

Add an array as a subdocument to a Mongoose model instance

Shuffle sub documents in mongoose query

Creating incrementing numbers with mongoDB

Categories

Resources