Delete documents from Collection with limit & skip - MongoDB & Mongoose - node.js

Mongoose 6.6.1, MongoDB 6.0.1
Our team uses Mongo/Mongoose every day - but just the basics. I experiment with more advanced features, so I got a reputation as the local "Mongoose Guru" - but in the real world, I'm so not.
I clean up our log tables with a deleteMany for docs older than a week - no problem - simple filter.
But now I just want to keep the last 1000 docs & delete the rest - assumed it would be trivial, but how?
Mongoose 6.6.1 documents the 'deleteMany' methods for 2 Object types: Model & Query
MyModel.deleteMany([filter], [options]) calls the deleteMany method on the Model -
MyModel.find().deleteMany([filter], [options]) calls it on the Query.
Slightly different syntax
Since queries are chainable, my first assumption was that query.deleteMany() WITHOUT ANY ARGUMENTS would only delete the documents that were already filtered by the previous query. I played with both skip & limit - similar results.
MyModel.find() is a Query that matches all docs - so deleteMany removes them all
MyModel.find().limit(10) matches 10 docs.
Logically I assumed MyModel.find().limit(10).deleteMany(); should delete only those 10
BUT - that generates an error:
MongoServerError: The limit field in delete objects must be 0 or 1. Got 10
I can hack that but it's messy and I like elegant - am I missing something about deleteMany?
Just in case anyone is interested, the hack:
async function keepLast(myModel, n=10) {
let edge = await myModel.findOne().sort({ _id: -1 }).skip(n);
let res = await myModel.deleteMany({_id:{$lte:edge._id}});
return res;
}
I'd like to think there was a better way that would also give more insight...

Related

Prob find latest collection

I don't find much information about this problem to solve.
On my mongodb I create a collection every 60 seconds with the name "test "+ date.now(). So far everything works ok. It creates me different collections with the name test XXXXXX1, test XXXXX2 etc.
I have problems with the mongoose.find() method. I can't find my last created collection.
let test = mongoose.model('test' + date.now(), Schema);
test.find({}, function (err, response) {});
How do I find the latest collection in stream? Thank you!
Mongo ,By default , does not support sequence .
for that purpose you're going to have to add specific field for sorting or sort your fields based on your current field properties.
After that you have to use .sort() cursor method :
Collection.find().sort([...]);
Read this article for more info

In Sequelize How to order by `column_a - column_b`?

In my project I am using node.js v10, sequelize 5.8.9 and Postgres 11.
In database there is a Task table and a few child tables. I have done the sequelize model definition, and the create, update, query through sequelize just works perfectly.
Today I was going to make the query ordered by Task.end_time - Task.start_time, sounds easy, doesn't it? However it proved to be very tricky if not impossible using sequelize.
As the options object passed into findAll method contains nested include arrays to retrieve the child tables, the SQL generated actually is complicated and containing subqueries.
I have tried sequelize.literal, sequelize.fn, none worked, and the reasons are the same - sequelize put the ORDER BY clause in two places - one in a subquery where the field names are still in underscored form, e.g. start_time and end_time and the other in the end of the sql, where the field names have been aliased to startTime and endTime, therefore the order I specified can never satisfy both at the same time.
I also think of Sequelize.VIRTUAL data type, found this post however the trick didn't work either. The SQL generated was wrong:
"Task"."endTime - startTime as runningTime" FROM "task"
Any advice / suggestion is appreciated.

mongoose query using sort and skip on populate is too slow

I'm using an ajax request from the front end to load more comments to a post from the back-end which uses NodeJS and mongoose. I won't bore you with the front-end code and the route code, but here's the query code:
Post.findById(req.params.postId).populate({
path: type, //type will either contain "comments" or "answers"
populate: {
path: 'author',
model: 'User'
},
options: {
sort: sortBy, //sortyBy contains either "-date" or "-votes"
skip: parseInt(req.params.numberLoaded), //how many are already shown
limit: 25 //i only load this many new comments at a time.
}
}).exec(function(err, foundPost){
console.log("query executed"); //code takes too long to get to this line
if (err){
res.send("database error, please try again later");
} else {
res.send(foundPost[type]);
}
});
As was mentioned in the title, everything works fine, my problem is just that this is too slow, the request is taking about 1.5-2.5 seconds. surely mongoose has a method of doing this that takes less to load. I poked around the mongoose docs and stackoverflow, but didn't really find anything useful.
Using skip-and-limit approach with mongodb is slow in its nature because it normally needs to retrieve all documents, then sort them, and after that return the desired segment of the results.
What you need to do to make it faster is to define indexes on your collections.
According to MongoDB's official documents:
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
-- https://docs.mongodb.com/manual/indexes/
Using indexes may cause increased collection size but they improve the efficiency a lot.
Indexes are commonly defined on fields which are frequently used in queries. In this case, you may want to define indexes on date and/or vote fields.
Read mongoose documentation to find out how to define indexes in your schemas:
http://mongoosejs.com/docs/guide.html#indexes

couchDB conflicts when supplying own ID with large inserts using _bulk_docs

Same code works fine when letting couch auto generate UUID's. I am starting off with a new completely empty database yet I keep getting this
error: conflict
reason: Document update conflict
To reiterate I am posting new documents to an empty database so not sure how I can get update conflicts when nothing is being updated. Even stranger the conflicting documents still show up in the DB with only a single revision, but overall there are missing records.
I am trying to insert about 38,000 records with _bulk_docs in batches of 100. I am getting these records (100 at a time) from a RETS server, each record already has a unique ID that I want to use for the couchDB _id instead of their UUID's. I am using a promised based library to get the records and axios to insert them into couch. After getting the first batch of 100 I then run this code to add an _id to each of the 100 records before inserting
let batch = [];
batch = records.results.map((listing) => {
let temp = listing;
temp._id = listing.ListingKey;
return temp;
});
Then insert:
axios.post('http://127.0.0.1:5984/rets_store/_bulk_docs', { docs: batch })
This is all inside of a function that I call recursively.
I know this probably wont be enough to see the issue but thought Id start here. I know for sure it has something to do with my map() and adding the _id = ListingKey
Thanks!

Speeding up my cloudant query

I was wondering whether someone could provide some advice on my cloudant query below. It is now taking upwards of 20 seconds to execute against a DB of 50,000 documents - I suspect I could be getting better speed than this.
The purpose of the query is to find all of my documents with the attribute "searchCode" equalling a specific value plus a further list of specific IDs.
Both searchCode and _id are indexed - any ideas why my query would be taking so long / what I could do to speed it up?
mydb.find({selector: {"$or":[{"searchCode": searchCode},{"_id":{"$in":idList}}]}}, function (err, result) {
if(!err){
fulfill(result.docs);
}
else{
console.error(err);
}
});
Thanks,
James
You could try doing separate calls for the queries
find me documents where the searchCode = 'some value'
find me documents whose ids match a list of ids
The first can be achieved with a find call and a query like so:
{ selector: {"searchCode": searchCode} }
The second can be achieved by hitting the databases's _all_docs endpoint, passing in the list of ids as a keys parameter e.g.
GET /db/_all_docs?keys=["a","b","c"]
You might find that running both requests in parallel and merging the results gives you better performance.

Resources