Change string indexes to be ObjectID ones in large MongoDB instance - node.js

So, I've git a large production database dump with _id field as strings. Different collections got those string's length different. There're a lot of relations there. I need a way to change string _ids to ObjectId ones.
What I've tried already:
1) Looking mongoose/mongodb documentation for single command to do that failed
2) node.js migration script that grabs all the entries in one collection and wraps string id into ObjectId just fails either because of stack overflow FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory if we're trying to delete-and-recreate approach or with an error about bad string lenght and impossibility to create ObjectId from that string.
Will attach data samples and/or mongoose schema a bit later.

A simple and inefficient cursor solution to avoid JavaScript heap out of memory is to serialize everytyhing. Of course in order to edit the _id you have to create a new document and remove the old one.
const cursor = Model.find().lean().cursor();
let doc;
while ((doc = await cursor.next())) {
// The string must be a valid ObjectId, otherwhise it won't work
if (typeof doc._id === 'string') {
let newId = new mongoose.Types.ObjectId(doc._id);
let newDoc = new Model(Object.assign({}, doc, {_id: newId}));
await newDoc.save();
await Model.remove({_id: doc._id});
}
}
However, if you have errors about incorrect ids, it may be beacause the string id are not actually stringified version of mongo ObjectId. In such case, the relation cannot be preserved.

Related

Converting string to ObjectId is failing in mongoose 4.6.0

I am trying to convert a string to ObjectId using
var body={};
var objId="57b40595866fdab90268321e";
body.id=mongoose.Types.ObjectId(objId);
myModel.collection.insert(body,function(err,data){
//causing err;
});
the above code is working fine when mongoose 4.4.16 is used, but if i update my mongoose to latest version(4.6.0) then problem occurs.
Err
object [
{
"_bsontype":"ObjectID",
"id:{"0":87,"1":180,"2":5,"3":235,"4":134,"5":111,"6":218,"7":185,"8":2,"9":104,"10":50,"11":111}
}
]
is not a valid ObjectId
The right way to insert new document is-
var newDocument = new myModel({
_id: mongoose.Types.ObjectId("57b40595866fdab90268321e")
});
newDocument.save();
In you case-
It stops working because the differences between versions of mongoose and mongo native drivers.
although, you are able to perform this by the example above, or, if you still want to use insert, you can use the myModel.insertMany (by passing object instead of array)
look here
http://mongoosejs.com/docs/api.html#model_Model.insertMany
I don't have the time to spike it, but if I remember correctly id is a simple string and _id is the ObjectId, i.e. either
body.id="57b40595866fdab90268321e"
or
body._id=mongoose.Types.ObjectId("57b40595866fdab90268321e");
That said, does it have to be that specific id? If not, you can use new myModel() and an id will be automatically created.

Update by Id not finding document

I am attempting to perform a relatively simple update using Mongoose:
var location = locationService.getLocation(typedLat, typedLong, travelRadius);
DocumentModel.update({_id : passedInId }, { location : location }, function(err, resp){
next(err, resp);
});
In this case passedInId is the string:
"561ee6bbe4b0f25b4aead5c8"
And in the object (test data I created) the id is:
"_id": ObjectId("561ee6bbe4b0f25b4aead5c8")
When I run the above however, matched documents is 0. I assume this is because passedInId is being treated like a string, however when I type it to an ObjectId:
var queryId = ObjectId(passedInId)
The result is the same, the document doesn't match. What am I missing? This seems like it should be obvious....
Mongoose will correctly interpret a string as an ObjectId. One of the following must be the case:
That record is not in the collection. Run a query in the mongo shell to check.
Mongoose I'd looking in collection other than the one containing your test data. Remember, by default, mongo will lowercase the name under which you register your model and will add an a "s" to it.
Lastly, and your answer speaks to this, maybe your model it's just not being updated with any new information.
This behavior was because I had not yet updated the model in mongoose to include the location element. There is no error, it just doesn't match or do anything.

Set field to empty for mongo object using mongoose

I have a document object that has an embedded sub-document.
To "clear" the sub-document, I try this:
obj.mysub = {};
obj.save();
This doesn't work, my object still has the contents of the mysub sub-document.
But this:
obj.mysub = undefined;
obj.save();
This does work, it removes my sub-document from the object.
My question is why doesn't the first version work? What is going on in Mongodb / Mongoose in the first example?
[edit] Why doesn't the empty object get saved in the first example above.
Mongoose sort of "protects" you from a lot of logic like you have presented in it's own internal resolution. So if you actually need to do this then do it at a lower level to the driver as in:
YourModel.update(
{ /*statement matching your document as a query */ },
{ "$unset": { "mysub": 1 } }
)
And per the normal MongoDB logic then this will work and remove that level in the document that was selected. See the $unset operator for more.

MongoDB's BSON "_id" looks like scientific notation in Node.js

Occasionally I get a document with an _id value which javascript could and does interpret as scientific notation.
ONLY illustrate the problem I ran the following query.
db.users.find({$where:'this._id > 1'}).count()
2
There are hundreds of docs in this collection, but those 2 that evaluate as numbers cause problems when they get used in {$in:[]} clauses.
db.users.findOne({$where:'this._id > 1'})._id
ObjectId("5225141850742e0000002331") - see it's looks like scientific notation right?
I think I run into trouble when I want to store that _id as a string in another collection say as
friendsCollection:
{
_uid:"5225141850742e0000002331"
//more properties here
}
When I retrieve that value, Node (or Mongoose) interprets it as a number like "Infinity". Then my code ends up trying to search for the user with {_id:{$in:[Infinity]}} which throws an error.
I'm guessing there's a more robust way to store _id values or to handle properties you know to be _ids, but I don't know how.
Converting from hex string to binary representation of ObjectID
If you want to convert from a 24-byte hex string representation of an _id value into a binary ObjectID as stored in MongoDB, you can use ObjectID.createFromHexString:
// Need to require ObjectID class if not already included
ObjectID = require('mongodb').ObjectID;
var uid = ObjectID.createFromHexString("5205c4bd7c21105d0d99648c")
Comparing ObjectIDs
You should be using the $gt operator for ObjectID comparison rather than $where. The $where operator evaluates a JavaScript string and cannot take advantage of indexes; it will be much less performant (particularly for this use case).
So the findOne() example to find an _id greater than a given ObjectID should instead be:
db.users.findOne(
{ _id: { $gt: ObjectID("5205c4bd7c21105d0d99648c") } }
)._id
For a predictable outcome on finding the next higher ObjectID you should specify an explicit sort order using find() with sort and limit:
// Find next _id greater than ObjectID("5205c4bd7c21105d0d99648c")
db.users.find(
{_id: { $gt: ObjectID("5205c4bd7c21105d0d99648c") } }
).sort({_id:1}).limit(1).toArray()[0]._id
You'll notice that these find examples doesn't explicitly call createFromHexString. The default ObjectID() constructor will try to create an appropriate ObjectID based on whether the given value is a 24 byte hex string, 12 byte binary string, or a Number. If you know what sort of value you are providing, it is better to call the expected constructor to limit unexpected conversions (for example if you accidentally provided a Number instead of a hex string).
Database References (DBRefs)
MongoDB explicitly does not support joins, however there is a convention for storing database references (DBRefs) when you want to store the _id of a related document as a reference in another document. Mongoose has a ref option that simplifies working with references; see 'Population' in the Mongoose docs.
At some point I had problems when querying _id using the native driver. I fixed it by using ObjectId in my queries. You might find this helpful:
var ObjectId = require("mongodb").ObjectID;
query._id = { $gt: ObjectId(idString) }
Ahhh, maybe I should use the Mongoose data type "ObjectId" in my schemas.
For example I was using mongoose schemas like this:
locations:{
_uid:{type:String},//<--probably not good
lat:{type:Number},
lng:{type:Number}
},
Started using the Schema type "ObjectId":
locations:{
_uid:Schema.Types.ObjectId,//<--better?
lat: {type:Number},
lng: {type:Number}
},
I have to rebuild a lot of data to support this change. Also, I won't be sure until another one of those scientific notation _ids pop up, (I deleted the offending docs). However, this seams promising.

Query Mongoose Schema by ObjectId

Going to need your help again, hopefully for this project, the answer to what I'm having here will be the last. I've seen it's a fairly commonly asked question, but I've tried the tips on another Stack Overflow post, and one by the Google Group, but the solutions haven't worked for me.
My code being a little bit like:
mongoose = require('mongoose');
Schema = mongoose.Schema;
mongoose.connect(MONGO_SERVER);
ObjectId = Schema.ObjectId;
var RacesSchema = new Schema({
venue_id : { type: mongoose.Schema.ObjectId, ref: 'Venues' },
racetype : String
});
var races = mongoose.model('Races', RacesSchema );
function test() {
var MyObjectId = require('mongoose').Types.ObjectId;
queryVenue = new MyObjectId("50e3dcdbf30375180c078f64");
races.find({venue_id: queryVenue, racetype:'Test'})
.exec(function(err,data) {
}
test();
But I get no results, which I know there is.
Many thanks in advance!
UPDATE
Have minimized the above code sample, this test works if I query the string value on its own, just querying for an ObjectId is where it fails, and I know it exists.
JSON UPDATE
Here is what I am looking for:
{
"_id" : ObjectId("50e3dcddf30375180c078f85"),
"venue_id" : "50e3dcdbf30375180c078f64",
"racetype" : "A"
}
And all of a sudden, I believe my answer has become clear to me. Is it simply because the venue_id is actually a string? And if so, can I keep my mongoose schema the way it is, and do casting on the query at the point of doing the find of being a string? Or should I change the way these values are being stored (from a separate .net application I developed) to be inserted as ObjectId's?
Currently right now for another query, the current mongoose schema and the way the database is [actually set up], using populate() works quite well to fill the results of the referenced table (venue_id) with the way this model is currently set up, only difference being on the above query, I don't specify the venue_id...
Thanks.
Right, the problem is happening because the data type of venue_id in the schema (ObjectId) doesn't match the one in the doc (String). So the find is looking for ObjectId values but doesn't find a match because it's a string in the doc.
The right fix for this is to write a little program to update the venue_id values in your docs to be ObjectIds instead of strings and then your query will work. That will also shrink the size of those fields from 24 bytes to 12.

Resources