Mongoose Node Express need _id Number with at least 18 positions - node.js

I am using MEAN Stack with Node, Express and MongoDB (mongoose ODM) in the backend.
Now, I have a schema model for a collection that has an _id value that is overwritten and that should be represented by a long or another big number datatype with at least 18 positions. I wanted to use mongoose-long for that.
The reason for that is, that I have some kind of "compound key" from 2 collections, which I represent by 12 positions from one collection, and then increment the other 6 number positions from the other collection which I increment by 1. This way, I can associate which of the collections belong together by the number prefix.
However, when I generate some data (float) and enter a long number for _id like "123456789123456778" and pass it to the mongodb, it gets saved, but there seems to be a rounding to the last three positions (hundreds) of the number when I query for the id field.
Also, if I increment the number by 1, it cannot save the entry because of duplicate keys. Is this a mongoose / MongoDB problem, or is it rather a javascript or python client side problem? I have read here: Mongodb can't find object with too long _id, that you can't query too long id numbers, maybe it is the same for entering too long numbers...
name: 'MongoError',
message: 'E11000 duplicate key error collection: dnz.fps index: _id_ dup key: { : 1.234567891234568e+17 }',
driver: true,
code: 11000,
index: 0,
errmsg: 'E11000 duplicate key error collection: dnz.fps index: _id_ dup key: { : 1.234567891234568e+17 }',
getOperation: [Function],
toJSON: [Function],
toString: [Function] }
Is there some workaround for this? Or for getting the big number sizes right in MongoDB?
I also thought about changing my data model, and using like 2 id's for my Collection, but which would change all my query logic again, and I don't know if this gets messy when I create new instances of my model. e.g. I would have to pass the id value from the created instance and insert it into the other collection in some way...
Has anybody a better idea how to work around this big number problem?

Related

Mongoose not updating key value pairs stored with Object Schema

This is for rating feature in my application. I want to use the user-mail as key and the users rating as the value
Eg:
ratings : {
"user1#gmail.com" : 5,
"user2#gmail.com" : 4
}
I don't prefer using arrays since their could be just a single rating from each user.
I tried inserting a new key value pair in mongo using compass and it worked fine but when I did this using mongoose with type as Object in express, it is not working. Only the first key value pair is stored the user2's key value pair is not getting added.
Thanks in advance.
Schema type : Object
I have solved this using the method markModified("fieldname") before save().
Example:
mongooseSchema.markModified("ratings");
mongooseSchema.save();

mongodb (mongoose) throws E11000 duplicate key error collection

I have a state model which is a lookup with just a name that i have set to unique because i don't want two states with the same name. now I have vacation model with a state property that i have set to the state schema. error E11000 duplicate key error collection is thrown after the first vacation is inserted when I insert a second vacation with the same state.I know mongodb throws the exception because I already have a vacation with the state.name as the first one.
const stateSchema = new mongoose.Schema({
name: {
type: String,
required: true,
minlength: 1,
maxlength: 30,
unique: true
}
});
const State = mongoose.model("State", stateSchema);
------------------------------------------------------
const vacationSchema = new mongoose.Schema({
-- some properties
state: {
type: stateSchema,
required: true
}
-- some properties
});
const Vacation = mongoose.model("Vacation", vacationSchema);
so how can i force unique state names in the states collection, but allow multiple vacations to have the same state in the vacations state? do I have to explicitly change state: {type: stateSchema} to {type: new Schema(...etc ?
You've two options :
If there is only one field name in State collection, also if it doesn't change(kind of nothing like rename/alter), Plus every vacation document contains only one state name then you would include state name value in state field directly in your Vacation document, this helps in if you don't want to hit other collection every time you query Vacation plus no need to maintain two collections. There is no point of directly storing names in Vacation collection if you're maintaining State collection, as there won't be any validation whether a given name exists in State collection or not, unless you do a DB call to check it before inserts to Vacation, because storing data into collections is independent - So you might need to remove unique : true from stateSchema, your code should work, as the above code is about storing data into Vacation or simply in Vacation schema make State as string & use State schema for storing data into State Collection.
Or if you really need to maintain another collection as what you've now, then you need to have a mapping between documents in State collection to documents in Vacation. This is helpful if most documents in Vacation will have multiple State names in array field. Where State names are unique in State collection. Optimal way of having relation is thru _id of State documents. If need to do that then make changes in Vacation Schema as,
state: {
type: [{ type: Schema.Types.ObjectId, ref: 'State' }],// you can make this single value instead of array.
required: true
}
After this you need to use .populate() on reads - Ref : mongoose populate
If you want to store state name as value in Vacation collection, then make state field as string, then you can use .populate() on reads - Ref : mongoose populate virtuals or similar to mongoDB's native $lookup - Ref : mongoDB $lookup
Note :
Uniqueness on State name helps when you've transactions related to State and also you need separate collection when you need to make frequent changes only on State related data + on reads as well you need only data related to State or Vacation - you make those independent as State is unique you can refer that in vacation.
Not only in mongoose schema, you also create unique index on a field, that way it will be on DB level which helps either way when running queries directly on DB or thru code, whereas mongoose schema triggers thru code, Also when you create an unique index on existing collection & runs into issue means there is already duplicate data for that field.

Mongo index [String] property

I was just reviewing some code, and saw such property in the mongoose scheme:
names: {
type: [String],
index: true
}
As far as I understand how indexes work, they are binary trees, and how is this going to be organized as a node of a tree? Is there at all any sense of indexing such property?
'If you index a field that holds an array value, MongoDB creates separate index entries for every element of the array.' Per MongoDB documentation on multikey index.

mongodb: another "how to add a random record" thread

I've come across many of this same question here on StackOverflow. None providing a valid solid solution, so here we go:
I need to pick a random document from around 5 million documents in my MongoDB database in an efficient way.
I've tried getting the .count and using the .skip to get the random document, but it takes almost three seconds and very, very inefficient.
I can't make changes to the documents (like adding a "random") entry to each document or changing their _id's.
I've tried the solution of adding documents with an incremental _id (to pick a random _id to bypass using .skip) but this brought more headache than what it did when I try to add many documents in a short amount of time.
Adding data in an incremental way, or picking a random document, should not be this hard. I'm either missing some common knowledge, or doing something wrong, or this is what it really is..
Wanted to bring up the topic and get your responses.
Here is a way using the default ObjectId values for _id and a little math and logic.
// Get the "min" and "max" timestamp values from the _id in the collection and the
// diff between.
// 4-bytes from a hex string is 8 characters
var min = parseInt(db.collection.find()
.sort({ "_id": 1 }).limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
max = parseInt(db.collection.find()
.sort({ "_id": -1 })limit(1).toArray()[0]._id.str.substr(0,8),16)*1000,
diff = max - min;
// Get a random value from diff and divide/multiply be 1000 for The "_id" precision:
var random = Math.floor(Math.floor(Math.random(diff)*diff)/1000)*1000;
// work out a "random" _id value in the range:
var _id = new ObjectId(((min + random)/1000).toString(16) + "0000000000000000")
// Then query for the single document:
var randomDoc = db.collection.find({ "_id": { "$gte": _id } })
.sort({ "_id": 1 }).limit(1).toArray()[0];
That's the general logic in shell representation and easily adaptable.
So in points:
Find the min and max primary key values in the collection
Generate a random number that falls between the timestamps of those documents.
Add the random number to the minimum value and find the first document that is greater than or equal to that value.
This uses "padding" from the timestamp value in "hex" to form a valid ObjectId value since that is what we are looking for. Using integers as the _id value is essentially simplier but the same basic idea in the points.

Storing data efficiently in MongoLab and in general

I have an app that listens to a websocket and it stores usernames/userID's (Usernames are 1-20 bytes, UserID's are 17 bytes). This is not a big deal because it's only one document. However, every round they participate in, it pushes the round ID (24 bytes) and a 'score' decimal value (ex: 1190.0015239999999).
The thing is, there is no limit to how many rounds there are and I can't afford to pay so much per month for mongolab. What's the best way to handle this data?
My thoughts:
- If there is a way to replace the _id: field in mongodb, I will replace it with the userID which is 17 bytes long. Not sure if I can do that though.
Store user data with timestamps and remove OLD data that has a score value less than 200.
Cut off user names that are more than 10 characters.
Completely remove Round ID's (Or replace the _id field with roundId). (Won't work since there are multiple roundID's in each document)
Round the decimal value to two places.
Remove Round ID's after 30 days
tl;dr
Need to store data efficiently < 500 mb in mongo lab
Documents consists of username(1-20 characters), userid(17 characters), round(Object Array) = [{round Id(24 characters), score(1190.0015239999999)}].
Thanks in advance!
Edit:
Document Schema:
userID: {type: String},
userName: {type: String},
rounds: [{roundID: String, score: String}]
Modelling 1:n relationships as embedded document is not the best except for very rare cases. This is because there is a 16MB size limit for BSON documents at the time of this writing.
A better (read more scalable and efficient approach) is to do use document references.
First, you need your player data, of course. Here is an example:
{
_id: "SomeUserId",
name: "SomeName"
}
There is no need for an extra userId field since each document needs to have a _id field with unique values anyway. Contrary to popular belief, this fields value does not have to be an ObjectId. So we already reduced the size you need for your player data by 1/3, if I am not mistaken.
Next, the results of each round:
{
_id: {
round: "SomeString",
player: "SomeUserId"
},
score: 5,
createdAt: ISODate("2015-04-13T01:03:04.0002Z")
}
A few things are to note here. First and foremost: Do NOT use strings to record values. Even grades should rather be stored as corresponding numerical values. Otherwise you can not get averages and alike. I'll show more of that later. We are using a compound field for _id here, which is perfectly valid. Furthermore, it will give us a free index optimizing a few of the most likely queries, like "How did player X score in round Y?"
db.results.find({"_id.player":"X","_id.round":"Y"})
or "What where the results of round Y?"
db.results.find({"_id.round":"Y"})
or "What we're the scores of Player X in all rounds?"
db.results.find({"_id.player":"X"})
However, by not using a string to save the score, even some nifty stats become rather cheap, for example "What was the average score of round Y?"
db.results.aggregate(
{ $match: { "_id.round":"Y" } },
{ $group: { "round":"$_id.round", "averageScore": {$avg:"$score"} }
)
or "What is the average score of each player in all rounds?"
db.results.aggregate(
{ $group: { "player: "$_id.player", "averageAll": {$avg:"$score"} }
)
While you could do these calculation in your application, MongoDB can do them much more efficiently since the data does not have to be send to your app prior to processing it.
Next, for the data expiration. We have a createdAt field, of type ISODate. Now, we let MongoDB take care of the rest by creating a TTL index
db.results.ensureIndex(
{ "createdAt":1 },
{ expireAfterSeconds: 60*60*24*30}
)
So all in all, this should be pretty much the most efficient way of storing and expiring your data, while improving scalability in the same time.
So currently you are storing three data points in the array for each record.
_id: false will prevent mongoose from automatically creating an id for the document. If you don't need roundID, then you can use the following which only stores one data point in the array:
round[{_id:false, score:String}]
Otherwise if roundID actually has meaning, use the following which stores two data points in the array:
round[{_id:false, roundID: string, score:String}]
Lastly, if you just need an ID for reference purposes, use the following, which will store two data points in the array - a random id and the score:
round[{score:String}]

Resources