How does mongodb compare/sort by _id? - node.js

How does mongodb apply comparison operators and sorting on _ids? Does it do it by the timestamp portion of the _id? Also, does it make a difference if the objectId was generated on the client or server?
If so, would paging be reliable on this field? e.g. _id: { $gte: last_idOnPage }

Looking at the documentation about ObjectId() you can see that _id is a hexadecimal string which represents 12-byte number which consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
Therefore partially you are correct: it uses timestamp to sort values as well. But other parts are also used. Because this string represents a number, mongo just compares numbers to find which one is bigger.
Regarding your second question (does it makes a difference was _id generated by application layer or by database): it does not make any difference. Mongo still compares only numbers.

Timestamp is the first portion of BSON::ObjectId value. So basically yes, it first sorts by timestamp and then by other parts.

Related

Decimal value in Postgresql returned as String in Node.js

When I run a query to my postgresql database on a node.js server, the value that I am getting is of variable type string when in fact it is a decimal in the postgresql database.
I am not sure why my decimals or even bigInts are returning as type strings. I am using knex as my ORM if that makes a difference. So far what I have read online is not very clear about what to do, and it seems as if this happens automatically to preserve precision???
What is the best work-around for this? Is it best to convert the string variable I am returned from my query into a decimal using parseFloat?
Both decimal and bigint types may contain values that are too large to "fit" in JavaScript's Number:
Number.MAX_SAFE_INTEGER (JS): 9007199254740991
bigint: -9223372036854775808 to 9223372036854775807
decimal: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
If you're quite certain that the values in your database will fit in Number, you can convert them (I don't know Knex, but perhaps it has some sort of hook system that you can use to transform data that was retrieved from the database), or change your database schema to contain "smaller" row types.
Alternatively, there are also various "big integer" packages for Node that you might be able to use.

Complex data structures in Redis

I am refactoring the backend of a Node/Express app from MongoDB to Redis.
My data currently consist of a few dozen (~70) documents, each composed of a NAME, an abbreviation ABBR, a GeoJSON LOCATION, and an array of integer PARAMETERs. The PARAMETERs for each document are updated every few minutes, but the rest of the attributes remain fixed. The length of the PARAMETER attribute may vary (and it can also be empty). I would like to perform many queries on the data to check the nearest locations to a given point, and display the name, the abbreviation, and the parameters.
An example document:
{
_id: ObjectId("1"),
name: 'A place',
abbr: 'PLC',
location: { type: "Point", coordinates: [ -130.922, 33.289 ]},
parameters: [3 4 28],
}
I am familiar with the GEOADD command in Redis, but I don't see how to use it to create a more complex data structure to hold my data given that if I use the GEOADD command to specify a location and then attempt to use HMSET to add fields for name and abbreviation, I obtain a WRONGTYPE error.
I appreciate the error because I value referential transparency and I like when types are taken seriously. But I also think I might be fundamentally misunderstanding how Redis stores data. When I originally began refactoring after learning about Redis conceptually, I envisioned being able to store my data in a form something like
1 name 'A Place' abbr 'PLC' location -130.922 33.289 parameters 3 4 28
Or if not quite that, a way to easily query the nearness of locations in my set along with the other attributes.
Redis core data structures can not be nested. In your example you should use different a different key (and data structure) for each level, e.g. a Hash for the properties, a Geoset for the location and a perhaps another Hash for the parameters.
Once you have that in place, your query should consist of three reads to compose the final answer.

How can I query MongoDB in a blocking way ? (not sure if this is the right way though)

Let me introduce my problem : I'm currently developing a web app using Node.js, Express and MongoDB (mongoose driver), and I would like, when the user requests /save, to generate an unique ID (made of random letters and digits) in order to redirect the request to /save/id.
Therefore I want my /save route to query MongoDB for a list of existing IDs, and generate a random ID which is not present in the list.
Any idea on how to do that ?
I think both the terms and their relation to each other are a bit unclear her. We first have to see a more precise definition of what the _id field and its default field type of ObjectId are.
The _id field
For each document, there has to be a mandatory, unique _id. The only constraint of that field is that its values have to be unique. In case this field is not explicitly contained in the incoming data, MongoDB will create one with an ObjectId. However, the _id can hold any value, even subdocuments, as long as it is unique. The following is a perfectly valid documents and might even make sense in some use cases:
{
_id: {
name: "Roger",
surname: "Rabbit"
}
}
To enforce uniqueness, a unique index is created over _id.
ObjectId
ObjectId is the default field type of _id. It is an alphanumerical, truncated representation of some values, namely
seconds since epoch
machine identifier, which itself is constructed from various values, iirc (though I can not remember how)
the ID of the process generating the ObjectId
a counter whith random initialisation
The answer to your question
Contrary to what has been told many times and here again, there is absolutely now drawback in using a custom value for the _id field instead of the default ObjectId. Quite the contrary: If you have a field which is (semantically) unique and you query by it, that is an awesome candidate, since most likely you will need an index on this, anyway. Let's take some easy example: the social security number (SSN) in an insurance company application. Compare the next two documents:
{
_id: <SomeObjectId>,
SSN: 12345678,
name: {
first: "Roger",
last: "Rabbit"
}
}
In order to query for the values you need, you need at least 3 indices here: The default on on _id, a unique one on SSN and a multikey one on name. For any given query, only one index can be used. This might become a problem in aggregations (which the math guys in your insurance will do a lot). Additionally you have the pretty useless ObjectId as additional data, which can become costly if you have several million datasets.
Now compare to this document:
{
_id: {
SSN: 12345678,
firstname: "Roger",
lastname: "Rabbit"
}
}
First of all: we hold the same information without useless data and we have only a single index, which can easily translate to Gigabytes of free RAM. Second of all, we added some implicit semantics, because the uniqueness of an entry is enforced over SSN and the complete name, which might become handy if a name is changed due to a marriage. Wether or not this makes sense is a design decision. I decided to use the full name as part of the _id mainly for the sake of this example.
So it is perfectly valid to create an _id yourself as long as you ensure it is unique. And here is the problem. Creating something unique isn't that easy. For the above example, it might be a hash of _id's values (never use clear text SSNs!). This could be done like this (kudos to #maerics):
var crypto = require('crypto'),
shasum = crypto.createHash('sha1');
shasum.update("Roger");
shasum.update("Rabbit");
shasum.update("12345678");
console.log(shasum.digest('hex'));
// Something like "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
You could either add this hash to the _id field , though it might make sense to have a separate field, increasing our index count to two.
Please note that this hashing procedure is only necessary if you would otherwise disclose sensitive data in the URLs!
However, just using ObjectId for the _id because it is a convenient way to create a unique value is lazy system design. As shown, smart system design may save you Gigabytes in RAM, which easily translates into multiple servers for large databases.

update_sequence changed semantics in cloudant db?

I use a cloudant couchdb and I've noticed see that the "_changes" query on the database returns an "update_sequence" that is not a number, e.g.
"437985-g1AAAADveJzLYWBgYM..........".
What is more, the response is not stable: I get 3 different update_sequences if a query the db 3 times.
Is there any change in the known semantics of the "update_sequence", "since", etc. or what?
Regards,
Vangelis
Paraphrasing an answer that Robert has previously given:
The update sequence values are opaque. In CouchDB, they are currently integers but, in Cloudant, the value is an encoding of the sequence value for each shard of the database. CouchDB will likely adopt this in future as clustering support is added (via the BigCouch merge).
In both CouchDB and Cloudant, _changes will return a "seq" value with every row that is guaranteed to return newer updates if you pass it back as "since". In cases of failover, that might include changes you've already seen.
So, the correct way to read changes since a particular update sequence is this;
call /dbname/_changes?since=
read the entire response, applying the changes as you go
Record the last_seq value as your new checkpoint seq value.
Do not interpret the two values, you can't compare them for equality. You can, if you need to, record any "seq" value as you go in step 2 as your current checkpoint seq value. The key thing you cannot do is compare them.
It'll jump around, the representation is a packed base64 string representing the update_seq of the various replicas of each shard of your database. It can't be a simple integer because it's a snapshot of a distributed database.
As for CouchDB, treat the update_seq as opaque JSON and you'll be fine.

What is the most effective way to store 48-digit hex value in mongoDB?

I have unique 48-digit hexidecimal (192 bit) hash for one kind of objects, and, how I can see, mongo stores this values as strings. Is there a way to store it like numbers? Will storing it as numbers be more effective, than as strings?
The most efficient way would be simply the binary data type. It is efficient due to the fact that BSON format, in which MongoDB stores documents is a "Binary" JSON.
The usage depends on the programming language you use.

Resources