MongoDB - Is find() realtime? - node.js

Using the node-mongodb-native npm package, in a node.js app, if I acquire a collection object early in a long-running node.js async script, like this:
var collection = await db.collection(collectionName);
If the collection gets modified before I execute the find() method, of this collection object, will the results of find({}) be current, or will it only show data as it was at the time I acquired the collection object?
For example, let's hypothetically assume that 10 minutes later the script gets to a line like this:
let cursor = await collection.find({});
Additionally assume that during this lapse of time, items were added, removed and modified before find() was called.
Will the resulting cursor navigate current data or will the data be as it was at the time that I acquired the collection object (at the beginning of the script)?

I really doubt it would take a snapshot of the collection when you acquire it.
See:
https://docs.mongodb.com/manual/reference/method/db.getCollection/
Return value of find will be a cursor to the current state.
Wil the resulting cursor navigate current data or will the data be as
it was at the time that I acquired the collection object (at the
beginning of the script)?
The resulting cursor runs through current data.

Related

immutable _id error when performing MongoDB bulkWrite replaceOne on first attempt only

I'm working on a little web application that will crawl and update baseball standings by day and track teams positions (among other things) over time.
I have an API I grab all of this from and a collection in MongoDB that stores all the team data and information for the current day. Right now I just run this manually but eventually it'll be automated to run at like 3am or whenever.
The API supplies a unique ID for each team that never changes. So what I'm doing is I'm taking in the team data from the API. Passing it to a function that then extracts the teams data (there is other data from the response object I don't need), puts it into an object for replacement, and then wherever that team ID exist in the collection its document is replaced in a bulkWite.
async function currentStandings(db,team_standings,callback){
const current_standings = db.collection('current_standings');
let replacePool = [];
for(const single_team of team_standings.data.standing){
let replaceOnePusher = {
replaceOne: {
"filter": {"team_id": single_team.team_id},
"replacement": single_team
}
}
replacePool.push(replaceOnePusher);
}
await current_standings.bulkWrite(replacePool);
callback();
}
However when I execute this code for the first time each day I get an error reading BulkWriteError: After applying the update, the (immutable) field '_id' was found to have been altered to _id: ObjectId('5f26e57b6831761ac840bf1d') (not the same ID every day) and if I look in Compass the data isn't updated. If I immediately run the script again, it goes through successfully without error. Refreshing the data in compass generates the correct data.
Can someone explain to me what is going wrong here? This is actually my first time using MongoDB since I wanted to learn it and this pet project seemed like a good place to start.

node-storage close a storage and save changes before creating a new storage

I am using node-storage in the following code to store a value in a file, however when I create a new storage object changes from another storage object are not yet saved. I need a way to save the changes before creating the new storage object.
Below is a program called code.js which I am running like so in the console: node code.js. If you run it you will see that the first time it is run the key value pair doesn't yet exist however it does exist the second time.
key = "key"
storage = require('node-storage')
const store1 = new storage("file")
const store2 = new storage("file")
store1.put(key,'val')
console.log(store2.get(key))
My motivation for this is that I want to be able to have a function called "set" which takes a key and a value and sets the key value pair in a dictionary of values that is store in a file. I want to be able to refer to this dictionary later, with for example a 'get' function, and have the changes present.
I am thinking there might be a function called "save" or something similar that applies the changes to the file. Is there such a function or some other solution?
node-storage saves the changes in the dictionary to disk after every call to put or remove. This is not the issue.
Your problem is that the dictionary in store2 has not been updated with the new properties. node-storage only loads the file from disk when the object is first created.
My suggestion would be to only have one instance of storage per file.
However, if this is not possible, then you might want to consider updating store2's cache before you get the property. This can be done using:
store2.store = store2._load();
This may not be the best for performance, as _load loads the entire file from disk synchronously every time it is called, so try to limit its use.

mongodb change stream resume from timestamp

In mongodb docs https://docs.mongodb.com/manual/changeStreams/
there is a quote:
The oplog must have enough history to locate the operation associated with the token or the timestamp, if the timestamp is in the past.
So it seems that it is possible to resume and get all the events that where added to oplog from a certain time.
There is a param, seems that it has to accomplish what I need
watch([],{startAtOperationTime: ...})
https://github.com/mongodb/specifications/blob/master/source/change-streams/change-streams.rst#startatoperationtime
Param is Timestamp, I don't get how to translate a particular date to correct timestamp.
startAtOperationTime is a new parameter for changestreams introduced in MongoDB 4.0 and newer driver versions. It allows you to ensure that you're not missing any writes just in case the stream was interrupted, and you don't have access to the resume token.
One caveat of using startAtOperationTime is that your app needs to be prepared to accept that it may see a write event twice when resuming the changestream, since you're resuming from an arbitrary point in time.
In node, this can be done by constructing a Timestamp object and passing it into watch():
async function run() {
const con = await MongoClient.connect(uri, {useNewUrlParser: true})
const ts = new Timestamp(1, 1560812065)
con.db('test').collection('test').watch([], {startAtOperationTime: ts})
.on('change', console.log)
}
The Timestamp object itself is created with the form of:
new Timestamp(ordinal, unix_epoch_in_seconds)
A detailed explanation can be found in BSON Timestamp.
In node, you can get the current epoch (in milliseconds) using e.g.:
(new Date).getTime()
bearing in mind that this needs to be converted to seconds for creating the Timestamp object needed for startAtOperationTime.

node cluster: ensure only one instance of a function is running at a time

I have a cluster of node worker servers that handle hitting an api and inserting data into a mongo db. The problem I am having is that one of these functions appears to every so often insert two copies of the same document. It checks if the document has already been created with a query like so:
gameDetails.findOne({ gameId: gameId }, function(err, gameCheck) {
if (!gameCheck) { //insert the document };
How can I ensure that this function always is only running one instance at a time. Alternatively, if I have not deduced the actual root problem, what could cause a mongo query like this to sometimes result in multiple of the same document, containing the same gameId, to be inserting?
findOne is being called multiple times before the document has had time to be inserted, i.e. something like the following is happening:
findThenInsert()
findThenInsert()
findThenInsert()
// findOne returns null, insert called
// findOne returns null, insert called
// document gets inserted
// findOne returns a gameCheck
// document gets inserted
You should use a unique index to prevent duplicates. Then, your node instances could optimistically call insert straight away, and simply handle the error if they were too late, which is similar to your 'if found do nothing' logic.
Alternatively if you don't mind the document being updated each time, you can use the upsert method, which is atomic:
db.collection.update(query, update, {upsert: true})
Also see
MongoDB atomic "findOrCreate": findOne, insert if nonexistent, but do not update

node-postgres: how to prepare a statement without executing the query?

I want to create a "prepared statement" in postgres using the node-postgres module. I want to create it without binding it to parameters because the binding will take place in a loop.
In the documentation i read :
query(object config, optional function callback) : Query
If _text_ and _name_ are provided within the config, the query will result in the creation of a prepared statement.
I tried
client.query({"name":"mystatement", "text":"select id from mytable where id=$1"});
but when I try passing only the text & name keys in the config object, I get an exception :
(translated) message is binding 0 parameters but the prepared statement expects 1
Is there something I am missing ? How do you create/prepare a statement without binding it to specific value in order to avoid re-preparing the statement in every step of a loop ?
I just found an answer on this issue by the author of node-postgres.
With node-postgres the first time you issue a named query it is
parsed, bound, and executed all at once. Every subsequent query issued
on the same connection with the same name will automatically skip the
"parse" step and only rebind and execute the already planned query.
Currently node-postgres does not support a way to create a named,
prepared query and not execute the query. This feature is supported
within libpq and the client/server protocol (used by the pure
javascript bindings), but I've not directly exposed it in the API. I
thought it would add complexity to the API without any real benefit.
Since named statements are bound to the client in which they are
created, if the client is disconnected and reconnected or a different
client is returned from the client pool, the named statement will no
longer work (it requires a re-parsing).
You can use pg-prepared for that:
var prep = require('pg-prepared')
// First prepare statement without binding parameters
var item = prep('select id from mytable where id=${id}')
// Then execute the query and bind parameters in loop
for (i in [1,2,3]) {
client.query(item({id: i}), function(err, result) {...})
}
Update: Reading your question again, here's what I believe you need to do. You need to pass a "value" array as well.
Just to clarify; where you would normally "prepare" your query, just prepare the object you pass to it, without the value array. Then where you would normally "execute" your query, set the value array in the object and pass it to the query. If it's the first time, the driver will do the actual prepare for you the first time around, and simple do binding and execution for the rest of the iteration.

Resources