Concurrent access to a document with mongoose - node.js

I am writing a web application where more users can perform simultaneous operation in the same document in mongodb.
I use the mean.io stack, but I am quite new to it.
I was wondering how does mongoose manage concurrency. Every "user click" operation performs first a read to get the document, and a save after some calculations. Of course the sequence read-calculate-save is not atomic.
Does mongoose work with 'last change wins' policy, or does it throw a versioning error?
Does it make sense in this case to use a queue?
Thanks, best regards.

Yes the last change will win.
A queue could be a good option to solve the problem but I'll suggest 2 other ways:
You could use more advanced mongodb commands, such as $inc (http://docs.mongodb.org/manual/reference/operator/update/inc/) to compute attomically (if your computation are too complicated maybe it is not possible)
If you don't necessarily need to have the correct count available at any time, you could use a 'big data' approach and just store the raw clicks information.
Whenever you need the data (or say every hour or day), you could then use the mongodb aggregate framework, or their mapreduce feature, to compute the correct count.

Related

Robot's Tracker Threads and Display

Application: The purposed application has an tcp server able to handle several connections with the robots.
I choosed to work with database/ no files, so i'm using a sqlite db to save information about the robots and their full history, models of robots, tasks, etc...
The robots send us several data like odometry, tasks information, and so on...
I create a thread for every new robot's connection to handle the messages and update the informations of the robots on the database. Now lets start talk about my problems:
The application got to show information about the robots in realtime, and I was thinking about using QSqlQueryModel, set the right query and the show it on a QTableView but then I got to some problems/ solutions to think about:
Problem number 1: There are informations to show on the QTableView that are not on the database: I have the current consumption on the database and the actual charge on the database in capacity, but I want to show also on my table the remaining battery time, how can I add that column with the right behaviour (math implemented) in my TableView.
Problem number 2: I will be receiving messages each second for each robot, so, updating the db and the the gui(loading the query) may not be the best solution when I have a big number of robots connected? Is it better to update the table, and only update the db each minute or something like this? If I use this method I cant work with the table with the QSqlQueryModel to update the tables, so what is the approach that you recommend me to use?
Thanks
SancheZ
I have run into similar problem before; my conclusion was QSqlQueryModel is not the best option for display purposes. You may want some processing on query results, or you may want to create, remove, change display data based on the result for a fancier gui. I think best is to implement your own delegates and override the view related methods - setData, setEditor
This way you have the control over all your columns and direct union of raw data and its display equivalent (i.e. EditData, UserData).
Yes, it is better if you update your view real-time and run a batch execute at lower frequency to update the big data. In general app is the middle layer and db is a bottom layer for data monitoring, unless you use db in memory shared cache.
EDIT: One important point, you cannot run updates in multiple threads (you can, but sqlite blocks the thread until it gets the lock) so it is best to run update from a single thread

How to use a mongodb cursor with node.js

Let's say that I have a collection in my database called rabbits. My app uses this database and currently there are multiple users using my app. The users want to see the rabbits one by one; when they start the app they see 1 rabbit and then they press 'next' to see the next one, and so on.
I don't want to query the database every time the user presses next, so I decided to use cursors. I am thinking of creating a simple map data structure (working as a cache) that maps a user to its cursor. So before querying the database again we simply check in the map first.
Is this good practice? should I perhaps use redis here instead?
there are probably a million answers to this question and most would be correct. Just some possibilities:
Of course you can use Redis, and read it from memory.
You can also downgrade a bid use something like node-cache which will have less overhead and simpler to implement.
You can take the cursor --> array ---> JSON and if you are not worried about constant new rabbits (after all rabbits do multiply fast :) -- then you can write the rabbits to a JSON file, and pick up as the client wants to swing through it.
You can of course aggregate your MongoDB Cursor...or have a cron job run every few minutes to create a new Rabbit Pick cursor.
On and on it goes.
The critical thing is to match what you decide with the services, memory and cores on your server(s).

Mongodb, can i trigger secondary replication only at the given time or manually?

I'm not a mongodb expert, so I'm a little unsure about server setup now.
I have a single instance running mongo3.0.2 with wiredtiger, accepting both read and write ops. It collects logs from client, so write load is decent. Once a day I want to process this logs and calculate some metrics using aggregation framework, data set to process is something like all logs from last month and all calculation takes about 5-6 hours.
I'm thinking about splitting write and read to avoid locks on my collections (server continues to write logs while i'm reading, newly written logs may match my queries, but i can skip them, because i don't need 100% accuracy).
In other words, i want to make a setup with a secondary for read, where replication is not performing continuously, but starts in a configured time or better is triggered before all read operations are started.
I'm making all my processing from node.js so one option i see here is to export data created in some period like [yesterday, today] and import it to read instance by myself and make calculations after import is done. I was looking on replica set and master/slave replication as possible setups but i didn't get how to config it to achieve the described scenario.
So maybe i wrong and miss something here? Are there any other options to achieve this?
Your idea of using a replica-set is flawed for several reasons.
First, a replica-set always replicates the whole mongod instance. You can't enable it for individual collections, and certainly not only for specific documents of a collection.
Second, deactivating replication and enabling it before you start your report generation is not a good idea either. When you enable replication, the new slave will not be immediately up-to-date. It will take a while until it has processed the changes since its last contact with the master. There is no way to tell how long this will take (you can check how far a secondary is behind the primary using rs.status() and comparing the secondaries optimeDate with its lastHeartbeat date).
But when you want to perform data-mining on a subset of your documents selected by timespan, there is another solution.
Transfer the documents you want to analyze to a new collection. You can do this with an aggregation pipeline consisting only of a $match which matches the documents from the last month followed by an $out. The out-operator specifies that the results of the aggregation are not sent to the application/shell, but instead written to a new collection (which is automatically emptied before this happens). You can then perform your reporting on the new collection without locking the actual one. It also has the advantage that you are now operating on a much smaller collection, so queries will be faster, especially those which can't use indexes. Also, your data won't change between your aggregations, so your reports won't have any inconsistencies between them due to data changing between them.
When you are certain that you will need a second server for report generation, you can still use replication and perform the aggregation on the secondary. However, I would really recommend you to build a proper replica-set (consisting of primary, secondary and an arbiter) and leave replication active at all times. Not only will that make sure that your data isn't outdated when you generate your reports, it also gives you the important benefit of automatic failover should your primary go down for some reason.

How to account for a failed write or add process in Mongodb

So I've been trying to wrap my head around this one for weeks, but I just can't seem to figure it out. So MongoDB isn't equipped to deal with rollbacks as we typically understand them (i.e. when a client adds information to the database, like a username for example, but quits in the middle of the registration process. Now the DB is left with some "hanging" information that isn't assocaited with anything. How can MongoDb handle that? Or if no one can answer that question, maybe they can point me to a source/example that can? Thanks.
MongoDB does not support transactions, you can't perform atomic multistatement transactions to ensure consistency. You can only perform an atomic operation on a single collection at a time. When dealing with NoSQL databases you need to validate your data as much as you can, they seldom complain about something. There are some workarounds or patterns to achieve SQL like transactions. For example, in your case, you can store user's information in a temporary collection, check data validity, and store it to user's collection afterwards.
This should be straight forwards, but things get more complicated when we deal with multiple documents. In this case, you need create a designated collection for transactions. For instance,
transaction collection
{
id: ..,
state : "new_transaction",
value1 : values From document_1 before updating document_1,
value2 : values From document_2 before updating document_2
}
// update document 1
// update document 2
Ooohh!! something went wrong while updating document 1 or 2? No worries, we can still restore the old values from the transaction collection.
This pattern is known as compensation to mimic the transactional behavior of SQL.

Multiple node instances with a single database

I'm currently writing a Node app and I'm thinking ahead in scaling. As I understand, horizontal scaling is one of the easier ways to scale up an application to handle more concurrent requests. My working copy currently uses MongoDb on the backend.
My question is thus this: I have a data structure that resembles a linked list that requires the order to be strictly maintained. My (imaginary) concern is that when there is a race condition to the database via multiple node instances, it is possible that the resolution of the linked list will be incorrect.
To give an example: Imagine the server having this list a->b. Instance 1 comes in with object c and instance 2 comes in with object d. It is possible that there is a race condition in which both instances read a->b and decides to append their own objects to the list. Instance 1 will then imagine it's insertion to be a->b->c while instance 2 think it's a->b->d when the database actually holds a->b->c->d.
In general, this sounds like a job for optimistic locking, however, as I understand, neither MongoDB or Redis (the other database that I am considering) does transactions in the SQL manner.
I therefore imagine the solution to be one of the below:
Implement my own transaction in MongoDB using flags. The client does a findAndModify on the lock variable and if successful, performs the operations. If unsuccessful, the client retries after a certain timeout.
Use Redis transactions and pubsub to achieve the same effect. I'm not exactly sure how to do this yet, but it sounds like it might be plausible.
Implement some sort of smart load balancing. If multiple clients is operating on the same item, route them to the same instance. Since JS is single threaded, the problem would be solved. Unfortunately, I didn't find a straightforward solution to that.
I sure there exists a better, more elegant way to achieve the above, and I would love to hear any solutions or suggestions. Thank you!
If I understood correctly, and the list is being stored as one single document, you might be looking at row versioning. So add a property to the document that will handle the version, when you update, you increase (or change) the version and you make that a conditional update:
//update(condition, value)
update({version: whateverYouReceivedWhenYouDidFind}, newValue)
Hope it helps.
Gus
You want the findAndModify command on mongodb that will guarantee an atomic modification while returning the newly modified doc. As the changes are serial and atomic instance 1 will have a->b->c and instance 2 will have a->b->c->d
Cheers
If all you are doing is adding new elements to the list, you could use a Redis list and include the time in every value you add. The list may be unsorted on redis but should be quickly sortable when retrieved.

Resources