Inserting unique items into a Meteor collection from an asynchronous callback

Inserting unique items into a Meteor collection from an asynchronous callback - node.js

I have defined a callback function with the Meteor.bindEnvironment wrapper as described in the
Meteor Async Guide. I used the wrapper so that a Meteor collection would be available to this asynchronous callback. Within the callback I am trying to insert only documents with a unique values for an attribute called 'title'. I have found several resources demonstrating the Mongo way of handling this, but the required functions (e.g. findAndModify or upsert option for find) are not yet implemented by Meteor.
I have resorted to performing a query for incoming title's value and inserting a new document if the query returns no matching documents. However, this fails due to the asynchronous nature of the callback and duplicates end up being inserted into the collection.
Is there a Meteor or Node.js pattern for wrapping a critical section like this with a lock?
Thanks!

If only one actor (client or server) is doing the polling, I don't see how you would get an async bypass of your 'already received' block list unless your update call was going off too often. Is there nothing in the api to only return msgs after a timestamp, or to subscribe only to changes?
If you have multiple clients each polling a common outside source for updates, then trying to sync to a merged common collection, (which you don't say, but would have the problem you describe),
Make each stream unique by title + userid.
You also may want just local collections on the client to track what the client has seen, and a separate one on the server if you are trying for an audit trail.

Related

How to reliably wait for an index to be ready in mongodb with the node.js driver?

Let's say I have a feature in my application which relies on an index to be fully functional to work properly.
For example a unique index which guards against creating duplicates.
Is there a way to find out if the indexes are ready for a specific collection using the mongodb Node.js driver ?
Ideally I would like to write a function with this contract
async function waitForIndexes(collection:Collection):Promise<void>
I found that it's possible to use admin commands to poll background tasks and see it one of them corresponds to an index creation, but that seems a bit tricky to implement in a reliable way. Is there a better way ?

React Flux dispatcher vs Node.js EventEmitter - scalable?

When you use Node's EventEmitter, you subscribe to a single event. Your callback is only executed when that specific event is fired up:
eventBus.on('some-event', function(data){
// data is specific to 'some-event'
});
In Flux, you register your store with the dispatcher, then your store gets called when every single event is dispatched. It is the job of the store to filter through every event it gets, and determine if the event is important to the store:
eventBus.register(function(data){
switch(data.type){
case 'some-event':
// now data is specific to 'some-event'
break;
}
});
In this video, the presenter says:
"Stores subscribe to actions. Actually, all stores receive all actions, and that's what keeps it scalable."
Question
Why and how is sending every action to every store [presumably] more scalable than only sending actions to specific stores?

The scalability referred to here is more about scaling the codebase than scaling in terms of how fast the software is. Data in flux systems is easy to trace because every store is registered to every action, and the actions define every app-wide event that can happen in the system. Each store can determine how it needs to update itself in response to each action, without the programmer needing to decide which stores to wire up to which actions, and in most cases, you can change or read the code for a store without needing to worrying about how it affects any other store.
At some point the programmer will need to register the store. The store is very specific to the data it'll receive from the event. How exactly is looking up the data inside the store better than registering for a specific event, and having the store always expect the data it needs/cares about?
The actions in the system represent the things that can happen in a system, along with the relevant data for that event. For example:
A user logged in; comes with user profile
A user added a comment; comes with comment data, item ID it was added to
A user updated a post; comes with the post data
So, you can think about actions as the database of things the stores can know about. Any time an action is dispatched, it's sent to each store. So, at any given time, you only need to think about your data mutations a single store + action at a time.
For instance, when a post is updated, you might have a PostStore that watches for the POST_UPDATED action, and when it sees it, it will update its internal state to store off the new post. This is completely separate from any other store which may also care about the POST_UPDATED event—any other programmer from any other team working on the app can make that decision separately, with the knowledge that they are able to hook into any action in the database of actions that may take place.
Another reason this is useful and scalable in terms of the codebase is inversion of control; each store decides what actions it cares about and how to respond to each action; all the data logic is centralized in that store. This is in contrast to a pattern like MVC, where a controller is explicitly set up to call mutation methods on models, and one or more other controllers may also be calling mutation methods on the same models at the same time (or different times); the data update logic is spread through the system, and understanding the data flow requires understanding each place the model might update.
Finally, another thing to keep in mind is that registering vs. not registering is sort of a matter of semantics; it's trivial to abstract away the fact that the store receives all actions. For example, in Fluxxor, the stores have a method called bindActions that binds specific actions to specific callbacks:
this.bindActions(
"FIRST_ACTION_TYPE", this.handleFirstActionType,
"OTHER_ACTION_TYPE", this.handleOtherActionType
);
Even though the store receives all actions, under the hood it looks up the action type in an internal map and calls the appropriate callback on the store.

Ive been asking myself the same question, and cant see technically how registering adds much, beyond simplification. I will pose my understanding of the system so that hopefully if i am wrong, i can be corrected.
TLDR; EventEmitter and Dispatcher serve similar purposes (pub/sub) but focus their efforts on different features. Specifically, the 'waitFor' functionality (which allows one event handler to ensure that a different one has already been called) is not available with EventEmitter. Dispatcher has focussed its efforts on the 'waitFor' feature.
The final result of the system is to communicate to the stores that an action has happened. Whether the store 'subscribes to all events, then filters' or 'subscribes a specific event' (filtering at the dispatcher). Should not affect the final result. Data is transferred in your application. (handler always only switches on event type and processes, eg. it doesn't want to operate on ALL events)
As you said "At some point the programmer will need to register the store.". It is just a question of fidelity of subscription. I don't think that a change in fidelity has any affect on 'inversion of control' for instance.
The added (killer) feature in facebook's Dispatcher is it's ability to 'waitFor' a different store, to handle the event first. The question is, does this feature require that each store has only one event handler?
Let's look at the process. When you dispatch an action on the Dispatcher, it (omitting some details):
iterates all registered subscribers (to the dispatcher)
calls the registered callback (one per stores)
the callback can call 'waitfor()', and pass a 'dispatchId'. This internally references the callback of registered by a different store. This is executed synchronously, causing the other store to receive the action and be updated first. This requires that the 'waitFor()' is called before your code which handles the action.
The callback called by 'waitFor' switches on action type to execute the correct code.
the callback can now run its code, knowing that its dependancies (other stores) have already been updated.
the callback switches on the action 'type' to execute the correct code.
This seems a very simple way to allow event dependancies.
Basically all callbacks are eventually called, but in a specific order. And then switch to only execute specific code. So, it is as if we only triggered a handler for the 'add-item' event on the each store, in the correct order.
If subscriptions where at a callback level (not 'store' level), would this still be possible? It would mean:
Each store would register multiple callbacks to specific events, keeping reference to their 'dispatchTokens' (same as currently)
Each callback would have its own 'dispatchToken'
The user would still 'waitFor' a specific callback, but be a specific handler for a specific store
The dispatcher would then only need to dispatch to callbacks of a specific action, in the same order
Possibly, the smart people at facebook have figured out that this would actually be less performant to add the complexity of individual callbacks, or possibly it is not a priority.

Mongoose Schema Post 'save' callback ordering

I'm using the mongoose Schema.post('save', postSaveCallback) method in order to send updates over a socket to display the state of the world in the database to subscribed clients in a web browser. I am wondering if the post save callback is guaranteed to be executed in the same order that the save method was called? This would guarantee that the state represented in the client view is the accurate state of the world. If the ordering of these post save callbacks is not guaranteed to be in the same order that the mongoose save method is called, it would mean that the clients view could potentially get out of sync with the real database representation.
Is there a better way to do this or is my approach sensible?
Furthermore, is it guaranteed that when postSaveCallback is called the save operation on the underlying mongodb has fully completed and was successful?
Would be very grateful for any pointers on this.
Thanks in advance.

As with async things in general, order is not defined. The postSaveCallback is called when the save operation returns and then is executed when Node gets around to it. Some saves take longer than others which may have been kicked off before, so the callbacks could occur in pretty much any order. You'll have to modify how your callbacks coordinate with each other to ensure whatever kind of consistency you require in your state.
The save callback takes an err argument, so naturally just because the callback doesn't mean that the operation succeeded.

How does nodejs-redis(&connect-redis) deal with sync and async?

I used connect-redis for my session store, and when I use req.session, it seems all the operations on it are synchronized, it's like operating on ordinary Javascript variables, the code obey the order. but I check the source code, which uses the asynchronized way, so I wonder why the req.session acts like that.
Another question is that if I have multiple redis queries,
client.sadd('test', 1);
client.del('test');
client.sadd('test', 2);
client.sadd('test', 3);
no matter where I put the del operation, the results always the same. I thought these queries might be run in any order right? since they all asynchronized called, so the results I expected should be different every time.
Thanks for you help

The fact that roundtrips to the Redis server are managed asynchronously does not mean the queries will be sent in random order.
Redis (and therefore most Redis client libraries) supports pipelining, generally used to optimize the number of roundtrips. The idea is to send multiple queries, and then wait for the replies. The order is critical, because it is used by the client to match queries and replies.
Node.js is very well suited to support this kind of mechanisms. Matt Ranney's node_redis client supports pipelining in a transparent way. Provided the same client object is used, all the queries will be serialized and executed in order.
In your example, it is normal the queries are always executed in the same order. You can check this point by using the monitor command to display the flow of queries sent to Redis.
Now, it is important the last query of the pipeline is associated with a callback, otherwise your program will never know when the last query is complete.

Question about mongodb capped collections + tailable cursors

I'm building a queueing system that passes a message from one process to another via a stack implemented in mongodb with capped_collections and tailable cursors.
The receiving processes loops infinitely looking for new documents in the capped_collection, and when it finds one it performs an operation.
My question is, if I implement multiple receiving processes is there a way to guarantee that a new document will only be read once by one of the processes using a tailable cursor? The goal is to avoid the operation being performed twice if there are two receiving processes looking for new messages in the queue. I'm relatively new to mongodb programming so I'm still getting a feel for all of its features.

MongoDB documents contain a thorough description of ways to achieve an atomic update. You cannot ensure that only one process receives the new document but you can implement an atomic update after receiving it to ensure that only one process acts on it.

I have recently been looking into this problem and I would be interested to know if there are other ways to have multiple readers (consumers) without relying on atomic updates.
This is what I have come up with: divide your logic into two "modules". The first module will be responsible for fetching new documents from the tailable cursor. The second module will be responsible for working with an arbitrary document. In this manner, you can have only one consumer (module one) fetching documents which later sends the document to multiple document workers (second module).
Both modules can be implemented in different processes and even in different languages. For example, a Node.js app could be fetching the documents and sending them to a pool of scripts written in Python ready to process documents concurrently.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string