When I add a startAt(priority) to an on("value", fn) call I don't get callbacks after the node is created.
var myDataRef = firebaseRef.push(data);
myDataref.startAt(200).on("value", fn(snap) { blah });
My setup:
I have a client that pushes data to Firebase.
The client adds a value callback on the data ref returned in the step above.
I have a server component that use uses the firebase-work-queue to extract items from Firebase as they arrive. But instead of removing the items from the queue, I set the priority of the item to 100 to mark it as claimed by a worker.
The server component then does its thing with the data and when finished, updates the data and sets the priority to 200. Marking the task as complete.
On the client, if I DON'T USE startAt(200), I get the following callbacks:
A null value when the data is first pushed.
Then I get a callback with the data that I actually pushed.
Finally, I get a callback when the data is updated and priority 200 is set.
You'll notice that I didn't get a callback when the server claimed the queue entry and set the priority to 100. I've noticed that just changing a node's priority doesn't generate value callbacks. I guess that's technically correct since you're not changing the value, just the priority. And technically correct is the best kind of correct.
I've structure my code to ignore the extraneous callback in step 1 & 2 above. But I'd like to use startAt(200) to prevent those callbacks. However when I apply startAt(200), I only get the first callback with null data (step 1). This doesn't make sense as it doesn't even have a priority.
So my question is what is going on here?
Related
I'm working on a project where I have a Feeds page which shows all type of post by all users. Type specific list page and Posts' detail page.
Pages-
1. Feeds
2. List (Type specific)
3. Detail (detail of a post)
So I have following Mongo collection -
1. Feed
2. type1 post
3. type2 post
4. type3...
Now when a user post a new Post I save it to respective collection lets say to 'type1 post' and return success to browser. But I also want to update my 'Feed' collection with same data. I don't want it to be done before response is send. Because that will increase User's wait time. Hence I have used Events. Here's my code -
const emitter = new event.EventEmitter();
function savePost(){
// Code to save data to Mongo collection
emitter.emit('addToFeeds', data);
console.log('emit done');
return res.json(data);
}
emitter.on('addToFeeds', function(data){
// code to save data to Feeds collection
console.log('emitter msg - ', data);
});
Now when I check the console.log output, it shows "emitter msg -" first and then "emit done". That's why I'm assuming emitter.on code is executing before res.json(data);
Now I'm wondering does Events are blocking code? If I have to update Feeds in background or after response is sent what is the right way? In future I also want to implement caching so I also have to update cache when even a post is added, that too I want to do after response is sent or in background.
Yes, events are synchronous and blocking. They are implemented with simple function calls. If you look at the eventEmitter code, to send an event to all listeners, it literally just iterates through an array of listeners and calls each listener callback, one after the other.
Now I'm wondering does Events are blocking code?
Yes. In the doc for .emit(), it says this: "Synchronously calls each of the listeners registered for the event named eventName, in the order they were registered, passing the supplied arguments to each."
And, further info in the doc in this section Asynchronous vs. Synchronous where it says this:
The EventEmitter calls all listeners synchronously in the order in which they were registered. This is important to ensure the proper sequencing of events and to avoid race conditions or logic errors. When appropriate, listener functions can switch to an asynchronous mode of operation using the setImmediate() or process.nextTick() methods:
If I have to update Feeds in background or after response is sent what is the right way?
Your eventListener can schedule when it wants to actually execute its code with a setTimeout() or a setImmediate() or process.nextTick() if it wants the other listeners and other synchronous code to finish running before it does its work. So, you register a normal listener (which will get called synchronously) and then inside that, you can use a setTimeout() or setImmediate() or process.nextTick() and put the actual work inside that callback. This will delay running your code until after the current Javascript that triggered the initial event is done running.
There is no actual "background processing" in node.js for pure Javascript code. node.js is single threaded so while you're running some Javascript, no other Javascript can run. Actual background processing would have to be done either with existing asynchronous operations (that use native code to run things in the background) such as network I/O or disk I/O) or by running another process to do the work (that other process can be any type of code including another node.js process).
Events are synchronous and will block. This is done so that you can bind events in a specific order and cascade them in that order. You can make each item asynchronous, and if you're making HTTP requests at those events, that's going to happen async, but the events themselves are started synchronously.
See: https://nodejs.org/api/events.html#events_emitter_emit_eventname_args
And: https://nodejs.org/api/events.html#events_asynchronous_vs_synchronous
In my Meteor application to implement a turnbased multiplayer game server, the clients receive the game state via publish/subscribe, and can call a Meteor method sendTurn to send turn data to the server (they cannot update the game state collection directly).
var endRound = function(gameRound) {
// check if gameRound has already ended /
// if round results have already been determined
// --> yes:
do nothing
// --> no:
// determine round results
// update collection
// create next gameRound
};
Meteor.methods({
sendTurn: function(turnParams) {
// find gameRound data
// validate turnParams against gameRound
// store turn (update "gameRound" collection object)
// have all clients sent in turns for this round?
// yes --> call "endRound"
// no --> wait for other clients to send turns
}
});
To implement a time limit, I want to wait for a certain time period (to give clients time to call sendTurn), and then determine the round result - but only if the round result has not already been determined in sendTurn.
How should I implement this time limit on the server?
My naive approach to implement this would be to call Meteor.setTimeout(endRound, <roundTimeLimit>).
Questions:
What about concurrency? I assume I should update collections synchronously (without callbacks) in sendTurn and endRound (?), but would this be enough to eliminate race conditions? (Reading the 4th comment on the accepted answer to this SO question about synchronous database operations also yielding, I doubt that)
In that regard, what does "per request" mean in the Meteor docs in my context (the function endRound called by a client method call and/or in server setTimeout)?
In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node.
In a multi-server / clustered environment, (how) would this work?
Great question, and it's trickier than it looks. First off I'd like to point out that I've implemented a solution to this exact problem in the following repos:
https://github.com/ldworkin/meteor-prisoners-dilemma
https://github.com/HarvardEconCS/turkserver-meteor
To summarize, the problem basically has the following properties:
Each client sends in some action on each round (you call this sendTurn)
When all clients have sent in their actions, run endRound
Each round has a timer that, if it expires, automatically runs endRound anyway
endRound must execute exactly once per round regardless of what clients do
Now, consider the properties of Meteor that we have to deal with:
Each client can have exactly one outstanding method to the server at a time (unless this.unblock() is called inside a method). Following methods wait for the first.
All timeout and database operations on the server can yield to other fibers
This means that whenever a method call goes through a yielding operation, values in Node or the database can change. This can lead to the following potential race conditions (these are just the ones I've fixed, but there may be others):
In a 2-player game, for example, two clients call sendTurn at exactly same time. Both call a yielding operation to store the turn data. Both methods then check whether 2 players have sent in their turns, finding the affirmative, and then endRound gets run twice.
A player calls sendTurn right as the round times out. In that case, endRound is called by both the timeout and the player's method, resulting running twice again.
Incorrect fixes to the above problems can result in starvation where endRound never gets called.
You can approach this problem in several ways, either synchronizing in Node or in the database.
Since only one Fiber can actually change values in Node at a time, if you don't call a yielding operation you are guaranteed to avoid possible race conditions. So you can cache things like the turn states in memory instead of in the database. However, this requires that the caching is done correctly and doesn't carry over to clustered environments.
Move the endRound code outside of the method call itself, using something else to trigger it. This is the approach I've taken which ensures that only the timer or the final player triggers the end of the round, not both (see here for an implementation using observeChanges).
In a clustered environment you will have to synchronize using only the database, probably with conditional update operations and atomic operators. Something like the following:
var currentVal;
while(true) {
currentVal = Foo.findOne(id).val; // yields
if( Foo.update({_id: id, val: currentVal}, {$inc: {val: 1}}) > 0 ) {
// Operation went as expected
// (your code here, e.g. endRound)
break;
}
else {
// Race condition detected, try again
}
}
The above approach is primitive and probably results in bad database performance under high loads; it also doesn't handle timers, but I'm sure with some thinking you can figure out how to extend it to work better.
You may also want to see this timers code for some other ideas. I'm going to extend it to the full setting that you described once I have some time.
I have been writing a lot of NodeJS recently and that has forced me to attack some problems from a different perspective. I was wondering what patterns had developed for the problem of processing chunks of data sequentially (rather than in parallel) in an asynchronous request-environment, but I haven't been able to find anything directly relevant.
So to summarize the problem:
I have a list of data stored in an array format that I need to process.
I have to send this data to a service asynchronously, but the service will only accept a few at a time.
The data must be processed sequentially to meet the restrictions on the service, meaning making a number of parallel asynchronous requests is not allowed
Working in this domain, the simplest pattern I've come up with is a recursive one. Something like
function processData(data, start, step, callback){
if(start < data.length){
var chunk = data.split(start, step);
queryService(chunk, start, step, function(e, d){
//Assume no errors
//Could possibly do some matching between d and 'data' here to
//Update data with anything that the service may have returned
processData(data, start+step, step, callback);
});
}
else{
callback(data);
}
}
Conceptually, this should step through each item, but it's intuitively complex. I feel like there should be a simpler way of doing this. Does anyone have a pattern they tend to follow when approaching this kind of problem?
My first thought process would be to rely on object encapsulation. Create an object that contains all of the information about what needs to be processed and all of the relevant data about what has been processed and is being processed and the callback function will just call the 'next' function for the object, which will in turn start processing on the next piece of data and update the object. Essentially working like a n asynchronous for-loop.
I'm working with Azure Event Hubs and initially when sending data to try and calculate batch size I had code similar to the below that would call EventData.GetBytes
EventHubClient client;//initialized before the relevant code
EventData curr = new EventData(data);
//Setting a partition key, and other operations.
long itemLength = curr.GetBytes().LongLength;
client.SendAsync(curr);
Unfortunately I would receive an exception in the SDK code.
The message body cannot be read multiple times. To reuse it store the value after reading.
While removing the ultimately unnecessary call to GetBytes meant that I could send messages, the rationale for this exception to occur is rather puzzling. Calling GetBytes() twice in a row is an easy way to reproduce the same exception, but a single call will mean that the EventData cannot be sent successfully.
It seems likely that underneath a Message is used and this is set to throw an exception if called more than once as Message.GetBody documents; however, there is no documentation to this effect in EventData's methods GetBodyStream, GetBody w/serializer, GetBody, or GetBytes.
I imagine this should either be documented, or corrected since currently it is an unpleasant surprise in a separate thread.
Have you tried using EventData.SerializedSizeInBytes to get the size? that is a much more accurate way to get the size for batching calculation.
This example confuse my understanding of how the node.js works:
// 1.
numbers.forEach(function(number) {
queue.push(Q.call(slowFunction, this, number));
});
// 2.
// Q.all: execute an array of 'promises' and 'then' call either a resolve
// callback (fulfilled promises) or reject callback (rejected promises)
Q.all(queue).then(function(ful) {
// All the results from Q.all are on the argument as an array
console.log('fulfilled', ful);
}, function(rej) {
// The first rejected (error thrown) will be here only
console.log('rejected', rej);
}).fail(function(err) {
// If something went wrong, then we catch it here, usually when there is no
// rejected callback.
console.log('fail', err);
}).fin(function() {
// Finally statement; executed no matter of the above results
console.log('finally');
});
Why it is assumed here, that 1. and 2. parts of code will be executed sequentially?
So, where is the guarantee that Q.all(queue) works on all queue elements pushed in 1.? Could it be so, that the numbers from 1. is so big, that it works than parallel to 2.?
These ideas coming from the understanding that node.js will handling 1. and 2. first of all with the node.js event-loop and then give it to workers, which are actually analogue to the normal threads.
So the question - will be 1. and 2. executed parallel to each other, started from node.js event-loop sequentially or will they be executed sequentially (the 1. push all elements in the queue and only after that the 2. starts to handling each element in the queue) ?
Please provide arguments with some direct links to documentation for this topic.
At the topmost level, 1. and 2's Q.all(queue).then(...).fail(...).fin(...); method chain, will unambiguously be executed sequentially.
The precise timing of execution of functions defined/called within 1 and 2 depends very much on the nature of slowFunction.
If slowFunction is performed wholly by synchronous javascript (eg. some extensive Math), then 1 will have completed in its entirety before 2 starts. In this case the callbacks specified in 2 will execute very shortly after 2's method chain has finished executing because any promises returned by slowfunction will be (or at least should be) already resolved.
If however, slowFunction involves one or more asynchronous node I/O process (eg. file handling or resource fetching), then each call to it will (at least in part) be undertaken by a non-blocking worker thread (not javascript); in this case, providing slowFunction is correctly written, queue will accumulate a set of promises, each of which will be resolved or rejected later. The callbacks specified in 2 will execute after 2's method chain has finished executing AND when all the promises have been either resolved or rejected.
For me, a rephrased version of 2's introductory text would go a long way toward explaining the order of execution :
Q.all: Wait for every 'promise' in the queue array of to be either
resolved or rejected 'then' call the corresponding callback function.
Ref: http://rickgaribay.net/archive/2012/01/28/node-is-not-single-threaded.aspx