How can I catch and handle an exception in a chain of async call between contracts?
Suppose, that my transaction initiate the following calls:
contractA.run()
-> do changes in contractA
-> calls contractB.run()
-> do changes in contractB
-> then calls another method on contractA: contractA.callback()
* callback() crashes
After an exception in a Promise, NEAR is not rolling back changes occured in past promises. I also don't see any method for handling exceptions in near-sdk.
One idea would be to return errors instead of throwing exceptions and create bunch of private functions to update the state after error value and adding / releasing mutexes. However this won't solve sometimes we can't control that, eg in external smart-contracts (eg, if contractB.do would panic in the example above).
The only way to catch an exception is to have a callback on the promise that generated the exception.
In the explained scenario, the contractA.callback() shouldn't crash. You need to construct the contract carefully enough to avoid failing on the callback. Most of the time it's possible to do, since you control the input to the callback and the amount gas attached. If the callback fails, it's similar to having an exception within an exception handling code.
Also note, that you can make sure the callback is scheduled properly with the enough gas attached in contractA.run(). If it's not the case and for example you don't have enough gas attached to run, the scheduling of callback and other promise will fail and the entire state from run changes is rolled back.
But once run completes, the state changes from run are committed and callback has to be carefully processed.
We have a few places in lockup contract where the callback is allowed to fail: https://github.com/near/core-contracts/blob/6fb13584d5c9eb1b372cfd80cd18f4a4ba8d15b6/lockup/src/owner_callbacks.rs#L7-L24
And also most of the places where the callback doesn't fail: https://github.com/near/core-contracts/blob/6fb13584d5c9eb1b372cfd80cd18f4a4ba8d15b6/lockup/src/owner_callbacks.rs#L28-L61
To point out there are some situation where the contract doesn't want to rely on the stability of other contracts, e.g. when the flow is A --> B --> A --> B. In this case B can't attach the callback to the resource given to A. For these scenarios we were discussing a possibility of adding a specific construct that is an atomic and has a resolving callback once it's dropped. We called it Safe: https://github.com/nearprotocol/NEPs/pull/26
EDIT
What if contractB.run fails and I will like to update the state in contractA to rollback changes from contractA.run?
In this case contractA.callback() is still called, but it has PromiseResult::Failed for its dependency contractB.run.
So callback() can modify the state of contractA to revert changes.
For example, a callback from the lockup contract implementation to handle withdrawal from the staking pool contract: https://github.com/near/core-contracts/blob/6fb13584d5c9eb1b372cfd80cd18f4a4ba8d15b6/lockup/src/foundation_callbacks.rs#L143-L185
If we adapt names to match the example:
The lockup contract (contractA) tries to withdraws funds (run()) from the staking pool (contractB), but the funds might still be locked due to recent unstaking, so the withdrawal fails (contractB.run() fails).
The callback is called (contractA.callback()) and it checks the success of the promise (of contractB.run). Since withdrawal failed, callback reverts the state back to the original (reverts the status).
Actually, it's slightly more complicated because the actual sequence is A.withdraw_all -> B.get_amount -> A.on_amount_for_withdraw -> B.withdraw(amount) -> A.on_withdraw
Related
I was following the guide here for setting up a presignup trigger.
However, when I used callback(null, event) my lambda function would never actually return and I would end up getting an error
{ code: 'UnexpectedLambdaException',
name: 'UnexpectedLambdaException',
message: 'arn:aws:lambda:us-east-2:642684845958:function:proj-dev-confirm-1OP5DB3KK5WTA failed with error Socket timeout while invoking Lambda function.' }
I found a similar link here that says to use context.done().
After switching it works perfectly fine.
What's the difference?
exports.confirm = (event, context, callback) => {
event.response.autoConfirmUser = true;
context.done(null, event);
//callback(null, event); does not work
}
Back in the original Lambda runtime environment for Node.js 0.10, Lambda provided helper functions in the context object: context.done(err, res) context.succeed(res) and context.fail(err).
This was formerly documented, but has been removed.
Using the Earlier Node.js Runtime v0.10.42 is an archived copy of a page that no longer exists in the Lambda documentation, that explains how these methods were used.
When the Node.js 4.3 runtime for Lambda was launched, these remained for backwards compatibility (and remain available but undocumented), and callback(err, res) was introduced.
Here's the nature of your problem, and why the two solutions you found actually seem to solve it.
Context.succeed, context.done, and context.fail however, are more than just bookkeeping – they cause the request to return after the current task completes and freeze the process immediately, even if other tasks remain in the Node.js event loop. Generally that’s not what you want if those tasks represent incomplete callbacks.
https://aws.amazon.com/blogs/compute/node-js-4-3-2-runtime-now-available-on-lambda/
So with callback, Lambda functions now behave in a more paradigmatically correct way, but this is a problem if you intend for certain objects to remain on the event loop during the freeze that occurs between invocations -- unlike the old (deprecated) done fail succeed methods, using the callback doesn't suspend things immediately. Instead, it waits for the event loop to be empty.
context.callbackWaitsForEmptyEventLoop -- default true -- was introduced so that you can set it to false for those cases where you want the Lambda function to return immediately after you call the callback, regardless of what's happening in the event loop. The default is true because false can mask bugs in your function and can cause very erratic/unexpected behavior if you fail to consider the implications of container reuse -- so you shouldn't set this to false unless and until you understand why it is needed.
A common reason false is needed would be a database connection made by your function. If you create a database connection object in a global variable, it will have an open socket, and potentially other things like timers, sitting on the event loop. This prevents the callback from causing Lambda to return a response, until these operations are also finished or the invocation timeout timer fires.
Identify why you need to set this to false, and if it's a valid reason, then it is correct to use it.
Otherwise, your code may have a bug that you need to understand and fix, such as leaving requests in flight or other work unfinished, when calling the callback.
So, how do we parse the Cognito error? At first, it seemed pretty unusual, but now it's clear that it is not.
When executing a function, Lambda will throw an error that the tasked timed out after the configured number of seconds. You should find this to be what happens when you test your function in the Lambda console.
Unfortunately, Cognito appears to have taken an internal design shortcut when invoking a Lambda function, and instead of waiting for Lambda to timeout the invocarion (which could tie up resources inside Cognito) or imposing its own explicit timer on the maximum duration Cognito will wait for a Lambda response, it's relying on a lower layer socket timer to constrain this wait... thus an "unexpected" error is thrown while invoking the timeout.
Further complicating interpreting the error message, there are missing quotes in the error, where the lower layer exception is interpolated.
To me, the problem would be much more clear if the error read like this:
'arn:aws:lambda:...' failed with error 'Socket timeout' while invoking Lambda function
This format would more clearly indicate that while Cognito was invoking the function, it threw an internal Socket timeout error (as opposed to Lambda encountering an unexpected internal error, which was my original -- and incorrect -- assumption).
It's quite reasonable for Cognito to impose some kind of response time limit on the Lambda function, but I don't see this documented. I suspect a short timeout on your Lambda function itself (making it fail more promptly) would cause Cognito to throw a somewhat more useful error, but in my mind, Cognito should have been designed to include logic to make this an expected, defined error, rather than categorizing it as "unexpected."
As an update the Runtime Node.js 10.x handler supports an async function that makes use of return and throw statements to return success or error responses, respectively. Additionally, if your function performs asynchronous tasks then you can return a Promise where you would then use resolve or reject to return a success or error, respectively. Either approach simplifies things by not requiring context or callback to signal completion to the invoker, so your lambda function could look something like this:
exports.handler = async (event) => {
// perform tasking...
const data = doStuffWith(event)
// later encounter an error situation
throw new Error('tell invoker you encountered an error')
// finished tasking with no errors
return { data }
}
Of course you can still use context but its not required to signal completion.
Here are some simple questions based on behaviour I noticed in the following example running in node:
Q('THING 1').then(console.log.bind(console));
console.log('THING 2');
The output for this is:
> "THING 2"
> "THING 1"
Questions:
1) Why is Q implemented to wait before running the callback on a value that is immediately known? Why isn't Q smart enough to allow the first line to synchronously issue its output before the 2nd line runs?
2) What is the time lapse between "THING 2" and "THING 1" being output? Is it a single process tick?
3) Could there be performance concerns with values that are deeply wrapped in promises? For example, does Q(Q(Q("THING 1"))) asynchronously wait 3 times as long to complete, even though it can be efficiently synchronously resolved?
This is actually done on purpose. It is to make it consistent whether or not the value is known or not. That way there is only one order of evaluation and you can depend on the fact that no matter if the promise has already settled or not, that order will be the same.
Also, doing it otherwise would make it possible to write a code to test if the promise has settled or not and by design it should not be known and acted upon.
This is pretty much the as doing callback-style code like this:
function fun(args, callback) {
if (!args) {
process.nextTick(callback, 'error');
}
// ...
}
so that anyone who calls it with:
fun(x, function (err) {
// A
});
// B
can be sure that A will never run before B.
The spec
See the Promises/A+ Specification, The then Method section, point 4:
onFulfilled or onRejected must not be called until the execution context stack contains only platform code.
See also the the note 1:
Here "platform code" means engine, environment, and promise implementation code. In practice, this requirement ensures that onFulfilled and onRejected execute asynchronously, after the event loop turn in which then is called, and with a fresh stack. This can be implemented with either a "macro-task" mechanism such as setTimeout or setImmediate, or with a "micro-task" mechanism such as MutationObserver or process.nextTick. Since the promise implementation is considered platform code, it may itself contain a task-scheduling queue or "trampoline" in which the handlers are called.
So this is actually mandated by the spec.
It was discussed extensively to make sure that this requirement is clear - see:
https://github.com/promises-aplus/promises-spec/pull/70
https://github.com/promises-aplus/promises-spec/pull/104
https://github.com/promises-aplus/promises-spec/issues/100
https://github.com/promises-aplus/promises-spec/issues/139
https://github.com/promises-aplus/promises-spec/issues/229
I am trying to implement a couple of state handler funcitons in my javascript code, in order to perform 2 different distinct actions in each state. This is similar to a state design pattern of Java (https://sourcemaking.com/design_patterns/state).
Conceptually, my program need to remain connected to an elasticsearch instance (or any other server for that matter), and then parse and POST some incoming data to el. If there is no connection available to elasticsearch, my program would keep tring to connect to el endlessly with some retry period.
In a nutshell,
When not connected, keep trying to connect
When connected, start POSTing the data
The main run loop is calling itself recurssively,
function run(ctx) {
logger.info("run: running...");
// initially starts with disconnected state...
return ctx.curState.run(ctx)
.then(function(result) {
if (result) ctx.curState = connectedSt;
// else it remains in old state.
return run(ctx);
});
}
This is not a truly recursive fn in the sense that each invocation is calling itself in a tight loop. But I suspect it ends up with many promises in the chain, and in the long run it will consume more n more memory and hence eventually hang.
Is my assumption / understanding right? Or is it OK to write this kinda code?
If not, should I consider calling setImmediate / process.nextTick etc?
Or should I consider using TCO (Tail Cost Optimization), ofcourse I am yet to fully understand this concept.
Yes, by returning a new promise (the result of the recursive call to run()), you effectively chain in another promise.
Neither setImmediate() nor process.nextTick() are going to solve this directly.
When you call run() again, simply don't return it and you should be fine.
In my Meteor application to implement a turnbased multiplayer game server, the clients receive the game state via publish/subscribe, and can call a Meteor method sendTurn to send turn data to the server (they cannot update the game state collection directly).
var endRound = function(gameRound) {
// check if gameRound has already ended /
// if round results have already been determined
// --> yes:
do nothing
// --> no:
// determine round results
// update collection
// create next gameRound
};
Meteor.methods({
sendTurn: function(turnParams) {
// find gameRound data
// validate turnParams against gameRound
// store turn (update "gameRound" collection object)
// have all clients sent in turns for this round?
// yes --> call "endRound"
// no --> wait for other clients to send turns
}
});
To implement a time limit, I want to wait for a certain time period (to give clients time to call sendTurn), and then determine the round result - but only if the round result has not already been determined in sendTurn.
How should I implement this time limit on the server?
My naive approach to implement this would be to call Meteor.setTimeout(endRound, <roundTimeLimit>).
Questions:
What about concurrency? I assume I should update collections synchronously (without callbacks) in sendTurn and endRound (?), but would this be enough to eliminate race conditions? (Reading the 4th comment on the accepted answer to this SO question about synchronous database operations also yielding, I doubt that)
In that regard, what does "per request" mean in the Meteor docs in my context (the function endRound called by a client method call and/or in server setTimeout)?
In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node.
In a multi-server / clustered environment, (how) would this work?
Great question, and it's trickier than it looks. First off I'd like to point out that I've implemented a solution to this exact problem in the following repos:
https://github.com/ldworkin/meteor-prisoners-dilemma
https://github.com/HarvardEconCS/turkserver-meteor
To summarize, the problem basically has the following properties:
Each client sends in some action on each round (you call this sendTurn)
When all clients have sent in their actions, run endRound
Each round has a timer that, if it expires, automatically runs endRound anyway
endRound must execute exactly once per round regardless of what clients do
Now, consider the properties of Meteor that we have to deal with:
Each client can have exactly one outstanding method to the server at a time (unless this.unblock() is called inside a method). Following methods wait for the first.
All timeout and database operations on the server can yield to other fibers
This means that whenever a method call goes through a yielding operation, values in Node or the database can change. This can lead to the following potential race conditions (these are just the ones I've fixed, but there may be others):
In a 2-player game, for example, two clients call sendTurn at exactly same time. Both call a yielding operation to store the turn data. Both methods then check whether 2 players have sent in their turns, finding the affirmative, and then endRound gets run twice.
A player calls sendTurn right as the round times out. In that case, endRound is called by both the timeout and the player's method, resulting running twice again.
Incorrect fixes to the above problems can result in starvation where endRound never gets called.
You can approach this problem in several ways, either synchronizing in Node or in the database.
Since only one Fiber can actually change values in Node at a time, if you don't call a yielding operation you are guaranteed to avoid possible race conditions. So you can cache things like the turn states in memory instead of in the database. However, this requires that the caching is done correctly and doesn't carry over to clustered environments.
Move the endRound code outside of the method call itself, using something else to trigger it. This is the approach I've taken which ensures that only the timer or the final player triggers the end of the round, not both (see here for an implementation using observeChanges).
In a clustered environment you will have to synchronize using only the database, probably with conditional update operations and atomic operators. Something like the following:
var currentVal;
while(true) {
currentVal = Foo.findOne(id).val; // yields
if( Foo.update({_id: id, val: currentVal}, {$inc: {val: 1}}) > 0 ) {
// Operation went as expected
// (your code here, e.g. endRound)
break;
}
else {
// Race condition detected, try again
}
}
The above approach is primitive and probably results in bad database performance under high loads; it also doesn't handle timers, but I'm sure with some thinking you can figure out how to extend it to work better.
You may also want to see this timers code for some other ideas. I'm going to extend it to the full setting that you described once I have some time.
I'm attempting to learn Clojure from the API and documentation available on the site. I'm a bit unclear about mutable storage in Clojure and I want to make sure my understanding is correct. Please let me know if there are any ideas that I've gotten wrong.
Edit: I'm updating this as I receive comments on its correctness.
Disclaimer: All of this information is informal and potentially wrong. Do not use this post for gaining an understanding of how Clojure works.
Vars always contain a root binding and possibly a per-thread binding. They are comparable to regular variables in imperative languages and are not suited for sharing information between threads. (thanks Arthur Ulfeldt)
Refs are locations shared between threads that support atomic transactions that can change the state of any number of refs in a single transaction. Transactions are committed upon exiting sync expressions (dosync) and conflicts are resolved automatically with STM magic (rollbacks, queues, waits, etc.)
Agents are locations that enable information to be asynchronously shared between threads with minimal overhead by dispatching independent action functions to change the agent's state. Agents are returned immediately and are therefore non-blocking, although an agent's value isn't set until a dispatched function has completed.
Atoms are locations that can be synchronously shared between threads. They support safe manipulation between different threads.
Here's my friendly summary based on when to use these structures:
Vars are like regular old variables in imperative languages. (avoid when possible)
Atoms are like Vars but with thread-sharing safety that allows for immediate reading and safe setting. (thanks Martin)
An Agent is like an Atom but rather than blocking it spawns a new thread to calculate its value, only blocks if in the middle of changing a value, and can let other threads know that it's finished assigning.
Refs are shared locations that lock themselves in transactions. Instead of making the programmer decide what happens during race conditions for every piece of locked code, we just start up a transaction and let Clojure handle all the lock conditions between the refs in that transaction.
Also, a related concept is the function future. To me, it seems like a future object can be described as a synchronous Agent where the value can't be accessed at all until the calculation is completed. It can also be described as a non-blocking Atom. Are these accurate conceptions of future?
It sounds like you are really getting Clojure! good job :)
Vars have a "root binding" visible in all threads and each individual thread can change the value it sees with out affecting the other threads. If my understanding is correct a var cannot exist in just one thread with out a root binding that is visible to all and it cant be "rebound" until it has been defined with (def ... ) the first time.
Refs are committed at the end of the (dosync ... ) transaction that encloses the changes but only when the transaction was able to finish in a consistent state.
I think your conclusion about Atoms is wrong:
Atoms are like Vars but with thread-sharing safety that blocks until the value has changed
Atoms are changed with swap! or low-level with compare-and-set!. This never blocks anything. swap! works like a transaction with just one ref:
the old value is taken from the atom and stored thread-local
the function is applied to the old value to generate a new value
if this succeeds compare-and-set is called with old and new value; only if the value of the atom has not been changed by any other thread (still equals old value), the new value is written, otherwise the operation restarts at (1) until is succeeds eventually.
I've found two issues with your question.
You say:
If an agent is accessed while an action is occurring then the value isn't returned until the action has finished
http://clojure.org/agents says:
the state of an Agent is always immediately available for reading by any thread
I.e. you never have to wait to get the value of an agent (I assume the value changed by an action is proxied and changed atomically).
The code for the deref-method of an Agent looks like this (SVN revision 1382):
public Object deref() throws Exception{
if(errors != null)
{
throw new Exception("Agent has errors", (Exception) RT.first(errors));
}
return state;
}
No blocking is involved.
Also, I don't understand what you mean (in your Ref section) by
Transactions are committed on calls to deref
Transactions are committed when all actions of the dosync block have been completed, no exceptions have been thrown and nothing has caused the transaction to be retried. I think deref has nothing to do with it, but maybe I misunderstand your point.
Martin is right when he say that Atoms operation restarts at 1. until is succeeds eventually.
It is also called spin waiting.
While it is note really blocking on a lock the thread that did the operation is blocked until the operation succeeds so it is a blocking operation and not an asynchronously operation.
Also about Futures, Clojure 1.1 has added abstractions for promises and futures.
A promise is a synchronization construct that can be used to deliver a value from one thread to another. Until the value has been delivered, any attempt to dereference the promise will block.
(def a-promise (promise))
(deliver a-promise :fred)
Futures represent asynchronous computations. They are a way to get code to run in another thread, and obtain the result.
(def f (future (some-sexp)))
(deref f) ; blocks the thread that derefs f until value is available
Vars don't always have a root binding. It's legal to create a var without a binding using
(def x)
or
(declare x)
Attempting to evaluate x before it has a value will result in
Var user/x is unbound.
[Thrown class java.lang.IllegalStateException]