What to do with "release" from unwrapResumable? - haskell

I wrote a simple Wai-to-uwsgi proxy, but in doing so, I had to use unwrapResumable. That gives an unwrapped Pipe and a "release" function that needs to be called eventually. The release function's type is ResourceT IO (), and I think I want to register it with my current resource, but to do that I'd need the release to just be IO (). What should I be doing with the release function?

The release action should already be registered with your ResourceT. In proper conduit code, there are two different ways of taking care of resource cleanup:
Within the Pipe itself. This cleanup will be called as early as possible, but is not exception safe.
From ResourceT. This is exception safe, but may be delayed.
The cleanup action provided by unwrapResumable is allowing you to reclaim the "early as possible" aspect. But if you'd just be calling the cleanup outside of the ResourceT block, there's no need to worry about it anyway.

Related

Accessing state in an IO-Monad

I'm trying to access a state of a State Monad inside an IO action.
More specifically: I'm trying to write a state-dependent signal handler using installHandlerfrom System.Posix.Signals which requires IO, however, I'd like to do different actions and change the state from inside the handler. I took a look at unliftio, but I read that State-Monads shouldn't be unlifted.
Is this possible? I'm more looking for an explanation than for a "copy-paste" solution. If unlifting State inside IO doesn't work, what would a solution look like, for when one wants to do some state-aware processing inside IO?
A value of type State a b does not contain a state. It is just an encapsulated function that can provide a resulting state and a result if you pass it a starting state (using the runState function. There is no way to access an intermediate (or "current") state from the outside of this function. This is what makes the State Monad "pure".
You seem to intend to have a handler, that does not always behave the same (even if invoked with the same parameters), but depends on an outside state. This kind of behaviour is "impure", and cannot be achieved by using only pure means. So what you need in this case is something that encapsulates the impureness in a way that you can access a "current value" of some state from the handler, without the current value itself getting passed into the handler.
As you already know from the comments, the go-to tool to provide access to mutable state to an IO action is using an IORef. IORefs work, because the Haskell runtime (traditionally, before multithreading at least) serializes IO actions. So the concept of the "current value" always makes sense. After every IO action, the value pointed to by every IORef is fixed. The order IO actions happen is also fixed, as it needs to be the order you chain them inside do blocks or using the >>= operators. Handling of Signals is performed by the Haskell runtime in a deterministic way, kind of like everytime you chain two IO actions, the runtime checks for pending signals, and invokes the corresponding handler.
In case you want to write code that manipulates data in an imperative way (where you can have a lot of variables, and even arrays where you update sinlge elements), you could write your code as I/O action and use IORef and IOArray for it, but there is a special "lite" version of IO that just supports mutable state in the same way as I/O without being able to interact with the environment. The shared state needs to be created, read and written from inside the same "capsule" of this special IO lite, so that running the whole action does not interact with outside state, but just with its internal state - the capsule as a whole is thus pure, even if single statements inside the capsule can be considered impure. The name of this lite version of IO is called ST, which is short for "state thread".

Difference between cancel and uninterruptibleCancel (from the Async library)

Context:
I'm trying to understand the difference between cancel and uninterruptibleCancel from the Control.Concurent.Async package. I believe it has something to do with the underlying concepts of mask , uninterruptibleMask, and interruptible operations. Here's what I have understood so for:
Asynchronous exceptions are thrown by thread-A, but need to be handled by thread-B. This is precisely what throwTo does. In some way, this can also be considered as a form of inter-thread communication.
Asynchronous exceptions are used by one thread to kill/cancel another thread.
Handling aysnchronous exceptions creates a problem in the target/receiving thread, because one usually doesn't expect exceptions to be raised at any random point in the code. One puts try / catch around certain operations and expects/handles only certain exceptions. But, Asynchronous exceptions can be delivered when the target thread could be at any point in the execution.
mask allows use to protect critical sections in the target/receiving thread from delivery of asynchronous exceptions. The operation protected by mask doesn't need to deal with asynchronous-exceptions till the point it calls restore.
At this point uninterruptibleMask comes into the picture, and I start losing the plot. I thought the whole point of mask was to NOT deliver asynchronous-exceptions while executing a protected piece of code. However, here is what the docs say about "interruptible actions":
It is useful to think of mask not as a way to completely prevent asynchronous exceptions, but as a way to switch from asynchronous mode to polling mode. The main difficulty with asynchronous exceptions is that they normally can occur anywhere, but within a mask an asynchronous exception is only raised by operations that are interruptible (or call other interruptible operations). In many cases these operations may themselves raise exceptions, such as I/O errors, so the caller will usually be prepared to handle exceptions arising from the operation anyway. To perform an explicit poll for asynchronous exceptions inside mask, use allowInterrupt.
Questions:
Even within a code-block protected by mask, if there are some points where it is safe to handle asynchronous-exceptions, one can call allowInterrupt. This implicitly means, that, unless allowInterrupt is called, asynchronous exceptions will NOT be delivered while executing masked code. What, then, is the purpose of uninterruptibleMask?
Consequently, what is the need for uninterruptibleCancel? IIUC, thread A is trying to cancel thread B, but thread A, itself, is trying to protect itself from some sort of asynchronous exceptions, which may possibly be initiated by a third thread C, right? In the code for cancel (given below), which part is so critical that it needs the ultimate form of protection from asynchronous exceptions? Isn't throwTo an atomic/masked operation itself? Further, even if an asynchronous-exception is delivered to thread-A while executing waitCatch, what difference does it make? Actually, if I think about it, why do we need to even mask this code in the first place (let alone, uninterruptibleMask) ?
cancel a#(Async t _) = throwTo t AsyncCancelled <* waitCatch a
Under no masking, asynchronous exceptions can happen wherever. Under mask, asynchronous exceptions can only appear from interruptible actions (which are generally blocking). Under uninterruptibleMask, asynchronous exceptions are completely out of the picture. Also, please note that allowInterrupt is just one of the interruptible actions; there are a ton more, e.g. takeMVar. With just mask, it is e.g. impossible to block on an MVar without opening yourself up to exceptions, but uninterruptibleMask lets you do it (though you shouldn't).
uninterruptibleCancel is useful because cancel waits for the target thread to finish. This is a blocking operation, so, as is convention, it is also interruptible. Thus, when you use cancel, you open yourself up to receiving unexpected exceptions, whether you are masked or not. When you use uninterruptibleCancel, you are 100% guaranteed to not get an exception. That's it. Remember that exceptions are non-local; even if nothing in cancel is critical, leaving it unprotected means an exception can leak into something that is.
mask $ do
cancel something -- whoops, this can receive an exception, even though it's masked
someCleanup -- therefore this might not get called
vs.
mask $ do
uninterruptibleCancel something -- no exceptions
someCleanup -- so this will definitely happen (assuming the target thread ends)

Haskell STM alwaysSucceeds

There is a function in haskell's stm library with the following type signature:
alwaysSucceeds :: STM a -> STM ()
From what I understand of STM in haskell, there are three ways that something can "go wrong" (using that term loosely) while an STM computation is executing:
The value of an TVar that has been read is changed by another thread.
An user-specified invariant is violated. This seems to usually be triggered by calling retry to make it start over. This effectively makes the thread block and then retry once a TVar in the read set changes.
An exception is thrown. Calling throwSTM causes this. This one differs from the first two because the transaction doesn't get restarted. Instead, the error is propagated and either crashes the program or is caught in the IO monad.
If these are accurate (and if they are not, please tell me), I can't understand what alwaysSucceeds could possibly do. The always function, which appears to be built on top of it, seems like it could be written without alwaysSucceeds as:
--This is probably wrong
always :: STM Bool -> STM ()
always stmBool = stmBool >>= check
The documentation for alwaysSucceeds says:
alwaysSucceeds adds a new invariant that must be true when passed to
alwaysSucceeds, at the end of the current transaction, and at the end
of every subsequent transaction. If it fails at any of those points
then the transaction violating it is aborted and the exception raised
by the invariant is propagated.
But since the argument is of type STM a (polymorphic in a), it can't use the value that the transaction returns for any part of the decision making. So, it seems like it would be looking for the different types of failures that I listed earlier. But what's the point of that? The STM monad already handles the failures. How would wrapping it in this function affect it? And why does the variable of type a get dropped, resulting in STM ()?
The special effect of alwaysSucceeds is not how it checks for failure at the spot it's run (running the "invariant" action alone should do the same thing), but how it reruns the invariant check at the end of transactions.
Basically, this function creates a user-specified invariant as in (2) above, that has to hold not just right now, but also at the end of later transactions.
Note that a "transaction" doesn't refer to every single subaction in the STM monad, but to a combined action that is passed to atomically.
I guess the a is dropped just for convenience so you don't have to convert an action to STM () (e.g. with void) before passing it to alwaysSucceeds. The return value will be useless anyhow for the later repeated checks.

Correct design for Haskell exception handling

I'm currently trying to wrap my mind around the correct way to use exceptions in Haskell. How exceptions work is straight-forward enough; I'm trying to get a clear picture of the correct way to interpret them.
The basic position is that, in a well-designed application, exceptions shouldn't escape to the top-level. Any exception that does is clearly one which the designer did not anticipate - i.e., a program bug (e.g., divide by zero), rather than an unusual run-time occurrence (e.g., file not found).
To that end, I wrote a simple top-level exception handler that catches all exceptions and prints a message to stderr saying "this is a bug" (before rethrowing the exception to terminate the program).
However, suppose the user presses Ctrl+C. This causes an exception to be thrown. Clearly this is not any kind of program bug. However, failing to anticipate and react to a user abort such as this could be considered a bug. So perhaps the program should catch this and handle it appropriately, doing any necessary cleanup before exiting.
The thing is, though... The code that handles this is going to catch the exception, release any resources or whatever, and then rethrow the exception! So if the exception makes it to the top-level, that doesn't necessarily mean it was unhandled. It just means we wanted to exit quickly.
So, my question: Should exceptions be used for flow-control in this manner? Should every function that explicitly catches UserInterrupt use explicit flow-control constructs to exit manually rather than rethrow the exception? (But then how does the caller know to also exit?) Is it OK for UserInterrupt to reach the top-level? But in that case, is it OK for ThreadKilled too, by the same argument?
In short, should the interrupt handler make a special case for UserInterrupt (and possibly ThreadKilled)? What about a HeapOverflow or StackOverflow? Is that a bug? Or is that "circumstance beyond the program's control"?
Cleaning up in the presence of exceptions
However, failing to anticipate and react to a user abort such as this could be considered a bug. So perhaps the program should catch this and handle it appropriately, doing any necessary cleanup before exiting.
In some sense you are right — the programmer should anticipate exceptions. But not by catching them. Instead, you should use exception-safe functions, such as bracket.
For example:
import Control.Exception
data Resource
acquireResource :: IO Resource
releaseResource :: Resource -> IO ()
workWithResource = bracket acquireResource releaseResource $ \resource -> ...
This way the resources will be cleaned up regardless of whether the program will be aborted by Ctrl+C.
Should exceptions reach top level?
Now, I'd like to address another statement of yours:
The basic position is that, in a well-designed application, exceptions shouldn't escape to the top-level.
I would argue that, in a well-designed application, exceptions are a perfectly fine way to abort. If there are any problems with this, then you're doing something wrong (e.g. want to execute a cleanup action at the end of main — but that should be done in bracket!).
Here's what I often do in my programs:
Define a data type that represents any possible error — anything that might go wrong. Some of them often wrap other exceptions.
data ProgramError
= InputFileNotFound FilePath IOException
| ParseError FilePath String
| ...
Define how to print errors in a user-friendly way:
instance Show ProgramError where
show (InputFileNotFound path e) = printf "File '%s' could not be read: %s" path (show e)
...
Declare the type as an exception:
instance Exception ProgramError
Throw these exceptions in the program whenever I feel like it.
Should I catch exceptions?
Exceptions that you anticipate must be caught and wrapped (e.g. in InputFileNotFound) to give them more context. What about the exceptions that you don't anticipate?
I can see some value in printing "it's a bug" to the users, so that they report the problem back to you. If you do this, you should anticipate UserInterrupt — it's not a bug, as you say. How you should treat ThreadKilled depends on your application — literally, whether you anticipate it!
This, however, is orthogonal to the "good design" and depends more on what kind of users you're targeting, what you expect of them and what they expect of your program.
The response may range from just printing the exception to a dialog that says "we're very sorry, would you like to submit a report to the developers?".
Should exceptions be used for flow-control in this manner?
Yes. I highly recommend you read Breaking from a loop, which shows how Either and EitherT at their core at nothing more than abstractions for exiting from a code block early. Exceptions are just a special case of this behavior where you exit because of an error, but there is no reason why that should be the only case in which you exit prematurely.

Clojure mutable storage types

I'm attempting to learn Clojure from the API and documentation available on the site. I'm a bit unclear about mutable storage in Clojure and I want to make sure my understanding is correct. Please let me know if there are any ideas that I've gotten wrong.
Edit: I'm updating this as I receive comments on its correctness.
Disclaimer: All of this information is informal and potentially wrong. Do not use this post for gaining an understanding of how Clojure works.
Vars always contain a root binding and possibly a per-thread binding. They are comparable to regular variables in imperative languages and are not suited for sharing information between threads. (thanks Arthur Ulfeldt)
Refs are locations shared between threads that support atomic transactions that can change the state of any number of refs in a single transaction. Transactions are committed upon exiting sync expressions (dosync) and conflicts are resolved automatically with STM magic (rollbacks, queues, waits, etc.)
Agents are locations that enable information to be asynchronously shared between threads with minimal overhead by dispatching independent action functions to change the agent's state. Agents are returned immediately and are therefore non-blocking, although an agent's value isn't set until a dispatched function has completed.
Atoms are locations that can be synchronously shared between threads. They support safe manipulation between different threads.
Here's my friendly summary based on when to use these structures:
Vars are like regular old variables in imperative languages. (avoid when possible)
Atoms are like Vars but with thread-sharing safety that allows for immediate reading and safe setting. (thanks Martin)
An Agent is like an Atom but rather than blocking it spawns a new thread to calculate its value, only blocks if in the middle of changing a value, and can let other threads know that it's finished assigning.
Refs are shared locations that lock themselves in transactions. Instead of making the programmer decide what happens during race conditions for every piece of locked code, we just start up a transaction and let Clojure handle all the lock conditions between the refs in that transaction.
Also, a related concept is the function future. To me, it seems like a future object can be described as a synchronous Agent where the value can't be accessed at all until the calculation is completed. It can also be described as a non-blocking Atom. Are these accurate conceptions of future?
It sounds like you are really getting Clojure! good job :)
Vars have a "root binding" visible in all threads and each individual thread can change the value it sees with out affecting the other threads. If my understanding is correct a var cannot exist in just one thread with out a root binding that is visible to all and it cant be "rebound" until it has been defined with (def ... ) the first time.
Refs are committed at the end of the (dosync ... ) transaction that encloses the changes but only when the transaction was able to finish in a consistent state.
I think your conclusion about Atoms is wrong:
Atoms are like Vars but with thread-sharing safety that blocks until the value has changed
Atoms are changed with swap! or low-level with compare-and-set!. This never blocks anything. swap! works like a transaction with just one ref:
the old value is taken from the atom and stored thread-local
the function is applied to the old value to generate a new value
if this succeeds compare-and-set is called with old and new value; only if the value of the atom has not been changed by any other thread (still equals old value), the new value is written, otherwise the operation restarts at (1) until is succeeds eventually.
I've found two issues with your question.
You say:
If an agent is accessed while an action is occurring then the value isn't returned until the action has finished
http://clojure.org/agents says:
the state of an Agent is always immediately available for reading by any thread
I.e. you never have to wait to get the value of an agent (I assume the value changed by an action is proxied and changed atomically).
The code for the deref-method of an Agent looks like this (SVN revision 1382):
public Object deref() throws Exception{
if(errors != null)
{
throw new Exception("Agent has errors", (Exception) RT.first(errors));
}
return state;
}
No blocking is involved.
Also, I don't understand what you mean (in your Ref section) by
Transactions are committed on calls to deref
Transactions are committed when all actions of the dosync block have been completed, no exceptions have been thrown and nothing has caused the transaction to be retried. I think deref has nothing to do with it, but maybe I misunderstand your point.
Martin is right when he say that Atoms operation restarts at 1. until is succeeds eventually.
It is also called spin waiting.
While it is note really blocking on a lock the thread that did the operation is blocked until the operation succeeds so it is a blocking operation and not an asynchronously operation.
Also about Futures, Clojure 1.1 has added abstractions for promises and futures.
A promise is a synchronization construct that can be used to deliver a value from one thread to another. Until the value has been delivered, any attempt to dereference the promise will block.
(def a-promise (promise))
(deliver a-promise :fred)
Futures represent asynchronous computations. They are a way to get code to run in another thread, and obtain the result.
(def f (future (some-sexp)))
(deref f) ; blocks the thread that derefs f until value is available
Vars don't always have a root binding. It's legal to create a var without a binding using
(def x)
or
(declare x)
Attempting to evaluate x before it has a value will result in
Var user/x is unbound.
[Thrown class java.lang.IllegalStateException]

Resources