What forces drove WAI Application to be redesigned five times? - haskell

I took a curious look at WAI interface and while it looks simple, I was surprised to see how many iterations it took to stabilize at the current form!
I had assumed that CPS style for resource safety would be the most interesting thing but it looks like there is much more to learn from!
$ git log -p --reverse -- wai/Network/Wai.hs | grep '\+type Application'
+type Application = Request -> Iteratee B.ByteString IO Response
+type Application = Request -> ResourceT IO Response
+type Application = Request -> C.ResourceT IO Response
+type Application = Request -> IO Response
+type Application = Request -> (forall b. (Response -> IO b) -> IO b)
+type Application = Request -> (Response -> IO ResponseReceived)
-> IO ResponseReceived
Some archeology yields somewhat unsatisfactory results:
$ git log --reverse -G 'type Application' --pretty=oneline -- wai/Network/Wai.hs | cat
879d4a23047c3585e1cba4cdd7c3e8fc13e17592 Moved everything to wai subfolder
360442ac74f7e79bb0e320110056b3f44e15107c Began moving wai/warp to conduit
af7d1a79cbcada0b18883bcc5e5e19a1cd06ae7b conduit 0.3
fe2032ad4c7435709ed79683acac3b91110bba04 Pass around an InternalState instead of living in ResourceT
63ad533299a0a5bad01a36171d98511fdf8d5821 Application uses bracket pattern
1e1b8c222cce96c3d58cd27318922c318642050d ResponseReceived, to avoid existential issues

All the designs seem to be driven by three main concerns:
Requests can have streamed bodies (so we don't have to load them all in memory before starting to process them). How to best represent it?
Responses can be streamed as well. How to best represent it?
How to ensure that resources allocated in the production of a response are properly freed? (For example, how to ensure that file handles are freed after serving a file?)
type Application = Request -> Iteratee B.ByteString IO Response
This version uses iteratees, which were an early solution for streaming data in Haskell. Iteratee consumers had to be written in a "push-based" way, which was arguably less natural than the "pull-based" consumers used in modern streaming libraries.
The streamed body of the request is fed to the iteratee and we get a Response value at the end. The Response contains an enumerator (a function that feeds streamed response bytes to a response iteratee supplied by the server). Presumably, the enumerator would control resource allocation using functions like bracket.
type Application = Request -> ResourceT IO Response
This version uses the resourcet monad transformer for resource management, instead of doing it in the enumerator. There is a special Source type inside both Request and Response which handles streamed data (and which is a bit hard to understant IMHO).
type Application = Request -> IO Response
This version uses the streaming abstractions from conduit, but eschews resourcet and instead provides a bracket-like responseSourceBracket function for handling resources in streamed responses.
type Application = Request -> (forall b. (Response -> IO b) -> IO b)
type Application = Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
This version moves to a continuation-based approach which enables the handler function to use regular bracket-like functions to control resource allocation. Back to square one, in that respect!
Conduits are no longer used for streaming. Now there is a plain Request -> IO ByteString function for reading chunks of the request body, and a (Builder -> IO ()) -> IO () -> IO () function in the Response for generating the response stream. (The Builder -> IO () write function along with a flush action are supplied by the server.)
Like the resourcet-based versions, and unlike the iteratee-based version, this implementation lets you overlap reading the request body with streaming the response.
The polymorphic handler is a neat trick to ensure that the response-taking callback Response -> IO b is always called: the handler needs to return a b, and the only way to get one is to actually invoke the callback!
This polymorphic solution seems to have caused some problems (perhaps with storing handlers in containers?) Instead of using polymorphism, we can use a ResponseReceived token without a public constructor. The effect is the same: the only way for handler code to get hold of the token it needs to return is to invoke the callback.

Related

Creating a Handle that guarantees hClose failure

I'd like to create a Handle that guarantees failure (exception) when it's passed into hClose. I need this for testing purposes.
How do I create such a Handle?
The module GHC.IO.Handle of the base package has the function mkFileHandle:
mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle
IODevice and BufferedIO are typeclasses that provide basic handle operations for a device. In particular, IODevice has the close method.
You can create your own dummy device type, define those two instances for it (with a close that throws an exception) and then use mkFileHandle to obtain a useable Handle.
See the code of the knob package for an example of how to do this.

Distributing Haskell on a cluster

I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO ()
This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new Main.hs containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using threadDelay) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
Yep. There is a magical library called packman. It allows you to turn any haskell thing into data (as long as it does not have IORefs or related things in them.) Here the things you would need:
trySerialize :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a
instance Typeable a => Binary (Serialized a)
Yep, those are the exact types. You can package up your IO actions using trySerialize, use Binary to transfer it to wherever, and then deserialize to get the IO action out, ready for use.
Caveats for packman is that:
It stores things as thunks. This is probably what you want, so that the node can do the evaluating.
That said, if your thunk is huge, the Binary will probably be huge. Evaluating the thunk can fix this.
Like I said, mutable references are a no-no. One thing to watch out is them being inside thunks without you knowing it.
Other than that, this seems like what you want!

Concurrency considerations between pipes and non-pipes code

I'm in the process of wrapping a C library for some encoding in a pipes interface, but I've hit upon some design decisions that need to be made.
After the C library is set up, we hold on to an encoder context. With this, we can either encode, or change some parameters (let's call the Haskell interface to this last function tune :: Context -> Int -> IO ()). There are two parts to my question:
The encoding part is easily wrapped up in a Pipe Foo Bar IO (), but I would also like to expose tune. Since simultaneous use of the encoding context must be lock protected, I would need to take a lock at every iteration in the pipe, and protect tune with taking the same lock. But now I feel I'm forcing hidden locks on the user. Am I barking up the wrong tree here? How is this kind of situation normally resolved in the pipes ecosystem? In my case I expect the pipe that my specific code is part of to always run in its own thread, with tuning happening concurrently, but I don't want to force this point of view upon any users. Other packages in the pipes ecosystem do not seem to force their users like either.
An encoding context that is no longer used needs to be properly de-initialized. How does one, in the pipes ecosystem, ensure that such things (in this case performing som IO actions) are taken care of when the pipe is destroyed?
A concrete example would be wrapping a compression library, in which case the above can be:
The compression strength is tunable. We set up the pipe and it runs along merrily. How should one best go about allowing the compression strength setting to be changed while the pipe keeps running, assuming that concurrent access to the compression codec context must be serialized?
The compression library allocated a bunch of memory off the Haskell heap when set up, and we'll need to call some library function to clean this up when the pipe is torn down.
Thanks… this might all be obvious, but I'm quite new to the pipes ecosystem.
Edit: Reading this after posting, I'm quite sure it's the vaguest question I've ever asked here. Ugh! Sorry ;-)
Regarding (1), the general solution is to change your Pipe's type to:
Pipe (Either (Context, Int) Foo) Bar IO ()
In other words, it accepts both Foo inputs and tune requests, which it processes internally.
So let's then assume that you have two concurrent Producers corresponding to inputs and tune requests:
producer1 :: Producer Foo IO ()
producer2 :: Producer (Context, Int) IO ()
You can use pipes-concurrency to create a buffer that they both feed into, like this:
example = do
(output, input) <- spawn Unbounded
-- input :: Input (Either (Context, Int) Foo)
-- output :: Output (Either (Context, Int) Foo)
let io1 = runEffect $ producer1 >-> Pipes.Prelude.map Right >-> toOutput output
io2 = runEffect $ producer2 >-> Pipes.Prelude.map Left >-> toOutput output
as <- mapM async [io1, io2]
runEffect (fromInput >-> yourPipe >-> someConsumer)
mapM_ wait as
You can learn more about the pipes-concurrency library by reading this tutorial.
By forcing all tune requests to go through the same single-threaded Pipe you can ensure that you don't accidentally have two concurrent invocations of the tune function.
Regarding (2) there are two ways you can acquire a resource using pipes. The more sophisticated approach is to use the pipes-safe library, which provides a bracket function that you can use within a Pipe, but that is probably overkill for your purpose and only exists for acquiring and releasing multiple resources over the lifetime of a pipe. A simpler solution is just to use the following with idiom to acquire the pipe:
withEncoder :: (Pipe Foo Bar IO () -> IO r) -> IO r
withEncoder k = bracket acquire release $ \resource -> do
k (createPipeFromResource resource)
Then a user would just write:
withEncoder $ \yourPipe -> do
runEffect (someProducer >-> yourPipe >-> someConsumer)
You can optionally use the managed package, which simplifies the types a bit and makes it easier to acquire multiple resources. You can learn more about it from reading this blog post of mine.

Ensure IO computations are run in a specific thread

I need to make sure that some actions are run on a specific OS thread. I wrote an API where this thread runs a loop listening to a TQueue and executes the given commands. From the API user side, there is an opaque value that is really just a newtype over this queue.
One problem is that what I really need is to embed arbitrary actions (type IO a), but I believe I can't directly exchange messages of that type. So I currently have something like this (pseudo code) :
makeSafe :: RubyInterpreter -> IO a -> IO (Either RubyError a)
makeSafe (RubyInterpreter q) a = do
mv <- newEmptyTMVarIO
-- embedded is of type IO (), letting me send this in my queue
let embedded = handleErrors a >>= atomically . putTMVar mv
atomically (writeTQueue q (SomeMessage embedded))
atomically (readTMVar mv)
(for more details, this is for the hruby package)
edit - clarifications :
Being able to send actions of type IO a would be nicer, but is not my main objective.
My main problem is that you can shoot yourself in the foot with this API, for example if there is a makeSafe call in the IO action that is passed as a parameter, this will hang.
My secondary problem is that this solution feels a bit contrived, and I wondered if there was a nicer/safer solution around.

Safer Handles in Haskell?

I felt a bit insecure when using Haskell Handles. Namely, I'm looking for two features (maybe they are already there and in that case please forgive my ignorance).
When I've obtained a handle (e.g., returned by Network.accept), which
is both readable and writable, I wish to convert them into a pair of
read-only and write-only handles such that writing to a read-only
handle won't type check and vice versa. (Perhaps one can achieve
this using phantom types and wraps around IO functions?)
In a concurrent setting, I found that it is possible for multiple threads to write to the same handle, which gives rise to quite nasty consequences. How could one prevent that through the type system (if possible) or at least get notified of such case via thrown exception during run-time?
Any idea is welcome.
It looks like the safer-file-handles library does what you want. The first part is handled pretty clearly. The concurrency-safety appears to be handled by RegionT from the regions library. I haven't used this at all, but it looks like a pretty common approach.
You may want to consider using the network conduit package. It describes a network application as something that is given two "endpoints" - one sink pushes data into a socket and one source that reads data from the socket:
type Application m = AppData m -> m ()
data AppData m Source -- ...
appSource :: AppData m -> Source m ByteStringSource
appSink :: AppData m -> Sink ByteString m ()
This cleanly separates the writing and the reading part. Now you can do whatever you like with such a source and a sink, even passing each to a different thread and processing input and output separately. Of course, each of them can only read or write, depending on what endpoint you give to it.
If you want to enforce single-threaded processing, you can restrict yourself to implement your program components as Conduit ByteString m ByteString. Such a conduit can be aseily turned into an Applications like
asApp :: MonadIO m => Conduit ByteString m ByteString -> Application m
asApp cond ad = appSource ad $= cond $$ appSink ad
But a conduit can only request data using await and write output using yield, otherwise has no access to any kind of handles and never sees any of its endpoints, so it can't expose or leak them anywhere.

Resources