Creating a Handle that guarantees hClose failure - haskell

I'd like to create a Handle that guarantees failure (exception) when it's passed into hClose. I need this for testing purposes.
How do I create such a Handle?

The module GHC.IO.Handle of the base package has the function mkFileHandle:
mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle
IODevice and BufferedIO are typeclasses that provide basic handle operations for a device. In particular, IODevice has the close method.
You can create your own dummy device type, define those two instances for it (with a close that throws an exception) and then use mkFileHandle to obtain a useable Handle.
See the code of the knob package for an example of how to do this.

Related

What forces drove WAI Application to be redesigned five times?

I took a curious look at WAI interface and while it looks simple, I was surprised to see how many iterations it took to stabilize at the current form!
I had assumed that CPS style for resource safety would be the most interesting thing but it looks like there is much more to learn from!
$ git log -p --reverse -- wai/Network/Wai.hs | grep '\+type Application'
+type Application = Request -> Iteratee B.ByteString IO Response
+type Application = Request -> ResourceT IO Response
+type Application = Request -> C.ResourceT IO Response
+type Application = Request -> IO Response
+type Application = Request -> (forall b. (Response -> IO b) -> IO b)
+type Application = Request -> (Response -> IO ResponseReceived)
-> IO ResponseReceived
Some archeology yields somewhat unsatisfactory results:
$ git log --reverse -G 'type Application' --pretty=oneline -- wai/Network/Wai.hs | cat
879d4a23047c3585e1cba4cdd7c3e8fc13e17592 Moved everything to wai subfolder
360442ac74f7e79bb0e320110056b3f44e15107c Began moving wai/warp to conduit
af7d1a79cbcada0b18883bcc5e5e19a1cd06ae7b conduit 0.3
fe2032ad4c7435709ed79683acac3b91110bba04 Pass around an InternalState instead of living in ResourceT
63ad533299a0a5bad01a36171d98511fdf8d5821 Application uses bracket pattern
1e1b8c222cce96c3d58cd27318922c318642050d ResponseReceived, to avoid existential issues
All the designs seem to be driven by three main concerns:
Requests can have streamed bodies (so we don't have to load them all in memory before starting to process them). How to best represent it?
Responses can be streamed as well. How to best represent it?
How to ensure that resources allocated in the production of a response are properly freed? (For example, how to ensure that file handles are freed after serving a file?)
type Application = Request -> Iteratee B.ByteString IO Response
This version uses iteratees, which were an early solution for streaming data in Haskell. Iteratee consumers had to be written in a "push-based" way, which was arguably less natural than the "pull-based" consumers used in modern streaming libraries.
The streamed body of the request is fed to the iteratee and we get a Response value at the end. The Response contains an enumerator (a function that feeds streamed response bytes to a response iteratee supplied by the server). Presumably, the enumerator would control resource allocation using functions like bracket.
type Application = Request -> ResourceT IO Response
This version uses the resourcet monad transformer for resource management, instead of doing it in the enumerator. There is a special Source type inside both Request and Response which handles streamed data (and which is a bit hard to understant IMHO).
type Application = Request -> IO Response
This version uses the streaming abstractions from conduit, but eschews resourcet and instead provides a bracket-like responseSourceBracket function for handling resources in streamed responses.
type Application = Request -> (forall b. (Response -> IO b) -> IO b)
type Application = Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
This version moves to a continuation-based approach which enables the handler function to use regular bracket-like functions to control resource allocation. Back to square one, in that respect!
Conduits are no longer used for streaming. Now there is a plain Request -> IO ByteString function for reading chunks of the request body, and a (Builder -> IO ()) -> IO () -> IO () function in the Response for generating the response stream. (The Builder -> IO () write function along with a flush action are supplied by the server.)
Like the resourcet-based versions, and unlike the iteratee-based version, this implementation lets you overlap reading the request body with streaming the response.
The polymorphic handler is a neat trick to ensure that the response-taking callback Response -> IO b is always called: the handler needs to return a b, and the only way to get one is to actually invoke the callback!
This polymorphic solution seems to have caused some problems (perhaps with storing handlers in containers?) Instead of using polymorphism, we can use a ResponseReceived token without a public constructor. The effect is the same: the only way for handler code to get hold of the token it needs to return is to invoke the callback.

Distributing Haskell on a cluster

I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO ()
This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new Main.hs containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using threadDelay) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
Yep. There is a magical library called packman. It allows you to turn any haskell thing into data (as long as it does not have IORefs or related things in them.) Here the things you would need:
trySerialize :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a
instance Typeable a => Binary (Serialized a)
Yep, those are the exact types. You can package up your IO actions using trySerialize, use Binary to transfer it to wherever, and then deserialize to get the IO action out, ready for use.
Caveats for packman is that:
It stores things as thunks. This is probably what you want, so that the node can do the evaluating.
That said, if your thunk is huge, the Binary will probably be huge. Evaluating the thunk can fix this.
Like I said, mutable references are a no-no. One thing to watch out is them being inside thunks without you knowing it.
Other than that, this seems like what you want!

Concurrency considerations between pipes and non-pipes code

I'm in the process of wrapping a C library for some encoding in a pipes interface, but I've hit upon some design decisions that need to be made.
After the C library is set up, we hold on to an encoder context. With this, we can either encode, or change some parameters (let's call the Haskell interface to this last function tune :: Context -> Int -> IO ()). There are two parts to my question:
The encoding part is easily wrapped up in a Pipe Foo Bar IO (), but I would also like to expose tune. Since simultaneous use of the encoding context must be lock protected, I would need to take a lock at every iteration in the pipe, and protect tune with taking the same lock. But now I feel I'm forcing hidden locks on the user. Am I barking up the wrong tree here? How is this kind of situation normally resolved in the pipes ecosystem? In my case I expect the pipe that my specific code is part of to always run in its own thread, with tuning happening concurrently, but I don't want to force this point of view upon any users. Other packages in the pipes ecosystem do not seem to force their users like either.
An encoding context that is no longer used needs to be properly de-initialized. How does one, in the pipes ecosystem, ensure that such things (in this case performing som IO actions) are taken care of when the pipe is destroyed?
A concrete example would be wrapping a compression library, in which case the above can be:
The compression strength is tunable. We set up the pipe and it runs along merrily. How should one best go about allowing the compression strength setting to be changed while the pipe keeps running, assuming that concurrent access to the compression codec context must be serialized?
The compression library allocated a bunch of memory off the Haskell heap when set up, and we'll need to call some library function to clean this up when the pipe is torn down.
Thanks… this might all be obvious, but I'm quite new to the pipes ecosystem.
Edit: Reading this after posting, I'm quite sure it's the vaguest question I've ever asked here. Ugh! Sorry ;-)
Regarding (1), the general solution is to change your Pipe's type to:
Pipe (Either (Context, Int) Foo) Bar IO ()
In other words, it accepts both Foo inputs and tune requests, which it processes internally.
So let's then assume that you have two concurrent Producers corresponding to inputs and tune requests:
producer1 :: Producer Foo IO ()
producer2 :: Producer (Context, Int) IO ()
You can use pipes-concurrency to create a buffer that they both feed into, like this:
example = do
(output, input) <- spawn Unbounded
-- input :: Input (Either (Context, Int) Foo)
-- output :: Output (Either (Context, Int) Foo)
let io1 = runEffect $ producer1 >-> Pipes.Prelude.map Right >-> toOutput output
io2 = runEffect $ producer2 >-> Pipes.Prelude.map Left >-> toOutput output
as <- mapM async [io1, io2]
runEffect (fromInput >-> yourPipe >-> someConsumer)
mapM_ wait as
You can learn more about the pipes-concurrency library by reading this tutorial.
By forcing all tune requests to go through the same single-threaded Pipe you can ensure that you don't accidentally have two concurrent invocations of the tune function.
Regarding (2) there are two ways you can acquire a resource using pipes. The more sophisticated approach is to use the pipes-safe library, which provides a bracket function that you can use within a Pipe, but that is probably overkill for your purpose and only exists for acquiring and releasing multiple resources over the lifetime of a pipe. A simpler solution is just to use the following with idiom to acquire the pipe:
withEncoder :: (Pipe Foo Bar IO () -> IO r) -> IO r
withEncoder k = bracket acquire release $ \resource -> do
k (createPipeFromResource resource)
Then a user would just write:
withEncoder $ \yourPipe -> do
runEffect (someProducer >-> yourPipe >-> someConsumer)
You can optionally use the managed package, which simplifies the types a bit and makes it easier to acquire multiple resources. You can learn more about it from reading this blog post of mine.

Safer Handles in Haskell?

I felt a bit insecure when using Haskell Handles. Namely, I'm looking for two features (maybe they are already there and in that case please forgive my ignorance).
When I've obtained a handle (e.g., returned by Network.accept), which
is both readable and writable, I wish to convert them into a pair of
read-only and write-only handles such that writing to a read-only
handle won't type check and vice versa. (Perhaps one can achieve
this using phantom types and wraps around IO functions?)
In a concurrent setting, I found that it is possible for multiple threads to write to the same handle, which gives rise to quite nasty consequences. How could one prevent that through the type system (if possible) or at least get notified of such case via thrown exception during run-time?
Any idea is welcome.
It looks like the safer-file-handles library does what you want. The first part is handled pretty clearly. The concurrency-safety appears to be handled by RegionT from the regions library. I haven't used this at all, but it looks like a pretty common approach.
You may want to consider using the network conduit package. It describes a network application as something that is given two "endpoints" - one sink pushes data into a socket and one source that reads data from the socket:
type Application m = AppData m -> m ()
data AppData m Source -- ...
appSource :: AppData m -> Source m ByteStringSource
appSink :: AppData m -> Sink ByteString m ()
This cleanly separates the writing and the reading part. Now you can do whatever you like with such a source and a sink, even passing each to a different thread and processing input and output separately. Of course, each of them can only read or write, depending on what endpoint you give to it.
If you want to enforce single-threaded processing, you can restrict yourself to implement your program components as Conduit ByteString m ByteString. Such a conduit can be aseily turned into an Applications like
asApp :: MonadIO m => Conduit ByteString m ByteString -> Application m
asApp cond ad = appSource ad $= cond $$ appSink ad
But a conduit can only request data using await and write output using yield, otherwise has no access to any kind of handles and never sees any of its endpoints, so it can't expose or leak them anywhere.

Nondeterministically interleaving conduit's Sources

I was hoping to see a nondeterministic interleaving operation for sources, with a type signature like
interleave :: WhateverIOMonadClassItWouldWant m => [(k, Source m a)] -> Source m (k, a)
The use case is that I have a p2p application that maintains open connections to many nodes on the network, and it is mostly just sitting around waiting for messages from any of them. When a message arrives, it doesn't care where it came from, but needs to process the message as soon as possible. In theory this kind of application (at least when used for socket-like sources) could bypass GHC's IO manager entirely and run the select/epoll/etc. calls directly, but I don't particularly care how it's implemented, as long as it works.
Is something like this possible with conduit? A less general but probably more feasible approach might be to write a [(k, Socket)] -> Source m (k, ByteString) function that handles receiving on all the sockets for you.
I noticed the ResumableSource operations in conduit, but they all seem to want to be aware of a particular Sink, which feels like a bit of an abstraction leak, at least for this operation.
The stm-conduit package provides the mergeSources which performs something similar- though not identical- to what you're looking for. It's probably a good place to start.
Yes, it is possible.
You can poll a bunch of Sources without blocking by forking threads to poll where in each thread you pair the Source up with a Sink that sends the output to some concurrency channel:
concur :: (WhateverIOMonadClassItWouldWant m) => TChan a -> Sink a m r
... and then you define a Source that reads from that channel:
synchronize :: (WhateverIOMonadClassItWouldWant m) => TChan a -> Source a m r
Notice that this would be no different than just forking the threads to poll the sockets themselves, but it would be useful to other users of conduit that might want to poll other things than sockets using Sources they defined because it's more general.
If you combined those capabilities into one function, then the overall API of the call would look something like:
poll :: (WhateverIOMonadClassItWouldWant m) => [Source a m r] -> m (Source a m r)
... but you can still throw in those ks if you want.

Resources