I'm trying to write a library aiming to reproduce Qt's threading semantics: signals can be connected to slots, and all slots execute in a known thread, so that slots tied to the same thread are threadsafe with regards to each other.
I have the following API:
data Signal a = Signal Unique a
data Slot a = Slot Unique ThreadId (a -> IO ())
mkSignal :: IO (Signal a)
mkSlot :: ThreadId -> (Slot a -> a -> IO ()) -> IO (Slot a)
connect :: Signal a -> Slot a -> IO ()
-- callable from any thread
emit :: Signal a -> a -> IO ()
-- runs in Slot's thread as a result of `emit`
execute :: Slot a -> a -> IO ()
execute (Slot _ _ f) arg = f arg
The problem is getting from emit to execute. The argument needs to be stored at runtime somehow, and then an IO action performed, but I can't seem to get past the type checker.
The things I need:
Type safety: signals shouldn't be connected to slots expecting a different type.
Type-independence: there can be more than one slots for any given type (Perhaps this can be relaxed with newtype and/or TH).
Ease of use: since this is a library, signals and slots should be easy to create.
The things I've tried:
Data.Dynamic: makes the whole thing really fragile, and I haven't found a way to perform a correctly-typed IO action on a Dynamic. There's dynApply, but it's pure.
Existential types: I need to execute the function passed to mkSlot, as opposed to an arbitrary function based on the type.
Data.HList: I'm not smart enough to figure it out.
What am I missing?
Firstly, are you sure Slots really want to execute in a specific thread? It's easy to write thread-safe code in Haskell, and threads are very lightweight in GHC, so you're not gaining much by tying all event-handler execution to a specific Haskell thread.
Also, mkSlot's callback doesn't need to be given the Slot itself: you can use recursive do-notation to bind the slot in its callback without adding the concern of tying the knot to mkSlot.
Anyway, you don't need anything as complicated as those solutions. I expect when you talk about existential types, you're thinking about sending something like (a -> IO (), a) through a TChan (which you mentioned using in the comments) and applying it on the other end, but you want the TChan to accept values of this type for any a, rather than just one specific a. The key insight here is that if you have (a -> IO (), a) and don't know what a is, the only thing you can do is apply the function to the value, giving you an IO () — so we can just send those through the channel instead!
Here's an example:
import Data.Unique
import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
newtype SlotGroup = SlotGroup (IO () -> IO ())
data Signal a = Signal Unique (TVar [Slot a])
data Slot a = Slot Unique SlotGroup (a -> IO ())
-- When executed, this produces a function taking an IO action and returning
-- an IO action that writes that action to the internal TChan. The advantage
-- of this approach is that it's impossible for clients of newSlotGroup to
-- misuse the internals by reading the TChan or similar, and the interface is
-- kept abstract.
newSlotGroup :: IO SlotGroup
newSlotGroup = do
chan <- newTChanIO
_ <- forkIO . forever . join . atomically . readTChan $ chan
return $ SlotGroup (atomically . writeTChan chan)
mkSignal :: IO (Signal a)
mkSignal = Signal <$> newUnique <*> newTVarIO []
mkSlot :: SlotGroup -> (a -> IO ()) -> IO (Slot a)
mkSlot group f = Slot <$> newUnique <*> pure group <*> pure f
connect :: Signal a -> Slot a -> IO ()
connect (Signal _ v) slot = atomically $ do
slots <- readTVar v
writeTVar v (slot:slots)
emit :: Signal a -> a -> IO ()
emit (Signal _ v) a = atomically (readTVar v) >>= mapM_ (`execute` a)
execute :: Slot a -> a -> IO ()
execute (Slot _ (SlotGroup send) f) a = send (f a)
This uses a TChan to send actions to the worker thread each slot is tied to.
Note that I'm not very familiar with Qt, so I may have missed some subtlety of the model. You can also disconnect Slots with this:
disconnect :: Signal a -> Slot a -> IO ()
disconnect (Signal _ v) (Slot u _ _) = atomically $ do
slots <- readTVar v
writeTVar v $ filter keep slots
where keep (Slot u' _) = u' /= u
You might want something like Map Unique (Slot a) instead of [Slot a] if this is likely to be a bottleneck.
So, the solution here is to (a) recognise that you have something that's fundamentally based upon mutable state, and use a mutable variable to structure it; (b) realise that functions and IO actions are first-class just like everything else, so you don't have to do anything special to construct them at runtime :)
By the way, I suggest keeping the implementations of Signal and Slot abstract by not exporting their constructors from the module defining them; there are many ways to tackle this approach without changing the API, after all.
Related
I want to process stream of events received via the MQTT. Library which I'm using uses a callback to provide the results. Processing I'm doing depends on the previous state not only the latest event. Also in the future events might be gathered from the other sources.
At the first I decided to compose it into the list which sounds as a good idea. I had the minor issue cause IO prevents lazy evaluation and waiting for infinite stream might be long, but I solved it with interleaving IO.
stream :: IO [Event] allows me to do the nice stuff like foldl, foldM map, mapM, etc... Unfortunately with this approach I rather wont be able to combine two streams, cause there is no more locking feature there.
I was diging through many libs, and found STM with TQueue for example. Unfortunately it is not what I exactly want.
I decide to create custom type and make it Foldable so I will be able to fold it. I failed due to IO.
import Control.Concurrent.STM
newtype Stream a = Stream (STM a)
runStream
:: ((a -> IO ()) -> IO i)
-> IO (Stream a)
runStream block = do
queue <- newTQueueIO
block (atomically . writeTQueue queue)
return $ Stream (readTQueue queue)
foldStream :: (a -> b -> IO b) -> b -> Stream a -> IO b
foldStream f s (Stream read) = do
n <- atomically read
m <- f n s
foldStream f m (Stream read)
mapStream :: (a -> b) -> Stream a -> Stream b
mapStream f (Stream read) = Stream $ f <$> read
zipStream :: [Stream a] -> Stream a
zipStream = undefined
Whih can be used like main = foldStream (\x _ -> print x) () =<< events
Is it possible to implement base some of base classes to work with this stream as with regular List?
The usual trick in these cases is to make the callback write to a queue, and then read from the other end of the queue.
Using a bounded, closeable queue from the stm-chans package, we can define this function:
import Control.Concurrent.STM
import Control.Concurrent.STM.TBMQueue
foldQueue :: TBMQueue a -> (x -> a -> IO x) -> IO x -> (x -> IO b) -> IO b
foldQueue queue step start done =
let go state =
do m <- atomically (readTBMQueue queue)
case m of
Nothing -> done state
Just a -> step state a >>= go
in start >>= go
It takes the channel, a step function (similar to the one required by foldM), an action to obtain the initial state, and a "done" action that returns the final result, and then feeds data from the channel until it is closed. Notice that the fold state x is chosen by the caller of foldQueue.
If later we want to upgrade to the monadic folds from the foldl package—which have a very useful Applicative instance—we can do it like this:
import qualified Control.Foldl as L
foldQueue' :: TBMQueue a -> L.FoldM IO a b -> IO b
foldQueue' queue = L.impurely (foldQueue queue)
Using impurely from the "foldl" package.
Sometimes (like when parsing, grouping, or decoding) it's easier to use a pull-based consumer. We can do that with the streaming package:
import Streaming
import qualified Streaming.Prelude as S
foldQueue' :: TBMQueue a -> (Stream (Of a) IO () -> IO r) -> IO r
foldQueue' queue consume = consume (S.untilRight (do
m <- atomically (readTBMQueue queue)
return (case m of
Nothing -> Right ()
Just a -> Left a)))
Given a function that consumes a stream, we feed to it a stream of values read from the queue.
Often, reading from the channel and writing to it must happen in different threads. We can use functions like concurrently from async to handle it cleanly.
I'm making several API calls that are encapsulated in a type alias:
type ConnectT a = EitherT String (RWST ConnectReader ConnectWriter ConnectState IO) a
Here's a simplified version of a function which connects to two separate APIs:
connectBoth :: ConnectT ()
connectBoth = do
a <- connectAPI SomeAPI someFunction
b <- connectAPI OtherAPI otherFunction
connectAPI OtherAPI (b `sendTo` a)
The final call in connectBoth is very time sensitive (and the transactions are of a financial nature). I figure a and b could be evaluated in parallel, and with lazy IO I should be able to do this:
b <- a `par` connectAPI OtherAPI otherFunction
The documentation for par says that it Indicates that it may be beneficial to evaluate the first argument in parallel with the second.
Does this work with IO?
Can I get any more guaranteed than "it may be beneficial?"
Or if I want greater guarantees will I need to use an MVar and liftIO . forkIO?
If I evaluate a first, I think I can use eitherT to check if a succeeded. But if I evaluate both at the same time I get confused. Here is the situation:
If only a failed, I will retry a, if that fails I will run a function that manually reverses b
If only b failed, I will retry b, write to the log in RWS and return left
if both fail write to the log in RWS and return left
if both succeed process c (which is not as time sensitive as a or b)
But if I evaluate both in parallel, then how can I identify which one failed? If I use eitherT immediately after a then a will evaluate first. If I use it after b then I won't be able to tell which one failed.
Is there a way I can evaluate the IO calls in parallel but respond differently depending on which one (if any) fails? Or am I left with a choice of parallelism vs failure mitigation?
The solution you are looking for will use forkIO and MVars.
par
par is for multiprocessor parallelism, it helps evaluate terms in parallel. It doesn't help with IO. If you do
do
a <- (someProcess :: IO a)
...
By the time you reach ... everything from the IO action has happened (if we ignore evil lazy IO) to a point that a can be determined entirely by ordinary evaluation. This means that by the time you do b <- someOtherProcess, all of someProcess is already done. It's too late to do anything in parallel.
EitherT
You can explicitly examine the Either e a result of an EitherT e m a. runEitherT :: EitherT e m a -> m (Either e a) makes the success or failure explicit in the underlying monad. We can lift that right back into EitherT to make a computation that always succeeds (sometimes with an error) from one that sometimes fails.
import Control.Monad.Trans.Class
examine :: (MonadTrans t, Monad m) => EitherT e m a -> t m (Either e a)
examine = lift . runEitherT
forkIO
The simplest solution for doing two things in IO is forkIO. It starts another lightweight thread that you can forget about.
If you run a value with your transformer stack, there will be four pieces of data when you are done. The state ConnectState, the written ConnectWriter log, whether the computation was successful, and, depending on whether or not it was successful, either the value or the error.
EitherT String (RWST ConnectReader ConnectWriter ConnectState IO) a
^ ^ ^ ^ ^
If we write out the structure of this, it looks like
(RWST ConnectReader ConnectWriter ConnectState IO) (Either String a)
^ ^ ^ ^ ^
ConnectReader -> ConnectState -> IO (Either String a, ConnectState, ConnectWriter)
^ ^ ^ ^ ^
All four of those pieces of information end up in the result of the IO action. If you fork your stack, you need to decide what to do with all of them when you join the results back together. You have already decided that you want to explicitly handle the Either String a. The ConnectWriters can probably be combined together with <>. You will need to decide what to do with ConnectState.
We'll make a fork that returns all four of these pieces of data by shoving them into an MVar.
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad.IO.Class
forkConnectT :: ConnectT a -> ConnectT (MVar (Either String a, ConnectState, ConnectWriter))
forkConnectT cta = do
result <- liftIO newEmptyMVar
r <- lift ask
s <- lift get
liftIO $ forkIO $ do
state <- runRWST (runEitherT cta) r s
putMVar result state
return result
Later, when we want the result, we can try and see if it is ready. We'll handle the Either for success and failure explicitly, while handling the state and writer behind the scenes.
import Data.Traversable
tryJoinConnectT :: MVar (Either String a, ConnectState, ConnectWriter) -> ConnectT (Maybe (Either String a))
tryJoinConnectT result = liftIO (tryTakeMVar result) >>= traverse reintegrate
Behind the scenes we reintegrate the ConnectWriter by telling this ConnectT to write what was accumulated in the other thread. You will need to decide what to do to combine the two states.
reintegrate :: (a, ConnectState, ConnectWriter) -> ConnectT a
reintegrate (a, s, w) = do
-- Whatever needs to be done with the state.
-- stateHere <- lift get
lift $ tell w
return a
If we want to wait until the result is ready, we can block reading the MVar. This offers less opportunity for handling errors such as timeouts.
joinConnectT :: MVar (Either String a, ConnectState, ConnectWriter) -> ConnectT (Either String a)
joinConnectT result = liftIO (takeMVar result) >>= reintegrate
Example
Putting it all together, we can fork a task in parallel, do something in this thread explicitly examining the success or failure, join with the result from the other thread, and reason about what to do next with explicit Eithers representing success or failure from each process.
connectBoth :: ConnectT ()
connectBoth = do
bVar <- forkConnectT $ connectAPI OtherAPI otherFunction
a <- examine $ connectAPI SomeAPI someFunction
b <- joinConnectT bVar
...
Going farther
If you are paranoid, you will also want to handle exceptions (some of which can be handled by forkFinally) and asynchronous exceptions. You will need to decide whether to bundle these exceptions into your stack or treat IO like it can always throw exceptions.
Consider using async instead of forkIO and MVars.
monad-control (which you already have dependencies on via either) provides mechanisms for building up, one transformer at a time, the type that represents the state of a monad transformer stack. We wrote this by hand as (Either String a, ConnectState, ConnectWriter). If you are going to grow your transformer stack, you might want to get this from MonadTransControl instead. You can restore the state from the forked thread(see MonadBaseControl section) in the parent to inspect it. You will still need to decide how to deal with the data from the two states..
Suppose I have a Data.Dynamic.Dynamic object which wraps an IO action (that is, something of type IO a for some perhaps-unknown a). I feel like I should be able carry out this IO action and get its result, wrapped in a Dynamic (which will have type a). Is there a standard library function which does this? (Something like dynApply, but for IO action performance instead of function application.)
The implementation of the function would perhaps look something like
dynPerform :: Dynamic -> Maybe IO Dynamic
dynPerform (Dynamic typ act)
= if (typeRepTyCon typ) /= ioTyCon then Nothing else Just $
do result <- (unsafeCoerce act :: IO Any)
return Just . Dynamic (head $ typeRepArgs typ) $ result
exampleIOAction = putChar
typeOfIOAction = typeOf exampleIOAction
ioTyCon = typeRepTyCon typeOfIOAction
but obviously this is uses several unsafe operations, so I'd rather pull it in from a library. (In fact, what I've written wouldn't work outside Data.Dynamic because of the opacity of the type Data.Dynamic.Dynamic.)
I don't believe you can safely do what you are trying to do. Let me suggest an alternative approach.
Perhaps phantom types can help you here. Suppose you are providing some sort of cron job service, where the user has you perform an action every x microseconds, and the user can query at any time to see the result of the most recent run of that action.
Suppose you yourself have access to the following primitives:
freshKey :: IO Key
save :: Key -> Dynamic -> IO ()
load :: Key -> IO (Maybe Dynamic)
You should schedule the jobs and make a plan to store the results while you still "know" in the type system what type the action is.
-- do not export the internals of PhantomKey
data PhantomKey a = PhantomKey {
getKey :: Key
getThread :: Async ()
}
-- This is how your user acquires phantom keys;
-- their phantom type is tied to the type of the input action
schedule :: Typeable a => Int -> IO a -> IO (PhantomKey a)
schedule microseconds m = do
k <- freshKey
let go = do
threadDelay microseconds
a <- m
save k (toDyn a)
go
thread <- async go
return $ PhantomKey k thread
unschedule :: PhantomKey a -> IO ()
unschedule pk = cancel (getThread pk)
-- This is how your user uses phantom keys;
-- notice the function result type is tied to the phantom key type
peekLatest :: PhantomKey a -> IO (Maybe a)
peekLatest pk = load (getKey pk) >>= \md -> case md of
Nothing -> return Nothing -- Nothing stored at this key (yet?)
Just dyn -> case fromDynamic dyn of
Nothing -> return Nothing -- mismatched data type stored at this key
-- hitting this branch is probably a bug
Just a -> return (Just a)
Now if I'm a user of your API, I can use it with my own data types that you know nothing about, as long as they're Typeable:
refreshFoo :: IO Foo
main = do
fooKey <- schedule 1000000 refreshFoo
-- fooKey :: PhantomKey Foo
mfoo <- peekLatest fooKey
-- mfoo :: Maybe Foo
So what have we accomplished?
Your library is taking in a user IO action, and performing it at arbitrary points in time
Your library is saving your user's data via Dynamic blobs
Your library is loading your user's data via Dynamic blobs
All this without your library knowing anything about your user's data types.
It seems to me that if you are putting something which you know is an IO action into a Dynamic blob, you have lost information in the type system about that thing in a context when you should have instead made use of said type information. TypeRep can get you type information at the value level, but (as far as I know) cannot bubble that information back up into the type level.
I'm working on a haskell network application and I use the actor pattern to manage multithreading. One thing I came across is how to store for example a set of client sockets/handles. Which of course must be accessible for all threads and can change when clients log on/off.
Since I'm coming from the imperative world I thought about some kind of lock-mechanism but when I noticed how ugly this is I thought about "pure" mutability, well actually it's kind of pure:
import Control.Concurrent
import Control.Monad
import Network
import System.IO
import Data.List
import Data.Maybe
import System.Environment
import Control.Exception
newStorage :: (Eq a, Show a) => IO (Chan (String, Maybe (Chan [a]), Maybe a))
newStorage = do
q <- newChan
forkIO $ storage [] q
return q
newHandleStorage :: IO (Chan (String, Maybe (Chan [Handle]), Maybe Handle))
newHandleStorage = newStorage
storage :: (Eq a, Show a) => [a] -> Chan (String, Maybe (Chan [a]), Maybe a) -> IO ()
storage s q = do
let loop = (`storage` q)
(req, reply, d) <- readChan q
print ("processing " ++ show(d))
case req of
"add" -> loop ((fromJust d) : s)
"remove" -> loop (delete (fromJust d) s)
"get" -> do
writeChan (fromJust reply) s
loop s
store s d = writeChan s ("add", Nothing, Just d)
unstore s d = writeChan s ("remove", Nothing, Just d)
request s = do
chan <- newChan
writeChan s ("get", Just chan, Nothing)
readChan chan
The point is that a thread (actor) is managing a list of items and modifies the list according to incoming requests. Since thread are really cheap I thought this could be a really nice functional alternative.
Of course this is just a prototype (a quick dirty proof of concept).
So my question is:
Is this a "good" way of managing shared mutable variables (in the actor world) ?
Is there already a library for this pattern ? (I already searched but I found nothing)
Regards,
Chris
Here is a quick and dirty example using stm and pipes-network. This will set up a simple server that allows clients to connect and increment or decrement a counter. It will display a very simple status bar showing the current tallies of all connected clients and will remove client tallies from the bar when they disconnect.
First I will begin with the server, and I've generously commented the code to explain how it works:
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar
import qualified Data.HashMap.Strict as H
import Data.Foldable (forM_)
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (unless)
import Control.Monad.Trans.State.Strict
import qualified Data.ByteString.Char8 as B
import Control.Proxy
import Control.Proxy.TCP
import System.IO
main = do
hSetBuffering stdout NoBuffering
{- These are the internal data structures. They should be an implementation
detail and you should never expose these references to the
"business logic" part of the application. -}
-- I use nRef to keep track of creating fresh Ints (which identify users)
nRef <- newTVarIO 0 :: IO (TVar Int)
{- hMap associates every user (i.e. Int) with a counter
Notice how I've "striped" the hash map by storing STM references to the
values instead of storing the values directly. This means that I only
actually write the hashmap when adding or removing users, which reduces
contention for the hash map.
Since each user gets their own unique STM reference for their counter,
modifying counters does not cause contention with other counters or
contention with the hash map. -}
hMap <- newTVarIO H.empty :: IO (TVar (H.HashMap Int (TVar Int)))
{- The following code makes heavy use of Haskell's pure closures. Each
'let' binding closes over its current environment, which is safe since
Haskell is pure. -}
let {- 'getCounters' is the only server-facing command in our STM API. The
only permitted operation is retrieving the current set of user
counters.
'getCounters' closes over the 'hMap' reference currently in scope so
that the server never needs to be aware about our internal
implementation. -}
getCounters :: STM [Int]
getCounters = do
refs <- fmap H.elems (readTVar hMap)
mapM readTVar refs
{- 'init' is the only client-facing command in our STM API. It
initializes the client's entry in the hash map and returns two
commands: the first command is what the client calls to 'increment'
their counter and the second command is what the client calls to log
off and delete
'delete' command.
Notice that those two returned commands each close over the client's
unique STM reference so the client never needs to be aware of how
exactly 'init' is implemented under the hood. -}
init :: STM (STM (), STM ())
init = do
n <- readTVar nRef
writeTVar nRef $! n + 1
ref <- newTVar 0
modifyTVar' hMap (H.insert n ref)
let incrementRef :: STM ()
incrementRef = do
mRef <- fmap (H.lookup n) (readTVar hMap)
forM_ mRef $ \ref -> modifyTVar' ref (+ 1)
deleteRef :: STM ()
deleteRef = modifyTVar' hMap (H.delete n)
return (incrementRef, deleteRef)
{- Now for the actual program logic. Everything past this point only uses
the approved STM API (i.e. 'getCounters' and 'init'). If I wanted I
could factor the above approved STM API into a separate module to enforce
the encapsulation boundary, but I am lazy. -}
{- Fork a thread which polls the current state of the counters and displays
it to the console. There is a way to implement this without polling but
this gets the job done for now.
Most of what it is doing is just some simple tricks to reuse the same
console line instead of outputting a stream of lines. Otherwise it
would be just:
forkIO $ forever $ do
ns <- atomically getCounters
print ns
-}
forkIO $ (`evalStateT` 0) $ forever $ do
del <- get
lift $ do
putStr (replicate del '\b')
putStr (replicate del ' ' )
putStr (replicate del '\b')
ns <- lift $ atomically getCounters
let str = show ns
lift $ putStr str
put $! length str
lift $ threadDelay 10000
{- Fork a thread for each incoming connection, which listens to the client's
commands and translates them into 'STM' actions -}
serve HostAny "8080" $ \(socket, _) -> do
(increment, delete) <- atomically init
{- Right now, just do the dumb thing and convert all keypresses into
increment commands, with the exception of the 'q' key, which will
quit -}
let handler :: (Proxy p) => () -> Consumer p Char IO ()
handler () = runIdentityP loop
where
loop = do
c <- request ()
unless (c == 'q') $ do
lift $ atomically increment
loop
{- This uses my 'pipes' library. It basically is a high-level way to
say:
* Read binary packets from the socket no bigger than 4096 bytes
* Get the first character from each packet and discard the rest
* Handle the character using the above 'handler' function -}
runProxy $ socketReadS 4096 socket >-> mapD B.head >-> handler
{- The above pipeline finishes either when the socket closes or
'handler' stops looping because it received a 'q'. Either case means
that the client is done so we log them out using 'delete'. -}
atomically delete
Next up is the client, which simply opens a connections and forwards all key presses as single packets:
import Control.Monad
import Control.Proxy
import Control.Proxy.Safe
import Control.Proxy.TCP.Safe
import Data.ByteString.Char8 (pack)
import System.IO
main = do
hSetBuffering stdin NoBuffering
hSetEcho stdin False
{- Again, this uses my 'pipes' library. It basically says:
* Read characters from the console using 'commands'
* Pack them into a binary format
* send them to a server running at 127.0.0.1:8080
This finishes looping when the user types a 'q' or the connection is
closed for whatever reason.
-}
runSafeIO $ runProxy $ runEitherK $
try . commands
>-> mapD (\c -> pack [c])
>-> connectWriteD Nothing "127.0.0.1" "8080"
commands :: (Proxy p) => () -> Producer p Char IO ()
commands () = runIdentityP loop
where
loop = do
c <- lift getChar
respond c
unless (c == 'q') loop
It's pretty simple: commands generates a stream of Chars, which then get converted to ByteStrings and then sent as packets to the server.
If you run the server and a few clients and have them each type in a few keys, your server display will output a list showing how many keys each client typed:
[1,6,4]
... and if some of the clients disconnect they will be removed from the list:
[1,4]
Note that the pipes component of these examples will simplify greatly in the upcoming pipes-4.0.0 release, but the current pipes ecosystem still gets the job done as is.
First, I'd definitely recommend using your own specific data type for representing commands. When using (String, Maybe (Chan [a]), Maybe a) a buggy client can crash your actor simply by sending an unknown command or by sending ("add", Nothing, Nothing), etc. I'd suggest something like
data Command a = Add a | Remove a | Get (Chan [a])
Then you can pattern match on commands in storage in a save way.
Actors have their advantages, but also I feel that they have some drawbacks. For example, getting an answer from an actor requires sending it a command and then waiting for a reply. And the client can't be completely sure that it gets a reply and that the reply will be of some specific type - you can't say I want only answers of this type (and how many of them) for this particular command.
So as an example I'll give a simple, STM solution. It'd be better to use a hash table or a (balanced tree) set, but since Handle implements neither Ord nor Hashable, we can't use these data structures, so I'll keep using lists.
module ThreadSet (
TSet, add, remove, get
) where
import Control.Monad
import Control.Monad.STM
import Control.Concurrent.STM.TVar
import Data.List (delete)
newtype TSet a = TSet (TVar [a])
add :: (Eq a) => a -> TSet a -> STM ()
add x (TSet v) = readTVar v >>= writeTVar v . (x :)
remove :: (Eq a) => a -> TSet a -> STM ()
remove x (TSet v) = readTVar v >>= writeTVar v . delete x
get :: (Eq a) => TSet a -> STM [a]
get (TSet v) = readTVar v
This module implements a STM based set of arbitrary elements. You can have multiple such sets and use them together in a single STM transaction that succeeds or fails at once. For example
-- | Ensures that there is exactly one element `x` in the set.
add1 :: (Eq a) => a -> TSet a -> STM ()
add1 x v = remove x v >> add x v
This would be difficult with actors, you'd have to add it as another command for the actor, you can't compose it of existing actions and still have atomicity.
Update: There is an interesting article explaining why Clojure designers chose not to use actors. For example, using actors, even if you have many reads and only very little writes to a mutable structure, they're all serialized, which can greatly impact performance.
Assume we have an IO action such as
lookupStuff :: InputType -> IO OutputType
which could be something simple such as DNS lookup, or some web-service call against a time-invariant data.
Let's assume that:
The operation never throws any exception and/or never diverges
If it wasn't for the IO monad, the function would be pure, i.e. the result is always the same for equal input parameters
The action is reentrant, i.e. it can be called from multiple threads at the same time safely.
The lookupStuff operation is quite (time-)expensive.
The problem I'm facing is how to properly (and w/o using any unsafe*IO* cheat) implement a reentrant cache, that can be called from multiple threads, and coalesces multiple queries for the same input-parameters into a single request.
I guess I'm after something similiar as GHC's blackhole-concept for pure computations but in the IO "calculation" context.
What is the idiomatic Haskell/GHC solution for the stated problem?
Yeah, basically reimplement the logic. Although it seems similar to what GHC is already doing, that's GHC's choice. Haskell can be implemented on VMs that work very differently, so in that sense it isn't already done for you.
But yeah, just use an MVar (Map InputType OutputType) or even an IORef (Map InputType OutputType) (make sure to modify with atomicModifyIORef), and just store the cache in there. If this imperative solution seems wrong, it's the "if not for the IO, this function would be pure" constraint. If it were just an arbitrary IO action, then the idea that you would have to keep state in order to know what to execute or not seems perfectly natural. The problem is that Haskell does not have a type for "pure IO" (which, if it depends on a database, it is just behaving pure under certain conditions, which is not the same as being a hereditarily pure).
import qualified Data.Map as Map
import Control.Concurrent.MVar
-- takes an IO function and returns a cached version
cache :: (Ord a) => (a -> IO b) -> IO (a -> IO b)
cache f = do
r <- newMVar Map.empty
return $ \x -> do
cacheMap <- takeMVar r
case Map.lookup x cacheMap of
Just y -> do
putMVar r cacheMap
return y
Nothing -> do
y <- f x
putMVar (Map.insert x y cacheMap)
return y
Yeah it's ugly on the inside. But on the outside, look at that! It's just like the type of a pure memoization function, except for it has IO stained all over it.
Here's some code implementing more or less what I was after in my original question:
import Control.Concurrent
import Control.Exception
import Data.Either
import Data.Map (Map)
import qualified Data.Map as Map
import Prelude hiding (catch)
-- |Memoizing wrapper for 'IO' actions
memoizeIO :: Ord a => (a -> IO b) -> IO (a -> IO b)
memoizeIO action = do
cache <- newMVar Map.empty
return $ memolup cache action
where
-- Lookup helper
memolup :: Ord a => MVar (Map a (Async b)) -> (a -> IO b) -> a -> IO b
memolup cache action' args = wait' =<< modifyMVar cache lup
where
lup tab = case Map.lookup args tab of
Just ares' ->
return (tab, ares')
Nothing -> do
ares' <- async $ action' args
return (Map.insert args ares' tab, ares')
The code above builds upon Simon Marlow's Async abstraction as described in Tutorial: Parallel and Concurrent Programming in Haskell:
-- |Opaque type representing asynchronous results.
data Async a = Async ThreadId (MVar (Either SomeException a))
-- |Construct 'Async' result. Can be waited on with 'wait'.
async :: IO a -> IO (Async a)
async io = do
var <- newEmptyMVar
tid <- forkIO ((do r <- io; putMVar var (Right r))
`catch` \e -> putMVar var (Left e))
return $ Async tid var
-- |Extract value from asynchronous result. May block if result is not
-- available yet. Exceptions are returned as 'Left' values.
wait :: Async a -> IO (Either SomeException a)
wait (Async _ m) = readMVar m
-- |Version of 'wait' that raises exception.
wait' :: Async a -> IO a
wait' a = either throw return =<< wait a
-- |Cancels asynchronous computation if not yet completed (non-blocking).
cancel :: Async a -> IO ()
cancel (Async t _) = throwTo t ThreadKilled