Trying to figure out best practices to handle repeated actions in Haskell using Control.Concurrent.Timer. Can't find good examples.
Suppose I have repeated action that must be done in application. Say it's fetching data from some webpage.
import Control.Concurrent.Suspend.Lifted
import Control.Concurrent.Timer
main :: IO ()
main = do
url <- fetchUrl :: IO Url
doSomethingWithUrl url
and I would like url to be fetched repeatly inside main so the data is always fresh. How could it be done?
Your needs sound like they fit with an IORef or MVar. Typed and not tested code is:
import Data.IORef (readIORef, writeIORef, newIORef)
import Control.Concurrent (threadDelay,forkIO)
import Control.Monad (void)
main :: IO ()
main = do
ref <- newIORef mempty {- or default value -}
_ <- forkIO (updatePeriodically fetchUrl ref)
scotty $ do
liftIO (readIORef ref) >>= etcEtc
updatePeriodically :: (IO SomeType) -> IORef SomeType -> IO ()
updatePeriodically op ref =
let update = op >>= writeIORef ref
forever $ threadDelay periodInMicroSeconds >> void (forkIO update)
This isn't perfect, if one of the fetchUrl calls is badly delayed then its write could clobber a newer call that returned quickly. That's probably a minor concern but you could handle it with a counter and compare and swap operation.
Related
I'm working with dbus in haskell, and I'm having difficulties figuring out how to export dbus methods that perform stateful operations. Below is a fully fleshed out example to illustrate where I'm stuck.
Let's say you're writing a counter service with dbus. When the service starts, the counter is initially at 0. The service defines a dbus API that exposes a count method, which returns the current value of the counter, and an update method, which increments that counter, and returns the new value.
Here's a pseudocodey implementation of the behavior I just described, using a message-passing-style of communication:
-- | Updates the given integer.
update :: Int -> Int
update = (+1)
-- | main function with message-passing-style communication
mainLoop :: Int -> IO Int
mainLoop state = do
case receiveMessage of
"update" -> do -- increment / update counter
sendReply $ update state
mainLoop $ update state -- recurse
"count" -> do -- return counter value
sendReply state
mainLoop state
"stop" -> do -- stop the counting service
exitSuccess
main :: IO ()
main = do
mainLoop 0
However, dbus uses method-calls, not message passing. So, I need to be able to export a count and update method that behaves the same way as in my message-passing example.
The stub we'll work with is something like this:
-- | Updates the given integer.
update :: Int -> Int
update = (+1)
main :: IO ()
main = do
let initialState = 0
dbus <- connectSession
export dbus "/org/counter/CounterService"
[ autoMethod "org.counter.CounterService" "update" ({-- call update? --})
, autoMethod "org.counter.CounterService" "count" ({-- return state? --}) ]
And here lies my question: How should I encode the missing {-- call update? --} and {-- return state? --} functions?
I know I can use an MVar to create global mutable state, and then just make the functions read from that, but I want to avoid mutability as much as possible here. I think I can do this with the Reader/State monad somehow, maybe by sneaking a get/ask into the functions, but I don't know how to handle the types with respect to DBus.
Ultimately, the dbus package only allows you to export methods of type Method, which has a methodHandler field that returns the monadic value:
DBusR Reply === ReaderT Client IO Reply
and there's no room in there for you to squeeze in your own StateT monad. You could export a Property instead, but that doesn't help you, since the fields of that type also involve IO actions to get and set the property.
So, maintaining your state in IO, most likely as an MVar, is going to be pretty much unavoidable.
You could try to separate your pure-ish "core" from the IO shell. One way to do it (as per #HTNW's comment) is to write the core in State:
type Counter = Int
update :: State Counter ()
update = modify (+1)
count :: State Counter Int
count = get
and lift it to IO with something like:
import Data.Tuple (swap)
runStateIO :: State s a -> MVar s -> IO a
runStateIO act s = modifyMVar s (return . swap . runState act)
main = do
...
s <- newMVar 0
let run act = runStateIO act s
export dbus "/com/example/CounterService"
defaultInterface
{ interfaceName = "com.example.CounterService"
, interfaceMethods =
[ autoMethod "update" (run update)
, autoMethod "count" (run count) ]
}
(I think I'm using a newer version of dbus here than you, since the API is a little different -- I'm testing with dbus-1.2.16, FYI.)
One potential drawback is that this is going to lock the state MVar on every method call, even if the call doesn't need the state or needs only read-only access. DBus services are typically pretty low-traffic with method calls that are intended to complete quickly, so I don't think this is a problem in practice.
Anyway, a here's a full working program, which I tested with:
dbus-send --print-reply --session --dest=com.example /com/example/CounterService com.example.CounterService.update
dbus-send --print-reply --session --dest=com.example /com/example/CounterService com.example.CounterService.count
The program:
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -Wall #-}
import System.IO
import System.Exit
import Data.Int
import DBus.Client
import Data.Tuple
import Control.Concurrent
import Control.Monad.State
type Counter = Int32
update :: State Counter ()
update = modify (+1)
count :: State Counter Int32
count = get
runStateIO :: State s a -> MVar s -> IO a
runStateIO act s = modifyMVar s (return . swap . runState act)
main :: IO ()
main = do
dbus <- connectSession
requestResult <- requestName dbus "com.example" []
when (requestResult /= NamePrimaryOwner) $ do
hPutStrLn stderr "Name \"com.example\" not available"
exitFailure
s <- newMVar 0
let run act = runStateIO act s
export dbus "/com/example/CounterService"
defaultInterface
{ interfaceName = "com.example.CounterService"
, interfaceMethods =
[ autoMethod "update" (run update)
, autoMethod "count" (run count) ]
}
forever $ threadDelay 60000000
The following program creates two threads running concurrently, that each sleep for a random amount of time, before printing a line of text to stdout.
import Control.Concurrent
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer str = forkIO . forever $ do
randomDelay 1000000 -- μs
putStrLn str
main = do
printer "Hello"
printer "World"
return ()
The output generally looks something like
>> main
Hello
World
World
Hello
WoHrelld
o
World
Hello
*Interrupted
>>
How do you ensure that only one thread can write to stdout at a time? This seems like the kind of thing that STM should be good at, but all STM transactions must have the type STM a for some a, and an action that prints to the screen has type IO a, and there doesn't seem to be a way to embed IO into STM.
The way to handle output with STM is to have an output queue that is shared between all threads and which is processed by a single thread.
import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer queue str = forkIO . forever $ do
randomDelay 1000000 -- μs
atomically $ writeTChan queue str
prepareOutputQueue = do
queue <- newTChanIO
forkIO . forever $ atomically (readTChan queue) >>= putStrLn
return queue
main = do
queue <- prepareOutputQueue
printer queue "Hello"
printer queue "World"
return ()
Locking in the way you're describing isn't possible usingSTM. This is because STM is based on optimistic locking and so every transaction must be restartable at any point. If you embedded an IO operation into STM, it could be executed multiple times.
Probably the easiest solution for this problem is to use a MVar as a lock:
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer lock str = forkIO . forever $ do
randomDelay 1000000
withMVar lock (\_ -> putStrLn str)
main = do
lock <- newMVar ()
printer lock "Hello"
printer lock "World"
return ()
In this solution the lock is passed as an argument to printer.
Some people prefer to declare the lock as a top-level global variable, but currently this requires unsafePerformIO and relies on properties of GHC that AFAIK aren't part of the Haskell Language Report (in particular, it relies on the fact that a global variable with non-polymorphic type is evaluated at most once during the execution of a program).
A bit of research, based on Petr Pudlák's answer shows that there is a module Control.Concurrent.Lock in the concurrent-extra package that provides an abstraction around MVar ()-based locks.
The solution using that library is
import Control.Concurrent
import qualified Control.Concurrent.Lock as Lock
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer lock str = forkIO . forever $ do
randomDelay 1000
Lock.with lock (putStrLn str)
main = do
lock <- Lock.new
printer lock "Hello"
printer lock "World"
return ()
This is the example using global lock as mentioned by Petr.
import Control.Concurrent
import Control.Monad
import System.Random
import Control.Concurrent.MVar (newMVar, takeMVar, putMVar, MVar)
import System.IO.Unsafe (unsafePerformIO)
{-# NOINLINE lock #-}
lock :: MVar ()
lock = unsafePerformIO $ newMVar ()
printer x = forkIO . forever $ do
randomDelay 100000
() <- takeMVar lock
let atomicPutStrLn str = putStrLn str >> putMVar lock ()
atomicPutStrLn x
randomDelay t = randomRIO (0, t) >>= threadDelay
main = do
printer "Hello"
printer "World"
return ()
You can actually implement a lock using STM if you want, though an MVar will almost certainly perform better.
newtype Lock = Lock (TVar Status)
data Status = Locked | Unlocked
newLocked :: IO Lock
newLocked = Lock <$> newTVarIO Locked
newUnlocked :: IO Lock
newUnlocked = Lock <$> newTVarIO Unlocked
-- | Acquire a lock.
acquire :: Lock -> IO ()
acquire (Lock tv) = atomically $
readTVar tv >>= \case
Locked -> retry
Unlocked -> writeTVar tv Locked
-- | Try to acquire a lock. If the operation succeeds,
-- return `True`.
tryAcquire :: Lock -> IO Bool
tryAcquire (Lock tv) = atomically $
readTVar tv >>= \case
Locked -> pure False
Unlocked -> True <$ writeTVar tv Locked
-- | Release a lock. This version throws an exception
-- if the lock is unlocked.
release :: Lock -> IO ()
release (Lock tv) = atomically $
readTVar tv >>= \case
Unlocked -> throwSTM DoubleRelease
Locked -> writeTVar tv Unlocked
data DoubleRelease = DoubleRelease deriving Show
instance Exception DoubleRelease where
displayException ~DoubleRelease = "Attempted to release an unlocked lock."
-- | Release a lock. This version does nothing if
-- the lock is unlocked.
releaseIdempotent :: Lock -> IO ()
releaseIdempotent (Lock tv) = atomically $ writeTVar tv Unlocked
-- | Get the status of a lock.
isLocked :: Lock -> IO Status
isLocked (Lock tv) = readTVarIO tv
acquire/release pairs need careful masking and exception handling, much like primitive MVar operations. The documentation suggests, but does not actually state, that STM operations are interruptible when they retry; assuming this is true, the same approach used for withMVar will work here. Note: I've opened a GHC ticket to document retry interruptibility.
I've written a small server which accepts registrations as POST requests and persists them by appending them to a file. As soon as I put this server under load (I use Apache JMeter with 50 concurrent threads and a repeat count of 10, and the post consists of one field with ~7k of text data), I get lots of "resource busy, file is locked" errors:
02/Nov/2013:18:07:11 +0100 [Error#yesod-core] registrations.txt: openFile: resource busy (file is locked) #(yesod-core-1.2.4.2:Yesod.Core.Class.Yesod ./Yesod/Core/Class/Yesod.hs:485:5)
Here is a stripped-down version of the code:
{-# LANGUAGE QuasiQuotes, TemplateHaskell, MultiParamTypeClasses, OverloadedStrings, TypeFamilies #-}
import Yesod
import Text.Hamlet
import Control.Applicative ((<$>), (<*>))
import Control.Monad.IO.Class (liftIO)
import Data.Text (Text, pack, unpack)
import Data.String
import System.IO (withFile, IOMode(..), hPutStrLn)
data Server = Server
data Registration = Registration
{ text :: Text
}
deriving (Show, Read)
mkYesod "Server" [parseRoutes|
/reg RegR POST
|]
instance Yesod Server
instance RenderMessage Server FormMessage where
renderMessage _ _ = defaultFormMessage
postRegR :: Handler Html
postRegR = do
result <- runInputPost $ Registration
<$> ireq textField "text"
liftIO $ saveRegistration result
defaultLayout [whamlet|<p>#{show result}|]
saveRegistration :: Registration -> IO ()
saveRegistration r = withFile "registrations.txt" AppendMode (\h -> hPutStrLn h $ "+" ++ show r)
main :: IO ()
main = warp 8080 Server
I compiled the code on purpose without -threaded, and the OS shows only a single thread running. Nonetheless it looks to me like the requests are not completely serialised, and a new request is already handled before the old one has been written to disk.
Could you tell me how I can avoid the error message and ensure that all requests are handled successfully? Performance is not an issue yet.
It's perfectly OK to write to a Handle from several threads. In fact, Handles have MVars inside them to prevent weird concurrent behaviour. What you probably want is not to handle [sic] MVars by hand (which can lead to deadlock if, for instance, a handler throws an exception) but lift the withFile call outside the individual handler threads. The file stays open all the time - opening it on each request would be slow anyway.
I don't know much about Yesod, but I'd recommend something like this (probably doesn't compile):
data Server = Server { handle :: Handle }
postRegR :: Handler Html
postRegR = do
h <- handle `fmap` getYesod
result <- runInputPost $ Registration
<$> ireq textField "text"
liftIO $ saveRegistration h result
defaultLayout [whamlet|<p>#{show result}|]
saveRegistration :: Handle -> Registration -> IO ()
saveRegistration h r = hPutStrLn h $ "+" ++ show r
main :: IO ()
main = withFile "registrations.txt" AppendMode $ \h -> warp 8080 (Server h)
-- maybe there's a better way?
Aside: if you wanted to file to be written asynchronously you could write to a queue (if it were a log file or something), but in your use case you probably want to let the user know if their registration failed, so I'd recommend staying with this form.
Even without -threaded the Haskell runtime will have several "green threads" running cooperatively. You need to use Control.Concurrent to limit access to the file because you cannot have several threads writing to it at once.
The easiest way is to have an MVar () in your Server and have each request "take" the unit from the MVar before opening the file and then put it back after the file operation has been completed. You can use bracket to ensure that the lock is released even if writing the file fails. E.g. something like
import Control.Concurrent
import Control.Exception (bracket_)
type Lock = MVar ()
data Server = Server { fileLock :: Lock }
saveRegistration :: Registration -> Lock -> IO ()
saveRegistration r lock = bracket_ acquire release updateFile where
acquire = takeMVar lock
release = putMVar lock ()
updateFile =
withFile "registrations.txt" AppendMode (\h -> hPutStrLn h $ "+" ++ show r)
I'm working on a haskell network application and I use the actor pattern to manage multithreading. One thing I came across is how to store for example a set of client sockets/handles. Which of course must be accessible for all threads and can change when clients log on/off.
Since I'm coming from the imperative world I thought about some kind of lock-mechanism but when I noticed how ugly this is I thought about "pure" mutability, well actually it's kind of pure:
import Control.Concurrent
import Control.Monad
import Network
import System.IO
import Data.List
import Data.Maybe
import System.Environment
import Control.Exception
newStorage :: (Eq a, Show a) => IO (Chan (String, Maybe (Chan [a]), Maybe a))
newStorage = do
q <- newChan
forkIO $ storage [] q
return q
newHandleStorage :: IO (Chan (String, Maybe (Chan [Handle]), Maybe Handle))
newHandleStorage = newStorage
storage :: (Eq a, Show a) => [a] -> Chan (String, Maybe (Chan [a]), Maybe a) -> IO ()
storage s q = do
let loop = (`storage` q)
(req, reply, d) <- readChan q
print ("processing " ++ show(d))
case req of
"add" -> loop ((fromJust d) : s)
"remove" -> loop (delete (fromJust d) s)
"get" -> do
writeChan (fromJust reply) s
loop s
store s d = writeChan s ("add", Nothing, Just d)
unstore s d = writeChan s ("remove", Nothing, Just d)
request s = do
chan <- newChan
writeChan s ("get", Just chan, Nothing)
readChan chan
The point is that a thread (actor) is managing a list of items and modifies the list according to incoming requests. Since thread are really cheap I thought this could be a really nice functional alternative.
Of course this is just a prototype (a quick dirty proof of concept).
So my question is:
Is this a "good" way of managing shared mutable variables (in the actor world) ?
Is there already a library for this pattern ? (I already searched but I found nothing)
Regards,
Chris
Here is a quick and dirty example using stm and pipes-network. This will set up a simple server that allows clients to connect and increment or decrement a counter. It will display a very simple status bar showing the current tallies of all connected clients and will remove client tallies from the bar when they disconnect.
First I will begin with the server, and I've generously commented the code to explain how it works:
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar
import qualified Data.HashMap.Strict as H
import Data.Foldable (forM_)
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (unless)
import Control.Monad.Trans.State.Strict
import qualified Data.ByteString.Char8 as B
import Control.Proxy
import Control.Proxy.TCP
import System.IO
main = do
hSetBuffering stdout NoBuffering
{- These are the internal data structures. They should be an implementation
detail and you should never expose these references to the
"business logic" part of the application. -}
-- I use nRef to keep track of creating fresh Ints (which identify users)
nRef <- newTVarIO 0 :: IO (TVar Int)
{- hMap associates every user (i.e. Int) with a counter
Notice how I've "striped" the hash map by storing STM references to the
values instead of storing the values directly. This means that I only
actually write the hashmap when adding or removing users, which reduces
contention for the hash map.
Since each user gets their own unique STM reference for their counter,
modifying counters does not cause contention with other counters or
contention with the hash map. -}
hMap <- newTVarIO H.empty :: IO (TVar (H.HashMap Int (TVar Int)))
{- The following code makes heavy use of Haskell's pure closures. Each
'let' binding closes over its current environment, which is safe since
Haskell is pure. -}
let {- 'getCounters' is the only server-facing command in our STM API. The
only permitted operation is retrieving the current set of user
counters.
'getCounters' closes over the 'hMap' reference currently in scope so
that the server never needs to be aware about our internal
implementation. -}
getCounters :: STM [Int]
getCounters = do
refs <- fmap H.elems (readTVar hMap)
mapM readTVar refs
{- 'init' is the only client-facing command in our STM API. It
initializes the client's entry in the hash map and returns two
commands: the first command is what the client calls to 'increment'
their counter and the second command is what the client calls to log
off and delete
'delete' command.
Notice that those two returned commands each close over the client's
unique STM reference so the client never needs to be aware of how
exactly 'init' is implemented under the hood. -}
init :: STM (STM (), STM ())
init = do
n <- readTVar nRef
writeTVar nRef $! n + 1
ref <- newTVar 0
modifyTVar' hMap (H.insert n ref)
let incrementRef :: STM ()
incrementRef = do
mRef <- fmap (H.lookup n) (readTVar hMap)
forM_ mRef $ \ref -> modifyTVar' ref (+ 1)
deleteRef :: STM ()
deleteRef = modifyTVar' hMap (H.delete n)
return (incrementRef, deleteRef)
{- Now for the actual program logic. Everything past this point only uses
the approved STM API (i.e. 'getCounters' and 'init'). If I wanted I
could factor the above approved STM API into a separate module to enforce
the encapsulation boundary, but I am lazy. -}
{- Fork a thread which polls the current state of the counters and displays
it to the console. There is a way to implement this without polling but
this gets the job done for now.
Most of what it is doing is just some simple tricks to reuse the same
console line instead of outputting a stream of lines. Otherwise it
would be just:
forkIO $ forever $ do
ns <- atomically getCounters
print ns
-}
forkIO $ (`evalStateT` 0) $ forever $ do
del <- get
lift $ do
putStr (replicate del '\b')
putStr (replicate del ' ' )
putStr (replicate del '\b')
ns <- lift $ atomically getCounters
let str = show ns
lift $ putStr str
put $! length str
lift $ threadDelay 10000
{- Fork a thread for each incoming connection, which listens to the client's
commands and translates them into 'STM' actions -}
serve HostAny "8080" $ \(socket, _) -> do
(increment, delete) <- atomically init
{- Right now, just do the dumb thing and convert all keypresses into
increment commands, with the exception of the 'q' key, which will
quit -}
let handler :: (Proxy p) => () -> Consumer p Char IO ()
handler () = runIdentityP loop
where
loop = do
c <- request ()
unless (c == 'q') $ do
lift $ atomically increment
loop
{- This uses my 'pipes' library. It basically is a high-level way to
say:
* Read binary packets from the socket no bigger than 4096 bytes
* Get the first character from each packet and discard the rest
* Handle the character using the above 'handler' function -}
runProxy $ socketReadS 4096 socket >-> mapD B.head >-> handler
{- The above pipeline finishes either when the socket closes or
'handler' stops looping because it received a 'q'. Either case means
that the client is done so we log them out using 'delete'. -}
atomically delete
Next up is the client, which simply opens a connections and forwards all key presses as single packets:
import Control.Monad
import Control.Proxy
import Control.Proxy.Safe
import Control.Proxy.TCP.Safe
import Data.ByteString.Char8 (pack)
import System.IO
main = do
hSetBuffering stdin NoBuffering
hSetEcho stdin False
{- Again, this uses my 'pipes' library. It basically says:
* Read characters from the console using 'commands'
* Pack them into a binary format
* send them to a server running at 127.0.0.1:8080
This finishes looping when the user types a 'q' or the connection is
closed for whatever reason.
-}
runSafeIO $ runProxy $ runEitherK $
try . commands
>-> mapD (\c -> pack [c])
>-> connectWriteD Nothing "127.0.0.1" "8080"
commands :: (Proxy p) => () -> Producer p Char IO ()
commands () = runIdentityP loop
where
loop = do
c <- lift getChar
respond c
unless (c == 'q') loop
It's pretty simple: commands generates a stream of Chars, which then get converted to ByteStrings and then sent as packets to the server.
If you run the server and a few clients and have them each type in a few keys, your server display will output a list showing how many keys each client typed:
[1,6,4]
... and if some of the clients disconnect they will be removed from the list:
[1,4]
Note that the pipes component of these examples will simplify greatly in the upcoming pipes-4.0.0 release, but the current pipes ecosystem still gets the job done as is.
First, I'd definitely recommend using your own specific data type for representing commands. When using (String, Maybe (Chan [a]), Maybe a) a buggy client can crash your actor simply by sending an unknown command or by sending ("add", Nothing, Nothing), etc. I'd suggest something like
data Command a = Add a | Remove a | Get (Chan [a])
Then you can pattern match on commands in storage in a save way.
Actors have their advantages, but also I feel that they have some drawbacks. For example, getting an answer from an actor requires sending it a command and then waiting for a reply. And the client can't be completely sure that it gets a reply and that the reply will be of some specific type - you can't say I want only answers of this type (and how many of them) for this particular command.
So as an example I'll give a simple, STM solution. It'd be better to use a hash table or a (balanced tree) set, but since Handle implements neither Ord nor Hashable, we can't use these data structures, so I'll keep using lists.
module ThreadSet (
TSet, add, remove, get
) where
import Control.Monad
import Control.Monad.STM
import Control.Concurrent.STM.TVar
import Data.List (delete)
newtype TSet a = TSet (TVar [a])
add :: (Eq a) => a -> TSet a -> STM ()
add x (TSet v) = readTVar v >>= writeTVar v . (x :)
remove :: (Eq a) => a -> TSet a -> STM ()
remove x (TSet v) = readTVar v >>= writeTVar v . delete x
get :: (Eq a) => TSet a -> STM [a]
get (TSet v) = readTVar v
This module implements a STM based set of arbitrary elements. You can have multiple such sets and use them together in a single STM transaction that succeeds or fails at once. For example
-- | Ensures that there is exactly one element `x` in the set.
add1 :: (Eq a) => a -> TSet a -> STM ()
add1 x v = remove x v >> add x v
This would be difficult with actors, you'd have to add it as another command for the actor, you can't compose it of existing actions and still have atomicity.
Update: There is an interesting article explaining why Clojure designers chose not to use actors. For example, using actors, even if you have many reads and only very little writes to a mutable structure, they're all serialized, which can greatly impact performance.
Assume we have an IO action such as
lookupStuff :: InputType -> IO OutputType
which could be something simple such as DNS lookup, or some web-service call against a time-invariant data.
Let's assume that:
The operation never throws any exception and/or never diverges
If it wasn't for the IO monad, the function would be pure, i.e. the result is always the same for equal input parameters
The action is reentrant, i.e. it can be called from multiple threads at the same time safely.
The lookupStuff operation is quite (time-)expensive.
The problem I'm facing is how to properly (and w/o using any unsafe*IO* cheat) implement a reentrant cache, that can be called from multiple threads, and coalesces multiple queries for the same input-parameters into a single request.
I guess I'm after something similiar as GHC's blackhole-concept for pure computations but in the IO "calculation" context.
What is the idiomatic Haskell/GHC solution for the stated problem?
Yeah, basically reimplement the logic. Although it seems similar to what GHC is already doing, that's GHC's choice. Haskell can be implemented on VMs that work very differently, so in that sense it isn't already done for you.
But yeah, just use an MVar (Map InputType OutputType) or even an IORef (Map InputType OutputType) (make sure to modify with atomicModifyIORef), and just store the cache in there. If this imperative solution seems wrong, it's the "if not for the IO, this function would be pure" constraint. If it were just an arbitrary IO action, then the idea that you would have to keep state in order to know what to execute or not seems perfectly natural. The problem is that Haskell does not have a type for "pure IO" (which, if it depends on a database, it is just behaving pure under certain conditions, which is not the same as being a hereditarily pure).
import qualified Data.Map as Map
import Control.Concurrent.MVar
-- takes an IO function and returns a cached version
cache :: (Ord a) => (a -> IO b) -> IO (a -> IO b)
cache f = do
r <- newMVar Map.empty
return $ \x -> do
cacheMap <- takeMVar r
case Map.lookup x cacheMap of
Just y -> do
putMVar r cacheMap
return y
Nothing -> do
y <- f x
putMVar (Map.insert x y cacheMap)
return y
Yeah it's ugly on the inside. But on the outside, look at that! It's just like the type of a pure memoization function, except for it has IO stained all over it.
Here's some code implementing more or less what I was after in my original question:
import Control.Concurrent
import Control.Exception
import Data.Either
import Data.Map (Map)
import qualified Data.Map as Map
import Prelude hiding (catch)
-- |Memoizing wrapper for 'IO' actions
memoizeIO :: Ord a => (a -> IO b) -> IO (a -> IO b)
memoizeIO action = do
cache <- newMVar Map.empty
return $ memolup cache action
where
-- Lookup helper
memolup :: Ord a => MVar (Map a (Async b)) -> (a -> IO b) -> a -> IO b
memolup cache action' args = wait' =<< modifyMVar cache lup
where
lup tab = case Map.lookup args tab of
Just ares' ->
return (tab, ares')
Nothing -> do
ares' <- async $ action' args
return (Map.insert args ares' tab, ares')
The code above builds upon Simon Marlow's Async abstraction as described in Tutorial: Parallel and Concurrent Programming in Haskell:
-- |Opaque type representing asynchronous results.
data Async a = Async ThreadId (MVar (Either SomeException a))
-- |Construct 'Async' result. Can be waited on with 'wait'.
async :: IO a -> IO (Async a)
async io = do
var <- newEmptyMVar
tid <- forkIO ((do r <- io; putMVar var (Right r))
`catch` \e -> putMVar var (Left e))
return $ Async tid var
-- |Extract value from asynchronous result. May block if result is not
-- available yet. Exceptions are returned as 'Left' values.
wait :: Async a -> IO (Either SomeException a)
wait (Async _ m) = readMVar m
-- |Version of 'wait' that raises exception.
wait' :: Async a -> IO a
wait' a = either throw return =<< wait a
-- |Cancels asynchronous computation if not yet completed (non-blocking).
cancel :: Async a -> IO ()
cancel (Async t _) = throwTo t ThreadKilled