Here's an example Haskell FRP program using the reactive-banana library. I'm only just starting to feel my way with Haskell, and especially haven't quite got my head around what FRP means. I'd really appreciate some critique of the code below
{-# LANGUAGE DeriveDataTypeable #-}
module Main where
{-
Example FRP/zeromq app.
The idea is that messages come into a zeromq socket in the form "id state". The state is of each id is tracked until it's complete.
-}
import Control.Monad
import Data.ByteString.Char8 as C (unpack)
import Data.Map as M
import Data.Maybe
import Reactive.Banana
import System.Environment (getArgs)
import System.ZMQ
data Msg = Msg {mid :: String, state :: String}
deriving (Show, Typeable)
type IdMap = Map String String
-- | Deserialize a string to a Maybe Msg
fromString :: String -> Maybe Msg
fromString s =
case words s of
(x:y:[]) -> Just $ Msg x y
_ -> Nothing
-- | Map a message to a partial operation on a map
-- If the 'state' of the message is "complete" the operation is a delete
-- otherwise it's an insert
toMap :: Msg -> IdMap -> IdMap
toMap msg = case msg of
Msg id_ "complete" -> delete id_
_ -> insert (mid msg) (state msg)
main :: IO ()
main = do
(socketHandle,runSocket) <- newAddHandler
args <- getArgs
let sockAddr = case args of
[s] -> s
_ -> "tcp://127.0.0.1:9999"
putStrLn ("Socket: " ++ sockAddr)
network <- compile $ do
recvd <- fromAddHandler socketHandle
let
-- Filter out the Nothings
justs = filterE isJust recvd
-- Accumulate the partially applied toMap operations
counter = accumE M.empty $ (toMap . fromJust <$> justs)
-- Print the contents
reactimate $ fmap print counter
actuate network
-- Get a socket and kick off the eventloop
withContext 1 $ \ctx ->
withSocket ctx Sub $ \sub -> do
connect sub sockAddr
subscribe sub ""
linkSocketHandler sub runSocket
-- | Recieve a message, deserialize it to a 'Msg' and call the action with the message
linkSocketHandler :: Socket a -> (Maybe Msg -> IO ()) -> IO ()
linkSocketHandler s runner = forever $ do
receive s [] >>= runner . fromString . C.unpack
There's a gist here: https://gist.github.com/1099712.
I'd particularly welcome any comments around whether this is a "good" use of accumE, (I'm unclear of this function will traverse the whole event stream each time although I'm guessing not).
Also I'd like to know how one would go about pulling in messages from multiple sockets - at the moment I have one event loop inside a forever. As a concrete example of this how would I add second socket (a REQ/REP pair in zeromq parlance) to query to the current state of the IdMap inside counter?
(Author of reactive-banana speaking.)
Overall, your code looks fine to me. I don't actually understand why you are using reactive-banana in the first place, but you'll have your reasons. That said, if you are looking for something like Node.js, remember that Haskell's leightweight threads make it unnecessary to use an event-based architecture.
Addendum: Basically, functional reactive programming is useful when you have a variety of different inputs, states and output that must work together with just the right timing (think GUIs, animations, audio). In contrast, it's overkill when you are dealing with many essentially independent events; these are best handled with ordinary functions and the occasional state.
Concerning the individual questions:
"I'd particularly welcome any comments around whether this is a "good" use of accumE, (I'm unclear of this function will traverse the whole event stream each time although I'm guessing not)."
Looks fine to me. As you guessed, the accumE function is indeed real-time; it will only store the current accumulated value.
Judging from your guess, you seem to be thinking that whenever a new event comes in, it will travel through the network like a firefly. While this does happen internally, it is not how you should think about functional reactive programming. Rather, the right picture is this: the result of fromAddHandler is the complete list of input events as they will happen. In other words, you should think that recvd contains the ordered list of each and every event from the future. (Of course, in the interest of your own sanity, you shouldn't try to look at them before their time has come. ;-)) The accumE function simply transforms one list into another by traversing it once.
I will need to make this way of thinking more clear in the documentation.
"Also I'd like to know how one would go about pulling in messages from multiple sockets - at the moment I have on event loop inside a forever. As a concrete example of this how would I add second socket (a REQ/REP pair in zeromq parlance) to query to the current state of the IdMap inside counter?"
If the receive function does not block, you can simply call it twice on different sockets
linkSocketHandler s1 s2 runner1 runner2 = forever $ do
receive s1 [] >>= runner1 . fromString . C.unpack
receive s2 [] >>= runner2 . fromString . C.unpack
If it does block, you will need to use threads, see also the section Handling Multiple TCP Streams in the book Real World Haskell. (Feel free to ask a new question on this, as it is outside the scope of this one.)
Related
Intention: Small application to learn Haskell: Downloads a wikipedia-article, then downloads all articles linked from it, then downloads all articles linked from them, and so on... until a specified recursion depth is reached. The result is saved to a file.
Approach: Use a StateT to keep track of the download queue, to download an article and to update the queue. I build a list IO [WArticle] recursively and then print it.
Problem: While profiling I find that total memory in use is proportional to number of articles downloaded.
Analysis: By literature I'm lead to believe this is a laziness and/or strictness issue. BangPatterns reduced the memory consumed but didn't solve proportionality. Furthermore, I know all articles are downloaded before the file output is started.
Possible solutions:
1) The function getNextNode :: StateT CrawlState IO WArticle (below) already has IO. One solution would be to just do the file writing in it and only return the state. It would mean the file is written to in very small chunks though. Doesn't feel very Haskell..
2) Have the function buildHelper :: CrawlState -> IO [WArticle] (below) return [IO WArticle]. Though I wouldn't know how to rewrite that code and have been advised against it in the comments.
Are any of these proposed solutions better than I think they are or are there better alternatives?
import GetArticle (WArticle, getArticle, wa_links, wiki2File) -- my own
type URL = Text
data CrawlState =
CrawlState ![URL] ![(URL, Int)]
-- [Completed] [(Queue, depth)]
-- Called by user
buildDB :: URL -> Int -> IO [WArticle]
buildDB startURL recursionDepth = buildHelper cs
where cs = CrawlState [] [(startURL, recursionDepth)]
-- Builds list recursively
buildHelper :: CrawlState -> IO [WArticle]
buildHelper !cs#(CrawlState _ queue) = {-# SCC "buildHelper" #-}
if null queue
then return []
else do
(!article, !cs') <- runStateT getNextNode cs
rest <- buildHelper cs'
return (article:rest)
-- State manipulation
getNextNode :: StateT CrawlState IO WArticle
getNextNode = {-# SCC "getNextNode" #-} do
CrawlState !parsed !queue#( (url, depth):queueTail ) <- get
article <- liftIO $ getArticle url
put $ CrawlState (url:parsed) (queueTail++ ( if depth > 1
then let !newUrls = wa_links article \\ parsed
!newUrls' = newUrls \\ map fst queue
in zip newUrls' (repeat (depth-1))
else []))
return article
startUrl = pack "https://en.wikipedia.org/wiki/Haskell_(programming_language)"
recursionDepth = 3
main :: IO ()
main = {-# SCC "DbMain" #-}
buildDB startUrl recursionDepth
>>= return . wiki2File
>>= writeFile "savedArticles.txt"
Full code at https://gitlab.com/mattias.br/sillyWikipediaSpider. Current version limited to only download the first eight links from each page to save time. Without changing it download 55 pages at ~600 MB heap usage.
Thanks for any help!
2) Is [IO WArticle] want I want in this case?
Not quite. The problem is that some of the IO WArticle actions depend on the result of a previous action: the links to future pages reside in previously obtained pages. [IO Warticle] can't provide that: it is pure in the sense that you can always find an action in the list without executing the previous actions.
What we need is a kind of "effectful list" that lets us extract articles one by one, progressively performing the neccessary effects, but not forcing us to completely generate the list in one go.
There are several libraries that provide these kinds of "effectful lists": streaming, pipes, conduit. They define monad transformers that extend a base monad with the ability to yield intermediate values before returning a final result. Usually the final result is of a type different from the values that are yielded; it might be simply unit ().
Note: The Functor, Applicative and Monad instances for these libraries differ from the corresponding instances for pure lists. The Functor instances map over the resulting final value, not over the intermediate values which are yielded. To map over the yielded values, they provide separate functions. And The Monad instances sequence effectful lists, instead of trying all combinations. To try all combinations, they provide separate functions.
Using the streaming library, we could modify buildHelper to something like this:
import Streaming
import qualified Streaming.Prelude as S
buildHelper :: CrawlState -> Stream (Of WArticle) IO ()
buildHelper !cs#(CrawlState _ queue) =
if null queue
then return []
else do (article, cs') <- liftIO (runStateT getNextNode cs)
S.yield article
buildHelper cs'
And then we could use functions like mapM_ (from Streaming.Prelude, not the one from Control.Monad!) to process the articles one by one, as they are generated.
Adding a further explaination and code building upon the answer of danidiaz. Here's the final code:
import Streaming
import qualified Streaming.Prelude as S
import System.IO (IOMode (WriteMode), hClose, openFile)
buildHelper :: CrawlState -> Stream (Of WArticle) IO ()
buildHelper cs#( CrawlState _ queue ) =
if null queue
then return ()
else do
(article, cs') <- liftIO (runStateT getNextNode cs)
S.yield article
buildHelper cs'
main :: IO ()
main = do outFileHandle <- openFile filename WriteMode
S.toHandle outFileHandle . S.show . buildHelper $
CrawlState [] [(startUrl, recursionDepth)]
hClose outFileHandle
outFileHandle is a usual file output handle.
S.toHandle takes a stream of String and writes them to the specified handle.
S.show maps show :: WArticle -> String over the stream.
An elegant solution that creates a lazy stream even though it is produced by a series of IO actions (namely downloading websites) and writes it to a file as results become available. On my machine it still uses a lot of memory (relative to the task) during execution but never exceeds 450 MB.
Since sodium has been deprecated by the author I'm trying to port my code to reactive-banana. However, there seem to be some incongruencies between the two that I'm having a hard time overcomming.
For example, in sodium it was easy to retrieve the current value of a behaviour:
retrieve :: Behaviour a -> IO a
retrieve b = sync $ sample b
I don't see how to do this in reactive-banana
(The reason I want this is because I'm trying to export the behaviour as a dbus property. Properties can be queried from other dbus clients)
Edit: Replaced the word "poll" as it was misleading
If you have a Behaviour modelling the value of your property, and you have an Event modelling the incoming requests for the property's value, then you can just use (<#) :: Behavior b -> Event a -> Event b1 to get a new event occurring at the times of your incoming requests with the value the property has at that time). Then you can transform that into the actual IO actions you need to take to reply to the request and use reactimate as usual.
1 https://hackage.haskell.org/package/reactive-banana-1.1.0.0/docs/Reactive-Banana-Combinators.html#v:-60--64-
For conceptual/architectural reasons, Reactive Banana has functions from Event to Behavior, but not vice versa, and it makes sense too, given th nature and meaning of FRP. I'm quite sure you can write a polling function, but instead you should consider changing the underlying code to expose events instead.
Is there a reason you can't change your Behavior into an Event? If not, that would be a good way to go about resolving your issue. (It might in theory even reveal a design shortcoming you have been overlooking so far.)
The answer seems to be "it's sort of possible".
sample corresponds to valueB, but there is no direct equivalent to sync.
However, it can be re-implemented with the help of execute:
module Sync where
import Control.Monad.Trans
import Data.IORef
import Reactive.Banana
import Reactive.Banana.Frameworks
data Network = Network { eventNetwork :: EventNetwork
, run :: MomentIO () -> IO ()
}
newNet :: IO Network
newNet = do
-- Create a new Event to handle MomentIO actions to be executed
(ah, call) <- newAddHandler
network <- compile $ do
globalExecuteEV <- fromAddHandler ah
-- Set it up so it executes MomentIO actions passed to it
_ <- execute globalExecuteEV
return ()
actuate network
return $ Network { eventNetwork = network
, run = call -- IO Action to fire the event
}
-- To run a MomentIO action within the context of the network, pass it to the
-- event.
sync :: Network -> MomentIO a -> IO a
sync Network{run = call} f = do
-- To retrieve the result of the action we set up an IORef
ref <- newIORef (error "Network hasn't written result to ref")
-- (`call' passes the do-block to the event)
call $ do
res <- f
-- Put the result into the IORef
liftIO $ writeIORef ref res
-- and read it back once the event has finished firing
readIORef ref
-- Example
main :: IO ()
main = do
net <- newNet -- Create an empty network
(bhv1, set1) <- sync net $ newBehavior (0 :: Integer)
(bhv2, set2) <- sync net $ newBehavior (0 :: Integer)
set1 3
set2 7
let sumB = (liftA2 (+) bhv1 bhv2)
print =<< sync net (valueB sumB)
set1 5
print =<< sync net (valueB sumB)
return ()
I've so far avoided ever needing unsafePerformIO, but this might have to change today.... I would like to see if the community agrees, or if someone has a better solution.
I have a library which needs to use some config data stored in a bunch of files. This data is guaranteed static (during the run), but needs to be in files that can (on very rare occasions) be edited by an end user who can not compile Haskell programs. (The details are uninportant, but think of "/etc/mime.types" as a pretty good approximation. It is a large almost static data file used throughout many programs).
If this weren't a library I would just use the IO monad.... But because it is a library which is called throughout my code, it literally forces a bubbling up of the IO monad through pretty much everything I have written in multiple modules! Although I need to do a one time read of the data files, this low level call is effetively pure, so this is a pretty unacceptable outcome.
FYI, I plan to also wrap the call in unsafeInterleaveIO, so that only files that are needed will be loaded. My code will look something like this....
dataDir="<path to files>"
datafiles::[FilePath]
datafiles =
unsafePerformIO $
unsafeInterleaveIO $
map (dataDir </>)
<$> filter (not . ("." `isPrefixOf`))
<$> getDirectoryContents dataDir
fileData::[String]
fileData = unsafePerformIO $ unsafeInterleaveIO $ sequence $ readFile <$> datafiles
Given that the data read is referentially transparent, I am pretty sure that unsafePerformIO is safe (this has been discussed in many place, such as "Use of unsafePerformIO appropriate?"). Still, though, if there is a better way, I would love to hear about it.
UPDATE-
In response to Anupam's comment....
There are two reasons why I can't break up the lib into IO and non IO parts.
First, the amount of data is large, and I don't want to read it all into memory at once. Remember that IO is always read strictly.... This is the reason that I need to put in the unsafeInterleaveIO call, to make it lazy. IMHO, once you use unsafeInterleaveIO, you might as well use unsafePerformIO, as the risk is already there.
Second, breaking out the IO specific parts just substitutes the bubbling up of the IO monad with the bubbling up of the IO read code, as well as the passing around of the data (I might actually choose to pass around the data using the state monad anyway, so it really isn't an improvement to substitute the IO monad for the state monad everywhere). This wouldn't be so bad if the low level function itself wasn't effectively pure (ie- think of my /etc/mime.types example above, and imagine a Haskell extensionToMimeType function, which is basically pure, but needs to get the database data from the file.... Suddenly everything from low to high in the stack needs to call or pass through a readMimeData::IO String. Why should each main even need to care about the library choice of a submodule many levels deep?).
I agree with Anupam Jain, you would be better off reading these data files at a somewhat higher level, in IO, and then passing the data in them through the rest of your program purely.
You could, for example, put the functions that need the results of fileData into Reader [String], so that they can just ask for the results as needed (or some Reader Config, where Config holds these strings and whatever else you need).
A sketch of what I'm suggesting follows:
type AppResult = String
fileData :: IO [String]
fileData = undefined -- read the files
myApp :: String -> Reader [String] AppResult
myApp s = do
files <- ask
return undefined -- do whatever with s and config
main = do
config <- fileData
return $ runReader (myApp "test") config
I gather that you don't want to read all the data at once, because that would be costly. And maybe you don't really know up-front what files you will need to load, so loading all of them at the start would be wasteful.
Here's an attempt at a solution. It requires you to work inside a free monad and relegate the side-effecting operations to an interpreter. Some preliminary imports:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString as B
import Data.Monoid
import Data.List
import Data.Functor.Compose
import Control.Applicative
import Control.Monad
import Control.Monad.Free
import System.IO
We define a functor for the free monad. It will offer a value p do the interpreter and continue the computation after receiving a value b:
type LazyLoad p b = Compose ((,) p) ((->) b)
A convenience function to request the loading of a file:
lazyLoad :: FilePath -> Free (LazyLoad FilePath B.ByteString) B.ByteString
lazyLoad path = liftF $ Compose (path,id)
A dummy interpreter function that reads "file contents" from stdin:
interpret :: Free (LazyLoad FilePath B.ByteString) a -> IO a
interpret = iterM $ \(Compose (path,next)) -> do
putStrLn $ "Enter the contents for file " <> path <> ":"
B.hGetLine stdin >>= next
Some silly example functions:
someComp :: B.ByteString -> B.ByteString
someComp b = "[" <> b <> "]"
takesAwhile :: Int
takesAwhile = foldl' (+) 0 $ take 400000000 $ intersperse (negate 1) $ repeat 1
An example program:
main :: IO ()
main = do
r <- interpret $ do
r1 <- someComp <$> lazyLoad "file1"
r2 <- return takesAwhile
if (r2 == 1)
then return r1
else someComp <$> lazyLoad "file2"
putStrLn . show $ r
When executed, this program will request a line, spend some time computing takesAwhile and only then request another line.
If want to allow different kinds of "requests", this solution could be extended with something like Data types à la carte so that each function only needs to know about about the precise effects it requires.
If you are content with allowing only one type of request, you could also use Clients and Servers from Pipes.Core instead of the free monad.
Sometimes I want to run a maximum amount of IO actions in parallel at once for network-activity, etc. I whipped up a small concurrent thread function which works well with https://gist.github.com/810920, but this isn't really a pool as all IO actions must finish before others can start.
The type of what I'm looking for would be something like:
runPool :: Int -> [IO a] -> IO [a]
and should be able to operate on finite and infinite lists.
The pipes package looks like it would be able to achieve this quite well, but I feel there is probably a similar solution to the gist I have provided just using mvars, etc, from the haskell-platform.
Has anyone encountered an idiomatic solution without any heavy dependencies?
You need a thread pool, if you want something short, you could get inspiration from Control.ThreadPool (from the control-engine package which also provide more general functions), for instance threadPoolIO is just :
threadPoolIO :: Int -> (a -> IO b) -> IO (Chan a, Chan b)
threadPoolIO nr mutator = do
input <- newChan
output <- newChan
forM_ [1..nr] $
\_ -> forkIO (forever $ do
i <- readChan input
o <- mutator i
writeChan output o)
return (input, output)
It use two Chan for communication with the outside but that's usually what you want, it really help writing code that don't mess up.
If you absolutely want to wrap it up in a function of your type you can encapsulate the communication too :
runPool :: Int -> [IO a] -> IO [a]
runPool n as = do
(input, output) <- threadPoolIO n (id)
forM_ as $ writeChan input
sequence (repeat (length as) $ readChan output)
This won't keep the order of your actions, is that a problem (it's easy to correct by transmitting the index of the action or just using an array instead to store the responses) ?
Note : the n threads will stay alive forever with this simplistic version, adding a "killAll" returned action to threadPoolIO would resolve this problem handily if you intend to create and trash several of those pool in a long running application (if not, given the weight of threads in Haskell, it's probably not worth the bother).
Note that this function works on finite lists only, that's because IO is normally strict so you can't start to process elements of IO [a] before the whole list is produced, if you really want that you'll have either to use lazy IO with unsafeInterleaveIO (maybe not the best idea) or completely change your model and use something like conduits to stream your results.
I recently started reading about the Go programming language and I found the channel variables a very appealing concept. Is it possible to emulate the same concept in Haskell? Maybe to have a data type Channel a and a monad structure to enable mutable state and functions that work like the keyword go.
I'm not very good in concurrent programming and a simple channel passing mechanism like this in Haskell would really make my life easier.
EDIT
People asked me to clarify what kind of Go's patterns I was interested in translating to Haskell. So Go has channel variables that are first class and can be passed around and returned by functions. I can read and write to these channels, and so communicate easily between routines that can run concurrently. Go also has a go keyword, that according to the language spec initiates the execution of a function concurrently as an independent thread and continues to execute the code without waiting.
The exact pattern I'm interested in is something like this (Go's syntax is weird - variables are declared by varName varType instead of the usual inverted way - but I think it is readable):
func generateStep(ch chan int) {
//ch is a variable of type chan int, which is a channel that comunicate integers
for {
ch <- randomInteger() //just sends random integers in the channel
}
func filter(input, output chan int) {
state int
for {
step <- input //reads an int from the input channel
newstate := update(state, step) //update the variable with some update function
if criteria(newstate, state) {
state = newstate // if the newstate pass some criteria, accept the update
}
output <- state //pass it to the output channel
}
}
func main() {
intChan := make(chan int)
mcChan := make(chan int)
go generateStep(intChan) // execute the channels concurrently
go filter(intChan, mcChan)
for i:=0; i<numSteps; i++ {
x <- mcChan // get values from the filtered channel
accumulateStats(x) // calculate some statistics
}
printStatisticsAbout(x)
}
My primary interest is to do Monte Carlo simulations, in which I generate configurations sequentially by trying to modify the current state of the system and accepting the modification if it satisfies some criteria.
The fact the using those channel stuff I could write a very simple, readable and small Monte Carlo simulation that would run in parallel in my multicore processor really impressed me.
The problem is that Go have some limitations (specially, it lacks polymorphism in the way I'm accustomed to in Haskell), and besides that, I really like Haskell and don't wanna trade it away. So the question is if there's some way to use some mechanics that looks like the code above to do a concurrent simulation in Haskell easily.
EDIT(2, context):
I'm not learned in Computer Science, specially in concurrency. I'm just a guy who creates simple programs to solve simple problems in my daily research routine in a discipline not at all related to CS. I just find the way Haskell works interesting and like to use it to do my little chores.
I never heard about alone pi-calculus or CSP channels. Sorry if the question seems ill posed, it's probably my huge-ignorance-about-the-matter's fault.
You are right, I should be more specific about what pattern in Go I'd like to replicate in Haskell, and I'll try to edit the question to be more specific. But don't expect profound theoretical questions. The thing is just that, from the few stuff I read and coded, it seems Go have a neat way to do concurrency (and in my case this just means that my job of making all my cores humming with numerical calculations is easier), and if I could use a similar syntax in Haskell I'd be glad.
I think what you are looking for is Control.Concurrent.Chan from Base. I haven't found it to be any different from go's chans other then the obvious haskellifications. Channels aren't something that is special to go, have a look at the wiki page about it.
Channels are part of a more general concept called communicating sequential processes (CSP), and if you want to do programming in the style of CSP in Haskell you might want to take a look at the Communicating Haskell Processes (CHP) package.
CHP is only one way of doing concurrency in Haskell, take a look at the Haskellwiki concurrency page for more information. I think your use case might be best written using Data Parrallel Haskell, however that is currently a work in progress, so you might want to use something else for now.
Extending HaskellElephant's answer, Control.Concurrent.Chan is the way to go for channels and Control.Concurrent's forkIO can emulate the go keyword. To make the syntax a bit more similar to Go, this set of aliases can be used:
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan (newChan, readChan, writeChan)
import Control.Concurrent.MVar (newMVar, swapMVar, readMVar)
data GoChan a = GoChan { chan :: Chan a, closed :: MVar Bool }
go :: IO () -> IO ThreadId
go = forkIO
make :: IO (GoChan a)
make = do
ch <- newChan
cl <- newMVar False
return $ GoChan ch cl
get :: GoChan a -> IO a
get ch = do
cl <- readMVar $ closed ch
if cl
then error "Can't read from closed channel!"
else readChan $ chan ch
(=->) :: a -> GoChan a -> IO ()
v =-> ch = do
cl <- readMVar $ closed ch
if cl
then error "Can't write to closed channel!"
else writeChan (chan ch) v
forRange :: GoChan a -> (a -> IO b) -> IO [b]
forRange ch func = fmap reverse $ range_ ch func []
where range_ ch func acc = do
cl <- readMVar $ closed ch
if cl
then return ()
else do
v <- get ch
func v
range_ ch func $ v : acc
close :: GoChan a -> IO ()
close ch = do
swapMVar (closed ch) True
return ()
This can be used like so:
import Control.Monad
generate :: GoChan Int -> IO ()
generate c = do
forM [1..100] (=-> c)
close c
process :: GoChan Int -> IO ()
process c = forRange c print
main :: IO ()
main = do
c <- make
go $ generate c
process c
(Warning: untested code)