Message passing concurrency in Haskell with multiple senders and receivers

Message passing concurrency in Haskell with multiple senders and receivers - haskell

I am trying to solve a problem involving multiple senders and receivers, and would like some feedback on whether my approach is on the right track.
Problem: We have N leaders and M followers, who must all be represented by individual threads. Everyone is a dancer, and has an associated "dance card" with the names of 8 different dances. each leader must ask a follower if they can dance a specific dance. Followers wait for invitations from leaders and accept only if they are not already dancing that dance and if they have not agreed to dance with this leader for 2 other dances. If the leaders hears back that their invite was accepted, they move on trying to secure a match for the next dance. Otherwise, they continue trying to find a match for the same dance. At the end, the leaders "dance card" is printed with each dance and the ID of the follower with whom they are dancing that dance.
Approach: I have created two functions: leader and follower. In main, I use forkIO to call leader n times and follower m times. However, I'm running into the issue of how I will keep state (specifically the dancecard). I was thinking of creating a type class "Dancer" and then two instances of it : Leader and Follower. Each leader and each follower would have a unique ID (anything from 1 to N or M). Each would also need an mvar to serve as its own personal mailbox . Leaders would somehow need to "get" a follower's mvar in order to put something in it so that that same follower can take it out and respond yes or no to the invite. Regarding the dancecard, I think it would it be best to incorporate the state monad. For example, when a leader invites a follower to a dance, a follower should be able to look at their dancecard and verify that they don't already have a partner for that dance.

Wow, you already have a typeclass, two instances, and a state monad, and you haven't even settled on the types of your MVars! Things are getting complicated.
I worry that you may be falling into the Haskell-as-Java trap, where you've come up with an object-oriented solution in your head, and you're now trying to translate that directly into Haskell, thinking about your dancers as stateful objects with shared methods wrapped up in a "class", etc., etc.
I would suggest a different approach. Dancers aren't "things"; they're tasks. Implement them as straightforward functions and use argument passing and recursion in place of "state", as is typical for idiomatic Haskell.
Spoilers follow, but here is a simple way to define a "follower" that has an id, responds to requests via a pair of request/response MVars, and maintains a dance card using a recursive core loop. Note that the Follower data type isn't supposed to be a "follower object" (e.g., it has no dance card); it's just a convenient way of documenting the return value from follower which serves as a "handle" for identifying and communicating with the follower task:
type LeaderId = Int
type FollowerId = Int
type Dance = Int
-- |A dance card for a follower with a list of dance/leader pairs.
data Card = Card { getCard :: [(Dance, LeaderId)] } deriving (Show)
emptyCard = Card []
-- |Follower handle giving its id and request/response MVars
data Follower =
Follower { followerId :: FollowerId
, request :: MVar (Dance, LeaderId)
, response :: MVar Bool
}
-- |Create a new follower task with given id.
follower :: FollowerId -> IO Follower
follower followerId_ = do
req <- newEmptyMVar
res <- newEmptyMVar
let loop (Card xs) = do
-- get next request
(dance, leaderId_) <- takeMVar req
case lookup dance xs of
-- if dance is free and we haven't danced too often w/ this leader
Nothing | length (filter ((==leaderId_) . snd) xs) < 2
-- then say yes and update dance card
-> do putMVar res True
loop (Card $ (dance, leaderId_) : xs)
-- otherwise, refuse
_ -> do putMVar res False
loop (Card xs)
forkIO $ loop emptyCard
return $ Follower followerId_ req res
You can create and test out a couple of followers by asking them to dance:
> f1 <- follower 1 -- follower #1
> f2 <- follower 2 -- follower #2
> putMVar (request f1) (1, 10) -- dance #1 w/ leader #10
> takeMVar (response f1)
True -- hooray!
> putMVar (request f1) (1, 14) -- dance #1 w/ leader #14
> takeMVar (response f1)
False -- wah! dance is taken
> putMVar (request f2) (1, 14) -- try different follower
> takeMVar (response f2)
True -- hooray!
>
Note that these particular followers can neither be queried for their dance cards nor told to exit their infinite loops. You don't need that for this application (we only need the dance cards from the leaders and don't care if there are a bunch of stuck lightweight threads when we get the answer), but you could always add a couple of MVars if you did.
Similarly, you should be able to implement a leader as a function with a simple recursive core loop. Note that, if the leader tries to just fill its dance card in order, it doesn't actually need to keep track of the dance card as it goes -- the final dance card (and "core loop") is just a mapM over trying to fill dance slots 1 to 8.
How do you provide the leaders with the ability to request dances from followers? Well, create the full set of followers first and pass a list of follower handles ([Follower]) as an argument to the leader creation function. How do you get dance cards back from the leaders? The leader function should return an MVar for the card, and the main function can mapM takeMVar leadersDanceCards to get the full list of dance cards.

Related

Inner join in persistent or should I use esqueleto?

I have this fragment describing Notification and Notified entities:
Notification
type NotiType
release ReleaseId
date UTCTime
Notified
aboutWhat NotificationId
unread Bool
user UserId
Now I want to write this:
-- | Mark specified notification as already seen by specific user. Note that
-- we use 'ReleaseId' to select notification, so this may result in several
-- notifications marked as “read” if they happen to be about the same
-- release. This is generally what you want.
markAsRead
:: ReleaseId -- ^ Release in question
-> UserId -- ^ User who has seen mentioned release
-> SqlPersistM ()
markAsRead release user = do
ns <- selectKeysList [ NotificationRelease ==. release ] []
updateWhere [ NotifiedAboutWhat <-. ns
, NotifiedUnread ==. True
, NotifiedUser ==. user ]
[ NotifiedUnread =. False ]
This works, but extracting list of notifications as list and then using it to select things in another table… well that's not exactly right. Obviously I need a join here and then I will be able to update everything efficiently.
How to do it in pure persistent? Is it possible and is it a good idea in this case to stay with persistent for this sort of task? Should I use esqueleto instead? It looks like I'll need to learn different DSL to work with it, so I'm not sure whether to switch or not.
How to write markAsRead properly with persistent (if possible)?

As Greg mentioned, Esqueleto is the way to go. You can try reading its main module documentation.
Currently Esqueleto doesn't support joins on UPDATEs. However, you can use subqueries to the same effect.
Untested code to get you started:
-- | Mark specified notification as already seen by specific user. Note that
-- we use 'ReleaseId' to select notification, so this may result in several
-- notifications marked as “read” if they happen to be about the same
-- release. This is generally what you want.
markAsRead
:: ReleaseId -- ^ Release in question
-> UserId -- ^ User who has seen mentioned release
-> SqlPersistM ()
markAsRead release user =
update $ \n -> do
set n [ NotifiedUnread =. val False ]
where_ $
n ^. NotifiedUnread ==. val True &&.
n ^. NotifiedUser ==. val user &&.
n ^. NotifiedAboutWhat `in_`
(subList_select $
from $ \t -> do
where_ $ t ^. NotificationRelease ==. val release
return $ t ^. NotificationId)

Yes Esqueleto if you want to do joins. Persistent works well with embedding data if your DB and your data modelling support that.

Abstracting over repeated application within Monad chains

The Haskell wikibook has an example that shows how to chain lookup commands when trying to find different pieces of connected information throughout a database, seen here:
getTaxOwed :: String -- their name
-> Maybe Double -- the amount of tax they owe
getTaxOwed name =
lookup name phonebook >>=
(\number -> lookup number governmentDatabase) >>=
(\registration -> lookup registration taxDatabase)
and rewritten in do notation:
getTaxOwed name = do
number <- lookup name phonebook
registration <- lookup number governmentDatabase
lookup registration taxDatabase
Now, anytime I see a function repeated more than once I immediately try to think of ways to abstract over its repeated application, but as I haven't used Monads much in practice yet, and as they seem to already be at a pretty high level of abstraction, I didn't know how to approach that in this case.
What are some ways, if any, a coder could abstract over the common pattern above, that is, a call to lookup in every line?
(an aside: is this an appropriate context for the phrase "abstract over"? I felt it captured my meaning, but I'm not sure, and I'd like to make sure I'm using terminology appropriately as a relatively new coder; I looked through other posts which clarified its use and meaning but I still can't figure it out for this particular example)

Big thanks to Carsten for the link to foldM! Credit to them for the insight of this answer.
So, if we use foldM, we can write a function that repeatedly performs a lookup chained through multiple directories that depend upon each previous result. If, thanks to the use of monads, at any point lookup cannot find the current key in a directory, it will terminate, and return Nothing:
lookupALot :: Eq a => a -> [(a,b)] -> Maybe b
lookupALot key directories = foldM lookup key directories
this has output of the form
foldM f k1 [d1, d2, ..., dm] -- k == key, d == directory
==
do
k2 <- f k1 d1
k3 <- f k2 d2
...
f km dm
which is exactly the same structure as
do
number <- lookup name phonebook
registration <- lookup number governmentDatabase
lookup registration taxDatabase
Hence, a more compact way of writing getTaxOwed would be:
getTaxOwed :: String -> Maybe Double
getTaxOwed name = foldM lookup name [phonebook, governmentDatabase, taxDatabase]
Which kinda blows me away! That line of code will find the phone-number associated with a person's name, then check the governmentDatabase with their number for their registration, and finally find their tax information from that registration. Note though, that this will only work for data in the form of [(a,b)], as indicated by the type of lookupALot.

can xmonad's logHook be run at set intervals rather than in (merely) response to layout events?

I'm using dynamicLogWithPP from XMonad.Hooks.DynamicLog together with dzen2 as a status bar under xmonad. One of the things I'd like to have displayed in the bar is the time remaining in the currently playing track in audacious (if any). Getting this information is easy:
audStatus :: Player -> X (Maybe String)
audStatus p = do
info <- liftIO $ tryS $ withPlayer p $ do
ispaused <- paused
md <- getMetadataString
timeleftmillis <- (-) <$> (getCurrentTrack >>= songFrames) <*> time
let artist = md ! "artist"
title = md ! "title"
timeleft = timeleftmillis `quot` 1000
(minutes, seconds) = timeleft `quotRem` 60
disp = artist ++ " - " ++ title ++ " (-"++(show minutes)++":"++(show seconds)++")" -- will be wrong if seconds < 10
audcolor False = dzenColor base0 base03
audcolor True = dzenColor base1 base02
return $ wrap "^ca(1, pms p)" "^ca()" (audcolor ispaused disp)
return $ either (const Nothing) Just info
So I can stick that in ppExtras and it works fine—except it only gets run when the logHook gets run, and that happens only when a suitable event comes down the pike. So the display is potentially static for a long time, until I (e.g.) switch workspaces.
It seems like some people just run two dzen bars, with one getting output piped in from a shell script. Is that the only way to have regular updates? Or can this be done from within xmonad (without getting too crazy/hacky)?
ETA: I tried this, which seems as if it should work better than it does:
create a TChan for updates from XMonad, and another for updates from a function polling Audacious;
set the ppOutput field in the PP structure from DynamicLog to write to the first TChan;
fork the audacious-polling function and have it write to the second TChan;
fork a function to read from both TChans (checking that they aren't empty, first), and combining the output.
Updates from XMonad are read from the channel and processed in a timely fashion, but updates from Audacious are hardly registered at all—every five or so seconds at best. It seems as if some approach along these lines ought to work, though.

I know this is an old question, but I came here looking for an answer to this a few days ago, and I thought I'd share the way I solved it. You actually can do it entirely from xmonad. It's a tiny bit hacky, but I think it's much nicer than any of the alternatives I've come across.
Basically, I used the XMonad.Util.Timer library, which will send an X event after a specified time period (in this case, one second). Then I just wrote an event hook for it, which starts the timer again, and then manually runs the log hook.
I also had to use the XMonad.Util.ExtensibleState library, because Timer uses an id variable to make sure it's responding to the right event, so I have to store that variable between events.
Here's my code:
{-# LANGUAGE DeriveDataTypeable #-}
import qualified XMonad.Util.ExtensibleState as XS
import XMonad.Util.Timer
...
-- wrapper for the Timer id, so it can be stored as custom mutable state
data TidState = TID TimerId deriving Typeable
instance ExtensionClass TidState where
initialValue = TID 0
...
-- put this in your startupHook
-- start the initial timer, store its id
clockStartupHook = startTimer 1 >>= XS.put . TID
-- put this in your handleEventHook
clockEventHook e = do -- e is the event we've hooked
(TID t) <- XS.get -- get the recent Timer id
handleTimer t e $ do -- run the following if e matches the id
startTimer 1 >>= XS.put . TID -- restart the timer, store the new id
ask >>= logHook.config -- get the loghook and run it
return Nothing -- return required type
return $ All True -- return required type
Pretty straightforward. I hope this is helpful to someone.

It cannot be done from within xmonad; xmonad's current threading model is a bit lacking (and so is dzen's). However, you can start a separate process that periodically polls your music player and then use one of the dzen multiplexers (e.g. dmplex) to combine the output from the two processes.
You may also want to look into xmobar and taffybar, which both have better threading stories than dzen does.
With regards to why your proposed TChan solution doesn't work properly, you might want to read the sections "Conventions", "Foreign Imports", and "The Non-Threaded Runtime" at my crash course on the FFI and gtk, keeping in mind that xmonad currently uses GHC's non-threaded runtime. The short answer is that xmonad's main loop makes an FFI call to Xlib that waits for an X event; this call blocks all other Haskell threads from running until it returns.

Updating elements of multiple collections with dynamic functions

Setup:
I have several collections of various data structures witch represent the state of simulated objects in a virtual system. I also have a number of functions that transform (that is create a new copy of the object based on the the original and 0 or more parameters) these objects.
The goal is to allow a user to select some object to apply transformations to (within the rules of the simulation), apply those the functions to those objects and update the collections by replacing the old objects with the new ones.
I would like to be able to build up a function of this type by combining smaller transformations into larger ones. Then evaluate this combined function.
Questions:
How to I structure my program to make this possible?
What kind of combinator do I use to build up a transaction like this?
Ideas:
Put all the collections into one enormous structure and pass this structure around.
Use a state monad to accomplish basically the same thing
Use IORef (or one of its more potent cousins like MVar) and build up an IO action
Use a Functional Reactive Programing Framework
1 and 2 seem like they carry a lot of baggage around especially if I envision eventually moving some of the collections into a database. (Darn IO Monad)
3 seems to work well but starts to look a lot like recreating OOP. I'm also not sure at what level to use the IORef. (e.g IORef (Collection Obj) or Collection (IORef Obj) or data Obj {field::IORef(Type)} )
4 feels the most functional in style, but it also seems to create a lot of code complexity without much payoff in terms of expressiveness.
Example
I have a web store front. I maintain a collections of products with (among other things) the quantity in stock and a price. I also have a collection of users who have credit with the store.
A user comes along ands selects 3 products to buy and goes to check out using store credit. I need to create a new products collection that has the amount in stock for the 3 products reduced, create a new user collection with the users account debited.
This means I get the following:
checkout :: Cart -> ProductsCol -> UserCol -> (ProductsCol, UserCol)
But then life gets more complicated and I need to deal with taxes:
checkout :: Cart -> ProductsCol -> UserCol -> TaxCol
-> (ProductsCol, UserCol, TaxCol)
And then I need to be sure to add the order to the shipping queue:
checkout :: Cart
-> ProductsCol
-> UserCol
-> TaxCol
-> ShipList
-> (ProductsCol, UserCol, TaxCol, ShipList)
And so forth...
What I would like to write is something like
checkout = updateStockAmount <*> applyUserCredit <*> payTaxes <*> shipProducts
applyUserCredit = debitUser <*> creditBalanceSheet
but the type-checker would have go apoplectic on me. How do I structure this store such that the checkout or applyUserCredit functions remains modular and abstract? I cannot be the only one to have this problem, right?

Okay, let's break this down.
You have "update" functions with types like A -> A for various specific types A, which may be derived from partial application, that specify a new value of some type in terms of a previous value. Each such type A should be specific to what that function does, and it should be easy to change those types as the program develops.
You also have some sort of shared state, which presumably contains all the information used by any of the aforementioned update functions. Further, it should be possible to change what the state contains, without significantly impacting anything other than the functions acting directly on it.
Additionally, you want to be able to abstractly combine update functions, without compromising the above.
We can deduce a few necessary features of a straightforward design:
An intermediate layer will be necessary, between the full shared state and the specifics needed by each function, allowing pieces of the state to be projected out and replaced independently of the rest.
The types of the update functions themselves are by definition incompatible with no real shared structure, so to compose them you'll need to first combine each with the intermediate layer portion. This will give you updates acting on the entire state, which can then be composed in the obvious way.
The only operations needed on the shared state as a whole are to interface with the intermediate layer, and whatever may be necessary to maintain the changes made.
This breakdown allows each entire layer to be modular to a large extent; in particular, type classes can be defined to describe the necessary functionality, allowing any relevant instance to be swapped in.
In particular, this essentially unifies your ideas 2 and 3. There's an inherent monadic context of some sort here, and the type class interface suggested would allow multiple approaches, such as:
Make the shared state a record type, store it in a State monad, and use lenses to provide the interface layer.
Make the shared state a record type containing something like an STRef for each piece, and combine field selectors with ST monad update actions to provide the interface layer.
Make the shared state a collection of TChans, with separate threads to read/write them as appropriate to communicate asynchronously with an external data store.
Or any number of other variations.

You can store your state in a record, and use lenses to update pieces of state. This lets you write the individual state updating components as simple, focused functions that may be composed to build more complex checkout functions.
{-# LANGUAGE TemplateHaskell #-}
import Data.Lens.Template
import Data.Lens.Common
import Data.List (foldl')
import Data.Map ((!), Map, adjust, fromList)
type User = String
type Item = String
type Money = Int -- money in pennies
type Prices = Map Item Money
type Cart = (User, [(Item,Int)])
type ProductsCol = Map Item Int
type UserCol = Map User Money
data StoreState = Store { _stock :: ProductsCol
, _users :: UserCol
, msrp :: Prices }
deriving Show
makeLens ''StoreState
updateProducts :: Cart -> ProductsCol -> ProductsCol
updateProducts (_,c) = flip (foldl' destock) c
where destock p' (item,count) = adjust (subtract count) item p'
updateUsers :: Cart -> Prices -> UserCol -> UserCol
updateUsers (name,c) p = adjust (subtract (sum prices)) name
where prices = map (\(itemName, itemCount) -> (p ! itemName) * itemCount) c
checkout :: Cart -> StoreState -> StoreState
checkout c s = (users ^%= updateUsers c (msrp s))
. (stock ^%= updateProducts c)
$ s
test = checkout cart store
where cart = ("Bob", [("Apples", 2), ("Bananas", 6)])
store = Store initialStock initialUsers prices
initialStock = fromList
[("Apples", 20), ("Bananas", 10), ("Lambdas", 1000)]
initialUsers = fromList [("Bob", 20000), ("Mary", 40000)]
prices = fromList [("Apples", 100), ("Bananas", 50), ("Lambdas", 0)]

How to add a finalizer on a TVar

Background
In response to a question, I built and uploaded a bounded-tchan (wouldn't have been right for me to upload jnb's version). If the name isn't enough, a bounded-tchan (BTChan) is an STM channel that has a maximum capacity (writes block if the channel is at capacity).
Recently, I've received a request to add a dup feature like in the regular TChan's. And thus begins the problem.
How the BTChan looks
A simplified (and actually non-functional) view of BTChan is below.
data BTChan a = BTChan
{ max :: Int
, count :: TVar Int
, channel :: TVar [(Int, a)]
, nrDups :: TVar Int
}
Every time you write to the channel you include the number of dups (nrDups) in the tuple - this is an 'individual element counter' which indicates how many readers have gotten this element.
Every reader will decrement the counter for the element it reads then move it's read-pointer to then next element in the list. If the reader decrements the counter to zero then the value of count is decremented to properly reflect available capacity on the channel.
To be clear on the desired semantics: A channel capacity indicates the maximum number of elements queued in the channel. Any given element is queued until a reader of each dup has received the element. No elements should remain queued for a GCed dup (this is the main problem).
For example, let there be three dups of a channel (c1, c2, c3) with capacity of 2, where 2 items were written into the channel then all items were read out of c1 and c2. The channel is still full (0 remaining capacity) because c3 hasn't consumed its copies. At any point in time if all references toc3 are dropped (so c3 is GCed) then the capacity should be freed (restored to 2 in this case).
Here's the issue: let's say I have the following code
c <- newBTChan 1
_ <- dupBTChan c -- This represents what would probably be a pathological bug or terminated reader
writeBTChan c "hello"
_ <- readBTChan c
Causing the BTChan to look like:
BTChan 1 (TVar 0) (TVar []) (TVar 1) --> -- newBTChan
BTChan 1 (TVar 0) (TVar []) (TVar 2) --> -- dupBTChan
BTChan 1 (TVar 1) (TVar [(2, "hello")]) (TVar 2) --> -- readBTChan c
BTChan 1 (TVar 1) (TVar [(1, "hello")]) (TVar 2) -- OH NO!
Notice at the end the read count for "hello" is still 1? That means the message is not considered gone (even though it will get GCed in the real implementation) and our count will never decrement. Because the channel is at capacity (1 element maximum) the writers will always block.
I want a finalizer created each time dupBTChan is called. When a dupped (or original) channel is collected all elements remaining to be read on that channel will get the per-element count decremented, also the nrDups variable will be decremented. As a result, future writes will have the correct count (a count that doesn't reserve space for variables not-read by GCed channels).
Solution 1 - Manual Resource Management (what I want to avoid)
JNB's bounded-tchan actually has manual resource management for this reason. See the cancelBTChan. I'm going for something harder for the user to get wrong (not that manual management isn't the right way to go in many cases).
Solution 2 - Use exceptions by blocking on TVars (GHC can't do this how I want)
EDIT this solution, and solution 3 which is just a spin-off, does not work! Due to bug 5055 (WONTFIX) the GHC compiler sends exceptions to both blocked threads, even though one is sufficient (which is theoretically determinable, but not practical with the GHC GC).
If all the ways to get a BTChan are IO, we can forkIO a thread that reads/retries on an extra (dummy) TVar field unique to the given BTChan. The new thread will catch an exception when all other references to the TVar are dropped, so it will know when to decrement the nrDups and individual element counters. This should work but forces all my users to use IO to get their BTChans:
data BTChan = BTChan { ... as before ..., dummyTV :: TVar () }
dupBTChan :: BTChan a -> IO (BTChan a)
dupBTChan c = do
... as before ...
d <- newTVarIO ()
let chan = BTChan ... d
forkIO $ watchChan chan
return chan
watchBTChan :: BTChan a -> IO ()
watchBTChan b = do
catch (atomically (readTVar (dummyTV b) >> retry)) $ \e -> do
case fromException e of
BlockedIndefinitelyOnSTM -> atomically $ do -- the BTChan must have gotten collected
ls <- readTVar (channel b)
writeTVar (channel b) (map (\(a,b) -> (a-1,b)) ls)
readTVar (nrDup b) >>= writeTVar (nrDup b) . (-1)
_ -> watchBTChan b
EDIT: Yes, this is a poor mans finalizer and I don't have any particular reason to avoid using addFinalizer. That would be the same solution, still forcing use of IO afaict.
Solution 3: A cleaner API than solution 2, but GHC still doesn't support it
Users start a manager thread by calling initBTChanCollector, which will monitor a set of these dummy TVars (from solution 2) and do the needed clean-up. Basically, it shoves the IO into another thread that knows what to do via a global (unsafePerformIOed) TVar. Things work basically like solution 2, but the creation of BTChan's can still be STM. Failure to run initBTChanCollector would result in an ever-growing list (space leak) of tasks as the process runs.
Solution 4: Never allow discarding BTChans
This is akin to ignoring the problem. If the user never drops a dupped BTChan then the issue disappears.
Solution 5
I see ezyang's answer (totally valid and appreciated), but really would like to keep the current API just with a 'dup' function.
** Solution 6**
Please tell me there's a better option.
EDIT:
I implemented solution 3 (totally untested alpha release) and handled the potential space leak by making the global itself a BTChan - that chan should probably have a capacity of 1 so forgetting to run init shows up really quick, but that's a minor change. This works in GHCi (7.0.3) but that seems to be incidental. GHC throws exceptions to both blocked threads (the valid one reading the BTChan and the watching thread) so my if you are blocked reading a BTChan when another thread discards it's reference then you die.

Here is another solution: require all accesses to the the bounded channel duplicate to be bracketed by a function that releases its resources on exit (by an exception or normally). You can use a monad with a rank-2 runner to prevent duplicated channels from leaking out. It's still manual, but the type system makes it a lot harder to do naughty things.
You really don't want to rely on true IO finalizers, because GHC gives no guarantees about when a finalizer may be run: for all you know it may wait until the end of the program before running the finalizer, which means you're deadlocked until then.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string