Tying the Knot with a State monad

Tying the Knot with a State monad - haskell

I'm working on a Haskell project that involves tying a big knot: I'm parsing a serialized representation of a graph, where each node is at some offset into the file, and may reference another node by its offset. So I need to build up a map from offsets to nodes while parsing, which I can feed back to myself in a do rec block.
I have this working, and kinda-sorta-reasonably abstracted into a StateT-esque monad transformer:
{-# LANGUAGE DoRec, GeneralizedNewtypeDeriving #-}
import qualified Control.Monad.State as S
data Knot s = Knot { past :: s, future :: s }
newtype RecStateT s m a = RecStateT (S.StateT (Knot s) m a) deriving
( Alternative
, Applicative
, Functor
, Monad
, MonadCont
, MonadError e
, MonadFix
, MonadIO
, MonadPlus
, MonadReader r
, MonadTrans
, MonadWriter w )
runRecStateT :: RecStateT s m a -> Knot s -> m (a, Knot s)
runRecStateT (RecStateT st) = S.runStateT st
tie :: MonadFix m => RecStateT s m a -> s -> m (a, s)
tie m s = do
rec (a, Knot s' _) <- runRecStateT m (Knot s s')
return (a, s')
get :: Monad m => RecStateT s m (Knot s)
get = RecStateT S.get
put :: Monad m => s -> RecStateT s m ()
put s = RecStateT $ S.modify $ \ ~(Knot _ s') -> Knot s s'
The tie function is where the magic happens: the call to runRecStateT produces a value and a state, which I feed it as its own future. Note that get allows you to read from both the past and future states, but put only allows you to modify the "present."
Question 1: Does this seem like a decent way to implement this knot-tying pattern in general? Or better still, has somebody implemented a general solution to this, that I overlooked when snooping through Hackage? I beat my head against the Cont monad for a while, since it seemed possibly more elegant (see similar post from Dan Burton), but I just couldn't work it out.
Totally subjective Question 2: I'm not totally thrilled with the way my calling code ends up looking:
do
Knot past future <- get
let {- ... -} = past
{- ... -} = future
node = {- ... -}
put $ {- ... -}
return node
Implementation details here omitted, obviously, the important point being that I have to get the past and future state, pattern-match them inside a let binding (or explicitly make the previous pattern lazy) to extract whatever I care about, then build my node, update my state and finally return the node. Seems unnecessarily verbose, and I particularly dislike how easy it is to accidentally make the pattern that extracts the past and future states strict. So, can anybody think of a nicer interface?

I've been playing around with stuff, and I think I've come up with something... interesting. I call it the "Seer" monad, and it provides (aside from Monad operations) two primitive operations:
see :: Monoid s => Seer s s
send :: Monoid s => s -> Seer s ()
and a run operation:
runSeer :: Monoid s => Seer s a -> a
The way this monad works is that see allows a seer to see everything, and send allows a seer to "send" information to all other seers for them to see. Whenever any seer performs the see operation, they are able to see all of the information that has been sent, and all of the information that will be sent. In other words, within a given run, see will always produce the same result no matter where or when you call it. Another way of saying it is that see is how you get a working reference to the "tied" knot.
This is actually very similar to just using fix, except that all of the sub-parts are added incrementally and implicitly, rather than explicitly. Obviously, seers will not work correctly in the presence of a paradox, and sufficient laziness is required. For example, see >>= send may cause an explosion of information, trapping you in a time loop.
A dumb example:
import Control.Seer
import qualified Data.Map as M
import Data.Map (Map, (!))
bar :: Seer (Map Int Char) String
bar = do
m <- see
send (M.singleton 1 $ succ (m ! 2))
send (M.singleton 2 'c')
return [m ! 1, m ! 2]
As I said, I've just been toying around, so I have no idea if this is any better than what you've got, or if it's any good at all! But it's nifty, and relevant, and if your "knot" state is a Monoid, then it just might be useful to you. Fair warning: I built Seer by using a Tardis.
https://github.com/DanBurton/tardis/blob/master/Control/Seer.hs

I wrote up an article on this topic at entitled Assembly: Circular Programming with Recursive do where I describe two methods for building an assembler using knot tying. Like your problem, an assembler has to be able to resolve address of labels that may occur later in the file.

Regarding the implementation, I would make it a composition of a Reader monad (for the future) and a State monad (for past/present). The reason is that you set your future only once (in tie) and then don't change it.
{-# LANGUAGE DoRec, GeneralizedNewtypeDeriving #-}
import Control.Monad.State
import Control.Monad.Reader
import Control.Applicative
newtype RecStateT s m a = RecStateT (StateT s (ReaderT s m) a) deriving
( Alternative
, Applicative
, Functor
, Monad
, MonadPlus
)
tie :: MonadFix m => RecStateT s m a -> s -> m (a, s)
tie (RecStateT m) s = do
rec (a, s') <- flip runReaderT s' $ flip runStateT s m
return (a, s')
getPast :: Monad m => RecStateT s m s
getPast = RecStateT get
getFuture :: Monad m => RecStateT s m s
getFuture = RecStateT ask
putPresent :: Monad m => s -> RecStateT s m ()
putPresent = RecStateT . put
Regarding your second question, it'd help to know your dataflow (i.e. to have a minimal example of your code). It's not true that strict patterns always lead to loops. It's true that you need to be careful so as not to create a non-producing loop, but the exact restrictions depend on what and how you're building.

I had a similar problem recently, but I chose a different approach. A recursive data structure can be represented as a type fixed point on a data type functor. Loading data can be then split into two parts:
Load the data into a structure that references other nodes only by some kind of identifier. In the example it's Loader Int (NodeF Int), which constructs a map of values of type NodeF Int Int.
Tie the knot by creating a recursive data structure by replacing the identifiers with actual data. In the example the resulting data structures have type Fix (NodeF Int), and they are later converted to Node Int for convenience.
It's lacking a proper error handling etc., but the idea should be clear from that.
-- Public Domain
import Control.Monad
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe (fromJust)
-- Fixed point operator on types and catamohism/anamorphism methods
-- for constructing/deconstructing them:
newtype Fix f = Fix { unfix :: f (Fix f) }
catam :: Functor f => (f a -> a) -> (Fix f -> a)
catam f = f . fmap (catam f) . unfix
anam :: Functor f => (a -> f a) -> (a -> Fix f)
anam f = Fix . fmap (anam f) . f
anam' :: Functor f => (a -> f a) -> (f a -> Fix f)
anam' f = Fix . fmap (anam f)
-- The loader itself
-- A representation of a loader. Type parameter 'k' represents the keys by
-- which the nodes are represented. Type parameter 'v' represents a functor
-- data type representing the values.
data Loader k v = Loader (Map k (v k))
-- | Creates an empty loader.
empty :: Loader k v
empty = Loader $ Map.empty
-- | Adds a new node into a loader.
update :: (Ord k) => k -> v k -> Loader k v -> Loader k v
update k v = update' k (const v)
-- | Modifies a node in a loader.
update' :: (Ord k) => k -> (Maybe (v k) -> (v k)) -> Loader k v -> Loader k v
update' k f (Loader m) = Loader $ Map.insertWith (const (f . Just)) k (f Nothing) $ m
-- | Does the actual knot-tying. Creates a new data structure
-- where the references to nodes are replaced by the actual data.
tie :: (Ord k, Functor v) => Loader k v -> Map k (Fix v)
tie (Loader m) = Map.map (anam' $ \k -> fromJust (Map.lookup k m)) m
-- -----------------------------------------------------------------
-- Usage example:
data NodeF n t = NodeF n [t]
instance Functor (NodeF n) where
fmap f (NodeF n xs) = NodeF n (map f xs)
-- A data structure isomorphic to Fix (NodeF n), but easier to work with.
data Node n = Node n [Node n]
deriving Show
-- The isomorphism that does the conversion.
nodeunfix :: Fix (NodeF n) -> Node n
nodeunfix = catam (\(NodeF n ts) -> Node n ts)
main :: IO ()
main = do
-- Each node description consist of an integer ID and a list of other nodes
-- it references.
let lss =
[ (1, [4])
, (2, [1])
, (3, [2, 1])
, (4, [3, 2, 1])
, (5, [5])
]
print lss
-- Fill a new loader with the data:
let
loader = foldr f empty lss
f (label, dependsOn) = update label (NodeF label dependsOn)
-- Tie the knot:
let tied' = tie loader
-- And convert Fix (NodeF n) into Node n:
let tied = Map.map nodeunfix tied'
-- For each node print the label of the first node it references
-- and the count of all referenced nodes.
print $ Map.map (\(Node n ls#((Node n1 _) : _)) -> (n1, length ls)) tied

I'm kind of overwhelmed by the amount of Monad usage.
I might not understand the past/future things, but I guess you are just trying to express the lazy+fixpoint binding. (Correct me if I'm wrong.)
The RWS Monad usage with R=W is kind of funny, but you do not need the State and the loop, when you can do the same with fmap. There is no point in using Monads if they do not make things easier. (Only very few Monads represent chronological order, anyway.)
My general solution to tying the knot:
I parse everything to a List of nodes,
convert that list to a Data.Vector for O(1) access to boxed (=lazy) values,
bind that result to a name using let or the fix or mfix function,
and access that named Vector inside the parser. (see 1.)
That example solution in your blog, where you write sth. like this:
data Node = Node {
value :: Int,
next :: Node
} deriving Show
…
tie = …
parse = …
data ParserState = …
…
example :: Node
example =
let (_, _, m) = tie parse $ ParserState 0 [(0, 1), (1, 2), (2, 0)]
in (m Map.! 0)
I would have written this way:
{-# LANGUAGE ViewPatterns, NamedFieldPuns #-}
import Data.Vector as Vector
example :: Node
example =
let node :: Int -> Node
node = (Vector.!) $ Vector.fromList $
[ Node{value,next}
| (value,node->next) <- [(0, 1), (1, 2), (2, 0)]
]
in (node 0)
or shorter:
{-# LANGUAGE ViewPatterns, NamedFieldPuns #-}
import Data.Vector as Vector
example :: Node
example = (\node->(Vector.fromList[ Node{value,next}
| (value,node->next) <- [(0, 1), (1, 2), (2, 0)]
] Vector.!)) `fix` 0

Related

Avoiding thunks in sparsely evaluated list generated by monadic unfold

I have a simulation library that uses the FFI wrapped in a monad M, carrying a context. All the foreign functions are pure, so I've decided to make the monad lazy, which is normally convenient for flow-control. I represent my simulation as a list of simulation-frames, that I can consume by either writing to a file, or by displaying the frame graphically.
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame
= step frame >>= fmap (frame:) . simulation steps
Each frame consists of a tuple of newtype-wrapped ForeignPtrs that I can lift to my Haskell representation with
lift :: Frame -> M HFrame
Since the time-steps in my simulation are quite short, I only want to look at every n frames, for which I use
takeEvery n l = foldr cons nil l 0 where
nil _ = []
cons x rest 0 = x : rest n
cons x rest n = rest (n-1)
So my code looks something like
main = consume
$ takeEvery n
$ runM
$ simulation steps initialFrame >>= mapM lift
Now, the problem is that as I increase n, a thunk builds up. I've tried a couple of different ways to try to strictly evaluate each frame in simulation, but I have yet to figure out how to do so. ForeignPtr doesn't appear to have a NFData instance, so I can't use deepseq, but all my attempts with seq, including using seq on each element in the tuple, have been without noticeable effect.
EDIT:
Upon request, I have included more specifics, that I initially excluded since I think they are probably mostly noise for this question.
The monad
newtype FT c a = FT (Context -> a)
instance Functor (FT c) where
fmap f (FT a) = FT (f.a)
instance Applicative (FT c) where
pure a = FT (\_ -> a)
(<*>) (FT a) (FT b) = FT (\c -> a c $ b c)
instance Monad (FT c) where
return = pure
(>>=) (FT a) f = FT (\c -> (\(FT b) -> b c) $ f $ a c)
runFTIn :: Context -> (forall c. FT c a) -> a
runFTIn context (FT a) = a context
runFTWith :: [ContextOption] -> (forall c. FT c a) -> a
runFTWith options a
= unsafePerformIO
$ getContext options >>= \c -> return $ runFTIn c a
runFT = runFTWith []
unsafeLiftFromIO :: (Context -> IO a) -> FT c a
unsafeLiftFromIO a = FT (\c -> unsafePerformIO $ a c)
All the foreign functions are lifted from IO with unsafeLiftFromIO
newtype Box c = Box (ForeignPtr RawBox)
newtype Coordinates c = Coordinates (ForeignPtr RawCoordinates)
type Frame c = (Box c, Coordinates c)
liftBox :: Box c -> FT c HBox
liftCoordinates :: Coordinates c -> FT c HCoordinates
liftFrame (box, coordinates) = do
box' <- liftBox box
coordinates' <- liftCoordinates coordinates
return (box', coordinates')
The steps themselves are supposed to be arbitrary (Frame c -> FT c (Frame c)), so strictness should preferably be in the higher level code.
EDIT2:
I have now tried to use Streamly, however the problem persists, so I think the issue really is finding a way to strictly evaluate ForeignPtrs.
current implementations:
import Streamly
import qualified Streamly.Prelude as S
import qualified Streamly.Internal.Data.Stream.Serial as Serial
takeEvery n = Serial.unfoldrM ((fmap.fmap) (\(h, t) -> (h, S.drop (n-1) t)) . S.uncons)
(#) = flip ($)
simulation
:: (IsStream t)
=> Frame c
-> t (FT c) (Frame c -> FT c (Frame c))
-> t (FT c) (Frame c)
simulation frame = S.scanlM' (#) frame
EDIT3:
To clarify the symptoms and how I have diagnosed the problem.
The library calls OpenCL functions running on a GPU. I am sure that the freeing of the pointers is handled correctly - the ForeignPtrs have the correct freeing functions, and memory use is independent of total number of steps as long as this number is larger than n. What I find is that memory use on the GPU is basically linearly correlated to n. The consumer I've been using for this testing is
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Put
writeTrajectory fn = fmap (BL.writeFile fn . runPut) . S.foldr ((>>).putFrame) (pure ()) . serially
For my streamly implementation, and
writeTrajectory fn = BL.writeFile fn . runPut . MapM_ putFrame
For the original implementation. Both should consume the stream continuously. I've generated the steps for testing with replicate.
I am unsure of how to more precisely analyze the memory-use on the GPU. System memory use is not an issue here.
Update:
I am starting to think it's not a matter of strictness, but of GC-problems. The run-time system does not know the size of the memory allocated on the GPU and so does not know to collect the pointers, this is less of an issue when there is stuff going on CPU-side as well, as that will produce allocations too, activating the GC. This would explain the slightly non-determinstic memory usage, but linear correlation to n that I've seen. How too solve this nicely is another issue, but I suspect there will be a substantial overhaul to my code.

I think the issue really is finding a way to strictly evaluate ForeignPtrs
If that is really the issue, one way to do that is to change the second clause of simulation:
{-# LANGUAGE BangPatterns #-}
simulation :: [(Frame -> M Frame)] -> Frame -> M [Frame]
simulation [] frame = return [frame]
simulation (step:steps) frame#(!_, !_) -- Evaluate both components of the pair
= step frame >>= fmap (frame:) . simulation steps

Creating a result piecewise from stateful computation, with good ergonomics

I'd like to write a function
step :: State S O
where O is a record type:
data O = MkO{ out1 :: Int, out2 :: Maybe Int, out3 :: Maybe Bool }
The catch is that I'd like to assemble my O output piecewise. What I mean by that, is that at various places along the definition of step, I learn then and there that e.g. out2 should be Just 3, but I don't know in a non-convoluted way what out1 and out3 should be. Also, there is a natural default value for out1 that can be computed from the end state; but there still needs to be the possibility to override it in step.
And, most importantly, I want to "librarize" this, so that users can provide their own S and O types, and I give them the rest.
My current approach is to wrap everything in a WriterT (HKD O Last) using Higgledy's automated way of creating a type HKD O Last which is isomorphic to
data OLast = MkOLast{ out1' :: Last Int, out2' :: Last (Maybe Int), out3' :: Last (Maybe String) }
This comes with the obvious Monoid instance, so I can, at least morally, do the following:
step = do
MkOLast{..} <- execWriterT step'
s <- get
return O
{ out1 = fromMaybe (defaultOut1 s) $ getLast out1'
, out2 = getLast out2'
, out3 = fromMaybe False $ getLast out3'
}
step' = do
...
tell mempty{ out2' = pure $ Just 42 }
...
tell mempty{ out1' = pure 3 }
This is code I could live with.
The problem is that I can only do this morally. In practice, what I have to write is quite convoluted code because Higgledy's HKD O Last exposes record fields as lenses, so the real code ends up looking more like the following:
step = do
oLast <- execWriterT step'
s <- get
let def = defaultOut s
return $ runIdentity . construct $ bzipWith (\i -> maybe i Identity . getLast) (deconstruct def) oLast
step' = do
...
tell $ set (field #"out2") (pure $ Just 42) mempty
...
tell $ set (field #"out3") (pure 3) mempty
The first wart in step we can hide away behind a function:
update :: (Generic a, Construct Identity a, FunctorB (HKD a), ProductBC (HKD a)) => a -> HKD a Last -> a
update initial edits = runIdentity . construct $ bzipWith (\i -> maybe i Identity . getLast) (deconstruct initial) edits
so we can "librarize" that as
runStep
:: (Generic o, Construct Identity o, FunctorB (HKD o), ProductBC (HKD o))
=> (s -> o) -> WriterT (HKD o Last) (State s) () -> State s o
runStep mkDef step = do
let updates = execWriterT step s
def <- gets mkDef
return $ update def updates
But what worries me are the places where partial outputs are recorded. So far, the best I've been able to come up with is to use OverloadedLabels to provide #out2 as a possible syntax:
instance (HasField' field (HKD a f) (f b), Applicative f) => IsLabel field (b -> Endo (HKD a f)) where
fromLabel x = Endo $ field #field .~ pure x
output :: (Monoid (HKD o Last)) => Endo (HKD o Last) -> WriterT (HKD o Last) (State s) ()
output f = tell $ appEndo f mempty
this allows end-users to write step' as
step' = do
...
output $ #out2 (Just 42)
...
output $ #out3 3
but it's still a bit cumbersome; moreover, it uses quite a lot of heavy machinery behind the scenes. Especially given that my use case is such that all the library internals would need to be explained step-by-step.
So, what I am looking for are improvements in the following areas:
Simpler internal implementation
Nicer API for end-users
I'd be happy with a completely different approach from first principles as well, as long as it doesn't require the user to define their own OLast next to O...

The following is not a very satisfactory solution because it's still complex and the type errors are horrific, but it tries to achieve two things:
Any attempt to "complete" the construction of the record without having specified all mandatory fields results in a type error.
"there is a natural default value for out1 that can be computed from the end state; but there still needs to be the possibility to override it"
The solution does away with the State monad. Instead, there's an extensible record to which new fields are progressively added—therefore changing its type—until it is "complete".
We use the red-black-record, sop-core (these for HKD-like functionality) and transformers (for the Reader monad) packages.
Some necessary imports:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE PartialTypeSignatures #-}
{-# OPTIONS_GHC -Wno-partial-type-signatures #-}
import Data.RBR (Record,unit,FromRecord(fromRecord),ToRecord,RecordCode,
Productlike,fromNP,toNP,ProductlikeSubset,projectSubset,
FromList,
Insertable,Insert,insert) -- from "red-black-record"
import Data.SOP (I(I),unI,NP,All,Top) -- from "sop-core"
import Data.SOP.NP (sequence_NP)
import Data.Function (fix)
import Control.Monad.Trans.Reader (Reader,runReader,reader)
import qualified GHC.Generics
The datatype-generic machinery:
specify :: forall k v t r. Insertable k v t
=> v -> Record (Reader r) t -> Record (Reader r) (Insert k v t)
specify v = insert #k #v #t (reader (const v))
close :: forall r subset subsetflat whole . _ => Record (Reader r) whole -> r
close = fixRecord #r #subsetflat . projectSubset #subset #whole #subsetflat
where
fixRecord
:: forall r flat. (FromRecord r, Productlike '[] (RecordCode r) flat, All Top flat)
=> Record (Reader r) (RecordCode r)
-> r
fixRecord = unI . fixHelper I
fixHelper
:: forall r flat f g. _
=> (NP f flat -> g (NP (Reader r) flat))
-> Record f (RecordCode r)
-> g r
fixHelper adapt r = do
let moveFunctionOutside np = runReader . sequence_NP $ np
record2record np = fromRecord . fromNP <$> moveFunctionOutside np
fix . record2record <$> adapt (toNP r)
specify adds a field to an extensible HKD-like record where each field is actually a function from the completed record to the type of the field in the completed record. It inserts the field as a constant function. It can also override existing default fields.
close takes an extensible record constructed with specify and "ties the knot", returning the completed non-HKD record.
Here's code that must be written for each concrete record:
data O = MkO { out1 :: Int, out2 :: Maybe Int, out3 :: Maybe Bool }
deriving (GHC.Generics.Generic, Show)
instance FromRecord O
instance ToRecord O
type ODefaults = FromList '[ '("out1",Int) ]
odefaults :: Record (Reader O) ODefaults
odefaults =
insert #"out1" (reader $ \r -> case out2 r of
Just i -> succ i
Nothing -> 0)
$ unit
In odefaults we specify overrideable default values for some fields, which are calculated by inspecting the "completed" record (this works because we later tie the knot with close.)
Putting it all to work:
example1 :: O
example1 =
close
. specify #"out3" (Just False)
. specify #"out2" (Just 0)
$ odefaults
example2override :: O
example2override =
close
. specify #"out1" (12 :: Int)
. specify #"out3" (Just False)
. specify #"out2" (Just 0)
$ odefaults
main :: IO ()
main =
do print $ example1
print $ example2override
-- result:
-- MkO {out1 = 1, out2 = Just 0, out3 = Just False}
-- MkO {out1 = 12, out2 = Just 0, out3 = Just False}

Here's what I am currently using for this: basically the same Barbies-based technique from my original question, but using barbies-th and lens to create properly named field lenses.
I am going to illustrate it with an example. Suppose I want to collect this result:
data CPUOut = CPUOut
{ inputNeeded :: Bool
, ...
}
Create Barbie for CPUOut using barbies-th, add _ prefix to field names, and use lens's makeLenses TH macro to generate field accessors:
declareBareB [d|
data CPUOut = CPUOut
{ _inputNeeded :: Bool
, ...
} |]
makeLenses ''CPUState
Write update s.t. it works on partial values that are wrapped in the Barbie newtype wrapper:
type Raw b = b Bare Identity
type Partial b = Barbie (b Covered) Last
update
:: (BareB b, ApplicativeB (b Covered))
=> Raw b -> Partial b -> Raw b
update initials edits =
bstrip $ bzipWith update1 (bcover initials) (getBarbie edits)
where
update1 :: Identity a -> Last a -> Identity a
update1 initial edit = maybe initial Identity (getLast edit)
The role of the Barbie wrapper is that Barbie b f has a Monoid instance if only all the fields of b f are monoids themselves. This is exactly the case for Partial CPUOut, so that is what we are going to be collecting in our WriterT:
type CPU = WriterT (Partial CPUOut) (State CPUState)
Write the generic output assignment combinator. This is what makes it nicer than the approach in the original question, because the Setter's are properly named field accessor lenses, not overloaded labels:
(.:=)
:: (Applicative f, MonadWriter (Barbie b f) m)
=> Setter' (b f) (f a) -> a -> m ()
fd .:= x = scribe (iso getBarbie Barbie . fd) (pure x)
Example use:
startInput :: CPU ()
startInput = do
inputNeeded .:= True
phase .= WaitInput

Haskell's Scrap Your Boilerplate (SYB) - applying transformation only once instead of everywhere

What's the best way to apply a transformation to a tree only once instead of everywhere using SYB? For instance, in the following simplified expression, there are several instances of Var "x", and I want to replace the first instance with Var "y" only.
data Exp = Var String
| Val Int
| Plus Exp Exp
|...
myExp = Val 5 `Plus` Var "x" `Plus` Val 5 `Plus` Var "x" ...
This can't be done using the everywhere combinator since it will try to transform all instances of Var "x" to Var "y".
EDIT (after posting): Looks like somewhere is what I am looking for.

Being a SYB beginner myself, my answer is more like a guess, but seems to work.
Combinator somewhere recommended by Neil Brown probably doesn't do exactly what you want. It's defined as
-- | Apply a monadic transformation at least somewhere
somewhere :: MonadPlus m => GenericM m -> GenericM m
-- We try "f" in top-down manner, but descent into "x" when we fail
-- at the root of the term. The transformation fails if "f" fails
-- everywhere, say succeeds nowhere.
--
somewhere f x = f x `mplus` gmapMp (somewhere f) x
where
-- | Transformation of at least one immediate subterm does not fail
gmapMp :: forall m. MonadPlus m => (forall d. Data d => d -> m d) -> a -> m a
But we need to transform at most once. For this it seems that gmapMo will be better:
-- | Transformation of one immediate subterm with success
gmapMo :: forall m. MonadPlus m => (forall d. Data d => d -> m d) -> a -> m a
So I made my own combinator:
{-# LANGUAGE DeriveDataTypeable, RankNTypes #-}
import Control.Monad
import Data.Maybe (fromMaybe)
import Data.Data
import Data.Typeable (Typeable)
import Data.Generics.Schemes
import Data.Generics.Aliases
-- | Apply a monadic transformation once.
once :: MonadPlus m => GenericM m -> GenericM m
once f x = f x `mplus` gmapMo (once f) x
If the substitution fails, it returns mzero, otherwise it returns the substituted result. If you don't care if the substitution fails (no matches), you could use something like
once' :: (forall a. Data a => a -> Maybe a) -> (forall a. Data a => a -> a)
once' f x = fromMaybe x (once f x)
With these, we can do some replacements:
data Exp = Var String | Val Int | Plus Exp Exp
deriving (Show, Typeable, Data)
myExp = Val 5 `Plus` Var "x" `Plus` Val 5 `Plus` Var "x"
replM :: (MonadPlus m) => Exp -> m Exp
replM (Var "x") = return $ Var "y"
replM t = mzero
main = do
-- `somewhere` doesn't do what we want:
print $ (somewhere (mkMp replM) myExp :: Maybe Exp)
-- returns `Just ..` if the substitution succeeds once,
-- Nothing otherwise.
print $ (once (mkMp replM) myExp :: Maybe Exp)
-- performs the substitution once, if possible.
print $ (once' (mkMp replM) myExp :: Exp)
-- Just for kicks, this returns all possible substitutions
-- where one `Var "x"` is replaced by `Var "y"`.
print $ (once (mkMp replM) myExp :: [Exp])

Yes, I think somewhere (mkMp mySpecificFunction) should do it, if you use a MonadPlus monad and make it succeed when it finds what you're looking for.
A flexible but hacky alternative is to use everywhereM with a State monad that can store a Boolean (or store Maybe MyFunc or whatever) and apply the transformation depending on the state being True or Just myFunc -- that way, when you are done (e.g. after applying the transformation once), you just alter the state to be False/Nothing.

Is there an elegant way to have functions return functions of the same type (in a tuple)

I'm using haskell to implement a pattern involving functions that return a value, and themselves (or a function of the same type). Right now I've implemented this like so:
newtype R a = R (a , a -> R a)
-- some toy functions to demonstrate
alpha :: String -> R String
alpha str
| str == reverse str = R (str , omega)
| otherwise = R (reverse str , alpha)
omega :: String -> R String
omega (s:t:r)
| s == t = R (s:t:r , alpha)
| otherwise = R (s:s:t:r , omega)
The driving force for these types of functions is a function called cascade:
cascade :: (a -> R a) -> [a] -> [a]
cascade _ [] = []
cascade f (l:ls) = el : cascade g ls where
R (el , g) = f l
Which takes a seed function and a list, and returns a list created by applying the seed function to the first element of the list, applying the function returned by that to the second element of the list, and so on and so forth.
This works--however, in the process of using this for slightly more useful things, I noticed that a lot of times I had the basic units of which are functions that returned functions other than themselves only rarely; and explicitly declaring a function to return itself was becoming somewhat tedious. I'd rather be able to use something like a Monad's return function, however, I have no idea what bind would do for functions of these types, especially since I never intended these to be linked with anything other than the function they return in the first place.
Trying to shoehorn this into a Monad started worrying me about whether or not what I was doing was useful, so, in short, what I want to know is:
Is what I'm doing a Bad Thing? if not,
Has what I'm doing been done before/am I reinventing the wheel here? if not,
Is there an elegant way to do this, or have I already reached this and am being greedy by wanting some kind of return analogue?
(Incidentally, besides, 'functions that return themeselves' or 'recursive data structure (of functions)', I'm not quite sure what this kind of pattern is called, and has made trying to do effective research in it difficult--if anyone could give me a name for this pattern (if it indeed has one), that alone would be very helpful)

As a high-level consideration, I'd say that your type represents a stateful stream transformer. What's a bit confusing here is that your type is defined as
newtype R a = R (a , a -> R a)
instead of
newtype R a = R (a -> (R a, a))
which would be a bit more natural in the streaming context because you typically don't "produce" something if you haven't received anything yet. Your functions would then have simpler types too:
alpha, omage :: R String
cascade :: R a -> [a] -> [a]
If we try to generalize this idea of a stream transformer, we soon realize that the case where we transform a list of as into a list of as is just a special case. With the proper infrastructure in place we could just as well produce a list of bs. So we try to generalize the type R:
newtype R a b = R (a -> (R a b, b))
I've seen this kind of structure being called a Circuit, which happens to be a full-blown arrow. Arrows are a generalization of the concept of functions and are an even more powerful construct than monads. I can't pretend to understand the category-theoretical background, but it's definitely interesting to play with them. For example, the trivial transformation is just Cat.id:
import Control.Category
import Control.Arrow
import Prelude hiding ((.), id)
import qualified Data.List as L
-- ... Definition of Circuit and instances
cascade :: Circuit a b -> [a] -> [b]
cascade cir = snd . L.mapAccumL unCircuit cir
--
ghci> cascade (Cat.id) [1,2,3,4]
[1,2,3,4]
We can also simulate state by parameterizing the circuit we return as the continuation:
countingCircuit :: (a -> b) -> Circuit a (Int, b)
countingCircuit f = cir 0
where cir i = Circuit $ \x -> (cir (i+1), (i, f x))
--
ghci> cascade (countingCircuit (+5)) [10,3,2,11]
[(0,15),(1,8),(2,7),(3,16)]
And the fact that our circuit type is a category gives us a nice way to compose circuits:
ghci> cascade (countingCircuit (+5) . arr (*2)) [10,3,2,11]
[(0,25),(1,11),(2,9),(3,27)]

It looks like what you have is a simplified version of a stream. That is to
say, a representation of an infinite stream of values. I don't think you can
easily define this as a monad, because you use the same type for your seed as
for your elements, which makes defining fmap difficult (it seems that you
would need to invert the function provided to fmap so as to be able to
recover the seed). You can make this a monad by making the seed type
independent of the element type like so
{-# LANGUAGE ExistentialQuantification #-}
data Stream a = forall s. Stream a s (s -> Stream a)
This will allow you to define a Functor and Monad instance as follows
unfold :: (b -> (a, b)) -> b -> Stream a
unfold f b = Stream a b' (unfold f)
where (a, b') = f b
shead :: Stream a -> a
shead (Stream a _ _) = a
stail :: Stream a -> Stream a
stail (Stream _ b f) = f b
diag :: Stream (Stream a) -> Stream a
diag = unfold f
where f str = (shead $ shead str, stail $ fmap stail str)
sjoin :: Stream (Stream a) -> Stream a
sjoin = diag
instance Functor Stream where
fmap f (Stream a b g) = Stream (f a) b (fmap f . g)
instance Monad Stream where
return = unfold (\x -> (x, x))
xs >>= f = diag $ fmap f xs
Note that this only obeys the Monad laws when viewed as a set, as it does not
preserve element ordering.
This explanation
of the stream monad uses infinite lists, which works just as well in Haskell
since they can be generated in a lazy fashion. If you check out the
documentation for the Stream type in the vector library, you will
find a more complicated version, so that it can be used in efficient stream fusion.

I don't have much to add, except to note that your cascade function can be written as a left fold (and hence also as a right fold, though I haven't done the transformation.)
cascade f = reverse . fst . foldl func ([], f)
where
func (rs,g) s = let R (r,h) = g s in (r:rs,h)

Combining multiple states in StateT

I am writing a program that runs as a daemon.
To create the daemon, the user supplies a set of
implementations for each of the required classes (one of them is a database)
All of these classes have functions have
type signatures of the form StateT s IO a,
but s is different for each class.
Suppose each of the classes follows this pattern:
import Control.Monad (liftM)
import Control.Monad.State (StateT(..), get)
class Hammer h where
driveNail :: StateT h IO ()
data ClawHammer = MkClawHammer Int -- the real implementation is more complex
instance Hammer ClawHammer where
driveNail = return () -- the real implementation is more complex
-- Plus additional classes for wrenches, screwdrivers, etc.
Now I can define a record that represents the implementation chosen by
the user for each "slot".
data MultiTool h = MultiTool {
hammer :: h
-- Plus additional fields for wrenches, screwdrivers, etc.
}
And the daemon does most of its work in the StateT (MultiTool h ...) IO ()
monad.
Now, since the multitool contains a hammer, I can use it in any situation
where a hammer is needed. In other words, the MultiTool type
can implement any of the classes it contains, if I write code like this:
stateMap :: Monad m => (s -> t) -> (t -> s) -> StateT s m a -> StateT t m a
stateMap f g (StateT h) = StateT $ liftM (fmap f) . h . g
withHammer :: StateT h IO () -> StateT (MultiTool h) IO ()
withHammer runProgram = do
t <- get
stateMap (\h -> t {hammer=h}) hammer runProgram
instance Hammer h => Hammer (MultiTool h) where
driveNail = withHammer driveNail
But the implementations of withHammer, withWrench, withScrewdriver, etc.
are basically identical. It would be nice to be able to write something
like this...
--withMember accessor runProgram = do
-- u <- get
-- stateMap (\h -> u {accessor=h}) accessor runProgram
-- instance Hammer h => Hammer (MultiTool h) where
-- driveNail = withMember hammer driveNail
But of course that won't compile.
I suspect my solution is too object-oriented.
Is there a better way?
Monad transformers, maybe?
Thank you in advance for any suggestions.

If you want to go with a large global state like in your case, then what you want to use is lenses, as suggested by Ben. I too recommend Edward Kmett's lens library. However, there is another, perhaps nicer way.
Servers have the property that the program runs continuously and performs the same operation over a state space. The trouble starts when you want to modularize your server, in which case you want more than just some global state. You want modules to have their own state.
Let's think of a module as something that transforms a Request to a Response:
Module :: (Request -> m Response) -> Module m
Now if it has some state, then this state becomes noticable in that the module might give a different answer the next time. There are a number of ways to do this, for example the following:
Module :: s -> ((Request, s) -> m (Response s)) -> Module m
But a much nicer and equivalent way to express this is the following constructor (we will build a type around it soon):
Module :: (Request -> m (Response, Module m)) -> Module m
This module maps a request to a response, but along the way also returns a new version of itself. Let's go further and make requests and responses polymorphic:
Module :: (a -> m (b, Module m a b)) -> Module m a b
Now if the output type of a module matches another module's input type, then you can compose them like regular functions. This composition is associative and has a polymorphic identity. This sounds a lot like a category, and in fact it is! It is a category, an applicative functor and an arrow.
newtype Module m a b =
Module (a -> m (b, Module m a b))
instance (Monad m) => Applicative (Module m a)
instance (Monad m) => Arrow (Module m)
instance (Monad m) => Category (Module m)
instance (Monad m) => Functor (Module m a)
We can now compose two modules that have their own individual local state without even knowing about it! But that's not sufficient. We want more. How about modules that can be switched among? Let's extend our little module system such that modules can actually choose not to give an answer:
newtype Module m a b =
Module (a -> m (Maybe b, Module m a b))
This allows another form of composition that is orthogonal to (.): Now our type is also a family of Alternative functors:
instance (Monad m) => Alternative (Module m a)
Now a module can choose whether to respond to a request, and if not, the next module will be tried. Simple. You have just reinvented the wire category. =)
Of course you don't need to reinvent this. The Netwire library implements this design pattern and comes with a large library of predefined "modules" (called wires). See the Control.Wire module for a tutorial.

Here's a concrete example of how to use lens like everybody else is talking about. In the following code example, Type1 is the local state (i.e. your hammer), and Type2 is the global state (i.e. your multitool). lens provides the zoom function which lets you run a localized state computation that zooms in on any field defined by a lens:
import Control.Lens
import Control.Monad.Trans.Class (lift)
import Control.Monad.Trans.State
data Type1 = Type1 {
_field1 :: Int ,
_field2 :: Double}
field1 :: SimpleLens Type1 Int
field1 = lens _field1 (\x a -> x { _field1 = a})
field2 :: SimpleLens Type1 Double
field2 = lens _field2 (\x a -> x { _field2 = a})
data Type2 = Type2 {
_type1 :: Type1 ,
_field3 :: String}
type1 :: SimpleLens Type2 Type1
type1 = lens _type1 (\x a -> x { _type1 = a})
field3 :: SimpleLens Type2 String
field3 = lens _field3 (\x a -> x { _field3 = a})
localCode :: StateT Type1 IO ()
localCode = do
field1 += 3
field2 .= 5.0
lift $ putStrLn "Done!"
globalCode :: StateT Type2 IO ()
globalCode = do
f1 <- zoom type1 $ do
localCode
use field1
field3 %= (++ show f1)
f3 <- use field3
lift $ putStrLn f3
main = runStateT globalCode (Type2 (Type1 9 4.0) "Hello: ")
zoom is not limited to immediate sub-fields of a type. Since lenses are composable, you can zoom as deep as you want in a single operation just by doing something like:
zoom (field1a . field2c . field3b . field4j) $ do ...

This sounds very much like an application of lenses.
Lenses are a specification of a sub-field of some data. The idea is you have some value toolLens and functions view and set so that view toolLens :: MultiTool h -> h fetches the tool and set toolLens :: MultiTool h -> h -> MultiTool h replaces it with a new value. Then you can easily define your withMember as a function just accepting a lens.
Lens technology has advanced a great deal recently, and they are now incredibly capable. The most powerful library around at the time of writing is Edward Kmett's lens library, which is a bit much to swallow, but pretty simple once you find the features you want. You can also search for more questions about lenses here on SO, e.g. Functional lenses which links to lenses, fclabels, data-accessor - which library for structure access and mutation is better, or the lenses tag.

I created a lensed extensible record library called data-diverse-lens which allows combining multiple ReaderT (or StateT) like this gist:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeApplications #-}
module Main where
import Control.Lens
import Control.Monad.Reader
import Control.Monad.State
import Data.Diverse.Lens
import Data.Semigroup
foo :: (MonadReader r m, HasItem' Int r, HasItem' String r) => m (Int, String)
foo = do
i <- view (item' #Int) -- explicitly specify type
s <- view item' -- type can also be inferred
pure (i + 10, s <> "bar")
bar :: (MonadState s m, HasItem' Int s, HasItem' String s) => m ()
bar = do
(item' #Int) %= (+10) -- explicitly specify type
item' %= (<> "bar") -- type can also be inferred
pure ()
main :: IO ()
main = do
-- example of running ReaderT with multiple items
(i, s) <- runReaderT foo ((2 :: Int) ./ "foo" ./ nil)
putStrLn $ show i <> s -- prints out "12foobar"
-- example of running StateT with multiple items
is <- execStateT bar ((2 :: Int) ./ "foo" ./ nil)
putStrLn $ show (view (item #Int) is) <> (view (item #String) is) -- prints out "12foobar"
Data.Has is a simpler library that does the same with tuples. Example from the library front page:
{-# LANGUAGE FlexibleContexts #-}
-- in some library code
...
logInAnyReaderHasLogger :: (Has Logger r, MonadReader r m) => LogString -> m ()
logInAnyReaderHasLogger s = asks getter >>= logWithLogger s
queryInAnyReaderHasSQL :: (Has SqlBackEnd r, MonadReader r m) => Query -> m a
queryInAnyReaderHasSQL q = asks getter >>= queryWithSQL q
...
-- now you want to use these effects together
...
logger <- initLogger ...
sql <- initSqlBackEnd ...
(`runReader` (logger, sql)) $ do
...
logInAnyReaderHasLogger ...
...
x <- queryInAnyReaderHasSQL ...
...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Tying the Knot with a State monad - haskell

I wrote up an article on this topic at entitled Assembly: Circular Programming with Recursive do where I describe two methods for building an assembler using knot tying. Like your problem, an assembler has to be able to resolve address of labels that may occur later in the file.

Related

Avoiding thunks in sparsely evaluated list generated by monadic unfold

Creating a result piecewise from stateful computation, with good ergonomics

Haskell's Scrap Your Boilerplate (SYB) - applying transformation only once instead of everywhere

Is there an elegant way to have functions return functions of the same type (in a tuple)

Combining multiple states in StateT

Categories

Resources