Haskell IO Monad in ST Monad - haskell

I guess I have a simple problem, but I'm not able to solve it:
I want to create a vector, call FFI on this vector and return it.
import qualified Data.Vector.Storable as VS
import qualified Data.Vector.Storable.Mutable as VSM
withCarray :: Int -> (Ptr Float -> IO ()) -> (Vector Float)
withCarray n f = VS.create $ do
v <- VSM.new n
VSM.unsafeWith v f
return v
Now, VSM.unsafeWith returns an IO, and this is in the ST Monad. But if I use ioToST or liftIO I get the following problem:
Couldn't match type ‘s’ with ‘RealWorld’
‘s’ is a rigid type variable bound by
a type expected by the context:
forall s. ST s (VSM.MVector s Float)
at src/Interpolation.hs:(60,30)-(63,10)
Expected type: ST s (VSM.IOVector Float)
Actual type: ST s (VSM.MVector (PrimState (ST s)) Float)
Any idea, how I can convert the unsafeWith to the correct Monad? I saw that IO and ST are both Prim-Monads, so it should be possible to convert them, right?

For FFI applications, it's usually simplest not to bother with ST at all and do everything in IO. If you must have a pure interface (and your FFI function actually is pure in the way needed), you can call unsafePerformIO to give a pure interface.
However, I think I would shy away from writing withCarray as you have done here. Abstraction is nice, but this is too easy to accidentally misuse by passing a not-suitably-behaving callback. If you must have it, at the very least name it unsafeWithCarray and leave a Haddock explicitly stating under what circumstances it is safe.

Related

How to understand `MonadUnliftIO`'s requirement of "no stateful monads"?

I've looked over https://www.fpcomplete.com/blog/2017/06/tale-of-two-brackets, though skimming some parts, and I still don't quite understand the core issue "StateT is bad, IO is OK", other than vaguely getting the sense that Haskell allows one to write bad StateT monads (or in the ultimate example in the article, MonadBaseControl instead of StateT, I think).
In the haddocks, the following law must be satisfied:
askUnliftIO >>= (\u -> liftIO (unliftIO u m)) = m
So this appears to be saying that state is not mutated in the monad m when using askUnliftIO. But to my mind, in IO, the entire world can be the state. I could be reading and writing to a text file on disk, for instance.
To quote another article by Michael,
False purity We say WriterT and StateT are pure, and technically they
are. But let's be honest: if you have an application which is entirely
living within a StateT, you're not getting the benefits of restrained
mutation that you want from pure code. May as well call a spade a
spade, and accept that you have a mutable variable.
This makes me think this is indeed the case: with IO we are being honest, with StateT, we are not being honest about mutability ... but that seems another issue than what the law above is trying to show; after all, MonadUnliftIO is assuming IO. I'm having trouble understanding conceptually how IO is more restrictive than something else.
Update 1
After sleeping (some), I am still confused but am gradually getting less so as the day wears on. I worked out the law proof for IO. I realized the presence of id in the README. In particular,
instance MonadUnliftIO IO where
askUnliftIO = return (UnliftIO id)
So askUnliftIO would appear to return an IO (IO a) on an UnliftIO m.
Prelude> fooIO = print 5
Prelude> :t fooIO
fooIO :: IO ()
Prelude> let barIO :: IO(IO ()); barIO = return fooIO
Prelude> :t barIO
barIO :: IO (IO ())
Back to the law, it really appears to be saying that state is not mutated in the monad m when doing a round trip on the transformed monad (askUnliftIO), where the round trip is unLiftIO -> liftIO.
Resuming the example above, barIO :: IO (), so if we do barIO >>= (u -> liftIO (unliftIO u m)), then u :: IO () and unliftIO u == IO (), then liftIO (IO ()) == IO (). **So since everything has basically been applications of id under the hood, we can see that no state was changed, even though we are using IO. Crucially, I think, what is important is that the value in a is never run, nor is any other state modified, as a result of using askUnliftIO. If it did, then like in the case of randomIO :: IO a, we would not be able to get the same value had we not run askUnliftIO on it. (Verification attempt 1 below)
But, it still seems like we could do the same for other Monads, even if they do maintain state. But I also see how, for some monads, we may not be able to do so. Thinking of a contrived example: each time we access the value of type a contained in the stateful monad, some internal state is changed.
Verification attempt 1
> fooIO >> askUnliftIO
5
> fooIOunlift = fooIO >> askUnliftIO
> :t fooIOunlift
fooIOunlift :: IO (UnliftIO IO)
> fooIOunlift
5
Good so far, but confused about why the following occurs:
> fooIOunlift >>= (\u -> unliftIO u)
<interactive>:50:24: error:
* Couldn't match expected type `IO b'
with actual type `IO a0 -> IO a0'
* Probable cause: `unliftIO' is applied to too few arguments
In the expression: unliftIO u
In the second argument of `(>>=)', namely `(\ u -> unliftIO u)'
In the expression: fooIOunlift >>= (\ u -> unliftIO u)
* Relevant bindings include
it :: IO b (bound at <interactive>:50:1)
"StateT is bad, IO is OK"
That's not really the point of the article. The idea is that MonadBaseControl permits some confusing (and often undesirable) behaviors with stateful monad transformers in the presence of concurrency and exceptions.
finally :: StateT s IO a -> StateT s IO a -> StateT s IO a is a great example. If you use the "StateT is attaching a mutable variable of type s onto a monad m" metaphor, then you might expect that the finalizer action gets access to the most recent s value when an exception was thrown.
forkState :: StateT s IO a -> StateT s IO ThreadId is another one. You might expect that the state modifications from the input would be reflected in the original thread.
lol :: StateT Int IO [ThreadId]
lol = do
for [1..10] $ \i -> do
forkState $ modify (+i)
You might expect that lol could be rewritten (modulo performance) as modify (+ sum [1..10]). But that's not right. The implementation of forkState just passes the initial state to the forked thread, and then can never retrieve any state modifications. The easy/common understanding of StateT fails you here.
Instead, you have to adopt a more nuanced view of StateT s m a as "a transformer that provides a thread-local immutable variable of type s which is implicitly threaded through a computation, and it is possible to replace that local variable with a new value of the same type for future steps of the computation." (more or less a verbose english retelling of the s -> m (a, s)) With this understanding, the behavior of finally becomes a bit more clear: it's a local variable, so it does not survive exceptions. Likewise, forkState becomes more clear: it's a thread-local variable, so obviously a change to a different thread won't affect any others.
This is sometimes what you want. But it's usually not how people write code IRL and it often confuses people.
For a long time, the default choice in the ecosystem to do this "lowering" operation was MonadBaseControl, and this had a bunch of downsides: hella confusing types, difficult to implement instances, impossible to derive instances, sometimes confusing behavior. Not a great situation.
MonadUnliftIO restricts things to a simpler set of monad transformers, and is able to provide relatively simple types, derivable instances, and always predictable behavior. The cost is that ExceptT, StateT, etc transformers can't use it.
The underlying principle is: by restricting what is possible, we make it easier to understand what might happen. MonadBaseControl is extremely powerful and general, and quite difficult to use and confusing as a result. MonadUnliftIO is less powerful and general, but it's much easier to use.
So this appears to be saying that state is not mutated in the monad m when using askUnliftIO.
This isn't true - the law is stating that unliftIO shouldn't do anything with the monad transformer aside from lowering it into IO. Here's something that breaks that law:
newtype WithInt a = WithInt (ReaderT Int IO a)
deriving newtype (Functor, Applicative, Monad, MonadIO, MonadReader Int)
instance MonadUnliftIO WithInt where
askUnliftIO = pure (UnliftIO (\(WithInt readerAction) -> runReaderT 0 readerAction))
Let's verify that this breaks the law given: askUnliftIO >>= (\u -> liftIO (unliftIO u m)) = m.
test :: WithInt Int
test = do
int <- ask
print int
pure int
checkLaw :: WithInt ()
checkLaw = do
first <- test
second <- askUnliftIO >>= (\u -> liftIO (unliftIO u test))
when (first /= second) $
putStrLn "Law violation!!!"
The value returned by test and the askUnliftIO ... lowering/lifting are different, so the law is broken. Furthermore, the observed effects are different, which isn't great either.

Sequence partially-applied Monadic Actions

I'm aware this is probably obvious, but my hoogle-fu is failing me here. I have a list of actions of type:
import Data.Vector.Mutable (STVector)
[STVector s a -> ST s ()]
That is, a set of actions that take a starting MVector and mutate it in some way
I also have a starting vector
import Data.Vector (Vector, thaw, freeze)
v :: Vector a
After thawing v, how do I sequence these actions into a final result?
doIt :: forall s. Vector a -> [STVector s a -> ST s ()] -> Vector a
doIt v ops = runST $ do
v' <- thaw v
-- do all the things to v'
unfreeze v'
If context helps, I'm attempting Advent of Code's day 16 puzzle (part 2), so ops is a long list of mutations that I actually need to run through literally a billion times. I expected to be able to use replicateM_ to do this, but can't see how to provide a starting value. I similarly thought foldM_ would work, but I can't seem to get it to typecheck. Perhaps I'm doing something fundamentally wrong? I can't say I understand the ST monad backwards and forwards yet.
The operation you're looking for is traverse_. It visits each value in a data structure, applies a function with a monadic return type, and sequences their effects together while discarding their results. The "discarding their results" part is important. It means that the return type of traverse_ in this case will be ST s () instead of ST s [()]. The difference there is significant. It means that the operation won't build up a giant list of () values to eventually throw away.
The function you want to pass to traverse_ is the one that means "apply a function to v'". That could be written as \f -> f v', but there's a shorter version that's idiomatic and uses the $ operator. Note that you can rewrite the lambda above as \f -> f $ v', which can be rewritten as the section ($ v').
So you have traverse_ ($ v') ops, which is the correct operation, but it still won't type check. That's because of the type of runST, which requires that the type s in ST s be polymorphic. With your type signature as written, s is chosen by the caller of doIt. To make this work, you need to enable the RankNTypes extension by making sure {-# LANGUAGE RankNTypes #-} is at the top of the file, and then change the type to Vector a -> (forall s. [STVector s a -> ST s ()]) -> Vector a. This narrowing of the scope of the s type variable requires that s be polymorphic in the value passed to doIt. With that restriction in place, runST will type check successfully with the operation above.
For more information on why runST has such a funny type, see Lazy Functional State Threads, the paper which introduced the ST type and showed how it can be used to constrain mutability inside an externally-pure interface.

Unwrapping the STT monad in a transformer stack?

This question is apparently related to the problem discussed here and here. Unfortunately, my requirement is slightly different to those questions, and the answers given don't apply to me. I also don't really understand why runST fails to type check in these cases, which doesn't help.
My problem is this, I have one section of code that uses one monad stack, or rather a single monad:
import Control.Monad.Except
type KErr a = Except KindError a
Another section of code needs to integrate with this that wraps this within the STT monad:
type RunM s a = STT s (Except KindError) a
At the interface between these sections, I clearly need to wrap and unwrap the outer layer. I've got the following function to work in the KErr -> RunM direction:
kerrToRun :: KErr a -> RunM s a
kerrToRun e = either throwError return $ runExcept e
but for some reason I just can't get the converse to type check:
runToKErr :: RunM s a -> KErr a
runToKErr r = runST r
I'm working on the assumption that as the RunM's inner monad has the same structure as KErr, I can just return it once I've unwrapped the STT layer, but I don't seem to be able to get that far, as runST complains about its type arguments:
src/KindLang/Runtime/Eval.hs:18:21:
Couldn't match type ‘s’ with ‘s1’
‘s’ is a rigid type variable bound by
the type signature for runToKErr :: RunM s a -> KErr a
at src/KindLang/Runtime/Eval.hs:17:14
‘s1’ is a rigid type variable bound by
a type expected by the context:
STT s1 (ExceptT KindError Data.Functor.Identity.Identity) a
at src/KindLang/Runtime/Eval.hs:18:15
Expected type: STT
s1 (ExceptT KindError Data.Functor.Identity.Identity) a
Actual type: RunM s a
I've also tried:
runToKErr r = either throwError return $ runExcept $ runST r
in order to more strongly isolate runST from its expected return type, in case that was the cause of the issue, but the results are the same.
Where is this s1 type coming from, and how do I persuade ghc that it's the same type as s?
(The below talks about ST s a but applies just the same as STT s m a; I've just avoided the unnecessary complication of talking about the transformer version below)
The problem you are seeing is that runST has type (forall s. ST s a) -> a to isolate any potential (STRef-changing) effects of the computation from the outside, pure world. The whole point of the s phantom type that all ST computations, STRefs, etc. are tagged with, is to track which "ST-domain" they belong to; and the type of runST ensures that nothing can pass between domains.
You can write runToKErr by enforcing the same invariant:
{-# language Rank2Types #-}
runToKErr :: (forall s. RunM s a) -> KErr a
runToKErr = runST
(of course, you might realize further down the road that this restriction is too strong for the program you hope to write; at that point you'll need to lose hope, sorry, I mean you'll need to redesign your program.)
As for the error message, the reason you can't "convince the type checker that s1 and s are the same type" is because if I pass you an ST s a for a given choice of s and a, that is not the same as giving you something which allows you to choose your own s. GHC chose s1 (a Skolemized variable) as s and so tries to unify ST s a with ST s1 a

STRef and phantom types

Does s in STRef s a get instantiated with a concrete type? One could easily imagine some code where STRef is used in a context where the a takes on Int. But there doesn't seem to be anything for the type inference to give s a concrete type.
Imagine something in pseudo Java like MyList<S, A>. Even if S never appeared in the implementation of MyList instantiating a concrete type like MyList<S, Integer> where a concrete type is not used in place of S would not make sense. So how can STRef s a work?
tl;dr - in practice it seems it always gets initialised to RealWorld in the end
The source notes that s can be instantiated to RealWorld inside invocations of stToIO, but is otherwise uninstantiated:
-- The s parameter is either
-- an uninstantiated type variable (inside invocations of 'runST'), or
-- 'RealWorld' (inside invocations of 'Control.Monad.ST.stToIO').
Looking at the actual code for ST however it seems runST uses a specific value realWorld#:
newtype ST s a = ST (STRep s a)
type STRep s a = State# s -> (# State# s, a #)
runST :: (forall s. ST s a) -> a
runST st = runSTRep (case st of { ST st_rep -> st_rep })
runSTRep :: (forall s. STRep s a) -> a
runSTRep st_rep = case st_rep realWorld# of
(# _, r #) -> r
realWorld# is defined as a magic primitive inside the GHC source code:
realWorldName = mkWiredInIdName gHC_PRIM (fsLit "realWorld#")
realWorldPrimIdKey realWorldPrimId
realWorldPrimId :: Id -- :: State# RealWorld
realWorldPrimId = pcMiscPrelId realWorldName realWorldStatePrimTy
(noCafIdInfo `setUnfoldingInfo` evaldUnfolding
`setOneShotInfo` stateHackOneShot)
You can also confirm this in ghci:
Prelude> :set -XMagicHash
Prelude> :m +GHC.Prim
Prelude GHC.Prim> :t realWorld#
realWorld# :: State# RealWorld
From your question I can not see if you understand why the phantom s type is there at all. Even if you did not ask for this explicitly, let me elaborate on that.
The role of the phantom type
The main use of the phantom type is to constrain references (aka pointers) to stay "inside" the ST monad. Roughly, the dynamically allocated data must end its life when runST returns.
To see the issue, let's pretend that the type of runST were
runST :: ST s a -> a
Then, consider this:
data Dummy
let var :: STRef Dummy Int
var = runST (newSTRef 0)
change :: () -> ()
change = runST (modifySTRef var succ)
access :: () -> Int
result :: (Int, ())
result = (access() , change())
in result
(Above I added a few useless () arguments to make it similar to imperative code)
Now what should be the result of the code above? It could be either (0,()) or (1,()) depending on the evaluation order. This is a big no-no in the pure Haskell world.
The issue here is that var is a reference which "escaped" from its runST. When you escape the ST monad, you are no longer forced to use the monad operator >>= (or equivalently, the do notation to sequentialize the order of side effects. If references are still around, then we can still have side effects around when there should be none.
To avoid the issue, we restrict runST to work on ST s a where a does not depend on s. Why this? Because newSTRef returns a STRef s a, a reference to a tagged with the phantom type s, hence the return type depends on s and can not be extracted from the ST monad through runST.
Technically, this restriction is done by using a rank-2 type:
runST :: (forall s. ST s a) -> a
the "forall" here is used to implement the restriction. The type is saying: choose any a you wish, then provide a value of type ST s a for any s I wish, then I will return an a. Mind that s is chosen by runST, not by the caller, so it could be absolutely anything. So, the type system will accept an application runST action only if action :: forall s. ST s a where s is unconstrained, and a does not involve s (recall that the caller has to choose a before runST chooses s).
It is indeed a slightly hackish trick to implement the independence constraint, but it does work.
On the actual question
To connect this to your actual question: in the implementation of runST, s will be chosen to be any concrete type. Note that, even if s were simply chosen to be Int inside runST it would not matter much, because the type system has already constrained a to be independent from s, hence to be reference-free. As #Ganesh pointed out, RealWorld is the type used by GHC.
You also mentioned Java. One could attempt to play a similar trick in Java as follows: (warning, overly simplified code follows)
interface ST<S,A> { A call(); }
interface STAction<A> { <S> ST<S,A> call(S dummy); }
...
<A> A runST(STAction<A> action} {
RealWorld dummy = new RealWorld();
return action.call(dummy).call();
}
Above in STAction parameter A can not depend on S.

Signature of IO in Haskell (is this class or data?)

The question is not what IO does, but how is it defined, its signature. Specifically, is this data or class, is "a" its type parameter then? I didn't find it anywhere. Also, I don't understand the syntactic meaning of this:
f :: IO a
You asked whether IO a is a data type: it is. And you asked whether the a is its type parameter: it is. You said you couldn't find its definition. Let me show you how to find it:
localhost:~ gareth.rowlands$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Prelude> :i IO
newtype IO a
= GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))
-- Defined in `GHC.Types'
instance Monad IO -- Defined in `GHC.Base'
instance Functor IO -- Defined in `GHC.Base'
Prelude>
In ghci, :i or :info tells you about a type. It shows the type declaration and where it's defined. You can see that IO is a Monad and a Functor too.
This technique is more useful on normal Haskell types - as others have noted, IO is magic in Haskell. In a typical Haskell type, the type signature is very revealing but the important thing to know about IO is not its type declaration, rather that IO actions actually perform IO. They do this in a pretty conventional way, typically by calling the underlying C or OS routine. For example, Haskell's putChar action might call C's putchar function.
IO is a polymorphic type (which happens to be an instance of Monad, irrelevant here).
Consider the humble list. If we were to write our own list of Ints, we might do this:
data IntList = Nil | Cons { listHead :: Int, listRest :: IntList }
If you then abstract over what element type it is, you get this:
data List a = Nil | Cons { listHead :: a, listRest :: List a }
As you can see, the return value of listRest is List a. List is a polymorphic type of kind * -> *, which is to say that it takes one type argument to create a concrete type.
In a similar way, IO is a polymorphic type with kind * -> *, which again means it takes one type argument. If you were to define it yourself, it might look like this:
data IO a = IO (RealWorld -> (a, RealWorld))
(definition courtesy of this answer)
The amount of magic in IO is grossly overestimated: it has some support from compiler and runtime system, but much less than newbies usually expect.
Here is the source file where it is defined:
http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-prim-0.3.0.0/src/GHC-Types.html
newtype IO a
= IO (State# RealWorld -> (# State# RealWorld, a #))
It is just an optimized version of state monad. If we remove optimization annotations we will see:
data IO a = IO (Realworld -> (Realworld, a))
So basically IO a is a data structure storing a function that takes old real world and returns new real world with io operation performed and a.
Some compiler tricks are necessary mostly to remove Realworld dummy value efficiently.
IO type is an abstract newtype - constructors are not exported, so you cannot bypass library functions, work with it directly and perform nasty things: duplicate RealWorld, create RealWorld out of nothing or escape the monad (write a function of IO a -> a type).
Since IO can be applied to objects of any type a, as it is a polymorphic monad, a is not specified.
If you have some object with type a, then it can be 'wrappered' as an object of type IO a, which you can think of as being an action that gives an object of type a. For example, getChar is of type IO Char, and so when it is called, it has the side effect of (From the program's perspective) generating a character, which comes from stdin.
As another example, putChar has type Char -> IO (), meaning that it takes a char, and then performs some action that gives no output (in the context of the program, though it will print the char given to stdout).
Edit: More explanation of monads:
A monad can be thought of as a 'wrapper type' M, and has two associated functions:
return and >>=.
Given a type a, it is possible to create objects of type M a (IO a in the case of the IO monad), using the return function.
return, therefore, has type a -> M a. Moreover, return attempts not to change the element that it is passed -- if you call return x, you will get a wrappered version of x that contains all of the information of x (Theoretically, at least. This doesn't happen with, for example, the empty monad.)
For example, return "x" will yield an M Char. This is how getChar works -- it yields an IO Char using a return statement, which is then pulled out of its wrapper with <-.
>>=, read as 'bind', is more complicated. It has type M a -> (a -> M b) -> M b, and its role is to take a 'wrappered' object, and a function from the underlying type of that object to another 'wrappered' object, and apply that function to the underlying variable in the first input.
For example, (return 5) >>= (return . (+ 3)) will yield an M Int, which will be the same M Int that would be given by return 8. In this way, any function that can be applied outside of a monad can also be applied inside of it.
To do this, one could take an arbitrary function f :: a -> b, and give the new function g :: M a -> M b as follows:
g x = x >>= (return . f)
Now, for something to be a monad, these operations must also have certain relations -- their definitions as above aren't quite enough.
First: (return x) >>= f must be equivalent to f x. That is, it must be equivalent to perform an operation on x whether it is 'wrapped' in the monad or not.
Second: x >>= return must be equivalent to m. That is, if an object is unwrapped by bind, and then rewrapped by return, it must return to its same state, unchanged.
Third, and finally (x >>= f) >>= g must be equivalent to x >>= (\y -> (f y >>= g) ). That is, function binding is associative (sort of). More accurately, if two functions are bound successively, this must be equivalent to binding the combination thereof.
Now, while this is how monads work, it's not how it's most commonly used, because of the syntactic sugar of do and <-.
Essentially, do begins a long chain of binds, and each <- sort of creates a lambda function that gets bound.
For example,
a = do x <- something
y <- function x
return y
is equivalent to
a = something >>= (\x -> (function x) >>= (\y -> return y))
In both cases, something is bound to x, function x is bound to y, and then y is returned to a in the wrapper of the relevant monad.
Sorry for the wall of text, and I hope it explains something. If there's more you need cleared up about this, or something in this explanation is confusing, just ask.
This is a very good question, if you ask me. I remember being very confused about this too, maybe this will help...
'IO' is a type constructor, 'IO a' is a type, the 'a' (in 'IO a') is an type variable. The letter 'a' carries no significance, the letter 'b' or 't1' could have been used just as well.
If you look at the definition of the IO type constructor you will see that it is a newtype defined as: GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))
'f :: IO a' is the type of a function called 'f' of apparently no arguments that returns a result of some unconstrained type in the IO monad. 'in the IO monad' means that f can do some IO (i.e. change the 'RealWorld', where 'change' means replace the provided RealWorld with a new one) while computing its result. The result of f is polymorphic (that's a type variable 'a' not a type constant like 'Int'). A polymorphic result means that in your program it's the caller that determines the type of the result, so used in one place f could return an Int, used in another place it could return a String. 'Unconstrained' means that there's no type class restricting what type can be returned and so any type can be returned.
Why is 'f' a function and not a constant since there are no parameters and Haskell is pure? Because the definition of IO means that 'f :: IO a' could have been written 'f :: GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #)' and so in fact has a parameter -- the 'state of the real world'.
In the data IO a a have mainly the same meaning as in Maybe a.
But we can't rid of a constructor, like:
fromIO :: IO a -> a
fromIO (IO a) = a
Fortunately we could use this data in Monads, like:
{-# LANGUAGE ScopedTypeVariables #-}
foo = do
(fromIO :: a) <- (dataIO :: IO a)
...

Resources