Keeping IO lazy under append

Keeping IO lazy under append - haskell

I may have been under the false impression that Haskell is lazier than it is, but I wonder if there's a way to get the best of both worlds...
Data.Monoid and Data.Semigroup define two variations of First. The monoidal version models the leftmost, non-empty value, whereas the semigroup version simply models the leftmost value.
This works fine for pure value values, but consider impure values:
x = putStrLn "x" >> return 42
y = putStrLn "y" >> return 1337
Both of these values have the type Num a => IO a. IO a is a Semigroup instance when a is:
instance Semigroup a => Semigroup (IO a)
-- Defined in `Data.Orphans'
This means that it's possible to combine two IO (First a) values:
Prelude Data.Semigroup Data.Orphans> fmap First x <> fmap First y
x
y
First {getFirst = 42}
As we can see, though, both x and y produce their respective side-effects, even though y is never required.
The same applies for Data.Monoid:
Prelude Data.Monoid> fmap (First . Just) x <> fmap (First . Just) y
x
y
First {getFirst = Just 42}
I think I understand why this happens, given that both the Semigroup and Monoid instances use liftA2, which seems to ultimately be based on IO bind, which is strict, as far as I understand it.
If I dispense with the First abstraction(s), however, I can get lazier evaluation:
first x _ = x
mfirst x y = do
x' <- x
case x' of
(Just _) -> return x'
Nothing -> y
Using both of those ignores y:
Prelude> first x y
x
42
Prelude> mfirst (fmap Just x) (fmap Just y)
x
Just 42
In both of these cases, y isn't printed.
My question is, then:
Can I get the best of both worlds? Is there a way that I can retain the Semigroup or Monoid abstraction, while still get lazy IO?
Is there, for example, some sort of LazyIO container that I can wrap First values in, so that I get the lazy IO I'd like to have?
The actual scenario I'm after is that I'd like to query a prioritised list of IO resources for data, and use the first one that gives me a useful response. I don't, however, want to perform redundant queries (for performance reasons).

The Alternative instance for the MaybeT monad transformer returns the first successful result, and does not execute the rest of the operations. In combination with the asum function, we can write something like:
import Data.Foldable (asum)
import Control.Applicative
import Control.Monad.Trans.Maybe
action :: Char -> IO Char
action c = putChar c *> return c
main :: IO ()
main = do
result <- runMaybeT $ asum $ [ empty
, MaybeT $ action 'x' *> return Nothing
, liftIO $ action 'v'
, liftIO $ action 'z'
]
print result
where the final action 'z' won't be executed.
We can also write a newtype wrapper with a Monoid instance which mimics the Alternative:
newtype FirstIO a = FirstIO (MaybeT IO a)
firstIO :: IO (Maybe a) -> FirstIO a
firstIO ioma = FirstIO (MaybeT ioma)
getFirstIO :: FirstIO a -> IO (Maybe a)
getFirstIO (FirstIO (MaybeT ioma)) = ioma
instance Monoid (FirstIO a) where
mempty = FirstIO empty
FirstIO m1 `mappend` FirstIO m2 = FirstIO $ m1 <|> m2
The relationship between Alternative and Monoid is explained in this other SO question.

Is there a way that I can retain the Semigroup or Monoid abstraction, while still get lazy IO?
Somewhat, but there are drawbacks. The udnerlying problem for our instances is that a generic instance for an Applicative will look like
instance Semigroup a => Semigroup (SomeApplicative a) where
x <> y = (<>) <$> x <*> y
We're here at the mercy of (<*>), and usually the second argument y will be at least in WHNF. For example in Maybe's implementation the first line will work fine and the second line will error:
liftA2 (<>) Just (First 10) <> Just (error "never shown")
liftA2 (<>) Just (First 10) <> error "fire!"
IO's (<*>) is implemented in terms of ap, so the second action will always be executed before <> is applied.
A First-like variant is possible with ExceptT or similar, essentially any data type that has a Left k >>= _ = Left k like case so that we can stop the computation at that point. Although ExceptT is meant for exceptions, it may work well for your use-case. Alternatively, one of the Alternative transformers (MaybeT, ExceptT) together with <|> instead of <> might suffice.
A almost completely lazy IO type is also possible, but must be handled with care:
import Control.Applicative (liftA2)
import System.IO.Unsafe (unsafeInterleaveIO)
newtype LazyIO a = LazyIO { runLazyIO :: IO a }
instance Functor LazyIO where
fmap f = LazyIO . fmap f . runLazyIO
instance Applicative LazyIO where
pure = LazyIO . pure
f <*> x = LazyIO $ do
f' <- unsafeInterleaveIO (runLazyIO f)
x' <- unsafeInterleaveIO (runLazyIO x)
return $ f' x'
instance Monad LazyIO where
return = pure
f >>= k = LazyIO $ runLazyIO f >>= runLazyIO . k
instance Semigroup a => Semigroup (LazyIO a) where
(<>) = liftA2 (<>)
instance Monoid a => Monoid (LazyIO a) where
mempty = pure mempty
mappend = liftA2 mappend
unsafeInterleaveIO will enable the behaviour you want (and is used in getContents and other lazy IO Prelude functions), but it must be used with care. The order of IO operations is completely off at that point. Only when we inspect the values we will trigger the original IO:
ghci> :module +Data.Monoid Control.Monad
ghci> let example = fmap (First . Just) . LazyIO . putStrLn $ "example"
ghci> runLazyIO $ fmap mconcat $ replicateM 100 example
First {getFirst = example
Just ()}
Note that we only got our example once in the output, but at a completely random place, since the putStrLn "example" and print result got interleaved, since
print (First x) = putStrLn (show (First x))
= putStrLn ("First {getFirst = " ++ show x ++ "}")
and show x will finally put the IO necessary to get x in action. The action will get called only once if we use the result multiple times:
ghci> :module +Data.Monoid Control.Monad
ghci> let example = fmap (First . Just) . LazyIO . putStrLn $ "example"
ghci> result <- runLazyIO $ fmap mconcat $ replicateM 100 example
ghci> result
First {getFirst = example
Just ()}
ghci> result
First {getFirst = Just ()}
You could write a finalizeLazyIO function that either evaluates or seq's x though:
finalizeLazyIO :: LazyIO a -> IO a
finalizeLazyIO k = do
x <- runLazyIO k
x `seq` return x
If you were to publish a module with this functions, I'd recommend to only export the type constructor LazyIO, liftIO :: IO a -> LazyIO a and finalizeLazyIO.

Related

Standard combinator to get first "non-empty" value from a set of monadic actions

I'm sure I am missing something very obvious here. Here's what I'm trying to achieve at a conceptual level:
action1 :: (MonadIO m) => m [a]
action1 = pure []
action2 :: (MonadIO m) => m [a]
action2 = pure [1, 2, 3]
action3 :: (MonadIO m) => m [a]
action3 = error "should not get evaluated"
someCombinator [action1, action2, action3] == m [1, 2, 3]
Does this hypothetical someCombinator exist? I have tried playing round with <|> and msum but couldn't get what I want.
I guess, this could be generalised in two ways:
-- Will return the first monadic value that is NOT an mempty
-- (should NOT blindly execute all monadic actions)
-- This is something like the msum function
someCombinator :: (Monoid a, Monad m, Traversable t, Eq a) => t m a -> m a
-- OR
-- this is something like the <|> operator
someCombinator :: (Monad m, Alternative f) => m f a -> m f a -> m f a

I'm not aware of a library that provides this, but it's not hard to implement:
someCombinator :: (Monoid a, Monad m, Foldable t, Eq a) => t (m a) -> m a
someCombinator = foldr f (pure mempty)
where
f item next = do
a <- item
if a == mempty then next else pure a
Note that you don't even need Traversable: Foldable is enough.

On an abstract level, the first non-empty value is a Monoid called First. It turns out, however, that if you just naively lift your IO values into First, you'll have a problem with action3, since the default monoidal append operation is strict under IO.
You can get lazy monoidal computation using the FirstIO type from this answer. It's not going to be better than Fyodor Soikin's answer, but it highlights (I hope) how you can compose behaviour from universal abstractions.
Apart from the above-mentioned FirstIO wrapper, you may find this function useful:
guarded :: Alternative f => (a -> Bool) -> a -> f a
guarded p x = if p x then pure x else empty
I basically just copied it from Protolude since I couldn't find one in base that has the desired functionality. You can use it to wrap your lists in Maybe so that they'll fit with FirstIO:
> guarded (not . null) [] :: Maybe [Int]
Nothing
> guarded (not . null) [1, 2, 3] :: Maybe [Int]
Just [1,2,3]
Do this for each action in your list of actions and wrap them in FirstIO.
> :t (firstIO . fmap (guarded (not . null))) <$> [action1, action2, action3]
(firstIO . fmap (guarded (not . null))) <$> [action1, action2, action3]
:: Num a => [FirstIO [a]]
In the above GHCi snippet, I'm only showing the type with :t. I can't show the value, since FirstIO has no Show instance. The point, however, is that you now have a list of FirstIO values from which mconcat will pick the first non-empty value:
> getFirstIO $ mconcat $ (firstIO . fmap (guarded (not . null))) <$> [action1, action2, action3]
Just [1,2,3]
If you want to unpack the Maybe, you can use fromMaybe from Data.Maybe:
answer :: IO [Integer]
answer =
fromMaybe [] <$>
(getFirstIO $ mconcat $ (firstIO . fmap (guarded (not . null))) <$> [action1, action2, action3])
This is clearly more complex than Fyodor Soikin's answer, but I'm fascinated by how Haskell enables you to assembling desired functionality by sort of 'clicking together' existing things, almost like Lego bricks.
So, to the question of does this combinator already exist? the answer is that it sort of does, but some assembly is required.

IO monad prevents short circuiting of embedded mapM?

Somewhat mystified by the following code. In non-toy version of the problem I'm trying to do a monadic computation in a monad Result, the values of which can only be constructed from within IO. Seems like the magic behind IO makes such computations strict, but I can't figure out how exactly that happens.
The code:
data Result a = Result a | Failure deriving (Show)
instance Functor Result where
fmap f (Result a) = Result (f a)
fmap f Failure = Failure
instance Applicative Result where
pure = return
(<*>) = ap
instance Monad Result where
return = Result
Result a >>= f = f a
Failure >>= _ = Failure
compute :: Int -> Result Int
compute 3 = Failure
compute x = traceShow x $ Result x
compute2 :: Monad m => Int -> m (Result Int)
compute2 3 = return Failure
compute2 x = traceShow x $ return $ Result x
compute3 :: Monad m => Int -> m (Result Int)
compute3 = return . compute
main :: IO ()
main = do
let results = mapM compute [1..5]
print $ results
results2 <- mapM compute2 [1..5]
print $ sequence results2
results3 <- mapM compute3 [1..5]
print $ sequence results3
let results2' = runIdentity $ mapM compute2 [1..5]
print $ sequence results2'
The output:
1
2
Failure
1
2
4
5
Failure
1
2
Failure
1
2
Failure

Nice test cases. Here's what's happening:
in mapM compute we see laziness at work, as usual. No surprise here.
in mapM compute2 we work inside the IO monad, whose mapM definition will demand the whole list: unlike Result which skips the tail of the list as soon as Failure is found, IO will always scan the whole list. Note the code:
compute2 x = traceShow x $ return $ Result x
So, the above wil print the debug message as soon as each element of the list of IO actions is accessed. All are, so we print everything.
in mapM compute3 we now use, roughly:
compute3 x = return $ traceShow x $ Result x
Now, since return in IO is lazy, it will not trigger the traceShow when returning the IO action. So, when mapM compute3 is run, no message is seen. Instead, we see messages only when sequence results3 is run, which forces the Result -- not all of them, but only as much as needed.
the final Identity example is also quite tricky. Note this:
> newtype Id1 a = Id1 a
> data Id2 a = Id2 a
> Id1 (trace "hey!" True) `seq` 42
hey!
42
> Id2 (trace "hey!" True) `seq` 42
42
when using a newtype, at runtime there is no boxing/unboxing (AKA lifting) involved, so forcing a Id1 x value causes x to be forced. With data types this does not happen: the value is wrapped in a box (e.g. Id2 undefined is not equivalent to undefined).
In your example, you add an Identity constructor, but that is from the newtype Identity!! So, when calling
return $ traceShow x $ Result x
the return here does not wrap anything, and the traceShow is immediately triggered as soon as mapM is run.

Your Result type appears to be virtually identical to Maybe, with
Result <-> Just
Failure <-> Nothing
For the sake of my poor brain, I'll stick to Maybe terminology in the rest of this answer.
chi explained why IO (Maybe a) does not short-circuit the way you expected. But there is a type you can use for this sort of thing! It's essentially the same type, in fact, but with a different Monad instance. You can find it in Control.Monad.Trans.Maybe. It looks something like this:
newtype MaybeT m a = MaybeT
{ runMaybeT :: m (Maybe a) }
As you can see, this is just a newtype wrapper around m (Maybe a). But its Monad instance is very different:
instance Monad m => Monad (MaybeT m) where
return a = MaybeT $ return (Just a)
m >>= f = MaybeT $ do
mres <- runMaybeT m
case mres of
Nothing -> return Nothing
Just a -> runMaybeT (f a)
That is, m >>= f runs the m computation in the underlying monad, getting Maybe something or other. If it gets Nothing, it just stops, returning Nothing. If it gets something, it passes that to f and runs the result. You can also turn any m action into a "successful" MaybeT m action using lift from Control.Monad.Trans.Class:
class MonadTrans t where
lift :: Monad m => m a -> t m a
instance MonadTrans MaybeT where
lift m = MaybeT $ Just <$> m
You can also use this class, defined somewhere like Control.Monad.IO.Class, which is often clearer and can be much more convenient:
class MonadIO m where
liftIO :: IO a -> m a
instance MonadIO IO where
liftIO m = m
instance MonadIO m => MonadIO (MaybeT m) where
liftIO m = lift (liftIO m)

Applicative that increments the environment in haskell

I'm wondering if there is an Applicative that can track how many applicative operations have occurred. I tried to implement it as follows:
import Control.Applicative
main :: IO ()
main = print $ run 1 $ (,,,) <$> FromInt id <*> FromInt id <*> FromInt id <*> FromInt id
data FromInt a = FromInt (Int -> a)
run :: Int -> FromInt a -> a
run i (FromInt f) = f i
instance Functor FromInt where
fmap g (FromInt f) = FromInt (g . f)
instance Applicative FromInt where
pure a = FromInt (const a)
FromInt f <*> FromInt g = FromInt (\i -> f i (g (i + 1)))
But, this of course does not work. If we call runhaskell on the file, we get this:
(1,2,2,2)
And what I want is this:
(1,2,3,4)
I've seen people accomplish this effect by pushing the requirement to increment into the actual data (this is how yesod-forms does its formlet-style implementation). This more-or-less uses a variation on State, and it allows people to break assumed invariants if they don't use particular helper functions (I think the yesod one is called mhelper). I want to know if the incrementing can be pulled into the applicative instance as I have tried to do. This would make violating this particular invariant impossible.

(,) a is an Applicative when a is a Monoid. We can compose (,) (Sum Int) with other applicative using Data.Functor.Compose, and get an applicative that lets us estimate the "cost" assigned to a computation before running it.
To count the steps, we need a lift function from the base applicative that always assigns a cost of 1:
module Main where
import Data.Monoid
import Control.Applicative
import Data.Functor.Compose
type CountedIO a = Compose ((,) (Sum Int)) IO a
-- lift from IO
step :: IO a -> CountedIO a
step cmd = Compose (Sum 1, cmd)
countSteps :: CountedIO a -> Int
countSteps = getSum . fst . getCompose
exec :: CountedIO a -> IO a
exec = snd . getCompose
program :: CountedIO ()
program = step (putStrLn "aaa") *> step (putStrLn "bbb") *> step (putStrLn "ccc")
main :: IO ()
main = do
putStrLn $ "Number of steps: " ++ show (countSteps program)
exec program
For greater safety, we could hide the composed aplicative behind a newtype, and not export the constructor, only the step function.
(Actions created with pure have cost 0 and do not count as a step.)

What is wrong with my Haskell definition of the bind operator in this example?

I'm following a monad transformers tutorial here.
At this point in the tutorial, it asks me to try to implement the Monad instance for the EitherIO data type, defined as:
data EitherIO e a = EitherIO {
runEitherIO :: IO (Either e a)
}
So I tried:
instance Functor (EitherIO e) where
fmap f = EitherIO . fmap (fmap f) . runEitherIO
instance Monad (EitherIO e) where
return = EitherIO . return . Right
x >>= f = join $ fmap f x
The tutorial's version was a bit different:
instance Monad (EitherIO e) where
return = pure -- the same as EitherIO . return . Right
x >>= f = EitherIO $ runEitherIO x >>= either (return . Left) (runEitherIO . f)
Now, the types all fit, so I thought I was good, and congratulated myself on finally finding a use for join.
As it turns out, further down the tutorial, I was asked to run runEitherIO getToken on the following:
liftEither x = EitherIO (return x)
liftIO x = EitherIO (fmap Right x)
getToken = do
liftIO (T.putStrLn "Enter email address:")
input <- liftIO T.getLine
liftEither (getDomain input)
Using my version of the >>= operator, GHCi would hang after I provided some input. Then even after I interrupt via ^C, GHCi would start acting strangely, hanging for a second or two while I'm typing. I'd have to kill GHCi to restart. When I replaced the >>= definition with the tutorial definition, everything worked.
My full file is here.
So:
What is wrong with my definition?
Why would GHCi typecheck and then behave like it did? Is this a GHCi bug?

My guess is that this is the culprit:
instance Monad (EitherIO e) where
return = EitherIO . return . Right
x >>= f = join $ fmap f x
However, join is defined as:
join x = x >>= id
However, in this way join and >>= get defined in a mutually recursive way which leads to non termination.
Note that this type checks, much as the following does:
f, g :: Int -> Int
f x = g x
g x = f x
Bottom line: you should provide a definition for >>= which does not involve join.

join in Control.Monad is defined as follows:
join :: (Monad m) => m (m a) -> m a
join x = x >>= id
You see, that join is defined in terms of >>=. So the short answer is, that your definition of >>= gets into an endless loop (i.e. does not terminate), because it uses join, which in turn uses >>= again.
If you think about it, you could use this definition of >>= for every Monad. So it cannot possibly work, because it makes no use of the internal structure of the type at all.
As for why this is not detected by ghc, this is an endless loop, but not a type error. The type system of Haskell is not powerful enough to detect such loops.

How to write without Do notation

I was playing around with composable failures and managed to write a function with the signature
getPerson :: IO (Maybe Person)
where a Person is:
data Person = Person String Int deriving Show
It works and I've written it in the do-notation as follows:
import Control.Applicative
getPerson = do
name <- getLine -- step 1
age <- getInt -- step 2
return $ Just Person <*> Just name <*> age
where
getInt :: IO (Maybe Int)
getInt = do
n <- fmap reads getLine :: IO [(Int,String)]
case n of
((x,""):[]) -> return (Just x)
_ -> return Nothing
I wrote this function with the intent of creating composable possible failures. Although I've little experience with monads other than Maybe and IO this seems like if I had a more complicated data type with many more fields, chaining computations wouldn't be complicated.
My question is how would I rewrite this without do-notation? Since I can't bind values to names like name or age I'm not really sure where to start.
The reason for asking is simply to improve my understanding of (>>=) and (<*>) and composing failures and successes (not to riddle my code with illegible one-liners).
Edit: I think I should clarify, "how should I rewrite getPerson without do-notation", I don't care about the getInt function half as much.

Do-notation desugars to (>>=) syntax in this manner:
getPerson = do
name <- getLine -- step 1
age <- getInt -- step 2
return $ Just Person <*> Just name <*> age
getPerson2 =
getLine >>=
( \name -> getInt >>=
( \age -> return $ Just Person <*> Just name <*> age ))
each line in do-notation, after the first, is translated into a lambda which is then bound to the previous line. It's a completely mechanical process to bind values to names. I don't see how using do-notation or not would affect composability at all; it's strictly a matter of syntax.
Your other function is similar:
getInt :: IO (Maybe Int)
getInt = do
n <- fmap reads getLine :: IO [(Int,String)]
case n of
((x,""):[]) -> return (Just x)
_ -> return Nothing
getInt2 :: IO (Maybe Int)
getInt2 =
(fmap reads getLine :: IO [(Int,String)]) >>=
\n -> case n of
((x,""):[]) -> return (Just x)
_ -> return Nothing
A few pointers for the direction you seem to be headed:
When using Control.Applicative, it's often useful to use <$> to lift pure functions into the monad. There's a good opportunity for this in the last line:
Just Person <*> Just name <*> age
becomes
Person <$> Just name <*> age
Also, you should look into monad transformers. The mtl package is most widespread because it comes with the Haskell Platform, but there are other options. Monad transformers allow you to create a new monad with combined behavior of the underlying monads. In this case, you're using functions with the type IO (Maybe a). The mtl (actually a base library, transformers) defines
newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }
This is the same as the type you're using, with the m variable instantiated at IO. This means you can write:
getPerson3 :: MaybeT IO Person
getPerson3 = Person <$> lift getLine <*> getInt3
getInt3 :: MaybeT IO Int
getInt3 = MaybeT $ do
n <- fmap reads getLine :: IO [(Int,String)]
case n of
((x,""):[]) -> return (Just x)
_ -> return Nothing
getInt3 is exactly the same except for the MaybeT constructor. Basically, any time you have an m (Maybe a) you can wrap it in MaybeT to create a MaybeT m a. This gains simpler composability, as you can see by the new definition of getPerson3. That function doesn't worry about failure at all because it's all handled by the MaybeT plumbing. The one remaining piece is getLine, which is just an IO String. This is lifted into the MaybeT monad by the function lift.
Edit
newacct's comment suggests that I should provide a pattern matching example as well; it's basically the same with one important exception. Consider this example (the list monad is the monad we're interested in, Maybe is just there for pattern matching):
f :: Num b => [Maybe b] -> [b]
f x = do
Just n <- x
[n+1]
-- first attempt at desugaring f
g :: Num b => [Maybe b] -> [b]
g x = x >>= \(Just n) -> [n+1]
Here g does exactly the same thing as f, but what if the pattern match fails?
Prelude> f [Nothing]
[]
Prelude> g [Nothing]
*** Exception: <interactive>:1:17-34: Non-exhaustive patterns in lambda
What's going on? This particular case is the reason for one of the biggest warts (IMO) in Haskell, the Monad class's fail method. In do-notation, when a pattern match fails fail is called. An actual translation would be closer to:
g' :: Num b => [Maybe b] -> [b]
g' x = x >>= \x' -> case x' of
Just n -> [n+1]
_ -> fail "pattern match exception"
now we have
Prelude> g' [Nothing]
[]
fails usefulness depends on the monad. For lists, it's incredibly useful, basically making pattern matching work in list comprehensions. It's also very good in the Maybe monad, since a pattern match error would lead to a failed computation, which is exactly when Maybe should be Nothing. For IO, perhaps not so much, as it simply throws a user error exception via error.
That's the full story.

do-blocks of the form var <- e1; e2 desugar to expressions using >>= as follows e1 >>= \var -> e2. So your getPerson code becomes:
getPerson =
getLine >>= \name ->
getInt >>= \age ->
return $ Just Person <*> Just name <*> age
As you see this is not very different from the code using do.

Actually, according to this explaination, the exact translation of your code is
getPerson =
let f1 name =
let f2 age = return $ Just Person <*> Just name <*> age
f2 _ = fail "Invalid age"
in getInt >>= f2
f1 _ = fail "Invalid name"
in getLine >>= f1
getInt =
let f1 n = case n of
((x,""):[]) -> return (Just x)
_ -> return Nothing
f1 _ = fail "Invalid n"
in (fmap reads getLine :: IO [(Int,String)]) >>= f1
And the pattern match example
f x = do
Just n <- x
[n+1]
translated to
f x =
let f1 Just n = [n+1]
f1 _ = fail "Not Just n"
in x >>= f1
Obviously, this translated result is less readable than the lambda version, but it works with or without pattern matching.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Keeping IO lazy under append - haskell

Related

Standard combinator to get first "non-empty" value from a set of monadic actions

IO monad prevents short circuiting of embedded mapM?

Applicative that increments the environment in haskell

What is wrong with my Haskell definition of the bind operator in this example?

How to write without Do notation

Categories

Resources