Related
I'd like to create a list data structure that can zipWith that has a better behavior with self reference. This is for an esoteric language that will rely on self reference and laziness to be Turing complete using only values (no user functions). I've already created it, called Atlas but it has many built ins, I'd like to reduce that and be able to compile/interpret in Haskell.
The issue is that zipWith checks if either list is empty and returns empty. But in the case that this answer depends on the result of zipWith then it will loop infinitely. Essentially I'd like it to detect this case and have faith that the list won't be empty. Here is an example using DList
import Data.DList
import Data.List (uncons)
zipDL :: (a->b->c) -> DList a -> DList b -> DList c
zipDL f a b = fromList $ zipL f (toList a) (toList b)
zipL :: (a->b->c) -> [a] -> [b] -> [c]
zipL _ [] _ = []
zipL _ _ [] = []
zipL f ~(a:as) ~(b:bs) = f a b : zipL f as bs
a = fromList [5,6,7]
main=print $ dh where
d = zipDL (+) a $ snoc (fromList dt) 0
~(Just (dh,dt)) = uncons $ toList d
This code would sum the list 5,6,7 except for the issue. It can be fixed by removing zipL _ _ [] = [] because then it assumes that the result won't be empty and then it in fact turns out not to be empty. But this is a bad solution because we can't always assume that it is the second list that could have the self reference.
Another way of explaining it is if we talk about the sizes of these list.
The size of zip a b = min (size a) (size b)
So in this example: size d = min (size a) (size d-1+1)
But there in lies the problem, if the size of d is 0, then the size of d = 0, but if size of d is 1 the size is 1, however once the size of d is said to be greater than size of a, then the size would be a, which is a contradiction. But any size 0-a works which means it is undefined.
Essentially I want to detect this case and make the size of d = a.
So far the only thing I have figured out is to make all lists lists of Maybe, and terminate lists with a Nothing value. Then in the application of the zipWith binary function return Nothing if either value is Nothing. You can then take out both of the [] checks in zip, because you can think of all lists as being infinite. Finally to make the summation example work, instead of doing a snoc, do a map, and replace any Nothing value with the snoc value. This works because when checking the second list for Nothing, it can lazily return true, since no value of the second list can be nothing.
Here is that code:
import Data.Maybe
data L a = L (Maybe a) (L a)
nil :: L a
nil = L Nothing nil
fromL :: [a] -> L a
fromL [] = nil
fromL (x:xs) = L (Just x) (fromL xs)
binOpMaybe :: (a->b->c) -> Maybe a -> Maybe b -> Maybe c
binOpMaybe f Nothing _ = Nothing
binOpMaybe f _ Nothing = Nothing
binOpMaybe f (Just a) (Just b) = Just (f a b)
zip2W :: (a->b->c) -> L a -> L b -> L c
zip2W f ~(L a as) ~(L b bs) = L (binOpMaybe f a b) (zip2W f as bs)
unconsL :: L a -> (Maybe a, Maybe (L a))
unconsL ~(L a as) = (a, Just as)
mapOr :: a -> L a -> L a
mapOr v ~(L a as) = L (Just $ fromMaybe v a) $ mapOr v as
main=print $ h
where
a = fromL [4,5,6]
b = zip2W (+) a (mapOr 0 (fromJust t))
(h,t) = unconsL $ b
The downside to this approach is it needs this other operator to map with Just . fromMaybe initialvalue. This is a less intuitive operator than ++. And without it the language could be built entirely on ++ uncons and (:[]) which would be pretty neat.
The other thing I've figured out is in the current ruby implementation to throw an error when a value depends on itself, and catch it in the empty list detection. But this is vary hacky and not entirely sound, although it does work for cases like this. I don't think this can work in Haskell since I don't think you can detect self dependence?
Sorry for the long description and the very odd use case. I've spent tons of time thinking about this, but haven't solved it yet and can't explain it any more succinctly! Not expecting an answer but figured it is worth a shot, thanks for considering.
EDIT:
After seeing it framed as a greatest fixed point question, it seems like a poor question because there is no efficient general solution to such a problem. For example, suppose the code was b = zipWith (+) a (if length b < 1 then [1] else []).
For my purposes it could still be nice to handle some cases correctly - the example provided does have a solution. So I could reframe the question as: when can we find the greatest fixed point efficiently and what is that fixed point? But I believe there is no simple answer to such a question, and so it would be a poor basis for a programming language to rely on ad hoc rules.
Sounds like you want a greatest fixed point. I'm not sure I've seen this done before, but maybe it's possible to make a sensible type class for types that support those.
class GF a where gfix :: (a -> a) -> a
instance GF a => GF [a] where
gfix f = case (f (repeat undefined), f []) of
(_:_, _) -> b:bs where
b = gfix (\a' -> head (f (a':bs)))
bs = gfix (\as' -> tail (f (b:as')))
([], []) -> []
_ -> error "no fixed point greater than bottom exists"
-- use the usual least fixed point. this ain't quite right, but
-- it works for this example, and maybe it's Good Enough
instance GF Int where gfix f = let x = f x in x
Try it out in ghci:
> gfix (\xs -> zipWith (+) [5,6,7] (tail xs ++ [0])) :: [Int]
[18,13,7]
This implementation isn't particularly efficient; e.g. replacing [5,6,7] with [1..n] results in a runtime that's quadratic in n. Perhaps with some cleverness that can be improved, but it's not immediately obvious to me how that would go.
I have an answer for this specific case, not general.
appendRepeat :: a -> [a] -> [a]
appendRepeat v a = h : appendRepeat v t
where
~(h,t) =
if null a
then (v,[])
else (head a,tail a)
a = [4,5,6]
main=print $ head b
where
b = zipWith (+) a $ appendRepeat 0 (tail b)
appendRepeat adds a an infinite list of a repeated value to the end of a list. But the key thing about it is it doesn't check if list is empty or not when deciding that it is returning a non empty list where the tail is a recursive call. This way laziness never ends up in an infinite loop checking the zipWith _ [] case.
So this code works, and for the purposes of the original question, it can be used to convert the language to just using 2 simple functions (++ and :[]). But the interpreter would need to do some static analysis for appending a repeated value and replace it to using this special appendRepeat function (which can easily be done in Atlas). It seems hacky to only make this one implementation switcharoo, but that is all that is needed.
I have started to solve the 99 problems in Haskell, and for the second question, it is given the following solution:
myButLast' :: [a] -> a
myButLast' = last . init
and if we give the empty list to this function, we get and error, however, I would like to print a specific error as
myButLast' [] = error "The list has to have at least 2 elements!"
myButLast' [x] = error "The list has to have at least 2 elements!"
but when I add these line to the code, I get
Equations for ‘myButLast'’ have different numbers of arguments
, so is there a way to use the composition type of defining my new function while also adding some specific behaviour ?
The best you can do is probably something like the following, where the error checking is moved into an auxiliary function (which I've named go for lack of a better name) defined in a where clause:
myButLast :: [a] -> a
myButLast = go (last . init)
where
go _ [] = bad
go _ [x] = bad
go f xs = f xs
bad = error "The list has to have at least 2 elements!"
It might help make it clearer what's going on if we define go separately with a type signature. Also, in case you find the underscores confusing, I've replaced them with f:
myButLast :: [a] -> a
myButLast = go (last . init)
go :: ([a] -> a) -> [a] -> a
go f [] = bad
go f [x] = bad
go f xs = f xs
bad = error "The list has to have at least 2 elements!"
Here, you can see that go is a function that takes two arguments, the first being itself a function of type [a] -> a and the second being a list of type [a].
The above definition of go pattern matches on the second argument (the list). If the list is empty or a singleton, then the result of go is just bad (the error message), regardless of the function f. Otherwise (if the list is at least two elements), the result of go f xs is simply to apply the first argument (that function f) to the list xs.
How does this work? Well, let's see what happens if we apply myButLast to a list. I've used the symbol "≡" here to show equivalence of Haskell expressions with comments explaining why they are equivalent:
myButLast [1,2,3]
-- by the definition of "myButLast"
≡ go (last . init) [1,2,3]
-- by the definition of "go", third pattern w/
-- f ≡ last . init
-- xs = [1,2,3]
≡ (last . init) [1,2,3] -- this is just f xs w/ the pattern substitutions
-- because of your original, correct answer
≡ 2
If we apply it to a "bad" list, the only difference is the pattern matched from the definition of go:
myButLast [1]
-- by the definition of "myButLast"
≡ go (last . init) [1]
-- by the definition of "go", second pattern w/
-- f ≡ last . init
-- x = 1
≡ bad
-- gives an error message by definition of "bad"
As an interesting aside, another way to look at go is that it's a function:
go :: ([a] -> a) -> ([a] -> a)
Because the function application arrow -> is right associative, this type signature is exactly the same as ([a] -> a) -> [a] -> a. The neat thing about this is that now it's clear that go takes a function of type [a] -> a (such as last . init) and returns another function of type [a] -> a (such as myButLast). That is, go is a transformer that adds additional behavior to an existing function to create a new function, which is exactly what you were asking for in your original question.
In fact, if you slightly generalize the type signature so that go can operate on a function taking a list, regardless of what it returns:
go :: ([a] -> b) -> [a] -> b
go _ [] = bad
go _ [x] = bad
go f xs = f xs
this still works. Then, you could use this same go on anything that needed a list of length two, whatever it returned. For example, if you had an original implementation to return the last two elements of a list:
lastTwoElements :: [a] -> [a]
lastTwoElements = (!! 2) . reverse . tails -- tails from Data.List
you could re-write it as:
lastTwoElements :: [a] -> [a]
lastTwoElements = go ((!! 2) . reverse . tails)
to add error handling for the empty and singleton list cases.
In this case, you'd probably want to rename go to usingTwoElements or withList2 or something...
Use an explicit argument in the solution:
myButLast' x = last (init x)
Now you can add your special cases just above that line.
The original solution used a pointfree style last . init to avoid mentioning the x argument. However, if you have to add further equations, you need to make the argument explicit.
Moving from
fun :: A -> B
fun = something
to
fun :: A -> B
fun a = something a
is called eta-expansion, and is a common transformation of Haskell code. The first style is usually called point-free (or, jokingly, point-less), while the second one is called pointful. Here "point" refers to the variable a.
Somewhat sidestepping the original question, but you may be interested in the safe package for tasks like this. In general, you should strive to use total functions that don't raise errors. In this case, that means using something like lastMay :: [a] -> Maybe a and initMay :: [a] -> Maybe a, which simply return Nothing if given an empty list. They can be composed using <=<, found in Control.Monad.
import Safe
myButLast :: [a] -> Maybe a
myButLast = lastMay <=< initMay
Then
> myButLast []
Nothing
> myButLast [1]
Nothing
> myButLast [1,2]
Just 1
If you really want an error message, Safe provides lastNote and initNote as well.
myButLast = let msg = "Need at least 2 elements" in (lastNote msg . initNote msg)
You could often simply compose an additional function that has the additional behaviour:
myButLast' :: [a] -> a
myButLast' = last . init . assertAtLeastTwo
where assertAtLeastTwo xs#(_:_:_) = xs
assertAtLeastTwo _ = error "The list has to have at least 2 elements!"
Here we've added a function that checks for the conditions we want to raise an error, and otherwise simply returns its input so that the other functions can act on it exactly as if assertAtLeastTwo wasn't there.
Another alternative that allows you to clearly highlight the error conditions is:
myButLast' :: [a] -> a
myButLast' [] = error "The list has to have at least 2 elements!"
myButLast' [x] = error "The list has to have at least 2 elements!"
myButLast' xs = go xs
where go = last . init
Where you do the error checking as you originally wrote, but have the main definition simply defer to an implementation function go, which can then be defined point-free using composition.
Or you can of course inline go from above, and have:
myButLast' xs = (last . init) xs
Sine a composition of functions is itself an expression, and can simply be used in a larger expression directly as the function. In fact a fairly common style is in fact to write code of the form "compose a bunch of functions then apply to this argument" this way, using the $ operator:
myButLast' xs = last . init $ xs
If you would use wrappers you can have the best of both worlds and a clear separation between the two. Robust error checking and reporting and a vanilla function to use however you wish, with or without the wrapper.
Interesting, the vanilla function reports 'last' cannot process an empty list given a one element list and 'init' cannot process an empty list when given an empty list.
mbl2 = last . init
mbl xs = if length xs < 2 then error errmsg else mbl2 xs
where errmsg = "Input list must contain at least two members."
I am trying to grok some of the fundamentals of the State Monad in Haskell, by constructing my own examples.
Consider a simple example where I want to count the number of even integers in an array of integers. Sure this can be done very easily using pure functions, but I wanted to try the round-about State monad route, where we keep a counter that keeps incrementing for every element that has been checked.
Here is a partial (but obviously wrong) attempt that I have managed to come up with thus far.
import Control.Monad.State
f' :: [Int] -> State Int [Int]
f' [] = state (\s -> ([],s))
f' (x:xs) = if x `mod` 2 == 0 then state (\s -> ((x:xs), s+1)) -- how can I fix this line?
else f' xs
This code compiles, but clearly does not give the right answer. How then can I fix this code, to do something similar to the following Python code
counter = 0 # incremented when we encounter an even integer.
for elt in integer_list:
if elt % 2 == 0 :
counter = counter + 1
The other answer starts from scratch to build up an implementation. I think it is also worth seeing a minimal change to your existing code to make it sensible. We will even keep your existing type -- though the other answer proposes that it be changed, I think it is acceptable (if not great).
In my opinion, the real problem is that you have recursed only in one branch of your if. What we really want is to recurse whether or not the current element is even. So:
f' (x:xs) = do
if x `mod` 2 == 0 then state (\s -> ((), s+1)) -- increase the count by one
else state (\s -> ((), s )) -- don't increase the count by one
rest <- f' xs -- compute the answer for the rest of the list
return (x:rest) -- reconstruct the answer for the whole list
We can check that it does the right thing in ghci:
> runState (f' [1..5]) 0
([1,2,3,4,5],2)
This is just about the smallest change you can make to get your implementation idea working.
From there, I would suggest a number of refactorings. First, your pervasive use of state is a code smell. I would write the various uses in this way instead:
f' [] = return []
f' (x:xs) = do
if x `mod` 2 == 0 then modify (+1) else return ()
rest <- f' xs
return (x:rest)
From here, I would use the even function in the conditional, and notice that the when function implements the "do some action or return ()" operation. So:
f' [] = return []
f' (x:xs) = do
when (even x) (modify (+1))
rest <- f' xs
return (x:rest)
Additionally, we actually have a combinator for running a monadic action on each element of a list; that is mapM. So we can turn the above explicit recursion into an implicit one in this way:
f' xs = mapM (\x -> when (even x) (modify (+1)) >> return x) xs
Finally, I think it a little bit odd that the function returns the list it consumed. Not unidiomatic, per se, as the previous objections have been, but maybe not what you want. If it turns out that you don't plan on using the resulting list in any followup computation, it will be more efficient to throw it away as you go; and the mapM_ combinator does this. So:
f' :: [Int] -> State Int ()
f' xs = mapM_ (\x -> when (even x) (modify (+1))) xs
At this point, I would consider f' to be quite a nice implementation of the idea you have proposed.
Let's get back to the drawing board. The actual function you want to use is something like
countEven :: [Int] -> Int
countEven xs = runStateMagic xs
where runStateMagic uses some State hidden in its depths. How could that function look like? Well, it has to use either execState or evalState. Since we're interested in the state only (aka, our current count of numbers), so let's replace runStateMagic with execState:
countEven :: [Int] -> Int
countEven xs = execState someState 0
Now, execState's type fixes our stateFunc to State Int a. The actual value type of the state is arbitrary, since we're not going to use it anyway. So what should someState do? It should probably work on the list, and use modify' (+1) if we have an even number. Let's write a helper for that:
increaseIfEven :: Int -> State Int ()
increaseIfEven n
| even n = modify' inc
| otherwise = return ()
This will now modify the state iff the number was even. All we have to do is to apply this to every element on the list. Therefore, for a list xs, we can simply do
mapM_ increaseIfEven xs
Remember, mapM_ :: (a -> m b) -> [a] -> m (). But in our case, that m is State Int, so it already contains our counter.
All in all, we end up with
countEven :: [Int] -> Int
countEven xs = execState (mapM_ increaseIfEven xs) 0
But keep in mind: the important part was to fix the type of the original function, f'.
This submission to Programming Praxis gives an O(n) function that "undoes" a preorder traversal of a binary search tree, converting a list back into a tree. Supplying the missing data declaration:
data Tree a = Leaf | Branch {value::a, left::Tree a, right:: Tree a}
deriving (Eq, Show)
fromPreOrder :: Ord a => [a] -> Tree a
fromPreOrder [] = Leaf
fromPreOrder (a:as) = Branch a l (fromPreOrder bs)
where
(l,bs) = lessThan a as
lessThan n [] = (Leaf,[])
lessThan n all#(a:as)
| a >= n = (Leaf,all)
| otherwise = (Branch a l r,cs)
where (l,bs) = lessThan a as
(r,cs) = lessThan n bs
It's obvious that one constructor is added to the tree in each recursive step, which is key to its efficiency.
The only "problem" is that the list is threaded through the computation manually, which is not a terribly Haskellian way to do it and makes it a little harder to see that it is actually consumed element by element in a single-threaded manner.
I attempted to correct this using a state monad (prettified on Codepad):
import Control.Monad.State
data Tree a = Leaf
| Branch {root::a, left::Tree a, right::Tree a}
deriving (Eq,Show)
peek = State peek' where
peek' [] = (Nothing,[])
peek' a#(x:_) = (Just x,a)
pop = State pop' where
pop' [] = error "Tried to read past the end of the list"
pop' (_:xs) = ((),xs)
prebuild'::Ord a => State [a] (Tree a)
prebuild' = do
next <- peek
case next of
Nothing -> return Leaf
Just x -> do
pop
leftpart <- lessThan x
rightpart <- prebuild'
return (Branch x leftpart rightpart)
lessThan n = do
next <- peek
case next of
Nothing -> return Leaf
Just x -> do
if x < n
then do
pop
leftpart <- lessThan x
rightpart <- lessThan n
return (Branch x leftpart rightpart)
else
return Leaf
prebuild::Ord a => [a] -> Tree a
prebuild = evalState prebuild'
Unfortunately, this just looks obscenely messy, and doesn't seem any easier to reason about.
One thought I haven't been able to get anywhere with yet (in part because I don't have a deep enough understanding of the underlying concepts, quite likely): could I use a left fold over the list that builds a continuation that ultimately produces the tree? Would that be possible? Also, would it be anything short of insane?
Another thought was to write this as a tree unfold, but I don't think it's possible to do that efficiently; the list will end up being traversed too many times and the program will be O(n^2).
Edit
Taking things from another direction, I have the nagging suspicion that it might be possible to come up with an algorithm that starts by splitting up the list into increasing segments and decreasing segments, but I haven't yet found something concrete to do with that idea.
I think the problem you're having with State is that your primitives (push, pop, peek) are not the right ones. I think a better one would be something like available_, which checks if the front of the stack matches a particular condition, and executes something different in each case:
available_ p f m = do
s <- get
case s of
x:xs | p x -> put xs >> f x
_ -> m
Actually, in our use case, we can specialize a bit: we will always want to return a Leaf when the head of our stack doesn't satisfy the condition, and we'll always want to recurse when it does.
available p m = available_ p
(\x -> liftM2 (Branch x) (lessThan' x) m)
(return Leaf)
(You could also just write available to begin with and skip available_ entirely. In my first iteration, that is what I did.) Now writing fromPreOrder and lessThan are a snap, and also I think give some insight into their behavior. I'll name them with primes so we can double-check they do the right thing with QuickCheck.
fromPreOrder' = available (const True) fromPreOrder'
lessThan' n = available (<n) (lessThan' n)
And in ghci:
> quickCheck (\xs -> fromPreOrder (xs :: [Int]) == evalState fromPreOrder' xs)
+++ OK, passed 100 tests.
While I can't answer the question about continuation passing, I believe that the State monad based implementation can be written much more clearly. First, we can use notational convenience such as those from Control.Applicative to make it easier to read. Second, we can upgrade the effect stack to include Maybe in order to capture the notion of failure (a) from taking the head of an empty list and (b) from the (a >= n) comparison as an effect.
import Control.Monad.State
import Control.Applicative
The final code uses the backtracking-state monad transformer stack. This means that we wrap State around Maybe instead of Maybe around State. In some sense we can think of this as meaning that failure is the "primary" effect. In practice it means that if the algorithm fails there's no way to continue using potentially bad state and so it must backtrack to the last known good state.
type Preord a b = StateT [a] Maybe b
Since we keep taking the head of a list and want to capture that failure correctly, we'll use a "safe head" function (which is the natural destructor of a list anyway, despite not being in the base Haskell libraries)
-- Safe list destructor
uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (a:as) = Just (a, as)
If we look at it cleverly we'll notice that this is already exactly the form of our monadic computation (StateT [a] Maybe b is isomorphic to [a] -> Maybe (b, [a])). We'll give it a more evocative name when lifted into the Monad.
-- Try to get the head or fail
getHead :: Preord a a
getHead = StateT uncons
A common feature of this algorithm is stopping local failures by providing a default value. I'll capture this in the certain combinator
-- Provides a default value for a failing computation
certain :: b -> Preord a b -> Preord a b
certain def p = p <|> return def
And now we can write the final algorithm very cleanly in our Preord monad.
fromPreOrder :: Ord a => Preord a (Tree a)
fromPreOrder = certain Leaf $ do
a <- getHead
Branch a <$> lessThan a <*> fromPreOrder
lessThan :: Ord a => a -> Preord a (Tree a)
lessThan n = certain Leaf $ do
a <- getHead
guard (a < n)
Branch a <$> lessThan a <*> lessThan n
Note that Applicative style helps to indicate that we're building the components of the Branch constructor using further effectful (state consuming) computations. The guard short-circuits lessThan when the pivot is already the least element in the pre-order traversal. We also explicitly see how both fromPreOrder and lessThan default out to Leaf when they cannot compute a better result.
(Also note that fromPreOrder and lessThan are nearly identical now, a commonality Daniel Wagner exploited in his own answer when writing available.)
We finally would want to hide all the monadic noise since, to an outside user, this is just a pure algorithm.
rebuildTree :: [a] -> Tree a
rebuildTree = fromMaybe Leaf . runStateT fromPreOrder
For a complete picture, here's the implementation of the algorithm using only the State monad. Note all the extra noise for handling failure! We've absorbed the entire popElse function into the effects of the backtracking state monad. We also lift the if up into the failure effect. Without that effect stack, our combinators are terrifically specific to the application instead of decomplected and useful elsewhere.
-- Try to take the head of the state list and return the default
-- if that's not possible.
popElse :: b -> (a -> State [a] b) -> State [a] b
popElse def go = do
x <- get
case x of
[] -> return def
(a:as) -> put as >> go a
push :: a -> State [a] ()
push a = modify (a:)
fromPreOrder :: Ord a => State [a] (Tree a)
fromPreOrder = popElse Leaf $ \a -> Branch a <$> lessThan a <*> fromPreOrder
lessThan :: Ord a => a -> State [a] (Tree a)
lessThan n =
popElse Leaf $ \a ->
if a >= n
then push a >> return Leaf
else Branch a <$> lessThan a <*> lessThan n
As you've said, the state monad doesn't really improve the situation, and I don't think it can be expected to, as it's both much too general in that it allows arbitrary access to the state, and annoying in that it enforces unnecessary sequencing.
At first glance, this looks quite like a foldr : we do one thing for the empty case, and in the (:) case we take the head off and make a recursive call based on the tail. However, as the recursive call isn't just using the tail directly, it isn't quite a foldr.
We could express it as a paramorphism but I don't think that really adds anything to the readability.
What I did notice is that the complicated recursion on the tail is all based on lessThan, which led me to the following idea for breaking down the algorithm:
lessThans [] = []
lessThans (a:as) = (a, l) : lessThans bs
where (l, bs) = lessThan a as
fromPreOrder2 :: Ord a => [a] -> Tree a
fromPreOrder2 = foldr (\(a, l) r -> Branch a l r) Leaf . lessThans
I'm sure lessThans could have a better name but I'm not quite sure what!
The foldr can also be expressed as foldr (uncurry Branch) Leaf but I'm not sure if that's an improvement.
EDIT: also, lessThans is an unfoldr, leading to this version:
fromPreOrder3 :: Ord a => [a] -> Tree a
fromPreOrder3 = foldr (uncurry Branch) Leaf . unfoldr lessThanList
lessThanList [] = Nothing
lessThanList (a:as) = Just ((a, l), bs)
where (l, bs) = lessThan a as
I'm trying to understand how Haskell list comprehensions work "under the hood" in regards to pattern matching. The following ghci output illustrates my point:
Prelude> let myList = [Just 1, Just 2, Nothing, Just 3]
Prelude> let xs = [x | Just x <- myList]
Prelude> xs
[1,2,3]
Prelude>
As you can see, it is able to skip the "Nothing" and select only the "Just" values. I understand that List is a monad, defined as (source from Real World Haskell, ch. 14):
instance Monad [] where
return x = [x]
xs >>= f = concat (map f xs)
xs >> f = concat (map (\_ -> f) xs)
fail _ = []
Therefore, a list comprehension basically builds a singleton list for every element selected in the list comprehension and concatenates them. If a pattern match fails at some step, the result of the "fail" function is used instead. In other words, the "Just x" pattern doesn't match so [] is used as a placeholder until 'concat' is called. That explains why the "Nothing" appears to be skipped.
What I don't understand is, how does Haskell know to call the "fail" function? Is it "compiler magic", or functionality that you can write yourself in Haskell? Is it possible to write the following "select" function to work the same way as a list comprehension?
select :: (a -> b) -> [a] -> [b]
select (Just x -> x) myList -- how to prevent the lambda from raising an error?
[1,2,3]
While implemenatations of Haskell might not do it directly like this internally, it is helpful to think about it this way :)
[x | Just x <- myList]
... becomes:
do
Just x <- myList
return x
... which is:
myList >>= \(Just x) -> return x
As to your question:
What I don't understand is, how does Haskell know to call the "fail" function?
In do-notation, if a pattern binding fails (i.e. the Just x), then the fail method is called. For the above example, it would look something like this:
myList >>= \temp -> case temp of
(Just x) -> return x
_ -> fail "..."
So, every time you have a pattern-match in a monadic context that may fail, Haskell inserts a call to fail. Try it out with IO:
main = do
(1,x) <- return (0,2)
print x -- x would be 2, but the pattern match fails
The rule for desugaring a list comprehension requires an expression of the form [ e | p <- l ] (where e is an expression, p a pattern, and l a list expression) behave like
let ok p = [e]
ok _ = []
in concatMap ok l
Previous versions of Haskell had monad comprehensions, which were removed from the language because they were hard to read and redundant with the do-notation. (List comprehensions are redundant, too, but they aren't so hard to read.) I think desugaring [ e | p <- l ] as a monad (or, to be precise, as a monad with zero) would yield something like
let ok p = return e
ok _ = mzero
in l >>= ok
where mzero is from the MonadPlus class. This is very close to
do { p <- l; return e }
which desugars to
let ok p = return e
ok _ = fail "..."
in l >>= ok
When we take the List Monad, we have
return e = [e]
mzero = fail _ = []
(>>=) = flip concatMap
I.e., the 3 approaches (list comprehensions, monad comprehensions, do expressions) are equivalent for lists.
I don't think the list comprehension syntax has much to do with the fact that List ([]), or Maybe for that matter, happens to be an instance of the Monad type class.
List comprehensions are indeed compiler magic or syntax sugar, but that's possible because the compiler knows the structure of the [] data type.
Here's what the list comprehension is compiled to: (Well, I think, I didn't actually check it against the GHC)
xs = let f = \xs -> case xs of
Just x -> [x]
_ -> []
in concatMap f myList
As you can see, the compiler doesn't have to call the fail function, it can simply inline a empty list, because it knows what a list is.
Interestingly, this fact that the list comprehensions syntax 'skips' pattern match failures is used in some libraries to do generic programming. See the example in the Uniplate library.
Edit: Oh, and to answer your question, you can't call your select function with the lambda you gave it. It will indeed fail on a pattern match failure if you call it with an Nothing value.
You could pass it the f function from the code above, but than select would have the type:
select :: (a -> [b]) -> [a] -> [b]
which is perfectly fine, you can use the concatMap function internally :-)
Also, that new select now has the type of the monadic bind operator for lists (with its arguments flipped):
(>>=) :: [a] -> (a -> [b]) -> [b]
xs >>= f = concatMap f xs -- 'or as you said: concat (map f xs)