Haskell: Applying function to all combinations of arguments - haskell

I am posting below some code in Haskell. Please treat the code as an example, which I am going to use, to explain what I would like to know.
try :: [[Char]] -> [[Char]] -> [[Char]] -> [[Char]]
try (a:as) (b:bs) (c:cs) | ((checkIfCorrect a b c) == True) = a:b:[c]
| otherwise = try as bs cs
checkIfCorrect :: [Char] -> [Char] -> [Char] -> Bool
checkIfCorrect a b c = True
Eventually, checkIfCorrect returns True for only one combination of arguments. checkIfCorrect is really long function, so I decided to post substitute here. In the example above, function checkIfCorrect is applicated (by function try) to: first [Char] on the first list, first [Char] on the second list and first [Char] one the third list. If first guarded equation is not fulfilled, function checkIfCorrect is applied to: second [Char] on the first list ... and so on. What I would like to reach, is to applicate function checkIfCorrect (by function try) to all combinations of [Char]s from all lists
([[Char]] ). I mean the following (e.g.): third [Char] on the first list, eighth [Char] on the second list, eleventh [Char] on the third list and so on. Everyone with everyone. How could I easy reach that?

I just wanted to show you an alternative way of writing #WillemVanOnsem's code. I'm guessing that you're a Haskell beginner, so hopefully this answer will give you a tiny glimpse of a rich and beautiful idea which you'll soon be learning about in full as you progress with the language. So don't worry too much if you don't understand everything about this code right away; I'm just trying to give you a taste!
A list comprehension can always be reformulated using the list monad:
import Control.Monad (guard)
try as bs cs = head $ do
a <- as
b <- bs
c <- cs
guard $ checkIfCorrect a b c
return [a,b,c]
I'm using do notation as a special notation for nested loops: for each a in as, for each b in bs, and for each c in cs, we yield [a,b,c] if checkIfCorrect returns True. The translation from list comprehensions is simple: "enumeration" parts of the list comprehension turn into "binds" using <-, "filter"s turn into calls to guard, and "yield"s turn into returns.
In an imperative language like Python you might write it like this:
def try(as, bs, cs):
for a in as:
for b in bs:
for c in cs:
if checkIfCorrect(a, b, c):
yield [a,b,c]
Like politics under the Western neoliberal hegemony, the imperative code gradually marches rightward. "Staircase" code like this actually crops up quite frequently in imperative programming (think of "callback hell" in JS), so monads were invented to help counteract this tendency. They turned out to be so useful that a special syntax was invented for them, namely do-notation.

Yes, you can make it also look more elegant with list comprehension:
try :: [[Char]] -> [[Char]] -> [[Char]] -> [[Char]]
try as bs cs = head [ [a,b,c] | a <- as, b <- bs, c <- cs, checkIfCorrect a b c ]
-- \__ __/ \__________ ____________/ \__________ _______/
-- v v v
-- yield enumeration filter
The code works as follows: the right part of the list comprehension consists out of an "enumeration" part (denoted by the comment section). Since we write a <- as, b <- bs, c <- cs it means that a will take any value from as, and for every such a, b will take any value of bs, etc. So that means every possible combination will be emitted.
Next there is the "filter" phase: there is a predicate checkIfCorrect a b c that will be called and only if that predicate returns True, the result will be "yielded".
On the left side we see "yield". It describes what to add to the list (based on the enumeration) given the filter succeeds. If that happens we add [a,b,c] to that list. If there are multiple such configurations that succeed, we might end up with a list containing multiple solutions. Note however that list comprehension is done lazily: so as long as you do not ask for at least one such element, it will not generate the first element, nor the second, etc.
Now we also need head (in front of the list comprehension). head :: [a] -> a returns the first element of the list. So try will return the first element that satisfies the condition.

While both Willem Van Onsem's and The Orgazoid's answers are good (upvoted), you can also approach part of the problem in a more generalised way, and not only for lists.
For the following, you're going to need these imports:
import Control.Monad (MonadPlus, mfilter)
import Data.Maybe (fromMaybe, listToMaybe)
If I'm understanding the question correctly, you want to try all combinations of as, bs, and cs. You can typically achieve combination-like behaviour with the Applicative typeclass:
combinations = (,,) <$> as <*> bs <*> cs
(,,) is a function that creates triples (three-element tuples) from three individual values.
This works for lists, because lists are applicative:
*Prelude> (,,) <$> [1,2] <*> ["foo", "bar"] <*> [True, False]
[(1,"foo",True),(1,"foo",False),(1,"bar",True),(1,"bar",False),(2,"foo",True),(2,"foo",False),(2,"bar",True),(2,"bar",False)]
but it also works for e.g. Maybes:
*Prelude> (,,) <$> Just 1 <*> Just "foo" <*> Just False
Just (1,"foo",False)
With that, you can now define the core of your function:
try' :: MonadPlus m => ((a, a, a) -> Bool) -> m a -> m a -> m a -> m [a]
try' predicate as bs cs =
tripleToList <$> mfilter predicate combinations
where
combinations = (,,) <$> as <*> bs <*> cs
tripleToList (a, b, c) = [a, b, c]
You'll notice that this helper function is completely generic. It works for any MonadPlus instance of any contained element a.
Here are some examples:
*Answer> try' (const True) ["foo", "bar", "baz"] ["qux", "quux", "quuz", "corge"] ["grault", "garply"]
[["foo","qux","grault"],["foo","qux","garply"],["foo","quux","grault"],["foo","quux","garply"],["foo","quuz","grault"],["foo","quuz","garply"],["foo","corge","grault"],["foo","corge","garply"],["bar","qux","grault"],["bar","qux","garply"],["bar","quux","grault"],["bar","quux","garply"],["bar","quuz","grault"],["bar","quuz","garply"],["bar","corge","grault"],["bar","corge","garply"],["baz","qux","grault"],["baz","qux","garply"],["baz","quux","grault"],["baz","quux","garply"],["baz","quuz","grault"],["baz","quuz","garply"],["baz","corge","grault"],["baz","corge","garply"]]
*Answer> try' (const False) ["foo", "bar", "baz"] ["qux", "quux", "quuz", "corge"] ["grault", "garply"]
[]
*Answer> try' (const True) (Just "foo") (Just "bar") (Just "baz")
Just ["foo","bar","baz"]
*Answer> try' (const False) (Just "foo") (Just "bar") (Just "baz")
Nothing
You should notice that if predicate always returns False, you'll get nothing back. For lists, you get the empty list; for Maybe, you literally get Nothing.
So far it's all generic, but checkIfCorrect isn't. It also looks like you'd like to get only the first elements that match. You can achieve that by composing try' with checkIfCorrect:
try :: [String] -> [String] -> [String] -> [String]
try as bs cs = fromMaybe [] $ listToMaybe $ try' isCorrect as bs cs
where isCorrect (a, b, c) = checkIfCorrect a b c
Here, I've created a private isCorrect function in order to uncurry the checkIfCorrect function. I've then used a combination of listToMaybe and fromMaybe to return the first element of the resulting list. Other answers here use head, but that's going to throw an exception if the list is empty, so I used this combination instead, because it's safe.

Related

Iteratively printing every integer in a List

Say I have a List of integers l = [1,2]
Which I want to print to stdout.
Doing print l produces [1,2]
Say I want to print the list without the braces
map print l produces
No instance for (Show (IO ())) arising from a use of `print'
Possible fix: add an instance declaration for (Show (IO ()))
In a stmt of an interactive GHCi command: print it
`:t print
print :: Show a => a -> IO ()
So while I thought this would work I went ahead and tried:
map putStr $ map show l
Since I suspected a type mismatch from Integer to String was to blame. This produced the same error message as above.
I realize that I could do something like concatenating the list into a string, but I would like to avoid that if possible.
What's going on? How can I do this without constructing a string from the elements of the List?
The problem is that
map :: (a -> b) -> [a] -> [b]
So we end up with [IO ()]. This is a pure value, a list of IO actions. It won't actually print anything. Instead we want
mapM_ :: (a -> IO ()) -> [a] -> IO ()
The naming convention *M means that it operates over monads and *_ means we throw away the value. This is like map except it sequences each action with >> to return an IO action.
As an example mapM_ print [1..10] will print each element on a new line.
Suppose you're given a list xs :: [a] and function f :: Monad m => a -> m b. You want to apply the function f to each element of xs, yielding a list of actions, then sequence these actions. Here is how I would go about constructing a function, call it mapM, that does this. In the base case, xs = [] is the empty list, and we simply return []. In the recursive case, xs has the form x : xs. First, we want to apply f to x, giving the action f x :: m b. Next, we want recursively call mapM on xs. The result of performing the first step is a value, say y; the result of performing the second step is a list of values, say ys. So we collect y and ys into a list, then return them in the monad:
mapM :: Monad m => (a -> m b) -> [a] -> m [b]
mapM f [] = return []
mapM f (x : xs) = f x >>= \y -> mapM f ys >>= \ys -> return (y : ys)
Now we can map a function like print, which returns an action in the IO monad, over a list of values to print: mapM print [1..10] does precisely this for the list of integers from one through ten. There is a problem, however: we aren't particularly concerned about collecting the results of printing operations; we're primarily concerned about their side effects. Instead of returning y : ys, we simply return ().
mapM_ :: Monad m => (a -> m b) ->[a] -> m ()
mapM_ f [] = return ()
mapM_ f (x : xs) = f x >> mapM_ f xs
Note that mapM and mapM_ can be defined without explicit recursion using the sequence and sequence_ functions from the standard library, which do precisely what their names imply. If you look at the source code for mapM and mapM_ in Control.Monad, you will see them implemented that way.
Everything in Haskell is very strongly typed, including code to perform IO!
When you write print [1, 2], this is just a convenience wrapper for putStrLn (show [1, 2]), where show is a function that turns a (Show'able) object into a string. print itself doesn't do anything (in the side effect sense of do), but it outputs an IO() action, which is sort of like a mini unrun "program" (if you excuse the sloppy language), which isn't "run" at its creation time, but which can be passed around for later execution. You can verify the type in ghci
> :t print [1, 2]
print [1, 2]::IO()
This is just an object of type IO ().... You could throw this away right now and nothing would ever happen. More likely, if you use this object in main, the IO code will run, side effects and all.
When you map multiple putStrLn (or print) functions onto a list, you still get an object whose type you can view in ghci
> :t map print [1, 2]
map print [1, 2]::[IO()]
Like before, this is just an object that you can pass around, and by itself it will not do anything. But unlike before, the type is incorrect for usage in main, which expects an IO() object. In order to use it, you need to convert it to this type.
There are many ways to do this conversion.... One way that I like is the sequence function.
sequence $ map print [1, 2]
which takes a list of IO actions (ie- mini "programs" with side effects, if you will forgive the sloppy language), and sequences them together as on IO action. This code alone will now do what you want.
As jozefg pointed out, although sequence works, sequence_ is a better choice here....
Sequence not only concatinates the stuff in the IO action, but also puts the return values in a list.... Since print's return value is IO(), the new return value becomes a useless list of ()'s (in IO). :)
Using the lens library:
[1,2,3] ^! each . act print
You might write your own function, too:
Prelude> let l = [1,2]
Prelude> let f [] = return (); f (x:xs) = do print x; f xs
Prelude> f l
1
2

Cont Monad breaks laziness in Haskell

I was trying the Cont monad, and discovers the following problem.
First construct a infinite list and lift all the elements to a Cont monad
Use sequence operation to get a Cont monad on the infinite list.
When we try to run the monad, with head, for example, it falls into infinite loop
while trying to expand the continuation and the head is never called.
The code looks like this:
let inff = map (return :: a -> Cont r a) [0..]
let seqf = sequence inff
runCont seqf head
So is this a limitation of the Cont monad implementation in Haskell?
If so, how do we improve this?
The reason is that even though the value of the head element of sequence someList depends only on the first elemenent of someList, the effect of sequence someList can generally depend on all the effects of someList (and it does for most monads). Therefore, if we want to evaluate the head element, we still need to evaluate all the effects.
For example, if we have a list of Maybe values, the result of sequence someList is Just only if all the elements of someList are Just. So if we try to sequence an infinite list, we'd need to examine its infinite number of elements if they're all Just.
The same applies for Cont.
In the continuation monad, we can escape any time from the computation and return a result that is different from what has been computed so far.
Consider the following example:
test :: (Num a, Enum a) => a
test = flip runCont head $
callCC $ \esc -> do
sequence (map return [0..100] ++ [esc [-1]])
or directly using cont instead of callCC:
test' :: (Num a, Enum a) => a
test' = flip runCont head $
sequence (map return [0..100] ++ [cont (const (-1))])
The result of test is just -1. After processing the first 100 elements, the final element can decide to escape all of this and return -1 instead. So in order to see what is the head element of sequence someList in Cont, we again need to compute them all.
This is not a flaw with the Cont monad so much as sequence. You can get similar results for Either, for example:
import Control.Monad.Instances ()
xs :: [Either a Int]
xs = map Right [0..] -- Note: return = Right, for Either
ys :: Either a [Int]
ys = sequence xs
You can't retrieve any elements of ys until it computes the entire list, which will never happen.
Also, note that: sequence (map f xs) = mapM f xs, so we can simplify this example to:
>>> import Control.Monad.Instances
>>> mapM Right [0..]
<Hangs forever>
There are a few monads where mapM will work on an infinite list of values, specifically the lazy StateT monad and Identity, but they are the exception to the rule.
Generally, mapM/sequence/replicateM (without trailing underscores) are anti-patterns and the correct solution is to use pipes, which allows you to build effectful streams that don't try to compute all the results up front. The beginning of the pipes tutorial describes how to solve this in more detail, but the general rule of thumb is that any time you write something like:
example1 = mapM f xs
example2 = sequence xs
You can transform it into a lazy Producer by just transforming it to:
example1' = each xs >-> Pipes.Prelude.mapM f
example2' = each xs >-> Pipes.Prelude.sequence
Using the above example with Either, you would write:
>>> import Pipes
>>> let xs = each [0..] >-> mapM Right :: Producer Int (Either a) ()
Then you can lazily process the stream without generating all elements:
>>> Pipes.Prelude.any (> 10) xs
Right True

List processing in Haskell

I am teaching myself Haskell and have run into a problem and need help.
Background:
type AInfo = (Char, Int)
type AList = [AInfo] (let’s say [(‘a’, 2), (‘b’,5), (‘a’, 1), (‘w’, 21)]
type BInfo = Char
type BList = [BInfo] (let’s say [‘a’, ‘a’, ‘c’, ‘g’, ‘a’, ‘w’, ‘b’]
One quick edit: The above information is for illustrative purposes only. The actual elements of the lists are a bit more complex. Also, the lists are not static; they are dynamic (hence the uses of the IO monad) and I need to keep/pass/"return"/have access to and change the lists during the running of the program.
I am looking to do the following:
For all elements of AList check against all elements of BList and where the character of the AList element (pair) is equal to the character in the Blist add one to the Int value of the AList element (pair) and remove the character from BList.
So what this means is after the first element of AList is checked against all elements of BList the values of the lists should be:
AList [(‘a’, 5), (‘b’,5), (‘a’, 1), (‘w’, 21)]
BList [‘c’, ‘g’, ‘w’, ‘b’]
And in the end, the lists values should be:
AList [(‘a’, 5), (‘b’,6), (‘a’, 1), (‘w’, 22)]
BList [‘c’, ‘g’]
Of course, all of this is happening in an IO monad.
Things I have tried:
Using mapM and a recursive helper function. I have looked at both:
Every element of AList checked against every element of bList -- mapM (myHelpF1 alist) blist and
Every element of BList checked against every element of AList – mapM (myHelpF2 alist) blist
Passing both lists to a function and using a complicated
if/then/else & helper function calls (feels like I am forcing
Haskell to be iterative; Messy convoluted code, Does not feel
right.)
I have thought about using filter, the character value of AList
element and Blist to create a third list of Bool and the count the
number of True values. Update the Int value. Then use filter on
BList to remove the BList elements that …… (again Does not feel
right, not very Haskell-like.)
Things I think I know about the problem:
The solution may be exceeding trivial. So much so, the more experienced Haskellers will be muttering under their breath “what a noob” as they type their response.
Any pointers would be greatly appreciated. (mutter away….)
A few pointers:
Don't use [(Char, Int)] for "AList". The data structure you are looking for is a finite map: Map Char Int. Particularly look at member and insertWith. toList and fromList convert from the representation you currently have for AList, so even if you are stuck with that representation, you can convert to a Map for this algorithm and convert back at the end. (This will be more efficient than staying in a list because you are doing so many lookups, and the finite map API is easier to work with than lists)
I'd approach the problem as two phases: (1) partition out the elements of blist by whether they are in the map, (2) insertWith the elements which are already in the map. Then you can return the resulting map and the other partition.
I would also get rid of the meaningless assumptions such as that keys are Char -- you can just say they are any type k (for "key") that satisfies the necessary constraints (that you can put it in a Map, which requires that it is Orderable). You do this with lowercase type variables:
import qualified Data.Map as Map
sieveList :: (Ord k) => Map.Map k Int -> [k] -> (Map.Map k Int, [k])
Writing algorithms in greater generality helps catch bugs, because it makes sure that you don't use any assumptions you don't need.
Oh, also this program has no business being in the IO monad. This is pure code.
import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
process :: AList -> BList -> AList
process [] _ = []
process (a:as) b = if is_in a b then (fst a,snd a + 1):(process as (delete (fst a) b)) else a:process as b where
is_in f [] = False
is_in f (s:ss) = if fst f == s then True else is_in f ss
*Main> process [('a',5),('b',5),('a',1),('b',21)] ['c','b','g','w','b']
[('a',5),('b',6),('a',1),('b',22)]
*Main> process [('a',5),('b',5),('a',1),('w',21)] ['c','g','w','b']
[('a',5),('b',6),('a',1),('w',22)]
Probably an important disclaimer: I'm rusty at Haskell to the point of ineptness, but as a relaxing midnight exercise I wrote this thing. It should do what you want, although it doesn't return a BList. With a bit of modification, you can get it to return an (AList,BList) tuple, but methinks you'd be better off using an imperative language if that kind of manipulation is required.
Alternately, there's an elegant solution and I'm too ignorant of Haskell to know it.
While I am by no means a Haskell expert, I have a partial attempt that returns that result of an operation once. Maybe you can find out how to map it over the rest to get your solution. The addwhile is clever, since you only want to update the first occurrence of an element in lista, if it exists twice, it will just add 0 to it. Code critiques are more than welcome.
import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
lista = ([('a', 2), ('b',5), ('a', 1), ('w', 21)] :: AList)
listb = ['a','a','c','g','a','w','b']
--step one, get the head, and its occurrences
items list = (eleA, eleB) where
eleA = length $ filter (\x -> x == (head list)) list
eleB = head list
getRidOfIt list ele = (dropWhile (\x -> x == ele) list) --drop like its hot
--add to lista
addWhile :: [(Char, Int)] -> Char -> Int -> [(Char,Int)]
addWhile [] _ _ = []
addWhile ((x,y):xs) letter times = if x == letter then (x,y+times) : addWhile xs letter times
else (x,y) : addWhile xs letter 0
--first answer
firstAnswer = addWhile lista (snd $ items listb) (fst $ items listb)
--[('a',5),('b',5),('a',1),('w',21)]
The operation you describe is pure, as #luqui points out, so we just define it as a pure Haskell function. It can be used inside a monad (including IO) by means of fmap (or do).
import Data.List
combine alist blist = (reverse a, b4) where
First we sort and count the B list:
b = map (\g->(head g,length g)) . group . sort $ blist
We need the import for group and sort to be available. Next, we roll along the alist and do our thing:
(a,b2) = foldl g ([],b) alist
g (acc,b) e#(x,c) = case pick x b of
Nothing -> (e:acc,b)
Just (n,b2) -> ((x,c+n):acc,b2)
b3 = map fst b2
b4 = [ c | c <- blist, elem c b3 ]
Now pick, as used, must be
pick x [] = Nothing
pick x ((y,n):t)
| x==y = Just (n,t)
| otherwise = case pick x t of Nothing -> Nothing
Just (k,r) -> Just (k, (y,n):r)
Of course pick performs a linear search, so if performance (speed) becomes a problem, b should be changed to allow for binary search (tree etc, like Map). The calculation of b4 which is filter (`elem` b3) blist is another potential performance problem with its repeated checks for presence in b3. Again, checking for presence in trees is faster than in lists, in general.
Test run:
> combine [('a', 2), ('b',5), ('a', 1), ('w', 21)] "aacgawb"
([('a',5),('b',6),('a',1),('w',22)],"cg")
edit: you probably want it the other way around, rolling along the blist while updating the alist and producing (or not) the elements of blist in the result (b4 in my code). That way the algorithm will operate in a more local manner on long input streams (that assuming your blist is long, though you didn't say that). As written above, it will have a space problem, consuming the input stream blist several times over. I'll keep it as is as an illustration, a food for thought.
So if you decide to go the 2nd route, first convert your alist into a Map (beware the duplicates!). Then, scanning (with scanl) over blist, make use of updateLookupWithKey to update the counts map and at the same time decide for each member of blist, one by one, whether to output it or not. The type of the accumulator will thus have to be (Map a Int, Maybe a), with a your element type (blist :: [a]):
scanl :: (acc -> a -> acc) -> acc -> [a] -> [acc]
scanning = tail $ scanl g (Nothing, fromList $ reverse alist) blist
g (_,cmap) a = case updateLookupWithKey (\_ c->Just(c+1)) a cmap of
(Just _, m2) -> (Nothing, m2) -- seen before
_ -> (Just a, cmap) -- not present in counts
new_b_list = [ a | (Just a,_) <- scanning ]
last_counts = snd $ last scanning
You will have to combine the toList last_counts with the original alist if you have to preserve the old duplicates there (why would you?).

iterating through a list in haskell

I have a list of list of characters ::[[Char]].
I need to iterate both over the list of strings and also over each character in each string.
Say, my list is present in this variable.
let xs
Please suggest an easy way to iterate.
If you want to apply a function f to every element of a list like this:
[a, b, c, d] → [f a, f b, f c, f d]
then map f xs does the trick. map turns a function on elements to a function on lists. So, we can nest it to operate on lists of lists: if f transforms as into bs, map (map f) transforms [[a]]s into [[b]]s.
If you instead want to perform some IO action for every element of a list (which is more like traditional iteration), then you're probably looking for forM_:1
forM_ :: [a] -> (a -> IO b) -> IO ()
You give it a function, and it calls it with each element of the list in order. For instance, forM_ xs putStrLn is an IO action that will print out every string in xs on its own line. Here's an example of a more involved use of forM_:
main = do
...
forM_ xs $ \s -> do
putStrLn "Here's a string:"
forM_ s print
putStrLn "Now it's done."
If xs contains ["hello", "world"], then this will print out:
Here's a string:
'h'
'e'
'l'
'l'
'o'
Now it's done.
Here's a string:
'w'
'o'
'r'
'l'
'd'
Now it's done.
1 forM_ actually has a more general type, but the simpler version I've shown is more relevant here.
Just that:
[c | x <- xs, c <- x]
The "correct" way to iterate is actually fold. Anything you might ever want to do with a list can be done with a fold. Let's consider what you want to do. You're probably thinking of something like this:
for (row in xs):
for (c in row):
doSomething
The problem is, you're probably making use of mutable variables in doSomething. That's ok, we can deal with that. So suppose you have this.
def iter2d(xs):
outerVar = outerInit
for (row in xs):
innerVar = innerInit(row)
outerVar.adjust1(row)
for (c in row):
innerVar.adjust2(c)
outerVar.adjust3(c, innerVar)
return outerVar
Let's translate that to folds. And immutability.
iter2d :: [[Char]] -> Something
iter2d xs = foldl' outerStep outerInit xs
where outerInit = ... -- same as outerInit above
outerStep acc row = fst $ foldl' innerStep innerInit' row)
where innerInit' = ((adjust1 acc row), innerInit row)
innerInit row = ... -- same as innerInit above
innerStep (outAcc, inAcc) c = (outAcc', inAcc')
where inAcc' = adjust2 inAcc c
outAcc' = adjust3 outAcc c inAcc'
Notice with immutability, we are forced to indicate that outAc' depends on inAcc', rather than inAcc, meaning, the "state" of innerVar after it is updated.
Now you might say "wow that Haskell looks way ugly, why would I ever want to use Haskell". Yes, it does look ugly, but only because I tailored it to be a direct translation of imperative code. Once you get used to using folds instead of "iterating through a list", then you will find that folding is a very powerful technique that lets you do a lot of things in a more elegant way than for loops allow.
map (map f) l
where f :: Char -> Foo is a function to apply to each Char and l :: [[Char]]
returns l' :: [[Foo]]

Why don't Haskell list comprehensions cause an error when pattern match fails?

I'm trying to understand how Haskell list comprehensions work "under the hood" in regards to pattern matching. The following ghci output illustrates my point:
Prelude> let myList = [Just 1, Just 2, Nothing, Just 3]
Prelude> let xs = [x | Just x <- myList]
Prelude> xs
[1,2,3]
Prelude>
As you can see, it is able to skip the "Nothing" and select only the "Just" values. I understand that List is a monad, defined as (source from Real World Haskell, ch. 14):
instance Monad [] where
return x = [x]
xs >>= f = concat (map f xs)
xs >> f = concat (map (\_ -> f) xs)
fail _ = []
Therefore, a list comprehension basically builds a singleton list for every element selected in the list comprehension and concatenates them. If a pattern match fails at some step, the result of the "fail" function is used instead. In other words, the "Just x" pattern doesn't match so [] is used as a placeholder until 'concat' is called. That explains why the "Nothing" appears to be skipped.
What I don't understand is, how does Haskell know to call the "fail" function? Is it "compiler magic", or functionality that you can write yourself in Haskell? Is it possible to write the following "select" function to work the same way as a list comprehension?
select :: (a -> b) -> [a] -> [b]
select (Just x -> x) myList -- how to prevent the lambda from raising an error?
[1,2,3]
While implemenatations of Haskell might not do it directly like this internally, it is helpful to think about it this way :)
[x | Just x <- myList]
... becomes:
do
Just x <- myList
return x
... which is:
myList >>= \(Just x) -> return x
As to your question:
What I don't understand is, how does Haskell know to call the "fail" function?
In do-notation, if a pattern binding fails (i.e. the Just x), then the fail method is called. For the above example, it would look something like this:
myList >>= \temp -> case temp of
(Just x) -> return x
_ -> fail "..."
So, every time you have a pattern-match in a monadic context that may fail, Haskell inserts a call to fail. Try it out with IO:
main = do
(1,x) <- return (0,2)
print x -- x would be 2, but the pattern match fails
The rule for desugaring a list comprehension requires an expression of the form [ e | p <- l ] (where e is an expression, p a pattern, and l a list expression) behave like
let ok p = [e]
ok _ = []
in concatMap ok l
Previous versions of Haskell had monad comprehensions, which were removed from the language because they were hard to read and redundant with the do-notation. (List comprehensions are redundant, too, but they aren't so hard to read.) I think desugaring [ e | p <- l ] as a monad (or, to be precise, as a monad with zero) would yield something like
let ok p = return e
ok _ = mzero
in l >>= ok
where mzero is from the MonadPlus class. This is very close to
do { p <- l; return e }
which desugars to
let ok p = return e
ok _ = fail "..."
in l >>= ok
When we take the List Monad, we have
return e = [e]
mzero = fail _ = []
(>>=) = flip concatMap
I.e., the 3 approaches (list comprehensions, monad comprehensions, do expressions) are equivalent for lists.
I don't think the list comprehension syntax has much to do with the fact that List ([]), or Maybe for that matter, happens to be an instance of the Monad type class.
List comprehensions are indeed compiler magic or syntax sugar, but that's possible because the compiler knows the structure of the [] data type.
Here's what the list comprehension is compiled to: (Well, I think, I didn't actually check it against the GHC)
xs = let f = \xs -> case xs of
Just x -> [x]
_ -> []
in concatMap f myList
As you can see, the compiler doesn't have to call the fail function, it can simply inline a empty list, because it knows what a list is.
Interestingly, this fact that the list comprehensions syntax 'skips' pattern match failures is used in some libraries to do generic programming. See the example in the Uniplate library.
Edit: Oh, and to answer your question, you can't call your select function with the lambda you gave it. It will indeed fail on a pattern match failure if you call it with an Nothing value.
You could pass it the f function from the code above, but than select would have the type:
select :: (a -> [b]) -> [a] -> [b]
which is perfectly fine, you can use the concatMap function internally :-)
Also, that new select now has the type of the monadic bind operator for lists (with its arguments flipped):
(>>=) :: [a] -> (a -> [b]) -> [b]
xs >>= f = concatMap f xs -- 'or as you said: concat (map f xs)

Resources