Related
I'm following the NLPWP Computational Linguistics site and trying to create a Haskell procedure to find collocations (most common groupings of two words, like "United States" or "to find") in a list of words. I've got the following working code to find bigram frequency:
import Data.Map (Map)
import qualified Data.Map as Map
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
bigram [] = []
bigram [_] = []
bigram xs = take 2 xs : bigram (tail xs)
-- | Helper for freqList and freqBigram
countElem base alow = case (Map.lookup alow base) of
Just v -> Map.insert alow (v + 1) base
Nothing -> Map.insert alow 1 base
-- | Maps each word to its frequency.
freqList alow = foldl countElem Map.empty alow
-- | Maps each bigram to its frequency.
freqBigram alow = foldl countElem Map.empty (bigram alow)
I'm trying to write a function that outputs a Map from each bigram to [freq of bigram]/[(freq word 1)*(freq word 2)]. Could you possibly provide advice on how to approach it?
None of the following code is working, but it gives a vague outline for what I was trying to do.
collocations alow =
| let f key = (Map.lookup key freqBi) / ((Map.lookup (first alow) freqs)*(Map.lookup (last alow) freqs))
in Map.mapWithKey f = freqBi
where freqs = (freqList alow)
where freqBi = (freqBigram alow)
I'm very new to Haskell, so let me know if you've got any idea how to fix the collocations procedure. Style tips are also welcome.
Most of your code looks sane, except for the final colloctions function.
I'm not sure why there's a stray pipe in there after the equals sign. You're not trying to write any kind of pattern guard, so I don't think that should be there.
Map.lookup returns a Maybe key, so trying to do division or multiplication isn't going to work. Maybe what you want is some kind of function that takes a key and a map, and returns the associated count or zero if the key doesn't exist?
Other than that, it looks like you're not too far off having this work.
As I read it, your confusion stems from mistaking types, more or less. General advice: Use type signatures on all your top level functions and make sure they are sensible and what you expect of the function (I often do this even before implementing the function).
Let's take a look at your
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
If you're giving in a list of Strings, you'll be getting a list of lists of Strings, so your bigram is a list.
You could decide to be more explicit (only allow Strings instead of sometype a - for the beginning at least). So, actually we get a list of Words an make a list of Bigrams from it:
type Word = String
type Bigram = (Word, Word)
bigram :: [Word] -> [Bigram]
For the implementation you can try to use readily available functions from Data.List, for example zipWith and tail.
Now your freqList and freqBigram look like
freqList :: [Word] -> Map Word Int
freqBigram :: [Word] -> Map Bigram Int
With this error messages of the compiler will be clearer to you. To point at it: Take care what you're doing in the lookups for the word frequencies. You're searching for the frequency of word1 and word2, and the bigram is (word1,word2).
Now you should be able to figure the solution out on your own, I guess.
First of all I advise you to have a look at the function
insertWith :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a
maybe you'll recognize the pattern if used
f freqs bg = insertWith (+) bg 1 freqs
Next as #MathematicalOrchid already pointed out your solution is not too far from being correct.
lookup :: Ord k => k -> Map k a -> Maybe a
You already took care of that in your countElems function.
what I'd like to note that there is this neat abstraction called Applicative, which works really well for problems like yours.
First of all you have to import Control.Applicative if you're using GHC prior to 7.10 for newer versions it is already at your fingertips.
So what does this abstraction provide, similar to Functor it gives you a way to handle "side effects" in your case the possibility of the failing lookup resulting in Nothing.
We have two operators provided by Applicative: pure and <*>, and in addition as every Applicative is required to be a Functor we also get fmap or <$> which are the latter is just an infix alias for convenience.
So how does this apply to your situation?
<*> :: Applicative f => f (a -> b) -> f a -> f b
<$> :: Functor f => a -> b -> f a -> f b
First of all you see that those two look darn similar but with <*> being slightly less familiar.
Now having a function
f :: Int -> Int
f x = x + 3
and
x1 :: Maybe Int
x1 = Just 4
x2 :: Maybe Int
x2 = Nothing
one couldn't simply just f y because that wouldn't typecheck - but and that is the first idea to keep in mind. Maybe is a Functor (it is also an Applicative - it is even more an M-thing, but let's not go there).
f <$> x1 = Just 7
f <$> x2 = Nothing
so you can imagine the f looking up the value and performing the calculation inside the Just and if there is no value - a.k.a. we have the Nothing situation, we'll do what every lazy student does - be lazy and do nothing ;-).
Now we get to the next part <*>
g1 :: Maybe (Int -> Int)
g1 = Just (x + 3)
g2 :: Maybe (Int -> Int)
g2 = Nothing
Still g1 x1 wouldn't work, but
g1 <*> x1 = Just 7
g1 <*> x2 = Nothing
g2 <*> x1 = Nothing -- remember g2 is Nothing
g2 <*> x2 = Nothing
NEAT! - but still how does this solve your problem?
The 'magic' is using both operators ... for multi-argument functions
h :: Int -> Int -> Int
h x y = x + y + 2
and partial function application, which just means put in one value get back a function that waits for the next value.
GHCi> :type h 1
h 1 :: Int -> Int
Now the strange thing happens we can use with a function like h.
GHCi> :type h1 <$> x1
h1 <$> x1 :: Maybe (Int -> Int)
well that's good because then we can use our <*> with it
y1 :: Maybe Int
y1 = Just 7
h1 <$> x1 <*> y1 = Just (4 + 7 + 2)
= Just 13
and this even works with an arbitrary number of arguments
k :: Int -> Int -> Int -> Int -> Int
k x y z w = ...
k <$> x1 <*> y1 <*> z1 <*> w1 = ...
So design a pure function that works with Int, Float, Double or whatever you like and then use the Functor/Applicative abstraction to make your lookup and frequency calculation work with each other.
I recently learned about Data.Function.fix, and now I want to apply it everywhere. For example, whenever I see a recursive function I want to "fix" it. So basically my question is where and when should I use it.
To make it more specific:
1) Suppose I have the following code for factorization of n:
f n = f' n primes
where
f' n (p:ps) = ...
-- if p^2<=n: returns (p,k):f' (n `div` p^k) ps for k = maximum power of p in n
-- if n<=1: returns []
-- otherwise: returns [(n,1)]
If I rewrite it in terms of fix, will I gain something? Lose something? Is it possible, that by rewriting an explicit recursion into fix-version I will resolve or vice versa create a stack overflow?
2) When dealing with lists, there are several solutions: recursion/fix, foldr/foldl/foldl', and probably something else. Is there any general guide/advice on when to use each? For example, would you rewrite the above code using foldr over the infinite list of primes?
There are, probably, other important questions not covered here. Any additional comments related to the usage of fix are welcome as well.
One thing that can be gained by writing in an explicitly fixed form is that the recursion is left "open".
factOpen :: (Integer -> Integer) -> Integer -> Integer
factOpen recur 0 = 1
factOpen recur n = n * recur (pred n)
We can use fix to get regular fact back
fact :: Integer -> Integer
fact = fix factOpen
This works because fix effectively passes a function itself as its first argument. By leaving the recursion open, however, we can modify which function gets "passed back". The best example of using this property is to use something like memoFix from the memoize package.
factM :: Integer -> Integer
factM = memoFix factOpen
And now factM has built-in memoization.
Effectively, we have that open-style recursion requires us impute the recursive bit as a first-order thing. Recursive bindings are one way that Haskell allows for recursion at the language level, but we can build other, more specialized forms.
I'd like to mention another usage of fix; suppose you have a simple language consisting of addition, negative, and integer literals. Perhaps you have written a parser which takes a String and outputs a Tree:
data Tree = Leaf String | Node String [Tree]
parse :: String -> Tree
-- parse "-(1+2)" == Node "Neg" [Node "Add" [Node "Lit" [Leaf "1"], Node "Lit" [Leaf "2"]]]
Now you would like to evaluate your tree to a single integer:
fromTree (Node "Lit" [Leaf n]) = case reads n of {[(x,"")] -> Just x; _ -> Nothing}
fromTree (Node "Neg" [e]) = liftM negate (fromTree e)
fromTree (Node "Add" [e1,e2]) = liftM2 (+) (fromTree e1) (fromTree e2)
Suppose someone else decides to extend the language; they want to add multiplication. They will have to have access to the original source code. They could try the following:
fromTree' (Node "Mul" [e1, e2]) = ...
fromTree' e = fromTree e
But then Mul can only appear once, at the top level of the expression, since the call to fromTree will not be aware of the Node "Mul" case. Tree "Neg" [Tree "Mul" a b] will not work, since the original fromTree has no pattern for "Mul". However, if the same function is written using fix:
fromTreeExt :: (Tree -> Maybe Int) -> (Tree -> Maybe Int)
fromTreeExt self (Node "Neg" [e]) = liftM negate (self e)
fromTreeExt .... -- other cases
fromTree = fix fromTreeExt
Then extending the language is possible:
fromTreeExt' self (Node "Mul" [e1, e2]) = ...
fromTreeExt' self e = fromTreeExt self e
fromTree' = fix fromTreeExt'
Now, the extended fromTree' will evaluate the tree properly, since self in fromTreeExt' refers to the entire function, including the "Mul" case.
This approach is used here (the above example is a closely adapted version of the usage in the paper).
Beware the difference between _Y f = f (_Y f) (recursion, value--copying) and fix f = x where x = f x (corecursion, reference--sharing).
Haskell's let and where bindings are recursive: same name on the LHS and RHS refer to the same entity. The reference is shared.
In the definition of _Y there's no sharing (unless a compiler performs an aggressive optimization of common subexpressions elimination). This means it describes recursion, where repetition is achieved by application of a copy of an original, like in a classic metaphor of a recursive function creating its own copies. Corecursion, on the other hand, relies on sharing, on referring to same entity.
An example, primes calculated by
2 : _Y ((3:) . gaps 5 . _U . map (\p-> [p*p, p*p+2*p..]))
-- gaps 5 == ([5,7..] \\)
-- _U == sort . concat
either reusing its own output (with fix, let g = ((3:)...) ; ps = g ps in 2 : ps) or creating separate primes supply for itself (with _Y, let g () = ((3:)...) (g ()) in 2 : g ()).
See also:
double stream feed to prevent unneeded memoization?
How to implement an efficient infinite generator of prime numbers in Python?
Or, with the usual example of factorial function,
gen rec n = n<2 -> 1 ; n * rec (n-1) -- "if" notation
facrec = _Y gen
facrec 4 = gen (_Y gen) 4
= let {rec=_Y gen} in (\n-> ...) 4
= let {rec=_Y gen} in (4<2 -> 1 ; 4*rec 3)
= 4*_Y gen 3
= 4*gen (_Y gen) 3
= 4*let {rec2=_Y gen} in (3<2 -> 1 ; 3*rec2 2)
= 4*3*_Y gen 2 -- (_Y gen) recalculated
.....
fac = fix gen
fac 4 = (let f = gen f in f) 4
= (let f = (let {rec=f} in (\n-> ...)) in f) 4
= let {rec=f} in (4<2 -> 1 ; 4*rec 3) -- f binding is created
= 4*f 3
= 4*let {rec=f} in (3<2 -> 1 ; 3*rec 2)
= 4*3*f 2 -- f binding is reused
.....
1) fix is just a function, it improves your code when you use some recursion. It makes your code prettier.For example usage visit: Haskell Wikibook - Fix and recursion.
2) You know what does foldr? Seems like foldr isn't useful in factorization (or i didn't understand what are you mean in that).
Here is a prime factorization without fix:
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->factIt x) $ xs
where factIt n = map (\x->getFact x n []) [2..n]
getFact i n xs
| n `mod` i == 0 = getFact i (div n i) xs++[i]
| otherwise = xs
and with fix(this exactly works like the previous):
fact xs = map (\x->takeWhile (\y->y/=[]) x) . map (\x->getfact x) $ xs
where getfact n = map (\x->defact x n) [2..n]
defact i n =
fix (\rec j k xs->if(mod k j == 0)then (rec j (div k j) xs++[j]) else xs ) i n []
This isn't pretty because in this case fix isn't a good choice(but there is always somebody who can write it better).
I'm just wondering about a recursion function I'm laying out in Haskell. Is it generally better to use guards than patterns for recursion functions?
I'm just not sure on what the best layout is but I do know that patterns are better when defining functions such as this:
units :: Int -> String
units 0 = "zero"
units 1 = "one"
is much preferred to
units n
| n == 0 = "zero"
| n == 1 = "one"
I'm just not sure though when it comes to recursion as to whether this is the same or different.
Just not quite sure on terminology: I'm using something like this:
f y [] = []
f y (x:xs)
| y == 0 = ......
| otherwise = ......
or would this be better?
f y [] = []
f 0 (x:xs) =
f y (x:xs) =
My general rule of thumb would be this:
Use pattern matching when the guard would be a simple == check.
With recursion, you usually are checking for a base case. So if your base case is a simple == check, then use pattern matching.
So I'd generally do this:
map f [] = []
map f (x:xs) = f x : map f xs
Instead of this (null simply checks if a list is empty. It's basically == []):
map f xs | null xs = []
| otherwise = f (head xs) : map f (tail xs)
Pattern matching is meant to make your life easier, imho, so in the end you should do what makes sense to you. If you work with a group, then do what makes sense to the group.
[update]
For your particular case, I'd do something like this:
f _ [] = []
f 0 _ = ...
f y (x:xs) = ...
Pattern matches, like guards, fall from top to bottom, stopping at the first definition that matches the input. I used the underscore symbol to indicate that for the first pattern match, I didn't care what the y argument was, and for the second pattern match, I didn't care what the list argument was (although, if you do use the list in that computation, then you should not use the underscore). Since it's still fairly simple ==-like checks, I'd personally stick with pattern matching.
But I think it's a matter of personal preference; your code is perfectly readable and correct as it is. If I'm not mistaken, when the code is compiled, both guards and pattern matches get turned into case statements in the end.
A simple rule
If you are recursing on a data structure, use pattern matching
If your recursive condition is more complex, use guards.
Discussion
Fundamentally, it depends on the test you wish to do to guard the recursion. If it is a test on the structure of a data type, use pattern matching, as it will be more efficient than redundant testing for equality.
For your example, pattern matching on the integers is obviously cleaner and more efficient:
units 0 = "zero"
units 1 = "one"
The same goes for recursive calls on any data type, where you distinguish cases via the shape of the data.
Now, if you had more complicated logical conditions, then guards would make sense.
There aren't really hard and fast rules on this, which is why the answers you've gotten were a bit hazy. Some decisions are easy, like pattern matching on [] instead of guarding with f xs | null xs = ... or, heaven forbid, f xs | length xs == 0 = ... which is terrible in multiple ways. But when there's no compelling practical issue, just use whichever makes the code clearer.
As an example, consider these functions (that aren't really doing anything useful, just serving as illustrations):
f1 _ [] = []
f1 0 (x:xs) = [[x], xs]
f1 y (x:xs) = [x] : f1 (y - 1) xs
f2 _ [] = []
f2 y (x:xs) | y == 0 = calc 1 : f2 (- x) xs
| otherwise = calc (1 / y) : f2 (y * x) xs
where calc z = x * ...
In f1, the separate patterns emphasize that the recursion has two base cases. In f2, the guards emphasize that 0 is merely a special case for some calculations (most of which are done by calc, defined in a where clause shared by both branches of the guard) and doesn't change the structure of the computation.
#Dan is correct: it's basically a matter of personal preferences and doesn't affect the generated code. This module:
module Test where
units :: Int -> String
units 0 = "zero"
units 1 = "one"
unitGuarded :: Int -> String
unitGuarded n
| n == 0 = "zero"
| n == 1 = "one"
produced the following core:
Test.units =
\ (ds_dkU :: GHC.Types.Int) ->
case ds_dkU of _ { GHC.Types.I# ds1_dkV ->
case ds1_dkV of _ {
__DEFAULT -> Test.units3;
0 -> Test.unitGuarded2;
1 -> Test.unitGuarded1
}
}
Test.unitGuarded =
\ (n_abw :: GHC.Types.Int) ->
case n_abw of _ { GHC.Types.I# x_ald ->
case x_ald of _ {
__DEFAULT -> Test.unitGuarded3;
0 -> Test.unitGuarded2;
1 -> Test.unitGuarded1
}
}
Exactly the same, except for the different default case, which in both instances is a pattern match error. GHC even commoned-up the strings for the matched cases.
The answers so far do not mention the advantage of pattern matching which is the most important for me: ability to safely implement total functions.
When doing pattern matching you can safely access the internal structure of the object without the fear of this object being something else. In case you forget some of the patterns, the compiler can warn you (unfortunately this warning is off by default in GHC).
For example, when writing this:
map f xs | null xs = []
| otherwise = f (head xs) : map f (tail xs)
You are forced to use non-total functions head and tail, thus risking the life of your program. If you make a mistake in guard conditions, the compiler can't help you.
On the other hand, if you make an error with pattern matching, the compiler can give you an error or a warning depending on how bad your error was.
Some examples:
-- compiles, crashes in runtime
map f xs | not (null xs) = []
| otherwise = f (head xs) : map f (tail xs)
-- does not have any way to compile
map f (h:t) = []
map f [] = f h : map f t
-- does not give any warnings
map f xs = f (head xs) : map f (tail xs)
-- can give a warning of non-exhaustive pattern match
map f (h:t) = f h : map f t
I am new to Haskell and I am very confused by Where vs. Let. They both seem to provide a similar purpose. I have read a few comparisons between Where vs. Let but I am having trouble discerning when to use each. Could someone please provide some context or perhaps a few examples that demonstrate when to use one over the other?
Where vs. Let
A where clause can only be defined at the level of a function definition. Usually, that is identical to the scope of let definition. The only difference is when guards are being used. The scope of the where clause extends over all guards. In contrast, the scope of a let expression is only the current function clause and guard, if any.
Haskell Cheat Sheet
The Haskell Wiki is very detailed and provides various cases but it uses hypothetical examples. I find its explanations too brief for a beginner.
Advantages of Let:
f :: State s a
f = State $ \x -> y
where y = ... x ...
Control.Monad.State
will not work, because where refers to
the pattern matching f =, where no x
is in scope. In contrast, if you had
started with let, then you wouldn't
have trouble.
Haskell Wiki on Advantages of Let
f :: State s a
f = State $ \x ->
let y = ... x ...
in y
Advantages of Where:
f x
| cond1 x = a
| cond2 x = g a
| otherwise = f (h x a)
where
a = w x
f x
= let a = w x
in case () of
_ | cond1 x = a
| cond2 x = g a
| otherwise = f (h x a)
Declaration vs. Expression
The Haskell wiki mentions that the Where clause is declarative while the Let expression is expressive. Aside from style how do they perform differently?
Declaration style | Expression-style
--------------------------------------+---------------------------------------------
where clause | let expression
arguments LHS: f x = x*x | Lambda abstraction: f = \x -> x*x
Pattern matching: f [] = 0 | case expression: f xs = case xs of [] -> 0
Guards: f [x] | x>0 = 'a' | if expression: f [x] = if x>0 then 'a' else ...
In the first example why is the Let in scope but Where is not?
Is it possible to apply Where to the first example?
Can some apply this to real examples where the variables represent actual expressions?
Is there a general rule of thumb to follow when to use each?
Update
For those that come by this thread later on I found the best explanation to be found here: "A Gentle Introduction to Haskell".
Let Expressions.
Haskell's let expressions are useful
whenever a nested set of bindings is
required. As a simple example,
consider:
let y = a*b
f x = (x+y)/y
in f c + f d
The set of bindings created by a let
expression is mutually recursive, and
pattern bindings are treated as lazy
patterns (i.e. they carry an implicit
~). The only kind of declarations
permitted are type signatures,
function bindings, and pattern
bindings.
Where Clauses.
Sometimes it is convenient to scope
bindings over several guarded
equations, which requires a where
clause:
f x y | y>z = ...
| y==z = ...
| y<z = ...
where z = x*x
Note that this cannot be done with a let expression, which only scopes over the expression which it encloses. A where clause is only allowed at the top level of a set of equations or case expression. The same properties and constraints on bindings in let expressions apply to those in where clauses. These two forms of nested scope seem very similar, but remember that a let expression is an expression, whereas a where clause is not -- it is part of the syntax of function declarations and case expressions.
1: The problem in the example
f :: State s a
f = State $ \x -> y
where y = ... x ...
is the parameter x. Things in the where clause can refer only to the parameters of the function f (there are none) and things in outer scopes.
2: To use a where in the first example, you can introduce a second named function
that takes the x as a parameter, like this:
f = State f'
f' x = y
where y = ... x ...
or like this:
f = State f'
where
f' x = y
where y = ... x ...
3: Here is a complete example without the ...'s:
module StateExample where
data State a s = State (s -> (a, s))
f1 :: State Int (Int, Int)
f1 = State $ \state#(a, b) ->
let
hypot = a^2 + b^2
result = (hypot, state)
in result
f2 :: State Int (Int, Int)
f2 = State f
where
f state#(a, b) = result
where
hypot = a^2 + b^2
result = (hypot, state)
4: When to use let or where is a matter of taste. I use let to emphasize a computation (by moving it to the front) and where to emphasize the program flow (by moving the computation to the back).
While there is the technical difference with respect to guards that ephemient pointed out, there is also a conceptual difference in whether you want to put the main formula upfront with extra variables defined below (where) or whether you want to define everything upfront and put the formula below (let). Each style has a different emphasis and you see both used in math papers, textbooks, etc. Generally, variables that are sufficiently unintuitive that the formula doesn't make sense without them should be defined above; variables that are intuitive due to context or their names should be defined below. For example, in ephemient's hasVowel example, the meaning of vowels is obvious and so it need not be defined above its usage (disregarding the fact that let wouldn't work due to the guard).
Legal:
main = print (1 + (let i = 10 in 2 * i + 1))
Not legal:
main = print (1 + (2 * i + 1 where i = 10))
Legal:
hasVowel [] = False
hasVowel (x:xs)
| x `elem` vowels = True
| otherwise = False
where vowels = "AEIOUaeiou"
Not legal: (unlike ML)
let vowels = "AEIOUaeiou"
in hasVowel = ...
Sadly, most of the answers here are too technical for a beginner.
LHYFGG has a relevant chapter on it -which you should read if you haven't already, but in essence:
where is just a syntactic construct (not a sugar) that are useful only at function definitions.
let ... in is an expression itself, thus you can use them wherever you can put an expression. Being an expression itself, it cannot be used for binding things for guards.
Lastly, you can use let in list comprehensions too:
calcBmis :: (RealFloat a) => [(a, a)] -> [a]
calcBmis xs = [bmi | (w, h) <- xs, let bmi = w / h ^ 2, bmi >= 25.0]
-- w: width
-- h: height
We include a let inside a list comprehension much like we would a predicate, only it doesn't filter the list, it only binds to names. The names defined in a let inside a list comprehension are visible to the output function (the part before the |) and all predicates and sections that come after of the binding. So we could make our function return only the BMIs of people >= 25:
I found this example from LYHFGG helpful:
ghci> 4 * (let a = 9 in a + 1) + 2
42
let is an expression so you can put a let anywhere(!) where expressions can go.
In other words, in the example above it is not possible to use where to simply replace let (without perhaps using some more verbose case expression combined with where).
I'm trying to understand how Haskell list comprehensions work "under the hood" in regards to pattern matching. The following ghci output illustrates my point:
Prelude> let myList = [Just 1, Just 2, Nothing, Just 3]
Prelude> let xs = [x | Just x <- myList]
Prelude> xs
[1,2,3]
Prelude>
As you can see, it is able to skip the "Nothing" and select only the "Just" values. I understand that List is a monad, defined as (source from Real World Haskell, ch. 14):
instance Monad [] where
return x = [x]
xs >>= f = concat (map f xs)
xs >> f = concat (map (\_ -> f) xs)
fail _ = []
Therefore, a list comprehension basically builds a singleton list for every element selected in the list comprehension and concatenates them. If a pattern match fails at some step, the result of the "fail" function is used instead. In other words, the "Just x" pattern doesn't match so [] is used as a placeholder until 'concat' is called. That explains why the "Nothing" appears to be skipped.
What I don't understand is, how does Haskell know to call the "fail" function? Is it "compiler magic", or functionality that you can write yourself in Haskell? Is it possible to write the following "select" function to work the same way as a list comprehension?
select :: (a -> b) -> [a] -> [b]
select (Just x -> x) myList -- how to prevent the lambda from raising an error?
[1,2,3]
While implemenatations of Haskell might not do it directly like this internally, it is helpful to think about it this way :)
[x | Just x <- myList]
... becomes:
do
Just x <- myList
return x
... which is:
myList >>= \(Just x) -> return x
As to your question:
What I don't understand is, how does Haskell know to call the "fail" function?
In do-notation, if a pattern binding fails (i.e. the Just x), then the fail method is called. For the above example, it would look something like this:
myList >>= \temp -> case temp of
(Just x) -> return x
_ -> fail "..."
So, every time you have a pattern-match in a monadic context that may fail, Haskell inserts a call to fail. Try it out with IO:
main = do
(1,x) <- return (0,2)
print x -- x would be 2, but the pattern match fails
The rule for desugaring a list comprehension requires an expression of the form [ e | p <- l ] (where e is an expression, p a pattern, and l a list expression) behave like
let ok p = [e]
ok _ = []
in concatMap ok l
Previous versions of Haskell had monad comprehensions, which were removed from the language because they were hard to read and redundant with the do-notation. (List comprehensions are redundant, too, but they aren't so hard to read.) I think desugaring [ e | p <- l ] as a monad (or, to be precise, as a monad with zero) would yield something like
let ok p = return e
ok _ = mzero
in l >>= ok
where mzero is from the MonadPlus class. This is very close to
do { p <- l; return e }
which desugars to
let ok p = return e
ok _ = fail "..."
in l >>= ok
When we take the List Monad, we have
return e = [e]
mzero = fail _ = []
(>>=) = flip concatMap
I.e., the 3 approaches (list comprehensions, monad comprehensions, do expressions) are equivalent for lists.
I don't think the list comprehension syntax has much to do with the fact that List ([]), or Maybe for that matter, happens to be an instance of the Monad type class.
List comprehensions are indeed compiler magic or syntax sugar, but that's possible because the compiler knows the structure of the [] data type.
Here's what the list comprehension is compiled to: (Well, I think, I didn't actually check it against the GHC)
xs = let f = \xs -> case xs of
Just x -> [x]
_ -> []
in concatMap f myList
As you can see, the compiler doesn't have to call the fail function, it can simply inline a empty list, because it knows what a list is.
Interestingly, this fact that the list comprehensions syntax 'skips' pattern match failures is used in some libraries to do generic programming. See the example in the Uniplate library.
Edit: Oh, and to answer your question, you can't call your select function with the lambda you gave it. It will indeed fail on a pattern match failure if you call it with an Nothing value.
You could pass it the f function from the code above, but than select would have the type:
select :: (a -> [b]) -> [a] -> [b]
which is perfectly fine, you can use the concatMap function internally :-)
Also, that new select now has the type of the monadic bind operator for lists (with its arguments flipped):
(>>=) :: [a] -> (a -> [b]) -> [b]
xs >>= f = concatMap f xs -- 'or as you said: concat (map f xs)