To start off this whole thing I'm working with a pattern synonym defined as follows:
{-# Language PatternSynonyms #-}
pattern x := y <- x # y
This allows me to run multiple pattern matches across a parameter at once. A regular as binding (#) does not allow the left hand side to be a pattern but this does.
With this I make the following toy function
{-# Language ViewPatterns #-}
f ((_:_) := (head -> y)) =
[ y ]
f [] =
[]
It's not the best way to implement this I am sure, but it's a minimum working example for the behavior in question.
This has a function that takes a single parameter.
It matches the parameter against two patterns using the defined synonym.
The first pattern matches any non-empty list and makes no bindings.
The second runs the head function on the list and binds y to the result.
So the question is can head cause an error or will the other pattern prevent it?
>>> f []
[]
The other pattern prevents it! Alright so if I do them in the other order then it should break right?
f' ((head -> y) := (_:_)) =
[ y ]
f' [] =
[]
>>> f' []
[]
Nope! It still works. So now my question is: Is the second pattern doing anything at all? Maybe view patterns has some sort of smart behavior where it calls the function and fails the pattern if an error occurs ...
f'' (head -> y) =
[ y ]
f'' [] =
[]
>>> f'' []
[*** Exception: Prelude.head: empty list
No ... it doesn't. This fails. Somehow (_:_) blocks the error no matter what side it's on. Maybe ghc prefers to match destructuring patterns before view patterns? To test this I can replace the pattern (_:_) with (reverse -> _:_). This way it has to run a function before it can get to the destructuring.
But having tested it, the new pattern doesn't change the behavior. This hypothesis can be ruled out.
So maybe it's laziness? x can't be evaluated if the list is empty so it sits in thunk and the error never occurs. It seems to somewhat be the case. If I replace (head -> x) with (undefined -> x) we have no change in behavior.
However if I replace it with (undefined -> "yo"):
f ((undefined -> "yo") := (x:_)) = [x]
f [] = []
>>> f []
*** Exception: Prelude.undefined
The undefined does get evaluated. Which seems to indicate that the pattern is forcing evaluation to compare with "yo". And if I now switch the order:
f ((x:_) := (undefined -> "yo")) = [x]
f [] = []
>>> f []
[]
It isn't evaluated. It seems that now we are short circuiting the pattern match.
So the laziness hypothesis seems to make sense? It's still very opaque to me and I would love to have someone with more experience as to the internals of ghc confirm this hypothesis.
So my question is now what is going on? Is it laziness? How does it work?
A big thanks to discord user lexi. They helped a lot in the diagnosis thus far.
You are indeed observing the effect of laziness.
Let's start with a much more basic example:
f :: () -> Int
f x = 42
Laziness makes f undefined return 42. This is because the variable pattern x does not require the argument to be evaluated, so undefined is never demanded.
By comparison, if we used
f :: () -> Int
f () = 42
then f undefined does crash, since the pattern () requires the argument to be evaluated until it exposes the () constructor (which, in this case, means fully evaluated).
Similarly,
f :: String -> Int
f x = 42
will cause f undefined to return 42, while
f :: String -> Int
f (x:xs) = 42
will cause f undefined to crash, after trying to evaluate undefined so to expose the first list constructor (either : or []).
We also have that
f :: String -> Int
f "yo" = 42
f x = 0
makes f undefined crash: after all the pattern "yo" means ('y':'o':[]) so it will force undefined, trying to match it against the first :. More in detail, all the following calls will crash:
f undefined
f (undefined:anything)
f ('y':undefined)
f ('y':undefined:anything)
f ('y':'o':undefined)
Here anything can be undefined or any other string/char as needed.
By comparison, all of the following calls will return 0 since the first pattern in the definition fails its match (without crashing!):
f []
f ('a':anything)
f ('y':'a':anything)
f ('y':'o':anything:anything)
Again, anything can be undefined or any other string/char as needed.
This is because the pattern matching of "yo" is done roughly like this:
evaluate the input value x until WHNF (expose its first constructor)
if it is [], fail
if it is y:ys, evaluate y until WHNF
if y is another char than 'y', fail
if y is 'y', evaluate ys until WHNF
if it is '[]`, fail
if it is z:zs, evaluate z until WHNF
if z is another char than 'o', fail
if z is 'o', evaluate zs until WHNF
if it is [], succeed!!
if it is h:hs, fail
Note that in each "evaluate .. until WHNF" point we could crash (or get stuck in an infinite computation) beacuse of bottoms.
Essentially, pattern matching proceed left-to-right and stops, evaluating the input only as much as needed, and stopping as soon as the result (fail/success) is known. This will not necessarily force the full evaluation of the input. On failure, we do not even necessarily evaluate the input as deep as the pattern, if we discover an early failure point. This is indeed what happens when you write:
It seems that now we are short circuiting the pattern match.
Now, view patterns follow the same principle. A pattern undefined -> x will not evaluate undefined on the input since x does not need to know the result of undefined to succeed. Instead undefined -> x:xs, undefined -> [], and undefined -> "yo" do need to know the result, hence they will evaluate it as needed.
About your examples:
f ((_:_) := (head -> y))
Here, head -> y always succeeds. On its own, it could bind y to a bottom value, but that's prevented by the leftmost _:_ pattern.
f' ((head -> y) := (_:_))
Here, head -> y always succeeds. On its own, it will bind y to a bottom value, and this actually happens if the input is [], but that will not force the input, so not crash is caused so far. After that, we try the leftmost _:_ pattern, which fails. Result: failure, but no crash.
f'' (head -> y) = [ y ]
Again, head -> y always succeeds, and binds y to bottom (if the input is []). The pattern matching will succeed, and the result of f'' is [ head [] ]. We can take, e.g., the length of this list, but we can not print its contents without crashing.
f ((undefined -> "yo") := (x:_)) = [x]
f [] = []
undefined -> "yo" crashes, as explained above. The x:_ pattern is never tried.
f ((x:_) := (undefined -> "yo")) = [x]
Here we first match x:_ and only when that succeeds we try undefined -> "yo". Since we call f with [], the view pattern is not tried, so it does not crash. Calling f "a" would instead match x:_, try the view pattern and crash.
Related
In explaining foldr to Haskell newbies, the canonical definition is
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
But in GHC.Base, foldr is defined as
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
It seems this definition is an optimization for speed, but I don't see why using the helper function go would make it faster. The source comments (see here) mention inlining, but I also don't see how this definition would improve inlining.
I can add some important details about GHC's optimization system.
The naive definition of foldr passes around a function. There's an inherent overhead in calling a function - especially when the function isn't known at compile time. It'd be really nice to able to inline the definition of the function if it's known at compile time.
There are tricks available to perform that inlining in GHC - and this is an example of them. First, foldr needs to be inlined (I'll get to why later). foldr's naive implementation is recursive, so cannot be inlined. So a worker/wrapper transformation is applied to the definition. The worker is recursive, but the wrapper is not. This allows foldr to be inlined, despite the recursion over the structure of the list.
When foldr is inlined, it creates a copy of all of its local bindings, too. It's more or less a direct textual inlining (modulo some renaming, and happening after the desugaring pass). This is where things get interesting. go is a local binding, and the optimizer gets to look inside it. It notices that it calls a function in the local scope, which it names k. GHC will often remove the k variable entirely, and will just replace it with the expression k reduces to. And then afterwards, if the function application is amenable to inlining, it can be inlined at this time - removing the overhead of calling a first-class function entirely.
Let's look at a simple, concrete example. This program will echo a line of input with all trailing 'x' characters removed:
dropR :: Char -> String -> String
dropR x r = if x == 'x' && null r then "" else x : r
main :: IO ()
main = do
s <- getLine
putStrLn $ foldr dropR "" s
First, the optimizer will inline foldr's definition and simplify, resulting in code that looks something like this:
main :: IO ()
main = do
s <- getLine
-- I'm changing the where clause to a let expression for the sake of readability
putStrLn $ let { go [] = ""; go (x:xs) = dropR x (go xs) } in go s
And that's the thing the worker-wrapper transformation allows.. I'm going to skip the remaining steps, but it should be obvious that GHC can now inline the definition of dropR, eliminating the function call overhead. This is where the big performance win comes from.
GHC cannot inline recursive functions, so
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
cannot be inlined. But
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
is not a recursive function. It is a non-recursive function with a local recursive definition!
This means that, as #bheklilr writes, in map (foldr (+) 0) the foldr can be inlined and hence f and z replaced by (+) and 0 in the new go, and great things can happen, such as unboxing of the intermediate value.
As the comments say:
-- Inline only in the final stage, after the foldr/cons rule has had a chance
-- Also note that we inline it when it has *two* parameters, which are the
-- ones we are keen about specialising!
In particular, note the "we inline it when it has two parameters, which are the ones we are keen about specialising!"
What this is saying is that when foldr gets inlined, it's getting inlined only for the specific choice of f and z, not for the choice of the list getting folded. I'm not expert, but it would seem it would make it possible to inline it in situations like
map (foldr (+) 0) some_list
so that the inline happens in this line and not after map has been applied. This makes it optimizable in more situations and more easily. All the helper function does is mask the 3rd argument so {-# INLINE #-} can do its thing.
One tiny important detail not mentioned in other answers is that GHC, given a function definition like
f x y z w q = ...
cannot inline f until all of the arguments x, y, z, w, and q are applied. This means that it's often advantageous to use the worker/wrapper transformation to expose a minimal set of function arguments which must be applied before inlining can occur.
I understand what lazy evaluation is, and how it works and the advantages it has, but could you explain me what strict evaluation really is in Haskell? I can't seem to find much info about it, since lazy evaluation is the most known.
What are the benefit of each of them over the other. When is strict evaluation actually used?
Strictness happens in a few ways in Haskell,
First, a definition. A function is strict if and only if when its argument a doesn't terminate, neither does f a. Nonstrict (sometimes called lazy) is just the opposite of this.
You can be strict in an argument, either using pattern matching
-- strict
foo True = 1
foo False = 1
-- vs
foo _ = 1
Since we don't need to evaluate the argument, we could pass something like foo (let x = x in x) and it'd still just return 1. With the first one however, the function needs to see what value the input is so it can run the appropriate branch, thus it is strict.
If we can't pattern match for whatever reason, then we can use a magic function called seq :: a -> b -> b. seq basically stipulates that whenever it is evaluated, it will evaluated a to what's called weak head normal form.
You may wonder why it's worth it. Let's consider a case study, foldl vs foldl'. foldl is lazy in it's accumulator so it's implemented something like
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f accum [] = acuum
foldl f accum (x:xs) = foldl (f accum x) xs
Notice that since we're never strict in accum, we'll build up a huge series of thunks, f (f (f (f (f (f ... accum)))..)))
Not a happy prospect since this will lead to memory issues, indeed
*> foldl (+) 0 [1..500000000]
*** error: stack overflow
Now what'd be better is if we forced evaluation at each step, using seq
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f accum [] = accum
foldl' f accum (x:xs) = let accum' = f accum x
in accum' `seq` foldl' f accum' xs
Now we force the evaluation of accum at each step making it much faster. This will make foldl' run in constant space and not stackoverflow like foldl.
Now seq only evaluates it values to weak head normal form, sometimes we want them to be evaluated fully, to normal form. For that we can use a library/type class
import Control.DeepSeq -- a library on hackage
deepseq :: NFData a => a -> b -> a
This forces a to be fully evaluated so,
*> [1, 2, error "Explode"] `seq` 1
1
*> [1, 2, error "Explode"] `deepseq` 1
error: Explode
*> undefined `seq` 1
error: undefined
*> undefined `deepseq` 1
error undefined
So this fully evaluates its arguments. This is very useful for parallel programming for example, where you want to fully evaluate something on one core before it's sent back to the main thread, otherwise you'd just create a thunk and all the actual computation would still be sequential.
I've written a haskell function which splits a list xs into (init xs, last xs) like so:
split xs = split' [] xs
where
split' acc (x:[]) = (reverse acc, x)
split' acc (x:xs) = split' (x:acc) xs
Since an empty list can not be split in this way, there is no match for the empty list. However, I did not want to simply error ... the function. Thus I defined the following:
split [] = ([], undefined)
Thanks to lazy evaluation I can thus define a safe init which simply returns the empty list for the empty list:
init' = fst . split
Is there some way how I could detect the undefined if I tried to access it, such that
last' xs
| isUndefined (snd xs) = ...
| otherwise = ...
I do know about Maybe and Either, and that those are a better choice for expressing what I want. However I wondered if there is a way to detect an actual value of undefined, i.e. in terms of catching errors, like catching exceptions.
undefined is no better than using error. In fact, undefined in Prelude is defined as
undefined = error "Prelude.undefined"
Now, a function that can't result in an error is called a "total function", i.e. it is valid for all input values.
The split function you've currently implemented has the signature
split :: [a] -> ([a], a)
This is a problem, since the type signature promises that the result always contains a list and an element, which is clearly impossible to provide for empty lists of generic type.
The canonical way in Haskell to address this is to change the type signature to signify that sometimes we don't have a valid value for the second item.
split :: [a] -> ([a], Maybe a)
Now you can write a proper implementation for the case where you get an empty list
split [] = ([], Nothing)
split xs = split' [] xs
where
split' acc (x:[]) = (reverse acc, Just x)
split' acc (x:xs) = split' (x:acc) xs
Now you can detect the missing value case by pattern-matching
let (init', last') = split xs
in case last' of
Nothing -> ... -- do something if we don't have a value
Just x -> ... -- do something with value x
Because bottom subsumes non-termination, the function isUndefined would have to solve the halting problem and thus cannot exist.
But note that even if it existed, you still could not tell if the undefined value in the 2nd element of your tuple was put there through your split function or if the last element of the list was already undefined.
The error function doesn't do anything until it is evaluated, so you can do something like:
split [] = ([], error "split: empty list")
last' = snd . split
From the Haskell 2010 Language Report > Introduction # Values and Types
Errors in Haskell are semantically equivalent to ⊥ (“bottom”). Technically, they are indistinguishable from nontermination, so the language includes no mechanism for detecting or acting upon errors.
To be clear, undefined is intended to be a way to insert ⊥ into your program, and given that (as shang noted) undefined is defined in terms of error, there is, therefore, "no mechanism for detecting or acting upon undefined".
Although semantically speaking Ingo's answer is correct, if you're using GHC, there is a way using a couple of "unsafe" functions that, although not quite perfect as if you pass it a computation of type IO a which contains an exception it will return True, works. It's a bit of a cheat though :).
import Control.Exception
import System.IO.Unsafe
import Unsafe.Coerce
isUndefined :: a -> Bool
isUndefined x = unsafePerformIO $ catch ((unsafeCoerce x :: IO ()) >> return False) (const $ return True :: SomeException -> IO Bool)
I know this is horrible, but none the less it works. It won't detect non termination though ;)
How to write a function that takes a predicate f and a list xx and reutrns true if fx is true for some x∈xs?
For example:
ghci>exists (>2) [1,2,3]
True
This is the function I wrote:
exists :: (t->Bool)->[t]->Bool
exists f a []=error
exists f a (x:xs)
|if x∈f a =True
|otherwise= x:f a xs
I know this is not right, but I don't know why. Do I need to write this predicate function f first, then used it inside the function exists. Because I really don't know how to compare one element of list xs with the function.
Your desired example usage is this
ghci>exists (>2) [1,2,3]
True
Stop. Hoogle time. ( <------ This should be the Haskell motto imho)
You want a function ("exists") that takes two parameters. The first is a unary function (a -> Bool) and the second is a list [a]. The desired result is a Bool
Hoogling that type signature, (a -> Bool) -> [a] -> Bool, the top hits are any, all, and find. As Andrew has noted, any is the one that behaves like the "exists" function.
As a side note, my first thought was to use find, which returns a Maybe a, and then pattern match. If it returns Nothing, then the result would be False, otherwise True.
As another side note, the actual implementation is simply any p = or . map p.
The third side note is probably the answer to your actual question. How is map defined? Hoogle is once again your friend. Search for the method's name and you can find a page that links to the source. I suggest you do this for map and or, but will only show map here.
map _ [] = []
map f (x:xs) = f x : map f xs
That's the basic way to recurse over a list. recursiveCall f (x:xs) = f x : recursiveCall f xs But if it can be written with map, filter, or foldl/foldr, then you should do it with these recursive methods. (Stop. Hoogle time. Search for those method names and check out the source; it's pretty straightforward.)
If we take a look at your definition,
exists :: (t -> Bool) -> [t] -> Bool
exists f a []=error
exists f a (x:xs)
|if x∈f a =True
|otherwise= x:f a xs
We see that your type is
exists :: (t -> Bool) -> [t] -> Bool
So exists must take two parameters, one predicate function of type (t -> Bool) and one list of type [t]. It returns a Bool. This seem okay as per our intention of the specification.
Let us look at the first line of your terms:
exists f a [] = error
This function suddenly takes three parameters. The f and the empty list constructor [] looks okay, but the a is not mentioned in the type specification. Hence, we prune it out:
exists f [] = error
Now, the error returned is not of boolean value. But the spec says it must be. Let us suppose we are asking exists (<2) []. Then would a natural answer to the question be True or False? Or paraphrased, is there any element x in [] satisfying the predicate f x ?
On to the next line,
exists f a (x:xs)
|if x∈f a =True
|otherwise= x:f a xs
We learned that the a has to go by the type specification, so let us prune it. Since we have now grown a natural dislike for the a, why not prune it everywhere it occur. Also, since the if will produce a syntax error, lets rid ourselves of that too:
exists f (x:xs)
| x∈f = True
| otherwise = x:f xs
The x∈f does not make much sense, but f x does. The guard variant will be taken if f x returns true. Now, the True which is returned here sounds about right. It signifies that we have found an element in the list matching the predicate - and lo n' behold, x might be it!
So we turn our attention to the final line. The otherwise means that the guard f x did not return True. As a consequence, the x is not satisfying the predicate, so we must search the rest of the list.
The Right-hand-side x : f xs is peculiar. The : means that we will try to return a list, but the return type of the function is something of type Bool. The type checker won't like us if we try this. Furthermore, we have no reason to look at the x anymore since we just determined it does not satisfy the predicate.
The key thing you are missing is that we need recursion at this point. We need to search the tail xs of the list somehow - and recursion means to invoke the exists function on the tail.
Your general track is right, but ask again if something is unclear. One trick might be to go by the types for the recursion case: "What do i have to supply exists for it to return a Bool value?".
I think the function you want already exists -- any:
Prelude> :t any
any :: (a -> Bool) -> [a] -> Bool
Prelude> any (<3) [1, 2, 3, 4]
True
Prelude> any (<3) [3, 4, 5, 6]
False
And then, in the spirit of your question -- not just getting a working function but working out how it's done -- we can look up the definition in the prelude:
any p xs = or (map p xs)
We map the function over the list to get a new [Bool] list, and then check with or to see if any of them are True, which by the way thanks to lazy evaluation short circuits as needed:
Prelude> any (<3) [1, 2..]
True
Actually your original version wasn't too far from working. To fix it, write:
exists :: (t -> Bool) -> [t] -> Bool
exists _ [] = False
exists f (x:xs)
| f x = True
| otherwise = exists f xs
Instead of using x in f, just apply f to x using f x as the predicate in the if statement. Your otherwise clause should also return a Bool: the result of exists on the rest of the list.
I'm trying to understand how Haskell list comprehensions work "under the hood" in regards to pattern matching. The following ghci output illustrates my point:
Prelude> let myList = [Just 1, Just 2, Nothing, Just 3]
Prelude> let xs = [x | Just x <- myList]
Prelude> xs
[1,2,3]
Prelude>
As you can see, it is able to skip the "Nothing" and select only the "Just" values. I understand that List is a monad, defined as (source from Real World Haskell, ch. 14):
instance Monad [] where
return x = [x]
xs >>= f = concat (map f xs)
xs >> f = concat (map (\_ -> f) xs)
fail _ = []
Therefore, a list comprehension basically builds a singleton list for every element selected in the list comprehension and concatenates them. If a pattern match fails at some step, the result of the "fail" function is used instead. In other words, the "Just x" pattern doesn't match so [] is used as a placeholder until 'concat' is called. That explains why the "Nothing" appears to be skipped.
What I don't understand is, how does Haskell know to call the "fail" function? Is it "compiler magic", or functionality that you can write yourself in Haskell? Is it possible to write the following "select" function to work the same way as a list comprehension?
select :: (a -> b) -> [a] -> [b]
select (Just x -> x) myList -- how to prevent the lambda from raising an error?
[1,2,3]
While implemenatations of Haskell might not do it directly like this internally, it is helpful to think about it this way :)
[x | Just x <- myList]
... becomes:
do
Just x <- myList
return x
... which is:
myList >>= \(Just x) -> return x
As to your question:
What I don't understand is, how does Haskell know to call the "fail" function?
In do-notation, if a pattern binding fails (i.e. the Just x), then the fail method is called. For the above example, it would look something like this:
myList >>= \temp -> case temp of
(Just x) -> return x
_ -> fail "..."
So, every time you have a pattern-match in a monadic context that may fail, Haskell inserts a call to fail. Try it out with IO:
main = do
(1,x) <- return (0,2)
print x -- x would be 2, but the pattern match fails
The rule for desugaring a list comprehension requires an expression of the form [ e | p <- l ] (where e is an expression, p a pattern, and l a list expression) behave like
let ok p = [e]
ok _ = []
in concatMap ok l
Previous versions of Haskell had monad comprehensions, which were removed from the language because they were hard to read and redundant with the do-notation. (List comprehensions are redundant, too, but they aren't so hard to read.) I think desugaring [ e | p <- l ] as a monad (or, to be precise, as a monad with zero) would yield something like
let ok p = return e
ok _ = mzero
in l >>= ok
where mzero is from the MonadPlus class. This is very close to
do { p <- l; return e }
which desugars to
let ok p = return e
ok _ = fail "..."
in l >>= ok
When we take the List Monad, we have
return e = [e]
mzero = fail _ = []
(>>=) = flip concatMap
I.e., the 3 approaches (list comprehensions, monad comprehensions, do expressions) are equivalent for lists.
I don't think the list comprehension syntax has much to do with the fact that List ([]), or Maybe for that matter, happens to be an instance of the Monad type class.
List comprehensions are indeed compiler magic or syntax sugar, but that's possible because the compiler knows the structure of the [] data type.
Here's what the list comprehension is compiled to: (Well, I think, I didn't actually check it against the GHC)
xs = let f = \xs -> case xs of
Just x -> [x]
_ -> []
in concatMap f myList
As you can see, the compiler doesn't have to call the fail function, it can simply inline a empty list, because it knows what a list is.
Interestingly, this fact that the list comprehensions syntax 'skips' pattern match failures is used in some libraries to do generic programming. See the example in the Uniplate library.
Edit: Oh, and to answer your question, you can't call your select function with the lambda you gave it. It will indeed fail on a pattern match failure if you call it with an Nothing value.
You could pass it the f function from the code above, but than select would have the type:
select :: (a -> [b]) -> [a] -> [b]
which is perfectly fine, you can use the concatMap function internally :-)
Also, that new select now has the type of the monadic bind operator for lists (with its arguments flipped):
(>>=) :: [a] -> (a -> [b]) -> [b]
xs >>= f = concatMap f xs -- 'or as you said: concat (map f xs)