Haskell, loading and is this merge function correct? - haskell

First off I am using ghci under ubuntu 11.10 to run the haskell code. 2nd this is my first attempts at haskell. Third, how might I load a file into ghci and where does it need to be located and what should its extension be? I know ":l "file.haskelxtnsn"" is how to load a file, but that's my best guess so far.
Seeing as I can do the above, how does this code look for merging two list of possibly infinite size in ascending order. (I can't put this in the prelude> prompt because of indentation???) Given [1, 2, 3] and [4, 5, 6] I should get [1, 2, 3, 4, 5, 6], and I think the usage would be "take 10 (merge listx listy)"
let merge x y = (min (head x) (head y)) :
case (min (head x) (head y)) of
head x -> merge (drop 1 x) y
head y -> merge x (drop 1 y)
psuedo:
output the min of the heads of the lists
if the first lists head was output call merge with the rest of the first list and the second
else call merge with the first list and the rest of the second list

Usually the extension used is ".hs".
You can use :cd in ghci to change directory, you can also supply a path to the :load (:l for short) command.
Your logic is correct, although maybe I'd write it a bit differently (hopefully you know about and where clause and defining a function as a series of equations):
merge [] ys = ys
merge xs [] = xs
merge xs ys = min x y : if x < y then merge (tail xs) ys
else merge xs (tail ys)
where x = head xs
y = head ys
In ghci you need a let in front of definitions, which is different from the let ... in ... expression. This is rather confusing so I suggest you just put your code in a file and load it in ghci.
Function application has higher precedence then the : operator, so some of you parenthesis is not needed. We usually try to minimize the number of parenthesis to make the code more concise, but don't be over zealous about it.
I don't really see the point of using a case expression here (other than causing an error). Try reading on pattern matching for more detail, data constructors vs function applications, why you can't use head x inside a pattern but you can do x:xs (Although I didn't here). Calling head and min multiple times looks redundant, andy ou can also substitute drop 1 with tail.

If you want to type this into the GHCi prompt, you can do it like this:
> let merge x y = (min (head x) (head y)) : case (min (head x) (head y)) of {
head x -> merge (drop 1 x) y ; head y -> merge x (drop 1 y) }
i.e. using explicit braces in place of indentation (all the above meant to be entered in one unbroken line). When putting the code into a file to be loaded, the leading let shouldn't be used.
As to the code itself, it causes an error "Parse error in pattern". This is because head x is not a valid pattern.
You can find a merge code e.g. here:
merge (x:xs) (y:ys) | y < x = y : merge (x:xs) ys
| otherwise = x : merge xs (y:ys)
merge xs [] = xs
merge [] ys = ys
This preserves duplicates.

Related

Haskell Basics of Recursion

I'm not exactly sure if I'm even supposed to ask more general, nonspecific questions on this platform, but I'm new to writing Haskell and writing code in general and an in-depth explanation would really be appreciated. I'm very used to the typical method of using loop systems in other languages, but as Haskell's variables are immutable, I've found recursion really difficult to wrap my head around. A few examples from the Haskell Wikibook include:
length xs = go 0 xs
where
go acc [] = acc
go acc (_:xs) = go (acc + 1) xs
zip [] _ = []
zip _ [] = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys
[] !! _ = error "Index too large" -- An empty list has no elements.
(x:_) !! 0 = x
(x:xs) !! n = xs !! (n-1)
The first one is kind of self-explanatory, just writing a length function for strings from scratch. The second is like an index search that returns a char at a specified point, and the third I guess kind of transposes lists together.
Despite somewhat knowing what these pieces of code do, I'm having a lot of trouble wrapping my head around how they function. Any and all step-by-step analysis of how these things actually process would be GREATLY appreciated.
EDIT: Thank you all for the answers! I have yet to go through all of them thoroughly but after reading some this is exactly the kind of information I'm looking for. I don't have a lot of time to practice right now, finals soon and all, but during my break and decided to take another crack at recursion with this:
ood x
|rem x 2 == 1 = ood (x-1)
|x <= 0 = _
|otherwise = ood (x-2)
I wanted to attempt to make a small function that prints every odd number starting from x down to 1. Obviously it does not work; it simply only prints 1. I believe it does hit every odd number on the way down, it just does not display it's answers intermittently. If any one of you could take my own attempt at code and show me how to create a successful recursion function it would really help me a lot!
Let's look at how one might construct two of these.
zip
We'll start with zip. The purpose of zip is to "zip" two lists into one. The name comes from the analogy of zipping two sides of a zipper together. Here's an example of how it functions:
zip [1,2,3] ["a", "b", "c"]
= [(1,"a"), (2,"b"), (3,"c")]
The type signature of zip (which is typically the first thing you'd write) is
zip :: [a] -> [b] -> [(a, b)]
That is, it takes a list of elements of type a and a list of elements of type b and produces a list of pairs with one component of each type.
To construct this function, let's go for standard Haskell pattern matching. We have four cases:
The first list is [] and the second list is [].
The first list is [] and the second list is a cons (constructed using :).
The first list is a cons and the second list is [].
The first list is a cons and the second list is also a cons.
Let's work out each of these.
zip [] [] = ?
If you zip together two empty lists, you have no elements to work with, so surely you get the empty list.
zip [] [] = []
In the next case, we have
zip [] (y : ys) = ?
We have an element, y, of type b, but no element of type a to pair it with. So we can only construct the empty list.
zip [] (y : ys) = []
The same happens in the other asymmetrical case:
zip (x : xs) [] = []
Now we get to the interesting case of two conses:
zip (x : xs) (y : ys) = ?
We have elements of the right types, so we can make a pair, (x, y), of type (a, b). That's the head of the result. What's the tail of the result? Well, that's the result of zipping the two tails together.
zip (x : xs) (y : ys) = (x, y) : zip xs ys
Putting all these together, we get
zip [] [] = []
zip [] (y : ys) = []
zip (x : xs) [] = []
zip (x : xs) (y : ys) = (x, y) : zip xs ys
But the implementation you gave only has three cases! How's that? Look at what the first two cases have in common: the first list is empty. You can see that whenever the first list is empty, the result is empty. So you can combine these cases:
zip [] _ = []
zip (x : xs) [] = []
zip (x : xs) (y : ys) = (x, y) : zip xs ys
Now look at what's now the second case. We already know that the first list is a cons (because otherwise we'd have taken the first case), and we don't need to know anything more about its composition, so we can replace it with a wildcard:
zip [] _ = []
zip _ [] = []
zip (x : xs) (y : ys) = (x, y) : zip xs ys
That's produces the zip implementation you copied. Now it turns out that there's a different way to combine the patterns that I think explains itself a bit more clearly. Reorder the four patterns like this:
zip (x : xs) (y : ys) = (x, y) : zip xs ys
zip [] [] = []
zip [] (y : ys) = []
zip (x : xs) [] = []
Now you can see that the first pattern produces a cons and all the rest produce empty lists. So you can collapse all three of the rest, producing the nicely compact
zip (x : xs) (y : ys) = (x, y) : zip xs ys
zip _ _ = []
This explains what happens when both lists are conses, and what happens when that's not the case.
length
The naive way to implement length is very direct:
length :: [a] -> Int
length [] = 0
length (_ : xs) = 1 + length xs
This will give you correct answers, but it's inefficient. When evaluating the recursive call, the implementation needs to keep track of the fact that once it's done, it needs to add 1 to the result. In practice, it likely pushes the 1+ onto some sort of stack, makes the recursive call, pops the stack, and performs the addition. If the list has length n, the stack will reach size n. That's not great for efficiency. The solution, which the code you copied obscures somewhat, is to write a more general function instead.
-- | A number plus the length of a list
--
-- > lengthPlus n xs = n + length xs
lengthPlus :: Int -> [a] -> Int
-- n plus the length of an empty list
-- is n.
lengthPlus n [] = n
lengthPlus n (_ : xs) = ?
Well,
lengthPlus n (x : xs)
= -- the defining property of `lengthPlus`
n + length (x : xs)
= -- the naive definition of length
n + (1 + length xs)
= -- the associative law of addition
(n + 1) + length xs
= -- the defining property of lengthPlus, applied recursively
lengthPlus (n + 1) xs
So we get
lengthPlus n [] = n
lengthPlus n (_ : xs) = lengthPlus (n + 1) xs
Now the implementation can increment the counter argument on each recursive call instead of delaying them till afterwards. Well ... pretty much.
Thanks to Haskell's call-by-need semantics, this isn't guaranteed to run in constant memory. Suppose we call
lengthPlus 0 ["a","b"]
This reduces to the second case:
lengthPlus (0 + 1) ["b"]
But we haven't actually demanded the value of the sum. So the implementation could defer that addition work, creating a chain of deferrals that's just as bad as the stack seen earlier! In practice, the compiler is clever enough that it will work out how to do this right when optimizations are enabled. But if you don't want to rely on that, you can give it a hint:
lengthPlus n [] = n
lengthPlus n (_ : xs) = n `seq` lengthPlus (n + 1) xs
This tells the compiler that the integer argument actually has to be evaluated. As long as the compiler isn't being intentionally obtuse, it will be sure to evaluate it first, clearing up any deferred additions.
I'm not sure exactly which part you're confused by. Perhaps you're just overthinking this? Let's walk through zip slowly.
For arguments' sake, let's say we want to execute zip [1, 2, 3] ['A', 'B', 'C']. What do we do?
We have zip [1, 2, 3] ['A', 'B', 'C']. What now?
The first line ("equation") of the definition of zip says
zip [] _ = []
Is our first argument an empty list? No, it's [1, 2, 3]. OK, so skip this equation.
The second equation of zip says
zip _ [] = []
Is our second argument an empty list? No, it's ['A', 'B', 'C']. So ignore this equation too.
The last equation says
zip (x:xs) (y:ys) = (x, y) : zip xs ys
Is our first argument a non-empty list? Yes! It's [1, 2, 3]. So the first element becomes x, and the rest become xs: x = 1, xs = [2, 3].
Is our second argument a non-empty list? Again, yes: y = 'A', ys = ['B', 'C'].
OK, what do we do now? Well, what the right-hand size says. If I put in some extra brackets, the right-hand side basically says
(x, y) : (zip xs ys)
So we're constructing a new list, which starts with (x, y) (a 2-tuple) and continues with whatever zip xs ys is. So our output is (1, 'A') : ???.
What is the ??? part? Well, it's like we executed zip [2, 3] ['B', 'C']. Go back to the top, walk through again the same way as before. You'll find that this outputs (2, 'B') : ???.
Now we started with (1, 'A') : ???. If we replace that with the thing we just got, we now have (1, 'A') : (2, 'B') : ???.
Take this one step further and we have (1, 'A') : (2, 'B') : (3, 'C') : ???. Here the ??? part is now zip [] []. It should be clear that the first equation says this is [], so our final result is
(1, 'A') : (2, 'B') : (3, 'C') : []
which can also be written as
[(1, 'A'), (2, 'B'), (3, 'C')]
You probably already knew that was what the answer would eventually be. I hope now you can see how we get that answer.
If you understand what the three equations make zip do at each step, we can summarise the process like this:
zip [1, 2, 3] ['A', 'B', 'C']
(1, 'A') : (zip [2, 3] ['B', 'C'])
(1, 'A') : (2, 'B') : (zip [3] ['C'])
(1, 'A') : (2, 'B') : (3, 'C') : (zip [] [])
(1, 'A') : (2, 'B') : (3, 'C') : []
If you're still confused, try to put your finger on exactly what part confuses you. (Yeah, easier said than done...)
The key to recursion is to stop worrying about how your language provides support for recursion. You really only need to know three things, which I'll demonstrate using zip as the example.
How to solve the base case
The base case is zipping two lists when one is empty. In this case, we simply return an empty list.
zip _ [] = []
zip [] _ = []
How to break a problem into one (or more) simpler problem(s).
A non-empty list can be split into two parts, a head and a tail. The head is a single element; the tail is a (sub)list. To zip together two lists, we "zip" together the two heads using (,), and we zip together the two tails. Since the tails are both lists, we already have a way to zip them together: use zip!
(As a former professor of mine would say, "Trust your recursion".)
You might object that we can't call zip because we haven't finished defining it yet. But we aren't calling it yet; we are just saying that at some point in the future, when we call this function, the name zip will be bound to a function that zips two lists together, so we'll use that.
zip (x:xs) (y:ys) = let h = (x,y)
t = zip xs ys
in ...
How to put the pieces back together.
zip needs to return a list, and we have our head h and tail t of the new list. To put them together, just use (:):
zip (x:xs) (y:ys) = let h = (x,y)
t = zip xs ys
in h : t
Or more simply, zip (x:xs) (y:ys) = (x,y) : zip xs ys
When explaining recursion, it's usually simplest to start with the base case. However, the Haskell code is sometimes simpler if you can write the recursive case first, because it lets us simply the base case.
zip (x:xs) (y:ys) = (x,y) : zip xs ys
zip _ _ = [] -- If the first pattern match failed, at least one input is empty
Taking a step further back, let's introduce the only recursive function you'll ever need:
fix :: (a -> a) -> a
fix f = f (fix f)
fix computes the fixed point of its argument.
The fixed point of a function is the value that, when you apply the function, you get back the fixed point. For instance, the fixed point of the square function square x = x**2 is 1, since square 1 == 1*1 == 1.
fix doesn't look terribly useful, though, since it looks like it just gets stuck in an infinite loop:
fix f = f (fix f) = f (f (fix f)) = f (f (f (fix f))) = ...
However, as we'll see, laziness lets us take advantage of this infinite stream of calls to f.
Ok, how do we actually make use of fix? Consider this nonrecursive version of zip:
zip' :: ([a] -> [b] -> [(a,b)]) -> [a] -> [b] -> [(a,b)]
zip' f (x:xs) (y:ys) = (x,y) : f xs ys
zip' _ _ _ = []
Given two nonempty lists, zip' zips them together by using the help function f that it receives to zip the tails of its inputs. If either input list is empty, it ignores f and returns an empty list. Basically, we've left the hard work to whoever calls zip'. We'll trust them to provide an appropriate f.
But how do we call zip'? What argument can we pass? This is where fix comes in. Look at the type of zip' again, but this time make the substitution t ~ [a] -> [b] -> [(a,b)]:
zip' :: ([a] -> [b] -> [(a,b)]) -> [a] -> [b] -> [(a,b)]
:: t -> t
Hey, that's the type fix expects! What's the type of fix zip'?
> :t fix zip'
fix zip' :: [a] -> [b] -> [(a, b)]
As expected. So what happens if we pass zip' its own fixed point? We should get back... the fixed point, that is, fix zip' and zip' (fix zip') should be the same function. We still don't really know what the fixed point of zip' is, but just for kicks, what happens if we try to call it?
> (fix zip') [1,2] ['a','b']
[(1,'a'),(2,'b')]
It sure looks like we just found a definition of zip! But how? Let's use equational reasoning to figure out what just happened.
(fix zip') [1,2] ['a','b']
== (zip' (fix zip')) [1,2] ['a','b'] -- def'n of fix
== (1,'a') : (fix zip') [2] ['b'] -- def'n of zip'
== (1,'a') : (zip' (fix zip')) [2] ['b'] -- def'n of fix, but in the other direction
== (1,'a') : ((2,'b') : (fix zip') [] []) -- def'n of zip'
== (1,'a') : ((2,'b') : zip' (fix zip') [] []) -- def'n of fix
== (1,'a') : ((2,'b') : []) -- def'n of zip'
Because Haskell is lazy, the last call to zip' doesn't need to evaluate fix zip', because its value is never used. So fix f doesn't need to terminate; it just needs to provide another call to f on demand.
And in then end, we see that our recursive function zip is simply the fixed point of the nonrecursive function zip':
fix f = f (fix f)
zip' f (x:xs) (y:ys) = (x,y) : f xs ys
zip' _ _ _ = []
zip = fix zip'
Let's briefly use fix to define length and (!!) as well.
length xs = fix go' 0 xs
where go' _ acc [] = acc
go' f acc (_:xs) = f (acc + 1) xs
xs !! n = fix (!!!) xs n
where (!!!) _ [] _ = error "Too big"
(!!!) _ (x:_) 0 = x
(!!!) f (x:xs) n = f xs (n-1)
And in general, a recursive function is just the fixed point of a suitable nonrecursive function. Note that not all functions have a fixed point, though. Consider
incr x = x + 1
If you try to call its fixed point, you get
(fix incr) 1 = (incr (fix incr)) 1
= (incr (incr (fix incr))) 1
= ...
Since incr always needs its first argument, the attempt to calculate its fixed point always diverges. It should be obvious that incr has no fixed point, because there is no number x for which x == x + 1.
Here’s a nice trick to show how to convert normal imperative loops into recursion. Here are the steps:
Make data immutable by not mutating objects (e.g. no x.y = z, only x = x { y = z })
Make variables “nearly immutable” by moving all variable-changes to just before control flow
Change into “goto form”
Work out the set of mutating variables
Add “variable changes” for mutating variables that don’t change at each goto
Replace labels with functions and goto with function (tail) calls
Here is a simple example after step 1 but before anything else (made up syntax)
let sumOfList f list =
total = 0
done = False
while (not done) {
case list of
[] -> done = True
(x : xs) ->
list = xs
total = total + (f x)
}
total
Well this doesn’t really do much other than change variables but there’s one thing we can do for step 2:
let sumOfList f list =
total = 0
done = False
while (not done) {
case list of
[] -> done = True
(x : xs) ->
let y = f x in
list = xs
total = total + y
}
total
Step 3:
let sumOfList f list =
total = 0
done = False
loop:
if not done then goto body else goto finish
body:
case list of
[] ->
done = True
goto loop
(x : xs) ->
let y = f x in
list = xs
total = total + y
goto loop
finish:
total
Step 4: the mutating variables are done, list, and total
Step 5:
let sumOfList f list =
done = False
list = list
total = 0
goto loop
loop:
if not done then
total = total
done = done
list = list
goto body
else
total = total
done = done
list = list
goto finish
body:
case list of
[] ->
done = True
total = total
list = list
goto loop
(x : xs) ->
let y = f x in
done = done
total = total + y
list = xs
goto loop
finish:
total
Step 6:
let sumOfList f list = loop False list 0 where
loop done list total =
if not done
then body done list total
else finish done list total
body done list total =
case list of
[] -> loop True list total
(x : xs) -> let y = f x in loop done list (total + y)
finish done list total = total
We can now clean things up by removing some unused parameters:
let sumOfList f list = loop False list 0 where
loop done list total =
if not done
then body done list total
else finish total
body done list total =
case list of
[] -> loop True list total
(x : xs) -> let y = f x in loop done list (total + y)
finish total = total
And realising that in body done is always False and inlining loop and finish
let sumOfList f list = body list 0 where
body list total =
case list of
[] -> total
(x : xs) -> let y = f x in body list (total + y)
And now we can pull the case into multiple function definitions:
let sumOfList f list = body list 0 where
body [] total = total
body (x : xs) total =
let y = f x in body list (total + y)
Now inline the definition of y and give body a better name:
let sumOfList f list = go list 0 where
go [] total = total
go (x : xs) total = go list (total + f y)
A loop is a function call is a loop. Reentering a loop body with updated loop parameters is the same as reentering a function body in a new recursive call with the updated function parameters. Or in other words, a function call is a goto, and the function name is the label to jump to:
loop_label:
do stuff updating a, b, c,
go loop_label
is
loop a b c =
let a2 = {- .... a ... b ... c ... -}
b2 = {- .... a ... b ... c ... -}
c2 = {- .... a ... b ... c ... -}
in
loop a2 b2 c2
You did say you're comfortable with loops.
Let's give the translations of your example functions in terms of the more primitive construct, case, as defined in the Report:
length xs = go 0 xs
where
go a b = case (a , b) of
( acc , [] ) -> acc
( acc , (_ : xs) ) -> go (acc + 1) xs
so it's the same old plain linear recursion.
Same goes to the other two definitions:
zip a b = case ( a , b ) of
( [] , _ ) -> []
( _ , [] ) -> []
(x : xs , y : ys) -> (x,y) : zip xs ys
(the last one is left as an exercise).

Base case for not going out of the list?

Im very new to haskell and would like to know if theres a basic case for not going out of the list when going threw it!
For example in this code im trying to make a list where it compares the number on the right, and it if its bigger it stays on the list, otherwise we remove it, but it keeps giving me Prelude.head:empty list, since its comparing to nothing in the end im assuming. I've tried every base case i could think off... can anyone help me?
maiores:: [Int]->[Int]
maiores [] = []
maiores (x:xs) | x > (head xs) = [x] ++ [maiores xs)
| otherwise = maiores xs
If your function is passed a list with one element, it will match (x:xs), with xs matching []. Then you end up with head [] and thus your error. To avoid this, add an additional base case maiores (x:[]) = ... between your two existing cases, and fill it in appropriately.
Also: you can write [x] ++ maiores xs as x : maiores xs, which is more natural because you deconstruct a : and then immediately reconstruct it with the modified value, as opposed to indirectly using ++.
Never use head or tail in your code, unless you can't avoid it. These are partial functions, which will crash when their input is empty.
Instead, prefer pattern matching: instead of
foo [] = 4
foo (x:xs) = x + head xs + foo (tail xs)
write
foo [] = 4
foo (x1:x2:xs) = x1 + x2 + foo xs
Now, if we turn on warnings with -Wall, GHC will suggest that the match in not exhaustive: we forgot to handle the [_] case. So, we can fix the program accordingly
foo [] = 4
foo [x] = x
foo (x1:x2:xs) = x1 + x2 + foo xs
Just make pattern matching more specific. Since (:) is right associative:
maiores:: [Int]->[Int]
maiores [] = []
maiores (x : y : xs) | x > y = [x] ++ maiores (y:xs)
maiores (_ : xs) = maiores xs

Creating a function to mix elements from two lists

How would I go about writing a function to mix two lists as such:
mixLists :: [a] -> [a] -> [a]
mixLists [1,2,3] [4,6,8,2] = [1,4,2,6,3,8,2]
One simple option would be to write a simple recursive function to process the two lists into one. This function needs 3 possible cases
The first list is empty, so we just return the second straightaway as there's no further mixing to be done.
mixLists [] ys = ys
The second list could also be empty and as we might expect, in this case we just return the first list, whatever it may be
mixLists xs [] = xs
Now if we've made it past those two clauses, we know that neither xs nor ys are empty, so we only need to explain what to do if both are nonempty
mixLists (x : xs) (y : ys) = ?
Now we want to create a new list which starts with x followed by y because we're mixing together two lists, one of which starts with x and the other, y.
mixLists (x : xs) (y : ys) = x : y : ?
Now we have to figure out what the rest of this outputted list should be. Our specification presumably says it ought to contain xs and ys mixed and we can easily calculate that using a recursive call
mixLists (x : xs) (y : ys) = x : y : mixLists xs ys
If you can live with the limitation that the lists need be the same length, you can solve this with a one liner....
mixLists = concat . zipWith ((. return) . (:))
It might be an interesting exercise to figure out how this works.... Hint- the function in zipWith can also be written as \x y -> [x, y].

Why does Sieve of Eratosthenes need extra helper function for merging infinite lists?

I'm working with the Sieve of Eratosthenes code from Literate Programming (http://en.literateprograms.org/Sieve_of_Eratosthenes_%28Haskell%29), modified slightly to include edge cases on merge and diff:
primesInit = [2,3,5,7,11,13]
primes = primesInit ++ [i | i <- diff [15,17..] nonprimes]
nonprimes = foldr1 f . map g $ tail primes
where g p = [n * p | n <- [p,p+2..]]
f (x:xt) ys = x : (merge xt ys)
merge :: (Ord a) => [a] -> [a] -> [a]
merge [] ys = ys
merge xs [] = xs
merge xs#(x:xt) ys#(y:yt)
| x < y = x : merge xt ys
| x == y = x : merge xt yt
| x > y = y : merge xs yt
diff :: (Ord a) => [a] -> [a] -> [a]
diff [] ys = []
diff xs [] = xs
diff xs#(x:xt) ys#(y:yt)
| x < y = x : diff xt ys
| x == y = diff xt yt
| x > y = diff xs yt
Both merge and diff on their own are lazy. So is nonprimes and primes. But if we change the definition of primes to remove f, as in:
nonprimes = foldr1 merge . map g $ tail primes
where g p = [n * p | n <- [p,p+2..]]
Now nonprimes isn't lazy. I've also recreated this with take 20 $ foldr1 merge [[i*n | n <- [3,7..]] | i <- [5,9..]] (GHCI runs out of memory and exits).
Based on http://www.haskell.org/haskellwiki/Performance/Laziness , one easy source of non-laziness is recursing before returning a data constructor. But merge doesn't have this problem; it returns a cons-cell that contains the recursive call as the second item. Nor should the use of foldr be a culprit here by itself (It's foldl that can't do infinite lists).
So, why does merge need to be separated from foldr1 by f, which essentially does the first call to merge manually? All f does is return a cons cell that contains the call to merge as the second item, right?
NOTE: Someone else on Stack Overflow was working with similar code and ran into the same problem I did, but they accepted an answer that looked to me like basically different code. I'm asking why, not how, as it seems that laziness is somewhat important in Haskell.
Let's compare those two functions again:
merge [] ys = ys
merge xs [] = xs
merge xs#(x:xt) ys#(y:yt)
| x < y = x : merge xt ys
| x == y = x : merge xt yt
| x > y = y : merge xs yt
and
f (x:xt) ys = x : (merge xt ys)
Let's ignore the semantic differences between the two, though they are significant - f is a lot more restricted as far as when it's valid to call. Instead, lets look at only the strictness properties.
Pattern matches in multiple equations are checked top-down. Multiple pattern matches within a single equation are checked left-to-right. So the first thing merge does is force the constructor of its first argument, in order to determine if the first equation matches. If the first equation doesn't match, it forces the constructor of the second argument, in order to determine if the second equation matches. Only if neither equation matches does it move to the third case. The compiler is smart enough to know it's already forced both arguments at this point, so it doesn't do it again - but those pattern matches would require the arguments to be forced if it hadn't already been.
But the important thing here is that the process of figuring out which equation matches causes both arguments to be forced before any constructor is produced.
Now, contrast that with f. In the definition of f, the only pattern-matching is on the first argument. As such, f is somewhat less strict than merge. It produces a constructor before examining its second argument.
And it turns out that if you closely examine the behavior of foldr, it works on infinite lists precisely when the function passed to it doesn't (always) examine its second argument before producing a constructor.
The parenthetical "always" there is interesting. One of my favorite examples of using foldr and laziness together is:
dropRWhile :: (a -> Bool) -> [a] -> [a]
dropRWhile p = foldr (\x xs -> if p x && null xs then [] else x:xs) []
This is a maximally-lazy function that works like dropWhile, except from the back (right) of the list. If the current element doesn't match the predicate, it's returned immediately. If it does match the predicate, it looks ahead until it finds something that either doesn't match, or the end of the list. This will be productive on infinite lists, so long as it eventually finds an element that doesn't match the predicate. And that is the source of the "always" parenthetical up above - a function that usually doesn't examine its second argument before producing a constructor still allows foldr to usually work on infinite lists.
To determine the first element of its output, merge needs to evaluate both arguments enough to determine if they are empty lists or not. Without that information it can't be determined which case of the function definition applies.
In combination with foldr1 it becomes a problem that merge tries to evaluate its second argument. nonprimes in an expression of this form:
foldr1 merge [a,b,c,...]
To evaluate this, first `foldr1 is expanded:
merge a (foldr1 merge [b,c,...])
To now evaluate merge, the cases of its function definition are checked. First a is evaluated, and it turns out to not be an empty list. So the first case of merge doesn't apply. Next, the second parameter of merge needs to be evaluated to see if it is an empty list and if the second case of the definition of merge applies. This second parameter is foldr1 merge [b,c,...].
But to evaluate this we are in the same situation as before with foldr1 merge [a,b,c,...], and we just the same end up with merge b (foldr1 merge [c,...]), where merge again needs to evaluate it's second parameter to check if it's an empty list.
And so on. Each evaluation of merge requires another evaluation of merge first, which ends up in infinite recursion.
With f that problem is avoided, since it doesn't need to look at its second parameter for the top level evaluation. foldr1 f [a,b,c...] is f a (foldr1 f [b,c,...]) which evaluates to a non-empty list a0 : merge a' (foldr1 f [b,c,...]). So foldr1 f ... never is an empty list. This can be determined without any infinite recursion.
Now also the evaluation of merge a' (foldr1 f [b,c,...]) isn't a problem, since the second parameter evaluates to some b0 : ..., which is all merge needs to know to start producing a result.

Haskell: Double every 2nd element in list

I just started using Haskell and wanted to write a function that, given a list, returns a list in which every 2nd element has been doubled.
So far I've come up with this:
double_2nd :: [Int] -> [Int]
double_2nd [] = []
double_2nd (x:xs) = x : (2 * head xs) : double_2nd (tail xs)
Which works but I was wondering how you guys would write that function. Is there a more common/better way or does this look about right?
That's not bad, modulo the fixes suggested. Once you get more familiar with the base library you'll likely avoid explicit recursion in favor of some higher level functions, for example, you could create a list of functions where every other one is *2 and apply (zip) that list of functions to your list of numbers:
double = zipWith ($) (cycle [id,(*2)])
You can avoid "empty list" exceptions with some smart pattern matching.
double2nd (x:y:xs) = x : 2 * y : double2nd xs
double2nd a = a
this is simply syntax sugar for the following
double2nd xss = case xss of
x:y:xs -> x : 2 * y : double2nd xs
a -> a
the pattern matching is done in order, so xs will be matched against the pattern x:y:xs first. Then if that fails, the catch-all pattern a will succeed.
A little bit of necromancy, but I think that this method worked out very well for me and want to share:
double2nd n = zipWith (*) n (cycle [1,2])
zipWith takes a function and then applies that function across matching items in two lists (first item to first item, second item to second item, etc). The function is multiplication, and the zipped list is an endless cycle of 1s and 2s. zipWith (and all the zip variants) stops at the end of the shorter list.
Try it on an odd-length list:
Prelude> double_2nd [1]
[1,*** Exception: Prelude.head: empty list
And you can see the problem with your code. The 'head' and 'tail' are never a good idea.
For odd-lists or double_2nd [x] you can always add
double_2nd (x:xs) | length xs == 0 = [x]
| otherwise = x : (2 * head xs) : double_2nd (tail xs)
Thanks.
Here's a foldr-based solution.
bar :: Num a => [a] -> [a]
bar xs = foldr (\ x r f g -> f x (r g f))
(\ _ _ -> [])
xs
(:)
((:) . (*2))
Testing:
> bar [1..9]
[1,4,3,8,5,12,7,16,9]

Resources