Use of folding in defining functions - haskell

I was introduced to the use of fold in defining function. I have an idea how that works but im not sure why one should do it. To me, it feels like just simplifying name of data type and data value ... Would be great if you can show me examples where it is significant to use fold.
data List a = Empty | (:-:) a (List a)
--Define elements
List a :: *
[] :: List a
(:) :: a -> List a -> List a
foldrList :: (a -> b -> b) -> b -> List a -> b
foldrList f e Empty = e
foldrList f e (x:-:xs) = f x (foldrList f e xs)

The idea of folding is a powerful one. The fold functions (foldr and foldl in the Haskell base library) come from a family of functions called Higher-Order Functions (for those who don't know - these are functions which take functions as parameters or return functions as their output).
This allows for greater code clarity as the intention of the program is more clearly expressed. A function written using fold functions strongly indicates that there is an intention to iterate over the list and apply a function repeatedly to obtain an output. Using the standard recursive method is fine for simple programs but when complexity increases it can become difficult to understand quickly what is happening.
Greater code re-use can be achieved with folding due to the nature of passing in a function as the parameter. If a program has some behaviour that is affected by the passing of a Boolean or enumeration value then this behaviour can be abstracted away into a separate function. The separate function can then be used as an argument to fold. This achieves greater flexibility and simplicity (as there are 2 simpler functions versus 1 more complex function).
Higher-Order Functions are also essential for Monads.
Credit to the comments for this question as well for being varied and informative.

Higher-order functions like foldr, foldl, map, zipWith, &c. capture common patterns of recursion so you can avoid writing manually recursive definitions. This makes your code higher-level and more readable: instead of having to step through the code and infer what a recursive function is doing, the programmer can reason about compositions of higher-level components.
For a somewhat extreme example, consider a manually recursive calculation of standard deviation:
standardDeviation numbers = step1 numbers
where
-- Calculate length and sum to obtain mean
step1 = loop 0 0
where
loop count sum (x : xs) = loop (count + 1) (sum + x) xs
loop count sum [] = step2 sum count numbers
-- Calculate squared differences with mean
step2 sum count = loop []
where
loop diffs (x : xs) = loop ((x - (sum / count)) ^ 2 : diffs) xs
loop diffs [] = step3 count diffs
-- Calculate final total and return square root
step3 count = loop 0
where
loop total (x : xs) = loop (total + x) xs
loop total [] = sqrt (total / count)
(To be fair, I went a little overboard by also inlining the summation, but this is roughly how it may typically be done in an imperative language—manually looping.)
Now consider a version using a composition of calls to standard functions, some of which are higher-order:
standardDeviation numbers -- The standard deviation
= sqrt -- is the square root
. mean -- of the mean
. map (^ 2) -- of the squares
. map (subtract -- of the differences
(mean numbers)) -- with the mean
$ numbers -- of the input numbers
where -- where
mean xs -- the mean
= sum xs -- is the sum
/ fromIntegral (length xs) -- over the length.
This more declarative code is also, I hope, much more readable—and without the heavy commenting, could be written neatly in two lines. It’s also much more obviously correct than the low-level recursive version.
Furthermore, sum, map, and length can all be implemented in terms of folds, as well as many other standard functions like product, and, or, concat, and so on. Folding is an extremely common operation on not only lists, but all kinds of containers (see the Foldable typeclass), because it captures the pattern of computing something incrementally from all elements of a container.
A final reason to use folds instead of manual recursion is performance: thanks to laziness and optimisations that GHC knows how to perform when you use fold-based functions, the compiler may fuse a series of folds (maps, &c.) together into a single loop at runtime.

Related

Haskell - why would I use infinite data structures?

In Haskell, it is possible to define infinite lists like so:
[1.. ]
If found many articles which describe how to implement infinite lists and I understood how this works.
However, I cant think of any reason to use the concept of infinite datastructures.
Can someone give me an example of a problem, which can be solved easier (or maybe only) with an infinite list in Haskell?
The basic advantage of lists in Haskell is that they’re a control structure that looks like a data structure. You can write code that operates incrementally on streams of data, but it looks like simple operations on lists. This is in contrast to other languages that require the use of an explicitly incremental structure, like iterators (Python’s itertools), coroutines (C# IEnumerable), or ranges (D).
For example, a sort function can be written in such a way that it sorts as few elements as possible before it starts to produce results. While sorting the entire list takes O(n log n) / linearithmic time in the length of the list, minimum xs = head (sort xs) only takes O(n) / linear time, because head will only examine the first constructor of the list, like x : _, and leave the tail as an unevaluated thunk that represents the remainder of the sorting operation.
This means that performance is compositional: for example, if you have a long chain of operations on a stream of data, like sum . map (* 2) . filter (< 5), it looks like it would first filter all the elements, then map a function over them, then take the sum, producing a full intermediate list at every step. But what happens is that each element is only processed one at a time: given [1, 2, 6], this basically proceeds as follows, with all the steps happening incrementally:
total = 0
1 < 5 is true
1 * 2 == 2
total = 0 + 2 = 2
2 < 5 is true
2 * 2 == 4
total = 2 + 4 = 6
6 < 5 is false
result = 6
This is exactly how you would write a fast loop in an imperative language (pseudocode):
total = 0;
for x in xs {
if (x < 5) {
total = total + x * 2;
}
}
This means that performance is compositional: because of laziness, this code has constant memory usage during the processing of the list. And there is nothing special inside map or filter that makes this happen: they can be entirely independent.
For another example, and in the standard library computes the logical AND of a list, e.g. and [a, b, c] == a && b && c, and it’s implemented simply as a fold: and = foldr (&&) True. The moment it reaches a False element in the input, it stops evaluation, simply because && is lazy in its right argument. Laziness gives you composition!
For a great paper on all this, read the famous Why Functional Programming Matters by John Hughes, which goes over the advantages of lazy functional programming (in Miranda, a forebear of Haskell) far better than I could.
Annotating a list with its indices temporarily uses an infinite list of indices:
zip [0..] ['a','b','c','d'] = [(0,'a'), (1,'b'), (2,'c'), (3,'d')]
Memoizing functions while maintaining purity (in this case this transformation causes an exponential speed increase, because the memo table is used recursively):
fib = (memo !!)
where
memo = map fib' [0..] -- cache of *all* fibonacci numbers (evaluated on demand)
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n-1) + fib (n-2)
Purely mocking programs with side-effects (free monads)
data IO a = Return a
| GetChar (Char -> IO a)
| PutChar Char (IO a)
Potentially non-terminating programs are represented with infinite IO strucutres; e.g. forever (putChar 'y') = PutChar 'y' (PutChar 'y' (PutChar 'y' ...))
Tries: if you define a type approximately like the following:
data Trie a = Trie a (Trie a) (Trie a)
it can represent an infinite collection of as indexed by the naturals. Note that there is no base case for the recursion, so every Trie is infinite. But the element at index n can be accessed in log(n) time. Which means you can do something like this (using some functions in the inttrie library):
findIndices :: [Integer] -> Trie [Integer]
findIndices = foldr (\(i,x) -> modify x (i:)) (pure []) . zip [0..]
this builds an efficient "reverse lookup table" which given any value in the list can tell you at which indices it occurs, and it both caches results and streams information as soon as it's available:
-- N.B. findIndices [0, 0,1, 0,1,2, 0,1,2,3, 0,1,2,3,4...]
> table = findIndices (concat [ [0..n] | n <- [0..] ])
> table `apply` 0
[0,1,3,6,10,15,21,28,36,45,55,66,78,91,...
all from a one-line infinite fold.
I'm sure there are more examples, there are so many cool things you can do.

The length of a list without the "length" function in Haskell

I want to see how long a list is, but without using the function length. I wrote this program and it does not work. Maybe you can tell me why? Thanks!
let y = 0
main = do
list (x:xs) = list (xs)
y++
list :: [Integer] -> Integer
list [] = y
Your program looks quite "imperative": you define a variable y, and then somehow write a do, that calls (?) the list function (?) that automagically seems to "return y" and then you want to increment y.
That's not how Haskell (and most functional and declarative) languages work:
in a declarative language, you define a variable only once, after the value is set, there is usually no way to alter its value,
in Haskell a do usually is used for monads, whereas the length is a pure function,
the let is a syntax construction to define a variable within the scope of an expression,
...
In order to program Haskell (or any functional language), you need to "think functional": think how you would solve the problem in a mathematical way using only functions.
In mathematics, you would say that the empty list [] clearly has length 0. Furthermore in case the list is not empty, there is a first element (the "head") and remaining elements (the "tail"). In that case the result is one plus the length of the tail. We can convert that in a mathematical expression, like:
Now we can easily translate that function into the following Haskell code:
ownLength :: [a] -> Int
ownLength [] = 0
ownLength (_:xs) = 1 + ownLength xs
Now in Haskell, one usually also uses accumulators in order to perform tail recursion: you pass a parameter through the recursive calls and each time you update the variable. When you reach the end of your recursion, you return - sometimes after some post-processing - the accumulator.
In this case the accumulator would be the so far seen length, so you could write:
ownLength :: [a] -> Int
ownLength = ownLength' 0
where ownLength' a [] = a
ownLength' a (_:xs) = ownLength' (a+1) xs
It looks you still think in an imperative way (not the functional way). For example:
you try to change the value of a "variable" (i.e. y++)
you try to use "global variable" (i.e. y) in the body of the list function
Here is the possible solution to your problem:
main = print $ my_length [1..10]
my_length :: [Integer] -> Integer
my_length [] = 0
my_length (_:xs) = 1 + my_length xs
You can also run this code here: http://ideone.com/mjUwL9.
Please also note that there is no need to require that your list consists of Integer values. In fact, you can create much more "agnostic" version of your function by using the following declaration:
my_length :: [a] -> Integer
Implementation of this function doesn't rely on the type of items from the list, thus you can use it for a list of any type. In contrast, you couldn't be that much liberal for, for example, my_sum function (a potential function that calculates the sum of elements from the given list). In this situation, you should define that your list consists of some numerical type items.
At the end, I'd like to suggest you a fantastic book about Haskell programming: http://learnyouahaskell.com/chapters.
Other answers have already beautifully explained the proper functional approach. It looks like an overkill but here is another way of implementing the length function by using only available higher order functions.
my_length :: [a] -> Integer
my_length = foldr (flip $ const . (+1)) 0
I've found this solution in Learn you a haskell.
length' xs = sum [1 | _ <- xs]
It replaces every element of the list with 1 and sums it up.
Probably the simplest way is to convert all elements to 1 and then to sum the new elements:
sum . map (const 1)
For added speed:
foldl' (+) 0 . map (const 1)

Forcing Strict Evaluation - What am I doing wrong?

I want an intermediate result computed before generating the new one to get the benefit of memoization.
import qualified Data.Map.Strict as M
import Data.List
parts' m = newmap
where
n = M.size m + 1
lists = nub $ map sort $
[n] : (concat $ map (\i -> map (i:) (M.findWithDefault [] (n-i) m)) [1..n])
newmap = seq lists (M.insert n lists m)
But, then if I do
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
It still completes instantaneously.
(Can using an Array instead of a Map help?)
Short answer:
If you need to calculate the entire list/array/map/... at once, you can use deepseq as #JoshuaRahm suggests, or the ($!!) operator.
The answer below how you can enforce strictness, but only on level-1 (it evaluates until it reaches a datastructure that may contain (remainders) of expression trees).
Furthermore the answer argues why laziness and memoization are not (necessarily) opposites of each other.
More advanced:
Haskell is a lazy language, it means it only calculates something, if it is absolutely necessary. An expression like:
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
is not evaluated immediately: Haskell simply stores that this has to be calculated later. Later if you really need the first, second, i-th, or the length of the list, it will evaluate it, and even then in a lazy fashion: if you need the first element, from the moment it has found the way to calculate that element, it will represent it as:
element : take 1999 (<some-expression>)
You can however force Haskell to evaluate something strictly with the exclamation mark (!), this is called strictness. For instance:
main = do
return $! take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
Or in case it is an argument, you can use it like:
f x !y !z = x+y+z
Here you force Haskell to evaluate y and z before "increasing the expression tree" as:
expression-for-x+expression-for-y+expression-for-z.
EDIT: if you use it in a let pattern, you can use the bang as well:
let !foo = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in ...
Note that you only collapse the structure to the first level. Thus let !foo will more or less only evaluate up to (_:_).
Note: note that memoization and lazyness are not necessary opposites of each other. Consider the list:
numbers :: [Integer]
numbers = 0:[i+(sum (genericTake i numbers))|i<-[1..]]
As you can see, calculating a number requires a large amount of computational effort. Numbers is represented like:
numbers ---> (0:[i+(sum (genericTake i numbers))|i<-[1..]])
if however, I evaluate numbers!!1, it will have to calculate the first element, it returns 1; but the internal structure of numbers is evaluated as well. Now it looks like:
numbers (0:1:[i+(sum (genericTake i numbers))|i<-[2..]])
The computation numbers!!1 thus will "help" future computations, because you will never have to recalcuate the second element in the list.
If you for instance calculate numbers!!4000, it will take a few seconds. Later if you calculate numbers!!4001, it will be calculated almost instantly. Simply because the work already done by numbers!!4000 is reused.
Arrays might be able to help, but you can also try taking advantage of the deepseq library. So you can write code like this:
let x = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in do
x `deepseq` print (x !! 5) -- takes a *really* long time
print (x !! 1999) -- finishes instantly
You are memoizing the partitions functions, but there are some drawbacks to your approach:
you are only memoizing up to a specific value which you have to specify beforehand
you need to call nub and sort
Here is an approach using Data.Memocombinators:
import Data.Memocombinators
parts = integral go
where
go k | k <= 0 = [] -- for safety
go 1 = [[1]]
go n = [[n]] ++ [ (a : p) | a <- [n-1,n-2..1], p <- parts (n-a), a >= head p ]
E.g.:
ghci> parts 4
[[4],[3,1],[2,2],[2,1,1],[1,1,1,1]]
This memoization is dynamic, so only the values you actually access will be memoized.
Note how it is constructed - parts = integral go, and go uses parts for any recursive calls. We use the integral combinator here because parts is a function of an Int.

Why doesn't `iterate` from the Prelude tie the knot?

Why isn't iterate defined like
iterate :: (a -> a) -> a -> [a]
iterate f x = xs where xs = x : map f xs
in the Prelude?
Tying the knot like that doesn't appear to increase sharing.
Contrast with:
cycle xs = let x = xs ++ x in x
Tying the knot here has the effect of creating a circular linked list in memory. x is its own tail. There's a real gain.
Your suggested implementation doesn't increase sharing over the naive implementation. And there's no way for it to do so in the first place - there's no shared structure in something like iterate (+1) 0 anyway.
There is no knot tying going on in your version, it just maintains a pointer one notch back on the produced list, to find the input value there for the next iteration. This means that each list cell can't be gc-ed until the next cell is produced.
By contrast, the Prelude's version uses iterate's call frame for that, and since it's needed only once, a good compiler could reuse that one frame and mutate the value in it, for more optimized operation overall (and list's cells are independent of each other in that case).
The Prelude definition, which I include below for clarity, has no overhead needed to call map.
iterate f x = x : iterate f (f x)
Just for fun, I made a small quick program to test your iterate vs the Prelude's - just to reduce to normal form take 100000000 $ iterate (+1) 0 (this is a list of Ints). I only ran 5 tests, but your version ran for 7.833 (max 7.873 min 7.667) while the Prelude's was at 7.519 (max 7.591 min 7.477). I suspect the time difference is the overhead of map getting called.
Second reason is simply: readability.

Learning Haskell maps, folds, loops and recursion

I've only just dipped my toe in the world of Haskell as part of my journey of programming enlightenment (moving on from, procedural to OOP to concurrent to now functional).
I've been trying an online Haskell Evaluator.
However I'm now stuck on a problem:
Create a simple function that gives the total sum of an array of numbers.
In a procedural language this for me is easy enough (using recursion) (c#) :
private int sum(ArrayList x, int i)
{
if (!(x.Count < i + 1)) {
int t = 0;
t = x.Item(i);
t = sum(x, i + 1) + t;
return t;
}
}
All very fine however my failed attempt at Haskell was thus:
let sum x = x+sum in map sum [1..10]
this resulted in the following error (from that above mentioned website):
Occurs check: cannot construct the infinite type: a = a -> t
Please bear in mind I've only used Haskell for the last 30 minutes!
I'm not looking simply for an answer but a more explanation of it.
I'm not looking simply for an answer but a more explanation of it.
On the left-hand side of the = you use sum as a function applied to x. The compiler doesn't know the type of x, so the compiler uses type variable a to stand for "the type of x." At thus point the compiler doesn't know the result type of the function sum either, so it picks another type variable, this type t, to stand for the result type. Now on the left-hand side the compiler thinks that the type of x is a -> t (function accepting a and returning t).
On the right-hand side of the = you add x and sum. In Haskell all kinds of numbers can be added, but you can add two numbers only if they have the same type. So here the compiler assumes that sum has the same type as x, namely type a.
But in Haskell an identifier has one type—maybe a whangdilly complicated type, but one type nevertheless. This includes sum, whose type should be the same on both sides of the ` sign, so the compiler tries to solve the equation
a = a -> t
There are no values for a and t that solve this equation. It simply can't be done. There is no a such that a is equal to a function that accepts itself as an argument. Thus ariseth the error message
cannot construct the infinite type: a = a -> t
Even with all the explanation, it's not such a great error message, is it?
Welcome to Haskell :-)
P.S. You might enjoy trying "Helium, for learning Haskell", which gives much nicer error messages for beginners.
'sum' takes a list of values and reduces it to a single value. You can either write it as an explicit loop (remember that Haskell has no loop keywords, but uses recursion). Note how the definition has two parts, based on the shape of the list:
mysum [] = 0
mysum (x:xs) = x + mysum xs
Or more efficiently, in a tail-recursive style:
mysum xs = go 0 xs
where
go n [] = n
go n (x:xs) = go (n+x) xs
However, Haskell has a rich library of control structures that operate on lazy lists. In this case, reduction of a list to a single value can be done with a reduce function: a fold.
So mysum can be written as:
mysum xs = foldr (+) 0 xs
For example:
Prelude> foldr (+) 0 [1..10]
55
Your mistake was to use a map, which transforms a list, one element at a time, rather than a fold.
I'd suggest you start with an introduction to Haskell, perhaps "Programming in Haskell", to get a feel for the core concepts of functional programming. Other good introductory materials are described in this question.
You need to read a good tutorial, there are a number of big misunderstandings.
First I'm going to assume you mean lists and not arrays. Arrays exist in Haskell, but they aren't something you'd encounter at the beginner level. (Not to mention you're using [1..10] which is a list of the numbers 1 to 10).
The function you want is actually built in, and called sum, so we'll have to call our something else, new_sum:
new_sum [] = 0
new_sum (h:t) = h + (sum t)
Let's look at the first part of this:
let sum x = x+sum
What would the type of sum be in this case? It takes a number and returns a function that takes a number which returns a function that takes a number etc. if you had written it
let sum x = + x
you would have a function that takes a number and returns the function +x.
and
let sum = +
would return a function that takes two integers and adds them.
so now let's look at the second part.
in map sum [1..10]
map takes a function of one argument and applies it to every element of the list. There is no room to wedge an accumulator in there, so let's look at other list functions in particular foldl, foldr. both of these take a function of two arguments a list and a beginning value. The difference between foldl and foldr is on the side in which they start. l being left so 1 + 2 + 3 etc and r being right 10 + 9 + 8 etc.
let sum = (+) in foldl sum 0 [1..10]

Resources