Haskell - why would I use infinite data structures? - haskell

In Haskell, it is possible to define infinite lists like so:
[1.. ]
If found many articles which describe how to implement infinite lists and I understood how this works.
However, I cant think of any reason to use the concept of infinite datastructures.
Can someone give me an example of a problem, which can be solved easier (or maybe only) with an infinite list in Haskell?

The basic advantage of lists in Haskell is that they’re a control structure that looks like a data structure. You can write code that operates incrementally on streams of data, but it looks like simple operations on lists. This is in contrast to other languages that require the use of an explicitly incremental structure, like iterators (Python’s itertools), coroutines (C# IEnumerable), or ranges (D).
For example, a sort function can be written in such a way that it sorts as few elements as possible before it starts to produce results. While sorting the entire list takes O(n log n) / linearithmic time in the length of the list, minimum xs = head (sort xs) only takes O(n) / linear time, because head will only examine the first constructor of the list, like x : _, and leave the tail as an unevaluated thunk that represents the remainder of the sorting operation.
This means that performance is compositional: for example, if you have a long chain of operations on a stream of data, like sum . map (* 2) . filter (< 5), it looks like it would first filter all the elements, then map a function over them, then take the sum, producing a full intermediate list at every step. But what happens is that each element is only processed one at a time: given [1, 2, 6], this basically proceeds as follows, with all the steps happening incrementally:
total = 0
1 < 5 is true
1 * 2 == 2
total = 0 + 2 = 2
2 < 5 is true
2 * 2 == 4
total = 2 + 4 = 6
6 < 5 is false
result = 6
This is exactly how you would write a fast loop in an imperative language (pseudocode):
total = 0;
for x in xs {
if (x < 5) {
total = total + x * 2;
}
}
This means that performance is compositional: because of laziness, this code has constant memory usage during the processing of the list. And there is nothing special inside map or filter that makes this happen: they can be entirely independent.
For another example, and in the standard library computes the logical AND of a list, e.g. and [a, b, c] == a && b && c, and it’s implemented simply as a fold: and = foldr (&&) True. The moment it reaches a False element in the input, it stops evaluation, simply because && is lazy in its right argument. Laziness gives you composition!
For a great paper on all this, read the famous Why Functional Programming Matters by John Hughes, which goes over the advantages of lazy functional programming (in Miranda, a forebear of Haskell) far better than I could.

Annotating a list with its indices temporarily uses an infinite list of indices:
zip [0..] ['a','b','c','d'] = [(0,'a'), (1,'b'), (2,'c'), (3,'d')]
Memoizing functions while maintaining purity (in this case this transformation causes an exponential speed increase, because the memo table is used recursively):
fib = (memo !!)
where
memo = map fib' [0..] -- cache of *all* fibonacci numbers (evaluated on demand)
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n-1) + fib (n-2)
Purely mocking programs with side-effects (free monads)
data IO a = Return a
| GetChar (Char -> IO a)
| PutChar Char (IO a)
Potentially non-terminating programs are represented with infinite IO strucutres; e.g. forever (putChar 'y') = PutChar 'y' (PutChar 'y' (PutChar 'y' ...))
Tries: if you define a type approximately like the following:
data Trie a = Trie a (Trie a) (Trie a)
it can represent an infinite collection of as indexed by the naturals. Note that there is no base case for the recursion, so every Trie is infinite. But the element at index n can be accessed in log(n) time. Which means you can do something like this (using some functions in the inttrie library):
findIndices :: [Integer] -> Trie [Integer]
findIndices = foldr (\(i,x) -> modify x (i:)) (pure []) . zip [0..]
this builds an efficient "reverse lookup table" which given any value in the list can tell you at which indices it occurs, and it both caches results and streams information as soon as it's available:
-- N.B. findIndices [0, 0,1, 0,1,2, 0,1,2,3, 0,1,2,3,4...]
> table = findIndices (concat [ [0..n] | n <- [0..] ])
> table `apply` 0
[0,1,3,6,10,15,21,28,36,45,55,66,78,91,...
all from a one-line infinite fold.
I'm sure there are more examples, there are so many cool things you can do.

Related

Lazy Catalan Numbers in Haskell

How might I go about efficiently generating an infinite list of Catalan numbers? What I have now works reasonably quickly, but it seems to me that there should be a better way.
c 1 = 1
c n = sum (zipWith (*) xs (reverse xs)) : xs
where xs = c (n-1)
catalan = map (head . c) [1..]
I made an attempt at using fix instead, but the lambda isn't lazy enough for the computation to terminate:
catalan = fix (\xs -> xs ++ [zipWith (*) xs (reverse xs)])
I realize (++) isn't ideal
Does such a better way exist? Can that function be made sufficiently lazy? There's an explicit formula for the nth, I know, but I'd rather avoid it.
The Catalan numbers [wiki] can be defined inductively with:
C0 = 1 and Cn+1=(4n+2)×Cn/(n+2).
So we can implement this as:
catalan :: Integral i => [i]
catalan = xs
where xs = 1 : zipWith f [0..] xs
f n cn = div ((4*n+2) * cn) (n+2)
For example:
Prelude> take 10 catalan
[1,1,2,5,14,42,132,429,1430,4862]
I'm guessing you're looking for a lazy, infinite, self-referential list of all the Catalan numbers using one of the basic recurrence relations. That's a common thing to do with the Fibonacci numbers after all. But it would help to specify the recurrence relation you mean, if you want answers to your specific question. I'm guessing this is the one you mean:
cat :: Integer -> Integer
cat 1 = 1
cat n = sum [ cat i * cat (n - i) | i <- [1 .. n - 1] ]
If so, the conversion to a self-referential form looks like this:
import Data.List (inits)
cats :: [Integer]
cats = 1 : [ sum (zipWith (*) pre (reverse pre)) | pre <- tail (inits cats) ]
This is quite a lot more complex than the fibonacci examples, because the recurrence refers to all previous entries in the list, not just a fixed small number of the most recent. Using inits from Data.List is the easiest way to get the prefix at each position. I used tail there because its first result is the empty list, and that's not helpful here. The rest is a straight-forward rewrite of the recurrence relation that I don't have much to say about. Except...
It's going to perform pretty badly. I mean, it's better than the exponential recursive calls of my cat function, but there's a lot of list manipulation going on that's allocating and then throwing away a lot of memory cells. That recurrence relation is not a very good fit for the recursive structure of the list data type. You can explore a lot of ways to make it more efficient, but they'll all be pretty bad in the end. For this particular case, going to a closed-form solution is the way to go if you want performance.
Apparently, what you wanted is
> cats = 1 : unfoldr (\ fx -> let x = sum $ zipWith (*) fx cats in Just (x, x:fx)) [1]
> take 10 cats
[1,1,2,5,14,42,132,429,1430,4862]
This avoids the repeated reversing of the prefixes (as in the linked answer), by unfolding with the state being a reversed prefix while consing onto the state as well as producing the next element.
The non-reversed prefix we don't have to maintain, as zipping the reversed prefix with the catalans list itself takes care of the catalans prefix's length.
You did say you wanted to avoid the direct formula.
The Catalan numbers are best understood by their generating function, which satisfies the relation
f(t) = 1 + t f(t)^2
This can be expressed in Haskell as
f :: [Int]
f = 1 : convolve f f
for a suitable definition of convolve. It is helpful to factor out convolve, for many other counting problems take this form. For example, a generalized Catalan number enumerates ternary trees, and its generating function satisfies the relation
g(t) = 1 + t g(t)^3
which can be expressed in Haskell as
g :: [Int]
g = 1 : convolve g (convolve g g)
convolve can be written using Haskell primitives as
convolve :: [Int] -> [Int] -> [Int]
convolve xs = map (sum . zipWith (*) xs) . tail . scanl (flip (:)) []
For these two examples and many other special cases, there are formulas that are quicker to evaluate. convolve is however more general, and cognitively more efficient. In a typical scenario, one has understood a counting problem in terms of a polynomial relation on its generating function, and now wants to compute some numbers in order to look them up in The On-Line Encyclopedia of Integer Sequences. One wants to get in and out, indifferent to complexity. What language will be least fuss?
If one has seen the iconic Haskell definition for the Fibonacci numbers
fibs :: [Int]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
then one imagines there must be a similar idiom for products of generating functions. That search is what brought me here.

Use of folding in defining functions

I was introduced to the use of fold in defining function. I have an idea how that works but im not sure why one should do it. To me, it feels like just simplifying name of data type and data value ... Would be great if you can show me examples where it is significant to use fold.
data List a = Empty | (:-:) a (List a)
--Define elements
List a :: *
[] :: List a
(:) :: a -> List a -> List a
foldrList :: (a -> b -> b) -> b -> List a -> b
foldrList f e Empty = e
foldrList f e (x:-:xs) = f x (foldrList f e xs)
The idea of folding is a powerful one. The fold functions (foldr and foldl in the Haskell base library) come from a family of functions called Higher-Order Functions (for those who don't know - these are functions which take functions as parameters or return functions as their output).
This allows for greater code clarity as the intention of the program is more clearly expressed. A function written using fold functions strongly indicates that there is an intention to iterate over the list and apply a function repeatedly to obtain an output. Using the standard recursive method is fine for simple programs but when complexity increases it can become difficult to understand quickly what is happening.
Greater code re-use can be achieved with folding due to the nature of passing in a function as the parameter. If a program has some behaviour that is affected by the passing of a Boolean or enumeration value then this behaviour can be abstracted away into a separate function. The separate function can then be used as an argument to fold. This achieves greater flexibility and simplicity (as there are 2 simpler functions versus 1 more complex function).
Higher-Order Functions are also essential for Monads.
Credit to the comments for this question as well for being varied and informative.
Higher-order functions like foldr, foldl, map, zipWith, &c. capture common patterns of recursion so you can avoid writing manually recursive definitions. This makes your code higher-level and more readable: instead of having to step through the code and infer what a recursive function is doing, the programmer can reason about compositions of higher-level components.
For a somewhat extreme example, consider a manually recursive calculation of standard deviation:
standardDeviation numbers = step1 numbers
where
-- Calculate length and sum to obtain mean
step1 = loop 0 0
where
loop count sum (x : xs) = loop (count + 1) (sum + x) xs
loop count sum [] = step2 sum count numbers
-- Calculate squared differences with mean
step2 sum count = loop []
where
loop diffs (x : xs) = loop ((x - (sum / count)) ^ 2 : diffs) xs
loop diffs [] = step3 count diffs
-- Calculate final total and return square root
step3 count = loop 0
where
loop total (x : xs) = loop (total + x) xs
loop total [] = sqrt (total / count)
(To be fair, I went a little overboard by also inlining the summation, but this is roughly how it may typically be done in an imperative language—manually looping.)
Now consider a version using a composition of calls to standard functions, some of which are higher-order:
standardDeviation numbers -- The standard deviation
= sqrt -- is the square root
. mean -- of the mean
. map (^ 2) -- of the squares
. map (subtract -- of the differences
(mean numbers)) -- with the mean
$ numbers -- of the input numbers
where -- where
mean xs -- the mean
= sum xs -- is the sum
/ fromIntegral (length xs) -- over the length.
This more declarative code is also, I hope, much more readable—and without the heavy commenting, could be written neatly in two lines. It’s also much more obviously correct than the low-level recursive version.
Furthermore, sum, map, and length can all be implemented in terms of folds, as well as many other standard functions like product, and, or, concat, and so on. Folding is an extremely common operation on not only lists, but all kinds of containers (see the Foldable typeclass), because it captures the pattern of computing something incrementally from all elements of a container.
A final reason to use folds instead of manual recursion is performance: thanks to laziness and optimisations that GHC knows how to perform when you use fold-based functions, the compiler may fuse a series of folds (maps, &c.) together into a single loop at runtime.

how to do this in haskell ? [x^0,x^1,x^2,x^3 ...]

i want to have a list like this one
[x^0,x^1,x^2,x^3 ...]
is it possible to have such a list
for example
ex : x = 2 [1,2,4,8,16,32 ..]
You can use iterate or unfoldr to double a number many times. This could be more efficient than computing x^n for each n.
Below, I use x=2, but you can use any x.
> take 10 $ iterate (*2) 1
[1,2,4,8,16,32,64,128,256,512]
> take 10 $ unfoldr (\x -> Just (x,2*x)) 1
[1,2,4,8,16,32,64,128,256,512]
Also beware that bounded integer types such as Int will overflow pretty fast in this way.
Yes, it is pretty easy thing to do in haskell.
You create an infinite stream of positive numbers and then map over them with function n ↦ x^n
f :: Num a => a -> [a]
f x = fmap (\n -> x^n) [0..]
> take 10 (f 2)
[1,2,4,8,16,32,64,128,256,512]
In Haskell, the list is linear no matter the progression. By linear, I mean non-recursive. The elements in the list are not dependent on one or more previous elements or an initial element.
In Haskell such lists are used very much. In Haskell there are two primary facilities for producing such lists. The first is map and it is effective without any filtering or recursion.
f b n = map (b^) [0..n]
The second is the list comprehension
f b n = [b^x|x<-[0..n]]
In both it is simple to set the limit or number of elements in the result. These could both be made into infinite lists if desired by excluding the n in both the left and right side of the equations.

Performance of predicate on the length of a list

I have read (and also reasoned) that calculating the length of a list is not good for performance in Haskell. However, i have long lists in my program and my requirement is to find that if length is greater than or less than some number X.
Is their already something in build in Haskell for these kind of predicates ? or i have to resort to manual looping.
On vanilla lists, you can check this using drop:
cmpLen :: Int -> [a] -> Ordering
cmpLen n xs
| n < 0 = GT
| otherwise = case drop (n-1) xs of
[] -> GT
[_] -> EQ
_ -> LT
However, this still takes as long as the value (not size, as is typical of asymptotic analysis!) of n. If you intend to do this often, you can take a cue from Okasaki and build a new structure that caches the operation you want to be efficient. I have wanted this a few times before, and found the following sort of interface convenient in those cases:
type LenList a = (Sum Word, [a])
singleton x = (1, [x])
cons x = (singleton x<>)
length = getSum . fst
elems = snd
Note that, since LenList a is already a Monoid, you get some of the usual operations for free, e.g. there is an empty LenList a named mempty, and concatenation is given by (<>). Some operations (notably the ones that produce infinite lists) will not be implementable for this type. However, you pay an O(1) price on each construction operation to make asking for the length of one of these O(1), which can be a nice tradeoff in many situations.

Forcing Strict Evaluation - What am I doing wrong?

I want an intermediate result computed before generating the new one to get the benefit of memoization.
import qualified Data.Map.Strict as M
import Data.List
parts' m = newmap
where
n = M.size m + 1
lists = nub $ map sort $
[n] : (concat $ map (\i -> map (i:) (M.findWithDefault [] (n-i) m)) [1..n])
newmap = seq lists (M.insert n lists m)
But, then if I do
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
It still completes instantaneously.
(Can using an Array instead of a Map help?)
Short answer:
If you need to calculate the entire list/array/map/... at once, you can use deepseq as #JoshuaRahm suggests, or the ($!!) operator.
The answer below how you can enforce strictness, but only on level-1 (it evaluates until it reaches a datastructure that may contain (remainders) of expression trees).
Furthermore the answer argues why laziness and memoization are not (necessarily) opposites of each other.
More advanced:
Haskell is a lazy language, it means it only calculates something, if it is absolutely necessary. An expression like:
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
is not evaluated immediately: Haskell simply stores that this has to be calculated later. Later if you really need the first, second, i-th, or the length of the list, it will evaluate it, and even then in a lazy fashion: if you need the first element, from the moment it has found the way to calculate that element, it will represent it as:
element : take 1999 (<some-expression>)
You can however force Haskell to evaluate something strictly with the exclamation mark (!), this is called strictness. For instance:
main = do
return $! take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
Or in case it is an argument, you can use it like:
f x !y !z = x+y+z
Here you force Haskell to evaluate y and z before "increasing the expression tree" as:
expression-for-x+expression-for-y+expression-for-z.
EDIT: if you use it in a let pattern, you can use the bang as well:
let !foo = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in ...
Note that you only collapse the structure to the first level. Thus let !foo will more or less only evaluate up to (_:_).
Note: note that memoization and lazyness are not necessary opposites of each other. Consider the list:
numbers :: [Integer]
numbers = 0:[i+(sum (genericTake i numbers))|i<-[1..]]
As you can see, calculating a number requires a large amount of computational effort. Numbers is represented like:
numbers ---> (0:[i+(sum (genericTake i numbers))|i<-[1..]])
if however, I evaluate numbers!!1, it will have to calculate the first element, it returns 1; but the internal structure of numbers is evaluated as well. Now it looks like:
numbers (0:1:[i+(sum (genericTake i numbers))|i<-[2..]])
The computation numbers!!1 thus will "help" future computations, because you will never have to recalcuate the second element in the list.
If you for instance calculate numbers!!4000, it will take a few seconds. Later if you calculate numbers!!4001, it will be calculated almost instantly. Simply because the work already done by numbers!!4000 is reused.
Arrays might be able to help, but you can also try taking advantage of the deepseq library. So you can write code like this:
let x = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in do
x `deepseq` print (x !! 5) -- takes a *really* long time
print (x !! 1999) -- finishes instantly
You are memoizing the partitions functions, but there are some drawbacks to your approach:
you are only memoizing up to a specific value which you have to specify beforehand
you need to call nub and sort
Here is an approach using Data.Memocombinators:
import Data.Memocombinators
parts = integral go
where
go k | k <= 0 = [] -- for safety
go 1 = [[1]]
go n = [[n]] ++ [ (a : p) | a <- [n-1,n-2..1], p <- parts (n-a), a >= head p ]
E.g.:
ghci> parts 4
[[4],[3,1],[2,2],[2,1,1],[1,1,1,1]]
This memoization is dynamic, so only the values you actually access will be memoized.
Note how it is constructed - parts = integral go, and go uses parts for any recursive calls. We use the integral combinator here because parts is a function of an Int.

Resources