Complexity of two cumulative sum (cumsum) functions in Haskell - haskell

Consider the following two cumulative sum (cumsum) functions:
cumsum :: Num a => [a] -> [a]
cumsum [] = []
cumsum [x] = [x]
cumsum (x:y:ys) = x : (cumsum $ (x+y) : ys)
and
cumsum' :: Num a => [a] -> [a]
cumsum' x = [sum $ take k x | k <- [1..length x]]
Of course, I prefer the definition of cumsum to that of cumsum' and I understand that the former has linear complexity.
But just why does cumsum' also have linear complexity? take itself has linear complexity in the length of its argument and k runs from 1 to length x. Therefore I'd have expected quadratic complexity for cumsum'.
Moreover, the constant of cumsum' is lower than that of cumsum. Is that due to the recursive list appending of the latter?
NOTE: welcoming any smart definition of a cumulative sum.
EDIT: I'm measuring execution times using (after enabling :set +s in GHCi):
last $ cumsum [1..n]

This is a measurement error caused by laziness.
Every value in Haskell is lazy: it isn't evaluated until necessary. This includes sub-structure of values - so for example when we see a pattern (x:xs) this only forces evaluation of the list far enough to identify that the list is non-empty, but it doesn't force the head x or the tail xs.
The definition of last is something like:
last [x] = x
last (x:xs) = last xs
So when last is applied to the result of cumsum', it inspects the list comprehension recursively, but only enough to track down the last entry. It doesn't force any of the entries, but it does return the last one.
When this last entry is printed in ghci or whatever, then it is forced which takes linear time as expected. But the other entries are never calculated so we don't see the "expected" quadratic behaviour.
Using maximum instead of last does demonstrate that cumnorm' is quadratic whereas cumnorm is linear.
[Note: this explanation is somewhat hand-wavy: really evaluation is entirely driven by what's needed for the final result, so even last is only evaluted at all because its result is needed. Search for things like "Haskell evaluation order" and "Weak Head Normal Form" to get a more precise explanation.]

Related

execution order influenced space complexity in haskell

i am reading about haskell in wikibook:
https://en.wikibooks.org/wiki/Haskell/Graph_reduction
and in the article, there is an example that puzzles me:
Tricky space leak example:
(\xs -> head xs + last xs) [1..n]
(\xs -> last xs + head xs) [1..n]
The first version runs on O(1) space. The second in O(n).
how come?
I assume last is implemented like following:
last [] = error ""
last [r] = r
last (_:t) = last t
so a tail recursion should take constant space. why the second one would yield linear space?
The simple answer has to be that when evaluating a + b, the expression a gets evaluated before b in practice.
Normally last xs only takes O(1) space - the elements of xs can be discarded as last traverses down the list. However, if xs is needed for another expression, the expansion of xs must be kept in memory for the other expression, and this is what is happening in the second case since it appears that xs is needed for head.
In the first case head xs is evaluated "first" which only evaluates the first element of xs. Then last xs is evaluated, traversing down the list. But since there isn't any further need for the elements of the list they may be discarded during the traversal.

Is the computational complexity of this function O(2^n) or O(n)

I want to make a function that creates an infinite list that takes two numbers and an operator as input so it can generate arithmetic and geometric sequences.
infiniteList:: (Floating a)=>a->(a->a->a)->a->[a]
infiniteList start operation changeby =
start:[(operation x changeby)| x<-(infiniteList start operation changeby)]
the code compiles and works properly: infiniteList 1 (*) 2 generates a list starting from 1 and subsequent numbers are double its predecessor.
Now I'm having trouble figuring out the computational complexity "to calculate the nth element of the list". Technically it is doing one operation to figure out each element of the list. However, if you were after the (2^k +1) term, I would have to wait for the computer to finish calculating 2^(k+1) elements first.
I hope I'm explaining this properly, so basically I think the program produces the elments in 2^k batches where k is an integer, so you could potentially be waiting for ( 2^(k+1)-2^k) time to calculate the (2^k +1)th integer. So what is the computational complexity "to calculate the nth element of the list"?
A key tool is the following rule:
When analyzing the performance (not the totality) of a binding, you are allowed to assume, when analyzing its right-hand-side, that the binding itself has been fully evaluated.
You are defining infiniteList, so you are allowed to assume that in the RHS, the infiniteList binding has been fully evaluated. That, unfortunately, isn't useful, because infiniteList is just a function, and fully evaluating it just gives you the function!
But you can use this reasoning tool to figure out a fix: you have to bind the right thing.
infiniteList :: a -> (a -> a -> a) -> a -> [a]
infiniteList start operation changeby =
let result =
start : [operation x changeby | x <- result]
in result
Now you have a useful binding, result, which you can assume is fully evaluated! In the RHS, you now have, essentially,
start : map (\x -> operation x changeby) result
which is clearly O(n).
Indeed with the first definition,
> infiniteList 1 (*) 2 !! 10000
takes longer than I wish to wait, but with the modified definition, it takes a mere 0.04 seconds even in GHCi.
The run time depends a lot on how GHC decides to evaluate it.
To simplify things, consider this version of the function:
inf a f = a : [ f x | x <- inf a f ]
If GHC performed common sub-expression elimination on int a f, it could decide to evaluate it as if it had been written:
inf a f = let r = a : [ f x | x <- r ]
in r
and this runs in linear time.
I'm not sure where you are getting the "batches" idea from. Below is a transcript of the first few elements of the list. From that, I think you should be able to figure out the complexity.
What's the first element of the list? It is start, because infiniteList is defined as start:[something], and the first element of any list of that form is start.
What is the second element of the list? We certainly need to consult the [something] portion of the list above. The first element of that sublist is operation x changeby where x is the first element of infiniteList. We decided already that the first element is start, so the second element is operation start changeby, which is exactly what we wanted. What do we have to compute to get the second element? Just the first, plus the operation.
What is the third element of the list? It's the second element of [something], which is operation x changeby where x is the second element of infiniteList. Fortunately, we just calculated what that is...
What do we have to compute to get the third element? Just the first and second, plus the operation.
Although it doesn't directly answer the question, you should ask yourself what complexity you expect the function to have. How much work needs to be done to get the nth element? It's possible that your implementation in code is worse, but it might help you think about your code differently.
Just do some math, assume calculate nth item requires T(n) calculations, as
[(operation x changeby)| x<-(infiniteList start operation changeby)]
suggests, we need to know sub problem T(n-1), and the full list comprehension have n-1 operations, and then concat star:... operation is efficient, and have 1 calculation, so
T(n) = T(n-1) + (n - 1) + 1 = T(n-1) + n -> O(n^2)
Actually, you can "feel" the time complexity just by running some examples. Let f n = (infiniteList 0 (+) 1) !! n, then run f 10, f 100, f 1000, f 10000, you can see the difference.
Usually, when n=1000 runs in no time, n=10000 run some time like 1 or 2 seconds, and n=100000 run forever, it is usually O(n^2).
BTW, there is an O(n) approach:
infi :: a -> (a -> a -> a) -> a -> [a]
infi x f s = x : infi (f x s) f s
You can do some math and run some examples to feel the difference.
One strategy that sometimes helps with recursion is to expand it out a few times to get a better idea of what's going on. Let's try that:
infiniteList start operation changeby =
start:[(operation x changeby) | x <-
start:[(operation x changeby) | x <-
start:[(operation x changeby) | x <-
start:[(operation x changeby) | x <-
start:[(operation x changeby) | x <- (infiniteList start operation changeby)]]]]]
We can see the first element in the list is going to be start as expected. Then the second element will be start from the first recursive call passed through operation x changeby. What will the third item be? Well it'll be the second item of the first recursive call, so it'll be start passed through two calls of operation x changeby. Now the pattern emerges! In general, the nth item of infiniteList will be start with operation x changeby called on it n-1 times. This is rather unfortunate because ,as any student of computer science knows, 1 + 2 + ... + n - 1 = n(n-1)/2 = O(n^2).
There is, of course, a much more efficient way to write this function. Instead of applying operation x changeby to start n-1 times to get the nth item, why don't we just apply it once to the previous item? This will give us an O(n) solution. For example, we can use unfoldr from Data.List:
import Data.List (unfoldr)
infiniteList start operation changeby =
unfoldr (\x -> Just (x, operation x changeby)) start

Forcing Strict Evaluation - What am I doing wrong?

I want an intermediate result computed before generating the new one to get the benefit of memoization.
import qualified Data.Map.Strict as M
import Data.List
parts' m = newmap
where
n = M.size m + 1
lists = nub $ map sort $
[n] : (concat $ map (\i -> map (i:) (M.findWithDefault [] (n-i) m)) [1..n])
newmap = seq lists (M.insert n lists m)
But, then if I do
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
It still completes instantaneously.
(Can using an Array instead of a Map help?)
Short answer:
If you need to calculate the entire list/array/map/... at once, you can use deepseq as #JoshuaRahm suggests, or the ($!!) operator.
The answer below how you can enforce strictness, but only on level-1 (it evaluates until it reaches a datastructure that may contain (remainders) of expression trees).
Furthermore the answer argues why laziness and memoization are not (necessarily) opposites of each other.
More advanced:
Haskell is a lazy language, it means it only calculates something, if it is absolutely necessary. An expression like:
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
is not evaluated immediately: Haskell simply stores that this has to be calculated later. Later if you really need the first, second, i-th, or the length of the list, it will evaluate it, and even then in a lazy fashion: if you need the first element, from the moment it has found the way to calculate that element, it will represent it as:
element : take 1999 (<some-expression>)
You can however force Haskell to evaluate something strictly with the exclamation mark (!), this is called strictness. For instance:
main = do
return $! take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
Or in case it is an argument, you can use it like:
f x !y !z = x+y+z
Here you force Haskell to evaluate y and z before "increasing the expression tree" as:
expression-for-x+expression-for-y+expression-for-z.
EDIT: if you use it in a let pattern, you can use the bang as well:
let !foo = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in ...
Note that you only collapse the structure to the first level. Thus let !foo will more or less only evaluate up to (_:_).
Note: note that memoization and lazyness are not necessary opposites of each other. Consider the list:
numbers :: [Integer]
numbers = 0:[i+(sum (genericTake i numbers))|i<-[1..]]
As you can see, calculating a number requires a large amount of computational effort. Numbers is represented like:
numbers ---> (0:[i+(sum (genericTake i numbers))|i<-[1..]])
if however, I evaluate numbers!!1, it will have to calculate the first element, it returns 1; but the internal structure of numbers is evaluated as well. Now it looks like:
numbers (0:1:[i+(sum (genericTake i numbers))|i<-[2..]])
The computation numbers!!1 thus will "help" future computations, because you will never have to recalcuate the second element in the list.
If you for instance calculate numbers!!4000, it will take a few seconds. Later if you calculate numbers!!4001, it will be calculated almost instantly. Simply because the work already done by numbers!!4000 is reused.
Arrays might be able to help, but you can also try taking advantage of the deepseq library. So you can write code like this:
let x = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in do
x `deepseq` print (x !! 5) -- takes a *really* long time
print (x !! 1999) -- finishes instantly
You are memoizing the partitions functions, but there are some drawbacks to your approach:
you are only memoizing up to a specific value which you have to specify beforehand
you need to call nub and sort
Here is an approach using Data.Memocombinators:
import Data.Memocombinators
parts = integral go
where
go k | k <= 0 = [] -- for safety
go 1 = [[1]]
go n = [[n]] ++ [ (a : p) | a <- [n-1,n-2..1], p <- parts (n-a), a >= head p ]
E.g.:
ghci> parts 4
[[4],[3,1],[2,2],[2,1,1],[1,1,1,1]]
This memoization is dynamic, so only the values you actually access will be memoized.
Note how it is constructed - parts = integral go, and go uses parts for any recursive calls. We use the integral combinator here because parts is a function of an Int.

H-99 Problems: #26 Can't Understand The Solution

I am currently working through H-99 Questions after reading Learn You a Haskell. So far I felt like I had a pretty good grasp of the concepts, and I didn't have too much trouble solving or understanding the previous problems. However, this one stumped me and I don't understand the solution.
The problem is:
Generate the combinations of K distinct objects chosen from the N elements of a list
In how many ways can a committee of 3 be chosen from a group of 12 people? We all know that there are C(12,3) = 220 possibilities (C(N,K) denotes the well-known binomial coefficients). For pure mathematicians, this result may be great. But we want to really generate all the possibilities in a list.
The solution provided:
import Data.List
combinations :: Int -> [a] -> [[a]]
combinations 0 _ = [ [] ]
combinations n xs = [ y:ys | y:xs' <- tails xs, ys <- combinations (n-1) xs']
The main point of confusion for me is the y variable. according to how tails works it should be getting assigned the entire list at the beginning and then that list will be preppend to ys after it is generate. However, when the function run it return a list of lists no longer than the n value passed in. Could someone please help me understand exactly how this works?
Variable y is not bound to the whole xs list. For instance, assume xs=[1,2,3]. Then:
y:xs' is matched against [1,2,3] ==> y=1 , xs'=[2,3]
y:xs' is matched against [2,3] ==> y=2 , xs'=[3]
y:xs' is matched against [3] ==> y=3 , xs'=[]
y:xs' is matched against [] ==> pattern match failure
Note that y is an integer above, while xs' is a list of integers.
The Haskell code can be read a a non-deterministic algorithm, as follows. To generate a combination of n elements from xs, get any tail of xs (i.e., drop any number of elements from the beginning). If the tail is empty, ignore it. Otherwise, let the tail be y:xs', where y is the first element of the tail and xs' the remaining (possibly empty) part. Take y and add it to the combination we are generating (as the first element). Then recursively choose other n-1 arguments from the xs' remaining part, and add those to the combination as well. When n drops to zero, we know there is only one combination, namely the empty combination [], so take that.
y is not appended to ys. That would involve the (++) :: [a] -> [a] -> [a] operator.
For that matter the types would not match if you tried to append y and ys. y has type a, while ys has type [a].
Rather, y is consed to ys using (:) :: a -> [a] -> [a] (the cons operator).
The length of the returned list is equal to n because combinations recurses from n to 0 so it will produce exactly n inner lists.

Haskell: Minimum sum of list

So, I'm new here, and I would like to ask 2 questions about some code:
Duplicate each element in list by n times. For example, duplicate [1,2,3] should give [1,2,2,3,3,3]
duplicate1 xs = x*x ++ duplicate1 xs
What is wrong in here?
Take positive numbers from list and find the minimum positive subtraction. For example, [-2,-1,0,1,3] should give 1 because (1-0) is the lowest difference above 0.
For your first part, there are a few issues: you forgot the pattern in the first argument, you are trying to square the first element rather than replicate it, and there is no second case to end your recursion (it will crash). To help, here is a type signature:
replicate :: Int -> a -> [a]
For your second part, if it has been covered in your course, you could try a list comprehension to get all differences of the numbers, and then you can apply the minimum function. If you don't know list comprehensions, you can do something similar with concatMap.
Don't forget that you can check functions on http://www.haskell.org/hoogle/ (Hoogle) or similar search engines.
Tell me if you need a more thorough answer.
To your first question:
Use pattern matching. You can write something like duplicate (x:xs). This will deconstruct the first cell of the parameter list. If the list is empty, the next pattern is tried:
duplicate (x:xs) = ... -- list is not empty
duplicate [] = ... -- list is empty
the function replicate n x creates a list, that contains n items x. For instance replicate 3 'a' yields `['a','a','a'].
Use recursion. To understand, how recursion works, it is important to understand the concept of recursion first ;)
1)
dupe :: [Int] -> [Int]
dupe l = concat [replicate i i | i<-l]
Theres a few problems with yours, one being that you are squaring each term, not creating a new list. In addition, your pattern matching is off and you would create am infinite recursion. Note how you recurse on the exact same list as was input. I think you mean something along the lines of duplicate1 (x:xs) = (replicate x x) ++ duplicate1 xs and that would be fine, so long as you write a proper base case as well.
2)
This is pretty straight forward from your problem description, but probably not too efficient. First filters out negatives, thewn checks out all subtractions with non-negative results. Answer is the minumum of these
p2 l = let l2 = filter (\x -> x >= 0) l
in minimum [i-j | i<-l2, j<-l2, i >= j]
Problem here is that it will allow a number to be checkeed against itself, whichwiull lend to answers of always zero. Any ideas? I'd like to leave it to you, commenter has a point abou t spoon-feeding.
1) You can use the fact that list is a monad:
dup = (=<<) (\x -> replicate x x)
Or in do-notation:
dup xs = do x <- xs; replicate x x; return x
2) For getting only the positive numbers from a list, you can use filter:
filter (>= 0) [1,-1,0,-5,3]
-- [1,0,3]
To get all possible "pairings" you can use either monads or applicative functors:
import Control.Applicative
(,) <$> [1,2,3] <*> [1,2,3]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]
Of course instead of creating pairs you can generate directly differences when replacing (,) by (-). Now you need to filter again, discarding all zero or negative differences. Then you only need to find the minimum of the list, but I think you can guess the name of that function.
Here, this should do the trick:
dup [] = []
dup (x:xs) = (replicate x x) ++ (dup xs)
We define dup recursively: for empty list it is just an empty list, for a non empty list, it is a list in which the first x elements are equal to x (the head of the initial list), and the rest is the list generated by recursively applying the dup function. It is easy to prove the correctness of this solution by induction (do it as an exercise).
Now, lets analyze your initial solution:
duplicate1 xs = x*x ++ duplicate1 xs
The first mistake: you did not define the list pattern properly. According to your definition, the function has just one argument - xs. To achieve the desired effect, you should use the correct pattern for matching the list's head and tail (x:xs, see my previous example). Read up on pattern matching.
But that's not all. Second mistake: x*x is actually x squared, not a list of two values. Which brings us to the third mistake: ++ expects both of its operands to be lists of values of the same type. While in your code, you're trying to apply ++ to two values of types Int and [Int].
As for the second task, the solution has already been given.
HTH

Resources