Haskell filter function laziness - haskell

Given:
take 5 (filter p xs)
say if filter p xs would return 1K match, would Haskell only filter out 5 matches and without producing a large intermediate result?

It will scan xs only as much as needed to produce 5 matches, evaluating p only on this prefix of xs.
To be more precise, it can actually perform less computation, depending on how the result is used. For instance,
main = do
let p x = (x==3) || (x>=1000000)
list1 = [0..1000000000]
list2 = take 5 (filter p list1)
print (head list2)
will only scan list1 until 3 is found, and no more, despite take asking for five elements. This is because head is demanding only the first of these five, so laziness causes to evaluate just that.

"Would return 1K matches" under what circumstances?
Haskell doesn't work by first evaluating filter p xs (as you would in an ordinary call-by-value language like Java or Ruby). It works by evaluating take 5 first (in this case). take 5 will evaluate enough of filter p xs to end up with a result, and not evaluate the rest.

Yes, it will not.
If it does, something like the following would not work anymore
take 5 (filter (> 10) [1..])
This feature is called Lazy evaluation.

Related

Why the `foldr`, `foldr1`, `scanr` and `scanr1` functions haven't a _problem with productivity when they are applied to big lists?

I read the old Russian translate of the Learn You a Haskell for Great Good! book. I see that the current English version (online) is newer, therefore I look it time of time also.
The quote:
When you put together two lists (even if you append a singleton list
to a list, for instance: [1,2,3] ++ [4]), internally, Haskell has to
walk through the whole list on the left side of ++. That's not a
problem when dealing with lists that aren't too big. But putting
something at the end of a list that's fifty million entries long is
going to take a while. However, putting something at the beginning of
a list using the : operator (also called the cons operator) is
instantaneous.
I assumed that Haskell has to walk through the whole list to get the last item of the list for the foldr, foldr1, scanr and scanr1 functions. Also I assumed that Haskell will do the same for getting a previous element (and so on for each item).
But I see I was mistaken:
UPD
I try this code and I see the similar time of processing for both cases:
data' = [1 .. 10000000]
sum'r = foldr1 (\x acc -> x + acc ) data'
sum'l = foldl1 (\acc x -> x + acc ) data'
Is each list of Haskell bidirectional? I assume that for getting last item of list Haskell at first are to iterate each item and to remember the necessary item (last item for example) for getting (later) the previous item of bidirectional list (for lazy computation). Am I right?
It's tricky since Haskell is lazy.
Evaluating head ([1..1000000]++[1..1000000]) will return immediately, with 1. The lists will never be fully created in memory: only the first element of the first list will be.
If you instead demand the full list [1..1000000]++[1..1000000] then ++ will indeed have to create a two-million long list.
foldr may or may not evaluate the full list. It depends on whether the function we use is lazy. For example, here's map f xs written using foldr:
foldr (\y ys -> f y : ys) [] xs
This is efficient as map f xs is: lists cells are produced on demand, in a streaming fashion. If we need only the first ten elements of the resulting list, then we indeed create only the first ten cells -- foldr will not be applied to the rest of the list. If we need the full resulting list, then foldr will be run over the full list.
Also note that xs++ys can be defined similarly in terms of foldr:
foldr (:) ys xs
and has similar performance properties.
By comparison, foldl instead always runs over the whole list.
In the example you mention we have longList ++ [something], appending to the end of the list. This only costs constant time if all we demand is the first element of the resulting list. But if we really need the last element we added, then appending will need to run over the whole list. This is why appending at the end is considered O(n) instead of O(1).
In the last update, the question speaks about computing the sum with foldr vs foldl, using the (+) operator. In such case, since (+) is strict (it needs both arguments to compute result) then both folds witll need to scan the whole list. The performance in such cases can be comparable. Indeed, they would compute, respectively
1 + (2 + (3 + (4 + ..... -- foldr
(...(((1 + 2) + 3) +4) + .... -- foldl
By comparison foldl' would be more memory efficient, since it starts reducing the above sum before building the above giant expression. That is, it would compute 1+2 first (3), then 3+3 (6), then 6 + 4 (10),... keeping in memory only the last result (a single integer) while the list is being scanned.
To the OP: the topic of laziness is not easy to grasp the first time. It is quite vast -- you just met a ton of different examples which have subtle but significant performance differences. It's hard to explain everything succinctly -- it's just too broad. I'd recommend to focus on small examples and start digesting those first.

Forcing Strict Evaluation - What am I doing wrong?

I want an intermediate result computed before generating the new one to get the benefit of memoization.
import qualified Data.Map.Strict as M
import Data.List
parts' m = newmap
where
n = M.size m + 1
lists = nub $ map sort $
[n] : (concat $ map (\i -> map (i:) (M.findWithDefault [] (n-i) m)) [1..n])
newmap = seq lists (M.insert n lists m)
But, then if I do
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
It still completes instantaneously.
(Can using an Array instead of a Map help?)
Short answer:
If you need to calculate the entire list/array/map/... at once, you can use deepseq as #JoshuaRahm suggests, or the ($!!) operator.
The answer below how you can enforce strictness, but only on level-1 (it evaluates until it reaches a datastructure that may contain (remainders) of expression trees).
Furthermore the answer argues why laziness and memoization are not (necessarily) opposites of each other.
More advanced:
Haskell is a lazy language, it means it only calculates something, if it is absolutely necessary. An expression like:
take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
is not evaluated immediately: Haskell simply stores that this has to be calculated later. Later if you really need the first, second, i-th, or the length of the list, it will evaluate it, and even then in a lazy fashion: if you need the first element, from the moment it has found the way to calculate that element, it will represent it as:
element : take 1999 (<some-expression>)
You can however force Haskell to evaluate something strictly with the exclamation mark (!), this is called strictness. For instance:
main = do
return $! take 2000 (iterate parts' (M.fromList [(1,[[1]])]))
Or in case it is an argument, you can use it like:
f x !y !z = x+y+z
Here you force Haskell to evaluate y and z before "increasing the expression tree" as:
expression-for-x+expression-for-y+expression-for-z.
EDIT: if you use it in a let pattern, you can use the bang as well:
let !foo = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in ...
Note that you only collapse the structure to the first level. Thus let !foo will more or less only evaluate up to (_:_).
Note: note that memoization and lazyness are not necessary opposites of each other. Consider the list:
numbers :: [Integer]
numbers = 0:[i+(sum (genericTake i numbers))|i<-[1..]]
As you can see, calculating a number requires a large amount of computational effort. Numbers is represented like:
numbers ---> (0:[i+(sum (genericTake i numbers))|i<-[1..]])
if however, I evaluate numbers!!1, it will have to calculate the first element, it returns 1; but the internal structure of numbers is evaluated as well. Now it looks like:
numbers (0:1:[i+(sum (genericTake i numbers))|i<-[2..]])
The computation numbers!!1 thus will "help" future computations, because you will never have to recalcuate the second element in the list.
If you for instance calculate numbers!!4000, it will take a few seconds. Later if you calculate numbers!!4001, it will be calculated almost instantly. Simply because the work already done by numbers!!4000 is reused.
Arrays might be able to help, but you can also try taking advantage of the deepseq library. So you can write code like this:
let x = take 2000 (iterate parts' (M.fromList [(1,[[1]])])) in do
x `deepseq` print (x !! 5) -- takes a *really* long time
print (x !! 1999) -- finishes instantly
You are memoizing the partitions functions, but there are some drawbacks to your approach:
you are only memoizing up to a specific value which you have to specify beforehand
you need to call nub and sort
Here is an approach using Data.Memocombinators:
import Data.Memocombinators
parts = integral go
where
go k | k <= 0 = [] -- for safety
go 1 = [[1]]
go n = [[n]] ++ [ (a : p) | a <- [n-1,n-2..1], p <- parts (n-a), a >= head p ]
E.g.:
ghci> parts 4
[[4],[3,1],[2,2],[2,1,1],[1,1,1,1]]
This memoization is dynamic, so only the values you actually access will be memoized.
Note how it is constructed - parts = integral go, and go uses parts for any recursive calls. We use the integral combinator here because parts is a function of an Int.

Haskell function example, why infinite list doesnt stop

Why doesnt map sqrt[1..] not give an infinite recursion????
How can i better understand the haskell?
sqrtSums :: Int
sqrtSums = length ( takeWhile (<1000) (scanl1 (+) (map sqrt[1..]))) + 1
Laziness turns lists into streams
Lists in Haskell behave as if they have a built-in iterator or stream interface, because the entire language uses lazy evaluation by default, which means only calculating results when they're needed by the calling function.
In your example,
sqrtSums = length ( takeWhile (<1000) (scanl1 (+) (map sqrt[1..]))) + 1
it's as if length keeps asking takeWhile for another element,
which asks scanl1 for another element,
which asks map for another element,
which asks [1..] for another element.
Once takeWhile gets something that's not <1000, it doesn't ask scanl1 for any more elements, so [1..] never gets fully evaluated.
Thunks
An unevaluated expression is called a thunk, and getting answers out of thunks is called reducing them. For example, the thunk [1..] first gets reduced to 1:[2..]. In a lot of programming languages, by writing the expression, you force the compiler/runtime to calculate it, but not in Haskell. I could write ignore x = 3 and do ignore (1/0) - I'd get 3 without causing an error, because 1/0 doesn't need to be calculated to produce the 3 - it just doesn't appear in the right hand side that I'm trying to produce.
Similarly, you don't need to produce any elements in your list beyond 131 because by then the sum has exceeded 1000, and takeWhile produces an empty list [], at which point length returns 130 and sqrtSums produces 131.
Haskell evaluates expressions lazily. This means that evaluation only occurs when it is demanded. In this example takeWhile (< 1000) repeatedly demands answers from scanl1 (+) (map sqrt [1..]) but stops after one of them exceeds 1000. The moment this starts happening Haskell ceases to evaluate more of the (truly infinite) list.
We can see this in the small by cutting away some pieces from this example
>>> takeWhile (< 10) [1..]
[1,2,3,4,5,6,7,8,9]
Here we have an expression that represents an infinite list ([1..]) but takeWhile is ensuring that the total expression only demands some of those countless values. Without the takeWhile Haskell will try to print the entire infinite list
>>> [1..]
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24Interrupted.
But again we notice that Haskell demands each element one-by-one only as it needs them in order to print. In a strict language we'd run out of ram trying to represent the infinite list internally prior to printing the very first answer.

Haskell List Comprehension creating function

I am new to Haskell and am trying to learn the basics. I am having a hard time understanding how to manipulate the contents of a list.
Assume I have the following list and I would like to create a function to subtract 1 from every element in the list, where I can simply pass x to the function, how would this be done?
Prelude>let x = 1:2:3:4:5:[]
Something like:
Prelude>subtractOne(x)
(You can write 1:2:3:4:5:[] more simply as [1,2,3,4,5] or even [1..5].)
Comprehensions
You'd like to use list comprehensions, so here it is:
subtractOne xs = [ x-1 | x <- xs ]
Here I'm using xs to stand for the list I'm subtracting one from.
The first thing to notice is x <- xs which you can read as "x is taken from xs". This means we're going to take each of the numbers in xs in turn, and each time we'll call the number x.
x-1 is the value we're calculating and returning for each x.
For more examples, here's one that adds one to each element [x+1|x<-xs] or squares each element [x*x|x<-xs].
More than one list
Let's take list comprehension a little further, to write a function that finds the squares then the cubes of the numbers we give it, so
> squaresAndCubes [1..5]
[1,4,9,16,25,1,8,27,64,125]
We need
squaresAndCubes xs = [x^p | p <- [2,3], x <- xs]
This means we take the powers p to be 2 then 3, and for each power we take all the xs from xs, and calculate x to the power p (x^p).
What happens if we do that the other way around?
squaresAndCubesTogether xs = = [x^p | x <- xs, p <- [2,3]]
We get
> squaresAndCubesTogether [1..5]
[1,1,4,8,9,27,16,64,25,125]
Which takes each x and then gives you the two powers of it straight after each other.
Conclusion - the order of the <- bits tells you the order of the output.
Filtering
What if we wanted to only allow some answers?
Which numbers between 2 and 100 can be written as x^y?
> [x^y|x<-[2..100],y<-[2..100],x^y<100]
[4,8,16,32,64,9,27,81,16,64,25,36,49,64,81]
Here we allowed all x and all y as long as x^y<100.
Since we're doing exactly the same to each element, I'd write this in practice using map:
takeOne xs = map (subtract 1) xs
or shorter as
takeOne = map (subtract 1)
(I have to call it subtract 1 because - 1 would be parsed as negative 1.)
You can do this with the map function:
subtractOne = map (subtract 1)
The alternative solution with List Comprehensions is a little more verbose:
subtractOne xs = [ x - 1 | x <- xs ]
You may also want to add type annotations for clarity.
You can do this quite easily with the map function, but I suspect you want to roll something yourself as a learning exercise. One way to do this in Haskell is to use recursion. This means you need to break the function into two cases. The first case is usually the base case for the simplest kind of input. For a list, this is an empty list []. The result of subtracting one from all the elements of the empty list is clearly an empty list. In Haskell:
subtractOne [] = []
Now we need to consider the slightly more complex recursive case. For any list other than an empty list, we can look at the head and tail of the input list. We will subtract one from the head and then apply subtractOne to the rest of the list. Then we need to concatenate the results together to form a new list. In code, this looks like this:
subtractOne (x:xs) = (x - 1) : subtractOne xs
As I mentioned earlier, you can also do this with map. In fact, it is only one line and the preferred Haskellism. On the other hand, I think it is a very good idea to write your own functions which use explicit recursion in order to understand how it works. Eventually, you may even want to write your own map function for further practice.
map (subtract 1) x will work.
subtractOne = map (subtract 1)
The map function allows you to apply a function to each element of a list.

Haskell: Minimum sum of list

So, I'm new here, and I would like to ask 2 questions about some code:
Duplicate each element in list by n times. For example, duplicate [1,2,3] should give [1,2,2,3,3,3]
duplicate1 xs = x*x ++ duplicate1 xs
What is wrong in here?
Take positive numbers from list and find the minimum positive subtraction. For example, [-2,-1,0,1,3] should give 1 because (1-0) is the lowest difference above 0.
For your first part, there are a few issues: you forgot the pattern in the first argument, you are trying to square the first element rather than replicate it, and there is no second case to end your recursion (it will crash). To help, here is a type signature:
replicate :: Int -> a -> [a]
For your second part, if it has been covered in your course, you could try a list comprehension to get all differences of the numbers, and then you can apply the minimum function. If you don't know list comprehensions, you can do something similar with concatMap.
Don't forget that you can check functions on http://www.haskell.org/hoogle/ (Hoogle) or similar search engines.
Tell me if you need a more thorough answer.
To your first question:
Use pattern matching. You can write something like duplicate (x:xs). This will deconstruct the first cell of the parameter list. If the list is empty, the next pattern is tried:
duplicate (x:xs) = ... -- list is not empty
duplicate [] = ... -- list is empty
the function replicate n x creates a list, that contains n items x. For instance replicate 3 'a' yields `['a','a','a'].
Use recursion. To understand, how recursion works, it is important to understand the concept of recursion first ;)
1)
dupe :: [Int] -> [Int]
dupe l = concat [replicate i i | i<-l]
Theres a few problems with yours, one being that you are squaring each term, not creating a new list. In addition, your pattern matching is off and you would create am infinite recursion. Note how you recurse on the exact same list as was input. I think you mean something along the lines of duplicate1 (x:xs) = (replicate x x) ++ duplicate1 xs and that would be fine, so long as you write a proper base case as well.
2)
This is pretty straight forward from your problem description, but probably not too efficient. First filters out negatives, thewn checks out all subtractions with non-negative results. Answer is the minumum of these
p2 l = let l2 = filter (\x -> x >= 0) l
in minimum [i-j | i<-l2, j<-l2, i >= j]
Problem here is that it will allow a number to be checkeed against itself, whichwiull lend to answers of always zero. Any ideas? I'd like to leave it to you, commenter has a point abou t spoon-feeding.
1) You can use the fact that list is a monad:
dup = (=<<) (\x -> replicate x x)
Or in do-notation:
dup xs = do x <- xs; replicate x x; return x
2) For getting only the positive numbers from a list, you can use filter:
filter (>= 0) [1,-1,0,-5,3]
-- [1,0,3]
To get all possible "pairings" you can use either monads or applicative functors:
import Control.Applicative
(,) <$> [1,2,3] <*> [1,2,3]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]
Of course instead of creating pairs you can generate directly differences when replacing (,) by (-). Now you need to filter again, discarding all zero or negative differences. Then you only need to find the minimum of the list, but I think you can guess the name of that function.
Here, this should do the trick:
dup [] = []
dup (x:xs) = (replicate x x) ++ (dup xs)
We define dup recursively: for empty list it is just an empty list, for a non empty list, it is a list in which the first x elements are equal to x (the head of the initial list), and the rest is the list generated by recursively applying the dup function. It is easy to prove the correctness of this solution by induction (do it as an exercise).
Now, lets analyze your initial solution:
duplicate1 xs = x*x ++ duplicate1 xs
The first mistake: you did not define the list pattern properly. According to your definition, the function has just one argument - xs. To achieve the desired effect, you should use the correct pattern for matching the list's head and tail (x:xs, see my previous example). Read up on pattern matching.
But that's not all. Second mistake: x*x is actually x squared, not a list of two values. Which brings us to the third mistake: ++ expects both of its operands to be lists of values of the same type. While in your code, you're trying to apply ++ to two values of types Int and [Int].
As for the second task, the solution has already been given.
HTH

Resources