ZipWith without evaluating elements more than once in Haskell - haskell

I have an infinite list, dog (not the actual name) whose elements are generated by a somewhat-slow function, so I'm trying to avoid having to generate the same element more than once. The problem is I want the following new list, cat:
let cat = zipWith (++) dog $ tail dog
Am I correct in believing that cat is created by evaluating each element of dog twice (from dog and tail dog) then concatenating the two elements? If so, is there a way that I can get Haskell to "realize" that the elements of dog are precisely the same as tail dog just shifted one to the left so that I can "pass the value" of the previous element of tail dog to the current element of dog? That is to say, since I know that the i-th element of dog is equal to the (i - 1) element of tail dog, I want my program to just re-use the (i - 1) element of tail dog instead of recalculating it.
I know lists like the canonical creation of the Fibonacci sequence, let fib = 0:1:zipWith (+) fib $ tail fib only evaluate the elements once; but that is because the list is defined on itself while cat is not.
I apologize if this is a dumb question, but my brain hasn't been firing on all cylinders lately. If knowing the specific list in question will be useful, then I'll be more than happy to provide it. Thanks.

Am I correct in believing that cat is created by evaluating each element of dog twice
No, each element of a list (or more generally: each variable and each element of a algebraic data type or record) will be evaluated at most once.
I know lists like the canonical creation of the Fibonacci sequence [...] only evaluate the elements once; but that is because the list is defined on itself while cat is not.
No, that's not why. It's because a lazy list is not the same thing as the concept of a generator you might know from other languages. A lazy list is an actual data structure that exists (partly) in memory once it's been (partly) evaluated.
That is, after you've used, say, the first ten elements of the list, those elements will actually exist in memory and any subsequent usage of those elements will simply read them from memory rather than calculating them again.

#sepp2k is correct, but at the same time, sometimes it is useful to be able to verify these things directly.... (here is was obvious, but in slightly more complicated cases, it isn't as clear).
You can always watch the program in action by using the (unsafe) trace function, like this....
import Debug.Trace
main :: IO ()
main = do
let dog = listFrom 0
cat = zipWith (++) dog $ tail dog
print $ take 10 cat
listFrom::Int->[String]
listFrom x = trace ("in listFrom: " ++ show x) $
show x:listFrom (x+1)
This will show you how many times each element was calculated (followed by the output of the program)....
in listFrom: 0
in listFrom: 1
in listFrom: 2
in listFrom: 3
in listFrom: 4
in listFrom: 5
in listFrom: 6
in listFrom: 7
in listFrom: 8
in listFrom: 9
in listFrom: 10
["01","12","23","34","45","56","67","78","89","910"]
As expected, it creates each item only once.... More interestingly, you can see that (because of laziness), no item in the list is created if you don't use the created list.... For instance, change
print $ take 10 cat
to
putStrLn "Not using cat"
and nothing gets printed out
> runProgram
Not using cat
(Just remember, though, trace is unsafe and should never be used in the final program, it is only intended for debugging)

Related

length vs head vs last in Haskell

I know 2 of these statements are true, but I dont know which
Let e be an expression of type [Int]
there exists e such that: Evaluation of head e won't finish but last e will
there exists e such that: Evaluation of last e won't finish but head e will
there exists e such that: Evaluation of length e won't finish but last e will
Seems clear to me that 2 is true, but I can't see how 1 or 3 can be true.
My thinking is that in order to calculate the length of a list you need to get to the last one, making 1 and 3 impossible
Since this is a test question, I won't answer it directly, but instead, here are some hints; it'd be better if you work this out yourself.
Since we're talking about computations that don't terminate, it might be useful to define one such computation. However, if this confuses you you can safely ignore this and refer only to examples that don't include this.
-- `never` never terminates when evaluated, and can be any type.
never :: a
never = never
Question 1
Consider the list [never, 1], or alternatively the list [last [1..], 1] as suggested by #chi.
Question 2
Consider the list [1..], or alternatively the list [1, never].
Question 3
Consider the definition of length:
length [] = 0
length (_:xs) = 1 + length xs
Under what conditions does length not terminate? How does this relate to last?

Build a specific list of string from a string in Haskell

I'm starting learning haskell and i'm stuck in a problem.
I read from the standard input a string like "1234" or "azer"
and I want to make a list like ["123", "234", "341", "412"] or ["aze", "zer", "era", "raz"].
I probably must use map but i don't know how to proceed.
Is someone can help me to do that ? Thanks
Let's start with a list, [1..4]. Let's repeat it for eternity:
>>> cycle [1..4]
[1,2,3,4,1,2,3,4,1,2,3,4,...
Now let's take a slice of it, at say, the 2nd index:
>>> take 4 $ drop (2-1) $ cycle [1..4]
[2,3,4,1]
We can generalize this by naming a function:
slice n = take 4 $ drop n $ cycle [1..4]
To obtain all possible cyclic permutations, we only need to sample n from 1 to 4:
>>> map slice [1..4]
[[2,3,4,1],[3,4,1,2],[4,1,2,3],[1,2,3,4]]
Now, how can we make this work with an arbitrary string? Let's redefine slice to accept a string:
slice s n = take (length s) $ drop n $ cycle s
And so our cyclic permutations function can be defined as follows:
cyclicPerms s = map (slice s) [1..(length s)]
Testing:
>>> cyclicPerms "abcde"
["bcdea","cdeab","deabc","eabcd","abcde"]
I had originally posted an answer totally misunderstanding the specification. I, like a Haskell enumeration, read only the first two numbers so I thought it continued as such. Oops. In any event I just adapted a chunks function I wrote to produce the repetitions. When I get home, I think I have another that cycles lists. I'll post it as well if it's not the same. Who knows.
This function allows you to specify the chunk size as well as the list.
cychnks n ls = [take n.drop x$ls2|(x,y) <-zip [0..] ls]
where ls2 = ls++ls
cychnks 5 "abcde"
["abcde","bcdea","cdeab","deabc","eabcd"]
cychnks 3 "abcde"
["abc","bcd","cde","dea","eab"]

Why does concatenation of lists take O(n)?

According to the theory of ADTs (Algebraic Data Types) the concatenation of two lists has to take O(n) where n is the length of the first list. You, basically, have to recursively iterate through the first list until you find the end.
From a different point of view, one can argue that the second list can simply be linked to the last element of the first. This would take constant time, if the end of the first list is known.
What am I missing here ?
Operationally, an Haskell list is typically represented by a pointer to the first cell of a single-linked list (roughly). In this way, tail just returns the pointer to the next cell (it does not have to copy anything), and consing x : in front of the list allocates a new cell, makes it point to the old list, and returns the new pointer. The list accessed by the old pointer is unchanged, so there's no need to copy it.
If you instead append a value with ++ [x], then you can not modify the original liked list by changing its last pointer unless you know that the original list will never be accessed. More concretely, consider
x = [1..5]
n = length (x ++ [6]) + length x
If you modify x when doing x++[6], the value of n would turn up to be 12, which is wrong. The last x refer to the unchanged list which has length 5, so the result of n must be 11.
Practically, you can't expect the compiler to optimize this, even in those cases in which x is no longer used and it could, theoretically, be updated in place (a "linear" use). What happens is that the evaluation of x++[6] must be ready for the worst-case in which x is reused afterwards, and so it must copy the whole list x.
As #Ben notes, saying "the list is copied" is imprecise. What actually happens is that the cells with the pointers are copied (the so-called "spine" on the list), but the elements are not. For instance,
x = [[1,2],[2,3]]
y = x ++ [[3,4]]
requires only to allocate [1,2],[2,3],[3,4] once. The lists of lists x,y will share pointers to the lists of integers, which do not have to be duplicated.
What you're asking for is related to a question I wrote for TCS Stackexchange some time back: the data structure that supports constant-time concatenation of functional lists is a difference list.
A way of handling such lists in a functional programming language was worked out by Yasuhiko Minamide in the 90s; I effectively rediscovered it a while back. However, the good run-time guarantees require language-level support that's not available in Haskell.
It's because of immutable state. A list is an object + a pointer, so if we imagined a list as a Tuple it might look like this:
let tupleList = ("a", ("b", ("c", [])))
Now let's get the first item in this "list" with a "head" function. This head function takes O(1) time because we can use fst:
> fst tupleList
If we want to swap out the first item in the list with a different one we could do this:
let tupleList2 = ("x",snd tupleList)
Which can also be done in O(1). Why? Because absolutely no other element in the list stores a reference to the first entry. Because of immutable state, we now have two lists, tupleList and tupleList2. When we made tupleList2 we didn't copy the whole list. Because the original pointers are immutable we can continue to reference them but use something else at the start of our list.
Now let's try to get the last element of our 3 item list:
> snd . snd $ fst tupleList
That happened in O(3), which is equal to the length of our list i.e. O(n).
But couldn't we store a pointer to the last element in the list and access that in O(1)? To do that we would need an array, not a list. An array allows O(1) lookup time of any element as it is a primitive data structure implemented on a register level.
(ASIDE: If you're unsure of why we would use a Linked List instead of an Array then you should do some more reading about data structures, algorithms on data structures and Big-O time complexity of various operations like get, poll, insert, delete, sort, etc).
Now that we've established that, let's look at concatenation. Let's concat tupleList with a new list, ("e", ("f", [])). To do this we have to traverse the whole list just like getting the last element:
tupleList3 = (fst tupleList, (snd $ fst tupleList, (snd . snd $ fst tupleList, ("e", ("f", [])))
The above operation is actually worse than O(n) time, because for each element in the list we have to re-read the list up to that index. But if we ignore that for a moment and focus on the key aspect: in order to get to the last element in the list, we must traverse the entire structure.
You may be asking, why don't we just store in memory what the last list item is? That way appending to the end of the list would be done in O(1). But not so fast, we can't change the last list item without changing the entire list. Why?
Let's take a stab at how that might look:
data Queue a = Queue { last :: Queue a, head :: a, next :: Queue a} | Empty
appendEnd :: a -> Queue a -> Queue a
appendEnd a2 (Queue l, h, n) = ????
IF I modify "last", which is an immutable variable, I won't actually be modifying the pointer for the last item in the queue. I will be creating a copy of the last item. Everything else that referenced that original item, will continue referencing the original item.
So in order to update the last item in the queue, I have to update everything that has a reference to it. Which can only be done in optimally O(n) time.
So in our traditional list, we have our final item:
List a []
But if we want to change it, we make a copy of it. Now the second last item has a reference to an old version. So we need to update that item.
List a (List a [])
But if we update the second last item we make a copy of it. Now the third last item has an old reference. So we need to update that. Repeat until we get to the head of the list. And we come full circle. Nothing keeps a reference to the head of the list so editing that takes O(1).
This is the reason that Haskell doesn't have Doubly Linked Lists. It's also why a "Queue" (or at least a FIFO queue) can't be implemented in a traditional way. Making a Queue in Haskell involves some serious re-thinking of traditional data structures.
If you become even more curious about how all of this works, consider getting the book Purely Funtional Data Structures.
EDIT: If you've ever seen this: http://visualgo.net/list.html you might notice that in the visualization "Insert Tail" happens in O(1). But in order to do that we need to modify the final entry in the list to give it a new pointer. Updating a pointer mutates state which is not allowed in a purely functional language. Hopefully that was made clear with the rest of my post.
In order to concatenate two lists (call them xs and ys), we need to modify the final node in xs in order to link it to (i.e. point at) the first node of ys.
But Haskell lists are immutable, so we have to create a copy of xs first. This operation is O(n) (where n is the length of xs).
Example:
xs
|
v
1 -> 2 -> 3
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7
^ ^
| |
xs ++ ys ys

How does this haskell code work?

I'm a new student and I'm studying in Computer Sciences. We're tackling Haskell, and while I understand the idea of Haskell, I just can't seem to figure out how exactly the piece of code we're supposed to look at works:
module U1 where
double x = x + x
doubles (d:ds) = (double d):(doubles ds)
ds = doubles [1..]
I admit, it seems rather simple for someone that knows whats happening, but I can't wrap my head around it. If I write "take 5 ds", it obviously gives back [2,4,6,8,10]. What I dont get, is why.
Here's my train of thought : I call ds, which then looks for doubles. because I also submit the value [1..], doubles (d:ds) should mean that d = 1 and ds = [2..], correct? I then double the d, which returns 2 and puts it at the start of a list (array?). Then it calls upon itself, transferring ds = [2..] to d = 2 and ds = [3..], which then doubles d again and again calls upon itself and so on and so forth until it can return 5 values, [2,4,6,8,10].
So first of all, is my understanding right? Do I have any grave mistakes in my string of thought?
Second of all, since it seems to save all doubled d into a list to call for later, whats the name of that list? Where did I exactly define it?
Thanks in advance, hope you can help out a student to understand this x)
I think you are right about the recursion/loop part about how doubles goes through each element of the infinite list.
Now regarding
it seems to save all doubled d into a list to call for later, whats
the name of that list? Where did I exactly define it?
This relates to a feature that's called Lazy Evaluation in Haskell. The list isn't precomputed and stored any where. Instead, you can imagine that a list is a function object in C++ that can generate elements when needed. (The normal language you may see is that expressions are evaluated on demand). So when you do
take 5 [1..]
[1..] can be viewed as a function object that generates numbers when used with head, take etc. So,
take 5 [1..] == (1 : take 4 [2..])
Here [2..] is also a "function object" that gives you numbers. Similarly, you can have
take 5 [1..] == (1 : 2 : take 3 [3..]) == ... (1 : 2 : 3 : 4 : 5 : take 0 [6..])
Now, we don't need to care about [6..], because take 0 xs for any xs is []. Therefore, we can have
take 5 [1..] == (1 : 2 : 3 : 4 : 5 : [])
without needing to store any of the "infinite" lists like [2..]. They may be viewed as function objects/generators if you want to get an idea of how Lazy computation can actually happen.
Your train of thought looks correct. The only minor inaccuracy in it lies in describing the computation using expressions such has "it doubles 2 and then calls itself ...". In pure functional programming languages, such as Haskell, there actually is no fixed evaluation order. Specifically, in
double 1 : double [2..]
it is left unspecified whether doubling 1 happens before of after doubling the rest of the list. Theoretical results guarantee that order is indeed immaterial, in that -- roughly -- even if you evaluate your expression in a different order you will get the same result. I would recommend that you see this property at work using the Lambda Bubble Pop website: there you can pop bubbles in a different order to simulate any evaluation order. No matter what you do, you will get the same result.
Note that, because evaluation order does not matter, the Haskell compiler is free to choose any evaluation order it deems to be the most appropriate for your code. For instance, let ds be defined as in the final line in your code, and consider
take 5 (drop 5 ds)
this results in [12,14,16,18,20]. Note that the compiler has no need to double the first 5 numbers, since you are dropping them, so they can be dropped before they are completely computed (!!).
If you want to experiment, define yourself a function which is very expensive to compute (say, write fibonacci following the recursive definifion).
fibonacci 0 = 0
fibonacci 1 = 1
fibonacci n = fibonacci (n-1) + fibonacci (n-2)
Then, define
const5 n = 5
and compute
fibonacci 100
and observe how long that actually takes. Then, evaluate
const5 (fibonacci 100)
and see that the result is immediately reached -- the argument was not even computed (!) since there was no need for it.

Generating All Possible Paths in Haskell

I am very bad at wording things, so please bear with me.
I am doing a problem that requires me to generate all possible numbers in the form of a lists of lists, in Haskell.
For example if I have x = 3 and y = 2, I have to generate a list of lists like this:
[[1,1,1], [1,2,1], [2,1,1], [2,2,1], [1,1,2], [1,2,2], [2,1,2], [2,2,2]]
x and y are passed into the function and it has to work with any nonzero positive integers x and y.
I am completely lost and have no idea how to even begin.
For anyone kind enough to help me, please try to keep any math-heavy explanations as easy to understand as possible. I am really not good at math.
Assuming that this is homework, I'll give you the part of the answer, and show you how I think through this sort of problem. It's helpful to experiment in GHCi, and build up the pieces we need. One thing we need is to be able to generate a list of numbers from 1 through y. Suppose y is 7. Then:
λ> [1..7]
[1,2,3,4,5,6,7]
But as you'll see in a moment, what we really need is not a simple list, but a list of lists that we can build on. Like this:
λ> map (:[]) [1..7]
[[1],[2],[3],[4],[5],[6],[7]]
This basically says to take each element in the array, and prepend it to the empty list []. So now we can write a function to do this for us.
makeListOfLists y = map (:[]) [1..y]
Next, we need a way to prepend a new element to every element in a list of lists. Something like this:
λ> map (99:) [[1],[2],[3],[4],[5],[6],[7]]
[[99,1],[99,2],[99,3],[99,4],[99,5],[99,6],[99,7]]
(I used 99 here instead of, say, 1, so that you can easily see where the numbers come from.) So we could write a function to do that:
prepend x yss = map (x:) yss
Ultimately, we want to be able to take a list and a list of lists, and invoke prepend on every element in the list to every element in the list of lists. We can do that using the map function again. But as it turns out, it will be a little easier to do that if we switch the order of the arguments to prepend, like this:
prepend2 yss x = map (x:) yss
Then we can do something like this:
λ> map (prepend2 [[1],[2],[3],[4],[5],[6],[7]]) [97,98,99]
[[[97,1],[97,2],[97,3],[97,4],[97,5],[97,6],[97,7]],[[98,1],[98,2],[98,3],[98,4],[98,5],[98,6],[98,7]],[[99,1],[99,2],[99,3],[99,4],[99,5],[99,6],[99,7]]]
So now we can write that function:
supermap xs yss = map (prepend2 yss) xs
Using your example, if x=2 and y=3, then the answer we need is:
λ> let yss = makeListOfLists 3
λ> supermap [1..3] yss
[[[1,1],[1,2],[1,3]],[[2,1],[2,2],[2,3]],[[3,1],[3,2],[3,3]]]
(If that was all we needed, we could have done this more easily using a list comprehension. But since we need to be able to do this for an arbitrary x, a list comprehension won't work.)
Hopefully you can take it from here, and extend it to arbitrary x.
For the specific x, as already mentioned, the list comprehension would do the trick, assuming that x equals 3, one would write the following:
generate y = [[a,b,c] | a<-[1..y], b<-[1..y], c <-[1..y]]
But life gets much more complicated when x is not predetermined. I don't have much experience of programming in Haskell, I'm not acquainted with library functions and my approach is far from being the most efficient solution, so don't judge it too harshly.
My solution consists of two functions:
strip [] = []
strip (h:t) = h ++ strip t
populate y 2 = strip( map (\a-> map (:a:[]) [1..y]) [1..y])
populate y x = strip( map (\a-> map (:a) [1..y]) ( populate y ( x - 1) ))
strip is defined for the nested lists. By merging the list-items it reduces the hierarchy so to speak. For example calling
strip [[1],[2],[3]]
generates the output:
[1,2,3]
populate is the tricky one.
On the last step of the recursion, when the second argument equals to 2, the function maps each item of [1..y] with every element of the same list into a new list. For example
map (\a-> map (:a:[]) [1..2]) [1..2])
generates the output:
[[[1,1],[2,1]],[[1,2],[2,2]]]
and the strip function turns it into:
[[1,1],[2,1],[1,2],[2,2]]
As for the initial step of the recursion, when x is more than 2, populate does almost the same thing except this time it maps the items of the list with the list generated by the recursive call. And Finally:
populate 2 3
gives us the desired result:
[[1,1,1],[2,1,1],[1,2,1],[2,2,1],[1,1,2],[2,1,2],[1,2,2],[2,2,2]]
As I mentioned above, this approach is neither the most efficient nor the most readable one, but I think it solves the problem. In fact, theoritically the only way of solving this without the heavy usage of recursion would be building the string with list comprehension statement in it and than compiling that string dynamically, which, according to my short experience, as a programmer, is never a good solution.

Resources