I am going through an Haskell tutorial about lists, and it claims:
Watch out when repeatedly using the ++ operator on long strings ... Haskell has to walk through the whole list on the left side of ++. ... However, putting something at the beginning of a list using the : operator (also called the cons operator) is instantaneous.
But, in my mind, things should be the other way around.
: has to go through all elements in the list because it needs to shift all the indices. ++, on the other hand, can just append a new element at the end of the list and be done with it, hence instantaneous.
Any help understanding this statement?
A list in Haskell is just a singly-linked list. A list of, say, Char, is either [], the empty list, or c : cs, where c is a Char and cs is a list of Char. To produce c : cs given c and cs, all the implementation needs to do is allocate a record with a tag indicating (:) and copies of the pointers c and cs. This is extremely cheap.
Related
Haskell lists are constructed by a sequence of calls to cons, after desugaring syntax:
Prelude> (:) 1 $ (:) 2 $ (:) 3 []
[1,2,3]
Are lists lazy due to them being such a sequence of function calls?
If this is true, how can the runtime access the values while calling the chain of functions?
Is the access by index also a syntax sugar?
How could we express it in other way, less sugared than this?:
Prelude> (!!) lst 1
2
The underlying question here might be:
Are lists fundamental entities in Haskell, or they can be expressed as composition of more essential concepts?
Are there possiblity to represent lists in the simplest lambda calculus?
I am trying to implement a language where the list is defined at the standard libray, not as a special entity directly hardwired in the parser/interpreter/runtime.
Lists in Haskell are special in syntax, but not fundamentally.
Fundamentally, Haskell list is defined like this:
data [] a = [] | (:) a ([] a)
Just another data type with two constructors, nothing to see here, move along.
The above is kind of a pseudocode though, because you can't actually define something like that yourself: neither [] nor (:) is a valid constructor name. A special exception is made for the built-in lists.
But you could define the equivalent, something like:
data MyList a = Nil | Cons a (MyList a)
And this would work exactly the same with regards to memory management, laziness, and so on, but it won't have the nice square-bracket syntax (at least in Haskell 2010; in modern GHC you can get the special syntax for your own types too, thanks to overloaded lists).
As far as laziness goes, that's not special to lists, but is special to data constructors, or more precisely, pattern matching on data constructors. In Haskell, every computation is lazy. This means that whatever crazy chain of function calls you may construct, it's not evaluated right away. Doesn't matter if it's a list construction or some other function call. Nothing is evaluated right away.
But when is it evaluated then? The answer is in the spec: a value is evaluated when somebody tries to pattern match on it, and at that moment it's evaluated up to the data constructor being matched. So, for lists, this would be when you go case myList of { [] -> "foo"; x:xs -> "bar" } - that's when the call chain is evaluated up to the first data constructor, which is necessary in order to decide whether that constructor is [] or (:), which is necessary for evaluating the case expression.
Index access is also not special, it works on the exact same principle: the implementation of the (!!) operator (check out the source code) repeatedly (recursively) matches on the list until it discovers N (:) constructors in a row, at which point it stops matching and returns whatever was on the left of the last (:) constructor.
In the "simplest" lambda calculus, in the absence of data constructors or primitive types, I reckon your only choice is to Church-encode lists (e.g. as a fold, or directly as a catamorphism) or build them up from other structures (e.g. pairs), which are themselves Church-encoded. Lambda calculus, after all, only has functions and nothing else. Unless you mean something more specific.
I'm just starting out with Haskell, so I'm trying to wrap my head around the "Haskell way of thinking." Is there a reason to use pattern matching to solve Problem 1 here basically by unwrapping the whole list and calling the function recursively, instead of just retrieving the last element directly like myLast lst = lst !! ((length lst) - 1)? It seems almost brute-force, but I assume it's just my lack of familiarity here.
A few things I can think of:
(!!) and length are ultimately implemented using recursion over the structure of the list. That being so, it can be a worthwhile learning exercise to implement those basic functions using explicit recursion.
Keep in mind that, under the hood, the retrieval of the last element is not direct. Since we are dealing with linked lists, length has to go through all elements of the lists, and (!!) has to go through all elements up to the desired index. That being so, lst !! (length lst - 1) runs through the whole list twice, rather than once. (This is one of the reasons why, as a rule of thumb, length is better avoided unless you actually need to know the number of elements in and of itself, and not just as a proxy to something else.)
Pattern matching is a neat way of stating facts about the structure of data types. If, while consuming a list recursively, you match a [x] pattern (or, equivalently, x : [] -- an element consed to the empty list), you know that x is the last element. In a way, matching [x] involves one less level of indirection than accessing the list element at index length lst - 1, as it only deals with the structure of the list, without requiring an indexing scheme to be bolted on the top of it.
With all that said, there is something fundamentally right about your feeling that explicit recursion feels "almost brute-force". In time, you'll find out about folds, mapping functions, and other ways to capture and abstract common recursive patterns, making it possible to write in a more fluent manner.
From my understanding, lazy evaluation is the arguments are not evaluated before they are passed to a function, but only when their values are actually used.
But in a haskell tutorial, I see an example.
xs = [1,2,3,4,5,6,7,8]
doubleMe(doubleMe(doubleMe(xs)))
The author said an imperative language would probably pass through the list once and make a copy and then return it. Then it would pass through the list another two times and return the result.
But in a lazy language, it would first compute
doubleMe(doubleMe(doubleMe(1)))
This will give back a doubleMe(1), which is 2. Then 4, and finally 8.
So it only does one pass through the list and only when you really need it.
This makes me confused. Why don't lazy language take the list as a whole, but split it? I mean we can ignore what the list or the expression is before we use it. But we need to evaluate the whole thing when we use it, isn't it?
A list like [1,2,3,4,5,6,7,8] is just syntactic sugar for this: 1:2:3:4:5:6:7:8:[].
In this case, all the values in the list are numeric constants, but we could define another, smaller list like this:
1:1+1:[]
All Haskell lists are linked lists, which means that they have a head and a tail. In the above example, the head is 1, and the tail is 1+1:[].
If you only want the head of the list, there's no reason to evaluate the rest of the list:
(h:_) = 1:1+1:[]
Here, h refers to 1. There's no reason to evaluate the rest of the list (1+1:[]) if h is all you need.
That's how lists are lazily evaluated. 1+1 remains a thunk (an unevaluated expression) until the value is required.
I am wondering exactly how list-comprehensions are evaluated in Haskell. After reading this Removing syntactic sugar: List comprehension in Haskell and this: Haskell Lazy Evaluation and Reuse I still don't really understand if
[<function> x|x <- <someList>, <somePredicate>]
is actually exactly equivalent (not just in outcome but in evaluation) to
filter (<somePredicate>) . map (<function>) $ <someList>
and if so, does this mean it can potentially reduce time complexity drastically to build up the desired list with only desired elements?
Also, how does this work in terms of infinite lists? To be specific: I assume something like:
[x|x <- [1..], x < 20]
will be evaluated in finite time, but how "obvious" does the fact that there are no more elements above some value which satisfy the predicate need to be, for the compiler to consider it? Would
[x|x <- [1..], (sum.map factorial $ digits x) == x]
work (see project Euler problem 34 https://projecteuler.net/problem=34). There is obviously an upper bound because from some x on x*9! < 10^n -1 always holds, but do I need to supply that bound or will the compiler find it?
There's nothing obvious about the fact that a particular infinite sequence has no more elements matching a predicate. When you pass a list to filter, it has no way of knowing any other properties of the elements than that an element can be passed to the predicate.
You can write your own version of Ord a => List a which can describe a sequence as ascending or not, and a version of filter that can use this information to stop looking at elements past a particular threshold. Unfortunately, list comprehensions won't use either of them.
Instead, I'd use a combination of takeWhile and a comprehension without a predicate / a plain map. Somewhere in the takeWhile arguments, you will supply the compiler the information about the expected upper bound; for a number of n decimal digits, it would be 10^n.
[<function> x|x <- <someList>, <somePredicate>]
should always evaluate to the same result as
filter (<somePredicate>) . map (<function>) $ <someList>
However, there is no guarantee that this is how the compiler will actually do it. The section on list comprehensions in the Haskell Report only mentions what list comprehensions should do, not how they should work. So each compiler is free to do as its developers find best. Therefore, you should not assume anything about the performance of list comprehensions or that the compiler will do any optimizations.
If it's linked list, why doesn't it support push_back?
If it's simply array, why does it need linear time when accessed by subscript?
Appreciated for your help.
Edit: We can append element in front of a list like this 1:[2,3], that's push_front; but we can't do it this way: [2,3]:4, that's push_back.
ps. actually I borrow push_front/back from C++'s STL
Haskell list are singly linked list. It supports appending to the end of the list, but it has to traverse the entire list for this operation.:
λ> let x = [1,2,3]
λ> x ++ [4]
[1,2,3,4]
If by push_back you mean adding an element to the end, of course it "supports" it. To add an element x to a list list (and get a new list so constructed), use the expression list ++ [x].
Beware though, that this is an O(n) operation, n being the length of list.
That's because it is a singly linked list. It also answers your other question about subscript.
Since the question asks about the "underlying implementation":
The list type is (conceptually) defined like this:
data [a] = [] | a:[a]
You can't actually write this declaration, because lists have funky syntax. But you can make exactly the same kind of thing yourself
data List a = Nil | a :* List a
infixr 5 :*
When you write something like [1,2,3], that's just special syntax for 1:2:3:[]. Since : associates to the right, this really means 1:(2:(3:[])). The functions on lists are all, fundamentally, based on pattern matching, which deals with just one constructor at a time. You may have to open up a lot of : constructors to get to the end of a list, and then build a bunch of : constructors to put together a new version with an extra element at the end. Any functions (like ++) that you may use will end up going through this sort of process internally.