Is the memory of compiled/eval’ed procedures garbage-collected in Chez Scheme? - garbage-collection

Multiple, perhaps most, language implementations that include a compiler at runtime neglect to garbage-collect discarded code (See, for example julia, where this leads to memory leaks in applications like genetic-programming)
My preliminary tests indicate that Chez Scheme does not leak memory here, but I would like to know with greater certainty, since I don't even know if f and g actually get compiled. (The old mantra: "Tests can only prove the presence of bugs, not their absence")
The test I tried: f and g call each other, and their definitions get replaced at runtime.
(define f)
(define g)
(define (make-f x)
(eval `(set! f (lambda (y)
(if (> y 100)
(+ (remainder ,x 3) (g y))
(+ y 1))))))
(define (make-g x)
(eval `(set! g (lambda (y)
(if (< y 10)
(+ (remainder ,x 5) (f y))
(div y 2))))))
(define (make-and-run-f n)
(begin
(make-f 1)
(make-g 1)
(let loop ((i 0) (acc 0))
(if (> i n)
acc
(begin
(make-f i)
(make-g i)
(loop (+ i 1) (+ acc (f 33))))))))
(time (make-and-run-f 1000000)) ; runs in 10 min and negligible memory

Given the importance of both procedures and garbage collection to Scheme, I would be surprised if Chez Scheme did not try to garbage collect any dynamically created objects. The R6RS Standard says [emphasis mine]:
All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent. No Scheme object is ever destroyed. The reason that implementations of Scheme do not (usually!) run out of storage is that they are permitted to reclaim the storage occupied by an object if they can prove that the object cannot possibly matter to any future computation.
A procedure is an object, and any object may be garbage collected if the implementation can prove that the computation will not need it again. This is not a requirement, but that goes for any object, not just for procedures.
The Chez Scheme manual seems definitive, though (Chez Scheme Version 9 User's Guide, p. 82):
Since all Scheme objects, including code objects, can be relocated or even reclaimed by the garbage collector....
In the 1990s Kent Dybvig wrote a paper together with David Eby and Carl Bruggeman which may be of interest here, called Don’t Stop the BIBOP: Flexible and Efficient Storage Management for Dynamically Typed Languages, which describes the garbage collection strategy implemented in Chez Scheme. In the paper some time is spent discussing "code objects" and in particular how they are segregated and treated differently during the garbage collection process (since they may contain pointers to other objects).

Related

What's the difference between these functions implemented with currying and transducers?

Taking the three functions below, implemented in Haskell and Clojure respectively:
f :: [Int] -> Int
f = foldl1 (+) . map (*7) . filter even
(defn f [coll]
((comp
(partial reduce +)
(partial map #(* 7 %)
(partial filter even?)) coll))
(defn f [coll]
(transduce
(comp
(filter even?)
(map #(* 7 %)))
+ coll))
when they are applied to a list like [1, 2, 3, 4, 5] they all return 42. I know the machinery behind the first 2 is similar since map is lazy in Clojure, but the third one uses transducers. Could someone show the intermediate steps for the execution of these functions?
The intermediate steps between the second and third example are the same for this specific example. This is due to the fact that map and filter are implemented as lazy transformations of a sequence into a sequence, as you’re no-doubt already aware.
The transducer versions of map and filter are defined using the same essential functionality as the not-transducer versions, except that the way they “conj" (or not, in the case of filter) onto the result stream is defined elsewhere. Indeed, if u look at the source for map, there are explicit data-structure construction functions in use, whereas the transducer variant uses no such functions -- they are passed in via rf. Explicitly using cons in the non-transducer versions means they're always going to be dealing with sequences
IMO, the main benefit of using transducers is that you have the ability to define the process that you're doing away from the thing which will use your process. Therefore perhaps a more interesting rewrite of your third example may be:
(def process (comp (filter even)
(map #(* 7 %))))
(defn f [coll] (transduce process + collection))
Its an exercise to the application author to decide when this sort of abstraction is necessary, but it can definitely open an opportunity for reuse.
It may occur to you that you can simply rewrite
(defn process [coll]
((comp
(partial map #(* 7 %)
(partial filter even?)) coll))
(reduce + (process coll))
And get the same effect; this is true. When your input is always a sequence (or always the same kind of stream / you know what kind of stream it will be) there's arguably not a good reason to create a transducer. But the power of reuse can be demonstrated here (assume process is a transducer)
(chan 1 process) ;; an async channel which runs process on all inputs
(into [] process coll) ;; writing to a vector
(transduce + process coll) ;; your goal
The motivation behind transducers was essentially to stop having to write new collection functions for different collection types. Rich Hickey mentions his frustration writing functions like map< map> mapcat< mapcat>, and so on in the core async library -- what map and mapcat are, is already defined, but because they assume that they operate on sequences (that explicit cons I linked above), they cant be applied to asnychronous channels. But channels can supply their own rf in the transducer version to let them reuse these functions.

Lazy evaluation and nested thunks eating up memory

I'm working on a tiny lambda calculus engine which I want it to be lazy as Haskell. I'm trying to, at least for now, stick to Haskell's rules so that I don't have to rethink everything, but I don't want to do this blindly.
I understand Haskell will not evaluate a term until its value is needed. Here is my first doubt. I understand a value is "needed" when is an argument to a built in function (so in (func x), x is needed if func is a built in function and (func x) is needed) or because is a function to be called (so in (x y), x would be needed if (x y) is needed).
My second problem is, say I have this recursive function:
let st = \x -> (st x)
The way I've implemented it so far is, if I call this like (st "hi"), "hi" won't be evaluated but wrapped in a thunk which contains the term and its scope, that will be added as "x" to the scope of the body of st. Then, when evaluating st again, another thunk will be created around the x in (st x), which will contain the term x and its scope, which contains another definition of x, that is, "hi". This way, nested thunks will keep building up until I run out of memory.
I tested my code above in GHCI and memory was OK. Then I tested this:
let st = \x -> (st (id x))
and memory built up until the application crashed. So apparently GHCI (or Haskell?) only use thunks when the argument is a function call; in every other case it uses the term's value. Which is something I could easily implement.
Another option I thought of, is not allowing nested thunks by, right before evaluating a function call and creating thunks for the arguments, evaluating the whole current scope to make sure no new thunk will contain another thunk. I think this should still let me create infinite lists and get some (or all?) of the benefits of lazy evaluation, and would even prevent the application from crashing when the let st = \x.(st (id x)) function is called.
I'm sure there are plenty of ways to implement lazy evaluation, but is hard to figure out what the pros and cons of each way is. Is there some list containing the most common implementations of lazy evaluation together with their pros and cons? And also, how does Haskell do it?
This is far from a full answer, but maybe it'll be enough to make progress.
First, the function
let st = \x -> (st x)
binds st to a lambda. This may be a thunk pointing to a lambda, or it may not be (the Haskell Report only specifies non-strict evaluation. If the compiler can prove that evaluating a thunk early doesn't change the semantics of the program, it's free to do so; it's trivial to prove that a lambda in the source code can be evaluated to WHNF without changing the semantics).
Regardless, suppose you force evaluation of st "hi". After applying the lambda (beta reduction), the next step is st "hi". So this beta reduction loops endlessly, but it never creates new data. That is, there's no need to wrap anything in a thunk. So though it loops forever, this application doesn't allocate memory.
Compare this to
let st = \x -> (st (id x))
Here, if we beta reduce:
st "hi"
st (id "hi")
st (id (id "hi"))
st (id (id (id "hi")))
etc. Here, because the argument to st is never evaluated, it builds up an endless chain of thunks wrapping a new id application, consuming increasing memory.
I think the problem you're having with your implementation is that you're wrapping "hi" underneath the lambda. Instead, whatever produces "hi" should create the thunk which then gets passed around until it's evaluated. Then "hi" only gets wrapped once instead of at each step.
Edit: forgot to answer your first question, but I can't do better than #leftaroundabout's suggestion to read up on Weak Head Normal Form. There are other questions on SO too, e.g. Haskell: What is Weak Head Normal Form? and Weak head normal form and order of evaluation

Benefit of avoiding multiple list traversals

I've seen many examples in functional languages about processing a list and constructing a function to do something with its elements after receiving some additional value (usually not present at the time the function was generated), such as:
Calculating the difference between each element and the average
(the last 2 examples under "Lazy Evaluation")
Staging a list append in strict functional languages such as ML/OCaml, to avoid traversing the first list more than once
(the section titled "Staging")
Comparing a list to another with foldr (i.e. generating a function to compare another list to the first)
listEq a b = foldr comb null a b
where comb x frec [] = False
comb x frec (e:es) = x == e && frec es
cmp1To10 = listEq [1..10]
In all these examples, the authors generally remark the benefit of traversing the original list only once. But I can't keep myself from thinking "sure, instead of traversing a list of N elements, you are traversing a chain of N evaluations, so what?". I know there must be some benefit to it, could someone explain it please?
Edit: Thanks to both for the answers. Unfortunately, that's not what I wanted to know. I'll try to clarify my question, so it's not confused with the (more common) one about creating intermediate lists (which I already read about in various places). Also thanks for correcting my post formatting.
I'm interested in the cases where you construct a function to be applied to a list, where you don't yet have the necessary value to evaluate the result (be it a list or not). Then you can't avoid generating references to each list element (even if the list structure is not referenced anymore). And you have the same memory accesses as before, but you don't have to deconstruct the list (pattern matching).
For example, see the "staging" chapter in the mentioned ML book. I've tried it in ML and Racket, more specifically the staged version of "append" which traverses the first list and returns a function to insert the second list at the tail, without traversing the first list many times. Surprisingly for me, it was much faster even considering it still had to copy the list structure as the last pointer was different on each case.
The following is a variant of map which after applied to a list, it should be faster when changing the function. As Haskell is not strict, I would have to force the evaluation of listMap [1..100000] in cachedList (or maybe not, as after the first application it should still be in memory).
listMap = foldr comb (const [])
where comb x rest = \f -> f x : rest f
cachedList = listMap [1..100000]
doubles = cachedList (2*)
squares = cachedList (\x -> x*x)
-- print doubles and squares
-- ...
I know in Haskell it doesn't make a difference (please correct me if I'm wrong) using comb x rest f = ... vs comb x rest = \f -> ..., but I chose this version to emphasize the idea.
Update: after some simple tests, I couldn't find any difference in execution times in Haskell. The question then is only about strict languages such as Scheme (at least the Racket implementation, where I tested it) and ML.
Executing a few extra arithmetic instructions in your loop body is cheaper than executing a few extra memory fetches, basically.
Traversals mean doing lots of memory access, so the less you do, the better. Fusion of traversals reduces memory traffic, and increases the straight line compute load, so you get better performance.
Concretely, consider this program to compute some math on a list:
go :: [Int] -> [Int]
go = map (+2) . map (^3)
Clearly, we design it with two traversals of the list. Between the first and the second traversal, a result is stored in an intermediate data structure. However, it is a lazy structure, so only costs O(1) memory.
Now, the Haskell compiler immediately fuses the two loops into:
go = map ((+2) . (^3))
Why is that? After all, both are O(n) complexity, right?
The difference is in the constant factors.
Considering this abstraction: for each step of the first pipeline we do:
i <- read memory -- cost M
j = i ^ 3 -- cost A
write memory j -- cost M
k <- read memory -- cost M
l = k + 2 -- cost A
write memory l -- cost M
so we pay 4 memory accesses, and 2 arithmetic operations.
For the fused result we have:
i <- read memory -- cost M
j = (i ^ 3) + 2 -- cost 2A
write memory j -- cost M
where A and M are the constant factors for doing math on the ALU and memory access.
There are other constant factors as well (two loop branches) instead of one.
So unless memory access is free (it is not, by a long shot) then the second version is always faster.
Note that compilers that operate on immutable sequences can implement array fusion, the transformation that does this for you. GHC is such a compiler.
There is another very important reason. If you traverse a list only once, and you have no other reference to it, the GC can release the memory claimed by the list elements as you traverse them. Moreover, if the list is generated lazily, you always have only a constant memory consumption. For example
import Data.List
main = do
let xs = [1..10000000]
sum = foldl' (+) 0 xs
len = foldl' (\_ -> (+ 1)) 0 xs
print (sum / len)
computes sum, but needs to keep the reference to xs and the memory it occupies cannot be released, because it is needed to compute len later. (Or vice versa.) So the program consumes a considerable amount of memory, the larger xs the more memory it needs.
However, if we traverse the list only once, it is created lazily and the elements can be GC immediately, so no matter how big the list is, the program takes only O(1) memory.
{-# LANGUAGE BangPatterns #-}
import Data.List
main = do
let xs = [1..10000000]
(sum, len) = foldl' (\(!s,!l) x -> (s + x, l + 1)) (0, 0) xs
print (sum / len)
Sorry in advance for a chatty-style answer.
That's probably obvious, but if we're talking about the performance, you should always verify hypotheses by measuring.
A couple of years ago I was thinking about the operational semantics of GHC, the STG machine. And I asked myself the same question — surely the famous "one-traversal" algorithms are not that great? It only looks like one traversal on the surface, but under the hood you also have this chain-of-thunks structure which is usually quite similar to the original list.
I wrote a few versions (varying in strictness) of the famous RepMin problem — given a tree filled with numbers, generate the tree of the same shape, but replace every number with the minimum of all the numbers. If my memory is right (remember — always verify stuff yourself!), the naive two-traversal algorithm performed much faster than various clever one-traversal algorithms.
I also shared my observations with Simon Marlow (we were both at an FP summer school during that time), and he said that they use this approach in GHC. But not to improve performance, as you might have thought. Instead, he said, for a big AST (such as Haskell's one) writing down all the constructors takes much space (in terms of lines of code), and so they just reduce the amount of code by writing down just one (syntactic) traversal.
Personally I avoid this trick because if you make a mistake, you get a loop which is a very unpleasant thing to debug.
So the answer to your question is, partial compilation. Done ahead of time, it makes it so that there's no need to traverse the list to get to the individual elements - all the references are found in advance and stored inside the pre-compiled function.
As to your concern about the need for that function to be traversed too, it would be true in interpreted languages. But compilation eliminates this problem.
In the presence of laziness this coding trick may lead to the opposite results. Having full equations, e.g. Haskell GHC compiler is able to perform all kinds of optimizations, which essentially eliminate the lists completely and turn the code into an equivalent of loops. This happens when we compile the code with e.g. -O2 switch.
Writing out the partial equations may prevent this compiler optimization and force the actual creation of functions - with drastic slowdown of the resulting code. I tried your cachedList code and saw a 0.01s execution time turn into 0.20s (don't remember right now the exact test I did).

What are examples of Symbolic Programming?

I have to do a term project in my symbolic programming class. But I'm not really sure what a good/legitimate project would be. Can anyone give me examples of symbolic programming? Just any generic ideas because right now I'm leaning towards a turn based fight game (jrpg style basically), but I really don't want to do a game.
The book Paradigms of Artificial Intelligence Programming, Case Studies in Common Lisp by Peter Norvig is useful in this context.
The book describes in detail symbolic AI programming with Common Lisp.
The examples are in the domain of Computer Algebra, solving mathematical tasks, game playing, compiler implementation and more.
Soon there will be another fun book: The Land of Lisp by Conrad Barski.
Generally there are a lot of possible applications of Symbolic Programming:
natural language question answering
natural language story generation
planning in logistics
computer algebra
fault diagnosis of technical systems
description of catalogs of things and matching
game playing
scene understaning
configuration of technical things
A simple arithmetic interpreter in Scheme that takes a list of symbols as input:
(define (symbolic-arith lst)
(case (car lst)
((add) (+ (cadr lst) (caddr lst)))
((sub) (- (cadr lst) (caddr lst)))
((mult) (* (cadr lst) (caddr lst)))
((div) (/ (cadr lst) (caddr lst)))
(else (error "unknown operator"))))
Test run:
> (symbolic-arith '(add 2 43))
=> 45
> (symbolic-arith '(sub 10 43))
=> -33
> (symbolic-arith '(mu 50 43))
unknown operator
> (symbolic-arith '(mult 50 43))
=> 2150
That shows the basic idea behind meta-circular interpreters. Lisp in Small Pieces is the best place to learn about such interpreters.
Another simple example where lists of symbols are used to implement a key-value data store:
> (define person (list (cons 'name 'mat) (cons 'age 20)))
> person
=> ((name . mat) (age . 20))
> (assoc 'name person)
=> (name . mat)
> (assoc 'age person)
=> (age . 20)
If you are new to Lisp, Common Lisp: A Gentle Introduction to Symbolic Computation is a good place to start.
This isn't (just) a naked attempt to steal #Ken's rep. I don't recall what McCarthy's original motivation for creating Lisp might have been. But it is certainly suitable for computer algebra, including differentiation and integration.
The reason that I am posting, though, is that automatic differentiation is used to mean something other than differentiating symbolic expressions. It's used to mean writing a function which calculate the derivative of another function. For example, given a Fortran program which calculates f(x) an automatic differentiation tool would write a Fortran function which calculates f'(x). One technique, of course, is to try to transform the program into a symbolic expression, then use symbolic differentiation, then transform the resulting expression into a program again.
The first of these is a nice exercise in symbolic computation, though it is so well trodden that it might not be a good choice for a term paper. The second is an active research front and OP would have to be careful not to bite off more than (s)he can chew . However, even a limited implementation would be interesting and challenging.

Haskell function definition and caching arrays

I have a question about implementing caching (memoization) using arrays in Haskell. The following pattern works:
f = (fA !)
where fA = listArray...
But this does not (the speed of the program suggests that the array is getting recreated each call or something):
f n = (fA ! n)
where fA = listArray...
Defining fA outside of a where clause (in "global scope") also works with either pattern.
I was hoping that someone could point me towards a technical explanation of what the difference between the above two patterns is.
Note that I am using the latest GHC, and I'm not sure if this is just a compiler peculiarity or part of the language itself.
EDIT: ! is used for array access, so fA ! 5 means fA[5] in C++ syntax. My understanding of Haskell is that (fA !) n would be the same as (fA ! n)...also it would have been more conventional for me to have written "f n = fA ! n" (without the parentheses). Anyway, I get the same behaviour no matter how I parenthesize.
The difference in behavior is not specified by the Haskell standard. All it has to say is that the functions are the same (will result in the same output given the same input).
However in this case there is a simple way to predict time and memory performance that most compilers adhere to. Again I stress that this is not essential, only that most compilers do it.
First rewrite your two examples as pure lambda expressions, expanding the section:
f = let fA = listArray ... in \n -> fA ! n
f' = \n -> let fA = listArray ... in fA ! n
Compilers use let binding to indicate sharing. The guarantee is that in a given environment (set of local variables, lambda body, something like that), the right side of a let binding with no parameters will be evaluated at most once. The environment of fA in the former is the whole program since it is not under any lambda, but the environment of the latter is smaller since it is under a lambda.
What this means is that in the latter, fA may be evaluated once for each different n, whereas in the former this is forbidden.
We can see this pattern in effect even with multi argument functions:
g x y = (a ! y) where a = [ x ^ y' | y' <- [0..] ]
g' x = (\y -> a ! y) where a = [ x ^ y' | y' <- [0..] ]
Then in:
let k = g 2 in k 100 + k 100
We might compute 2^100 more than once, but in:
let k = g' 2 in k 100 + k 100
We will only compute it once.
If you are doing work with memoization, I recommend data-memocombinators on Hackage, which is a library of memo tables of different shapes, so you don't have to roll your own.
The best way to find what is going on is to tell the compiler to output its intermediate representation with -v4. The output is voluminous and a bit hard to read, but should allow you to find out exactly what the difference in the generated code is, and how the compiler arrived there.
You will probably notice that fA is being moved outside the function (to the "global scope") on your first example. On your second example, it probably is not (meaning it will be recreated on each call).
One possible reason for it not being moved outside the function would be because the compiler is thinking it depends on the value of n. On your working example, there is no n for fA to depend on.
But the reason I think the compiler is avoiding moving fA outside on your second example is because it is trying to avoid a space leak. Consider what would happen if fA, instead of your array, were an infinite list (on which you used the !! operator). Imagine you called it once with a large number (for instance f 10000), and later only called it with small numbers (f 2, f 3, f 12...). The 10000 elements from the earlier call are still on memory, wasting space. So, to avoid this, the compiler creates fA again every time you call your function.
The space leak avoidance probably does not happen on your first example because in that case f is in fact only called once, returning a closure (we are now at the frontier of the pure functional and the imperative worlds, so things get a bit more subtle). This closure replaces the original function, which will never be called again, so fA is only called once (and thus the optimizer feels free to move it outside the function). On your second example, f does not get replaced by a closure (since its value depends on the argument), and thus will get called again.
If you want to try to understand more of this (which will help reading the -v4 output), you could take a look at the Spineless Tagless G-Machine paper (citeseer link).
As to your final question, I think it is a compiler peculiarity (but I could be wrong). However, it would not surprise me if all compilers did the same thing, even if it were not part of the language.
Cool, thank you for your answers which helped a lot, and I will definitely check out data-memocombinators on Hackage. Coming from a C++-heavy background, I've been struggling with understanding exactly what Haskell will do (mainly in terms of complexity) with a given program, which tutorials don't seem to get in to.

Resources