Suspected SBCL Garbage Collection Bug for Hash Table

Suspected SBCL Garbage Collection Bug for Hash Table - garbage-collection

I'm using SBCL 1.4.5 on Ubuntu 18.04.
It seems that the garbage collector in SBCL is not properly freeing up memory bound to hash tables with symbol keys and values. Interestingly, when the keys and values are integers, garbage collection works fine.
For example, the following program works correctly:
(defun copy-hash-table (hash)
(let ((h (make-hash-table
:test (hash-table-test hash)
:rehash-size (hash-table-rehash-size hash)
:rehash-threshold (hash-table-rehash-threshold hash)
:size (hash-table-size hash))))
(loop for key being the hash-keys of hash
using (hash-value value)
do
(setf (gethash key h) value)
finally (return h))))
(defun make-integer-input ()
(loop
with hash1 = (make-hash-table) and hash2 = (make-hash-table)
for i from 0 to 500
do
(setf (gethash (random 100) hash1) (random 100))
(setf (gethash (random 100) hash2) (random 100))
finally
(return (list hash1 hash2))))
(defun do-integer-work (hash1 hash2)
(loop
for i being the hash-keys of hash1
for j being the hash-keys of hash2
do
(remhash i hash1)
(setf (gethash i hash1) (random 100))
(remhash j hash2)
(setf (gethash j hash2) (random 100)))
(values hash1 hash2))
(defun hash-worker (list-obj)
(loop
with next
for i from 1 to 50000
do
(multiple-value-bind (new-hash1 new-hash2)
(do-integer-work (copy-hash-table (first list-obj)) (copy-hash-table (second list-obj)))
(setq next (list new-hash1 new-hash2))
(if (> (random 100) 50)
(setq list-obj next)))))
I ran this program by calling (hash-worker (make-integer-input)). The top-level function, hash-worker, takes in a list of two hash tables, and does work over a copy of the hash tables in do-integer-work. Then, then the helper function outputs two modified hash tables which are saved in new-hash1 and new-hash2. Then the system randomly decides to keep the modified hash tables or not.
The do-integer-work helper function sequentially removes the keys of the hash tables and resubmits them with new random values.
When this program runs, I observed that the memory consumption was basically constant for the duration of the program. This is not the case when I ran a sister program over hash tables with symbol keys and values.
(defun copy-hash-table (hash)
(let ((h (make-hash-table
:test (hash-table-test hash)
:rehash-size (hash-table-rehash-size hash)
:rehash-threshold (hash-table-rehash-threshold hash)
:size (hash-table-size hash))))
(loop for key being the hash-keys of hash
using (hash-value value)
do
(setf (gethash key h) value)
finally (return h))))
(defun make-symbol-input ()
(loop
with hash1 = (make-hash-table) and hash2 = (make-hash-table)
for i from 0 to 500
do
(setf (gethash (gentemp) hash1) (gentemp))
(setf (gethash (gentemp) hash2) (gentemp))
finally
(return (list hash1 hash2))))
(defun do-symbol-work (hash1 hash2)
(loop
for i being the hash-keys of hash1
for j being the hash-keys of hash2
do
(remhash i hash1)
(setf (gethash i hash1) (gentemp))
(remhash j hash2)
(setf (gethash j hash2) (gentemp)))
(values hash1 hash2))
(defun hash-worker (list-obj)
(loop
with next
for i from 1 to 50000
do
(multiple-value-bind (new-hash1 new-hash2)
(do-symbol-work (copy-hash-table (first list-obj)) (copy-hash-table (second list-obj)))
(setq next (list new-hash1 new-hash2))
(if (> (random 100) 50)
(setq list-obj next)))))
I ran this program calling (hash-worker (make-symbol-input)). The difference in this program is that the top-level function calls do-symbol-work. As this program ran, I observed the memory usage of the system steadily increase until my machine ran out of memory.
Is this a known bug in SBCL, and if so, is there a workaround for this?

You're using gentemp, which interns the symbol it creates. Such symbols can't be GCd because they are referenced by their package. So you are generating a huge number of interned symbols and killing the system. Instead, use gensym. There may still be GC bugs in code like this, but this isn't one.

Related

Understanding variable scope in given code

I'm a beginner in Emacs lisp, so this is really a noob question. Let's say that I have to write a function that uses a loop to add 1 to each element of a numeric vector.
Here is the code I wrote (the comments indicate what I'm trying to do at each step):
(defun add-one (x)
"Use a loop to add 1 to each element of list X"
(let* ((res x) ; make a copy of X
(counter 0)) ; set counter to 0
(while (< counter (length x))
;; Replace each element of RES by adding 1:
(setcar (nthcdr counter res) (1+ (car (nthcdr counter x))))
(setq counter (1+ counter)))
;; Display result:
(message "%s" res)))
But my code seems to be destructive for x, since several calls to the function do not produce the same result:
;; Define a list:
(setq mylist '(1 2 3 4))
;; Several calls to the function:
(add-one mylist) ; -> (2 3 4 5)
(add-one mylist) ; -> (3 4 5 6)
Here is my question: I don't understand why my code is destructive (I expected the result to be (2 3 4 5) at each execution). I know that setcar is destructive, but it is applied to a copy of x, not to x itself. So why is the result changing?
Thanks!

For the sake of clarity:
Despite the fact you do not copy input list, your code is not at all lispy. Try this instead:
(mapcar '1+ '(1 2 3 4))

Let does not make a copy of anything, so this assigns the value referenced by the variable x to the variable res. Hence any changes to the list referenced by res also change the list referenced by x
(let* ((res x) ; make a copy of X

Clojure async/go how to park blocking code

I use some Java library that makes non-async get and post requests. I used to wrap such requests to futures and it solves for me the "waiting problem" (I mean waiting for the response)
(defn unchangeable-lib-request [n]
(Thread/sleep 1000)
n)
(defn process [n]
(let [res (atom [])]
(dotimes [i n]
(future (swap! res conj (unchangeable-lib-request i))))
(loop []
(if (> n (count #res))
(recur)
#res))))
(time (process 9))
;; "Elapsed time: 1000.639079 msecs"
;; => [8 7 5 6 4 3 2 1 0]
But I need to create hundreds of requests and this creates performance problems. I found out about core.async and go blocks. But if I will use go-blocks with this library, it will not solve the "waiting problem"
(defn unchangeable-lib-request [n]
(Thread/sleep 1000)
n)
(defn process [n]
(let [c (async/chan 10)]
(dotimes [i n]
(async/go
(async/>! c (unchangeable-lib-request i))))
(loop [result []]
(if (> n (count result))
(recur (conj result (async/<!! c)))
result))))
(time (process 9))
;; "Elapsed time: 2001.770183 msecs"
;; => [0 4 1 6 7 2 5 3 8]
Go blocks can handle just 8 requests simultaneously. Is there a possibility to write some async-wrapper that will park go-block and provide ability to make 100s of requests asynchronously without blocking each other?
(defn process [n]
(let [c (async/chan 10)]
(dotimes [i n]
(async/go
(async/>! c (magic-async-parking-wrapper
(unchangeable-lib-request i))))
(loop [result []]
(if (> n (count result))
(recur (conj result (async/<!! c)))
result))))
(time (process 9))
;; "Elapsed time: 1003.2563 msecs"
I know about async/thread but it seems that this is the same as (future ...).
Is it possible?

I'd suggest:
Use futures to create the threads, and have them put the results back onto a core async channel from outside of any go block using put!, something like: (future (put! chan (worker-function)))
Then use a go block to wait on that (single) channel, put in the results as you get them.

This is where you use clojure.core.async/pipeline-blocking
(require '[clojure.core.async :as a :refer [chan pipeline-blocking]])
(let [output-chan (chan 100)
input-chan (chan 1000)]
(pipeline-blocking 4 ; parallelism knob
output-chan
(map unchangeable-lib-request)
input-chan)
;; Consume results from output-chan, put operations on input-chan
[output-chan input-chan]
)
This spawns n (in this case 4) threads that are kept busy executing unchangeable-lib-request.
Use the buffer size of output-chan to finetune how much requests you want to happen in advance.
Use the buffer size of input-chan to finetune how many requests you want scheduled without backpropagation (a blocking input-chan).

Intersection-sets using iterator in Scheme

I am trying to create a function that takes two set objects and returns a new set object that is the intersection of those two objects while using an iterator.
Here are some functions that I used, basic-set1 and basic-set2 are set objects that are initially empty.
((basic-set1 'get-set))
>(d c b a)
((basic-set2 'get-set))
>(a b)
(define my-iterator3 basic-set1)
((my-iterator3 'next))
> d
((my-iterator3 'next))
> c
((my-iterator3 'hasnext))
> #t
My desired output
(intersection-sets basic-set1 basic-set2)
> (b a)
This is the code I have so far.
(define (intersect-sets set1 set2)
(define my-iterator3 ((set1 'get-iterator )))
(define result (basic-set))
(define (iter)
(let ((x ((my-iterator3 'next))))
(cond ((not ((my-iterator3 'hasnext))) result)
(((set2 'element?) x)
(begin ((result 'insert) x)
(iter)))
(else
(iter)))))
(iter))
Tested output:
(intersect-sets basic-set1 basic-set2)
>#<procedure:...Problem3.rkt:60:2
I'm kind of stumped. Any help would be appreciated.

As far as I can tell your code is correct. The first cond clause returns result, which is a procedure. If you want the set returned as a list try ((not ((my-iterator3 'hasnext))) ((result 'get-set))) as your first cond clause in iter

What is the core difference between strings and numbers in Common Lisp?

Being new with CL, I play a lot with simple algorithms. For instance, I tried to implement a function for removing all unique elements in a list.
(1 2 2 3 3 4 5 3) -> (2 2 3 3 3)
First attempt lead to this code:
(defun remove-unique (items)
(let ((duplicates (set-difference items (remove-duplicates items :test #'equal))))
(append duplicates (remove-duplicates duplicates :test #'equal))))
This works ok with strings but does always return NIL for numbers. Reading a bit more about set-difference I've learned that it isn't suppose to work with duplicate populated lists at all, it just works somehow in my case, so I abandoned the approach as unportable and moved along.
Another attempt is:
(defun remove-unique (items)
(loop for item in items
when (member item (cdr (member item items)))
collect item))
And this works ok with numbers, but returns NIL for strings.
Apparently there is a core difference between strings and numbers I don't understand. How come list processing functions such as member and set-difference work differently on them?

The equality comparison for numbers, characters and strings is indeed different. Equal, which you should be wary to use because it is more expensive, does structure equality (so it descends on some objects). eq does object equality. And eql does object equality for most cases except for numbers (where they check type and value) and characters (where they check 'value')
See the hyperspec entries for equal, eql and eq for more information.

Strings are more related to lists than numbers since both lists and strings are sequences.
"Hello" is a sequence (compund data type) starting with the primitive character value #\H and ending with #\o.
'(1 2 3) is a sequence (compond data type) starting with the primitive numeric value 1 and ending with 3.
Characters are similar to numbers in that they are primitive values. Primitive values can be compared using eql while sequences, that are not the same object, can be compared using equal
(setq list1 (list 1 2 3))
(setq list2 (list 1 2 3))
(eql list1 list2)
;==> NIL
(equal list1 list2)
;==> T
;; comparing first element of both lists using eql
(eql (car list1) (car list2))
;==> T
(setq string1 "Hello")
(setq string2 "Hello")
(eql string1 string2)
;==> NIL
(equal string1 string2)
;==> T
;; comparing first character of both strings using eql
(eql (elt string1 0) (elt string2 0))
;==> T
Most (if not all) functions in Common Lisp that compares something usually has an optional named argument :test where you can supply how elements compare. the default usually is eql. To make them behave corretly with sequences you need to supply #'equal as :test.

(defun remove-unique (items &key (test 'eql))
(loop
:with table := (make-hash-table :test test)
:for element :in items :do
(setf (gethash element table)
(1+ (gethash element table 0)))
:finally
(return
(loop
:for k :being :the :hash-keys :of table
:using (:hash-value v)
:when (> v 1) :nconc (make-list v :initial-element k)))))
(defun remove-unique (items &key (test 'eql))
(loop
:with table := (make-hash-table :test test)
:for element :in items :do
(setf (gethash element table)
(1+ (gethash element table 0)))
:finally
(return
(loop
:for element :in items
:unless (= 1 (gethash element table))
:collect element))))
I'd probably use the first variant because it makes less reads from hash-table, but you'd need to check that items in the list aren't modified later in place.
(remove-unique '("1" "2" "2" "3" "3" "4" "5" "3") :test #'equal)
gives:
("2" "2" "3" "3" "3")
but
(remove-unique '("1" "2" "2" "3" "3" "4" "5" "3"))
gives:
NIL

How do I concat/flatten byte arrays

I'm making a function that generates a .wav file. I have the header all set, but I'm running into trouble with the data itself. I have a function for creating a sine wave at 880Hz (at least I think that's what it does, but that's not my question)--the question is, how do I convert a collection of byte arrays to just one byte array with their contents? This was my best try:
(defn lil-endian-bytes [i]
(let [i (int i)]
(byte-array
(map #(.byteValue %)
[(bit-and i 0x000000ff)
(bit-shift-right (bit-and i 0x0000ff00) 8)
(bit-shift-right (bit-and i 0x00ff0000) 16)
(bit-shift-right (bit-and i 0xff000000) 24)]))))
(def leb lil-endian-bytes)
(let [wr (io/output-stream (str name ".wav") :append true)]
(.write wr
(byte-array (flatten (concat (map
(fn [sample] (leb (* 0xffff (math/sin (+ (* 2 3.14 880) sample)))))
(range (* duration s-rate)) )))))
but it doesn't do what I want it to do: concat all of the byte-arrays into one vector and then into a single byte array. And it makes sense to me why it can't: it can't concat/flatten a byte[] because it's not a vector; it's a byte[]. And it can't cast a byte[] into a byte. But what do I need to do to get this working?

You might be looking for something like:
(byte-array (mapcat seq my-sequence-of-byte-arrays))

Converting the byte arrays to be concatenated to sequences and back to byte array may be a little inefficient if you are dealing with large arrays. Here's how to concatenate byte arrays making java.nio.ByteBuffer do the heavy lifting:
(defn concat-byte-arrays [& byte-arrays]
(when (not-empty byte-arrays)
(let [total-size (reduce + (map count byte-arrays))
result (byte-array total-size)
bb (java.nio.ByteBuffer/wrap result)]
(doseq [ba byte-arrays]
(.put bb ba))
result)))

It will be as simple as:
Your byte arrays that needs to be combined in a single byte-array:
(def byte-arrays [(byte-array 10 (byte 1))
(byte-array 10 (byte 2))
(byte-array 10 (byte 3))])
Combine:
(byte-array (for [ar byte-arrays
i ar] i))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Suspected SBCL Garbage Collection Bug for Hash Table - garbage-collection

You're using gentemp, which interns the symbol it creates. Such symbols can't be GCd because they are referenced by their package. So you are generating a huge number of interned symbols and killing the system. Instead, use gensym. There may still be GC bugs in code like this, but this isn't one.

Related

Understanding variable scope in given code

Clojure async/go how to park blocking code

Intersection-sets using iterator in Scheme

What is the core difference between strings and numbers in Common Lisp?

How do I concat/flatten byte arrays

Categories

Resources