What is the core difference between strings and numbers in Common Lisp? - string

Being new with CL, I play a lot with simple algorithms. For instance, I tried to implement a function for removing all unique elements in a list.
(1 2 2 3 3 4 5 3) -> (2 2 3 3 3)
First attempt lead to this code:
(defun remove-unique (items)
(let ((duplicates (set-difference items (remove-duplicates items :test #'equal))))
(append duplicates (remove-duplicates duplicates :test #'equal))))
This works ok with strings but does always return NIL for numbers. Reading a bit more about set-difference I've learned that it isn't suppose to work with duplicate populated lists at all, it just works somehow in my case, so I abandoned the approach as unportable and moved along.
Another attempt is:
(defun remove-unique (items)
(loop for item in items
when (member item (cdr (member item items)))
collect item))
And this works ok with numbers, but returns NIL for strings.
Apparently there is a core difference between strings and numbers I don't understand. How come list processing functions such as member and set-difference work differently on them?

The equality comparison for numbers, characters and strings is indeed different. Equal, which you should be wary to use because it is more expensive, does structure equality (so it descends on some objects). eq does object equality. And eql does object equality for most cases except for numbers (where they check type and value) and characters (where they check 'value')
See the hyperspec entries for equal, eql and eq for more information.

Strings are more related to lists than numbers since both lists and strings are sequences.
"Hello" is a sequence (compund data type) starting with the primitive character value #\H and ending with #\o.
'(1 2 3) is a sequence (compond data type) starting with the primitive numeric value 1 and ending with 3.
Characters are similar to numbers in that they are primitive values. Primitive values can be compared using eql while sequences, that are not the same object, can be compared using equal
(setq list1 (list 1 2 3))
(setq list2 (list 1 2 3))
(eql list1 list2)
;==> NIL
(equal list1 list2)
;==> T
;; comparing first element of both lists using eql
(eql (car list1) (car list2))
;==> T
(setq string1 "Hello")
(setq string2 "Hello")
(eql string1 string2)
;==> NIL
(equal string1 string2)
;==> T
;; comparing first character of both strings using eql
(eql (elt string1 0) (elt string2 0))
;==> T
Most (if not all) functions in Common Lisp that compares something usually has an optional named argument :test where you can supply how elements compare. the default usually is eql. To make them behave corretly with sequences you need to supply #'equal as :test.

(defun remove-unique (items &key (test 'eql))
(loop
:with table := (make-hash-table :test test)
:for element :in items :do
(setf (gethash element table)
(1+ (gethash element table 0)))
:finally
(return
(loop
:for k :being :the :hash-keys :of table
:using (:hash-value v)
:when (> v 1) :nconc (make-list v :initial-element k)))))
(defun remove-unique (items &key (test 'eql))
(loop
:with table := (make-hash-table :test test)
:for element :in items :do
(setf (gethash element table)
(1+ (gethash element table 0)))
:finally
(return
(loop
:for element :in items
:unless (= 1 (gethash element table))
:collect element))))
I'd probably use the first variant because it makes less reads from hash-table, but you'd need to check that items in the list aren't modified later in place.
(remove-unique '("1" "2" "2" "3" "3" "4" "5" "3") :test #'equal)
gives:
("2" "2" "3" "3" "3")
but
(remove-unique '("1" "2" "2" "3" "3" "4" "5" "3"))
gives:
NIL

Related

Suspected SBCL Garbage Collection Bug for Hash Table

I'm using SBCL 1.4.5 on Ubuntu 18.04.
It seems that the garbage collector in SBCL is not properly freeing up memory bound to hash tables with symbol keys and values. Interestingly, when the keys and values are integers, garbage collection works fine.
For example, the following program works correctly:
(defun copy-hash-table (hash)
(let ((h (make-hash-table
:test (hash-table-test hash)
:rehash-size (hash-table-rehash-size hash)
:rehash-threshold (hash-table-rehash-threshold hash)
:size (hash-table-size hash))))
(loop for key being the hash-keys of hash
using (hash-value value)
do
(setf (gethash key h) value)
finally (return h))))
(defun make-integer-input ()
(loop
with hash1 = (make-hash-table) and hash2 = (make-hash-table)
for i from 0 to 500
do
(setf (gethash (random 100) hash1) (random 100))
(setf (gethash (random 100) hash2) (random 100))
finally
(return (list hash1 hash2))))
(defun do-integer-work (hash1 hash2)
(loop
for i being the hash-keys of hash1
for j being the hash-keys of hash2
do
(remhash i hash1)
(setf (gethash i hash1) (random 100))
(remhash j hash2)
(setf (gethash j hash2) (random 100)))
(values hash1 hash2))
(defun hash-worker (list-obj)
(loop
with next
for i from 1 to 50000
do
(multiple-value-bind (new-hash1 new-hash2)
(do-integer-work (copy-hash-table (first list-obj)) (copy-hash-table (second list-obj)))
(setq next (list new-hash1 new-hash2))
(if (> (random 100) 50)
(setq list-obj next)))))
I ran this program by calling (hash-worker (make-integer-input)). The top-level function, hash-worker, takes in a list of two hash tables, and does work over a copy of the hash tables in do-integer-work. Then, then the helper function outputs two modified hash tables which are saved in new-hash1 and new-hash2. Then the system randomly decides to keep the modified hash tables or not.
The do-integer-work helper function sequentially removes the keys of the hash tables and resubmits them with new random values.
When this program runs, I observed that the memory consumption was basically constant for the duration of the program. This is not the case when I ran a sister program over hash tables with symbol keys and values.
(defun copy-hash-table (hash)
(let ((h (make-hash-table
:test (hash-table-test hash)
:rehash-size (hash-table-rehash-size hash)
:rehash-threshold (hash-table-rehash-threshold hash)
:size (hash-table-size hash))))
(loop for key being the hash-keys of hash
using (hash-value value)
do
(setf (gethash key h) value)
finally (return h))))
(defun make-symbol-input ()
(loop
with hash1 = (make-hash-table) and hash2 = (make-hash-table)
for i from 0 to 500
do
(setf (gethash (gentemp) hash1) (gentemp))
(setf (gethash (gentemp) hash2) (gentemp))
finally
(return (list hash1 hash2))))
(defun do-symbol-work (hash1 hash2)
(loop
for i being the hash-keys of hash1
for j being the hash-keys of hash2
do
(remhash i hash1)
(setf (gethash i hash1) (gentemp))
(remhash j hash2)
(setf (gethash j hash2) (gentemp)))
(values hash1 hash2))
(defun hash-worker (list-obj)
(loop
with next
for i from 1 to 50000
do
(multiple-value-bind (new-hash1 new-hash2)
(do-symbol-work (copy-hash-table (first list-obj)) (copy-hash-table (second list-obj)))
(setq next (list new-hash1 new-hash2))
(if (> (random 100) 50)
(setq list-obj next)))))
I ran this program calling (hash-worker (make-symbol-input)). The difference in this program is that the top-level function calls do-symbol-work. As this program ran, I observed the memory usage of the system steadily increase until my machine ran out of memory.
Is this a known bug in SBCL, and if so, is there a workaround for this?
You're using gentemp, which interns the symbol it creates. Such symbols can't be GCd because they are referenced by their package. So you are generating a huge number of interned symbols and killing the system. Instead, use gensym. There may still be GC bugs in code like this, but this isn't one.

Understanding variable scope in given code

I'm a beginner in Emacs lisp, so this is really a noob question. Let's say that I have to write a function that uses a loop to add 1 to each element of a numeric vector.
Here is the code I wrote (the comments indicate what I'm trying to do at each step):
(defun add-one (x)
"Use a loop to add 1 to each element of list X"
(let* ((res x) ; make a copy of X
(counter 0)) ; set counter to 0
(while (< counter (length x))
;; Replace each element of RES by adding 1:
(setcar (nthcdr counter res) (1+ (car (nthcdr counter x))))
(setq counter (1+ counter)))
;; Display result:
(message "%s" res)))
But my code seems to be destructive for x, since several calls to the function do not produce the same result:
;; Define a list:
(setq mylist '(1 2 3 4))
;; Several calls to the function:
(add-one mylist) ; -> (2 3 4 5)
(add-one mylist) ; -> (3 4 5 6)
Here is my question: I don't understand why my code is destructive (I expected the result to be (2 3 4 5) at each execution). I know that setcar is destructive, but it is applied to a copy of x, not to x itself. So why is the result changing?
Thanks!
For the sake of clarity:
Despite the fact you do not copy input list, your code is not at all lispy. Try this instead:
(mapcar '1+ '(1 2 3 4))
Let does not make a copy of anything, so this assigns the value referenced by the variable x to the variable res. Hence any changes to the list referenced by res also change the list referenced by x
(let* ((res x) ; make a copy of X

Intersection-sets using iterator in Scheme

I am trying to create a function that takes two set objects and returns a new set object that is the intersection of those two objects while using an iterator.
Here are some functions that I used, basic-set1 and basic-set2 are set objects that are initially empty.
((basic-set1 'get-set))
>(d c b a)
((basic-set2 'get-set))
>(a b)
(define my-iterator3 basic-set1)
((my-iterator3 'next))
> d
((my-iterator3 'next))
> c
((my-iterator3 'hasnext))
> #t
My desired output
(intersection-sets basic-set1 basic-set2)
> (b a)
This is the code I have so far.
(define (intersect-sets set1 set2)
(define my-iterator3 ((set1 'get-iterator )))
(define result (basic-set))
(define (iter)
(let ((x ((my-iterator3 'next))))
(cond ((not ((my-iterator3 'hasnext))) result)
(((set2 'element?) x)
(begin ((result 'insert) x)
(iter)))
(else
(iter)))))
(iter))
Tested output:
(intersect-sets basic-set1 basic-set2)
>#<procedure:...Problem3.rkt:60:2
I'm kind of stumped. Any help would be appreciated.
As far as I can tell your code is correct. The first cond clause returns result, which is a procedure. If you want the set returned as a list try ((not ((my-iterator3 'hasnext))) ((result 'get-set))) as your first cond clause in iter

Clojure: Transform string into key-value pairs in hash-map

I'll start with my test using the speclj framework.
(it "turns the string into a hash-map"
(should= {1 "1" 2 "2" 3 "3"}
(format-string "1=1 2=2 3=3")))
Then my code:
(:use [clojure.string :only (split)])
(defn format-string [string]
(split string #"\s+"))
Right now, the format-string function returns ["1=1" "2=2" "3=3"] and the test fails. As you can see in my test, I want it to return a hash-map with key-value pairs indicated by the = sign.
I've tried a few things and I've gotten close, but can't quite understand how to make this transformation.
EDIT
Figured out one solution, although the keys are strings instead of integers.
My code:
(defn format-board [route]
(let [[first second third] (split route #"\s+")]
(merge
(apply hash-map (split-at-equals first))
(apply hash-map (split-at-equals second))
(apply hash-map (split-at-equals third))
This returns {"1" "1" "2" "2" "3" "3"}.
You have split at the spaces, but then you need to split again at the = delimiter. You can use regular expressions to do the parsing. Once you have your pairs you can assoc into a hash-map. Here I've used reduce to effect the transformation.
user=> (reduce #(assoc % (read-string (nth %2 1)) (nth %2 2)) {}
#_> (re-seq #"([^=\s]+)=([^=\s]+)" "1=1 2=2 3=3") )
{3 "3", 2 "2", 1 "1"}
Note key order is not applicable to hash-maps
user=> (= {1 "1", 2 "2", 3 "3"} *1)
true
Here is the potential parallel version using clojure.core.reducers:
(require '[clojure.core.reducers :as r])
(require '[clojure.string :as s])
(def string-of-pairs "1=1 2=2 3=3 4=4")
; reducing fn to convert seq of (key, value) to hash-map
(defn rf ([] {}) ([acc [k v]] (assoc acc k v)))
; for large colls, fold will parallelize execution
(r/fold merge rf (r/map #(s/split % #"=") (s/split string-of-pairs #"\s+")))
For better understanding of reducers, watch this video where
Rich explains motivation behind reducers and demonstrates some usage.

Reading specific columns from excel and parsing the data

As records are immutable i am unable to read in the data and parse it without it making a new instance of itself. Moreover, how would i be able to read my excel files from a number of specific columns instead of reading from column 0 to EOF. Is there anyway where i could read in the data from say column 1, column 3, column 5. Supposedly, Column 1 would be parsed as a string, Column 3 as an integer and Column 5 as a long.
(defrecord Record [Name Age Index])
(defn read-csv [fname count]
(with-open [file (io/reader fname)]
(doall (take count (map (comp first csv/read-csv)
(line-seq file))))))
(def records (map #(apply ->Record %) (read-csv "C:/Users/user/Documents/URECA/hi/lib/test.csv" 1)))
This is what i have but it seems to read the columns incrementally
To keep quotes of text fields you can parse csv file by regexp:
(defn read-csv [fname count]
(with-open [file (io/reader fname)]
(doall (map #(str/split % #",") ; doesn't work with commas in text fields
(take count (line-seq file))))))
(defn make-record [idxs types row]
(apply ->Record
(map (fn [idx t]
(let [value (nth row idx)]
(case t
:string value
:int (Integer/parseInt value)
:long (Long/parseLong value))))
idxs types)))
(def records (map (partial make-record
[0 2 4]
[:string :int :long])
(read-csv "/home/mobyte/test.csv" 3)))
(pprint records)
-> ({:Name "\"s1\"", :Age 1, :Index 111}
{:Name "\"s2\"", :Age 2, :Index 112}
{:Name "\"s3\"", :Age 3, :Index 113})
(type (:Age (first records)))
->java.lang.Integer
(type (:Index (first records)))
-> java.lang.Long
(type (:Name (first records)))
-> java.lang.String

Resources