Cosine Similarity in linear time using Lisp - python-3.x

The cosine similarity of two lists can be calculated in linear time using a for-loop. I'm curious as to how one would achieve this using a Lisp-like language. Below is an example of my code in Python and Hy (Hylang).
Python:
def cos_sim(A,B):
import math as _math
n,da,db,d = 0,0,0,0
for a,b in zip(A,B):
n += a*b
da += a*a
db += b*b
da = _math.sqrt(da)
db = _math.sqrt(db)
d = da*db
return n / (d + 1e-32)
Hy (Lisp):
(import math)
(defn l2norm [a]
(math.sqrt (reduce + (map (fn [s](* s s)) a))))
(defn dot [a b]
(reduce + (map * a b)))
(defn cossim [a b]
(/ (dot a b) (* (l2norm a) (l2norm b))))

"I'm curious as to how one would achieve this using a Lisp-like language." This really depends on which Lisp you are using. In Scheme you might do something similar to the posted Hy solution:
(define (cos-sim-1 u v)
(/ (dot-prod u v)
(* (norm u) (norm v))))
(define (dot-prod u v)
(fold-left + 0 (map * u v)))
(define (norm u)
(sqrt (fold-left (lambda (acc x) (+ acc (* x x)))
0
u)))
This is linear in time complexity, but it could be improved by a constant factor by passing over the input only once. Scheme provides a named let construct that can be used to bind a name to a procedure; this is convenient here as it provides a simple mechanism for building the dot product and norms:
(define (cos-sim-2 u v)
(let iter ((u u)
(v v)
(dot-product 0)
(U^2 0)
(V^2 0))
(if (null? u)
(/ dot-product (sqrt (* U^2 V^2)))
(let ((x (car u))
(y (car v)))
(iter (cdr u)
(cdr v)
(+ dot-product (* x y))
(+ U^2 (* x x))
(+ V^2 (* y y)))))))
Both of these procedures assume that the input lists have the same length; it might be useful to add some validation code that checks this. Note that fold-left is standard in R6RS Scheme, but other standards rely on SRFIs for this, and some implementations may use different names, but the fold-left functionality is commonly available (perhaps as foldl or reduce).
It is possible to solve the problem in Common Lisp using either of the basic methods shown above, though in Common Lisp you would use labels instead of named let. But it would be typical to see a Common Lisp solution using the loop macro. The Common Lisp standard does not guarantee tail call elimination (though some implementations do support that), so explicit loops are seen much more often than in Scheme. The loop macro is pretty powerful, and one way that you could solve this problem while passing over the input lists only once is this:
(defun cos-sim (u v)
(loop :for x :in u
:for y :in v
:sum (* x y) :into dot-product
:sum (* x x) :into u2
:sum (* y y) :into y2
:finally (return (/ dot-product (sqrt (* u2 y2))))))
Here are some sample interactions:
Scheme (Chez Scheme):
> (cos-sim-1 '(1 0 0) '(1 0 0))
1
> (cos-sim-1 '(1 0 0) '(-1 0 0))
-1
> (cos-sim-1 '(1 0 0) '(0 1 0))
0
> (cos-sim-1 '(1 1 0) '(0 1 0))
0.7071067811865475
> (cos-sim-2 '(1 0 0) '(1 0 0))
1
> (cos-sim-2 '(1 0 0) '(-1 0 0))
-1
> (cos-sim-2 '(1 0 0) '(0 1 0))
0
> (cos-sim-2 '(1 1 0) '(0 1 0))
0.7071067811865475
Common Lisp:
CL-USER> (cos-sim '(1 0 0) '(1 0 0))
1.0
CL-USER> (cos-sim '(1 0 0) '(-1 0 0))
-1.0
CL-USER> (cos-sim '(1 0 0) '(0 1 0))
0.0
CL-USER> (cos-sim '(1 1 0) '(0 1 0))
0.70710677

A simple option is to translate the Python version literally to Hy, like this:
(defn cos_sim [A B]
(import math :as _math)
(setv [n da db d] [0 0 0 0])
(for [[a b] (zip A B)]
(+= n (* a b))
(+= da (* a a))
(+= db (* b b)))
(setv
da (_math.sqrt da)
db (_math.sqrt db)
d (* da db))
(/ n (+ d 1e-32)))

I think your proposed solution is fairly 'lispy': build several short, easy to read functions that combine into your solution. EG:
(defun n (A B)
(sqrt (reduce #'+ (map 'list #'* A B))))
(defun da (A)
(sqrt (reduce #'+ (map 'list #'* A A))))
(defun db (B)
(sqrt (reduce #'+ (map 'list #'* B B))))
(defun cos-sim (A B)
(let ((n (n A B))
(da (da A))
(db (db B)))
(/ (* n n) (+ (* da db) 1e-32)))
But, notice that n, da, and db look very similar. We can see if we can make those a single function, or macro. In this case, a function with an optional second list parameter is easy enough. (And note that I've defined n in a slightly weird way to emphasize this, but we might prefer not to take a square root and then square it for our final calculation. This would be easy to change by checking for passing the optional parameter (included as B-p below); I chose to move the square root inside the combined function) Anyway, this gives us:
(defun d (A &optional (B A B-p))
(reduce #'+ (map 'list #'* A B)))
(defun cos-sim (A B)
(let ((n (d A B))
(da (sqrt (d A)))
(db (sqrt (d B))))
(/ n (+ (* da db) 1e-32))))
Alternately, using Loop is very Common Lisp-y, and is more directly similar to the python:
(defun cos-sim (A B)
(loop for a in A
for b in B
sum (* a b) into n
sum (* a a) into da
sum (* b b) into db
finally (return (/ n (+ (sqrt (* da db)) 1e-32)))))

Here is a fairly natural (I think) approach in Racket. Essentially this is a process of folding a pair of sequences of numbers, so that's what we do. Note that this uses no explicit assignment, and also pulls the square root up a level (sqrt(a) * sqrt(b) = sqrt(a*b) as taking roots is likely expensive (this probably does not matter in practice). It also doesn't do the weird adding of a tiny float, which I presume was an attempt to coerce a value which might not be a float to a float? If so that's the wrong way to do that, and it's also not needed in a language like Racket (and most Lisps) which strive to do arithmetic correctly where possible.
(define (cos-sim a b)
;; a and b are sequences of numbers
(let-values ([(a^2-sum b^2-sum ab-sum)
(for/fold ([a^2-running 0]
[b^2-running 0]
[ab-running 0])
([ai a] [bi b])
(values (+ (* ai ai) a^2-running)
(+ (* bi bi) b^2-running)
(+ (* ai bi) ab-running)))])
(/ ab-sum (sqrt (* a^2-sum b^2-sum)))))
You can relatively easily turn this into typed Racket:
(define (cos-sim (a : (Sequenceof Number))
(b : (Sequenceof Number)))
: Number
(let-values ([(a^2-sum b^2-sum ab-sum)
(for/fold ([a^2-running : Number 0]
[b^2-running : Number 0]
[ab-running : Number 0])
([ai a] [bi b])
(values (+ (* ai ai) a^2-running)
(+ (* bi bi) b^2-running)
(+ (* ai bi) ab-running)))])
(/ ab-sum (sqrt (* a^2-sum b^2-sum)))))
This probably is no faster, but it is fussier.
This might be faster though:
(define (cos-sim/flonum (a : (Sequenceof Flonum))
(b : (Sequenceof Flonum)))
: Flonum
(let-values ([(a^2-sum b^2-sum ab-sum)
(for/fold ([a^2-running : Flonum 0.0]
[b^2-running : Flonum 0.0]
[ab-running : Flonum 0.0])
([ai a] [bi b])
(values (+ (* ai ai) a^2-running)
(+ (* bi bi) b^2-running)
(+ (* ai bi) ab-running)))])
(/ ab-sum (assert (sqrt (* a^2-sum b^2-sum)) flonum?))))
I have not checked it is however.

Your Hy example is already linear time. None of the nested loops multiply their number of iterations based on the length of input. It could be simplified to make this easier to see
(import math)
(defn dot [a b]
(sum (map * a b)))
(defn l2norm [a]
(math.sqrt (dot a a)))
(defn cossim [a b]
(/ (dot a b) (* (l2norm a) (l2norm b))))
I think this version is clearer than the Python version, because it's closer to the math notation.
Let's also inline the l2norm to make the number of loops easier to see.
(defn cossim [a b]
(/ (dot a b)
(* (math.sqrt (dot a a))
(math.sqrt (dot b b)))))
Python's map() is lazy, so the sum() and map() together only loop once. You effectively have three loops, one for each dot, and none of them are nested. Your Python version had one loop, but it was doing more calculations each iteration. Theoretically, it doesn't matter if you calculate row-by-row or column-by-column: multiplication is commutative, either rows by columns or columns by rows are the same number of calculations.
However, in practice, Python does have significant overhead for function calls, so I would expect the Hy version using higher-order functions to be slower than the Python version that doesn't have any function calls in the loop body. This is a constant factor slowdown, so it's still linear time.
If you want fast loops for calculations in Python, put your data in a matrix and use Numpy.

Related

Racket - string->list returns strange results [duplicate]

I want to calculate the sum of digits of a number in Scheme. It should work like this:
>(sum-of-digits 123)
6
My idea is to transform the number 123 to string "123" and then transform it to a list '(1 2 3) and then use (apply + '(1 2 3)) to get 6.
but it's unfortunately not working like I imagined.
>(string->list(number->string 123))
'(#\1 #\2 #\3)
Apparently '(#\1 #\2 #\3) is not same as '(1 2 3)... because I'm using language racket under DrRacket, so I can not use the function like char->digit.
Can anyone help me fix this?
An alternative method would be to loop over the digits by using modulo. I'm not as used to scheme syntax, but thanks to #bearzk translating my Lisp here's a function that works for non-negative integers (and with a little work could encompass decimals and negative values):
(define (sum-of-digits x)
(if (= x 0) 0
(+ (modulo x 10)
(sum-of-digits (/ (- x (modulo x 10)) 10)))))
Something like this can do your digits thing arithmetically rather than string style:
(define (digits n)
(if (zero? n)
'()
(cons (remainder n 10) (digits2 (quotient n 10))))
Anyway, idk if its what you're doing but this question makes me think Project Euler. And if so, you're going to appreciate both of these functions in future problems.
Above is the hard part, this is the rest:
(foldr + (digits 12345) 0)
OR
(apply + (digits 1234))
EDIT - I got rid of intLength above, but in case you still want it.
(define (intLength x)
(define (intLengthP x c)
(if (zero? x)
c
(intLengthP (quotient x 10) (+ c 1))
)
)
(intLengthP x 0))
Those #\1, #\2 things are characters. I hate to RTFM you, but the Racket docs are really good here. If you highlight string->list in DrRacket and hit F1, you should get a browser window with a bunch of useful information.
So as not to keep you in the dark; I think I'd probably use the "string" function as the missing step in your solution:
(map string (list #\a #\b))
... produces
(list "a" "b")
A better idea would be to actually find the digits and sum them. 34%10 gives 4 and 3%10 gives 3. Sum is 3+4.
Here's an algorithm in F# (I'm sorry, I don't know Scheme):
let rec sumOfDigits n =
if n<10 then n
else (n%10) + sumOfDigits (n/10)
This works, it builds on your initial string->list solution, just does a conversion on the list of characters
(apply + (map (lambda (d) (- (char->integer d) (char->integer #\0)))
(string->list (number->string 123))))
The conversion function could factored out to make it a little more clear:
(define (digit->integer d)
(- (char->integer d) (char->integer #\0)))
(apply + (map digit->integer (string->list (number->string 123))))
(define (sum-of-digits num)
(if (< num 10)
num
(+ (remainder num 10) (sum-of-digits (/ (- num (remainder num 10)) 10)))))
recursive process.. terminates at n < 10 where sum-of-digits returns the input num itself.

Coloring N segments using M colors with dynamic programming

Problem: I have N contiguous segments numbered from 1 to N and M colors also numbered from 1 to M.
Now, there are two numbers U and V defined as:
U = color(i) + color(j)
V = color(j) + color(k)
U, V are coprime.
where 1 <= i,j,k <= N and
j = i+1, k=j+1
Problem is to find the number of ways that all N segments can be colored such that the above property holds for all i,j,k.
Is there a dynamic programming solution to this problem? What is it?
I have a recursive but non-[dynamic programming] implementation of this that should help get you pointed in the right direction. It's implemented in Common Lisp since there's no language specified.
The way to extend it to be a dynamic programming solution would be to add a cache.
count-all-coprime-triple-colorings constructs all the colorings in memory and then checks each of them for satisfying the coprime triple condition.
count-all-coprime-triple-colorings-lazy tries to aggressively prune the colorings we even consider by ruling out colorings with a prefix that doesn't satisfy the coprime condition.
This approach could be improved by noting that only the last two elements of the prefix are relevant, so you could use that to populate the cache.
(defun coprime-p (a b)
"check whether a and b are coprime"
(= (gcd a b) 1))
(defun coprime-triple-p (a b c)
"check whether (a+b) and (b+c) are coprime"
(coprime-p (+ a b) (+ b c)))
(defun coprime-triple-sequence-p (seq)
"check whether seq is a sequence of corpime triples"
(cond
;; if the length is less than 2 then
;; every triple is trivially coprime
((<= (length seq) 2) t)
(t (let
((a (nth 0 seq))
(b (nth 1 seq))
(c (nth 2 seq))
(tail (cdr seq)))
(if (coprime-triple-p a b c)
(coprime-triple-sequence-p tail)
nil)))))
(defun curry-cons (x)
"curried cons operator"
(lambda (list) (cons x list)))
(defun all-colorings (sections colors)
"generate all possible #colors-colorings of sections"
(assert (>= sections 0))
(assert (>= colors 1))
(cond
;; if there are no sections
;; then there are no colorings
((= sections 0) ())
;; when we have one section there is one coloring
;; for each color
((= sections 1) (loop for i from 1 upto colors collecting (list i)))
(t
;; wildly inefficient
(loop for i from 1 upto colors appending
(mapcar (curry-cons i) (all-colorings (1- sections) colors))))))
(defun count-all-coprime-triple-colorings (sections colors)
"count all the colorings that have coprime triples"
(loop for i in (all-colorings sections colors) counting (coprime-triple-sequence-p i)))
(defun coprime-triple-check-boundary (reversed-prefix suffix)
"prefix = [...a, b] ; suffix = [c,...] ; check
gcd(a+b, b+c) != 1"
;; if there aren't enough elements in reversed-prefix and suffix
;; then we admit the list
(if (and (nth 1 reversed-prefix) (nth 0 suffix))
(let
((b (nth 0 reversed-prefix)) (a (nth 1 reversed-prefix)) (c (nth 0 suffix)))
(coprime-triple-p a b c))
t))
(defun count-all-coprime-triple-colorings-lazy (sections colors reversed-prefix)
"count the number of sequences with coprime triples with a particular number
of sections and colors with a particular reversed-prefix."
(let
((sections-- (1- sections)))
(cond
((= sections 0) 1)
(t (loop for i from 1 upto colors summing
(if (coprime-triple-check-boundary reversed-prefix (list i))
(count-all-coprime-triple-colorings-lazy sections-- colors (cons i reversed-prefix))
0))))))
(defun summarize-coloring (i j)
"summarize the given coloring number"
(print (list "triples" i "colors" j
(count-all-coprime-triple-colorings-lazy i j nil))))
(loop for i from 1 upto 9 doing
(loop for j from 1 upto 9 doing (summarize-coloring i j)))

A Replace Function in Lisp That Duplicates Mathematica Functionality

What is the easiest way to accomplish the following in a Mathematica clone or in any version of Lisp(any language is probably okay actually even Haskell)? It doesn't appear any lisps have a similar replace function.
Replace[{
f[{x, "[", y, "]"}],
f#f[{x, "[", y, y2, "]"}]
}
, f[{x_, "[", y__, "]"}] :> x[y],
Infinity]
and a return value of {x[y], f[x[y, y2]]}
It replaces all instances of f[{x_, "[", y__, "]"}] in args where x_ represents a single variable and y__ represents one or more variables.
In lisp the function and replacement would probably be the equivalent(forgive me I am not the best with Lisp). I'm looking for a function of the form (replace list search replace).
(replace
'(
(f (x "[" y "]"))
(f (f '(x "[" y y2 "]")))
)
'(f (x_ "[" y__ "]"))
'(x y)
)
and get a return value of ((x y) (f (x y y2))).
Let's give it another try.
First, install quicklisp and use it to fetch, install and load optima and alexandria.
(ql:quickload :optima)
(ql:quickload :alexandria)
(use-package :alexandria)
The functions from alexandria referenced below are ensure-list and last-elt. If you don't have them installed, you can use the following definitions:
(defun ensure-list (list) (if (listp list) list (list list)))
(defun last-elt (list) (car (last list)))
We define rules as functions from one form to another.
Below, the function tries to destructure the input as (f (<X> "[" <ARGS> "]"), where <ARGS> is zero or more form. If destructuring fails, we return NIL (we expect non-matching filters to return NIL hereafter).
(defun match-ugly-funcall (form)
(optima:match form
((list 'f (cons x args))
(unless (and (string= "[" (first args))
(string= "]" (last-elt args)))
(optima:fail))
`(,x ,#(cdr (butlast args))))))
(match-ugly-funcall '(f (g "[" 1 3 5 4 8 "]")))
; => (G 1 3 5 4 8)
Then, we mimic Mathematica's Replace with this function, which takes a form and a list of rules to be tried. It is possible to pass a single rule (thanks to ensure-list). If a list of list of rules is given, a list of matches should be returned (to be done).
(defun match-replace (form rules &optional (levelspec '(0)))
(setf rules (ensure-list rules))
(multiple-value-bind (match-levelspec-p recurse-levelspec-p)
(optima:ematch levelspec
((list n1 n2) (if (some #'minusp (list n1 n2))
(optima:fail)
(values (lambda (d) (<= n1 d n2))
(lambda (d) (< d n2)))))
((list n) (if (minusp n)
(optima:fail)
(values (lambda (d) (= d n))
(lambda (d) (< d n)))))
(:infinity (values (constantly t) (constantly t))))
(labels
((do-replace (form depth)
(let ((result
(and (funcall match-levelspec-p depth)
(some (lambda (r) (funcall r form)) rules))))
(cond
(result (values result t))
((and (listp form)
(funcall recurse-levelspec-p depth))
(incf depth)
(do (newlist
(e (pop form) (pop form)))
((endp form) (values form nil))
(multiple-value-bind (result matchedp) (do-replace e depth)
(if matchedp
(return (values (nconc (nreverse newlist)
(list* result form)) t))
(push e newlist)))))
(t (values form nil))))))
(do-replace form 0))))
And a test:
(match-replace '(a b (f (x "[" 1 2 3 "]")) c d)
#'match-ugly-funcall
:infinity)
; => (A B (X 1 2 3) C D)
; T
In order to replace all expressions instead of the first matching one, use this instead:
(defun match-replace-all (form rules &optional (levelspec '(0)))
(setf rules (ensure-list rules))
(multiple-value-bind (match-levelspec-p recurse-levelspec-p)
(optima:ematch levelspec
((list n1 n2) (if (some #'minusp (list n1 n2))
(optima:fail)
(values (lambda (d) (<= n1 d n2))
(lambda (d) (< d n2)))))
((list n) (if (minusp n)
(optima:fail)
(values (lambda (d) (= d n))
(lambda (d) (< d n)))))
(:infinity (values (constantly t) (constantly t))))
(labels
((do-replace (form depth)
(let ((result
(and (funcall match-levelspec-p depth)
(some (lambda (r) (funcall r form)) rules))))
(cond
(result result)
((and (listp form)
(funcall recurse-levelspec-p depth))
(incf depth)
(mapcar (lambda (e) (do-replace e depth)) form))
(t form)))))
(do-replace form 0))))
Oh boy, how Mathematica manages to obfuscate everything by applying its renown NIH approach.
Basically, you're looking for a function to perform string replacement according to some pattern. In most languages, this is accomplished with regular expressions.
For instance, in Common Lisp using the cl-ppcre library it will look something like this:
(cl-ppcre:regex-replace-all
;; regular expression you match against with groups
"f\\[{(x[^ ]*), \"\\[\", ((y[^ ]* ?)+), \"\\]\"}\\]"
;; your string
"{f[{x, \"[\", y, \"]\"}], f#f[{x, \"[\", y, y2, \"]\"}]}"
;; substitution expression using groups 1 & 2
"\\1[\\2]")
Surely, you can write a specialized 20-line function for this problem of matching and substituting subtrees using subst and recursion, but if all that you want is cases similar to the presented one you can get away with a simple regex-based approach.

Nested FLET block inside LET block (and vice-versa)

Is it considered idiomatic or non-idiomatic to have a LET block nested inside aFLET/LABELS block ?
Again, I may be coming at this all wrong, but I'm trying to mimic the generic where block in Haskell (so I have a defun and I want to write code that uses certain temporary bindings for values and functions.
In case these are non-idiomatic (after all, I shouldn't expect to transfer over usage from one language to another), what's the right way to do this ?
E.g. something like (stupid example follows ...)
(defun f (x) (
(let* ((x 4)
(y (1+ x))
(flet ((g (x) (+ 2 x)))
(g y))))
You want to know if it's a difference of preference between:
(defun f (x)
(let* ((x 4) (y (1+ x)))
(flet ((g (x) (+ 2 x)))
(g y))))
and
(defun f (x)
(flet ((g (x) (+ 2 x)))
(let* ((x 4) (y (1+ x)))
(g y))))
?
It really doesn't matter which order you put flet/labels and let/let* in this case. It will produce the same result and your CL implementation might optimize your code such that the result would be the same anyway.
In a LISP-1 you would have put it in the same let and then the question would be if you should put the lambda first or last. Seems like taste to me.
The only case where there is a difference is when you are making calculations that are free variables in your function. Like this:
(defun f (x)
(let ((y (1+ x)))
(flet ((g (x) (+ 2 x y))) ; y is free, made in the let*
(g x))))
(f 5) ; ==> 13
Switching order is now impossible without moving logic since the function uses a free variable. You could put the let inside the definition of g like this:
(defun f (x)
(flet ((g (z) ; renamed to not shadow original x
(let* ((y (1+ x)))
(+ 2 z y)))
(g x))))
But imagine you used it with mapcar, reduce or recursion. Then it would have done the calculation for every iteration instead of once before the call. These are the cases that really matter.

Scheme equivalent to Haskell where clause

I am just learning scheme, but I would love to be able to repeat myself less.
Is there a way I can assign a name to a subexpression in the local scope?
As per the comment:
Haskell where clause
x = s * t
where s = 10
t = 20
x should be 200 in this case.
Let (or letrec for recursive bindings), e.g.:
(define (f g)
(let ((x 1) (y (* g 2)))
(+ x y)))

Resources