I am looking for some advice or good practice about Threads in common lisp. Basically, I am trying to synchronise some threads with a global variable +clock+ (set as a thread too). I am a bit confused about different concepts as join-process, process-wait, make-mutex/make-lock, condition-variable, etc.
I am using ccl and sbcl, so I should probably use bordeaux-threads, but this is just a convenient way to manage both.
In short, my code works but when I add a thread, instead to share the cpu resources, this one increases beyond 150%.
;; for instance using CCL64 Version 1.12 DarwinX8664
;; --- THREADS-SET-1
(defvar +buffer+ nil)
(defvar +buffer-size+ 30)
(defparameter +compute+
(ccl:process-preset (ccl:make-process "+compute+")
#'(lambda ()
(loop do
(push (do-some-computation) +buffer+)
(sleep 0.1)))
'+compute+))
(defparameter +osc-send+
(ccl:process-preset (ccl:make-process "+osc-send+")
#'(lambda ()
(loop do
(when +buffer+
(OSCsend (car (last +buffer+)))
(setf +buffer+ (butlast +buffer+))
(if (> (length +buffer+) +buffer-size+)
(ccl:process-suspend +compute+)
(ccl:process-resume +compute+)))
(sleep (some-time))))
'+osc-send+))
;; commands:
(progn
(ccl:process-enable +compute+)
(sleep 1)
(ccl:process-enable +osc-send+))
(progn
(ccl:process-suspend +compute+)
(ccl:process-suspend +osc-send+))
(progn
(ccl:process-resume +compute+)
(ccl:process-resume +osc-send+))
(progn
(ccl:process-kill +compute+)
(ccl:process-kill +osc-send+))
;; when these threads as THREADS-SET-1 are 'playing' dx86cl64 takes more than 90% of cpu
;; and when I add some other threads as THREADS-SET-2, the cpu goes beyond 150%
;; needless to say that I did not try to add a third one before to solve this issue...
The final idea is to synchronise THREADS-SET-1 and THREADS-SET-2 with +clock+.
Maybe lparralel can be relevant in that context, if so thanks to illustrate how to use it in that case. Thanks in advance for any help or any reference book to learn about this topic.
Finally I solved the issue by making one single thread for each 'routine' plus one for the 'clock'. Then the cpu activity decreased until around 50% for 3 routines.
Anyway, I will be pleased if somebody could explain why in the first example the threads take all that cpu resources and not when I do one thread by routine. I suspect the use of process-suspend/resume instead of mutex... An affair to follow! :)
Related
In the Lparallel API, the recommended way to terminate all threaded tasks is to stop the kernel with (lparallel:end-kernel). But when a thread is blocking—eg, with (pop-queue queue1) waiting for an item to appear in the queue—it will still be active when the kernel is stopped. In this case (at least in SBCL) the kernel shutdown occasionally (but not every time) fails with:
debugger invoked on a SB-KERNEL:BOUNDING-INDICES-BAD-ERROR in thread
#<THREAD "lparallel" RUNNING {1002F04973}>:
The bounding indices 1 and NIL are bad for a sequence of length 0.
See also:
The ANSI Standard, Glossary entry for "bounding index designator"
The ANSI Standard, writeup for Issue SUBSEQ-OUT-OF-BOUNDS:IS-AN-ERROR
debugger invoked on a SB-SYS:INTERACTIVE-INTERRUPT in thread
#<THREAD "main thread" RUNNING {10012E0613}>:
Interactive interrupt at #x1001484328.
I’m assuming this has something to do with the blocking thread not terminating correctly. How should a blocking thread be properly terminated before shutting down the kernel? (The API says kill-tasks should only be used in exceptional circumstances, which I’m taking not to apply to this “normal” shutdown circumstance.)
The problem with killing a thread is that it might happen anywhere, when the thread could be in any unknown state.
The only way to safely terminate a thread it is to let it shutdown itself gracefully, meaning you expect that during normal operations, there is a way for the thread to know it should stop working. Then you can properly clean your resources, close databases, free foreign pointers, log all things, ...
The queues you are using have operations that can timeout, that is a simple yet safe way to ensure you can avoid blocking forever and exit properly. But that's not the only option (you can use them in addition to what is shown below).
Shared / global flag
When a timeout occurs, or when you receive a message, you check a global boolean variable (or one that is shared among all interested threads). That's also a simple way to exit, and it can be read by multiple threads. This is however a concurrent access, so you should use locks or atomic operations (http://www.sbcl.org/manual/#Atomic-Operations), for example use defglobal and a fixnum type with atomic-incf, etc.
Control messages
Send control data in the queues and use them to determine how to shutdown gracefully, and how to propagate the information down the pipes, or how to restart things. This is safe (just message-passing) and allows any kind of control you might want to implement in your thread.
(defpackage :so (:use :cl :bt :lparallel.queue))
(in-package :so)
Let's define two services.
The first one echoes back its input:
(defun echo (in out)
(lambda ()
(loop
for value = (pop-queue in)
do (push-queue value out)
until (eq value :stop))))
Notice how it is expected to finish properly when given a :stop input, and how it also propagates the :stop message to its output queue.
The second thread will perform a modular addition, and also sleeps a bit between requests:
(defun modulo-adder (x m in out)
(lambda ()
(loop
for value = (progn (sleep 0.02)
(pop-queue in))
do (push-queue (typecase value
(keyword value)
(number (mod (+ x value) m)))
out)
until (eq value :stop))))
Create queues:
(defparameter *q1* (make-queue))
(defparameter *q2* (make-queue))
Create threads:
(progn
(bt:make-thread (echo *q1* *q2*) :name "echo")
(bt:make-thread (modulo-adder 5 1024 *q2* *q1*) :name "adder"))
Both threads are connected to each others in a circular fashion, creating an infinite loop of additions. No value is currently exchanged between threads, and you can see them running for example with slime-list-threads or any other implementation-provided way; In any case (bt:all-threads) returns a list.
slime-list-threads
10 adder Running
11 echo Running
...
Add an item, now there is an infinite exchange of data between threads:
(push-queue 10 *q1*)
Wait, then stop them both:
(push-queue :stop *q1*)
Both threads stopped gracefully (they are no more visible in lists of threads).
We can inspect what remains in the queues (result vary from one test to another):
(list (try-pop-queue *q1*)
(try-pop-queue *q2*))
(99 NIL)
(list (try-pop-queue *q1*)
(try-pop-queue *q2*))
(:STOP NIL)
(list (try-pop-queue *q1*)
(try-pop-queue *q2*))
(NIL NIL)
Interrupting a thread
You create a service, controlled by messages or a global flag, but then you have a bug and the thread hangs. Instead of killing it and lose everything, you want at least to unwind the thread stack properly. This is a dangerous too, but you can use bt:interrupt to stop a thread anywhere it is running right now and execute a function.
(define-condition stop () ())
(defun signal-stop ()
(signal 'stop))
(defun endless ()
(let ((output *standard-output*))
(lambda ()
(print "START" output)
(unwind-protect (handler-case (loop)
(stop ()
(print "INTERRUPTED" output)))
(print "STOP" output)))))
Start it:
(bt:make-thread (endless) :name "loop")
This prints "START" and loops.
Then we interrupt it:
(bt:interrupt-thread (find "loop"
(bt:all-threads)
:test #'string=
:key #'bt:thread-name)
#'signal-stop)
The following is printed:
"INTERRUPTED"
"STOP"
Those messages would not be printed if the thread was killed, but note that you could still manage to have corrupted data given how random the interruption is. Also, it can unblock blocking calls like sleep or pop-queue.
This code is really pushing the limits of my understanding so bear with me.
Previously I implemented coroutines in Racket in the following code:
;; Coroutine definition
(define (make-generator procedure)
(define last-return values)
(define last-value #f)
(define status 'suspended)
(define (last-continuation _)
(let ([result (procedure yield)])
(last-return result)))
(define (yield value)
(call/cc (lambda (continuation)
(set! last-continuation continuation)
(set! last-value value)
(set! status 'suspended)
(last-return value))))
(lambda args
(call/cc (lambda (return)
(set! last-return return)
(cond ((null? args)
(let ()
(set! status 'dead)
(last-continuation last-value)))
((eq? (car args) 'coroutine?) 'coroutine)
((eq? (car args) 'status?) status)
((eq? (car args) 'dead?) (eq? status 'dead))
((eq? (car args) 'alive?) (not (eq? status 'dead)))
((eq? (car args) 'kill!) (set! status 'dead))
(#t (apply last-continuation args)))))))
;;Define a function that will return a suspended coroutine created from given args and body forms
(define-syntax (define-coroutine stx)
(syntax-case stx ()
((_ (name . args) . body )
#`(define (name . args)
(make-generator
(lambda (#,(datum->syntax stx 'yield))
. body))))))
What I want to do is implement an exception handler (with-handlers) that calls the (yield) function. The idea is a second thread can send a signal to the thread evaluating the coroutine forcing it to yield when its running for too long.
I've tried the following in the args lambda, which successfully returned early but later evaluations of the coroutine (my-coroutine 'dead?) returned that the coroutine was in the 'dead state:
(with-handlers
([exn:break?
(lambda (break)
(yield 'coroutine-timeout))])
(break-enabled #t) ;register for yield requests from coroutine manager thread
(last-continuation last-value))))
Alternatively, I've tried the following, but it didn't produce a procedure that can be applied to arguments:
(with-handlers
([exn:break?
(lambda (break)
(set! last-continuation (exn:break-continuation break))
(set! last-value 'coroutine-timeout)
(set! status 'suspended)
(last-return 'coroutine-timeout))])
(break-enabled #t) ;register for yield requests from coroutine manager thread
(last-continuation last-value))))
I'm trying to understand how continuations and exceptions interact/block each other. It seems like I may need to use Parameters somehow?
How can I successfully write a signal handler that will (yield) correctly so that I can resume the coroutine later?
Edit:
I am mixing metaphores here (cooperative and preemptive multithreading). However, my question seems possible to me (from a layman's perspective) as I can evaluate functions defined in my coroutine (including (yield)) from within the exception handler. I'm essentially trying to limit resource starvation in my worker threads, as well as mitigate a certain class of deadlock (where task 1 can only complete after task 2 has run, and there are no free threads for task 2 to run on).
I have written a (go) function for these coroutines that is modeled after go's goroutines. I assume they achieve their asynchronous behavior on single threads by having cooperative yield checks in the underlying code they control. Perhaps it runs in a VM as you suggested and there are checks, perhaps their operators have the checks. Whatever the case may be I'm trying to achieve similar behavior with a different strategy.
As far as "how continuations and exceptions interact/block each other," it's important to know that exceptions are implemented using delimited continuations. In particular, the exception system makes use of continuation barriers. Both of these are introduced in the Racket reference §1.1.12 Prompts, Delimited Continuations, and Barriers:
A continuation barrier is another kind of continuation frame that prohibits certain replacements of the current continuation with another. … A continuation barrier thus prevents “downward jumps” into a continuation that is protected by a barrier. Certain operations install barriers automatically; in particular, when an exception handler is called, a continuation barrier prohibits the continuation of the handler from capturing the continuation past the exception point.
You may also want to see the material on exceptions from later in the evaluation model section and from the control flow section, which cites an academic paper on the subject. The differences between call-with-exception-handler and with-handlers are also relevant to capturing continuations from within exception handlers.
Basically, though, the continuation barrier prevents using exception handlers for continuations that you abort and might later resume: you should use continuation barriers and prompts directly for that.
More broadly, I would suggest that you look at Racket's substantial existing support for concurrency. Even if you want to implement coroutines as an experiment, they would be useful for inspiration and examples of implementation techniques. Racket comes with derived constructs such as engines ("processes that can be preempted by a timer or other external trigger") and generators, in addition to the fundamental building-blocks, green threads and synchronizable events (which are based on Concurrent ML model).
The gist of your question:
How can I implement an exception handler for coroutines, such that a second thread can send
a signal to a thread evaluating a coroutine, forcing it to yield
when its running for too long.
And once more:
How can I successfully write a signal handler that will (yield)
correctly so that I can resume the coroutine later?
It seems to me that you are not cleanly separating cooperative and preemptive multitasking, since you seem to want to combine coroutines (cooperative) with time-outs (preemptive). (You also mention threads, but seem to conflate them with coroutines.)
With cooperative multitasking there is no way that you can force anyone else to stop running; hence the moniker "cooperative".
With preemptive multitasking you do not need to yield, because the scheduler will preempt you when your allocated time has run out. The scheduler is also responsible for saving your continuation, but it is not the (scheduler's) current continuation, since the scheduler is wholly separate from the user thread.
Perhaps the closest thing to what you are proposing is simulating preemptive multitasking via polling. Every (simulated) timestep (i.e. a VM instruction) the simulation needs to check whether any interrupts/signals have been received by a running thread and handle them.
Context
I need to do computations using multi-threading. I use SBCL and portability is not a concern. I am aware that bordeaux-threads and lparallel exist but I want to implement something at the relatively low level provided by the specific SBCL threading implementation. I need maximal speed, even at the expense of readability/programming effort.
Example of computation intensive operation
We can define a sufficiently computation-intensive function that will benefit from multi-threading.
(defun intensive-sqrt (x)
"Dummy calculation for intensive algorithm.
Approx 50 ms for 1e6 iterations."
(let ((y x))
(dotimes (it 1000000 t)
(if (> y 1.01d0)
(setf y (sqrt y))
(setf y (* y y y))))
y))
Mapping each computation to a thread and execute
Given a list of argument-lists llarg and a function fun, we want to compute nthreads results and return the list of results res-list. Here is what I came up with using the resources I found (see below).
(defmacro splice-arglist-help (fun arglist)
"Helper macro.
Splices a list 'arglist' (arg1 arg2 ...) into the function call of 'fun'
Returns (funcall fun arg1 arg2 ...)"
`(funcall ,fun ,#arglist))
(defun splice-arglist (fun arglist)
(eval `(splice-arglist-help ,fun ,arglist)))
(defun maplist-fun-multi (fun llarg nthreads)
"Maps 'fun' over list of argument lists 'llarg' using multithreading.
Breaks up llarg and feeds it to each thread.
Appends all the result lists at the end."
(let ((thread-list nil)
(res-list nil))
;; Create and run threads
(dotimes (it nthreads t)
(let ((larg-temp (elt llarg it)))
(setf thread-list (append thread-list
(list (sb-thread:make-thread
(lambda ()
(splice-arglist fun larg-temp))))))))
;; Join threads
;; Threads are joined in order, not optimal for speed.
;; Should be joined when finished ?
(dotimes (it (list-length thread-list) t)
(setf res-list (append res-list (list (sb-thread:join-thread (elt thread-list it))))))
res-list))
nthreads does not necessarily match the length of llarg, but I avoid the extra book-keeping just for the example simplicity's sake. I also omitted the various declare used for optimization.
We can test the multi-threading and compare timings using :
(defparameter *test-args-sqrt-long* nil)
(dotimes (it 10000 t)
(push (list (+ 3d0 it)) *test-args-sqrt-long*))
(time (intensive-sqrt 5d0))
(time (maplist-fun-multi #'intensive-sqrt *test-args-sqrt-long* 100))
The number of threads is quite high. I think the optimum would be to use as many threads as the CPU has, but I noticed the performance drop-off is barely noticeable in terms of time/operations. Doing more operations would involve breaking up the input lists into smaller pieces.
The above code outputs, on a 2 cores/4 threads machine :
Evaluation took:
0.029 seconds of real time
0.015625 seconds of total run time (0.015625 user, 0.000000 system)
55.17% CPU
71,972,879 processor cycles
22,151,168 bytes consed
Evaluation took:
1.415 seconds of real time
4.703125 seconds of total run time (4.437500 user, 0.265625 system)
[ Run times consist of 0.205 seconds GC time, and 4.499 seconds non-GC time. ]
332.37% CPU
3,530,632,834 processor cycles
2,215,345,584 bytes consed
What's bugging me
The example I've given works very well and is robust (ie results don't get mixed up between threads, and I experience no crash). The speed gain is also there and the computations do use several cores/threads on the machines I've tested this code on. But there are a few things that I'd like an opinion/help on :
The use of the argument list llarg and larg-temp. Is this really necessary ? Is there any way to avoid manipulating potentially huge lists ?
Threads are joined in the order in which they are stored in the thread-list. I imagine this would not be optimal if operations each took a different time to complete. Is there a way to join each thread when it is finished, instead of waiting ?
The answers should be in the resources I already found, but I find the more advanced stuff hard to grapple with.
Resources found so far
http://www.sbcl.org/manual/#Threading
http://cl-cookbook.sourceforge.net/process.html
https://lispcookbook.github.io/cl-cookbook/process.html
Stylistic issues
The splice-arglist helpers are not needed at all (so I'll also skip details in them). Use apply in your thread function instead:
(lambda ()
(apply fun larg-temp))
You don't need to (and should not) index into a list, because that is O(n) for each lookup—your loops are quadratic. Use dolist for simple side-effective loops, or loop when you have e. g. parallel iteration:
(loop :repeat nthreads
:for args :in llarg
:collect (sb-thread:make-thread (lambda () (apply fun args))))
For going over a list while creating a new list of the same length where each element is calculated from the corresponding element in the source list, use mapcar:
(mapcar #'sb-thread:join-thread threads)
Your function thus becomes:
(defun map-args-parallel (fun arglists nthreads)
(let ((threads (loop :repeat nthreads
:for args :in arglists
:collect (sb-thread:make-thread
(lambda ()
(apply fun args))))))
(mapcar #'sb-thread:join-thread threads)))
Performance
You are right that one usually creates only as many threads as ca. the number of cores available. If you test performance by always creating n threads, then joining them, then going to the next batch, you will indeed have not much difference in performance. That is because the inefficiency lies in creating the threads. A thread is about as resource intensive as a process.
What one usually does is to create a thread pool where the threads do not get joined, but instead reused. For that, you need some other mechanism to communicate arguments and results, e. g. channels (e. g. from chanl).
Note however that e. g. lparallel already provides a pmap function, and it does things right. The purpose of such wrapper libraries is not only to give the user (programmer) a nice interface, but also to think really hard about the problems and optimize sensibly. I am quite confident that pmap will be significantly faster than your attempt.
What is the idiomatic Clojure way to create a thread that loops in the background doing updates to some shared refs and to manage its lifetime? I find myself using future for this, but it feels like a little bit of a hack as I never return a meaningful value. E.g.:
(future (loop [] (do
(Thread/sleep 100)
(dosync (...))
(recur))))
Also, I need to be careful to future-cancel this when the background processing is no longer needed. Any tips on how to orchestrate that in a Clojure/Swing application would be nice. E.g. a dummy JComponent that is added to my UI that is responsible for killing the thread when the window is closed may be an idea.
You don't need a do in your loop; it's implied. Also, while there's nothing wrong with an unconditional loop-recur, you may as well use (while true ...).
future is a fine tool for this; don't let it bother you that you never get a value back. That should really bother you if you use an agent rather than a future, though - agents without values are madness.
However, who said you need to future-cancel? Just make one of the steps in your future be to check whether it's still needed. Then no other parts of your code need to keep track of futures and decide when to cancel them. So something like
(future (loop []
(Thread/sleep 100)
(when (dosync
(alter some-value some-function))
(recur)) ; quit if alter returns nil
))
would be a viable approach.
Using agents for background recurring tasks seems neater to me
(def my-ref (ref 0))
(def my-agent (agent nil))
(defn my-background-task [x]
(do
(send-off *agent* my-background-task)
(println (str "Before " #my-ref))
(dosync (alter my-ref inc))
(println "After " #my-ref)
(Thread/sleep 1000)))
Now all you have to do is to initiate the loop
(send-off my-agent my-background-task)
The my-backgound-task function is sending itself to the calling agent after its invocation is done.
This is the way how Rich Hickey performs recurring tasks in the ant colony example application: Clojure Concurrency
When I run Clojure code from the Swank repl in emacs, the main thread will print out messages using printf to the repl. But if I run agents or explicitly create other threads which also print, sometimes the output doesn't show up, and other times it shows up in the console window where I'm running Swank. I'd love to understand why.
Edit: Thanks to Daniel's answer below I now know that the other threads do not have out bound to the output of the REPL. This code works because you pass in the out from where you run from. However my new problem is that this code now blocks per thread so rather than running in parallel it runs each thread one at a time, so I need a more thread aware output method.
(defn sleeper-thread [out id t]
"Sleep for time T ms"
(binding [*out* out]
(printf "%d sleeping for time %d\n" id t)
(Thread/sleep t)
(printf "%d slept\n" id)))
(defn test-threads [n out]
(dotimes [x n]
(.start (Thread. (#(sleeper-thread %1 %2 %3) out x (+ 2000 (rand-int 5000)))))))
The reason is, that in other threads *out* is not bound to the REPL's stream. Try something like this:
(let [repl-out *out*]
(defn foo []
(binding [*out* repl-out]
...)))
Now, when running foo from another thread, *out* will be bound to whatever it was when you defined the function (i.e. the SLIME REPL), so printing will work as expected.
Or, for testing:
(defmacro future-output [& body]
`(let [out# *out*]
(future
(binding [*out* out#]
~#body))))
Note: This is untested, because I have no working Clojure/SLIME here atm, but that code worked a few months ago. There might be differences in newer Versions of Clojure (1.3 Alpha 2):
code path for using vars is now
much faster for the common case,
and you must explicitly ask for :dynamic bindability
If you are struggling with the same using cake, there should be a log file with the output in the .cake/cake.log file in your project root (where project.clj lives).