Understanding output in Clojure using swank/slime - multithreading

When I run Clojure code from the Swank repl in emacs, the main thread will print out messages using printf to the repl. But if I run agents or explicitly create other threads which also print, sometimes the output doesn't show up, and other times it shows up in the console window where I'm running Swank. I'd love to understand why.
Edit: Thanks to Daniel's answer below I now know that the other threads do not have out bound to the output of the REPL. This code works because you pass in the out from where you run from. However my new problem is that this code now blocks per thread so rather than running in parallel it runs each thread one at a time, so I need a more thread aware output method.
(defn sleeper-thread [out id t]
"Sleep for time T ms"
(binding [*out* out]
(printf "%d sleeping for time %d\n" id t)
(Thread/sleep t)
(printf "%d slept\n" id)))
(defn test-threads [n out]
(dotimes [x n]
(.start (Thread. (#(sleeper-thread %1 %2 %3) out x (+ 2000 (rand-int 5000)))))))

The reason is, that in other threads *out* is not bound to the REPL's stream. Try something like this:
(let [repl-out *out*]
(defn foo []
(binding [*out* repl-out]
...)))
Now, when running foo from another thread, *out* will be bound to whatever it was when you defined the function (i.e. the SLIME REPL), so printing will work as expected.
Or, for testing:
(defmacro future-output [& body]
`(let [out# *out*]
(future
(binding [*out* out#]
~#body))))
Note: This is untested, because I have no working Clojure/SLIME here atm, but that code worked a few months ago. There might be differences in newer Versions of Clojure (1.3 Alpha 2):
code path for using vars is now
much faster for the common case,
and you must explicitly ask for :dynamic bindability

If you are struggling with the same using cake, there should be a log file with the output in the .cake/cake.log file in your project root (where project.clj lives).

Related

Correct way to do multithreaded computations in SBCL

Context
I need to do computations using multi-threading. I use SBCL and portability is not a concern. I am aware that bordeaux-threads and lparallel exist but I want to implement something at the relatively low level provided by the specific SBCL threading implementation. I need maximal speed, even at the expense of readability/programming effort.
Example of computation intensive operation
We can define a sufficiently computation-intensive function that will benefit from multi-threading.
(defun intensive-sqrt (x)
"Dummy calculation for intensive algorithm.
Approx 50 ms for 1e6 iterations."
(let ((y x))
(dotimes (it 1000000 t)
(if (> y 1.01d0)
(setf y (sqrt y))
(setf y (* y y y))))
y))
Mapping each computation to a thread and execute
Given a list of argument-lists llarg and a function fun, we want to compute nthreads results and return the list of results res-list. Here is what I came up with using the resources I found (see below).
(defmacro splice-arglist-help (fun arglist)
"Helper macro.
Splices a list 'arglist' (arg1 arg2 ...) into the function call of 'fun'
Returns (funcall fun arg1 arg2 ...)"
`(funcall ,fun ,#arglist))
(defun splice-arglist (fun arglist)
(eval `(splice-arglist-help ,fun ,arglist)))
(defun maplist-fun-multi (fun llarg nthreads)
"Maps 'fun' over list of argument lists 'llarg' using multithreading.
Breaks up llarg and feeds it to each thread.
Appends all the result lists at the end."
(let ((thread-list nil)
(res-list nil))
;; Create and run threads
(dotimes (it nthreads t)
(let ((larg-temp (elt llarg it)))
(setf thread-list (append thread-list
(list (sb-thread:make-thread
(lambda ()
(splice-arglist fun larg-temp))))))))
;; Join threads
;; Threads are joined in order, not optimal for speed.
;; Should be joined when finished ?
(dotimes (it (list-length thread-list) t)
(setf res-list (append res-list (list (sb-thread:join-thread (elt thread-list it))))))
res-list))
nthreads does not necessarily match the length of llarg, but I avoid the extra book-keeping just for the example simplicity's sake. I also omitted the various declare used for optimization.
We can test the multi-threading and compare timings using :
(defparameter *test-args-sqrt-long* nil)
(dotimes (it 10000 t)
(push (list (+ 3d0 it)) *test-args-sqrt-long*))
(time (intensive-sqrt 5d0))
(time (maplist-fun-multi #'intensive-sqrt *test-args-sqrt-long* 100))
The number of threads is quite high. I think the optimum would be to use as many threads as the CPU has, but I noticed the performance drop-off is barely noticeable in terms of time/operations. Doing more operations would involve breaking up the input lists into smaller pieces.
The above code outputs, on a 2 cores/4 threads machine :
Evaluation took:
0.029 seconds of real time
0.015625 seconds of total run time (0.015625 user, 0.000000 system)
55.17% CPU
71,972,879 processor cycles
22,151,168 bytes consed
Evaluation took:
1.415 seconds of real time
4.703125 seconds of total run time (4.437500 user, 0.265625 system)
[ Run times consist of 0.205 seconds GC time, and 4.499 seconds non-GC time. ]
332.37% CPU
3,530,632,834 processor cycles
2,215,345,584 bytes consed
What's bugging me
The example I've given works very well and is robust (ie results don't get mixed up between threads, and I experience no crash). The speed gain is also there and the computations do use several cores/threads on the machines I've tested this code on. But there are a few things that I'd like an opinion/help on :
The use of the argument list llarg and larg-temp. Is this really necessary ? Is there any way to avoid manipulating potentially huge lists ?
Threads are joined in the order in which they are stored in the thread-list. I imagine this would not be optimal if operations each took a different time to complete. Is there a way to join each thread when it is finished, instead of waiting ?
The answers should be in the resources I already found, but I find the more advanced stuff hard to grapple with.
Resources found so far
http://www.sbcl.org/manual/#Threading
http://cl-cookbook.sourceforge.net/process.html
https://lispcookbook.github.io/cl-cookbook/process.html
Stylistic issues
The splice-arglist helpers are not needed at all (so I'll also skip details in them). Use apply in your thread function instead:
(lambda ()
(apply fun larg-temp))
You don't need to (and should not) index into a list, because that is O(n) for each lookup—your loops are quadratic. Use dolist for simple side-effective loops, or loop when you have e. g. parallel iteration:
(loop :repeat nthreads
:for args :in llarg
:collect (sb-thread:make-thread (lambda () (apply fun args))))
For going over a list while creating a new list of the same length where each element is calculated from the corresponding element in the source list, use mapcar:
(mapcar #'sb-thread:join-thread threads)
Your function thus becomes:
(defun map-args-parallel (fun arglists nthreads)
(let ((threads (loop :repeat nthreads
:for args :in arglists
:collect (sb-thread:make-thread
(lambda ()
(apply fun args))))))
(mapcar #'sb-thread:join-thread threads)))
Performance
You are right that one usually creates only as many threads as ca. the number of cores available. If you test performance by always creating n threads, then joining them, then going to the next batch, you will indeed have not much difference in performance. That is because the inefficiency lies in creating the threads. A thread is about as resource intensive as a process.
What one usually does is to create a thread pool where the threads do not get joined, but instead reused. For that, you need some other mechanism to communicate arguments and results, e. g. channels (e. g. from chanl).
Note however that e. g. lparallel already provides a pmap function, and it does things right. The purpose of such wrapper libraries is not only to give the user (programmer) a nice interface, but also to think really hard about the problems and optimize sensibly. I am quite confident that pmap will be significantly faster than your attempt.

How to execute some Clojure futures in a single thread?

I'd like to create some futures in Clojure and run them all on a specific thread, to make sure they run one-at-a-time. Is this possible?
It's not hard to wrap the Java libraries to do this, but before I do that I want to make sure I'm not missing a Clojure way of doing it. In Java I can do this by implementing FutureTask and submitting those tasks to a single-threaded executor.
Clojure's future macro calls future-call function which uses a dedicated executor service. This means that you have no control to enforce a sequential execution.
On the other hand you can use promise instead of future objects and one future thread to sequentially deliver the results. Promise's API is similar to what futures provide. They have deref and realized? too.
The following code example has the subtasks executed sequentially on a new thread in the background while the immediately returned result of the function contains the promises to the computed values.
(defn start-async-calc []
(let [f1 (promise)
f2 (promise)
f3 (promise)]
(future
(deliver f1 (task-1))
(deliver f2 (task-2))
(deliver f3 (task-3)))
{:task1 f1
:task2 f2
:task3 f3}))
if you want to sequentialize the calls to future you can use it manually like this:
(do #(future 1)
#(future 2)
#(future 3))
they would still possibly called in different threads, but the next one won't be called until the previous has finished. This is guaranteed by the # (or deref function). This means that the thread in which you execute do form would be blocked with prev promise before it completes, and then spawn next one.
you can prettify it with macro like this:
(defmacro sequentialize [& futures]
`(do ~#(map #(list `deref %) futures)))
user> (let [a (atom 1)]
(sequentialize
(future (swap! a #(* 10 %)))
(future (swap! a #(+ 20 %)))
(future (swap! a #(- %))))
#a)
;;=> -30
this does exactly the same as manual do. Notice that mutations to a atom are in-order even if some threads run longer:
user> (let [a (atom 1)]
(sequentialize
(future (Thread/sleep 100)
(swap! a #(* 10 %)))
(future (Thread/sleep 200)
(swap! a #(+ 20 %)))
(future (swap! a #(- %))))
#a)
;;=> -30
Manifold provides a way to create future with specific executor. It's not part of core Clojure lib, but it's still a high quality lib and probably a best option in case you need more flexibility dealing with futures than core lib provides (without resorting to Java interop).
In addition the promises mentioned, you can use a delay. Promises have the problem that you can accidentally not deliver them, and create a deadlock scenario that's not possible with futures and delays. The difference between a future and a delay is only the thread that the work is executed on. With a future, the work is done in the background, and with a delay the work is done by the first thread that tries to deref it. So if future's are a better fit than promises, you could always do something like:
(def result-1 (delay (long-calculation-1)))
(def result-2 (delay (long-calculation-2)))
(def result-3 (delay (long-calculation-3)))
(defn run-calcs []
#(future
#result-1
#result-2
#result-3))

How to use lightweight thread in Clojure?

I was trying to rewrite this Rust code in Clojure:
fn main() {
let nums = [1, 2];
let noms = ["Tim", "Eston", "Aaron", "Ben"];
let mut odds = nums.iter().map(|&x| x * 2 - 1);
for num in odds {
spawn(proc() {
println!("{:s} says hello from a lightweight thread!", noms[num]);
});
}
}
Here is the Clojure code that did almost the same with the above mentioned Rust code
(def noms ["Tim", "Eston", "Aaron", "Ben"])
(doseq [i (take-nth 2 (rest noms))]
(println i "says hello from a lightweight thread!"))
Except it does not use thread.
How to write "lightweight" thread (or something equivalent in Clojure terms)?
This code is almost the direct translate from the imperial programming style. What's the idiomatic way to write?
Beware of the terminology: the clojure example using futures doesn't create a lightweight thread but rather a native OS thread. Rust does the same by default but it has a green thread runtime that provides the lightweight semantics. Reference: http://rustbyexample.com/tasks.html
Clojure doesn't support lightweight threads by default but you can create them via the library core.async. So the code would look something like this:
(require '[clojure.core.async :as async :refer :all])
(doseq [i (take-nth 2 (rest noms))]
(go (print (str i " says hello from a lightweight thread!\n"))))
The go macro above will create a proper lightweight thread *
* The statement wasn't clear as pointed out in the comments so I'll try and clarify: core.async is backed by a Java thread pool, however the go macro turns your code into a state machine that uses "parking" instead of "blocking". This just means you can have tens of thousands of go blocks backed by a limited number of real threads.
When using blocking IO however, this benefit is hindered, as explained by the post #hsestupin refers to below.
To understand how core.async manages to have lightweight threads on the JVM, I recommend starting here: https://www.youtube.com/watch?v=R3PZMIwXN_g - it's a great video into the internals of the go macro.
The macro itself is implemented here
You can use future to spawn threads. In this case you could do something like:
(doseq [i (take-nth 2 (rest noms))]
(future (print (str i " says hello from a lightweight thread!\n"))))

Clojure - Using agents slows down execution too much

I am writing a benchmark for a program in Clojure. I have n threads accessing a cache at the same time. Each thread will access the cache x times. Each request should be logged inside a file.
To this end I created an agent that holds the path to the file to be written to. When I want to write I send-off a function that writes to the file and simply returns the path. This way my file-writes are race-condition free.
When I execute my code without the agent it finished in a few miliseconds. When I use the agent, and ask each thread to send-off to the agent each time my code runs horribly slow. I'm talking minutes.
(defn load-cache-only [usercount cache-size]
"Test requesting from the cache only."
; Create the file to write the benchmark results to.
(def sink "benchmarks/results/load-cache-only.txt")
(let [data-agent (agent sink)
; Data for our backing store generated at runtime.
store-data (into {} (map vector (map (comp keyword str)
(repeat "item")
(range 1 cache-size))
(range 1 cache-size)))
cache (create-full-cache cache-size store-data)]
(barrier/run-with-barrier (fn [] (load-cache-only-work cache store-data data-agent)) usercount)))
(defn load-cache-only-work [cache store-data data-agent]
"For use with 'load-cache-only'. Requests each item in the cache one.
We time how long it takes for each request to be handled."
(let [cache-size (count store-data)
foreachitem (fn [cache-item]
(let [before (System/nanoTime)
result (cache/retrieve cache cache-item)
after (System/nanoTime)
diff_ms ((comp str float) (/ (- after before) 1000))]
;(send-off data-agent (fn [filepath]
;(file/insert-record filepath cache-size diff_ms)
;filepath))
))]
(doall (map foreachitem (keys store-data)))))
The (barrier/run-with-barrier) code simply spawns usercount number of threads and starts them at the same time (using an atom). The function I pass is the body of each thread.
The body willl simply map over a list named store-data, which is a key-value list (e.g., {:a 1 :b 2}. The length of this list in my code right now is 10. The number of users is 10 as well.
As you can see, the code for the agent send-off is commented out. This makes the code execute normally. However, when I enable the send-offs, even without writing to the file, the execution time is too slow.
Edit:
I made each thread, before he sends off to the agent, print a dot.
The dots appear just as fast as without the send-off. So there must be something blocking in the end.
Am I doing something wrong?
You need to call (shutdown-agents) when you're done sending stuff to your agent if you want the JVM to exit in reasonable time.
The underlying problem is that if you don't shutdown your agents, the threads backing its threadpool will never get shut down, and prevent the JVM from exiting. There's a timeout that will shutdown the pool if there's nothing else running, but it's fairly lengthy. Calling shutdown-agents as soon as you're done producing actions will resolve this problem.

Idiomatic Clojure way to spawn and manage background threads

What is the idiomatic Clojure way to create a thread that loops in the background doing updates to some shared refs and to manage its lifetime? I find myself using future for this, but it feels like a little bit of a hack as I never return a meaningful value. E.g.:
(future (loop [] (do
(Thread/sleep 100)
(dosync (...))
(recur))))
Also, I need to be careful to future-cancel this when the background processing is no longer needed. Any tips on how to orchestrate that in a Clojure/Swing application would be nice. E.g. a dummy JComponent that is added to my UI that is responsible for killing the thread when the window is closed may be an idea.
You don't need a do in your loop; it's implied. Also, while there's nothing wrong with an unconditional loop-recur, you may as well use (while true ...).
future is a fine tool for this; don't let it bother you that you never get a value back. That should really bother you if you use an agent rather than a future, though - agents without values are madness.
However, who said you need to future-cancel? Just make one of the steps in your future be to check whether it's still needed. Then no other parts of your code need to keep track of futures and decide when to cancel them. So something like
(future (loop []
(Thread/sleep 100)
(when (dosync
(alter some-value some-function))
(recur)) ; quit if alter returns nil
))
would be a viable approach.
Using agents for background recurring tasks seems neater to me
(def my-ref (ref 0))
(def my-agent (agent nil))
(defn my-background-task [x]
(do
(send-off *agent* my-background-task)
(println (str "Before " #my-ref))
(dosync (alter my-ref inc))
(println "After " #my-ref)
(Thread/sleep 1000)))
Now all you have to do is to initiate the loop
(send-off my-agent my-background-task)
The my-backgound-task function is sending itself to the calling agent after its invocation is done.
This is the way how Rich Hickey performs recurring tasks in the ant colony example application: Clojure Concurrency

Resources