How to use lightweight thread in Clojure? - multithreading

I was trying to rewrite this Rust code in Clojure:
fn main() {
let nums = [1, 2];
let noms = ["Tim", "Eston", "Aaron", "Ben"];
let mut odds = nums.iter().map(|&x| x * 2 - 1);
for num in odds {
spawn(proc() {
println!("{:s} says hello from a lightweight thread!", noms[num]);
});
}
}
Here is the Clojure code that did almost the same with the above mentioned Rust code
(def noms ["Tim", "Eston", "Aaron", "Ben"])
(doseq [i (take-nth 2 (rest noms))]
(println i "says hello from a lightweight thread!"))
Except it does not use thread.
How to write "lightweight" thread (or something equivalent in Clojure terms)?
This code is almost the direct translate from the imperial programming style. What's the idiomatic way to write?

Beware of the terminology: the clojure example using futures doesn't create a lightweight thread but rather a native OS thread. Rust does the same by default but it has a green thread runtime that provides the lightweight semantics. Reference: http://rustbyexample.com/tasks.html
Clojure doesn't support lightweight threads by default but you can create them via the library core.async. So the code would look something like this:
(require '[clojure.core.async :as async :refer :all])
(doseq [i (take-nth 2 (rest noms))]
(go (print (str i " says hello from a lightweight thread!\n"))))
The go macro above will create a proper lightweight thread *
* The statement wasn't clear as pointed out in the comments so I'll try and clarify: core.async is backed by a Java thread pool, however the go macro turns your code into a state machine that uses "parking" instead of "blocking". This just means you can have tens of thousands of go blocks backed by a limited number of real threads.
When using blocking IO however, this benefit is hindered, as explained by the post #hsestupin refers to below.
To understand how core.async manages to have lightweight threads on the JVM, I recommend starting here: https://www.youtube.com/watch?v=R3PZMIwXN_g - it's a great video into the internals of the go macro.
The macro itself is implemented here

You can use future to spawn threads. In this case you could do something like:
(doseq [i (take-nth 2 (rest noms))]
(future (print (str i " says hello from a lightweight thread!\n"))))

Related

How to execute some Clojure futures in a single thread?

I'd like to create some futures in Clojure and run them all on a specific thread, to make sure they run one-at-a-time. Is this possible?
It's not hard to wrap the Java libraries to do this, but before I do that I want to make sure I'm not missing a Clojure way of doing it. In Java I can do this by implementing FutureTask and submitting those tasks to a single-threaded executor.
Clojure's future macro calls future-call function which uses a dedicated executor service. This means that you have no control to enforce a sequential execution.
On the other hand you can use promise instead of future objects and one future thread to sequentially deliver the results. Promise's API is similar to what futures provide. They have deref and realized? too.
The following code example has the subtasks executed sequentially on a new thread in the background while the immediately returned result of the function contains the promises to the computed values.
(defn start-async-calc []
(let [f1 (promise)
f2 (promise)
f3 (promise)]
(future
(deliver f1 (task-1))
(deliver f2 (task-2))
(deliver f3 (task-3)))
{:task1 f1
:task2 f2
:task3 f3}))
if you want to sequentialize the calls to future you can use it manually like this:
(do #(future 1)
#(future 2)
#(future 3))
they would still possibly called in different threads, but the next one won't be called until the previous has finished. This is guaranteed by the # (or deref function). This means that the thread in which you execute do form would be blocked with prev promise before it completes, and then spawn next one.
you can prettify it with macro like this:
(defmacro sequentialize [& futures]
`(do ~#(map #(list `deref %) futures)))
user> (let [a (atom 1)]
(sequentialize
(future (swap! a #(* 10 %)))
(future (swap! a #(+ 20 %)))
(future (swap! a #(- %))))
#a)
;;=> -30
this does exactly the same as manual do. Notice that mutations to a atom are in-order even if some threads run longer:
user> (let [a (atom 1)]
(sequentialize
(future (Thread/sleep 100)
(swap! a #(* 10 %)))
(future (Thread/sleep 200)
(swap! a #(+ 20 %)))
(future (swap! a #(- %))))
#a)
;;=> -30
Manifold provides a way to create future with specific executor. It's not part of core Clojure lib, but it's still a high quality lib and probably a best option in case you need more flexibility dealing with futures than core lib provides (without resorting to Java interop).
In addition the promises mentioned, you can use a delay. Promises have the problem that you can accidentally not deliver them, and create a deadlock scenario that's not possible with futures and delays. The difference between a future and a delay is only the thread that the work is executed on. With a future, the work is done in the background, and with a delay the work is done by the first thread that tries to deref it. So if future's are a better fit than promises, you could always do something like:
(def result-1 (delay (long-calculation-1)))
(def result-2 (delay (long-calculation-2)))
(def result-3 (delay (long-calculation-3)))
(defn run-calcs []
#(future
#result-1
#result-2
#result-3))

Is clojure "multithread"?

My question may seem weird but I think I'm facing an issue with volatile objects.
I have written a library implemented like this (just a scheme, not real content):
(def var1 (volatile! nil))
(def var2 (volatile! nil))
(def do-things [a]
(vreset! var1 a)
(vswap! var2 (inc #var2))
{:a #var1 :b #var2})
So I have global var which are initialized by external values, others that are calculated and I return their content.
i used volatile to have better speed than with atoms and not to redefine everytime a new var for every calculation.
The problem is that this seems to fail in practice because I map do-things to a collection (in another program) with inner sub-calls to this function occasionaly, like (pseudo-code) :
(map
(fn [x]
(let [analysis (do-things x)]
(if blabla
(do-things (f x))
analysis)))) coll)
Will inner conditionnal call spawn another thread under the hood ? It seems yes because somethimes calls work, sometimes not.
is there any other way to do apart from defining volatile inside every do-things body ?
EDIT
Actually the error was another thing but the question is still here : is this an acceptable/safe way to do without any explicit call to multithreading capabilities ?
There are very few constructs in Clojure that create threads on your behalf - generally Clojure can and will run on one or more threads depending on how you structure your program. pmap is a good example that creates and manages a pool of threads to map in parallel. Another is clojure.core.reducers/fold, which uses a fork/join pool, but really that's about it. In all other cases it's up to you to create and manage threads.
Volatiles should only be used with great care and in circumstances where you control the scope of use such that you are guaranteed not to be competing with threads to read and write the same volatile. Volatiles guarantee that writes can be read on another thread, but they do nothing to guarantee atomicity. For that, you must use either atoms (for uncoordinated) or refs and the STM (for coordinated).

Achieving multiple locks in clojure

I'm new to Clojure and am writing a web application. It includes a function fn performed on user user-id which includes several steps of reading and writing to the database and file system. These steps cannot be performed simultaneously by multiple threads (will cause database and file system inconsistencies) and I don't believe they can be performed using a database transaction. However, they are specific to one user and thus can be performed simultaneously for different users.
Thus, if a http request is made to perform fn for a specific user-id I need to make sure that it is completed before any http requests can perform fn for this user-id
I've come up with a solution that seems to work in the REPL but have not tried it in the web server yet. However, being unexperienced with Clojure and threaded programming I'm not sure whether this is a good or safe way to solve the problem. The following code has been developed by trial-and-error and uses the locking function - which seems to go against the "no locks" philosophy of Clojure.
(ns locking.core)
;;; Check if var representing lock exists in namespace
;;; If not, create it. Creating a new var if one already
;;; exists seems to break the locking.
(defn create-lock-var
[var-name value]
(let [var-sym (symbol var-name)]
(do
(when (nil? (ns-resolve 'locking.core var-sym))
(intern 'locking.core var-sym value))
;; Return lock var
(ns-resolve 'locking.core var-sym))))
;;; Takes an id which represents the lock and the function
;;; which may only run in one thread at a time for a specific id
(defn lock-function
[lock-id transaction]
(let [lock (create-lock-var (str "lock-id-" lock-id) lock-id)]
(future
(locking lock
(transaction)))))
;;; A function to test the locking
(defn test-transaction
[transaction-count sleep]
(dotimes [x transaction-count]
(Thread/sleep sleep)
(println "performing operation" x)))
If I open three windows in REPL and execute these functions, it works
repl1 > (lock-function 1 #(test-transaction 10 1000)) ; executes immediately
repl2 > (lock-function 1 #(test-transaction 10 1000)) ; waits for repl1 to finish
repl2 > (lock-function 2 #(test-transaction 10 1000)) ; executes immediately because id=2
Is this reliable? Are there better ways to solve the problem?
UPDATE
As pointed out, the creation of the lock variable is not atomic. I've re-written the lock-function function and it seems to work (no need for create-lock-var)
(def locks (atom {}))
(defn lock-transaction
[lock-id transaction]
(let [lock-key (keyword (str "lock-id-" lock-id))]
(do
(compare-and-set! locks (dissoc #locks lock-key) (assoc #locks lock-key lock-id))
(future
(locking (lock-key #locks)
(transaction))))))
Note: Renamed the function to lock-transaction, seems more appropriate.
Don't use N vars in a namespace, use an atom wrapped around 1 hash-map mapping N symbols to N locks. This fixes your current race condition, avoids creating a bunch of silly vars, and is easier to write anyway.
Since you're making a web app, I have to warn you: even if you do manage to get in-process locking right (which is not easy in itself), it will be for nothing as soon as you deploy your web server on more than one machine (which is almost mandatory if you want your app to be highly-available).
So basically, if you want to use locking, you'd better use distributed locking. From this point on, this discussion is not Clojure-specific, since Clojure's concurrency tools won't be especially helpful here.
For distributed locking, you could use something like Zookeeper. If you don't want to set up a whole Zookeeper cluster just for this, maybe you can compromise by using a Redis database (the Carmine library gives you distributed locks out of the box), although last time I heard Redis locking is not 100% reliable.
Now, it seems to me locking is not especially a requirement, and is not the best approach, especially if you're striving for idiomatic Clojure. How about using a queue instead ? Some popular JVM message brokers (such as HornetQ and ActiveMQ) give you Message Grouping, which guarantees that messages of the same group-id will be processed (serially) by the same consumer. All you have to do is have some threads listen to the right queue and set the user-id as the group id for your messages.
HACK: If you don't want to set up a distributed message broker, maybe you could get around by enabling sticky sessions on you load balancer, and using such a message broker in-VM.
By the way, don't name your function fn :).

Idiomatic Clojure way to spawn and manage background threads

What is the idiomatic Clojure way to create a thread that loops in the background doing updates to some shared refs and to manage its lifetime? I find myself using future for this, but it feels like a little bit of a hack as I never return a meaningful value. E.g.:
(future (loop [] (do
(Thread/sleep 100)
(dosync (...))
(recur))))
Also, I need to be careful to future-cancel this when the background processing is no longer needed. Any tips on how to orchestrate that in a Clojure/Swing application would be nice. E.g. a dummy JComponent that is added to my UI that is responsible for killing the thread when the window is closed may be an idea.
You don't need a do in your loop; it's implied. Also, while there's nothing wrong with an unconditional loop-recur, you may as well use (while true ...).
future is a fine tool for this; don't let it bother you that you never get a value back. That should really bother you if you use an agent rather than a future, though - agents without values are madness.
However, who said you need to future-cancel? Just make one of the steps in your future be to check whether it's still needed. Then no other parts of your code need to keep track of futures and decide when to cancel them. So something like
(future (loop []
(Thread/sleep 100)
(when (dosync
(alter some-value some-function))
(recur)) ; quit if alter returns nil
))
would be a viable approach.
Using agents for background recurring tasks seems neater to me
(def my-ref (ref 0))
(def my-agent (agent nil))
(defn my-background-task [x]
(do
(send-off *agent* my-background-task)
(println (str "Before " #my-ref))
(dosync (alter my-ref inc))
(println "After " #my-ref)
(Thread/sleep 1000)))
Now all you have to do is to initiate the loop
(send-off my-agent my-background-task)
The my-backgound-task function is sending itself to the calling agent after its invocation is done.
This is the way how Rich Hickey performs recurring tasks in the ant colony example application: Clojure Concurrency

Understanding output in Clojure using swank/slime

When I run Clojure code from the Swank repl in emacs, the main thread will print out messages using printf to the repl. But if I run agents or explicitly create other threads which also print, sometimes the output doesn't show up, and other times it shows up in the console window where I'm running Swank. I'd love to understand why.
Edit: Thanks to Daniel's answer below I now know that the other threads do not have out bound to the output of the REPL. This code works because you pass in the out from where you run from. However my new problem is that this code now blocks per thread so rather than running in parallel it runs each thread one at a time, so I need a more thread aware output method.
(defn sleeper-thread [out id t]
"Sleep for time T ms"
(binding [*out* out]
(printf "%d sleeping for time %d\n" id t)
(Thread/sleep t)
(printf "%d slept\n" id)))
(defn test-threads [n out]
(dotimes [x n]
(.start (Thread. (#(sleeper-thread %1 %2 %3) out x (+ 2000 (rand-int 5000)))))))
The reason is, that in other threads *out* is not bound to the REPL's stream. Try something like this:
(let [repl-out *out*]
(defn foo []
(binding [*out* repl-out]
...)))
Now, when running foo from another thread, *out* will be bound to whatever it was when you defined the function (i.e. the SLIME REPL), so printing will work as expected.
Or, for testing:
(defmacro future-output [& body]
`(let [out# *out*]
(future
(binding [*out* out#]
~#body))))
Note: This is untested, because I have no working Clojure/SLIME here atm, but that code worked a few months ago. There might be differences in newer Versions of Clojure (1.3 Alpha 2):
code path for using vars is now
much faster for the common case,
and you must explicitly ask for :dynamic bindability
If you are struggling with the same using cake, there should be a log file with the output in the .cake/cake.log file in your project root (where project.clj lives).

Resources