Off-loading an HTTP Request w/o Leaking Threads - multithreading

I need to make an HTTP request which quite often can fail and I'm totally not interested in the result, if it worked or not. Also, I don't want to wait for it to return.
So, I'd like to wrap that call in a separate thread and make sure that the thread won't stick around when something is blocking.
My current approach is something like this:
(defn- call-and-forget [url]
(let [timeout 250
combined-timeout (* timeout 2.5)
f (future
(try
(http/delete url
{:socket-timeout timeout
:conn-timeout timeout})
(catch Throwable e
(printf "Could not call %s: %s"
url (.getMessage e)))))]
(deref f combined-timeout)
(when-not (future-done? f)
(future-cancel f))))
I hereby put this code under the Apache 2.0 license
It uses clj-http to make the call and a Future to create another thread. I am aware of this using a thread of the built-in pool and the discussion over in this thread. The amount of complexity added by using my own thread pool, thread factory, executor service, uncaught handler and so on is not really worth it.
Would you agree that the code above is a good, working solution, or do you see a better way?

Looks good. You could also do
(when (= :failed (deref f timeout-ms :failed))
(future-cancel f))

Related

How to yield a thread's current continuation from an exception handler

This code is really pushing the limits of my understanding so bear with me.
Previously I implemented coroutines in Racket in the following code:
;; Coroutine definition
(define (make-generator procedure)
(define last-return values)
(define last-value #f)
(define status 'suspended)
(define (last-continuation _)
(let ([result (procedure yield)])
(last-return result)))
(define (yield value)
(call/cc (lambda (continuation)
(set! last-continuation continuation)
(set! last-value value)
(set! status 'suspended)
(last-return value))))
(lambda args
(call/cc (lambda (return)
(set! last-return return)
(cond ((null? args)
(let ()
(set! status 'dead)
(last-continuation last-value)))
((eq? (car args) 'coroutine?) 'coroutine)
((eq? (car args) 'status?) status)
((eq? (car args) 'dead?) (eq? status 'dead))
((eq? (car args) 'alive?) (not (eq? status 'dead)))
((eq? (car args) 'kill!) (set! status 'dead))
(#t (apply last-continuation args)))))))
;;Define a function that will return a suspended coroutine created from given args and body forms
(define-syntax (define-coroutine stx)
(syntax-case stx ()
((_ (name . args) . body )
#`(define (name . args)
(make-generator
(lambda (#,(datum->syntax stx 'yield))
. body))))))
What I want to do is implement an exception handler (with-handlers) that calls the (yield) function. The idea is a second thread can send a signal to the thread evaluating the coroutine forcing it to yield when its running for too long.
I've tried the following in the args lambda, which successfully returned early but later evaluations of the coroutine (my-coroutine 'dead?) returned that the coroutine was in the 'dead state:
(with-handlers
([exn:break?
(lambda (break)
(yield 'coroutine-timeout))])
(break-enabled #t) ;register for yield requests from coroutine manager thread
(last-continuation last-value))))
Alternatively, I've tried the following, but it didn't produce a procedure that can be applied to arguments:
(with-handlers
([exn:break?
(lambda (break)
(set! last-continuation (exn:break-continuation break))
(set! last-value 'coroutine-timeout)
(set! status 'suspended)
(last-return 'coroutine-timeout))])
(break-enabled #t) ;register for yield requests from coroutine manager thread
(last-continuation last-value))))
I'm trying to understand how continuations and exceptions interact/block each other. It seems like I may need to use Parameters somehow?
How can I successfully write a signal handler that will (yield) correctly so that I can resume the coroutine later?
Edit:
I am mixing metaphores here (cooperative and preemptive multithreading). However, my question seems possible to me (from a layman's perspective) as I can evaluate functions defined in my coroutine (including (yield)) from within the exception handler. I'm essentially trying to limit resource starvation in my worker threads, as well as mitigate a certain class of deadlock (where task 1 can only complete after task 2 has run, and there are no free threads for task 2 to run on).
I have written a (go) function for these coroutines that is modeled after go's goroutines. I assume they achieve their asynchronous behavior on single threads by having cooperative yield checks in the underlying code they control. Perhaps it runs in a VM as you suggested and there are checks, perhaps their operators have the checks. Whatever the case may be I'm trying to achieve similar behavior with a different strategy.
As far as "how continuations and exceptions interact/block each other," it's important to know that exceptions are implemented using delimited continuations. In particular, the exception system makes use of continuation barriers. Both of these are introduced in the Racket reference §1.1.12 Prompts, Delimited Continuations, and Barriers:
A continuation barrier is another kind of continuation frame that prohibits certain replacements of the current continuation with another. … A continuation barrier thus prevents “downward jumps” into a continuation that is protected by a barrier. Certain operations install barriers automatically; in particular, when an exception handler is called, a continuation barrier prohibits the continuation of the handler from capturing the continuation past the exception point.
You may also want to see the material on exceptions from later in the evaluation model section and from the control flow section, which cites an academic paper on the subject. The differences between call-with-exception-handler and with-handlers are also relevant to capturing continuations from within exception handlers.
Basically, though, the continuation barrier prevents using exception handlers for continuations that you abort and might later resume: you should use continuation barriers and prompts directly for that.
More broadly, I would suggest that you look at Racket's substantial existing support for concurrency. Even if you want to implement coroutines as an experiment, they would be useful for inspiration and examples of implementation techniques. Racket comes with derived constructs such as engines ("processes that can be preempted by a timer or other external trigger") and generators, in addition to the fundamental building-blocks, green threads and synchronizable events (which are based on Concurrent ML model).
The gist of your question:
How can I implement an exception handler for coroutines, such that a second thread can send
a signal to a thread evaluating a coroutine, forcing it to yield
when its running for too long.
And once more:
How can I successfully write a signal handler that will (yield)
correctly so that I can resume the coroutine later?
It seems to me that you are not cleanly separating cooperative and preemptive multitasking, since you seem to want to combine coroutines (cooperative) with time-outs (preemptive). (You also mention threads, but seem to conflate them with coroutines.)
With cooperative multitasking there is no way that you can force anyone else to stop running; hence the moniker "cooperative".
With preemptive multitasking you do not need to yield, because the scheduler will preempt you when your allocated time has run out. The scheduler is also responsible for saving your continuation, but it is not the (scheduler's) current continuation, since the scheduler is wholly separate from the user thread.
Perhaps the closest thing to what you are proposing is simulating preemptive multitasking via polling. Every (simulated) timestep (i.e. a VM instruction) the simulation needs to check whether any interrupts/signals have been received by a running thread and handle them.

Clojure futures mysteriously dying

I have an application that spins up a number of futures to do prolonged work. It's intermittently failing and I'm trying to work out why.
The symptom is that the code is just ceasing to execute, and stops in a random place. My future-creation-code is something like this:
(def future-timeout
; 1 hour
3600000)
(def concurrency 200)
(defn do-parallel
[f coll]
(let [chunks (partition-all concurrency coll)]
(doseq [chunk chunks]
(let [futures (doall
(map #(future
(try
(f %)
(catch Exception e
(log/error "Unhandled error in do-parallel:" (.getMessage e))
:exception)))
chunk))
results (doall (map #(deref % future-timeout :timeout) futures))
all-ok (every? true? results)]
(when all-ok
(log/info "Chunk successful."))
(when-not all-ok
(log/error "Chunk unsuccessful.")
(log/warn "Parallel execution results:" results))
(swap! chunk-count inc)))
(log/info "Finished batch")))
The concurrency variable controls the size of batches, and therefore the number of concurrent executions it attempts. f returns true on success. If there's a timeout or exception, they return :timeout or :exception.
I'm doing this instead of pmap because I want to control concurrency, f is a long-running (~10 minutes), network-intensive task. pmap seems to be tuned toward mid-sized, smaller batches.
Normally this works fine. But after a few hours it stops:
During the execution of f, the function stops running.
No exception is caught. No timeout occurs.
The loop in do-parallel stops and no more log entries appear.
Other threads, e.g. Kafka Client, keep running.
Any ideas of what might be causing this? Or steps to put in place to help diagnose?
You might want to try to install an uncaught exception handler, to see if a stray exception on the Executor itself is causing work to stop.
https://github.com/pyr/uncaught has a facility for this, but it's also straightforward to do from the code directly.
Claypool is useful for controlling parallelism.
(cp/pmap (count chunk) f chunk)
will create a temporary threadpool the same size as your chunk and execute all the functions in parallel.
This is just a suggestion for expressing parallelism, not an answer to your question which is about error handling; which I'm curious about also!
Maybe try catch Throwable instead of Exception? I've had issues before that slipped through catch Exception because of it.
I think if there's an uncaught exception in the futures, it catches it and dies without throwing it further out, so setting the default uncaughtexception isn't going to help. Untested - but that's my gut feel.
Do you get the "Chunk unsuccessful" message at the end at least, when it stops? because if you don't, then that's really weird...
Looking around the implementation of future - it uses a cachedthreadpool underneath - which doesn't have a thread limit, so you're probably better off using the ExecutorService directly, or something like claypoole, like the other suggestions indicate.

Achieving multiple locks in clojure

I'm new to Clojure and am writing a web application. It includes a function fn performed on user user-id which includes several steps of reading and writing to the database and file system. These steps cannot be performed simultaneously by multiple threads (will cause database and file system inconsistencies) and I don't believe they can be performed using a database transaction. However, they are specific to one user and thus can be performed simultaneously for different users.
Thus, if a http request is made to perform fn for a specific user-id I need to make sure that it is completed before any http requests can perform fn for this user-id
I've come up with a solution that seems to work in the REPL but have not tried it in the web server yet. However, being unexperienced with Clojure and threaded programming I'm not sure whether this is a good or safe way to solve the problem. The following code has been developed by trial-and-error and uses the locking function - which seems to go against the "no locks" philosophy of Clojure.
(ns locking.core)
;;; Check if var representing lock exists in namespace
;;; If not, create it. Creating a new var if one already
;;; exists seems to break the locking.
(defn create-lock-var
[var-name value]
(let [var-sym (symbol var-name)]
(do
(when (nil? (ns-resolve 'locking.core var-sym))
(intern 'locking.core var-sym value))
;; Return lock var
(ns-resolve 'locking.core var-sym))))
;;; Takes an id which represents the lock and the function
;;; which may only run in one thread at a time for a specific id
(defn lock-function
[lock-id transaction]
(let [lock (create-lock-var (str "lock-id-" lock-id) lock-id)]
(future
(locking lock
(transaction)))))
;;; A function to test the locking
(defn test-transaction
[transaction-count sleep]
(dotimes [x transaction-count]
(Thread/sleep sleep)
(println "performing operation" x)))
If I open three windows in REPL and execute these functions, it works
repl1 > (lock-function 1 #(test-transaction 10 1000)) ; executes immediately
repl2 > (lock-function 1 #(test-transaction 10 1000)) ; waits for repl1 to finish
repl2 > (lock-function 2 #(test-transaction 10 1000)) ; executes immediately because id=2
Is this reliable? Are there better ways to solve the problem?
UPDATE
As pointed out, the creation of the lock variable is not atomic. I've re-written the lock-function function and it seems to work (no need for create-lock-var)
(def locks (atom {}))
(defn lock-transaction
[lock-id transaction]
(let [lock-key (keyword (str "lock-id-" lock-id))]
(do
(compare-and-set! locks (dissoc #locks lock-key) (assoc #locks lock-key lock-id))
(future
(locking (lock-key #locks)
(transaction))))))
Note: Renamed the function to lock-transaction, seems more appropriate.
Don't use N vars in a namespace, use an atom wrapped around 1 hash-map mapping N symbols to N locks. This fixes your current race condition, avoids creating a bunch of silly vars, and is easier to write anyway.
Since you're making a web app, I have to warn you: even if you do manage to get in-process locking right (which is not easy in itself), it will be for nothing as soon as you deploy your web server on more than one machine (which is almost mandatory if you want your app to be highly-available).
So basically, if you want to use locking, you'd better use distributed locking. From this point on, this discussion is not Clojure-specific, since Clojure's concurrency tools won't be especially helpful here.
For distributed locking, you could use something like Zookeeper. If you don't want to set up a whole Zookeeper cluster just for this, maybe you can compromise by using a Redis database (the Carmine library gives you distributed locks out of the box), although last time I heard Redis locking is not 100% reliable.
Now, it seems to me locking is not especially a requirement, and is not the best approach, especially if you're striving for idiomatic Clojure. How about using a queue instead ? Some popular JVM message brokers (such as HornetQ and ActiveMQ) give you Message Grouping, which guarantees that messages of the same group-id will be processed (serially) by the same consumer. All you have to do is have some threads listen to the right queue and set the user-id as the group id for your messages.
HACK: If you don't want to set up a distributed message broker, maybe you could get around by enabling sticky sessions on you load balancer, and using such a message broker in-VM.
By the way, don't name your function fn :).

Clojure - Using agents slows down execution too much

I am writing a benchmark for a program in Clojure. I have n threads accessing a cache at the same time. Each thread will access the cache x times. Each request should be logged inside a file.
To this end I created an agent that holds the path to the file to be written to. When I want to write I send-off a function that writes to the file and simply returns the path. This way my file-writes are race-condition free.
When I execute my code without the agent it finished in a few miliseconds. When I use the agent, and ask each thread to send-off to the agent each time my code runs horribly slow. I'm talking minutes.
(defn load-cache-only [usercount cache-size]
"Test requesting from the cache only."
; Create the file to write the benchmark results to.
(def sink "benchmarks/results/load-cache-only.txt")
(let [data-agent (agent sink)
; Data for our backing store generated at runtime.
store-data (into {} (map vector (map (comp keyword str)
(repeat "item")
(range 1 cache-size))
(range 1 cache-size)))
cache (create-full-cache cache-size store-data)]
(barrier/run-with-barrier (fn [] (load-cache-only-work cache store-data data-agent)) usercount)))
(defn load-cache-only-work [cache store-data data-agent]
"For use with 'load-cache-only'. Requests each item in the cache one.
We time how long it takes for each request to be handled."
(let [cache-size (count store-data)
foreachitem (fn [cache-item]
(let [before (System/nanoTime)
result (cache/retrieve cache cache-item)
after (System/nanoTime)
diff_ms ((comp str float) (/ (- after before) 1000))]
;(send-off data-agent (fn [filepath]
;(file/insert-record filepath cache-size diff_ms)
;filepath))
))]
(doall (map foreachitem (keys store-data)))))
The (barrier/run-with-barrier) code simply spawns usercount number of threads and starts them at the same time (using an atom). The function I pass is the body of each thread.
The body willl simply map over a list named store-data, which is a key-value list (e.g., {:a 1 :b 2}. The length of this list in my code right now is 10. The number of users is 10 as well.
As you can see, the code for the agent send-off is commented out. This makes the code execute normally. However, when I enable the send-offs, even without writing to the file, the execution time is too slow.
Edit:
I made each thread, before he sends off to the agent, print a dot.
The dots appear just as fast as without the send-off. So there must be something blocking in the end.
Am I doing something wrong?
You need to call (shutdown-agents) when you're done sending stuff to your agent if you want the JVM to exit in reasonable time.
The underlying problem is that if you don't shutdown your agents, the threads backing its threadpool will never get shut down, and prevent the JVM from exiting. There's a timeout that will shutdown the pool if there's nothing else running, but it's fairly lengthy. Calling shutdown-agents as soon as you're done producing actions will resolve this problem.

Future vs Thread: Which is better for working with channels in core.async?

When working with channels, is future recommended or is thread? Are there times when future makes more sense?
Rich Hickey's blog post on core.async recommends using thread rather than future:
While you can use these operations on threads created with e.g. future, there is also a macro, thread , analogous to go, that will launch a first-class thread and similarly return a channel, and should be preferred over future for channel work.
~ http://clojure.com/blog/2013/06/28/clojure-core-async-channels.html
However, a core.async example makes extensive use of future when working with channels:
(defn fake-search [kind]
(fn [c query]
(future
(<!! (timeout (rand-int 100)))
(>!! c [kind query]))))
~ https://github.com/clojure/core.async/blob/master/examples/ex-async.clj
Summary
In general, thread with its channel return will likely be more convenient for the parts of your application where channels are prominent. On the other hand, any subsystems in your application that interface with some channels at their boundaries but don't use core.async internally should feel free to launch threads in whichever way makes the most sense for them.
Differences between thread and future
As pointed out in the fragment of the core.async blog post you quote, thread returns a channel, just like go:
(let [c (thread :foo)]
(<!! c))
;= :foo
The channel is backed by a buffer of size 1 and will be closed after the value returned by the body of the thread form is put on it. (Except if the returned value happens to be nil, in which case the channel will be closed without anything being put on it -- core.async channels do not accept nil.)
This makes thread fit in nicely with the rest of core.async. In particular, it means that go + the single-bang ops and thread + the double-bang ops really are used in the same way in terms of code structure, you can use the returned channel in alt! / alts! (and the double-bang equivalents) and so forth.
In contrast, the return of future can be deref'd (#) to obtain the value returned by the future form's body (possibly nil). This makes future fit in very well with regular Clojure code not using channels.
There's another difference in the thread pool being used -- thread uses a core.async-specific thread pool, while future uses one of the Agent-backing pools.
Of course all the double-bang ops, as well as put! and take!, work just fine regardless of the way in which the thread they are called from was started.
it sounds like he is recommending using core. async's built in thread macro rather than java's Thread class.
http://clojure.github.io/core.async/#clojure.core.async/thread
Aside from which threadpool things are run in (as pointed out in another answer), the main difference between async/thread and future is this:
thread will return a channel which only lets you take! from the channel once before you just get nil, so good if you need channel semantics, but not ideal if you want to use that result over and over
in contrast, future returns a dereffable object, which once the thread is complete will return the answer every time you deref , making it convenient when you want to get this result more than once, but this comes at the cost of channel semantics
If you want to preserve channel semantics, you can use async/thread and place the result on (and return a) async/promise-chan, which, once there's a value, will always return that value on later take!s. It's slightly more work than just calling future, since you have to explicitly place the result on the promise-chan and return it instead of the thread channel, but buys you interoperability with the rest of the core.async infrastructure.
It almost makes one wonder if there shouldn't be a core.async/thread-promise and core.async/go-promise to make this more convenient...

Resources