Racket refresh-now, thread, and yield - multithreading

I've been writing some simple racket GUI programs to prepare for a class I'm teaching in the fall. I'm having some problems with animation. I'm using a basic canvas, and using the model for animation in which the entire canvas is refreshed each frame, by calling the paint procedure. An example program is below.
My problem is that I have to either run the animation as a separate thread, or call yield after each instance of refresh-now. Why is this? I expected the refresh-now to cause the image to refresh right away, without additional work on my part.
I have read the animation examples on the racket pages, and see that they usually draw directly to the canvas. I understand that since the canvas is double-buffered this works fine ... but for my application it's easier to just have the paint procedure carry the load, since I need a working paint procedure anyway in case of minimizing, etc. (Of course, the yield is not a huge burden, but it would be easier to teach if it were not needed.)
Thanks,
John
#lang racket
; Demonstrate simple animation in Racket
(require racket/gui)
(define min-x 0)
(define min-y 0)
(define max-x 200)
(define max-y 200)
; Three vertexes of the triangle, expressed relative to a starting x and y location.
(define triangle-vertexes [list
(list 10 0)
(list 0 20)
(list 20 20)])
(define triangle-x 20)
(define triangle-y 20)
; Move a triangle by a (delta-x, delta-y) pair
(define (move-triangle adjust)
(set! triangle-x (+ triangle-x (first adjust)))
(set! triangle-y (+ triangle-y (second adjust))))
; Adjust the location of a vertex by adding an (x,y) adjustment to it.
; Could also be defined using map.
(define (triangle-adjust adjust vertex)
(list (+ (first adjust) (first vertex))
(+ (second adjust) (second vertex))))
; Create the paint-callback function.
; It should:
; - draw a triangle at the current location
(define (draw-triangle dc)
(let ((vertex1 (triangle-adjust (list triangle-x triangle-y) (first triangle-vertexes)))
(vertex2 (triangle-adjust (list triangle-x triangle-y) (second triangle-vertexes)))
(vertex3 (triangle-adjust (list triangle-x triangle-y) (third triangle-vertexes))))
(send dc draw-line (first vertex1) (second vertex1) (first vertex2) (second vertex2))
(send dc draw-line (first vertex2) (second vertex2) (first vertex3) (second vertex3))
(send dc draw-line (first vertex3) (second vertex3) (first vertex1) (second vertex1))))
(define frame (new frame%
[label "Animation Example"]
[width 800]
[height 800]))
(define triangle-canvas (new canvas% [parent frame]
[paint-callback
(lambda (canvas dc)
(display "callback called")
(draw-triangle dc))]))
(send frame show #t)
; run a thunk (a procedure of zero arguments) n times
; only useful if thunk has side-effects
(define (loop n thunk)
(cond
((> n 0) (thunk)
(loop (- n 1) thunk))
(else false)))
; Animate the triangle. We have to either run this in a different thread from
; the event loop or yield each time we want something to be drawn.
(define (animate-triangle)
(loop 30
(lambda ()
(move-triangle (list 10 10))
(send triangle-canvas refresh-now)
; (send triangle-canvas flush)
(yield)
; (sleep 0.1)
)))

This isn't an answer to your question about refresh-now, but a better alternative to an explicit thread and loop is the timer% class:
;; This goes after (send frame show #t), replacing loop and animate-triangle
(define timer-counter 0)
(define timer
(new timer%
(interval 100) ;; update every 100 ms
(notify-callback
(lambda ()
(cond [(< timer-counter 30)
(set! timer-counter (add1 timer-counter))
(move-triangle (list 10 10))
(send triangle-canvas refresh)]
[else
(send timer stop)])))))
If you redefine your stopping condition based on the triangle's state, you can get rid of the auxiliary timer-counter; I put it in to mimic the behavior of your original code.
The timer is created in the same eventspace as the frame, and the eventspace has an event-handling thread, which is why you don't have to create your own thread explicitly.
How to Design Programs, 2nd ed has another approach to animation, where the canvas and updates are managed automatically. You just call big-bang with functions to (functionally) update the "state of the world" and render the "state of the world" as an image. Depending on exactly what you're teaching, it may or may not be useful to you.

Related

How to include field names when printing a Racket struct

Suppose I have this Racket code:
(struct pos (x y))
(displayln (pos 5 6))
This displays #<pos>. Is there a way to make it display the field names and values too?
With the #:transparent option, the values are displayed:
(struct pos (x y) #:transparent)
(displayln (pos 5 6))
This displays #(struct:pos 5 6), but I also want to display the field names (x and y). Is there a way to display both the field names and values? For example: #(struct:pos #:x 5 #:y 6).
I am looking for something similar to how Common Lisp structs are displayed. Common Lisp example:
(defstruct pos x y)
(format t "~A~%" (make-pos :x 5 :y 6))
This prints #S(POS :X 5 :Y 6).
If you don't want to use third-party libraries, take a look at the very last example of make-constructor-style-printer.
If you don't mind using third-party libraries, you can just use Rebellion's record.

Analogy of Python is_pressed in Common Lisp?

I'm making a simple LTK game in Common Lisp and would like to have the player character jump upon pressing space. However, the window is constantly updating, so I cannot wait for user input. I want to check each frame whether the space key is pressed. Despite much googling, I couldn't find the way to do this in Lisp. All I found are traditional I/O prompts that stop the flow of the program.
Thanks for any help.
Are you using a Tk canvas? Maybe you could bind event handlers to key presses?
(with-ltk ()
(let ((canvas (make-canvas nil)))
(grid canvas 1 1)
(bind canvas
"<Left>"
(lambda (e)
(message-box "Left" "Click" "okcancel" "info")))
(focus canvas)))
When you receive a key press, you can set a variable that you inspect when processing a frame.
This works fine for one button, but binding e.g. both the up arrow and the left arrow for movement only registers one at a time. Is there a way to fix that?
Here is a more detailed example. Tk can bind on KeyPress and KeyRelease events. You only need to manage the pressed/unpressed state of the key you want to observe, and set the right variables accordingly.
(with-ltk ()
(let ((canvas (make-canvas nil))
(x 50)
(y 50)
(speed 5)
(w 10)
(h 10)
(dx 0)
(dy 0))
(grid canvas 1 1)
(macrolet
((on (event expr)
(check-type event string)
(let ((evar (gensym)))
`(bind canvas ,event
(lambda (,evar)
(declare (ignore ,evar))
,expr)))))
(on "<KeyPress-Left>" (decf dx))
(on "<KeyPress-Right>" (incf dx))
(on "<KeyRelease-Left>" (incf dx))
(on "<KeyRelease-Right>" (decf dx))
(on "<KeyPress-Up>" (decf dy))
(on "<KeyPress-Down>" (incf dy))
(on "<KeyRelease-Up>" (incf dy))
(on "<KeyRelease-Down>" (decf dy)))
(focus canvas)
(let ((rectangle (create-rectangle canvas x y 10 10)))
(labels ((game-loop ()
(incf x (* dx speed))
(incf y (* dy speed))
(set-coords canvas rectangle (list x y (+ x w) (+ y h)))
(after 50 #'game-loop)))
(game-loop)))))
The first part of the above function creates a canvas and binds event handlers for KeyPress/KeyRelease events for Left, Right, Down and Up keys. This is a bit verbose, but is simple enough for an example. Alternatively, you could bind only on "<KeyPress>" and "<KeyRelease>" events (no additional key in the string), and use a table indexed by the keycode of your key event.
The second part is the game loop, where both deltas dx and dy are actually used to move the rectangle; at the end of the game loop, after ensures the game loop is run again after some time.
Event handlers do not directly manipulate canvas elements, they are only used to translate UI events to game logic changes. You have to be careful about how and when events happen. For example, the first version of the above used to do:
(on "<KeyPress-Left>" (setf dx -1))
(on "<KeyPress-Right>" (setf dx 1))
(on "<KeyRelease-Right>" (setf dx 0))
(on "<KeyRelease-Left>" (setf dx 0))
But that was wrong because the sequence press-left, press-right and release-right would make the rectangle stop.

Why does looping over a large amount of data in another thread cause an overactive GC, and prevent some data from being freed?

I'm writing code that takes some lazy results produced by pmap, and draws them onto a BufferedImage. For three days now I've been trying to figure out why the drawing suddenly starts freezing and eventually halts about a 1/3 of the way through.
I've finally narrowed it down to the fact that I'm looping over a large amount of data in another thread.
This is the best MCVE that I've come up with:
(ns mandelbrot-redo.irrelevant.write-image-mcve
(:import [java.awt.image BufferedImage]
(java.util.concurrent Executors Executor)))
(defn lazy-producer [width height]
(for [y (range height)
x (range width)]
[x y (+ x y)]))
; This works fine; finishing after about 5 seconds when width=5000
(defn sync-consumer [results width height]
(time
(doseq [[i result] (map vector (range) results)]
(when (zero? (rem i 1e6))
(println (str (double (/ i (* width height) 0.01)) "%")))
((fn boop [x] x) result)))) ; Data gets consumed here
; This gets to ~30%, then begins being interupted by 1-4 second lags
(defn async-consumer [results width height]
(doto
(Thread. ^Runnable
(fn []
(sync-consumer results width height)
(println "Done...")))
(.start)))
(defn -main []
(let [width 5000
height (int (* width 2/3))]
(-> (lazy-producer width height)
(async-consumer width height))))
When -main is run with sync-consumer, it finishes after a few seconds. With async-consumer however, it gets to about 25%, then begins slowing to a crawl to the point where the last printed percentage is 30%. If I leave it, I get an OOME.
If I use an explicit Thread., or use a local thread pool in async-consumer, it hangs and crashes. If I use future however, it finishes fine, just like sync-consumer.
The only hint I've gotten is that when I run this in VisualVM, I see that I have runaway allocation of Longs when using the async version:
The sync version shows a peak amount of Longs to be about 45mb at once in comparison.
The CPU usage is quite different too:
There's massive GC spikes, but it doesn't seem like the Longs are being disposed of.
I could use future for this, but I've been bitten by its exception swallowing behavior so many times, I'm hesitant.
Why is this happening? Why is running this in a new thread causing the GC to go crazy, while at the same time numbers aren't being freed?
Can anyone explain this behavior?
The sync version seems to be processing through the 16M+ results and will not hold onto the head of the results seq due to locals clearing. This means that as you go, values are created, processed, and GC'ed.
The async one closes over results in the fn and will hold the head, keeping all 16M+ values in memory, likely leading to GC thrashing?
I actually can't reproduce what you describe - both sync and async take about the same time for me as written above. (Clojure 1.9, Java 1.8).
I simplified you example, and get inconsistent results. I suspect that the manual Thread object is somehow being regarded (sometimes) as a daemon thread, so the JVM sometimes exits before it has completed:
(def N 5e3)
(def total-count (* N N))
(def report-fact (int (/ total-count 20)))
(defn lazy-producer []
(for [y (range N)
x (range N)]
[x y (+ x y)]))
(defn sync-consumer [results]
(println "sync-consumer: start")
(time
(doseq [[i result] (map vector (range) results)]
(when (zero? (rem i report-fact))
(println (str (Math/round (/ (* 100 i) total-count)) " %")))))
(println "sync-consumer: stop"))
(defn async-consumer [results]
; (spyx (count results))
(spyx (nth results 99))
(let [thread (Thread. (fn []
(println "thread start")
(sync-consumer results)
(println "thread done")
(flush)))]
; (.setDaemon thread false)
(.start thread)
(println "daemon? " (.isDaemon thread))
thread))
(dotest
(println "test - start")
(let [thread (async-consumer
(lazy-producer))]
(when true
(println "test - sleeping")
(Thread/sleep 5000))
; (.join thread)
)
(println "test - end"))
with results:
----------------------------------
Clojure 1.9.0 Java 10.0.1
----------------------------------
lein test tst.demo.core
test - start
(nth results 99) => [99 0 99]
daemon? false
test - sleeping
thread start
sync-consumer: start
0 %
5 %
10 %
15 %
20 %
25 %
30 %
35 %
40 %
45 %
50 %
55 %
test - end
Ran 2 tests containing 0 assertions.
0 failures, 0 errors.
60 %
lein test 54.58s user 1.37s system 372% cpu 15.028 total
If we uncomment the (.join thread) line, we get a complete run:
~/expr/demo > lein test
----------------------------------
Clojure 1.9.0 Java 10.0.1
----------------------------------
lein test tst.demo.core
test - start
(nth results 99) => [99 0 99]
daemon? false
test - sleeping
thread start
sync-consumer: start
0 %
5 %
10 %
15 %
20 %
25 %
30 %
35 %
40 %
45 %
50 %
55 %
60 %
65 %
70 %
75 %
80 %
85 %
90 %
95 %
"Elapsed time: 9388.313828 msecs"
sync-consumer: stop
thread done
test - end
Ran 2 tests containing 0 assertions.
0 failures, 0 errors.
lein test 72.52s user 1.69s system 374% cpu 19.823 total
It seems to exit early as if Clojure killed off the manual Thread object.
Perhaps you have found an (intermittent) bug.
Thanks to #amalloy and #Alex, I got it working.
I implemented the suggestions by #amalloy in the comments, and both variants work here and in my real case:
; Brittle since "once" is apparently more of an implementation detail of the language
(defn async-consumer [results width height]
(doto
(Thread. ^Runnable
(^:once fn* []
(sync-consumer results width height)
(println "Done...")))
(.start)))
; Arguably less brittle under the assumption that if they replace "once" with another mechanism,
; they'll update "delay".
(defn async-consumer [results width height]
(let [d (delay (sync-consumer results width height))]
(doto
(Thread. ^Runnable
(fn []
#d
(println "Done...")))
(.start))))
I also tried updating to 1.9.0. I thought that might fix it since #Alex says he's on 1.9.0 and can't reproduce this, and there's also this bug fix that seems related. Unfortunately, I didn't notice any difference.
It would be nice if there was an actual, solid mechanism for this. ^:once seems fine, but I don't want to use if just to have it potentially break later, and the use of delay seems like blatant abuse of the object just to make use of its inner (^:once fn* ...).
Oh well, at least it works now. Thanks guys.

Guile Scheme parallel forms speedup

I am experimenting with the parallel forms of Guile Scheme and I have the following code:
(use-modules (srfi srfi-1)
(ice-9 pretty-print)
(ice-9 receive))
(define (busy-work limit)
(if (> limit 0)
(begin (sqrt (+ (expt limit limit) 1))
(busy-work (- limit 1)))
'done))
(define (busy-work-2 lst)
(cond [(null? lst) 'done]
[else
(expt (car lst) (car lst))
(busy-work-2 (cdr lst))]))
(define (time thunk)
(define starting-time (current-time))
(define res (thunk))
(define ending-time (current-time))
(display "elapsed time: ")
(display (- ending-time starting-time))
(display "s")
(newline)
res)
(define (partition-4 numbers)
(define (loop numbers rc0 rc1 rc2 rc3)
(cond [(null? numbers) (list (reverse rc0)
(reverse rc1)
(reverse rc2)
(reverse rc3))]
[else
(let* ([number (car numbers)]
[residue (remainder number 4)])
(cond [(= residue 0) (loop (cdr numbers)
(cons number rc0)
rc1
rc2
rc3)]
[(= residue 1) (loop (cdr numbers)
rc0
(cons number rc1)
rc2
rc3)]
[(= residue 2) (loop (cdr numbers)
rc0
rc1
(cons number rc2)
rc3)]
[(= residue 3) (loop (cdr numbers)
rc0
rc1
rc2
(cons number rc3))]))]))
(loop numbers '() '() '() '()))
(or in my experimenting repository at https://github.com/ZelphirKaltstahl/guile-scheme-tutorials/blob/5321470f8f3cbbdb7f64d4ed60e4b1eaf8d8f444/parallellism/utils.scm)
The 2 procedures busy-work and busy-work-2 are pure number crunching, with lists of numbers, where no calculation depends on another, as far as I am aware. I know the time measurement might not be completely accurate.
However, consistently I do not get the expected speedup from using more threads (cores, as I can see in core usage in my CPU indicator).
Here are some examples, from which I would expect 2 threads to be twice as quickly done with the task as 1 core and 4 cores to be twice as fast as 2 cores. Well more or less at least, because I am splitting up the lists in a way, which should spread work more or less evenly.
Using 4 cores and parallel
(let ([residue-classes (partition-4 (iota 30000))])
(time
(lambda ()
(parallel (busy-work-2 (car residue-classes))
(busy-work-2 (cadr residue-classes))
(busy-work-2 (caddr residue-classes))
(busy-work-2 (cadddr residue-classes))))))
This finishes in approximately 10s on my machine. Sometimes 9s sometimes 10s.
Using par-map which uses 4 threads (cores)
(let ([residue-classes (partition-4 (iota 30000))])
(time
(lambda ()
(par-map busy-work-2
residue-classes))))
This finishes in approximately 10s on my machine. Sometimes 9s sometimes 10s. Just like with parallel.
Using n-par-map with 4 threads (on my machine)
(let ([residue-classes (partition-4 (iota 30000))])
(time
(lambda ()
(n-par-map (current-processor-count)
busy-work-2
residue-classes))))
Also 10s. Here the manual (https://www.gnu.org/software/guile/manual/html_node/Parallel-Forms.html) says:
Unlike those above, the functions described below take a number of threads as an argument. This makes them inherently non-portable since the specified number of threads may differ from the number of available CPU cores as returned by current-processor-count (see Processes). In addition, these functions create the specified number of threads when they are called and terminate them upon completion, which makes them quite expensive.
Therefore, they should be avoided.
While I find this explanation does not make 100% sense as it is (why would n-par-map not use the same pre-created threads as parallel, if there are sufficient of those? Like 4 as on my machine?), I do not see any great overhead and again I see it finishes in 10s approximately. My guess is, that the thread creation takes so little time, that it is simply not noticed compared to all the computation when number crunching.
Using n-par-map with 2 threads (cores)
(let ([residue-classes (partition-4 (iota 30000))])
(time
(lambda ()
(n-par-map 2
busy-work-2
residue-classes))))
Expectation: Might finish in 20s.
Result: This finishes in 12s.
Now of course I am thinking: "Well there must be some massive overhead in the runs with 4 cores!".
Question: But where does this overhead come from, when I am doing purely number crunching without inter-dependencies of any results? Does it use some shared memory so that memory access becomes a bottle neck?
You are probably using a machine with two physical cores which are hyperthreaded, so that 4 cpus are reported. What it shows is that this workload does not suit hyperthreads.
I get a similar result on a machine with two hyperthreaded physical cores. However with a machine with 4 physical cores I get 9 secs using all 4 cores, and 16 secs using only 2 cores, which is more along the lines of what you were expecting.

When to use non-blocking >! / threads and blocking >!! / goroutines with clojure core.async

I'm writing a an ETL process to read event level data from a product database, transform / aggregate it and write to to an analytics data warehouse. I'm using clojure's core.async library to separate these process into concurrently executing components. Here's what the main part of my code looks like right now
(ns data-staging.main
(:require [clojure.core.async :as async])
(:use [clojure.core.match :only (match)]
[data-staging.map-vecs]
[data-staging.tables])
(:gen-class))
(def submissions (make-table "Submission" "Valid"))
(def photos (make-table "Photo"))
(def videos (make-table "Video"))
(def votes (make-table "Votes"))
;; define channels used for sequential data processing
(def chan-in (async/chan 100))
(def chan-out (async/chan 100))
(defn write-thread [table]
"infinitely loops between reading subsequent 10000 rows from
table and ouputting a vector of the rows(maps)
into 'chan-in'"
(while true
(let [next-rows (get-rows table)]
(async/>!! chan-in next-rows)
(set-max table (:max-id (last next-rows))))))
(defn aggregator []
"takes output from 'chan-in' and aggregates it by coupon_id, date.
then adds / drops any fields that are needed / not needed and inputs
into 'chan-out'"
(while true
(->>
(async/<!! chan-in)
aggregate
(async/>!! chan-out))))
(defn read-thread []
"reads data from chan out and interts into Analytics DB"
(while true
(upsert (async/<!! chan-out))))
(defn -main []
(async/thread (write-thread submissions))
(async/thread (write-thread photos))
(async/thread (write-thread videos))
(async/thread-call aggregator)
(async/thread-call read-thread))
As you can see, I'm putting each os component on to its own thread and using the blocking >!! call on the channels. It feels like using the non-blocking >! calls along with go routines might be better for this use case, especially for the database reads which spend most of their time performing i/o and waiting for new rows in the product db. Is this the case, and if so, what would be the best way to implement it? I'm a little unclear on all the tradeoffs between the two methods and on exactly how to effectively use go routines. Also any other suggestions on how to improve the overall architecture would be much appreciated!
Personally, I think your use of threads here is probably the right call. The magic non-blocking nature of go-blocks comes from "parking," which is a special sort of pseudo-blocking that core.async's state machine uses — but since your database calls genuinely block instead of putting the state machine into a parked state, you'd just be blocking some thread from the core.async thread pool. It does depend on how long your synchronous calls take, so this is the sort of thing where benchmarks can be informative, but I strongly suspect threads are the right approach here.
The one exception is your aggregator function. It looks to me like it could just be folded into the definition of chan-out, as (def chan-out (map< aggregate chan-in)).
For a general overview of go-blocks versus threads, Martin Trojer wrote a good examination of the two approaches and which one is faster in which situation. The Cliff's Notes version is that go-blocks are good for adapting already-asynchronous libraries for use with core.async, while threads are good for making asynchronous processes out of synchronous parts. If your database had a callback-based API, for example, then go-blocks would be a definite win. But since it is synchronous, they are not a good fit.
i think that it would be a better approach to use "go" macros to have non-blocking-threads in this ETL case.
I've written a very simple code to achieve the synchronized sequence of processes implied in Extract Transform and Load tasks
Type on your repl the following code:
(require '[clojure.core.async :as async :refer [<! >! <!! timeout chan alt! go]])
(def output(chan))
(defn extract [origin]
(let [value-extracted (chan)
value-transformed (chan)
value-loaded (chan)]
(go
(<! (timeout (+ 100 (* 100 (rand-int 20))))) ; wait a little
(>! value-extracted (str origin " > extracted ")))
(go
(<! (timeout (+ 100 (* 100 (rand-int 20))))) ; wait a little
(>! value-transformed (str (<! value-extracted) " > transformed " )))
(go
(<! (timeout (+ 100 (* 100 (rand-int 20))))) ; wait a little
(>! value-loaded (str (<! value-transformed) " > loaded " )))
(go
(<! (timeout (+ 100 (* 100 (rand-int 20))))) ; wait a little
(>! output [origin (<! value-loaded)]))))
(go
(loop [origins-already-loaded []]
(let [[id message] (<! output)
origins-updated (conj origins-already-loaded id)]
(println message)
(println origins-updated)
(recur origins-updated)
)
))
Type on the repl:
(doseq [example (take 10 (range))] (extract example))
1 > extracted > transformed > loaded
[1]
7 > extracted > transformed > loaded
[1 7]
0 > extracted > transformed > loaded
[1 7 0]
8 > extracted > transformed > loaded
[1 7 0 8]
3 > extracted > transformed > loaded
[1 7 0 8 3]
6 > extracted > transformed > loaded
[1 7 0 8 3 6]
2 > extracted > transformed > loaded
[1 7 0 8 3 6 2]
5 > extracted > transformed > loaded
[1 7 0 8 3 6 2 5]
9 > extracted > transformed > loaded
[1 7 0 8 3 6 2 5 9]
4 > extracted > transformed > loaded
[1 7 0 8 3 6 2 5 9 4]
UPDATE:
the error fixed was to use <!! (timeout (+ 100 (* 100 (rand-int 20))))) inside the removed function "wait-a-while" that was blocking the others no blocking go processes

Resources