Racket: Storing bytes then outputting them all at the end of program - io

My program currently writes bytes using write-byte throughout the program.
When there is an error in the program, the program stops there but I've realized that this still leaves the previously written bytes (before encountering the error).
I was wondering if it is possible to hold on to all the bytes that I want to output until the successful ending of the program so that if the program encounters an error before the end of the program, it outputs nothing, and if no error is encountered, then I can output all the bytes that I wanted to write.

You can wrap your program in with-output-to-bytes to produce a bytestring value instead of writing directly to stdout:
(with-output-to-bytes
(λ ()
(write-bytes #"a")
(write-bytes #"b")))
Internally, this is just a super simple wrapper around open-output-bytes and a parameterization of current-output-port, so if you want more fine-grained control, you can use those directly. For example, if you have a simple script and don’t want to wrap the whole program, you can mutate the current-output-port parameter globally:
(define stdout (current-output-port))
(define output (open-output-bytes))
(current-output-port output)
(void
(begin
(write-bytes #"a")
(write-bytes #"b")))
(write-bytes (get-output-bytes output) stdout)
However, be careful: mutating current-output-port like that will affect everything that prints, including the output from expressions evaluated at a module level, which is why it is necessary to wrap the write-bytes invocations with void above.

One can add bytes to a list and print them together later:
(define lst '())
(set! lst (cons #"a" lst))
(set! lst (cons #"b" lst))
(println lst)
(for ((item (reverse lst)))
(write-bytes item))
Output:
'(#"b" #"a")
ab
List has to be reversed since 'cons' adds item to the head of the list.

Related

Recursively reading a file in Racket

I am struggling to understand how to read a file line by line with racket, while passing each line to a recursive function.
According to the manual, the idiomatic way of doing this is something like the following example:
(with-input-from-file "manylines.txt"
(lambda ()
(for ([l (in-lines)])
(op l))))
What if my function op is a recursive function that needs to do some complicated operations depending on the line just read from file and also on the history of the recursion?
For example, I could have a function like this:
(define (op l s)
;; l is a string, s is a list
(cond ((predicate? l)
(op (next-line-from-file) (cons (function-yes l) s)))
(else
(op (next-line-from-file) (append (function-no l) s)))))
I am not sure how to use this function within the framework described by the manual.
Here next-line-from-file is a construct I made up to make it clear that I would like to keep reading the file.
I think I could do what I want by introducing side effects, for example:
(with-input-from-file "manylines.txt"
(lambda ()
(let ((s '()))
(for ([l (in-lines)])
(if (predicate? l)
(let ((prefix (function-yes l)))
(set-cdr! s s)
(set-car! s prefix))
(let ((prefix (function-no l)))
(set-cdr! prefix s)
(set-car! s prefix)))))))
I actually did not try to run this code, so I'm not sure it would work.
Anyway I would bet that this common task can be solved without introducing side effects, but how?
Two approaches that Racket supports rather well are to turn the port into something which is essentially a generator of lines, or into a stream. You can then pass these things around as arguments to whatever function you are using in order to successively read lines from the file.
The underlying thing in both of these is that ports are sequences, (in-lines p) returns another sequence which consists of the lines from p, and then you can turn these into generators or streams.
Here's a function which will cat a file (just read its lines in other words) using a generator:
(define (cat/generator f)
(call-with-input-file f
(λ (p)
(let-values ([(more? next) (sequence-generate (in-lines p))])
(let loop ([carry-on? (more?)])
(when carry-on?
(displayln (next))
(loop (more?))))))))
Here call-with-input-file deals with opening the file and calling its second argument with a suitable port. in-lines makes a sequence of lines from the port, and sequence-generate then takes any sequence and returns two thunks: one tells you if the sequence is exhausted, and one returns the next thing in it if it isn't. The remainder of the function just uses these functions to print the lines of the file.
Here's an equivalent function which does it using a stream:
(define (cat/stream f)
(call-with-input-file f
(λ (p)
(let loop ([s (sequence->stream (in-lines p))])
(unless (stream-empty? s)
(displayln (stream-first s))
(loop (stream-rest s)))))))
Here the trick is that sequence->stream returns a stream corresponding to a sequence, and then stream-empty? will tell you if you're at the end of the stream, and if it's not empty, then stream-first returns the first element (conceptually the car) while stream-rest returns a stream of all the other elements.
The second one of these is nicer I think.
One nice thing is that lists are streams so you can write functions which use the stream-* functions, test them on lists, and then use them on any other kind of stream, which means any other kind of sequence, and the functions will never know.
I recently implement something similar, except in my case the predicate depended on the following line, not the preceding one. In any case, I found it simplest to discard in-lines and use read-line recursively. Since the predicate depended on unread input, I used peek-string to look ahead in the input stream.
If you really want to use in-lines, you might like to experiment with sequence-fold:
(sequence-fold your-procedure '() (in-lines))
Notice this uses an accumulator, which you could use to check the previous results from your procedure. However, if you're building a list, you generally want to build it backwards using cons, so the most recent element is at the head of the list and can be accessed in constant time. Once you're done, reverse the list.

Cursive: Clojure's *out*, different Writers, flushing and ordering inconsistency when multithreaded: what is going on?

tl;dr Why Clojure creates a separate Writer for threads in newFixedThreadPool? Why it may be flushed after the pool is terminated? Why the behaviour can only be reproduced in Cursive?
Suppose we have an application that does something in separate threads, and that something writes to stdout. Suppose that after we've done everything, we want to print a final message.
First thing we'll run into is that Clojure's println, if multiple arguments supplied, will produce interleaved output. This is covered here.
But there seems to be another problem. If we run something like this:
(defn main []
(let [pool (make-pool num-threads)]
(print-multithreaded pool "Hello, world!")
(shutdown-pool pool))
(safe-println "All done, have a nice day."))
We'll sometimes have
Hello, world!
All done, have a nice day.
and sometimes
All done, have a nice day.
Hello, world!
Maybe flush after each write?
(defn safe-println [& more]
(.write *out* (str (clojure.string/join " " more) "\n"))
(.flush *out*))
Doesn't work.
What works is resorting to explicit Java interop on top of System.out, like this:
(defn safe-println [& more]
(let [writer (System/out)]
(.println writer (str (clojure.string/join " " more)))
(.flush writer)))
Making writer a (PrintWriter. System/out) or (OutputStreamWriter. System/out) also works.
Seems like we have different *out*s in our threads... Indeed,
(def out *out*)
(defn safe-println [& more]
(.write out (str (clojure.string/join " " more) "\n"))
(.flush out))
works.
So here's the question: why is this happening? With Java pieces, it makes sense: System.out is static final, so only one instance exists for all the threads, and everything talks to it, so everything adds to the same buffer. With printing to Clojure's *out*, main thread and pooled threads have their own *out*, with their own buffers (and for main thread it's a PrintWriter, for pooled ones it's a shared OutputStreamWriter). I don't really get why it's like that in the first place, and I don't really get why it results to incosistent ordering: we explicitly finish all our threads before calling the final print, which should induce implicit flush. But even if we add an explicit flush, the result stays the same.
I might be missing some really obvious detail here, and I'd be glad if you helped me out. If you'd like to see the whole reproducible example, which I don't include here because of its length, here's a link to the gist: https://gist.github.com/trueneu/b8498aa259899a8fc979090fccf632de
EDIT: First version of gist actually works and you have to tinker with it to break it, so I edited it to demonstrate "incorrect" behaviour from the start.
Also, to remove any misunderstandings, here's a screenshot from Cursive: https://ibb.co/jHqSL0
EDIT2: This was pointed out in original question, but I'll put some emphasis. Understanding the point and mechanism of this behaviour is half of the question. New *out* is not created for each thread. But it seems to be creating a separate one for the thread pool. (For this output, reduce num-threads to 1, and add printing of (.toString *out*) to safe-println. Increasing num-threads doesn't produce new object addresses):
(main)
java.io.PrintWriter#1dcc77c6
All done, have a nice day.
=> nil
java.io.OutputStreamWriter#7104a76f
Hello, world!
EDIT3: Changed map with doseq after #glts comment.
Also, when run from lein repl, it always produces correct output, which confuses me further. So as David Arenas's pointed out, seems like behaviour depends on upstream output handling. However, questions still stand.
EDIT4: David Arenas also checked that in Cider, and cannot reproduce the behaviour. Seems that it has something to do with Cursive's nrepl output handling implementation.
Clojure's *out* does not create an instance for each thread (it is also static final), but it does use OutputStreamWriter which has no atomic guarantees. You would need to synchronize threads on a buffer since you are writing to a single stream.
If you run your code using nrepl you'll see that you get the "correct" behavior. This is because they re-bind out to their own writer which uses a locking buffer.
nrepl's session-out:
(defn- session-out
"Returns a PrintWriter suitable for binding as *out* or *err*. All of
the content written to that PrintWriter will (when .flush-ed) be sent on the
given transport in messages specifying the given session-id.
`channel-type` should be :out or :err, as appropriate."
[channel-type session-id transport]
(let [buf (clojure.tools.nrepl.StdOutBuffer.)]
(PrintWriter. (proxy [Writer] []
(close [] (.flush ^Writer this))
(write [& [x ^Integer off ^Integer len]]
(locking buf
(cond
(number? x) (.append buf (char x))
(not off) (.append buf x)
; the CharSequence overload of append takes an *end* idx, not length!
(instance? CharSequence x) (.append buf ^CharSequence x (int off) (int (+ len off)))
:else (.append buf ^chars x off len))
(when (<= *out-limit* (.length buf))
(.flush ^Writer this))))
(flush []
(let [text (locking buf (let [text (str buf)]
(.setLength buf 0)
text))]
(when (pos? (count text))
(t/send (or (:transport *msg*) transport)
(response-for *msg* :session session-id
channel-type text))))))
true)))

unread-char behaviour deviating from spec?

On the Common Lisp HyperSpec page for unread-char - see
here - it says both of the following things:
"unread-char is intended to be an efficient mechanism for allowing the Lisp reader and other
parsers to perform one-character lookahead in input-stream."
"It is an error to invoke unread-char twice consecutively on the same stream without an
intervening call to read-char (or some other input operation which implicitly reads characters)
on that stream."
I'm investigating how to add support for multiple-character lookahead for CL streams for a
parser I'm planning to write, and just to confirm the above, I ran the following code:
(defun unread-char-test (data)
(with-input-from-string (stream data)
(let ((stack nil))
(loop
for c = (read-char stream nil)
while c
do (push c stack))
(loop
for c = (pop stack)
while c
do (unread-char c stream)))
(coerce
(loop
for c = (read-char stream nil)
while c
collect c)
'string)))
(unread-char-test "hello")
==> "hello"
It doesn't throw an error (on SBCL or CCL, I haven't tested it on other implementations yet) but I don't see how there can possibly be any read
operations (implicit or explicit) taking place on the stream between the consecutive calls
to unread-char.
This behaviour is good news for multiple-character lookahead, as long as it is consistent, but why
isn't an error being thrown?
In response to user jkiiski's comment I did some more digging. I defined a function similar to the above but that takes the stream as an argument (for easier reuse):
(defun unread-char-test (stream)
(let ((stack nil))
(loop
for c = (read-char stream nil)
while c
do (push c stack))
(loop
for c = (pop stack)
while c
do (unread-char c stream)))
(coerce
(loop
for c = (read-char stream nil)
while c
collect c)
'string))
I then ran the following in a second REPL:
(defun create-server (port)
(usocket:with-socket-listener (listener "127.0.0.1" port)
(usocket:with-server-socket (connection (usocket:socket-accept listener))
(let ((stream (usocket:socket-stream connection)))
(print "hello" stream)))))
(create-server 4000)
And the following in the first REPL:
(defun create-client (port)
(usocket:with-client-socket (connection stream "127.0.0.1" port)
(unread-char-test stream)))
(create-client 4000)
And it did throw the error I expected:
Two UNREAD-CHARs without intervening READ-CHAR on #<BASIC-TCP-STREAM ISO-8859-1 (SOCKET/4) #x302001813E2D>
[Condition of type SIMPLE-ERROR]
This suggests that jkiiski's assumption is correct. The original behaviour was also observed when the input was read from a text file, like so:
(with-open-file (stream "test.txt" :direction :output)
(princ "hello" stream))
(with-open-file (stream "test.txt")
(unread-char-test stream)))
==> "hello"
I imagine that, when dealing with local file I/O, the implementation reads large chunks of a file into memory, and then read-char reads from the buffer. If correct, this also supports the assumption that the error described in the specification is not thrown by typical implementations when unreading from a stream whose contents are in-memory.

How to exhaust a channel's values and then return the result (ClojureScript)?

Suppose that channel chan has the values "1" and "2" on queue.
Goal: Make a function which takes chan and returns the vector [1 2]. Note that I am totally fine if this function has to block for some time before its value is returned.
Attempt:
(defn chan->vector
[chan]
(let [a (atom true) v []]
(while (not-nil? #a)
(go
(reset! a (<! chan))
(into v #a)
(reset! a (<! chan))
)
) v
)
)
Result: My REPL freezes and eventually spits out a huge error. I have come to realize that this is because the (go ...) block is asynchronous, and so immediately returns. Thus the atom іn my (while ...) loop is never given a chance to be set to nil and the loop can never terminate.
So how do I accomplish the desired result? In case it's relevant, I'm using ClojureScript and targetting nodejs.
you should use alts! from core.async to fulfill this task
(https://clojure.github.io/core.async/#clojure.core.async/alts!):
(def x (chan 10))
(go (>! x 1)
(>! x 2)
(>! x 3))
(defn read-all [from-chan]
(<!! (go-loop [res []]
(let [[v _] (alts! [from-chan] :default :complete)]
(if (= v :complete)
res
(recur (conj res v)))))))
(read-all x)
;; output: [1 2 3]
(read-all x)
;; output: []
(go (>! x 10)
(>! x 20)
(>! x 30)
(>! x 40))
(read-all x)
;; output: [10 20 30 40]
inside the go-loop this (a/alts! [from-chan] :default :complete) tries to read any value from channel, and in case there are no value available at once it emits the default value, so you will see you should break the loop and return accumulated values.
update: as the blocking read (<!!) is absent in cljs, you can rewrite it the following way:
(defn read-all [from-chan]
(go-loop [res []]
(let [[v _] (alts! [from-chan] :default :complete)]
(if (= v :complete)
res
(recur (conj res v)))))))
so it will return the channel, and then just read one value from there:
(go (let [res (<! (read-all x))]
(println res)
;; do something else
))
You can use clojure.core.async/reduce:
;; demo setup
(def ch (async/chan 2))
(async/>!! ch :foo)
(async/>!! ch :bar)
;; background thread to print reduction result
(async/thread
(prn (async/<!! (async/reduce conj [] ch))))
;; closing the channel…
(async/close! ch)
;; …terminates the reduction and the result gets printed out:
;; [:foo :bar]
clojure.core.async/reduce returns a channel that will produce a value if and when the original channel closes. Internally it uses a go block and will release control in between taking elements from the original channel.
If you want to produce a value after a certain amount of time passes whether or not the original channel closes, you can either wrap the original channel in a pass-through channel that will close itself after a timeout passes or you can use a custom approach to the reduction step (perhaps the approach suggested by #leetwinski).
Use into
Returns a channel containing the single (collection) result of the
items taken from the channel conjoined to the supplied collection. ch
must close before into produces a result.
Something like this should work (it should print the events from events-chan given events-chan closes when it is done publishing events):
(go
(println (<! (into [] events-chan))))
The source channel needs to end (close), otherwise you can't put all events into a collection.
Edit:
Re-read your question, and it is not very clear what you want to accomplish. Whatever you want to do, chan->vector needs to return a channel so that whoever calls it can wait for the result. In fact, chan->vector is exactly into:
; chan->vector ch:Chan<Event> -> Chan<Vector[Event]>
(defn chan->vector [ch]
(into [] ch))
(go
(let [events (<! (chan->vector events-chan))]
(println events) ; Do whatever with the events vector
))
As I mentioned above, if the events chan never closes, then you have to do more thinking about how to consume the events. There is no magic solution. Do you want to batch the events by time intervals? By number of events? By a combination of those?
In summary, as mentioned above, chan->vector is into.
While possible in Clojure and many other languages, what you want to do is not possible in ClojureScript.
You want a function that blocks while listening to a channel. However, ClojureScript's version of core.async doesn't include the blocking operators. Why? Because ClojureScript doesn't block.
I couldn't find a reliable source to back that last sentence. There seems to be a lot of confusion around this topic on the web. However, I'm pretty sure of what I'm saying because ClojureScript ultimately becomes JavaScript, and that's how JavaScript works.
Indeed, JavaScript never blocks, neither on the browser nor in Node.js. Why? As far as I understand, it uses a single thread, so if it were to block, the user would be unable to do anything in the browser.
So it's impossible to do what you want. This is by design, because it could have disastrous UX effects. ClojureScript channels are like JavaScript events; in the same way you don't want an event listener to block the user interface while waiting for an event to happen, you also shouldn't want a channel to block while waiting for new values.
Instead, try using a callback function that gets called whenever a new value is delivered.

Read-only strings in Clisp

I'm noticing some inconsistency in the output of this code in Clisp:
(defvar str "Another")
(setf (char str 3) #\!)
When I run it from the repl, I get the desired result:
[1]> (defvar str "Another")
STR
[2]> (setf (char str 3) #\!)
#\!
[3]> str
"Ano!her"
[4]>
However, when I run it from a script, I get a warning about modifying a readonly string:
*** - Attempt to modify a read-only string: "Another"
I got that error when running this code:
(print (do ((str "foobar")
(i 0 (+ i 1)))
((= i (length str)) str)
(setf (char str i) #\!)))
What's the point of making the string read-only(I'm assuming this is the same as immutable) when the binding will dissappear when the block ends?
And, why the discrepancy between the two outputs?
Lastly, is there a way to turn it off? I don't find the warning particularly useful.
Solution
First of all, what you are seeing is an error, not a warning.
Second, you cannot turn it off, but you can avoid it by copying the immutable string:
(print (do ((str (copy-seq "foobar"))
(i 0 (+ i 1)))
((= i (length str)) str)
(setf (char str i) #\!)))
Motivation
Why some data is made immutable is a topic much discussed on the web.
The basic reasons are:
safety in a multithreaded environment and
better compilers
Justification
As per the manual:
An attempt to modify read-only data SIGNALs an ERROR. Program text and
quoted constants loaded from files are considered read-only data. This
check is only performed for strings, not for conses, other kinds of
arrays, and user-defined data types.
This is explicitly permitted by the ANSI CL spec:
implementations are not required to detect attempts to modify
immutable objects or cells; the consequences of attempting to make
such modification are undefined

Resources