In Tackling the Awkward Squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell, SPJ states:
For example, perhaps the functional program could be a function mapping an input character string to an output string:
main :: String -> String
Now a "wrapper" program, written in (gasp!) C, can get an input string from somewhere [...] apply the function to it, and store the result somewhere [...]
He then goes on to say that this locates the "sinfulness" in the wrapper, and that the trouble with this approach is that one sin leads to another (e.g. more than one input, deleting files, opening sockets etc).
This seems odd to me. I would have thought Haskell would be most powerful, and possibly even most useful, when approached exactly in this fashion. That is, the input is a character string located in a file, and output is a new character string in a new file. If the input string is some mathematical expression concatenated with data, and the output string is (non-Haskell) code, then you can Get Things Done. In other words, why not always treat Haskell programs as translators? (Or as a compiler, but as a translator you can blend genuine I/O into the final executable.)
Regardless of the wisdom of this as a general strategy (I appreciate that some things we might want to get done may not start out as mathematics), my real question is: if this is indeed the approach, can we avoid the IO a type? Do we need a wrapper in some other language? Does anyone actually do this?
The point is that String -> String is a rather poor model of what programs do in general.
What if you're writing an http server that accepts concurrent pipelined requests and responds to each pipeline concurrently while also interleaving writes from a response in a pipeline along with reads for the next request? This is the level of concurrency that http servers work at.
Maybe, just maybe, you can stuff that into a String -> String program. You could multiplex the pipelines into your single channel. But what about timeouts? Web servers time out connections that trickle in, to prevent slow loris attacks. How are you even going to account for that? Maybe your input string has a sequence of timestamps added at regular intervals, regardless of other inputs? Oh, but what about the variant where the recipient only reads from their receive buffer in trickles? How can you even tell that you are blocked waiting for the send buffer to drain?
If you pursue all the potential problems and stuff them into a String -> String program, you eventually end up with almost all the interesting parts of the server existing outside your haskell program. After all, something has to do the multiplexing, has to do the error detection and reporting, has to do timeouts. If you're writing an http server in Haskell, it would be nice if it was actually written in Haskell.
Of course, none of this means the IO type as it currently exists is the best possible answer. There are reasonable complaints that can be made about it. But it at least allows you to address all of those issues within Haskell.
That wrapper exists. It’s called Prelude.interact. I use the Data.ByteString versions of it often. The wrapper to a pure String -> String function works, because strings are lazily-evaluated singly-linked lists that can process each line of input as it’s read in, but singly-linked lists of UCS-4 characters are a very inefficient data structure.
You still need to use IO for the wrapper because the operations depend on the state of the universe and need to be sequenced with the outside world. In particular, if your program is interactive, you want it to respond to a new keyboard command immediately, and run all the OS system calls in sequence, not (say) process all the input and display all the output at once when it’s ready to quit the program.
A simple program to demonstrate this is:
module Main where
import Data.Char (toUpper)
main :: IO ()
main = interact (map toUpper)
Try running this interactively. Type control-D to quit on Linux or the MacOS console, and control-Z to quit on Windows.
As I mentioned before, though String is not an efficient data structure at all. For a more complicated example, here is the Main module of a program I wrote to normalize UTF-8 input to NFC form.
module Main ( lazyNormalize, main ) where
import Data.ByteString.Lazy as BL ( fromChunks, interact )
import Data.Text.Encoding as E (encodeUtf8)
import Data.Text.Lazy as TL (toChunks)
import Data.Text.Lazy.Encoding as LE (decodeUtf8)
import Data.Text.ICU ( NormalizationMode (NFC) )
import TextLazyICUNormalize (lazyNormalize)
main :: IO ()
main = BL.interact (
BL.fromChunks .
map E.encodeUtf8 .
TL.toChunks . -- Workaround for buffer not always flushing on newline.
lazyNormalize NFC .
LE.decodeUtf8 )
This is an Data.Bytestring.Lazy.interact wrapper around a Data.Text.Lazy.Text -> Data.Text.Lazy.Text function, lazyNormalize with the NormalizationMode constant NFC as its curried first argument. Everything else just converts from the lazy ByteString strings I use to do I/O to the lazy Text strings the ICU library understands, and back. You will probably see more programs written with the & operator than in this point-free style.
There are multiple questions here:
If the input string is some mathematical expression concatenated with data, and the output string is (non-Haskell) code, then you can Get Things Done. In other words, why not always treat Haskell programs as translators? ([... because] as a translator you can blend genuine I/O into the final executable.)
Regardless of the wisdom of this as a general strategy [...] if this is indeed the approach, can we avoid the IO a type?
Do we need a wrapper in some other language?
Does anyone actually do this?
Using your informal description, main would have a type signature resembling:
main :: (InputExpr, InputData) -> OutputCode
If we drop the InputData component:
main :: InputExpr -> OutputCode
then (as you noted) main really does look like a translator...but we already have programs for doing that - compilers!
While task-specific translators have their place, using them everywhere when we already have general-purpose ones seems somewhat redundant...
...so it's use as as a general strategy is speculative at best. This is probably why such an approach was never officially "built into" Haskell (instead being implemented using Haskell's I/O facilities e.g. interact for simple string-to-string interactions).
As for getting by without the monadic IO type - for a single-purpose language like Dhall, it could be possible...but can that technique also be used to build webservers or operating systems? (That is left as an exercise for intrepid readers :-)
In theory: no - the various Lisp Machines being the canonical example.
In practice: yes - as the variety of microprocessor and assembly languages grows ever larger, portably defining the essential runtime services (parallelism, GC, I/O, etc) in an existing programming language these days is a necessity.
Not exactly - perhaps the closest to what you're described is (again) the string-to-string interactivity supported in the venerable Lazy ML system from Chalmers or the original version of Miranda(R).
Related
Is there any difference in how Data.Text and Data.Vector.Unboxed Char work internally? Why would I choose one over the other?
I always thought it was cool that Haskell defines String as [Char]. Is there a reason that something analagous wasn't done for Text and Vector Char?
There certainly would be an advantage to making them the same.... Text-y and Vector-y tools could be written to be used in both camps. Imagine Ropes of Ints, or Regexes on strings of poker cards.
Of course, I understand that there were probably historical reasons and I understand that most current libraries use Data.Text, not Vector Char, so there are many practical reasons to favor one over the other. But I am more interested in learning about the abstract qualities, not the current state that we happen to be in.... If the whole thing were rewritten tomorrow, would it be better to unify the two?
Edit, with more info-
To put stuff into perspective-
According to this page, http://www.haskell.org/haskellwiki/GHC/Memory_Footprint, GHC uses 16 bytes for each Char in your program!
Data.Text is not O(1) index'able, it is O(n).
Ropes (binary trees wrapped around text) can also hold strings.... They have better complexity for index/insert/delete, although depending on the number of nodes and balance of the tree, index could be close to that of Text.
This is my takeaway from this-
Text and Vector Char are different internally....
Use String if you don't care about performance.
If performance is important, default to using Text.
If fast indexing of chars is necessary, and you don't mind a lot of memory overhead (up to 16x), use Vector Char.
If you want to insert/delete a lot of data, use Ropes.
It's a fairly bad idea to think of Text as being a list of characters. Text is designed to be thought of as an opaque, user-readable blob of Unicode text. Character boundaries might be defined based on encoding, locale, language, time of month, phase of the moon, coin flips performed by a blinded participant, and migratory patterns of Venezuela's national bird whatever it may be. The same story happens with sorting, up-casing, reversing, etc.
Which is a long way of saying that Text is an abstract type representing human language and goes far out of its way to not behave just the same way as its implementation, be it a ByteString, a Vector UTF16CodePoint, or something totally unique (which is the case).
To clarify this distinction take note that there's no guarantee that unpack . pack witnesses an isomorphism, that the preferred ways of converting from Text to ByteString are in Data.Text.Encoding and are partial, and that there's a whole sophisticated plug-in module text-icu littered with complex ways of handling human language strings.
You absolutely should use Text if you're dealing with a human language string. You should also be really careful to treat it with care since human language strings are not easily amenable to computer processing. If your string is better thought of as a machine string, you probably should use ByteString.
The pedagogical advantages of type String = [Char] are high, but the practical advantages are quite low.
To add to what J. Abrahamson said, it's also worth making the distinction between iterating over runes (roughly character by character, but really could be ideograms too) as opposed to unitary logical unicode code points. Sometimes you need to know if you're looking at a code point that has been "decorated" by a previous code point.
In the case of the latter, you then have to make the distinction between code points that stand alone (such as letters, ideograms) and those that modify the text that follows (right-to-left code point, diacritics, etc).
Well implemented unicode libraries will typically abstract these details away and let you process the text in a more or less character-by-character fashion but you have to drop certain assumptions that come from thinking in terms of ASCII.
A byte is not a character. A logical unit of text isn't necessarily a "character". Not every code point stands alone, some decorate/annotate the following code point or even the rest of the byte stream until invalidated (right-to-left).
Unicode is hard. There is no one true encoding that will eliminate the difficulty of encapsulating the variety inherent in human language. Data.Text does a respectable job of it though.
To summarize:
The methods of processing are:
byte-by-byte - totally invalid for unicode, only applicable to latin-1/ASCII
code point by code point - works for processing unicode, but is lower-level than people realize
logical rune-by-rune - what you actually want
The types are:
String (aka [Char]) - has a limited scope. Best used for teaching Haskell or for legacy use-cases.
Text - the preferred way to handle "human" text.
Bytestring - for byte streams, raw data, binary etc.
I'm considering converting a C# app to Haskell as my first "real" Haskell project. However I want to make sure it's a project that makes sense. The app collects data packets from ~15 serial streams that come at around 1 kHz, loads those values into the corresponding circular buffers on my "context" object, each with ~25000 elements, and then at 60 Hz sends those arrays out to OpenGL for waveform display. (Thus it has to be stored as an array, or at least converted to an array every 16 ms). There are also about 70 fields on my context object that I only maintain the current (latest) value, not the stream waveform.
There are several aspects of this project that map well to Haskell, but the thing I worry about is the performance. If for each new datapoint in any of the streams, I'm having to clone the entire context object with 70 fields and 15 25000-element arrays, obviously there's going to be performance issues.
Would I get around this by putting everything in the IO-monad? But then that seems to somewhat defeat the purpose of using Haskell, right? Also all my code in C# is event-driven; is there an idiom for that in Haskell? It seems like adding a listener creates a "side effect" and I'm not sure how exactly that would be done.
Look at this link, under the section "The ST monad":
http://book.realworldhaskell.org/read/advanced-library-design-building-a-bloom-filter.html
Back in the section called “Modifying array elements”, we mentioned
that modifying an immutable array is prohibitively expensive, as it
requires copying the entire array. Using a UArray does not change
this, so what can we do to reduce the cost to bearable levels?
In an imperative language, we would simply modify the elements of the
array in place; this will be our approach in Haskell, too.
Haskell provides a special monad, named ST, which lets us work
safely with mutable state. Compared to the State monad, it has some
powerful added capabilities.
We can thaw an immutable array to give a mutable array; modify the
mutable array in place; and freeze a new immutable array when we are
done.
...
The IO monad also provides these capabilities. The major difference between the two is that the ST monad is intentionally designed so that we can escape from it back into pure Haskell code.
So should be possible to modify in-place, and it won't defeat the purpose of using Haskell after all.
Yes, you would probably want to use the IO monad for mutable data. I don't believe the ST monad is a good fit for this problem space because the data updates are interleaved with actual IO actions (reading input streams). As you would need to perform the IO within ST by using unsafeIOToST, I find it preferable to just use IO directly. The other approach with ST is to continually thaw and freeze an array; this is messy because you need to guarantee that old copies of the data are never used.
Although evidence shows that a pure solution (in the form of Data.Sequence.Seq) is often faster than using mutable data, given your requirement that data be pushed out to OpenGL, you'll possible get better performance from working with the array directly. I would use the functions from Data.Vector.Storable.Mutable (from the vector package), as then you have access to the ForeignPtr for export.
You can look at arrows (Yampa) for one very common approach to event-driven code. Another area is Functional Reactivity (FRP). There are starting to be some reasonably mature libraries in this domain, such as Netwire or reactive-banana. I don't know if they'd provide adequate performance for your requirements though; I've mostly used them for gui-type programming.
I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!
It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.
I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.
Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.
Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].
I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.
I am trying to figure out how to do the following, assume that your are working on a controller for a DC motor you want to keep it spinning at a certain speed set by the user,
(def set-point (ref {:sp 90}))
(while true
(let [curr (read-speed)]
(controller #set-point curr)))
Now that set-point can change any time via a web a application, I can't think of a way to do this without using ref, so my question is how functional languages deal with this sort of thing? (even though the example is in clojure I am interested in the general idea.)
This will not answer your question but I want to show how these things are done in Clojure. It might help someone reading this later so they don't think they have to read up on monads, reactive programming or other "complicated" subjects to use Clojure.
Clojure is not a purely functional language and in this case it might be a good idea to leave the pure functions aside for a moment and model the inherent state of the system with identities.
In Clojure, you would probably use one of the reference types. There are several to choose from and knowing which one to use might be difficult. The good news is they all support the unified update model so changing the reference type later should be pretty straight forward.
I've chosen an atom but depending on your requirements it might be more appropriate to use a ref or an agent.
The motor is an identity in your program. It is a "label" for some thing that has different values at different times and these values are related to each other (i.e., the speed of the motor). I have put a :validator on the atom to ensure that the speed never drops below zero.
(def motor (atom {:speed 0} :validator (comp not neg? :speed)))
(defn add-speed [n]
(swap! motor update-in [:speed] + n))
(defn set-speed [n]
(swap! motor update-in [:speed] (constantly n)))
> (add-speed 10)
> (add-speed -8)
> (add-speed -4) ;; This will not change the state of motor
;; since the speed would drop below zero and
;; the validator does not allow that!
> (:speed #motor)
2
> (set-speed 12)
> (:speed #motor)
12
If you want to change the semantics of the motor identity you have at least two other reference types to choose from.
If you want to change the speed of the motor asynchronously you would use an agent. Then you need to change swap! with send. This would be useful if, for example, the clients adjusting the motor speed are different from the clients using the motor speed, so that it's fine for the speed to be changed "eventually".
Another option is to use a ref which would be appropriate if the motor need to coordinate with other identities in your system. If you choose this reference type you change swap! with alter. In addition, all state changes are run in a transaction with dosync to ensure that all identities in the transaction are updated atomically.
Monads are not needed to model identities and state in Clojure!
For this answer, I'm going to interpret "a purely functional language" as meaning "an ML-style language that excludes side effects" which I will interpret in turn as meaning "Haskell" which I'll interpret as meaning "GHC". None of these are strictly true, but given that you're contrasting this with a Lisp derivative and that GHC is rather prominent, I'm guessing this will still get at the heart of your question.
As always, the answer in Haskell is a bit of sleight-of-hand where access to mutable data (or anything with side effects) is structured in such a way that the type system guarantees that it will "look" pure from the inside, while producing a final program that has side effects where expected. The usual business with monads is a large part of this, but the details don't really matter and mostly distract from the issue. In practice, it just means you have to be explicit about where side effects can occur and in what order, and you're not allowed to "cheat".
Mutability primitives are generally provided by the language runtime, and accessed through functions that produce values in some monad also provided by the runtime (often IO, sometimes more specialized ones). First, let's take a look at the Clojure example you provided: it uses ref, which is described in the documentation here:
While Vars ensure safe use of mutable storage locations via thread isolation, transactional references (Refs) ensure safe shared use of mutable storage locations via a software transactional memory (STM) system. Refs are bound to a single storage location for their lifetime, and only allow mutation of that location to occur within a transaction.
Amusingly, that whole paragraph translates pretty directly to GHC Haskell. I'm guessing that "Vars" are equivalent to Haskell's MVar, while "Refs" are almost certainly equivalent to TVar as found in the stm package.
So to translate the example to Haskell, we'll need a function that creates the TVar:
setPoint :: STM (TVar Int)
setPoint = newTVar 90
...and we can use it in code like this:
updateLoop :: IO ()
updateLoop = do tvSetPoint <- atomically setPoint
sequence_ . repeat $ update tvSetPoint
where update tv = do curSpeed <- readSpeed
curSet <- atomically $ readTVar tv
controller curSet curSpeed
In actual use my code would be far more terse than that, but I've left things more verbose here in hopes of being less cryptic.
I suppose one could object that this code isn't pure and is using mutable state, but... so what? At some point a program is going to run and we'd like it to do input and output. The important thing is that we retain all the benefits of code being pure, even when using it to write code with mutable state. For instance, I've implemented an infinite loop of side effects using the repeat function; but repeat is still pure and behaves reliably and nothing I can do with it will change that.
A technique to tackle problems that apparently scream for mutability (like GUI or web applications) in a functional way is Functional Reactive Programming.
The pattern you need for this is called Monads. If you really want to get into functional programming you should try to understand what monads are used for and what they can do. As a starting point I would suggest this link.
As a short informal explanation for monads:
Monads can be seen as data + context that is passed around in your program. This is the "space suit" often used in explanations. You pass data and context around together and insert any operation into this Monad. There is usually no way to get the data back once it is inserted into the context, you just can go the other way round inserting operations, so that they handle data combined with context. This way it almost seems as if you get the data out, but if you look closely you never do.
Depending on your application the context can be almost anything. A datastructure that combines multiple entities, exceptions, optionals, or the real world (i/o-monads). In the paper linked above the context will be execution states of an algorithm, so this is quite similar to the things you have in mind.
In Erlang you could use a process to hold the value. Something like this:
holdVar(SomeVar) ->
receive %% wait for message
{From, get} -> %% if you receive a get
From ! {value, SomeVar}, %% respond with SomeVar
holdVar(SomeVar); %% recursively call holdVar
%% to start listening again
{From, {set, SomeNewVar}} -> %% if you receive a set
From ! {ok}, %% respond with ok
holdVar(SomeNewVar); %% recursively call holdVar with
%% the SomeNewVar that you received
%% in the message
end.
This is really just a conceptual question for me at this point.
In Lisp, programs are data and data are programs. The REPL does exactly that - reads and then evaluates.
So how does one go about getting input from the user in a secure way? Obviously it's possible - I mean viaweb - now Yahoo!Stores is pretty secure, so how is it done?
The REPL stands for Read Eval Print Loop.
(loop (print (eval (read))))
Above is only conceptual, the real REPL code is much more complicated (with error handling, debugging, ...).
You can read all kinds of data in Lisp without evaluating it. Evaluation is a separate step - independent from reading data.
There are all kinds of IO functions in Lisp. The most complex of the provided functions is usually READ, which reads s-expressions. There is an option in Common Lisp which allows evaluation during READ, but that can and should be turned off when reading data.
So, data in Lisp is not necessarily a program and even if data is a program, then Lisp can read the program as data - without evaluation. A REPL should only be used by a developer and should not be exposed to arbitrary users. For getting data from users one uses the normal IO functions, including functions like READ, which can read S-expressions, but does not evaluate them.
Here are a few things one should NOT do:
use READ to read arbitrary data. READ for examples allows one to read really large data - there is no limit.
evaluate during READ ('read eval'). This should be turned off.
read symbols from I/O and call their symbol functions
read cyclical data structures with READ, when your functions expect plain lists. Walking down a cyclical list can keep your program busy for a while.
not handle syntax errors during reading from data.
You do it the way everyone else does it. You read a string of data from the stream, you parse it for your commands and parameters, you validate the commands and parameters, and you interpret the commands and parameters.
There's no magic here.
Simply put, what you DON'T do, is you don't expose your Lisp listener to an unvalidated, unsecure data source.
As was mentioned, the REPL is read - eval - print. #The Rook focused on eval (with reason), but do not discount READ. READ is a VERY powerful command in Common Lisp. The reader can evaluate code on its own, before you even GET to "eval".
Do NOT expose READ to anything you don't trust.
With enough work, you could make a custom package, limit scope of functions avaliable to that package, etc. etc. But, I think that's more work than simply writing a simple command parser myself and not worrying about some side effect that I missed.
Create your own readtable and fill with necessary hooks: SET-MACRO-CHARACTER, SET-DISPATCH-MACRO-CHARACTER et al.
Bind READTABLE to your own readtable.
Bind READ-EVAL to nil to prevent #. (may not be necessary if step 1 is done right)
READ
Probably something else.
Also there is a trick in interning symbols in temporary package while reading.
If data in not LL(1)-ish, simply write usual parser.
This is a killer question and I thought this same thing when I was reading about Lisp. Although I haven't done anything meaningful in LISP so my answer is very limited.
What I can tell you is that eval() is nasty. There is a saying that I like "If eval is the answer then you are asking the wrong question." --Unknown.
If the attacker can control data that is then evaluated then you have a very serious remote code execution vulnerability. This can be mitigated, and I'll show you an example with PHP, because that is what I know:
$id=addslashes($_GET['id']);
eval('$test="$id";');
If you weren't doing an add slashes then an attacker could get remote code execution by doing this:
http://localhost?evil_eval.php?id="; phpinfo();/*
But the add slashes will turn the " into a \", thus keeping the attacker from "breaking out" of the "data" and being able to execute code. Which is very similar to sql injection.
I found that question quit controversial. The eval wont eval your input unless you explicitly ask for it.
I mean your input will not be treat it as a LISP code but instead as a string.
Is not because that your language have powerfull concept like the eval that it is not "safe".
I think the confusion come from SQL where your actually treat an input as a [part of] SQL.
(query (concatenate 'string "SELECT * FROM foo WHERE id = " input-id))
Here input-id is being evaluate by the SQL engine.
This is because you have no nice way to write SQL, or whatever, but the point is that your input become part of what is being evaluate.
So eval don't bring you insecurity unless your are using it eyes closed.
EDIT Forgot to tell that this apply to any language.