List of Lua commands/statements to watch out? - security

I'm making a moddable game and thinking about using Lua as the language for my players to write their own scripts.
But like any programming language, Lua bound to have some "loopholes" for not-so-nice users to do bad things.
I'm new to Lua, so I don't really know what Lua "can" do.
I did a little reasearch online and found that Metatable and ob.exit could be used for doing bad things, is there any other things?
Could somebody please be so kind and give me a list of the things I should watch out and block it (maybe by replacing it with empty string)?
Much appreciated!

Lua's built-in math, string, and table libraries will always be safe. coroutine is also safe, and extremely useful to some advanced lua programmers.
There are some other, not-so-safe libraries lua loads in by default (which you can easily disable)
os lets you execute commands, and do other nasty things. However, os.time and os.date are useful functions, so keep those in.
io allows you to read- and edit- any file on the computer. Probably best to leave it out.
debug allows you to "reflect" on the program. This means that the program can edit certain parts about itself, and can be unwanted. It's a safe bet that user programs won't need this. Ever.
Instead of replacing something with an empty string, you can always replace it with setfenv (Lua 5.1), like so:
local badCode = readFile("./code.lua")
local Func = loadstring(badCode)
setfenv(Func, {
-- If we leave nothing here, then the script will not be able to access any global variable.
-- It's perfectly sandboxed. But let's give it some stuff:
print = print, pcall = pcall, pairs = pairs, ipairs = ipairs, error = error, string = string, table = table, math = math, coroutine = coroutine,
-- Now, let's give it some functions that *could* be useful, from an otherwise sandboxed library
os = {
time = os.time,
date = os.date,
clock = os.clock,
},
-- All of these are "kind of" useful to the program.
})
-- Now that Func is properly sandboxed, let's run it!
Func()
-- This is how you should treat user code.

Related

Does Lisp's treatment of code as data make it more vulnerable to security exploits than a language that doesn't treat code as data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I know that this might be a stupid question but I was curious.
Since Lisp treats code and data the same, does this mean that it's easier to write a payload and pass it as "innocent" data that can be used to exploit programs? In comparison to languages that don't do so?
For e.g. In python you can do something like this.
malicious_str = "print('this is a malicious string')"
user_in = eval(malicious_str)
>>> this is a malicious string
P.S I have just started learning Lisp.
No, I don't think it does. In fact because of what is normally meant by 'code is data' in Lisp, it is potentially less vulnerable than some other languages.
[Note: this answer is really about Common Lisp: see the end for a note about that.]
There are two senses in which 'code can be data' in a language.
Turning objects into executable code: eval & friends
This is the first sense. What this means is that you can, say, take a string or some other object (not all types of object, obviously) and say 'turn this into something I can execute, and do that'.
Any language that can do this has either
to be extremely careful about doing this on unconstrained data;
or to be able to be certain that a given program does not actually do this.
Plenty of languages have equivalents of eval and its relations, so plenty of languages have this problem. You give an example of Python for instance, which is a good one, and there are probably other examples in Python (I've written programs even in Python 2 which supported dynamic loading of modules at runtime, which executes potentially arbitrary code, and I think this stuff is much better integrated in Python 3).
This is also not just a property of a language: it's a property of a system. C can't do this, right? Well, yes it can if you're on any kind of reasonable *nixy platform. Not only can you use an exec-family function, but you can probably dynamically load a shared library and execute code in it.
So one solution to this problem is to, somehow, be able to be certain that a given program doesn't do this. One thing that helps is if there are a finite, known number of ways of doing it. In Common Lisp I think those are probably
eval of course;
unconstrained read (because of *read-eval*);
load;
compile;
compile-file;
and probably some others that I have forgotten.
Well, can you detect calls to those, statically, in a program? Not really: consider this:
(funcall (symbol-function (find-symbol s)) ...)
and now you're in trouble unless you have very good control over what s is: it might be "EVAL" for instance.
So that's frightening, but I don't think it's more frightening than what Python can do, for instance (almost certainly you can poke around in the namespace to find eval or something?). And something like that in a program ought to be a really big hint that bad things might happen.
I think there are probably two approaches to this, neither of which CL adopts but which implementations could (and perhaps even programs written in CL could).
One would be to be able to run programs in such a way that the finite set of bad functions above simply are disallowed: they'd signal errors if you tried to call them. An implementation could clearly do that (see below).
The other would be to have something like Perl's 'tainting' where data which came from a user needs to be explicitly looked-at by the program somehow before it's used. That doesn't guarantee safety of course, but it does make it harder to make silly mistakes: if s above came from user input and was thus tainted you'd have to explicitly say 'it's OK to use it' and, well, then it's up to you.
So this is a problem, but I don't think it's worse than the problems that very many other languages (and language-families) have.
An example of an implementation that can address the first approach is LispWorks: if you're building an application with LW, you typically create the binary with a function called deliver, which has options which allow you to remove the definitions of functions from the resulting binary whether or not the delivery process would otherwise leave them there. So, for instance
(deliver 'foo "x" 5
:functions-to-remove '(eval load compile compile-file read))
would result in an executable x which, whatever else it did, couldn't call those functions, because they're not present, at all.
Other implementations probably have similar features: I just don't know them as well.
But there's another sense in which 'code is data' in Lisp.
Program source code is available as structured data
This is the sense that people probably really mean when they say 'code is data' in Lisp, even if they don't know that. It's worth looking at your Python example again:
>>> eval("exit('exploded')")
exploded
$
So what eval eats is a string: a completely unstructured vector of characters. If you want to know whether that string contains something nasty, well, you've got a lot of work ahead of you (disclaimer: see below).
Compare this with CL:
> (let ((trying-to-be-bad "(end-the-world :now t)"))
(eval trying-to-be-bad))
"(end-the-world :now t)"
OK, so that clearly didn't end the world. And it didn't end the world because eval evaluates a bit of Lisp source code, and the value of a string, as source code, is the string.
If I want to do something nasty I have to hand it an actual interesting structure:
> (let ((actually-bad '(eval (progn
(format *query-io* "? ")
(finish-output *query-io*)
(read *query-io*)))))
(eval actually-bad))
? (defun foo () (foo))
foo
Now that's potentially quite nasty in at least several ways. But wait: in order to do this nasty thing, I had to hand eval a chunk of source code represented as an s-expression. And the structure of that s-expression is completely open to inspection by me. I can write a program which inspects this s-expression in any arbitrary way I like, and decides whether or not it is acceptable to me. That's just hugely easier than 'given this string, interpret it as a piece of source text for the language and tell me if it is dangerous':
the process of turning the sequence of characters into an s-expression has happened already;
the structure of s-expressions is both simple and standard.
So in this sense of 'code is data', Lisp is potentially much safer than other languages which have versions of eval which eat strings, like Python, say, because code is structured, standard, simple data. Lisp has an answer to the terrible 'language in a string' problem.
I am fairly sure that Python does in fact have some approach to making the parse tree available in a standard way which can be inspected. But eval still happily eats strings.
As I said above, this answer is about Common Lisp. But there are many other Lisps of course, which will have varying versions of this problem. Racket for instance probably can really fairly tightly constrain things, using sandboxed execution and modules, although I haven't explored this.
Any language can be exploited if you are not careful.
A well-known attack against Lisp is via the #. reader macro:
(read-from-string "#.(start-the-war)")
will start the war if *read-eval* is non-nil - this is why one should always bind it when reading from an un-trusted stream.
However, this is not directly related to "code is data" doctrine...

Why do we need IO?

In Tackling the Awkward Squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell, SPJ states:
For example, perhaps the functional program could be a function mapping an input character string to an output string:
main :: String -> String
Now a "wrapper" program, written in (gasp!) C, can get an input string from somewhere [...] apply the function to it, and store the result somewhere [...]
He then goes on to say that this locates the "sinfulness" in the wrapper, and that the trouble with this approach is that one sin leads to another (e.g. more than one input, deleting files, opening sockets etc).
This seems odd to me. I would have thought Haskell would be most powerful, and possibly even most useful, when approached exactly in this fashion. That is, the input is a character string located in a file, and output is a new character string in a new file. If the input string is some mathematical expression concatenated with data, and the output string is (non-Haskell) code, then you can Get Things Done. In other words, why not always treat Haskell programs as translators? (Or as a compiler, but as a translator you can blend genuine I/O into the final executable.)
Regardless of the wisdom of this as a general strategy (I appreciate that some things we might want to get done may not start out as mathematics), my real question is: if this is indeed the approach, can we avoid the IO a type? Do we need a wrapper in some other language? Does anyone actually do this?
The point is that String -> String is a rather poor model of what programs do in general.
What if you're writing an http server that accepts concurrent pipelined requests and responds to each pipeline concurrently while also interleaving writes from a response in a pipeline along with reads for the next request? This is the level of concurrency that http servers work at.
Maybe, just maybe, you can stuff that into a String -> String program. You could multiplex the pipelines into your single channel. But what about timeouts? Web servers time out connections that trickle in, to prevent slow loris attacks. How are you even going to account for that? Maybe your input string has a sequence of timestamps added at regular intervals, regardless of other inputs? Oh, but what about the variant where the recipient only reads from their receive buffer in trickles? How can you even tell that you are blocked waiting for the send buffer to drain?
If you pursue all the potential problems and stuff them into a String -> String program, you eventually end up with almost all the interesting parts of the server existing outside your haskell program. After all, something has to do the multiplexing, has to do the error detection and reporting, has to do timeouts. If you're writing an http server in Haskell, it would be nice if it was actually written in Haskell.
Of course, none of this means the IO type as it currently exists is the best possible answer. There are reasonable complaints that can be made about it. But it at least allows you to address all of those issues within Haskell.
That wrapper exists. It’s called Prelude.interact. I use the Data.ByteString versions of it often. The wrapper to a pure String -> String function works, because strings are lazily-evaluated singly-linked lists that can process each line of input as it’s read in, but singly-linked lists of UCS-4 characters are a very inefficient data structure.
You still need to use IO for the wrapper because the operations depend on the state of the universe and need to be sequenced with the outside world. In particular, if your program is interactive, you want it to respond to a new keyboard command immediately, and run all the OS system calls in sequence, not (say) process all the input and display all the output at once when it’s ready to quit the program.
A simple program to demonstrate this is:
module Main where
import Data.Char (toUpper)
main :: IO ()
main = interact (map toUpper)
Try running this interactively. Type control-D to quit on Linux or the MacOS console, and control-Z to quit on Windows.
As I mentioned before, though String is not an efficient data structure at all. For a more complicated example, here is the Main module of a program I wrote to normalize UTF-8 input to NFC form.
module Main ( lazyNormalize, main ) where
import Data.ByteString.Lazy as BL ( fromChunks, interact )
import Data.Text.Encoding as E (encodeUtf8)
import Data.Text.Lazy as TL (toChunks)
import Data.Text.Lazy.Encoding as LE (decodeUtf8)
import Data.Text.ICU ( NormalizationMode (NFC) )
import TextLazyICUNormalize (lazyNormalize)
main :: IO ()
main = BL.interact (
BL.fromChunks .
map E.encodeUtf8 .
TL.toChunks . -- Workaround for buffer not always flushing on newline.
lazyNormalize NFC .
LE.decodeUtf8 )
This is an Data.Bytestring.Lazy.interact wrapper around a Data.Text.Lazy.Text -> Data.Text.Lazy.Text function, lazyNormalize with the NormalizationMode constant NFC as its curried first argument. Everything else just converts from the lazy ByteString strings I use to do I/O to the lazy Text strings the ICU library understands, and back. You will probably see more programs written with the & operator than in this point-free style.
There are multiple questions here:
If the input string is some mathematical expression concatenated with data, and the output string is (non-Haskell) code, then you can Get Things Done. In other words, why not always treat Haskell programs as translators? ([... because] as a translator you can blend genuine I/O into the final executable.)
Regardless of the wisdom of this as a general strategy [...] if this is indeed the approach, can we avoid the IO a type?
Do we need a wrapper in some other language?
Does anyone actually do this?
Using your informal description, main would have a type signature resembling:
main :: (InputExpr, InputData) -> OutputCode
If we drop the InputData component:
main :: InputExpr -> OutputCode
then (as you noted) main really does look like a translator...but we already have programs for doing that - compilers!
While task-specific translators have their place, using them everywhere when we already have general-purpose ones seems somewhat redundant...
...so it's use as as a general strategy is speculative at best. This is probably why such an approach was never officially "built into" Haskell (instead being implemented using Haskell's I/O facilities e.g. interact for simple string-to-string interactions).
As for getting by without the monadic IO type - for a single-purpose language like Dhall, it could be possible...but can that technique also be used to build webservers or operating systems? (That is left as an exercise for intrepid readers :-)
In theory: no - the various Lisp Machines being the canonical example.
In practice: yes - as the variety of microprocessor and assembly languages grows ever larger, portably defining the essential runtime services (parallelism, GC, I/O, etc) in an existing programming language these days is a necessity.
Not exactly - perhaps the closest to what you're described is (again) the string-to-string interactivity supported in the venerable Lazy ML system from Chalmers or the original version of Miranda(R).

Advantage to a certain string comparison order

Looking at some pieces of code around the internet, I've noticed some authors tend to write string comparisons like
if("String"==$variable)
in PHP, or
if("String".equals(variable))
Whereas my preference is:
if(variable.equals("String"))
I realize these are effectively equal: they compare two strings for equality. But I was curious if there was an advantage to one over the other in terms of performance or something else.
Thank you for the help!
One example to the approach of using an equality function or using if( constant == variable ) rather than if( variable == constant ) is that it prevents you from accidentally typoing and writing an assignment instead of a comparison, for instance:
if( s = "test" )
Will assign "test" to s, which will result in undesired behaviour which may potentially cause a hard-to-find bug. However:
if( "test" = s )
Will in most languages (that I'm aware of) result in some form of warning or compiler error, helping to avoid a bug later on.
With a simple int example, this prevents accidental writes of
if (a=5)
which would be a compile error if written as
if (5=a)
I sure don't know about all languages, but decent C compilers warn you about if (a=b). Perhaps whatever language your question is written in doesn't have such a feature, so to be able to generate an error in such a case, they have reverted the order of the comparison arguments.
Yoda conditions call these some.
The kind of syntaxis a language uses has nothing to do with efficiency. It is all about how the comparison algorithm works.
In the examples you mentioned, this:
if("String".equals(variable))
and this:
if(variable.equals("String"))
would be exactly the same, because the expression "String" will be treated as a String variable.
Languages that provide a comparison method for Strings, will use the fastest method so you shouldn't care about it, unless you want to implement the method yourself ;)

Programming language for self-modifying code?

I am recently thinking about writing self-modifying programs, I think it may be powerful and fun. So I am currently looking for a language that allows modifying a program's own code easily.
I read about C# (as a way around) and the ability to compile and execute code in runtime, but that is too painful.
I am also thinking about assembly. It is easier there to change running code but it is not very powerful (very raw).
Can you suggest a powerful language or feature that supports modifying code in runtime?
Example
This is what I mean by modifying code in runtime:
Start:
a=10,b=20,c=0;
label1: c=a+b;
....
label1= c=a*b;
goto label1;
and may be building a list of instructions:
code1.add(c=a+b);
code1.add(c=c*(c-1));
code1. execute();
Malbolge would be a good place to start. Every instruction is self-modifying, and it's a lot of fun(*) to play with.
(*) Disclaimer: May not actually be fun.
I highly recommend Lisp. Lisp data can be read and exec'd as code. Lisp code can be written out as data.
It is considered one of the canonical self-modifiable languages.
Example list(data):
'(+ 1 2 3)
or, calling the data as code
(eval '(+ 1 2 3))
runs the + function.
You can also go in and edit the members of the lists on the fly.
edit:
I wrote a program to dynamically generate a program and evaluate it on the fly, then report to me how it did compared to a baseline(div by 0 was the usual report, ha).
Every answer so far is about reflection/runtime compilation, but in the comments you mentioned you're interested in actual self-modifying code - code that modifies itself in-memory.
There is no way to do this in C#, Java, or even (portably) in C - that is, you cannot modify the loaded in-memory binary using these languages.
In general, the only way to do this is with assembly, and it's highly processor-dependent. In fact, it's highly operating-system dependent as well: to protect against polymorphic viruses, most modern operating systems (including Windows XP+, Linux, and BSD) enforce W^X, meaning you have to go through some trouble to write polymorphic executables in those operating systems, for the ones that allow it at all.
It may be possible in some interpreted languages to have the program modify its own source-code while it's running. Perl, Python (see here), and every implementation of Javascript I know of do not allow this, though.
Personally, I find it quite strange that you find assembly easier to handle than C#. I find it even stranger that you think that assembly isn't as powerful: you can't get any more powerful than raw machine language. Anyway, to each his/her own.
C# has great reflection services, but if you have an aversion to that.. If you're really comfortable with C or C++, you could always write a program that writes C/C++ and issues it to a compiler. This would only be viable if your solution doesn't require a quick self-rewriting turn-around time (on the order of tens of seconds or more).
Javascript and Python both support reflection as well. If you're thinking of learning a new, fun programming language that's powerful but not massively technically demanding, I'd suggest Python.
May I suggest Python, a nice very high-level dynamic language which has rich introspection included (and by e.g. usage of compile, eval or exec permits a form of self-modifying code). A very simple example based upon your question:
def label1(a,b,c):
c=a+b
return c
a,b,c=10,20,0
print label1(a,b,c) # prints 30
newdef= \
"""
def label1(a,b,c):
c=a*b
return c
"""
exec(newdef,globals(),globals())
print label1(a,b,c) # prints 200
Note that in the code sample above c is only altered in the function scope.
Common Lisp was designed with this sort of thing in mind. You could also try Smalltalk, where using reflection to modify running code is not unknown.
In both of these languages you are likely to be replacing an entire function or an entire method, not a single line of code. Smalltalk methods tend to be more fine-grained than Lisp functions, so that may be a good place to begin.
Many languages allow you to eval code at runtime.
Lisp
Perl
Python
PHP
Ruby
Groovy (via GroovyShell)
In high-level languages where you compile and execute code at run-time, it is not really self-modifying code, but dynamic class loading. Using inheritance principles, you can replace a class Factory and change application behavior at run-time.
Only in assembly language do you really have true self-modification, by writing directly to the code segment. But there is little practical usage for it. If you like a challenge, write a self-encrypting, maybe polymorphic virus. That would be fun.
I sometimes, although very rarely do self-modifying code in Ruby.
Sometimes you have a method where you don't really know whether the data you are using (e.g. some lazy cache) is properly initialized or not. So, you have to check at the beginning of your method whether the data is properly initialized and then maybe initialize it. But you really only have to do that initialization once, but you check for it every single time.
So, sometimes I write a method which does the initialization and then replaces itself with a version that doesn't include the initialization code.
class Cache
def [](key)
#backing_store ||= self.expensive_initialization
def [](key)
#backing_store[key]
end
#backing_store[key]
end
end
But honestly, I don't think that's worth it. In fact, I'm embarrassed to admit that I have never actually benchmarked to see whether that one conditional actually makes any difference. (On a modern Ruby implementation with an aggressively optimizing profile-feedback-driven JIT compiler probably not.)
Note that, depending on how you define "self-modifying code", this may or may not be what you want. You are replacing some part of the currently executing program, so …
EDIT: Now that I think about it, that optimization doesn't make much sense. The expensive initialization is only executed once anyway. The only thing that modification avoids, is the conditional. It would be better to take an example where the check itself is expensive, but I can't think of one.
However, I thought of a cool example of self-modifying code: the Maxine JVM. Maxine is a Research VM (it's technically not actually allowed to be called a "JVM" because its developers don't run the compatibility testsuites) written completely in Java. Now, there are plenty of JVMs written in itself, but Maxine is the only one I know of that also runs in itself. This is extremely powerful. For example, the JIT compiler can JIT compile itself to adapt it to the type of code that it is JIT compiling.
A very similar thing happens in the Klein VM which is a VM for the Self Programming Language.
In both cases, the VM can optimize and recompile itself at runtime.
I wrote Python class Code that enables you to add and delete new lines of code to the object, print the code and excecute it. Class Code shown at the end.
Example: if the x == 1, the code changes its value to x = 2 and then deletes the whole block with the conditional that checked for that condition.
#Initialize Variables
x = 1
#Create Code
code = Code()
code + 'global x, code' #Adds a new Code instance code[0] with this line of code => internally code.subcode[0]
code + "if x == 1:" #Adds a new Code instance code[1] with this line of code => internally code.subcode[1]
code[1] + "x = 2" #Adds a new Code instance 0 under code[1] with this line of code => internally code.subcode[1].subcode[0]
code[1] + "del code[1]" #Adds a new Code instance 0 under code[1] with this line of code => internally code.subcode[1].subcode[1]
After the code is created you can print it:
#Prints
print "Initial Code:"
print code
print "x = " + str(x)
Output:
Initial Code:
global x, code
if x == 1:
x = 2
del code[1]
x = 1
Execute the cade by calling the object: code()
print "Code after execution:"
code() #Executes code
print code
print "x = " + str(x)
Output 2:
Code after execution:
global x, code
x = 2
As you can see, the code changed the variable x to the value 2 and deleted the whole if block. This might be useful to avoid checking for conditions once they are met. In real-life, this case-scenario could be handled by a coroutine system, but this self modifying code experiment is just for fun.
class Code:
def __init__(self,line = '',indent = -1):
if indent < -1:
raise NameError('Invalid {} indent'.format(indent))
self.strindent = ''
for i in xrange(indent):
self.strindent = ' ' + self.strindent
self.strsubindent = ' ' + self.strindent
self.line = line
self.subcode = []
self.indent = indent
def __add__(self,other):
if other.__class__ is str:
other_code = Code(other,self.indent+1)
self.subcode.append(other_code)
return self
elif other.__class__ is Code:
self.subcode.append(other)
return self
def __sub__(self,other):
if other.__class__ is str:
for code in self.subcode:
if code.line == other:
self.subcode.remove(code)
return self
elif other.__class__ is Code:
self.subcode.remove(other)
def __repr__(self):
rep = self.strindent + self.line + '\n'
for code in self.subcode: rep += code.__repr__()
return rep
def __call__(self):
print 'executing code'
exec(self.__repr__())
return self.__repr__()
def __getitem__(self,key):
if key.__class__ is str:
for code in self.subcode:
if code.line is key:
return code
elif key.__class__ is int:
return self.subcode[key]
def __delitem__(self,key):
if key.__class__ is str:
for i in range(len(self.subcode)):
code = self.subcode[i]
if code.line is key:
del self.subcode[i]
elif key.__class__ is int:
del self.subcode[key]
You can do this in Maple (the computer algebra language). Unlike those many answers above which use compiled languages which only allow you to create and link in new code at run-time, here you can honest-to-goodness modify the code of a currently-running program. (Ruby and Lisp, as indicated by other answerers, also allow you to do this; probably Smalltalk too).
Actually, it used to be standard in Maple that most library functions were small stubs which would load their 'real' self from disk on first call, and then self-modify themselves to the loaded version. This is no longer the case as the library loading has been virtualized.
As others have indicated: you need an interpreted language with strong reflection and reification facilities to achieve this.
I have written an automated normalizer/simplifier for Maple code, which I proceeded to run on the whole library (including itself); and because I was not too careful in all of my code, the normalizer did modify itself. I also wrote a Partial Evaluator (recently accepted by SCP) called MapleMIX - available on sourceforge - but could not quite apply it fully to itself (that wasn't the design goal).
Have you looked at Java ? Java 6 has a compiler API, so you can write code and compile it within the Java VM.
In Lua, you can "hook" existing code, which allows you to attach arbitrary code to function calls. It goes something like this:
local oldMyFunction = myFunction
myFunction = function(arg)
if arg.blah then return oldMyFunction(arg) end
else
--do whatever
end
end
You can also simply plow over functions, which sort of gives you self modifying code.
Dlang's LLVM implementation contains the #dynamicCompile and #dynamicCompileConst function attributes, allowing you to compile according to the native host's the instruction set at compile-time, and change compile-time constants in runtime through recompilation.
https://forum.dlang.org/thread/bskpxhrqyfkvaqzoospx#forum.dlang.org

Lisp data security/validation

This is really just a conceptual question for me at this point.
In Lisp, programs are data and data are programs. The REPL does exactly that - reads and then evaluates.
So how does one go about getting input from the user in a secure way? Obviously it's possible - I mean viaweb - now Yahoo!Stores is pretty secure, so how is it done?
The REPL stands for Read Eval Print Loop.
(loop (print (eval (read))))
Above is only conceptual, the real REPL code is much more complicated (with error handling, debugging, ...).
You can read all kinds of data in Lisp without evaluating it. Evaluation is a separate step - independent from reading data.
There are all kinds of IO functions in Lisp. The most complex of the provided functions is usually READ, which reads s-expressions. There is an option in Common Lisp which allows evaluation during READ, but that can and should be turned off when reading data.
So, data in Lisp is not necessarily a program and even if data is a program, then Lisp can read the program as data - without evaluation. A REPL should only be used by a developer and should not be exposed to arbitrary users. For getting data from users one uses the normal IO functions, including functions like READ, which can read S-expressions, but does not evaluate them.
Here are a few things one should NOT do:
use READ to read arbitrary data. READ for examples allows one to read really large data - there is no limit.
evaluate during READ ('read eval'). This should be turned off.
read symbols from I/O and call their symbol functions
read cyclical data structures with READ, when your functions expect plain lists. Walking down a cyclical list can keep your program busy for a while.
not handle syntax errors during reading from data.
You do it the way everyone else does it. You read a string of data from the stream, you parse it for your commands and parameters, you validate the commands and parameters, and you interpret the commands and parameters.
There's no magic here.
Simply put, what you DON'T do, is you don't expose your Lisp listener to an unvalidated, unsecure data source.
As was mentioned, the REPL is read - eval - print. #The Rook focused on eval (with reason), but do not discount READ. READ is a VERY powerful command in Common Lisp. The reader can evaluate code on its own, before you even GET to "eval".
Do NOT expose READ to anything you don't trust.
With enough work, you could make a custom package, limit scope of functions avaliable to that package, etc. etc. But, I think that's more work than simply writing a simple command parser myself and not worrying about some side effect that I missed.
Create your own readtable and fill with necessary hooks: SET-MACRO-CHARACTER, SET-DISPATCH-MACRO-CHARACTER et al.
Bind READTABLE to your own readtable.
Bind READ-EVAL to nil to prevent #. (may not be necessary if step 1 is done right)
READ
Probably something else.
Also there is a trick in interning symbols in temporary package while reading.
If data in not LL(1)-ish, simply write usual parser.
This is a killer question and I thought this same thing when I was reading about Lisp. Although I haven't done anything meaningful in LISP so my answer is very limited.
What I can tell you is that eval() is nasty. There is a saying that I like "If eval is the answer then you are asking the wrong question." --Unknown.
If the attacker can control data that is then evaluated then you have a very serious remote code execution vulnerability. This can be mitigated, and I'll show you an example with PHP, because that is what I know:
$id=addslashes($_GET['id']);
eval('$test="$id";');
If you weren't doing an add slashes then an attacker could get remote code execution by doing this:
http://localhost?evil_eval.php?id="; phpinfo();/*
But the add slashes will turn the " into a \", thus keeping the attacker from "breaking out" of the "data" and being able to execute code. Which is very similar to sql injection.
I found that question quit controversial. The eval wont eval your input unless you explicitly ask for it.
I mean your input will not be treat it as a LISP code but instead as a string.
Is not because that your language have powerfull concept like the eval that it is not "safe".
I think the confusion come from SQL where your actually treat an input as a [part of] SQL.
(query (concatenate 'string "SELECT * FROM foo WHERE id = " input-id))
Here input-id is being evaluate by the SQL engine.
This is because you have no nice way to write SQL, or whatever, but the point is that your input become part of what is being evaluate.
So eval don't bring you insecurity unless your are using it eyes closed.
EDIT Forgot to tell that this apply to any language.

Resources