I'm speaking specifically of something like
the PLT Scheme make-evaluator.
It will run scheme code, but under certain conditions:
It only uses a definable amount of memory, and will quit execution if the script needs more
It behaves similarly with time
It restricts all IO except for what I specifically allow in the code
Is anyone familiar with anything else that can do this?
Lua lets you easily define sandboxes with as much or as little power you want.
PHP allows something similar with eval - though you would need to set some restrictive values with ini_set before calling it, and they would affect the current script as well.
The Java platform provides fine-grained access control and sandboxing.
This isn't exactly equivalent to make-evaluator but the API allows you to place constraints on arbitrary objects (through the GuardedObject class). You can also restrict permissions of classes loaded from a particular source.
It might be helpful to read the Java Platform Security Architecture spec
Please note that Java APIs can be accessed from most languages on the jvm.
Related
I want to allow user scripting/programming of server-side programs. While I am seeking advice on serverfault regarding how to mitigate security risks on the system administration side of things, I am also wondering what languages provide facilities to restrict them from executing dangerous code?
While I'll discuss Lua and Tcl, I am very interested in other options as well; I love to learn about new languages, and having a selection to choose from would be great from the standpoint of being able to pick one that would be ideal for users.
Lua:
Lua has a sandbox capability.
As an example of an issue that a developer may need to consider for Lua, there is this page regarding Lua bytecode as a vector of attack.
Tcl:
Tcl has the ability to create "safe interpreters":
https://www.tcl-lang.org/man/tcl8.6/TclCmd/interp.htm#M12
If the -safe switch is specified (or if the parent interpreter is a safe interpreter), the new child interpreter will be created as a safe interpreter with limited functionality; otherwise the child will include the full set of Tcl built-in commands and variables.
I haven't been able to find much information regarding issues a developer may need to consider for Tcl.
What are other languages with similar capabilities and do they have any proven known issues in this regard?
In Tcl with basic safe interpreters, all operations that touch the outside world at all (open, exec, source, load, socket, etc.) are profiled out of the safe interpreter — the commands that provide them are hidden (a special kind of naming that means that they cannot be accessed from inside the interpreter) — so it's trivially provable that by default there's only possible problems due to excessive memory or CPU use.
But what about how to actually let the safe interpreters do something useful?
Well, that's possible because every safe interpreter has a parent interpreter that is fully-enabled, and which can create inter-interpreter aliases: commands in the safe interpreter which invoke a defined stand-in in the controlling parent. It's good to think of those as being analogous to a system call (though much cheaper!) and they can provide exactly the operations that the application wishes to support. Of course, some substantial care is required if you take an argument to those commands that you intend to treat as a filename or network address, but you at least know that you only ever get poked at in ways that you expect. (The usual way to avoid problems with filenames is to only support abstract handles — simple names defined by the parent — that have no meaning other than “you can use this in these operations”. That's pretty much how Tcl's I/O channel handles work.)
There's also the full Safe Tcl built-in package, which provides a simulated full interpreter in a safe interpreter and which allows defined profiles of what can be accessed (e.g., just reading packages from a defined local package repository). I'm less certain that that's correct; it is quite complicated internally.
I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.
This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.
Common Lisp allows to execute/compile code at runtime. But I thought for some (scripting-like) purposes it would be good if one could disallow a user-script to call some functions (especially for application extensions). One could still ask the user if he will allow an extension to access files/... I'm thinking of something like the Android permission system for Common Lisp. Is this possible without rewriting the evaluation code?
The problem I see is, that in Common Lisp you would probably want a script to be able to use reader macros and normal macros and for the latter operators like intern, but those would allow you to get arbitrary symbols (by string manipulation & interning), so simply scanning the code before evaluation won't suffice to ensure that specific functions aren't called.
So, is there something like a lock for functions? I thought of using fmakunbound / makunbound (and keeping the values in a local variable), but would that be possible in a multi-threaded environment?
Thanks in advance.
This is not part of the Common Lisp specification and there is no Common Lisp implementation that is extended to make this kind of restriction easy.
It seems to me like it would be easier to use operating system restrictions (e.g. rlimit, capabilities, etc) to enforce what you want on the Common Lisp process.
This is not an unusual desire, i.e. to run untrusted 3rd party code in a sandbox.
You can hand craft a sandbox by creating a custom parser and interpreter for your scripting language. It is pedantic, but true, than any program with an API is providing such a service. API designers and implementors needs to worry about the vile users.
You can still call eval or the compiler to run your sandbox scripts. It just means you need to assure that your reader, parser and language decline to provide access to any risky functionality.
You can use a lisp package to create a good sandbox. You can still use s-expressions for your scripting language's syntax, but you must cripple the standard reader so the user can't escape package-sandbox. You can still use the evaluator and the compiler, but you need to be sure the package you have boxed the user into contains no functionality that he can use to do inappropriate things.
Successful sandbox design and construction is easier when you start with an empty sandbox and slowly add functionality. Common Lisp is a big language and that creates a huge surface for attacker to poke at. So if you create a sandbox out of a package it's best to start with an empty package and add functions one at a time. Thinking thru what risks they create. The same approach is good when creating your crippled reader. Don't start with the full reader and throw things away, start with a useless reader and add things. Sadly taking that advice creates a pretty significant cost to getting started. But, if you look around I suspect you can find an existing safe reader.
Xach's suggestion is another way to go and in many case more straight forward.
I'm looking for a way to run an arbitrary Haskell code safely (or refuse to run unsafe code).
Must have:
module/function whitelist
timeout on execution
memory usage restriction
Capabilities I would like to see:
ability to kill thread
compiling the modules to native code
caching of compiled code
running several interpreters concurrently
complex datatype for compiler errors (insted of simple message in String)
With that sort of functionality it would be possible to implement a browser plugin capable of running arbitrary Haskell code, which is the idea I have in mind.
EDIT: I've got two answers, both great. Thanks! The sad part is that there doesn't seem to be ready-to-go library, just a similar program. It's a useful resource though. Anyway I think I'll wait for 7.2.1 to be released and try to use SafeHaskell in my own program.
We've been doing this for about 8 years now in lambdabot, which supports:
a controlled namespace
OS-enforced timeouts
native code modules
caching
concurrent interactive top-levels
custom error message returns.
This series of rules is documented, see:
Safely running untrusted Haskell code
mueval, an alternative implementation based on ghc-api
The approach to safety taken in lambdabot inspired the Safe Haskell language extension work.
For approaches to dynamic extension of compiled Haskell applications, in Haskell, see the two papers:
Dynamic Extension of Typed Functional Languages, and
Dynamic applications from the ground up.
GHC 7.2.1 will likely have a new facility called SafeHaskell which covers some of what you want. SafeHaskell ensures type-safety (so things like unsafePerformIO are outlawed), and establishes a trust mechanism, so that a library with a safe API but implemented using unsafe features can be trusted. It is designed exactly for running untrusted code.
For the other practical aspects (timeouts and so on), lambdabot as Don says would be a great place to look.
I'm developer of Robocode engine. We would like to make Robocode
multilingual and Scala seems to be good match. We have Scala plugin prototype here.
The problem:
Because users are creative programmers, they may try to win battle
different ways. As well robots are downloaded from online database
where anyone could upload one. So gap in security may lead to security
hole into users computer. Robots written in Java are running in
restricted sandbox. Almost everything is prohibited [network, GUI,
disk (limited), threads (limited), classloaders and reflection]. The
sandbox is similar to browser applet. We use SecurityManager, custom
ClassLoader per robot, etc ...
There are two ways how to host Scala runtime in Robocode:
1) load it together with robot inside of sandbox. Pretty safe for us,
preferred solution. But it will damage Scala runtime abilities because runtime uses reflection. Maybe generates classes at runtime ? Use threads to do some internal cleanup ? Access to JVM/internals ? (I would not like to limit abilities of language)
2) use Scala runtime as trusted code, outside the box, security on
same level as JDK. Visibility to (malicious)
robot. Are the Scala runtime APIs safe ? Do methods they have security
guards ? Is there any safe mode ? Is there any singleton in Scala runtime,
which could be abused to communicate between robots ? Any concurency/threadpool/messaging which could simulate threads ? (Is there any security audit for Scala runtime?)
3) Something in between, some classes of runtime in and some out. Which classes/packages must be visible to robot/which are just private implementation ? (this seems to be future solution)
The question:
Is it possible to enumerate and isolate the parts of runtime which must run in
trusted scope from the rest ? Specific packages and classes ? Or better idea ?
I'm looking for specific answer, which will lead to secure solution. Random thoughts welcome, but not awarded. There is ongoing discussion at scala email group. No specific answer yet.
I think #1 is your best bet and even that is a moving target. As brought up on the mailing list, structural types use reflection. I don't think structural types are common in the standard library, but I don't think anyone keeps track of where they are.
There's also always the possibility that there are other features using reflection behind the scenes. For example, for a while in the 2.8 branch some array functionality was using reflection. I think that's been changed after benchmarking, but there's always the possibility that there's some problem where someone said "Aha! I will use reflection to solve this."
The Scala standard library is filled with singletons. Most of them are immutable, but I know that the Scheduler object in the actors library could be abused for communication because it is essentially a proxy for an actual scheduler so you can plug your own custom scheduler into it.
At this time I don't think Scala requires using a custom class loader and all of its classes are produced at compile time instead of runtime, but then again that's probably a moving target. Scala generates a lot of class files, and there is always talk of making it generate some of them at runtime when they are needed instead of at compile time.
So, in short, I do not think it's possible (within reasonable constraints on effort) to enumerate and isolate the pieces of Scala that can (and should) be trusted.
As you mentioned other J* language implementations which all may make use of reflections, it would be a ban for all those languages as long as reflection is not part of the game.
I guess that would be JVM's problem not to have a way to partition the scope of reflection API, such that you could sort of "sandbox" the part of code that could be reflected within.