Security of scala runtime

Security of scala runtime - security

I'm developer of Robocode engine. We would like to make Robocode
multilingual and Scala seems to be good match. We have Scala plugin prototype here.
The problem:
Because users are creative programmers, they may try to win battle
different ways. As well robots are downloaded from online database
where anyone could upload one. So gap in security may lead to security
hole into users computer. Robots written in Java are running in
restricted sandbox. Almost everything is prohibited [network, GUI,
disk (limited), threads (limited), classloaders and reflection]. The
sandbox is similar to browser applet. We use SecurityManager, custom
ClassLoader per robot, etc ...
There are two ways how to host Scala runtime in Robocode:
1) load it together with robot inside of sandbox. Pretty safe for us,
preferred solution. But it will damage Scala runtime abilities because runtime uses reflection. Maybe generates classes at runtime ? Use threads to do some internal cleanup ? Access to JVM/internals ? (I would not like to limit abilities of language)
2) use Scala runtime as trusted code, outside the box, security on
same level as JDK. Visibility to (malicious)
robot. Are the Scala runtime APIs safe ? Do methods they have security
guards ? Is there any safe mode ? Is there any singleton in Scala runtime,
which could be abused to communicate between robots ? Any concurency/threadpool/messaging which could simulate threads ? (Is there any security audit for Scala runtime?)
3) Something in between, some classes of runtime in and some out. Which classes/packages must be visible to robot/which are just private implementation ? (this seems to be future solution)
The question:
Is it possible to enumerate and isolate the parts of runtime which must run in
trusted scope from the rest ? Specific packages and classes ? Or better idea ?
I'm looking for specific answer, which will lead to secure solution. Random thoughts welcome, but not awarded. There is ongoing discussion at scala email group. No specific answer yet.

I think #1 is your best bet and even that is a moving target. As brought up on the mailing list, structural types use reflection. I don't think structural types are common in the standard library, but I don't think anyone keeps track of where they are.
There's also always the possibility that there are other features using reflection behind the scenes. For example, for a while in the 2.8 branch some array functionality was using reflection. I think that's been changed after benchmarking, but there's always the possibility that there's some problem where someone said "Aha! I will use reflection to solve this."
The Scala standard library is filled with singletons. Most of them are immutable, but I know that the Scheduler object in the actors library could be abused for communication because it is essentially a proxy for an actual scheduler so you can plug your own custom scheduler into it.
At this time I don't think Scala requires using a custom class loader and all of its classes are produced at compile time instead of runtime, but then again that's probably a moving target. Scala generates a lot of class files, and there is always talk of making it generate some of them at runtime when they are needed instead of at compile time.
So, in short, I do not think it's possible (within reasonable constraints on effort) to enumerate and isolate the pieces of Scala that can (and should) be trusted.

As you mentioned other J* language implementations which all may make use of reflections, it would be a ban for all those languages as long as reflection is not part of the game.
I guess that would be JVM's problem not to have a way to partition the scope of reflection API, such that you could sort of "sandbox" the part of code that could be reflected within.

Related

Multithreading in JRuby - what to avoid?

I hope that this question is not too broad for Stackoverflow. If it is, I would be grateful to get a suggestion, where this kind of question can be discussed.
Problem:
I'm thinking of creating a multi-threading applications in JRuby, and try to forsee potential pitfalls. My current concept goes like this:
Use the ruby-concurrent library (https://github.com/ruby-concurrency/concurrent-ruby)
Communication between threads uses only Queues and/or Futures from this library
Now I'm wondering what else I would have to observe. For instance, while new code can of course use the classes defined in concurrent-ruby, each thread will also use for its internal work existing Ruby code or Gems, and I don't know in what ways they could jeopardize the parallism.
For instance, the docs of old JRuby versions (<1.7) reportedly had the problem that the implementation of the native Hash and Array classes themselves were not thread-safe, with the effect that even a method using a Hash in a local variable, could comprise another thread. I think this is not present in newer JRuby versions anymore; at least I could not find anything in the current JRuby docs about this.
Then, I see a risk from global variables ($name) and class variables (#name, ##name)? From my understanding, this would also be a possible error sources and I have to check the source code of each gem I am planning to use, whether it perhaps uses such a variable.
Is there anything else I would have to be aware of?

Securely running user's code

I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.

This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.

Safe execution of untrusted Haskell code

I'm looking for a way to run an arbitrary Haskell code safely (or refuse to run unsafe code).
Must have:
module/function whitelist
timeout on execution
memory usage restriction
Capabilities I would like to see:
ability to kill thread
compiling the modules to native code
caching of compiled code
running several interpreters concurrently
complex datatype for compiler errors (insted of simple message in String)
With that sort of functionality it would be possible to implement a browser plugin capable of running arbitrary Haskell code, which is the idea I have in mind.
EDIT: I've got two answers, both great. Thanks! The sad part is that there doesn't seem to be ready-to-go library, just a similar program. It's a useful resource though. Anyway I think I'll wait for 7.2.1 to be released and try to use SafeHaskell in my own program.

We've been doing this for about 8 years now in lambdabot, which supports:
a controlled namespace
OS-enforced timeouts
native code modules
caching
concurrent interactive top-levels
custom error message returns.
This series of rules is documented, see:
Safely running untrusted Haskell code
mueval, an alternative implementation based on ghc-api
The approach to safety taken in lambdabot inspired the Safe Haskell language extension work.
For approaches to dynamic extension of compiled Haskell applications, in Haskell, see the two papers:
Dynamic Extension of Typed Functional Languages, and
Dynamic applications from the ground up.

GHC 7.2.1 will likely have a new facility called SafeHaskell which covers some of what you want. SafeHaskell ensures type-safety (so things like unsafePerformIO are outlawed), and establishes a trust mechanism, so that a library with a safe API but implemented using unsafe features can be trusted. It is designed exactly for running untrusted code.
For the other practical aspects (timeouts and so on), lambdabot as Don says would be a great place to look.

What is the best way to create multiple language versions of a domain?

I would like to create a set of domain objects in multiple languages, so that I can target different platforms. I have been looking at external DSLs as a way to define a language for my domain, and then potentially writing adapters that generate code for the languages I'm interested in targeting. Is this the best way to solve this problem? Or is it just simpler to maintain multiple versions of the project?

I think that Apache Thrift delivers what you are asking for.

Sorry for late answer, but as you mention C# being your main language, this practically fully supported Visual Studio based technology is exactly what you are looking for.
You have to understand what you want to abstract with your DSLs, but the multiple-platform support is trivial on top of that.
Disclaimer: This is our technology, but it's publicly open and it solves exactly the problem presented in the question.
http://abstraction.codeplex.com/
Note! Mind the very "alpha" stage of the current download, I suggest you skip the zipped download and grab the latest source. I am updating better construct in relatively near future. Check out the "Context" implementation in "Production/Dev/AbstractionTemplate" solution.

It is difficult to be helpful without understanding what you are planning to use your DSL for.
Is portability your main problem here?
To succesfully target these different platforms, you will probably have to maintain plaftorm-specific layers anyway (generated or not).
If you plan to write your whole application in your DSL, then use your own compiler to transform it into runnable code for each platform, well it is most probably a bad idea, too complex and overengineered.
However, if you have a well-defined chunk of platform-independent logic, then a DSL is a good choice. Just write an interpreter for it on each target platform (provided that performance is not critical, this is also simpler and easier than generating code).

What is the best way to create multiple language versions of a domain?
This is (was?) somehow the idea of Model Driven Architecture (MDA). Quoting Model-driven architecture from Wikipedia:
The Model-Driven Architecture approach
defines system functionality using a
platform-independent model (PIM) using
an appropriate domain-specific
language (DSL).
Then, given a platform definition
model (PDM) corresponding to CORBA,
.NET, the Web, etc., the PIM is
translated to one or more
platform-specific models (PSMs) that
computers can run. This requires
mappings and transformations and
should be modeled too.
The PSM may use different Domain
Specific Languages (DSLs), or a
General Purpose Language (GPL) like
Java, C#, PHP, Python, etc. Automated tools generally
perform this translation.
Depending on the complexity of your domain and the availability of a MDA Tool, this might be an option (with a lower implementation cost).
See also
MDA: Nice idea, shame about the ...
Language Workbenches and Model Driven Architecture
UML vs. Domain-Specific Languages
DSL in the context of UML and GPL
UML or DSL: Which Bear Is Best? (be sure to read this one)

Which languages have readily available safe evaluation environments?

I'm speaking specifically of something like
the PLT Scheme make-evaluator.
It will run scheme code, but under certain conditions:
It only uses a definable amount of memory, and will quit execution if the script needs more
It behaves similarly with time
It restricts all IO except for what I specifically allow in the code
Is anyone familiar with anything else that can do this?

Lua lets you easily define sandboxes with as much or as little power you want.

PHP allows something similar with eval - though you would need to set some restrictive values with ini_set before calling it, and they would affect the current script as well.

The Java platform provides fine-grained access control and sandboxing.
This isn't exactly equivalent to make-evaluator but the API allows you to place constraints on arbitrary objects (through the GuardedObject class). You can also restrict permissions of classes loaded from a particular source.
It might be helpful to read the Java Platform Security Architecture spec
Please note that Java APIs can be accessed from most languages on the jvm.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string