What is the least obvious Go standard library package you can use to escape a sandbox? - security

I'm designing a project for a college-level computer security course, and I'm trying to include a vulnerability where code which is "clean" by virtue of a number of risky packages being blacklisted (unsafe, os, ioutil, etc). The question is this: can you think of a way to use other non-obvious Go standard library packages to escape the sandbox? "Escape the sandbox" here means reading/writing files, making network connections, breaking memory safety (which would allow you to do any of the other things), etc.
Things I've tried so far that haven't worked:
Using the reflect package to do unsafe pointer conversions (the reflect package seems really safe against this sort of abuse)
Using the reflect package to get access to a reference held by a random stdlib package to some sensitive function like os.Open (I haven't found any that actually keep function pointers or anything like that)

Related

Languages with a NodeJS/CommonJS style module system

I really like the way NodeJS (and it's browser-side counterparts) handle modules:
var $ = require('jquery');
var config = require('./config.json');
module.exports = function(){};
module.exports = {...}
I am actually rather disappointed by the ES2015 'import' spec which is very similar to the majority of languages.
Out of curiosity, I decided to look for other languages which implement or even support a similar export/import style, but to no avail.
Perhaps I'm missing something, or more likely, my Google Foo isn't up to scratch, but it would be really interesting to see which other languages work in a similar way.
Has anyone come across similar systems?
Or maybe someone can even provide reasons that it isn't used all that often.
It is nearly impossible to properly compare these features. One can only compare their implementation in specific languages. I collected my experience mostly with the language Java and nodejs.
I observed these differences:
You can use require for more than just making other modules available to your module. For example, you can use it to parse a JSON file.
You can use require everywhere in your code, while import is only available at the top of a file.
require actually executes the required module (if it was not yet executed), while import has a more declarative nature. This might not be true for all languages, but it is a tendency.
require can load private dependencies from sub directories, while import often uses one global namespace for all the code. Again, this is also not true in general, but merely a tendency.
Responsibilities
As you can see, the require method has multiple responsibilities: declaring module dependencies and reading data. This is better separated with the import approach, since import is supposed to only handle module dependencies. I guess, what you like about being able to use the require method for reading JSON is, that it provides a really easy interface to the programmer. I agree that it is nice to have this kind of easy JSON reading interface, however there is no need to mix it with the module dependency mechanism. There can just be another method, for example readJson(). This would separate the concerns, so the require method would only be needed for declaring module dependencies.
Location in the Code
Now, that we only use require for module dependencies, it is a bad practice to use it anywhere else than at the top of your module. It just makes it hard to see the module dependencies when you use it everywhere in your code. This is why you can use the import statement only on top of your code.
I don't see the point where import creates a global variable. It merely creates a consistent identifier for each dependency, which is limited to the current file. As I said above, I recommend doing the same with the require method by using it only at the top of the file. It really helps to increase the readability of the code.
How it works
Executing code when loading a module can also be a problem, especially in big programs. You might run into a loop where one module transitively requires itself. This can be really hard to resolve. To my knowledge, nodejs handles this situation like so: When A requires B and B requires A and you start by requiring A, then:
the module system remembers that it currently loads A
it executes the code in A
it remembers that is currently loads B
it executes the code in B
it tries to load A, but A is already loading
A is not yet finished loading
it returns the half loaded A to B
B does not expect A to be half loaded
This might be a problem. Now, one can argue that cyclic dependencies should really be avoided and I agree with this. However, cyclic dependencies should only be avoided between separate components of a program. Classes in a component often have cyclic dependencies. Now, the module system can be used for both abstraction layers: Classes and Components. This might be an issue.
Next, the require approach often leads to singleton modules, which cannot be used multiple times in the same program, because they store global state. However, this is not really the fault of the system but the programmers fault how uses the system in the wrong way. Still, my observation is that the require approach misleads especially new programmers to do this.
Dependency Management
The dependency management that underlays the different approaches is indeed an interesting point. For example Java still misses a proper module system in the current version. Again, it is announced for the next version, but who knows whether this will ever become true. Currently, you can only get modules using OSGi, which is far from easy to use.
The dependency management underlaying nodejs is very powerful. However, it is also not perfect. For example non-private dependencies, which are dependencies that are exposed via the modules API, are always a problem. However, this is a common problem for dependency management so it is not limited to nodejs.
Conclusion
I guess both are not that bad, since each is used successfully. However, in my opinion, import has some objective advantages over require, like the separation of responsibilities. It follows that import can be restricted to the top of the code, which means there is only one place to search for module dependencies. Also, import might be a better fit for compiled languages, since these do not need to execute code to load code.

Is "require()" safe in a sandbox?

I'm building a sandboxed duktape application. The sanboxing doc (https://github.com/svaarala/duktape/blob/master/doc/sandboxing.rst) advises to remove the default require() implementation. I'm not clear why that is necessary. It seems that require() depends on modSearch() to determine what code to load and from where. If modSearch() doesn't permit loading data from anywhere that isn't permitted in the sandbox, is there anything else about the default require() implementation that is unsafe or gives cause for wariness?
The recommendation is there to emphasize that the default require() is not necessarily sandboxing safe (even if the current implementation was), so it's probably best to replace it when sandboxing against potentially untrusted code (at least if the code can be actively malicious rather than just accidentally broken).
That said I don't know of any concrete issues right now.

Securely running user's code

I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.
This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.

hackage.haskell.org documentation convention meaning

I am currently reading http://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-Set.html#t:Set
What does following detail convey (to an absolute haskell beginner)?
Portability: portable (Which other are portability value?)
Stability: provisional (Which other are stability value?)
Maintainer: libraries#haskell.org
Safe Haskell: Safe (Is there something unsafe?)
The fields come from the package's .cabal file, which lists some metadata for the package. Many fields can have free-form values, so that the developer decides for him/herself what to write in that field, and there are no fixed "rules" for what each field must not contain.
Portability: Describes how portable the package is between Haskell compilers, and sometimes also between operating systems. The only values I've seen are "portable" and "unportable". An unportable package is a package that maybe depends on a Haskell language extension that only exists in the GHC compiler, and doesn't work on any other Haskell compiler like UHC, or that maybe depends on some system library that only exists in UNIX and doesn't work in Windows.
Stability: Specifies how stable a library is, which includes its reliability (e.g. how often it crashes), but most importantly how often its API changes. I've seen the values "experimental", "provisional" and "stable", but there might be a list with more somewhere (Things in the Cabal documentation are sometimes impossible to find). When a package is experimental, it means that its interface probably changes between every release, because the developer hasn't decided on how it should be implemented yet, or because the developer just implemented some theoretical functionality from a paper somewhere, and doesn't intend to maintain the package; s/he just wanted to implement the functionality to see if it was possible and is publishing the package as a demonstration. When a package is provisional, it means that the general API is stable, so that there might be updates to the package that only fixes internal bugs and doesn't add or remove any functions. Because it is provisional, however, it might change in the future when the developers decide to add new features or restructure the library. With a stable library, this basically never happens; the API will probably never change, and the implementation is "rock solid" or a reference implementation of an API or something.
The maintainer for a package is the person or group of persons who are responsible for the package. The email specifies how these maintainers can be reached.
The "Safe Haskell" field refers to a GHC extension which you can read more about here. A module which is unsafe uses functions like unsafePerformIO which breaks some of the fundamental "rules" of Haskell like referential transparency. An unsafe module might also use unsafe language extensions. A safe module is a module that doesn't use any unsafe functions etc, and also doesn't import any other unsafe modules. A trusted module uses unsafe functions (Directly trustworthy, indirectly trusted), but the author has made sure that the public API of the module hides this fact safely, so that from the outside it seems like the module is safe for all intents and purposes. Those are the options for the "Safe Haskell" field.

Protecting shared library

Is there any way to protect a shared library (.so file) against reverse engineering ?
Is there any free tool to do that ?
The obvious first step is to strip the library of all symbols, except the ones required for the published API you provide.
The "standard" disassembly-prevention techniques (such as jumping into the middle of instruction, or encrypting some parts of code and decrypting them on-demand) apply to shared libraries.
Other techniques, e.g. detecting that you are running under debugger and failing, do not really apply (unless you want to drive your end-users completely insane).
Assuming you want your end-users to be able to debug the applications they are developing using your library, obfuscation is a mostly lost cause. Your efforts are really much better spent providing features and support.
Reverse engineering protection comes in many forms, here are just a few:
Detecting reversing environments, such as being run in a debugger or a virtual machine, and aborting. This prevents an analyst from figuring out what is going on. Usually used by malware. A common trick is to run undocumented instructions that behave differently in VMWare than on a real CPU.
Formatting the binary so that it is malformed, e.g. missing ELF sections. You're trying to prevent normal analysis tools from being able to open the file. On Linux, this means doing something that libbfd doesn't understand (but other libraries like capstone may still work).
Randomizing the binary's symbols and code blocks so that they don't look like what a compiler would produce. This makes decompiling (guessing at the original source code) more difficult. Some commercial programs (like games) are deployed with this kind of protection.
Polymorphic code that changes itself on the fly (e.g. decompresses into memory when loaded). The most sophisticated ones are designed for use by malware and are called packers (avoid these unless you want to get flagged by anti-malware tools). There are also academic ones like UPX http://upx.sourceforge.net/ (which provides a tool to undo the UPX'ing).

Resources