Securely running user's code

Securely running user's code - security

I am looking to create an AI environment where users can submit their own code for the AI and let them compete. The language could be anything, but something easy to learn like JavaScript or Python is preferred.
Basically I see three options with a couple of variants:
Make my own language, e.g. a JavaScript clone with only very basic features like variables, loops, conditionals, arrays, etc. This is a lot of work if I want to properly implement common language features.
1.1 Take an existing language and strip it to its core. Just remove lots of features from, say, Python until there is nothing left but the above (variables, conditionals, etc.). Still a lot of work, especially if I want to keep up to date with upstream (though I just could also just ignore upstream).
Use a language's built-in features to lock it down. I know from PHP that you can disable functions and searching around, similar solutions seem to exist for Python (with lots and lots of caveats). For this I'd need to have a good understanding of all the language's features and not miss anything.
2.1. Make a preprocessor that rejects code with dangerous stuff (preferably whitelist based). Similar to option 1, except that I only have to implement the parser and not implement all features: the preprocessor has to understand the language so that you can have variables named "eval" but not call the function named "eval". Still a lot of work, but more manageable than option 1.
2.2. Run the code in a very locked-down environment. Chroot, no unnecessary permissions... perhaps in a virtual machine or container. Something in that sense. I'd have to research how to achieve this and how to make it give me the results in a secure way, but that seems doable.
Manually read through all code. Doable on a small scale or with moderators, though still tedious and error-prone (I might miss stuff like if (user.id = 0)).
The way I imagine 2.2 to work is like this: run both AIs in a virtual machine (or something) and constrain it to communicate with the host machine only (no other Internet or LAN access). Both AIs run in a separate machine and communicate with each other (well, with the playing field, and thereby they see each other's positions) through an API running on the host.
Option 2.2 seems the most doable, but also relatively hacky... I let someone's code loose in a virtualized or locked down environment, hoping that that'll keep them in while giving them free game to DoS or break out of the environment. Then again, most other options are not much better.
TL;DR: in essence my question is: how do I let people give me 'logic' for an AI (which I think is most easily done using code) and then run that without compromising the functionality of the system? There must be at least 2 AIs working on the same playing field.

This is really just a plugin system, so researching how others implement plugins is a good starting point. In particular, I'd look at web browsers like Chrome and Safari and their plugin systems.
A common theme in modern plugins systems is process isolation. Ideally you should run the plugin in its own process space in a sandbox. In OS X look at XPC, which is designed explicitly for this problem. On Linux (or more portably), I would probably look at NaCl (Native Client). The JVM is also designed to provide sandboxing, and offers a rich selection of languages. (That said, I don't personally consider the JVM a very strong sandbox. It's had a history of security problems.)
In general, my preference on these kinds of projects is a language-agnostic API. I most often use REST APIs (or "REST-like"). This allows the plugin to be highly restricted, while not restricting the language choice. I like simple HTTP for communications whenever possible because it has rich support in numerous languages, so it puts little restriction on the plugin. In fact, given your description, you wouldn't even have to run the plugin on your hardware (and certainly not on the main server). Making the plugins remote clients removes many potential concerns.
But ultimately, I think something like your "2.2" is the right direction.

Related

How to add security to Spring boot jar file? [duplicate]

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?

You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.

First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)

GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.

A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.

This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.

Can you disallow a common lisp script called from common lisp to call specific functions?

Common Lisp allows to execute/compile code at runtime. But I thought for some (scripting-like) purposes it would be good if one could disallow a user-script to call some functions (especially for application extensions). One could still ask the user if he will allow an extension to access files/... I'm thinking of something like the Android permission system for Common Lisp. Is this possible without rewriting the evaluation code?
The problem I see is, that in Common Lisp you would probably want a script to be able to use reader macros and normal macros and for the latter operators like intern, but those would allow you to get arbitrary symbols (by string manipulation & interning), so simply scanning the code before evaluation won't suffice to ensure that specific functions aren't called.
So, is there something like a lock for functions? I thought of using fmakunbound / makunbound (and keeping the values in a local variable), but would that be possible in a multi-threaded environment?
Thanks in advance.

This is not part of the Common Lisp specification and there is no Common Lisp implementation that is extended to make this kind of restriction easy.
It seems to me like it would be easier to use operating system restrictions (e.g. rlimit, capabilities, etc) to enforce what you want on the Common Lisp process.

This is not an unusual desire, i.e. to run untrusted 3rd party code in a sandbox.
You can hand craft a sandbox by creating a custom parser and interpreter for your scripting language. It is pedantic, but true, than any program with an API is providing such a service. API designers and implementors needs to worry about the vile users.
You can still call eval or the compiler to run your sandbox scripts. It just means you need to assure that your reader, parser and language decline to provide access to any risky functionality.
You can use a lisp package to create a good sandbox. You can still use s-expressions for your scripting language's syntax, but you must cripple the standard reader so the user can't escape package-sandbox. You can still use the evaluator and the compiler, but you need to be sure the package you have boxed the user into contains no functionality that he can use to do inappropriate things.
Successful sandbox design and construction is easier when you start with an empty sandbox and slowly add functionality. Common Lisp is a big language and that creates a huge surface for attacker to poke at. So if you create a sandbox out of a package it's best to start with an empty package and add functions one at a time. Thinking thru what risks they create. The same approach is good when creating your crippled reader. Don't start with the full reader and throw things away, start with a useless reader and add things. Sadly taking that advice creates a pretty significant cost to getting started. But, if you look around I suspect you can find an existing safe reader.
Xach's suggestion is another way to go and in many case more straight forward.

Is it or should it be possible to modify the GUI of an application after it's compiled?

I'm a Linux user, and I have been very hesitant to use Glade to design GUIs, since the xml files it produces can easily be modified. I know it doesn't sound like a major issue, but what if it's a commercial app that you just don't want people changing?
I use Mac OS X every once in a while, and I figured out that they use files called ".nib"s for GUIs. I think they're essentially the same type used in Nextstep and Openstep (there's even a Linux app which lets you edit these files). Anyway, these files are included in the application bundle, and according to some people, are completely editable. This person claims he even successfully edited Keynote's interface.
Now, why would that be possible? Is it completely okay for the end user to change the interface? Or is it better to have the GUI directly in the compiled application code, like traditional GTK apps?

OS X nib files are one option; the other option is to do things programmatically. For android, XML files can define the GUI or program code can do it. In Windows WPF, the UI is made in XML. Firefox/Mozilla? XUL, another XML-based UI language.
Most modern GUI toolkits have either both of these options or even just defining UIs in files.
But even binaries are modifiable. With a good binary reverse engineering tool, it's wide open. The only way to be really certain is to do what Apple did with iOS, and run signed code; the entire bundle is signed by a key and can't be run if modified.
This isn't a problem for most everyone. Why do you care if the UI is modified? The underlying code isn't, so functionality can't be added or modified.
As a corollary (and a little off-topic) something that you might have a valid concern about is stuff a little more like this.

I don't really see a problem with it. If a user messes up his UI, then it's his problem. Think of it like moddable games. Users always loved them, and in the end, most games benefit from it. There is usually nothing secret about an application's user interface. If there is, you could always do some sort of encryption.
As others have said, you can also add checksums if you just want to disallow editing.

The xml specifies little more than what the interface looks like. Without the compiled-in event handling code, it's pretty much useless. My opinion is customers change it at their own risk, and you might actually get some free useful improvements out of their hacks.
If you're really paranoid about people changing it, you could always add an MD5 digest verification step or something when you load the xml, or compile the xml string into a header file, but that defeats many of the benefits.

The theming engine can make substantial-looking changes to your GUI, as can tools like Parasite. Updating the Glade layout — at their own risk — is much safer than either of those.
What's wrong with users customizing the UI anyway?

How to build Linux system from kernel to UI layer

I have been looking into MeeGo, maemo, Android architecture.
They all have Linux Kernel, build some libraries on it, then build middle layer libraries [e.g telephony, media etc...].
Suppose i wana build my own system, say Linux Kernel, with some binariers like glibc, Dbus,.... UI toolkit like GTK+ and its binaries.
I want to compile every project from source to customize my own linux system for desktop, netbook and handheld devices. [starting from netbook first :)]
How can i build my own customize system from kernel to UI.

I apologize in advance for a very long winded answer to what you thought would be a very simple question. Unfortunately, piecing together an entire operating system from many different bits in a coherent and unified manner is not exactly a trivial task. I'm currently working on my own Xen based distribution, I'll share my experience thus far (beyond Linux From Scratch):
1 - Decide on a scope and stick to it
If you have any hope of actually completing this project, you need write an explanation of what your new OS will be and do once its completed in a single paragraph. Print that out and tape it to your wall, directly in front of you. Read it, chant it, practice saying it backwards and whatever else may help you to keep it directly in front of any urge to succumb to feature creep.
2 - Decide on a package manager
This may be the single most important decision that you will make. You need to decide how you will maintain your operating system in regards to updates and new releases, even if you are the only subscriber. Anyone, including you who uses the new OS will surely find a need to install something that was not included in the base distribution. Even if you are pushing out an OS to power a kiosk, its critical for all deployments to keep themselves up to date in a sane and consistent manner.
I ended up going with apt-rpm because it offered the flexibility of the popular .rpm package format while leveraging apt's known sanity when it comes to dependencies. You may prefer using yum, apt with .deb packages, slackware style .tgz packages or your own format.
Decide on this quickly, because its going to dictate how you structure your build. Keep track of dependencies in each component so that its easy to roll packages later.
3 - Re-read your scope then configure your kernel
Avoid the kitchen sink syndrome when making a kernel. Look at what you want to accomplish and then decide what the kernel has to support. You will probably want full gadget support, compatibility with file systems from other popular operating systems, security hooks appropriate for people who do a lot of browsing, etc. You don't need to support crazy RAID configurations, advanced netfilter targets and minixfs, but wifi better work. You don't need 10GBE or infiniband support. Go through the kernel configuration carefully. If you can't justify including a module by its potential use, don't check it.
Avoid pulling in out of tree patches unless you absolutely need them. From time to time, people come up with new scheduling algorithms, experimental file systems, etc. It is very, very difficult to maintain a kernel that consumes from anything else but mainline.
There are exceptions, of course. If going out of tree is the only way to meet one of your goals stated in your scope. Just remain conscious of how much additional work you'll be making for yourself in the future.
4 - Re-read your scope then select your base userland
At the very minimum, you'll need a shell, the core utilities and an editor that works without an window manager. Paying attention to dependencies will tell you that you also need a C library and whatever else is needed to make the base commands work. As Eli answered, Linux From Scratch is a good resource to check. I also strongly suggest looking at the LSB (Linux standard base), this is a specification that lists common packages and components that are 'expected' to be included with any distribution. Don't follow the LSB as a standard, compare its suggestions against your scope. If the purpose of your OS does not necessitate inclusion of something and nothing you install will depend on it, don't include it.
5 - Re-read your scope and decide on a window system
Again, referring to the everything including the kitchen sink syndrome, try and resist the urge to just slap a stock install of KDE or GNOME on top of your base OS and call it done. Another common pitfall is to install a full blown version of either and work backwards by removing things that aren't needed. For the sake of sane dependencies, its really better to work on this from bottom up rather than top down.
Decide quickly on the UI toolkit that your distribution is going to favor and get it (with supporting libraries) in place. Define consistency in UIs quickly and stick to it. Nothing is more annoying than having 10 windows open that behave completely differently as far as controls go. When I see this, I diagnose the OS with multiple personality disorder and want to medicate its developer. There was just an uproar regarding Ubuntu moving window controls around, and they were doing it consistently .. the inconsistency was the behavior changing between versions. People get very upset if they can't immediately find a button or have to increase their mouse mileage.
6 - Re-read your scope and pick your applications
Avoid kitchen sink syndrome here as well. Choose your applications not only based on your scope and their popularity, but how easy they will be for you to maintain. Its very likely that you will be applying your own patches to them (even simple ones like messengers updating a blinking light on the toolbar).
Its important to keep every architecture that you want to support in mind as you select what you want to include. For instance, if Valgrind is your best friend, be aware that you won't be able to use it to debug issues on certain ARM platforms.
Pretend you are a company and will be an employee there. Does your company pass the Joel test? Consider a continuous integration system like Hudson, as well. It will save you lots of hair pulling as you progress.
As you begin unifying all of these components, you'll naturally be establishing your own SDK. Document it as you go, avoid breaking it on a whim (refer to your scope, always). Its perfectly acceptable to just let linux be linux, which turns your SDK more into formal guidelines than anything else.
In my case, I'm rather fortunate to be working on something that is designed strictly as a server OS. I don't have to deal with desktop caveats and I don't envy anyone who does.
7 - Additional suggestions
These are in random order, but noting them might save you some time:
Maintain patch sets to every line of upstream code that you modify, in numbered sequence. An example might be 00-make-bash-clairvoyant.patch, this allows you to maintain patches instead of entire forked repositories of upstream code. You'll thank yourself for this later.
If a component has a testing suite, make sure you add tests for anything that you introduce. Its easy to just say "great, it works!" and leave it at that, keep in mind that you'll likely be adding even more later, which may break what you added previously.
Use whatever version control system is in use by the authors when pulling in upstream code. This makes merging of new code much, much simpler and shaves hours off of re-basing your patches.
Even if you think upstream authors won't be interested in your changes, at least alert them to the fact that they exist. Coordination is essential, even if you simply learn that a feature you just put in is already in planning and will be implemented differently in the future.
You may be convinced that you will be the only person to ever use your OS. Design it as though millions will use it, you never know. This kind of thinking helps avoid kludges.
Don't pull upstream alpha code, no matter what the temptation may be. Red Hat tried that, it did not work out well. Stick to stable releases unless you are pulling in bug fixes. Major bug fixes usually result in upstream releases, so make sure you watch and coordinate.
Remember that it's supposed to be fun.
Finally, realize that rolling an entire from-scratch distribution is exponentially more complex than forking an existing distribution and simply adding whatever you feel that it lacks. You need to reward yourself often by booting your OS and actually using it productively. If you get too frustrated, consistently confused or find yourself putting off work on it, consider making a lightweight fork of Debian or Ubuntu. You can then go back and duplicate it entirely from scratch. Its no different than prototyping an application in a simpler / rapid language first before writing it for real in something more difficult. If you want to go this route (first), gNewSense offers utilities to fork your own OS directly from Ubuntu. Note, by default, their utilities will strip any non free bits (including binary kernel blobs) from the resulting distro.
I strongly suggest going the completely from scratch route (first) because the experience that you will gain is far greater than making yet another fork. However, its also important that you actually complete your project. Best is subjective, do what works for you.
Good luck on your project, see you on distrowatch.

Check out Linux From Scratch:
Linux From Scratch (LFS) is a project
that provides you with step-by-step
instructions for building your own
customized Linux system entirely from
source.

Use Gentoo Linux. It is a compile from source distribution, very customizable. I like it a lot.

Using multiple languages in one project

From discussions I've had about language design, it seems like a lot of people make the argument that there is not and will never be "one true language". The alternative, according to these people, is to be familiar with several languages and to pick the right tool for the job. This makes perfect sense at the level of a whole project or a large subproject that only has to interact with the rest of the project through a very narrow, well-defined interface.
On the other hand, using lots of different languages seems like a very awkward thing to do when trying to solve lots of small subproblems elegantly. In other words, IMHO, general purpose languages that are decent at everything still matter. As a trivial example, let's say you need to do the following:
Read a bunch of data in some arbitrary format from a file. Check it for errors, etc. (Best done in something like Perl).
Load this data into matrices, do a bunch of hardcore matrix ops on it (Best done in something like Matlab).
Run a custom, computationally intensive routine on it that must be fast and space-efficient (Best done in C or C++).
This is a fairly simple project, other than writing the computationally intensive custom matrix processing routine, yet the only good answer about what language to use seems to be a general-purpose one that's decent at everything.
What am I missing here? How does one use multiple languages effectively to take advantage of each of their strengths?

I have worked on many projects that contain a fairly diverse mix of languages. Unless it's a .NET project, you usually use these different languages at different tiers or in different processes. Maybe your webapp is in PHP and your application server in java. So you do not really "mix and match" at method level.
In .NET and for some of the java vm languages the rules change a bit, since you can mix much more freely. But the features of these languages are mostly defined by the class libraries - which are common. So the motivation for switching languages in .net is usually driven by other factors, such as which language the developers know. F# actually provides quite a few language features that are specific for that language, so it seems to be a little bit of an exception within .NET. Some of the java VM languages also add methods to the standard java libraries, adding features not available in java.
You do actually get quite used to working with multiple languages as long as all of them have good IDE support. Without that I really think I would be lost.

Embedded languages like Lua make this pretty easy. Lua is a great dynamic language along the lines of Ruby or Python and allows you to quickly develop high quality code. It also has tight integration with C, which means you can take advantage of C/C++ libraries and optimize performance critical sections by writing them in C or C++.
In scientific computing it's also not uncommon to have a script that for example will do some data processing in Matlab, use Perl to reformat the output and then pass that into another app written in Matlab, C or whatever. This tends to be more common when integrating apps that have been written by different people than when developing something from scratch, though.

You're saying that the choices are either a) write everything in one language and lose efficiency, or b) cobble together a bunch of different executables in order to use the best language for each section of the job.
Most working programmers pick option c: do the best they can with the languages their IT departments support in production. Generally programmers have some limited language choice within a particular framework such as the JVM or CLR - so it's neither the toolbox utopia nor the monolanguage ghetto.
Even LAMP and Rails support (demand, really) different languages at different levels - HTML, Javascript, Ruby, C in the case of Rails. If you're writing software services (which is where most of the interesting work is happening these days) then you're hardly ever writing in just one language. But your choices aren't infinite, either.

Consider the most difficult part of your task. Is parsing the file the most complex part? A good language for reading files, and then manipulating them might be the most pertinent choice.
Are the math transformations the most difficult part? You might want to suck it up and use matlab, feeding it in with scripts or other home grown tools.
Is performance the most important part? How much calculation will be done versus the transformations and parsing? If 90% of the work done will be on calculation, and the other parts are only transitory, then consider a C/C++ solution.
There's nothing inherently wrong with playing to multiple language strengths in a simple solution, especially if you don't mind gluing it together with scripts. However, the more languages and modules that you integrate increase your dependencies, and the more dependencies you have, the more and more difficult it becomes to distribute and maintain your app. That's where the real trade-off lies: the fewer technologies, libraries, and bits of software you need to run the app, the simpler it becomes. The simpler the app, the more maintainable, distributable, debuggable, and usable it becomes.
If you don't mind the extra overhead of integrating the multiple pieces of technology, or that overhead will save you an equal or greater amount of development time, go right ahead.

I think it's fine. If you set it up as an n-tier setup where each layer accesses what is below it the same way (SOAP/XML/COM, etc), I have found it to be invisible in the end.
For your example, I would choose one language for the front end that is easy, quick, flexible, and can make many, many, many, many, many types of calls to different interfaces such as com, corba, xml, .net, custom dll libraries, ftp, and custom event gateways. This ensures that you can call, or interface with any other lanugage easily.
My tool of choice for this is Coldfusion from working with ASP/PHP/Java and a bit of Ruby. It seems to have it all in one place and is very simple. For your specific case, COldfusion will allow you to compile your own library tag that you can call in your web code. Inside the tag you can put all the c code you like and make it do whatever you want.
There are free and open source versions of Coldfusion available and it is fantastic for making Java and .NET calls naively from one code base. Check it out, I really feel it might be the best fit as it's made for these kinds of corporate solutions.
I know PHP can be pretty good for this too depending on your situation, and I'm sure Asp.net isn't too far behind.

My work in multiple languages has involved just a simple mix of C# and C++/CLI (that's like C++ with .NET if you haven't heard of it). I do think that it's worth saying that the way .NET and Visual Studio allow you to combine many different languages together means that you won't have as high of an overhead as you would normally expect when combining two languages - Visual Studio keeps it all together. Given that there are many .NET clones of various languages - F#, IronPython, IronRuby, JScript.NET for example - Visual Studio is quite a good way in which you can combine various languages.
Down to the technical level which sounds like what you're interested in, the way it's done in Visual Studio is you must have one project for each separate language, and each project gets compiled into an assembly - basically meaning it becomes a library for the other projects to use. You'll have one main project which drives everything and calls the other libraries. The requirement of each different language having to be a different project means you wouldn't change languages at the method level, but if you are doing a large enough application that you can logically separate it into multiple projects, it can actually be quite useful. In particular, the separation and abstraction can make things easier overall.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string