Safe execution of untrusted Haskell code

Safe execution of untrusted Haskell code - haskell

I'm looking for a way to run an arbitrary Haskell code safely (or refuse to run unsafe code).
Must have:
module/function whitelist
timeout on execution
memory usage restriction
Capabilities I would like to see:
ability to kill thread
compiling the modules to native code
caching of compiled code
running several interpreters concurrently
complex datatype for compiler errors (insted of simple message in String)
With that sort of functionality it would be possible to implement a browser plugin capable of running arbitrary Haskell code, which is the idea I have in mind.
EDIT: I've got two answers, both great. Thanks! The sad part is that there doesn't seem to be ready-to-go library, just a similar program. It's a useful resource though. Anyway I think I'll wait for 7.2.1 to be released and try to use SafeHaskell in my own program.

We've been doing this for about 8 years now in lambdabot, which supports:
a controlled namespace
OS-enforced timeouts
native code modules
caching
concurrent interactive top-levels
custom error message returns.
This series of rules is documented, see:
Safely running untrusted Haskell code
mueval, an alternative implementation based on ghc-api
The approach to safety taken in lambdabot inspired the Safe Haskell language extension work.
For approaches to dynamic extension of compiled Haskell applications, in Haskell, see the two papers:
Dynamic Extension of Typed Functional Languages, and
Dynamic applications from the ground up.

GHC 7.2.1 will likely have a new facility called SafeHaskell which covers some of what you want. SafeHaskell ensures type-safety (so things like unsafePerformIO are outlawed), and establishes a trust mechanism, so that a library with a safe API but implemented using unsafe features can be trusted. It is designed exactly for running untrusted code.
For the other practical aspects (timeouts and so on), lambdabot as Don says would be a great place to look.

Related

Why are asynchronous runtimes like Tokio necessary?

My first experience doing a computer system project was building a server using vanilla Java and then a client on an Android phone. Since then, I've found that there are a lot of frameworks to help manage scalability and remove the need to write boilerplate code.
I'm trying to understand what services like Tokio and Rayon enable.
I came across this paragraph on the Tokio tutorial page and I'm having a hard time understanding it
When you write your application in an asynchronous manner, you enable it to scale much better by reducing the cost of doing many things at the same time. However, asynchronous Rust code does not run on its own, so you must choose a runtime to execute it.
I first thought a "runtime" might refer to where the binary can run, but it looks like Tokio just provides functions that are already available in the Rust standard library while Rayon implements functions that aren't in the standard library.
Are the standard implementations for asynchronous functions written poorly in the standard library or am I not understanding what service Tokio is providing?

Rust currently does not provide an async runtime in the standard library. For full details, see Asynchronous Programming in Rust, and particularly the chapter on "The Async Ecosystem."
Rust currently provides only the bare essentials for writing async code. Importantly, executors, tasks, reactors, combinators, and low-level I/O futures and traits are not yet provided in the standard library. In the meantime, community-provided async ecosystems fill in these gaps.
Rust has very strict backward compatibility requirements, and they haven't chosen to lock-in a specific runtime. There are reasons to pick one over another (features versus size for example), and making it part of the standard library would impose certain choices that aren't clearly the right ones for all projects. This may change in the future as the community projects better explore this space and help determine the best mix of choices without the strong backward compatibility promises.

Elixir and Haskell interoperability

As a platform for handling concurrent problems, Elixir/OTP seems to be the best suited solution.
When writing an application with a web interface, consider the case in which I want to reason about, and decouple, the application logic using another functional language - namely haskell (due to benefits like its advanced detection of errors at compile time, static typing, etc). I would then handle concurrency using GenServers, and attach a web interface using Phoenix.Channels.
Is this setup even possible using NIFs? Also, would true concurrency be maintained? I'm not sure that I'm following the correct line of reasoning here, but would a new haskell process be able to be spawned in line with GenServer demands, and would the two be able communicate efficiently?

This setup is certainly possible using NIFs and GHC's FFI with a small amount of boilerplate written in C. But NIFs are best used for short synchronous computations with no side effects and I get the feeling that that isn't what these operations are.
You'd probably be better off with C Nodes for the Haskell parts of the application. Most of the documentation you'll find for that will be for Erlang and not Elixir, but given Elixir's easy interop with Erlang, it should be pretty straight forward (someone's even written an example). Most of the hard work will be to do with writing a Haskell "C Node", for which a cursory glance at hackage and github turns up nothing.

How reasonable/possible/difficult is it to write a tsocks clone in Haskell

I'm a reasonably competent programmer who knows haskell, but who hasn't used it in any major projects. I know enough about c and systems and network programming that I believe I can pick apart tsocks from the source code.
I don't have any experience with the low-level systems interfaces haskell provides. I'm looking for any advice people can offer me on the topic, including, "Don't do it; you'll hate yourself for it," provided there is an explanation.

I really wouldn't do this, except as an experiment. I'm a Haskell guy, but not a deep systems guy, so there's a caveat there. But nonetheless, I see the following on the tsocks page:
tsocks is based on the 'shared library
interceptor' concept. Through use of
the LD_PRELOAD environment variable or
the /etc/ld.so.preload file tsocks is
automatically loaded into the process
space of every executed program. From
there it overrides the normal
connect() function by providing its
own. Thus when an application calls
connect() to establish a TCP
connection it instead passes control
to tsocks. tsocks determines if the
connection needs to be made via a
SOCKS server (by checking
/etc/tsocks.conf) and negotiates the
connection if so (through use of the
real connect() function )
It is possible to call Haskell from C, and vice-versa. And its relatively easy, in fact. For shared libraries, see this: http://www.haskell.org/ghc/docs/6.12.1/html/users_guide/using-shared-libs.html.
But when you invoke Haskell from C, you need to A) link in the runtime and B) invoke the runtime.
So that works when the C knows that its calling Haskell. But its relatively trickier when the C doesn't know that it's calling Haskell, and so you'd need to wrap the Haskell shared library with a C library that invoked and managed the runtime transparently to the program that is preloading the haskell-tsocks library to intercept its normal connect functions.
So I'm sure this can be done -- but it sounds rather painful and complicated, and somewhat expensive in terms of having to link the whole ghc runtime in for this one feature. And frankly, I imagine the code you'd be writing (I haven't inspected the tsocks code itself yet) would largely be FFI calls anyway.
So a Haskell implementation of some element of socks -- a proxy, a client, etc. sounds interesting and potentially useful. But the exact preload magic that tsocks does sounds like a perhaps poor fit.
Bear in mind that there are Haskell hackers that are much better at this stuff than me, more knowledgeable, and more experienced. So they might say otherwise.

(Posting as a separate answer, since this is advice unrelated to the FFI)
You probably know this stuff, but in case its useful for anyone...
Read up on the Network.Socket module
Search the Haskell Wiki for pages that might help you (like Applications and libraries/Network)
Check out System Programming in Haskell and other chapters from RWH
Ignore the people that say "Haskell is terrible for I/O" - protip: you can just scare them away by saying fancy words like "endofunctor"

This may not be exactly the answer you were looking for, but instead of re-writing it in Haskell, you could just use the Foreign Function Interface to wrap the already-existing C implementation in Haskell types.
Note, one of the few major changes in Haskell 2010 was to officially include the FFI as a language feature. Link: Haskell 2010 FFI

making standalone toplevels with OCaml and Haskell

In Common Lisp, programs are often produced as binaries with a translator bundled inside. StumpWM is a good example.
How would one do the same with Haskell and OCaml?
It is not necessary to provide a debugger as well, as Common Lisp does, the aim is to make extensions while not depending on the whole translator package ( xmonad which requires GHC ).
P.S. I know about ocamlmktop, and it works great, except I don't really get why it requires "pervasives.cmi" and doesn't bundle it with the binary. So, best thing I can do is mycustomtoplevel -I /path/to/dir/with/pervasives.cmi/. Any way to override it?

This isn't really possible for (GHC) Haskell - you would either need to to ship the application binary + GHC so you can extend via GHC-API, or embed an extension language. I don't think there are any "off-the-shelf" extension languages to embed in Haskell at the moment, though HsLua might be close. This is a bridge to the the standard (C source) Lua. There was a thread on Haskell-cafe last month about extension languages written in Haskell, I think the answer was 'there aren't any'.
http://www.haskell.org/pipermail/haskell-cafe/2010-November/085830.html

With GHC, there is GHC-API, which allows you to embed ghci-like interpreters in your program. It's a quite low-level and often changing library, since it simply provides access to GHC internas.
Then, there is Hint, a library which aims to encapsulate ghc-api behind a well designed and more stable interface.
Nevertheless, I've recently switched from using either of these packages to using an external ghci. The external ghci process is controlled via standard input/output pipes. This change made it easy to stay compatible with GHC 6.12.x and 7.0.x, while our ghc-api code broke with GHC 7.x and hint didn't work out of the box either. I don't know whether there is a new version of hint available, which works with GHC 7.

For Ocaml, have you tried using findlib? See the section Custom Toploops.

The right language for OpenGL UI prototyping. Ditching Python

So, I got this idea that I'd try to prototype an experimental user interface using OpenGL and some physics. I know little about either of the topics, but am pretty experienced with programming languages such as C++, Java and C#. After some initial research, I decided on using Python (with Eclipse/PyDev) and Qt, both new to me, and now have four different topics to learn more or less simultaneously.
I've gotten quite far with both OpenGL and Python, but while Python and its ecosystem initially seemed perfect for the task, I've now discovered some serious drawbacks. Bad API documentation and lacking code completion (due to dynamic typing), having to import every module I use in every other module gets tedious when having one class per module, having to select the correct module to run the program, and having to wait 30 seconds for the program to start and obscure the IDE before being notified of many obvious typos and other mistakes. It gets really annoying really fast. Quite frankly, i don't get what all the fuzz is about. Lambda functions, list comprehensions etc. are nice and all, but there's certainly more important things.
So, unless anyone can resolve at least some of these annoyances, Python is out. C++ is out as well, for obvious reasons, and C# is out, mainly for lack of portability. This leaves Java and JOGL as an attractive option, but I'm also curious about Ruby and Groovy. I'd like your opinion on these and others though, to keep my from making the same mistake again.
The requirements are:
Keeping the hell out of my way.
Good code completion. Complete method signatures, including data types and parameter names.
Good OpenGL support.
Qt support is preferable.
Object Oriented
Suitable for RAD, prototyping
Cross-platform
Preferably Open-Source, but at least free.

It seems you aren't mainly having a problem with Python itself, but instead with the IDE.
"Bad API documentation"
To what API? Python itself, Qt or some other library you are using?
"lacking code completion (due to dynamic typing)"
As long as you are not doing anything magic, I find that PyDev is pretty darn good at figuring these things out. If it gets lost, you can always typehint by doing:
assert isinstance(myObj, MyClass)
Then, PyDev will provide you with code completion even if myObj comes from a dynamic context.
"having to import every module I use in every other module gets tedious when having one class per module"
Install PyDev Extensions, it has auto-import on the fly. Or collect all your imports in a separate module and do:
from mymodulewithallimports import *
"having to select the correct module to run the program"
In Eclipse, you can set up a default startup file, or just check "use last run configuration". Then you never have to select it again.
"before being notified of many obvious typos and other mistakes"
Install PyDev Extensions, it has more advanced syntax checking and will happily notify you about unused imports/variables, uninitialized variables etc.

Looking just at your list I'd recommend C++; especially because Code Completion is so important to you.
About Python: Although I have few experience with OpenGL programming with Python (used C++ for that), the Python community offers a number of interesting modules for OpenGL development: pyopengl, pyglew, pygpu; just to name a few.
BTW, your import issue can be resolved easily by importing the modules in the __init__.py files of the directory the modules are contained in and then just importing the "parent" module. This is not recommended but nonetheless possible.

I don't understand why nobody has heard of the D programing language?
THIS IS THE PERFECT SOLUTION!!!!

The only real alternative if you desire all those things is to use Java, but honestly you're being a bit picky about features. Is code completion really that important a trait? Everything else you've listed is traditionally very well regarded with Python, so I don't see the issue.

The text editor (not even an IDE) which I use lets you import API function definitions. Code completion is not a language feature, especially with OpenGL. Just type gl[Ctrl+I] and you'd get the options.

I tried using Java3D and java once. I realized Java3D is a typical Java API... lots of objects to do simple things, and because it's Java, that translates to a lot of code. I then moved to Jython in Eclipse to which cleaned up the code, leaving me with only the complexity of Java3D.
So in the end, I went in the opposite direction. One advantage this has over pure python is I can use Java with all of Eclipse's benefits like autocomplete and move it over to python when parts get unwieldy in Java.

It seems like Pydev can offer code completion for you in Eclipse.

I started off doing OpenGL programming with GL4Java, which got migrated to JOGL and you should definately give it (JOGL) a try. Java offers most of the features you require (plus Eclipse gives you the code completion) and especially for JOGL there are a lot of tutorials out there to get you started.

Consider Boo -- it has many of Python's advantages while adopting features from elsewhere as well, and its compile-time type inference (when variables are neither explicitly given a specific type or explicitly duck typed) allows the kind of autocompletion support you're asking about.
The Tao.OpenGL library exposes OpenGL to .NET apps (such as those Boo compiles), with explicit support for Mono.
(Personally, I'm mostly a Python developer when not doing C or Java, but couldn't care less about autocompletion... but hey, it's your question; also, the one-class-per-module convention seems like a ridiculous amount of pain you're putting yourself through needlessly).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string