Running (& compiling) untrusted user code - linux

I want to create a application that contains a feature that allows users to submit code and the server will compile and run it, similar to Ideone & Spoj. How do I do this securely in a scalable manner?
Partial Solutions I'm aware of:
IDEA 1 - 3rd Party Services
The Sphere Engine. However this costs a LOT of money!
I'm not aware of any open source application I can run on my server to achieve this, or a cheaper alternative. Please correct me if i'm wrong.
IDEA 2 - VM
This would be the next most sensible choice. However, I'm unsure how to implement it. For example let's say I created a VM and started to run the user's code. This would restrict damage on MY system, but not the damage on the VM, which other users would have to use. Does that mean I have to create a new VM each and every time I want to compile and run user's code (which clearly is not scalable - correct me if I'm wrong.

Having not set up a thing, I assumed that services like TravisCI (which compiles code and runs it under test cases you provide), have a base virtual machine image, which boots up and processes your code. The next user to come along gets a separate VM booted from the same base image, your changes aren't stored.
So inside the VM, the user code can do whatever. All of its effects, except stuff written to the console will be erased at the end of the time limit.

Related

Best way to run an untrusted executable file from a user on a server

I'm trying to create a server that compiles and runs code submitted by the user. How can I run the executable output (compiled from code, unreliable) on the server-side, without affecting my codebase (binary executable itself)?
Is docker somehow going to be useful here? If "yes", how?
FYI, I'm using a microservices architecture. And Go for the server-side development of this service (code-runner).
The best way to run an unreliable code is isolating it, we do it a lot when reverse engineering malware. Since you are using Docker there are some precautions you would need to take that are not needed when using virtual machines.
The safest way to do it would be having another Docker container only for compiling, executing and performing any other operation needed.
Let's take the example of an application which the user sends the code and we run it for him comparing it with the expected output, as in programming contests like Codeforces.
The main application receives the code
The main application creates a special Docker container
The main application sends the code to the special container with the file representing the expected output
The special container compiles, executes and print the code output to a file and uses diff output expected_output
The special container sends the diff result (basically a true or false) to the main application
This way all the stuff related to the code sent by the user is isolated so it is safer.
Some notes:
The container running the user code should be running it using a low privileges user for safety
If you need to get the output from the code back and not just a true or false from a diff then you need to be sure that his code is not generating a malware as output which you save in a file and put inside the main application server
Blocking libraries of threads, networking, filesystem access, etc, from the user code, helps to make it safer
[What is the] Best way to run an untrusted executable file from an user on a server?
Not at all.
Sandboxing is hard. Try using the Playground which does exactly what you need.
Or: Reimplement the playground according to your needs. https://blog.golang.org/playground Might help you get started.
Just remember to disallow dangerous packages like unsafe, runtime, os/exec and everything from net.

Use Google App Engine or Google Cloud Compute VM to Test Run My App?

I'm moving my Three.js app and its customized node.js environment, which I've been running on my local machine to Google Cloud. I want to test things out there, and hopefully soon get some early alpha testing going with other people.
I'm not sure which is the wiser way to go... to upload the repo I've been running locally as-is onto a VM which users would then access via the VM's external IP until I get a good name to call this app... or merge my local node.js environment with what's available via the Google App Engine and run it on GAE.
Issues I'm running into with the linux VM approach... I'm not sure how to do the equivalent on the VM of what I've been doing locally. In Windows Powershell I cd into the app directory and then enter node index.js. I'm assuming by this method of deployment that I can get the app running as soon as the browser hits the external IP. I should mention too that the app will allow users to save content as well as upload images, and eventually, 3D models as well as json datasets.
Issues I'm running into with the App Engine approach: it looks like I only have access to a linux-based command line, and have to install all the node.js modules manually. Meanwhile I have a bunch of files to upload, both the server-side node files and all the frontend stuff. I don't see where to upload those files, and ultimately what I'd like to do is have access to a visual, editable file-tree interface, as I have in Windows and FileZilla, so I can swap files in and out, etc. Alternatively I suppose I could import a repo from Github? Github would be fine as long as I can visually see what's happening. Is there a visual interface for file structure available in GAE somewhere? Am I missing something?
I went through the GAE "Hello World" tutorial and that worked fine, but was left scratching my head afterward regarding how to actually see and edit the guts of the tutorial app, or even where to look for the files.
So first off, I want to determine what's the better approach, and then if possible, determine how to make the experience of getting my app up there and running a more visual, user-friendly experience.
Thanks.
There are many things to consider when choosing how to run an app, but my instinct for your use case is to simply use a VM on GCE. The most compelling reason for this is that it's the most similar thing to what you have now. You can SSH into the machine and run nohup node index.js & (or node index.js inside tmux/screen if you prefer) and it will start the app and not stop it when you log out of SSH. You can use SCP / SFTP with whatever GUI client you want to upload files. You don't have to learn anything new! If you wanted to, you could even use a Windows VM (although I think you have to pay a little more than for a comparable Linux VM due to the licensing fees).
That said, the other way is arguably more "correct" by modern development standards, but it will involve a lot more learning that will prevent you from getting your app running somewhere other than your laptop in the short term:
First, you'll need to learn about Docker and stateless containers, which is basically what your app runs inside of on AppEngine.
Next, you'll need to learn how to hook up a separate stateful service (database, file server, ...) to your app's container so you can store your files, etc. in it, and then probably rewrite your app somewhat to use it to store stuff.
Next, you'll probably want some way to automatically deploy this from code instead of manually doing it, which gets you into build systems, package managers, artifact storage, continuous integration systems, and on and on and on.
This latter path is certainly what you should choose for a long-running production service if you work with a big team of developers -- but that doesn't mean that it's necessarily the right path for your project today. If you don't care about scaling up automatically, load balancing between nodes, redundant copies of your app running in different regions in case there's a natural disaster, etc., then go with the easy way for now, and you can learn new ways to improve the service when they're actually needed.

Extract SAS Enterprise Guide into Unix Server runnable batch?

We have built a project in Enterprise Guide for the purpose of creating a easy understandable and maintainable code. The project contain a set of process flows which run should be done in specific order. This project we need to run on a Linux Server machine, where the SAS Metadata Server is running.
Basic idea is to extract this project into SAS code, which we would be able to run from command line in Linux as a batch job.
Question 1:
Is there any other way to schedule a batch job in Linux-hosted SAS Server? I have read about VBS scripting for scheduling/running batch jobs, but in order this to be done on Linux Server, a installation of WINE is required, which on a production machine which already runs a number of other important applications, is almost completely out of question.
Is there a way to specify a complete project export into SAS code, provided that I give the specific order of running process flows? I have tried out ordered list, which is able to make you a list of tasks to run in order (although there is no way to choose a whole process flow as a single task), but unfortunately, this ordered list itself is later not possible to be exported as a SAS code.
Current solution we do is the following:
We export each single process flow of the SAS EG project into SAS code, and then create another SAS code with %include lines to run all the extracted codes in order that we want. This is of course a possible solution, but definitely not the most elegant one.
Question 2:
Since I don't know how exactly the code is being exported afterwards, are there any dangers I should bear in mind with the solution I chose.
Is there any other, more elegant way?
You have a couple of options from what I'm familiar with, plus I suspect if Dom happens by he'll know more. These answers are based on EG 6.1, which is the current version (ships with 9.4); it's possible some of these things may not be true in earlier versions.
First, if you're running Enterprise Guide from Windows, you can schedule the job locally (on any Windows machine with Enterprise Guide). You're not scheduling the server directly, you schedule Windows to launch an EG process that connects to the server and does its magic. That's how I largely interact with scheduling (because I have a fairly 'light' scheduling need).
Second, from the blog post "Four Ways to Schedule SAS Tasks", options 3 and 4 may be helpful for you. The SAS Platform Suite is designed in part for scheduling, and the options using SAS Management Console to schedule via operating system tools, are both very helpful.
Third, you may want to look into SAS Stored Processes, which should be schedulable. A process flow can be converted into a stored process.
For your specific questions:
Question 1: When you export a process flow or a project, at least in 6.1 you have the option to change the order in which the programs are exported. It's manual, so it's probably not perfect, but it does give you that option. (The code seems to be by default in creation order, which is sub-optimal.) The project export does group process flows together, but you don't have the option of manipulating the order of process flows - you have to move each program around, which would be tedious. It also of course gives you less flexibility if you need to multiply run programs.
Question 2: As Stig Eide points out in comments, make sure your System Option LRECL is > 256 (the default) or you run some risk of code being cut off. In 9.2+ this is modifiable; just place LRECL=32767in your config.sas file.

Sandboxing code on a Linux machine

I'm in the process of writing an application in C++ on Linux. The goal is to have it load dynamically linked libraries at run-time and to provide all the services that the libraries require. The main aim is to have it act as a black box where code loaded at run-time can not break out and damage the rest of the system.
I've never done anything like this before and am a little lost of the best method to take. If I load all the dynamically linked libraries under a special process and then use something like SELinux to limit the ability for the central daemon to do anything outside of its requirements would that seem like a reasonable solution?
The reason I ask is that I want to allow people to load code into this container application that then handles all the server side stuff for them, so things such as security, permissions, networking, logging etc are all provided with a simple, clean and cross platform API regardless of the version of UNIX that the container is running on.

Linux: What should I use to run terminal programs based on a calendar system?

Sorry about the really ambiguous question, I really have no idea how to word it though hopefully I can give you more detail here.
I am developing a project where a user can log into a website and book a server to run a game for a specific amount of time. When the time is up the server stops running and the players on the server are kicked off. The website part is not a problem, I am doing this in PHP and everything works. It has a calendar system to book a server and can generate config files based on what the user wants.
My question is what should I use to run the specific game server on the linux box with those config files at the correct time? I have got this working with bash scripts and cron, but it seems very un-elegant. It literally uses FTP to connect to the website so it can download all the necessary config files and put them in a folder for that game and time. I was wondering if there was a better way of doing this. Perhaps writing a program in C, but I am not sure how to go about doing this.
(I am not asking for someone to hold my hand and tell me "write this code here", just some ideas of a better way of approaching this problem)
Thanks so much guys!
Edit: The webserver is a totaly different machine. I would theoreticaly like to have more than one game server where each of them "connects" (at the moment FTP) to the webserver, gets a file saying what it has to do at a specific time and downloads any associated files then disconnects.
I think at is better suited for running one time jobs than cron.
For a better approach for the downloading files etc, you should give more details on your setup (like, the website and the game server, are they on the same machine? Or the same network? etc etc.
You need a distributed task scheduler. With that, you can:
Schedule command "X" to be run at a certain time.
Specify the machine (or ask it to pick a machine from a pool of available machines)
Webserver would send request to this scheduler via command line or via web service when user selects a game server and a time.
You can have a look at : http://www.acelet.com/super/SuperWatchdog/index.html
EDIT :
One more option :http://jobscheduler.sourceforge.net/

Resources