I'm currently offering an assembly compile service for some people. They can enter their assembly code in an online editor and compile it. When then compile it, the code is sent to my server with an ajax request, gets compiled and the output of the program is returned.
However, I'm wondering what I can do to prevent any serious damage to the server. I'm quite new to assembly myself so what is possible when they run their script on my server? Can they delete or move files? Is there any way to prevent these security issues?
Thank you in advance!
Have a look at http://sourceforge.net/projects/libsandbox/. It is designed for doing exactly what you want on a linux server:
This project provides API's in C/C++/Python for testing and profiling simple (single process) programs in a restricted environment, or sandbox. Runtime behaviours of binary executable programs can be captured and blocked according to configurable / programmable policies.
The sandbox libraries were originally designed and utilized as the core security module of a full-fledged online judge system for ACM/ICPC training. They have since then evolved into a general-purpose tool for binary program testing, profiling, and security restriction. The sandbox libraries are currently maintained by the OpenJudge Alliance (http://openjudge.net/) as a standalone, open-source project to facilitate various assignment grading solutions for IT/CS education.
If this is a tutorial service, so the clients just need to test miscellaneous assembly code and do not need to perform operations outside of their program (such as reading or modifying the file system), then another option is to permit only a selected subset of instructions. In particular, do not allow any instructions that can make system calls, and allow only limited control-transfer instructions (e.g., no returns, branches only to labels defined within the user’s code, and so on). You might also provide some limited ways to return output, such as a library call that prints whatever value is in a particular register. Do not allow data declarations in the text (code) section, since arbitrary machine code could be entered as numerical data definitions.
Although I wrote “another option,” this should be in addition to the others that other respondents have suggested, such as sandboxing.
This method is error prone and, if used, should be carefully and thoroughly designed. For example, some assemblers permit multiple instructions on one line. So merely ensuring that the text in the first instruction field of a line was acceptable would miss the remaining instructions on the line.
Compiling and running someone else's arbitrary code on your server is exactly that, arbitrary code execution. Arbitrary code execution is the holy grail of every malicious hacker's quest. Someone could probably use this question to find your service and exploit it this second. Stop running the service immediately. If you wish to continue running this service, you should compile and run the program within a sandbox. However, until this is implemented, you should suspend the service.
You should run the code in a virtual machine sandbox because if the code is malicious, the sandbox will prevent the code from damaging your actual OS. Some Virtual Machines include VirtualBox and Xen. You could also perform some sort of signature detection on the code to search for known malicious functionality, though any form of signature detection can be beaten.
This is a link to VirtualBox's homepage: https://www.virtualbox.org/
This is a link to Xen: http://xen.org/
Related
I'm currently building a CodingGame like platform for a school project and I wondered how I could handle user script validation.
The goal is to have multiple exercises to solve with an expected response.
For example, there's an error to correct in the exercise and when corrected it should output "hello".
I don't want users to actually do "console.log('hello')" in the IDE on the website if possible :)
The most difficult part is that i don't know how to do execute the actual script given (sent as text to the API).
I just want to start with NodeJS available for the user, no Php or C...
Save the script as localfile wouldn't be a good option I think, that's how we do it for now but I don't know how Coding test platforms actually do it.
The API is hosted on AWS EC2 Amazon Linux instances.
Thank you for your help :)
This is rather an elaborate business requirement and not simply a doubt, but some points that could help you to progress in your project further -
A program is only a program when it is understood by some other program as instruction. Otherwise there's no difference in program.js and program.txt
Validating and running a program are two different things
Running a program on your own system vs allowing anyone to run the same program in your system are entirely different situations in terms of everything. (This is where the web & it's virtues come to play)
There will always be some risk when letting others provide instructions to your systems, as no validation is 100% effective, even if you check for all destructive commands, there will always be more of them. But then again, you'll have to consider this while setting up a virtual machine for the sole purpose of public use. (Like, keeping in mind that this machine can be lost at any moment, so the architecture should be able to handle such losses, either by spinning up a new machine or having a set of replaceable machines for this.)
Isolation & sandboxing of execution is the key in these kind of requirements. So you can have a look at tools like Docker, which could be a bit complex to understand initially, but if it suites your needs then simplicity is just one complexity away.
I'm looking for a way to output traces to a log file in my code, which runs on linux.
I don't want to include the printing information in the binary, in every place I deploy it.
It windows, I simply used WPP to trace without putting the actual traces strings in my binary.
How can this by achieved in Linux?
I'm not very familiar with Linux tools in this area, so maybe there is a better system. However, since nobody else has made any good suggestions, I'll make a suggestion. (Probably not a very good suggestion, but the best I can think of right now.)
In theory, you could continue to use wpp. Wpp is simply a template system. It scans the configuration and input files to create data structures. Then it runs a template, fills in the data values it got from the scan, producing the tmh files. You could create a new set of templates that would use Linux apis instead of Windows apis, and would record the message strings in a way that works with some other log decoder system.
I noticed this question only now and would like to add my two cents to the story just for a case. Personally, I truly appreciate Windows WPP Tracing and consider it probably the best engineering solution for practical development troubleshooting among similar tools.
It happened I extended WPP use to Unix-like platforms twice. We wanted to use strong sides of WPP concept in general and yet use it in a multi-platform pieces of code. This was not a porting but rather a wrapper to specific WPP use we configured on Windows. One time we had a web service to perform actual WPP pre-processing on Windows; it may sound a bit insane but it worked fine and effective within the local network. A wrapper script that was executed before each compilation sent a web request, got a processed file and post-processed the generated include file to make it suitable for Unix-like platforms. The second time we implemented a simplified WPP pre-processor of our own (we found yet additional use for it - we could generate the tracing statements differently for production and unit testing, for example). This was a harsh solution: you anyway need to use some physical tracing framework behind the wrapper on non-Windows platform (well, the first time we apparently implemented our own lower level).
I do not think the Linux world has a framework comparable to WPP. Once I even thought it could be a great idea to make an open source porting project for WPP. I am not sure it would be much requested though. I said it is a great engineering solution. But who wants to do dirty engineering work? Open source community prefer abstract object-oriented and generic solutions, streaming and less necessity in corresponding tools (WPP requires special management tools and OS support).Ease of code writing is the today's choice.
There could be Microsoft fault (or unwillingness) in the lack of WPP popularity too. They kept it as an internal framework that came out just by a case with Windows DDK because they have to offer some logging/tracing solution for driver developers. Nobody even noticed much that WPP is well suitable for the user-space code too. And WPP pre-processor for C#, for example, has never been exposed to public at all.
Nevertheless, I still think that WPP porting to Unix/Linux work can be a challenging, interesting and maybe even useful attempt. If someone decides to lead it. :)
In some languages (Java, C# without unsafe code, ...) it is (should be) impossible to corrupt memory - no manual memory management, etc. This allows them to restrict resources (access to files, access to net, maximum memory usage, ...) to applications quite easily - e.g. Java applets (Java web start). It's sometimes called sandboxing.
My question is: is it possible with native programs (e.g. written in memory-unsafe language like C, C++; but without having source code)? I don't mean simple bypass-able sandbox, or anti-virus software.
I think about two possibilities:
run application as different OS user, set restrictions for this user. Disadvantage - many users, for every combination of parameters, access rights?
(somehow) limit (OS API) functions, that can be called
I don't know if any of possibilities allow (at least in theory) in full protection, without possibility of bypass.
Edit: I'm interested more in theory - I don't care that e.g. some OS has some undocumented functions, or how to sandbox any application on given OS. For example, I want to sandbox application and allow only two functions: get char from console, put char to console. How is it possible to do it unbreakably, no possibility of bypassing?
Answers mentioned:
Google Native Client, uses subset of x86 - in development, together with (possible?) PNaCl - portable native client
full VM - obviously overkill, imagine tens of programs...
In other words, could native (unsafe memory access) code be used within restricted environment, e.g. in web browser, with 100% (at least in theory) security?
Edit2: Google Native Client is exactly what I would like - any language, safe or unsafe, run at native speed, sandbox, even in web browser. Everybody use whatever language you want, in web or on desktop.
You might want to read about Google's Native Client which runs x86 code (and ARM code I believe now) in a sandbox.
You pretty much described AppArmor in your original question. There are quite a few good videos explaining it which I highly recommend watching.
Possible? Yes. Difficult? Also yes. OS-dependent? Very yes.
Most modern OSes support various levels of process isolation that can be used to acheive what you want. The simplest approach is to simply attach a debugger and break on all system calls; then filter these calls in the debugger. This, however, is a large performance hit, and is difficult to make safe in the presence of multiple threads. It is also difficult to implement safely on OSes where the low-level syscall interface is not documented - such as Mac OS or Windows.
The Chrome browser folks have done a lot of work in this field. They've posted design docs for Windows, Linux (in particular the SUID sandbox), and Mac OS X. Their approach is effective but not totally foolproof - there may still be some minor information leaks between the outer OS and the guest application. In addition, some of the OSes require specific modifications to the guest program to be able to communicate out of the sandbox.
If some modification to the hosted application is acceptable, Google's native client is worth a look. This restricts the compiler's code generation choices in such a way that the loader can prove that it doesn't do anything nasty. This obviously doesn't work on arbitrary executables, but it will get you the performance benefits of native code.
Finally, you can always simply run the program in question, plus an entire OS to itself, in an emulator. This approach is basically foolproof, but adds significant overhead.
Yes this is possible IF the hardware provides mechanisms to restrict memory accesses. Desktop processors usually are equipped with an MMU and access levels, so the OS can employ these to deny access to any memory address a thread should not have access to.
Virtual memory is implemented by the very same means: any access to memory currently swapped out to disk is trapped, the memory fetched from disk and then the thread is continued. Virtualization takes it a little farther, because it also traps accesses to hardware registers.
All the OS really needs to do is properly use those features and it will be impossible for any code to break out of the sandbox. Of course this much easier said than practically applied. Mostly because the OS takes liberties if favor of performance, oversights in what certain OS calls can be used to do and last but not least bugs in the implementation.
In the site Ideone a user uploads code to be run on a remote server. This is similar to the functions of an online judge.
The problem is that users might upload code that attempts to 'hack' the system. I understand that in C and C++ it's easy to disable a certain set of system calls (patch a few .dll's), but I'm not so sure about other languages.
How would you protect your system if you were to support higher level languages (Erlang, Haskell) on the online judge?
Use Ideone API
Run in a sandbox as a non-privileged user. That's not absolutely foolproof, but it makes the bar for doing lasting damage or serious compromise very high. It also does not depend on possible options or modifications to the language run-time in question. If you are dealing with a fully compiled language (that is, no run-time interpreter), you can do this as well.
For example, take Erlang. Set up a chroot jail that contains only what you need to run Erlang. Add a non-privileged user account and home directory. Bring in the code to be run, verify all file/directory permissions, change to the non-privileged UID and run the code.
You can find more detailed instructions on setting up jails in the Wikipedia article referenced above. Procedures and requirements are slightly different for different OSes.
I have read the Wikipedia article, but I am not really sure what it means, and how similar it is to version control.
It would be helpful if somebody could explain in very simple terms what sandboxing is.
A sandpit or sandbox is a low, wide container or shallow depression filled with sand in which children can play. Many homeowners with children build sandpits in their backyards because, unlike much playground equipment, they can be easily and cheaply constructed. A "sandpit" may also denote an open pit sand mine.
Well, A software sandbox is no different than a sandbox built for a child to play. By providing a sandbox to a child we simulate the environment of real play ground (in other words an isolated environment) but with restrictions on what a child can do. Because we don't want child to get infected or we don't want him to cause trouble to others. :) What so ever the reason is, we just want to put restrictions on what child can do for Security Reasons.
Now coming to our software sandbox, we let any software(child) to execute(play) but with some restrictions over what it (he) can do. We can feel safe & secure about what the executing software can do.
You've seen & used Antivirus software. Right? It is also a kind of sandbox. It puts restrictions on what any program can do. When a malicious activity is detected, it stops and informs user that "this application is trying to access so & so resources. Do want to allow?".
Download a program named sandboxie and you can get an hands on experience of a sandbox. Using this program you can run any program in controlled environment.
The red arrows indicate changes flowing from a running program into your computer. The box labeled Hard disk (no sandbox) shows changes by a program running normally. The box labeled Hard disk (with sandbox) shows changes by a program running under Sandboxie. The animation illustrates that Sandboxie is able to intercept the changes and isolate them within a sandbox, depicted as a yellow rectangle. It also illustrates that grouping the changes together makes it easy to delete all of them at once.
Now from a programmer's point of view, sandbox is restricting the API that is allowed to the application. In the antivirus example, we are limiting the system call (operating system API).
Another example would be online coding arenas like topcoder. You submit a code (program) but it runs on the server. For the safety of the server, They should limit the level of access of API of the program. In other words, they need to create a sandbox and run your program inside it.
If you have a proper sandox you can even run a virus infected file and stop all the malicious activity of the virus and see for yourself what it is trying to do. In fact, this will be the first step of an Antivirus researcher.
This definition of sandboxing basically means having test environments (developer integration, quality assurance, stage, etc). These test environments mimic production, but they do not share any of the production resources. They have completely separate servers, queues, databases, and other resources.
More commonly, I've seen sandboxing refer to something like a virtual machine -- isolating some running code on a machine so that it can't affect the base system.
For a concrete example: suppose you have an application that deals with money transfers. In the production environment, real money is exchanged. In the sandboxed environment, everything runs exactly the same, but the money is virtual. It's for testing purposes.
Paypal offers such a sandboxed environment, for example.
For the "sandbox" in software development, it means to develop without disturbing others in an isolated way.
It is not similiar to version control. But some version control (as branching) method can help making sandboxes.
More often we refer to the other sandbox.
In anyway, sandbox often mean an isolated environment. You can do anything you like in the sandbox, but its effect won't propagate outside the sandbox. For instance, in software development, that means you don't need to mess with stuff in /usr/lib to test your library, etc.
A sandbox is an isolated testing environment that enables users to run programs or execute files without affecting the application, system, or platform on which they run. Software developers use sandboxes to test new programming code. Especially cybersecurity professionals use sandboxes to test potentially malicious software. Without sandboxing, an application or other system process could have unlimited access to all the user data and system resources on a network.
Sandboxes are also used to safely execute malicious code to avoid harming the device on which the code is running, the network, or other connected devices. Using a sandbox to detect malware offers an additional layer of protection against security threats, such as stealthy attacks and exploits that use zero-day vulnerabilities.
The main article is here.