memory safety and security - sandboxing arbitrary program?

memory safety and security - sandboxing arbitrary program? - security

In some languages (Java, C# without unsafe code, ...) it is (should be) impossible to corrupt memory - no manual memory management, etc. This allows them to restrict resources (access to files, access to net, maximum memory usage, ...) to applications quite easily - e.g. Java applets (Java web start). It's sometimes called sandboxing.
My question is: is it possible with native programs (e.g. written in memory-unsafe language like C, C++; but without having source code)? I don't mean simple bypass-able sandbox, or anti-virus software.
I think about two possibilities:
run application as different OS user, set restrictions for this user. Disadvantage - many users, for every combination of parameters, access rights?
(somehow) limit (OS API) functions, that can be called
I don't know if any of possibilities allow (at least in theory) in full protection, without possibility of bypass.
Edit: I'm interested more in theory - I don't care that e.g. some OS has some undocumented functions, or how to sandbox any application on given OS. For example, I want to sandbox application and allow only two functions: get char from console, put char to console. How is it possible to do it unbreakably, no possibility of bypassing?
Answers mentioned:
Google Native Client, uses subset of x86 - in development, together with (possible?) PNaCl - portable native client
full VM - obviously overkill, imagine tens of programs...
In other words, could native (unsafe memory access) code be used within restricted environment, e.g. in web browser, with 100% (at least in theory) security?
Edit2: Google Native Client is exactly what I would like - any language, safe or unsafe, run at native speed, sandbox, even in web browser. Everybody use whatever language you want, in web or on desktop.

You might want to read about Google's Native Client which runs x86 code (and ARM code I believe now) in a sandbox.

You pretty much described AppArmor in your original question. There are quite a few good videos explaining it which I highly recommend watching.

Possible? Yes. Difficult? Also yes. OS-dependent? Very yes.
Most modern OSes support various levels of process isolation that can be used to acheive what you want. The simplest approach is to simply attach a debugger and break on all system calls; then filter these calls in the debugger. This, however, is a large performance hit, and is difficult to make safe in the presence of multiple threads. It is also difficult to implement safely on OSes where the low-level syscall interface is not documented - such as Mac OS or Windows.
The Chrome browser folks have done a lot of work in this field. They've posted design docs for Windows, Linux (in particular the SUID sandbox), and Mac OS X. Their approach is effective but not totally foolproof - there may still be some minor information leaks between the outer OS and the guest application. In addition, some of the OSes require specific modifications to the guest program to be able to communicate out of the sandbox.
If some modification to the hosted application is acceptable, Google's native client is worth a look. This restricts the compiler's code generation choices in such a way that the loader can prove that it doesn't do anything nasty. This obviously doesn't work on arbitrary executables, but it will get you the performance benefits of native code.
Finally, you can always simply run the program in question, plus an entire OS to itself, in an emulator. This approach is basically foolproof, but adds significant overhead.

Yes this is possible IF the hardware provides mechanisms to restrict memory accesses. Desktop processors usually are equipped with an MMU and access levels, so the OS can employ these to deny access to any memory address a thread should not have access to.
Virtual memory is implemented by the very same means: any access to memory currently swapped out to disk is trapped, the memory fetched from disk and then the thread is continued. Virtualization takes it a little farther, because it also traps accesses to hardware registers.
All the OS really needs to do is properly use those features and it will be impossible for any code to break out of the sandbox. Of course this much easier said than practically applied. Mostly because the OS takes liberties if favor of performance, oversights in what certain OS calls can be used to do and last but not least bugs in the implementation.

Related

when will a process need memory pages with both write and exec permissions at once

I'm trying to understand how programs can be isolated and secured.
Are there any valid cases when processes should require PROT_WRITE |PROT_EXEC on a memory page? Can this be avoided?
This seems like the opposite of the things the NX bit or W^X or DEP were trying to achieve.
Libre office seems to be using this and creating a whole lot of trouble on hardened linux.
https://github.com/nning/linux-pax-flags/pull/3

That situation is only required when you are writing what amounts to a loader -- something that will be bringing in additional code on demand using its own mechanisms -- or a JIT compiler, or one of the VERY few other legitimate situations in which an application should be allowed to modify its own code. Even there, what's often done is to control the duration of those permissions, having the page only be writable when it's being loaded then switching it to only being executable so it can't be stepped on thereafter.
I have no insight into why Libre Office might think it needs this capability. You'd have to take that up with its developer community.

IO performance in windows and linux

We want to build a web service to return some images (like google map tiles).
And the source data is organized as the esri compact cache format,the key of our service is to read the tiles from the bundles.
I am not sure how to choose the platform,windows or linux?
It is said that the linux have a bettor IO reading/writing performance than that of windows.
However java is our only choose if we choose linux,so I want to know if there is any points we should know to impove the IO reading performnce in linux?
PS:
In winodws platform,we will build the service based on .net4 using c#,and deploy the service use iis.
In linux,we will build the service using java (maybe based on spring mvc or some other mvc framework),and deploy the service using tomcat.
Update:
We may have the following source compact files in different folds:
L1
RxxCxx.bundle
RxxCxx.bundlx
L2
RxxCxx.bundle
RxxCxx.bundlx
And the request from the client may looks like this:
http://ourserver/maptile?row=123&col=234&level=1.png
For this requst,we will go in to the fold L1 since the level is 1,then read the RxxCxx.bundlx file first,since this file is the metadata that till tell us the position(the offset and length in RxxCxx.bundle) of the data for render the image(row=123&col=234),then we will read the RxxCxx.bundle according to the offset and length. Then we render the data to an image by write them to the response and set the content type to "image/png" or something else.
This is a whole procceed to handle a request.
Then I wonder if there is any documents or exist demos which can show me how to handle these type of IO reading?

The only situation where you have to have Windows servers in your environment is when you choose MS SQL Server DBMS (it is almost a Sybase but is a way cheaper), in which case have Windows box for the DB and *nix server for middle tier.
There are many situations where Windows can be used. Beginning with the declaration "have to have Windows" reveals an existing bias and is then followed by many groundless statements. But at least you clearly recognized this as the case.
Java is the best technology for millisecond grade middleware, mainly for the amount of mature standartized open source technologies available. Everything from coding (Eclipse, NetBeans, Idea) to manual (ant, maven) and automatic (teamcity, hudson/jenkins) builds, testing, static code analysis is there, is standartized, is open source, and is backed up by a multimillion size community.
I feel it necessary to say Visual Studio/C# (because OP mentioned as an alternative) offers everything you mentioned above with the exception of being open source. That said, the .NET Framework (or .NET Core) is now open source. Get information here. Based on your above comment, I think I can conclude that the only viable solutions are available through the open source community.
Quote I once heard that has a lot of truth: "It's only free if your time is worthless."
Also, counting the entirety of the open source community is a bogus argument. You'd have to take one development tool/API and compare the community support with another. For example, compare the community size/quality for Visual Studio with that of Eclipse. Or that of the .NET Framework vs. Java.
By the way, I've experienced no better intellisense implementation than with Visual Studio/Windows. When Eclipse does work you have rely on the quality of the open source libraries you reference to have anything meaningful. I've found the .NET Framework requires fewer 3rd party libraries than Java to accomplish the same goal.
Linux is the best server side platform for performance, stability, ease of maitenance, quality of the development environment - an extremely powerful command line based IDE. You can expect multimonth uptime from a Linux server, but not from Windows.
We have many Windows servers running services processing "big data" that have a system up-time since 5/30/2014 (nearly a year) and several more running without interruption since 2013. The only times we experience up-time problems is when hardware is aged/failing or the application-layer software we wrote contains bugs.
Tomcat/Servlet (or Jetty/Servlet) is a classic industrial combination in many financial institutions where stability is the #1 priority.
IIS is also used: job posting for IIS developer at financial institution
And lastly, the IO performance concern: a high quality user space non-blocking IO code will be CPU and hardware bandwidth bound, so OS will not be determining factor. Though fancy things like interrupts affinity, threads pinning, informed realtime tuning, kernel bypass I believe are easier to do on Linux.
Most of these variables are defined by each OS. It sounds like you have a lot of experience with threads, but also I would posit the developer can optimize at the application layer just as easily in both environments. Changing thread priority, implementing a custom thread pool, configuring BIOS, etc. are all available in the Windows world as well. Unless you want to customize the kernel which Unix/Linux allows, but then you have to support your own custom build of Unix/Linux.
I don't think commercial software should be vilified or avoided in favor of open source as a rule.

I understand this may sound as a groundless statement, but use *nix unless you have to use Windows. The only situation where you have to have Windows servers in your environment is when you choose MS SQL Server DBMS (it is almost a Sybase but is a way cheaper), in which case have Windows box for the DB and *nix server for middle tier.
Java is the best technology for millisecond grade middleware, mainly for the amount of mature standartized open source technologies available. Everything from coding (Eclipse, NetBeans, Idea) to manual (ant, maven) and automatic (teamcity, hudson/jenkins) builds, testing, static code analysis is there, is standartized, is open source, and is backed up by a multimillion size community.
Linux is the best server side platform for performance, stability, ease of maitenance, quality of the development environment - an extremely powerful command line based IDE. You can expect multimonth uptime from a Linux server, but not from Windows.
Tomcat/Servlet (or Jetty/Servlet) is a classic industrial combination in many financial institutions where stability is the #1 priority.
And lastly, the IO performance concern: a high quality user space non-blocking IO code will be CPU and hardware bandwidth bound, so OS will not be determining factor. Though fancy things like interrupts affinity, threads pinning, informed realtime tuning, kernel bypass I believe are easier to do on Linux.

Assembly security

I'm currently offering an assembly compile service for some people. They can enter their assembly code in an online editor and compile it. When then compile it, the code is sent to my server with an ajax request, gets compiled and the output of the program is returned.
However, I'm wondering what I can do to prevent any serious damage to the server. I'm quite new to assembly myself so what is possible when they run their script on my server? Can they delete or move files? Is there any way to prevent these security issues?
Thank you in advance!

Have a look at http://sourceforge.net/projects/libsandbox/. It is designed for doing exactly what you want on a linux server:
This project provides API's in C/C++/Python for testing and profiling simple (single process) programs in a restricted environment, or sandbox. Runtime behaviours of binary executable programs can be captured and blocked according to configurable / programmable policies.
The sandbox libraries were originally designed and utilized as the core security module of a full-fledged online judge system for ACM/ICPC training. They have since then evolved into a general-purpose tool for binary program testing, profiling, and security restriction. The sandbox libraries are currently maintained by the OpenJudge Alliance (http://openjudge.net/) as a standalone, open-source project to facilitate various assignment grading solutions for IT/CS education.

If this is a tutorial service, so the clients just need to test miscellaneous assembly code and do not need to perform operations outside of their program (such as reading or modifying the file system), then another option is to permit only a selected subset of instructions. In particular, do not allow any instructions that can make system calls, and allow only limited control-transfer instructions (e.g., no returns, branches only to labels defined within the user’s code, and so on). You might also provide some limited ways to return output, such as a library call that prints whatever value is in a particular register. Do not allow data declarations in the text (code) section, since arbitrary machine code could be entered as numerical data definitions.
Although I wrote “another option,” this should be in addition to the others that other respondents have suggested, such as sandboxing.
This method is error prone and, if used, should be carefully and thoroughly designed. For example, some assemblers permit multiple instructions on one line. So merely ensuring that the text in the first instruction field of a line was acceptable would miss the remaining instructions on the line.

Compiling and running someone else's arbitrary code on your server is exactly that, arbitrary code execution. Arbitrary code execution is the holy grail of every malicious hacker's quest. Someone could probably use this question to find your service and exploit it this second. Stop running the service immediately. If you wish to continue running this service, you should compile and run the program within a sandbox. However, until this is implemented, you should suspend the service.
You should run the code in a virtual machine sandbox because if the code is malicious, the sandbox will prevent the code from damaging your actual OS. Some Virtual Machines include VirtualBox and Xen. You could also perform some sort of signature detection on the code to search for known malicious functionality, though any form of signature detection can be beaten.
This is a link to VirtualBox's homepage: https://www.virtualbox.org/
This is a link to Xen: http://xen.org/

How are clientside security vulnerabilities generally discovered?

I mean in operating systems or their applications. The only way I can think of is examine binaries for the use of dangerous functions like strcpy(), and then try to exploit those. Though with compiler improvements like Visual Studio's /GS switch this possibility should mostly be a thing of the past. Or am I mistaken?
What other ways do people use to find vulnerabilities? Just load your target in a debugger, then send unexpected input and see what happens? This seems like a long and tedious process.
Could anyone recommend some good books or websites on this subject?
Thanks in advance.

There are two major issues involved with "Client Side Security".
The most common client exploited today is the browser in the form of "Drive By Downloads". Most often memory corruption vulnerabilities are to blame. ActiveX com objects have been a common path on windows systems and AxMan is a good ActiveX fuzzer.
In terms of memory protection systems the /GS is a canary and it isn't the be all end all for stopping buffer overflows. It only aims to protect stack based overflows that are attempting to overwrite the return address and control the EIP. NX Zones and canaries are a good things, but ASLR can be a whole lot better at stopping memory corruption exploits and not all ASLR implementations are made equally secure. Even with all three of these systems you're still going to get hacked. IE 8 Running on Windows 7 had all of this and it was one of the first to be hacked at the pwn2own and here is how they did it. It involved chaining together a Heap Overflow and a Dangling Pointer vulnerability.
The problem with "client side security" is CWE-602: Client-Side Enforcement of Server-Side Security are created when the server side is trusting the client with secret resources (like passwords) or to send report on sensitive information such as the Players Score in a flash game.
The best way to look for client side issues is by looking at the traffic. WireShark is the best for non-browser client/server protocols. However TamperData is by far the best tool you can use for browser based platforms such as Flash and JavaScript. Each case is going to be different, unlike buffer overflows where its easy to see the process crash, client side trust issues are all about context and it takes a skilled human to look at the network traffic to figure out the problem.
Sometimes foolish programmers will hardcode a password into their application. Its trivial to decompile the app to obtain the data. Flash decompiling is very clean, and you'll even get full variable names and code comments. Another option is using a debugger like OllyDBG to try and find the data in memory. IDA-Pro is the best decompiler for C/C++ applications.

Writing Secure Code, 2nd edition, includes a bit about threat modeling and testing, and a lot more.

What is sandboxing?

I have read the Wikipedia article, but I am not really sure what it means, and how similar it is to version control.
It would be helpful if somebody could explain in very simple terms what sandboxing is.

A sandpit or sandbox is a low, wide container or shallow depression filled with sand in which children can play. Many homeowners with children build sandpits in their backyards because, unlike much playground equipment, they can be easily and cheaply constructed. A "sandpit" may also denote an open pit sand mine.
Well, A software sandbox is no different than a sandbox built for a child to play. By providing a sandbox to a child we simulate the environment of real play ground (in other words an isolated environment) but with restrictions on what a child can do. Because we don't want child to get infected or we don't want him to cause trouble to others. :) What so ever the reason is, we just want to put restrictions on what child can do for Security Reasons.
Now coming to our software sandbox, we let any software(child) to execute(play) but with some restrictions over what it (he) can do. We can feel safe & secure about what the executing software can do.
You've seen & used Antivirus software. Right? It is also a kind of sandbox. It puts restrictions on what any program can do. When a malicious activity is detected, it stops and informs user that "this application is trying to access so & so resources. Do want to allow?".
Download a program named sandboxie and you can get an hands on experience of a sandbox. Using this program you can run any program in controlled environment.
The red arrows indicate changes flowing from a running program into your computer. The box labeled Hard disk (no sandbox) shows changes by a program running normally. The box labeled Hard disk (with sandbox) shows changes by a program running under Sandboxie. The animation illustrates that Sandboxie is able to intercept the changes and isolate them within a sandbox, depicted as a yellow rectangle. It also illustrates that grouping the changes together makes it easy to delete all of them at once.
Now from a programmer's point of view, sandbox is restricting the API that is allowed to the application. In the antivirus example, we are limiting the system call (operating system API).
Another example would be online coding arenas like topcoder. You submit a code (program) but it runs on the server. For the safety of the server, They should limit the level of access of API of the program. In other words, they need to create a sandbox and run your program inside it.
If you have a proper sandox you can even run a virus infected file and stop all the malicious activity of the virus and see for yourself what it is trying to do. In fact, this will be the first step of an Antivirus researcher.

This definition of sandboxing basically means having test environments (developer integration, quality assurance, stage, etc). These test environments mimic production, but they do not share any of the production resources. They have completely separate servers, queues, databases, and other resources.
More commonly, I've seen sandboxing refer to something like a virtual machine -- isolating some running code on a machine so that it can't affect the base system.

For a concrete example: suppose you have an application that deals with money transfers. In the production environment, real money is exchanged. In the sandboxed environment, everything runs exactly the same, but the money is virtual. It's for testing purposes.
Paypal offers such a sandboxed environment, for example.

For the "sandbox" in software development, it means to develop without disturbing others in an isolated way.
It is not similiar to version control. But some version control (as branching) method can help making sandboxes.

More often we refer to the other sandbox.
In anyway, sandbox often mean an isolated environment. You can do anything you like in the sandbox, but its effect won't propagate outside the sandbox. For instance, in software development, that means you don't need to mess with stuff in /usr/lib to test your library, etc.

A sandbox is an isolated testing environment that enables users to run programs or execute files without affecting the application, system, or platform on which they run. Software developers use sandboxes to test new programming code. Especially cybersecurity professionals use sandboxes to test potentially malicious software. Without sandboxing, an application or other system process could have unlimited access to all the user data and system resources on a network.
Sandboxes are also used to safely execute malicious code to avoid harming the device on which the code is running, the network, or other connected devices. Using a sandbox to detect malware offers an additional layer of protection against security threats, such as stealthy attacks and exploits that use zero-day vulnerabilities.
The main article is here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string