Running a webserver on a virtual machine (VirtualBox) - Pros/Cons in terms of security - linux

I want to sharpen my skills in terms of gnu/linux and get a better understanding of how servers work. So I thought I'd set up an apache webserver with ftp, ssh, svn etc. Since I use Adobe products everyday in my line of work installing a linux dist. straight on my machine isn't an option. Yes, I could probably do a dualboot with linux and vista. But since I am a novice I don't want to risk something happening to my machine.
So I thought of start of installing a dist. with a pretty steep learning curve with a lot of manual configuration. To maximize the familiarization of command line operations and such. The goal is to make it working and have a safe setup.
So before I write a WOT;
I was curious of, what pros and cons there are in terms of security to have a setup like this?
Thank you!

None, there are no difference if the *nix system is on a VM or physical hardware if you give it access to resources.
In the case of the VM if you don't want it to have access to the hosts hard drive then don't add the physical hard drive. Same for the Network and any other resource.

I am running a bunch of virtual servers on my single server. I'm using OpenVZ but the basic pros and cons are the same.
Pros
I enjoy the fact that I get to experiment a lot. I can install stuff, screw things up royally, and then just wipe out the entire virtual server and start over. It beats re-installing the OS in real-life. I can also easily compare and contrast competing products this way. I'm also able to monitor the running of the system and understand how it works in a more intimate way.
Cons
Resource consumption, which is the reason why I chose OpenVZ - it doesn't consume that much as compared to VirtualBox.
Security wise, you need to take the same precautions as you do for a real system. The difference is that if your machine is compromised, you can just wipe it out easily.

Related

what is a container? and gVisor?

I am trying to understand what are containers and what is their purpose?
I am a little bit confused. When I started to read about them I saw that they rely on the Linux namespaces (is it true?) - a way to isolate the process within the container from the other processes on the machine, and got the impression that their main purpose is security.
For instance, let's say that I own a server that runs multiple services. I also don't want that a single hacked service will be able to hack the whole system. So I put each service inside a container that will make the service unable to interfere the other processes inside the machine, like to kill them or to play with their memory and in that way eliminate the risk.
But later I saw other purposes like being able to ship the app easily? or something like that. so what is their main purpose? I also read that if their main purpose is security - they have a problem. because they run directly on the host kernel (again, is it true?)- and an exploit like the "dirty cow" will or was able to get out of the container and be able to corrupt the machine. So I ended reading about the gVisor - which from what I understood tries to secure the containers, and in some cases succeed. So - what does gVisor do differently? that it's able to secure the containers? is gVisor a container itself? or just a Runtime environment for containers?
eventually, I always see comparisons between containers and VM and I ask why? And when should I use them?
I don't know if anything that I wrote is correct, and I will be glad if you will point out my mistakes, and answer my questions. Yes, I know that there are a lot of them and I am sorry, but Thanks!
The answer below is not guaranteed to be concise. Anyone is welcomed to point out my mistakes.
It might be a little bit vague because many people mixed such concepts nowadays.
1. LXC
When I first got to know such concepts, container still meant LXC, a long-existed technique in Linux. IMHO, container is a complete process that does not simulate a kernel. The difference between a container and a normal process is that container provides a isolated view via cgroups, as if it was in a new operating system. But in fact, the containers still share the host kernel (you are right), so people do worry about the security, especially when you want to deploy it in a public cloud (I don't see people using LXC directly on public cloud yet).
Despite the potential insecurity, the convenience and lightweightness(fast boot, small memory fingerprint) of containers seem to outweigh its drawbacks in most of security-insensitive situations. Tools like docker and kubernetes make large-scale deployment and management more efficient.
2. Virtual Machine & Hardware-assisted virtualization
In contrast to container, the concept Virtual Machine represents another category of isolated execution environment. Considering that most of VMs leverages some hardware-accelerating techniques like VT-x, I will assume you are talking about hardware-assisted virtualization. Virtual Machine usually contains a full kernel inside it.
See this picture from Doug Chamberlain
The Intel VT-x technique provides 2 modes, root mode(privileged) and non-root mode(not privileged). Each mode has its own ring0-ring3 (e.g, non-root ring3, non-root ring0, root ring3, root ring 0). The whole virtual machine runs in non-root mode, and the hypervisor(VMM, e.g., kvm) runs in root-mode.
In the classic qemu+kvm setup, qemu runs in root ring3, and kvm runs in root ring0.
The strong isolation and the existance of guest kernel makes virtual machine more secure and compatible. But, of course, the price is performance and efficiency (slower boot etc.)
Container-based Virtualization
People want the isolation of hardware-assisted virtualization, but don't want to give up the convenience of containers. Therefore, the hybrid solution seems really intuitive to come.
There are 2 typical solutions at present, Kata Container and [gVisor][6]
Kata Container tries to slim the whole stack of virtual machine to make it more lightweight. However, there is still linux inside it and it is still a virtual machine, but more lightweight.
gVisor claims to be an secure container, but it still leverages hardware virtualization techniques (or ptrace if you don't want virtualization). There is a component called sentry, which runs both in non-root ring0 and root ring3. The sentry will do part of the guest kernel's job, but is much smaller than linux. If sentry could not finish a request itself, it proxy the request down to the host kenrel.
The reason why most people believe gvisor is somewhat more secure is that it achieves "defense in depth" -- more layers of indirection lead people to believe it is more secure. This is usually true, but again, is not a guarantee.

How does RunKit make their virtual servers?

There are many websites providing cloud coding sush as Cloud9, repl.it. They must use server virtualisation technologies. For example, Clould9's workspaces are powered by Docker Ubuntu containers. Every workspace is a fully self-contained VM (see details).
I would like to know if there are other technologies to make sandboxed environment. For example, RunKit seems to have a light solution:
It runs a completely standard copy of Node.js on a virtual server
created just for you. Every one of npm's 300,000+ packages are
pre-installed, so try it out
Does anyone know how RunKit acheives this?
You can see more in "Tonic is now RunKit - A Part of Stripe! " (see discussion)
we attacked the problem of time traveling debugging not at the application level, but directly on the OS by using the bleeding edge virtualization tools of CRIU on top of Docker.
The details are in "Time Traveling in Node.js Notebooks"
we were able to take a different approach thanks to an ambitious open source project called CRIU (which stands for checkpoint and restore in user space).
The name says it all. CRIU aims to give you the same checkpointing capability for a process tree that virtual machines give you for an entire computer.
This is no small task: CRIU incorporates a lot of lessons learned from earlier attempts at similar functionality, and years of discussion and work with the Linux kernel team. The most common use case of CRIU is to allow migrating containers from one computer to another
The next step was to get CRIU working well with Docker
Part of that setup is being opened-source, as mentioned in this HackerNews feed.
It uses linux containers, currently powered by Docker.

Free Linux Cluster Build for Small Scale Reseach

I need to build a small cluster for my research. It's pretty humble and I'd like to build a cluster just with my other 3 laptops at home.
I'm writing in C++. My codes in MPI framework are ready. I can simulate them using visual studio 2010 and they're fine. Now I want to see the real thing.
I want to do it free (I'm a student). I have ubuntu installed and I wonder:
if I could build a cluster using ubuntu. I couldn't find a clear answer to that on the net.
if not, is there a free linux distro that I can use at building cluster?
I also wonder if I have to install ubuntu, or the linux distro on the host machine to all other laptops. Will any other linux distribution (like openSUSE) work with the one at the host machine? Or do all of them have to be same linux distro?
Thank you all.
In principle, any linux distro will work with the cluster, and also in principle, they can all be different distros. In practice, it'll be a enormously easier with everything the same, and if you get a distribution which already has a lot of your tools set up for you, it'll go much more quickly.
To get started, something like the Bootable Cluster CD should be fairly easy -- I've not used it myself yet, but know some who have. It'll let you boot up a small cluster without overwriting anything on the host computer, which lets you get started very easily. Other distributions of software for clusters include Rocks and Oscar. A technical discussion on building a cluster can be found here.
I also liked PelicanHPC when I used it a few years back. I was more successful getting it to work that with Rocks, but it is much less popular.
http://pareto.uab.es/mcreel/PelicanHPC/
Just to get a cluster up and running is actually not very difficult for the situation you're describing. Getting everything installed and configured just how you want it though can be quite challenging. Good luck!

Use "apt" or compile from scratch for a web service?

For the first time, I am writing a web service that will call upon external programs to process requests in batch. The front-end will accept file uploads and then place them in a queue. The workers on the backend will take that file, run it through ffmpeg and the rest of my pipeline, and send an email when the process is complete.
I have my backend process working on my computer (Ubuntu 10.04). The question is: should I try to re-create that pipeline using binaries that I've compiled from scratch? Or is it okay to use apt when configuring in The Real World?
Not all hosting services uses Ubuntu, and not all give me root access. (I haven't chosen a host yet.) However, they will let me upload binaries to execute, and many give me shell access with gcc.
Usually this would be a no-brainier and I'd compile it all from scratch. But doing so - not to mention trying to figure out how to create a platform-independent .tar.gz binary - will be quite a task which ultimately doesn't really help me ship my product.
Do you have any thoughts on the best way to set up my stack so that I'm not tied to a specific hosting provider? Should I try creating my own .deb, which contains Ubuntu's version of ffmpeg (and other tools) with the configurations I need?
Short of a setup where I manage my own servers/VMs (which may very well be what I have to do), how might I accomplish this?
The question is: should I try to re-create that pipeline using binaries that I've compiled from scratch? Or is it okay to use apt when configuring in The Real World?
It is in reverse: it is not okay to deploy unpackaged in The Real World IMHO
and not all give me root access
How would you be deploying a .deb without root access. Chroot jails?
But doing so - not to mention trying to figure out how to create a platform-independent .tar.gz binary - will be quite a task which ultimately doesn't really help me ship my product.
+1 You answer you own question. Don't meddle unless you have to.
Do you have any thoughts on the best way to set up my stack so that I'm not tied to a specific hosting provider?
Only depend on wellpackaged standard libs (such as ffmpeg). Otherwise include them in your own deployment. This problem isn't too hard too solve for 10s of thousand Linux applications over decades now, so it would probably be feasible for you too.
Out of the box:
Look at rightscale and other cloud providers/agents that have specialized images/tool chains especially for video encoding.
A 'regular' VPS provider (with Xen or Virtuozzo) will not normally be happy with these kinds of workload, but EC2, Rackspace and their lot will be absolutely fine with that.
In general, I wouldn't believe that a cloud infrastructure provider that doesn't grant root access will allow for computationally intensive workloads. $0.02

Running external code in a restricted environment (linux)

For reasons beyond the scope of this post, I want to run external (user submitted) code similar to the computer language benchmark game. Obviously this needs to be done in a restricted environment. Here are my restriction requirements:
Can only read/write to current working directory (will be large tempdir)
No external access (internet, etc)
Anything else I probably don't care about (e.g., processor/memory usage, etc).
I myself have several restrictions. A solution which uses standard *nix functionality (specifically RHEL 5.x) would be preferred, as then I could use our cluster for the backend. It is also difficult to get software installed there, so something in the base distribution would be optimal.
Now, the questions:
Can this even be done with externally compiled binaries? It seems like it could be possible, but also like it could just be hopeless.
What about if we force the code itself to be submitted, and compile it ourselves. Does that make the problem easier or harder?
Should I just give up on home directory protection, and use a VM/rollback? What about blocking external communication (isn't the VM usually talked to over a bridged LAN connection?)
Something I missed?
Possibly useful ideas:
rssh. Doesn't help with compiled code though
Using a VM with rollback after code finishes (can network be configured so there is a local bridge but no WAN bridge?). Doesn't work on cluster.
I would examine and evaluate both a VM and a special SELinux context.
I don't think you'll be able to do what you need with simple file system protection because you won't be able to prevent access to syscalls which will allow access to the network etc. You can probably use AppArmor to do what you need though. That uses the kernel and virtualizes the foreign binary.

Resources