Sandboxing thread

Sandboxing thread - sandbox

I want to create thread or process that would have its own virtual address space (It would probably would have to be separate process) without system libraries in the address space. My goal is to create an execution environment for foreign origin code.
I would like to create a thread with no system libraries, just few executable pages where user's code would be copied and the thread entry point would be placed and also few RW pages for stack and data exchange with main thread.
Is it possible to completely unmap all system libraries on windows (or possibly Linux) from virtual memory from application level?

Unmapping system libraries will not prevent the binary from performing system calls by itself. To catch all operations which you are trying to prevent, some form of binary translation is necessary. You might want to have a look at libdetox and fastBT (Google Tech Talk about fastBT)

Depending on what you want to achieve, it might be easier to run the foreign code within a User-Mode Linux, qemu, VMware or other virtualization solution (using a fresh copy of the virtual hard disk for each run, not providing any network interfaces, etc.).

Related

Linux's system calls for GUI? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm studying Operating Systems. I read Window have lots of system calls for manage windows and GUI components. I have read you can change the GUI manager of your Linux Operating System. Then does Linux have system calls for GUI managements? How GUI works in Linux?

I'll take x86 as an example as I am more aware of x86 stuff than ARM stuff. Also, I may get some information wrong as I've been doing some research on this question while answering. Feel free to correct me if I am wrong.
System booting
Some time ago, Linux used to boot with a legacy bootloader (GRUB legacy version). The GRUB bootloader would be started by the BIOS at 0x7c00 in RAM and then would read the kernel from the hard-disk. It would follow the multiboot specification. The multiboot specification mentions the state that the computer needs to be in before jumping to the kernel's entry point. The kernel would then launch a first process (init) that every process would be a child of.
Today, most Linux distributions boot with UEFI (with the option of legacy booting also available). A UEFI app is placed on the boot partition partionned as a GPT ESP (EFI System Partition). This EFI app is launched and then follows the Linux Boot Protocol to launch Linux. The init process was also replaced by systemd. Linux will thus launch systemd as the first process of the computer. Actually, as stated on the manpage for systemd:
systemd is usually not invoked directly by the user, but is
installed as the /sbin/init symlink and started during early
boot.
The process that will be started is thus /sbin/init but it is a symlink to systemd. The systemd process will then read several configuration files on the hard-disk called units. These units are often targets which specify several units to read. Targets are thus units which specify several units to read. At first systemd will read default.target which specifies several other units. Some of these other units will start some processes among which is the Display Manager (fancy terminology which means login prompt). Recently, Ubuntu starts the Gnome Display Manager (GDM) as the first displaying program (gdm.service unit). This program will start the X server before presenting the user login screen (https://en.wikipedia.org/wiki/X_display_manager).
When the display manager runs on the user's computer, it starts the X server before presenting the user the login screen, optionally repeating when the user logs out.
Once logged in, GDM will start several other binaries responsible to let you interact with the system (the actual desktop, a binary to gather input for this desktop, etc). All of these components depend on the X server to work properly.
The DRM
The X server is a user program which makes extensive use of the Direct Rendering Manager (DRM) of the Linux kernel. The DRM is a system call interface which is used to interact with graphics cards. When the DRM detects a graphics card, it exposes a file like /dev/dri/card0 which is a character device (http://manpages.ubuntu.com/manpages/bionic/man7/drm.7.html).
In earlier days, the kernel framework was solely used to provide raw hardware access to
priviledged user-space processes which implement all the hardware abstraction layers. But
more and more tasks were moved into the kernel. All these interfaces are based on ioctl(2)
commands on the DRM character device. The libdrm library provides wrappers for these
system-calls and many helpers to simplify the API.
When a GPU is detected, the DRM system loads a driver for the detected hardware type. Each
connected GPU is then presented to user-space via a character-device that is usually
available as /dev/dri/card0 and can be accessed with open(2) and close(2). However, it
still depends on the grapics driver which interfaces are available on these devices. If an
interface is not available, the syscalls will fail with EINVAL.
The ioctl call allows to have any number of operations on the /dev/dri/card0 file since it is a general call which includes a request argument which is simply an unsigned long. It also takes a variable amount of arguments (see https://man7.org/linux/man-pages/man2/ioctl.2.html).
The ioctl call thus allows hardware vendors (like NVIDIA, AMD, etc) to provide drivers for their cards with the general ioctl call used as a general interface between user mode and kernel mode.
OpenGL
There exists several 3D rendering APIs available (OpenGL, Direct3D). OpenGL is mostly a set of C headers and a convention. The convention says what a certain call should do. It is up to the hardware vendor to implement the convention for their own card. Mesa3D has been an attempt to create an open source implementation of OpenGL for certain graphics cards. It worked quite well for integrated Intel HD Graphics (since documentation is open) and AMD (since they cooperated and offered some insight into the workings of their cards), but not for NVIDIA (the Nouveau driver is mostly not working or slow).
When you program some OpenGL, you include the OpenGL headers and link with libraries provided by hardware vendors which provide the definitions of the functions in the headers. These definitions make use of the DRM and cooperate with the X server to show content on the screen.

I'm studying Operating Systems. I read Window have lots of system calls for manage windows and GUI components. I have read you can change the GUI manager of your Linux Operating System. Then does Linux have system calls for GUI managements? How GUI works in Linux?
System calls (provided by the kernel) are often buried (e.g. in some cases deliberately undocumented and proprietary) and should not be used. Almost everything you see are actually normal functions in dynamically linked libraries/shared libraries. This allows the kernel's system calls to be radically changed without breaking everything (because everything only depends on the dynamically linked libraries/shared libraries); and reduces the functionality needed in the kernel itself.
For an example; most of the "system calls for managing windows and GUI components" you think Windows has could (internally, inside the relevant DLL) just end up using a single "send_message()" system call (to tell a different process, the GUI, that you want to create a window or change its position or ...).
For Linux it's roughly similar. The kernel's system calls (which actually are documented, for no sane reason - it goes against the spirit of SYS-V specs and means badly written "linux executables" aren't compatible with other Unix clones like FreeBSD or Solaris or OSX) exist to use things like low level memory management and raw file IO and sockets; but (like Windows) the kernel's system calls are buried under layers of shared libraries, and those shared libraries (e.g. like Xlib, GLib, KWindowSystem, Qt, ...) just use "something" (file IO, pipes, sockets, ...) provided by kernel to talk to another process (display server, GUI, ..).

Linux and Windows fall under separate categories; Linux is just a kernel, i.e. the piece under the hood that gives us the basic functionality we expect to run programs, like threads, memory and process management, etc. Windows is a full operating system, including the user facing components and numerous system libraries. An apter comparison would be a specific Linux distro and Windows.
On that note, distros, as independent operating systems, obviously can have different implementations of any OS component. Some distros, like Arch, don't come with a GUI by default at all. That said, essentially the entire Linux ecosystem uses Xorg and/or Wayland; I would recommend looking into the implementation details of those two.

A Linux GUI has quite a few differences compared to Windows GUI. For example, the GUI is not considered to be a part of the operating system, but rather an external part of it; that means no syscalls (not embedded whatsoever in the OS). After all, like the previous answer says, Linux is a kernel, that means it's only something really basic (allows execution of programs, memory/threads management, processes management, but not really much more). Whatever comes next (GUI, for example) are added features using packages.
This allows, for example, installing a GUI on top of a minimal installation of any Linux distro (CentOS, for example), and that GUI can be the one you want (Gnome, KDE...).

Is it possible to circumvent OS security by not using the supplied System Calls?

I understand that an Operating System forces security policies on users when they use the system and filesystem via the System Calls supplied by stated OS.
Is it possible to circumvent this security by implementing your own hardware instructions instead of making use of the supplied System Call Interface of the OS? Even writing a single bit to a file where you normally have no access to would be enough.

First, for simplicity, I'm considering the OS and Kernel are the same thing.
A CPU can be in different modes when executing code.
Lets say a hypothetical CPU has just two modes of execution (Supervisor and User)
When in Supervisor mode, you are allowed to execute any instructions, and you have full access to the hardware resources.
When in User mode, there is subset of instructions you don't have access to, such has instructions to deal with hardware or change the CPU mode. Trying to execute one of those instructions will cause the OS to be notified your application is misbehaving, and it will be terminated. This notification is done through interrupts. Also, when in User mode, you will only have access to a portion of the memory, so your application can't even touch memory it is not supposed to.
Now, the trick for this to work is that while in Supervisor Mode, you can switch to User Mode, since it's a less privileged mode, but while in User Mode, you can't go back to Supervisor Mode, since the instructions for that are not permitted anymore.
The only way to go back to Supervisor mode is through system calls, or interrupts. That enables the OS to have full control of the hardware.
A possible example how everything fits together for this hypothetical CPU:
The CPU boots in Supervisor mode
Since the CPU starts in Supervisor Mode, the first thing to run has access to the full system. This is the OS.
The OS setups the hardware anyway it wants, memory protections, etc.
The OS launches any application you want after configuring permissions for that application. Launching the application switches to User Mode.
The application is running, and only has access to the resources the OS allowed when launching it. Any access to hardware resources need to go through System Calls.
I've only explained the flow for a single application.
As a bonus to help you understand how this fits together with several applications running, a simplified view of how preemptive multitasking works:
In a real-world situation. The OS will setup an hardware timer before launching any applications.
When this timer expires, it causes the CPU to interrupt whatever it was doing (e.g: Running an application), switch to Supervisor Mode and execute code at a predetermined location, which belongs to the OS and applications don't have access to.
Since we're back into Supervisor Mode and running OS code, the OS now picks the next application to run, setups any required permissions, switches to User Mode and resumes that application.
This timer interrupts are how you get the illusion of multitasking. The OS keeps changing between applications quickly.
The bottom line here is that unless there are bugs in the OS (or the hardware design), the only way an application can go from User Mode to Supervisor Mode is through the OS itself with a System Call.
This is the mechanism I use in my hobby project (a virtual computer) https://github.com/ruifig/G4DevKit.

HW devices are connected to CPU trough bus, and CPU does use to communicate with them in/out instructions to read/write values at I/O ports (not used with current HW too much, in early age of home computers this was the common way), or a part of device memory is "mapped" into CPU address space, and CPU controls the device by writing values at defined locations in that shared memory.
All of this should be not accessible at "user level" context, where common applications are executed by OS (so application trying to write to that shared device memory would crash on illegal memory access, actually that piece of memory is usually not even mapped into user space, ie. not existing from user application point of view). Direct in/out instructions are blocked too at CPU level.
The device is controlled by the driver code, which is either run is specially configured user-level context, which has the particular ports and memory mapped (micro-kernel model, where drivers are not part of kernel, like OS MINIX). This architecture is more robust (crash in driver can't take down kernel, kernel can isolate problematic driver and restart it, or just kill it completely), but the context switches between kernel and user level are a very costly operation, so the throughput of data is hurt a bit.
Or the device drivers code runs on kernel-level (monolithic kernel model like Linux), so any vulnerability in driver code can attack the kernel directly (still not trivial, but lot more easier than trying to get tunnel out of user context trough some kernel bug). But the overall performance of I/O is better (especially with devices like graphics cards or RAID disc clusters, where the data bandwidth goes into GiBs per second). For example this is the reason why early USB drivers are such huge security risk, as they tend to be bugged a lot, so a specially crafted USB device can execute some rogue code from device in kernel-level context.
So, as Hyd already answered, under ordinary circumstances, when everything works as it should, user-level application should be not able to emit single bit outside of it's user sandbox, and suspicious behaviour outside of system calls will be either ignored, or crash the app.
If you find a way to break this rule, it's security vulnerability and those get usually patched ASAP, when the OS vendor gets notified about it.
Although some of the current problems are difficult to patch. For example "row hammering" of current DRAM chips can't be fixed at SW (OS) or CPU (configuration/firmware flash) level at all! Most of the current PC HW is vulnerable to this kind of attack.
Or in mobile world the devices are using the radiochips which are based on legacy designs, with closed source firmware developed years ago, so if you have enough resources to pay for a research on these, it's very likely you would be able to seize any particular device by fake BTS station sending malicious radio signal to the target device.
Etc... it's constant war between vendors with security researchers to patch all vulnerabilities, and hackers to find ideally zero day exploit, or at least picking up users who don't patch their devices/SW fast enough with known bugs.

Not normally. If it is possible it is because of an operating system software error. If the software error is discovered it is fixed fast as it is considered to be a software vulnerability, which equals bad news.

"System" calls execute at a higher processor level than the application: generally kernel mode (but system systems have multiple system level modes).
What you see as a "system" call is actually just a wrapper that sets up registers then triggers a Change Mode Exception of some kind (the method is system specific). The system exception hander dispatches to the appropriate system server.
You cannot just write your own function and do bad things. True, sometimes people find bugs that allow circumventing the system protections. As a general principle, you cannot access devices unless you do it through the system services.

What Really Protects File Priveleges?

In Windows for instance, and all operating systems, file priveleges exist that "prevent" a file from being written to if that rule is set.
This is hard to describe but please listen. People coding in a C language obviously would use some form of framework to easily modify a file. Using the built-in .Net framework, Microsoft obviously would put prevention into their classes checking file permissions before writing to a file. Since file permissions are stored via software and not hardware, what really prevents a file from being tampered with?
Let's hop over to Assembly. Suppose I create an Assembly program that directly accesses hard drive data and changes the bytes of a file. How could file permissions possibly prevent me from doing this? I guess what I am trying to ask is how a file permission really stays secure if the compiled program does not check for file permissions before writing to a file?

Suppose I create an Assembly program that directly accesses hard drive data and changes the bytes of a file. How could file permissions possibly prevent me from doing this?
If you write in assembly, your assembly is still run in a CPU mode that prevents direct access to memory and devices.
CPU modes … place restrictions on the type and scope of operations that can be performed by certain processes being run by the CPU. This design allows the operating system to run with more privileges than application software.
Your code still needs to issue system calls to get the OS to interact with memory not owned by your process and devices.
a system call is how a program requests a service from an operating system's kernel. This may include hardware related services (e.g. accessing the hard disk), creating and executing new processes, and communicating with integral kernel services (like scheduling).
The OS maintains security by monopolizing the ability to switch CPU modes and by crafting system calls so that they are safe for user-land code to initiate.

Can containers be implemented purely in userspace?

There are a bunch of container mechanisms for Linux now: LXC, Docker, lmctfy, OpenVZ, Linux-VServer, etc. All of these either involve kernel patches or recently added Linux features like cgroups and seccomp.
I'm wondering if it would be possible to implement similar (OS-level) virtualization purely in userspace.
There's already a precedent for this - User Mode Linux. However, it also requires special kernel features to be reasonably fast and secure. Also, it is literally a Linux kernel running in userspace, which makes networking setup rather difficult.
I'm thinking more along the lines of a process that would act as an intermediary between spawned programs and the Linux kernel. You would start the process with the programs to spawn as arguments; it would track system calls they made, and block or redirect attempts to access the real root filesystem, real network devices, etc. without itself relying on special kernel features.
Is such a thing possible to implement securely, and in a way that could be invoked effectively by a limited user (i.e. not privileged like chroot)?
In summary: would a pure userspace implementation of something like LXC be possible? If yes, what would the penalties be for doing it in userspace? If no, why not?

Surprisingly it turns out the answer is "yes": this is what systrace and sysjail do.
http://sysjail.bsd.lv/
And they are also inherently insecure on modern operating systems.
http://www.watson.org/~robert/2007woot/
So if you want proper sandboxing, it has to be done in kernel space.

what tool for debugging a linux kernel?

I am new to linux kernel.
wandering how to browse the complete flow, right from the power up of CPU.
Basic idea on BIOS/ROM code.
can I have some tool to debug the complete kernel ?
or
raw code browsing is preferable ?

The following tools may help you to debug Linux kernel
Dynamic Probes is one of the popular debugging tool for Linux which developed by IBM. This tool allows the placement of a “probe” at almost any place in the system, in both user and kernel space. The probe consists of some code (written in a specialized, stack-oriented language) that is executed when control hits the given point. Resources regarding dprobes / kprobes listed below
http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaax/dprobesltt.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.6212&rep=rep1&type=pdf
https://www.redhat.com/magazine/005mar05/features/kprobes/
https://sourceware.org/systemtap/kprobes/
http://www.ibm.com/developerworks/library/l-kprobes/index.html
https://doc.opensuse.org/documentation/html/openSUSE_121/opensuse-tuning/cha.tuning.kprobes.html
Linux Trace Toolkit is a kernel patch and a set of related utilities that allow the tracing of events in the kernel. The trace includes timing information and can create a reasonably complete picture of what happened over a given period of time. Resources of LTT, LTT Viewer and LTT Next Generation
http://elinux.org/Linux_Trace_Toolkit
http://www.linuxjournal.com/article/3829
http://multivax.blogspot.com/2010/11/introduction-to-linux-tracing-toolkit.html
MEMWATCH is an open source memory error detection tool. It works by defining MEMWATCH in gcc statement and by adding a header file to our code. Through this we can track memory leaks and memory corruptions. Resources regarding MEMWATCH
http://www.linuxjournal.com/article/6059
ftrace is a good tracing framework for Linux kernel. ftrace traces internal operations of the kernel. This tool included in the Linux kernel in 2.6.27. With its various tracer plugins, ftrace can be targeted at different static tracepoints, such as scheduling events, interrupts, memory-mapped I/O, CPU power state transitions, and operations related to file systems and virtualization. Also, dynamic tracking of kernel function calls is available, optionally restrictable to a subset of functions by using globs, and with the possibility to generate call graphs and provide stack usage. You can find a good tutorial of ftrace at https://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_rostedt.pdf
ltrace is a debugging utility in Linux, used to display the calls a user space application makes to shared libraries. This tool can be used to trace any dynamic library function call. It intercepts and records the dynamic library calls which are called by the executed process and the signals which are received by that process. It can also intercept and print the system calls executed by the program.
http://www.ellexus.com/getting-started-with-ltrace-how-does-it-do-that/?doing_wp_cron=1425295977.1327838897705078125000
http://developerblog.redhat.com/2014/07/10/ltrace-for-rhel-6-and-7/
KDB is the in-kernel debugger of the Linux kernel. KDB follows simplistic shell-style interface. We can use it to inspect memory, registers, process lists, dmesg, and even set breakpoints to stop in a certain location. Through KDB we can set breakpoints and execute some basic kernel run control (Although KDB is not source level debugger). Several handy resources regarding KDB
http://www.drdobbs.com/open-source/linux-kernel-debugging/184406318
http://elinux.org/KDB
http://dev.man-online.org/man1/kdb/
https://www.kernel.org/pub/linux/kernel/people/jwessel/kdb/usingKDB.html
KGDB is intended to be used as a source level debugger for the Linux kernel. It is used along with gdb to debug a Linux kernel. Two machines are required for using kgdb. One of these machines is a development machine and the other is the target machine. The kernel to be debugged runs on the target machine. The expectation is that gdb can be used to "break in" to the kernel to inspect memory, variables and look through call stack information similar to the way an application developer would use gdb to debug an application. It is possible to place breakpoints in kernel code and perform some limited execution stepping. Several handy resources regarding KGDB
http://landley.net/kdocs/Documentation/DocBook/xhtml-nochunks/kgdb.html

First, see related question Linux kernel live debugging, how it's done and what tools are used?. Try to use KDB or Ftrace.

If your intention is understanding whole flow of Linux kernel, running Linux kernel on QEMU can be easy way to learn how Linux works. Esp. you can emulate many CPU types without real H/W. or how about user mode Linux?
This document can be helpful to debug kernel on QEMU.

Just adding, the Linux kernel is not very suitable for debugging. Linus Torvalds once stated that he's againts supportng kernel debugging in Linux because it leads to badly written code.
I used kdbg, however I didn't find it very useful, what I suggest is to debug the kernel the oldschool way, using printk.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string