How do function detour packages circumvent security - detours

I am looking at some code that uses a function detour package called DetourXS. My application is targeted for Microsoft Server operating systems. Microsoft Research also has a Detours package and they have an article on how it works. They patch the machine code that is loaded in memory and insert code that makes an unconditional jump into the newly injected code.
If this code works by modifying the machine code at run time, they should be facing security restrictions by the Operating System. This would be a serious security lapse on the OS as I can modify any critical DLL like kernel32 to do anything I want. My understanding is that if a user process attempts to modify the code of a dll already loaded into memory, it should be stopped by the OS. Is there a setting in Widows Server OS to enable/disable this check?
How do they overcome this?
Does anybody have experience on using this kind of detour packages in any application in enterprise production environment?

First important thing is that with detours you modify the instructions of your own process. In your process - you can do whatever you want anyways and you don't even have to detour anything, from OS point of view userspace code (e.g. code in your DLL and code in system's kernel32.dll loaded into your process) is exactly the same from security point of view. The reason is simple - hacking security of OS is not the same as changing some code of your process in userspace. OS secures itself by not giving your process too much power (unless you're running as an administrator). For security purposes it is more interesting how to modify code of another process (to achieve its privileges or interesting data, like passwords), but it is a subject for a different discussion.
Secondly, detouring had been considered as a way to implement hot-patching, i.e. patching critical system services on the fly, without reboot. I don't know whether it is currently used/supported by Microsoft (according to google it is), but it is not by chance that standard prolog has length of 5 bytes (yes, it is ideal for putting your jmp instruction).

Despite its name, kernel32.dll is not actually the kernel; it contains the entry points to Win32 API functions that implement Win32's interface to NT system calls.
Furthermore, Windows (like any modern OS) supports copy-on-write, so if you use detours to patch a shared DLL like kernel32.dll, the OS will first make a private copy for your process and patch that. The NT kernel itself is perfectly safe.

Related

How do you detect the unloading of a dynamic library in the current process?

On Windows, we can register a callback function through LdrDllNotification, so that when any DLL is about to be unloaded, we can get a chance to collect some useful information about that DLL, including its address, size, etc.
I don’t know enough about UNIX-like platforms (including Linux, macOS, iOS, Android, etc.). How can I do the same on these platforms?
On Linux, libraries are loaded and unloaded by the dynamic loader. The loading is usually done automatically when needed by the loader itself, but can also be done manually using the library function dlopen(). The unloading is done manually through dlclose() or automatically at program exit. This is not universally applicable to every Unix system, but only to POSIX compliant ones.
Unfortunately, since (unlike in Windows) the job of loading and unloading libraries is carried out by the dynamic loader (which is just an userspace library), the kernel does not know what is going on and does not provide any mechanism of detecting loading or unloading of dynamic libraries. Even more unfortunately, the loader itself does not provide any such mechanism either. Indeed, there would probably be little-to-no-use for such functionality, apart from usage metrics or similar stuff.
You normally don't even unload libraries at runtime in Linux, unless we are talking about a very complex piece of software that needs to limit its memory footprint when running. The loader loads the libraries when needed at startup (or when the first needed library function is called) and then they are left like any other piece of memory for the kernel to clean up when the program dies.
To my knowlegte, the "best" you could do on Linux to detect unloading of dynamic libraries is:
Create your own shared library implementing dlclose() and preload it when running executables. This is explained here and here.
Continuously watch /proc/self/maps parsing the list of loaded libraries.
Run the program under a debugger like gdb, which can be scripted/automated using Python.
As per other OSes... Android is Linux based, though it has additional security features and apps are sandboxed, unless you root the device or you use a debug shell you can't just "run" other apps hooking them with a custom dlclose() or even a debugger. I can't say much about iOS, but I suspect that implementing such a functionality is not even remotely an option given the very limited abilities of apps (that are also sandboxed). AFAIK macOS also supports dlopen()/dlclose() for manually loading/unloading libraries, however the linker is not the same as the one commonly used on Linux (linked above) so I can't say much about the automatic loading/unloading behavior.

Golang, load Windows DLL in Linux

Our vendor provides DLL, which works on Windows. Is this possible to load custom xxx.dll file and use its functions in Linux using Go?
Like this: https://github.com/golang/go/wiki/WindowsDLLs
The short answer is "no": when you "load" a dynamic-linked library, it's not only actually loaded (as in read from the file) but linked into the address space of your running program — by the special means provided by the OS (on Linux-based systems, at least on x86/amd64 platforms that's an external process; on Windows, it's an in-kernel facility, AFAIK).
In other words, loading a dynamic-linked library involves quite a lot of complexity happening behind your back.
Another complication, is whether the DLL is "self-contained" in the sense it only contains "pure" functions — which only perform computations on their input data to provide their output data, — or they call out to the operating system to perform activities such as I/O on files.
The way operating systems provide ways to do these kinds of activities to the running processes are drastically different between Windows and Linux.
The last complication I can think of is dependency of this library on others.
If the library's code is written in C or C++, it quite likely depends on the C library used by the compiler which compiled the library (on Windows, it's typically that MSVCRxx.DLL thing). (A simple example is calling malloc() or printf() or something like this in a library's code.)
All this means that most of a DLL written on Windows for Windows depends both on Windows and the C or C++ standard library associated with the compiler used to build that library.
That's not to mention that Windows DLL uses PE (Portable Executable) format for its modules while GNU/Linux-based systems natively use the ELF format for its shared object files.
I'm afraid not because the windows DLLs do different kernel calls than Linux shared object, both has this fancy name for system library objects
But if you need to run a windows native app on linux I would recommend give Wine a try, I doubt it works properly, and a second worthy try is dosbox, once I doubt this works too
But there is a little hope if those DLLs are written in .net framework, you could wrap them up on a nice c# code and use mono on linux side, not sure if this enables you to import those DLLs to golang, but I don't think so either
In the and with that amount of tricks to have everything working you will get some performance issues, just saying

What Really Protects File Priveleges?

In Windows for instance, and all operating systems, file priveleges exist that "prevent" a file from being written to if that rule is set.
This is hard to describe but please listen. People coding in a C language obviously would use some form of framework to easily modify a file. Using the built-in .Net framework, Microsoft obviously would put prevention into their classes checking file permissions before writing to a file. Since file permissions are stored via software and not hardware, what really prevents a file from being tampered with?
Let's hop over to Assembly. Suppose I create an Assembly program that directly accesses hard drive data and changes the bytes of a file. How could file permissions possibly prevent me from doing this? I guess what I am trying to ask is how a file permission really stays secure if the compiled program does not check for file permissions before writing to a file?
Suppose I create an Assembly program that directly accesses hard drive data and changes the bytes of a file. How could file permissions possibly prevent me from doing this?
If you write in assembly, your assembly is still run in a CPU mode that prevents direct access to memory and devices.
CPU modes … place restrictions on the type and scope of operations that can be performed by certain processes being run by the CPU. This design allows the operating system to run with more privileges than application software.
Your code still needs to issue system calls to get the OS to interact with memory not owned by your process and devices.
a system call is how a program requests a service from an operating system's kernel. This may include hardware related services (e.g. accessing the hard disk), creating and executing new processes, and communicating with integral kernel services (like scheduling).
The OS maintains security by monopolizing the ability to switch CPU modes and by crafting system calls so that they are safe for user-land code to initiate.

How to simulate ThreadX application on Windows OS

I have an application using ThreadX 5.1 as the kernel.
The Image is flashed on to a hardware running an ARM 9 processor.
I'm trying to build a Simulator for the application that can be run on Windows (say XP, 32-bit).
Is there any way I can make it run on Windows, without modifying the entire source code to start calling win32 system calls?
You can build a Simulator for the application that can be run on Windows with "ThreadX for Win32".
"ThreadX for Win32"'s specification is hear.
http://rtos.com/products/threadx/Win32
Yes you can if you are willing to put in the work.
First observe that each threadx system call has an equivalent posix call except for events.
So your threadx program can run as a single process using posix threads, mutexes, etc.
Events can be handled by an external library (there are a few out there).
If you find this difficult in windows then the simplest thing to do is set up a linux vm. I use an ubuntu vm running on Virtual Box. It is very easy to set up. All you will need is the cdt version of eclipse.
Next you need to stub out all of your low level system calls.
This is also easier than you might think. For example, if you have a SPI driver to read and write to flash, you can replace your flash with a big array which is quite easy to work with at this level.
Having said all this, you may get more mileage if your threadx application is modular. Then you can test each module on it's own and you don't need to mess with threads, etc.
As a first approximation this may give you what you need without going the distance to port the whole thing to run under posix.
I have done this successfully in the past and developed a full set of unit tests for a module that allowed me to develop and test it (on my mac) before going to the target. Development is much faster and reliable this way.
Another option you may want to consider is to find a qemu project that supports your microprocessor. With some work you can develop a complete simulator for your platform and then run the real firmware under the emulator.
Good luck.

Why I need to re-compile vmware kernel module after a linux kernel upgrade?

After a linux kernel upgrade, my VMWare server cannot start until using vmware-config.pl to do some re-config work (including build some kernel modules).
If I update my windows VMWare host with latest Windows Service Pack, I usually not need to do anything to run VMWare.
Why VMWare works differently between Linux and Windows? Does this re-compile action brings any benifits on Linux platform over Windows?
Go read The Linux Kernel Driver Interface.
This is being written to try to explain why Linux does not have a binary kernel interface, nor does it have a stable kernel interface. Please realize that this article describes the _in kernel_ interfaces, not the kernel to userspace interfaces. The kernel to userspace interface is the one that application programs use, the syscall interface. That interface is _very_ stable over time, and will not break. I have old programs that were built on a pre 0.9something kernel that still works just fine on the latest 2.6 kernel release. This interface is the one that users and application programmers can count on being stable.
It reflects the view of a large portion of Linux kernel developers:
the freedom to change in-kernel implementation details and APIs at any time allows them to develop much faster and better.
Without the promise of keeping in-kernel interfaces identical from release to release, there is no way for a binary kernel module like VMWare's to work reliably on multiple kernels.
As an example, if some structures change on a new kernel release (for better performance or more features or whatever other reason), a binary VMWare module may cause catastrophic damage using the old structure layout. Compiling the module again from source will capture the new structure layout, and thus stand a better chance of working -- though still not 100%, in case fields have been removed or renamed or given different purposes.
If a function changes its argument list, or is renamed or otherwise made no longer available, not even recompiling from the same source code will work. The module will have to adapt to the new kernel. Since everybody (should) have source and (can find somebody who) is able to modify it to fit. "Push work to the end-nodes" is a common idea in both networking and free software: since the resources [at the fringes]/[of the developers outside the Linux kernel] are larger than the limited resources [of the backbone]/[of the Linux developers], the trade-off to make the former do more of the work is accepted.
On the other hand, Microsoft has made the decision that they must preserve binary driver compatibility as much as possible -- they have no choice, as they are playing in a proprietary world. In a way, this makes it much easier for outside developers who no longer face a moving target, and for end-users who never have to change anything. On the downside, this forces Microsoft to maintain backwards-compatibility, which is (at best) time-consuming for Microsoft's developers and (at worst) is inefficient, causes bugs, and prevents forward progress.
Linux does not have a stable kernel ABI - things like the internal layout of datastructures, etc changes from version to version. VMWare needs to be rebuilt to use the ABI in the new kernel.
On the other hand, Windows has a very stable kernel ABI that does not change from service pack to service pack.
To add to bdonlan's answer, ABI compatibility is a mixed bag. On one hand, it allows you to distribute binary modules and drivers which will work with newer versions of the kernel. On the other hand, it forces kernel programmers to add a lot of glue code to retain backwards compatibility. Because Linux is open-source, and because kernel developers even whether they're even allowed, the ability to distribute binary modules isn't considered that important. On the upside, Linux kernel developers don't have to worry about ABI compatibility when altering datastructures to improve the kernel. In the long run, this results in cleaner kernel code.
It's a consequence of Linux and Windows being developed in different cultural environments and expectations: http://www.joelonsoftware.com/articles/Biculturalism.html. In short: Windows is designed to be suitable for users, whereas Linux evolves to be suitable for open source developers.

Resources