Suppressing system calls when using gcc/g++ - linux

I have a portal in my university LAN where people can upload code to programming puzzles in C/C++. I would like to make the portal secure so that people cannot make system calls via their submitted code. There might be several workarounds but I'd like to know if I could do it simply by setting some clever gcc flags. libc by default seems to include <unistd.h>, which appears to be the basic file where system calls are declared. Is there a way I could tell gcc/g++ to 'ignore' this file at compile time so that none of the functions declared in unistd.h can be accessed?

Some particular reason why chroot("/var/jail/empty"); setuid(65534); isn't good enough (assuming 65534 has sensible limits)?

Restricting access to the header file won't prevent you from accessing libc functions: they're still available if you link against libc - you just won't have the prototypes (and macros) to hand; but you can replicate them yourself.
And not linking against libc won't help either: system calls could be made directly via inline assembler (or even tricks involving jumping into data).
I don't think this is a good approach in general. Running the uploaded code in a completely self-contained virtual sandbox (via QEMU or something like that, perhaps) would probably be a better way to go.

-D can overwrite individual function names. For example:
gcc file.c -Dchown -Dchdir
Or you can set the include guard yourself:
gcc file.c -D_UNISTD_H
However their effects can be easily reverted with #undefs by intelligent submitters :)

Related

Why does uClibc UCLIBC_BUILD_NOEXECSTACK not actually use the linker flag -Wl,-z,noexecstack

One modern Linux security hardening tactic is to compile & link code with the option -Wl,-z-noexecstack, this marks the DLL or binary as not needing an executable stack. This condition can be checked using readelf or other means.
I have been working with uClibc and noticed that it produces objects (.so files) that do not have this flag set. Yet uClibc has a configuration option UCLIBC_BUILD_NOEXECSTACK which according to the help means:
Mark all assembler files as noexecstack, which will mark uClibc
as not requiring an executable stack. (This doesn't prevent other
files you link against from claiming to need an executable stack, it
just won't cause uClibc to request it unnecessarily.)
This is a security thing to make buffer overflows harder to exploit.
...etc...
On some digging into the Makefiles this is correct - the flag is only applied to the assembler.
Because the flag is only passed to the assembler does this mean that the uClibc devs have missed an important hardening flag? There are other options, for example UCLIBC_BUILD_RELRO which do result in the equivalent flag being added to the linker (as -Wl,-z,relro)
However a casual observer could easily misread this and assume, as I originally did, that UCLIBC_BUILD_NOEXECSTACK is actually marking the .so file when it is in fact not. OpenWRT for example ensures that that flag is set when it builds uClibc.
Why would uClibc not do things the 'usual' way? What am I missing here? Are the libraries (e.g. librt.so, libpthread.so, etc) actually not NX?
EDIT
I was able to play with the Makefiles and get the noexecstack bit by using the -Wl,-z,noexecstack argument. So why would they not use that as well?
OK, it turns out after list conversation and further research that:
the GNU linker sets the DLL / executable stack state based on the 'lowest common denominator' i.e. if any linked or referenced part has an exec stack then the whole object is set this way
the 'correct' way to resolve this problem is actually to find and fix assembly / object files that use an exec stack when they dont need to.
Using the linker to 'fix' things is a workaround if you can't otherwise fix the root cause.
So for uClibc solution is to submit a bug so that the underlying objects get fixed. Otherwise anything linked with static libraries wont get a non-exec stack.
For my own question, if building a custom firmware not using any static libraries it is possibly sufficient to use the linker flag.
References:
Ubuntu Security Team - Executable Stacks

How to inspect Haskell bytecode

I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!
I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

Finding the shared library name to use with dlload

In my open-source project Artha I use libnotify for showing passive desktop notifications to the user.
Instead of statically linking libnotify, a lookup at runtime is made for the shared object (.so) file via dlload, if available on the target machine, Artha exposes the notification feature in it's GUI. On app. start, a call to dlload with filename param as libnotify.so.1 is made and if it returns a non-null pointer, then the feature is exposed.
A recurring problem with this model is that every time the version number of the library is bumped, Artha's code needs to be updated, currently libnotify.so.4 is the latest to entail such an occurance.
Is there a linux system call (irrespective of the distro the app. is running on), which can tell me if a particular library's shared object is available at runtime? I know that there exists the bruteforce option of enumerating the library by going from 1 to say 10, I find the solution ugly and inelegant.
Also, if this can be addressed via autoconf, then that solution is welcome too I.e. at build time, based on the target machine, the configure.h generated should've the right .so name that can be passed to dlload.
P.S.: I think good distros follow the style of creating links to libnotify.so.x so that a programmer can just do dlload("libnotify.so", RTLD_LAZY) and the right version numbered .so is loaded; unfortunately not all distros follow this, including Ubuntu.
The answer is: you don't.
dlopen() is not designed to deal with things like that, and trying to load whichever soversion you find on the system just because it happens to have the symbols you need is not a good way to do it.
Different sonames have different ABIs, and different ABIs means that you may be calling the same exact symbol name that is expecting a different set (or different size) of parameters, which will cause crashes or misbehaviour that are extremely difficult do debug.
You should have a read on how shared object versions work and what an ABI is.
The libfoo.so link is there for the link editor (ld) and is usually installed with the -devel packages for that reason; it might also very well not be a link but rather a text file with a linker script, often times on purpose to avoid exactly what you're trying to do.

How to make a fix in one of the shared libraries (.so) in the project on linux?

I want to make a quick fix to one of the project's .so libraries. Is it safe to just recompile the .so and replace the original? Or I have to rebuild and reinstall the whole project? Or it depends?
It depends. Shared library needs to be binary-compatible with your executable.
For example,
if you changed the behaviour of one of library's internal functions, you probably don't need to recompile.
If you changed the size of a struct (e.g. by adding a member) that's known by the application, you will need to recompile, otherwise the library and the application will think the struct is smaller than it is, and will crash when the library tries to read an extra uninitialized member that the application didn't write to.
If you change the type or the position of arguments of any functions visible from the applications, you do need to recompile, because the library will try to read more arguments off the stack than the application has put on it (this is the case with C, in C++ argument types are the part of function signature, so the app will refuse run, rather than crashing).
The rule of thumb (for production releases) is that, if you are not consciously aware that you are maintaining binary compatibility, or not sure what binary compatibility is, you should recompile.
That's certainly the intent of using dynamic libraries: if something in the library needs updating, then you just update the library, and programs that use it don't need to be changed. If the signature of the function you're changing doesn't change, and it accomplishes the same thing, then this will in general be fine.
There are of course always edge cases where a program depends on some undocumented side-effect of a function, and then changing that function's implementation might change the side-effect and break the program; but c'est la vie.
If you have not changed the ABI of the shared library, you can just rebuild and replace the library.
It depends yes.
However, I assume you have the exact same source and compiler that built the other stuff and now if you only change in a .cpp file something, it is fine.
Other things e.g. changing an interface (between the shared lib and the rest of the system) in a header file is not fine.
If you don't change your library binary interface, it's ok to recompile and redeploy only the shared library.
Good references:
How To Write Shared Libraries
The Little Manual of API Design

Resources