Limiting syscall access for a Linux application - linux

Assume a Linux binary foobar which has two different modes of operation:
Mode A: A well-behaved mode in which syscalls a, b and c are used.
Mode B: A things-gone-wrong mode in which syscalls a, b, c and d are used.
Syscalls a, b and c are harmless, whereas syscall d is potentially dangerous and could cause instability to the machine.
Assume further that which of the two modes the application runs is random: the application runs in mode A with probability 95 % and in mode B with probability 5 %. The application comes without source code so it cannot be modified, only run as-is.
I want to make sure that the application cannot execute syscall d. When executing syscall d the result should be either a NOOP or an immediate termination of the application.
How do I achieve that in a Linux environment?

Is the application linked statically?
If not, you may override some symbols, for example, let's redefine socket:
int socket(int domain, int type, int protocol)
{
write(1,"Error\n",6);
return -1;
}
Then build a shared library:
gcc -fPIC -shared test.c -o libtest.so
Let's run:
nc -l -p 6000
Ok.
And now:
$ LD_PRELOAD=./libtest.so nc -l -p 6000
Error
Can't get socket
What happens when you run with variable LD_PRELOAD=./libtest.so? It overrides with symbols defined in libtest.so over those defined in the C library.

It seems that systrace does exactly what you need. From the Wikipedia page:
An application is allowed to make only those system calls specified as permitted in the policy. If the application attempts to execute a system call that is not explicitly permitted an alarm gets raised.

This is one possible application of sandboxing (specifically, Rule-based Execution). One popular implementation is SELinux.
You will have to write the policy that corresponds to what you want to allow the process to do.

That's exactly what seccomp-bpf is for. See an example how to restrict access to syscalls.

Related

Read from standard input with all MPI processes

So far I've been using OPEN(fid, FILE='IN', ...) and it seems that all MPI processes read the same file IN without interfering with each other.
Furthermore, in order to allow the input file being chosen among several, I simply made the IN file a symbolic link pointing to the desired input. This means that when I want to change the input file I have to run ln -sf desidered-input IN before running the program (mpirun -n $np ./program).
I'd really like to be able to run the progam as mpirun -n $np ./program < input-file. To do so I removed the OPEN statement, and the corresponding CLOSE statement, and changed all READ(fid,*) statements to READ(INPUT_UNIT,*) (I'm using ISO_FORTRAN_ENV module).
But, after all edits, I've realized that only one process (always 0, I noticed) reads from it, since all others reach EOF immediately. Here is a MWE, using OpenMPI 2.0.1.
! cat main.f90
program main
use, intrinsic :: iso_fortran_env
use mpi
implicit none
integer :: myid, x, ierr, stat
x = 12
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, myid, ierr)
read(input_unit,*, iostat=stat) x
if (is_iostat_end(stat)) write(output_unit,*) myid, "I'm out"
if (.not. is_iostat_end(stat)) write(output_unit,*) myid, "I'm in", myid, x
call mpi_finalize(ierr)
end program main
that can be compiled with mpifort -o main main.f90, run with mpirun -np 4 ./main, and which results in this output
1 I'm out
2 I'm out
3 I'm out
17 this is my input from keyboard
0 I'm in 0 17
I know that MPI has proper routines to perform parallel I/O, but I've found nothing about reading from standard input.
You are seeing the expected behaviour with OpenMPI. By default, mpirun
directs UNIX standard input to /dev/null on all processes except the MPI_COMM_WORLD rank 0 process. The MPI_COMM_WORLD rank 0 process inherits standard input from mpirun.
The option --stdin can be used to direct standard input to another process, but not to direct to all.
One could also note that the behaviour of redirection of standard input isn't consistent across MPI implementations (the notion isn't specified by the MPI standard). For example, using Intel MPI there is the -s option to that mpirun. mpirun -np 4 -s all ./main does allow all processes access to mpirun's standard input. There's also no guarantee that processes without that redirection will fail, rather than wait, to read.

How can I compile C code to get a bare-metal skeleton of a minimal RISC-V assembly program?

I have the following simple C code:
void main(){
int A = 333;
int B=244;
int sum;
sum = A + B;
}
When I compile this with
$riscv64-unknown-elf-gcc code.c -o code.o
If I want to see the assembly code I use
$riscv64-unknown-elf-objdump -d code.o
But when I explore the assembly code I see that this generates a lot of code which I assume is for Proxy Kernel support (I am a newbie to riscv). However, I do not want that this code has support for Proxy kernel, because the idea is to implement only this simple C code within an FPGA.
I read that riscv provides three types of compilation: Bare-metal mode, newlib proxy kernel and riscv Linux. According to previous research, the kind of compilation that I should do is bare metal mode. This is because I desire a minimum assembly without support for the operating system or kernel proxy. Assembly functions as a system call are not required.
However, I have not yet been able to find as I can compile a C code for get a skeleton of a minimal riscv assembly program. How can I compile the C code above in bare metal mode or for get a skeleton of a minimal riscv assembly code?
Warning: this answer is somewhat out-of-date as of the latest RISC-V Privileged Spec v1.9, which includes the removal of the tohost Control/Status Register (CSR), which was a part of the non-standard Host-Target Interface (HTIF) which has since been removed. The current (as of 2016 Sep) riscv-tests instead perform a memory-mapped store to a tohost memory location, which in a tethered environment is monitored by the front-end server.
If you really and truly need/want to run RISC-V code bare-metal, then here are the instructions to do so. You lose a bunch of useful stuff, like printf or FP-trap software emulation, which the riscv-pk (proxy kernel) provides.
First things first - Spike boots up at 0x200. As Spike is the golden ISA simulator model, your core should also boot up at 0x200.
(cough, as of 2015 Jul 13, the "master" branch of riscv-tools (https://github.com/riscv/riscv-tools) is using an older pre-v1.7 Privileged ISA, and thus starts at 0x2000. This post will assume you are using v1.7+, which may require using the "new_privileged_isa" branch of riscv-tools).
So when you disassemble your bare-metal program, it better
start at 0x200!!! If you want to run it on top of the proxy-kernel, it
better start at 0x10000 (and if Linux, it’s something even larger…).
Now, if you want to run bare metal, you’re forcing yourself to write up the
processor boot code. Yuck. But let’s punt on that and pretend that’s not
necessary.
(You can also look into riscv-tests/env/p, for the “virtual machine”
description for a physically addressed machine. You’ll find the linker script
you need and some macros.h to describe some initial setup code. Or better
yet, in riscv-tests/benchmarks/common.crt.S).
Anyways, armed with the above (confusing) knowledge, let’s throw that all
away and start from scratch ourselves...
hello.s:
.align 6
.globl _start
_start:
# screw boot code, we're going minimalist
# mtohost is the CSR in machine mode
csrw mtohost, 1;
1:
j 1b
and link.ld:
OUTPUT_ARCH( "riscv" )
ENTRY( _start )
SECTIONS
{
/* text: test code section */
. = 0x200;
.text :
{
*(.text)
}
/* data: Initialized data segment */
.data :
{
*(.data)
}
/* End of uninitalized data segement */
_end = .;
}
Now to compile this…
riscv64-unknown-elf-gcc -nostdlib -nostartfiles -Tlink.ld -o hello hello.s
This compiles to (riscv64-unknown-elf-objdump -d hello):
hello: file format elf64-littleriscv
Disassembly of section .text:
0000000000000200 <_start>:
200: 7810d073 csrwi tohost,1
204: 0000006f j 204 <_start+0x4>
And to run it:
spike hello
It’s a thing of beauty.
The link script places our code at 0x200. Spike will start at
0x200, and then write a #1 to the control/status register
“tohost”, which tells Spike “stop running”. And then we spin on an address
(1: j 1b) until the front-end server has gotten the message and kills us.
It may be possible to ditch the linker script if you can figure out how to
tell the compiler to move <_start> to 0x200 on its own.
For other examples, you can peruse the following repositories:
The riscv-tests repository holds the RISC-V ISA tests that are very minimal
(https://github.com/riscv/riscv-tests).
This Makefile has the compiler options:
https://github.com/riscv/riscv-tests/blob/master/isa/Makefile
And many of the “virtual machine” description macros and linker scripts can
be found in riscv-tests/env (https://github.com/riscv/riscv-test-env).
You can take a look at the “simplest” test at (riscv-tests/isa/rv64ui-p-simple.dump).
And you can check out riscv-tests/benchmarks/common for start-up and support code for running bare-metal.
The "extra" code is put there by gcc and is the sort of stuff required for any program. The proxy kernel is designed to be the bare minimum amount of support required to run such things. Once your processor is working, I would recommend running things on top of pk rather than bare-metal.
In the meantime, if you want to look at simple assembly, I would recommend skipping the linking phase with '-c':
riscv64-unknown-elf-gcc code.c -c -o code.o
riscv64-unknown-elf-objdump -d code.o
For examples of running code without pk or linux, I would look at riscv-tests.
I'm surprised no one mentioned gcc -S which skips assembly and linking altogether and outputs assembly code, albeit with a bunch of boilerplate, but it may be convenient just to poke around.

Function graph (timestamped entry and exit) for both user, library and kernel space in Linux?

I'm writing this more-less in frustration - but who knows, maybe there's a way for this too...
I would like to analyze what happens with a function from ALSA, say snd_pcm_readi; for that purpose, let's say I have prepared a small testprogram.c, where I have this:
void doCapture() {
ret = snd_pcm_readi(handle, buffer, period_size);
}
The problem with this function is that it eventually (should) hook into snd_pcm_readi in the shared system library /usr/lib/libasound.so; from there, I believe via ioctl, it would somehow communicate to snd_pcm_read in the kernel module /lib/modules/$(uname -r)/kernel/sound/core/snd-pcm.ko -- and that should ultimately talk to whatever .ko kernel module which is a driver for a particular soundcard.
Now, with the organization like above, I can do something like:
valgrind --tool=callgrind --toggle-collect=doCapture ./testprogram
... and then kcachegrind callgrind.out.12406 does indeed reveal a relationship between snd_pcm_readi, libasound.so and an ioctl (I cannot get the same information to show with callgrind_annotate) - so that somewhat covers userspace; but that is as far as it goes. Furthermore, it produces a call graph, that is to say general caller/callee relationships between functions (possibly by a count of samples/ticks each function has spent working as scheduled).
However, what I would like to get instead, is something like the output of the Linux ftrace tracer called function_graph, which provides a timestamped entry and exit of traced kernel functions... example from ftrace: add documentation for function graph tracer [LWN.net]:
$ cat /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
# TIME CPU DURATION FUNCTION CALLS
# | | | | | | | |
2105.963678 | 0) | mutex_unlock() {
2105.963682 | 0) 5.715 us | __mutex_unlock_slowpath();
2105.963693 | 0) + 14.700 us | }
2105.963698 | 0) | dnotify_parent() {
(NB: newer ftrace documentation seems to not show a timestamp at first for the function\_graph, only duration - but I think it's still possible to modify that)
With ftrace, one can filter so one can only trace functions in a given kernel module - so in my case, I could add the functions of snd-pcm.ko and whatever .ko module is the soundcard driver, and I'd have whatever I find interesting in kernel-space covered. But then, I lose the link to the user-space program (unless I explicitly printf to /sys/kernel/debug/tracing/trace_marker, or do a trace_printk from user-space .c files)
Ultimately, what I'd like, is to have the possibility to specify an executable, possibly also library files and kernel modules - and obtain a timestamped function graph (with indented/nested entry and exit per function) like ftrace provides. Are there any alternatives for something like this? (Note I can live without the function exits - but I'd really like to have timestamped function entries)
As a PS: it seems I actually found something that fits the description, which is the fulltrace application/script:
fulltrace [andreoli#Github]
fulltrace traces the execution of an ELF program, providing as output a full trace of its userspace, library and kernel function calls. ...
(prerequisites) the following kernel configuration options and their dependencies must be set as enabled (=y): FTRACE, TRACING_SUPPORT, UPROBES, UPROBE_EVENT, FUNCTION_GRAPH_TRACER.
Sounds perfect - but the problem is, I'm on Ubuntu 11.04, and while this 2.6.38 kernel luckily has CONFIG_FTRACE=y enabled -- its /boot/config-`uname -r`
doesn't even mention UPROBES :/ And since I'd like to avoid doing kernel hacking, unfortunately I cannot use this script...
(Btw, if UPROBES were available, (as far as I understand) one sets a trace probe on a symbol address (as obtained from say objdump -d), and output goes again to /sys/kernel/debug/tracing/trace - so some custom solution would have been possible using UPROBES, even without the fulltrace script)
So, to narrow down my question a bit - is there a solution, that would allow simultaneous user-space (incl. shared libraries) and kernel-space "function graph" tracing, but where UPROBES are not available in the kernel?

Disable randomization of memory addresses

I'm trying to debug a binary that uses a lot of pointers. Sometimes for seeing output quickly to figure out errors, I print out the address of objects and their corresponding values, however, the object addresses are randomized and this defeats the purpose of this quick check up.
Is there a way to disable this temporarily/permanently so that I get the same values every time I run the program.
Oops. OS is Linux fsttcs1 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 UTC 2011 x86_64 GNU/Linux
On Ubuntu , it can be disabled with...
echo 0 > /proc/sys/kernel/randomize_va_space
On Windows, this post might be of some help...
http://blog.didierstevens.com/2007/11/20/quickpost-another-funny-vista-trick-with-aslr/
To temporarily disable ASLR for a particular program you can always issue the following (no need for sudo)
setarch `uname -m` -R ./yourProgram
You can also do this programmatically from C source before a UNIX exec.
If you take a look at the sources for setarch (here's one source):
http://code.metager.de/source/xref/linux/utils/util-linux/sys-utils/setarch.c
You can see if boils down to a system call (syscall) or a function call (depending on what your system defines). From setarch.c:
#ifndef HAVE_PERSONALITY
# include <syscall.h>
# define personality(pers) ((long)syscall(SYS_personality, pers))
#endif
On my CentOS 6 64-bit system, it looks like it uses a function (which probably calls the self-same syscall above). Take a look at this snippet from the include file in /usr/include/sys/personality.h (as referenced as <sys/personality.h> in the setarch source code):
/* Set different ABIs (personalities). */
extern int personality (unsigned long int __persona) __THROW;
What it boils down to, is that you can, from C code, call and set the personality to use ADDR_NO_RANDOMIZE and then exec (just like setarch does).
#include <sys/personality.com>
#ifndef HAVE_PERSONALITY
# include <syscall.h>
# define personality(pers) ((long)syscall(SYS_personality, pers))
#endif
...
void mycode()
{
// If requested, turn off the address rand feature right before execing
if (MyGlobalVar_Turn_Address_Randomization_Off) {
personality(ADDR_NO_RANDOMIZE);
}
execvp(argv[0], argv); // ... from set-arch.
}
It's pretty obvious you can't turn address randomization off in the process you are in (grin: unless maybe dynamic loading), so this only affects forks and execs later. I believe the Address Randomization flags are inherited by child sub-processes?
Anyway, that's how you can programmatically turn off the address randomization in C source code. This may be your only solution if you don't want the force a user to intervene manually and start-up with setarch or one of the other solutions listed earlier.
Before you complain about security issues in turning this off, some shared memory libraries/tools (such as PickingTools shared memory and some IBM databases) need to be able to turn off randomization of memory addresses.

Can a program assign the memory directly?

Is there any really low level programming language that can get access the memory variable directly? For example, if I have a program have a variable i. Can anyone access the memory to change my program variable i to another value?
As an example of how to change the variable in a program from “the outside”, consider the use of a debugger. Example program:
$ cat print_i.c
#include <stdio.h>
#include <unistd.h>
int main (void) {
int i = 42;
for (;;) { (void) printf("i = %d\n", i); (void) sleep(3); }
return 0;
}
$ gcc -g -o print_i print_i.c
$ ./print_i
i = 42
i = 42
i = 42
…
(The program prints the value of i every 3 seconds.)
In another terminal, find the process id of the running program and attach the gdb debugger to it:
$ ps | grep print_i
1779 p1 S+ 0:00.01 ./print_i
$ gdb print_i 1779
…
(gdb) bt
#0 0x90040df8 in mach_wait_until ()
#1 0x90040bc4 in nanosleep ()
#2 0x900409f0 in sleep ()
#3 0x00002b8c in main () at print_i.c:6
(gdb) up 3
#3 0x00002b8c in main () at print_i.c:6
6 for (;;) { (void) printf("i = %d\n", i); (void) sleep(3); }
(gdb) set variable i = 666
(gdb) continue
Now the output of the program changes:
…
i = 42
i = 42
i = 666
So, yes, it's possible to change the variable of a program from the “outside” if you have access to its memory. There are plenty of caveats here, e.g. one needs to locate where and how the variable is stored. Here it was easy because I compiled the program with debugging symbols. For an arbitrary program in an arbitrary language it's much more difficult, but still theoretically possible. Of course, if I weren't the owner of the running process, then a well-behaved operating system would not let me access its memory (without “hacking”), but that's a whole another question.
Sure, unless of course the operating system protects that memory on your behalf. Machine language (the lowest level programming language) always "accesses memory directly", and it's pretty easy to achieve in C (by casting some kind of integer to pointer, for example). Point is, unless this code's running in your process (or the kernel), whatever language it's written in, the OS would normally be protecting your process from such interference (by mapping the memory in various ways for different processes, for example).
If another process has sufficient permissions, then it can change your process's memory. On Linux, it's as simple as reading and writing the pseudo-file /proc/{pid}/mem. This is how many exploits work, though they do rely on some vulnerability that allows them to run with very high privileges (root on Unix).
Short answer: yes. Long answer: it depends on a whole lot of factors including your hardware (memory management?), your OS (protected virtual address spaces? features to circumvent these protections?) and the detailed knowledge your opponent may or may not have of both your language's architecture and your application structure.
It depends. In general, one of the functions of an operating system is called segmentation -- that means keeping programs out of each other's memory. If I write a program that tries to access memory that belongs to your program, the OS should crash me, since I'm committing something called a segmentation fault.
But there are situations where I can get around that. For example, if I have root privileges on the system, I may be able to access your memory. Or worse -- I can run your program inside a virtual machine, then sit outside that VM and do whatever I want to its memory.
So in general, you should assume that a malicious person can reach in and fiddle with your program's memory if they try hard enough.

Resources