Can a program assign the memory directly? - security

Is there any really low level programming language that can get access the memory variable directly? For example, if I have a program have a variable i. Can anyone access the memory to change my program variable i to another value?

As an example of how to change the variable in a program from “the outside”, consider the use of a debugger. Example program:
$ cat print_i.c
#include <stdio.h>
#include <unistd.h>
int main (void) {
int i = 42;
for (;;) { (void) printf("i = %d\n", i); (void) sleep(3); }
return 0;
}
$ gcc -g -o print_i print_i.c
$ ./print_i
i = 42
i = 42
i = 42
…
(The program prints the value of i every 3 seconds.)
In another terminal, find the process id of the running program and attach the gdb debugger to it:
$ ps | grep print_i
1779 p1 S+ 0:00.01 ./print_i
$ gdb print_i 1779
…
(gdb) bt
#0 0x90040df8 in mach_wait_until ()
#1 0x90040bc4 in nanosleep ()
#2 0x900409f0 in sleep ()
#3 0x00002b8c in main () at print_i.c:6
(gdb) up 3
#3 0x00002b8c in main () at print_i.c:6
6 for (;;) { (void) printf("i = %d\n", i); (void) sleep(3); }
(gdb) set variable i = 666
(gdb) continue
Now the output of the program changes:
…
i = 42
i = 42
i = 666
So, yes, it's possible to change the variable of a program from the “outside” if you have access to its memory. There are plenty of caveats here, e.g. one needs to locate where and how the variable is stored. Here it was easy because I compiled the program with debugging symbols. For an arbitrary program in an arbitrary language it's much more difficult, but still theoretically possible. Of course, if I weren't the owner of the running process, then a well-behaved operating system would not let me access its memory (without “hacking”), but that's a whole another question.

Sure, unless of course the operating system protects that memory on your behalf. Machine language (the lowest level programming language) always "accesses memory directly", and it's pretty easy to achieve in C (by casting some kind of integer to pointer, for example). Point is, unless this code's running in your process (or the kernel), whatever language it's written in, the OS would normally be protecting your process from such interference (by mapping the memory in various ways for different processes, for example).

If another process has sufficient permissions, then it can change your process's memory. On Linux, it's as simple as reading and writing the pseudo-file /proc/{pid}/mem. This is how many exploits work, though they do rely on some vulnerability that allows them to run with very high privileges (root on Unix).

Short answer: yes. Long answer: it depends on a whole lot of factors including your hardware (memory management?), your OS (protected virtual address spaces? features to circumvent these protections?) and the detailed knowledge your opponent may or may not have of both your language's architecture and your application structure.

It depends. In general, one of the functions of an operating system is called segmentation -- that means keeping programs out of each other's memory. If I write a program that tries to access memory that belongs to your program, the OS should crash me, since I'm committing something called a segmentation fault.
But there are situations where I can get around that. For example, if I have root privileges on the system, I may be able to access your memory. Or worse -- I can run your program inside a virtual machine, then sit outside that VM and do whatever I want to its memory.
So in general, you should assume that a malicious person can reach in and fiddle with your program's memory if they try hard enough.

Related

Are function locations altered when running a program through GDB?

I'm trying to run through a buffer overflow exercise, here is the code:
#include <stdio.h>
int badfunction() {
char buffer[8];
gets(buffer);
puts(buffer);
}
int cantrun() {
printf("This function cant run because it is never called");
}
int main() {
badfunction();
}
This is a simple piece of code. The objective is to overflow the buffer in badfunction()and override the return address having it point to the memory address of the function cantrun().
Step 1: Find the offset of the return address (in this case it's 12bytes, 8 for the buffer and 4 for the base pointer).
Step 2: Find the memory location of cantrun(), gdb say it's 0x0804849a.
When I run the program printf "%012x\x9a\x84\x04\x08" | ./vuln, I get the error "illegal instruction". This suggests to me that I have correctly overwritten the EIP, but that the memory location of cantrun() is incorrect.
I am using Kali Linux, Kernel 3.14, I have ASLR turned off and I am using execstack to allow an executable stack. Am I doing something wrong?
UPDATE:
As a shot in the dark I tried to find the correct instruction by moving the address around and 0x0804849b does the trick. Why is this different than what GDB shows. When running GDB, 0x0804849a is the location of the prelude instruction push ebp and 0x0804849b is the prelude instruction mov ebp,esp.
gdb doesn't do anything to change the locations of functions in the programs it executes. ASLR may matter, but by default gdb turns this off to enable simpler debugging.
It's hard to say why you are seeing the results you are. What does disassembling the function in gdb show?

In general, on ucLinux, is ioctl faster than writing to /sys filesystem?

I have an embedded system I'm working with, and it currently uses the sysfs to control certain features.
However, there is function that we would like to speed up, if possible.
I discovered that this subsystem also supports and ioctl interface, but before rewriting the code, I decided to search to see which is a faster interface (on ucLinux) in general: sysfs or ioctl.
Does anybody understand both implementations well enough to give me a rough idea of the difference in overhead for each? I'm looking for generic info, such as "ioctl is faster because you've removed the file layer from the function calls". Or "they are roughly the same because sysfs has a very simple interface".
Update 10/24/2013:
The specific case I'm currently doing is as follows:
int fd = open("/sys/power/state",O_WRONLY);
write( fd, "standby", 7 );
close( fd );
In kernel/power/main.c, the code that handles this write looks like:
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
const char * const *s;
#endif
char *p;
int len;
int error = -EINVAL;
p = memchr(buf, '\n', n);
len = p ? p - buf : n;
/* First, check if we are requested to hibernate */
if (len == 7 && !strncmp(buf, "standby", len)) {
error = enter_standby();
goto Exit;
((( snip )))
Can this be sped up by moving to a custom ioctl() where the code to handle the ioctl call looks something like:
case SNAPSHOT_STANDBY:
if (!data->frozen) {
error = -EPERM;
break;
}
error = enter_standby();
break;
(so the ioctl() calls the same low-level function that the sysfs function did).
If by sysfs you mean the sysfs() library call, notice this in man 2 sysfs:
NOTES
This System-V derived system call is obsolete; don't use it. On systems with /proc, the same information can be obtained via
/proc/filesystems; use that interface instead.
I can't recall noticing stuff that had an ioctl() and a sysfs interface, but probably they exist. I'd use the proc or sys handle anyway, since that tends to be less cryptic and more flexible.
If by sysfs you mean accessing files in /sys, that's the preferred method.
I'm looking for generic info, such as "ioctl is faster because you've removed the file layer from the function calls".
Accessing procfs or sysfs files does not entail an I/O bottleneck because they are not real files -- they are kernel interfaces. So no, accessing this stuff through "the file layer" does not affect performance. This is a not uncommon misconception in linux systems programming, I think. Programmers can be squeamish about system calls that aren't well, system calls, and paranoid that opening a file will be somehow slower. Of course, file I/O in the ABI is just system calls anyway. What makes a normal (disk) file read slow is not the calls to open, read, write, whatever, it's the hardware bottleneck.
I always use low level descriptor based functions (open(), read()) instead of high level streams when doing this because at some point some experience led me to believe they were more reliable for this specifically (reading from /proc). I can't say whether that's definitively true.
So, the question was interesting, I built a couple of modules, one for ioctl and one for sysfs, the ioctl implementing only a 4 bytes copy_from_user and nothing more, and the sysfs having nothing in its write interface.
Then a couple of userspace test up to 1 million iterations, here the results:
time ./sysfs /sys/kernel/kobject_example/bar
real 0m0.427s
user 0m0.056s
sys 0m0.368s
time ./ioctl /run/temp
real 0m0.236s
user 0m0.060s
sys 0m0.172s
edit
I agree with #goldilocks answer, HW is the real bottleneck, in a Linux environment with a well written driver choosing ioctl or sysfs doesn't make a big difference, but if you are using uClinux probably in your HW even few cpu cycles can make a difference.
The test I've done is for Linux not uClinux and it never wanted to be an absolute reference profiling the two interfaces, my point is that you can write a book about how fast is one or another but only testing will let you know, took me few minutes to setup the thing.

Is there a way to check whether the processor cache has been flushed recently?

On i386 linux. Preferably in c/(c/posix std libs)/proc if possible. If not is there any piece of assembly or third party library that can do this?
Edit: I'm trying to develop test whether a kernel module clear a cache line or the whole proccesor(with wbinvd()). Program runs as root but I'd prefer to stay in user space if possible.
Cache coherent systems do their utmost to hide such things from you. I think you will have to observe it indirectly, either by using performance counting registers to detect cache misses or by carefully measuring the time to read a memory location with a high resolution timer.
This program works on my x86_64 box to demonstrate the effects of clflush. It times how long it takes to read a global variable using rdtsc. Being a single instruction tied directly to the CPU clock makes direct use of rdtsc ideal for this.
Here is the output:
took 81 ticks
took 81 ticks
flush: took 387 ticks
took 72 ticks
You see 3 trials: The first ensures i is in the cache (which it is, because it was just zeroed as part of BSS), the second is a read of i that should be in the cache. Then clflush kicks i out of the cache (along with its neighbors) and shows that re-reading it takes significantly longer. A final read verifies it is back in the cache. The results are very reproducible and the difference is substantial enough to easily see the cache misses. If you cared to calibrate the overhead of rdtsc() you could make the difference even more pronounced.
If you can't read the memory address you want to test (although even mmap of /dev/mem should work for these purposes) you may be able to infer what you want if you know the cacheline size and associativity of the cache. Then you can use accessible memory locations to probe the activity in the set you're interested in.
Source code:
#include <stdio.h>
#include <stdint.h>
inline void
clflush(volatile void *p)
{
asm volatile ("clflush (%0)" :: "r"(p));
}
inline uint64_t
rdtsc()
{
unsigned long a, d;
asm volatile ("rdtsc" : "=a" (a), "=d" (d));
return a | ((uint64_t)d << 32);
}
volatile int i;
inline void
test()
{
uint64_t start, end;
volatile int j;
start = rdtsc();
j = i;
end = rdtsc();
printf("took %lu ticks\n", end - start);
}
int
main(int ac, char **av)
{
test();
test();
printf("flush: ");
clflush(&i);
test();
test();
return 0;
}
I dont know of any generic command to get the the cache state, but there are ways:
I guess this is the easiest: If you got your kernel module, just disassemble it and look for cache invalidation / flushing commands (atm. just 3 came to my mind: WBINDVD, CLFLUSH, INVD).
You just said it is for i386, but I guess you dont mean a 80386. The problem is that there are many different with different extension and features. E.g. the newest Intel series has some performance/profiling registers for the cache system included, which you can use to evalute cache misses/hits/number of transfers and similar.
Similar to 2, very depending on the system you got. But when you have a multiprocessor configuration you could watch the first cache coherence protocol (MESI) with the 2nd.
You mentioned WBINVD - afaik that will always flush complete, i.e. all, cache lines
It may not be an answer to your specific question, but have you tried using a cache profiler such as Cachegrind? It can only be used to profile userspace code, but you might be able to use it nonetheless, by e.g. moving the code of your function to userspace if it does not depend on any kernel-specific interfaces.
It might actually be more effective than trying to ask the processor for information that may or may not exist and that will be probably affected by your mere asking about it - yes, Heisenberg was way before his time :-)

SetPriorityClass equivalent on Linux

I have a daemon-like application that does some disk-intensive processing at initialization. To avoid slowing down other tasks I do something like this on Windows:
SetPriorityClass(GetCurrentProcess(), PROCESS_MODE_BACKGROUND_BEGIN);
// initialization tasks
SetPriorityClass(GetCurrentProcess(), PROCESS_MODE_BACKGROUND_END);
// daemon is ready and running at normal priority
AFAIK, on Unices I can call nice or setpriority and lower the process priority but I can't raise it back to what it was at process creation (i.e. there's no equivalent to the second SetPriorityClass invocation) unless I have superuser privileges. Is there by any chance another way of doing it that I'm missing? (I know I can create an initialization thread that runs at low priority and wait for it to complete on the main thread, but I'd rather prefer avoiding it)
edit: Bonus points for the equivalent SetThreadPriority(GetCurrentThread(), THREAD_MODE_BACKGROUND_BEGIN); and SetThreadPriority(GetCurrentThread(), THREAD_MODE_BACKGROUND_END);
You've said that your processing is disk intensive, so solutions using nice won't work. nice handles the priority of CPU access, not I/O access. PROCESS_MODE_BACKGROUND_BEGIN lowers I/O priority as well as CPU priority, and requires kernel features that don't exist in XP and older.
Controlling I/O priority is not portable across Unices, but there is a solution on modern Linux kernels. You'll need CAP_SYS_ADMIN to lower I/O priority to IO_PRIO_CLASS_IDLE, but it is possible to lower and raise priority within the best effort class without this.
The key function call is ioprio_set, which you'll have to call via a syscall wrapper:
static int ioprio_set(int which, int who, int ioprio)
{
return syscall(SYS_ioprio_set, which, who, ioprio);
}
For full example source, see here.
Depending on permissions, your entry to background mode is either IOPRIO_PRIO_VALUE(IO_PRIO_CLASS_IDLE,0) or IOPRIO_PRIO_VALUE(IO_PRIO_CLASS_BE,7). The sequence should then be:
#define IOPRIO_PRIO_VALUE(class, data) (((class) << IOPRIO_CLASS_SHIFT) | data)
ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IO_PRIO_CLASS_BE,7));
// Do work
ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IO_PRIO_CLASS_BE,4));
Note that you many not have permission to return to your original io priority, so you'll need to return to another best effort value.
Actually, if you have a reasonably recent Linux kernel there might be a solution. Here's what TLPI says:
In Linux kernels before 2.6.12, an
unprivileged process may use
setpriority() only to (irreversibly)
lower its own or another process’s
nice value.
Since kernel 2.6.12, Linux provides
the RLIMIT_NICE resource limit, which
permits unprivileged processes to
increase nice values. An unprivileged
process can raise its own nice value
to the maximum specified by the
formula 20 – rlim_cur, where rlim_cur
is the current RLIMIT_NICE soft
resource limit.
So basically you have to:
Use ulimit -e to set RLIMIT_NICE
Use setpriority as usual
Here is an example
Edit /etc/security/limits.conf. Add
cnicutar - nice -10
Verify using ulimit
cnicutar#aiur:~$ ulimit -e
30
We like that limit so we don't change it.
nice ls
cnicutar#aiur:~$ nice -n -10 ls tmp
cnicutar#aiur:~$
cnicutar#aiur:~$ nice -n -11 ls tmp
nice: cannot set niceness: Permission denied
setpriority example
#include <stdio.h>
#include <sys/resource.h>
#include <unistd.h>
int main()
{
int rc;
printf("We are being nice!\n");
/* set our nice to 10 */
rc = setpriority(PRIO_PROCESS, 0, 10);
if (0 != rc) {
perror("setpriority");
}
sleep(1);
printf("Stop being nice\n");
/* set our nice to -10 */
rc = setpriority(PRIO_PROCESS, 0, -10);
if (0 != rc) {
perror("setpriority");
}
return 0;
}
Test program
cnicutar#aiur:~$ ./nnice
We are being nice!
Stop being nice
cnicutar#aiur:~$
The only drawback to this is that it's not portable to other Unixes (or is it Unices ?).
To workaroud lowering priority and then bringing it back you can:
fork()
CHILD: lower its priority
PARENT: wait for the child (keeping original parent's priority)
CHILD: do the job (in lower priority)
PARENT: continue with original priority after child is finished.
This should be UNIX-portable solution.

Disable randomization of memory addresses

I'm trying to debug a binary that uses a lot of pointers. Sometimes for seeing output quickly to figure out errors, I print out the address of objects and their corresponding values, however, the object addresses are randomized and this defeats the purpose of this quick check up.
Is there a way to disable this temporarily/permanently so that I get the same values every time I run the program.
Oops. OS is Linux fsttcs1 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 UTC 2011 x86_64 GNU/Linux
On Ubuntu , it can be disabled with...
echo 0 > /proc/sys/kernel/randomize_va_space
On Windows, this post might be of some help...
http://blog.didierstevens.com/2007/11/20/quickpost-another-funny-vista-trick-with-aslr/
To temporarily disable ASLR for a particular program you can always issue the following (no need for sudo)
setarch `uname -m` -R ./yourProgram
You can also do this programmatically from C source before a UNIX exec.
If you take a look at the sources for setarch (here's one source):
http://code.metager.de/source/xref/linux/utils/util-linux/sys-utils/setarch.c
You can see if boils down to a system call (syscall) or a function call (depending on what your system defines). From setarch.c:
#ifndef HAVE_PERSONALITY
# include <syscall.h>
# define personality(pers) ((long)syscall(SYS_personality, pers))
#endif
On my CentOS 6 64-bit system, it looks like it uses a function (which probably calls the self-same syscall above). Take a look at this snippet from the include file in /usr/include/sys/personality.h (as referenced as <sys/personality.h> in the setarch source code):
/* Set different ABIs (personalities). */
extern int personality (unsigned long int __persona) __THROW;
What it boils down to, is that you can, from C code, call and set the personality to use ADDR_NO_RANDOMIZE and then exec (just like setarch does).
#include <sys/personality.com>
#ifndef HAVE_PERSONALITY
# include <syscall.h>
# define personality(pers) ((long)syscall(SYS_personality, pers))
#endif
...
void mycode()
{
// If requested, turn off the address rand feature right before execing
if (MyGlobalVar_Turn_Address_Randomization_Off) {
personality(ADDR_NO_RANDOMIZE);
}
execvp(argv[0], argv); // ... from set-arch.
}
It's pretty obvious you can't turn address randomization off in the process you are in (grin: unless maybe dynamic loading), so this only affects forks and execs later. I believe the Address Randomization flags are inherited by child sub-processes?
Anyway, that's how you can programmatically turn off the address randomization in C source code. This may be your only solution if you don't want the force a user to intervene manually and start-up with setarch or one of the other solutions listed earlier.
Before you complain about security issues in turning this off, some shared memory libraries/tools (such as PickingTools shared memory and some IBM databases) need to be able to turn off randomization of memory addresses.

Resources