I am trying to get offset of sys_call_table on Linux x86_64.
First of all I read pointer to system_call entry by reading it from MSR_LSTAR and it's correct
static unsigned long read_msr(unsigned int msr)
{
unsigned low, high;
asm volatile("rdmsr" : "=a" (low), "=d" (high) : "c" (msr));
return ((low) | ((u64)(high) << 32));
}
Then I parse it to find opcode of call instruction and it is also correct
#define CALL_OP 0xFF
#define CALL_MODRM 0x14
static unsigned long find_syscall_table(unsigned char *ptr)
{
//correct
for (; (*ptr != CALL_OP) || (*(ptr+1) != CALL_MODRM); ptr++);
//not correct
ptr += *(unsigned int*)(ptr + 3);
pr_info("%lx", (unsigned long)ptr);
return ptr;
}
But I failed to get address after call opcode. First byte of ptr is opcode, then ModRM byte, then SIB and then 32bit displacement, so I add 3 to ptr and dereferenced it as integer value and then add it to ptr, because it is %RIP, and address is RIP relative. But the result value is wrong, it don't coincide with value I see in gdb, so where am I wrong?
It's not x7e9fed00 but rather -0x7e9fed00 - a negative displacement.
That is the sign-magnitude form of the 2's complement negative number 0x81601300
which is stored by a little-endian processor as "00 13 60 81"
No idea if you will find sys_call_table at the resulting address however. As an alternative idea, it seems some people find it by searching memory for the known pointers to functions that should be listed in it.
Related
I am trying to insert probes at different instructions with kprobes in function of kernel module.
But register_kprobe is returning EINVAL(-22) error for 0xffffffffa33c1085 instruction addresses and 0xffffffffa33c109b from below assembly code (it passes for all other instruction addresses).
Instructions giving errors:
0xffffffffa33c1085 <test_increment+5>: mov 0x21bd(%rip),%eax # 0xffffffffa33c3248
0xffffffffa33c109b <test_increment+27>: mov %esi,0x21a7(%rip) # 0xffffffffa33c3248
Observed that both these instructions use rip register. Tried with functions of other modules, observed same error with instructions which use rip register.
Why is register_kprobe failing ? does it have any constraints involving rip ? Any help is appreciated.
System has kernel 3.10.0-514 on x86_64 installed.
kprobe function:
kp = kzalloc(sizeof(struct kprobe), GFP_KERNEL);
kp->post_handler = exit_func;
kp->pre_handler = entry_func;
kp->addr = sym_addr;
atomic_set(&pcount, 0);
ret = register_kprobe(kp);
if ( ret != 0 ) {
printk(KERN_INFO "register_kprobe returned %d for %s\n", ret, str);
kfree(kp);
kp=NULL;
return ret;
}
probed function:
int race=0;
void test_increment()
{
race++;
printk(KERN_INFO "VALUE=%d\n",race);
return;
}
assembly code:
crash> dis -l test_increment
0xffffffffa33c1080 <test_increment>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa33c1085 <test_increment+5>: mov 0x21bd(%rip),%eax # 0xffffffffa33c3248
0xffffffffa33c108b <test_increment+11>: push %rbp
0xffffffffa33c108c <test_increment+12>: mov $0xffffffffa33c2024,%rdi
0xffffffffa33c1093 <test_increment+19>: mov %rsp,%rbp
0xffffffffa33c1096 <test_increment+22>: lea 0x1(%rax),%esi
0xffffffffa33c1099 <test_increment+25>: xor %eax,%eax
0xffffffffa33c109b <test_increment+27>: mov %esi,0x21a7(%rip) # 0xffffffffa33c3248
0xffffffffa33c10a1 <test_increment+33>: callq 0xffffffff81659552 <printk>
0xffffffffa33c10a6 <test_increment+38>: pop %rbp
0xffffffffa33c10a7 <test_increment+39>: retq
Thanks
Turns out, register_kprobe does have limitations with instructions invoving rip relative addressing for x86_64.
Here is snippet of __copy_instruction function code causing error (register_kprobe -> prepare_kprobe -> arch_prepare_kprobe -> arch_copy_kprobe -> __copy_instruction )
#ifdef CONFIG_X86_64
if (insn_rip_relative(&insn)) {
s64 newdisp;
u8 *disp;
kernel_insn_init(&insn, dest);
insn_get_displacement(&insn);
/*
* The copied instruction uses the %rip-relative addressing
* mode. Adjust the displacement for the difference between
* the original location of this instruction and the location
* of the copy that will actually be run. The tricky bit here
* is making sure that the sign extension happens correctly in
* this calculation, since we need a signed 32-bit result to
* be sign-extended to 64 bits when it's added to the %rip
* value and yield the same 64-bit result that the sign-
* extension of the original signed 32-bit displacement would
* have given.
*/
newdisp = (u8 *) src + (s64) insn.displacement.value - (u8 *) dest;
if ((s64) (s32) newdisp != newdisp) {
pr_err("Kprobes error: new displacement does not fit into s32 (%llx)\n", newdisp);
pr_err("\tSrc: %p, Dest: %p, old disp: %x\n", src, dest, insn.displacement.value);
return 0;
}
disp = (u8 *) dest + insn_offset_displacement(&insn);
*(s32 *) disp = (s32) newdisp;
}
#endif
http://elixir.free-electrons.com/linux/v3.10/ident/__copy_instruction
A new displacement value is calculated based new instruction address (where orig insn is copied). If that value doesn't fit in 32 bit, it returns 0 which results in EINVAL error. Hence the failure.
As a workaround, we can set kprobe handler post previous instruction or pre next instruction based on need (works for me).
I am trying to make my own read system call for a 64bit linux system. but it keeps telling me I have a bad type. Is the compiler trying to indirectly address buf? I have a feeling I messed up in my input constraints. I just need the address of buf at %2.
error:
test.c: Assembler messages:
test.c:28: Error: operand type mismatch for `movq'
static int myread(int fd, char *buf, int size) {
register int bytes;
asm(
"movq $0, %%rax\n"
"movq %1, %%rdi\n"
"movq %2, %%rsi\n"
"movq %3, %%rdx\n"
"syscall\n"
"movq %%rax, %0"
: "=r" (bytes)
: "m" (fd), "m" (buf), "m" (size)
: "%rax", "%rdi", "%rsi", "%rdx"
);
return bytes;
}
As Mystical said, the mismatch error comes from the fact that you are using movq (which is for 64bit values) on 32bit integers (like fd and size).
But beyond that, this code is really inefficient and subtly (but dangerously) flawed. Maybe something more like this:
static int myread(int fd, char *buf, int size) {
register int bytes;
asm(
"syscall"
: "=a" (bytes)
: "D" (fd), "S" (buf), "d" (size), "0" (0)
: "rcx", "r11", "memory", "cc"
);
return bytes;
}
To understand this, check out the machine constraints for i386.
Note that syscalls clobber the rcx and r11 registers. Failing to advise the compiler that you are changing these values can lead to very strange problems. And the problems won't happen on the syscall, but a hundred lines downstream.
I'm also going to make a pitch for NOT using inline asm. I'm not sure why you don't want to just use the system calls, but you are just setting yourself up for grief.
in do_IRQ you can find the following code!
#ifdef CONFIG_DEBUG_STACKOVERFLOW
/* Debugging check for stack overflow: is there less than 1KB free? */
{
long esp;
__asm__ __volatile__("andl %%esp,%0" :
"=r" (esp) : "0" (THREAD_SIZE - 1));
if (unlikely(esp < (sizeof(struct thread_info) + STACK_WARN))) {
printk("do_IRQ: stack overflow: %ld\n",
esp - sizeof(struct thread_info));
dump_stack();
}
}
#endif
i did't understand the meaning of this asm assembly
asm _volatile_("andl %%esp,%0" :
"=r" (esp) : "0" (THREAD_SIZE - 1));
THREAD_SIZE - 1 means what?
I remeber the symbol in the parenthesis should be the C variable like the esp in the output part, but in the input part it looks like a integer but not a C symbol, can some noe help
The "0" constraint means: use the same constraints as the 0th operands (http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#ss6.1, and 6.1.3 Matching(Digit) constraints).
Basically, this snippet takes THREAD_SIZE - 1 as an input register, and output an anded value in the same register. This register is referenced as the esp variable in the source code.
My main aim is to get the address values of the last 16 branches maintained by the LBR registers when a program crashes. I tried two ways till now -
1) msr-tools
This allows me to read the msr values from the command line. I make system calls to it from the C program itself and try to read the values. But the register values seem no where related to the addresses in the program itself. Most probably the registers are getting polluted from the other branches in system code. I tried turning off recording of branches in ring 0 and far jumps. But that doesn't help. Still getting unrelated values.
2) accessing through kernel module
Ok I wrote a very simple module (I've never done this before) to access the msr registers directly and possibly avoid register pollution.
Here's what I have -
#define LBR 0x1d9 //IA32_DEBUGCTL MSR
//I first set this to some non 0 value using wrmsr (msr-tools)
static void __init do_rdmsr(unsigned msr, unsigned unused2)
{
uint64_t msr_value;
__asm__ __volatile__ (" rdmsr"
: "=A" (msr_value)
: "c" (msr)
);
printk(KERN_EMERG "%lu \n",msr_value);
}
static int hello_init(void)
{
printk(KERN_EMERG "Value is ");
do_rdmsr (LBR,0);
return 0;
}
static void hello_exit(void)
{
printk(KERN_EMERG "End\n");
}
module_init(hello_init);
module_exit(hello_exit);
But the problem is that every time I use dmesg to read the output I get just
Value is 0
(I have tried for other registers - it always comes as 0)
Is there something that I am forgetting here?
Any help? Thanks
Use the following:
unsigned long long x86_get_msr(int msr)
{
unsigned long msrl = 0, msrh = 0;
/* NOTE: rdmsr is always return EDX:EAX pair value */
asm volatile ("rdmsr" : "=a"(msrl), "=d"(msrh) : "c"(msr));
return ((unsigned long long)msrh << 32) | msrl;
}
You can use Ilya Matveychikov's answer... or... OR :
#include <asm/msr.h>
int err;
unsigned int msr, cpu;
unsigned long long val;
/* rdmsr without exception handling */
val = rdmsrl(msr);
/* rdmsr with exception handling */
err = rdmsrl_safe(msr, &val);
/* rdmsr on a given CPU (instead of current one) */
err = rdmsrl_safe_on_cpu(cpu, msr, &val);
And there are many more functions, such as :
int msr_set_bit(u32 msr, u8 bit)
int msr_clear_bit(u32 msr, u8 bit)
void rdmsr_on_cpus(const struct cpumask *mask, u32 msr_no, struct msr *msrs)
int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
Have a look at /lib/modules/<uname -r>/build/arch/x86/include/asm/msr.h
We could initialize a character pointer like this in C.
char *c="test";
Where c points to the first character(t).
But when I gave code like below. It gives segmentation fault.
#include<stdio.h>
#include<stdlib.h>
main()
{
int *i=0;
printf("%d",*i);
}
Also when I give
#include<stdio.h>
#include<stdlib.h>
main()
{
int *i;
i=(int *)malloc(2);
*i=0;
printf("%d",*i);
}
It worked(gave output 0).
When I gave malloc(0), it worked(gave output 0).
Please tell what is happening
Your first example is seg faulting because you are trying to de-reference a null pointer which you have created with the line:
int *i=0;
You can't de-reference a pointer that doesn't point to anything and expect good things to happen. =)
The second code segment works because you have actually assigned memory to your pointer using malloc which you may de-reference. I would think it's possible for you to get values other than zero depending on the memory adjacent to the address you're allocated with malloc. I say this because typically an int is 4 bytes and you've only assigned 2. When de-referencing the int pointer, it should return the value as an int based on the 4 bytes pointed to. In your case, the first 2 bytes being what you received from the malloc and the adjacent 2 bytes being whatever is there which could be anything and whatever it is will be treated as if it was an int. You could get strange behavior like this and you should malloc the size of memory needed for the type you are trying to use/point at.
(i.e. int *i = (int *) malloc(sizeof(int)); )
Once you have the pointer pointing at memory that is of the correct size, you can then set the values as such:
#include <stdlib.h>
#include <stdio.h>
int main (int argc, char *argv[])
{
int *i = (int *)malloc(sizeof(int));
*i = 25;
printf("i = %d\n",*i);
*i = 12;
printf("i = %d\n",*i);
return 0;
}
Edit based on comment:
A pointer points to memory, not to values. When initializing char *ptr="test"; You're not assigning the value of "test", you're assigning the memory address of where the compiler is placing "test" which is placed in your processes data segment and is read only. It you tried to modify the string "test", you program would likely seg-fault. What you need to realize about a char * is that it points at a single (i.e. the first) character in the string. When you de-reference the char *, you will see 1 character and one character only. C uses null terminated strings, and notice that you do not de-reference ptr when calling printf, you pass it the pointer itself and that points at just the first character. How this is displayed depends on the format passed to printf. When printf is passed the '%c' format, it will print the single character ptr points at, if you pass the format '%p' it will print the address that ptr points. To get the entire string, you pass '%s' as the format. What this makes printf do is to start at the pointer you passed in and read each successive byte until a null is reached. Below is some code that demonstrates these.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main (int argc, char *argv[])
{
// Initialize to data segement/read only string
char *ptr = "test";
printf("ptr points at = %p\n", ptr); // Prints the address ptr points to
printf("ptr dereferenced = %c\n", *ptr); // Prints the value at address ptr
printf("ptr value = %s\n", ptr); // Prints the string of chars pointed to by ptr
// Uncomment this to see bad behavior!
// ptr[1] = 'E'; // SEG FAULT -> Attempting to modify read-only memory
printf("--------------------\n");
// Use memory you have allocated explicitly and can modify
ptr = malloc(10);
strncpy(ptr, "foo", 10);
printf("ptr now points at = %p\n", ptr); // Prints the address ptr points to
printf("ptr dereferenced = %c\n", *ptr); // Prints the value at address ptr
printf("ptr value = %s\n", ptr); // Prints the string of chars pointed to by ptr
ptr[1] = 'F'; // Change the second char in string to F
printf("ptr value (mod) = %s\n", ptr);
return 0;
}