Xbox 360, PPC function Hook crashes when i call a function in the hook. PowerPC - hook

i am facing a problem that I could not resolve so i have to turn to the community to help me out. The problem is related to PPC function hooking.
The area where i am hooking is this.
.text:8220D810 mflr r12
.text:8220D814 bl __savegprlr_20
.text:8220D818 stfd f31, var_70(r1)
.text:8220D81C stwu r1, -0x100(r1)
.text:8220D820 lis r11, off_82A9CCC0#ha // => This is where i am hooking the function
.text:8220D824 lis r22, dword_82BBAE68#ha // These 4 instructions are overwritt
.text:8220D828 lis r10, 8 # 0x87700 //Patched
.text:8220D82C mr r26, r3 //Patched
.text:8220D830 li r20, 0
.text:8220D834 lwz r9, off_82A9CCC0#l(r11)
.text:8220D838 ori r23, r10, 0x7700 # 0x87700
.text:8220D83C lwz r11, dword_82BBAE68#l(r22)
.text:8220D840 cmplwi cr6, r11, 0
.text:8220D844 stw r9, 0x100+var_7C(r1)
.text:8220D848 bne cr6, loc_8220D854
.text:8220D84C mr r30, r20
.text:8220D850 b loc_8220D85C
Here it jumps to my code cave that is mentioned below. The patched instructions are correctly written in the PredictPlayerHook function and is not the problem.
The problem here is if i call a function in the hook e.g here i call "GetCurrentCmdNumber(0);" it causes the game to crash. Now without calling any functions the game doesn't crash and the code cave works without any issues. but if I try to call any function within the code cave(PredictPlayerHook) it just crashes. I cant debug it so i dont know where it crashes.
void __declspec(naked) PredictPlayerHook(){
DWORD R11,Return,cmdNumber;
__asm lis r11, 0x82AA //patched instructions
__asm lis r22, 0x82BC //patched instructions
__asm lis r10, 0x8 //patched instructions
__asm mr r26, r3 //patched instructions
__asm mflr r0 ; //mflr grabs the link register, and stores it into the first operand. r0 is now the link register
__asm stw r0, -0x14(r1) ; //Save the link register inside the stack frame
__asm stwu r1, -0x90(r1) ;// This is pushing the stack (hence push)
// cmdNumber = GetCurrentCmdNumber(0);
__asm addi r1, r1, 0x90 ;//popping the stack frame
__asm lwz r0,-0x14(r1) ; //Reading the link register from the sack
__asm mtlr r0
__asm stw r11,R11
//Return = 0x82200230;
__asm lis r11,0x8220 //Return Address is correct. The difference is in IDA segment, it is +0xD600 ahead of the original address.
__asm ori r11,r11,0x0230
__asm mtctr r11
__asm lwz r11,R11
__asm bctr
}
Here is the function itself and its correct. I can use it in a hook placed else where in the game so it has no issues.
typedef int (__cdecl* CL_GetCurrentCmdNumber)(int localClientNum);
CL_GetCurrentCmdNumber GetCurrentCmdNumber = (CL_GetCurrentCmdNumber)0x82261F90;

Related

What's the meaning of mov 0x8(%r14,%r15,8),%rax

In here what's the meaning of 0x8(%r14,%r15,8), I know 0x8(%r14,%r15,8) is SRC, but I don't understand why use two register %r14 and %r15 in here, and I don't understand how to cal the src address.
Thanks so much for any input.
Information pulled from http://flint.cs.yale.edu/cs421/papers/x86-asm/asm.html
AT&T Addressing:
Memory Address Reference: Address_or_Offset(%base_or_offset, %Index_Register, Scale)
Final Address Calculation: Address_or_Offset + %base_or_offset + [Scale * %Index_Reg]
Example:
mov (%esi,%ebx,4), %edx /* Move the 4 bytes of data at address ESI+4*EBX into EDX. */

register_kprobe returns EINVAL (-22) error for instructions involving rip

I am trying to insert probes at different instructions with kprobes in function of kernel module.
But register_kprobe is returning EINVAL(-22) error for 0xffffffffa33c1085 instruction addresses and 0xffffffffa33c109b from below assembly code (it passes for all other instruction addresses).
Instructions giving errors:
0xffffffffa33c1085 <test_increment+5>: mov 0x21bd(%rip),%eax # 0xffffffffa33c3248
0xffffffffa33c109b <test_increment+27>: mov %esi,0x21a7(%rip) # 0xffffffffa33c3248
Observed that both these instructions use rip register. Tried with functions of other modules, observed same error with instructions which use rip register.
Why is register_kprobe failing ? does it have any constraints involving rip ? Any help is appreciated.
System has kernel 3.10.0-514 on x86_64 installed.
kprobe function:
kp = kzalloc(sizeof(struct kprobe), GFP_KERNEL);
kp->post_handler = exit_func;
kp->pre_handler = entry_func;
kp->addr = sym_addr;
atomic_set(&pcount, 0);
ret = register_kprobe(kp);
if ( ret != 0 ) {
printk(KERN_INFO "register_kprobe returned %d for %s\n", ret, str);
kfree(kp);
kp=NULL;
return ret;
}
probed function:
int race=0;
void test_increment()
{
race++;
printk(KERN_INFO "VALUE=%d\n",race);
return;
}
assembly code:
crash> dis -l test_increment
0xffffffffa33c1080 <test_increment>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa33c1085 <test_increment+5>: mov 0x21bd(%rip),%eax # 0xffffffffa33c3248
0xffffffffa33c108b <test_increment+11>: push %rbp
0xffffffffa33c108c <test_increment+12>: mov $0xffffffffa33c2024,%rdi
0xffffffffa33c1093 <test_increment+19>: mov %rsp,%rbp
0xffffffffa33c1096 <test_increment+22>: lea 0x1(%rax),%esi
0xffffffffa33c1099 <test_increment+25>: xor %eax,%eax
0xffffffffa33c109b <test_increment+27>: mov %esi,0x21a7(%rip) # 0xffffffffa33c3248
0xffffffffa33c10a1 <test_increment+33>: callq 0xffffffff81659552 <printk>
0xffffffffa33c10a6 <test_increment+38>: pop %rbp
0xffffffffa33c10a7 <test_increment+39>: retq
Thanks
Turns out, register_kprobe does have limitations with instructions invoving rip relative addressing for x86_64.
Here is snippet of __copy_instruction function code causing error (register_kprobe -> prepare_kprobe -> arch_prepare_kprobe -> arch_copy_kprobe -> __copy_instruction )
#ifdef CONFIG_X86_64
if (insn_rip_relative(&insn)) {
s64 newdisp;
u8 *disp;
kernel_insn_init(&insn, dest);
insn_get_displacement(&insn);
/*
* The copied instruction uses the %rip-relative addressing
* mode. Adjust the displacement for the difference between
* the original location of this instruction and the location
* of the copy that will actually be run. The tricky bit here
* is making sure that the sign extension happens correctly in
* this calculation, since we need a signed 32-bit result to
* be sign-extended to 64 bits when it's added to the %rip
* value and yield the same 64-bit result that the sign-
* extension of the original signed 32-bit displacement would
* have given.
*/
newdisp = (u8 *) src + (s64) insn.displacement.value - (u8 *) dest;
if ((s64) (s32) newdisp != newdisp) {
pr_err("Kprobes error: new displacement does not fit into s32 (%llx)\n", newdisp);
pr_err("\tSrc: %p, Dest: %p, old disp: %x\n", src, dest, insn.displacement.value);
return 0;
}
disp = (u8 *) dest + insn_offset_displacement(&insn);
*(s32 *) disp = (s32) newdisp;
}
#endif
http://elixir.free-electrons.com/linux/v3.10/ident/__copy_instruction
A new displacement value is calculated based new instruction address (where orig insn is copied). If that value doesn't fit in 32 bit, it returns 0 which results in EINVAL error. Hence the failure.
As a workaround, we can set kprobe handler post previous instruction or pre next instruction based on need (works for me).

Compare user-inputted string/character to another string/character

So I'm a bit of a beginner to ARM Assembly (assembly in general, too). Right now I'm writing a program and one of the biggest parts of it is that the user will need to type in a letter, and then I will compare that letter to some other pre-inputted letter to see if the user typed the same thing.
For instance, in my code I have
.balign 4 /* Forces the next data declaration to be on a 4 byte segment */
dime: .asciz "D\n"
at the top of the file and
addr_dime : .word dime
at the bottom of the file.
Also, based on what I've been reading online I put
.balign 4
inputChoice: .asciz "%d"
at the top of the file, and put
inputVal : .word 0
at the bottom of the file.
Near the middle of the file (just trust me that there is something wrong with this standalone code, and the rest of the file doesn't matter in this context) I have this block of code:
ldr r3, addr_dime
ldr r2, addr_inputChoice
cmp r2, r3 /*See if the user entered D*/
addeq r5, r5, #10 /*add 10 to the total if so*/
Which I THINK should load "D" into r3, load whatever String or character the user inputted into r2, and then add 10 to r5 if they are the same.
For some reason this doesn't work, and the r5, r5, #10 code only works if addne comes before it.
addr_dime : .word dime is poitlessly over-complicated. The address is already a link-time constant. Storing the address in memory (at another location which has its own address) doesn't help you at all, it just adds another layer of indirection. (Which is actually the source of your problem.)
Anyway, cmp doesn't dereference its register operands, so you're comparing pointers. If you single-step with a debugger, you'll see that the values in registers are pointers.
To load the single byte at dime, zero-extended into r3, do
ldrb r3, dime
Using ldr to do a 32-bit load would also get the \n byte, and a 32-bit comparison would have to match that too for eq to be true.
But this can only work if dime is close enough for a PC-relative addressing mode to fit; like most RISC machines, ARM can't use arbitrary absolute addresses because the instruction-width is fixed.
For the constant, the easiest way to avoid that is not to store it in memory in the first place. Use .equ dime, 'D' to define a numeric constant, then you can use
cmp r2, dime # compare with immediate operand
Or ldr r3, =dime to ask the assembler to get the constant into a register for you. You can do this with addresses, so you could do
ldr r2, =inputVal # r2 = &inputVal
ldrb r2, [r2] # load first byte of inputVal
This is the generic way to handle loading from static data that might be too far away for a PC-relative addressing mode.
You could avoid that by using a stack address (sub sp, #16 / mov r5, sp or something). Then you already have the address in a register.
This is exactly what a C compiler does:
char dime[4] = "D\n";
char input[4] = "xyz";
int foo(int start) {
if (dime[0] == input[0])
start += 10;
return start;
}
From ARM32 gcc6.3 on the Godbolt compiler explorer:
foo:
ldr r3, .L4 # load a pointer to the data section at dime / input
ldrb r2, [r3]
ldrb r3, [r3, #4]
cmp r2, r3
addeq r0, r0, #10
bx lr
.L4:
# gcc greated this "literal pool" next to the code
# holding a pointer it can use to access the data section,
# wherever the linker ends up putting it.
.word .LANCHOR0
.section .data
.p2align 2
### These are in a different section, near each other.
### On Godbolt, click the .text button to see full assembler directives.
.LANCHOR0: # actually defined with a .set directive, but same difference.
dime:
.ascii "D\012\000"
input:
.ascii "xyz\000"
Try changing the C to compare with a literal character instead of a global the compiler can't optimize into a constant, and see what you get.

VC++ SSE code generation - is this a compiler bug?

A very particular code sequence in VC++ generated the following instruction (for Win32):
unpcklpd xmm0,xmmword ptr [ebp-40h]
2 questions arise:
(1) As far as I understand the intel manual, unpcklpd accepts as 2nd argument a 128-aligned memory address. If the address is relative to a stack frame alignment cannot be forced. Is this really a compiler bug?
(2) Exceptions are thrown from at the execution of this instruction only when run from the debugger, and even then not always. Even attaching to the process and executing this code does not throw. How can this be??
The particular exception thrown is access violation at 0xFFFFFFFF, but AFAIK that's just a code for misalignment.
[Edit:]
Here's some source that demonstrates the bad code generation - but typically doesn't cause a crash. (that's mostly what I'm wondering about)
[Edit 2:]
The code sample now reproduces the actual crash. This one also crashes outside the debugger - I suspect the difference occurs because the debugger launches the program at different typical base addresses.
// mock.cpp
#include <stdio.h>
struct mockVect2d
{
double x, y;
mockVect2d() {}
mockVect2d(double a, double b) : x(a), y(b) {}
mockVect2d operator + (const mockVect2d& u) {
return mockVect2d(x + u.x, y + u.y);
}
};
struct MockPoly
{
MockPoly() {}
mockVect2d* m_Vrts;
double m_Area;
int m_Convex;
bool m_ParClear;
void ClearPar() { m_Area = -1.; m_Convex = 0; m_ParClear = true; }
MockPoly(int len) { m_Vrts = new mockVect2d[len]; }
mockVect2d& Vrt(int i) {
if (!m_ParClear) ClearPar();
return m_Vrts[i];
}
const mockVect2d& GetCenter() { return m_Vrts[0]; }
};
struct MockItem
{
MockItem() : Contour(1) {}
MockPoly Contour;
};
struct Mock
{
Mock() {}
MockItem m_item;
virtual int GetCount() { return 2; }
virtual mockVect2d GetCenter() { return mockVect2d(1.0, 2.0); }
virtual MockItem GetItem(int i) { return m_item; }
};
void testInner(int a)
{
int c = 8;
printf("%d", c);
Mock* pMock = new Mock;
int Flag = true;
int nlr = pMock->GetCount();
if (nlr == 0)
return;
int flr = 1;
if (flr == nlr)
return;
if (Flag)
{
if (flr < nlr && flr>0) {
int c = 8;
printf("%d", c);
MockPoly pol(2);
mockVect2d ctr = pMock->GetItem(0).Contour.GetCenter();
// The mess happens here:
// ; 74 : pol.Vrt(1) = ctr + mockVect2d(0., 1.0);
//
// call ? Vrt#MockPoly##QAEAAUmockVect2d##H#Z; MockPoly::Vrt
// movdqa xmm0, XMMWORD PTR $T4[ebp]
// unpcklpd xmm0, QWORD PTR tv190[ebp] **** crash!
// movdqu XMMWORD PTR[eax], xmm0
pol.Vrt(0) = ctr + mockVect2d(1.0, 0.);
pol.Vrt(1) = ctr + mockVect2d(0., 1.0);
}
}
}
void main()
{
testInner(2);
return;
}
If you prefer, download a ready vcxproj with all the switches set from here. This includes the complete ASM too.
Update: this is now a confirmed VC++ compiler bug, hopefully to be resolved in VS2015 RTM.
Edit: The connect report, like many others, is now garbage. However the compiler bug seems to be resolved in VS2017 - not in 2015 update 3.
Since no one else has stepped up, I'm going to take a shot.
1) If the address is relative to a stack frame alignment cannot be forced. Is this really a compiler bug?
I'm not sure it is true that you cannot force alignment for stack variables. Consider this code:
struct foo
{
char a;
int b;
unsigned long long c;
};
int wmain(int argc, wchar_t* argv[])
{
foo moo;
moo.a = 1;
moo.b = 2;
moo.c = 3;
}
Looking at the startup code for main, we see:
00E31AB0 push ebp
00E31AB1 mov ebp,esp
00E31AB3 sub esp,0DCh
00E31AB9 push ebx
00E31ABA push esi
00E31ABB push edi
00E31ABC lea edi,[ebp-0DCh]
00E31AC2 mov ecx,37h
00E31AC7 mov eax,0CCCCCCCCh
00E31ACC rep stos dword ptr es:[edi]
00E31ACE mov eax,dword ptr [___security_cookie (0E440CCh)]
00E31AD3 xor eax,ebp
00E31AD5 mov dword ptr [ebp-4],eax
Adding __declspec(align(16)) to moo gives
01291AB0 push ebx
01291AB1 mov ebx,esp
01291AB3 sub esp,8
01291AB6 and esp,0FFFFFFF0h <------------------------
01291AB9 add esp,4
01291ABC push ebp
01291ABD mov ebp,dword ptr [ebx+4]
01291AC0 mov dword ptr [esp+4],ebp
01291AC4 mov ebp,esp
01291AC6 sub esp,0E8h
01291ACC push esi
01291ACD push edi
01291ACE lea edi,[ebp-0E8h]
01291AD4 mov ecx,3Ah
01291AD9 mov eax,0CCCCCCCCh
01291ADE rep stos dword ptr es:[edi]
01291AE0 mov eax,dword ptr [___security_cookie (12A40CCh)]
01291AE5 xor eax,ebp
01291AE7 mov dword ptr [ebp-4],eax
Apparently the compiler (VS2010 compiled debug for Win32), recognizing that we will need specific alignments for the code, takes steps to ensure it can provide that.
2) Exceptions are thrown from at the execution of this instruction only when run from the debugger, and even then not always. Even attaching to the process and executing this code does not throw. How can this be??
So, a couple of thoughts:
"and even then not always" - Not standing over your shoulder when you run this, I can't say for certain. However it seems plausible that just by random chance, stacks could get created with the alignment you need. By default, x86 uses 4byte stack alignment. If you need 16 byte alignment, you've got a 1 in 4 shot.
As for the rest (from https://msdn.microsoft.com/en-us/library/aa290049%28v=vs.71%29.aspx#ia64alignment_topic4):
On the x86 architecture, the operating system does not make the alignment fault visible to the application. ...you will also suffer performance degradation on the alignment fault, but it will be significantly less severe than on the Itanium, because the hardware will make the multiple accesses of memory to retrieve the unaligned data.
TLDR: Using __declspec(align(16)) should give you the alignment you want, even for stack variables. For unaligned accesses, the OS will catch the exception and handle it for you (at a cost of performance).
Edit1: Responding to the first 2 comments below:
Based on MS's docs, you are correct about the alignment of stack parameters, but they propose a solution as well:
You cannot specify alignment for function parameters. When data that
has an alignment attribute is passed by value on the stack, its
alignment is controlled by the calling convention. If data alignment
is important in the called function, copy the parameter into correctly
aligned memory before use.
Neither your sample on Microsoft connect nor the code about produce the same code for me (I'm only on vs2010), so I can't test this. But given this code from your sample:
struct mockVect2d
{
double x, y;
mockVect2d(double a, double b) : x(a), y(b) {}
It would seem that aligning either mockVect2d or the 2 doubles might help.

ARM assembly addition

I'm trying to get modulo of addition of two numbers with ARM 32-bit processor. Well, I'm trying to make it with three unsigned long 128 bit numbers but I cant succeed. Can anyone give me an idea or basic example of it?
mov r1, #11
mov r2, #13
mov r3, #15
add r1, r1,r2
subge r1, r1, r3
ldr lr, address_of_return2
ldr lr, [lr]
bx lr
You need cmp r1,r3 between add and subge. First add, than test if greater than modulo, finally substract if greater or equal (assuming both input numbers are less than modulo).
PS: Or cmp r3,r1.... not sure by the order right now.

Resources