x64 pop instructions (opcode + rd) - 64-bit

Here are the pop instructions that use the shortcut opcodes on page 1159 of the intel x64 manual:
58+ rw POP r16 Pop top of stack into r16; increment stack
pointer.
58+ rd POP r64 Pop top of stack into r64; increment stack
pointer.
Do these instructions use Rex.R or Rex.B to encode registers 9-16 or are they just added to the opcode? Also does the 64-bit version use Rex.W? I've just never run into these register shortcut instructions before.

Instructions that encode a register operand as part of the opcode use the REX.B field to access registers r8 and so on.
64bit pushes and pops do not need a REX.W, they are 64bit by default and there is no way to make them 32bit. They can be made 16bit by using the 66h prefix.

Related

Understanding ASM. Why does this work in Windows?

Me and a couple of friends are fiddling with a very strange issue. We encountered a Crash in our application inside of a small assembler portion (used to speed up the process). The error was caused by fiddling with the stackpointer and not resetting it at the end, it looked like this:
push ebp
mov ebp, esp
; do stuff here including sub and add on esp
pop ebp
When correctly it should be written as:
push ebp
mov ebp, esp
; do stuff here including sub and add on esp
mov esp,ebp
pop ebp
Now what our mindbreak is: Why does this work in Windows? We found the error as we ported the application to Linux, where we encountered the crash. Neither in Windows or Android (using the NDK) we encountered any issues and would never have found this error. Is there any Stackpointer recovery? Is there a protection against misusing the stackpointer?
the ebp esp usage, is called a stack frame, and its purpose is to allocate variables on the stack, and afterward have a quick way to restore the stack back before the ret instruction. All new versions of x86 CPU can compress these instructions together using enter / leave instructions instead.
esp is the actual stack pointer used by the CPU when doing push/pop/call/ret.
ebp is a user-manipulated base pointer, more or less all compilers use this as a stack-pointer for local storage.
If the mov esp, ebp instruction is missing, the stack will misbehave if esp != ebp when the CPU reaches pop ebp, but only then.
it seems the compiler takes care of your stack in windows:
The only way I can imagine is:
Microsoft Visual C takes special care of functions that are B{__stdcall}. Since the number of parameters is known at compile time, the compiler encodes the parameter byte count in the symbol name itself.
The __stdcall convention is mainly used by the Windows API, and it's a bit more compact than __cdecl. The main difference is that any given function has a hard-coded set of parameters, and this cannot vary from call to call like it can in C (no "variadic functions").
see:
http://unixwiz.net/techtips/win32-callconv-asm.html
and:
https://en.wikipedia.org/wiki/X86_calling_conventions

Switch from 32bit mode to 64 bit (long mode) on 64bit linux

My program is in 32bit mode running on x86_64 CPU (64bit OS, ubuntu 8.04). Is it possible to switch to 64bit mode (long mode) in user mode temporarily? If so, how?
Background story: I'm writing a library linked with 32bit mode program, so it must be 32bit mode at start. However, I'd like to use faster x86_64 intructions for better performance. So I want to switch to 64bit mode do some pure computation (no OS interaction; no need 64bit addressing) and come back to 32bit before returning to caller.
I found there are some related but different questions. For example,
run 32 bit code in 64 bit program
run 64 bit code in 32 bit OS
My question is "run 64 bit code in 32 bit program, 64 bit OS"
Contrary to the other answers, I assert that in principle the short answer is YES. This is likely not supported officially in any way, but it appears to work. At the end of this answer I present a demo.
On Linux-x86_64, a 32 bit (and X32 too, according to GDB sources) process gets CS register equal to 0x23 — a selector of 32-bit ring 3 code segment defined in GDT (its base is 0). And 64 bit processes get another selector: 0x33 — a selector of long mode (i.e. 64 bit) ring 3 code segment (bases for ES, CS, SS, DS are treated unconditionally as zeros in 64 bit mode). Thus if we do far jump, far call or something similar with target segment selector of 0x33, we'll load the corresponding descriptor to the shadow part of CS and will end up in a 64 bit segment.
The demo at the bottom of this answer uses jmp far instruction to jump to 64 bit code. Note that I've chosen a special constant to load into rax, so that for 32 bit code that instruction looks like
dec eax
mov eax, 0xfafafafa
ud2
cli ; these two are unnecessary, but leaving them here for fun :)
hlt
This must fail if we execute it having 32 bit descriptor in CS shadow part (will raise SIGILL on ud2 instruction).
Now here's the demo (compile it with fasm).
format ELF executable
segment readable executable
SYS_EXIT_32BIT=1
SYS_EXIT_64BIT=60
SYS_WRITE=4
STDERR=2
entry $
mov ax,cs
cmp ax,0x23 ; 32 bit process on 64 bit kernel has this selector in CS
jne kernelIs32Bit
jmp 0x33:start64 ; switch to 64-bit segment
start64:
use64
mov rax, qword 0xf4fa0b0ffafafafa ; would crash inside this if executed as 32 bit code
xor rdi,rdi
mov eax, SYS_EXIT_64BIT
syscall
ud2
use32
kernelIs32Bit:
mov edx, msgLen
mov ecx, msg
mov ebx, STDERR
mov eax, SYS_WRITE
int 0x80
dec ebx
mov eax, SYS_EXIT_32BIT
int 0x80
msg:
db "Kernel appears to be 32 bit, can't jump to long mode segment",10
msgLen = $-msg
The answer is NO. Just because you are running 64bit code (presumably 64bit length datatypes, eg. variables, etc.) you are not running in 64bit mode on a 32 bit box. Compilers have workarounds to provide 64bit data types on 32 bit machines. For example gcc has unsigned long long and uin64_t that are 8 bit datatypes on both x86 and x86_64 machines. Datatypes are portable between x86 & x86_64 for that reason. That does not mean you get 64bit address space on a 32bit box. It means the compiler can handle 64bit datatypes. You will run into instances where you cannot run some 64bit code on 32 bit boxes. In that case, you will need preprocessor instructions to compile the correct 64bit code on x86_64 and the correct 32bit code on x86. A simple example is where different datatypes are explicitly required. In that case you can provide a preprocessor check to determine if the host computer is 64bit or 32bit with:
#if defined(__LP64__) || defined(_LP64)
# define BUILD_64 1
#endif
You can then provide conditionals to compile the correct code with the following:
#ifdef BUILD_64
printf (" x : %ld, hex: %lx,\nfmtbinstr_64 (d, 4, \"-\"): %s\n",
d, d, fmtbinstr_64 (d, 4, "-"));
#else
printf (" x : %lld, hex: %llx,\nfmtbinstr_64 (d, 4, \"-\"): %s\n",
d, d, fmtbinstr_64 (d, 4, "-"));
#endif
Hopefully this provides a starting point for you to work with. If you have more specific questions, please post more details.

Linux 32 bit disassembly has call instructions to next byte

I'm creating a driver for 32 and 64 bit Linux OS. One of the requirements is that all of the code needs to be self contained with no call outs. On 64-bit I've no issues, but on 32-bit GCC seems to add a call instruction to the next byte. After searching a bit I found this link:
http://forum.soft32.com/linux/Strange-problem-disassembling-shared-lib-ftopict439936.html
Is there a way to disable this on 32-bit Linux?
Example:
32 bit disassembly:
<testfunc>:
0: push %ebp
1: mov %esp, %ebp
3: call 4 <test_func+0x4>
<...some operation on ebx as mentioned in the link above>
64 bit disassebmly:
<testfunc>:
0: push %rbp
1: mov %rsp, %rbp
3: <...no call here>
There is no call in the "testfunc" at all. Even then why is 32-bit compiler adding these "call" instructions? Any help is appreciated.
What you're seeing in 32-bit disassembly may be a way to make the code position-independent. Remember that call pushes onto the stack the return address, which is equal eip+constant? In 64-bit mode there is rip-relative addressing. In 32-bit there isn't. So this call may be simulate that instruction-pointer-relative addressing.
This call instruction to the next byte is coming from function profiling for "gprof" tool. I was able to get rid of these "call" instruction by removing the "-pg" option from compilation.
Since it was a driver, this was being picked up from Linux kernel config - CONFIG_FUNCTION_TRACER.

64-bit windows VMware detection

I am trying to develop an application which detects if program is running inside a virtual machine.
For 32-bit Windows, there are already methods explained in the following link:
http://www.codeproject.com/Articles/9823/Detect-if-your-program-is-running-inside-a-Virtual
I am trying to adapt the code regarding Virtual PC and VMware detection in an 64-bit Windows operating system. For VMware, the code can detect successfully in an Windows XP 64-bit OS. But the program crashes when I run it in a native system (Windows 7 64-bit OS).
I put the code in an .asm file and define custom build step with ml64.exe file. The asm code for 64-bit Windows is:
IsInsideVM proc
push rdx
push rcx
push rbx
mov rax, 'VMXh'
mov rbx, 0 ; any value but not the MAGIC VALUE
mov rcx, 10 ; get VMWare version
mov rdx, 'VX' ; port number
in rax, dx ; read port
; on return EAX returns the VERSION
cmp rbx, 'VMXh'; is it a reply from VMWare?
setz al ; set return value
movzx rax,al
pop rbx
pop rcx
pop rdx
ret
IsInsideVM endp
I call this part in a cpp file like:
__try
{
returnValue = IsInsideVM();
}
__except(1)
{
returnValue = false;
}
Thanks in advance.
The old red pill from Joanna may work: random backup page of invisiblethings.org blog:
Swallowing the Red Pill is more or less equivalent to the following code (returns non zero when in Matrix):
int swallow_redpill () {
unsigned char m[2+4], rpill[] = "\x0f\x01\x0d\x00\x00\x00\x00\xc3";
*((unsigned*)&rpill[3]) = (unsigned)m;
((void(*)())&rpill)();
return (m[5]>0xd0) ? 1 : 0;
}
The heart of this code is actually the SIDT instruction (encoded as 0F010D[addr]), which stores the contents of the interrupt descriptor table register (IDTR) in the destination operand, which is actually a memory location. What is special and interesting about SIDT instruction is that, it can be executed in non privileged mode (ring3) but it returns the contents of the sensitive register, used internally by operating system.
Because there is only one IDTR register, but there are at least two OS running concurrently (i.e. the host and the guest OS), VMM needs to relocate the guest's IDTR in a safe place, so that it will not conflict with a host's one. Unfortunately, VMM cannot know if (and when) the process running in guest OS executes SIDT instruction, since it is not privileged (and it doesn't generate exception). Thus the process gets the relocated address of IDT table. It was observed that on VMWare, the relocated address of IDT is at address 0xffXXXXXX, whereas on Virtual PC it is 0xe8XXXXXX. This was tested on VMWare Workstation 4 and Virtual PC 2004, both running on Windows XP host OS.
Note: I haven't tested it myself but look that it uses an unprivileged approach. If it does not work at first for x64, some tweaking may help.
Also, just found out a question with content that may help you: Detecting VMM on linux
My guess is that your function corrups registers.
Running on real hardware (non-VM) should probably trigger exception at "in rax, dx". If this happens then control is passed to your exception handler, which sets result, but does not restore registers. This behaviour will be fully unexpected by caller. For example, it can save something into EBX/RBX register, then call your asm code, your asm code does "mov RBX, 0", it executes, catches exception, sets result, returns - and then caller suddently realizes that his saved data isn't in EBX/RBX anymore! If there was some pointer stored in EBX/RBX - you're going to crash hard. Anything can happen.
Surely, your asm code saves/restores registers, but this happens only when no exception is raised. I.e. if your code is running on VM. Then your code does its normal execution path, no exceptions are raised, registers will be restored normally. But if there is the exception - your POPs will be skipped, because execution will be passed to exception handler.
The correct code should probably do PUSH/POPs outside of try/except block, not inside.

Help with %es register for x86 to x86_64 assembly code port

Ok, I get this compile error:
Error: suffix or operands invalid for `push'
when I use this line:
pushw %es;
I know it is either the %es or w as I have been successfully porting others push, pop commands for 64 bit assembler.
%es is an existing register according to some documentaion I have found and isn't referenced differently I think.
So what could be my problem? I am extremely rusty on my asm and I think it could be the w.
Thanks for any help.
As Zimbaboa already explained, there is no segmentation in 64-bit mode.
Moreover, if you look at Intel's manuals, Instruction Set Reference, M-Z, you will see that push ES is an invalid instruction altogether in 64-bit mode (page 423):
Opcode Instruction Op/ 64-Bit Compat/ Description
En Mode Leg Mode
...
0E PUSH CS NP Invalid Valid Push CS.
16 PUSH SS NP Invalid Valid Push SS.
1E PUSH DS NP Invalid Valid Push DS.
06 PUSH ES NP Invalid Valid Push ES.
0F A0 PUSH FS NP Valid Valid Push FS.
0F A8 PUSH GS NP Valid Valid Push GS.
Is this the Pentium instruction set? If so, then yes, I think ES (capitalized) is a 16-bit segment register. The instruction is just "push %ES" according to this site: http://faydoc.tripod.com/cpu/index.htm.
Wish I could help more, but I only code in MIPS assembly.
You are using instruction PUSHW which is push word to stack. On 64bit machines wordsize is 64 and you are trying to push 16bit ES register using a wrong instruction.
Try just using push, but take care that your pop is also matching.
Edit1: Checked the processor documentation, segmentation is disable in 64bit mode of x86_64
Check section 4 of above document.
64-bit mode, segmentation is disabled, creating a flat 64-bit virtual-address space. As will be seen, certain functions of some segment registers, particularly the system-segment registers, continue to be used in 64-bit mode.
Again in section 4.5.3
DS, ES, and SS Registers in 64-Bit Mode. In 64-bit mode, the contents of the ES, DS, and SS segment registers are ignored. All fields (base, limit, and attribute) in the hidden portion of the segment registers are ignored.
So in your code just safely ignore any references to these registers.

Resources