The following code snippet is taken from linux v2.6.11. Something similar is present in v3.8 as well.
mrs r13, cpsr
bic r13, r13, #MODE_MASK
orr r13, r13, #MODE_SVC
msr spsr_cxsf, r13 # switch to SVC_32 mode
and lr, lr, #15
ldr lr, [pc, lr, lsl #2]
movs pc, lr # Changes mode and branches
Check out the following link for the actual file:
http://lxr.linux.no/linux+v2.6.11/arch/arm/kernel/entry-armv.S
I think writing into the mode bits of CPSR can change the current ARM mode. But how writing into SPSR (instead of CPSR), has resulted in switching to SVC_32 mode?
(or) Is something happening in the last instruction "movs pc, lr". Could someone help me understand this?
A mov or sub instruction with the 'S' suffix and the program counter as its destination register means a exception return.
It copies the contents of the SPSR to the CPSR and moves the value of the source register into the program counter (in this case, the link register).
In your example, this effectively sets the mode to SVC mode and returns from the function in one go.
There's more information on this in the ARM reference manual.
I am answering the SPSR Vs CPSR question here.
CPSR is user/system mode register, and doesn't exist in other modes, like fiq or irq modes. Whereas, SPSR exists in fiq and irq modes. On a mode change CPSR is copied into SPSR and the changed mode has to use SPSR to make any changes to the current status of the processor. SPSR is not available in user mode. And any changes made to CPSR in non-user mode won't take effect.
CPSR - Current Program Status Register
SPSR - Saved Program Status Register
Related
I'm reading about linux/arch/arm/boot/compressed/head.S
I figured out about the angel boot. It's my first time about this word
#ifndef CONFIG_CPU_V7M
/*
* Booting from Angel - need to enter SVC mode and disable
* FIQs/IRQs (numeric definitions from angel arm.h source).
* We only do this if we were in user mode on entry.
*/
mrs r2, cpsr # get current mode
tst r2, #3 # not user?
bne not_angel
mov r0, #0x17 # angel_SWIreason_EnterSVC
ARM( swi 0x123456 ) # angel_SWI_ARM
THUMB( svc 0xab ) # angel_SWI_THUMB
not_angel:
safe_svcmode_maskall r0
msr spsr_cxsf, r9 # Save the CPU boot mode in
# SPSR
#endif
So i googled and read the linux documentation located in linux/Documentation/arm/Booting
There's no clear definition about angel booting in any website and linux documentation only mentioned angel as like bellow
For CPUs which do not include the ARM virtualization extensions, the
CPU must be in SVC mode. (A special exception exists for Angel)
So I want to know about clear definition about angel boot
Thank you for your answer
Refers to the content from ARM Information center, "Angel is a debug monitor that allows rapid development and debugging of applications running on ARM-based hardware."
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0066d/Babdcdih.html
It seems you can debug your software through Angel using debuggers like gdb - when your board is set up with Angel.
It offers a feature which is called "Semihosting" - a board-host input/output bridging. It is done on SWI context.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0058d/CIHDICHH.html
I'm trying to run a binary program that uses CMPXCHG16B instruction at one place, unfortunately my Athlon 64 X2 3800+ doesn't support it. Which is great, because I see it as a programming challenge. The instruction doesn't seem to be that hard to implement with a cave jump, so that's what I did, but something didn't work, program just froze in a loop. Maybe someone can tell me if I implemented my CMPXCHG16B wrong?
Firstly the actual piece of machine code that I'm trying to emulate is this:
f0 49 0f c7 08 lock cmpxchg16b OWORD PTR [r8]
Excerpt from Intel manual describing CMPXCHG16B:
Compare RDX:RAX with m128. If equal, set ZF and load RCX:RBX into m128.
Else, clear ZF and load m128 into RDX:RAX.
First I replace all 5 bytes of the instruction with a jump to code cave with my emulation procedure, luckily the jump takes up exactly 5 bytes! The jump is actually a call instruction e8, but could be a jmp e9, both work.
e8 96 fb ff ff call 0xfffffb96(-649)
This is a relative jump with a 32-bit signed offset encoded in two's complement, the offset points to a code cave relative to address of next instruction.
Next the emulation code I'm jumping to:
PUSH R10
PUSH R11
MOV r10, QWORD PTR [r8]
MOV r11, QWORD PTR [r8+8]
TEST R10, RAX
JNE ELSE
TEST R11, RDX
JNE ELSE
MOV QWORD PTR [r8], RBX
MOV QWORD PTR [r8+8], RCX
JMP END
ELSE:
MOV RAX, r10
MOV RDX, r11
END:
POP R11
POP R10
RET
Personally, I'm happy with it, and I think it matches the functional specification given in manual. It restores stack and two registers r10 and r11 to their original order and then resumes execution. Alas it does not work! That is the code works, but the program acts as if it's waiting for a tip and burning electricity. Which indicates my emulation was not perfect and I inadvertently broke it's loop. Do you see anything wrong with it?
I notice that this is an atomic variant of it—owning to the lock prefix. I'm hoping it's something else besides contention that I did wrong. Or is there a way to emulate atomicity too?
It's not possible to emulate lock cmpxchg16b. It's sort of possible if all accesses to the target address are synchronised with a separate lock, but that includes all other instructions, including non-atomic stores to either half of the object, and atomic read-modify-writes (like xchg, lock cmpxchg, lock add, lock xadd) with one half (or other part) of the 16 byte object.
You can emulate cmpxchg16b (without lock) like you've done here, with the bugfixes from #Fifoernik's answer. That's an interesting learning exercise, but not very useful in practice, because real code that uses cmpxchg16b always uses it with a lock prefix.
A non-atomic replacement will work most of the time, because it's rare for a cache-line invalidate from another core to arrive in the small time window between two nearby instructions. This doesn't mean it's safe, it just means it's really hard to debug when it does occasionally fail. If you just want to get a game working for your own use, and can accept occasional lockups / errors, this might be useful. For anything where correctness is important, you're out of luck.
What about MFENCE? Seems to be what I need.
MFENCE before, after, or between the loads and stores won't prevent another thread from seeing a half-written value ("tearing"), or from modifying the data after your code has made the decision that the compare succeeded, but before it does the store. It might narrow the window of vulnerability, but it can't close it, because MFENCE only prevents reordering of the global visibility of our own stores and loads. It can't stop a store from another core from becoming visible to us after our loads but before our stores. That requires an atomic read-modify-write bus cycle, which is what locked instructions are for.
Doing two 8-byte atomic compare-exchanges would solve the window-of-vulnerability problem, but only for each half separately, leaving the "tearing" problem.
Atomic 16B loads/stores solves the tearing problem but not the atomicity problem between loads and stores. It's possible with SSE on some hardware, but not guaranteed to be atomic by the x86 ISA the way 8B naturally-aligned loads and stores are.
Xen's lock cmpxchg16b emulation:
The Xen virtual machine has an x86 emulator, I guess for the case where a VM starts on one machine and migrates to less-capable hardware. It emulates lock cmpxchg16b by taking a global lock, because there's no other way. If there was a way to emulate it "properly", I'm sure Xen would do that.
As discussed in this mailing list thread, Xen's solution still doesn't work when the emulated version on one core is accessing the same memory as the non-emulated instruction on another core. (The native version doesn't respect the global lock).
See also this patch on the Xen mailing list that changes the lock cmpxchg8b emulation to support both lock cmpxchg8b and lock cmpxchg16b.
I also found that KVM's x86 emulator doesn't support cmpxchg16b either, according to the search results for emulate cmpxchg16b.
I think all this is good evidence that my analysis is correct, and that it's not possible to emulate it safely.
I see these things wrong with your code to emulate the cmpxchg16b instruction:
You need to use cmp in stead of test to get a correct comparison.
You need to save/restore all flags except the ZF. The manual mentions :
The CF, PF, AF, SF, and OF flags are unaffected.
The manual contains the following:
IF (64-Bit Mode and OperandSize = 64)
THEN
TEMP128 ← DEST
IF (RDX:RAX = TEMP128)
THEN
ZF ← 1;
DEST ← RCX:RBX;
ELSE
ZF ← 0;
RDX:RAX ← TEMP128;
DEST ← TEMP128;
FI;
FI
So to really write code that "matches the functional specification given in manual" a write to the m128 is required. Although this particular write is part of the locked version lock cmpxchg16b, it won't of course do any good to the atomicity of the emulation! A straightforward emulation of lock cmpxchg16b is thus not possible. See #PeterCordes' answer.
This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison
ELSE:
MOV RAX, r10
MOV RDX, r11
MOV QWORD PTR [r8], r10
MOV QWORD PTR [r8+8], r11
END:
Related assembly codes are located in boot/setup.s and I paste them below:
mov ax,#0x0001 ! protected mode (PE) bit
lmsw ax ! This is bit!
jmpi 0,8 ! jmp offset 0 of segment 8 (cs)
The first two lines have made the corresponding bit changes in CR0 control register.
So,my problem is : When instruction lmsw ax is being executed,
the ip register points to next instruction jmpi 0,8 .
More exactly , at this point , cs:ip points to the memory location of instruction
jmpi 0,8 .But after execution of instruction lmsw ax, the PE mechanism is enabled.
The cs value now represents segment selector, but the corresponding GDT description entry is not
prepared for it. the GDT only contains two valid entries located in 1 and 2 respectively.So, I
think the next instruction specified by cs:ip is not the instruction jmpi 0,8.cs:ip
now points to an invalid memory address. The above last instruction jmpi 0,8 which is used
to place the right values into cs and eip registers cannot be reached. I know I was wrong because the
Linux 0.11 is verifying by long term practice. Please help me point the mistakes that I make.Thanks very much.
The CPU doesn't look up selectors in the GDT (or LDT) every time segment register is used. It only reads the descriptor table in memory when the segment register is loaded. It then stores the information in the segment descriptor cache.The same thing happens in real mode, when a segment register is loaded with a value, that value is used to create an entry in the descriptor cache. Then whenever that segment is used, both in real and protected mode, the processor uses the values stored in the cache.
When you switch from real mode to protected mode none of the segment registers change and none of the entries in the descriptor cache change. The cache entry for the CS register remains the same as it was before, and so the CPU executes following instruction as expected. It's not until the following far jump instruction is executed that the value of the CS register changes, which then replaces the old real mode descriptor entry with a new protected mode entry.
I have a pretty weird thing I need to do: Access some "secure" instructions for things that don't really need to be done in a secure context. In short: I need to get in to Secure Mode, but not because I want Hardware TPM-ish functionality or anything. I just need access to certain instructions that I wouldn't otherwise have.
We're doing this on Gumstix Overo FireSTORM COMs. It is my understanding these boot securely, but then somewhere (MLO? u-boot?) they switch to non-secure mode, but I could be wrong. The point is that we're certainly doing this from nonsecure (but privileged, see below) mode.
(I authored this question, about direct access to the GHB/BTB of the A8 branch predictor, if you're curious about what I'm looking to do: Write directly to the global history buffer (GHB) or BTB in the branch predictor of a ARM Cortex A8?)
Now, all of this will be done from u-boot (we've got Overo FireSTORM COMs), so luckily I have "privileged" execution. No worries there. And I've looked at other StackOverflow questions, but there doesn't seem to be anything on how, exactly, to get to secure mode. All I really wanna do is access some CP15 registers, and then go back to non-secure mode (and potentially repeat the process).
I've looked into the SMC instruction, but I can't find any documentation on how to appropriately trap the call/where the call goes to/how to set that up, etc.
Is that information anywhere?
To recap, here's what I want to do:
FROM PRIVILEGED EXECUTION:
Do stuff
Tweak GHB // requires secure execution
Do more stuff
Tweak GHB
Do more stuff
...
...
...
Do stuff
Any help would CERTAINLY be appreciated!
Thanks to #artlessnoise, I found this file in the u-boot source: /u-boot/arch/arm/cpu/armv7/nonsec_virt.S.
It contains the following code:
/*
* secure monitor handler
* U-boot calls this "software interrupt" in start.S
* This is executed on a "smc" instruction, we use a "smc #0" to switch
* to non-secure state.
* We use only r0 and r1 here, due to constraints in the caller.
*/
.align 5
_secure_monitor:
mrc p15, 0, r1, c1, c1, 0 # read SCR
bic r1, r1, #0x4e # clear IRQ, FIQ, EA, nET bits
orr r1, r1, #0x31 # enable NS, AW, FW bits
#ifdef CONFIG_ARMV7_VIRT
mrc p15, 0, r0, c0, c1, 1 # read ID_PFR1
and r0, r0, #CPUID_ARM_VIRT_MASK # mask virtualization bits
cmp r0, #(1 << CPUID_ARM_VIRT_SHIFT)
orreq r1, r1, #0x100 # allow HVC instruction
#endif
mcr p15, 0, r1, c1, c1, 0 # write SCR (with NS bit set)
#ifdef CONFIG_ARMV7_VIRT
mrceq p15, 0, r0, c12, c0, 1 # get MVBAR value
mcreq p15, 4, r0, c12, c0, 0 # write HVBAR
#endif
movs pc, lr # return to non-secure SVC
Presumably if I modified the mask for the mcr p15 instruction, I could simply "turn off" the move to nonsecure mode. This will probably kill u-boot, however.
So the question is, then: How do I set the appropriate vector so when I make the SMC call, I jump back into secure mode, and am able to do my GHB/BTB tinkering?
Any other help is appreciated!
The DM3730 on the Gumstix is a GP (general purpose) device, which means it has TrustZone disabled. There's no way you can get in to it.
See https://stackoverflow.com/a/8028948/6839
Ok, I get this compile error:
Error: suffix or operands invalid for `push'
when I use this line:
pushw %es;
I know it is either the %es or w as I have been successfully porting others push, pop commands for 64 bit assembler.
%es is an existing register according to some documentaion I have found and isn't referenced differently I think.
So what could be my problem? I am extremely rusty on my asm and I think it could be the w.
Thanks for any help.
As Zimbaboa already explained, there is no segmentation in 64-bit mode.
Moreover, if you look at Intel's manuals, Instruction Set Reference, M-Z, you will see that push ES is an invalid instruction altogether in 64-bit mode (page 423):
Opcode Instruction Op/ 64-Bit Compat/ Description
En Mode Leg Mode
...
0E PUSH CS NP Invalid Valid Push CS.
16 PUSH SS NP Invalid Valid Push SS.
1E PUSH DS NP Invalid Valid Push DS.
06 PUSH ES NP Invalid Valid Push ES.
0F A0 PUSH FS NP Valid Valid Push FS.
0F A8 PUSH GS NP Valid Valid Push GS.
Is this the Pentium instruction set? If so, then yes, I think ES (capitalized) is a 16-bit segment register. The instruction is just "push %ES" according to this site: http://faydoc.tripod.com/cpu/index.htm.
Wish I could help more, but I only code in MIPS assembly.
You are using instruction PUSHW which is push word to stack. On 64bit machines wordsize is 64 and you are trying to push 16bit ES register using a wrong instruction.
Try just using push, but take care that your pop is also matching.
Edit1: Checked the processor documentation, segmentation is disable in 64bit mode of x86_64
Check section 4 of above document.
64-bit mode, segmentation is disabled, creating a flat 64-bit virtual-address space. As will be seen, certain functions of some segment registers, particularly the system-segment registers, continue to be used in 64-bit mode.
Again in section 4.5.3
DS, ES, and SS Registers in 64-Bit Mode. In 64-bit mode, the contents of the ES, DS, and SS segment registers are ignored. All fields (base, limit, and attribute) in the hidden portion of the segment registers are ignored.
So in your code just safely ignore any references to these registers.