Ternary in if statement - ternary-operator

Have looked at this question: To ternary or not to ternary?, however it didn't really provide me with an answer to this situation.
I've been using ternary in if statements, as it, according to me, provides more logic and requires less operations to actually work. An example of such statement is: if (a == 'foo' ? b != 'poo' : true), which of course can be replaced by if ((a == 'foo' && b != 'poo') || a != 'foo'). If comparing the number of operations - between 1 and 2 for ternary and 2 and 3 for non (of course this is a trivial example - I can as well create an example where the difference is greater than 1), it's also a cleaner than the non-ternary (at least for me); however my co-workers have been asking why have I adopted such convention.
Are there any caveats of using ternaries within conditionals (apart for readability - I'm still considering a ? b ? c ? d : e : f : g inhumane thing to parse)?

Using ternaries in an if statement is a bad idea, if for no other reason than it's unexpected. It's also logically more difficult to follow - I mentally converted your ternary to a conditional, as it makes more sense thinking about it that way. If I think of the conditions, I'm going to think in terms of "and" and "or", not "if ... then ... else," and I think most devs will agree with me on that.
Finally, look at the disassembly. Using clang on OSX, they are identical except the ternary has one more jmp. Your "fewer operations" argument is wrong.
Ternary
_ternary: ## #ternary
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl %edx, -12(%rbp)
cmpl $102, -4(%rbp)
jne LBB0_2
## BB#1:
cmpl $111, -8(%rbp)
je LBB0_3
jmp LBB0_4
LBB0_2:
cmpl $111, -12(%rbp)
jne LBB0_4
LBB0_3:
leaq L_.str(%rip), %rdi
callq _puts
movl %eax, -16(%rbp) ## 4-byte Spill
LBB0_4:
movl $0, %eax
addq $16, %rsp
popq %rbp
ret
.cfi_endproc
Logical
_logical: ## #logical
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp7:
.cfi_def_cfa_offset 16
Ltmp8:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp9:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl %edx, -12(%rbp)
cmpl $102, -4(%rbp)
jne LBB1_2
## BB#1:
cmpl $111, -8(%rbp)
je LBB1_3
LBB1_2:
cmpl $111, -12(%rbp)
jne LBB1_4
LBB1_3:
leaq L_.str1(%rip), %rdi
callq _puts
movl %eax, -16(%rbp) ## 4-byte Spill
LBB1_4:
movl $0, %eax
addq $16, %rsp
popq %rbp
ret
.cfi_endproc
Diff, excluding naming (< = ternary)
20,22c20,21
< jmp LBB0_4
---

Related

unable to print characters in assembly [duplicate]

This question already has an answer here:
Using interrupt 0x80 on 64-bit Linux [duplicate]
(1 answer)
Closed 1 year ago.
I am trying to print the character h in assembly, but it is not outputting anything right now. I see no reason, nor can I understand why this is not working.
I would believe that it is because I am using %rbp instead of %eax but I am reasonably new to assembly, and I do not know whether writing to the %rbp register instead of %eax makes a difference.
.section .text
.global _start
_start:
mov %eax, %edi
call main
movl $1, %eax
int $0x80
main:
pushq %rbp
movq %rsp, %rbp
movl $4, %eax
movl $1, %ebx
push $0x068
movl $5, %edx
movq %rbp, %rsp
syscall
popq %rbp
ret
The code is compiled with
> as $(BIN_DIR)/assembly.asm -o $(BIN_DIR)/a.o
> ld $(BIN_DIR)/a.o -o $(BIN_DIR)/a
I looked up the structure in e.g. Free Pascal sources which somewhat illustrates how parameters are allocated and how success is determined.
movq sysnr, %rax { Syscall number -> rax. }
// for calls that have less parameters, just skip the relevant lines that load it
movq param1, %rdi { shift arg1 - arg5. }
movq param2, %rsi
movq param3, %rdx
movq param4, %r10
movq param5, %r8
movq param6, %r9
syscall { Do the system call. }
cmpq $-4095, %rax { Check %rax for error. }
jnae .LSyscOK { Jump to error handler if error. }
negq %rax
movq %rax,%rdi
call seterrno // call some function to set errno threadvar
movq $-1,%rax
.LSyscOK: // end of procedure

Assembly x86 64 Linux AT&T: print routine segmentation error

I am new to assembly and am aware that my assembly code may not be efficient or could be better. The comments of the assembly may be messed up a little due to constant changes. The goal is to print each character of the string individually and when comes across with a format identifier like %s, it prints a string from one of the parameters in place of %s.
So for example:
String: Hello, %s
Parameter (RSI): Foo
Output: Hello, Foo
So the code does what it suppose to do but give segmentation error at the end.
.bss
char: .byte 0
.text
.data
text1: .asciz "%s!\n"
text2: .asciz "My name is %s. I think I’ll get a %u for my exam. What does %r do? And %%?\n"
word1: .asciz "Piet"
.global main
main:
pushq %rbp # push the base pointer (and align the stack)
movq %rsp, %rbp # copy stack pointer value to base pointer
movq $text1, %rdi
movq $word1, %rsi
movq $word1, %rdx
movq $word1, %rcx
movq $word1, %r8
movq $word1, %r9
call myPrint
end:
movq %rbp, %rsp # clear local variables from stack
popq %rbp # restore base pointer location
movq $60, %rax
movq $0, %rdi
syscall
myPrint:
pushq %rbp
movq %rsp, %rbp
pushq %rsi
pushq %rdx
pushq %rcx
pushq %r8
pushq %r9
movq %rdi, %r12
regPush:
movq $0, %rbx
#rbx: counter
printLooper:
movb (%r12), %r14b #Get a byte of r12 to r14
cmpb $0, %r14b #Check if r14 is a null byte
je endPrint #If it is a null byte then go to 'endPrint'
cmpb $37, %r14b
je formatter
incq %r12 #Increment r12 to the next byte
skip:
mov $char, %r15 #Move char address to r15
mov %r14b, (%r15) #Move r14 byte into the value of r15
mov $char, %rcx #Move char address into rcx
movq $1, %r13 #For the number of byte
printer:
movq $0, %rsi #Clearing rsi
mov %rcx, %rsi #Move the address to rsi
movq $1, %rax #Sys write
movq $1, %rdi #Output
movq %r13, %rdx #Number of byte to rdx
syscall
jmp printLooper
formatter:
incq %r12 #Moving to char after "%"
movb (%r12), %r14b #Moving the char byte into r14
cmpb $115, %r14b #Compare 's' with r14
je formatString #If it is equal to 's' then jump to 'formatString'
movb -1(%r12), %r14b #Put back the previous char into r14
jmp skip
####String Formatter Start ##################################################
formatString:
addq $1, %rbx
movq $8, %rax
mulq %rbx
subq %rax, %rbp
movq (%rbp), %r15
pushq %r15 ### into the stack
movq $0, %r13 ### Byte counter
formatStringLoop:
movb (%r15), %r14b #Move char into r14
cmpb $0, %r14b #Compare r14 with null byte
je formatStringEnd #If it is equal, go to 'formatStringEnd'
incq %r15 #Increment to next char
addq $1, %r13 #Add 1 to the byte counter
jmp formatStringLoop#Loop again
formatStringEnd:
popq %rcx #Pop the address into rcx
incq %r12 #Moving r12 to next char
jmp printer
#######String Formatter End #############################################
endPrint:
movq %rbp, %rsp
popq %rbp
ret
In formatString you modify %rbp with subq %rax, %rbp, forgetting that you will restore %rsp from it. So when you mov %rbp, %rsp just before the function returns, you end up with %rsp pointing somewhere else, and so you get the wrong return address.
I guess you are subtracting some offset from %rbp to get some space on the stack. This seems unsafe because you've pushed lots of other stuff there. It is safe to use up to 128 bytes below the stack pointer as this is the red zone, but it would be more natural to use an offset from %rsp instead. Using SIB addressing you can access data at constant or variable offsets to %rsp without actually changing its value.
How I found this with gdb: by setting breakpoints at myPrint and endPrint, I found that %rsp was different at the ret than it was on entry. Its value could only have come from %rbp, so I did watch $rbp to have the debugger break when %rbp changed, and it pointed straight to the offending instruction in formatString. (Which I could also have found by searching the source code for %rbp.)
Also, your .text at the top of the file is misplaced, so all your code gets placed in the .data section. This actually works but it surely is not what you intended.

How do canary words allow gcc to detect buffer overflows?

I could test using strncpy() with larger source string then the destination:
int main() {
char *ptr = malloc(12);
strcpy(ptr,"hello world!");
return 0;
}
Compiling with the flag -fstack-protector and using the -S option I got:
.file "malloc.c"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movq $0, -16(%rbp)
movl $12, %edi
call malloc
movq %rax, -16(%rbp)
movq -16(%rbp), %rax
movabsq $8022916924116329800, %rdx
movq %rdx, (%rax)
movl $560229490, 8(%rax)
movb $0, 12(%rax)
movl $0, %eax
movq -8(%rbp), %rcx
xorq %fs:40, %rcx
je .L3
call __stack_chk_fail
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
Could someone explain to me how this works? And why isn't the "canary word" also overwritten by the \0 of the hello world! string?
Could someone explain to me how does this work ?
Canary word is read from fs:40 and store at top of frame here:
movq %fs:40, %rax
movq %rax, -8(%rbp)
It's below the return address so if your code happens to overflow the buffer (which will be below -8(%rbp)), it'll first overwrite the -8(%rbp) location. This will be detected by GCC prior to issuing ret here:
movq -8(%rbp), %rcx
xorq %fs:40, %rcx ; Checks that %fs:40 == -8(%rbp)
je .L3 ; Ok, return
call __stack_chk_fail ; Die
as overwritten contents of -8(%rbp) will likely to be different from proper value (installed from fs:40).
And why is not the canary word also overwritten by the \0 of the hello world!?
Your code has heap overflow, not buffer overflow so SSP can't help...

Swift string via string literal vs initializer

In other languages such as Java, under the hood there is actually a difference between string obtained via string literal vs initializer. In Swift, are they equivalent under the hood?
e.g.
var string:String = ""
var string:String = String()
Refer to this SO post for info on differences between literal and object in Java.
The declarations are equivalent according to the Apple docs:
Initializing an Empty String
To create an empty String value as the starting point for building a longer string, either assign an empty string literal to a variable, or initialize a new String instance with initializer syntax:
var emptyString = "" // empty string literal
var anotherEmptyString = String() // initializer syntax
// these two strings are both empty, and are equivalent to each other
Reference: https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html
If we look at the assembly, we will see that the two constructors use identical instructions.
string.swift:
let str = String()
let str2 = ""
Compiled assembly (swiftc -emit-assembly string.swift):
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 14, 3
.globl _main
.align 4, 0x90
_main:
.cfi_startproc
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movq _globalinit_33_1BDF70FFC18749BAB495A73B459ED2F0_token4#GOTPCREL(%rip), %rax
movq _globalinit_33_1BDF70FFC18749BAB495A73B459ED2F0_func4#GOTPCREL(%rip), %rcx
xorl %edx, %edx
movl %edi, -4(%rbp)
movq %rax, %rdi
movq %rsi, -16(%rbp)
movq %rcx, %rsi
callq _swift_once
movq _globalinit_33_1BDF70FFC18749BAB495A73B459ED2F0_token5#GOTPCREL(%rip), %rdi
movq _globalinit_33_1BDF70FFC18749BAB495A73B459ED2F0_func5#GOTPCREL(%rip), %rax
xorl %r8d, %r8d
movl %r8d, %edx
movq __TZvOSs7Process5_argcVSs5Int32#GOTPCREL(%rip), %rcx
movl -4(%rbp), %r8d
movl %r8d, (%rcx)
movq %rax, %rsi
callq _swift_once
movq __TZvOSs7Process11_unsafeArgvGVSs20UnsafeMutablePointerGS0_VSs4Int8__#GOTPCREL(%rip), %rax
movq -16(%rbp), %rcx
movq %rcx, (%rax)
callq __TFSSCfMSSFT_SS
leaq L___unnamed_1(%rip), %rdi
xorl %r8d, %r8d
movl %r8d, %esi
movl $1, %r8d
movq %rax, __Tv6string3strSS(%rip)
movq %rdx, __Tv6string3strSS+8(%rip)
movq %rcx, __Tv6string3strSS+16(%rip)
movl %r8d, %edx
callq __TFSSCfMSSFT21_builtinStringLiteralBp8byteSizeBw7isASCIIBi1__SS
xorl %r8d, %r8d
movq %rax, __Tv6string4str2SS(%rip)
movq %rdx, __Tv6string4str2SS+8(%rip)
movq %rcx, __Tv6string4str2SS+16(%rip)
movl %r8d, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
.globl __Tv6string3strSS
.zerofill __DATA,__common,__Tv6string3strSS,24,3
.globl __Tv6string4str2SS
.zerofill __DATA,__common,__Tv6string4str2SS,24,3
.section __TEXT,__cstring,cstring_literals
L___unnamed_1:
.space 1
.no_dead_strip __Tv6string3strSS
.no_dead_strip __Tv6string4str2SS
.linker_option "-lswiftCore"
.section __DATA,__objc_imageinfo,regular,no_dead_strip
L_OBJC_IMAGE_INFO:
.long 0
.long 512
.subsections_via_symbols
Notice that the declarations for str and str2 have identical instructions:
xorl %r8d, %r8d
movl %r8d, %esi
movl $1, %r8d
movq %rax, __Tv6string3strSS(%rip)
movq %rdx, __Tv6string3strSS+8(%rip)
movq %rcx, __Tv6string3strSS+16(%rip)
movl %r8d, %edx
# ...
xorl %r8d, %r8d
movq %rax, __Tv6string4str2SS(%rip)
movq %rdx, __Tv6string4str2SS+8(%rip)
movq %rcx, __Tv6string4str2SS+16(%rip)
movl %r8d, %eax
You can learn more about String literals by reviewing the Apple's documentation.

Assembly - Why strtol clobbers %rcx register?

Context :
Linux 64.
GCC 4.8.2 (with -O3 -march=native)
The x86_64 abi under my left hand, opened at page 21.
The C code :
int main (int argc, char ** argv) {
printf("%d %s\n", atoi(argv[2]),argv[1] );
}
The assembly code :
(notice that the compiler replaced atoi with strtol by itself)
...
movl $10, %edx
movq 16(%rsi), %rdi
movq 8(%rsi), %rbx
xorl %esi, %esi
call strtol
movl $.LC0, %edi
movq %rbx, %rdx
movl %eax, %esi
xorl %eax, %eax
call printf
xorl %eax, %eax
popq %rbx
...
The question :
%rcx should be reserved for the 4th input integer argument.
strtol has 3 input args (respectively registers %rdi, %rsi, %rdx) and one return, %eax.
Why then is %rcx clobbered ?
This code won't make it :
...
movl $10, %edx
movq 16(%rsi), %rdi
movq 8(%rsi), %rcx <-- look I replaced with %ecx
xorl %esi, %esi
call strtol
movl $.LC0, %edi
movq %rcx, %rdx <-- look I replaced with %ecx
movl %eax, %esi
xorl %eax, %eax
call printf
xorl %eax, %eax
popq %rbx
...
Thanks
In each calling convention I know there are some registers that may be modified by the called function and some which must not be modified.
In 32-bit programs ecx may be modified while ebx must not be modified - or, to be more exact - must be re-stored before returning. For 64-bit programs this rule seems to be the same.
Indeed most functions modify most registers; for this reason there is a "popq %rbx" at the end of the code you posted because rbx must not be modified by the function. rcx may be modified and strtol obviously does that!

Resources