Segmentation fault when trying to adapt "Smashing The Stack For Fun And Profit" - linux

I am following the classic paper Smashing The Stack For Fun And Profit along side "Smashing the Stack in 2011". Despite all the Q/As about these papers I cannot find an answer to my problem.
I am trying to run a simple exit(0) command but with a call and jmp similar to shellcodeasm.c in "Smashing The Stack For Fun And Profit" so I can follow the paper to the end (I managed to get this to work when I removed the call and jmp). Clearly my following shellcodeasm.c doesn't open a shell but I am keeping to the names in "Smashing The Stack For Fun And Profit" so my process is easier to follow.
shellcodeasm.c
void main() {
__asm__("jmp 0xd \n \
popl %esi \n \
movl $0x1,%eax \n \
movl $0x0, %ebx \n \
int $0x80 \n \
call -0x12 \n \
.string \"/bin/sh\" ");
}
Running gcc -o shellcodeasm -g -ggdb shellcodeasm.c and using gdb to get the hex from main+3 to the end of main (as in the paper) I can generate my testsc.c
testsc.c
char shellcode[] =
"\xe9\x29\x7c\xfb\xf7\x5e\xb8\x01\x00\x00\x00\xbb\x00"
"\x00\x00\x00\xcd\x80\xe8\xf8\x7b\xfb\xf7\x2f\x62\x69"
"\x6e\x2f\x73\x68\x00\x5d\xc3";
void main() {
int *ret;
ret = (int *)&ret + 2;
(*ret) = (int)shellcode;
}
I can then compile and run it using the techniques in "Smashing the Stack in 2011"
gcc -o testsc testsc.c -fno-stack-protector
execstack -s testsc
./testsc
But unfortunately I get a segmentation fault (as there are no buffer overflows here I guess -fno-stack-protector is not necessary but it doesn't work when I remove it either).
Does anyone know what I am not understanding/missing?
The output of uname -a is Linux core 3.2.0-4-686-pae #1 SMP Debian 3.2.73-2+deb7u3 i686 GNU/Linux and the output of gcc -v is gcc version 4.7.2 (Debian 4.7.2-5). I hope I have given all the relevant info.

Related

Unclear output by riscv objdump -d

Now I am trying to understand the RISC-V ISA but I have an unclear point about the machine code and assembly.
I have written a C code like this:
int main() {
return 42;
}
Then, I produced the .s file by this command:
$ /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc -S 42.c
The output was:
.file "42.c"
.option nopic
.text
.align 1
.globl main
.type main, #function
main:
addi sp,sp,-16
sd s0,8(sp)
addi s0,sp,16
li a5,42
mv a0,a5
ld s0,8(sp)
addi sp,sp,16
jr ra
.size main, .-main
.ident "GCC: (g5964b5cd727) 11.1.0"
.section .note.GNU-stack,"",#progbits
Now, I run following command to produce an elf.
$ /opt/riscv/bin/riscv64-unknown-linux-gnu-gcc -nostdlib -o 42 42.s
So, a binary file is produced. I tried to read that by objdump like this:
$ /opt/riscv/bin/riscv64-unknown-linux-gnu-objdump -d 42
So the output was like this:
42: file format elf64-littleriscv
Disassembly of section .text:
00000000000100b0 <main>:
100b0: 1141 addi sp,sp,-16
100b2: e422 sd s0,8(sp)
100b4: 0800 addi s0,sp,16
100b6: 02a00793 li a5,42
100ba: 853e mv a0,a5
100bc: 6422 ld s0,8(sp)
100be: 0141 addi sp,sp,16
100c0: 8082 ret
What I don't understand is the meaning of the machine code in objdump output.
For example, the first instruction addi is translated into .....0010011 according to this page, (while this is not an official spec). However, the dumped hex is 1141. 1141 can only represent 2 bytes, but the instruction should be 32-bit, 4bytes.
I guess I am missing some points, but how should I read the output of objdump for riscv?
You can tell objdump to show compressed (16-bit) instructions by using -M no-aliases in this way
riscv64-unknown-elf-objdump -d -M no-aliases
In that case, instructions starting with c. are compressed ones.
Unfortunately that will also disable some other aliases, making the asm less nice to read if you're used to them. You can just look at the number of bytes (2 vs. 4) in the hexdump to see if it's a compressed instruction or not.

override return address of main in c

i'm trying to execute a buffer overflow attack on a program written in c, i'm using GNU/Linux (Ubuntu 16.04 LTS).
this is the source code:
#include<stdio.h>
void CALLME(){
puts("successful!");
}
int main(void){
char s[16];
scanf("%s",s);
}
what i want to do is override the return address of main so that after main function, the function CALLME will be executed.
i compile the program with
gcc -m32 -fno-stack-protector -o prog prog.c
use command:
nm prog | grep CALLME
i got the address of CALLME: 0804845b
disassemble main in gdb i found that: during main function, the return address is located at 8(%ebp) and the address of string s is at -0x18(%ebp). So the difference is 0x8 + 0x18 = 32
i try to exploit:
perl -e 'print "a" x 32 . "\x5b\x84\x04\x08"' | ./main
it didn't work.
Segmentation fault (core dumped)
Why ? Is main function more special ? Because in other functions (i made) that have a similar vulnerability it works ?
NOTE: i don't think about ASLR, some guys said that happens only when i compile gcc -pie ... and other stuffs.

Where is segment %fs for static elf images setup?

I'm trying to figure out how the %fs register is initialized
when creating a elf image by hand.
The simple snippet I'd like to run is:
.text
nop
movq %fs:0x28, %rax;
1: jmp 1b
Which should read at offset 0x28 in the %fs segment. Normally this is where the stack canary is stored. Because I create the elf image by hand the %fs segment is not setup at all by my code this fails expectedly(?) .
Here is how I create the elf image:
0000000000000000 <.text>:
0: 90 nop
1: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
8: 00 00
a: eb fe jmp 0xa
I create the .text segment via
echo 9064488b042528000000ebfe | xxd -r -p > r2.bin
Then I convert to elf:
ld -b binary -r -o raw.elf r2.bin
objcopy --rename-section .data=.text --set-section-flags .data=alloc,code,load raw.elf
At that point raw.elf contains my instructions. I then link with
ld -T raw.ld -o out.elf -M --verbose where raw.ld is:
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_entry)
PHDRS {
phdr4000000 PT_LOAD;
}
SECTIONS
{
_entry = 0x4000000;
.text 0x4000000 : { raw.elf (.text) } :phdr4000000
}
I can now start out.elf with gdb:
gdb --args out.elf
and set a breakpoint at 0x4000000:
(gdb)break *0x4000000
(gdb)run
The first nop can be stepped via stepi, however the stack canary read mov %fs:0x28,%rax segfaults.
I suppose that is expected given that maybe the OS is not setting up %fs.
For a simple m.c: int main() { return 0; } program compiled with gcc --static m.c -o m I can read from %fs. Adding:
long can()
{
long v = 0;
__asm__("movq %%fs:0x28, %0;"
: "=r"(val)::);
return v;
}
lets me read from %fs - even though I doubt that %fs:28 is setup because ld.so is not run (it is a static image).
Question:
Can anyone point out where %fs is setup in the c runtime for static images?
You need to call arch_prctl with an ARCH_SET_FS argument before you can use the %fs segment prefix. You will have to allocate the backing store somewhere (brk, mmap, or an otherwise unused part of the stack).
glibc does this in __libc_setup_tls in csu/libc-tls.c for statically linked binaries, hidden behind the TLS_INIT_TP macro.

How to compile STM32f103 program on ubuntu?

I've some experience with programming stm32 arm cortex m3 micro controllers on Windows using Keil. I now want to move to linux environment and use open source tools to program STM32 cortex m3 devices.
I've researched a bit and found that I can use OpenOCD or Texane's ST Link to flash the chip. I also found out that I'll need a cross compiler to compile the code viz. gcc-arm-none-eabi toolchain.
I want to know what basic source and header files are needed? Which are the core and systems file required to make a simple blink program.
I'm not intending to use HAL libraries as of now. I'm using stm32f103zet6 mcu (a very generic board). I went to http://regalis.com.pl/en/arm-cortex-stm32-gnulinux/ , but couldn't exactly pinpoint the files.
If there is any tutorial to start stm32 programming on linux environment, please let me know.
Any help is appreciated. Thanks!
Here is a very simple example that is fairly portable across the stm32 family. Doesnt do anything useful you have to fill in the blanks to blink an led or something (read the schematic, the manuals, enable the clocks to the gpio, follow the instructions to make it a push/pull output and so on, the set the bit or clear the bit, etc).
I have my reasons for how I do it others have theirs, and we all have various numbers of years or decades of experience behind those opinions. But at the end of they day they are opinions and many different solutions will work.
On the last so many releases of ubuntu you can simply do this to get a toolchain:
apt-get install gcc-arm-linux-gnueabi binutils-arm-linux-gnueabi
Or you can go here and get a pre-built for your operating system
https://launchpad.net/gcc-arm-embedded
flash.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.end
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
sram.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.end
sram.ld
MEMORY
{
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.data : { *(.data*) } > ram
.bss : { *(.bss*) } > ram
}
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
#define STK_CSR 0xE000E010
#define STK_RVR 0xE000E014
#define STK_CVR 0xE000E018
#define STK_MASK 0x00FFFFFF
int delay ( unsigned int n )
{
unsigned int ra;
while(n--)
{
while(1)
{
ra=GET32(STK_CSR);
if(ra&(1<<16)) break;
}
}
return(0);
}
int notmain ( void )
{
unsigned int rx;
PUT32(STK_CSR,4);
PUT32(STK_RVR,1000000-1);
PUT32(STK_CVR,0x00000000);
PUT32(STK_CSR,5);
for(rx=0;;rx++)
{
dummy(rx);
delay(50);
dummy(rx);
delay(50);
}
return(0);
}
Makefile
#ARMGNU ?= arm-none-eabi
ARMGNU ?= arm-linux-gnueabi
AOPS = --warn --fatal-warnings -mcpu=cortex-m0
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0
all : notmain.gcc.thumb.flash.bin notmain.gcc.thumb.sram.bin
clean:
rm -f *.bin
rm -f *.o
rm -f *.elf
rm -f *.list
rm -f *.bc
rm -f *.opt.s
rm -f *.norm.s
rm -f *.hex
#---------------------------------
flash.o : flash.s
$(ARMGNU)-as $(AOPS) flash.s -o flash.o
sram.o : sram.s
$(ARMGNU)-as $(AOPS) sram.s -o sram.o
notmain.gcc.thumb.o : notmain.c
$(ARMGNU)-gcc $(COPS) -mthumb -c notmain.c -o notmain.gcc.thumb.o
notmain.gcc.thumb.flash.bin : flash.ld flash.o notmain.gcc.thumb.o
$(ARMGNU)-ld -o notmain.gcc.thumb.flash.elf -T flash.ld flash.o notmain.gcc.thumb.o
$(ARMGNU)-objdump -D notmain.gcc.thumb.flash.elf > notmain.gcc.thumb.flash.list
$(ARMGNU)-objcopy notmain.gcc.thumb.flash.elf notmain.gcc.thumb.flash.bin -O binary
notmain.gcc.thumb.sram.bin : sram.ld sram.o notmain.gcc.thumb.o
$(ARMGNU)-ld -o notmain.gcc.thumb.sram.elf -T sram.ld sram.o notmain.gcc.thumb.o
$(ARMGNU)-objdump -D notmain.gcc.thumb.sram.elf > notmain.gcc.thumb.sram.list
$(ARMGNU)-objcopy notmain.gcc.thumb.sram.elf notmain.gcc.thumb.sram.hex -O ihex
$(ARMGNU)-objcopy notmain.gcc.thumb.sram.elf notmain.gcc.thumb.sram.bin -O binary
You can also try/use this approach if you prefer. I have my reasons not to, TL;DW.
void dummy ( unsigned int );
#define STK_MASK 0x00FFFFFF
#define STK_CSR (*((volatile unsigned int *)0xE000E010))
#define STK_RVR (*((volatile unsigned int *)0xE000E014))
#define STK_CVR (*((volatile unsigned int *)0xE000E018))
int delay ( unsigned int n )
{
unsigned int ra;
while(n--)
{
while(1)
{
ra=STK_CSR;
if(ra&(1<<16)) break;
}
}
return(0);
}
int notmain ( void )
{
unsigned int rx;
STK_CSR=4;
STK_RVR=1000000-1;
STK_CVR=0x00000000;
STK_CSR=5;
for(rx=0;;rx++)
{
dummy(rx);
delay(50);
dummy(rx);
delay(50);
}
return(0);
}
Between the arm docs which to some extent ST publishes a derivative for you (not everyone does that you should still go to arm). Plus the st docs.
There is uart based bootloader built in (might be usb, etc), that is pretty easy to interface, lets see...my host code to download programs is in the hundreds of lines of code, probably took an evening or an afternoont to write. YMMV. You can get if you dont already have, one of the discovery or nucleo boards, I recommend those anyway, you can use the debug end of it to program other stm32 or even other non st arm chips (not all, depends on what openocd supports, etc, but some) can get those for 30% cheaper than the dedicated stlink usb dongles and you dont need an extension usb cable, etc, etc. YMMV. Can certainly use an stlink with openocd or texane stlink as you have already mentioned.
Due to the way the cortex-m boots I have provided two examples, one for burning to flash the other for downloading via openocd to ram and running that way, could arguably use the flash one too but you have to tweak the start address when you run. I prefer this method. YMMV.
This approach you are portable and completely unencumbered by HAL limitations or requirements, build environments, etc. But I recommend you try the various methods. Bare metal like this the HAL types of bare metal with one or more st solutions and the cmsis approach. Every year or so try again, see if the one you picked is still the one you like.
This example demonstrates though it does not take a whole lot. I picked the cortex-m0 simply to avoid the armv7m thumb2 extensions. thumb without those extensions is the most portable arm instruction set. so again the code does mostly nothing, but does nothing on any stm32 cortex-m with a systick timer.
EDIT
This along with whatever you need to feed the linker would be the minimal non-C code.
.global _start
_start:
.word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
And this is abbreviated depending on the chip vendor and core there can be dozens to hundreds of vectors for every little interrupt of every little thing. The labels reset and hang in this case would be the names of C functions to handle those vectors (the documentation for the chip and core determine what vector handles what). The first vector is always the initalization value of the stack pointer. The second is always reset, the next few are common, after that they are generic interrupt pins on the core that the chip vendor wires up so you have to look at the chip vendor documentation.
The core design is such that registers are preserved for you so you dont need a little bit of assembly. Going without any bootstrap then you assume to not have .bss zeroed nor .data initialized, and you cant return from the reset function, which in a real implementation you wouldnt but for demonstration tests, you might (blink an led 10 times then program is finished).
Your toolchain may have some other way to do this. Since all toolchains should have an assembler and assemblers can generate tables of words, there is always that option, doesnt really make sense to create yet another tool and language for this but some folks feel the need. Your toolchain may not require the entry point named _start and/or it may have a different entry point name requirement.
Even if you use Kiel, you should also try the gnu tools, easy(easier) to get, significantly more support and experience in the world than for Kiel. May not produce as "good" of code as Kiel, performance wise or other, but should always have that in your back pocket as you will always be able to find help with gnu tools.
http://gnuarmeclipse.github.io/
There you'll find everything, including an IDE (Eclipse), toolchain, debugger, headers.
Look at this package. This is IDE + toolchain + debugger and it available for linux platforms. You can research it and get any ideas to do what you want. I hope most of linux programs have commnad line interface.
In addition I can suggest to you: try to use LL api if it already available for your mcu.

Building 16 bit os - character array not working

I am building a 16 bit operating system. But character array does not seem to work.
Here is my example kernel code:
asm(".code16gcc\n");
void putchar(char);
int main()
{
char *str = "hello";
putchar('A');
if(str[0]== 'h')
putchar('h');
return 0;
}
void putchar(char val)
{
asm("movb %0, %%al\n"
"movb $0x0E, %%ah\n"
"int $0x10\n"
:
:"m"(val)
) ;
}
It prints:
A
that means putchar function is working properly but
if(str[0]== 'h')
putchar('h');
is not working.
I am compiling it by:
gcc -fno-toplevel-reorder -nostdinc -fno-builtin -I./include -c -o ./bin/kernel.o ./source/kernel.c
ld -Ttext=0x9000 -o ./bin/kernel.bin ./bin/kernel.o -e 0x0
What should I do?
Your data segment is probably not loaded in to the target. What are you doing after the link with your brand new kernel.bin file, which is in fact an elf file ?

Resources