How to assemble a mips assembly file

How to assemble a mips assembly file - cygwin

I have a binary file in mips format. I was able to disassemble it, make the changes I wanted to the assembly file in mips. Now I would like to assemble it back into a bin file again. I am using cygwin and am trying to do so with the ar utility.
This is the original object dump:
$ objdump -b binary -h test.bin
test.bin: file format binary
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00200004 00000000 00000000 00000000 2**0
CONTENTS, ALLOC, LOAD, DATA
I also have the assembly file (test.asm) which contains the mips instructions from the test.bin file.
I then tried to assemble it by:
ar -q test2.bin test.asm --target=elf32-big
and
ar -cr test2.bin test.asm --target=elf32-big
But in both cases I only get a bin file with the contents of the assembly file. Can anyone help on what I am missing to assemble this back to a elf32-big binary?
Thanks Before Hand

To do this, you'll need a MIPS assembler program. If you have a full gcc MIPS cross-compiler, the name of the assembler should be something like mips-as or as.
Actually, it might be easier to compile it with mips-gcc, which will invoke the assembler and linker for you.

Related

Getting undefined reference to "_printf" error for assembly code despite using gcc linker

I am trying to follow the exercise in the book PC Assembly by Paul Carter. http://pacman128.github.io/pcasm/
I'm trying to run the program from 1.4 page 23 on Ubuntu 18. The files are all available on the github site above.
Since original code is for 32bit I compile using
nasm -f elf32
for first.asm and asm_io.asm to get the object files. I also compile driver.c
I use the linker from gcc and run
gcc -m32 -o first first.o asm_io.o driver.o
but it keeps giving me a bun of errors like
undefined reference to '_scanf'
undefined reference to '_printf'
(note _printf appears instead of printf because some conversion is done in the file asm_io.asm to maintain compatibility between windows and linux OS's)
I don't know why these errors are appearing. I also try running using linker directly
ld -m elf_i386 -e main -o first -first.o driver.o asm_io.o -I /lib/i386-linux-gnu/ld-linux.so.2
and many variations since it seems that its not linking with the C libraries.
Any help? Stuck on this for a while and couldn't find a solution on similar questions

Linux doesn't prepend _ to names when mapping from C to asm symbol names in ELF object files1.
So call printf, not _printf, because there is no _printf in libc.
Whatever "compatibility" code did that is doing it wrong. Only Windows and OS X use _printf, Linux uses printf.
So either you've misconfigured something or defined the wrong setting, or it requires updating / porting to Linux.
Footnote 1: In ancient history (like over 20 years ago), Linux with the a.out file format did use leading underscores on symbol names.
Update: the library uses the NASM preprocessor to %define _scanf scanf and so on, but it requires you to manually define ELF_TYPE by assembling with nasm -d ELF_TYPE.
They could have detected ELF32 or ELF64 output formats on their own, because NASM pre-defines __OUTPUT_FORMAT__. Someone should submit a pull-request to make this detection automatic with code something like this:
%ifidn __OUTPUT_FORMAT__, elf32
%define ELF_TYPE 32
%elifidn __OUTPUT_FORMAT__, elf64
%define ELF_TYPE 64
%endif
%ifdef ELF_TYPE
...
%endif

How to turn hex code into x86 instructions

I'm trying to make a script or program that will take given bytes (given in hexadecimal), and convert them into a x86 instructions (For example c3 -> retq)
I've tried doing it by calling gcc -c on an assembly file just containing
retq
retq
and then using a script to insert bytes where it says "c3 c3", then using objdump -d to see what it says now. But it seems that it messes up the format of the file unless I only pass an instruction of the same size as the original instruction bytes.
I'm running it on a Raspbian Pi (A linux based operating system) using SSH, BASH terminal. I'm using BASH shell scripts and python, as well as the tools listed here, and gdb.

Disassemble flat binary file: objdump -D -b binary -m i386 foo.bin. Or create an object file using .byte directives from assembly source, e.g. put .byte 0xc3 into foo.s then gcc -c foo.s then objdump -d foo.o

Is it possible to assemble and run raw CPU instructions using `as`?

There are a couple of related questions here.
Consider a program consisting only of the following two instructions
movq 1, %rax
cpuid
If I throw this into a file called Foo.asm, and run as Foo.asm, where as is the portable GNU assembler, I will get a file called a.out, of size 665 bytes on my system.
If I then chmod 700 a.out and try ./a.out, I will get an error saying cannot execute binary file.
Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Why can the binary not be executed? I am providing valid instructions, so I would expect the CPU to be able to execute them.
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
Once I have the answer to 3, how can I get my processor to execute them? (Assuming that I am not running privileged instructions.)

Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Because the assembler creates a relocatable object file which includes additional information, like memory Sections and Symbol tables.
Why can the binary not be executed?
Because it is an (relocatable) object file, not a loadable file. You need to link it in order to make it executable so that it can be loaded by the operating system:
$ ld -o Foo a.out
You also need to give the linker a hint about where your program starts, by specifying the _start symbol.
But then, still, the Foo executable is larger than you might expect since it still contains additional information (e.g. the elf header) required by the operating system to actually launch the program.
Also, if you launch the executable now, it will result in a segmentation fault, since you are loading the contents of address 1, which is not mapped into your address space, into rax. Still, if you fix this, the program will run into undefined code at the end - you need to make sure to gracefully exit the program through a syscall.
A minimal running example (assumed x86_64 architecture) would look like
.globl _start
_start:
movq $1, %rax
cpuid
mov $60, %rax # System-call "sys_exit"
mov $0, %rdi # exit code 0
syscall
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
You can use objcopy to generate a raw binary image from an object file:
$ objcopy -O binary a.out Foo.bin
Then, Foo.bin will only contain the instruction opcodes.
nasm has a -f bin option which creates a binary-only representation of your assembly code. I used this to implement a bare boot loader for VirtualBox (warning: undocumented, protoype only!) to directly launch binary code inside a VirtualBox image without operating system.
Once I have the answer to 3, how can I get my processor to execute them?
You will not be able to directly execute the raw binary file under Linux. You will need to write your own loader for that or not use an operating system at all. For an example, see my bare boot loader link above - this writes the opcodes into the boot loader of a VirtualBox disc image, so that the instructions are getting executed when launching the VirtualBox machine.

The old MS-DOS COM file format does not include a header. It really only contains the binary executable code. The code size can, however, not exceed 64kb. I don't know whether Linux can execute these.

You can write the opcodes into a file using a hexeditor. Then you just need to surround it with an elf header that Linux knows how to execute it.
Here's an example:
hexedit myfile.bin
Now just write your opcodes inside the file using the hexeditor.
After that you need to add the elf header. You could do this by hand and write the elf header into your .bin file, but that a bit tricky. Easiest method is to use a few commands (In this example for 64 bit).
objcopy --input-target=binary --output-target=elf64-x86-64 myfile.bin myfile.o
ld -o myfile myfile.o -T binary.ld
You will also need a linker script. I called this for example binary.ld.
And that are the contents of binary.ld:
ENTRY(_start);
SECTIONS
{
_start = 0x0;
}
Now you can execute your program: ./myfile

Perhaps there's something like exe2bin utility for the GNU tool set. I've used various versions of exe2bin with Microsoft tools, and the ARM toolkit has the ability to produce binaries, but I don't recall if it was directly from the linked output or something like exe2bin.

starting point of ELF executable file?

I compile following C program on lubuntu 12.10 with anjuta
int main()
{
return 0;
}
the file name is foobar
then I open up terminal and write command
ndisasm foobar -b 32 1>asm.txt
(disassemble foobar with 32 bit instruction option and save disassembled result to asm.txt)
I open up asm.txt
there are many 0x0000 and miss-understandable code.
the instruction jg 0x47(0x7F45) on 0x00000000 and dec esp(0x4C) on 0x00000002
seems ELF file format signature.
(because the hex code 0x454c46 is 'ELF' in ascii)
the Linux might load this code to memory and don't jump to 0x00000000 because there is no executable code.
I have questions here.
how do I know the address of starting address?
which code is OK to ignore?(maybe many 0x0000 would be OK to ignore but what else?)

Even for the simplest program like yours, gcc is linking some libraries and some object files (notably crt0.o which calls your main and contains _start, the ELF starting point). And your binary is probably dynamically linked to some libc.so.6 so needs the dynamic linker (use ldd foobar to find out). Use gcc -v to understand what gcc is doing. And objdump has a lot of interesting flags or options.
You may also want to read the Assembly Howto, the X86 calling conventions, this question, the X86-64 ABI, these notes on X86-64 programming, etc

How to convert PE(Portable Executable) format to ELF in linux

What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks

It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.

I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.

I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string