Using ndisasm in files of different architectures - linux

I would like to use ndisasm for a huge number of files of different architectures (x86 or x64). I do not know if -b16 would gave me correct outputs for all the files or if I have to specify the correct option for each file, like -b32 or -b64. What I am running right know from the command line:
for file in *; do ndisasm -b16 -07c00h -a -s7c3eh "$file" > "/my-path/$file"; done

I'd recommend not using ndisasm unless you really do have flat binaries. It treats the whole file, including metadata, as instructions.
x86 machine code is variable-length, and needs to be decoded from the correct starting address to be "in sync". e.g. if the last couple bytes of metadata decode as the start of a long instruction, that's how ndisasm will decode them. This will consume the first few bytes of what was supposed to be the first instruction(s) of machine code in the object or executable file. After that, the current position may be in the middle of another instruction.
Decoding will often get back into sync fairly quickly and line up with how the instructions will actually execute, but if you're going to run a big batch disassembly you might as well use tools that will do it correctly.
Both of the following disassemblers understand object-file formats and selected a mode based on the file type. (e.g. x86-64 mode for x86-64 ELF or PE-COFF objects / executables).
objdump -drwC -Mintel (from GNU binutils) makes pretty nice output, but it uses GNU .intel_syntax noprefix which is MASM-like. (See the intel-syntax tag wiki for more about MASM-style vs. NASM-style).
Agner Fog's objconv disassembler is quite good, and can disassemble into NASM / YASM syntax, or MASM, or AT&T. Example of using it. The output has all extra info as comments, so you can feed it to an assembler and get a binary similar to what you started with, including different sections.
(But special encodings aren't preserved, e.g. the .plt normally uses push imm32 for padding even with small immediates, but you will get the push imm8 form when NASM assembles push 0x1, because objconv doesn't disassemble it to push strict dword 0x1.) Still, it's very good most of the time, and even puts labels on branch targets so you can easily find the tops of loops.
If some but not all of your binaries are flat, maybe use file to find the ones that aren't and feed them to objconv. For the flat binaries, you'll probably have to try disassembling multiple ways and use human judgement to decide whether the code looks "sane" or not.
One major sign of 32-bit code being disassembled as 16 is when the end of a 32-bit immediate or addressing-mode displacement gets decoded as the start of a new instruction. Often this is an add instruction (opcode 00).
For 64 vs. 32-bit code, one big difference is REX prefixes vs. single-byte dec / inc instructions. If you see weird dec / inc instructions in 32-bit disassembly, it's probably actually 64-bit machine-code. If you see weird REX prefixes (especially when the disassembler says rex add eax, ecx or something to show you there's a useless REX prefix), it was probably a separate inc instruction in 32-bit machine code.

Related

Why does the .bss segment have no executable attribute?

I have an ELF 32-bit executable file named orw from the pwnable.tw: https://pwnable.tw/challenge/. In my Ubuntu18.04, the .bss segment can be executed:
But in my Ubuntu20 and IDA Pro, the .bss segment have no executable attributes, why?
Why does the .bss segment have no executable attribute?
In a normal executable .bss should not have execute permissions, so it's the Ubuntu 18.04 result that is strange, not the other way around.
The following are all relevant here:
output from readelf -Wl orw
kernel versions
output from cat /proc/cpuinfo
emulator details (if you are using some kind of emulator).
I suspect that you are using an emulator, and it's set up to emulate pre-NX-bit processor (where the W bit implied X bit as well).
Alternatively, the executable lacks PT_GNU_STACK segment, in which case this answer is likely the correct one -- kernel defaults have changed for such binaries.
.bss is a segment for uninitialized global variables, so It's not normally executable (it doesn't need to). If you want it executable (because you are compiling machine code that you want to be able to test) you will probably need to select a special segment or to create two segments (one executable and other read/write) overlapping to allow to write the code while you can also execute it. This can be already specified in the standard script you use to link executables (with a different name, sure) or if it has not been done for you, you can specify a linker script that allows for those to be created. Read the linker documentation (in full, sorry) to know how the linker deals with this (and other) idiosynchracies of your processor architecture.
I don't know what architecture you are using, but for example, intel processors have an execution bit permissions in the segments, as they have read and write, which means that the memory access for an executable segment must be an opcode fetch access and not a data read access to load a data register. If you want to access the text segment for data reading, then you need to add also read access to the text segment to be able to see the code you are executing.

RIP stuck at inc instruction in self-modifying shellcode [duplicate]

Is it possible to allocate memory in other sections of a NASM program, besides .data and .bss?
Say I want to write to a location in .text section and receive Segmentation Fault
I'm interested in ways to avoid this and access memory legally. I'm running Ubuntu Linux
If you want to allocate memory at runtime, reserve some space on the stack with sub rsp, 4096 or something. Or run an mmap system call or call malloc from libc, if you linked against libc.
If you want to test shellcode / self-modifying code,
or have some other reason for have a writeable .text:
Link with ld --omagic or gcc -Wl,--omagic. From the ld(1) man page:
-N
--omagic
Set the text and data sections to be readable and writable. Also, do not page-align the data segment, and disable linking against shared
libraries. If the output format supports Unix style magic numbers, mark the output as "OMAGIC".
See also How can I make GCC compile the .text section as writable in an ELF binary?
Or probably you can use a linker script. It might also be possible to use NASM section attribute stuff to declare a custom section that has read, write, exec permission.
There's normally (outside of shellcode testing) no reason to do any of this, just put your static storage in .data or .bss, and your static read-only data in .rodata like a normal person.
Putting read/write data near code is actively bad for performance: possible pipeline nukes from the hardware that detects self-modifying-code, and it at least pollutes the iTLB with data and the dTLB with code, if you have a page that includes some of both instead of being full of one or the other.

why non-pic code can't be totally ASLR using run-time fixups?

I understand that PIC code makes ASLR randomization more efficient and easier since the code can be placed anywhere in memory with no change in code. But if i understand right according to Wikipedia relocation dynamic linker can make "fixups" at runtime so a symbol can be located although code being not position-independent. But according to many answers i saw here non-pic code can't ASLR sections except the stack(so cant randomize program entry point). If that is correct then what are runtime fixups are used for and why can't we just fixup all locations in code at runtime before program start to make program entry point randomized.
TL:DR: Not all uses of absolute address will have relocation info in a non-PIE executable (ELF type EXEC, not DYN). Therefore the kernel's program-loader can't find them all to apply fixups.
Thus there's no way to retroactively enable ASLR for executables built as non-PIE. There's no way for a traditional executable to flag itself as having relocation metadata for every use of an absolute address, and no point in adding such a feature either since if you want text ASLR you'd just build a PIE.
Because ELF-type EXEC Linux executables are guaranteed to be loaded / mapped at the fixed base address chosen by the linker at link time, it would be a waste of space in the executable to make symbol-table entries for internal symbols. So toolchains didn't do that, and there's no reason to start. That's simply how traditional ELF executables were designed; Linux switched from a.out to ELF back in the mid 90s before stack ASLR was a thing, so it wasn't on people's radar.
e.g. the absolute address of static char buf[100] is probably embedded somewhere in the machine code that uses it (if we're talking about 32-bit code, or 64-bit code that puts the address in a register), but there's no way to know where or how many times.
Also, for x86-64 specifically, the default code model for non-PIE executables guarantees that static addresses (text / data / bss) will all be in the low 2GiB of virtual address space, so 32-bit absolute signed or unsigned addresses can work, and rel32 displacements can reach anything from anything. That's why non-PIE compiler output uses mov $symbol, %edi (5 bytes) to put an address in a register, instead of lea symbol(%rip), %rdi (7 bytes). https://godbolt.org/z/89PeK1
So even if you did know where every absolute address was, you could only ASLR it in the low 2GiB, limiting the number of bits of entropy you could introduce. (I think Windows has a mode for this: LargeAddressAware = no. But Linux doesn't. 32-bit absolute addresses no longer allowed in x86-64 Linux? Again, PIE is a better way to allow text ASLR, so people (distros) should just compile for that if they want its benefits.)
Unlike Windows, Linux doesn't spend huge effort on things that can be handled better and more efficiently by recompiling binaries from source.
That being said, GNU/Linux does support fixup relocations for 64-bit absolute addresses even in PIC / PIE ELF shared objects. That's why beginner code like NASM mov rdi, BUFFER can work even in a shared library: use objdump -drwC -Mintel to see the relocation info on that use of the symbol in a mov reg, imm64 instruction. An lea rdi, [rel BUFFER] wouldn't need any relocation entry if BUFFER wasn't a global symbol. (Equivalent of C static.)
You might be wondering why metadata is essential:
There's no reliable way to search text/data for possible absolute addresses; false positives would be possible. e.g. /usr/bin/ld probably contains 0x401000 as the default start address for an x86-64 executable. You don't want ASLR of ld's code+data to also change its defaults. Or that integer value could have come up in any number of ways in many programs, e.g. as a bitmap. And of course x86-64 machine code is variable length so there's no reliable way to even distinguish opcodes from immediate operands in the most general case.
And also potentially false negatives. Not super likely that an x86 program would construct an absolute address in a register with multiple instructions, but it's certainly possible. However in non-x86 code, that would be common.
RISC machines with fixed-length instructions can't put a 32-bit address into a 32-bit instruction; there'd be no room left for anything else. So to load from static storage, the absolute addresses would have to be split across multiple instructions, like MIPS lui $t0, %hi(0x612300) / lw $t1, %lo(0x612300)($t0) to load from a static variable at absolute address 0x612300. (There would normally be a symbol name in the asm source, but it wouldn't appear in the final linked binary unless it was .globl, so I used numbers as a reminder.) Instructions like that don't have to come in pairs; the same high-half of the address could be reused by other accesses into the same array or struct in later instructions.
Let's first have a look at Windows before having a look at Linux:
Windows' .EXE files (programs) typically have a so-called "base relocation table" and they have an "image base".
The "image base" is the "desired" start address of the program; if Windows loads the program to that address, no relocation needs to be done.
The "base relocation table" contains a list of all values in a program which represent addresses. If the program is loaded to a different address than the "image base", Windows must add the difference to all values listed in that table.
If the .EXE file does not contain a "base relocation table" (as far as I know some 32-bit GCC versions generate such files), it is not possible to load the file to another address.
This is because the following C code statements will result in exactly the same machine code (binary code) if the variable someVariable is located at the address 12340000, and it is not possible to distinguish between them:
long myVariable = 12340000;
And:
int * myVariable = &someVariable;
In the first case, the value 12340000 must not be changed in any situation; in the second case, the address (which is 12340000) must be changed to the real address if the program is loaded to another address.
If the "base relocation table" is missing, there is no information if the value 12340000 is an integer value (which must not be changed) or an address (which must be changed).
So the program must be loaded to some fixed address.
I'm not sure about the latest 32-bit Linux releases, but at least in older 32-bit Linux versions there was nothing like a "base relocation table" and programs did not use PIC. This means that these programs had to be loaded to their "favorite" address.
I don't know about 64-bit Linux programs, but if a program is compiled the same way as the (older) 32-bit programs, they also must be loaded to a certain address and ASLR is not possible.

How to inform GCC to not use a particular register

Assume I have a very big source code and intend to make the rdx register totally unused during the execution, i.e., while generating the assembly code, all I want is to inform my compiler (GCC) that it should not use rdx at all.
NOTE: register rdx is just an example. I am OK with any available Intel x86 register.
I am even happy to update the source code of the compiler and use my custom GCC. But which changes to the source code are needed?
You tell GCC not to allocate a register via the -ffixed-reg option (gcc docs).
-ffixed-reg
Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role).
reg must be the name of a register. The register names accepted are machine-specific and are defined in the REGISTER_NAMES macro in the machine description macro file.
For example, gcc -ffixed-r13 will make gcc leave it alone entirely. Using registers that are part of the calling convention, or required for certain instructions, may be problematic.
You can put some global variable to this register.
For ARM CPU you can do it this way:
register volatile type *global_ptr asm ("r8")
This instruction uses general purpose register "r8" to hold
the value of global_ptr pointer.
See the source in U-Boot for real-life example:
http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/include/asm/global_data.h;h=4e3ea55e290a19c766017b59241615f7723531d5;hb=HEAD#l83
File arch/arm/include/asm/global_data.h (line ~83).
#define DECLARE_GLOBAL_DATA_PTR register volatile gd_t *gd asm ("r8")
I don't know whether there is a simple mechanism to tell that to gcc at run time. I would assume that you must recompile. From what I read I understand that there are description files for the different CPUs, e.g. this file, but what exactly needs to be changed in order to prevent gcc from using the register, and what potential side effects such a change could have, is beyond me.
I would ask on the gcc mailing list for assistence. Chances are that the modification is not so difficult per se, except that building gcc isn't trivial in my experience. In your case, if I analyze the situation correctly, a caveat applies. You are essentially cross-compiling, i.e building for a different architecture. In particular I understand that you have to build your system and other libraries which your program uses because their code would normally use that register. If you intend to link dynamically you probably would also have to build your own ld.so (the dynamic loader) because starting a dynamically linked executable actually starts that loader which would use that register. (Therefore maybe linking statically is better.)
Consider the divq instruction - the dividend is represented by [rdx][rax], and, assuming the divisor (D) satisfies rdx < D, the quotient is stored in %rax and remainder in %rdx. There are no alternative registers that can be used here.
The same applies with the mul/mulq instructions, where the product is stored in [rdx][rax] - even the recent mulx instruction, while more flexible, still uses %rdx as a source register. (If memory serves)
More importantly, %rdx is used to pass parameters in the x86-64 ELF ABI. You could never call C library functions (or any other ELF library for that matter) - even kernel syscalls use %rdx to pass parameters - though the register use is not the same.
I'm not clear on your motivation - but the fact is, you won't be able to do anything practical on any x86[-64] platform (let alone an ELF/Linux platform) - at least in user-space.

How to add/remove x86 instruction in linux executables without spoiling the alignment

I'm new to binary and assembly, and I'm curious about how to directly edit binary executables. I tried to remove an instruction from a binary file (according to disassembled instructions provided by objdump), but after doing that the "executable" seems no longer in an executable format (segmentation fault when running; gdb cannot recognize). I heard that this is due to instruction alignment issue. (Is it?)
So, is it possible to add/remove single x86 instructions directly in linux executables? If so, how? Thanks in advance.
If you remove a chunk of binary file without adjusting file headers accordingly, it will become invalid.
Fortunately, you can replace instructions with NOP without actually removing them. File size remains the same, and if there is no checksum or signature (or if it's not actually checked), there is nothing more to do.
There is no universal way to insert the instructions, but generally you overwrite the original code with a JMP to another location, where you reproduce what the original code did, do your own things as you wanted, then JMP back. Finding room for your new code might be impossible without changing the size of the binary, so I would instead patch the code after executable is loaded (perhaps using a special LD_PRELOADed library).
Yes. Just replace it with a NOP instruction (0x90) - or multiple ones if the instruction spans across multiple bytes. This is an old trick.

Resources