How do I do bcalls in hex? - z80

So I have a TI-84 Plus C Silver Edition. I just started working on writing assembly programs on it using the opcodes. I found a good reference chart here, but was wondering how to do bcalls, specifically how to print a character to the screen. It seems as if the hex code for the call is 3 bytes long, but call takes in 2 bytes. So how do I call it?
Also, does anyone know the memory location programs are loaded into when they are run for my calculator? I haven't been able to find it yet.

Based on the definitions here: http://wikiti.brandonw.net/index.php?title=84PCSE:OS:Include_File , a "bcall" is a RST 28 instruction followed by the specific number of the bcall. So to print a character you would do (given that PutC is 44FB):
rst 28h
dw 44FBh
Presumably the character to print is in A register.

TI uses rst 28h for their bcall which translates to hexadecimal as EF. Bcalls are 2 bytes, but keep in mind that the Z80 and eZ80 are little-endian processors. So as mentioned earlier, _PutC is 44FB, so you would have to use the FB first, then the 44, making bcall(_PutC) equivalent to EFFB44.
I think the calc you are using has an eZ80. While the eZ80 is backwards compatible with the Z80 instruction set, the table to which you linked is not complete for the eZ80. If you are looking to get really wild, you can use the docs provided by Zilog here though I must warn you that if you are not fairly comfortable with Z80 Assembly, the reading material will be too dense.

Related

How to deal with a bad char in a shellcode buffer overflow?

So I got recently interested in buffer overflow and many tutorials and recourses online have this CTF like attack where you need to read the content of a flag file (using cat for example).
So I started looking online for assembly examples of how to do this and I came accross sites like this or shell-storm where there are plenty of examples on how to do this.
So I generated my exploit and got this machine code (it basically executes a shell doing cat flag):
shellcode = b'\x31\xc0\x50\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x89\xe3\x50\x68\x66\x6c\x61\x67\x89\xe1\x50\x51\x53\x89\xe1\x31\xc0\x83\xc0\x0b\xcd\x80'
The problem is that, thanks to stepping in with GDB to debug the problem, I noticed that my buffer doesn't get copied starting with \x0b towards the end of the shell code. I know the problem is there because if I change it to say \x3b then it works (with the rest of my exploits not copied here) even if it obviously crashes when it reaches the wrong value there but at least the whole buffer gets copied. Now doing some research it seems like \x0b is a "bad char" which can cause issues and should be avoided. Having said this I don't understand how:
All those online and even university tutorials use that shell code
for this exact task.
How to potentially fix this. Is it even possible without completely
change the assembly code?
I will add that I am on Ubuntu and trying to make this work on 64 bits.
One thing that's special about byte 0x0b is it's ASCII Vertical Tab, which is considered a whitespace character.
So I'm going to make a wild guess that the code you're exploiting looks something like
// Dangerous code, DO NOT USE
char buf[TOO_SMALL];
scanf("%s", buf);
since scanf("%s") is a commonly (mis)used input mechanism that stops when it hits whitespace. If so, then if your shellcode contains 0x0b or any other whitespace character, it will get truncated.
To your first question, as to "why do other tutorials use shellcode like this", they may be thinking instead of exploiting code like
// Dangerous code, DO NOT USE
char buf[TOO_SMALL];
gets(buf);
where gets() will not stop reading at 0x0b but only at newline 0x0a. Or maybe they are thinking of a buffer filled by strcpy() which will only stop at 0x00, or maybe a buffer filled by read() with a user-controlled size which will read the full amount of data no matter what bytes it contains. So the question of which characters are "bad" depends on what the vulnerable code actually does.
As to how to handle it, well, you need to modify your shellcode to use only instructions that don't contain any whitespace bytes. This sort of thing is more an art than a science; you have to know your instruction set well, and be creative in thinking about alternative instruction sequences to achieve the desired result. Sometimes you may be able to do it with minor tweaks; other times a wholesale rewrite may be needed. It really varies.
In this case, luckily the 0x0b is the only whitespace character in the whole code, and it appears in the instruction
83C00B add eax, 0x0b
Since eax was previously zeroed, the goal is to load it with the value 0xb which is the system call number of execve. When the "bad byte" appears as part of immediate data, it is usually not too hard to find another way to get that data to where it needs to go. (Life is harder when the bad byte is part of the opcode itself.) In this case, a simple solution is to take advantage of two's complement, and write instead
83E8F5 sub eax, -0x0b
The single byte -0x0b = 0xf5 gets sign-extended to 32 bits and used as the value to subtract, which leaves 0x0b in eax as desired. Of course there are lots of other ways, some of which may have smaller code size; I'll leave this to your ingenuity.
To find out the "bad char" for the shellcode is an important step to exploit an overflow vulneribility.
first, you have to figure out how many bits the target can be overflow (this field is also for the shellcode). if this zone is big enough and you can use all the "char"(google bad char from \x01 to \xff. \x00 is bad char) to act as shellcode send to target.
Then you can get find the Register to see what the char left.(if the zone is not big enough for all the chars you can send just some chars one time and repeat)
you can follow this https://netsec.ws/?p=180.

RISC-V user level reference or reference implementation

Summary: What is the definitive reference or reference implementation for the RISC-V user-level ISA?
Context: The RISC-V website has "The RISC-V Instruction Set Manual" which explains the user-level instructions very well, but does not give an exact specification for them. I am trying to build a user-level ISA simulator now and intend to write an FPGA implementation later, so the exact behavior is important to me.
A reference implementation would be sufficient, but should preferably be as simple as possible -- i.e. I would try to understand a pipelined implementation only as a last resort. What is important is to have an understanding of the specified ISA and not of a single CPU implementation or compiler implementation.
One example to show my problem is the AUIPC instruction: The prose explanation says that "AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd." I wanted to know whether this refers to the old or new PC, i.e. the position of the AUIPC instruction or the next instruction. I looked at the "RISCV Angel" implementation, but that seems to mask out the lower bits of the (old) PC -- not just of the immediate -- which I could not find any reason for in the spec, not even in the change history of the spec (since Angel is a bit older). Instead of an answer, I now have two questions about AUIPC. Many other instructions pose similar problems to me.
AFAICT the RISC-V Instruction Set Manual you cite is the closest thing there is to a definitive reference. If there are things that are unclear or incorrect in there then you could open issues at the Github site where that document is maintained: https://github.com/riscv/riscv-isa-manual
As far as AIUPC is concerned, the answer is implied, but not stated explicitly, by this sentence at the bottom of page 9 in the current manual:
There is one additional user-visible register: the program counter pc holds the address of the current instruction.
Based on that statement I would expect that the pc value that is seen and manipulated by the AIUPC instruction is the address of the AIUPC instruction itself.
This interpretation is supported by the discussion of the JALR instruction:
The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the least-signicant bit of the result to zero. The address of the instruction following the jump (pc+4) is written to register rd.
Given that the address of the following instruction is expressed as pc+4, it seems clear that the pc value visible during the execution of JALR is the address of the JALR instruction itself.
The latest draft of the manual (at https://github.com/riscv/riscv-isa-manual/releases/download/draft-20190321-ba17106/riscv-spec.pdf) makes the situation slightly clearer. In place of this in the current manual:
AUIPC appends 12 low-order zero bits to the 20-bit U-immediate, sign-extends the result to 64 bits, then adds it to the pc and places the result in register rd.
the latest draft says:
AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc of the AUIPC instruction, then places the result in register rd.

How to tell compiler to pad a specific amount of bytes on every C function?

I'm trying to practice some live instrumentation and I saw there was a linker option -call-nop=prefix-nop, but it has some restriction as it only works with GOT function (I don't know how to force compiler to generate GOT function, and not sure if it's good idea for performance reason.) Also, -call-nop=* cannot pad more than 1 byte.
Ideally, I'd like to see a compiler option to pad any specific amount of bytes, and compiler will still perform all the normal function alignment.
Once I have this pad area, I can at run time to reuse these padding area to store some values or redirect the control flow.
P.S. I believe Linux kernel use similar trick to dynamically enable some software tracepoint.
-pg is intended for profile-guided optimization. The correct option for this is -fpatchable-function-entry
-fpatchable-function-entry=N[,M]
Generate N NOPs right at the beginning of each function, with the function entry point before the Mth NOP. If M is omitted, it defaults to 0 so the function entry points to the address just at the first NOP. The NOP instructions reserve extra space which can be used to patch in any desired instrumentation at run time, provided that the code segment is writable. The amount of space is controllable indirectly via the number of NOPs; the NOP instruction used corresponds to the instruction emitted by the internal GCC back-end interface gen_nop. This behavior is target-specific and may also depend on the architecture variant and/or other compilation options.
It'll insert N single-byte 0x90 NOPs and doesn't make use of multi-byte NOPs thus performance isn't as good as it should, but you probably don't care about that in this case so the option should work fine
I achieved this goal by implement my own mcount function in an assembly file and compile the code with -pg.

instruction set emulator guide

I am interested in writing emulators like for gameboy and other handheld consoles, but I read the first step is to emulate the instruction set. I found a link here that said for beginners to emulate the Commodore 64 8-bit microprocessor, the thing is I don't know a thing about emulating instruction sets. I know mips instruction set, so I think I can manage understanding other instruction sets, but the problem is what is it meant by emulating them?
NOTE: If someone can provide me with a step-by-step guide to instruction set emulation for beginners, I would really appreciate it.
NOTE #2: I am planning to write in C.
NOTE #3: This is my first attempt at learning the whole emulation thing.
Thanks
EDIT: I found this site that is a detailed step-by-step guide to writing an emulator which seems promising. I'll start reading it, and hope it helps other people who are looking into writing emulators too.
Emulator 101
An instruction set emulator is a software program that reads binary data from a software device and carries out the instructions that data contains as if it were a physical microprocessor accessing physical data.
The Commodore 64 used a 6502 Microprocessor. I wrote an emulator for this processor once. The first thing you need to do is read the datasheets on the processor and learn about its behavior. What sort of opcodes does it have, what about memory addressing, method of IO. What are its registers? How does it start executing? These are all questions you need to be able to answer before you can write an emulator.
Here is a general overview of how it would look like in C (Not 100% accurate):
uint8_t RAM[65536]; //Declare a memory buffer for emulated RAM (64k)
uint16_t A; //Declare Accumulator
uint16_t X; //Declare X register
uint16_t Y; //Declare Y register
uint16_t PC = 0; //Declare Program counter, start executing at address 0
uint16_t FLAGS = 0 //Start with all flags cleared;
//Return 1 if the carry flag is set 0 otherwise, in this example, the 3rd bit is
//the carry flag (not true for actual 6502)
#define CARRY_FLAG(flags) ((0x4 & flags) >> 2)
#define ADC 0x69
#define LDA 0xA9
while (executing) {
switch(RAM[PC]) { //Grab the opcode at the program counter
case ADC: //Add with carry
A = X + RAM[PC+1] + CARRY_FLAG(FLAGS);
UpdateFlags(A);
PC += ADC_SIZE;
break;
case LDA: //Load accumulator
A = RAM[PC+1];
UpdateFlags(X);
PC += MOV_SIZE;
break;
default:
//Invalid opcode!
}
}
According to this reference ADC actually has 8 opcodes in the 6502 processor, which means you will have 8 different ADC in your switch statement, each one for different opcodes and memory addressing schemes. You will have to deal with endianess and byte order, and of course pointers. I would get a solid understanding of pointer and type casting in C if you dont already have one. To manipulate the flags register you have to have a solid understanding of bitwise operations in C. If you are clever you can make use of C macros and even function pointers to save yourself some work, as the CARRY_FLAG example above.
Every time you execute an instruction, you must advance the program counter by the size of that instruction, which is different for each opcode. Some opcodes dont take any arguments and so their size is just 1 byte, while others take 16-bit integers as in my MOV example above. All this should be pretty well documented.
Branch instructions (JMP, JE, JNE etc) are simple: If some flag is set in the flags register then load the PC to the address specified. This is how "decisions" are made in a microprocessor and emulating them is simply a matter of changing the PC, just as the real microprocessor would do.
The hardest part about writing an instruction set emulator is debugging. How do you know if everything is working like it should? There are plenty of resources for helping you. People have written test codes that will help you debug every instruction. You can execute them one instruction at a time and compare the reference output. If something is different, you know you have a bug somewhere and can fix it.
This should be enough to get you started. The important thing is that you have A) A good solid understanding of the instruction set you want to emulate and B) a solid understanding of low level data manipulation in C, including type casting, pointers, bitwise operations, byte order, etc.

Doing file operations with 64-bit addresses in C + MinGW32

I'm trying to read in a 24 GB XML file in C, but it won't work. I'm printing out the current position using ftell() as I read it in, but once it gets to a big enough number, it goes back to a small number and starts over, never even getting 20% through the file. I assume this is a problem with the range of the variable that's used to store the position (long), which can go up to about 4,000,000,000 according to http://msdn.microsoft.com/en-us/library/s3f49ktz(VS.80).aspx, while my file is 25,000,000,000 bytes in size. A long long should work, but how would I change what my compiler(Cygwin/mingw32) uses or get it to have fopen64?
The ftell() function typically returns an unsigned long, which only goes up to 232 bytes (4 GB) on 32-bit systems. So you can't get the file offset for a 24 GB file to fit into a 32-bit long.
You may have the ftell64() function available, or the standard fgetpos() function may return a larger offset to you.
You might try using the OS provided file functions CreateFile and ReadFile. According to the File Pointers topic, the position is stored as a 64bit value.
Unless you can use a 64-bit method as suggested by Loadmaster, I think you will have to break the file up.
This resource seems to suggest it is possible using _telli64(). I can't test this though, as I don't use mingw.
I don't know of any way to do this in one file, a bit of a hack but if splitting the file up properly isn't a real option, you could write a few functions that temp split the file, one that uses ftell() to move through the file and swaps ftell() to a new file when its reaching the split point, then another that stitches the files back together before exiting. An absolutely botched up approach, but if no better solution comes to light it could be a way to get the job done.
I found the answer. Instead of using fopen, fseek, fread, fwrite... I'm using _open, lseeki64, read, write. And I am able to write and seek in > 4GB files.
Edit: It seems the latter functions are about 6x slower than the former ones. I'll give the bounty anyone who can explain that.
Edit: Oh, I learned here that read() and friends are unbuffered. What is the difference between read() and fread()?
Even if the ftell() in the Microsoft C library returns a 32-bit value and thus obviously will return bogus values once you reach 2 GB, just reading the file should still work fine. Or do you need to seek around in the file, too? For that you need _ftelli64() and _fseeki64().
Note that unlike some Unix systems, you don't need any special flag when opening the file to indicate that it is in some "64-bit mode". The underlying Win32 API handles large files just fine.

Resources