Why does a fully static Rust ELF binary have a Global Offset Table (GOT) section? - rust

This code, when compiled for the x86_64-unknown-linux-musl target, produces a .got section:
fn main() {
println!("Hello, world!");
}
$ cargo build --release --target x86_64-unknown-linux-musl
$ readelf -S hello
There are 30 section headers, starting at offset 0x26dc08:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[12] .got PROGBITS 0000000000637b58 00037b58
00000000000004a8 0000000000000008 WA 0 0 8
...
According to this answer for analogous C code, the .got section is an artifact that can be safely removed. However, it segfaults for me:
$ objcopy -R.got hello hello_no_got
$ ./hello_no_got
[1] 3131 segmentation fault (core dumped) ./hello_no_got
Looking at the disassembly, I see that the GOT basically holds static function addresses:
$ objdump -d hello -M intel
...
0000000000400340 <_ZN5hello4main17h5d434a6e08b2e3b8E>:
...
40037c: ff 15 26 7a 23 00 call QWORD PTR [rip+0x237a26] # 637da8 <_GLOBAL_OFFSET_TABLE_+0x250>
...
$ objdump -s -j .got hello | grep 637da8
637da8 50434000 00000000 b0854000 00000000 PC#.......#.....
$ objdump -d hello -M intel | grep 404350
0000000000404350 <_ZN3std2io5stdio6_print17h522bda9f206d7fddE>:
404350: 41 57 push r15
The number 404350 comes from 50434000 00000000, which is a little-endian 0x00000000000404350 (this was not obvious; I had to run the binary under GDB to figure this out!)
This is perplexing, since Wikipedia says that
[GOT] is used by executed programs to find during runtime addresses of global variables, unknown in compile time. The global offset table is updated in process bootstrap by the dynamic linker.
Why is the GOT present? From the disassembly, it looks like the compiler knows all the needed addresses. As far as I know, there is no bootstrap done by the dynamic linker: there is neither INTERP nor DYNAMIC program headers present in my binary;
Why does the GOT store function pointers? Wikipedia says the GOT is only for global variables, and function pointers should be contained in the PLT.

TL;DR summary: the GOT is really a rudimentary build artifact, which I was able to get rid of via simple machine code manipulations.
Breakdown
If we look at
$ objdump -dj .text hello
and search for GLOBAL, we see only four distinct types of references to the GOT (constants differ):
40037c: ff 15 26 7a 23 00 call QWORD PTR [rip+0x237a26] # 637da8 <_GLOBAL_OFFSET_TABLE_+0x250>
425903: ff 25 5f 26 21 00 jmp QWORD PTR [rip+0x21265f] # 637f68 <_GLOBAL_OFFSET_TABLE_+0x410>
41d8b5: 48 3b 1d b4 a5 21 00 cmp rbx,QWORD PTR [rip+0x21a5b4] # 637e70 <_GLOBAL_OFFSET_TABLE_+0x318>
40b259: 48 83 3d 7f cb 22 00 cmp QWORD PTR [rip+0x22cb7f],0x0 # 637de0 <_GLOBAL_OFFSET_TABLE_+0x288>
40b260: 00
All of these are reading instructions, which means that the GOT is not modified at runtime. This in turn means that we can statically resolve the addresses that the GOT refers to! Let's consider the reference types one by one:
call QWORD PTR [rip+0x2126be] simply says "go to address [rip+0x2126be], take 8 bytes from there, interpret them as a function address and call the function". We can simply replace this instruction with a direct call:
40037c: e8 cf 3f 00 00 call 404350 <_ZN3std2io5stdio6_print17h522bda9f206d7fddE>
400381: 90 nop
Notice the nop at the end: we need to replace all the 6 bytes of the machine code that constitute the first instruction, but the instruction we replace it with is only 5 bytes, so we need to pad it. Fundamentally, as we are patching a compiled binary, we can replace an instruction with a another one only if it is not longer.
jmp QWORD PTR [rip+0x21265f] is the same as the previous one, but instead of calling an address it jumps to it. This turns into:
425903: e9 b8 f7 ff ff jmp 4250c0 <_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$9write_str17hc384e51187942069E>
425908: 90 nop
cmp rbx,QWORD PTR [rip+0x21a5b4] - this takes 8 bytes from [rip+0x21a5b4] and compares them to the contents of rbx register. This one is tricky, since cmp can not compare register contents to an 64-bit immediate value. We could use another register for that, but we don't know which of the registers are used around this instruction. A careful solution would be something like
push rax
mov rax,0x0000006363c0
cmp rbx,rax
pop rax
But that would be way beyond our limit of 7 bytes. The real solution stems from an observation that the GOT contains only addresses; our address space is (roughly) contained in range [0x400000; 0x650000], which can be seen in the program headers:
$ readelf -l hello
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000035b50 0x0000000000035b50 R E 0x200000
LOAD 0x0000000000036380 0x0000000000636380 0x0000000000636380
0x0000000000001dd0 0x0000000000003918 RW 0x200000
...
It follows that we can (mostly) get away with only comparing 4 bytes of a GOT entry instead of 8. So the substitution is:
41d8b5: 81 fb c0 63 63 00 cmp ebx,0x6363c0
41d8bb: 90 nop
The last one consists of two lines of objdump output, since 8 bytes do not fit in one line:
40b259: 48 83 3d 7f cb 22 00 cmp QWORD PTR [rip+0x22cb7f],0x0 # 637de0 <_GLOBAL_OFFSET_TABLE_+0x288>
40b260: 00
It just compares 8 bytes of the GOT to a constant (in this case, 0x0). In fact, we can do the comparison statically; if the operands compare equal, we replace the comparison with
40b259: 48 39 c0 cmp rax,rax
40b25c: 90 nop
40b25d: 90 nop
40b25e: 90 nop
40b25f: 90 nop
40b260: 90 nop
Obviously, a register is always equal to itself. A lot of padding needed here!
If the left operand is greater than the right one, we replace the comparison with
40b259: 48 83 fc 00 cmp rsp,0x0
40b25d: 90 nop
40b25e: 90 nop
40b25f: 90 nop
40b260: 90 nop
In practice, rsp is always greater than zero.
If the left operand is smaller than the right one, things get a bit more complicated, but since we have a whole lot of bytes (8!) we can manage:
40b259: 50 push rax
40b25a: 31 c0 xor eax,eax
40b25c: 83 f8 01 cmp eax,0x1
40b25f: 58 pop rax
40b260: 90 nop
Notice that the second and the third instructions use eax instead of rax, since cmp and xor involving eax take one less byte than with rax.
Testing
I have written a Python script to do all these substitutions automatically (it's a bit hacky and relies on parsing of objdump output though):
#!/usr/bin/env python3
import re
import sys
import argparse
import subprocess
def read_u64(binary):
return sum(binary[i] * 256 ** i for i in range(8))
def distance_u32(start, end):
assert abs(end - start) < 2 ** 31
diff = end - start
if diff < 0:
return 2 ** 32 + diff
else:
return diff
def to_u32(x):
assert 0 <= x < 2 ** 32
return bytes((x // (256 ** i)) % 256 for i in range(4))
class GotInstruction:
def __init__(self, lines, symbol_address, symbol_offset):
self.address = int(lines[0].split(":")[0].strip(), 16)
self.offset = symbol_offset + (self.address - symbol_address)
self.got_offset = int(lines[0].split("(File Offset: ")[1].strip().strip(")"), 16)
self.got_offset = self.got_offset % 0x200000 # No idea why the offset is actually wrong
self.bytes = []
for line in lines:
self.bytes += [int(x, 16) for x in line.split("\t")[1].split()]
class TextDump:
symbol_regex = re.compile(r"^([0-9,a-f]{16}) <(.*)> \(File Offset: 0x([0-9,a-f]*)\):")
def __init__(self, binary_path):
self.got_instructions = []
objdump_output = subprocess.check_output(["objdump", "-Fdj", ".text", "-M", "intel",
binary_path])
lines = objdump_output.decode("utf-8").split("\n")
current_symbol_address = 0
current_symbol_offset = 0
for line_group in self.group_lines(lines):
match = self.symbol_regex.match(line_group[0])
if match is not None:
current_symbol_address = int(match.group(1), 16)
current_symbol_offset = int(match.group(3), 16)
elif "_GLOBAL_OFFSET_TABLE_" in line_group[0]:
instruction = GotInstruction(line_group, current_symbol_address,
current_symbol_offset)
self.got_instructions.append(instruction)
#staticmethod
def group_lines(lines):
if not lines:
return
line_group = [lines[0]]
for line in lines[1:]:
if line.count("\t") == 1: # this line continues the previous one
line_group.append(line)
else:
yield line_group
line_group = [line]
yield line_group
def __iter__(self):
return iter(self.got_instructions)
def read_binary_file(path):
try:
with open(path, "rb") as f:
return f.read()
except (IOError, OSError) as exc:
print(f"Failed to open {path}: {exc.strerror}")
sys.exit(1)
def write_binary_file(path, content):
try:
with open(path, "wb") as f:
f.write(content)
except (IOError, OSError) as exc:
print(f"Failed to open {path}: {exc.strerror}")
sys.exit(1)
def patch_got_reference(instruction, binary_content):
got_data = read_u64(binary_content[instruction.got_offset:])
code = instruction.bytes
if code[0] == 0xff:
assert len(code) == 6
relative_address = distance_u32(instruction.address, got_data)
if code[1] == 0x15: # call QWORD PTR [rip+...]
patch = b"\xe8" + to_u32(relative_address - 5) + b"\x90"
elif code[1] == 0x25: # jmp QWORD PTR [rip+...]
patch = b"\xe9" + to_u32(relative_address - 5) + b"\x90"
else:
raise ValueError(f"unknown machine code: {code}")
elif code[:3] == [0x48, 0x83, 0x3d]: # cmp QWORD PTR [rip+...],<BYTE>
assert len(code) == 8
if got_data == code[7]:
patch = b"\x48\x39\xc0" + b"\x90" * 5 # cmp rax,rax
elif got_data > code[7]:
patch = b"\x48\x83\xfc\x00" + b"\x90" * 3 # cmp rsp,0x0
else:
patch = b"\x50\x31\xc0\x83\xf8\x01\x90" # push rax
# xor eax,eax
# cmp eax,0x1
# pop rax
elif code[:3] == [0x48, 0x3b, 0x1d]: # cmp rbx,QWORD PTR [rip+...]
assert len(code) == 7
patch = b"\x81\xfb" + to_u32(got_data) + b"\x90" # cmp ebx,<DWORD>
else:
raise ValueError(f"unknown machine code: {code}")
return dict(offset=instruction.offset, data=patch)
def make_got_patches(binary_path, binary_content):
patches = []
text_dump = TextDump(binary_path)
for instruction in text_dump.got_instructions:
patches.append(patch_got_reference(instruction, binary_content))
return patches
def apply_patches(binary_content, patches):
for patch in patches:
offset = patch["offset"]
data = patch["data"]
binary_content = binary_content[:offset] + data + binary_content[offset + len(data):]
return binary_content
def main():
parser = argparse.ArgumentParser()
parser.add_argument("binary_path", help="Path to ELF binary")
parser.add_argument("-o", "--output", help="Output file path", required=True)
args = parser.parse_args()
binary_content = read_binary_file(args.binary_path)
patches = make_got_patches(args.binary_path, binary_content)
patched_content = apply_patches(binary_content, patches)
write_binary_file(args.output, patched_content)
if __name__ == "__main__":
main()
Now we can get rid of the GOT for real:
$ cargo build --release --target x86_64-unknown-linux-musl
$ ./resolve_got.py target/x86_64-unknown-linux-musl/release/hello -o hello_no_got
$ objcopy -R.got hello_no_got
$ readelf -e hello_no_got | grep .got
$ ./hello_no_got
Hello, world!
I have also tested it on my ~3k LOC app, and it seems to work alright.
P.S. I am not an expert in assembly, so some of the above might be inaccurate.

Related

python3 minimalmodbus query PZEM-014 -> PZEM-016 AC and DC multimeters

[SOLVED] Could someone help on this? I use minimalmodbus on python 3.11.1 to query a USB<->RS485 transceiver (see picture below) that is connected to a PZEM-016 (for measuring AC power consumption: voltage, current, power, energy, frequency...).
When used with the Simply Modbus Master 8.1.2, I can read the data without any problem.
With Python and minimalmodbus, I can only get up to 7 registers and of course the data is wrong (I can't pass the 7th register).
Here is the code
import minimalmodbus
#######################################
# CONSTANTS declaration
#######################################
# Communication constants
PORT_NAME ='COM3'
BAUD_RATE = 9600
BYTE_SIZE = 8
STOP_BITS = 1
TIMEOUT_VAL = 0.7
SLAVE_ADDR = 1
READ_MODE_NB = 4 # minimalmodus default to 3 PZEM requires 4 or it issues an error
WRITE_FCT_NB = 16 #HEX: 0x10
DEBUG_VAL = True
ECHO_VAL = False # Changed back from True to False after solution found
# Register constants
BASE_REG = 0
VOLT_REG = 0 #0x0000
CURR_LOW_REG = 1 #0x0001
CURR_HIGH_REG = 2 #0x0002
POWER_LOW_REG = 3 #0x0003
POWER_HIGH_REG = 4 #0x0004
ENGY_LOW_REG = 5 #0x0005
ENGY_HIGH_REG = 6 #0x0006
FREQCY_REG = 7 #0x0007
POWER_FACTOR_REG = 8 #0x0008
ALARM_REG = 9 #0x0009
#######################################
# Variables initialization
#######################################
instrument = ''
response = ''
voltage = ''
current_high = ''
current_low = ''
power_high = ''
power_low = ''
energy_high = ''
energy_low = ''
frequency = ''
power_factor = ''
alarm_status = ''
#######################################
# Set up the instrument
#######################################
instrument = minimalmodbus.Instrument(PORT_NAME, SLAVE_ADDR)
#######################################
# Explicit instrument settings
#######################################
instrument.serial.baudrate = BAUD_RATE # baud rate
instrument.serial.bytesize = BYTE_SIZE # data bits
instrument.serial.parity = minimalmodbus.serial.PARITY_NONE # parity
instrument.serial.stopbits = STOP_BITS # stop bit
instrument.serial.timeout = TIMEOUT_VAL # seconds
instrument.mode = minimalmodbus.MODE_RTU # communication protocol
# instrument.address = SLAVE_ADDR # slave address when only one slave in the bus
instrument.debug = DEBUG_VAL
instrument.handle_local_echo = ECHO_VAL
instrument.close_port_after_each_call = True
instrument.clear_buffers_before_each_transaction = True
#######################################
# Typical request format of the master
# Slave address: x001
# Command type: 0x04
# First register address high = left bytes: 0x00
# First register address low = right bytes: 0x00 (=> 0x0000 = 0)
# Number of registers high = left bytes: 0x00
# Number of registers low = right bytes: 0x0A (=> 0x000A = 10)
# CRC check high = left bytes: 0x70
# CRC check low = right bytes: 0x0D (=> 0x700D)
# 01 04 00 00 00 0A 70 0D
# Typical reply format of the slave
# Slave address: 0x01
# Command type: 0x04
# Number of bytes: 0x14 (= 20 bytes = 2 bytes per requested register)
# Register 1: Voltage 8 bits
# Register 1 data high byte = left byte: 0x09
# Register 1 data low byte = right byte: 0x2E (=> 0x092E = 2350 = 235.0V)
# Register 2 + 3: Current 16 bits
# Register 2 data low byte = right bits: 0x01
# Register 2 data high byte = left bits: 0x78
# Register 3 data low byte = right bits: 0x00
# Register 3 data high byte = left bits: 0x00 (=> 0x00000178 = 376 = 0.376A)
# Register 4 + 5: Power 16 bits
# Register 4 data low byte = right bits: 0x01
# Register 4 data high byte = left bits: 0xE7
# Register 5 data low byte = right bits: 0x00
# Register 5 data high byte = left bits: 0x00 (=> 0x000001E7 = 487 = 48.7W)
# Register 6 + 7: Energy 16 bits
# Register 6 data low byte = right bits: 0x02
# Register 6 data high byte = left bits: 0xEA
# Register 7 data low byte = right bits: 0x00
# Register 7 data high byte = left bits: 0x00 (=> 0x000002EA = 746 = 746Wh)
# Register 8: Frequency 8 bits
# Register 8 data high byte = left byte: 0x01
# Register 8 data low byte = right byte: 0xF4 (=> 0x01F4 = 500 = 50.0Hz)
# Register 9: Power factor 8 bits
# Register 9 data high byte = left byte: 0x00
# Register 9 data low byte = right byte: 0x37 (=> 0x0037 = 55 = 0.55)
# Register 10: Alarm status 8 bits
# Register 10 data high byte = left byte: 0x00
# Register 10 data low byte = right byte: 0x00 (=> 0x0000 = 0 = No alarm)
# CRC check high = left bytes: 0xE5 (= 229)
# CRC check low = right bytes: 0xFD (= 253) (=> 0xE5FD)
# LED lights turned on
# 01 04 14 09 2E 01 78 00 00 01 E7 00 00 02 EA 00 00 01 F4 00 37 00 00 39 51
# LED lights turned off
# 01 04 14 09 34 00 00 00 00 00 05 00 00 02 E4 00 00 01 F3 00 64 00 00 E5 FD
#######################################
print('instrument: ', instrument)
#######################################
# Read and print registers' data
#######################################
response = instrument.read_registers(VOLT_REG, 10, READ_MODE_NB)
print('Response raw: ', response)
With the wrong read mode value, I get an error message with the read_registers() command
>>> ReadSerial.py
< MinimalModbus debug mode. Will write to instrument (expecting 25 bytes back): 01 03 00 00 00 0A C5 CD (8 bytes)
MinimalModbus debug mode. Clearing serial buffers for port COM3
MinimalModbus debug mode. No sleep required before write. Time since previous read: 167521078.00 ms, minimum silent period: 4.01 ms.
MinimalModbus debug mode. Closing port COM3
MinimalModbus debug mode. Response from instrument: 01 83 02 C0 F1 (5 bytes), roundtrip time: 0.8 ms. Timeout for reading: 700.0 ms.
< Traceback (most recent call last):
File "...\ReadSerial.py", line 127, in <module>
response = instrument.read_registers(VOLT_REG, 10, READ_MODE_NB)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\minimalmodbus.py", line 904, in read_registers
returnvalue = self._generic_command(
^^^^^^^^^^^^^^^^^^^^^^
File "...\minimalmodbus.py", line 1245, in _generic_command
payload_from_slave = self._perform_command(functioncode, payload_to_slave)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\minimalmodbus.py", line 1329, in _perform_command
payload_from_slave = _extract_payload(
^^^^^^^^^^^^^^^^^
File "...\minimalmodbus.py", line 1880, in _extract_payload
_check_response_slaveerrorcode(response)
File "...\minimalmodbus.py", line 3538, in _check_response_slaveerrorcode
raise error
minimalmodbus.IllegalRequestError: Slave reported illegal data address
I matched everything I could with the Simply Modbus Master example, except for the query mode, which value should be 4 (minimalmodbus default read mode is 3).
Update from above
I finally found the error in my code, which was very stupid, but errors are often stupid like a missing ";".
I still have to figure out why the read_register() command doesn't work (without the "s" for reading one register at a time). I can't pass the 7th register without errors and the returned data is wrong.
I leave the question and the answer for the community, in case someone uses the same PZEM-014 -> PZEM0-17 and looks for a python solution to retrieve the data

How does BL instruction jump to invalid instruction still manage to work corretly

I'm practice to reverse engineering a il2cpp unity project
Things I done:
get the apk
using Apktool to extract files
open libunity.so with Ghidra ( or IDA works too )
And I found a wired block of instructions like :
004ac818 f4 0f 1e f8 str x20,[sp, #local_20]!
004ac81c f3 7b 01 a9 stp x19,x30,[sp, #local_10]
004ac820 e1 03 1f 2a mov w1,wzr
004ac824 77 b5 00 94 bl FUN_004d9e00
I follow bl FUN_004d9e00 and I found :
FUN_004d9e00
004d9e00 6e ?? 6Eh n
004d9e01 97 ?? 97h
004d9e02 85 ?? 85h
004d9e03 60 ?? 60h `
004d9e04 6d ?? 6Dh m
But here is the thing, the instruction in FUN_004d9e00 is not a valid one. How can the libunity.so still work properly
Perhaps there is a relocation symbol for address 0x004ac824? In that case the linker would modify the instruction when libunity.so is loaded, and it would end up calling a different address (maybe in a different shared library).

Find frame base and variable locations using DWARF version 4

I'm following Eli Bendersky's blog on parsing the DWARF debug information. He shows an example of parsing the binary with DWARF version 2 in his blog. The frame base of a function (further used for retrieving local variables) can be retrieved from the location list:
<1><71>: Abbrev Number: 5 (DW_TAG_subprogram)
<72> DW_AT_external : 1
<73> DW_AT_name : (...): do_stuff
<77> DW_AT_decl_file : 1
<78> DW_AT_decl_line : 4
<79> DW_AT_prototyped : 1
<7a> DW_AT_low_pc : 0x8048604
<7e> DW_AT_high_pc : 0x804863e
<82> DW_AT_frame_base : 0x0 (location list)
<86> DW_AT_sibling : <0xb3>
...
$ objdump --dwarf=loc tracedprog2
Contents of the .debug_loc section:
Offset Begin End Expression
00000000 08048604 08048605 (DW_OP_breg4: 4 )
00000000 08048605 08048607 (DW_OP_breg4: 8 )
00000000 08048607 0804863e (DW_OP_breg5: 8 )
However, I find in DWARF version 4 there is no such .debug_loc section. Here is the function info on my machine:
<1><300>: Abbrev Number: 17 (DW_TAG_subprogram)
<301> DW_AT_external : 1
<301> DW_AT_name : (indirect string, offset: 0x1e0): do_stuff
<305> DW_AT_decl_file : 1
<306> DW_AT_decl_line : 3
<307> DW_AT_decl_column : 6
<308> DW_AT_prototyped : 1
<308> DW_AT_low_pc : 0x1149
<310> DW_AT_high_pc : 0x47
<318> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<31a> DW_AT_GNU_all_tail_call_sites: 1
Line <318> indicates the frame base is 1 byte block: 9c (DW_OP_call_frame_cfa). Any idea how to find the frame base for the DWARF v4 binaries?
Update based on #Employed Russian's answer:
The frame_base of a subprogram seems to point to the Canonical Frame Address (CFA), which is the RBP value before the call instruction.
<2><329>: Abbrev Number: 19 (DW_TAG_variable)
<32a> DW_AT_name : (indirect string, offset: 0x7d): my_local
<32e> DW_AT_decl_file : 1
<32f> DW_AT_decl_line : 5
<330> DW_AT_decl_column : 9
<331> DW_AT_type : <0x65>
<335> DW_AT_location : 2 byte block: 91 6c (DW_OP_fbreg: -20)
So a local variable (my_local in the above example) can be located by the CFA using this calculation: &my_local = CFA - 20 = (current RBP + 16) - 20 = current RBP - 4.
Verify it by checking the assembly:
void do_stuff(int my_arg)
{
1149: f3 0f 1e fa endbr64
114d: 55 push %rbp
114e: 48 89 e5 mov %rsp,%rbp
1151: 48 83 ec 20 sub $0x20,%rsp
1155: 89 7d ec mov %edi,-0x14(%rbp)
int my_local = my_arg + 2;
1158: 8b 45 ec mov -0x14(%rbp),%eax
115b: 83 c0 02 add $0x2,%eax
115e: 89 45 fc mov %eax,-0x4(%rbp)
my_local is at -0x4(%rbp).
This isn't about DWARFv2 vs. DWARFv4 -- using either version the compiler may chose to use or not use location lists. Your compiler chose not to.
Any idea how to find the frame base for the DWARF v4 binaries?
It tells you right there: use the CFA pseudo-register, also known as "canonical frame address".
That "imaginary" register has the same value that %rsp had just before the current function was called. That is, current function's return address is always stored at CFA+0, and %rsp == CFA+8 on entry into the function.
If the function uses frame pointer, then previous value of %rbp is usually stored at CFA+8.
More info here.

Convert binary <something> of hex bytes to list of decimal values

I have the following binary (something):
test = b'40000000111E0C09'
Every two digits is a hexadecimal number I want out, so the following is clearer than the above:
test = b'40 00 00 00 11 1E 0C 09'
0x40 = 64 in decimal
0x00 = 0 in decimal
0x11 = 17 in decimal
0x1E = 30 in decimal
You get the idea.
How can I use struct.unpack(fmt, binary) to get the values out? I ask about struct.unpack() because it gets more complicated... I have a little-endian 4-byte integer in there... The last four bytes were:
b'11 1E 0C 09'
What is the above in decimal, assuming it's little-endian?
Thanks a lot! This is actually from a CAN bus, which I'm accessing as a serial port (frustrating stuff..)
Assuming you have string b'40000000111E0C09', you can use codecs.decode() with hex parameter to decode it to bytes form:
import struct
from codecs import decode
test = b'40000000111E0C09'
test_decoded = decode(test, 'hex') # from hex string to bytes
for i in test_decoded:
print('{:#04x} {}'.format(i, i))
Prints:
0x40 64
0x00 0
0x00 0
0x00 0
0x11 17
0x1e 30
0x0c 12
0x09 9
To get last four bytes as UINT32 (little-endian), you can do then (struct docs)
print( struct.unpack('<I', test_decoded[-4:]) )
Prints:
(151789073,)

Gameboy Emulator pop off empty stack

I'm working on a Gameboy Emulator, and I've reached a point in the ROM where I get opcode 0xD1 (pop DE off stack) but the stack is empty (no values have been pushed onto it). All unknown opcodes return an error, and all other instructions seem to be working fine.
Is it an error in my programming, the ROM, or is this just a quick way for the program to set DE to 0x0000?
Even if no value has been PUSHed to the stack, POP will retrieve the value stored at the address in SP to the specified register pair, and SP will be incremented by 2.
In your example, if SP has been initialized to, say wD000, and that the WRAM is initialized to 0 beforehand, POP DE would effectively load 0 to DE, and increment the Stack Pointer by 2.
21 00 C0 ld hl,C000 ;Start of WRAM
01 FF 1F ld bc,1FFF ;Length of WRAM
AF xor a ;a = 0
22 ldi (hl),a ;Blanks WRAM
0B dec bc
78 ld a,b
B1 or c
20 F9 jr nz,0158 ;Loops until WRAM is cleared
21 00 D0 ld hl,D000
F9 ld sp,hl ;SP = 0xD000
D1 pop de ;de = 0x0000, SP = 0xD002
Also, please note that the CALL instruction pushes the return address to the stack, and decrements SP by 2. In the same way, RET retrieves the address from the stack, and increases SP by 2.

Resources