I'm working with python3 and do not find an answer for my little problem.
My problem is sending a byte greater than 0x7F over the serial port with my raspberry pi.
example:
import serial
ser=serial.Serial("/dev/ttyAMA0")
a=0x7F
ser.write(bytes(chr(a), 'UTF-8'))
works fine! The receiver gets 0x7F
if a equals 0x80
a=0x80
ser.write(bytes(chr(a), 'UTF-8'))
the receiver gets two bytes: 0xC2 0x80
if i change the type to UTF-16 the receiver reads
0xFF 0xFE 0x80 0x00
The receiver should get only 0x80!
Whats wrong! Thanks for your answers.
UTF-8 specification says that words that are 1 byte/octet start with 0. Because 0x80 is "10000000" in binary, it needs to be preceded by a C2, "11000010 10000000" (2 bytes/octets). 0x7F is 01111111, so when reading it, it knows it is only 1 byte/octet long.
UTF-16 says that all words are represented as 2 byte/octets and has a Byte Order Mark which essentially tells the reader which one is the most-significant octet (or endianness.
Check on UTF-8 for full specifications, but essentially you are moving from the end of the 1 byte range, to the start of the 2 byte range.
I don't understand why you want to send your own custom 1-byte words, but what you are really looking for is any SBCS (Single Byte Character Set) which has a character for those bytes you specify. UTF-8/UTF-16 are MBCS, which means when you encode a character, it may give you more than a single byte.
Before UTF-? came along, everything was SBCS, which meant that any code page you selected was coded using 8-bits. The problem arose when 256 characters were not enough, and they had to make code pages like IBM273 (IBM EBCDIC Germany) and ISO-8859-1 (ANSI Latin 1; Western European) to interpret what "0x2C" meant. Both the sender and receiver needed to set their code page identifier to the same, or they wouldn't understand each other. There is further confusion because these SBCS code pages don't always use the full 256 characters, so "0x7F" may not even exist / have a meaning.
What you could do is encode it to something like codepage 737/IBM 00737, send the "Α" (Greek Alpha) character and it should encode it as 0x80.
If it doesn't work, t'm not sure if you can send the raw byte through pyserial as the write() method seems to require an encoding, you may need to look into the source code to see the lower level details.
a=0x80
ser.write(bytes(chr(a), 'ISO-8859-1'))
Related
How do I get "\x90" to be read as the byte value corresponding to the x86 NOP instruction when supplied as a field within the standard argument list in Linux? I have a buffer being stuffed all the way to 10 and then being overwritten into the next 8 bytes with the new return address, at least so I would like. Because the byte sequence being supplied is not read as a byte sequence but rather as characters, I do not know how to fix this. What next?
I am parsing a JPG in java byte by byte. I am then writing same image byte by byte, and I have come across an oddity. I have tried looking at the spec but I see no reference.
At the end of the SOS section there are three bytes that most sources say 'skip'. But if I write 0x00,0x00,0x00 then java(fx) complains about an invalid value. If I write 0x000x3f0x00 then there is no complaint. (the three byte sequence is what was produced by GIMP in the original file)
I came across an indirect reference to this in the GoLang repo
// - the bytes "\x00\x3f\x00". Section B.2.3 of the spec says that for
// sequential DCTs, those bytes (8-bit Ss, 8-bit Se, 4-bit Ah, 4-bit Al)
// should be 0x00, 0x3f, 0x00<<4 | 0x00.
My question is should I just write 0x3f at this position, or does the value depend upon something else?
In a sequential JPEG scan this value has no meaning. The standard says to set it to 63 but that tells the decoder nothing. You have to process all 64 DCT coefficients in a sequential scan.
In a progressive scan this value means A LOT.
I am new to python 3. I am sending bytes across the wire.
When I send s.send(b'\x8f\x35\x4a\x5f"), and I look at the stack trace, I only see 5f4a358f.
However, if I create a variable:
test=(['\x8f\x35\x4a\x5f'])
print(str(''.join(test).encode()))
I receive b'\xc2\x8f5J_'
As you can see, there is an extra byte /xc2.
My question is two-fold:
1) Why when using str.encode() which encodes a string to bytes which are already "encoded" an extra byte /xc2 is added whereas a literal byte string b'\x8f\x35\x4a\x5f' has no extra encoding is added?
2) If I am passing in bytes into a variable which is used as a buffer to send data across a socket, how does one create and send a set of literal bytes (e.g. b') programmatically such that there is no added /xc2 byte when sent across the wire?
Thank you all for your time! I really appreciate the help.
Because it's not encoded; it's text consisting of U+008F U+0035 U+004A U+005F. And then when you encode it (as UTF-8, per default) the extra byte is added. Either use bytes in the first place, or encode as Latin-1. But use bytes.
I think I'm getting the Mod R/M byte down but I'm still confused by the effective memory address/scaled indexing byte. I'm looking at these sites: http://www.sandpile.org/x86/opc_rm.htm, http://wiki.osdev.org/X86-64_Instruction_Encoding. Can someone encode an example with the destination address being in a register where the SIB is used? Say for example adding an 8-bit register to an address in a 8-bit register with SIB used?
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?'
Is the SIB always used with a source or destination address?
A memory address is never in an 8-bit register, but here's an example of using SIB:
add byte [rax + rdx], 1
This is an instance of add rm8, imm8, 80 /0 ib. /0 indicates that the r field in the ModR/M byte is zero. We must use a SIB here but don't need an immediate offset, so we can use 00b for the mod and 100b for the rm, to form 04h for the ModR/M byte (44h and 84h also work, but wastes space encoding a zero-offset). Looking in the SIB table now, there are two registers both with "scale 1", so the base and index are mostly interchangeable (rsp can not be an index, but we're not using it here). So the SIB byte can be 10h or 02h.
Just putting the bytes in a row now:
80 04 10 01
; or
80 04 02 01
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?
Yes. You saw the note, I'm sure. So it can be either, depending on whether you used an address size override or not. In every reasonable case, it will be rip + sdword. Using the other form gives you a truncated result, I can't immediately imagine any circumstances under which that makes sense to do (for general lea math sure, but not for pointers). Probably (this is speculation though) that possibility only exists to make the address size override work reasonably uniformly.
Is the SIB always used with a source or destination address?
Depends on what you mean. Certainly, if you have a SIB, it will encode a source or destination (because what else is there?) (you might argue that the SIB that can appear in nop rm encodes nothing because nop has neither sources nor destinations). If you mean "which one does it encode", it can be either one. Looking over all instructions, it can most often appear in a source operand. But obviously there are many cases where it can encode the destination - example: see above. If you mean "is it always used", well no, see that table that you were looking at.
I am trying to wrap my head around how the sender identifies the endianness of the sender. I know the initial byte is usually the architecture/type of the sender. For example 0x00 is i386 etc. However, how does the first byte help at all if the receiver has no idea how to interpret it?
Endianness refers to the ordering of bytes into larger numbers, not the order of bits inside a byte. A single byte is always endian-safe; networks transfer byte streams transparently (that is, bytes are received in the same order in which they were sent).