Python3 How to get raw bytes string without encode? - python-3.x

I want to get a string of origin bytes (assemble code) without encoding to another encoding. As the content of bytes is shellcode, I do not need to encode it and want to write it directly as raw bytes.
By simplify, I want to convert "b'\xb7\x00\x00\x00'" to "\xb7\x00\x00\x00" and get the string representation of raw bytes.
For example:
>> byte_code = b'\xb7\x00\x00\x00\x05\x00\x00\x00\x95\x00\x00\x00\x00\x00\x00\x00'
>> uc_str = str(byte_code)[2:-1]
>> print(byte_code, uc_str)
b'\xb7\x00\x00\x00\x05\x00\x00\x00\x95\x00\x00\x00\x00\x00\x00\x00' \xb7\x00\x00\x00\x05\x00\x00\x00\x95\x00\x00\x00\x00\x00\x00\x00
Currently I have only two ugly methods,
>> uc_str = str(byte_code)[2:-1]
>> uc_str = "".join('\\x{:02x}'.format(c) for c in byte_code)
Raw bytes usage:
>> my_template = "const char byte_code[] = 'TPL'"
>> uc_str = str(byte_code)[2:-1]
>> my_code = my_template.replace("TPL", uc_str)
# then write my_code to xx.h
Is there any pythonic way to do this?

Your first method is broken, because any bytes that can be represented as printable ASCII will be, for example:
>>> str(b'\x00\x20\x41\x42\x43\x20\x00')[2:-1]
'\\x00 ABC \\x00'
The second method is actually okay. Since this feature appears to be missing from stdlib I've published all-escapes which provides it.
pip install all-escapes
Example usage:
>>> b"\xb7\x00\x00\x00".decode("all-escapes")
'\\xb7\\x00\\x00\\x00'

I came across this trying to do something similar with some SNMP code.
byte_code = b'\xb7\x00\x00\x00\x05\x00\x00\x00\x95\x00\x00\x00\x00\x00\x00\x00'
text = byte_code.decode('raw_unicode_escape')
writer_func(text)
It worked to send an SNMP Hex string as an OctetString when there was no helper support for hex.
See also standard-encodings and bytes decode
and for anyone looking at the SNMP Set Types

basic of conversion byte / str is this :
>>> b"abc".decode()
'abc'
>>>
or :
>>> sb = b"abc"
>>> s = sb.decode()
>>> s
'abc'
>>>
The inverse is :
>>> "abc".encode()
b'abc'
>>>
or :
>>> s="abc"
>>> sb=s.encode()
>>> sb
b'abc'
>>>
And in your case, you should use errors argument :
>>> b"\xb7".decode(errors="replace")
'�'
>>>

Related

Is there any way to get the direct hexadecimal value in bytes instead of getting string value?

In python3.5 I need to convert the string to IPFIX supported field value for UDP packet. While I am sending string bytes as UDP packet I am unable to recover the string data again. In Wireshark, it says that "Malformed data".
I found that IPFIX supports only the "ASCII" for strings. So I have converted ASCII value to hex and then converted into bytes. But while converting hex("4B") to byte. I am not getting my hex value in bytes instead of I am getting the string in bytes("K").
I have tried the following in the python console. I need exact byte what I have entered. But it seems like b'\x4B' instead of '\x4B' I am getting 'K'. I am using python3.5
b'\x4B'
b'K'
Code: "K".encode("ascii")
Actual OP: b'K'
Expected OP: b'\x4B'
There are multiple ways to do this:
1. The hex method (python 3.5 and up)
>>> 'K'.encode('ascii').hex()
'4b' # type str
2. Using binascii
>>> binascii.hexlify('K'.encode('ascii'))
b'4b' # type bytes
3. Using str.format
>>> ''.join('{:02x}'.format(x) for x in 'K'.encode('ascii'))
'4b' # type str
4. Using format
>>> ''.join(format(x, '02x') for x in 'K'.encode('ascii'))
'4b' # type str
Note: Methods using format are not very performance efficient.
If you really care about the \x you will have to use format, eg:
>>> print(''.join('\\x{:02x}'.format(x) for x in 'K'.encode('ascii')))
\x4b
>>> print(''.join('\\x{:02x}'.format(x) for x in 'KK'.encode('ascii')))
\x4b\x4b
If you care about uppercase then you can use X instead of x, eg:
>>> ''.join('{:02X}'.format(x) for x in 'K'.encode('ascii'))
'4B'
>>> ''.join(format(x, '02X') for x in 'K'.encode('ascii'))
'4B'
Uppercase and with \x:
>>> print(''.join('\\x{:02X}'.format(x) for x in 'Hello'.encode('ascii')))
\x48\x65\x6C\x6C\x6F
If you want bytes instead of str then just encode the output to ascii again:
>>> print(''.join('\\x{:02X}'.format(x) for x in 'Hello'.encode('ascii')).encode('ascii'))
b'\\x48\\x65\\x6C\\x6C\\x6F'

unpack syntax in python 3

I am trying to convert hex numbers into decimals using unpack.
When I use:
from struct import *
unpack("<H",b"\xe2\x07")
The output is: 2018, which is what I want.
The thing is I have my hex data in a list as a string in the form of:
asd = ['e2','07']
My question is is there a simple way of using unpack without the backslashes, the x? Something like so:
unpack("<H","e207")
I know this doesn't work but I hope you get the idea.
For clarification I know I could get the data in the form of b'\x11' in the list but then it's interpreted as ASCII, which I don't want, that's why I have it in the format I showed.
You have hex-encoded data, in a text object. So, to go back to raw hex bytes, you can decode the text string. Please note that this is not the usual convention in Python 3.x (generally, text strings are already decoded).
>>> codecs.decode('e207', 'hex')
b'\xe2\x07'
A convenience function for the same thing:
>>> bytes.fromhex('e207')
b'\xe2\x07'
Now you can struct.unpack those bytes. Putting it all together:
>>> asd = ['e2','07']
>>> text = ''.join(asd)
>>> encoded = codecs.decode(text, 'hex')
>>> struct.unpack("<H", encoded)
(2018,)

Why are all my strings printed as string literals? [duplicate]

How do I print a bytes string without the b' prefix in Python 3?
>>> print(b'hello')
b'hello'
Use decode:
>>> print(b'hello'.decode())
hello
If the bytes use an appropriate character encoding already; you could print them directly:
sys.stdout.buffer.write(data)
or
nwritten = os.write(sys.stdout.fileno(), data) # NOTE: it may write less than len(data) bytes
If the data is in an UTF-8 compatible format, you can convert the bytes to a string.
>>> print(str(b"hello", "utf-8"))
hello
Optionally, convert to hex first if the data is not UTF-8 compatible (e.g. data is raw bytes).
>>> from binascii import hexlify
>>> print(hexlify(b"\x13\x37"))
b'1337'
>>> print(str(hexlify(b"\x13\x37"), "utf-8"))
1337
>>> from codecs import encode # alternative
>>> print(str(encode(b"\x13\x37", "hex"), "utf-8"))
1337
According to the source for bytes.__repr__, the b'' is baked into the method.
One workaround is to manually slice off the b'' from the resulting repr():
>>> x = b'\x01\x02\x03\x04'
>>> print(repr(x))
b'\x01\x02\x03\x04'
>>> print(repr(x)[2:-1])
\x01\x02\x03\x04
To show or print:
<byte_object>.decode("utf-8")
To encode or save:
<str_object>.encode('utf-8')
I am a little late but for Python 3.9.1 this worked for me and removed the -b prefix:
print(outputCode.decode())
It's so simple...
(With that, you can encode the dictionary and list bytes, then you can stringify it using json.dump / json.dumps)
You just need use base64
import base64
data = b"Hello world!" # Bytes
data = base64.b64encode(data).decode() # Returns a base64 string, which can be decoded without error.
print(data)
There are bytes that cannot be decoded by default(pictures are an example), so base64 will encode those bytes into bytes that can be decoded to string, to retrieve the bytes just use
data = base64.b64decode(data.encode())
Use decode() instead of encode() for converting bytes to a string.
>>> import curses
>>> print(curses.version.decode())
2.2

Why is passing bytes to class str constructor special?

Offical Python3 docs say this about passing bytes to the single argument constructor for class str:
Passing a bytes object to str() without the encoding or errors
arguments falls under the first case of returning the informal string
representation (see also the -b command-line option to Python).
Ref: https://docs.python.org/3/library/stdtypes.html#str
informal string representation -> Huh?
Using the Python console (REPL), and I see the following weirdness:
>>> ''
''
>>> b''
b''
>>> str()
''
>>> str('')
''
>>> str(b'')
"b''" # What the heck is this?
>>> str(b'abc')
"b'abc'"
>>> "x" + str(b'')
"xb''" # Woah.
(The question title can be improved -- I'm struggling to find a better one. Please help to clarify.)
The concept behind str seems to be that it returns a "nicely printable" string, usually in a human understandable form. The documentation actually uses the phrase "nicely printable":
If neither encoding nor errors is given, str(object) returns
object.__str__(), which is the “informal” or nicely printable string
representation of object. For string objects, this is the string
itself. If object does not have a __str__() method, then str() falls
back to returning repr(object).
With that in mind, note that str of a tuple or list produces string versions such as:
>>> str( (1, 2) )
'(1, 2)'
>>> str( [1, 3, 5] )
'[1, 3, 5]'
Python considers the above to be the "nicely printable" form for these objects. With that as background, the following seems a bit more reasonable:
>>> str(b'abc')
"b'abc'"
With no encoding provided, the bytes b'abc' are just bytes, not characters. Thus, str falls back to the "nicely printable" form and the six character string b'abc' is nicely printable.

Convert hexadecimal to normal string

I'm using Python 3.3.2 and I want convert a hex to a string.
This is my code:
junk = "\x41" * 50 # A
eip = pack("<L", 0x0015FCC4)
buffer = junk + eip
I've tried use
>>> binascii.unhexlify("4142")
b'AB'
... but I want the output "AB", no "b'AB'". What can I do?
Edit:
buffer = junk + binascii.unhexlify(eip).decode('ascii')
binascii.Error: Non-hexadecimal digit found
The problem is I can't concatenate junk + eip.
Thank you.
What that b stands for is to denote that is a bytes class, i.e. a string of bytes. If you want to convert that into a string you want to use the decode method.
>>> type(binascii.unhexlify(b"4142"))
<class 'bytes'>
>>> binascii.unhexlify(b"4142").decode('ascii')
'AB'
This results in a string, which is a string of unicode characters.
Edit:
If you want to work purely with binary data, don't do decode, stick with using the bytes type, so in your edited example:
>>> #- junk = "\x41" * 50 # A
>>> junk = b"\x41" * 50 # A
>>> eip = pack("<L", 0x0015FCC4)
>>> buffer = junk + eip
>>> buffer
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xc4\xfc\x15\x00'
Note the b in b"\x41", which denote that as a binary string, i.e. standard string type in python2, or literally a string of bytes rather than a string of unicode characters which are two completely different things.
That's just a literal representation. Don't worry about the b, as it's not actually part of the string itself.
See What does the 'b' character do in front of a string literal?

Resources