Using Protocol Buffer to Serialize Bytes Python3 - python-3.x

I am trying to serialize a bytes object - which is an initialization vector for my program's encryption. But, the Google Protocol Buffer only accepts strings. It seems like the error starts with casting bytes to string. Am I using the correct method to do this? Thank you for any help or guidance!
Or also, can I make the Initialization Vector a string object for AES-CBC mode encryption?
Code
Cast the bytes to a string
string_iv = str(bytes_iv, 'utf-8')
Serialize the string using SerializeToString():
serialized_iv = IV.SerializeToString()
Use ParseToString() to recover the string:
IV.ParseFromString( serialized_iv )
And finally, UTF-8 encode the string back to bytes:
bytes_iv = bytes(IV.string_iv, encoding= 'utf-8')
Error
string_iv = str(bytes_iv, 'utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9b in position 3: invalid start byte

If you must cast an arbitrary bytes object to str, these are your option:
simply call str() on the object. It will turn it into repr form, ie. something that could be parsed as a bytes literal, eg. "b'abc\x00\xffabc'"
decode with "latin1". This will always work, even though it technically makes no sense if the data isn't text encoded with Latin-1.
use base64 or base85 encoding (the standard library has a base64 module wich covers both)

Related

Python3 decoding binary string with hex numbers higher than \x7f

i try to port some bmv2 thrift python2 code to python3 and have the following problem:
python2:
import struct
def demo(byte_list):
f = 'B' * len(byte_list)
r = struct.pack(f, *byte_list)
return r
demo([255, 255])
"\xff\xff"
ported to python3 it returns a binary string b"\xff\xff" because the struct module changed.
If i try to decode by r.decode() an exception throws because \xff is reserved in the unicode table.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
The easiest solution would be to concatenate the string by myself. I tried with a self made string like "\x01" and it works, if i try "\xff" it does not work with thrift. I think because "\xff" is "ÿ" in unicode and the thrift server expects "\xff".
I tried different encodings and raw strings.
TL;DR: Is there any way to decode a binary string containing \xff or in general higher than \x7f (which is ord 127) in python3? b"\xff" => "x\ff" OR use the old python2 struct import?

How do I decode a dictionary of bytes to utf-8?

I'm trying to figure out how to convert the values of a dictionary from bytes to strings as the backend only supports primitive types.
oledata = {
'macros': macros,
'data': analysis
}
s = str(oledata)
save_data_to_s3(json.dumps(s), ['olevba3'])
As you can see, the values of this dict are bytes. Now this code will execute without errors on my test sample but the output has the b' prefix in front of the values (data), which will break the database. Dict's also have no decode() functionality which is why I used str(), but it must be doing something wrong since the values are still coming out with the b' prefix. Which leads to my general question, how do you decode the values of a dictionary to utf-8 format?
my_str = b"Hello" # b means its a byte string
new_str = my_str.decode('utf-8') # Decode using the utf-8 encoding
print(new_str)

How to Turn string into bytes?

Using python3 and I've got a string which displayed as bytes
strategyName=\xe7\x99\xbe\xe5\xba\xa6
I need to change it into readable chinese letter through decode
orig=b'strategyName=\xe7\x99\xbe\xe5\xba\xa6'
result=orig.decode('UTF-8')
print()
which shows like this and it is what I want
strategyName=百度
But if I save it in another string,it works different
str0='strategyName=\xe7\x99\xbe\xe5\xba\xa6'
result_byte=str0.encode('UTF-8')
result_str=result_byte.decode('UTF-8')
print(result_str)
strategyName=ç¾åº¦é£é©ç­ç¥
Please help me about why this happening,and how can I fix it.
Thanks a lot
Your problem is using a str literal when you're trying to store the UTF-8 encoded bytes of your string. You should just use the bytes literal, but if that str form is necessary, the correct approach is to encode in latin-1 (which is a 1-1 converter for all ordinals below 256 to the matching byte value) to get the bytes with utf-8 encoded data, then decode as utf-8:
str0 = 'strategyName=\xe7\x99\xbe\xe5\xba\xa6'
result_byte = str0.encode('latin-1') # Only changed line
result_str = result_byte.decode('UTF-8')
print(result_str)
Of course, the other approach could be to just type the Unicode escapes you wanted in the first place instead of byte level escapes that correspond to a UTF-8 encoding:
result_str = 'strategyName=\u767e\u5ea6'
No rigmarole needed.

bytes() initializer adding an additional byte?

I initialize a utf-8 encoding string in python3:
bytes('\xc2', encoding="utf-8", errors="strict")
but on writing it out I get two bytes!
>>> s = bytes('\xc2', encoding="utf-8", errors="strict")
>>> s
b'\xc3\x82'
Where is this additional byte coming from? Why should I not be able to encode any hex value up to 254 (I can understand that 255 is potentially reserved to extend to utf-16)?
The Unicode codepoint "\xc2" (which can also be written as "Â"), is two bytes long when encoded with the utf-8 encoding. If you were expecting it to be the single byte b'\xc2', you probably want to use a different encoding, such as "latin-1":
>>> s = bytes("\xc2", encoding="latin-1", errors="strict")
>>> s
b'\xc2'
If you area really creating "\xc2" directly with a literal though, there's no need to mess around with the bytes constructor to turn it into a bytes instance. Just use the b prefix on the literal to create the bytes directly:
s = b"\xc2"

Converting a string to and from Base 64 [duplicate]

This question already has answers here:
Why do I need 'b' to encode a string with Base64?
(5 answers)
Closed 6 years ago.
I am trying to write two programs one that converts a string to base64 and then another that takes a base64 encoded string and converts it back to a string.
so far i cant get past the base64 encoding part as i keep getting the error
TypeError: expected bytes, not str
my code looks like this so far
def convertToBase64(stringToBeEncoded):
import base64
EncodedString= base64.b64encode(stringToBeEncoded)
return(EncodedString)
A string is already 'decoded', thus the str class has no 'decode' function.Thus:
AttributeError: type object 'str' has no attribute 'decode'
If you want to decode a byte array and turn it into a string call:
the_thing.decode(encoding)
If you want to encode a string (turn it into a byte array) call:
the_string.encode(encoding)
In terms of the base 64 stuff:
Using 'base64' as the value for encoding above yields the error:
LookupError: unknown encoding: base64
Open a console and type in the following:
import base64
help(base64)
You will see that base64 has two very handy functions, namely b64decode and b64encode. b64 decode returns a byte array and b64encode requires a bytes array.
To convert a string into it's base64 representation you first need to convert it to bytes. I like utf-8 but use whatever encoding you need...
import base64
def stringToBase64(s):
return base64.b64encode(s.encode('utf-8'))
def base64ToString(b):
return base64.b64decode(b).decode('utf-8')

Resources