I'm not understanding something about hashlib. I'm not sure why I can decode a regular byte object, but can't decode a hash that's returned as a byte object. I keep getting this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 1: invalid start byte
Here's my test code that's producing this error. The error is on line 8 (h2 = h.decode('utf-8'))
import hashlib
pw = 'wh#teV)r'
salt = 'b7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua'
pwd = pw + salt
h = hashlib.sha512(pwd.encode('utf-8')).digest()
print(h)
h2 = h.decode('utf-8')
print(h2)
If I don't hash it, it works perfectly fine...
>>> pw = 'wh#teV)r'
>>> salt = 'b7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua'
>>> pwd = pw + salt
>>> h = pwd.encode('utf-8')
>>> print(h)
b'wh#teV)rb7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua'
>>> h2 = h.decode('utf-8')
>>> print(h2)
wh#teV)rb7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua
So I'm guessing I'm not understanding something about the hash, but I have no clue what I'm missing.
In the second example you're just encoding to UTF-8 and then decoding the result straight back.
In the first example, on the other hand, you're encoding to UTF-8, messing about with the bytes, and then trying to decode it as though it's still UTF-8. Whether the resulting bytes are still valid as UTF-8 is purely down to chance (and even if it is still valid UTF-8, the Unicode string it represents will bear no relation to the original string).
Related
I am trying to serialize a bytes object - which is an initialization vector for my program's encryption. But, the Google Protocol Buffer only accepts strings. It seems like the error starts with casting bytes to string. Am I using the correct method to do this? Thank you for any help or guidance!
Or also, can I make the Initialization Vector a string object for AES-CBC mode encryption?
Code
Cast the bytes to a string
string_iv = str(bytes_iv, 'utf-8')
Serialize the string using SerializeToString():
serialized_iv = IV.SerializeToString()
Use ParseToString() to recover the string:
IV.ParseFromString( serialized_iv )
And finally, UTF-8 encode the string back to bytes:
bytes_iv = bytes(IV.string_iv, encoding= 'utf-8')
Error
string_iv = str(bytes_iv, 'utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9b in position 3: invalid start byte
If you must cast an arbitrary bytes object to str, these are your option:
simply call str() on the object. It will turn it into repr form, ie. something that could be parsed as a bytes literal, eg. "b'abc\x00\xffabc'"
decode with "latin1". This will always work, even though it technically makes no sense if the data isn't text encoded with Latin-1.
use base64 or base85 encoding (the standard library has a base64 module wich covers both)
How do I print a bytes string without the b' prefix in Python 3?
>>> print(b'hello')
b'hello'
Use decode:
>>> print(b'hello'.decode())
hello
If the bytes use an appropriate character encoding already; you could print them directly:
sys.stdout.buffer.write(data)
or
nwritten = os.write(sys.stdout.fileno(), data) # NOTE: it may write less than len(data) bytes
If the data is in an UTF-8 compatible format, you can convert the bytes to a string.
>>> print(str(b"hello", "utf-8"))
hello
Optionally, convert to hex first if the data is not UTF-8 compatible (e.g. data is raw bytes).
>>> from binascii import hexlify
>>> print(hexlify(b"\x13\x37"))
b'1337'
>>> print(str(hexlify(b"\x13\x37"), "utf-8"))
1337
>>> from codecs import encode # alternative
>>> print(str(encode(b"\x13\x37", "hex"), "utf-8"))
1337
According to the source for bytes.__repr__, the b'' is baked into the method.
One workaround is to manually slice off the b'' from the resulting repr():
>>> x = b'\x01\x02\x03\x04'
>>> print(repr(x))
b'\x01\x02\x03\x04'
>>> print(repr(x)[2:-1])
\x01\x02\x03\x04
To show or print:
<byte_object>.decode("utf-8")
To encode or save:
<str_object>.encode('utf-8')
I am a little late but for Python 3.9.1 this worked for me and removed the -b prefix:
print(outputCode.decode())
It's so simple...
(With that, you can encode the dictionary and list bytes, then you can stringify it using json.dump / json.dumps)
You just need use base64
import base64
data = b"Hello world!" # Bytes
data = base64.b64encode(data).decode() # Returns a base64 string, which can be decoded without error.
print(data)
There are bytes that cannot be decoded by default(pictures are an example), so base64 will encode those bytes into bytes that can be decoded to string, to retrieve the bytes just use
data = base64.b64decode(data.encode())
Use decode() instead of encode() for converting bytes to a string.
>>> import curses
>>> print(curses.version.decode())
2.2
I am trying to make a web service call in Python 3. A subset of the request includes a base64 encoded string, which is coming from a list of Python dictionaries.
So I dump the list and encode the string:
j = json.dumps(dataDictList, indent=4, default = myconverter)
encodedData = base64.b64encode(j.encode('ASCII'))
Then, when I build my request, I add in that string. Because it comes back in bytes I need to change it to string:
...
\"data\": \"''' + str(encodedData) + '''\"
...
The response I'm getting from the web service is that my request is malformed. When I print our str(encodedData) I get:
b'WwogICAgewogICAgICAgICJEQVlfREFURSI6ICIyMDEyLTAzLTMxIDAwOjAwOjAwIiwKICAgICAgICAiQ0FMTF9DVFJfSUQiOiA1LAogICAgICAgICJUT1RfRE9MTEFSX1NBTEVTIjogMTk5MS4wLAogICAgICAgICJUT1RfVU5JVF9TQUxFUyI6IDQ0LjAsCiAgICAgICAgIlRPVF9DT1NUIjogMTYxOC4xMDM3MDAwMDAwMDA2LAogICAgICAgICJHUk9TU19ET0xMQVJfU0FMRVMiOiAxOTkxLjAKICAgIH0KXQ=='
If I copy this into a base64 decoder, I get gibberish until I remove the b' at the beginning as well as the last single quote. I think those are causing my request to fail. According to this note, though, I would think that the b' is ignored: What does the 'b' character do in front of a string literal?
I'll appreciate any advice.
Thank you.
Passing a bytes object into str causes it to be formatted for display, it doesn't convert the bytes into a string (you need to know the encoding for that to work):
In [1]: x = b'hello'
In [2]: str(x)
Out[2]: "b'hello'"
Note that str(x) actually starts with b' and ends with '. If you want to decode the bytes into a string, use bytes.decode:
In [5]: x = base64.b64encode(b'hello')
In [6]: x
Out[6]: b'aGVsbG8='
In [7]: x.decode('ascii')
Out[7]: 'aGVsbG8='
You can safely decode the base64 bytes as ASCII. Also, your JSON should be encoded as UTF-8, not ASCII. The following changes should work:
j = json.dumps(dataDictList, indent=4, default=myconverter)
encodedData = base64.b64encode(j.encode('utf-8')).decode('ascii')
I am trying to do some experimenting in encrypting and decrypting using PyCrypto.AES when I try to decrypt it gives me TypeError: 'str' does not support the buffer interface
I found some solutions where I have to encode or use string, but I couldn't figure how to use it.
AESModule.py
from Crypto.Cipher import AES
#base64 is used for encoding. dont confuse encoding with encryption#
#encryption is used for disguising data
#encoding is used for putting data in a specific format
import base64
# os is for urandom, which is an accepted producer of randomness that
# is suitable for cryptology.
import os
def encryption(privateInfo,secret,BLOCK_SIZE):
#32 bytes = 256 bits
#16 = 128 bits
# the block size for cipher obj, can be 16 24 or 32. 16 matches 128 bit.
# the character used for padding
# used to ensure that your value is always a multiple of BLOCK_SIZE
PADDING = '{'
# function to pad the functions. Lambda
# is used for abstraction of functions.
# basically, its a function, and you define it, followed by the param
# followed by a colon,
# ex = lambda x: x+5
pad = lambda s: s + (BLOCK_SIZE - len(s) % BLOCK_SIZE) * PADDING
# encrypt with AES, encode with base64
EncodeAES = lambda c, s: base64.b64encode(c.encrypt(pad(s)))
# generate a randomized secret key with urandom
#secret = os.urandom(BLOCK_SIZE)
print('Encryption key:',secret)
# creates the cipher obj using the key
cipher = AES.new(secret)
# encodes you private info!
encoded = EncodeAES(cipher, privateInfo)
print('Encrypted string:', encoded)
return(encoded)
def decryption(encryptedString,secret):
PADDING = '{'
DecodeAES = lambda c, e: c.decrypt(base64.b64decode(e)).rstrip(PADDING)
#Key is FROM the printout of 'secret' in encryption
#below is the encryption.
encryption = encryptedString
cipher = AES.new(secret)
decoded = DecodeAES(cipher, encryption)
print(decoded)
test.py
import AESModule
import base64
import os
BLOCK_SIZE = 16
key = os.urandom(BLOCK_SIZE)
c = AESRun2.encryption('password',key,BLOCK_SIZE)
AESRun2.decryption(c,key)
Strings (str) are text. Encryption does not deal in text, it deals in bytes (bytes).
In practice insert .encode and .decode calls where necessary to convert between the two. I recommend UTF-8 encoding.
In your case since you are already encoding and decoding the ciphertext as base-64 which is another bytes/text conversion, you only need to encode and decode your plaintext. Encode your string with .encode("utf-8") when passing it into the encryption function, and decode the final result with .decode("utf-8") when getting it out of the decryption function.
If you're reading examples or tutorials make sure they are for Python 3. In Python 2 str was a byte string and it was commonplace to use it for both text and bytes, which was very confusing. In Python 3 they fixed it.
I have a base64 encoded string S="aGVsbG8=", now i want to decode the string into ASCII, UTF-8, UTF-16, UTF-32, CP-1256, ISO-8659-1, ISO-8659-2, ISO-8659-6, ISO-8659-15 and Windows-1252, How i can decode the string into the mentioned format. For UTF-16 I tried following code, but it was giving error "'bytes' object has no attribute 'deocde'".
base64.b64decode(encodedBase64String).deocde('utf-8')
Please read the doc or docstring for the 3.x base64 module. The module works with bytes, not text. So your base64 encoded 'string' would be a byte string B = b"aGVsbG8". The result of base64.decodebytes(B) is bytes; binary data with whatever encoding it has (text or image or ...). In this case, it is b'hello', which can be viewed as ascii-encoded text. To change to other encodings, first decode to unicode text and then encode to bytes in whatever other encoding you want. Most of the encodings you list above will have the same bytes.
>>> B=b"aGVsbG8="
>>> b = base64.decodebytes(B)
>>> b
b'hello'
>>> t = b.decode()
>>> t
'hello'
>>> t.encode('utf-8')
b'hello'
>>> t.encode('utf-16')
b'\xff\xfeh\x00e\x00l\x00l\x00o\x00'
>>> t.encode('utf-32')
b'\xff\xfe\x00\x00h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00'