Google Protocol Buffers (protobuf) in Python3 - trouble with ParseFromString (encoding?) - python-3.x

I've got Google Protocol buffers 80% working in Python3. My .proto file works, I'm encoding data, life is almost good.
The problem is that I can't ParseFromString the result of SerializeToString.
When I print SerializeToString it looks like what I'd expect, a fairly compact binary representation (preceded by b').
My guess is that perhaps this is a difference in the way Python2 and Python3 handle strings. The putput of SerializeToString is Bytes, not a string.
Printed output of SerializeToString (Python type is ):
b'\x10\xd7\xeb\x8e\xcd\x04\x1a\x0cnamegoeshere2#\x08\x80\xf8\xde\xc3\x9f\xb0\x81\x89\x14\x11\x00\x00\x00\x00\x00\x80d\xc0\x19\x00\x00\x00\x00\x00\xc0m#!\x00\x00\x00\x00\x00\x80R\xc0)\x00\x00\x00\x00\x00x\xb7\xc01\x00\x00\x00\x00\x00\x8c\x95#9\x00\x00\x00\x00\x00\x16\xb2#'
result of ParseFromString(message):
None
No error is provided...
So - my best guess is that all I need to do is .decode() the bytes object generated, the problem is that I have no clue what the encoding is. I've tried UTF-8, -16, Latin-1, and a few others without success. My Google-Fu is strong but I haven't found anything on this.
Any help would be appreciated.

ParseFromString is a method -- it does not return anything, but rather fills in self with the parsed content. Use it like:
message = MyMessageType()
message.ParseFromString(data)
print(message.some_field)

Related

base64.encodestring returns error Unicode objects must be encoded before hashing

I am using the following code which works fine on python2.7. This code returns me error 'Unicode objects must be encoded before hashing' on python 3.7. Can someone please tell me the equal of this in python3.7 version.
base64.encodestring(hashlib.sha256(any_string).digest()).strip()
A lot of downstream code depends on this so I cannot change this algo. I want the same output in python3.7.
Any pointers would be appreciated.
base64.encodestring(hashlib.sha256(any_string.encode('UTF-8')).digest()).strip()
In Python 3+ unicode objects (strings) and bytes are handled differently than in Python 2. The sha256 function seems to require bytes and not unicode which is why the error is appearing. Adding .encode('UTF-8') to the string will give the correct format for the sha256 function. I have tested this in both python 2.7 and 3.7 and both work correctly and give the same output.

bytes().decode proviedes neither Error nor Output

print(bytes(97).decode('utf8'))
This line is part of my project and its giving me a headache for a while now, If I run this the Output is just an empty space, no Error or anything else. The 97 came from encoding the 'a' with utf8. I want do work with the encoded numbers so I changed them to Integers, but I cant get them to decode once I'm done working with them
So if you run bytes(97) the output is b'\x00\x00\x00\x00\...
but if you run bytes([97]) you get b'a' which I assume is what you expected.
EDIT:
I just found this question which explains the why:
Converting int to bytes in Python 3

Node.js wrong UTF8 string representation, even though byte-codes seem correct

Having searched around for a while now, I believe my problem may not be directly related to what others had. I am using unicode chars in forms (using angularjs for client-side) and noticed that the UTF8 strings didn't display on the server logs properly. Thus I decided to base64.encode all strings on the client side before submitting to the server (nodejs/express4). The JSON data arrives properly to the server, but when I try to convert it from base64 to UTF8 using a buffer I'm getting different symbols. I tested the strings on http://www.base64decode.org/ and they decode fine. Can anyone suggest what I might be doing wrong?
Example char: σ, base64="z4M=".
On the server this line decodes all JSON values to UTF8:
Object.keys(req.body).forEach(function(key) { req.body[key] = new Buffer(req.body[key], 'base64').toString('utf8'); });
And the "σ" char becomes "Ο" on the server. Anyone can assist?
Thus I decided to base64.encode all strings on the client side before submitting to the server (nodejs/express4).
No need to, really. Probably the thing you were doing wrong with utf-8 json is also wrong now.
Try to debug that.
noticed that the UTF8 strings didn't display on the server logs properly.
What do they display?
And on what OS are you?
Did you look at the logs with a hex viewer?
To me this looks like a typical "I have an a problem X, thought my solution half the way, but I am stuck with a sub-problem Y". Go back to X and attack it the right way (no base64).

Writing text files with node webkit

I'm trying to do something rather simple: write a text file with data entered in a text input field to a file...
var data = document.getElementById("fileContent").value;
fs.writeFileSync("test.txt", data);
For instance if I type in,
Write this to file 123 123
I end up with this in the file...
Write this to
If I hard code a string into the application, it writes correctly.
fs.writeFileSync("test.txt", "this is a hard coded string");
I tried using writeFileSync with and without the encoding parameter set. I've tried createWriteStream with and without encoding the parameter set. I've tried fileOpen, fs.writeSync, and fs.close. I even tried converting the date to a Buffer object and writing that. In every case, I got the exact same results.
The encoding is also strange. Notepad++ indicates that the encoding is "UCS2-LE w/o BOM" I'd expect it to be UTF-8, as I'v been setting the encoding parameter to that.
Any thoughts?
It's a bug with Node-Webkit-v0.9.*
It's OK if you use Node-Webkit-v.8.* or lower version.
After some more research and determining it was something with encoding, I stumbled on this post. Apparently, utf8 doesn't work...
https://groups.google.com/forum/#!msg/node-webkit/3M-0v92o9Zs/eSYnSZ8dUK0J
I changed the encoding it to "utf16le", and this appears to write the text correctly for hard-coded text and text from a text box.

How to convert between bytes and strings in Python 3?

This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes.
As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a .tostring() method that was clearly not producing what I understood as a string...)
My test program goes like this:
import mangler # spoof package
stringThing = """
<Doc>
<Greeting>Hello World</Greeting>
<Greeting>你好</Greeting>
</Doc>
"""
# print out the input
print('This is the string input:')
print(stringThing)
# now make the string into bytes
bytesThing = mangler.tostring(stringThing) # pseudo-code again
# now print it out
print('\nThis is the bytes output:')
print(bytesThing)
The output from this code gives this:
This is the string input:
<Doc>
<Greeting>Hello World</Greeting>
<Greeting>你好</Greeting>
</Doc>
This is the bytes output:
b'\n<Doc>\n <Greeting>Hello World</Greeting>\n <Greeting>\xe4\xbd\xa0\xe5\xa5\xbd</Greeting>\n</Doc>\n'
So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.
The 'mangler' in the above code sample was doing the equivalent of this:
bytesThing = stringThing.encode(encoding='UTF-8')
There are other ways to write this (notably using bytes(stringThing, encoding='UTF-8'), but the above syntax makes it obvious what is going on, and also what to do to recover the string:
newStringThing = bytesThing.decode(encoding='UTF-8')
When we do this, the original string is recovered.
Note, using str(bytesThing) just transcribes all the gobbledegook without converting it back into Unicode, unless you specifically request UTF-8, viz., str(bytesThing, encoding='UTF-8'). No error is reported if the encoding is not specified.
In python3, there is a bytes() method that is in the same format as encode().
str1 = b'hello world'
str2 = bytes("hello world", encoding="UTF-8")
print(str1 == str2) # Returns True
I didn't read anything about this in the docs, but perhaps I wasn't looking in the right place. This way you can explicitly turn strings into byte streams and have it more readable than using encode and decode, and without having to prefex b in front of quotes.
This is a Python 101 type question,
It's a simple question but one where the answer is not so simple.
In python3, a "bytes" object represents a sequence of bytes, a "string" object represents a sequence of unicode code points.
To convert between from "bytes" to "string" and from "string" back to "bytes" you use the bytes.decode and string.encode functions. These functions take two parameters, an encoding and an error handling policy.
Sadly there are an awful lot of cases where sequences of bytes are used to represent text, but it is not necessarily well-defined what encoding is being used. Take for example filenames on unix-like systems, as far as the kernel is concerned they are a sequence of bytes with a handful of special values, on most modern distros most filenames will be UTF-8 but there is no gaurantee that all filenames will be.
If you want to write robust software then you need to think carefully about those parameters. You need to think carefully about what encoding the bytes are supposed to be in and how you will handle the case where they turn out not to be a valid sequence of bytes for the encoding you thought they should be in. Python defaults to UTF-8 and erroring out on any byte sequence that is not valid UTF-8.
print(bytesThing)
Python uses "repr" as a fallback conversion to string. repr attempts to produce python code that will recreate the object. In the case of a bytes object this means among other things escaping bytes outside the printable ascii range.
TRY THIS:
StringVariable=ByteVariable.decode('UTF-8','ignore')
TO TEST TYPE:
print(type(StringVariable))
Here 'StringVariable' represented as a string. 'ByteVariable' represent as Byte. Its not relevent to question Variables..

Resources