Protobuf RuntimeWarning : Unexpected end-group tag Not all data was converted - python-3.x

I have this utf-8 encoded file from which i need to collect the hexadecimal dump which is protobuf viable and then feed it to the protobuf.The .proto file works as expected and life is almost perfect.
message_content = message_content.replace(" ","")
message_content = binascii.unhexlify(message_content)
I convert the string to raw bytes and then feed it to the protobuf
msg.ParseFromString(message_content)
from which results the error
RuntimeWarning: Unexpected end-group tag: Not all data was converted
msg.ParseFromString(message_content)
I can't tell if I collect the hex part poorly or if its corrupted.
message_content looks like this:
b"87\x00\x00C\x17\x11\x10j\x17\x11\x10\x0c\x00\xc2\x00\x08\xec\xad\xe8\xe0\xf9\x04\x10\x01\x1a\x1f\x08\xea\xae\x18\x12\x14\x01\x00\x0f\x00\x02\x02|\xf0%\x00\x01&\x00\x01'\x00\x01*\x00\x01*\x01\x00\x1a\x00\x1a \x08\xea\xae\x14\x12\x14\x01\x00\x0f\x00\x02\x02|\xf0%\x00\x01&\x00\x01'\x00\x01(\x00\x01*\x02\x00\x00\x1a\x00\x1a#\x08\xea.\x12\x14\x01\x00\x0f\x00\x02\x02|\xf0%\x00\x01&\x00\x011\x00\x012\x00\x01*\x06\x00\x00\x00\x00\x00\x00\x1a\x00\x1a \x08\xea\xae\x14\x12\x14\x01\x00\x0f\x00\x02\x02|\xf0%\x00\x01&\x00\x01'\x00\x01(\x00\x02*\x02\x00\x00\x1a\x00\x1a\x1d\x08\xea\xae\x0c\x12\x11\x01\x00\x0f\x00\x02\x02|\xf0%\x00\x01&\x00\x011\x00\x01*\x02\x00\x00\x1a\x00"

I had a similar problem. Finally I found there is a problem with the data provided by the upstream source. You can also check whether there is a problem with the base64 data source.

I met the same err.
the err code is:
rsp_serialize = base64.b64decode(str(rsp))
item_info = StrucItem()
item_info.ParseFromString(rsp_serialize)
the right code is:
rsp_serialize = base64.b64decode(str(rsp, encoding="utf-8"))
item_info = StrucItem()
item_info.ParseFromString(rsp_serialize)
so you should check your upstream data is ok?

Related

How to send a pickled object across a server with encoding? Python 3

I want to send a pickled, encoded version of an object Account across to my server, and then decoding it at the server end and reinstating it as the object with corresponding data, however I am unsure how to convert it back from a string to the bytes (?) data type as an object.
On the clients end, this is essentially what happens:
command = 'insert account'
tempAccount = Account('Isabel', 'password')
pickledAcc = pickle.dumps(tempAccount)
clientCommand = f"{command},{pickledAcc}".encode('utf-8')
client.send(clientCommand)
However on the servers side, it receives an empty string as the pickledAcc part.
I have simplified my code a bit but the I think the essentials are there, but if you need more I can give it lol. Also should mention that I have used the proper length etiquette, i.e. sending a message before this to notify the server how long this message will be. And all of my server infrastructure works properly :)
Basically, I just need to know if it is possible to encode the pickled Account object to send it, or if doing so will not ever work and is stupid to do so.
The problem with the format line is that you insert the __repr__ of the pickledAcc instead of the real bytes. This will not get you the wanted result:
for example:
command = "test"
pickledAcc = pickle.dumps("test_data")
clientCommand = f"{command},{pickledAcc}".encode('utf-8')
Now client command will output:
b"test,b'\\x80\\x03X\\t\\x00\\x00\\x00test_dataq\\x00.'"
as you can see, the representation of the bytes array was encoded to utf-8 ("b\...")
To solve this problem I suggest you will convert the command to bytes array and then send clientCommand as a bytes array instead
hope that helped
Client side:
import base64
##--snip--##
pickledAcc = base64.b64encode(pickledAcc).decode()
clientCommand = f"{command},{pickledAcc}".encode('utf-8')
client.send(clientCommand)
Server Side:
import base64
##--snip--##
pickledAcc = base64.b64decode(pickledAcc)
pickledAcc = pickle.loads(pickledAcc)

Groovy encodeBase64() returning unexpected result for PNG image file

I am trying to convert a PNG image file to Base64 encoding in Groovy.
Here is my code:
ImageFile = new File("D:/DATA/CustomScript/Logo.png").text;
String encoded = ImageFile.getBytes().encodeBase64().toString();
I get the following as result:
iVBORw0KGgoAAAANSUhEUgAAAIQAAABPCAIAAAClCfqHAAAABGdBTUEAALE/C/xhBQAAAAlwSFlzAAAOwwAADsMBx2+oZAAAAQ1JREFUeF7t1KGRgwAURdFVyHQbSwOkKlrIoECDSwusoYgDcz97396Z/3eGUQxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgzIE2IcxzHP87qu176tJ8T4/X7Lsuz7fu3b6k1BigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAnhNj27ZxHN/v9/f7vU5385wYn8/n9XoNwzBN03W6l/P8BwSpsfw4c1/6AAAAAElFTkSuQmCC
The same image when passed through https://www.base64encode.org/ gives this result:
iVBORw0KGgoAAAANSUhEUgAAAIQAAABPCAIAAAClCfqHAAAABGdBTUEAALGPC/xhBQAAAAlwSFlzAAAOwwAADsMBx2+oZAAAAQ1JREFUeF7t1KGRgwAURdFVyHQbSwOkKlrIoECDSwusoYgDc497396Z/3eGUQxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgzIE2IcxzHP87qu176tJ8T4/X7Lsuz7fu3b6k1BigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAnhNj27ZxHN/v9/f7vU5385wYn8/n9XoNwzBN03W6l/P8BwSpsfw4c1/6AAAAAElFTkSuQmCC
I have tried to highlight some of the differences. It is clear that both encoded strings are different.
Problem is that I have to pass this image's Base64 encoding to another system and it is accepting the one from https://www.base64encode.org/ but rejecting the one generated by Groovy.
Any ideas what I am doing wrong here?
You are hiting an encoding problem here. Binary data is not character data; character data is effected by encodings. Instead of text use the bytes of the file. E.g.
def f = "/tmp/screenshot-000.png" as File
assert f.bytes.encodeBase64().toString()==("/tmp/encoded_20190208131326.txt" as File).text
Answer from user cfrick was extremely helpful. Unfortunately, it didn't solve my problem. I believe the reason was that I was on an older version of Groovy.
This code eventually solved my problem:
String base64Image = "";
File file = new File(imagePath);
FileInputStream imageInFile = new FileInputStream(file);
byte[] imageData = new byte[file.size()];
imageInFile.read(imageData);
base64Image = Base64.getEncoder().encodeToString(imageData);

Python 3.4: str : AttributeError: 'str' object has no attribute 'decode

I have this code part of a function that replace badly encoded foreign characters from a string :
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s)
# b"String from an old database with weird mixed encodings"
I need here a "real" string, not bytes. But whend i want to decode them, i have an exception :
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s.decode("utf-8"))
# AttributeError: 'str' object has no attribute 'decode'
Do you know why s is bytes here ?
Why can't i decode it to a real string ?
Do you know how to do it the clean way ? (today i return s[2:][:-1]. Working but very ugly, and i would like to understand this behavior)
Thanks in advance !
EDIT :
pypyodbc in python3 use all unicode by default. That confused me. On connect, you can tell him to use ANSI.
con_odbc = pypyodbc.connect("DSN=GP", False, False, 0, False)
Then, i can convert the returned stuffs into cp850, which is the initial codepage of the database.
str(odbc_str, "cp850", "replace")
No more need to manualy replace each special character.
Thank you very much pepr
The printed b"String from an old database with weird mixed encodings" is not the representation of the string content. It is the value of the string content. As you did not pass the encoding argument to str()... (see the doc https://docs.python.org/3.4/library/stdtypes.html#str)
If neither encoding nor errors is given, str(object) returns object.__str__(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a __str__() method, then str() falls back to returning repr(object).
This is what happened in your case. The b" are actually two characters that are the part of the string content. You can also try:
s1 = 'String from an old database with weird mixed encodings'
print(type(s1), repr(s1))
by = bytes(s1, 'cp1252')
print(type(by), repr(by))
s2 = str(by)
print(type(s2), repr(s2))
and it prints:
<class 'str'> 'String from an old database with weird mixed encodings'
<class 'bytes'> b'String from an old database with weird mixed encodings'
<class 'str'> "b'String from an old database with weird mixed encodings'"
This is the reason why s[2:][:-1] works for you.
If you think more about it, then (in my opinion) or you want to get bytes or bytearray from the database (if possible), and to fix the bytes (see bytes.translate https://docs.python.org/3.4/library/stdtypes.html?highlight=translate#bytes.translate) or you successfully get the string (being lucky that there was no exception when constructing that string), and you want to replace the wrong characters by the correct characters (see also str.translate() https://docs.python.org/3.4/library/stdtypes.html?highlight=translate#str.translate).
Possibly, the ODBC used internally the wrong encoding. (That is the content of the database may be correct, but it was misinterpreted by the ODBC, and you are not able to tell the ODBC what is the correct encoding.) Then you want to encode the string back to bytes using that wrong encoding, and then decode the bytes using the right encoding.

Node Buffers, from utf8 to binary

I'm receiving data as utf8 from a source and this data was originally in binary form (it was a Buffer). I have to convert back this data to a Buffer. I'm having a hard time figuring how to do this.
Here's a small sample that shows my problem:
var hexString = 'e61b08020304e61c09020304e61d0a020304e61e65';
var buffer1 = new Buffer(hexString, 'hex');
var str = buffer1.toString('utf8');
var buffer2 = new Buffer(str, 'utf8');
console.log('original content:', hexString);
console.log('buffer1 contains:', buffer1.toString('hex'));
console.log('buffer2 contains:', buffer2.toString('hex'));
prints
original content: e61b08020304e61c09020304e61d0a020304e61e65
buffer1 contains: e61b08020304e61c09020304e61d0a020304e61e65
buffer2 contains: efbfbd1b08020304efbfbd1c09020304efbfbd1d0a020304efbfbd1e65
Here, I would like buffer2 to be the exact same thing as buffer1.
How can I convert an utf8 string to its original binary Buffer?
You cannot expect binary data converted to utf8 and back again to be the same as the original binary data because of the way utf8 works (especially when invalid utf8 characters are replaced with \ufffd).
You have to use another format that correctly preserves the data. This could be 'hex', 'base64', 'binary', or some other binary-safe format provided by a third-party module. Obviously you should probably keep it as a Buffer if you can.
The accepted answer is misleading. Your main problem is that you're dealing with invalid UTF-8. If the data were valid, the conversion would not cause issues.
Specifically, take the first two bytes: e61b.
In binary, that's: 11100110, 00011011. This is invalid. Take a look at this diagram from the utf-8 wikipedia page.
This says that if a byte starts with 1110, the next byte must start with two bytes starting with 10 after it. This is not the case here.
Whenever js hits an invalid character, it replaces it with �, the unicode replacement character. The codepoint for that is U+FFFD, and the utf-8 encoding of that code point is efbfbd. Notice that this shows up in your output a few times.

Convert Binary data from file to readable string

I have binary data stored in a file. I am doing this:
byte[] fileBytes = File.ReadAllBytes(#"c:\carlist.dat");
string ascii = Encoding.ASCII.GetString(fileBytes);
This is giving me following result with lot of invalid characters. What am i doing wrong?
?D{F ?x#??4????? NBR-OF-CARSNUMBER-OF-CARS!"#??? NBR-OF-CARS$%??1y0#123?G??#$ NBR-OF-CARS%45??1y#  NUMBER-OF-CARSd?
hmm... seems like a save was made from a byte buffer where after NBR-OF-CARS was written some numeric data. If you have an access to the code that saves the file could you check if there are numbers over there and if there are - check does the code converts numbers to string before witing the value into the binary stream.

Resources