Decoding Base 64 In Groovy Returns Garbled Characters - groovy

I'm using an API which returns a Base64 encoded file that I want to parse and harvest data from. I'm having trouble decoding the Base64, as it comes back with garbled characters. The code I have is below.
Base64 decoder = new Base64()
def jsonSlurper = new JsonSlurper()
def json = jsonSlurper.parseText(Requests.getInventory(app).toString())
String stockB64 = json.getAt("stock")
byte[] decoded = decoder.decode(stockB64)
println(new String(decoded, "US-ASCII"))
I've also tried println(new String(decoded, "UTF-8")) and this returns the same garbled output. I've pasted in an example snipped of the output for reference.
� ���v���
��W`�C�:�f��y�z��A��%J,S���}qF88D q )��'�C�c�X��������+n!��`nn���.��:�g����[��)��f^���c�VK��X�W_����������?4��L���D�������i�9|�X��������\���L�V���gY-K�^����
��b�����~s��;����g���\�ie�Ki}_������
What am I doing wrong here?

You don't need the Base64 class wherever you took it from. You can simply do stockB64.decodeBase64() to get the decoded byte array. Are you sure what you have there is actual text that is encoded. Usually base64 encoded means that this is some binary like an image. If it is text you could have put it as string in the json simply. Maybe save the resulting byte array to a file and then investigate the file type by content.

Related

Parsing a non-Unicode string with Flask-RESTful

I have a webhook developed with Flask-RESTful which gets several parameters with POST.
One of the parameters is a non-Unicode string, encoded in cp1251.
Can't find a way to correctly parse this argument using reqparse.
Here is the fragment of my code:
parser = reqparse.RequestParser()
parser.add_argument('text')
msg = parser.parse_args()
Then, I write msg to a text file, and it looks like this:
{"text": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd !\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\n\n-- \n\ufffd \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd."}
As you can see, Flask somehow replaces all Cyrillic characters with \ufffd. At the same time, non-Cyrillic characters, like ! or \n are processed correctly.
Anything I can do to advise RequestParser with the string encoding?
Here is my code for writing the text to disk:
f = open('log_msg.txt', 'w+')
f.write(json.dumps(msg))
f.close()
I tried f = open('log_msg.txt', 'w+', encoding='cp1251') with the same result.
Then, I tried
f = open('log_msg_ascii.txt', 'w+')
f.write(ascii(json.dumps(msg)))
Also, no difference.
So, I'm pretty sure it's RequestParser() tries to be too smart and can't understand the non-Unicode input.
Thanks!
Okay, I finally found a workaround. Thanks to #lenz for helping me with this issue. It seems that reqparse wrongly assumes that every string parameter comes as UTF-8. So when it sees a non-Unicode input field (among other Unicode fields!), it tries to load it as Unicode and fails. As a result, all characters are U+FFFD (replacement character).
So, to access that non-Unicode field, I did the following trick.
First, I load raw data using get_data(), decode it using cp1251 and parse with a simple regexp.
raw_data = request.get_data()
contents = raw_data.decode('windows-1251')
match = re.search(r'(?P<delim>--\w+\r?\n)Content-Disposition: form-data; name=\"text\"\r?\n(.*?)(?P=delim)', contents, re.MULTILINE | re.DOTALL)
text = match.group(2)
Not the most beautiful solution, but it works.

base64.encodebytes fails to insert newline chars

I must be missing something obvious. The function below successfully generates a base64 encoded string from an image file, but according to the docs, I expected it to have newlines every 76 characters.
def generateBase64(filein):
import base64
with open(filein, 'rb') as f:
return base64.encodebytes(f.read())
calling it on an image file (.png) thus: print(generateBase64(imgpath)) just returns one long string. What am I doing wrong?

Groovy encodeBase64() returning unexpected result for PNG image file

I am trying to convert a PNG image file to Base64 encoding in Groovy.
Here is my code:
ImageFile = new File("D:/DATA/CustomScript/Logo.png").text;
String encoded = ImageFile.getBytes().encodeBase64().toString();
I get the following as result:
iVBORw0KGgoAAAANSUhEUgAAAIQAAABPCAIAAAClCfqHAAAABGdBTUEAALE/C/xhBQAAAAlwSFlzAAAOwwAADsMBx2+oZAAAAQ1JREFUeF7t1KGRgwAURdFVyHQbSwOkKlrIoECDSwusoYgDcz97396Z/3eGUQxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgzIE2IcxzHP87qu176tJ8T4/X7Lsuz7fu3b6k1BigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAnhNj27ZxHN/v9/f7vU5385wYn8/n9XoNwzBN03W6l/P8BwSpsfw4c1/6AAAAAElFTkSuQmCC
The same image when passed through https://www.base64encode.org/ gives this result:
iVBORw0KGgoAAAANSUhEUgAAAIQAAABPCAIAAAClCfqHAAAABGdBTUEAALGPC/xhBQAAAAlwSFlzAAAOwwAADsMBx2+oZAAAAQ1JREFUeF7t1KGRgwAURdFVyHQbSwOkKlrIoECDSwusoYgDc497396Z/3eGUQxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgzIE2IcxzHP87qu176tJ8T4/X7Lsuz7fu3b6k1BigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAnhNj27ZxHN/v9/f7vU5385wYn8/n9XoNwzBN03W6l/P8BwSpsfw4c1/6AAAAAElFTkSuQmCC
I have tried to highlight some of the differences. It is clear that both encoded strings are different.
Problem is that I have to pass this image's Base64 encoding to another system and it is accepting the one from https://www.base64encode.org/ but rejecting the one generated by Groovy.
Any ideas what I am doing wrong here?
You are hiting an encoding problem here. Binary data is not character data; character data is effected by encodings. Instead of text use the bytes of the file. E.g.
def f = "/tmp/screenshot-000.png" as File
assert f.bytes.encodeBase64().toString()==("/tmp/encoded_20190208131326.txt" as File).text
Answer from user cfrick was extremely helpful. Unfortunately, it didn't solve my problem. I believe the reason was that I was on an older version of Groovy.
This code eventually solved my problem:
String base64Image = "";
File file = new File(imagePath);
FileInputStream imageInFile = new FileInputStream(file);
byte[] imageData = new byte[file.size()];
imageInFile.read(imageData);
base64Image = Base64.getEncoder().encodeToString(imageData);

Node Buffers, from utf8 to binary

I'm receiving data as utf8 from a source and this data was originally in binary form (it was a Buffer). I have to convert back this data to a Buffer. I'm having a hard time figuring how to do this.
Here's a small sample that shows my problem:
var hexString = 'e61b08020304e61c09020304e61d0a020304e61e65';
var buffer1 = new Buffer(hexString, 'hex');
var str = buffer1.toString('utf8');
var buffer2 = new Buffer(str, 'utf8');
console.log('original content:', hexString);
console.log('buffer1 contains:', buffer1.toString('hex'));
console.log('buffer2 contains:', buffer2.toString('hex'));
prints
original content: e61b08020304e61c09020304e61d0a020304e61e65
buffer1 contains: e61b08020304e61c09020304e61d0a020304e61e65
buffer2 contains: efbfbd1b08020304efbfbd1c09020304efbfbd1d0a020304efbfbd1e65
Here, I would like buffer2 to be the exact same thing as buffer1.
How can I convert an utf8 string to its original binary Buffer?
You cannot expect binary data converted to utf8 and back again to be the same as the original binary data because of the way utf8 works (especially when invalid utf8 characters are replaced with \ufffd).
You have to use another format that correctly preserves the data. This could be 'hex', 'base64', 'binary', or some other binary-safe format provided by a third-party module. Obviously you should probably keep it as a Buffer if you can.
The accepted answer is misleading. Your main problem is that you're dealing with invalid UTF-8. If the data were valid, the conversion would not cause issues.
Specifically, take the first two bytes: e61b.
In binary, that's: 11100110, 00011011. This is invalid. Take a look at this diagram from the utf-8 wikipedia page.
This says that if a byte starts with 1110, the next byte must start with two bytes starting with 10 after it. This is not the case here.
Whenever js hits an invalid character, it replaces it with �, the unicode replacement character. The codepoint for that is U+FFFD, and the utf-8 encoding of that code point is efbfbd. Notice that this shows up in your output a few times.

Convert Binary data from file to readable string

I have binary data stored in a file. I am doing this:
byte[] fileBytes = File.ReadAllBytes(#"c:\carlist.dat");
string ascii = Encoding.ASCII.GetString(fileBytes);
This is giving me following result with lot of invalid characters. What am i doing wrong?
?D{F ?x#??4????? NBR-OF-CARSNUMBER-OF-CARS!"#??? NBR-OF-CARS$%??1y0#123?G??#$ NBR-OF-CARS%45??1y#  NUMBER-OF-CARSd?
hmm... seems like a save was made from a byte buffer where after NBR-OF-CARS was written some numeric data. If you have an access to the code that saves the file could you check if there are numbers over there and if there are - check does the code converts numbers to string before witing the value into the binary stream.

Resources