I'm currently trying to decode my base64 encoded string. The issue I'm facing is it gives me junk decoded result but when I remove 77u_ in front from it, the decode result will get fine. My question is that, is there any other way to decode it?
Following is my encoded string:
77u_RGF0ZSxUeXBlLENvbmZpcm1hdGlvbiBDb2RlLFN0YXJ0IERhdGUsTmlnaHRzLEd1ZXN0LExpc3RpbmcsRGV0YWlscyxSZWZlcmVuY2UsQ3VycmVuY3ksQW1vdW50LFBhaWQgT3V0LEhvc3QgRmVlLENsZWFuaW5nIEZlZSxHcm9zcyBFYXJuaW5ncyxPY2N1cGFuY3kgVGF4ZXMNCjEwLzMwLzIwMTgsUmVzZXJ2YXRpb24sSE05U1BUOUNYQywxMC8yOS8yMDE4LDMsQWFyb24gSmFibG9uc2tpLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwxNDUuNTAsLDQuNTAsMzAuMDAsMTUwLjAwLDIxLjANCjEwLzI2LzIwMTgsUmVzZXJ2YXRpb24sSE0zM0gyWEU0SiwxMC8yNS8yMDE4LDMsWWIgQmFiaWUsRWxlZ2FudCBQcml2YXRlIFJvb20gaW4gYSBNb2Rlcm4gTHV4dXJ5IEJ1bmdhbG93LCwsVVNELDE3NC42MCwsNS40MCwzMC4wMCwxODAuMDAsMjUuMg0KMTAvMjIvMjAxOCxSZXNlcnZhdGlvbixITVdSRk1KNU1LLDEwLzIxLzIwMTgsNCxHYWJyaWVsbGEgRGFsdG9uLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwxNjQuOTAsLDUuMTAsMzAuMDAsMTcwLjAwLDIzLjgNCjEwLzE4LzIwMTgsUmVzZXJ2YXRpb24sSE1NUFk4UDJRRiwxMC8xNy8yMDE4LDQsUMOpdGVyIFZhcmdhLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwyMjMuMTAsLDYuOTAsMzAuMDAsMjMwLjAwLDMyLjINCjEwLzE2LzIwMTgsUmVzZXJ2YXRpb24sSE1KODVDTTNCWiwxMC8xNS8yMDE4LDIsR3JlZ29yIFNwcmljayxFbGVnYW50IFByaXZhdGUgUm9vbSBpbiBhIE1vZGVybiBMdXh1cnkgQnVuZ2Fsb3csLCxVU0QsODcuMzAsLDIuNzAsMzAuMDAsOTAuMDAsMTIuNg0KMTAvMTMvMjAxOCxSZXNlcnZhdGlvbixITVFOU1BRQUpYLDEwLzEyLzIwMTgsMyxBbWJlciBEdWNrc3dvcnRoLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwxNDUuNTAsLDQuNTAsMzAuMDAsMTUwLjAwLDIxLjANCjEwLzEyLzIwMTgsUmVzZXJ2YXRpb24sSE1UQlFCREFFMiwxMC8xMS8yMDE4LDIsR29yZG9uIEJsZWVjaG1vcmUsRWxlZ2FudCBQcml2YXRlIFJvb20gaW4gYSBNb2Rlcm4gTHV4dXJ5IEJ1bmdhbG93LCwsVVNELDU4LjIwLCwwLjk4LDAuMDAsNTkuMTgsOC40DQoxMC8xMi8yMDE4LFJlc2VydmF0aW9uLEhNWEZEVzlNRkosMTAvMTEvMjAxOCwxLEhhbXphIE5zb3VyLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCw1My4zNSwsMS42NSwzMC4wMCw1NS4wMCw3LjcNCjEwLzEwLzIwMTgsUmVzZXJ2YXRpb24sSE1XWDVQQ0hBUSwxMC8wOS8yMDE4LDIsTGVhaCBXLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCw4Ny4zMCwsMi43MCwzMC4wMCw5MC4wMCwxMi42DQoxMC8wNC8yMDE4LFJlc2VydmF0aW9uLEhNQzNGTk04RFcsMTAvMDMvMjAxOCw2LFNhbmRybyBDemVrYWksRWxlZ2FudCBQcml2YXRlIFJvb20gaW4gYSBNb2Rlcm4gTHV4dXJ5IEJ1bmdhbG93LCwsVVNELDM3OC4zMCwsMTEuNzAsMzAuMDAsMzkwLjAwLDU0LjYNCjEwLzAxLzIwMTgsUmVzZXJ2YXRpb24sSE1UUzk0Uk1FMywwOS8zMC8yMDE4LDMsQnJhbmRlbiBIaWNrcyxFbGVnYW50IFByaXZhdGUgUm9vbSBpbiBhIE1vZGVybiBMdXh1cnkgQnVuZ2Fsb3csLCxVU0QsMTE2LjQwLCwzLjYwLDMwLjAwLDEyMC4wMCwxNi44DQo
It's the _ that messes up the result. Everything else is perfectly fine base64 encoded. The base64 code table doesn't contain a _, it is a substitiution character for / in base64url encoding.
When you replace _ with / the decoding works fine.
When I tested it on https://www.base64decode.org/ and chose ASCII as the source character set, I got  in front of the text, which is the byte order mark for UTF-8. When I changed to utf-8, there was nothing visible in front of the text.
A short test in node.js also proves that '77u/' is indeed the base64 code of the BOM:
var messageB64 ='77u/'
var buf = Buffer.from(messageB64, 'base64');
console.log(buf) // output: <Buffer ef bb bf>
Conclusion:
your data is base64url decoded
you should change it back to base64 code before you decode
the extra characters are a harmless byte order marker which is invisible if you use utf-8 encoding.
Related
I have base64 encoded text and I try to decode it but when I decode it, it contains some unknown characters. But the interesting part is, decoded text contains that I need data. So just I need to remove unknown characters and I need to split the data.
Do you have any suggestion to escape this unknown characters?
E.g base64 data:

It is a base64 encoded protobuf data. you can decode it from https://protogen.marcgravell.com
I created a .docx file.
Now, I do this:
// read the file to a buffer
const data = await fs.promises.readFile('<pathToMy.docx>')
// Converts the buffer to a string using 'utf8' but we could use any encoding
const stringContent = data.toString()
// Converts the string back to a buffer using the same encoding
const newData = Buffer.from(stringContent)
// We expect the values to be equal...
console.log(data.equals(newData)) // -> false
I don't understand in what step of the process the bytes are being changed...
I already spent sooo much time trying to figure this out, without any result... If someone can help me understand what part I'm missing out, it would be really awesome!
A .docXfile is not a UTF-8 string (it's a binary ZIP file) so when you read it into a Buffer object and then call .toString() on it, you're assuming it is already encoding as UTF-8 in the buffer and you want to now move it into a Javascript string. That's not what you have. Your binary data will likely encounter things that are invalid in UTF-8 and those will be discarded or coerced into valid UTF-8, causing an irreversible change.
What Buffer.toString() does is take a Buffer that is ALREADY encoded in UTF-8 and puts it into a Javascript string. See this comment in the doc,
If encoding is 'utf8' and a byte sequence in the input is not valid UTF-8, then each invalid byte is replaced with the replacement character U+FFFD.
So, the code you show in your question is wrongly assuming that Buffer.toString() takes binary data and reversibly encodes it as a UTF8 string. That is not what it does and that's why it doesn't do what you are expecting.
Your question doesn't describe what you're actually trying to accomplish. If you want to do something useful with the .docX file, you probably need to actually parse it from it's binary ZIP file form into the actual components of the file in their appropriate format.
Now that you explain you're trying to store it in localStorage, then you need to encode the binary into a string format. One such popular option is Base64 though it isn't super efficient (size wise), but it is better than many others. See Binary Data in JSON String. Something better than Base64 for prior discussion on this topic. Ignore the notes about compression in that other answer because your data is already ZIP compressed.
I have two encoded strings that used same encoding method but i don't know what type it is.
I have tried using base64 decode but it didn't work.
This is the first encoded string I have 3qpY0Vw86MZykGfqc7jnVg==
This is the second encoded string I have nB6dtl3iA5IE1Z+g9SpBrw==
They are using same encoding method.
I want to know what type of encoding that used in that strings. Also I want to know how to decode it.
Those base64 payload may be containing something else than a string, like a raw binary, an image, a ciphered payload, etc. that can't be displayed as text.
base64 is not exclusively used to encode text.
For example to save to a file:
$ printf %s 'iVBORw0KGgoAAAANSUhEUgAAAGAAAABgCAYAAADimHc4AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAAZdEVYdFNvZnR3YXJlAEFkb2JlIEltYWdlUmVhZHlxyWU8AAAMfUlEQVR4Xu1bC3BU1Rn+7mN3E8IjEiAY3iQxIA8bKiJSRUofjLZKx0cf6mgfap3O6NjptNNqW6vjYDt2+pip09ZXqXa01amIoqJTpVYIijYIpFEgBAMkhEB4JCG7e/fe2+8/ezdsHhYq7t7d5H6ZO3tz7tm7e//vnO9/nLOaSyCAb9C91wA+ISDAZwQE+IyAAJ8REOAzAgJ8RkCAzwgI8BkBAT4jIMBnBAT4jIAAnzHoCbCbt8La+CicQ41eS25hUBOQ+M9LiK++A7G1K5DY9Be4iZh3JXcwKMvRbvQYrA0PI1H7NziH9/IpNWjDShD+/A8Rmne11ys3MOgIcKMdiL34M9hbVnPERwEjDE3a7Tj0sZWIXH4fjCnzk51zAINOgrRwEQ28AG7HgaTxOfplBsi507aDM+OPcI7s83r7j8HnA3Qd5uxLEVpyG3C83WsUDkiCbiJR/wqsd56gg8gNfzAonbAWHobQwm9Ar1oKN9bptbJdN9RrYsNDdNAvq3O/MWijIH30FIQ//V1ow8dS/y2vleAscOPHEXtlBZyWOq/RPwzqMNSctlBFPoh2IhVrKCmiP3AP7Ub8jQfhdB9W7X4hbwlwY13ovKuCDrXZaxkY5uzLEVr8HfqDE4ZWJIQLkXj7cSS2PEdyvAs+IG8JiD5yFWB1o+ueGUg0bvRa+0OL0B8w9jemMTKi9KSsrWl89IKRsJ75Hux9tarND+QlAdFHvwp7z2YgVEiNH4PuB5Yh9tyPvKv9oZ85i1ER/QEN7jqJEyTQKbuhAsQevxHOsRbVlm3kHQGx1XfAaniDkQ6NTykRI+pFJYjXPILuP12LD1MTc8ZnEF58K+DYtL/jtZIEMwL3SBPiL9ytZC3byCsCXCsKu+55ZrZu0tAp8aac6EzA7J3r0P2H5XAO70m290HovGsQqr5S5QDpBQCtcBSsTY/Bqn3Ka8ke8ooAjXJRcNNqGONncSQnkkZMI0EzC+A0vY3ow1dT199NtqeDEhS64FswJp8L2PHeJIwoRfzp22DvHeB9GUTeSZBeMgUFtzwPo/wiGp9yQknpgUgSSZIZEF15HRLbXmCfE3IjEH9gSpJWNEaRmCJQOeXCYvqDG+B0HlJt2UDeESDQjBAKr38M5tzlHPWM6dPLCkIC/YPbfQTRp29F/B+/7FeGDs1laDr/WjoGvpckpiD3co42I7bq+0lysgDjLsI7zzuYZy8Tq8E58D5cifOZ5aoYn9B4Lka0KUk4th/6xHMYkg5X1wR62Wy4bbvgHNyp/u95H52y/cGb0EaeCWPSPNWWSQyKcrS9/TXE1vw0uerFUazkxIOEnRqfUK9cgvDS22GUzfGu8H0tdYhztMtrOnlK1hJxFIrUTfyEassU8lKC+sI4awkKvvJ7JlsLaVWrl1+QmeCSELt+LXOFO2EzhE2NOEP8wYLroQ8brcLTHn8gRTuSEX3i2xkPTXNWgmRhxd61AW7nQcrB+F6jeiBoI8bCrLhI9Xdb36MtnaQh5Zr4BYNEtDfCad5GR10IfWy5ahMSnI42uPu3KeJSn6P6MzmTtYPQ7EtVWyaQswQ4O99A7NkfwN6xLlnH4eDURpb2GHUgiMYbVUtUycGlHMnSpDjs5EWSIElXRyucDzbxAyxo489WUZMx4Rw4+7bCaf+A/aRrmj9o2sQ8oTgZumYAOUmACIG1cSXs3Rvhdh3kTFhPA21WRoVNTR89lUQMPCNEcozKxWpNwKVBZRSDJPQYledCkLO3lpHSUbVMqY8cB23sdDiNNSp6IlXCgkr3NImgeM9Q9RXq/R83cpIAp3EjrLf+TDlpUyNUDOgeb4ezu4YJ1ha4jFzEiPoIGo5y0hdiakOinlETKCF74EpmzJnTQwLPXRLp7K9jJNRA41dyFsxRI93e8TqlyEr6BMqguehmhJfcBn0484YMIOcIkFFnbXgI9vuv8tslIxN1iAFJhhs9qiTEYcbqNJOMeDeNM7ZXiJmC6LxeOgPoJHltDDfVvTyNlxlEP+G0baf+10E7YyrMmZ8F4p2KBG3YGYgsuxPhi2l8Sl+mkHszQJwvdd/ZU8tRyGQobeQKeoiId3EE16t+9l7KE0e5VjxB1XXSoY86E/rkavbvJGEMN3s5Z5IgURKlSpGpGQgt/Dpn3gFEaHiz+sqevplCzuUBIg3ukb3U/zeR2Px32I3rSYSjFlAGioRSMbtkvzqdqc5QNDTnMujjKr0eSTjU9kTNI2qXnHucOi/SlgbX6mYOEUFk+S9gTF/EUT/eu5JZ5GwiJl/KbW9S0pF49xkeq9jAGSFrAEJE2qyQ+F09BnMAVaYeM50+YB7M+V9TYWYKUpJIbHkW1qu/ZsSzWznq1H2k0qoNH4eC6x6lP5ir2rKBvMiE3S5qeGs9HfNjSNS9REMzMhEiVCQ0ABESYsqeoNFToJedQym5Amb5omQXHvaO12CtvQ+2aL+EppxhIjXhLz+A0MzPqX7ZQl4QkIJIh9OxH4m3n4S1/iFlaESK+uu0ECGmlsSKGo9hjOMnViO06KaeXXFSP4o992NKXA0Q60Tkqt8mC3TpMysLyCsCUhC5kPjeqn0a1j9/pyIXqfX/LyJ4UTloY0wFzPOvhzn7C2q2xFZeA33qAoQZbqKPX8gG8pKAFNT6Lh2wbLSKvXI/szePiFT2mwb1mEIEIWVnffxMhM6/gQ73U0BRCQd+dkd+CjlFgHyRj2wGOlirbg3ia1dQXnao0S6G7gv1uLJII+Eoc4iCm1cli3g+IWcIkMgk+tqvYEyeD7PiQuhTFsCcfr539dQhRbjEllWIS6Sz59/QikYrR9sXkkkblUsQuXwF9DMmea3ZR04QIGWG2Mv3IfHmSsbi1GFZpVLxfQxayTTok86FWbUUJrVaK5nivevksJkxq4IecwrJbCXOT80xjQlf5JoHYTJn8BM5QYBzYCdia34Cu+FfnpHSIDovezup9bLHH8yAzfLFnCHzSUy12gk9UBkiHc6+dxF76V44u9artQEh15i1DJFL74ZePNHr5Q9yggC74XXEnrqd2Wp7Mmw8CVySIbsa1OKLxVlSMJzh5XnQz1qiEjCj/IIB7yNJXfzFe2BtW4OCL/0coYXf9K74h5wgIPHOk+h+4ibKxGh+Iy+5YlSixOKk0QkDTYePoAjhITOFfzrDTamI6pPPVStm+rgqFenIw6oqK6VOKxiRvIWP8J8AO8HsltHLut+ojFdpv1pW9ErCAiHBI+aUSOEjqRCV91b7fxJRytQo6BPnkIiZMD95tfInuYCMEOAwtY9Fo/0MZRhM98P9Q8MUpCBmN70DZ+9mVasX34DoUWo/yeA1kRsVQqYI4etABbqBIOUGyZxl90TBJXcjtPR27wo5sm3E45w9aRCzhEIhdWQSGSGgo6MDr69b5xmIxpIRyfaysjJUz/v/tnq4h5tgt76fJKalLrnCxfjdtY6TFGbE0WOqpHwqpKg1X2a7keX303lf4rUCzS0tqNu6VZGQSsgSiQSqZsxAVVWV+j9TyAgBXV1d2LB+vXoYpbveR5SWlmLO3NOrNMoscVq3JxdS2hoYar6V/BlSrEutEUhdR0oVso6QIkRe1fegHOljKhH+4r0wp52QoNbWVrxXXw/Lsk4QwPPyykpUVFSo/zOFvCNgIMhWROfgLrgHG5RsOYcaODNIhPxIjzNEESS5BR20cfYyhBl+GiVTvXcHBGQEzqHdsPfXq58iqe0oR1vgHm1WRbjwhbf0KrwNaQLUipasRkkf7+FPGyI9IkF6iKc8p+N2ju6He2A7NFmiLO2t60OaAOdQExK1f1VbRJTRMgW5N6Mpc9Yl/YpvQ5oAu/EtxJ68MfljO+PkWfDpQCKmyGUrEL74Vq8lCT8JGDheyyZkZMrabGS4qulk9AgXqaXKXIL/BAxx5AABlCcpG0iWKiWIDB7qcySTziH4T4Ds2xzByGTUhIwfWvEkJXW5BP/DUNnJ3N7EgcnR6TnAjIGfoY0qU1sZ0xEkYj5jaEdBQxwBAT4jqwToH/KjCr9hmCd+oJdtZNUHRCIRFBcXqwWbXIF8v2g0is7OTvW9su0DskKAIPUxGfi400bqOwqyTUDWNCH1YClSculIIf08W8iqKPd98Fw7/EBGCBCREamRQ3Q1L48sSWXGfMDGmhrfRtXHAdkpMb28HOU8MomMEBDg1JGbgfkQQkCAzwgI8BkBAT4jIMBnBAT4jIAAnxEQ4DMCAnxGQIDPCAjwGQEBPiMgwGcEBPiMgACfERDgMwICfEZAgM8ICPAZAQG+Avgv4WYH0htgNTQAAAAASUVORK5CYII=' | base64 -d > stackoverflow.png
According to MIME base64 encoding specified in rfc2045, the base64 encoded data must be split in lines of at most 76 characters.
When decoding, all characters not belonging to the base64 alphabet must ne ignored.
How do we determine the end of MIME base64 encoded data ?
When you've found the start of a base64 encoded object, it should always be possible to find the end without decoding it. Examples:
You might have an email message whose top-level encoding is base64. In that case, the end of the base64 stuff is the end of the body. The end of the body is recognized not by any internal structure, but by the lone . at the end of the SMTP DATA.
If you're reading an email message from an mbox file instead of receiving it via SMTP, the mbox format is responsible for telling you where the end of the message is.
If you have a multipart email body with one part base64, you can scan for the multipart boundary first to find the end of the body part, then pass the whole body part to the base64 decoder.
Similarly, if you have an RFC2047-encoded header with base64, you can find the terminating =? first, then pass the encoded portion to the base64 decoder.
Because the terminators are already identified before base64 decoding begins, the decoder never sees the terminator, so the rule "characters not belonging to the base64 alphabet" is not relevant.
The 2 steps of finding the end of the base64 data and decoding can be combined into a single loop over the input, for efficiency. But conceptually they are separate.
I copy and paste a text from the web that is base64 encoded to this site http://www.motobit.com/util/base64-decoder-encoder.asp ,
but it gives a strange result , it look like a garbled.
May i ask ....how can i decode the text correctly , thank you.
nUł"˘ÂUÓ"…F˙i+Şž›‘Ş˘éăŁqëKĄ®qâ %˘ dŞn‘ĄŐňŢ^{łcąWq\)ńÂÔ€
Here is the base64 encoded text
blWzIh+iwgYbVdODDyKFRv9pFCuqnpuRqqLp46Nx60ulrnHioCWiIARk-qm6RpdUB8t5ee7NjG7lXcVwp8cLUg==