How can i check a base64 string is a file(what type?) or not? - base64

I took the Spentalkux challenge on https://2020.ractf.co.uk/.
This is the first time I do a CTF challenge so I went through a solution on https://github.com/W3rni0/RACTF_2020/blob/master/readme.md#spentalkux
When I receive this base64 string :
JA2HGSKBJI4DSZ2WGRAS6KZRLJKVEYKFJFAWSOCTNNTFCKZRF5HTGZRXJV2EKQTGJVTXUOLSIMXWI2KYNVEUCNLIKN5HK3RTJBHGIQTCM5RHIVSQGJ3C6MRLJRXXOTJYGM3XORSIJN4FUYTNIU4XAULGONGE6YLJJRAUYODLOZEWWNCNIJWWCMJXOVTEQULCJFFEGWDPK5HFUWSLI5IFOQRVKFWGU5SYJF2VQT3NNUYFGZ2MNF4EU5ZYJBJEGOCUMJWXUN3YGVSUS43QPFYGCWSIKNLWE2RYMNAWQZDKNRUTEV2VNNJDC43WGJSFU3LXLBUFU3CENZEWGQ3MGBDXS4SGLA3GMS3LIJCUEVCCONYSWOLVLEZEKY3VM4ZFEZRQPB2GCSTMJZSFSSTVPBVFAOLLMNSDCTCPK4XWMUKYORRDC43EGNTFGVCHLBDFI6BTKVVGMR2GPA3HKSSHNJSUSQKBIE
I don't know how to check if it is a file, but the solver said that it is a gz compressed data file.
Can you help me, please?
detail here
Link to file: https://github.com/W3rni0/RACTF_2020/blob/master/assets/files/Spentalkux.gz

Many filetypes have a header (the first few bytes of the file) with some fixed information by which a file can be identified as a gz, png, pdf, etc.
So every base64 encoded gz file would also start with a certain sequence of base64 characters, by which it can be recognized.
A gzip-file always starts with the two byte sequence 0x1f 0x1b, which in base64 encoding is H4 plus a third character in the range of s to v.
The reason is, that every base64 character represents 6 bits of the original bytes, so the two bytes 0x1f 0x1b are encoded with two base64 characters (12 bits) plus the first 4 bits of the third character.
Based on that, I would say that's no base64 encoded gzip that you show there.
other examples are:
png
starts with: 0x89 0x50 0x4e 0x47 0x0d 0x0a 0x1a 0x0a
base64 encoded: iVBORw0KGg...
jpg
starts with: 0xFF 0xD8 0xFF 0xD0
base64 encoded: /9j/4...
gif
starts with: GIF
base64 encoded: R0lG
tif
a) little endian:
starts with: 0x49 0x49 0x2A 0x00
base64 encoded: SUkqA
b) big endian:
starts with: 0x4D 0x4D 0x00 0x2A
base64 encoded: TU0AK
flv
starts with FLV
base64 encoded: RkxW
wav/avi/webp and others
several audio/video/image/graphic -formats are base on RIFF(Resource Interchange Format)
The common part is that all files start with RIFF
base64 encoded: UklGR
After the RIFFheader, you'll find the specific format starting in the 4 bytes starting at the 9th byte.
In the following _ is used as a placeholder for any character.
wav
starts with: RIFF____WAVE
base64 encoded: UklGR______XQVZF
webp
starts with: RIFF____WEBP
base64 encoded: UklGR______XRUJQ
avi
starts with: RIFF____AVI
base64 encoded: UklGR______BVkkg
Regarding the specific example in the question:
in the updated question there's a hint in the attached picture that
the data is first base32 encoded and then base64 encoded.
When we feed an online base32 decoder with the string given in the question (JA2HGSKBJI4DSZ2WGRAS...), we get:
H4sIAJ89gV4A/+1ZURaEIAi8SkfQ+1/O3f7MtEBfMgz9rC/diXmIA5hSzun3HNdBbgbtVP2v/2+LowM837wFHKxZbmE9pQfsLOaiLAL8kvIk4MBma17ufHQbIJCXoWNZZKGPWB5QljvXIuXOmm0SgLixJw8HRC8Tbmz7x5eIspypaZHSWbj8cAhdjli2WUkR1sv2dZmwXhZlDnIcCl0GyrFX6fKkBEBTBsq+9uY2Ecug2Rf0xtaJlNdYJuxjP9kcd1LOW/fQXtb1sd3fSTGXFTx3UjfGFx6uJGjeIAAA
It starts with H4s, so according to what I wrote about how to recognize file types in base64 encoding, it's a base64 encoded gzip file.
This can be saved in a text file and then uploaded on base64decode.org where it will be converted into a gzip file. When you download and open that gzip file it contains a file with text like this:
00110000 00110000 00110001 00110001 00110000 00110001 00110000 00110000 00100000 00110000 00110000 00110001 00110001 00110000 00110001 00110000 00110001 00100000 ...
Conclusion for this case: The original string/file is a gzip file that was first base64 encoded and the base64 encoded part was again encoded with base32.

Related

Base64 Decoding results contain unknown characters

I have base64 encoded text and I try to decode it but when I decode it, it contains some unknown characters. But the interesting part is, decoded text contains that I need data. So just I need to remove unknown characters and I need to split the data.
Do you have any suggestion to escape this unknown characters?
E.g base64 data:

It is a base64 encoded protobuf data. you can decode it from https://protogen.marcgravell.com

issue decoding base64 text: prefix "77u_" causes junk output

I'm currently trying to decode my base64 encoded string. The issue I'm facing is it gives me junk decoded result but when I remove 77u_ in front from it, the decode result will get fine. My question is that, is there any other way to decode it?
Following is my encoded string:
77u_RGF0ZSxUeXBlLENvbmZpcm1hdGlvbiBDb2RlLFN0YXJ0IERhdGUsTmlnaHRzLEd1ZXN0LExpc3RpbmcsRGV0YWlscyxSZWZlcmVuY2UsQ3VycmVuY3ksQW1vdW50LFBhaWQgT3V0LEhvc3QgRmVlLENsZWFuaW5nIEZlZSxHcm9zcyBFYXJuaW5ncyxPY2N1cGFuY3kgVGF4ZXMNCjEwLzMwLzIwMTgsUmVzZXJ2YXRpb24sSE05U1BUOUNYQywxMC8yOS8yMDE4LDMsQWFyb24gSmFibG9uc2tpLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwxNDUuNTAsLDQuNTAsMzAuMDAsMTUwLjAwLDIxLjANCjEwLzI2LzIwMTgsUmVzZXJ2YXRpb24sSE0zM0gyWEU0SiwxMC8yNS8yMDE4LDMsWWIgQmFiaWUsRWxlZ2FudCBQcml2YXRlIFJvb20gaW4gYSBNb2Rlcm4gTHV4dXJ5IEJ1bmdhbG93LCwsVVNELDE3NC42MCwsNS40MCwzMC4wMCwxODAuMDAsMjUuMg0KMTAvMjIvMjAxOCxSZXNlcnZhdGlvbixITVdSRk1KNU1LLDEwLzIxLzIwMTgsNCxHYWJyaWVsbGEgRGFsdG9uLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwxNjQuOTAsLDUuMTAsMzAuMDAsMTcwLjAwLDIzLjgNCjEwLzE4LzIwMTgsUmVzZXJ2YXRpb24sSE1NUFk4UDJRRiwxMC8xNy8yMDE4LDQsUMOpdGVyIFZhcmdhLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwyMjMuMTAsLDYuOTAsMzAuMDAsMjMwLjAwLDMyLjINCjEwLzE2LzIwMTgsUmVzZXJ2YXRpb24sSE1KODVDTTNCWiwxMC8xNS8yMDE4LDIsR3JlZ29yIFNwcmljayxFbGVnYW50IFByaXZhdGUgUm9vbSBpbiBhIE1vZGVybiBMdXh1cnkgQnVuZ2Fsb3csLCxVU0QsODcuMzAsLDIuNzAsMzAuMDAsOTAuMDAsMTIuNg0KMTAvMTMvMjAxOCxSZXNlcnZhdGlvbixITVFOU1BRQUpYLDEwLzEyLzIwMTgsMyxBbWJlciBEdWNrc3dvcnRoLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCwxNDUuNTAsLDQuNTAsMzAuMDAsMTUwLjAwLDIxLjANCjEwLzEyLzIwMTgsUmVzZXJ2YXRpb24sSE1UQlFCREFFMiwxMC8xMS8yMDE4LDIsR29yZG9uIEJsZWVjaG1vcmUsRWxlZ2FudCBQcml2YXRlIFJvb20gaW4gYSBNb2Rlcm4gTHV4dXJ5IEJ1bmdhbG93LCwsVVNELDU4LjIwLCwwLjk4LDAuMDAsNTkuMTgsOC40DQoxMC8xMi8yMDE4LFJlc2VydmF0aW9uLEhNWEZEVzlNRkosMTAvMTEvMjAxOCwxLEhhbXphIE5zb3VyLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCw1My4zNSwsMS42NSwzMC4wMCw1NS4wMCw3LjcNCjEwLzEwLzIwMTgsUmVzZXJ2YXRpb24sSE1XWDVQQ0hBUSwxMC8wOS8yMDE4LDIsTGVhaCBXLEVsZWdhbnQgUHJpdmF0ZSBSb29tIGluIGEgTW9kZXJuIEx1eHVyeSBCdW5nYWxvdywsLFVTRCw4Ny4zMCwsMi43MCwzMC4wMCw5MC4wMCwxMi42DQoxMC8wNC8yMDE4LFJlc2VydmF0aW9uLEhNQzNGTk04RFcsMTAvMDMvMjAxOCw2LFNhbmRybyBDemVrYWksRWxlZ2FudCBQcml2YXRlIFJvb20gaW4gYSBNb2Rlcm4gTHV4dXJ5IEJ1bmdhbG93LCwsVVNELDM3OC4zMCwsMTEuNzAsMzAuMDAsMzkwLjAwLDU0LjYNCjEwLzAxLzIwMTgsUmVzZXJ2YXRpb24sSE1UUzk0Uk1FMywwOS8zMC8yMDE4LDMsQnJhbmRlbiBIaWNrcyxFbGVnYW50IFByaXZhdGUgUm9vbSBpbiBhIE1vZGVybiBMdXh1cnkgQnVuZ2Fsb3csLCxVU0QsMTE2LjQwLCwzLjYwLDMwLjAwLDEyMC4wMCwxNi44DQo
It's the _ that messes up the result. Everything else is perfectly fine base64 encoded. The base64 code table doesn't contain a _, it is a substitiution character for / in base64url encoding.
When you replace _ with / the decoding works fine.
When I tested it on https://www.base64decode.org/ and chose ASCII as the source character set, I got  in front of the text, which is the byte order mark for UTF-8. When I changed to utf-8, there was nothing visible in front of the text.
A short test in node.js also proves that '77u/' is indeed the base64 code of the BOM:
var messageB64 ='77u/'
var buf = Buffer.from(messageB64, 'base64');
console.log(buf) // output: <Buffer ef bb bf>
Conclusion:
your data is base64url decoded
you should change it back to base64 code before you decode
the extra characters are a harmless byte order marker which is invisible if you use utf-8 encoding.

I have encoded string but i don't know what type is it

I have two encoded strings that used same encoding method but i don't know what type it is.
I have tried using base64 decode but it didn't work.
This is the first encoded string I have 3qpY0Vw86MZykGfqc7jnVg==
This is the second encoded string I have nB6dtl3iA5IE1Z+g9SpBrw==
They are using same encoding method.
I want to know what type of encoding that used in that strings. Also I want to know how to decode it.
Those base64 payload may be containing something else than a string, like a raw binary, an image, a ciphered payload, etc. that can't be displayed as text.
base64 is not exclusively used to encode text.
For example to save to a file:
$ printf %s 'iVBORw0KGgoAAAANSUhEUgAAAGAAAABgCAYAAADimHc4AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAAZdEVYdFNvZnR3YXJlAEFkb2JlIEltYWdlUmVhZHlxyWU8AAAMfUlEQVR4Xu1bC3BU1Rn+7mN3E8IjEiAY3iQxIA8bKiJSRUofjLZKx0cf6mgfap3O6NjptNNqW6vjYDt2+pip09ZXqXa01amIoqJTpVYIijYIpFEgBAMkhEB4JCG7e/fe2+8/ezdsHhYq7t7d5H6ZO3tz7tm7e//vnO9/nLOaSyCAb9C91wA+ISDAZwQE+IyAAJ8REOAzAgJ8RkCAzwgI8BkBAT4jIMBnBAT4jIAAnzHoCbCbt8La+CicQ41eS25hUBOQ+M9LiK++A7G1K5DY9Be4iZh3JXcwKMvRbvQYrA0PI1H7NziH9/IpNWjDShD+/A8Rmne11ys3MOgIcKMdiL34M9hbVnPERwEjDE3a7Tj0sZWIXH4fjCnzk51zAINOgrRwEQ28AG7HgaTxOfplBsi507aDM+OPcI7s83r7j8HnA3Qd5uxLEVpyG3C83WsUDkiCbiJR/wqsd56gg8gNfzAonbAWHobQwm9Ar1oKN9bptbJdN9RrYsNDdNAvq3O/MWijIH30FIQ//V1ow8dS/y2vleAscOPHEXtlBZyWOq/RPwzqMNSctlBFPoh2IhVrKCmiP3AP7Ub8jQfhdB9W7X4hbwlwY13ovKuCDrXZaxkY5uzLEVr8HfqDE4ZWJIQLkXj7cSS2PEdyvAs+IG8JiD5yFWB1o+ueGUg0bvRa+0OL0B8w9jemMTKi9KSsrWl89IKRsJ75Hux9tarND+QlAdFHvwp7z2YgVEiNH4PuB5Yh9tyPvKv9oZ85i1ER/QEN7jqJEyTQKbuhAsQevxHOsRbVlm3kHQGx1XfAaniDkQ6NTykRI+pFJYjXPILuP12LD1MTc8ZnEF58K+DYtL/jtZIEMwL3SBPiL9ytZC3byCsCXCsKu+55ZrZu0tAp8aac6EzA7J3r0P2H5XAO70m290HovGsQqr5S5QDpBQCtcBSsTY/Bqn3Ka8ke8ooAjXJRcNNqGONncSQnkkZMI0EzC+A0vY3ow1dT199NtqeDEhS64FswJp8L2PHeJIwoRfzp22DvHeB9GUTeSZBeMgUFtzwPo/wiGp9yQknpgUgSSZIZEF15HRLbXmCfE3IjEH9gSpJWNEaRmCJQOeXCYvqDG+B0HlJt2UDeESDQjBAKr38M5tzlHPWM6dPLCkIC/YPbfQTRp29F/B+/7FeGDs1laDr/WjoGvpckpiD3co42I7bq+0lysgDjLsI7zzuYZy8Tq8E58D5cifOZ5aoYn9B4Lka0KUk4th/6xHMYkg5X1wR62Wy4bbvgHNyp/u95H52y/cGb0EaeCWPSPNWWSQyKcrS9/TXE1vw0uerFUazkxIOEnRqfUK9cgvDS22GUzfGu8H0tdYhztMtrOnlK1hJxFIrUTfyEassU8lKC+sI4awkKvvJ7JlsLaVWrl1+QmeCSELt+LXOFO2EzhE2NOEP8wYLroQ8brcLTHn8gRTuSEX3i2xkPTXNWgmRhxd61AW7nQcrB+F6jeiBoI8bCrLhI9Xdb36MtnaQh5Zr4BYNEtDfCad5GR10IfWy5ahMSnI42uPu3KeJSn6P6MzmTtYPQ7EtVWyaQswQ4O99A7NkfwN6xLlnH4eDURpb2GHUgiMYbVUtUycGlHMnSpDjs5EWSIElXRyucDzbxAyxo489WUZMx4Rw4+7bCaf+A/aRrmj9o2sQ8oTgZumYAOUmACIG1cSXs3Rvhdh3kTFhPA21WRoVNTR89lUQMPCNEcozKxWpNwKVBZRSDJPQYledCkLO3lpHSUbVMqY8cB23sdDiNNSp6IlXCgkr3NImgeM9Q9RXq/R83cpIAp3EjrLf+TDlpUyNUDOgeb4ezu4YJ1ha4jFzEiPoIGo5y0hdiakOinlETKCF74EpmzJnTQwLPXRLp7K9jJNRA41dyFsxRI93e8TqlyEr6BMqguehmhJfcBn0484YMIOcIkFFnbXgI9vuv8tslIxN1iAFJhhs9qiTEYcbqNJOMeDeNM7ZXiJmC6LxeOgPoJHltDDfVvTyNlxlEP+G0baf+10E7YyrMmZ8F4p2KBG3YGYgsuxPhi2l8Sl+mkHszQJwvdd/ZU8tRyGQobeQKeoiId3EE16t+9l7KE0e5VjxB1XXSoY86E/rkavbvJGEMN3s5Z5IgURKlSpGpGQgt/Dpn3gFEaHiz+sqevplCzuUBIg3ukb3U/zeR2Px32I3rSYSjFlAGioRSMbtkvzqdqc5QNDTnMujjKr0eSTjU9kTNI2qXnHucOi/SlgbX6mYOEUFk+S9gTF/EUT/eu5JZ5GwiJl/KbW9S0pF49xkeq9jAGSFrAEJE2qyQ+F09BnMAVaYeM50+YB7M+V9TYWYKUpJIbHkW1qu/ZsSzWznq1H2k0qoNH4eC6x6lP5ir2rKBvMiE3S5qeGs9HfNjSNS9REMzMhEiVCQ0ABESYsqeoNFToJedQym5Amb5omQXHvaO12CtvQ+2aL+EppxhIjXhLz+A0MzPqX7ZQl4QkIJIh9OxH4m3n4S1/iFlaESK+uu0ECGmlsSKGo9hjOMnViO06KaeXXFSP4o992NKXA0Q60Tkqt8mC3TpMysLyCsCUhC5kPjeqn0a1j9/pyIXqfX/LyJ4UTloY0wFzPOvhzn7C2q2xFZeA33qAoQZbqKPX8gG8pKAFNT6Lh2wbLSKvXI/szePiFT2mwb1mEIEIWVnffxMhM6/gQ73U0BRCQd+dkd+CjlFgHyRj2wGOlirbg3ia1dQXnao0S6G7gv1uLJII+Eoc4iCm1cli3g+IWcIkMgk+tqvYEyeD7PiQuhTFsCcfr539dQhRbjEllWIS6Sz59/QikYrR9sXkkkblUsQuXwF9DMmea3ZR04QIGWG2Mv3IfHmSsbi1GFZpVLxfQxayTTok86FWbUUJrVaK5nivevksJkxq4IecwrJbCXOT80xjQlf5JoHYTJn8BM5QYBzYCdia34Cu+FfnpHSIDovezup9bLHH8yAzfLFnCHzSUy12gk9UBkiHc6+dxF76V44u9artQEh15i1DJFL74ZePNHr5Q9yggC74XXEnrqd2Wp7Mmw8CVySIbsa1OKLxVlSMJzh5XnQz1qiEjCj/IIB7yNJXfzFe2BtW4OCL/0coYXf9K74h5wgIPHOk+h+4ibKxGh+Iy+5YlSixOKk0QkDTYePoAjhITOFfzrDTamI6pPPVStm+rgqFenIw6oqK6VOKxiRvIWP8J8AO8HsltHLut+ojFdpv1pW9ErCAiHBI+aUSOEjqRCV91b7fxJRytQo6BPnkIiZMD95tfInuYCMEOAwtY9Fo/0MZRhM98P9Q8MUpCBmN70DZ+9mVasX34DoUWo/yeA1kRsVQqYI4etABbqBIOUGyZxl90TBJXcjtPR27wo5sm3E45w9aRCzhEIhdWQSGSGgo6MDr69b5xmIxpIRyfaysjJUz/v/tnq4h5tgt76fJKalLrnCxfjdtY6TFGbE0WOqpHwqpKg1X2a7keX303lf4rUCzS0tqNu6VZGQSsgSiQSqZsxAVVWV+j9TyAgBXV1d2LB+vXoYpbveR5SWlmLO3NOrNMoscVq3JxdS2hoYar6V/BlSrEutEUhdR0oVso6QIkRe1fegHOljKhH+4r0wp52QoNbWVrxXXw/Lsk4QwPPyykpUVFSo/zOFvCNgIMhWROfgLrgHG5RsOYcaODNIhPxIjzNEESS5BR20cfYyhBl+GiVTvXcHBGQEzqHdsPfXq58iqe0oR1vgHm1WRbjwhbf0KrwNaQLUipasRkkf7+FPGyI9IkF6iKc8p+N2ju6He2A7NFmiLO2t60OaAOdQExK1f1VbRJTRMgW5N6Mpc9Yl/YpvQ5oAu/EtxJ68MfljO+PkWfDpQCKmyGUrEL74Vq8lCT8JGDheyyZkZMrabGS4qulk9AgXqaXKXIL/BAxx5AABlCcpG0iWKiWIDB7qcySTziH4T4Ds2xzByGTUhIwfWvEkJXW5BP/DUNnJ3N7EgcnR6TnAjIGfoY0qU1sZ0xEkYj5jaEdBQxwBAT4jqwToH/KjCr9hmCd+oJdtZNUHRCIRFBcXqwWbXIF8v2g0is7OTvW9su0DskKAIPUxGfi400bqOwqyTUDWNCH1YClSculIIf08W8iqKPd98Fw7/EBGCBCREamRQ3Q1L48sSWXGfMDGmhrfRtXHAdkpMb28HOU8MomMEBDg1JGbgfkQQkCAzwgI8BkBAT4jIMBnBAT4jIAAnxEQ4DMCAnxGQIDPCAjwGQEBPiMgwGcEBPiMgACfERDgMwICfEZAgM8ICPAZAQG+Avgv4WYH0htgNTQAAAAASUVORK5CYII=' | base64 -d > stackoverflow.png

MIME base64 encoding ambiguity in rfc2045

According to MIME base64 encoding specified in rfc2045, the base64 encoded data must be split in lines of at most 76 characters.
When decoding, all characters not belonging to the base64 alphabet must ne ignored.
How do we determine the end of MIME base64 encoded data ?
When you've found the start of a base64 encoded object, it should always be possible to find the end without decoding it. Examples:
You might have an email message whose top-level encoding is base64. In that case, the end of the base64 stuff is the end of the body. The end of the body is recognized not by any internal structure, but by the lone . at the end of the SMTP DATA.
If you're reading an email message from an mbox file instead of receiving it via SMTP, the mbox format is responsible for telling you where the end of the message is.
If you have a multipart email body with one part base64, you can scan for the multipart boundary first to find the end of the body part, then pass the whole body part to the base64 decoder.
Similarly, if you have an RFC2047-encoded header with base64, you can find the terminating =? first, then pass the encoded portion to the base64 decoder.
Because the terminators are already identified before base64 decoding begins, the decoder never sees the terminator, so the rule "characters not belonging to the base64 alphabet" is not relevant.
The 2 steps of finding the end of the base64 data and decoding can be combined into a single loop over the input, for efficiency. But conceptually they are separate.

Base64 decode gives garbled result

I copy and paste a text from the web that is base64 encoded to this site http://www.motobit.com/util/base64-decoder-encoder.asp ,
but it gives a strange result , it look like a garbled.
May i ask ....how can i decode the text correctly , thank you.
nUł"˘ÂUÓ"…F˙i+Şž›‘Ş˘éăŁqëKĄ®qâ %˘ dŞn‘ĄŐňŢ^{łcąWq\)ńÂÔ€
Here is the base64 encoded text
blWzIh+iwgYbVdODDyKFRv9pFCuqnpuRqqLp46Nx60ulrnHioCWiIARk-qm6RpdUB8t5ee7NjG7lXcVwp8cLUg==

Resources