Decode a string with unknown encode method received from web-browser - string

inside a webapplication i am processing requests to a url like
http://example.com/<website-base-url>
im am logging the raw GET parameter of the request in an uft8 database column and in filesystem. for a few chinese domains i get requests with a website-base-url parameter like
%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%A7%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A3%C3%82%C2%A8%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A2%C3%82%C2%B4.cn
Decoding with urldecode returns
ã¥â¤â§ã¥â¤â´ã¨â´â´.cn
This does not seem to be the domain name the user wants to request.
I have tried urlencoding, base64, utf8 and combinations wihtout success.
Any suggestions how decode the given parameter to utf8?

URL percentage encodings simply encode the raw bytes. It does not give you any hint regarding the actual encoding of the text. If you do not know what encoding these bytes represent, all you can do is guess.
php > $d = urldecode('%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%A7%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A3%C3%82%C2%A8%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A2%C3%82%C2%B4.cn');
php > echo $d;
ã¥â¤â§ã¥â¤â´ã¨â´â´.cn
php > echo iconv('BIG5', 'UTF-8', $d);
php > echo iconv('Shift-JIS', 'UTF-8', $d);
テδ」テつ・テδ「テつ、テδ「テつァテδ」テつ・テδ「テつ、テδ「テつエテδ」テつィテδ「テつエテδ「テつエ.cn
php > echo iconv('GB18030', 'UTF-8', $d);
脙拢脗楼脙垄脗陇脙垄脗搂脙拢脗楼脙垄脗陇脙垄脗麓脙拢脗篓脙垄脗麓脙垄脗麓.cn
GB18030 would seem to be the best candidate, but even that decoded string looks a bit too repetitive to be really useful Chinese.

Related

Linux using command file -i return wrong value charset=unknow-8bit for a windows-1252 encoded file

Using nodejs and iconv-lite to create a http response file in xml with charset windows-1252, the file -i command cannot identify it as windows-1252.
Server side:
r.header('Content-Disposition', 'attachment; filename=teste.xml');
r.header('Content-Type', 'text/xml; charset=iso8859-1');
r.write(ICONVLITE.encode(`<?xml version="1.0" encoding="windows-1252"?><x>€Àáção</x>`, "win1252")); //euro symbol and portuguese accentuated vogals
r.end();
The browser donwloads the file and then i check it in Ubuntu 20.04 LTS:
file -i teste.xml
/tmp/teste.xml: text/xml; charset=unknown-8bit
When i use gedit to open it, the accentuated vogal appear fine but the euro symbol it does not (all characters from 128 to 159 get messed up).
I checked in a windows 10 vm and in there all goes well. Both in Windows and Linux web browsers, it also shows all fine.
So, is it a problem in file command? How to check the right charsert of a file in Linux?
Thank you
EDIT
The result file can be get here
2nd EDIT
I found one error! The code line:
r.header('Content-Type', 'text/xml; charset=iso8859-1');
must be:
r.header('Content-Type', 'text/xml; charset=Windows-1252');
It's important to understand what a character encoding is and isn't.
A text file is actually just a stream of bits; or, since we've mostly agreed that there are 8 bits in a byte, a stream of bytes. A character encoding is a lookup table (and sometimes a more complicated algorithm) for deciding what characters to show to a human for that stream of bytes.
For instance, the character "€" encoded in Windows-1252 is the string of bits 10000000. That same string of bits will mean other things in other encodings - most encodings assign some meaning to all 256 possible bytes.
If a piece of software knows that the file is supposed to be read as Windows-1252, it can look up a mapping for that encoding and show you a "€". This is how browsers are displaying the right thing: you've told them in the Content-Type header to use the Windows-1252 lookup table.
Once you save the file to disk, that "Windows-1252" label form the Content-Type header isn't stored anywhere. So any program looking at that file can see that it contains the string of bits 10000000 but it doesn't know what mapping table to look that up in. Nothing you do in the HTTP headers is going to change that - none of those are going to affect how it's saved on disk.
In this particular case the "file" command could look at the "encoding" marker inside the XML document, and find the "windows-1252" there. My guess is that it simply doesn't have that functionality. So instead it uses its general logic for guessing an encoding: it's probably something ASCII-compatible, because it starts with the bytes that spell <?xml in ASCII; but it's not ASCII itself, because it has bytes outside the range 00000000 to 01111111; anything beyond that is hard to guess, so output "unknown-8bit".

Why does this base64 string decode online and not locally

I have a base64 encoded JWT that decodes on jwt.io and on base64decode.org but does not decode using base64 command on my OSX not my Linux machine (running echo {the string} | grep base64 --decode. I have no idea why, here is the string:
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2NvdW50IjoiNTgzZDQzZTItY2Q2ZC00YzBjLTlkNTItMmE4ZmNlY2Y2ZTVmIiwiYWNjb3VudFJvbGVzIjpbIk9PU19QRVJTT05fUkVNQVJLU19FRElUT1IiLCJBV1MtQVJUQVBQLUFETUlOIiwiT09TX1BFUlNPTl9ET0NVTUVOVFNfUkVBREVSX05PTl9NRURJQ0FMIiwiQVdTLVZFU1NFTExPQURJTkctQURNSU4iLCJPT1NfUEVSU09OX1JFTUFSS1NfUkVBREVSX1NBTEFSWSIsIlZQTlVTRVI5IiwiT09TX0NIRUNLTElTVF9BRE1JTiIsIkFXUy1QUk9ELUFETUlOIiwiQ09OQ09VUlNFLkNJIiwiT09TX1BPUlRfQ09NTUVOVFNfRURJVE9SIiwiT09TX1BFUlNPTl9FRElUT1JfU0VSVklDRSIsIk9PU19WRVNTRUxfUkVTUE9OU0lCSUxJVElFU19FRElUT1IiLCJNQU5BR0VEIERBVEFCQVNFUyIsIlZQTlVTRVIxMCIsIk9PU19QT1JUX01ZX0NPTU1FTlRTX0RFTEVUT1IiLCJPT1NfV09SS0ZMT1dfVVNFUiIsIk9PU19QRVJTT05fRURJVE9SIiwiQVdTLVNBTkRCT1gtQURNSU4iLCJPT1NfU1VQRVJfVVNFUiIsIlZQTlVTRVI1IiwiT09TX1ZFU1NFTF9JVElORVJBUllfUkVBREVSIiwiT09TX1BFUlNPTl9SRUFERVJfQkFDS0dST1VORCIsIk9PU19QRVJTT05fTUVESUNBTF9FRElUT1IiLCJPT1NfUEVSU09OX0RFTEVUT1IiLCJPT1NfSU5TUEVDVElPTl9TQ0hFRFVMRV9ERUxFVE9SIiwiT09TX1BFUlNPTl9SRUFERVJfU0VSVklDRSIsIk9PU19QRVJTT05fRE9DVU1FTlRTX0VESVRPUiIsIk9PU19QRVJTT05fRE9DVU1FTlRTX0RFTEVUT1IiLCJWUE5VU0VSMiIsIk9PU19QT1JUX0NPTU1FTlRTX0RFTEVUT1IiLCJPT1NfUEVSU09OX1JFTUFSS1NfUkVBREVSX1BFUlNPTkFMIiwiT09TX1BPUlRfTVlfQ09NTUVOVFNfRURJVE9SIiwiT09TX1RFTVBMQVRFX0xJQlJBUllfRURJVE9SIiwiQVdTLVRFU1QtQURNSU5AOTBQT0UuSU8iLCJPT1NfUEVSU09OX0RPQ1VNRU5UU19ET1dOTE9BREVSX01FRElDQUwiLCJJTlNQRUNUSU9OX1RFTVBMQVRFIiwiVlBOVVNFUjgiLCJPT1NfUEVSU09OX1JBTktfQVZBSUxBQklMSVRZX0VESVRPUiIsIlZQTlVTRVI0IiwiVlBOVVNFUjEiLCJPT1NfUEVSU09OX01FRElDQUxfREVMRVRPUiIsIlZQTlVTRVIzIiwiQVdTLUlBTS1BRE1JTiIsIkFXUy1BUlRBUFAtUFJPRC1BRE1JTiIsIk9PU19GTEVFVF9QRVJGT1JNQU5DRSIsIk9PU19JTlNQRUNUSU9OX1NDSEVEVUxFX1JFQURFUiIsIk9PU19QRVJTT05fRE9DVU1FTlRTX0RPV05MT0FERVJfTk9OX01FRElDQUwiLCJPT1NfUEVSU09OX0RPQ1VNRU5UU19SRUFERVJfTUVESUNBTCIsIk9PU19QT1JUX0lORk9fUkVBREVSIiwiVlBOVVNFUjYiLCJPT1NfUEVSU09OX1JFTUFSS1NfUkVBREVSX01FRElDQUwiLCJBV1MtREVWLUFETUlOIiwiQVdTLVRPT0xTLUFETUlOIiwiQVdTLUJJTExJTkctQURNSU4iLCJJTlNQRUNUSU9OX1RFTVBMQVRFX0lURU0iLCJPT1NfUEVSU09OX1JFTUFSS1NfREVMRVRPUiIsIk9PU19QRVJTT05fUkVBREVSX1NVTU1BUlkiLCJPT1NfUEVSU09OX1JFQURFUl9QSFlTSUNBTCIsIk9PU19XT1JLRkxPV19FRElUT1IiLCJPT1NfVEVNUExBVEVfTElCUkFSWV9SRUFERVIiLCJBUFBST0FDSF9JTkZPX0VESVRPUiIsIk9PU19URU1QTEFURV9MSUJSQVJZX0RFTEVUT1IiLCJBV1MtU1RHLUFETUlOIiwiT09TX1BFUlNPTl9SRU1BUktTX1JFQURFUl9UUkFWRUwiLCJPT1NfTUFOTklOR19BR0VOVF9WQVJOQSIsIk9PU19QRVJTT05fUkVBREVSX0NPTlRBQ1QiLCJPT1NfUEVSU09OX1JFQURFUl9ORVhUT0ZLSU4iLCJPT1NfSU5TUEVDVElPTl9TQ0hFRFVMRV9FRElUT1IiLCJFVkVSWU9ORSIsIk9PU19QRVJTT05fUkVBREVSX0ZBTUlMWSIsIk9PU19QRVJTT05fUkVNQVJLU19SRUFERVJfRElTQ0lQTElOQVJZIl0sImFwcGxpY2F0aW9uIjoiT09TIiwiY29udGV4dCI6Ik1BTk5JTkdfT0ZGSUNFOjEyOWQxNjAzLWFhZTktNDQ1NC04OWRjLWI4NzFhMGNmNGJhMCIsImV4cCI6MTU0MTc3Njg3OCwiaWF0IjoxNTQxNzc2Mjc4LCJpc3MiOiIiLCJwZXJtaXNzaW9ucyI6WyJJTlNQRUNUSU9OX1NDSEVEVUxFOkRMUlVDIiwiUEVSU09OX1JFTUFSS1NfU0FMQVJZOkNMUlVEIiwiUEVSU09OX0RPQ1VNRU5UU19DRVJUSUZJQ0FURVM6TFJDTlVEIiwiUEVSU09OOkxSQ1VEIiwiUEVSU09OX05FWFRPRktJTjpDTFJVRCIsIlBFUlNPTl9SRU1BUktTX01FRElDQUw6Q0xSVUQiLCJQRVJTT05fRE9DVU1FTlRTX0xJQ0VOU0U6TFJDTlVEIiwiUEVSU09OX0ZBTUlMWTpDTFJVRCIsIlBPUlRfSU5GT19BUFBST0FDSDpMUlVDIiwiSU5TUEVDVElPTl9URU1QTEFURTpMUlVDRCIsIklOU1BFQ1RJT05fVEVNUExBVEVfSVRFTTpMUlVDRCIsIlBMQVRGT1JNX1JFTUFSS1M6TFJVQ0QiLCJQTEFURk9STV9XT1JLRkxPV19ERUZJTklUSU9OOkNVIiwiVkVTU0VMX0lUSU5FUkFSWTpMUiIsIlBPUlRfQ09NTUVOVFM6TFJVQ0QiLCJQRVJTT05fUkFOS19BVkFJTEFCSUxJVFk6TFJDVUQiLCJQRVJTT05fQkFDS0dST1VORDpDTFJVRCIsIkZMRUVUX1BFUkZPUk1BTkNFOkxSIiwiUEVSU09OX1NFUlZJQ0U6TFJVQ0QiLCJQRVJTT05fQ09OVEFDVDpDTFJVRCIsIlBMQVRGT1JNX0RPQ1VNRU5UUzpMUlVDTkQiLCJQRVJTT05fUEhZU0lDQUw6Q0xSVUQiLCJQRVJTT05fUkVNQVJLU19QRVJTT05BTDpDTFJVRCIsIlBMQVRGT1JNX1dPUktGTE9XX0lOU1RBTkNFOkNVIiwiUEVSU09OX0RPQ1VNRU5UU19NRURJQ0FMOkxSVUNORCIsIlBFUlNPTl9SRU1BUktTX1RSQVZFTDpMUiIsIlBFUlNPTl9SRU1BUktTX0RJU0NJUExJTkFSWTpDTFJVRCIsIlBPUlRfTVlfQ09NTUVOVFM6TFJVQ0QiLCJQT1JUX0lORk86TFIiLCJWRVNTRUxfUkVTUE9OU0lCSUxJVElFUzpMUlVDIl0sInVzZXJJZCI6ImYxYzU3ZTRiLWEwZmEtNGJhZS04NWM4LTMzMzE1YWE4ZjJlZCJ9.NJ9jflW7Hk_RLL8X7olwvxeSl6bdeO783_iOkLegBcocZCP-0szh3kZ45gG3WAGN6O2KERuyg7lVGz1b5PT7CpaOyBGTGMYZDN6H6mqqs4EUidS10OV7191rYxLy0xh5axdw4O9Oci3fUL8U8ueO0AUNVomb_Tdw-3Ncgi7e35wmxv-PZiPe6fs_1aldy3BCCs4nZxV6cu8DeOmBRYuSC8qzYwmwZvWAlj_JAOOOT_1xvG8zQ6YUGPrNreADKZhVnbksjfGBee3fE3GjcFK8Sd1dbwap2hcw0n-FX-2PPK2IqLLeEdrOK1KcaxwHSfCupPGvXQa9ga03GMRhyd95Q9uZ6nSt2UijMEglCCQzSMKMQykriBsAD3_gQ375UnrXwzFdo6qfBWrNiBOuQiHN2E5XqMpRk1GYBH1K1cf8wp1ninGX4kJP6jbIGkAwNlsE0sLeXifUV0SLBHwEEFreq6BjLyXrWnmhXUPMoQOa-07zwx5iypwSSSr_o6x2xsEjKtJWA6kau_uArEfS7fYpTqKrOpZhPMZyrk5LojLgAnsPEgoDmm0PxMDDPaDSExXGwwDDumOw5UDrgoSp6Holt0XJ9sV1N2Jy7uoKCT6268ovIHOwjsrAKugNJAfGRYuxAzT9Dr5R3wxxZxw571yLAxitA_5fZeejgqZqrZ1MaII
I think the issue is that you're trying to decode the full JWT in a single hit. The JWT specification says that a JWT is three . separated base64 encoded pieces.
If you split the string on the two dots then each of the three pieces do decode using standard command line tools. The earlier poster is correct that underscores and hyphens are not valid in pure base64 encoding, but there are variants and some do allow these characters so provided you are following a consistent strategy those characters are fine.
What you have is not properly encoded base64. You have periods, underscores and hyphens in there.
I was able to clean up your string
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9eyJhY2NvdW50IjoiNTgzZDQzZTItY2Q2ZC00YzBjLTlkNTItMmE4ZmNlY2Y2ZTVmIiwiYWNjb3VudFJvbGVzIjpbIk9PU19QRVJTT05fUkVNQVJLU19FRElUT1IiLCJBV1MtQVJUQVBQLUFETUlOIiwiT09TX1BFUlNPTl9ET0NVTUVOVFNfUkVBREVSX05PTl9NRURJQ0FMIiwiQVdTLVZFU1NFTExPQURJTkctQURNSU4iLCJPT1NfUEVSU09OX1JFTUFSS1NfUkVBREVSX1NBTEFSWSIsIlZQTlVTRVI5IiwiT09TX0NIRUNLTElTVF9BRE1JTiIsIkFXUy1QUk9ELUFETUlOIiwiQ09OQ09VUlNFLkNJIiwiT09TX1BPUlRfQ09NTUVOVFNfRURJVE9SIiwiT09TX1BFUlNPTl9FRElUT1JfU0VSVklDRSIsIk9PU19WRVNTRUxfUkVTUE9OU0lCSUxJVElFU19FRElUT1IiLCJNQU5BR0VEIERBVEFCQVNFUyIsIlZQTlVTRVIxMCIsIk9PU19QT1JUX01ZX0NPTU1FTlRTX0RFTEVUT1IiLCJPT1NfV09SS0ZMT1dfVVNFUiIsIk9PU19QRVJTT05fRURJVE9SIiwiQVdTLVNBTkRCT1gtQURNSU4iLCJPT1NfU1VQRVJfVVNFUiIsIlZQTlVTRVI1IiwiT09TX1ZFU1NFTF9JVElORVJBUllfUkVBREVSIiwiT09TX1BFUlNPTl9SRUFERVJfQkFDS0dST1VORCIsIk9PU19QRVJTT05fTUVESUNBTF9FRElUT1IiLCJPT1NfUEVSU09OX0RFTEVUT1IiLCJPT1NfSU5TUEVDVElPTl9TQ0hFRFVMRV9ERUxFVE9SIiwiT09TX1BFUlNPTl9SRUFERVJfU0VSVklDRSIsIk9PU19QRVJTT05fRE9DVU1FTlRTX0VESVRPUiIsIk9PU19QRVJTT05fRE9DVU1FTlRTX0RFTEVUT1IiLCJWUE5VU0VSMiIsIk9PU19QT1JUX0NPTU1FTlRTX0RFTEVUT1IiLCJPT1NfUEVSU09OX1JFTUFSS1NfUkVBREVSX1BFUlNPTkFMIiwiT09TX1BPUlRfTVlfQ09NTUVOVFNfRURJVE9SIiwiT09TX1RFTVBMQVRFX0xJQlJBUllfRURJVE9SIiwiQVdTLVRFU1QtQURNSU5AOTBQT0UuSU8iLCJPT1NfUEVSU09OX0RPQ1VNRU5UU19ET1dOTE9BREVSX01FRElDQUwiLCJJTlNQRUNUSU9OX1RFTVBMQVRFIiwiVlBOVVNFUjgiLCJPT1NfUEVSU09OX1JBTktfQVZBSUxBQklMSVRZX0VESVRPUiIsIlZQTlVTRVI0IiwiVlBOVVNFUjEiLCJPT1NfUEVSU09OX01FRElDQUxfREVMRVRPUiIsIlZQTlVTRVIzIiwiQVdTLUlBTS1BRE1JTiIsIkFXUy1BUlRBUFAtUFJPRC1BRE1JTiIsIk9PU19GTEVFVF9QRVJGT1JNQU5DRSIsIk9PU19JTlNQRUNUSU9OX1NDSEVEVUxFX1JFQURFUiIsIk9PU19QRVJTT05fRE9DVU1FTlRTX0RPV05MT0FERVJfTk9OX01FRElDQUwiLCJPT1NfUEVSU09OX0RPQ1VNRU5UU19SRUFERVJfTUVESUNBTCIsIk9PU19QT1JUX0lORk9fUkVBREVSIiwiVlBOVVNFUjYiLCJPT1NfUEVSU09OX1JFTUFSS1NfUkVBREVSX01FRElDQUwiLCJBV1MtREVWLUFETUlOIiwiQVdTLVRPT0xTLUFETUlOIiwiQVdTLUJJTExJTkctQURNSU4iLCJJTlNQRUNUSU9OX1RFTVBMQVRFX0lURU0iLCJPT1NfUEVSU09OX1JFTUFSS1NfREVMRVRPUiIsIk9PU19QRVJTT05fUkVBREVSX1NVTU1BUlkiLCJPT1NfUEVSU09OX1JFQURFUl9QSFlTSUNBTCIsIk9PU19XT1JLRkxPV19FRElUT1IiLCJPT1NfVEVNUExBVEVfTElCUkFSWV9SRUFERVIiLCJBUFBST0FDSF9JTkZPX0VESVRPUiIsIk9PU19URU1QTEFURV9MSUJSQVJZX0RFTEVUT1IiLCJBV1MtU1RHLUFETUlOIiwiT09TX1BFUlNPTl9SRU1BUktTX1JFQURFUl9UUkFWRUwiLCJPT1NfTUFOTklOR19BR0VOVF9WQVJOQSIsIk9PU19QRVJTT05fUkVBREVSX0NPTlRBQ1QiLCJPT1NfUEVSU09OX1JFQURFUl9ORVhUT0ZLSU4iLCJPT1NfSU5TUEVDVElPTl9TQ0hFRFVMRV9FRElUT1IiLCJFVkVSWU9ORSIsIk9PU19QRVJTT05fUkVBREVSX0ZBTUlMWSIsIk9PU19QRVJTT05fUkVNQVJLU19SRUFERVJfRElTQ0lQTElOQVJZIl0sImFwcGxpY2F0aW9uIjoiT09TIiwiY29udGV4dCI6Ik1BTk5JTkdfT0ZGSUNFOjEyOWQxNjAzLWFhZTktNDQ1NC04OWRjLWI4NzFhMGNmNGJhMCIsImV4cCI6MTU0MTc3Njg3OCwiaWF0IjoxNTQxNzc2Mjc4LCJpc3MiOiIiLCJwZXJtaXNzaW9ucyI6WyJJTlNQRUNUSU9OX1NDSEVEVUxFOkRMUlVDIiwiUEVSU09OX1JFTUFSS1NfU0FMQVJZOkNMUlVEIiwiUEVSU09OX0RPQ1VNRU5UU19DRVJUSUZJQ0FURVM6TFJDTlVEIiwiUEVSU09OOkxSQ1VEIiwiUEVSU09OX05FWFRPRktJTjpDTFJVRCIsIlBFUlNPTl9SRU1BUktTX01FRElDQUw6Q0xSVUQiLCJQRVJTT05fRE9DVU1FTlRTX0xJQ0VOU0U6TFJDTlVEIiwiUEVSU09OX0ZBTUlMWTpDTFJVRCIsIlBPUlRfSU5GT19BUFBST0FDSDpMUlVDIiwiSU5TUEVDVElPTl9URU1QTEFURTpMUlVDRCIsIklOU1BFQ1RJT05fVEVNUExBVEVfSVRFTTpMUlVDRCIsIlBMQVRGT1JNX1JFTUFSS1M6TFJVQ0QiLCJQTEFURk9STV9XT1JLRkxPV19ERUZJTklUSU9OOkNVIiwiVkVTU0VMX0lUSU5FUkFSWTpMUiIsIlBPUlRfQ09NTUVOVFM6TFJVQ0QiLCJQRVJTT05fUkFOS19BVkFJTEFCSUxJVFk6TFJDVUQiLCJQRVJTT05fQkFDS0dST1VORDpDTFJVRCIsIkZMRUVUX1BFUkZPUk1BTkNFOkxSIiwiUEVSU09OX1NFUlZJQ0U6TFJVQ0QiLCJQRVJTT05fQ09OVEFDVDpDTFJVRCIsIlBMQVRGT1JNX0RPQ1VNRU5UUzpMUlVDTkQiLCJQRVJTT05fUEhZU0lDQUw6Q0xSVUQiLCJQRVJTT05fUkVNQVJLU19QRVJTT05BTDpDTFJVRCIsIlBMQVRGT1JNX1dPUktGTE9XX0lOU1RBTkNFOkNVIiwiUEVSU09OX0RPQ1VNRU5UU19NRURJQ0FMOkxSVUNORCIsIlBFUlNPTl9SRU1BUktTX1RSQVZFTDpMUiIsIlBFUlNPTl9SRU1BUktTX0RJU0NJUExJTkFSWTpDTFJVRCIsIlBPUlRfTVlfQ09NTUVOVFM6TFJVQ0QiLCJQT1JUX0lORk86TFIiLCJWRVNTRUxfUkVTUE9OU0lCSUxJVElFUzpMUlVDIl0sInVzZXJJZCI6ImYxYzU3ZTRiLWEwZmEtNGJhZS04NWM4LTMzMzE1YWE4ZjJlZCJ9NJ9jflW7HkRLL8X7olwvxeSl6bdeO783iOkLegBcocZCP0szh3kZ45gG3WAGN6O2KERuyg7lVGz1b5PT7CpaOyBGTGMYZDN6H6mqqs4EUidS10OV7191rYxLy0xh5axdw4O9Oci3fUL8U8ueO0AUNVombTdw3Ncgi7e35wmxvPZiPe6fs1aldy3BCCs4nZxV6cu8DeOmBRYuSC8qzYwmwZvWAljJAOOOT1xvG8zQ6YUGPrNreADKZhVnbksjfGBee3fE3GjcFK8Sd1dbwap2hcw0nFX2PPK2IqLLeEdrOK1KcaxwHSfCupPGvXQa9ga03GMRhyd95Q9uZ6nSt2UijMEglCCQzSMKMQykriBsAD3gQ375UnrXwzFdo6qfBWrNiBOuQiHN2E5XqMpRk1GYBH1K1cf8wp1ninGX4kJP6jbIGkAwNlsE0sLeXifUV0SLBHwEEFreq6BjLyXrWnmhXUPMoQOa07zwx5iypwSSSro6x2xsEjKtJWA6kauuArEfS7fYpTqKrOpZhPMZyrk5LojLgAnsPEgoDmm0PxMDDPaDSExXGwwDDumOw5UDrgoSp6Holt0XJ9sV1N2Jy7uoKCT6268ovIHOwjsrAKugNJAfGRYuxAzT9Dr5R3wxxZxw571yLAxitA5fZeejgqZqrZ1MaII
That decodes fine in https://www.developertoolkits.com/base64/decoder
#Gary Lloyd is right in that because JWT format is <header>.<payload>.<signature> and . is not part of base64, that causes base64 command to error out with invalid input.
What's more is that JWT format for <header>, <payload>, and <signature> are base64URL which is slight modification of base64.
Another note is that if you try to decode <signature>, it inherently has all kinds of funny characters(maybe binary?) which will fail if you try decoding it.
All that said, try separating the <header> and <payload> and it should decode just fine.
Here's a simple bash script that you can add in your bashrc file if you really want to use cli.
#!/usr/bin/env bash
function base64jwt() {
read content
token_parts=($(echo "$content" | tr "." "\n"))
echo "[$(base64 -d <<< "$token_parts[1]"), $(base64 -d <<< "$token_parts[2]")]"
}
Note that I added the content in an array format [<header_content>, <payload_content>] so that you can easily pipe that to tool like jq for prettier json format.
e.g.
echo "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG5Eb2UiLCJpYXQiOjE1MTYyMzkwMjJ9.RW7PfdmXPU5baGCV2WIXosKpuahEMUUY4nJvEh7gLDo" \
| base64jwt \
| jq

Read the mail attachment from Linux command line

Is it possible to read the emails based on the Subject line and then get the base64 attachment or directly get the attachment ?
Server : Linux System
Your question seems to presuppose that there is a single attachment and that it can be reliably extracted. In the general case, an email message can have a basically infinite amount of attachments, and the encoding could be one out of several.
But if we assume that you are dealing with a single sender which consistently uses a static message template where the first base64 attachment is always going to be the one you want, something like
case $(formail -zcxSubject: <"$message") in
"Hello, here is your report for "*)
awk 'BEGIN { h=1 }
h { if ($0 ~ /^$/) h=0 ; next } # skip headers
/^Content-Disposition: attachment/ { a=1 } # find att
a && /^$/ { p=1; next }
p && /^$/ { exit }
p' "$message" |
base64 -d ;;
esac
This will extract the Subject: header and compare it to a glob pattern. I expect this is what you mean by "based on subject" -- if we find a matching subject header, examine this message, otherwise discard.
The crude Awk script attempts to isolate the base64 data and pass it to base64 -d for extraction. This contains a number of pesky and somewhat crude assumptions about the message format, and probably requires significant additional tweaking. Briefly, we skip the headers, then look for MIME headers identifying an attachment, and print that, skipping everything else in the message. If this header is missing, or identifies the wrong MIME part, you will get no results, or (worse) incorrect results. Also, the /^Content-Disposition:/ regex could theoretically match on a line which is not a MIME header, though this seems highly unlikely (but might actually happen if you are looking e.g. at a bounce message).
A more robust approach would involve a MIME extraction tool or perhaps a custom script to actually parse the MIME structure and extract the part you want. Without details about what exactly you need, I'm not able to provide that. (This would also allow you to use the sender's specified filename; the above script simply prints the decoded payload to standard output.)
Note also that formail has no idea about RFC2047 encoding, so if the subject is not plain ASCII, you have to specify the encoded form in the script.

Using a nodejs server to load files with Unicode names

I want to load audio files with names like здраво.mp3, using a NodeJS server. (That's "zdravo" or "hello" in Serbian, if you were wondering).
However, NodeJS makes a request for %D0%B7%D0%B4%D1%80%D0%B0%D0%B2%D0%BE.mp3 instead, which results in the file not being found.
If I drag the file into a browser window from my desktop, the browser is happy to load it as file///path/здраво.mp3, so the issue is not with the way the browser is treating the Unicode string
The HTML page containing the link to the file has this meta tag in the head section...
<meta charset="utf-8" />
... and it is quite happy to display the text "Здраво" on the page, so the Unicode strings are properly formed within the browser.
I am guessing that the browser is converting the name to ISO-8859-1 before sending the request, and that the NodeJS server somehow needs to convert it back to Unicode before looking for it in the file system.
My question is: is there already a module that I can use to do this conversion, and are there examples of how to use it?
SOLUTION: Following the reply from Edwin Dalorzo, here is the one-line fix that I made to my handleRequest() function:
function handleRequest(request, response) {
request.url = decodeURIComponent(request.url) // the fix
var pathname = url.parse(request.url).pathname
It is not clear how you are receiving the encoded string, but for sure you can decode by simply doing:
decodeURIComponent("%D0%B7%D0%B4%D1%80%D0%B0%D0%B2%D0%BE")
And this will give you back your string "здраво"

What is the encoding of an .eml file from IIS's SMTP server?

I need to write a program that read the .eml files from IIS's mail drop box, but I can't find a definitive source that tells me the encoding of the .eml files. Is there a specification somewhere that tells me the encoding of the files, or do I just have to guess/assume one?
You need to read the Content-Transfer-Encoding header. This value will tell you how the email is encoded. The most common are 7-Bit (no encoding), Quoted-Printable (where you see a lot of =HEX pairs), and base64 (which is base 64 encoding).
Based upon that header value, you decode the following body part using the specified routine.
I found my answer at en.wikipedia.org/wiki/MIME: "The basic Internet e-mail transmission protocol, SMTP, supports only 7-bit ASCII characters... "
Though it's too late to answer but eml file format nothing but a plaintext MIME (rfc822) file format for storing emails.

Resources