I know about Data URIs in which base64 encoded data can be used inline such as images. Today I received an email actually an spam one in which there was an animated (gif) icon in its subject:
Here is the icon alone:
So the only thing did cross my mind was all about Data URIs and if Gmail allows some sort of emoticons to be inserted in subject. I saw the full detailed version of email and pointed to subject line at the below picture:
So GIF comes from =?UTF-8?B?876Urg==?= encoded string which is similar to Data URI scheme however I couldn't get the icon out of it. Here is element HTML source:
Long story short, there are lots of emoticons from https://mail.google.com/mail/e/XXX where XXX are hexadecimal numbers. They are documented nowhere or I couldn't find it. If that's about Data URI, so how is it possible to include them in Gmail's email subject? (I forwarded that email to a yahoo email account, seeing [?] instead of icon) and if it's not, then how that encoded string is parsed?
#Short description:
They are referred to internally as goomoji, and they appear to be a non-standard UTF-8 extension. When Gmail encounters one of these characters, it is replaced by the corresponding icon. I wasn't able to find any documentation on them, but I was able to reverse engineer the format.
#What are these icons?
Those icons are actually the icons that appear under the "Insert emoticons" panel.
While I don't see the 52E icon in the list, there are several others that follow the same convention.
B0C
4F4
Note that there are also some icons whose names are prefixed, such as gtalk.03C . I was not able to determine if or how these icons can be used in this manner.
#What is this Data URI thing?
It's not actually a Data URI, though it does share some similarities. It's actually a special syntax for encoding non-ASCII characters in email subjects, defined in RFC 2047. Basically, it works like this.
=?charset?encoding?data?=
So, in our example string, we have the following data.
=?UTF-8?B?876Urg==?=
charset = UTF-8
encoding = B (means base64)
data = 876Urg==
#So, how does it work?
We know that somehow, 876Urg== means the icon 52E, but how?
If we base64 decode 876Urg==, we get 0xf3be94ae. This looks like the following in binary:
11110011 10111110 10010100 10101110
These bits are consistent with a 4-byte UTF-8 encoded character.
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
So the relevant bits are the following.:
011 111110 010100 101110
Or when aligned:
00001111 11100101 00101110
In hexadecimal, these bytes are the following:
FE52E
As you can see, except for the FE prefix which is presumably to distinguished the goomoji icons from other UTF-8 characters, it matches the 52E in the icon URL. Some testing proves that this holds true for other icons.
#Sounds like a lot of work, is there a converter?:
This can of course be scripted. I created the following Python code for my testing. These functions can convert the base64 encoded string to and from the short hex string found in the URL. Note, this code is written for Python 3, and is not Python 2 compatible.
###Conversion functions:
import base64
def goomoji_decode(code):
#Base64 decode.
binary = base64.b64decode(code)
#UTF-8 decode.
decoded = binary.decode('utf8')
#Get the UTF-8 value.
value = ord(decoded)
#Hex encode, trim the 'FE' prefix, and uppercase.
return format(value, 'x')[2:].upper()
def goomoji_encode(code):
#Add the 'FE' prefix and decode.
value = int('FE' + code, 16)
#Convert to UTF-8 character.
encoded = chr(value)
#Encode UTF-8 to binary.
binary = bytearray(encoded, 'utf8')
#Base64 encode return end return a UTF-8 string.
return base64.b64encode(binary).decode('utf-8')
###Examples:
print(goomoji_decode('876Urg=='))
print(goomoji_encode('52E'))
###Output:
52E
876Urg==
And, of course, finding an icon's URL simply requires creating a new draft in Gmail, inserting the icon you want, and using your browser's DOM inspector.
If you use the correct hex code point (e.g. fe4f4 for 'pile of poo') and If it is correctly encoded within the subject line header, let it be base64 (see #AlexanderOMara) or quoted-printable (=?utf-8?Q?=F3=BE=93=B4?=), then Gmail will automatically parse and replace it with the corresponding emoji.
Here's a Gmail emoji list for copying and pasting into subject lines - or email bodies. Animated emojis, which will grab even more attention in the inbox, are placed on a yellow background:
Many thanks to Alexander O'Mara for such a well-researched answer about the goomoji-tagged HTML images!
I just wanted to add three things:
There are still many many emoji (and other Unicode sequences generating pictures) that spammers and other erstwhile marketers are starting to use in email subject lines and that gmail does not convert to HTML images. In some browsers these show up bold and colored, which is almost as bad as animation. Browsers could also choose to animate these, but I don't know if any do. These Unicode sequences get displayed by the browser as Unicode text, so the exact appearance (color or not, animated or not, ...) depends on what text rendering system the browser is using. The appearance of a given Unicode emoji also depends on any Unicode variation selectors and emoji modifiers that appear near it in the Unicode code point sequence. Unlike the image-based emoji spam, these sequences can be copied-and-pasted out of the browser and into other apps as Unicode text.
I hope the many marketers reading this StackOverflow question will just say no. It is a horrible idea to include these sequences in your email subject lines and it will immediately tarnish you and your brand as lowlife spammers. It is not worth the "attention" your email will get.
Of course the first question coming to everyone's mind is: "how do I get rid of these things?" Fortunately there is this open-source Greasemonkey/Tampermonkey/Violentmonkey userscript:
Gmail Subject Line Emoji Roach Motel
This userscript eliminates both HTML-image (thanks to awesome work of Alexander O'Mara) and pure-Unicode types.
For the latter type, the userscript includes a regular expression designed to capture the Unicode sequences likely to be abused by marketers. The regex looks like this in ES6 Javascript (the userscript translates this to widely-supported pre-ES6 regex using the amazing ES6 Regex Transpiler):
var re = /(\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F|[\u{2100}-\u{2BFF}\u{E000}-\u{F8FF}\u{1D000}-\u{1F5FF}\u{1F650}-\u{1FA6F}\u{F0000}-\u{FFFFF}\u{100000}-\u{10FFFF}])\s*/gu
// which includes the Unicode Emoji pattern from
// https://github.com/tc39/proposal-regexp-unicode-property-escapes
// plus also these blocks frequently used for spammy emojis
// (see https://en.wikipedia.org/wiki/Unicode_block ):
// U+2100..U+2BFF Arrows, Dingbats, Box Drawing, ...
// U+E000..U+F8FF Private Use Area (gmail generates them for some emoji)
// U+1D000..U+1F5FF Musical Symbols, Playing Cards (sigh), Pictographs, ...
// U+1F650..U+1FA6F Ornamental Dingbats, Transport and Map symbols, ...
// U+F0000..U+FFFFF Supplementary Private Use Area-A
// U+100000..U+10FFFF Supplementary Private Use Area-B
// plus any space AFTER the discovered emoji spam
Related
I'm doing some reverse engineering and came across this string in one field of a ProtoBuf-encoded HTTP request. The ProtoBuf binary structure is already decoded; this is the contents of one of its fields.
Does anybody recognize this encoding? It's not base 64 and doesn't appear to be escaped Unicode characters since there are regular non-escaped characters interspersed throughout.
\002\000\000\000P\030\326--\037\352Hx\232\244\322.\224\'\246\004P\3314\372g\274\366\362\337\277\226b\236\nr8\351 ]u\362\214\374\330O5\246Y+\276\005\212\234\017\216\333\312*\313g\357\267t\227\034\244_}\205jiO\261\271\304\013\224\373.zZ\224\230\260\004\2411\000\323\362\345K\300h\307\\\220\335\304\022\357\2230\355\375\032\210\330\2711\374\272\336\277kC]\334?\226\370w\262\023)4 D\273\344H\212\000\347}u\336lOp\237\3666\337j\002*s\033|\010\000\000%\3157/w\327\364\252)\235\245wQ\325+W\026*\215\357E\005\271w\002\246\216\325\002&e\217T!\242\376\307\321\267\016_\017Q\265p\007\035\367\324\216H\314\222\3244\004\353b\017\325\025N\017\205dk\257\237g\"\367\245\324*\204^\010\233\244\002\266\007\231\226w\006\2056\313\265_\236Y\270\nP\216\nq\373\330#\345,\271\241\177\331\271\023K\227\013\317d\335mg\255\266\232pp2d\253\332A:Gs!0>O\226\315_\264G\234\326\240\213\261\253\017\352\214\365\007{ \022\365r<\306\354\355]\320\010\2511\225\215M\276\366P\264\003\315F\314\301\244\350\034\316P\375\317v(\360\244\347\371<$$9\360\267\340H\372\362\271\307\357\215J\3433\215\331=\tqQ1\354\213\333R\331--\213Tc\352a\337\236\346[#-\266]\354\202\335\307\333\330\213\351b\254kt\304\210\276\300\013\322\306\242\000\037\177\354#[U|\302j\r{\243\247\257\000XY\344\020x*\310\363\242^\315\271\371\335\210\030\310\255S\240)\234C\'\200+\313\246oM\304\271^6s\345IG3.\306Xhf\235\244\004[\314Tb\373\360\023\3565\265\254\351\236\227b\337\264\207\302\nq\t\\\236\253e\271I\244u\004\204\220\336\367\333\337# \005\361IPe\270v\016\010RPd\306r\254\2651J\250P\320\312l\177g\036\021\276fC\0136\363\372\265\003v\243y\215w\266q\364a\025\210\264\251\227\333\235r\316\275\237r\256o\231\263\3358?\240\001\306: \211\223\375-\247=\361\207\022\321Gb\326\230x\342\203\014^\371\243\037N\000\376+\370\302\351\227F\025\025\017C\225x]\201c\370\373{\230\2656\222\334,\266\016Q\320\005-Y\203z\200\207\205\2667\264\320\027\250\3007,\303\204\006\357\036\254)\271\271!;\233\300P\250\220\306V/_EP\352\272-\242\276q\252C\337BV\022\357\2467y\025\377A\017u\\\335m\352\037\215\026f\354\3100\312\032\235i\333\312|h\255\266\376\234\345\\\361lC\n\022\341K\022cnU\217\'\222Hl\312\006;0\003V\006\255\256\016\262Z\220zo\002\004\316\370\317\371\220O^q\247\313g\301\376\354W\346\001F\262\233\354\024\004kzk\032\313\0132u\346R\013z:TQ\007\347\273\343\022&X\357\334\305\307;\221W\301\236\360Ap\311\t\024W\004i\221\301a\356}\036\362\002J\267R\335\371(\357\025<\322H\232\334a\375\215eSl\324\214P\367\377T\236\346\346\026\367h\214\275;\013\205\n\302%\\\017\227a\373\376\347\222\\\014cT\340\'\361\024t\t:\203c\314\361W\252\336+\376e\353\336\237\272\2745\315\354\356\272\037Z\246Z\277;\344j\271\022\273\274\025\367\037\257\372p\204\224\314\244\026&o\365\220\235`\365c\377\306\304)&f]q\241\252|d\270H\010?\300$\275\200^!\r\272_\237V\241=\245\020#\314\362\032\031\312t\037\0344\254\264\213Y\315:\215\271\222\277\332\007\220\t\357N]\361O\\\257\352<F\001{\214\317\226\314\'&\232\026\314\350\020\200\316\370\216\231\325\2574\373R\231\316\251\257\260z!\033\203\357\364\310\021\0029\000)\034\010\276Tr\336y0\376\232h~\332y-\354\327w\220\254\321\022\210\266\345\245\325gy\210\357\356\215P$\270\372\3169\365\022\357\225A\324\352\313\340\3445\247\267\352{\037\266\244\205\262\023\t\\\224\020\236\307C\241\371\214\345\216^\271\320\345?\0052\341TD\235j\370\306\236\274\254J\213 \377\212K\032\265\251\367,Q\331\0067ZE\235\253\256\311\022\320\232\205p\262\370\032h\255\304\304D\366\340\276\006\200\307S\230\340?\212jj\261\377r\337\223 \305\217\310\344Xi()*\225z[Y\313_t}\331\240\000>\024:\3242\322\030\352ZWB\247`\320\340\243\204\224\312&\274\321qi\375\231\374\201\235\234{\344\367\002lO\350\363X\361\rh)\231\337r\361\306w\360B\271\013\233IoG\245~:X5%h\222\247J\\w\373\266\374\340\314\313\226\224\204:\250\363\243\265H|\003Y\263\023sZ7#\351)V]{\3065E\210t\207\353^\205q\211\003Yj\373\227Qb4v\2213TO\"S\301^\272\035\t\212|eJ\332t\243\177\274\016ni^ 8\273\317p(N\263j\375\254k\253h\206%ta*LM\270v\2473\220\263\366\211\302=Q\217~\0029\246\236\374\350\247%\221\001`B\337\321N\216wR\235\336\244.K;Y\330\033\372i<\3156{z\310\255\031\021wr{{\331F01\227\010\346B#\341\276\'\246\372S\250\356\222\370,\334h\217\025\334S\016\005\007/,\024\355\024V\246\007\036;\030\337\002c\254\304[\253\tN\331X|$[%*\242\353\254\227>\031\304\203\275\277``c\240\344\277\213\377\204\223\202\026#\367\271\302k\027\262\020H\024p\010\203\264iM\233F7\333\354\352\303\223\217Hi\'\375\010\302\035\013\273F(\032\272\377\252:8\213\304\036\264y\t\265\025\300\317\324za4\010I$Eu\310,\006y,^\3531\027g\343o\314j\270\3152gif\271(\037g\031\375\325\341\320\320\317HJ+\374>%\320\234V\317\332\232x\034x\233R9\245\346_r\307{\030y\234z\331zV\031\264\035\324\003\260AQ\024\217\230\213w\021\3205g\273\275nn\357\275\217?Kd\031\353yF\'\234\201\335+\177\350\001\340D\324\"\340\335\254\304\360=\301\'$\274\235e\032$N\345+\244WKC\204\342\024\307\3103\2722\024\216\002\221UbTn\233\244\261\347\303\340A\312l\317\263Gm\352\000v\245X\334\"\263\315z\374N\244\365\013\375\260\220\251\203\036gD\364p3i#n\016\031[,\336\300\0000\352\001NK()\214\023\222w\014B\242\220\206\034\333\256\265\331-\220\361F\203s\014S\236p\265\236\343g\020HR\235\325W\360\030(\374\341\000\261\315s\315vv\017]s\311o\033c\206\303\245\347\372C\345\207\244\207AL+\306c\026\001\307\3409\331\205\340\371\365\006\263\352kF\010\035K\354\225\035\341\014\360*\232\035\251\t\344\205\374\235\374\352\n}\262+\252\321\377\010G\215\263GA\230\364Z\037\323\351\220\226\272\002\207\254\241\263X\t_ N\307\326\350\246hI\223\223J&\373-\344\243\316\300m.FHmNdS?\tCf\001\252\307\346\205H\026\375)#\006\261g\036\307\252\205\000\027}\212_\021 )4\207#\213n\254H\205\036\325q\217\025\305\036\010J\017\320\257\203\226\025X\313\032,\003\341\003\023cw\375r\337\223\233>+\335\223\206\203}\035!\3100\242Tv\350\255\276\343&\220\213\361\354Ij\035\312_\273\233\333\327;\022\016\315a5\373\217t\324ZJ\202\304(~,B(\215\005E\341\375\036\260C\213\364\240\020\373\340\275\310\2048*\326\"^$\366\367\252#\201\355\000\273\010#`J\230\363\320\363L9\261\216\353((#;3\366oKR\021\nL\244a\244\376\032\304\376\001|\317c\222%c=\\\225\340I\225\301\277G\227\242\366\025\323y0\273\241\217E[\032=\253e\001\270q\005\241\374\276\267$\277Lj\3528\257z\247\242+}\304\254(\013\336g\230\237\270\212I#\245\247\271)\026i\346\366\342\021\005\373i\341`A\020|\367\337\312$`\241\322\007YaQ#\216cy&\371\206\223\264+g\0213b\315\217\371\364\013x\327\2478\0013\352\372\375E\233\352\200\213<\021puH\347x;\354\036\024\\\253_\340\200xH\353\350b\364\207\276*\323J\341\200\r\276]e\217\307\305\275\350\004V\300\272\271\010\345KM\330\2716$\030\225\223\322\347\325\260\331Ok\0340Y\241\276\353\223\276\253>\256\022\257CE\320\007D\236\201\026\214\177\036\277\347\031\001\254\240L\203\n\332\252c\211Y\031\310\212\r+J\274E0
I wanted to decode my PUBG name. I come to interact with this site: http://ddecode.com/hexdecoder/
It decodes as I want, but now I want to know what technique they use, so I can use it in my project.
Input :
PSYCH%C3%98%E4%B9%82JOKER
Decoded String:
PSYCHØ乂JOKER
Here Is The result Url: http://ddecode.com/hexdecoder/?results=48d3b517a922349a1838240623f6e7c3
You should take a look at Percent encoding, this is a way to encode stuff to be valid written in URLs. The characters after the % symbol are just the hexadecimal UTF-8 values to encode the special characters Ø乂.
0xC3 0x98 corresponds to Ø and 0xE4 0xB9 0x82 to 乂 in UTF-8.
By the way, since you added the encryption badge and wrote the word in your question. In this situation, we cannot speak of decryption; you might want to take a look at the difference between all that terminology (encoding and encryption, for example).
When i try to get subject and content email, i alwasy got this content has been encode, example:
subject: Thư cảm ơn
I got this subject has been encode: Th=C6=B0_c=E1=BA=A3m_=C6=A1n?=
so how to decode this content.
That string looks like it's supposed to be encoded in the format mandated by RFC 2047, except that it is corrupt: RFC 2047 encoded strings always begin with =? end end with ?=.
Assuming you can get a non-corrupt encoded string, you should expect to see not only the Subject but also the display names in To, From, and other headers to be encoded in this way. You might want to look into using a library that does complete MIME parsing for you. Appropriate available libraries will depend on what language you're using.
Hi Im fairly new to base64 code and could do with some info/help etc. I have recently used it while converting my photos into code as part of an experiment regarding my sculptures, I converted a photo into 152 pages of base 64 using an online converter, just to get started. I was wondering how I can reverse this, and also if I edited out some code would the image change or would it not convert at all!? Like I said I am interested in using this to help my experiments. thanks for any info .
Base64 decode functions will decode it. If you remove some of the base64 code it may not translate back into a valid image or translate into valid base64 code at all depending on how much you remove. Depending on how the image was originally compressed, it MAY give you something, but the results will be incredibly unpredictable... So you'll never know what the result of the change will be on the final product. Part of the base64 standard involves a padded width (which is why some strings end in up to 2 = signs).
I am looking for a url encoding method that is most efficient in terms of space. Raw binary (base2) could be represented in base16 which is smaller and is url safe, but base64 is even more efficient. However, the usual base64 encoding isn't url safe....
So what is the smallest encoding method that is also safe for URLS?
This is what the Base64 URL encoding variant is for.
It uses the same standard Base64 Alphabet except that + is changed to - and / is changed to _.
Most modern Base64 implementations will support this alternate encoding. If yours doesn't, it's usually just a matter of doing a search/replace on the Base64 input prior to decoding, or on the output prior to sending it to a browser.
You can use a 62 character representation instead of the usual base 64. This will give you URLs like the youtube ones:
http://www.youtube.com/watch?v=0JD55e5h5JM
You can use the PHP functions provided in this page if you need to map strings to a database numerical ID:
http://bsd-noobz.com/blog/how-to-create-url-shortening-service-using-simple-php
Or this one if you need to directly convert a numerical ID to a short URL string:
http://kevin.vanzonneveld.net/techblog/article/create_short_ids_with_php_like_youtube_or_tinyurl/
"base66" (theoretical, according to spec)
As far as I can tell, the optimal encoding for URLs is a "base66" encoding into the following alphabet:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789-_.~
These are all the "Unreserved characters" according the URI specification RFC 3986 (section 2.3), so they will appear as-is in the URL. Using this "base66" encoding could give a URL like:
https://example.org/articles/.3Ja~jkWe
The question is then if you want . and ~ in your URLs?
On some older servers (ancient by now, I guess) ~joe would mean the "www directory" of the user joe on this server. And thus a user might be confused as to what the ~ character is doing in the middle of your URL.
This is common for academic websites, especially CS professors (e.g. Donald Knuth's website https://www-cs-faculty.stanford.edu/~knuth/)
"base80" (in practice, but not battle-tested)
However, in my own testing the following 14 other symbols also do not get
percent-encoded (in Chrome 95 and Firefox 93):
!$'()*+,:;=#[]
(see also this StackOverflow answer)
leaving a "base80" URL encoding possible. Some of these (notably + and =) would not work in the query string part of the URL, only in the path part. All in all, this ends up giving you beautiful, hyper-compressed URLs like:
https://example.org/articles/1OWG,HmpkySCbBy#RG6_,
https://example.org/articles/21Cq-b6Ud)txMEW$,hc4K
https://example.org/articles/:3Tx**U9X'd;tl~rR]q+
There's a plethora of reasons why you might not want all of those symbols in your URLs. One example is that StackOverflow's own "linkifier" won't include that ending comma in the link it generates (I've manually made it a part of the link here).
Also the percent encoding seems to be quite finicky. In some cases Firefox would initially percent-encode ' and ~] but on later requests would not.