I have been trying to activate a service through sms, but there is a part where '<' and '3' comes together and it turns into the heart emoji ruining the whole sms . Is there a solution to fix this issue?
If you can, set encoding to GSM-7. Emojis are sometimes shown when encoding is UCS-2 (UTF-16)
Related
I have a PDF document with the following sample text (screenshot) -
But when I copy and paste it to either word or other text editors all I see is the weird characters :
I am not quite sure why does it giving me weird square boxes instead of pasting the clear human-readable letters (just like the screenshot). Can someone help me how can I get rid of this issue ? Or at least what shall I do to identify the root cause of this strange issue ?
================== Workaround found ==================
I tried converting the document's corrupted unicode to a standard ANSCI unicode formats. But most of the online services couldn't recognize these garbage/weird characters.
This issue could be resolved by some programming, but I don't want to invest time with the programming approach and preferred on the fly approach.
Finally, as suggested by the user 'mkl', converting this document by using the OCR services like "Sedja"/ "Adobe OCR" resolved by issue.
I stumbled upon an interesting character while skyping. I was initially interested to send an empty message but, since Skype has some validation rules, it doesn't allow empty messages.
But, with a few ALT+NUMPAD ASCII symbols I saw that for alt+777 it returns ○ on most editors and ̉ on skype, which delivers an empty-like message and this is where curiosity got to me. I started to abuse this symbol and noticed that it can overlap other characters. So if I write the ASCII symbol on a word, the result would be a ̉mủt̉ả̉̉̉t̉ẻd̉W̉ỏ̉r̉̉d̉̉.
You can see for yourself that "̉mủt̉ả̉̉̉t̉ẻd̉W̉ỏ̉r̉̉d̉̉".length == 24 while "mutatedWord".length == 11.
And the funny thing is that, up until now, I could only generate that weird aphostrophy on skype, anywhere else it results in ○.
Can someone explain this behavior to me?
You're writing Unicode codepoint 777 (but usually notated in hex as U+0307). This codepoint is one of Unicode's combinbing characters, namely "combining hook above". You've already discovered what they do: they add a diacritic to the next character.
If there's no letter following, there's no consisitent behavior. If Skype renders it differently from the other apps you've tried, it probably is using a different font engine.
Why the web-version of Gmail line-wrap its mail content without marking the breaking place with a =, which make email processing very difficult:
See the original mail content sent by gmail:
and this mail sent by Mac OS X Mail:
Edited:
As Brandon Invergo said, they are using different encoding method. I am sorry that I said GMail is not decent.
Edited 2:
Their original content are:
They are wrapped in Gmail, I guess it is according to word-wrap algorithm.
So, there are two separate issues here, and GMail is doing one of them "a different way" and one of them "the wrong way."
First is the issue of encoding. You're correct; GMail is using the UTF-8 character set for plain text mails by default, while Mac OS X Mail is using Quoted Printable, which is MIME content transfer encoding.
The second issue is word wrapping. RFC 2822 specifies that lines should be 78 characters or fewer (not including the CR+LF. Google solves this problem by (rather aggressively) introducing hard word wrapping, which looks ugly when displayed on smaller screens, etc. Most other mail clients use the features of quoted printable to introduce soft line breaks to comply with this recommendation. That allows mail clients to tell the difference between a "hard" (ie user-intended) and "soft" (ie introduced by the client) line break.
There is no reason GMail couldn't use this Quoted Printable convention instead of UTF-8, or use Format=Flowed (RFC 2646, FAQ) to achieve the same results. These have both been around a while, and it's a little silly that GMail is forcing word wrap on plain-text users, in my opinion.
A good primer on this whole situation is here.
Quoted printable
MIME content transfer encoding
RFC 2822
CR+LF
RFC 2646
Format=flowed FAQ
GMail is sending the text using UTF-8 character encoding, as indicated in the content type. The Mac email client is sending using Quoted-printable encoding. Both are used to send characters that are outside the ASCII range. GMail is sending 8-bit clean messages while Mail is sending 7-bit messages. The 7-bit messages should be more space efficient, but I would hesitate to say that a mail client that does not use them is somehow not "decent."
I'm trying to create a webpage in Chinese and I realized that while the text looks fine when I run it on browsers, once I change the Character Encoding, the text becomes gibberish. Here's what's happening:
I create my html file in Emacs, encoded in UTF-8.
I upload it to the server, and view it on my browsers (FF, IE, Chrome, Opera) - no problem.
I try to view the page in other encodings via FF > View > Character Encoding > All those different Chinese encoding systems, e.g. Chinese Simplified (HZ)
Apart from UTF-8, on every other encoding the text becomes gibberish.
I'm assuming this isn't a problem - i.e. browsers are smart enough to know which encoding the page is in, and parse the content accurately. What I'm wondering is why I can't read the Chinese text anymore once I change encoding - is it because I don't have Chinese fonts installed on my OS? Should I stick to UTF-8 if my audience are Chinese or should I choose among one of their many encoding systems?
Thanks in advance for your help/opinions.
UTF isn't a 'catch-all' encoding. It's designed to contain international language character symbols for ease of use, but it is still an encoding, just like the other encodings you've selected. You would have to retype the text in each encoding to make it appear correctly when viewed with that encoding.
Viewer encoding MUST match the file being read. Viewing UTF-8 as something other makes about same sense as renaming .txt to .exe and trying to run it.
You should specify correct encoding in HTML. The option you're using in web browser exist only for those rare occasions when web developer screwed up his job and declared other encoding than actually used OR mixed up 2 different encodings on one page.
Of course changing the encoding in your browser will "break" the text! The browser is taking the stream of UTF-8 codepoints and tries to force another encoding on the raw data. Needless to say, the result ain't pretty. Changing the encoding in the browser is NOT the equivalent of converting.
As you surmised correctly, modern browsers usually guess correctly -- but not always. As Agent_L make sure to declare the encoding in the headers.
Device is getting a UTF16 encoded XML response then I parse xml.
On some device it doesn't work. All these devices support utf-16.
What may be the problem in?
It's tricky without seeing code, but if I remember correctly, some nokia devices were choosy about whether you used a hyphen or not in the encoding name. If "UTF16" doesn't work, try "UTF-16" instead, or vise versa.