When i try to paste in a Unicode in The Text Entity as a text. it comes up with the error:
:text(warning): No definition in for character U+21d5
I have also tried to use the Unicode string itself inside the text string.
Anyone know how to do it?
Related
I pasted some text from a text editor (Atom) into IPython and it was rendered as I saw it on the editor, but some special characters appeared, too. These are light-blue carat capital-i's (^I). They seem to represent indentations. Indeed, when I search through the string by index slices, they show tab characters (\t).
What is this symbol's name? I tried to find it using unicodedata.name('^I'), but it returned a ValueError: no such name error.
If anyone knows where I can find a table of characters by their string representation that will save me a lot of time. The unicode.org source cited in the SO post above does not allow that. Something like this, but with ^I.
I have the following text in a json file:
"\u00d7\u0090\u00d7\u0097\u00d7\u0095\u00d7\u0096\u00d7\u00aa
\u00d7\u00a4\u00d7\u0095\u00d7\u009c\u00d7\u0092"
which represents the text "אחוזת פולג" in Hebrew.
no matter which encoding/decoding i use i don't seem to get it right with
Python 3.
if for example ill try:
text = "\u00d7\u0090\u00d7\u0097\u00d7\u0095\u00d7\u0096\u00d7\u00aa
\u00d7\u00a4\u00d7\u0095\u00d7\u009c\u00d7\u0092".encode('unicode-escape')
print(text)
i get that text is:
b'\\xd7\\x90\\xd7\\x97\\xd7\\x95\\xd7\\x96\\xd7\\xaa \\xd7\\xa4\\xd7\\x95\\xd7\\x9c\\xd7\\x92'
which in bytecode is almost the correct text, if i was able to remove only one backslash and turn
b'\\xd7\\x90\\xd7\\x97\\xd7\\x95\\xd7\\x96\\xd7\\xaa \\xd7\\xa4\\xd7\\x95\\xd7\\x9c\\xd7\\x92'
into
text = b'\xd7\x90\xd7\x97\xd7\x95\xd7\x96\xd7\xaa \xd7\xa4\xd7\x95\xd7\x9c\xd7\x92'
(note how i changed double slash to single slash) then
text.decode('utf-8')
would yield the correct text in Hebrew.
but i am struggling to do so and couldn't manage to create a piece of code which will do that for me (and not manually as i just showed...)
any help much appreciated...
This string does not "represent" Hebrew text (at least not as unicode code points, UTF-16, UTF-8, or in any well-known way at all). Instead, it represents a sequence of UTF-16 code units, and this sequence consists mostly of multiplication signs, currency signs, and some weird control characters.
It looks like the original character data has been encoded and decoded several times with some strange combination of encodings.
Assuming that this is what literally is saved in your JSON file:
"\u00d7\u0090\u00d7\u0097\u00d7\u0095\u00d7\u0096\u00d7\u00aa \u00d7\u00a4\u00d7\u0095\u00d7\u009c\u00d7\u0092"
you can recover the Hebrew text as follows:
(jsonInput
.encode('latin-1')
.decode('raw_unicode_escape')
.encode('latin-1')
.decode('utf-8')
)
For the above example, it gives:
'אחוזת פולג'
If you are using a JSON deserializer to read in the data, then you should of course omit the .encode('latin-1').decode('raw_unicode_escape') steps, because the JSON deserializer would already interpret the escape sequences for you. That is, after the text element is loaded by JSON deserializer, it should be sufficient to just encode it as latin-1 and then decode it as utf-8. This works because latin-1 (ISO-8859-1) is an 8-bit character encoding that corresponds exactly to the first 256 code points of unicode, whereas your strangely broken text encodes each byte of UTF-8 encoding as an ASCII-escape of an UTF-16 code unit.
I'm not sure what you can do if your JSON contains both the broken escape sequences and valid text at the same time, it might be that the latin-1 doesn't work properly any more. Please don't apply this transformation to your JSON file unless the JSON itself contains only ASCII, it would only make everything worse.
I would like to convert the special characters in "text" variable back normal in VBA!
Dim text As String
text = "Cs\u00fct\u00f6rt\u00f6k"
text = Encoding.utf8.GetString(Encoding.ASCII.GetBytes(text))
MsgBox text
'Csütörtök would be the correct result
But in the above code Excel 2013 gives me an error about "Encoding" method.. cant parse it.
It should be working just like this online converter if you put in the text value:
http://www.rapidmonkey.com/unicodeconverter/reverse.jsp
Is there any good solution for this problem? A one-line code maybe?
Thanks in advance!
The Encoding class does not decode escape sequences, you have to do that manually by parsing the string yourself. For that matter, VB strings use UTF-16, so you do not need to use the Encoding class at all. Simply replace characters 2-7 ("\u00fc") with a single &H00FC character, replace characters 9-14 ("\u00f6") with a single &H00F6 character, etc and then you are done. Each \uXXXX sequence represents a single Unicode codepoint.
I have a string in a C# application that needs to be underlined. This needs to be done in unicode as the string is exported and displayed in a word file. To do this I preceded every character with the underline unicode \u0332which works, but it does not completely underline the 'm' character as seen in this screenshot:
I have tried preceding the \u0332 a few times before the m and after but the output is always the same.
Is there any way to get it to completely underline the character?
EDIT: I just tried using the continuous underline unicode symbol \u2381 but that does not render at all.
U+0332 is a Unicode combining character, so ist goes after the character that it modifies. But this only specifies that the character should be underlined. The specific graphical representation depends on the application and its rendering engine; it's not fully supported everywhere. Try to paste the text i̲m̲p̲o̲r̲t̲a̲n̲t̲ into the application and see if it works as intended. If not, then there is nothing you can do, except using another representation such as *important* or IMPORTANT, or exporting in a supported rich text format (RTF, docx, etc.).
I have the following string, read from an XML attribute:
"OnTrak 4-3/4”, 6-3/4”, 8-1/4” / MPR"
In my C# application it shows up nicely formatted like this
"OnTrak 4-3/4”, 6-3/4”, 8-1/4” / MPR"
This is the form I see in the debugger, a combobox, or on this forum (if I don't indent to specify code).
What I want to do is specify the same string as a C# variable and have it show up nicely formatted when the application runs. Unfortunately, all I get is the string as I literally typed it.
I have tried to play around with converting the encoding from ASCII to UTF8 with no luck. How can I get this special character properly formatted, and where can I find a list of these symbols?
Those are called XML entities. Use HttpUtility.HtmlDecode to decode them back to plain text like you would like. Credit goes to C#, function to replace all html special characters with normal text characters for how to convert entities in C#
Note that converting from ASCII to UTF8 (and Unicode etc.) is called changing the character set and is usually done when specific characters are in the string. For instance if you strings contained Chinese characters you couldn't use ASCII. In this simple case you shouldn't need to convert character sets because C# strings are Unicode character set by default and XML entities are Unicode based (I believe).