AS3 - "\u2605" NOT the same as "\\u"+"2605"? - string

Trying to make a textfield where people write the unicode without the backslash. I want to add the backslash after they typed it. So the user types u2605 and the code converts it to "\u2605", i then convert this to a unicode character and insert it in textflow.
My code:
this works:
span.text = publicFunctions.htmlUnescape(he.encode("\u2605"))
this doesn't work:
span.text = publicFunctions.htmlUnescape(he.encode("\\u"+"2605"))
how to make a string that acts as a unicode string?
Tried all sorts of things, escape(unescape()), convert to number, "\u", "\u" ... nothing helps.
trace("\u2605" == "\u"+"2605") ... will return false. So will
trace("\u2605" == "\u"+"2605")

"\u2605" is a string with a single character, the character with the code point 2605, while "\\u" + "2605" is a string with 6 characters (the backslash, the u and the four digit number).
If you want to construct a unicode character from just the four digits, you should be able to use String.fromCharCode. The thing is just that the escape sequence uses a hexadecimal number, while the method obviously takes a decimal number. So if the user enters a hexadecimal string, you will have to convert that first:
trace(String.fromCharCode(parseInt('2605', 16)) == '\u2605'));

That's an interesting issue! I don't think you can concatenate a string literal and achieve what you're trying to do. The relevant character escaping happens when the string literal is originally formed, which means that you need the whole sequence together in the first place.
But you should be able to take the user-supplied number and dynamically generate a Unicode string with String.fromCharCode(...).
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/String.html#fromCharCode()

Related

Why the string in unicode form is not equal to its unicode code point value?

We can get the string 你's unicode code point value:
u'你'.encode('unicode-escape')
b'\\u4f60'
Why the string in unicode form is not equal to its unicode code point value?
u'你' == u'\x4f\x60'
False
u'你' == u'\\u4f60'
False
It is, but your comparison strings are not correct to compare. The first one is two separate characters of a single byte, and the second one has the backslash escaped, meaning that it is the literal 6 characters \u4f60.
u'你' == u"\u4f60"
True
The encoded byte string has the two backslashes since the encoding escapes it, making it not equivalent even if turned back into a string unless you decode it with unicode-escape as well.
Side note, the u is default in python 3.

Python. Why the length of the list changes after turning it from int to string?

I have a bunch of users in a list called UserList.
And I do not want the output to have the square brackets, so I run this line:
UserList = [1,2,3,4...]
UserListNoBrackets = str(UserList).strip('[]')
But if I run:
len(UserList) #prints22 (which is correct).
However:
len(UserListNoBrackets) #prints 170 (whaaat?!)
Anyway, the output is actually correct (I'm pretty sure). Just wondering why that happens.
Here:
UserListNoBrackets = str(UserList).strip('[]')
UserListNoBrackets is a string. A string is a sequence of characters, and len(str) returns the numbers of characters in the string. A comma is a character, a white space is a character, and the string represention of an integer has has many characters as there are digits in the integer. So obviously, the length of your UserListNoBrackets string is much greater than the length of you UserList list.
You probably need str.join
Ex:
user_list = [1,2,3,4...]
print(",".join(map(str, user_list)))
Note:
Using map method to convert all int elements in list to string.

How to convert string like "//u****" to text?

I want to convert a string like "//u****" to text (unicode) in Haskell.
I have a Java propertyes file, and it has the following content:
i18n.test.key=\u0050\u0069\u006e\u0067\u0020\uc190\uc2e4\ub960\u0020\ud50c\ub7ec\uadf8\uc778
I wanna convert it to text (Unicode) in Haskell.
I think I can do it like this:
Convert "\u****" to word8 array
Convert word8 array to ByteString
Use Text.Encoding.decodeUtf8 convert ByteString to text
But step 1 is little complicated for me.
How to do it in Haskell?
A simple solution may look like this:
decodeJava = T.decodeUtf16BE . BS.concat . gobble
gobble [] = []
gobble ('\\':'u':a:b:c:d:rest) = let sym = convert16 [a,b] [c,d]
in sym : gobble rest
gobble _ = error "decoding error"
convert16 hi lo = BS.pack [read $ "0x"++hi, read $ "0x"++lo]
Notes:
Your string is UTF16-encoded, therefore you need decodeUtf16BE.
Decoding will fail if there are other characters in the string. This code will work with your example only if you remove the trailing i.
Constructing the words by appending 0x and, in particular, using read is very slow, but will do the trick for small data.
If you replace \u with \x then this is a valid Haskell string literal.
my_string = "\x0050\x0069\x006e..."
You can then convert to Text if you want, or leave it as String, or whatever.
Watch out, Java normally uses UTF-16 to encode its strings, so interpreting the bytes as UTF-8 will probably not work.
If the codes in your file are UTF-16, you need to do the following:
find the numeric value (Unicode code point) for each quadrupel
check if this is a high surrogate character. If this is so, the following character will be a low surrogate character. The pair of surrogate characters can be mapped to a Unicode point.
make a String from your list of unicode numbers with map fromEnum
The following is a quote from the Java doc http://docs.oracle.com/javase/7/docs/api/ :
The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode Standard.)
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
Java has methods to combine a high surrogate character and a low surrogate character to get the Unicode point. You may want to check the source of the java.lang.Character class to find out how exactly they do this, but I guess it is some simple bit-operation.
Another possibility would be to check for a Haskell library that does UTF-16 decoding.

In Swift how to obtain the "invisible" escape characters in a string variable into another variable

In Swift I can create a String variable such as this:
let s = "Hello\nMy name is Jack!"
And if I use s, the output will be:
Hello
My name is Jack!
(because the \n is a linefeed)
But what if I want to programmatically obtain the raw characters in the s variable? As in if I want to actually do something like:
let sRaw = s.raw
I made the .raw up, but something like this. So that the literal value of sRaw would be:
Hello\nMy name is Jack!
and it would literally print the string, complete with literal "\n"
Thank you!
The newline is the "raw character" contained in the string.
How exactly you formed the string (in this case from a string literal with an escape sequence in source code) is not retained (it is only available in the source code, but not preserved in the resulting program). It would look exactly the same if you read it from a file, a database, the concatenation of multiple literals, a multi-line literal, a numeric escape sequence, etc.
If you want to print newline as \n you have to convert it back (by doing text replacement) -- but again, you don't know if the string was really created from such a literal.
You can do this with escaped characters such as \n:
let secondaryString = "really"
let s = "Hello\nMy name is \(secondaryString) Jack!"
let find = Character("\n")
let r = String(s.characters.split(find).joinWithSeparator(["\\","n"]))
print(r) // -> "Hello\nMy name is really Jack!"
However, once the string s is generated the \(secondaryString) has already been interpolated to "really" and there is no trace of it other than the replaced word. I suppose if you already know the interpolated string you could search for it and replace it with "\\(secondaryString)" to get the result you want. Otherwise it's gone.

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

Resources