XQuery how to represent a string as a number? - string

Let us assume that we have a string let $string:='red color', which is usually consist of 2 words. Is there a way to represent this string as a number Or sequence of numbers ? I'm using XQuery and Marklogic.

I found a solution of how to represent a string as a sequence of numbers. The function That I found represent a string with a sequence of numbers which is fine in my case.
The function fn:string-to-codepoints("pla pla pla") returns the sequence of Unicode code points that constitute a string.

Related

Python. Why the length of the list changes after turning it from int to string?

I have a bunch of users in a list called UserList.
And I do not want the output to have the square brackets, so I run this line:
UserList = [1,2,3,4...]
UserListNoBrackets = str(UserList).strip('[]')
But if I run:
len(UserList) #prints22 (which is correct).
However:
len(UserListNoBrackets) #prints 170 (whaaat?!)
Anyway, the output is actually correct (I'm pretty sure). Just wondering why that happens.
Here:
UserListNoBrackets = str(UserList).strip('[]')
UserListNoBrackets is a string. A string is a sequence of characters, and len(str) returns the numbers of characters in the string. A comma is a character, a white space is a character, and the string represention of an integer has has many characters as there are digits in the integer. So obviously, the length of your UserListNoBrackets string is much greater than the length of you UserList list.
You probably need str.join
Ex:
user_list = [1,2,3,4...]
print(",".join(map(str, user_list)))
Note:
Using map method to convert all int elements in list to string.

How to convert string like "//u****" to text?

I want to convert a string like "//u****" to text (unicode) in Haskell.
I have a Java propertyes file, and it has the following content:
i18n.test.key=\u0050\u0069\u006e\u0067\u0020\uc190\uc2e4\ub960\u0020\ud50c\ub7ec\uadf8\uc778
I wanna convert it to text (Unicode) in Haskell.
I think I can do it like this:
Convert "\u****" to word8 array
Convert word8 array to ByteString
Use Text.Encoding.decodeUtf8 convert ByteString to text
But step 1 is little complicated for me.
How to do it in Haskell?
A simple solution may look like this:
decodeJava = T.decodeUtf16BE . BS.concat . gobble
gobble [] = []
gobble ('\\':'u':a:b:c:d:rest) = let sym = convert16 [a,b] [c,d]
in sym : gobble rest
gobble _ = error "decoding error"
convert16 hi lo = BS.pack [read $ "0x"++hi, read $ "0x"++lo]
Notes:
Your string is UTF16-encoded, therefore you need decodeUtf16BE.
Decoding will fail if there are other characters in the string. This code will work with your example only if you remove the trailing i.
Constructing the words by appending 0x and, in particular, using read is very slow, but will do the trick for small data.
If you replace \u with \x then this is a valid Haskell string literal.
my_string = "\x0050\x0069\x006e..."
You can then convert to Text if you want, or leave it as String, or whatever.
Watch out, Java normally uses UTF-16 to encode its strings, so interpreting the bytes as UTF-8 will probably not work.
If the codes in your file are UTF-16, you need to do the following:
find the numeric value (Unicode code point) for each quadrupel
check if this is a high surrogate character. If this is so, the following character will be a low surrogate character. The pair of surrogate characters can be mapped to a Unicode point.
make a String from your list of unicode numbers with map fromEnum
The following is a quote from the Java doc http://docs.oracle.com/javase/7/docs/api/ :
The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode Standard.)
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
Java has methods to combine a high surrogate character and a low surrogate character to get the Unicode point. You may want to check the source of the java.lang.Character class to find out how exactly they do this, but I guess it is some simple bit-operation.
Another possibility would be to check for a Haskell library that does UTF-16 decoding.

MAXIMA string to integer conversion

I am reading a list with 2 integers from a file using readline(s) which returns a string. How does one convert the string of 2 integers back to 2 integers? Thanks.
I assume you have a string like "123 456". If so you can do this: map(parse_string, split(s)) where s is your string.
parse_string parses one Maxima expression, e.g. one integer. To get both integers, split the string, which makes a list, and then parse each element of the list.

Algorithm for finding string permutations where each position varies

Let's say I have a three character string "ABC". I want to generate all permutations of that string where a single letter can be replaced with his lower-case equivalent. For example, "aBC", "abC", "abc", "AbC", "Abc", etc. In other words, given a regexp like [Aa][Bb][Cc] generate every string that can be matched by it.
The problem can be trivially reduced to generating all binary sequences of length n. This has been previously addressed, for example in Fastest way to generate all binary strings of size n into a boolean array? and all permutations of a binary sequence x bits long.

AS3 - "\u2605" NOT the same as "\\u"+"2605"?

Trying to make a textfield where people write the unicode without the backslash. I want to add the backslash after they typed it. So the user types u2605 and the code converts it to "\u2605", i then convert this to a unicode character and insert it in textflow.
My code:
this works:
span.text = publicFunctions.htmlUnescape(he.encode("\u2605"))
this doesn't work:
span.text = publicFunctions.htmlUnescape(he.encode("\\u"+"2605"))
how to make a string that acts as a unicode string?
Tried all sorts of things, escape(unescape()), convert to number, "\u", "\u" ... nothing helps.
trace("\u2605" == "\u"+"2605") ... will return false. So will
trace("\u2605" == "\u"+"2605")
"\u2605" is a string with a single character, the character with the code point 2605, while "\\u" + "2605" is a string with 6 characters (the backslash, the u and the four digit number).
If you want to construct a unicode character from just the four digits, you should be able to use String.fromCharCode. The thing is just that the escape sequence uses a hexadecimal number, while the method obviously takes a decimal number. So if the user enters a hexadecimal string, you will have to convert that first:
trace(String.fromCharCode(parseInt('2605', 16)) == '\u2605'));
That's an interesting issue! I don't think you can concatenate a string literal and achieve what you're trying to do. The relevant character escaping happens when the string literal is originally formed, which means that you need the whole sequence together in the first place.
But you should be able to take the user-supplied number and dynamically generate a Unicode string with String.fromCharCode(...).
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/String.html#fromCharCode()

Resources