Converting Unicode in Swift - string

I currently have a string as follows which I received through an API call:
\n\nIt\U2019s a great place to discover Berlin and a comfortable place
to come home to.
And I want to convert it into something like this which is more readable:
It's a great place to discover Berlin and a comfortable place to come
home to.
I've taken a look at this post, but that's manually writing down every conversion, and there may be more of these unicode scalar characters introduced.
What I understand is \u{2019} is unicode scalar, but the format for this is \U2019 and I'm quite confused. Are there any built in methods to do this conversion?

This answer suggests using the NSString method stringByFoldingWithOptions.
The Swift String class has a concept called a "view" which lets you operate on the string under different encodings. It's pretty neat, and there are some views that might help you.
If you're dealing with strings in Swift, read this excellent post by Mike Ash. He discusses the idea of what a string really is with great detail and has some helpful hints for Swift 2.

Assuming you are already splitting the string and can get the offending format separately:
func convertFormat(stringOrig: String) -> Character {
let subString = String(stringOrig.characters.split("U").map({$0})[1])
let scalarValue = Int(subString)
let scalar = UnicodeScalar(scalarValue!)
return Character(scalar)
}
This will convert the String "\U2019" to the Character represented by "\u{2019}".

Related

What problems exist with using CAtlString in a Visual Studio project?

I'm working with a large codebase that is using a mix of ANSI characters and Unicode characters. Currently, there are like 13 home-rolled string classes in the project along with many home-rolled string manipulation functions (not sure why someone needed to write convert_to_upper that uses toupper).
I would like to standardize our string handling and from my past work with MFC, I really think using CAtlString is the best solution.
Ignoring the "I hate Microsoft" reasons, why would you suggest not using it and what alternative would you suggest?
One of the reasons I like CAtlSting is the easy handling of string conversion.
CAtlString myString;
myString = "Hello from ANSI";
myString = L"Hello from Unicode";
CAtlStringA ansiString("ANSI");
CAtlStringW unicodeString(ansiString); //Automatic translation to wide string
This kind of flexibility means we don't have to worry about what the string is. You just use the object.
Thanks

How to reverse strings that contain surrogate pairs in Dart?

I was playing with algorithms using Dart and as I actually followed TDD, I realized that my code has some limitations.
I was trying to reverse strings as part of an interview problem, but I couldn't get the surrogate pairs correctly reversed.
const simple = 'abc';
const emoji = '๐ŸŽ๐Ÿ๐Ÿ›';
const surrogate = '๐Ÿ‘ฎ๐Ÿฝโ€โ™‚๏ธ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป';
String rev(String s) {
return String.fromCharCodes(s.runes.toList().reversed);
}
void main() {
print(simple);
print(rev(simple));
print(emoji);
print(rev(emoji));
print(surrogate);
print(rev(surrogate));
}
The output:
abc
cba
๐ŸŽ๐Ÿ๐Ÿ›
๐Ÿ›๐Ÿ๐ŸŽ
๐Ÿ‘ฎ๐Ÿฝโ€โ™‚๏ธ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป
๐Ÿ’ปโ€๐Ÿฟ๐Ÿ‘ฉ๏ธโ™‚โ€๐Ÿฝ๐Ÿ‘ฎ
You can see that the simple emojis are correctly reversed as I'm using the runes instead of just simply executing s.split('').toList().reversed.join(''); but the surrogate pairs are reversed incorrectly.
How can I reverse a string that might contain surrogate pairs using the Dart programming language?
When reversing strings, you must operate on graphemes, not characters nor code units. Use grapheme_splitter.
Dart 2.7 introduced a new package that supports grapheme cluster-aware operations. The package is called characters. characters is a package for characters represented as Unicode extended grapheme clusters.
Dartโ€™s standard String class uses the UTF-16 encoding. This is a common choice in programming languages, especially those that offer support for running both natively on devices, and on the web.
UTF-16 strings usually work well, and the encoding is transparent to the developer. However, when manipulating strings, and especially when manipulating strings entered by users, you may experience a difference between what the user perceives as a character, and what is encoded as a code unit in UTF-16.
Source: "Announcing Dart 2.7: A safer, more expressive Dart" by Michael Thomsen, section "Safe substring handling"
The package will also help to reverse your strings with emojis the way a native programmer would expect.
Using simple Strings, you find issues:
String hi = 'Hi ๐Ÿ‡ฉ๐Ÿ‡ฐ';
print('String.length: ${hi.length}');
// Prints 7; would expect 4
With characters
String hi = 'Hi ๐Ÿ‡ฉ๐Ÿ‡ฐ';
print(hi.characters.length);
// Prints 4
print(hi.characters.last);
// Prints ๐Ÿ‡ฉ๐Ÿ‡ฐ
It's worth taking a look at the source code of the characters package, it's far from simple but looks easier to digest and better documented than grapheme_splitter. The characters package is also maintained by the Dart team.

Enhancing String Literals Delimiters to Support Raw Text Swift

I recently found this code snippets on the Swift 5 Book.
print(#"Write an interpolated string in Swift using \(multiplier)."#)
// Prints "Write an interpolated string in Swift using \(multiplier).โ€
print(#"6 times 7 is \#(6 * 7)."#)
// Prints "6 times 7 is 42.โ€
I learnt it was an accepted proposal in Swift 5 for enhancing string literals delimiters to support raw text, with so many examples given.
My question is when and how is it used in practical cases because from the examples given above, I would still clearly achieve what I want to even without the # signs!
To give just one example where it is very useful. How about when writing Regex, previously it was a nightmare as you had to escape all special characters. E.g.
let regex1 = "\\\\[A-Z]+[A-Za-z]+\\.[a-z]+"
Can now be replaced with
let regex2 = #"\\[A-Z]+[A-Za-z]+\.[a-z]+"#
Much easier to write. Now when you find a regex online, you can just copy and paste it in without having to spend ages escaping special characters.
Edit:
Can read here
https://www.hackingwithswift.com/articles/162/how-to-use-raw-strings-in-swift

Divide the string with a dot in Groovy

How can I divide a string with dots as delimiters in Groovy?
If I have a string like "22112018", how do I convert it to "22.11.2018"?
EDIT:
I wasn't really sure how to formulate the question. I wanted to 'split' the string but split() method doesn't do what I need (doesn't mean the same).
This answer in comments (by #ernest_k) was good enough for what I needed:
text = "22112018"
"${text[0..1]}.${text[2..3]}.${text[4..7]}"
However, it was not an "answer" in the SO way, so I'm accepting the answer by #tim_yates (also works and is probably a more precise and robust solution).
I assume this is a date...
You could do:
Date.parse('ddMMyyyy', '22112018').format('dd.MM.yyyy')
instead of just grabbing characters

Use STL string with unicode

I am coding a plugin for autodesk 3dsmax and they recommend to use the _T(x) macro for every string literal to make it work with unicode as well. I am using the stl string class a lot in this code. So do I have to rewrite the code: string("foo") to: string(_T("foo")) ? Actually the stl string class doesnt have a constructor for wchars, so it doesnt make sense, does it?
Thx
Look at the definition of "T" macro - it expands to "L" in "Unicode" builds or nothing in "non-Unicode" builds. If you want to keep using the string calss and follow the recommendation for your plugin, your best bet is to use something like tstring which would follow the same rules.
But the truth is - all this "T" business made a lot of sense 10 years ago - all modern Windows versions are Unicode-only and you can just use wstring.
You could create an own string class say xstring and use the _T for constants and then internally, depending on unicode or not switch to string or wstring. either that or instantiate xstring<yourchartype>

Resources