Replace some characters in a string with the next unicode character

Replace some characters in a string with the next unicode character - string

I have an input text as following:
inputtext = "This is a test";
I need to replace some of the character (based on a certain criteria) to next unicode character
let i = 0;
for c in inputtext.chars() {
if (somecondition){
// Replace char here
inputtext.replace_range(i..i+1, newchar);
// println!("{}", c);
}
What is the best way to do this?

You can't easily update a string in-place because a Rust string is not just an array of characters, it's an array of bytes (in UTF-8 encoding), and different characters may use different numbers of bytes. For example, the character ߿ (U+07FF "Nko Taman Sign") uses two bytes, whereas the next Unicode character ࠀ (U+0800 "Samaritan Letter Alaf") uses three.
It's therefore simplest to turn the string into an iterator of characters (using .chars()), manipulate that iterator as appropriate, and then construct a new string using .collect().
For example:
let old = "abcdef";
let new = old.chars()
// note: there's an edge case if ch == char::MAX which we must decide
// how to handle. in this case I chose to not change the
// character, but this may be different from what you need.
.map(|ch| {
if somecondition {
char::from_u32(ch as u32 + 1).unwrap_or(ch)
} else {
ch
}
})
.collect::<String>();

Related

Split emoji string in Dart

I want to split a string of emojis into each emoji. so how can I do this in dart language?
void main() {
print('GoodJob'.split("")); // output: [G, o, o, d, J, o, b]
print('🤭🎱🏓'.split("")); // output: [�, �, �, �, �, �] but expected: ['🤭','🎱','🏓']
}

Docs from TextField recommends to use characters package to work with emoji in dart.
Docs describe as follows,
It's important to always use characters when dealing with user input text that may contain complex characters. This will ensure that extended grapheme clusters and surrogate pairs are treated as single characters, as they appear to the user.
For example, when finding the length of some user input, use string.characters.length. Do NOT use string.length or even string.runes.length. For the complex character "👨‍👩‍👦", this appears to the user as a single character, and string.characters.length intuitively returns 1. On the other hand, string.length returns 8, and string.runes.length returns 5!
import 'package:characters/characters.dart';
void main() {
print('🤭🎱🏓'.characters.split("".characters));
}
outputs
(🤭, 🎱, 🏓)

You can match all the emojis using regex, and then add them to a list:
List<String> splitEmoji(String text) {
final List<String> out = [];
final pattern = RegExp(
r'(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])');
final matches = pattern.allMatches(text);
for (final match in matches) {
out.add(match.group(0)!);
}
return out;
}
Regex credit
Usage:
print(splitEmoji('🤭🎱🏓')); // Output: [🤭, 🎱, 🏓]

You can use the runes property of String.
void main() {
final String emojis = '🤭🎱🏓';
final Runes codePoints = emojis.runes;
for (final codePoint in codePoints) {
print(String.fromCharCode(codePoint));
}
}

Is there a better way of achieving horizontal scrolling text effect imitating a limited character display?

I'm trying to imitate a 16-character display on the command line which loops over a long string infinitely similar to a stock exchange ticker.
Right now, I'm first printing the first 16 byte slice of the ASCII string and moving over 1 byte at a time:
package main
import (
"fmt"
"time"
)
const (
chars = 16
text = "There are many variations of passages of Lorem Ipsum available!!!"
)
func main() {
fmt.Print("\033[2J") // clear screen
buf := []byte(text)
i := chars
for {
fmt.Print("\033[H") // move cursor back to first position
fmt.Printf(string(buf[i-chars : i]))
i++
if i == len(buf)+1 {
i = chars
}
time.Sleep(time.Second / 4)
// visualization of what's happening:
// fmt.Printf("\t\t Character:%d of Length:%d | Slice: %d:%d \n", i, len(buf), i-chars, i)
}
}
When I reach the end of the text, I reset the counter inside loop and start printing again from the first slice. Instead of doing this, I want to get to a "roll over" effect where the head of the slice seamlessly connects to the tail of the slice.
The problem is, I cannot use an empty buffer and append the head to the tail within the loop because it will just grow endlessly.
So instead of doing that, I decided to append the first 16 bytes of the string to it's tail before the loop and shrink the slice -16 bytes right away. But since that -16 bytes still exist in the backing array, I can expand/shrink from the loop:
func main() {
fmt.Print("\033[2J") // clear screen
buf := []byte(text)
buf = append(buf, buf[:chars]...)
buf = buf[:len(buf)-chars]
var expanded bool
i := chars
for {
fmt.Print("\033[H") // move cursor back to first position
fmt.Printf(string(buf[i-chars : i]))
i++
if i+1 == len(buf)-chars && !expanded {
buf = buf[:len(buf)+chars]
expanded = true
}
if i+1 == len(buf) {
i = chars
buf = buf[:len(buf)-chars]
expanded = false
}
time.Sleep(time.Second / 2)
// visualization of what's happening:
//fmt.Printf("\t\t Character:%d of Length:%d | Slice: %d:%d | Cap: %d\n", i, len(buf), i-chars, i, cap(buf))
}
}
This gets me to where I want, but I'm rather new to Go so I want to know if there's a better way to achieve the same result?

First I would not change the buffer. It's a good idea to append the first 16 chars to the end of it to easily get the "rolling over" effect, but it's much easier and cheaper to just reset the position to 0 when you reach its end.
Next, you don't need to operate on a byte slice. Just operate on a string. Strings can be indexed and sliced, just like slices, and slicing a string doesn't even make a copy (doesn't have to), it returns a new string (header) which shares the backing array of the string data. Don't forget that indexing and slicing strings uses byte index (not rune index) which is fine for ASCII texts (their characters are mapped to bytes one-to-one in UTF-8), but would not work with multi-byte special characters. Your example text is fine.
Also don't use fmt.Printf() to print a text, that expects a format string (treats its first argument as a format string). Instead just use fmt.Print().
All in all, your solution can be reduced to this which is much-much better performance-wise, and it's much cleaner and simpler:
func main() {
fmt.Print("\033[2J") // clear screen
s := text + text[:chars]
for i := 0; ; i = (i + 1) % len(text) {
fmt.Print("\033[H") // move cursor back to first position
fmt.Print(s[i : i+chars])
time.Sleep(time.Second / 2)
}
}
Also note that when position reaches len(text), we reset it to 0, so the text before that starts with the last char of text and uses chars-1 from the beginning. So it's also enough to append chars-1 instead of chars:
s := text + text[:chars-1]

TextField.text is assigned with a multi-line string, but they're not equal right after the assignment

I have an Array of strings:
private var phrase:Array = ["You will be given a series of questions like this:\n2 + 2 =\n(click or press ENTER to continue)","You can use the Keyboard or Mouse\nto deliver the answer\n\"ENTER\" locks it in.\n(click or press ENTER to continue)","\nClick Here\n to start."];
I have a conditional later in the script to see if the phrase[0] is equal to the instructText.text, so I put a "test" directly after the assignment as below:
instructText.text = phrase[0];
if (instructText.text == phrase[0]) {
trace("phrase zero");
}
else {
trace("nottttttttt");
}
//OUTPUT: nottttttttt
I've tried various combinations of phrase[0] as String and String(phrase[0]), but haven't had any luck.
What am I missing?

Turns out that the text property of the TextField class converts the "Line Feed" characters (the "\n", ASCII code of 1010=A16) to the character of "Carriage Return" (the ASCII code of 1310=D16).
So, you need a LF to CR conversion (or vise-versa) to make a homogeneous comparison of what is stored in the property against what you have in the array element:
function replaceLFwithCR(s:String):String {
return s.replace(/\n/g, String.fromCharCode(13));
}
if (instructText.text == replaceCRwithLF(phrase[0])) {
trace("They are equal :)");
}
else {
trace("They are NOT equal :(");
}
// Output: They are equal :)
P.S. To get the code of a character, you may utilize the charCodeAt() method of the String class:
trace("\n".charCodeAt(0)); // 10

D: how to remove last char in string?

I need to remove last char in string in my case it's comma (","):
foreach(line; fcontent.splitLines)
{
string row = line.split.map!(a=>format("'%s', ", a)).join;
writeln(row.chop.chop);
}
I have found only one way - to call chop two times. First remove \r\n and second remove last char.
Is there any better ways?

import std.array;
if (!row.empty)
row.popBack();

As it usually happens with string processing, it depends on how much Unicode do you care about.
If you only work with ASCII it is very simple:
import std.encoding;
// no "nice" ASCII literals, D really encourages Unicode
auto str1 = cast(AsciiString) "abcde";
str1 = str1[0 .. $-1]; // get slice of everything but last byte
auto str2 = cast(AsciiString) "abcde\n\r";
str2 = str2[0 .. $-3]; // same principle
In "last char" actually means unicode code point (http://unicode.org/glossary/#code_point) it gets a bit more complicated. Easy way is to just rely on D automatic decoding and algorithms:
import std.range, std.stdio;
auto range = "кириллица".retro.drop(1).retro();
writeln(range);
Here retro (http://dlang.org/phobos/std_range.html#.retro) is a lazy reverse iteration function. It takes any range (unicode string is a valid range) and returns wrapper that is capable of iterating it backwards.
drop (http://dlang.org/phobos/std_range.html#.drop) simply pops a single range element and ignores it. Calling retro again will reverse the iteration order back to normal, but now with the last element dropped.
Reason why it is different from ASCII version is because of nature of Unicode (specifically UTF-8 which D defaults to) - it does not allow random access to any code point. You actually need to decode them all one by one to get to any desired index. Fortunately, D takes care of all decoding for you hiding it behind convenient range interface.
For those who want even more Unicode correctness, it should be possible to operate on graphemes (http://unicode.org/glossary/#grapheme):
import std.range, std.uni, std.stdio;
auto range = "abcde".byGrapheme.retro.drop(1).retro();
writeln(range);
Sadly, looks like this specific pattern is not curently supported because of bug in Phobos. I have created an issue about it : https://issues.dlang.org/show_bug.cgi?id=14394

NOTE: Updated my answer to be a bit cleaner and removed the lambda function in 'map!' as it was a little ugly.
import std.algorithm, std.stdio;
import std.string;
void main(){
string fcontent = "I am a test\nFile\nwith some,\nCommas here and\nthere,\n";
auto data = fcontent
.splitLines
.map!(a => a.replaceLast(","))
.join("\n");
writefln("%s", data);
}
auto replaceLast(string line, string toReplace){
auto o = line.lastIndexOf(toReplace);
return o >= 0 ? line[0..o] : line;
}

module main;
import std.stdio : writeln;
import std.string : lineSplitter, join;
import std.algorithm : map, splitter, each;
enum fcontent = "some text\r\nnext line\r\n";
void main()
{
fcontent.lineSplitter.map!(a=>a.splitter(' ')
.map!(b=>"'" ~ b ~ "'")
.join(", "))
.each!writeln;
}

Take a look, I use this extension method to replace any last character or sub-string, for example:
string testStr = "Happy holiday!";<br>
Console.Write(testStr.ReplaceVeryLast("holiday!", "Easter!"));
public static class StringExtensions
{
public static string ReplaceVeryLast(this string sStr, string sSearch, string sReplace = "")
{
int pos = 0;
sStr = sStr.Trim();
do
{
pos = sStr.LastIndexOf(sSearch, StringComparison.CurrentCultureIgnoreCase);
if (pos >= 0 && pos + sSearch.Length == sStr.Length)
sStr = sStr.Substring(0, pos) + sReplace;
} while (pos == (sStr.Length - sSearch.Length + 1));
return sStr;
}
}

Defining a custom PURE Swift Character Set

So, using Foundation you can use NSCharacterSet to define character sets and test character membership in Strings. I would like to do so without Cocoa classes, but in a purely Swift manner.
Ideally, code could be used like so:
struct ReservedCharacters: CharacterSet {
characters "!", "#", "$", "&", ... etc.
func isMember(character: Character) -> Bool
func encodeCharacter(parameters) { accepts a closure }
func decodeCharacter(parameters) { accepts a closure }
}
This is probably a very loaded question. But I'd like to see what you Swifters think.

You can already test for membership in a character set by initializing a String and using the contains global function:
let vowels = "aeiou"
let isVowel = contains(vowels, "i") // isVowel == true
As far as your encode and decode functions go, are you just trying to get the 8-bit or 16-bit encodings for the Character? If that is the case then just convert them to a String and access there utf8 or utf16 properties:
let char = Character("c")
let a = Array(String(char).utf8)
println() // This prints [99]
Decode would take a little more work, but I know there's a function for it...
Edit: This will replace a character from a characterSet with '%' followed by the character's hex value:
let encode: String -> String = { s in
reduce(String(s).unicodeScalars, "") { x, y in
switch contains(charSet, Character(y)) {
case true:
return x + "%" + String(y.value, radix: 16)
default:
return x + String(y)
}
}
}
let badURL = "http://why won't this work.com"
let encoded = encode(badURL)
println(encoded) // prints "http://why%20won%27t%20this%20work.com"
Decoding, again, is a bit more challenging, but I'm sure it can be done...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace some characters in a string with the next unicode character - string

Related

Split emoji string in Dart

Is there a better way of achieving horizontal scrolling text effect imitating a limited character display?

TextField.text is assigned with a multi-line string, but they're not equal right after the assignment

D: how to remove last char in string?

Defining a custom PURE Swift Character Set

Categories

Resources