How can I modify individual characters in a String using Swift?

How can I modify individual characters in a String using Swift? - string

var str: String = "sometext"
for i in str.characters.indices
{
str[i] = "c"
}
print(str)
I'm getting the following error:
error: cannot assign through subscript: subscript is get-only

You are getting this error because the subscript method of a Swift String is get-only like it is saying in your warning.
This is different from an Array.
Array:
array[0] ✅
array[0] = 0 ✅
String:
str[0] ❌
str[0] = "0" ❌
str[str.startIndex.advancedBy(0)] ✅
Use replaceRange for accomplishing your task.
Example:
var value = "green red blue"
value.replaceRange(value.startIndex.advancedBy(
6)..<value.startIndex.advancedBy(6 + 3),
with: "yellow")
print(value)
Result:
green yellow blue
Also have a look at this superb blog article from Ole Begemann who explains how Swift Strings work very detailed. You will also find the answer why you can't use subscript methods on Swift Strings.
Because of the way Swift strings are stored, the String type does not support random access to its Characters via an integer index — there is no direct equivalent to NSStringʼs characterAtIndex: method. Conceptually, a String can be seen as a doubly linked list of characters rather than an array.
Article Link

In some cases it may be preferable to convert the String to an Array, mutate, then convert back to a String, e.g.:
var chars = Array("sometext".characters)
for i in 0..<chars.count {
chars[i] = "c"
}
let string = String(chars)
Advantages include:
clarity
better performance on large strings: O(1) time for making each replacement in Array vs O(N) time for making each replacement in String.
Disadvantages include:
higher memory consumption: O(N) for Array vs O(1) for String.
Pick your poison :)

Related

Remove part of string (regular expressions)

I am a beginner in programming. I have a string for example "test:1" and "test:2". And I want to remove ":1" and ":2" (including :). How can I do it using regular expression?

Hi andrew it's pretty easy. Think of a string as if it is an array of chars (letters) cause it actually IS. If the part of the string you want to delete is allways at the end of the string and allways the same length it goes like this:
var exampleString = 'test:1';
exampleString.length -= 2;
Thats it you just deleted the last two values(letters) of the string(charArray)
If you cant be shure it's allways at the end or the amount of chars to delete you'd to use the version of szymon

There are at least a few ways to do it with Groovy. If you want to stick to regular expression, you can apply expression ^([^:]+) (which means all characters from the beginning of the string until reaching :) to a StringGroovyMethods.find(regexp) method, e.g.
def str = "test:1".find(/^([^:]+)/)
assert str == 'test'
Alternatively you can use good old String.split(String delimiter) method:
def str = "test:1".split(':')[0]
assert str == 'test'

What is the difference between strings and characters in Matlab?

What is the difference between string and character class in MATLAB?
a = 'AX'; % This is a character.
b = string(a) % This is a string.

The documentation suggests:
There are two ways to represent text in MATLAB®. You can store text in character arrays. A typical use is to store short pieces of text as character vectors. And starting in Release 2016b, you can also store multiple pieces of text in string arrays. String arrays provide a set of functions for working with text as data.
This is how the two representations differ:
Element access. To represent char vectors of different length, one had to use cell arrays, e.g. ch = {'a', 'ab', 'abc'}. With strings, they can be created in actual arrays: str = [string('a'), string('ab'), string('abc')].
However, to index characters in a string array directly, the curly bracket notation has to be used:
str{3}(2) % == 'b'
Memory use. Chars use exactly two bytes per character. strings have overhead:
a = 'abc'
b = string('abc')
whos a b
returns
Name Size Bytes Class Attributes
a 1x3 6 char
b 1x1 132 string

The best place to start for understanding the difference is the documentation. The key difference, as stated there:
A character array is a sequence of characters, just as a numeric array is a sequence of numbers. A typical use is to store short pieces of text as character vectors, such as c = 'Hello World';.
A string array is a container for pieces of text. String arrays provide a set of functions for working with text as data. To convert text to string arrays, use the string function.
Here are a few more key points about their differences:
They are different classes (i.e. types): char versus string. As such they will have different sets of methods defined for each. Think about what sort of operations you want to do on your text, then choose the one that best supports those.
Since a string is a container class, be mindful of how its size differs from an equivalent character array representation. Using your example:
>> a = 'AX'; % This is a character.
>> b = string(a) % This is a string.
>> whos
Name Size Bytes Class Attributes
a 1x2 4 char
b 1x1 134 string
Notice that the string container lists its size as 1x1 (and takes up more bytes in memory) while the character array is, as its name implies, a 1x2 array of characters.
They can't always be used interchangeably, and you may need to convert between the two for certain operations. For example, string objects can't be used as dynamic field names for structure indexing:
>> s = struct('a', 1);
>> name = string('a');
>> s.(name)
Argument to dynamic structure reference must evaluate to a valid field name.
>> s.(char(name))
ans =
1

Strings do have a bit of overhead, but still increase by 2 bytes per character. After every 8 characters it increases the size of the variable. The red line is y=2x+127.
figure is created using:
v=[];N=100;
for ct = 1:N
s=char(randi([0 255],[1,ct]));
s=string(s);
a=whos('s');v(ct)=a.bytes;
end
figure(1);clf
plot(v)
xlabel('# characters')
ylabel('# bytes')
p=polyfit(1:N,v,1);
hold on
plot([0,N],[127,2*N+127],'r')
hold off

One important practical thing to note is, that strings and chars behave differently when interacting with square brackets. This can be especially confusing when coming from python. consider following example:
>>['asdf' '123']
ans =
'asdf123'
>> ["asdf" "123"]
ans =
1×2 string array
"asdf" "123"

[] operator for strings, link with slices for vectors

Why do you have to walk over the string to find the nᵗʰ letter of a string when you do s[n] where s is a string. (According to https://doc.rust-lang.org/book/strings.html)
From what I understood, a string is an array of chars and a char is an array of 4 bytes or a number of 4 bytes. So is getting the nth letter would be similar as doing this : v[4*n..4*n+4] where v is a vector ?
What is the cost of v[i..j] ?
I would assume that the cost of v[i..j] is j-i and so that the cost of s[n] should be 4.

Note: The second edition of The Rust Programming Language has an improved and smooth explanation to Strings in Rust, which you might wish to read as well. The answer below, although still accurate, quotes from the first edition of the book.
I will try to clarify these misconceptions about strings in Rust by quoting from the book (https://doc.rust-lang.org/book/strings.html).
A ‘string’ is a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid encoding of UTF-8 sequences.
With this in mind, plus that UTF-8 code points are variably sized (1 to 4 bytes depending on the character), all strings in Rust, whether they are &str or String, are not arrays of characters, and can not be treated like such. It is further explained why on Slicing:
Because strings are valid UTF-8, they do not support indexing:
let s = "hello";
println!("The first letter of s is {}", s[0]); // ERROR!!!
Usually, access to a vector with [] is very fast. But, because each character in a UTF-8 encoded string can be multiple bytes, you have to walk over the string to find the nᵗʰ letter of a string. This is a significantly more expensive operation, and we don’t want to be misleading.
Unlike what was mentioned in the question, one cannot do s[n], because although in theory this would allows us to fetch the nth byte in constant time, that byte is not guaranteed to make any sense on its own.
What is the cost of v[i..j] ?
The cost of slicing is actually constant, because it is done at byte-level:
You can get a slice of a string with slicing syntax:
let dog = "hachiko";
let hachi = &dog[0..5];
But note that these are byte offsets, not character offsets. So this will fail at runtime:
let dog = "忠犬ハチ公";
let hachi = &dog[0..2];
with this error:
thread '' panicked at 'index 0 and/or 2 in 忠犬ハチ公 do not lie on
character boundary'
Basically, slicing is acceptable and will yield a new view of that string, so no copies are made. However, it should only be used when you are completely sure that the offsets are right in terms of character boundaries.
In order to iterate over each character of a string, you may instead call chars():
let c = s.chars().nth(n);
Even with that in mind, note that handling Unicode character might not be exactly what you want if you wish to handle character modifiers in UTF-8 (which are scalar values by themselves but should not be treated individually either). Quoting now from the str API:
fn chars(&self) -> Chars
Returns an iterator over the chars of a string slice.
As a string slice consists of valid UTF-8, we can iterate through a string slice by char. This method returns such an iterator.
It's important to remember that char represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.
Remember, chars may not match your human intuition about characters:
let y = "y̆";
let mut chars = y.chars();
assert_eq!(Some('y'), chars.next()); // not 'y̆'
assert_eq!(Some('\u{0306}'), chars.next());
assert_eq!(None, chars.next());
The unicode_segmentation crate provides a means to define grapheme cluster boundaries:
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
let s = "a̐éö̲\r\n";
let g = UnicodeSegmentation::graphemes(s, true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);

If you do want to treat the string as an array of codepoints (which isn't strictly the same as characters; there are combining marks, emoji with separate skin-tone modifiers, etc.), you can collect it into a Vec:
fn main() {
let s = "£10 🙃!";
for (i,c) in s.char_indices() {
println!("{} {}", i, c);
}
let v: Vec<char> = s.chars().collect();
println!("v[5] = {}", v[5]);
}
Play link
With bonus demonstration of some varying character widths, this outputs:
0 £
2 1
3 0
4
5 🙃
9 !
v[5] = !

Swift: Test boundary of String.Index for substring function

Wow. Swift makes it really fiddly to copy a substring from a simple String.
Most programming languages allow characters to be simply indexed by their integer position in the string, making targeting a character or a range a matter of simple maths. Because Swift allows a wide range of characters to be used with various bit depths, a precise (memory?) index for each character has to be found first, based its position from the start or end of the string. These positions can then be passed into a method of the String class that returns the substring in the range. I've written a function to do the work:
//arguments: The parent string, number of chars from 1st char in it and total char length of substring
func subStr(str: String, c1: Int, c2: Int) -> String {
//get string indexes for range of substring based on args
let ind1 = str.startIndex.advancedBy(c1)
let ind2 = str.startIndex.advancedBy(c1+c2)
//calls substring function with a range object passed as an argument set to the index values
let sub = str.substringWithRange(Range<String.Index>(start: ind1, end: ind2))
//substring returned
return sub
}
The problem is that because the substringWithRange function only works with Range objects its not obvious how to check if the substring is out of bounds of the string. For example calling this function would produce an error:
subStr ("Stack Overflow", c1: 6, c2: 12)
The c1 argument is OK but the length of the (c2) substring exceeds the upper boundary of the parent string causing a fatal error.
How do I refine this function to make it handle bad arguments or otherwise refine this clunky process?
Many thanks for reading.

You can use
let ind1 = str.startIndex.advancedBy(c1, limit: str.endIndex)
let ind2 = str.startIndex.advancedBy(c1+c2, limit: str.endIndex)
to advance the start index by the given amounts, but not beyond the end index of the string. With that modification, your function gives
subStr ("Stack Overflow", c1: 6, c2: 12) // "Overflow"
subStr ("Stack Overflow", c1: 12, c2: 20) // ""

I quickly made a nice substring function for Strings without Foundation dependency:
extension String {
func substring(from: Int, length: Int) -> String {
return String(dropFirst(from).prefix(length))
}
}
If the length is bigger than possible, it just gives all the characters to the end. Can't crash Can't crash as long as neither argument is negative (it makes sense to crash if any argument is negative since that would be a major flaw in your code).
Example usage:
"Stack Overflow".substring(from: 6, length: 12)
gives
"Overflow"

Converting Character and CodePoint in Swift

Can I convert directly between a Swift Character and its Unicode numeric value? That is:
var i:Int = ... // A plain integer index.
var myCodeUnit:UInt16 = myString.utf16[i]
// Would like to say myChar = myCodeUnit as Character, or equivalent.
or...
var j:String.Index = ... // NOT an integer!
var myChar:Character = myString[j]
// Would like to say myCodeUnit = myChar as UInt16
I can say:
myCodeUnit = String(myChar).utf16[0]
but this means creating a new String for each character. And I am doing this thousands of times (parsing text) so that is a lot of new Strings that are immediately being discarded.

The type Character represents a "Unicode grapheme cluster", which can be multiple Unicode codepoints. If you want one Unicode codepoint, you should use the type UnicodeScalar instead.

As per the swift book:
String to Code Unit
To get codeunit/ordinals for each character of the String, you can do the following:
var yourSwiftString = "甲乙丙丁"
for scalar in yourSwiftString.unicodeScalars {
print("\(scalar.value) ")
}
Code Unit to String
Because swift current does not have a way to convert ordinals/code units back to UTF, the best way I found is to still NSString. i.e. if you have int ordinals (32bit but representing the 21bit codepoints) you can use the following to convert to Unicode:
var i = 22247
var unicode_str = NSString(bytes: &i, length: 4, encoding: NSUTF32LittleEndianStringEncoding)
Obviously if you want to convert a array of ints, you'll need to pack them into a array first.

I spoke to an Apple engineer who is working on Unicode and he says they have not completed the implementation of unicode characters in strings. Are you looking at getting a code unit or a full character? Because the only and proper way to get at a full unicode character is by using a for each loop on a string. ie
for c in "hello" {
// c is a unicode character of type Character
}
But, this is not implemented as of yet.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I modify individual characters in a String using Swift? - string

var str: String = "sometext" for i in str.characters.indices { str[i] = "c" } print(str) I'm getting the following error: error: cannot assign through subscript: subscript is get-only

Related

Remove part of string (regular expressions)

What is the difference between strings and characters in Matlab?

[] operator for strings, link with slices for vectors

Swift: Test boundary of String.Index for substring function

Converting Character and CodePoint in Swift

Categories

Resources