Edit string in place with a function - rust

I am trying to edit a string in place by passing it to mutate(), see below.
Simplified example:
fn mutate(string: &mut &str) -> &str {
string[0] = 'a'; // mutate string
string
}
fn do_something(string: &str) {
println!("{}", string);
}
fn main() {
let string = "Hello, world!";
loop {
string = mutate(&mut string);
do_something(string);
}
}
But I get the following compilation error:
main.rs:1:33: 1:37 error: missing lifetime specifier [E0106]
main.rs:1 fn mutate(string: &mut &str) -> &str {
^~~~
main.rs:1:33: 1:37 help: this function's return type contains a borrowed value, but the signature does not say which one of `string`'s 2 elided lifetimes it is borrowed from
main.rs:1 fn mutate(string: &mut &str) -> &str {
^~~~
Why do I get this error and how can I achieve what I want?

You can't change a string slice at all. &mut &str is not an appropriate type anyway, because it literally is a mutable pointer to an immutable slice. And all string slices are immutable.
In Rust strings are valid UTF-8 sequences, and UTF-8 is a variable-width encoding. Consequently, in general changing a character may change the length of the string in bytes. This can't be done with slices (because they always have fixed length) and it may cause reallocation for owned strings. Moreover, in 99% of cases changing a character inside a string is not what you really want.
In order to do what you want with unicode code points you need to do something like this:
fn replace_char_at(s: &str, idx: uint, c: char) -> String {
let mut r = String::with_capacity(s.len());
for (i, d) in s.char_indices() {
r.push(if i == idx { c } else { d });
}
r
}
However, this has O(n) efficiency because it has to iterate through the original slice, and it also won't work correctly with complex characters - it may replace a letter but leave an accent or vice versa.
More correct way for text processing is to iterate through grapheme clusters, it will take diacritics and other similar things correctly (mostly):
fn replace_grapheme_at(s: &str, idx: uint, c: &str) -> String {
let mut r = String::with_capacity(s.len());
for (i, g) in s.grapheme_indices(true) {
r.push_str(if i == idx { c } else { g });
}
r
}
There is also some support for pure ASCII strings in std::ascii module, but it is likely to be reformed soon. Anyway, that's how it could be used:
fn replace_ascii_char_at(s: String, idx: uint, c: char) -> String {
let mut ascii_s = s.into_ascii();
ascii_s[idx] = c.to_ascii();
String::from_utf8(ascii_s.into_bytes()).unwrap()
}
It will panic if either s contains non-ASCII characters or c is not an ASCII character.

Related

Temporarily cache owned value between iterator adapters

I'd like to know if there's a way to cache an owned value between iterator adapters, so that adapters later in the chain can reference it.
(Or if there's another way to allow later adapters to reference an owned value that lives inside the iterator chain.)
To illustrate what I mean, let's look at this (contrived) example:
I have a function that returns a String, which is called in an Iterator map() adapter, yielding an iterator over Strings. I'd like to get an iterator over the chars() in those Strings, but the chars() method requires a string slice, meaning a reference.
Is this possible to do, without first collecting the Strings?
Here's a minimal example that of course fails:
fn greet(c: &str) -> String {
"Hello, ".to_owned() + c
}
fn main() {
let names = ["Martin", "Helena", "Ingrid", "Joseph"];
let iterator = names.into_iter().map(greet);
let fails = iterator.flat_map(<str>::chars);
}
Playground
Using a closure instead of <str>::chars - |s| s.chars() - does of course not work either. It makes the types match, but breaks lifetimes.
Edit (2022-10-03): In response to the comments, here's some pseudocode of what I have in mind, but with incorrect lifetimes:
struct IteratorCache<'a, T, I>{
item : Option<T>,
inner : I,
_p : core::marker::PhantomData<&'a T>
}
impl<'a, T, I> Iterator for IteratorCache<'a, T,I>
where I: Iterator<Item=T>
{
type Item=&'a T;
fn next(&mut self) -> Option<&'a T> {
self.item = self.inner.next();
if let Some(x) = &self.item {
Some(&x)
} else {
None
}
}
}
The idea would be that the reference could stay valid until the next call to next(). However I don't know if this can be expressed with the function signature of the Iterator trait. (Or if this can be expressed at all.)
I don't think something like this exists yet, and collecting into a Vec<char> creates some overhead, but you can write such an iterator yourself with a little bit of trickery:
struct OwnedCharsIter {
s: String,
index: usize,
}
impl OwnedCharsIter {
pub fn new(s: String) -> Self {
Self { s, index: 0 }
}
}
impl Iterator for OwnedCharsIter {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
// Slice of leftover characters
let slice = &self.s[self.index..];
// Iterator over leftover characters
let mut chars = slice.chars();
// Query the next char
let next_char = chars.next()?;
// Compute the new index by looking at how many bytes are left
// after querying the next char
self.index = self.s.len() - chars.as_str().len();
// Return next char
Some(next_char)
}
}
fn greet(c: &str) -> String {
"Hello, ".to_owned() + c
}
fn main() {
let names = ["Martin", "Helena", "Ingrid", "Joseph"];
let iterator = names.into_iter().map(greet);
let chars_iter = iterator.flat_map(OwnedCharsIter::new);
println!("{:?}", chars_iter.collect::<String>())
}
"Hello, MartinHello, HelenaHello, IngridHello, Joseph"

How do I change characters at a specific index within a string in rust?

I am trying to change a single character at a specific index in a string, but I do not know how to in rust. For example, how would I change the 4th character in "hello world" to 'x', so that it would be "helxo world"?
The easiest way is to use the replace_range() method like this:
let mut hello = String::from("hello world");
hello.replace_range(3..4,"x");
println!("hello: {}", hello);
Output: hello: helxo world (Playground)
Please note that this will panic if the range to be replaced does not start and end on UTF-8 codepoint boundaries. E.g. this will panic:
let mut hello2 = String::from("hell😀 world");
hello2.replace_range(4..5,"x"); // panics because 😀 needs more than one byte in UTF-8
If you want to replace the nth UTF-8 code point, you have to do something like this:
pub fn main() {
let mut hello = String::from("hell😀 world");
hello.replace_range(
hello
.char_indices()
.nth(4)
.map(|(pos, ch)| (pos..pos + ch.len_utf8()))
.unwrap(),
"x",
);
println!("hello: {}", hello);
}
(Playground)
The standard way of representing a string in Rust is as a contiguous range of bytes encoded as a UTF-8 string. UTF-8 codepoints can be from one to 4 bytes long, so generally you can't simply replace one UTF-8 codepoint with another because the length might change. You also can't do simple pointer arithmetic to index into a Rust String to the nth character, because again codepoint encodings can be from 1 to 4 bytes long.
So one safe but slow way to do it would be like this, iterating through the characters of the source string, replacing the one you want, then creating a new string:
fn replace_nth_char(s: &str, idx: usize, newchar: char) -> String {
s.chars().enumerate().map(|(i,c)| if i == idx { newchar } else { c }).collect()
}
But we can do it in O(1) if we manually make sure the old and new character are single-byte ascii.
fn replace_nth_char_safe(s: &str, idx: usize, newchar: char) -> String {
s.chars().enumerate().map(|(i,c)| if i == idx { newchar } else { c }).collect()
}
fn replace_nth_char_ascii(s: &mut str, idx: usize, newchar: char) {
let s_bytes: &mut [u8] = unsafe { s.as_bytes_mut() };
assert!(idx < s_bytes.len());
assert!(s_bytes[idx].is_ascii());
assert!(newchar.is_ascii());
// we've made sure this is safe.
s_bytes[idx] = newchar as u8;
}
fn main() {
let s = replace_nth_char_safe("Hello, world!", 3, 'x');
assert_eq!(s, "Helxo, world!");
let mut s = String::from("Hello, world!");
replace_nth_char_ascii(&mut s, 3, 'x');
assert_eq!(s, "Helxo, world!");
}
Keep in mind that idx parameter in replace_nth_char_ascii is not a character index, but instead a byte index. If there are any multibyte characters earlier in the string, then the byte index and the character index will not correspond.

How to "crop" characters off the beginning of a string in Rust?

I want a function that can take two arguments (string, number of letters to crop off front) and return the same string except with the letters before character x gone.
If I write
let mut example = "stringofletters";
CropLetters(example, 3);
println!("{}", example);
then the output should be:
ingofletters
Is there any way I can do this?
In many uses it would make sense to simply return a slice of the input, avoiding any copy. Converting #Shepmaster's solution to use immutable slices:
fn crop_letters(s: &str, pos: usize) -> &str {
match s.char_indices().skip(pos).next() {
Some((pos, _)) => &s[pos..],
None => "",
}
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_letters(example, 3);
println!("{}", cropped);
}
Advantages over the mutating version are:
No copy is needed. You can call cropped.to_string() if you want a newly allocated result; but you don't have to.
It works with static string slices as well as mutable String etc.
The disadvantage is that if you really do have a mutable string you want to modify, it would be slightly less efficient as you'd need to allocate a new String.
Issues with your original code:
Functions use snake_case, types and traits use CamelCase.
"foo" is a string literal of type &str. These may not be changed. You will need something that has been heap-allocated, such as a String.
The call crop_letters(stringofletters, 3) would transfer ownership of stringofletters to the method, which means you wouldn't be able to use the variable anymore. You must pass in a mutable reference (&mut).
Rust strings are not ASCII, they are UTF-8. You need to figure out how many bytes each character requires. char_indices is a good tool here.
You need to handle the case of when the string is shorter than 3 characters.
Once you have the byte position of the new beginning of the string, you can use drain to move a chunk of bytes out of the string. We just drop these bytes and let the String move over the remaining bytes.
fn crop_letters(s: &mut String, pos: usize) {
match s.char_indices().nth(pos) {
Some((pos, _)) => {
s.drain(..pos);
}
None => {
s.clear();
}
}
}
fn main() {
let mut example = String::from("stringofletters");
crop_letters(&mut example, 3);
assert_eq!("ingofletters", example);
}
See Chris Emerson's answer if you don't actually need to modify the original String.
I found this answer which I don't consider really idiomatic:
fn crop_with_allocation(string: &str, len: usize) -> String {
string.chars().skip(len).collect()
}
fn crop_without_allocation(string: &str, len: usize) -> &str {
// optional length check
if string.len() < len {
return &"";
}
&string[len..]
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_with_allocation(example, 3);
println!("{}", cropped);
let cropped = crop_without_allocation(example, 3);
println!("{}", cropped);
}
my version
fn crop_str(s: &str, n: usize) -> &str {
let mut it = s.chars();
for _ in 0..n {
it.next();
}
it.as_str()
}
#[test]
fn test_crop_str() {
assert_eq!(crop_str("123", 1), "23");
assert_eq!(crop_str("ЖФ1", 1), "Ф1");
assert_eq!(crop_str("ЖФ1", 2), "1");
}

error: `line` does not live long enough but it's ok in playground

I can't figure it out why my local var line does not live long enough. You can see bellow my code. It work on the Rust's playground.
I may have an idea of the issue: I use a structure (load is a function of this structure). As I want to store the result of the line in a member of my struct, it could be the issue. But I don't see what should I do to resolve this problem.
pub struct Config<'a> {
file: &'a str,
params: HashMap<&'a str, &'a str>
}
impl<'a> Config<'a> {
pub fn new(file: &str) -> Config {
Config { file: file, params: HashMap::new() }
}
pub fn load(&mut self) -> () {
let f = match fs::File::open(self.file) {
Ok(e) => e,
Err(e) => {
println!("Failed to load {}, {}", self.file, e);
return;
}
};
let mut reader = io::BufReader::new(f);
let mut buffer = String::new();
loop {
let result = reader.read_line(&mut buffer);
if result.is_ok() && result.ok().unwrap() > 0 {
let line: Vec<String> = buffer.split("=").map(String::from).collect();
let key = line[0].trim();
let value = line[1].trim();
self.params.insert(key, value);
}
buffer.clear();
}
}
...
}
And I get this error:
src/conf.rs:33:27: 33:31 error: `line` does not live long enough
src/conf.rs:33 let key = line[0].trim();
^~~~
src/conf.rs:16:34: 41:6 note: reference must be valid for the lifetime 'a as defined on the block at 16:33...
src/conf.rs:16 pub fn load(&mut self) -> () {
src/conf.rs:17 let f = match fs::File::open(self.file) {
src/conf.rs:18 Ok(e) => e,
src/conf.rs:19 Err(e) => {
src/conf.rs:20 println!("Failed to load {}, {}", self.file, e);
src/conf.rs:21 return;
...
src/conf.rs:31:87: 37:14 note: ...but borrowed value is only valid for the block suffix following statement 0 at 31:86
src/conf.rs:31 let line: Vec<String> = buffer.split("=").map(String::from).collect();
src/conf.rs:32
src/conf.rs:33 let key = line[0].trim();
src/conf.rs:34 let value = line[1].trim();
src/conf.rs:35
src/conf.rs:36 self.params.insert(key, value);
...
There are three steps in realizing why this does not work.
let line: Vec<String> = buffer.split("=").map(String::from).collect();
let key = line[0].trim();
let value = line[1].trim();
self.params.insert(key, value);
line is a Vec of Strings, meaning the vector owns the strings its containing. An effect of this is that when the vector is freed from memory, the elements, the strings, are also freed.
If we look at string::trim here, we see that it takes and returns a &str. In other words, the function does not allocate anything, or transfer ownership - the string it returns is simply a slice of the original string. So if we were to free the original string, the trimmed string would not have valid data.
The signature of HashMap::insert is fn insert(&mut self, k: K, v: V) -> Option<V>. The function moves both the key and the value, because these needs to be valid for as long as they may be in the hashmap. We would like to give the hashmap the two strings. However, both key and value are just references to strings which is owned by the vector - we are just borrowing them - so we can't give them away.
The solution is simple: copy the strings after they have been split.
let line: Vec<String> = buffer.split("=").map(String::from).collect();
let key = line[0].trim().to_string();
let value = line[1].trim().to_string();
self.params.insert(key, value);
This will allocate two new strings, and copy the trimmed slices into the new strings.
We could have moved the string out of the vector(ie. with Vec::remove), if we didn't trim the strings afterwards; I was unable to find a easy way of trimming a string without allocating a new one.
In addition, as malbarbo mentions, we can avoid the extra allocation that is done with map(String::from), and the creation of the vector with collect(), by simply omitting them.
In this case you have to use String instead of &str. See this to understand the difference.
You can also eliminate the creation of the intermediate vector and use the iterator return by split direct
pub struct Config<'a> {
file: &'a str,
params: HashMap<String, String>
}
...
let mut line = buffer.split("=");
let key = line.next().unwrap().trim().to_string();
let value = line.next().unwrap().trim().to_string();

Iterate over a string, n elements at a time

I'm trying to iterate over a string, but iterating in slices of length n instead of iterator over every character. The following code accomplishes this manually, but is there a more functional way to do this?
fn main() {
let string = "AAABBBCCC";
let offset = 3;
for (i, _) in string.chars().enumerate() {
if i % offset == 0 {
println!("{}", &string[i..(i+offset)]);
}
}
}
I would use a combination of Peekable and Take:
fn main() {
let string = "AAABBBCCC";
let mut z = string.chars().peekable();
while z.peek().is_some() {
let chunk: String = z.by_ref().take(3).collect();
println!("{}", chunk);
}
}
In other cases, Itertools::chunks might do the trick:
extern crate itertools;
use itertools::Itertools;
fn main() {
let string = "AAABBBCCC";
for chunk in &string.chars().chunks(3) {
for c in chunk {
print!("{}", c);
}
println!();
}
}
Standard warning about splitting strings
Be aware of issues with bytes / characters / code points / graphemes whenever you start splitting strings. With anything more complicated than ASCII characters, one character is not one byte and string slicing operates on bytes! There is also the concept of Unicode code points, but multiple Unicode characters may combine to form what a human thinks of as a single character. This stuff is non-trivial.
If you actually just have ASCII data, it may be worth it to store it as such, perhaps in a Vec<u8>. At the very least, I'd create a newtype that wraps a &str and only exposes ASCII-safe method and validates that it is ASCII when created.
chunks() is not available for &str because it is not really well-defined on strings - do you want chunks with length in bytes, or characters, or grapheme clusters? If you know in advance that your string is in ASCII you can use the following code:
use std::str;
fn main() {
let string = "AAABBBCCC";
for chunk in str_chunks(string, 3) {
println!("{}", chunk);
}
}
fn str_chunks<'a>(s: &'a str, n: usize) -> Box<Iterator<Item=&'a str>+'a> {
Box::new(s.as_bytes().chunks(n).map(|c| str::from_utf8(c).unwrap()))
}
However, it will break immediately if your strings have non-ASCII characters inside them. I'm pretty sure that it is possible to implement an iterator which splits a string into chunks of code points or grapheme clusters - it is just there is no such thing in the standard library now.
You can always implement your own iterator. Of course that still requires quite some code, but it's not at the location where you are working with the string. Therefor your loop stays readable.
#![feature(collections)]
struct StringChunks<'a> {
s: &'a str,
step: usize,
n: usize,
}
impl<'a> StringChunks<'a> {
fn new(s: &'a str, step: usize) -> StringChunks<'a> {
StringChunks {
s: s,
step: step,
n: s.chars().count(),
}
}
}
impl<'a> Iterator for StringChunks<'a> {
type Item = &'a str;
fn next(&mut self) -> Option<&'a str> {
if self.step > self.n {
return None;
}
let ret = self.s.slice_chars(0, self.step);
self.s = self.s.slice_chars(self.step, self.n);
self.n -= self.step;
Some(ret)
}
}
fn main() {
let string = "AAABBBCCC";
for s in StringChunks::new(string, 3) {
println!("{}", s);
}
}
Note that this splits after n unicode chars. So graphemes or similar might end up split up.

Resources