Rust String concatenation [duplicate] - string

This question already has answers here:
How do I concatenate strings?
(9 answers)
Closed 7 years ago.
I started programming with Rust this week and I am having a lot of problems understanding how Strings work.
Right now, I am trying to do a simple program that prints a list of players appending their order(for learning purposes only).
let res : String = pl.name.chars().enumerate().fold(String::new(),|res,(i,ch)| -> String {
res+=format!("{} {}\n",i.to_string(),ch.to_string());
});
println!("{}", res);
This is my idea, I know I could just use a for loop but the objective is to understand the different Iterator functions.
So, my problem is that the String concatenation does not work.
Compiling prueba2 v0.1.0 (file:///home/pancho111203/projects/prueba2)
src/main.rs:27:13: 27:16 error: binary assignment operation `+=` cannot be applied to types `collections::string::String` and `collections::string::String` [E0368]
src/main.rs:27 res+=format!("{} {}\n",i.to_string(),ch.to_string());
^~~
error: aborting due to previous error
Could not compile `prueba2`.
I tried using &str but it is not possible to create them from i and ch values.

First, in Rust x += y is not overloadable, so += operator won't work for anything except basic numeric types. However, even if it worked for strings, it would be equivalent to x = x + y, like in the following:
res = res + format!("{} {}\n",i.to_string(),ch.to_string())
Even if this were allowed by the type system (it is not because String + String "overload" is not defined in Rust), this is still not how fold() operates. You want this:
res + &format!("{} {}\n", i, ch)
or, as a compilable example,
fn main(){
let x = "hello";
let res : String = x.chars().enumerate().fold(String::new(), |res, (i, ch)| {
res + &format!("{} {}\n", i, ch)
});
println!("{}", res);
}
When you perform a fold, you don't reassign the accumulator variable, you need to return the new value for it to be used on the next iteration, and this is exactly what res + format!(...) do.
Note that I've removed to_string() invocations because they are completely unnecessary - in fact, x.to_string() is equivalent to format!("{}", x), so you only perform unnecessary allocations here.
Additionally, I'm taking format!() result by reference: &format!(...). This is necessary because + "overload" for strings is defined for String + &str pair of types, so you need to convert from String (the result of format!()) to &str, and this can be done simply by using & here (because of deref coercion).
In fact, the following would be more efficient:
use std::fmt::Write;
fn main(){
let x = "hello";
let res: String = x.chars().enumerate().fold(String::new(), |mut res, (i, ch)| {
write!(&mut res, "{} {}\n", i, ch).unwrap();
res
});
println!("{}", res);
}
which could be written more idiomatically as
use std::fmt::Write;
fn main(){
let x = "hello";
let mut res = String::new();
for (i, ch) in x.chars().enumerate() {
write!(&mut res, "{} {}\n", i, ch).unwrap();
}
println!("{}", res);
}
(try it on playpen)
This way no extra allocations (i.e. new strings from format!()) are created. We just fill the string with the new data, very similar, for example, to how StringBuilder in Java works. use std::fmt::Write here is needed to allow calling write!() on &mut String.
I would also suggest reading the chapter on strings in the official Rust book (and the book as a whole if you're new to Rust). It explains what String and &str are, how they are different and how to work with them efficiently.

Related

Join iterator of &str [duplicate]

This question already has answers here:
What's an idiomatic way to print an iterator separated by spaces in Rust?
(4 answers)
Closed 3 years ago.
How do I convert an Iterator<&str> to a String, interspersed with a constant string such as "\n"?
For instance, given:
let xs = vec!["first", "second", "third"];
let it = xs.iter();
One may produce a string s by collecting into a Vec<&str> and joining the result:
let s = it
.map(|&x| x)
.collect::<Vec<&str>>()
.join("\n");
However, this unnecessarily allocates memory for a Vec<&str>.
Is there a more direct method?
You could use the itertools crate for that. I use the intersperse helper in the example, it is pretty much the join equivalent for iterators.
cloned() is needed to convert &&str items to &str items, it is not doing any allocations. It can be eventually replaced by copied() when rust#1.36 gets a stable release.
use itertools::Itertools; // 0.8.0
fn main() {
let words = ["alpha", "beta", "gamma"];
let merged: String = words.iter().cloned().intersperse(", ").collect();
assert_eq!(merged, "alpha, beta, gamma");
}
Playground
You can do it by using fold function of the iterator easily:
let s = it.fold(String::new(), |a, b| a + b + "\n");
The Full Code will be like following:
fn main() {
let xs = vec!["first", "second", "third"];
let it = xs.into_iter();
// let s = it.collect::<Vec<&str>>().join("\n");
let s = it.fold(String::new(), |a, b| a + b + "\n");
let s = s.trim_end();
println!("{:?}", s);
}
Playground
EDIT: After the comment of Sebastian Redl I have checked the performance cost of the fold usage and created a benchmark test on playground.
You can see that fold usage takes significantly more time for the many iterative approaches.
Did not check the allocated memory usage though.
there's relevant example in rust documentation: here.
let words = ["alpha", "beta", "gamma"];
// chars() returns an iterator
let merged: String = words.iter()
.flat_map(|s| s.chars())
.collect();
assert_eq!(merged, "alphabetagamma");
You can also use Extend trait:
fn f<'a, I: Iterator<Item=&'a str>>(data: I) -> String {
let mut ret = String::new();
ret.extend(data);
ret
}

How to find the starting offset of a string slice of another string? [duplicate]

This question already has answers here:
How to get the byte offset between `&str`
(2 answers)
Closed 3 years ago.
Given a string and a slice referring to some substring, is it possible to find the starting and ending index of the slice?
I have a ParseString function which takes in a reference to a string, and tries to parse it according to some grammar:
ParseString(inp_string: &str) -> Result<(), &str>
If the parsing is fine, the result is just Ok(()), but if there's some error, it usually is in some substring, and the error instance is Err(e), where e is a slice of that substring.
When given the substring where the error occurs, I want to say something like "Error from characters x to y", where x and y are the starting and ending indices of the erroneous substring.
I don't want to encode the position of the errors directly in Err, because I'm nesting these invocations, and the offsets in the nested slice might not correspond to the some slice in the top level string.
As long as all of your string slices borrow from the same string buffer, you can calculate offsets with simple pointer arithmetic. You need the following methods:
str::as_ptr(): Returns the pointer to the start of the string slice
A way to get the difference between two pointers. Right now, the easiest way is to just cast both pointers to usize (which is always a no-op) and then subtract those. On 1.47.0+, there is a method offset_from() which is slightly nicer.
Here is working code (Playground):
fn get_range(whole_buffer: &str, part: &str) -> (usize, usize) {
let start = part.as_ptr() as usize - whole_buffer.as_ptr() as usize;
let end = start + part.len();
(start, end)
}
fn main() {
let input = "Everyone ♥ Ümläuts!";
let part1 = &input[1..7];
println!("'{}' has offset {:?}", part1, get_range(input, part1));
let part2 = &input[7..16];
println!("'{}' has offset {:?}", part2, get_range(input, part2));
}
Rust actually used to have an unstable method for doing exactly this, but it was removed due to being obsolete, which was a bit odd considering the replacement didn't remotely have the same functionality.
That said, the implementation isn't that big, so you can just add the following to your code somewhere:
pub trait SubsliceOffset {
/**
Returns the byte offset of an inner slice relative to an enclosing outer slice.
Examples
```ignore
let string = "a\nb\nc";
let lines: Vec<&str> = string.lines().collect();
assert!(string.subslice_offset_stable(lines[0]) == Some(0)); // &"a"
assert!(string.subslice_offset_stable(lines[1]) == Some(2)); // &"b"
assert!(string.subslice_offset_stable(lines[2]) == Some(4)); // &"c"
assert!(string.subslice_offset_stable("other!") == None);
```
*/
fn subslice_offset_stable(&self, inner: &Self) -> Option<usize>;
}
impl SubsliceOffset for str {
fn subslice_offset_stable(&self, inner: &str) -> Option<usize> {
let self_beg = self.as_ptr() as usize;
let inner = inner.as_ptr() as usize;
if inner < self_beg || inner > self_beg.wrapping_add(self.len()) {
None
} else {
Some(inner.wrapping_sub(self_beg))
}
}
}
You can remove the _stable suffix if you don't need to support old versions of Rust; it's just there to avoid a name conflict with the now-removed subslice_offset method.

How to test if a string contains each character in a pattern in order?

I'm trying to port this Python function that returns true if each character in the pattern appears in the test string in order.
def substr_match(pattern, document):
p_idx, d_idx, p_len, d_len = 0, 0, len(pattern), len(document)
while (p_idx != p_len) and (d_idx != d_len):
if pattern[p_idx].lower() == document[d_idx].lower():
p_idx += 1
d_idx += 1
return p_len != 0 and d_len != 0 and p_idx == p_len
This is what I have at the moment.
fn substr_match(pattern: &str, document: &str) -> bool {
let mut pattern_idx = 0;
let mut document_idx = 0;
let pattern_len = pattern.len();
let document_len = document.len();
while (pattern_idx != pattern_len) && (document_idx != document_len) {
let pat: Vec<_> = pattern.chars().nth(pattern_idx).unwrap().to_lowercase().collect();
let doc: Vec<_> = document.chars().nth(document_idx).unwrap().to_lowercase().collect();
if pat == doc {
pattern_idx += 1;
}
document_idx += 1;
}
return pattern_len != 0 && document_len != 0 && pattern_idx == pattern_len;
}
I tried s.chars().nth(n) since Rust doesn't seem to allow string indexing, but I feel there is a more idiomatic way of doing it. What would be the preferred way of writing this in Rust?
Here is mine:
fn substr_match(pattern: &str, document: &str) -> bool {
let pattern_chars = pattern.chars().flat_map(char::to_lowercase);
let mut doc_chars = document.chars().flat_map(char::to_lowercase);
'outer: for p in pattern_chars {
for d in &mut doc_chars {
if d == p {
continue 'outer;
}
}
return false;
}
true
}
The other answers mimic the behavior of the Python function you started with, but it may be worth trying to make it better. I thought of two test cases where the original function may have surprising behavior:
>>> substr_match("ñ", "in São Paulo")
True
>>> substr_match("🇺🇸", "🇺🇦🇸🇰")
True
Hmm.
(The first example may depend on your input method; try copying and pasting. Also, if you can't see them, the special characters in the second example are flag emoji for the United States, Ukraine, and Slovakia.)
Without getting into why these tests fail or all the other things that could potentially be undesired, if you want to correctly handle Unicode text, you need to, at minimum, operate on graphemes instead of code points (this question describes the difference). Rust doesn't provide this feature in the standard library, so you need the unicode-segmentation crate, which provides a graphemes method on str.
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
fn substr_match(pattern: &str, document: &str) -> bool {
let mut haystack = document.graphemes(true);
pattern.len() > 0 && pattern.graphemes(true).all(|needle| {
haystack
.find(|grapheme| {
grapheme
.chars()
.flat_map(char::to_lowercase)
.eq(needle.chars().flat_map(char::to_lowercase))
})
.is_some()
})
}
Playground, test cases provided.
This algorithm takes advantage of several convenience methods on Iterator. all iterates over the pattern. find short-circuits, so whenever it finds the next needle in haystack, the next call to haystack.find will start at the following element.
(I thought this approach was somewhat clever, but honestly, a nested for loop is probably easier to read, so you might prefer that.)
The last "tricky" bit is case-insensitive string comparison, which is inherently language-dependent, but if you're willing to accept only unconditional mappings (those that apply in any language), char::to_lowercase does the trick. Rather than collect the result into a String, though, you can use Iterator::eq to compare the sequences of (lowercased) characters.
One other thing you may want to consider is Unicode normalization -- this question is a good place for the broad strokes. Fortunately, Rust has a unicode-normalization crate, too! And it looks quite easy to use. (You wouldn't necessarily want to use it in this function, though; instead, you might normalize all text on input so that you're dealing with the same normalization form everywhere in your program.)
str::chars() returns an iterator. Iterators return elements from a sequence one at a time. Specifically, str::chars() returns characters from a string one at a time. It's much more efficient to use a single iterator to iterate over a string than to create a new iterator each time you want to look up a character, because s.chars().nth(n) needs to perform a linear scan in order to find the nth character in the UTF-8 encoded string.
fn substr_match(pattern: &str, document: &str) -> bool {
let mut pattern_iter = pattern.chars();
let mut pattern_ch_lower: String = match pattern_iter.next() {
Some(ch) => ch,
None => return false,
}.to_lowercase().collect();
for document_ch in document.chars() {
let document_ch_lower: String = document_ch.to_lowercase().collect();
if pattern_ch_lower == document_ch_lower {
pattern_ch_lower = match pattern_iter.next() {
Some(ch) => ch,
None => return true,
}.to_lowercase().collect();
}
}
return false;
}
Here, I'm demonstrating two ways of using iterators:
To iterate over the pattern, I'm using the next method manually. next returns an Option: Some(value) if the iterator hasn't finished, or None if it has.
To iterate over the document, I'm using a for loop. The for loop does the work of calling next and unwrapping the result until next returns None.
One thing to notice is that I'm using a return expression inside a match expression (twice). Since a return expression doesn't produce a value, the compiler knows that its type doesn't matter. In this case, on the Some arm, the result is a char, so the whole match evaluates to a char.
We could also do this with two nested for loops:
fn substr_match(pattern: &str, document: &str) -> bool {
if pattern.len() == 0 {
return false;
}
let mut document_iter = document.chars();
for pattern_ch in pattern.chars() {
let pattern_ch_lower: String = pattern_ch.to_lowercase().collect();
for document_ch in &mut document_iter {
let document_ch_lower: String = document_ch.to_lowercase().collect();
if pattern_ch_lower == document_ch_lower {
break;
}
}
return false;
}
return true;
}
There are two things to notice here:
We need to handle the case where the pattern is empty without using the iterator.
In the inner loop, we don't want to restart from the start of the document when we move to the next pattern character, so we need to reuse the same iterator over the document. When we write for x in iter, the for loop takes ownership of iter; to avoid that, we must write &mut iter instead. Mutable references to iterators are iterators themselves, thanks to the blanket implementation impl<'a, I> Iterator for &'a mut I where I: Iterator + ?Sized in the standard library.

What is the "standard" way to concatenate strings?

While I understand basically what str and std::string::String are and how they relate to each other, I find it a bit cumbersome to compose strings out of various parts without spending too much time and thought on it. So as usual I suspect I did not see the proper way to do it yet, which makes it intuitive and a breeze.
let mut s = std::string::String::with_capacity(200);
let precTimeToJSON = | pt : prectime::PrecTime, isLast : bool | {
s.push_str(
"{ \"sec\": "
+ &(pt.sec.to_string())
+ " \"usec\": "
+ &(pt.usec.to_string())
+ if isLast {"}"} else {"},"})
};
The code above is honored by the compiler with error messages like:
src\main.rs:25:20: 25:33 error: binary operation + cannot be applied to type &'static str [E0369]
And even after half an hours worth of fiddling and randomly adding &, I could not make this compilable. So, here my questions:
What do I have to write to achieve the obvious?
What is the "standard" way to do this in Rust?
The Rust compiler is right (of course): there's no + operator for string literals.
I believe the format!() macro is the idiomatic way to do what you're trying to do. It uses the std::fmt syntax, which essentially consists of a formatting string and the arguments to format (a la C's printf). For your example, it would look something like this:
let mut s: String = String::new();
let precTimeToJSON = | pt : prectime::PrecTime, isLast : bool | {
s = format!("{{ \"sec\": {} \"usec\": {} }}{}",
pt.sec,
pt.usec,
if isLast { "" } else { "," }
)
};
Because it's a macro, you can intermix types in the argument list freely, so long as the type implements the std::fmt::Display trait (which is true for all built-in types). Also, you must escape literal { and } as {{ and }}, respectively. Last, note that the format string must be a string literal, because the macro parses it and the expanded code looks nothing like the original format! expression.
Here's a playground link to the above example.
Two more points for you. First, if you're reading and writing JSON, have a look at a library such as serde. It's much less painful!
Second, if you just want to concatenate &'static str strings (that is, string literals), you can do that with zero run-time cost with the concat!() macro. It won't help you in your case above, but it might with other similar ones.
Itertools::format can help you write this as a single expression if you really want to.
let times: Vec<PrecTime>; // iterable of PrecTime
let s = format!("{}", times.iter().format(",", |pt, f|
f(&format_args!(r#"{{ "sec": {}, "usec": {} }}"#, pt.sec, pt.usec))
));
format() uses a separator, so just specify "," there (or "" if you need no separator). It's a bit involved so that the formatting can be completely lazy and composable. You receive a callback f that you call back with a &Display value (anything that can be Display formatted).
Here we demonstrate this great trick of using &format_args!() to construct a displayable value. This is something that comes in handy if you use the debug builder API as well.
Finally, use a raw string so that we don't need to escape the inner " in the format: r#"{{ "sec": {} "usec": {} }}"#. Raw strings are delimited by r#" and "# (free choice of number of #).
Itertools::format() uses no intermediate allocations, it is all directly passed on to the underlying formatter object.
You can also do this madness:
fn main() {
let mut s = std::string::String::with_capacity(200);
// Have to put this in a block so precTimeToJSON is dropped, see https://doc.rust-lang.org/book/closures.html
{
// I have no idea why this has to be mut...
let mut precTimeToJSON = |sec: u64, usec: u64, isLast: bool| {
s.push_str(&( // Coerce String to str. See https://doc.rust-lang.org/book/deref-coercions.html
"{ \"sec\": ".to_string() // String
+ &sec.to_string() // + &str (& coerces a String to a &str).
+ " \"usec\": " // + &str
+ &usec.to_string() // + &str
+ if isLast {"}"} else {"},"} // + &str
));
};
precTimeToJSON(30, 20, false);
}
println!("{}", &s);
}
Basically the operator String + &str -> String is defined, so you can do String + &str + &str + &str + &str. That gives you a String which you have to coerce back to a &str using &. I think this way is probably quite inefficient though as it will (possibly) allocate loads of Strings.

How do I get the first character out of a string?

I want to get the first character of a std::str. The method char_at() is currently unstable, as is String::slice_chars.
I have come up with the following, but it seems excessive to get a single character and not use the rest of the vector:
let text = "hello world!";
let char_vec: Vec<char> = text.chars().collect();
let ch = char_vec[0];
UTF-8 does not define what "character" is so it depends on what you want. In this case, chars are Unicode scalar values, and so the first char of a &str is going to be between one and four bytes.
If you want just the first char, then don't collect into a Vec<char>, just use the iterator:
let text = "hello world!";
let ch = text.chars().next().unwrap();
Alternatively, you can use the iterator's nth method:
let ch = text.chars().nth(0).unwrap();
Bear in mind that elements preceding the index passed to nth will be consumed from the iterator.
I wrote a function that returns the head of a &str and the rest:
fn car_cdr(s: &str) -> (&str, &str) {
for i in 1..5 {
let r = s.get(0..i);
match r {
Some(x) => return (x, &s[i..]),
None => (),
}
}
(&s[0..0], s)
}
Use it like this:
let (first_char, remainder) = car_cdr("test");
println!("first char: {}\nremainder: {}", first_char, remainder);
The output looks like:
first char: t
remainder: est
It works fine with chars that are more than 1 byte.
Get the first single character out of a string w/o using the rest of that string:
let text = "hello world!";
let ch = text.chars().take(1).last().unwrap();
It would be nice to have something similar to Haskell's head function and tail function for such cases.
I wrote this function to act like head and tail together (doesn't match exact implementation)
pub fn head_tail<T: Iterator, O: FromIterator<<T>::Item>>(iter: &mut T) -> (Option<<T>::Item>, O) {
(iter.next(), iter.collect::<O>())
}
Usage:
// works with Vec<i32>
let mut val = vec![1, 2, 3].into_iter();
println!("{:?}", head_tail::<_, Vec<i32>>(&mut val));
// works with chars in two ways
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, String>(&mut val));
// calling the function with Vec<char>
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, Vec<char>>(&mut val));
NOTE: The head_tail function doesn't panic! if the iterator is empty. If this matched Haskell's head/tail output, this would have thrown an exception if the iterator was empty. It might also be good to use iterable trait to be more compatible to other types.
If you only want to test for it, you can use starts_with():
"rust".starts_with('r')
"rust".starts_with(|c| c == 'r')
I think it is pretty straight forward
let text = "hello world!";
let c: char = text.chars().next().unwrap();
next() takes the next item from the iterator
To “unwrap” something in Rust is to say, “Give me the result of the computation, and if there was an error, panic and stop the program.”
The accepted answer is a bit ugly!
let text = "hello world!";
let ch = &text[0..1]; // this returns "h"

Resources