Efficient ways to build new Strings in Rust

Efficient ways to build new Strings in Rust - string

I have just recently started learning Rust and have been messing around with some code. I wanted to create a simple function that removes vowels from a String and returns a new String. The code below functions, but I was concerned if this truly was a valid, typical approach in this language or if I'm missing something...
// remove vowels by building a String using .contains() on a vowel array
fn remove_vowels(s: String) -> String {
let mut no_vowels: String = String::new();
for c in s.chars() {
if !['a', 'e', 'i', 'o', 'u'].contains(&c) {
no_vowels += &c.to_string();
}
}
return no_vowels;
}
First, using to_string() to construct a new String and then using & to borrow just seemed off. Is there a simpler way to append characters to a String, or is this the only way to go? Or should I rewrite this entirely and iterate through the inputted String using a loop by its length, not by a character array?
Also, I have been informed that it's quite popular in Rust to not use the return statement but to instead let the last expression return the value from the function. Is my return statement required here, or is there a cleaner way to return that value that follows convention?

If you consume the original String as your example does, you can remove the vowels in-place using retain(), which will avoid allocating a new string:
fn remove_vowels(mut s: String) -> String {
s.retain(|c| !['a', 'e', 'i', 'o', 'u'].contains(&c));
s
}
See it working on the playground. Side note: you may want to consider uppercase vowels as well.

You can use collect on an iterator of characters to create a String. You can filter out the characters you don't want using filter.
// remove vowels by building a String using .contains() on a vowel array
fn remove_vowels(s: &str) -> String {
s.chars()
.filter(|c| !['a', 'e', 'i', 'o', 'u'].contains(c))
.collect()
}
playground
If this is in a performance critical region, then since you know the characters you're removing are single bytes in utf8, they are OK to remove directly from the bytes instead. Which means you can write something like
fn remove_vowels(s: &str) -> String {
String::from_utf8(
s.bytes()
.filter(|c| ![b'a', b'e', b'i', b'o', b'u'].contains(c))
.collect()
).unwrap()
}
which may be more efficient. playground

Related

Palindrome code in rust programming language

I am trying to write a palindrome program in Rust.
Even when the input is a palindrome word, my attempt is not showing a palindrome:
use std::io;
fn main(){
println!("enter a word to know if palindrome or not");
let mut inp=String::new();
io::stdin().read_line(&mut inp).expect("needed a string");
let arr:Vec<_>=inp.chars().collect();
let mut new_st=String::new();
for i in 0..arr.len(){
new_st.push(arr[arr.len()-1-i]);
}
if inp.eq(&new_st[1..]) {
println!("Palindrome");
}
else{
println!("not a palindrome..");
}
println!("{}",&new_st[1..]);
}
Output:
enter a word to know if palindrome or not
amma
not a palindrome..
amma

The problem is that the .read_line() function adds a \n to the end of inp string. You should remove it from the string or better yet use the .trim() method on the String to strip out any newline or whitespace characters.
inp = inp.trim().to_string();
Just some more improvement on your code, you should leverage Rust's iterators to reverse the String faster rather than manually doing it.
You can write this:-
let rev = inp.chars().rev().collect::<String>();
This iterates over all the characters in the inp string and reverses them in order. Finally it collects them into a String that is stored in the rev variable.
Also, you should use == rather than using the .eq() operator, it's just much more clearer.
See this playground link for complete code

Use char as &str in HashMap

I want to create a HashMap which maps words — a Vec of &str — and letters of those words each to another. For example, vec!["ab", "b", "abc"] will be converted to the following HashMap
{
// Letters are keys, words which contain the keys are values
"a" => ["ab", "abc"],
"b" => ["ab", "bc", "abc"],
"c" => ["bc", "abc"],
// Words are keys, letters which are in the words are values
"ab" => ["a", "b"],
"abc" => ["a", "b", "c"],
}
I tried this code [playground]:
let words = vec!["ab", "bc", "abc"];
let mut map: HashMap<&str, Vec<&str>> = HashMap::new();
for word in words.iter() {
for letter in word.chars() {
map.entry(letter).or_default().push(word);
map.entry(word).or_default().push(letter);
}
}
but there is a problem: letter is of type char but I need a &str because map accepts only &strs as keys. I also tried to convert letter to a &str:
for word in words.iter() {
for letter in word.chars() {
let letter = letter.to_string()
// no changes
but this code creates a new String which has a smaller lifetime than map's one. In other words, letter is dropped after the nested for loop but and I get compiler error.
How can I use a char in HashMap which accepts only &strs as keys?

I would separate the map into two:
One is char -> Vec<&str>. It maps letters to a list of words.
The second one would be &str -> Vec<char>, but I do not know if you really need it: Why not just iterate over the chars of a &str directly?
Storing a Vec<char> essentially just doubles the amount of memory you use, unless the Vec<char> is e.g. sorted in a particular order (which may or may not be necessary).
If you really want to keep them in one map, I think it is easier to have a HashMap<String, Vec<String>>.
The problem seems to be that chars gives you one char after another, but what you actually want is a &str after another, where each &str actually encompasses a single character. I did not find anything like that in the docs for &str, but maybe there is something somewhere that iterates like this.
You could work-around it using matches:
let words = vec!["ab", "bc", "abc"];
let mut map: HashMap<&str, Vec<&str>> = HashMap::new();
for word in words.iter() {
for letter in word.matches(|_| true) { // iterates over &str's that encompass one single character
map.entry(letter).or_default().push(word);
map.entry(word).or_default().push(letter);
}
}

Get a random character from a string and append to another string

I'm trying to write the Rust equivalent of the following C++ code:
result += consonants[rand() % consonants.length()];
It is meant to take a random character from the string consonants and append it to the string result.
I seem to have found a working Rust equivalent, but it's... monstrous, to say the least. What would be a more idiomatic equivalent?
format!("{}{}", result, consonants.chars().nth(rand::thread_rng().gen_range(1, consonants.chars().count())).unwrap().to_string());

A few things:
You don't need to use format!() here. There is String::push() which appends a single char.
There is also the rand::sample() function which can randomly choose multiple elements from an iterator. This looks like the perfect fit!
So let's see how this fits together! I created three different versions for different use cases.
1. Unicode string (the general case)
let consonants = "bcdfghjklmnpqrstvwxyz";
let mut result = String::new();
result.push(rand::sample(&mut rand::thread_rng(), consonants.chars(), 1)[0]);
// | |
// sample one element from the iterator --+ |
// |
// get the first element from the returned vector --+
(Playground)
We sample only one element from the iterator and immediately push it to the string. Still not as short as with C's rand(), but please note that rand() is considered harmful for any kind of serious use! Using C++'s <random> header is a lot better, but will require a little bit more code, too. Additionally, your C version can't handle multi-byte characters (e.g. UTF-8 encoding), while the Rust version has full UTF-8 support.
2. ASCII string
However, if you only want to have a string with English consonants, then UTF-8 is not needed and we can make use of O(1) indexing, by using a byte slice:
use rand::{thread_rng, Rng};
let consonants = b"bcdfghjklmnpqrstvwxyz";
let mut result = String::new();
result.push(thread_rng().choose(consonants).cloned().unwrap().into());
// convert Option<&u8> into Option<u8> ^^^^^^
// unwrap, because we know `consonants` is not empty ^^^^^^
// convert `u8` into `char` ^^^^
(Playground)
3. Collection of characters with Unicode support
As mentioned in the comments, you probably just want a collection of characters ("consonants"). This means, we don't have to use a string, but rather an array of chars. So here is one last version which does have UTF-8 support and avoids O(n) indexing:
use rand::{thread_rng, Rng};
// If you need to avoid the heap allocation here, you can create a static
// array like this: let consonants = ['b', 'c', 'd', ...];
let consonants: Vec<_> = "bcdfghjklmnpqrstvwxyz".chars().collect();
let mut result = String::new();
result.push(*thread_rng().choose(&consonants).unwrap());
(Playground)

Why is capitalizing the first letter of a string so convoluted in Rust?

I'd like to capitalize the first letter of a &str. It's a simple problem and I hope for a simple solution. Intuition tells me to do something like this:
let mut s = "foobar";
s[0] = s[0].to_uppercase();
But &strs can't be indexed like this. The only way I've been able to do it seems overly convoluted. I convert the &str to an iterator, convert the iterator to a vector, upper case the first item in the vector, which creates an iterator, which I index into, creating an Option, which I unwrap to give me the upper-cased first letter. Then I convert the vector into an iterator, which I convert into a String, which I convert to a &str.
let s1 = "foobar";
let mut v: Vec<char> = s1.chars().collect();
v[0] = v[0].to_uppercase().nth(0).unwrap();
let s2: String = v.into_iter().collect();
let s3 = &s2;
Is there an easier way than this, and if so, what? If not, why is Rust designed this way?
Similar question

Why is it so convoluted?
Let's break it down, line-by-line
let s1 = "foobar";
We've created a literal string that is encoded in UTF-8. UTF-8 allows us to encode the 1,114,112 code points of Unicode in a manner that's pretty compact if you come from a region of the world that types in mostly characters found in ASCII, a standard created in 1963. UTF-8 is a variable length encoding, which means that a single code point might take from 1 to 4 bytes. The shorter encodings are reserved for ASCII, but many Kanji take 3 bytes in UTF-8.
let mut v: Vec<char> = s1.chars().collect();
This creates a vector of characters. A character is a 32-bit number that directly maps to a code point. If we started with ASCII-only text, we've quadrupled our memory requirements. If we had a bunch of characters from the astral plane, then maybe we haven't used that much more.
v[0] = v[0].to_uppercase().nth(0).unwrap();
This grabs the first code point and requests that it be converted to an uppercase variant. Unfortunately for those of us who grew up speaking English, there's not always a simple one-to-one mapping of a "small letter" to a "big letter". Side note: we call them upper- and lower-case because one box of letters was above the other box of letters back in the day.
This code will panic when a code point has no corresponding uppercase variant. I'm not sure if those exist, actually. It could also semantically fail when a code point has an uppercase variant that has multiple characters, such as the German ß. Note that ß may never actually be capitalized in The Real World, this is the just example I can always remember and search for. As of 2017-06-29, in fact, the official rules of German spelling have been updated so that both "ẞ" and "SS" are valid capitalizations!
let s2: String = v.into_iter().collect();
Here we convert the characters back into UTF-8 and require a new allocation to store them in, as the original variable was stored in constant memory so as to not take up memory at run time.
let s3 = &s2;
And now we take a reference to that String.
It's a simple problem
Unfortunately, this is not true. Perhaps we should endeavor to convert the world to Esperanto?
I presume char::to_uppercase already properly handles Unicode.
Yes, I certainly hope so. Unfortunately, Unicode isn't enough in all cases.
Thanks to huon for pointing out the Turkish I, where both the upper (İ) and lower case (i) versions have a dot. That is, there is no one proper capitalization of the letter i; it depends on the locale of the the source text as well.
why the need for all data type conversions?
Because the data types you are working with are important when you are worried about correctness and performance. A char is 32-bits and a string is UTF-8 encoded. They are different things.
indexing could return a multi-byte, Unicode character
There may be some mismatched terminology here. A char is a multi-byte Unicode character.
Slicing a string is possible if you go byte-by-byte, but the standard library will panic if you are not on a character boundary.
One of the reasons that indexing a string to get a character was never implemented is because so many people misuse strings as arrays of ASCII characters. Indexing a string to set a character could never be efficient - you'd have to be able to replace 1-4 bytes with a value that is also 1-4 bytes, causing the rest of the string to bounce around quite a lot.
to_uppercase could return an upper case character
As mentioned above, ß is a single character that, when capitalized, becomes two characters.
Solutions
See also trentcl's answer which only uppercases ASCII characters.
Original
If I had to write the code, it'd look like:
fn some_kind_of_uppercase_first_letter(s: &str) -> String {
let mut c = s.chars();
match c.next() {
None => String::new(),
Some(f) => f.to_uppercase().chain(c).collect(),
}
}
fn main() {
println!("{}", some_kind_of_uppercase_first_letter("joe"));
println!("{}", some_kind_of_uppercase_first_letter("jill"));
println!("{}", some_kind_of_uppercase_first_letter("von Hagen"));
println!("{}", some_kind_of_uppercase_first_letter("ß"));
}
But I'd probably search for uppercase or unicode on crates.io and let someone smarter than me handle it.
Improved
Speaking of "someone smarter than me", Veedrac points out that it's probably more efficient to convert the iterator back into a slice after the first capital codepoints are accessed. This allows for a memcpy of the rest of the bytes.
fn some_kind_of_uppercase_first_letter(s: &str) -> String {
let mut c = s.chars();
match c.next() {
None => String::new(),
Some(f) => f.to_uppercase().collect::<String>() + c.as_str(),
}
}

Is there an easier way than this, and if so, what? If not, why is Rust designed this way?
Well, yes and no. Your code is, as the other answer pointed out, not correct, and will panic if you give it something like བོད་སྐད་ལ་. So doing this with Rust's standard library is even harder than you initially thought.
However, Rust is designed to encourage code reuse and make bringing in libraries easy. So the idiomatic way to capitalize a string is actually quite palatable:
extern crate inflector;
use inflector::Inflector;
let capitalized = "some string".to_title_case();

It's not especially convoluted if you are able to limit your input to ASCII-only strings.
Since Rust 1.23, str has a make_ascii_uppercase method (in older Rust versions, it was available through the AsciiExt trait). This means you can uppercase ASCII-only string slices with relative ease:
fn make_ascii_titlecase(s: &mut str) {
if let Some(r) = s.get_mut(0..1) {
r.make_ascii_uppercase();
}
}
This will turn "taylor" into "Taylor", but it won't turn "édouard" into "Édouard". (playground)
Use with caution.

I did it this way:
fn str_cap(s: &str) -> String {
format!("{}{}", (&s[..1].to_string()).to_uppercase(), &s[1..])
}
If it is not an ASCII string:
fn str_cap(s: &str) -> String {
format!("{}{}", s.chars().next().unwrap().to_uppercase(),
s.chars().skip(1).collect::<String>())
}

The OP's approach taken further:
replace the first character with its uppercase representation
let mut s = "foobar".to_string();
let r = s.remove(0).to_uppercase().to_string() + &s;
or
let r = format!("{}{s}", s.remove(0).to_uppercase());
println!("{r}");
works with Unicode characters as well eg. "😎foobar"
The first guaranteed to be an ASCII character, can changed to a capital letter in place:
let mut s = "foobar".to_string();
if !s.is_empty() {
s[0..1].make_ascii_uppercase(); // Foobar
}
Panics with a non ASCII character in first position!

Since the method to_uppercase() returns a new string, you should be able to just add the remainder of the string like so.
this was tested in rust version 1.57+ but is likely to work in any version that supports slice.
fn uppercase_first_letter(s: &str) -> String {
s[0..1].to_uppercase() + &s[1..]
}

Here's a version that is a bit slower than #Shepmaster's improved version, but also more idiomatic:
fn capitalize_first(s: &str) -> String {
let mut chars = s.chars();
chars
.next()
.map(|first_letter| first_letter.to_uppercase())
.into_iter()
.flatten()
.chain(chars)
.collect()
}

This is how I solved this problem, notice I had to check if self is not ascii before transforming to uppercase.
trait TitleCase {
fn title(&self) -> String;
}
impl TitleCase for &str {
fn title(&self) -> String {
if !self.is_ascii() || self.is_empty() {
return String::from(*self);
}
let (head, tail) = self.split_at(1);
head.to_uppercase() + tail
}
}
pub fn main() {
println!("{}", "bruno".title());
println!("{}", "b".title());
println!("{}", "🦀".title());
println!("{}", "ß".title());
println!("{}", "".title());
println!("{}", "བོད་སྐད་ལ".title());
}
Output
Bruno
B
🦀
ß
བོད་སྐད་ལ

Inspired by get_mut examples I code something like this:
fn make_capital(in_str : &str) -> String {
let mut v = String::from(in_str);
v.get_mut(0..1).map(|s| { s.make_ascii_uppercase(); &*s });
v
}

How do I convert from a char array [char; N] to a string slice &str?

Given a fixed-length char array such as:
let s: [char; 5] = ['h', 'e', 'l', 'l', 'o'];
How do I obtain a &str?

You can't without some allocation, which means you will end up with a String.
let s2: String = s.iter().collect();
The problem is that strings in Rust are not collections of chars, they are UTF-8, which is an encoding without a fixed size per character.
For example, the array in this case would take 5 x 32-bits for a total of 20 bytes. The data of the string would take 5 bytes total (although there's also 3 pointer-sized values, so the overall String takes more memory in this case).
We start with the array and call []::iter, which yields values of type &char. We then use Iterator::collect to convert the Iterator<Item = &char> into a String. This uses the iterator's size_hint to pre-allocate space in the String, reducing the need for extra allocations.

Another quick one-liner I didn't see above:
let whatever_char_array = ['h', 'e', 'l', 'l', 'o'];
let string_from_char_array = String::from_iter(whatever_char_array);
Note:
This feature (iterating over an array) was introduced recently. I tried looking for the exact rustc version, but could not...

I will give you a very simple functional solution but it's not the best one. You can learn some basics:
let s: [char; 5] = ['h', 'e', 'l', 'l', 'o'];
let mut str = String::from("");
for x in &s {
str.push(*x);
}
println!("{}", str);
Before the variable names you can put an underscore if you want to keep the signature, but in this simple example it is not necessary. The program starts by creating an empty mutable String so you can add elements (chars) to the String. Then we make a for loop over the s array by taking its reference. We add each element to the initial string. At the end you can return your string or just print it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Efficient ways to build new Strings in Rust - string

Related

Palindrome code in rust programming language

Use char as &str in HashMap

Get a random character from a string and append to another string

Why is capitalizing the first letter of a string so convoluted in Rust?

How do I convert from a char array [char; N] to a string slice &str?

Categories

Resources