String recursion in Rust - string

In Rust, I am trying to obtain all possible combinations of a-z characters up to a fixed length with no repeating letters.
For example, for a limited set of a-f and a length of 3 I should get:
abc
abd
abe
abf
acb
acd
ace
acf
adb
... etc
I've been struggling to do this through recursion and have been banging my head on ownership and borrows. The only way I've managed to do it is as follows, but this is cloning strings all over the place and is very inefficient. There are probably standard permutation/combination functions for this in the standard library, I don't know, but I'm interested in understanding how this can be done manually.
fn main() {
run(&String::new());
}
fn run(target: &String) {
for a in 97..123 { // ASCII a..z
if !target.contains(char::from(a)) {
let next = target.clone() + char::from(a).to_string().as_str(); // Working but terrible
if next.len() == 3 { // Required string size
println!("{}", next);
} else {
run(&next);
}
}
}
}

First off, a couple of remarks:
&String is kind of an anti-pattern that is rarely seen. It serves no purpose; all the functionality that String has over str requires mutability. So it should either be &mut String or &str.
97..123 is uncommon ... use 'a'..='z'.
Now to the actual problem:
As long as you pass a non-mutable string into the recursion, you won't get around cloning the data. I'd make the string mutable, then you can simply append and remove single characters from it.
Like this:
fn main() {
run(&mut String::new());
}
fn run(target: &mut String) {
for a in 'a'..='z' {
if !target.contains(a) {
target.push(a);
if target.len() == 3 {
// Required string size
println!("{}", target);
} else {
run(target);
}
target.pop();
}
}
}

Just for comprehensiveness, here is an alternate way to do it, when not using recursion. It does use .permutations() from crate Itertools.
Also, It's probably cleaner to return the String object from the function directly, instead of passing a mutable reference by argument.
use std::ops::RangeInclusive;
use itertools::Itertools;
fn main() {
println!("result: {}",combine('a'..='d', 3));
println!("result: {}",combine('a'..='g', 4));
println!("result: {}",combine('a'..='c', 3));
println!("result: {}",combine('a'..='c', 4)); // assertion fail
}
fn combine(range: RangeInclusive<char>, depth: usize) -> String
{
assert!( *range.end() as usize - *range.start() as usize + 1 >= depth);
let perms = range.permutations(depth);
let mut result = String::new();
perms.for_each(|mut item| {
item.push(' ');
result += &item.into_iter().collect::<String>();
});
result.pop(); // pop last superfluous space char
result
}

Related

How to pass &mut str and change the original mut str without a return?

I'm learning Rust from the Book and I was tackling the exercises at the end of chapter 8, but I'm hitting a wall with the one about converting words into Pig Latin. I wanted to see specifically if I could pass a &mut String to a function that takes a &mut str (to also accept slices) and modify the referenced string inside it so the changes are reflected back outside without the need of a return, like in C with a char **.
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
Also what do you think of the way I handled the chars and indexing inside strings?
use std::io::{self, Write};
fn main() {
let v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
for s in &v {
// cannot borrow `s` as mutable, as it is not declared as mutable
// cannot borrow data in a `&` reference as mutable
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
io::stdout().flush().unwrap();
}
fn to_pig_latin(mut s: &mut str) {
let first = s.chars().nth(0).unwrap();
let mut pig;
if "aeiouAEIOU".contains(first) {
pig = format!("{}-{}", s, "hay");
s = &mut pig[..]; // `pig` does not live long enough
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
pig = format!("{}-{}{}", word, first.to_lowercase(), "ay");
s = &mut pig[..]; // `pig` does not live long enough
}
}
Edit: here's the fixed code with the suggestions from below.
fn main() {
// added mut
let mut v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
// added mut
for mut s in &mut v {
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
println!();
}
// converted into &mut String
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay");
} else {
// added code to make the new first letter uppercase
let second = s.chars().nth(1).unwrap();
*s = format!(
"{}{}-{}ay",
second.to_uppercase(),
// the slice starts at the third char of the string, as if &s[2..]
&s[first.len_utf8() * 2..],
first.to_lowercase()
);
}
}
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
What you're trying to do can't work: with a mutable reference you can update the referee in-place, but this is extremely limited here:
a &mut str can't change length or anything of that matter
a &mut str is still just a reference, the memory has to live somewhere, here you're creating new Strings inside your function then trying to use these as the new backing buffers for the reference, which as the compiler tells you doesn't work: the String will be deallocated at the end of the function
What you could do is take an &mut String, that lets you modify the owned string itself in-place, which is much more flexible. And, in fact, corresponds exactly to your request: an &mut str corresponds to a char*, it's a pointer to a place in memory.
A String is also a pointer, so an &mut String is a double-pointer to a zone in memory.
So something like this:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
*s = format!("{}-{}", s, "hay");
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
*s = format!("{}-{}{}", word, first.to_lowercase(), "ay");
}
}
You can also likely avoid some of the complete string allocations by using somewhat finer methods e.g.
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
s.replace_range(first.len_utf8().., "");
write!(s, "-{}ay", first.to_lowercase()).unwrap();
}
}
although the replace_range + write! is not very readable and not super likely to be much of a gain, so that might as well be a format!, something along the lines of:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
*s = format!("{}-{}ay", &s[first.len_utf8()..], first.to_lowercase());
}
}

How do I convert reverse domain notation to PascalCase?

I want to convert "foo.bar.baz" to "FooBarBaz". My input will always be only ASCII. I tried:
let result = "foo.bar.baz"
.to_string()
.split(".")
.map(|x| x[0..1].to_string().to_uppercase() + &x[1..])
.fold("".to_string(), |acc, x| acc + &x);
println!("{}", result);
but that feels inefficient.
Your solution is a good start. You could probably make it work without heap allocations in the "functional" style; I prefer putting complex logic into normal for loops though.
Also I don't like assuming input is in ASCII without actually checking - this should work with any string.
You probably could also use String::with_capacity in your code to avoid reallocations in standard cases.
Playground
fn dotted_to_pascal_case(s: &str) -> String {
let mut result = String::with_capacity(s.len());
for part in s.split('.') {
let mut cs = part.chars();
if let Some(c) = cs.next() {
result.extend(c.to_uppercase());
}
result.push_str(cs.as_str());
}
result
}
fn main() {
println!("{}", dotted_to_pascal_case("foo.bar.baz"));
}
Stefan's answer is correct, but I decided to get rid of that first String allocation and go full-functional, without loops:
fn dotted_to_pascal_case(s: &str) -> String {
s.split('.')
.map(|piece| piece.chars())
.flat_map(|mut chars| {
chars
.next()
.expect("empty section between dots!")
.to_uppercase()
.chain(chars)
})
.collect()
}
fn main() {
println!("{}", dotted_to_pascal_case("foo.bar.baz"));
}

How to "crop" characters off the beginning of a string in Rust?

I want a function that can take two arguments (string, number of letters to crop off front) and return the same string except with the letters before character x gone.
If I write
let mut example = "stringofletters";
CropLetters(example, 3);
println!("{}", example);
then the output should be:
ingofletters
Is there any way I can do this?
In many uses it would make sense to simply return a slice of the input, avoiding any copy. Converting #Shepmaster's solution to use immutable slices:
fn crop_letters(s: &str, pos: usize) -> &str {
match s.char_indices().skip(pos).next() {
Some((pos, _)) => &s[pos..],
None => "",
}
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_letters(example, 3);
println!("{}", cropped);
}
Advantages over the mutating version are:
No copy is needed. You can call cropped.to_string() if you want a newly allocated result; but you don't have to.
It works with static string slices as well as mutable String etc.
The disadvantage is that if you really do have a mutable string you want to modify, it would be slightly less efficient as you'd need to allocate a new String.
Issues with your original code:
Functions use snake_case, types and traits use CamelCase.
"foo" is a string literal of type &str. These may not be changed. You will need something that has been heap-allocated, such as a String.
The call crop_letters(stringofletters, 3) would transfer ownership of stringofletters to the method, which means you wouldn't be able to use the variable anymore. You must pass in a mutable reference (&mut).
Rust strings are not ASCII, they are UTF-8. You need to figure out how many bytes each character requires. char_indices is a good tool here.
You need to handle the case of when the string is shorter than 3 characters.
Once you have the byte position of the new beginning of the string, you can use drain to move a chunk of bytes out of the string. We just drop these bytes and let the String move over the remaining bytes.
fn crop_letters(s: &mut String, pos: usize) {
match s.char_indices().nth(pos) {
Some((pos, _)) => {
s.drain(..pos);
}
None => {
s.clear();
}
}
}
fn main() {
let mut example = String::from("stringofletters");
crop_letters(&mut example, 3);
assert_eq!("ingofletters", example);
}
See Chris Emerson's answer if you don't actually need to modify the original String.
I found this answer which I don't consider really idiomatic:
fn crop_with_allocation(string: &str, len: usize) -> String {
string.chars().skip(len).collect()
}
fn crop_without_allocation(string: &str, len: usize) -> &str {
// optional length check
if string.len() < len {
return &"";
}
&string[len..]
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_with_allocation(example, 3);
println!("{}", cropped);
let cropped = crop_without_allocation(example, 3);
println!("{}", cropped);
}
my version
fn crop_str(s: &str, n: usize) -> &str {
let mut it = s.chars();
for _ in 0..n {
it.next();
}
it.as_str()
}
#[test]
fn test_crop_str() {
assert_eq!(crop_str("123", 1), "23");
assert_eq!(crop_str("ЖФ1", 1), "Ф1");
assert_eq!(crop_str("ЖФ1", 2), "1");
}

Using str and String interchangably

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]

Iterate over a string, n elements at a time

I'm trying to iterate over a string, but iterating in slices of length n instead of iterator over every character. The following code accomplishes this manually, but is there a more functional way to do this?
fn main() {
let string = "AAABBBCCC";
let offset = 3;
for (i, _) in string.chars().enumerate() {
if i % offset == 0 {
println!("{}", &string[i..(i+offset)]);
}
}
}
I would use a combination of Peekable and Take:
fn main() {
let string = "AAABBBCCC";
let mut z = string.chars().peekable();
while z.peek().is_some() {
let chunk: String = z.by_ref().take(3).collect();
println!("{}", chunk);
}
}
In other cases, Itertools::chunks might do the trick:
extern crate itertools;
use itertools::Itertools;
fn main() {
let string = "AAABBBCCC";
for chunk in &string.chars().chunks(3) {
for c in chunk {
print!("{}", c);
}
println!();
}
}
Standard warning about splitting strings
Be aware of issues with bytes / characters / code points / graphemes whenever you start splitting strings. With anything more complicated than ASCII characters, one character is not one byte and string slicing operates on bytes! There is also the concept of Unicode code points, but multiple Unicode characters may combine to form what a human thinks of as a single character. This stuff is non-trivial.
If you actually just have ASCII data, it may be worth it to store it as such, perhaps in a Vec<u8>. At the very least, I'd create a newtype that wraps a &str and only exposes ASCII-safe method and validates that it is ASCII when created.
chunks() is not available for &str because it is not really well-defined on strings - do you want chunks with length in bytes, or characters, or grapheme clusters? If you know in advance that your string is in ASCII you can use the following code:
use std::str;
fn main() {
let string = "AAABBBCCC";
for chunk in str_chunks(string, 3) {
println!("{}", chunk);
}
}
fn str_chunks<'a>(s: &'a str, n: usize) -> Box<Iterator<Item=&'a str>+'a> {
Box::new(s.as_bytes().chunks(n).map(|c| str::from_utf8(c).unwrap()))
}
However, it will break immediately if your strings have non-ASCII characters inside them. I'm pretty sure that it is possible to implement an iterator which splits a string into chunks of code points or grapheme clusters - it is just there is no such thing in the standard library now.
You can always implement your own iterator. Of course that still requires quite some code, but it's not at the location where you are working with the string. Therefor your loop stays readable.
#![feature(collections)]
struct StringChunks<'a> {
s: &'a str,
step: usize,
n: usize,
}
impl<'a> StringChunks<'a> {
fn new(s: &'a str, step: usize) -> StringChunks<'a> {
StringChunks {
s: s,
step: step,
n: s.chars().count(),
}
}
}
impl<'a> Iterator for StringChunks<'a> {
type Item = &'a str;
fn next(&mut self) -> Option<&'a str> {
if self.step > self.n {
return None;
}
let ret = self.s.slice_chars(0, self.step);
self.s = self.s.slice_chars(self.step, self.n);
self.n -= self.step;
Some(ret)
}
}
fn main() {
let string = "AAABBBCCC";
for s in StringChunks::new(string, 3) {
println!("{}", s);
}
}
Note that this splits after n unicode chars. So graphemes or similar might end up split up.

Resources