Reversing a string in Rust - rust

What is wrong with this:
fn main() {
let word: &str = "lowks";
assert_eq!(word.chars().rev(), "skwol");
}
I get an error like this:
error[E0369]: binary operation `==` cannot be applied to type `std::iter::Rev<std::str::Chars<'_>>`
--> src/main.rs:4:5
|
4 | assert_eq!(word.chars().rev(), "skwol");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: an implementation of `std::cmp::PartialEq` might be missing for `std::iter::Rev<std::str::Chars<'_>>`
= note: this error originates in a macro outside of the current crate
What is the correct way to do this?

Since, as #DK. suggested, .graphemes() isn't available on &str in stable, you might as well just do what #huon suggested in the comments:
fn main() {
let foo = "palimpsest";
println!("{}", foo.chars().rev().collect::<String>());
}

The first, and most fundamental, problem is that this isn't how you reverse a Unicode string. You are reversing the order of the code points, where you want to reverse the order of graphemes. There may be other issues with this that I'm not aware of. Text is hard.
The second issue is pointed out by the compiler: you are trying to compare a string literal to a char iterator. chars and rev don't produce new strings, they produce lazy sequences, as with iterators in general. The following works:
/*!
Add the following to your `Cargo.toml`:
```cargo
[dependencies]
unicode-segmentation = "0.1.2"
```
*/
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let word: &str = "loẅks";
let drow: String = word
// Split the string into an Iterator of &strs, where each element is an
// extended grapheme cluster.
.graphemes(true)
// Reverse the order of the grapheme iterator.
.rev()
// Collect all the chars into a new owned String.
.collect();
assert_eq!(drow, "skẅol");
// Print it out to be sure.
println!("drow = `{}`", drow);
}
Note that graphemes used to be in the standard library as an unstable method, so the above will break with sufficiently old versions of Rust. In that case, you need to use UnicodeSegmentation::graphemes(s, true) instead.

If you are just dealing with ASCII characters, you can make the reversal in place with the unstable reverse function for slices.
It is doing something like that:
fn main() {
let mut slice = *b"lowks";
let end = slice.len() - 1;
for i in 0..end / 2 {
slice.swap(i, end - i);
}
assert_eq!(std::str::from_utf8(&slice).unwrap(), "skwol");
}
Playground

Related

How to generate a random String of alphanumeric chars?

The first part of the question is probably pretty common and there are enough code samples that explain how to generate a random string of alphanumerics. The piece of code I use is from here.
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect();
println!("{}", rand_string);
}
This piece of code does however not compile, (note: I'm on nightly):
error[E0277]: a value of type `String` cannot be built from an iterator over elements of type `u8`
--> src/main.rs:8:10
|
8 | .collect();
| ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=u8>`
|
= help: the trait `FromIterator<u8>` is not implemented for `String`
Ok, the elements that are generated are of type u8. So I guess this is an array or vector of u8:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let r = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>();
let s = String::from_utf8_lossy(&r);
println!("{}", s);
}
And this compiles and works!
2dCsTqoNUR1f0EzRV60IiuHlaM4TfK
All good, except that I would like to ask if someone could explain what exactly happens regarding the types and how this can be optimised.
Questions
.sample_iter(&Alphanumeric) produces u8 and not chars?
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
.sample_iter(&Alphanumeric) produces u8 and not chars?
Yes, this was changed in rand v0.8. You can see in the docs for 0.7.3:
impl Distribution<char> for Alphanumeric
But then in the docs for 0.8.0:
impl Distribution<u8> for Alphanumeric
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
There are a couple of ways to do this, the most obvious being to just cast every u8 to a char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(|x| x as char)
.collect();
Or, using the From<u8> instance of char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(char::from)
.collect();
Of course here, since you know every u8 must be valid UTF-8, you can use String::from_utf8_unchecked, which is faster than from_utf8_lossy (although probably around the same speed as the as char method):
let s = unsafe {
String::from_utf8_unchecked(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
)
};
If, for some reason, the unsafe bothers you and you want to stay safe, then you can use the slower String::from_utf8 and unwrap the Result so you get a panic instead of UB (even though the code should never panic or UB):
let s = String::from_utf8(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
).unwrap();
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
First of all, trust me, you don't want arrays of chars. They are not fun to work with. If you want a stack string, have a u8 array then use a function like std::str::from_utf8 or the faster std::str::from_utf8_unchecked (again only usable since you know valid utf8 will be generated.)
As to optimizing the heap allocation away, refer to this answer. Basically, it's not possible with a bit of hackiness/ugliness (such as making your own function that collects an iterator into an array of 30 elements).
Once const generics are finally stabilized, there'll be a much prettier solution.
The first example in the docs for rand::distributions::Alphanumeric shows that if you want to convert the u8s into chars you should map them using the char::from function:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.map(char::from) // map added here
.take(30)
.collect();
println!("{}", rand_string);
}
playground

How to get the last character of a &str?

In Python, this would be final_char = mystring[-1]. How can I do the same in Rust?
I have tried
mystring[mystring.len() - 1]
but I get the error the type 'str' cannot be indexed by 'usize'
That is how you get the last char (which may not be what you think of as a "character"):
mystring.chars().last().unwrap();
Use unwrap only if you are sure that there is at least one char in your string.
Warning: About the general case (do the same thing as mystring[-n] in Python): UTF-8 strings are not to be used through indexing, because indexing is not a O(1) operation (a string in Rust is not an array). Please read this for more information.
However, if you want to index from the end like in Python, you must do this in Rust:
mystring.chars().rev().nth(n - 1) // Python: mystring[-n]
and check if there is such a character.
If you miss the simplicity of Python syntax, you can write your own extension:
trait StrExt {
fn from_end(&self, n: usize) -> char;
}
impl<'a> StrExt for &'a str {
fn from_end(&self, n: usize) -> char {
self.chars().rev().nth(n).expect("Index out of range in 'from_end'")
}
}
fn main() {
println!("{}", "foobar".from_end(2)) // prints 'b'
}
One option is to use slices. Here's an example:
let len = my_str.len();
let final_str = &my_str[len-1..];
This returns a string slice from position len-1 through the end of the string. That is to say, the last byte of your string. If your string consists of only ASCII values, then you'll get the final character of your string.
The reason why this only works with ASCII values is because they only ever require one byte of storage. Anything else, and Rust is likely to panic at runtime. This is what happens when you try to slice out one byte from a 2-byte character.
For a more detailed explanation, please see the strings section of the Rust book.
As #Boiethios mentioned
let last_ch = mystring.chars().last().unwrap();
Or
let last_ch = codes.chars().rev().nth(0).unwrap();
I would rather have (how hard is that!?)
let last_ch = codes.chars(-1); // Not implemented as rustc 1.56.1

Why does a truncated string Rust print as an empty pair of parenthesis?

I have
use std::io;
fn main() {
println!("CHAR COUNT");
let mut guess = String::new();
io::stdin().read_line(&mut guess).expect(
"Failed to read line",
);
let string_length = guess.len() - 2;
let correct_string_length = guess.truncate(string_length);
println!("Your text: {}", guess);
println!("Your texts wrong length is: {}", string_length);
println!("Your texts correct length: {}", correct_string_length);
}
The last line gives me
error[E0277]: the trait bound `(): std::fmt::Display` is not satisfied
--> src/main.rs:15:47
|
15 | println!("Your texts correct length: {}", correct_string_length);
| ^^^^^^^^^^^^^^^^^^^^^ `()` cannot be formatted with the default formatter; try using `:?` instead if you are using a format string
|
= help: the trait `std::fmt::Display` is not implemented for `()`
= note: required by `std::fmt::Display::fmt`
What am I doing wrong? If I use {:?} then I get () instead of a formatted string.
When in doubt, go to the docs - here's the function signature of String::truncate:
fn truncate(&mut self, new_len: usize)
Note two things:
It takes self as &mut.
It has no return value.
From that, the problem becomes pretty clear - truncate does not return a new truncated string, it truncates the existing string in place.
This might seem a little unintuitive at first, but Rust APIs tend not to allocate new objects in memory unless you specifically ask them to - if you're never going to use guess again, then it'd be ineffecient to create a whole new String. If you wanted to make a truncated copy, then you'd need to be explicit:
let truncated = guess.clone();
truncated.truncate(string_length);
Or if you just wanted to reference part of the existing string, you could do what Ryan's answer suggests.
Just to compliment the other answers here..
Attempting to truncate a string in Rust that is not on a character boundary will cause a runtime panic.
So while this works now:
let correct_string_length = &guess[..string_length];
If you're trying to truncate a string with wider characters, your code will panic at runtime. This is especially true if you're truncating user input.. who knows what it could be. For example:
fn main() {
let s = "Hello, 世界";
println!("{}", &s[..8]); // <--- panic
}
You can use the str::is_char_boundary(usize) method to make sure you're not about to break up a wide character accidentally:
fn print_safely(s: &str, mut idx: usize) {
loop {
if s.is_char_boundary(idx) || idx >= s.len() - 1 {
break;
}
idx += 1;
}
println!("{}", &s[..idx]);
}
User input could be anything so this is just something to consider.
Playground link: http://play.integer32.com/?gist=632ff6c81c56f9ba52e0837ff25939bc&version=stable
truncate operates in place, which is why it returns (). Looks like you’re just looking for a regular non-mutating substring:
let correct_string_length = &guess[..string_length];

Searching for a matching subslice? [duplicate]

I have a &[u8] slice over a binary buffer. I need to parse it, but a lot of the methods that I would like to use (such as str::find) don't seem to be available on slices.
I've seen that I can covert both by buffer slice and my pattern to str by using from_utf8_unchecked() but that seems a little dangerous (and also really hacky).
How can I find a subsequence in this slice? I actually need the index of the pattern, not just a slice view of the parts, so I don't think split will work.
Here's a simple implementation based on the windows iterator.
fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
haystack.windows(needle.len()).position(|window| window == needle)
}
fn main() {
assert_eq!(find_subsequence(b"qwertyuiop", b"tyu"), Some(4));
assert_eq!(find_subsequence(b"qwertyuiop", b"asd"), None);
}
The find_subsequence function can also be made generic:
fn find_subsequence<T>(haystack: &[T], needle: &[T]) -> Option<usize>
where for<'a> &'a [T]: PartialEq
{
haystack.windows(needle.len()).position(|window| window == needle)
}
I don't think the standard library contains a function for this. Some libcs have memmem, but at the moment the libc crate does not wrap this. You can use the twoway crate however. rust-bio implements some pattern matching algorithms, too. All of those should be faster than using haystack.windows(..).position(..)
I found the memmem crate useful for this task:
use memmem::{Searcher, TwoWaySearcher};
let search = TwoWaySearcher::new("dog".as_bytes());
assert_eq!(
search.search_in("The quick brown fox jumped over the lazy dog.".as_bytes()),
Some(41)
);
How about Regex on bytes? That looks very powerful. See this Rust playground demo.
extern crate regex;
use regex::bytes::Regex;
fn main() {
//see https://doc.rust-lang.org/regex/regex/bytes/
let re = Regex::new(r"say [^,]*").unwrap();
let text = b"say foo, say bar, say baz";
// Extract all of the strings without the null terminator from each match.
// The unwrap is OK here since a match requires the `cstr` capture to match.
let cstrs: Vec<usize> =
re.captures_iter(text)
.map(|c| c.get(0).unwrap().start())
.collect();
assert_eq!(cstrs, vec![0, 9, 18]);
}

Repeat string with integer multiplication

Is there an easy way to do the following (from Python) in Rust?
>>> print ("Repeat" * 4)
RepeatRepeatRepeatRepeat
I'm starting to learn the language, and it seems String doesn't override Mul, and I can't find any discussion anywhere on a compact way of doing this (other than a map or loop).
Rust 1.16+
str::repeat is now available:
fn main() {
let repeated = "Repeat".repeat(4);
println!("{}", repeated);
}
Rust 1.0+
You can use iter::repeat:
use std::iter;
fn main() {
let repeated: String = iter::repeat("Repeat").take(4).collect();
println!("{}", repeated);
}
This also has the benefit of being more generic — it creates an infinitely repeating iterator of any type that is cloneable.
This one doesn't use Iterator::map but Iterator::fold instead:
fn main() {
println!("{:?}", (1..5).fold(String::new(), |b, _| b + "Repeat"));
}

Resources