How to generate a random String of alphanumeric chars? - string

The first part of the question is probably pretty common and there are enough code samples that explain how to generate a random string of alphanumerics. The piece of code I use is from here.
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect();
println!("{}", rand_string);
}
This piece of code does however not compile, (note: I'm on nightly):
error[E0277]: a value of type `String` cannot be built from an iterator over elements of type `u8`
--> src/main.rs:8:10
|
8 | .collect();
| ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=u8>`
|
= help: the trait `FromIterator<u8>` is not implemented for `String`
Ok, the elements that are generated are of type u8. So I guess this is an array or vector of u8:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let r = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>();
let s = String::from_utf8_lossy(&r);
println!("{}", s);
}
And this compiles and works!
2dCsTqoNUR1f0EzRV60IiuHlaM4TfK
All good, except that I would like to ask if someone could explain what exactly happens regarding the types and how this can be optimised.
Questions
.sample_iter(&Alphanumeric) produces u8 and not chars?
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.

.sample_iter(&Alphanumeric) produces u8 and not chars?
Yes, this was changed in rand v0.8. You can see in the docs for 0.7.3:
impl Distribution<char> for Alphanumeric
But then in the docs for 0.8.0:
impl Distribution<u8> for Alphanumeric
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
There are a couple of ways to do this, the most obvious being to just cast every u8 to a char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(|x| x as char)
.collect();
Or, using the From<u8> instance of char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(char::from)
.collect();
Of course here, since you know every u8 must be valid UTF-8, you can use String::from_utf8_unchecked, which is faster than from_utf8_lossy (although probably around the same speed as the as char method):
let s = unsafe {
String::from_utf8_unchecked(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
)
};
If, for some reason, the unsafe bothers you and you want to stay safe, then you can use the slower String::from_utf8 and unwrap the Result so you get a panic instead of UB (even though the code should never panic or UB):
let s = String::from_utf8(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
).unwrap();
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
First of all, trust me, you don't want arrays of chars. They are not fun to work with. If you want a stack string, have a u8 array then use a function like std::str::from_utf8 or the faster std::str::from_utf8_unchecked (again only usable since you know valid utf8 will be generated.)
As to optimizing the heap allocation away, refer to this answer. Basically, it's not possible with a bit of hackiness/ugliness (such as making your own function that collects an iterator into an array of 30 elements).
Once const generics are finally stabilized, there'll be a much prettier solution.

The first example in the docs for rand::distributions::Alphanumeric shows that if you want to convert the u8s into chars you should map them using the char::from function:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.map(char::from) // map added here
.take(30)
.collect();
println!("{}", rand_string);
}
playground

Related

How to rotate a vector without standard library?

I'm getting into Rust and Arduino at the same time.
I was programming my LCD display to show a long string by rotating it through the top column of characters. Means: Every second I shift all characters by one position and show the new String.
This was fairly complex in the Arduino language, especially because I had to know the size of the String at compile time (given my limited knowledge).
Since I'd like to use Rust in the long term, I was curious to see if that could be done more easily in a modern language. Not so much.
This is the code I came up with, after hours of experimentation:
#![no_std]
extern crate alloc;
use alloc::{vec::Vec};
fn main() {
}
fn rotate_by<T: Copy>(rotate: Vec<T>, by: isize) -> Vec<T> {
let real_by = modulo(by, rotate.len() as isize) as usize;
Vec::from_iter(rotate[real_by..].iter().chain(rotate[..real_by].iter()).cloned())
}
fn modulo(a: isize, b: isize) -> isize {
a - b * (a as f64 /b as f64).floor() as isize
}
mod tests {
use super::*;
#[test]
fn test_rotate_five() {
let chars: Vec<_> = "I am the string and you should rotate me! ".chars().collect();
let res_chars: Vec<_> = "the string and you should rotate me! I am ".chars().collect();
assert_eq!(rotate_by(chars, 5), res_chars);
}
}
My questions are:
Could you provide an optimized version of this function? I'm aware that there already is Vec::rotate but it uses unsafe code and can panic, which I would like to avoid (by returning a Result).
Explain whether or not it is possible to achieve this in-place without unsafe code (I failed).
Is Vec<_> the most efficient data structure to work with? I tried hard to use [char], which I thought would be more efficient, but then I have to know the size at compile time, which hardly works. I thought Rust arrays would be similar to Java arrays, which can be sized at runtime yet are also fixed size once created, but they seem to have a lot more constraints.
Oh and also what happens if I index into a vector at an invalid index? Will it panic? Can I do this better? Without "manually" checking the validity of the slice indices?
I realize that's a lot of questions, but I'm struggling and this is bugging me a lot, so if somebody could set me straight it would be much appreciated!
You can use slice::rotate_left and slice::rotate_right:
#![no_std]
extern crate alloc;
use alloc::vec::Vec;
fn rotate_by<T>(data: &mut [T], by: isize) {
if by > 0 {
data.rotate_left(by.unsigned_abs());
} else {
data.rotate_right(by.unsigned_abs());
}
}
I made it rotate in-place because that is more efficient. If you don't want to do it in-place you still have the option of cloning the vector first, so this is more flexible than if the function creates a new vector, as you have done, because you aren't be able to opt out of that when you call it.
Notice that rotate_by takes a mutable slice, but you can still pass a mutable reference to a vector, because of deref coercion.
#[test]
fn test_rotate_five() {
let mut chars: Vec<_> = "I am the string and you should rotate me! ".chars().collect();
let res_chars: Vec<_> = "the string and you should rotate me! I am ".chars().collect();
rotate_by(&mut chars, 5);
assert_eq!(chars, res_chars);
}
There are some edge cases with moving chars around like this because some valid UTF-8 will contain grapheme clusters that are made up of multiple codepoints (chars in Rust). This will result in strange effects when a grapheme cluster is split between the start and end of the string. For example, rotating "abcdéfghijk" by 5 will result in "efghijkabcd\u{301}", with the acute accent stranded on its own, away from the 'e'.
If your strings are ASCII then you don't have to worry about that, but then you can also just treat them as byte strings anyway:
#[test]
fn test_rotate_five_ascii() {
let mut chars = b"I am the string and you should rotate me! ".clone();
let res_chars = b"the string and you should rotate me! I am ";
rotate_by(&mut chars, 5);
assert_eq!(chars, &res_chars[..]);
}

Unknown size at compile time when trying to print string contents in Rust

I have a couple of pieces of code, once errors out and the other doesn't, and I don't understand why.
The one that errors out when compiling:
fn main() {
let s1 = String::from("hello");
println!("{}", *s1);
}
This throws: doesn't have a size known at compile-time, on the line println!("{}", *s1);
The one that works:
fn main() {
let s1 = String::from("hello");
print_string(&s1);
}
fn print_string(s1: &String) {
println!("{}", *s1);
}
Why is this happening? Aren't both correct ways to access the string contents and printing them?
In the first snippet you’re dereferencing a String. This yields an str which is a dynamically sized type (sometimes called unsized types in older texts). DSTs are somewhat difficult to use directly
In the second snippet you’re dereferencing a &String, which yields a regular String, which is a normal sized type.
In both cases the dereference is completely useless, why are you even using one?

Cast vector of i8 to vector of u8 in Rust? [duplicate]

This question already has answers here:
How do I convert a Vec<T> to a Vec<U> without copying the vector?
(2 answers)
Closed 3 years ago.
Is there a better way to cast Vec<i8> to Vec<u8> in Rust except for these two?
creating a copy by mapping and casting every entry
using std::transmute
The (1) is slow, the (2) is "transmute should be the absolute last resort" according to the docs.
A bit of background maybe: I'm getting a Vec<i8> from the unsafe gl::GetShaderInfoLog() call and want to create a string from this vector of chars by using String::from_utf8().
The other answers provide excellent solutions for the underlying problem of creating a string from Vec<i8>. To answer the question as posed, creating a Vec<u8> from data in a Vec<i8> can be done without copying or transmuting the vector. As pointed out by #trentcl, transmuting the vector directly constitutes undefined behavior because Vec is allowed to have different layout for different types.
The correct (though still requiring the use of unsafe) way to transfer a vector's data without copying it is:
obtain the *mut i8 pointer to the data in the vector, along with its length and capacity
leak the original vector to prevent it from freeing the data
use Vec::from_raw_parts to build a new vector, giving it the pointer cast to *mut u8 - this is the unsafe part, because we are vouching that the pointer contains valid and initialized data, and that it is not in use by other objects, and so on.
This is not UB because the new Vec is given the pointer of the correct type from the start. Code (playground):
fn vec_i8_into_u8(v: Vec<i8>) -> Vec<u8> {
// ideally we'd use Vec::into_raw_parts, but it's unstable,
// so we have to do it manually:
// first, make sure v's destructor doesn't free the data
// it thinks it owns when it goes out of scope
let mut v = std::mem::ManuallyDrop::new(v);
// then, pick apart the existing Vec
let p = v.as_mut_ptr();
let len = v.len();
let cap = v.capacity();
// finally, adopt the data into a new Vec
unsafe { Vec::from_raw_parts(p as *mut u8, len, cap) }
}
fn main() {
let v = vec![-1i8, 2, 3];
assert!(vec_i8_into_u8(v) == vec![255u8, 2, 3]);
}
transmute on a Vec is always, 100% wrong, causing undefined behavior, because the layout of Vec is not specified. However, as the page you linked also mentions, you can use raw pointers and Vec::from_raw_parts to perform this correctly. user4815162342's answer shows how.
(std::mem::transmute is the only item in the Rust standard library whose documentation consists mostly of suggestions for how not to use it. Take that how you will.)
However, in this case, from_raw_parts is also unnecessary. The best way to deal with C strings in Rust is with the wrappers in std::ffi, CStr and CString. There may be better ways to work this in to your real code, but here's one way you could use CStr to borrow a Vec<c_char> as a &str:
const BUF_SIZE: usize = 1000;
let mut info_log: Vec<c_char> = vec![0; BUF_SIZE];
let mut len: usize;
unsafe {
gl::GetShaderInfoLog(shader, BUF_SIZE, &mut len, info_log.as_mut_ptr());
}
let log = Cstr::from_bytes_with_nul(info_log[..len + 1])
.expect("Slice must be nul terminated and contain no nul bytes")
.to_str()
.expect("Slice must be valid UTF-8 text");
Notice there is no unsafe code except to call the FFI function; you could also use with_capacity + set_len (as in wasmup's answer) to skip initializing the Vec to 1000 zeros, and use from_bytes_with_nul_unchecked to skip checking the validity of the returned string.
See this:
fn get_compilation_log(&self) -> String {
let mut len = 0;
unsafe { gl::GetShaderiv(self.id, gl::INFO_LOG_LENGTH, &mut len) };
assert!(len > 0);
let mut buf = Vec::with_capacity(len as usize);
let buf_ptr = buf.as_mut_ptr() as *mut gl::types::GLchar;
unsafe {
gl::GetShaderInfoLog(self.id, len, std::ptr::null_mut(), buf_ptr);
buf.set_len(len as usize);
};
match String::from_utf8(buf) {
Ok(log) => log,
Err(vec) => panic!("Could not convert compilation log from buffer: {}", vec),
}
}
See ffi:
let s = CStr::from_ptr(strz_ptr).to_str().unwrap();
Doc

Reversing a string in Rust

What is wrong with this:
fn main() {
let word: &str = "lowks";
assert_eq!(word.chars().rev(), "skwol");
}
I get an error like this:
error[E0369]: binary operation `==` cannot be applied to type `std::iter::Rev<std::str::Chars<'_>>`
--> src/main.rs:4:5
|
4 | assert_eq!(word.chars().rev(), "skwol");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: an implementation of `std::cmp::PartialEq` might be missing for `std::iter::Rev<std::str::Chars<'_>>`
= note: this error originates in a macro outside of the current crate
What is the correct way to do this?
Since, as #DK. suggested, .graphemes() isn't available on &str in stable, you might as well just do what #huon suggested in the comments:
fn main() {
let foo = "palimpsest";
println!("{}", foo.chars().rev().collect::<String>());
}
The first, and most fundamental, problem is that this isn't how you reverse a Unicode string. You are reversing the order of the code points, where you want to reverse the order of graphemes. There may be other issues with this that I'm not aware of. Text is hard.
The second issue is pointed out by the compiler: you are trying to compare a string literal to a char iterator. chars and rev don't produce new strings, they produce lazy sequences, as with iterators in general. The following works:
/*!
Add the following to your `Cargo.toml`:
```cargo
[dependencies]
unicode-segmentation = "0.1.2"
```
*/
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let word: &str = "loẅks";
let drow: String = word
// Split the string into an Iterator of &strs, where each element is an
// extended grapheme cluster.
.graphemes(true)
// Reverse the order of the grapheme iterator.
.rev()
// Collect all the chars into a new owned String.
.collect();
assert_eq!(drow, "skẅol");
// Print it out to be sure.
println!("drow = `{}`", drow);
}
Note that graphemes used to be in the standard library as an unstable method, so the above will break with sufficiently old versions of Rust. In that case, you need to use UnicodeSegmentation::graphemes(s, true) instead.
If you are just dealing with ASCII characters, you can make the reversal in place with the unstable reverse function for slices.
It is doing something like that:
fn main() {
let mut slice = *b"lowks";
let end = slice.len() - 1;
for i in 0..end / 2 {
slice.swap(i, end - i);
}
assert_eq!(std::str::from_utf8(&slice).unwrap(), "skwol");
}
Playground

How to shuffle a str in place

I want to shuffle a String in place in Rust, but I seem to miss something. The fix is probably trivial...
use std::rand::{Rng, thread_rng};
fn main() {
// I want to shuffle this string...
let mut value: String = "SomeValue".to_string();
let mut bytes = value.as_bytes();
let mut slice: &mut [u8] = bytes.as_mut_slice();
thread_rng().shuffle(slice);
println!("{}", value);
}
The error I get is
<anon>:8:36: 8:41 error: cannot borrow immutable dereference of `&`-pointer `*bytes` as mutable
<anon>:8 let mut slice: &mut [u8] = bytes.as_mut_slice();
^~~~~
I read about String::as_mut_vec() but it's unsafe so I'd rather not use it.
There's no very good way to do this, partly due to the nature of the UTF-8 encoding of strings, and partly due to the inherent properties of Unicode and text.
There's at least three layers of things that could be shuffled in a UTF-8 string:
the raw bytes
the encoded codepoints
the graphemes
Shuffling raw bytes is likely to give an invalid UTF-8 string as output unless the string is entirely ASCII. Non-ASCII characters are encoded as special sequences of multiple bytes, and shuffling these will almostly certainly not get them in the right order at the end. Hence shuffling bytes is often not good.
Shuffling codepoints (char in Rust) makes a little bit more sense, but there is still the concept of "special sequences", where so-called combining characters can be layered on to a single letter adding diacritics etc (e.g. letters like ä can be written as a plus U+0308, the codepoint representing the diaeresis). Hence shuffling characters won't give an invalid UTF-8 string, but it may break up these codepoint sequences and give nonsense output.
This brings me to graphemes: the sequences of codepoints that make up a single visible character (like ä is still a single grapheme when written as one or as two codepoints). This will give the most reliably sensible answer.
Then, once you've decided which you want to shuffle the shuffling strategy can be made:
if the string is guaranteed to be purely ASCII, shuffling the bytes with .shuffle is sensible (with the ASCII assumption, this is equivalent to the others)
otherwise, there's no standard way to operate in-place, one would get the elements as an iterator (.chars() for codepoints or .graphemes(true) for graphemes), place them into a vector with .collect::<Vec<_>>(), shuffle the vector, and then collect everything back into a new String with e.g. .iter().map(|x| *x).collect::<String>().
The difficulty of handling codepoints and graphemes is because UTF-8 does not encode them as fixed width, so there's no way to take a random codepoint/grapheme out and insert it somewhere else, or otherwise swap two elements efficiently... Without just decoding everything into an external Vec.
Not being in-place is unfortunate, but strings are hard.
(If your strings are guaranteed to be ASCII, then using a type like the Ascii provided by ascii would be a good way to keep things straight, at the type-level.)
As an example of the difference of the three things, take a look at:
fn main() {
let s = "U͍̤͕̜̲̼̜n̹͉̭͜ͅi̷̪c̠͍̖̻o̸̯̖de̮̻͍̤";
println!("bytes: {}", s.bytes().count());
println!("chars: {}", s.chars().count());
println!("graphemes: {}", s.graphemes(true).count());
}
It prints:
bytes: 57
chars: 32
graphemes: 7
(Generate your own, it demonstrates putting multiple combining character on to a single letter.)
Putting together the suggestion above:
use std::rand::{Rng, thread_rng};
fn str_shuffled(s: &str) -> String {
let mut graphemes = s.graphemes(true).collect::<Vec<&str>>();
let mut gslice = graphemes.as_mut_slice();
let mut rng = thread_rng();
rng.shuffle(gslice);
gslice.iter().map(|x| *x).collect::<String>()
}
fn main() {
println!("{}", str_shuffled("Hello, World!"));
println!("{}", str_shuffled("selam dünya"));
println!("{}", str_shuffled("你好世界"));
println!("{}", str_shuffled("γειά σου κόσμος"));
println!("{}", str_shuffled("Здравствулте мир"));
}
I am also a beginner with Rust, but what about:
fn main() {
// I want to shuffle this string...
let value = "SomeValue".to_string();
let mut bytes = value.into_bytes();
bytes[0] = bytes[1]; // Shuffle takes place.. sorry but std::rand::thread_rng is not available in the Rust installed on my current machine.
match String::from_utf8(bytes) { // Should not copy the contents according to documentation.
Ok(s) => println!("{}", s),
_ => println!("Error occurred!")
}
}
Also keep in mind that Rust default string encoding is UTF-8 when fiddling around with sequences of bytes. ;)
This was a great suggestion, lead me to the following solution, thanks!
use std::rand::{Rng, thread_rng};
fn main() {
// I want to shuffle this string...
let value: String = "SomeValue".to_string();
let mut bytes = value.into_bytes();
thread_rng().shuffle(&mut *bytes.as_mut_slice());
match String::from_utf8(bytes) { // Should not copy the contents according to documentation.
Ok(s) => println!("{}", s),
_ => println!("Error occurred!")
}
}
rustc 0.13.0-nightly (ad9e75938 2015-01-05 00:26:28 +0000)

Resources