How to change a String into a Vec and can also modify Vec's value in Rust? - rust

I want to change a String into a vector of bytes and also modify its value, I have looked up and find How do I convert a string into a vector of bytes in rust?
but this can only get a reference and I cannot modify the vector. I want a to be 0, b to be 1 and so on, so after changing it into bytes I also need to subtract 97. Here is my attempt:
fn main() {
let s: String = "abcdefg".to_string();
let mut vec = (s.as_bytes()).clone();
println!("{:?}", vec);
for i in 0..vec.len() {
vec[i] -= 97;
}
println!("{:?}", vec);
}
but the compiler says
error[E0594]: cannot assign to `vec[_]`, which is behind a `&` reference
Can anyone help me to fix this?

You could get a Vec<u8> out of the String with the into_bytes method. An even better way, though, may be to iterate over the String's bytes with the bytes method, do the maths on the fly, and then collect the result:
fn main() {
let s = "abcdefg";
let vec: Vec<u8> = s.bytes().map(|b| b - b'a').collect();
println!("{:?}", vec); // [0, 1, 2, 3, 4, 5, 6]
}
But as #SvenMarnach correctly points out, this won't re-use s's buffer but allocate a new one. So, unless you need s again, the into_bytes method will be more efficient.

Strings in Rust are encoded in UTF-8. The (safe) interface of the String type enforces that the underlying buffer always is valid UTF-8, so it can't allow direct arbitrary byte modifications. However, you can convert a String into a Vec<u8> using the into_bytes() mehod. You can then modify the vector, and potentially convert it back to a string using String::from_utf8() if desired. The last step will verify that the buffer still is vaid UTF-8, and will fail if it isn't.
Instead of modifying the bytes of the string, you could also consider modifying the characters, which are potentially encoded by multiple bytes in the UTF-8 encoding. You can iterate over the characters of the string using the chars() method, convert each character to whatever you want, and then collect into a new string, or alternatively into a vector of integers, depending on your needs.

To understand what's going on, check the type of the vec variable. If you don't have an IDE/editor that can display the type to you, you can do this:
let mut vec: () = (s.as_bytes()).clone();
The resulting error message is explanative:
3 | let mut vec: () = (s.as_bytes()).clone();
| -- ^^^^^^^^^^^^^^^^^^^^^^ expected `()`, found `&[u8]`
So, what's happening is that the .clone() simply cloned the reference returned by as_bytes(), and didn't create a Vec<u8> from the &[u8]. In general, you can use .to_owned() in this kind of case, but in this specific case, using .into_bytes() on the String is best.

Related

How to reassign mutable String? [duplicate]

Why does Rust have String and str? What are the differences between String and str? When does one use String instead of str and vice versa? Is one of them getting deprecated?
String is the dynamic heap string type, like Vec: use it when you need to own or modify your string data.
str is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer. This means that str most commonly2 appears as &str: a reference to some UTF-8 data, normally called a "string slice" or just a "slice". A slice is just a view onto some data, and that data can be anywhere, e.g.
In static storage: a string literal "foo" is a &'static str. The data is hardcoded into the executable and loaded into memory when the program runs.
Inside a heap allocated String: String dereferences to a &str view of the String's data.
On the stack: e.g. the following creates a stack-allocated byte array, and then gets a view of that data as a &str:
use std::str;
let x: &[u8] = &[b'a', b'b', b'c'];
let stack_str: &str = str::from_utf8(x).unwrap();
In summary, use String if you need owned string data (like passing strings to other threads, or building them at runtime), and use &str if you only need a view of a string.
This is identical to the relationship between a vector Vec<T> and a slice &[T], and is similar to the relationship between by-value T and by-reference &T for general types.
1 A str is fixed-length; you cannot write bytes beyond the end, or leave trailing invalid bytes. Since UTF-8 is a variable-width encoding, this effectively forces all strs to be immutable in many cases. In general, mutation requires writing more or fewer bytes than there were before (e.g. replacing an a (1 byte) with an ä (2+ bytes) would require making more room in the str). There are specific methods that can modify a &mut str in place, mostly those that handle only ASCII characters, like make_ascii_uppercase.
2 Dynamically sized types allow things like Rc<str> for a sequence of reference counted UTF-8 bytes since Rust 1.2. Rust 1.21 allows easily creating these types.
I have a C++ background and I found it very useful to think about String and &str in C++ terms:
A Rust String is like a std::string; it owns the memory and does the dirty job of managing memory.
A Rust &str is like a char* (but a little more sophisticated); it points us to the beginning of a chunk in the same way you can get a pointer to the contents of std::string.
Are either of them going to disappear? I do not think so. They serve two purposes:
String keeps the buffer and is very practical to use. &str is lightweight and should be used to "look" into strings. You can search, split, parse, and even replace chunks without needing to allocate new memory.
&str can look inside of a String as it can point to some string literal. The following code needs to copy the literal string into the String managed memory:
let a: String = "hello rust".into();
The following code lets you use the literal itself without a copy (read-only though):
let a: &str = "hello rust";
It is str that is analogous to String, not the slice of it.
An str is a string literal, basically a pre-allocated text:
"Hello World"
This text has to be stored somewhere, so it is stored in the data section of the executable file along with the program’s machine code, as sequence of bytes ([u8]). Because text can be of any length, they are dynamically-sized:
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ H │ e │ l │ l │ o │ │ W │ o │ r │ l │ d │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ 72 │ 101 │ 108 │ 108 │ 111 │ 32 │ 87 │ 111 │ 114 │ 108 │ 100 │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
We need a way to access a stored text and that is where the slice comes in.
A slice,[T], is a view into a block of memory. Whether mutable or not, a slice always borrows and that is why it is always behind a pointer, &.
Lets explain the meaning of being dynamically sized. Some programming languages, like C, appends a zero byte (\0) at the end of its strings and keeps a record of the starting address. To determine a string's length, program has to walk through the raw bytes from starting position until finding this zero byte. So, length of a text can be of any size hence it is dynamically sized.
However Rust takes a different approach: It uses a slice. A slice stores the address where a str starts and how many byte it takes. It is better than appending zero byte because calculation is done in advance during compilation. Since text can be of any size, from type system perspective it is still dynamically sized.
So, "Hello World" expression returns a fat pointer, containing both the address of the actual data and its length. This pointer will be our handle to the actual data and it will also be stored in our program. Now data is behind a pointer and the compiler knows its size at compile time.
Since text is stored in the source code, it will be valid for the entire lifetime of the running program, hence will have the static lifetime.
So, return value of "Hello Word" expression should reflect these two characteristics, and it does:
let s: &'static str = "Hello World";
You may ask why its type is written as str but not as [u8], it is because data is always guaranteed to be a valid UTF-8 sequence. Not all UTF-8 characters are single byte, some take 4 bytes. So [u8] would be inaccurate.
If you disassemble a compiled Rust program and inspect the executable file, you will see multiple strs are stored adjacent to each other in the data section without any indication where one starts and the other ends.
Compiler takes it even further. If identical static text is used at multiple locations in your program, Rust compiler will optimize your program and create a single binary block in the executable's data section and each slice in your code point to this binary block.
For example, compiler creates a single continuous binary with the content of "Hello World" for the following code even though we use three different literals with "Hello World":
let x: &'static str = "Hello World";
let y: &'static str = "Hello World";
let z: &'static str = "Hello World";
String, on the other hand, is a specialized type that stores its value as vector of u8. Here is how String type is defined in the source code:
pub struct String {
vec: Vec<u8>,
}
Being vector means it is heap allocated and resizable like any other vector value.
Being specialized means it does not permit arbitrary access and enforces certain checks that data is always valid UTF-8. Other than that, it is just a vector.
So a String is a resizable buffer holding UTF-8 text. This buffer is allocated on the heap, so it can grow as needed or requested. We can fill this buffer anyway we see fit. We can change its content.
If you look carefully vec field is kept private to enforce validity. Since it is private, we can not create a String instance directly. The reason why it is kept private because not all stream of bytes produce valid utf-8 characters and direct interaction with the underlying bytes may corrupt the string. We create u8 bytes through methods and methods runs certain checks. We can say that being private and having controlled interaction via methods provides certain guarantees.
There are several methods defined on String type to create String instance, new is one of them:
pub const fn new() -> String {
String { vec: Vec::new() }
}
We can use it to create a valid String.
let s = String::new();
println("{}", s);
Unfortunately it does not accept input parameter. So result will be valid but an empty string but it will grow like any other vector when capacity is not enough to hold the assigned value. But application performance will take a hit, as growing requires re-allocation.
We can fill the underlying vector with initial values from different sources:
From a string literal
let a = "Hello World";
let s = String::from(a);
Please note that an str is still created and its content is copied to the heap allocated vector via String.from. If we check the executable binary we will see raw bytes in data section with the content "Hello World". This is very important detail some people miss.
From raw parts
let ptr = s.as_mut_ptr();
let len = s.len();
let capacity = s.capacity();
let s = String::from_raw_parts(ptr, len, capacity);
From a character
let ch = 'c';
let s = ch.to_string();
From vector of bytes
let hello_world = vec![72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100];
// We know it is valid sequence, so we can use unwrap
let hello_world = String::from_utf8(hello_world).unwrap();
println!("{}", hello_world); // Hello World
Here we have another important detail. A vector might have any value, there is no guarantee its content will be a valid UTF-8, so Rust forces us to take this into consideration by returning a Result<String, FromUtf8Error> rather than a String.
From input buffer
use std::io::{self, Read};
fn main() -> io::Result<()> {
let mut buffer = String::new();
let stdin = io::stdin();
let mut handle = stdin.lock();
handle.read_to_string(&mut buffer)?;
Ok(())
}
Or from any other type that implements ToString trait
Since String is a vector under the hood, it will exhibit some vector characteristics:
a pointer: The pointer points to an internal buffer that stores the data.
length: The length is the number of bytes currently stored in the buffer.
capacity: The capacity is the size of the buffer in bytes. So, the length will always be less than or equal to the capacity.
And it delegates some properties and methods to vectors:
pub fn capacity(&self) -> usize {
self.vec.capacity()
}
Most of the examples uses String::from, so people get confused thinking why create String from another string.
It is a long read, hope it helps.
They are actually completely different. First off, a str is nothing but a type level thing; it can only be reasoned about at the type level because it's a so-called dynamically-sized type (DST). The size the str takes up cannot be known at compile time and depends on runtime information — it cannot be stored in a variable because the compiler needs to know at compile time what the size of each variable is. A str is conceptually just a row of u8 bytes with the guarantee that it forms valid UTF-8. How large is the row? No one knows until runtime hence it can't be stored in a variable.
The interesting thing is that a &str or any other pointer to a str like Box<str> does exist at runtime. This is a so-called "fat pointer"; it's a pointer with extra information (in this case the size of the thing it's pointing at) so it's twice as large. In fact, a &str is quite close to a String (but not to a &String). A &str is two words; one pointer to a the first byte of a str and another number that describes how many bytes long the the str is.
Contrary to what is said, a str does not need to be immutable. If you can get a &mut str as an exclusive pointer to the str, you can mutate it and all the safe functions that mutate it guarantee that the UTF-8 constraint is upheld because if that is violated then we have undefined behaviour as the library assumes this constraint is true and does not check for it.
So what is a String? That's three words; two are the same as for &str but it adds a third word which is the capacity of the str buffer on the heap, always on the heap (a str is not necessarily on the heap) it manages before it's filled and has to re-allocate. the String basically owns a str as they say; it controls it and can resize it and reallocate it when it sees fit. So a String is as said closer to a &str than to a str.
Another thing is a Box<str>; this also owns a str and its runtime representation is the same as a &str but it also owns the str unlike the &str but it cannot resize it because it does not know its capacity so basically a Box<str> can be seen as a fixed-length String that cannot be resized (you can always convert it into a String if you want to resize it).
A very similar relationship exists between [T] and Vec<T> except there is no UTF-8 constraint and it can hold any type whose size is not dynamic.
The use of str on the type level is mostly to create generic abstractions with &str; it exists on the type level to be able to conveniently write traits. In theory str as a type thing didn't need to exist and only &str but that would mean a lot of extra code would have to be written that can now be generic.
&str is super useful to be able to to have multiple different substrings of a String without having to copy; as said a String owns the str on the heap it manages and if you could only create a substring of a String with a new String it would have to be copied because everything in Rust can only have one single owner to deal with memory safety. So for instance you can slice a string:
let string: String = "a string".to_string();
let substring1: &str = &string[1..3];
let substring2: &str = &string[2..4];
We have two different substring strs of the same string. string is the one that owns the actual full str buffer on the heap and the &str substrings are just fat pointers to that buffer on the heap.
str, only used as &str, is a string slice, a reference to a UTF-8 byte array.
String is what used to be ~str, a growable, owned UTF-8 byte array.
Rust &str and String
String:
Rust owned String type, the string itself lives on the heap and therefore is mutable and can alter its size and contents.
Because String is owned when the variables which owns the string goes out of scope the memory on the heap will be freed.
Variables of type String are fat pointers (pointer + associated metadata)
The fat pointer is 3 * 8 bytes (wordsize) long consists of the following 3 elements:
Pointer to actual data on the heap, it points to the first character
Length of the string (# of characters)
Capacity of the string on the heap
&str:
Rust non owned String type and is immutable by default. The string itself lives somewhere else in memory usually on the heap or 'static memory.
Because String is non owned when &str variables goes out of scope the memory of the string will not be freed.
Variables of type &str are fat pointers (pointer + associated metadata)
The fat pointer is 2 * 8 bytes (wordsize) long consists of the following 2 elements:
Pointer to actual data on the heap, it points to the first character
Length of the string (# of characters)
Example:
use std::mem;
fn main() {
// on 64 bit architecture:
println!("{}", mem::size_of::<&str>()); // 16
println!("{}", mem::size_of::<String>()); // 24
let string1: &'static str = "abc";
// string will point to `static memory which lives through the whole program
let ptr = string1.as_ptr();
let len = string1.len();
println!("{}, {}", unsafe { *ptr as char }, len); // a, 3
// len is 3 characters long so 3
// pointer to the first character points to letter a
{
let mut string2: String = "def".to_string();
let ptr = string2.as_ptr();
let len = string2.len();
let capacity = string2.capacity();
println!("{}, {}, {}", unsafe { *ptr as char }, len, capacity); // d, 3, 3
// pointer to the first character points to letter d
// len is 3 characters long so 3
// string has now 3 bytes of space on the heap
string2.push_str("ghijk"); // we can mutate String type, capacity and length will aslo change
println!("{}, {}", string2, string2.capacity()); // defghijk, 8
} // memory of string2 on the heap will be freed here because owner goes out of scope
}
std::String is simply a vector of u8. You can find its definition in source code . It's heap-allocated and growable.
#[derive(PartialOrd, Eq, Ord)]
#[stable(feature = "rust1", since = "1.0.0")]
pub struct String {
vec: Vec<u8>,
}
str is a primitive type, also called string slice. A string slice has fixed size. A literal string like let test = "hello world" has &'static str type. test is a reference to this statically allocated string.
&str cannot be modified, for example,
let mut word = "hello world";
word[0] = 's';
word.push('\n');
str does have mutable slice &mut str, for example:
pub fn split_at_mut(&mut self, mid: usize) -> (&mut str, &mut str)
let mut s = "Per Martin-Löf".to_string();
{
let (first, last) = s.split_at_mut(3);
first.make_ascii_uppercase();
assert_eq!("PER", first);
assert_eq!(" Martin-Löf", last);
}
assert_eq!("PER Martin-Löf", s);
But a small change to UTF-8 can change its byte length, and a slice cannot reallocate its referent.
In easy words, String is datatype stored on heap (just like Vec), and you have access to that location.
&str is a slice type. That means it is just reference to an already present String somewhere in the heap.
&str doesn't do any allocation at runtime. So, for memory reasons, you can use &str over String. But, keep in mind that when using &str you might have to deal with explicit lifetimes.
For C# and Java people:
Rust' String === StringBuilder
Rust's &str === (immutable) string
I like to think of a &str as a view on a string, like an interned string in Java / C# where you can't change it, only create a new one.
Some Usages
example_1.rs
fn main(){
let hello = String::("hello");
let any_char = hello[0];//error
}
example_2.rs
fn main(){
let hello = String::("hello");
for c in hello.chars() {
println!("{}",c);
}
}
example_3.rs
fn main(){
let hello = String::("String are cool");
let any_char = &hello[5..6]; // = let any_char: &str = &hello[5..6];
println!("{:?}",any_char);
}
Shadowing
fn main() {
let s: &str = "hello"; // &str
let s: String = s.to_uppercase(); // String
println!("{}", s) // HELLO
}
function
fn say_hello(to_whom: &str) { //type coercion
println!("Hey {}!", to_whom)
}
fn main(){
let string_slice: &'static str = "you";
let string: String = string_slice.into(); // &str => String
say_hello(string_slice);
say_hello(&string);// &String
}
Concat
// String is at heap, and can be increase or decrease in its size
// The size of &str is fixed.
fn main(){
let a = "Foo";
let b = "Bar";
let c = a + b; //error
// let c = a.to_string + b;
}
Note that String and &str are different types and for 99% of the time, you only should care about &str.
In Rust, str is a primitive type that represents a sequence of Unicode scalar values, also known as a string slice. This means that it is a read-only view into a string, and it does not own the memory that it points to. On the other hand, String is a growable, mutable, owned string type. This means that when you create a String, it will allocate memory on the heap to store the contents of the string, and it will deallocate this memory when the String goes out of scope. Because String is growable and mutable, you can change the contents of a String after you have created it.
In general, str is used when you want to refer to a string slice that is stored in another data structure, such as a String. String is used when you want to create and own a string value.
String is an Object.
&str is a pointer at a part of object.
In these 3 different types
let noodles = "noodles".to_string();
let oodles = &noodles[1..];
let poodles = "ಠ_ಠ"; // this is string literal
A String has a resizable buffer holding UTF-8 text. The buffer is allocated on the heap, so it can resize its buffer as needed or
requested. In the example, "noodles" is a String that owns an
eight-byte buffer, of which seven are in use. You can think of a
String as a Vec that is guaranteed to hold well-formed UTF-8; in
fact, this is how String is implemented.
A &str is a reference to a run of UTF-8 text owned by someone else: it “borrows” the text. In the example, oodles is a &str
referring to the last six bytes of the text belonging to "noodles", so
it represents the text “oodles.” Like other slice references, a &str
is a fat pointer, containing both the address of the actual data and
its length. You can think of a &str as being nothing more than a
&[u8] that is guaranteed to hold well-formed UTF-8.
A string literal is a &str that refers to preallocated text, typically stored in read-only memory along with the program’s machine
code. In the preceding example, poodles is a string literal, pointing
to seven bytes that are created when the program begins execution and
that last until it exits.
This is how they are stored in memory
Reference:Programming Rust,by Jim Blandy, Jason Orendorff, Leonora F . S. Tindall
Here is a quick and easy explanation.
String - A growable, ownable heap-allocated data structure. It can be coerced to a &str.
str - is (now, as Rust evolves) mutable, fixed-length string that lives on the heap or in the binary. You can only interact with str as a borrowed type via a string slice view, such as &str.
Usage considerations:
Prefer String if you want to own or mutate a string - such as passing the string to another thread, etc.
Prefer &str if you want to have a read-only view of a string.

Why use immutable Vector or String in Rust

I'm learning Rust and learned, that for making an expandable string or array, the String and Vec structs are used. And to modify a String or Vec, the corresponding variable needs to be annotated with mut.
let mut myString = String::from("Hello");
let mut myArray = vec![1, 2, 3];
My question is, why would you use or declare an immutable String or Vec like
let myString = String::from("Hello");
let myArray = vec![1, 2, 3];
instead of a true array or str literal like
let myString = "Hello";
let myArray = [1, 2, 3];
What would be the point of that, does it have any benefits? And what may be other use cases for immutable String's and Vec's?
Edit:
Either I am completely missing something obvious or my question isn't fully understood. I get why you want to use a mutable String or Vec over the str literal or an array, since the latter are immutable. But why would one use an immutable String or Vec over the str literal or the array (which are also immutable)?
You might then do something with that string, for example use it to populate a struct:
struct MyStruct {
field: String,
}
...
let s = String::from("hello");
let mut mystruct = MyStruct { field: s };
You could also, for example, return it from a function.
One use case could be the following:
fn main() {
let text: [u8; 6] = [0x48, 0x45, 0x4c, 0x4c, 0x4f, 0x00];
let converted = unsafe { CStr::from_ptr(text.as_ptr() as *const c_char) }.to_string_lossy().to_string();
println!("{}", converted);
}
This is a bit constructed, but imagine you want to convert a null terminated C string (which might come from the network) from a raw pointer to some Rust string, but you don't know whether it has invalid UTF-8 in it. to_string_lossy() returns a Cow which either points to the original bytes (in case everything is valid UTF-8), but if that's not the case, it will basically copy the string and do a new allocation, replace the invalid characters with the UTF-8 replacement character and then point to that.
This is of course quite nice, because (presumably) most of the time you get away without copying the original C string, but in some cases, it might not be. But if you don't care about that and don't want to work with a Cow, it might make sense to convert it to a String, which you don't need to be mutable afterwards. But it's not possible to have a &str in case the original text contains invalid UTF-8.

How to generate a random String of alphanumeric chars?

The first part of the question is probably pretty common and there are enough code samples that explain how to generate a random string of alphanumerics. The piece of code I use is from here.
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect();
println!("{}", rand_string);
}
This piece of code does however not compile, (note: I'm on nightly):
error[E0277]: a value of type `String` cannot be built from an iterator over elements of type `u8`
--> src/main.rs:8:10
|
8 | .collect();
| ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=u8>`
|
= help: the trait `FromIterator<u8>` is not implemented for `String`
Ok, the elements that are generated are of type u8. So I guess this is an array or vector of u8:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let r = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>();
let s = String::from_utf8_lossy(&r);
println!("{}", s);
}
And this compiles and works!
2dCsTqoNUR1f0EzRV60IiuHlaM4TfK
All good, except that I would like to ask if someone could explain what exactly happens regarding the types and how this can be optimised.
Questions
.sample_iter(&Alphanumeric) produces u8 and not chars?
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
.sample_iter(&Alphanumeric) produces u8 and not chars?
Yes, this was changed in rand v0.8. You can see in the docs for 0.7.3:
impl Distribution<char> for Alphanumeric
But then in the docs for 0.8.0:
impl Distribution<u8> for Alphanumeric
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
There are a couple of ways to do this, the most obvious being to just cast every u8 to a char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(|x| x as char)
.collect();
Or, using the From<u8> instance of char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(char::from)
.collect();
Of course here, since you know every u8 must be valid UTF-8, you can use String::from_utf8_unchecked, which is faster than from_utf8_lossy (although probably around the same speed as the as char method):
let s = unsafe {
String::from_utf8_unchecked(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
)
};
If, for some reason, the unsafe bothers you and you want to stay safe, then you can use the slower String::from_utf8 and unwrap the Result so you get a panic instead of UB (even though the code should never panic or UB):
let s = String::from_utf8(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
).unwrap();
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
First of all, trust me, you don't want arrays of chars. They are not fun to work with. If you want a stack string, have a u8 array then use a function like std::str::from_utf8 or the faster std::str::from_utf8_unchecked (again only usable since you know valid utf8 will be generated.)
As to optimizing the heap allocation away, refer to this answer. Basically, it's not possible with a bit of hackiness/ugliness (such as making your own function that collects an iterator into an array of 30 elements).
Once const generics are finally stabilized, there'll be a much prettier solution.
The first example in the docs for rand::distributions::Alphanumeric shows that if you want to convert the u8s into chars you should map them using the char::from function:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.map(char::from) // map added here
.take(30)
.collect();
println!("{}", rand_string);
}
playground

How to prepend a slice to a Vec [duplicate]

This question already has answers here:
Efficiently insert or replace multiple elements in the middle or at the beginning of a Vec?
(3 answers)
Closed 5 years ago.
I was expecting a Vec::insert_slice(index, slice) method — a solution for strings (String::insert_str()) does exist.
I know about Vec::insert(), but that inserts only one element at a time, not a slice. Alternatively, when the prepended slice is a Vec one can append to it instead, but this does not generalize. The idiomatic solution probably uses Vec::splice(), but using iterators as in the example makes me scratch my head.
Secondly, the whole concept of prepending has seemingly been exorcised from the docs. There isn't a single mention. I would appreciate comments as to why. Note that relatively obscure methods like Vec::swap_remove() do exist.
My typical use case consists of indexed byte strings.
String::insert_str makes use of the fact that a string is essentially a Vec<u8>. It reallocates the underlying buffer, moves all the initial bytes to the end, then adds the new bytes to the beginning.
This is not generally safe and can not be directly added to Vec because during the copy the Vec is no longer in a valid state — there are "holes" in the data.
This doesn't matter for String because the data is u8 and u8 doesn't implement Drop. There's no such guarantee for an arbitrary T in a Vec, but if you are very careful to track your state and clean up properly, you can do the same thing — this is what splice does!
the whole concept of prepending has seemingly been exorcised
I'd suppose this is because prepending to a Vec is a poor idea from a performance standpoint. If you need to do it, the naïve case is straight-forward:
fn prepend<T>(v: Vec<T>, s: &[T]) -> Vec<T>
where
T: Clone,
{
let mut tmp: Vec<_> = s.to_owned();
tmp.extend(v);
tmp
}
This has a bit higher memory usage as we need to have enough space for two copies of v.
The splice method accepts an iterator of new values and a range of values to replace. In this case, we don't want to replace anything, so we give an empty range of the index we want to insert at. We also need to convert the slice into an iterator of the appropriate type:
let s = &[1, 2, 3];
let mut v = vec![4, 5];
v.splice(0..0, s.iter().cloned());
splice's implementation is non-trivial, but it efficiently does the tracking we need. After removing a chunk of values, it then reuses that chunk of memory for the new values. It also moves the tail of the vector around (maybe a few times, depending on the input iterator). The Drop implementation of Slice ensures that things will always be in a valid state.
I'm more surprised that VecDeque doesn't support it, as it's designed to be more efficient about modifying both the head and tail of the data.
Taking into consideration what Shepmaster said, you could implement a function prepending a slice with Copyable elements to a Vec just like String::insert_str() does in the following way:
use std::ptr;
unsafe fn prepend_slice<T: Copy>(vec: &mut Vec<T>, slice: &[T]) {
let len = vec.len();
let amt = slice.len();
vec.reserve(amt);
ptr::copy(vec.as_ptr(),
vec.as_mut_ptr().offset((amt) as isize),
len);
ptr::copy(slice.as_ptr(),
vec.as_mut_ptr(),
amt);
vec.set_len(len + amt);
}
fn main() {
let mut v = vec![4, 5, 6];
unsafe { prepend_slice(&mut v, &[1, 2, 3]) }
assert_eq!(&v, &[1, 2, 3, 4, 5, 6]);
}

How to shuffle a str in place

I want to shuffle a String in place in Rust, but I seem to miss something. The fix is probably trivial...
use std::rand::{Rng, thread_rng};
fn main() {
// I want to shuffle this string...
let mut value: String = "SomeValue".to_string();
let mut bytes = value.as_bytes();
let mut slice: &mut [u8] = bytes.as_mut_slice();
thread_rng().shuffle(slice);
println!("{}", value);
}
The error I get is
<anon>:8:36: 8:41 error: cannot borrow immutable dereference of `&`-pointer `*bytes` as mutable
<anon>:8 let mut slice: &mut [u8] = bytes.as_mut_slice();
^~~~~
I read about String::as_mut_vec() but it's unsafe so I'd rather not use it.
There's no very good way to do this, partly due to the nature of the UTF-8 encoding of strings, and partly due to the inherent properties of Unicode and text.
There's at least three layers of things that could be shuffled in a UTF-8 string:
the raw bytes
the encoded codepoints
the graphemes
Shuffling raw bytes is likely to give an invalid UTF-8 string as output unless the string is entirely ASCII. Non-ASCII characters are encoded as special sequences of multiple bytes, and shuffling these will almostly certainly not get them in the right order at the end. Hence shuffling bytes is often not good.
Shuffling codepoints (char in Rust) makes a little bit more sense, but there is still the concept of "special sequences", where so-called combining characters can be layered on to a single letter adding diacritics etc (e.g. letters like ä can be written as a plus U+0308, the codepoint representing the diaeresis). Hence shuffling characters won't give an invalid UTF-8 string, but it may break up these codepoint sequences and give nonsense output.
This brings me to graphemes: the sequences of codepoints that make up a single visible character (like ä is still a single grapheme when written as one or as two codepoints). This will give the most reliably sensible answer.
Then, once you've decided which you want to shuffle the shuffling strategy can be made:
if the string is guaranteed to be purely ASCII, shuffling the bytes with .shuffle is sensible (with the ASCII assumption, this is equivalent to the others)
otherwise, there's no standard way to operate in-place, one would get the elements as an iterator (.chars() for codepoints or .graphemes(true) for graphemes), place them into a vector with .collect::<Vec<_>>(), shuffle the vector, and then collect everything back into a new String with e.g. .iter().map(|x| *x).collect::<String>().
The difficulty of handling codepoints and graphemes is because UTF-8 does not encode them as fixed width, so there's no way to take a random codepoint/grapheme out and insert it somewhere else, or otherwise swap two elements efficiently... Without just decoding everything into an external Vec.
Not being in-place is unfortunate, but strings are hard.
(If your strings are guaranteed to be ASCII, then using a type like the Ascii provided by ascii would be a good way to keep things straight, at the type-level.)
As an example of the difference of the three things, take a look at:
fn main() {
let s = "U͍̤͕̜̲̼̜n̹͉̭͜ͅi̷̪c̠͍̖̻o̸̯̖de̮̻͍̤";
println!("bytes: {}", s.bytes().count());
println!("chars: {}", s.chars().count());
println!("graphemes: {}", s.graphemes(true).count());
}
It prints:
bytes: 57
chars: 32
graphemes: 7
(Generate your own, it demonstrates putting multiple combining character on to a single letter.)
Putting together the suggestion above:
use std::rand::{Rng, thread_rng};
fn str_shuffled(s: &str) -> String {
let mut graphemes = s.graphemes(true).collect::<Vec<&str>>();
let mut gslice = graphemes.as_mut_slice();
let mut rng = thread_rng();
rng.shuffle(gslice);
gslice.iter().map(|x| *x).collect::<String>()
}
fn main() {
println!("{}", str_shuffled("Hello, World!"));
println!("{}", str_shuffled("selam dünya"));
println!("{}", str_shuffled("你好世界"));
println!("{}", str_shuffled("γειά σου κόσμος"));
println!("{}", str_shuffled("Здравствулте мир"));
}
I am also a beginner with Rust, but what about:
fn main() {
// I want to shuffle this string...
let value = "SomeValue".to_string();
let mut bytes = value.into_bytes();
bytes[0] = bytes[1]; // Shuffle takes place.. sorry but std::rand::thread_rng is not available in the Rust installed on my current machine.
match String::from_utf8(bytes) { // Should not copy the contents according to documentation.
Ok(s) => println!("{}", s),
_ => println!("Error occurred!")
}
}
Also keep in mind that Rust default string encoding is UTF-8 when fiddling around with sequences of bytes. ;)
This was a great suggestion, lead me to the following solution, thanks!
use std::rand::{Rng, thread_rng};
fn main() {
// I want to shuffle this string...
let value: String = "SomeValue".to_string();
let mut bytes = value.into_bytes();
thread_rng().shuffle(&mut *bytes.as_mut_slice());
match String::from_utf8(bytes) { // Should not copy the contents according to documentation.
Ok(s) => println!("{}", s),
_ => println!("Error occurred!")
}
}
rustc 0.13.0-nightly (ad9e75938 2015-01-05 00:26:28 +0000)

Resources