How to share parts of a string with Rc?

How to share parts of a string with Rc? - string

I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?

While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).

Related

Why are string constant pointers different across crates in Rust?

While working with a HashMap that uses &'static str as the key type, I created a newtype to hash by the pointer rather than by the string contents to reduce overhead.
pub struct StaticStr(&'static str);
impl Hash for StaticStr {
fn hash<H: Hasher>(&self, state: &mut H) {
self.0.as_ptr().hash(state)
}
}
impl PartialEq for StaticStr {
fn eq(&self, other: &Self) -> bool {
self.0.as_ptr() == other.0.as_ptr()
}
}
impl Eq for StaticStr {}
It turns out that this does not work consistently, as in the following example.
pub type MyMap = HashMap<StaticStr, u8>;
pub const A: &str = "A";
pub fn make_map() -> MyMap {
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
map
}
pub fn get_value(control: &MyMap) -> Option<u8> {
control.get(&StaticStr(A)).cloned()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
pub fn map_made_in_lib() {
let map = make_map();
assert_eq!(get_value(&map), Some(1));
}
#[test]
pub fn map_made_in_test() {
// Same as make_map()
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
// This check fails
assert_eq!(get_value(&map), Some(1));
}
}
Notice that in the first test, the string constant A is only used directly in the lib crate. In the second test, A is used directly in both the lib crate and the test crate. I discovered that although both tests use the same string constant, the pointers are different depending on which crate refers to the string constant by name. This is demonstrated in the minimal reproduction I created. I would have expected that the string literal be included only once for the crate that defines it, or at least that the linker would be smart enough to deduplicate the string literals. Is there a reason for this behavior?

Instead of a const try a static?
A constant item is an optionally named constant value which is not
associated with a specific memory location in the program. Constants
are essentially inlined wherever they are used, meaning that they are
copied directly into the relevant context when used. This includes
usage of constants from external crates, and non-Copy types.
References to the same constant are not necessarily guaranteed to
refer to the same memory address. -- The Rust Reference
A static item is similar to a constant, except that it represents a
precise memory location in the program. All references to the static
refer to the same memory location. Static items have the static
lifetime, which outlives all other lifetimes in a Rust program. Static
items do not call drop at the end of the program. -- The Rust Reference

Is there a way to have a struct contain a reference that might no longer be valid?

Like some kind of Box that holds the reference to the value or something? I'd have to check whether the value is still alive or not before reading it, like when a Option is pattern matched.
A mock example:
struct Whatever {
thing: AliveOrNot<i32>,
}
fn main() {
let mut w = Whatever { thing: Holder::Empty };
w.thing = AliveOrNot::new(100);
match w.thing {
Empty => println!("doesn't exist"),
Full(value) => println!("{}", value),
}
}
The exact case is that I'm using a sdl2 Font and I want to store instances of that struct in another struct. I don't want to do something like this because the Parent would have to live exactly as long as the Font:
struct Font<'a, 'b> {
aa: &'a i32,
bb: &'b i32,
}
struct Parent<'a, 'b, 'c> {
f: &'c Font<'a, 'b>
}
What I want is for the Parent to work regardless of whether that field is still alive or not.

You may be interested in std::rc::Weak or std::sync::Weak:
use std::rc::{Rc, Weak};
struct Whatever {
thing: Weak<i32>,
}
impl Whatever {
fn do_it(&self) {
match self.thing.upgrade() {
Some(value) => println!("{}", value),
None => println!("doesn't exist"),
}
}
}
fn its_dead_jim() -> Whatever {
let owner = Rc::new(42);
let thing = Rc::downgrade(&owner);
let whatever = Whatever { thing };
whatever.do_it();
whatever
}
fn main() {
let whatever = its_dead_jim();
whatever.do_it();
}
42
doesn't exist
There is no way to do this in safe Rust using non-'static references. A huge point of Rust's references is that it's impossible to refer to invalid values.
You could leak the memory, creating a &'static i32, but that's not sustainable if you need to do this multiple times.
You could use unsafe code and deal with raw pointers that have no notion of lifetimes. You then assume the responsibility of ensuring you don't introduce memory unsafety.
See also:
Need holistic explanation about Rust's cell and reference counted types
Situations where Cell or RefCell is the best choice
Is there a shared pointer with a single strong owner and multiple weak references?
Why can't I store a value and a reference to that value in the same struct?
How to convert a String into a &'static str
Is there any way to return a reference to a variable created in a function?
How to solve this rust lifetime bound issue of SDL2?
Cannot infer an appropriate lifetime for autoref due to conflicting requirements

Why doesn't Weak::new() work when Rc::downgrade() does?

I'm creating a function which returns a Weak reference to a trait object. In situations where the object cannot be found (it's a lookup function), I want to return an empty Weak reference using Weak::new():
use std::rc::{self, Rc, Weak};
use std::cell::RefCell;
pub trait Part {}
pub struct Blah {}
impl Part for Blah {}
fn main() {
let blah = Blah {};
lookup(Rc::new(RefCell::new(blah)));
}
fn lookup(part: Rc<RefCell<Part>>) -> Weak<RefCell<Part>> {
if true {
Rc::downgrade(&part)
} else {
Weak::new()
}
}
This has the following error during compilation:
error[E0277]: the trait bound `Part + 'static: std::marker::Sized` is not satisfied in `std::cell::RefCell<Part + 'static>`
--> <anon>:19:9
|
19 | Weak::new()
| ^^^^^^^^^ within `std::cell::RefCell<Part + 'static>`, the trait `std::marker::Sized` is not implemented for `Part + 'static`
|
= note: `Part + 'static` does not have a constant size known at compile-time
= note: required because it appears within the type `std::cell::RefCell<Part + 'static>`
= note: required by `<std::rc::Weak<T>>::new`
Why is it that I can successfully create a Weak<RefCell<Part>> from Rc::downgrade() but cannot use the same type to create a new Weak reference with Weak::new()?
Is there a way for me to annotate Weak::new() to help the compiler or will I have to wrap this in an Option to let the user know the part wasn't found?
Working minimal example

The type inferred for Weak::new() is Weak<RefCell<Part>>, and the Part part can not be created because it's a trait!
That's what the Sized error is all about. The trait is not a concrete structure, it has no size known at compile time, so the compiler wouldn't know how much space to allocate.
Why is it that I can successfully create a Weak<RefCell<Part>> from Rc::downgrade()
It is because Rc<RefCell<Part>> points to a structure that is already allocated. Compiler can reference it with a trait pointer even though it doesn't know whether it's a Blah or some other implementation of the Part trait.
Is there a way for me to annotate Weak::new() to help the compiler
You can indeed annotate Weak::new(), pointing the compiler to the implementation of Part that you want instantiated, like this:
use std::rc::{Rc, Weak};
use std::cell::RefCell;
pub trait Part {}
pub struct Blah {}
impl Part for Blah {}
fn main() {
let blah = Blah {};
lookup(Rc::new(RefCell::new(blah)));
}
fn lookup(part: Rc<RefCell<Part>>) -> Weak<RefCell<Part>> {
if true {
Rc::downgrade(&part)
} else {
Weak::<RefCell<Blah>>::new()
}
}

TL;DR: Fat pointers are hard.
And therefore you need to specify the concrete type explicitly before coercion takes place:
Weak::<RefCell<Blah>>::new()
Note: if Blah takes a lot of memory, create a Zero-Sized Type Fool, implement Part for it (all functions unimplemented!()), then use Weak::<RefCell<Fool>>::new() to avoid allocating memory uselessly.
I believe that the underlying issue is simply one of implementation issue.
It does not seem unfixable, but may require quite some work to cover all corner cases.
First, let's expose the issue.
The implementation of Weak::new:
impl<T> Weak<T> {
pub fn new() -> Weak<T> {
unsafe {
Weak {
ptr: Shared::new(Box::into_raw(box RcBox {
strong: Cell::new(0),
weak: Cell::new(1),
value: uninitialized(),
})),
}
}
}
}
For homogeneity, all Shared elements are wrapping a RcBox, which contains two Cell (the counters) and the actual value.
The mere fact of building a RcBox<T> requires that the size of T be known, which is why unlike most Weak methods, T is NOT marked as : ?Sized in this impl.
Now, since the memory is left uninitialized, it is clear that it will never be used, so actually any size would have been fine.
This is supported by the fact that RcBox can actually carry unsized data, which is necessary to go from RcBox<Struct> to RcBox<Trait>, and therefore the strong and weak fields are always laid out first (only the last field can be unsized).
Thus, we would like
Allocate a RcBox<()>, which would save memory AND not require that T be Sized,
Then transmuted to RcBox<T>, whatever T is.
Alright, let's do it!
Our desired implementation will look something like this:
impl<T: ?Sized> Weak<T> {
pub fn new() -> Weak<T> {
unsafe {
Weak {
ptr: Shared::new(transmute(Box::into_raw(box RcBox {
strong: Cell::new(0),
weak: Cell::new(1),
value: (),
}))),
}
}
}
}
which utterly fails to compile.
Why? Because *mut RcBox<()> is a thin pointer, whereas *mut RcBox<T> is either a thin pointer OR a fat pointer (see raw memory representation) depending on whether T is Sized or !Sized.
Now, trait pointers can be handled (warning: contains a simplified and totally unsafe implementation of Rc) with the following implementation of Weak::new:
impl<T: ?Sized> Weak<T> {
pub fn new() -> Weak<T> {
unsafe {
let boxed = Box::into_raw(box RcBox {
strong: Cell::new(0),
weak: Cell::new(1),
value: (),
});
let ptr = if size_of::<*mut ()>() == size_of::<*mut T>() {
let ptr: *mut RcBox<T> = transmute_copy(&boxed);
ptr
} else {
let ptr: *mut RcBox<T> = transmute_copy(&TraitObject {
data: boxed as *mut (),
vtable: null_mut(),
});
ptr
};
Weak { ptr: Shared::new(ptr) }
}
}
}
However this implementation only accounts for trait pointers, and there are other kinds of fat pointers for which it would... probably completely break down.

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:
The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.
What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).
If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.
Here is some sample code I have whipped up to illustrate this:
// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case,
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)
#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]
extern crate alloc;
use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;
#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }
#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);
impl<T: fmt::Debug> PrintOnDrop<T> {
fn new(name: &'static str, t: T) -> Self {
PrintOnDrop(name, t, State::Valid)
}
}
impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
fn drop(&mut self) {
println!("drop PrintOnDrop({}, {:?}, {:?})",
self.0,
self.1,
self.2);
self.2 = State::INVALID;
}
}
struct MyBox1<T> {
v: Box<T>,
}
impl<T> MyBox1<T> {
fn new(t: T) -> Self {
MyBox1 { v: Box::new(t) }
}
}
struct MyBox2<T> {
v: *const T,
}
impl<T> MyBox2<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox2 { v: p }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
fn drop(&mut self) {
unsafe {
// We want this to be *legal*. This destructor is not
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
// that `MyBox2<T>` owns an instance of `T`.)
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
struct MyBox3<T> {
v: *const T,
_pd: PhantomData<T>,
}
impl<T> MyBox3<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox3 { v: p, _pd: Default::default() }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
fn drop(&mut self) {
unsafe {
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
fn f1() {
// `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
let v1; let _mb1;
v1 = PrintOnDrop::new("v1", 13);
_mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}
fn f2() {
{
let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
v2a = PrintOnDrop::new("v2a", 13);
_mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
}
{
let (_mb2b, v2b); // Unsound!
v2b = PrintOnDrop::new("v2b", 13);
_mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
// namely, v2b dropped before _mb2b, but latter contains
// value that attempts to access v2b when being dropped.
}
}
fn f3() {
let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
v3 = PrintOnDrop::new("v3", 13);
_mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}
fn main() {
f1(); f2(); f3();
}

Caveat emptor — I'm not that strong in the extremely deep theory that truly answers your question. I'm just a layperson who has used Rust a bit and has read the related RFCs. Always refer back to those original sources for a less-diluted version of the truth.
RFC 769 introduced the actual The Drop-Check Rule:
Let v be some value (either temporary or named) and 'a be some
lifetime (scope); if the type of v owns data of type D, where (1.)
D has a lifetime- or type-parametric Drop implementation, and (2.)
the structure of D can reach a reference of type &'a _, and (3.)
either:
(A.) the Drop impl for D instantiates D at 'a
directly, i.e. D<'a>, or,
(B.) the Drop impl for D has some type parameter with a
trait bound T where T is a trait that has at least
one method,
then 'a must strictly outlive the scope of v.
It then goes further to define some of those terms, including what it means for one type to own another. This goes further to mention PhantomData specifically:
Therefore, as an additional special case to the criteria above for when the type E owns data of type D, we include:
If E is PhantomData<T>, then recurse on T.
A key problem occurs when two variables are defined at the same time:
struct Noisy<'a>(&'a str);
impl<'a> Drop for Noisy<'a> {
fn drop(&mut self) { println!("Dropping {}", self.0 )}
}
fn main() -> () {
let (mut v, s) = (Vec::new(), "hi".to_string());
let noisy = Noisy(&s);
v.push(noisy);
}
As I understand it, without The Drop-Check Rule and indicating that Vec owns Noisy, code like this might compile. When the Vec is dropped, the drop implementation could access an invalid reference; introducing unsafety.
Returning to your points:
If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The compiler must know that you own the value because you can/will call drop. Since the implementation of drop is arbitrary, if you are going to call it, the compiler must forbid you from accepting values that would cause unsafe behavior during drop.
Always remember that any arbitrary T can be a value, a reference, a value containing a reference, etc. When trying to puzzle out these types of things, it's important to try to use the most complicated variant for any thought experiments.
All of that should provide enough pieces to connect-the-dots; for full understanding, reading the RFC a few times is probably better than relying on my flawed interpretation.
Then it gets more complicated. RFC 1238 further modifies The Drop-Check Rule, removing this specific reasoning. It does say:
parametricity is a necessary but not sufficient condition to justify the inferences that dropck makes
Continuing to use PhantomData seems the safest thing to do, but it may not be required. An anonymous Twitter benefactor pointed out this code:
use std::marker::PhantomData;
#[derive(Debug)] struct MyGeneric<T> { x: Option<T> }
#[derive(Debug)] struct MyDropper<T> { x: Option<T> }
#[derive(Debug)] struct MyHiddenDropper<T> { x: *const T }
#[derive(Debug)] struct MyHonestHiddenDropper<T> { x: *const T, boo: PhantomData<T> }
impl<T> Drop for MyDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHiddenDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHonestHiddenDropper<T> { fn drop(&mut self) { } }
fn main() {
// Does Compile! (magic annotation on destructor)
{
let (a, mut b) = (0, vec![]);
b.push(&a);
}
// Does Compile! (no destructor)
{
let (a, mut b) = (0, MyGeneric { x: None });
b.x = Some(&a);
}
// Doesn't Compile! (has destructor, no attribute)
{
let (a, mut b) = (0, MyDropper { x: None });
b.x = Some(&a);
}
{
let (a, mut b) = (0, MyHiddenDropper { x: 0 as *const _ });
b.x = &&a;
}
{
let (a, mut b) = (0, MyHonestHiddenDropper { x: 0 as *const _, boo: PhantomData });
b.x = &&a;
}
}
This suggests that the changes in RFC 1238 made the compiler more conservative, such that simply having a lifetime or type parameter is enough to prevent it from compiling.
You can also note that Vec doesn't have this problem because it uses the unsafe_destructor_blind_to_params attribute described in the the RFC.

Why is it discouraged to accept a reference &String, &Vec, or &Box as a function argument?

I wrote some Rust code that takes a &String as an argument:
fn awesome_greeting(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
I've also written code that takes in a reference to a Vec or Box:
fn total_price(prices: &Vec<i32>) -> i32 {
prices.iter().sum()
}
fn is_even(value: &Box<i32>) -> bool {
**value % 2 == 0
}
However, I received some feedback that doing it like this isn't a good idea. Why not?

TL;DR: One can instead use &str, &[T] or &T to allow for more generic code.
One of the main reasons to use a String or a Vec is because they allow increasing or decreasing the capacity. However, when you accept an immutable reference, you cannot use any of those interesting methods on the Vec or String.
Accepting a &String, &Vec or &Box also requires the argument to be allocated on the heap before you can call the function. Accepting a &str allows a string literal (saved in the program data) and accepting a &[T] or &T allows a stack-allocated array or variable. Unnecessary allocation is a performance loss. This is usually exposed right away when you try to call these methods in a test or a main method:
awesome_greeting(&String::from("Anna"));
total_price(&vec![42, 13, 1337])
is_even(&Box::new(42))
Another performance consideration is that &String, &Vec and &Box introduce an unnecessary layer of indirection as you have to dereference the &String to get a String and then perform a second dereference to end up at &str.
Instead, you should accept a string slice (&str), a slice (&[T]), or just a reference (&T). A &String, &Vec<T> or &Box<T> will be automatically coerced (via deref coercion) to a &str, &[T] or &T, respectively.
fn awesome_greeting(name: &str) {
println!("Wow, you are awesome, {}!", name);
}
fn total_price(prices: &[i32]) -> i32 {
prices.iter().sum()
}
fn is_even(value: &i32) -> bool {
*value % 2 == 0
}
Now you can call these methods with a broader set of types. For example, awesome_greeting can be called with a string literal ("Anna") or an allocated String. total_price can be called with a reference to an array (&[1, 2, 3]) or an allocated Vec.
If you'd like to add or remove items from the String or Vec<T>, you can take a mutable reference (&mut String or &mut Vec<T>):
fn add_greeting_target(greeting: &mut String) {
greeting.push_str("world!");
}
fn add_candy_prices(prices: &mut Vec<i32>) {
prices.push(5);
prices.push(25);
}
Specifically for slices, you can also accept a &mut [T] or &mut str. This allows you to mutate a specific value inside the slice, but you cannot change the number of items inside the slice (which means it's very restricted for strings):
fn reset_first_price(prices: &mut [i32]) {
prices[0] = 0;
}
fn lowercase_first_ascii_character(s: &mut str) {
if let Some(f) = s.get_mut(0..1) {
f.make_ascii_lowercase();
}
}

In addition to Shepmaster's answer, another reason to accept a &str (and similarly &[T] etc) is because of all of the other types besides String and &str that also satisfy Deref<Target = str>. One of the most notable examples is Cow<str>, which lets you be very flexible about whether you are dealing with owned or borrowed data.
If you have:
fn awesome_greeting(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
But you need to call it with a Cow<str>, you'll have to do this:
let c: Cow<str> = Cow::from("hello");
// Allocate an owned String from a str reference and then makes a reference to it anyway!
awesome_greeting(&c.to_string());
When you change the argument type to &str, you can use Cow seamlessly, without any unnecessary allocation, just like with String:
let c: Cow<str> = Cow::from("hello");
// Just pass the same reference along
awesome_greeting(&c);
let c: Cow<str> = Cow::from(String::from("hello"));
// Pass a reference to the owned string that you already have
awesome_greeting(&c);
Accepting &str makes calling your function more uniform and convenient, and the "easiest" way is now also the most efficient. These examples will also work with Cow<[T]> etc.

The recommendation is using &str over &String because &str also satisfies &String which could be used for both owned strings and the string slices but not the other way around:
use std::borrow::Cow;
fn greeting_one(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
fn greeting_two(name: &str) {
println!("Wow, you are awesome, {}!", name);
}
fn main() {
let s1 = "John Doe".to_string();
let s2 = "Jenny Doe";
let s3 = Cow::Borrowed("Sally Doe");
let s4 = Cow::Owned("Sally Doe".to_string());
greeting_one(&s1);
// greeting_one(&s2); // Does not compile
// greeting_one(&s3); // Does not compile
greeting_one(&s4);
greeting_two(&s1);
greeting_two(s2);
greeting_two(&s3);
greeting_two(&s4);
}
Using vectors to manipulate text is never a good idea and does not even deserve discussion because you will loose all the sanity checks and performance optimizations. String type uses vector internally anyway. Remember, Rust uses UTF-8 for strings for storage efficiency. If you use vector, you have to repeat all the hard work. Other than that, borrowing vectors or boxed values should be OK.

Because those types can be coerced, so if we use those types functions will accept less types:
1- a reference to String can be coerced to a str slice. For example create a function:
fn count_wovels(words:&String)->usize{
let wovels_count=words.chars().into_iter().filter(|x|(*x=='a') | (*x=='e')| (*x=='i')| (*x=='o')|(*x=='u')).count();
wovels_count
}
if you pass &str, it will not be accepted:
let name="yilmaz".to_string();
println!("{}",count_wovels(&name));
// this is not allowed because argument should be &String but we are passing str
// println!("{}",wovels("yilmaz"))
But if that function accepts &str instead
// words:&str
fn count_wovels(words:&str)->usize{ ... }
we can pass both types to the function
let name="yilmaz".to_string();
println!("{}",count_wovels(&name));
println!("{}",wovels("yilmaz"))
With this, our function can accept more types
2- Similary, a reference to Box &Box[T], will be coerced to the reference to the value inside the Box Box[&T]. for example
fn length(name:&Box<&str>){
println!("lenght {}",name.len())
}
this accepts only &Box<&str> type
let boxed_str=Box::new("Hello");
length(&boxed_str);
// expected reference `&Box<&str>` found reference `&'static str`
// length("hello")
If we pass &str as type, we can pass both types
3- Similar relation exists between ref to a Vec and ref to an array
fn square(nums:&Vec<i32>){
for num in nums{
println!("square of {} is {}",num,num*num)
}
}
fn main(){
let nums=vec![1,2,3,4,5];
let nums_array=[1,2,3,4,5];
// only &Vec<i32> is accepted
square(&nums);
// mismatched types: mismatched types expected reference `&Vec<i32>` found reference `&[{integer}; 5]`
//square(&nums_array)
}
this will work for both types
fn square(nums:&[i32]){..}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to share parts of a string with Rc? - string

Related

Why are string constant pointers different across crates in Rust?

Is there a way to have a struct contain a reference that might no longer be valid?

Why doesn't Weak::new() work when Rc::downgrade() does?

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

Why is it discouraged to accept a reference &String, &Vec, or &Box as a function argument?

Categories

Resources