Can I get a String directly from split? [duplicate] - string

Is there any way to avoid calling .to_string() when I need a string? For example:
fn func1(aaa: String) -> ....
And instead of
func1("fdsfdsfd".to_string())
can I do something like this:
func1(s"fdsfdsfd")

TL;DR:
As of Rust 1.9, str::to_string, str::to_owned, String::from, str::into all have the same performance characteristics. Use whichever you prefer.
The most obvious and idiomatic way to convert a string slice (&str) to an owned string (String) is to use ToString::to_string. This works for any type that implements Display. This includes string slices, but also integers, IP addresses, paths, errors, and so on.
Before Rust 1.9, the str implementation of to_string leveraged the formatting infrastructure. While it worked, it was overkill and not the most performant path.
A lighter solution was to use ToOwned::to_owned, which is implemented for types that have a "borrowed" and an "owned" pair. It is implemented in an efficient manner.
Another lightweight solution is to use Into::into which leverages From::from. This is also implemented efficiently.
For your specific case, the best thing to do is to accept a &str, as thirtythreeforty answered. Then you need to do zero allocations, which is the best outcome.
In general, I will probably use into if I need to make an allocated string — it's only 4 letters long ^_^. When answering questions on Stack Overflow, I'll use to_owned as it's much more obvious what is happening.

No, the str::to_string() method is the canonical way of creating a String from an &'static str (a string literal). I even like it for the reason you dislike it: it's a little verbose. Because it involves a heap allocation, you should think twice before invoking it in cases such as these. Also note that since Rust gained impl specialization, str::to_string is no slower than str::to_owned or its ilk.
However, what you really want here is a func1 that can easily be passed any string, be it a &str or a String. Because a String will Deref to a &str, you can have func1 accept an &str, thereby avoiding the String allocation altogether. See this example (playground):
fn func1(s: &str) {
println!("{}", s);
}
fn main() {
let allocated_string: String = "owned string".to_string();
func1("static string");
func1(&allocated_string);
}

dtolnay:
I now strongly prefer to_owned() for string literals over either of to_string() or into().
What is the difference between String and &str? An unsatisfactory answer is “one is a string and the other is not a string” because obviously both are strings. Taking something that is a string and converting it to a string using to_string() seems like it misses the point of why we are doing this in the first place, and more importantly misses the opportunity to document this to our readers.
The difference between String and &str is that one is owned and one is not owned. Using to_owned() fully captures the reason that a conversion is required at a particular spot in our code.
struct Wrapper {
s: String
}
// I have a string and I need a string. Why am I doing this again?
Wrapper { s: "s".to_string() }
// I have a borrowed string but I need it to be owned.
Wrapper { s: "s".to_owned() }
vitalyd:
Not if you mentally read to_string as to_String

Related

Why would you convert from a string in Rust to a String?

I'm new to rust and comes from a python domain, I've just started learning rust, so I've 2 questions to ask. 1) Why are we defining "String" again (name: String) when already I have mentioned the datatype to be as "String" in my "struct Person". 2) What exactly is the use of from. Please could someone explain me in simple english.
fn main() {
struct Person {
name: String,
}
// instantiate Person struct
let person = Person {
name: String::from("Steve Austin"), //Why are we defining "string" again and "from"
};
// access value of name field in Person struct
println!("Person name = {}", person.name);
}
Why are we defining "String" again (name: String) when already I have mentioned the datatype to be as "String" in my "struct Person". 2) What exactly is the use of from.
You're not defining anything, String::from is a static function (think classmethod), which converts to a string.
It's a bit like calling str() in Python.
The oddity here is that Rust has multiple string-adjacent types, and String is only one of them: an owned, heap-allocated, mutable, string buffer. Python doesn't really have an equivalent, StringIO is probably the closest relative (strarray would be if it existed, but it does not).
Meanwhile string literals are not owned heap-allocated strings, instead they are references to data stored in the rodata (or text) segment of the binary.
Python has nothing which really compares, because its strings are more or less always heap-allocated and created at runtime, they're just immutable. The closest general equivalent to &str would be a string version of memoryview, but it still lacks the lexical constraints, and the idea of a 'static lifetime.
For more, see
https://doc.rust-lang.org/rust-by-example/std/str.html
https://stackoverflow.com/a/24159933/8182118
https://doc.rust-lang.org/std/primitive.str.html
https://doc.rust-lang.org/std/string/struct.String.html
In Rust there's two kinds of strings: str and String. The former is a really lean construct and is passed around as a reference, like &str. These cannot be modified.1 They also can't be copied, they're references, so they will always refer to the same value.
The reason they exist is because they are the "minimum viable string", they are the cheapest possible representation of textual data. This efficiency does have trade-offs.
A String can be modified if it's mut, and can also be copied, and later altered again. This makes them more suitable for properties that can and will change, or need to be computed at runtime.
Learning the difference here can be a bit bewildering if you're used to Python where strings are strings, but once you get a handle on it, you'll realize what's going on here.
"x" is a &str value, while String::from("x") is a String converted from that value. You can also do "x".into() if the type is well understood by the compiler, such as for a function argument or struct property.
Strings are also such a common thing that there's to_string() and to_owned(), both of which effectively do the same thing here on &str values.
If you want the best of both of these features, you can use Cow<str> which can encapsulate either an &str value, or a String, and you can convert from the "borrowed" value (&str) to an "owned" value via the to_owned() function.
These are more exotic, though, so I'd recommend only using them when you know what you're doing and need the performance gains they can offer.
--
1 Treat these like const char* in C++, versus std::string. The former is compiled into non-modifiable data in the executable, while the other uses a buffer that can be dynamically allocated.
In your example String::from() is not needed at all, it will work just fine without:
struct Person<'a> {
name: &'a str
}
fn main() {
let person = Person {
name: "Steve Austin"
};
println!("Person name = {}", person.name);
}
But it really depends on if your program would ever need to change Person.name and if you think that the struct should hold the data itself ("own" the data).
A str is a fixed string, it is stored in your executable and can be referenced in your program with a pointer & as a &str pointer. But it can not be modified.
In your case, "Steve Austin" can be used in your program as a &str.
In your example the name property is of type String. The contents of a String can be modified by your program because it lives in its own place in memory. This is where ownership comes in.
Rust will make sure that the String which is held by the struct lives long enough, by implicitly deriving both the lifetime of the struct and that of the String. Notice that to Rust, those are two different things. In your example, the struct really holds the data itself and so there is no question about what the lengths of the lifetimes should be.
In my example, Rust needs to be told the separate lifetimes for both the struct and the &str. Rust needs the &str to live (be present in memory) at least as long as the struct, because of the memory safety Rust offers via its borrow checker. This way the struct can not have a &str pointing to invalid memory (something that is not under control).
At first you might think that String in Rust is a pointer. But that is not the case. To Rust, String represents the data itself. When you want to reference that data, with a promise not to make modifications to it, you use a & (pointer). If you do want to make changes, you need to tell Rust your intention, and use a &mut (pointer to mutable data).
The difference of str and String in Rust will give you precise control over you program, and performance.

What benefits are there with making println a macro?

In this code, there is a ! after the println:
fn main() {
println!("Hello, world!");
}
In most languages I have seen, the print operation is a function. Why is it a macro in Rust?
By being a procedural macro, println!() gains the ability to:
Automatically reference its arguments. For example this is valid:
let x = "x".to_string();
println!("{}", x);
println!("{}", x); // Works even though you might expect `x` to have been moved on the previous line.
Accept an arbitrary number of arguments.
Validate, at compile time, that the format string placeholders and arguments match up. This is a common source of bugs with C's printf().
None of those are possible with plain functions or methods.
See also:
Does println! borrow or own the variable?
How can I create a function with a variable number of arguments?
Is it possible to write something as complex as `print!` in a pure Rust macro?
What is the difference between macros and functions in Rust?
Well, lets pretend we made those functions for a moment.
fn println<T: Debug>(format: &str, args: &[T]) {}
We would take in some format string and arguments to pass to format to it. So if we did
println("hello {:?} is your value", &[3]);
The code for println would search for and replace the {:?} with the debug representation for 3.
That's con 1 of doing these as functions - that string replacement needs to be done at runtime. If you have a macro you could imagine it essentially being the same as
print("hello ");
print("3");
println(" is your value);
But when its a function there needs to be runtime scanning and splitting of the string.
In general rust likes to avoid unneeded performance hits so this is a bummer.
Next is that T in the function version.
fn println<T: Debug>(format: &str, args: &[T]) {}
What this signature I made up says is that it expects an slice of things that implement Debug. But it also means that it expects all elements in the slice to be the same type, so this
println("Hello {:?}, {:?}", &[99, "red balloons"]);
wouldn't work because u32 and &'static str aren't the same T and therefore could be different sizes on the stack. To get that to work you'd need to do something like boxing each element and doing dynamic dispatch.
println("Hello {:?}, {:?}", &[Box::new(99), Box::new("red balloons")]);
This way you could have every element be Box<dyn Debug>, but you now have even more unneeded performance hits and the usage is starting to look kinda gnarly.
Then there is the requirement that they want to support printing both Debug and Display implementations.
println!("{}, {:?}", 10, 15);
and at this point there isn't a way to express this as a normal rust function.
There are more motivating reasons i'm sure, but this is just a sampling.
For (fun?) lets compare this to what happens in Java in similar circumstances.
In Java everything is, or can be, heap allocated. Everything also "inherits" a toString method from Object, meaning you can get a string representation for anything in your program using dynamic dispatch.
So when you use String.format, you get something similar to what is above for println.
public static String format(String format, Object... args) {
return new Formatter().format(format, args).toString();
}
Object... is just special syntax for accepting an array as a second argument at runtime that the Java compiler will let you write without the array explicitly there with {}s.
The big difference is that, unlike rust where different types have different sizes, things in Java are always* behind pointers. Therefore you don't need to know T ahead of time to make the bytecode/machine code to do this.
String.format("Hello %s, %s", 99, "red baloons");
which is doing much the same mechanically as this (ignoring JIT)
println("Hello {:?}, {:?}", &[Box::new(99), Box::new("red balloons")]);
So rust's problem is, how do you provide ergonomics at least as good as or greater than the Java version - which is what many are used to - without incurring unneeded heap allocations or dynamic dispatch. Macros give a mechanism for that solution.
(Java can also solve things like the Debug/Display issue since you can check at runtime for implemented interfaces, but that's not core to the reasoning here)
Add on the fact that using a macro instead of a function that takes a string and array means you can provide compile time errors for mismatched or missing arguments, and its a pretty solid design choice.

Is this mem::transmute::<&str, &'static str>(k) safe?

I'm looking at this code which is a very simple library with just one file and mostly tests, so it's short. There's a struct that I'm trying to understand:
pub struct ChallengeFields(HashMap<UniCase<CowStr>, (String, Quote)>);
struct CowStr(Cow<'static, str>);
There's a line where it does
pub fn get(&self, k: &str) -> Option<&String> {
self.0
.get(&UniCase(CowStr(Cow::Borrowed(unsafe {
mem::transmute::<&str, &'static str>(k)
}))))
.map(|&(ref s, _)| s)
}
I'm annoyed by this unsafe operation. I think CowStr is a Cow with 'static lifetime otherwise it'd be hard or impossible to store strs inside the map. Because of that, when I try to get something inside this map, the str in question has to have 'static lifetime. This is the reason for the transmute, right? If so, why simply not use String, so we can get rid of lifetimes and thus transmute? I don't like unsafe, and reading about transmute it looks very unsafe.
Also, I don't see why Cow is needed at all.
I think CowStr is a Cow with 'static lifetime otherwise it'd be hard or impossible to store strs inside the map.
Well yes and no, you can store &'static str inside a hashmap with no issue, the problem is that you can't store both &'static str and String.
Am I rigth? If so, why simply not use String, so we can get rid of lifetimes and thus transmute?
I assume that is an optimisation: with String you'd have to create an allocation every time you want to insert a challenge in the map, but if the overwhelming majority of challenge names would be Digest and Basic then that's a waste of time (and memory but mostly time), but at the same time you'd have to support String for custom auth schemes.
Now maybe in the grand scheme of things this is not an optimisation which actually matter and it'd be better off not doing that, I couldn't tell you.
I don't like unsafe, and reading about transmute it looks very unsafe.
It's a debatable thing to do, but in this case it's "safe", in the sense that the reference is valid for the entirety of the HashMap::get call and we know that that call doesn't keep the reference alive (it's reliance on an implementation detail which is a bit risky, but the odds that that would change are basically nil as it wouldn't make much sense).
Extending lifetimes is not in-and-of-itself UB (the mem::transmute documentation literally provides an example doing that), but requires care as you must avoid it causing UBs (the most likely being a dangling reference).

Why does the &str primitive exist?

If String is actually
pub struct String {
vec: Vec<u8>,
}
Then why is there a special syntax (&str) for a slice of a Vec<u8>? In Chapter 3 of "Programming Rust" by Jim Blandy & Jason Orendorff it says,
&str is very much like &[T]: a fat pointer to some data. String is analogous to Vec<T>
Following that statement there is a chart which shows all the ways they're similar, but there isn't any mention of a single method that they're different. Is a &str; just a &[T]?
Likewise in the answer to, What are the differences between Rust's String and str? it says
This is identical to the relationship between a vector Vec<T> and a slice &[T], and is similar to the relationship between by-value T and by-reference &T for general types.
That question focuses on the difference between String and &str. Knowing that a String really is a vector of u8, I'm more interested in &str, which I can't even find the source to. Why does this primitive even exist when we have a primitive (implemented as a fat pointer) for regular vector slices?
It exists for the same reason that String exists, and we don't just pass around Vec<u8> for every string.
A String is an owned, growable container of data that is guaranteed to be UTF-8.
&str is a borrowed, fixed-length container of data that is guaranteed to be UTF-8
A Vec<u8> is an owned, growable container of u8.
&[u8] is a borrowed, fixed-length container of u8.
This is effectively the reason that types exist, period — to provide abstraction and guarantees (a.k.a. restrictions) on a looser blob of bits.
If we had access to the string as &mut [u8], then we could trivially ruin the UTF-8 guarantee, which is why all such methods are marked as unsafe. Even with an immutable &[u8], we wouldn't be able to make assumptions (a.k.a. optimizations) about the data and would have to write much more defensive code everywhere.
but there isn't any mention of a single method that they're different
Looking at the documentation for str and slice quickly shows a number of methods that exist on one that don't exist on the other, so I don't understand your statement. split_last is the first one that caught my eye, for example.
&str is not necessarily a view to a String, it can be a view to anything that is a valid UTF-8 string.
For example, the crate arraystring allows creating a string on the stack that can be viewed as a &str.

How to create a String directly?

Is there any way to avoid calling .to_string() when I need a string? For example:
fn func1(aaa: String) -> ....
And instead of
func1("fdsfdsfd".to_string())
can I do something like this:
func1(s"fdsfdsfd")
TL;DR:
As of Rust 1.9, str::to_string, str::to_owned, String::from, str::into all have the same performance characteristics. Use whichever you prefer.
The most obvious and idiomatic way to convert a string slice (&str) to an owned string (String) is to use ToString::to_string. This works for any type that implements Display. This includes string slices, but also integers, IP addresses, paths, errors, and so on.
Before Rust 1.9, the str implementation of to_string leveraged the formatting infrastructure. While it worked, it was overkill and not the most performant path.
A lighter solution was to use ToOwned::to_owned, which is implemented for types that have a "borrowed" and an "owned" pair. It is implemented in an efficient manner.
Another lightweight solution is to use Into::into which leverages From::from. This is also implemented efficiently.
For your specific case, the best thing to do is to accept a &str, as thirtythreeforty answered. Then you need to do zero allocations, which is the best outcome.
In general, I will probably use into if I need to make an allocated string — it's only 4 letters long ^_^. When answering questions on Stack Overflow, I'll use to_owned as it's much more obvious what is happening.
No, the str::to_string() method is the canonical way of creating a String from an &'static str (a string literal). I even like it for the reason you dislike it: it's a little verbose. Because it involves a heap allocation, you should think twice before invoking it in cases such as these. Also note that since Rust gained impl specialization, str::to_string is no slower than str::to_owned or its ilk.
However, what you really want here is a func1 that can easily be passed any string, be it a &str or a String. Because a String will Deref to a &str, you can have func1 accept an &str, thereby avoiding the String allocation altogether. See this example (playground):
fn func1(s: &str) {
println!("{}", s);
}
fn main() {
let allocated_string: String = "owned string".to_string();
func1("static string");
func1(&allocated_string);
}
dtolnay:
I now strongly prefer to_owned() for string literals over either of to_string() or into().
What is the difference between String and &str? An unsatisfactory answer is “one is a string and the other is not a string” because obviously both are strings. Taking something that is a string and converting it to a string using to_string() seems like it misses the point of why we are doing this in the first place, and more importantly misses the opportunity to document this to our readers.
The difference between String and &str is that one is owned and one is not owned. Using to_owned() fully captures the reason that a conversion is required at a particular spot in our code.
struct Wrapper {
s: String
}
// I have a string and I need a string. Why am I doing this again?
Wrapper { s: "s".to_string() }
// I have a borrowed string but I need it to be owned.
Wrapper { s: "s".to_owned() }
vitalyd:
Not if you mentally read to_string as to_String

Resources