I'm learning Rust and the chapter for structs gives an example of a struct without a ; at the end. It compiles but I have no idea why this is allowed.
fn main() {
struct User {
username: String,
email: String,
sign_in_count: u64,
active: bool,
}
}
... same question goes for functions, actually.
As Shepmaster said in the comment, the "reason" is that Rust defines so. Here I will explain the rules behind it.
Basically you can omit ; when it ends with }. This will answer your question.
However, there are a number of exceptions to the rule:
When {} appears indirectly
The rule above doesn't apply when {} appears indirectly, like
use std::io::{self, Read, Write}; // Here }; appears
or
let x = if cond {
1
} else {
2
}; // Here }; appears
In this case, {} isn't a direct part of use/let. So in this case you need ;.
Items
Items are things which you can also place outside of functions. That is, one of extern crate, use, mod, struct, enum, union, type, trait, impl, fn, static, const, extern, and macros.
You can place items either outside of functions or in a function. However, There is a difference between them:
If it appears outside of functions, you have to omit ; when unnecessary.
If it appears in a function, you can also place ; there. This is basically because the ; itself is an empty statement.
Example:
struct A {} // You can't place ; here
fn main() {
struct B {} // You can omit ; here
struct C {}; // You can also place ; here
}
The last expression
You have to omit ; if
It is the last statement in the block,
it is an expression (items and let aren't expressions), and
you want to return the value from the expression.
Example:
fn f() -> i32 {
let x = 1;
x + x // You want to return x + x, so you can't place `;` here
}
Block expressions
if, if let, match, loop, while, while let, for, unsafe, and bare {} ends with }, so you can omit ; after them. However, there is a slight effect if you place ; here.
Example:
fn f(x: i32) -> i32 {
if x < 10 {
10
} else {
20
}; // If you remove ; here, then you will see a compile error.
42
}
In most cases, you don't have to place ; here; instead you may have to place ; in the blocks.
fn f(x: i32) -> i32 {
if x < 10 {
10;
} else {
20;
}
42
}
Statement macros
In statement positions, you can write three different kinds of macros:
some_macro!()/some_macro![]: this isn't in fact a statement macro; instead, this is a mere expression macro. It can't expand to items or let.
some_macro!{}: this expands to zero or more statements.
some_macro!();/some_macro![];/some_macro!{};: this also expands to zero or more statements; however, there is a very minor difference: ; is added to the last expanded statement.
Related
Let us say I have a function like follows:
fn log(msg: &str) {
//fancy_output
println!("{}", msg)
}
Now, if I want to log a variable using the function, I must do it like so:
let x = 5;
log(&format!("{:?}", x)); // Assume some complex data type which implements Debug
Clearly this is a lot of boilerplate. I could remove the & by making the argument a string, but that does not remove my bigger problem: using format!() everywhere.
How can I write a function/macro such that I can do the following or similar:
let x = 5;
log("{:?}", x) // Assume some complex data type which implements Debug
I know a place to start would be looking at the format! source code, but it is quite hard to understand for a beginner like me and how I might implement it here.
Do I use some fancy macro or is there a simpler way?
format! is a macro, not a function, which is why it is able to work with variable number of arguments. You can do the same with a declarative macro like this:
macro_rules! log {
($($args: tt)*) => {
println!($($args)*);
}
}
The $($args: tt)* means that the macro accepts zero or more (*) of any kind of token (tt). Then it just passes these on to the println macro.
Which you can call like this:
fn main() {
let x = 5;
log!("{:?}", x);
}
I'm trying to port this Python function that returns true if each character in the pattern appears in the test string in order.
def substr_match(pattern, document):
p_idx, d_idx, p_len, d_len = 0, 0, len(pattern), len(document)
while (p_idx != p_len) and (d_idx != d_len):
if pattern[p_idx].lower() == document[d_idx].lower():
p_idx += 1
d_idx += 1
return p_len != 0 and d_len != 0 and p_idx == p_len
This is what I have at the moment.
fn substr_match(pattern: &str, document: &str) -> bool {
let mut pattern_idx = 0;
let mut document_idx = 0;
let pattern_len = pattern.len();
let document_len = document.len();
while (pattern_idx != pattern_len) && (document_idx != document_len) {
let pat: Vec<_> = pattern.chars().nth(pattern_idx).unwrap().to_lowercase().collect();
let doc: Vec<_> = document.chars().nth(document_idx).unwrap().to_lowercase().collect();
if pat == doc {
pattern_idx += 1;
}
document_idx += 1;
}
return pattern_len != 0 && document_len != 0 && pattern_idx == pattern_len;
}
I tried s.chars().nth(n) since Rust doesn't seem to allow string indexing, but I feel there is a more idiomatic way of doing it. What would be the preferred way of writing this in Rust?
Here is mine:
fn substr_match(pattern: &str, document: &str) -> bool {
let pattern_chars = pattern.chars().flat_map(char::to_lowercase);
let mut doc_chars = document.chars().flat_map(char::to_lowercase);
'outer: for p in pattern_chars {
for d in &mut doc_chars {
if d == p {
continue 'outer;
}
}
return false;
}
true
}
The other answers mimic the behavior of the Python function you started with, but it may be worth trying to make it better. I thought of two test cases where the original function may have surprising behavior:
>>> substr_match("ñ", "in São Paulo")
True
>>> substr_match("🇺🇸", "🇺🇦🇸🇰")
True
Hmm.
(The first example may depend on your input method; try copying and pasting. Also, if you can't see them, the special characters in the second example are flag emoji for the United States, Ukraine, and Slovakia.)
Without getting into why these tests fail or all the other things that could potentially be undesired, if you want to correctly handle Unicode text, you need to, at minimum, operate on graphemes instead of code points (this question describes the difference). Rust doesn't provide this feature in the standard library, so you need the unicode-segmentation crate, which provides a graphemes method on str.
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
fn substr_match(pattern: &str, document: &str) -> bool {
let mut haystack = document.graphemes(true);
pattern.len() > 0 && pattern.graphemes(true).all(|needle| {
haystack
.find(|grapheme| {
grapheme
.chars()
.flat_map(char::to_lowercase)
.eq(needle.chars().flat_map(char::to_lowercase))
})
.is_some()
})
}
Playground, test cases provided.
This algorithm takes advantage of several convenience methods on Iterator. all iterates over the pattern. find short-circuits, so whenever it finds the next needle in haystack, the next call to haystack.find will start at the following element.
(I thought this approach was somewhat clever, but honestly, a nested for loop is probably easier to read, so you might prefer that.)
The last "tricky" bit is case-insensitive string comparison, which is inherently language-dependent, but if you're willing to accept only unconditional mappings (those that apply in any language), char::to_lowercase does the trick. Rather than collect the result into a String, though, you can use Iterator::eq to compare the sequences of (lowercased) characters.
One other thing you may want to consider is Unicode normalization -- this question is a good place for the broad strokes. Fortunately, Rust has a unicode-normalization crate, too! And it looks quite easy to use. (You wouldn't necessarily want to use it in this function, though; instead, you might normalize all text on input so that you're dealing with the same normalization form everywhere in your program.)
str::chars() returns an iterator. Iterators return elements from a sequence one at a time. Specifically, str::chars() returns characters from a string one at a time. It's much more efficient to use a single iterator to iterate over a string than to create a new iterator each time you want to look up a character, because s.chars().nth(n) needs to perform a linear scan in order to find the nth character in the UTF-8 encoded string.
fn substr_match(pattern: &str, document: &str) -> bool {
let mut pattern_iter = pattern.chars();
let mut pattern_ch_lower: String = match pattern_iter.next() {
Some(ch) => ch,
None => return false,
}.to_lowercase().collect();
for document_ch in document.chars() {
let document_ch_lower: String = document_ch.to_lowercase().collect();
if pattern_ch_lower == document_ch_lower {
pattern_ch_lower = match pattern_iter.next() {
Some(ch) => ch,
None => return true,
}.to_lowercase().collect();
}
}
return false;
}
Here, I'm demonstrating two ways of using iterators:
To iterate over the pattern, I'm using the next method manually. next returns an Option: Some(value) if the iterator hasn't finished, or None if it has.
To iterate over the document, I'm using a for loop. The for loop does the work of calling next and unwrapping the result until next returns None.
One thing to notice is that I'm using a return expression inside a match expression (twice). Since a return expression doesn't produce a value, the compiler knows that its type doesn't matter. In this case, on the Some arm, the result is a char, so the whole match evaluates to a char.
We could also do this with two nested for loops:
fn substr_match(pattern: &str, document: &str) -> bool {
if pattern.len() == 0 {
return false;
}
let mut document_iter = document.chars();
for pattern_ch in pattern.chars() {
let pattern_ch_lower: String = pattern_ch.to_lowercase().collect();
for document_ch in &mut document_iter {
let document_ch_lower: String = document_ch.to_lowercase().collect();
if pattern_ch_lower == document_ch_lower {
break;
}
}
return false;
}
return true;
}
There are two things to notice here:
We need to handle the case where the pattern is empty without using the iterator.
In the inner loop, we don't want to restart from the start of the document when we move to the next pattern character, so we need to reuse the same iterator over the document. When we write for x in iter, the for loop takes ownership of iter; to avoid that, we must write &mut iter instead. Mutable references to iterators are iterators themselves, thanks to the blanket implementation impl<'a, I> Iterator for &'a mut I where I: Iterator + ?Sized in the standard library.
I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.
I'm trying to compute the 10,001st prime in Rust (Project Euler 7), and as a part of this, my method to check whether or not an integer is prime references a vector:
fn main() {
let mut count: u32 = 1;
let mut num: u64 = 1;
let mut primes: Vec<u64> = Vec::new();
primes.push(2);
while count < 10001 {
num += 2;
if vectorIsPrime(num, primes) {
count += 1;
primes.push(num);
}
}
}
fn vectorIsPrime(num: u64, p: Vec<u64>) -> bool {
for i in p {
if num > i && num % i != 0 {
return false;
}
}
true
}
When I try to reference the vector, I get the following error:
error[E0382]: use of moved value: `primes`
--> src/main.rs:9:31
|
9 | if vectorIsPrime(num, primes) {
| ^^^^^^ value moved here, in previous iteration of loop
|
= note: move occurs because `primes` has type `std::vec::Vec<u64>`, which does not implement the `Copy` trait
What do I have to do to primes in order to be able to access it within the vectorIsPrime function?
With the current definition of your function vectorIsPrime(), the function specifies that it requires ownership of the parameter because you pass it by value.
When a function requires a parameter by value, the compiler will check if the value can be copied by checking if it implements the trait Copy.
If it does, the value is copied (with a memcpy) and given to the function, and you can still continue to use your original value.
If it doesn't, then the value is moved to the given function, and the caller cannot use it afterwards
That is the meaning of the error message you have.
However, most functions do not require ownership of the parameters: they can work on "borrowed references", which means they do not actually own the value (and cannot for example put it in a container or destroy it).
fn main() {
let mut count: u32 = 1;
let mut num: u64 = 1;
let mut primes: Vec<u64> = Vec::new();
primes.push(2);
while count < 10001 {
num += 2;
if vector_is_prime(num, &primes) {
count += 1;
primes.push(num);
}
}
}
fn vector_is_prime(num: u64, p: &[u64]) -> bool {
for &i in p {
if num > i && num % i != 0 {
return false;
}
}
true
}
The function vector_is_prime() now specifies that it only needs a slice, i.e. a borrowed pointer to an array (including its size) that you can obtain from a vector using the borrow operator &.
For more information about ownership, I invite you to read the part of the book dealing with ownership.
Rust is, as I would say, a “value-oriented” language. This means that if you define primes like this
let primes: Vec<u64> = …
it is not a reference to a vector. It is practically a variable that stores a value of type Vec<u64> just like any u64 variable stores a u64 value. This means that if you pass it to a function defined like this
fn vec_is_prime(num: u64, vec: Vec<u64>) -> bool { … }
the function will get its own u64 value and its own Vec<u64> value.
The difference between u64 and Vec<u64> however is that a u64 value can be easily copied to another place while a Vec<u64> value can only move to another place easily. If you want to give the vec_is_prime function its own Vec<u64> value while keeping one for yourself in main, you have to duplicate it, somehow. That's what's clone() is for. The reason you have to be explicit here is because this operation is not cheap. That's one nice thing about Rust: It's not hard to spot expensive operations. So, you could call the function like this
if vec_is_prime(num, primes.clone()) { …
but that's not really what you want, actually. The function does not need its own a Vec<64> value. It just needs to borrow it for a short while. Borrowing is much more efficient and applicable in this case:
fn vec_is_prime(num: u64, vec: &Vec<u64>) -> bool { …
Invoking it now requires the “borrowing operator”:
if vec_is_prime(num, &primes) { …
Much better. But we can still improve it. If a function wants to borrow a Vec<T> just for the purpose of reading it, it's better to take a &[T] instead:
fn vec_is_prime(num: u64, vec: &[u64]) -> bool { …
It's just more general. Now, you can lend a certain portion of a Vec to the function or something else entirely (not necessarily a Vec, as long as this something stores its values consecutively in memory, like a static lookup table). What's also nice is that due to coersion rules you don't need to alter anything at the call site. You can still call this function with &primes as argument.
For String and &str the situation is the same. String is for storing string values in the sense that a variable of this type owns that value. &str is for borrowing them.
You move value of primes to the function vectorIsPrime (BTW Rust use snake_case by convention). You have other options, but the best one is to borrow vector instead of moving it:
fn vector_is_prime(num: u64, p: &Vec<u64>) -> bool { … }
And then passing reference to it:
vector_is_prime(num, &primes)
Sometimes I have to act on information that is expressed in a long sequence, like:
f1(f2(f3).f4(x,f5(y,z))).f6().f7()
(not necessarily that, just any long sequence you don't want to repeat). And I may need to reference that multiple times, with other code in-between. Like this:
fn myfunc(v: &T) -> X {
match v.func(func(v.func().func())).func() {
...
}
.. other stuff ..
match v.func(func(v.func().func())).func() {
...
}
}
The value is not movable, so I cannot assign it to a variable and then reference the variable twice like in other languages, so essentially I find myself writing the same sequence of function calls multiple times. I tried something like this
let x = &( ... )
and then using this
*x
but this didn't work. I suppose I could use a macro, but then it would be recomputed each time (... which isn't too bad as most of the function calls are just sugar for the compiler and type system), but that's the best I have solved this so far. Is there another way?
If the value is not Copy, then you either need to copy it, or pass by reference. E.g. suppose it was computing a value of type T. I suppose the problem you're currently meeting is
fn foo(x: T) { ... }
fn bar(x: T) { ... }
let your_thing = f1(f2(f3).f4(x,f5(y,z))).f6().f7();
foo(your_thing);
bar(your_thing); // error: use of moved value
The correct fix is changing the foo lines to
fn foo(x: &T) { ... }
foo(&your_thing);
or, the foo call to foo(your_thing.clone()) (if T is Clone). You can decide which one is appropriate by thinking about what sort of ownership foo needs of the T: if it needs full ownership (e.g. passing it to different tasks), you should take it by value foo(x: T); on the other hand, if it only needs a view on to the data (i.e. no ownership), then take a reference foo(x: &T).
See also "Moves vs Copy in Rust" for some background on moving and copying. It includes an explanation of why the &(...) + *x solution isn't work: can't move out from behind a reference (although, in this case it will never work, because moving out twice is illegal anyway).
The same reasoning applies for pattern matching: if you only need a reference, you can take a reference into the value of interest via ref. E.g. imagine you're computing an Option<T>.
let x = v.func(func(v.func().func())).func()
match x {
Some(ref y) => { /* y is a &T */ ... }
None => { ... }
}
// the last `match` can move `x`
match x {
Some(y) => { /* y is a T */ ... }
None => { ... }
}
If the first match does need ownership of some parts of x, you can either clone x itself, or only those parts you need after matching with ref.