Does idiomatic rust code always avoid 'unsafe'? [closed]

Does idiomatic rust code always avoid 'unsafe'? [closed] - rust

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm doing leetcode questions to get better at solving problems and expressing those solutions in rust, and I've come across a case where it feels like the most natural way of expressing my answer includes unsafe code. Here's what I wrote:
const _0: u8 = '0' as u8;
const _1: u8 = '1' as u8;
pub fn add_binary(a: String, b: String) -> String {
let a = a.as_bytes();
let b = b.as_bytes();
let len = a.len().max(b.len());
let mut result = vec![_0; len];
let mut carry = 0;
for i in 1..=len {
if i <= a.len() && a[a.len() - i] == _1 {
carry += 1
}
if i <= b.len() && b[b.len() - i] == _1 {
carry += 1
}
if carry & 1 == 1 {
result[len - i] = _1;
}
carry >>= 1;
}
if carry == 1 {
result.insert(0, _1);
}
unsafe { String::from_utf8_unchecked(result) }
}
The only usage of unsafe is to do an unchecked conversion of a Vec<u8> to a String, and there is no possibility of causing undefined behaviour in this case because the Vec always just contains some sequence of two different valid ascii characters. I know I could easily do a checked cast and unwrap it, but that feels a bit silly because of how certain I am that the check can never fail. In idiomatic rust, is this bad practice? Should unsafe be unconditionally avoided unless the needed performance can't be achieved without it, or are there exceptions (possibly like this) where it's okay? At risk of making the question too opinionated, what are the rules that determine when it is and isn't okay to use unsafe?

You should avoid unsafe unless there are 2 situations:
You are doing something which impossible to do in safe code e.g. FFI-calls. It is a main reason why unsafe ever exists.
You proved using benchmarks that unsafe provide big speed-up and this code is bottleneck.
Your arguing
I know I could easily do a checked cast and unwrap it, but that feels a bit silly because of how certain I am that the check can never fail.
is valid about current version of your code but you would need to keep this unsafe in mind during all further development.
Unsafe greatly increase cognitive complexity of code. You cannot change any place in your function without keeping unsafe in mind, for example.
I doubt that utf8 validation adds more overhead than possible reallocation in result.insert(0, _1); in your code.
Other nitpicks:
You should add a comment in unsafe section which explains why it is safe. It would make easier to read a code for a other people (or other you after a year of don't touching it).
You could define your constants as const _0: u8 = b'0';

Related

Why does the order matter in closures? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 months ago.
Improve this question
In rust, order of the functions do not matter unlike C or C++ where main function should be at the end. But it is not the same for closures and I wonder why
For example, this code compiles:
fn main(){
println!("{}", add_up(1));
}
fn add_up(x: i32) -> i32{
x + y
}
const y:i32 = 1;
and this code does not compile:
fn main(){
let add_up = |x:i32| x + y;
let y = 1;
println!("{}", add_up(1));
}
I know in the first one y is a global variable and that is not a fair comparison but this is the example I can think of at the moment. The main point is, generally order does not matter in rust (where is main located or where is add_up located) but not the same for closures. Why?

The difference between a local variable (which can be captured by a closure) and a constant or static is that the latter don't have "scope", they are available before the program starts and last for as long as the program. A variable, on the other hand, has a clear scope - a place where it's introduced and a place where it's destructed, and attempt to use it outside that region cannot work.
For example, if it were allowed to capture variables that are not yet created might result in non-sensical code like this:
let c = || x;
println!("{}", c()); // what does this print?
let x = some_function_that_takes_user_input();
let x = x + 1;
{
let x = 100;
}
A static or a const, on the other hand, doesn't have this issue because it's not created at any time, it always exists for as long as the program.

Possible memory leak with mem::swap in tokio routine [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 months ago.
Improve this question
The process will killed by OOM, it seems memory can't be released in somewhere, where does the problem arise? static, routine or mem::swap? Or should I solve it.
#[macro_use]
extern crate lazy_static;
use std::mem;
use std::sync::Arc;
use parking_lot::Mutex;
lazy_static! {
pub static ref Buffer: Arc<Mutex<Vec<i64>>> = Arc::new(Mutex::new(Vec::with_capacity(100)));
}
pub async fn consume() {
let mut lk = Buffer.lock();
lk.clear();
drop(lk);
}
#[tokio::main]
async fn main() {
let mut vec: Vec<i64> = Vec::with_capacity(100);
loop {
vec.push(1);
if vec.len() == 100 {
let mut lk = Buffer.lock();
mem::swap(&mut vec, &mut *lk);
drop(lk);
tokio::spawn(async move {
consume().await
});
}
}
}

I think this is what you expect this code to do (please correct me if this is wrong):
vec is filled up to 100 elements and put into buffer.
while it is being filled up again, the background task frees the buffer.
rinse and repeat.
But consider what happens when your main loop managed to acquire the lock twice in a row: then the task didn't have time to clear the buffer and after the mem::swap call, the vec still has 100 elements.
From this point on, vec.len() == 100 will never be true and your code simply appends to the vector in a loop.
Note that this only needs to happen once and then your code never recovers.
The fix to this is to, instead of only trying to clear the vector when it has exactly 100 elements, do so whenever it has at least 100 elements:
if vec.len() >= 100 {

How to use mutexes in threads without Arc? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a function that is supposed to search for primes within a given range. (The algorithm is not important; please ignore the fact that it is very inefficient.)
use std::thread;
use std::sync::Mutex;
use std::convert::TryInto;
/// Takes the search range as (start, end) and outputs a vector with the primes found within
/// that range.
pub fn run(range: (u32, u32)) -> Vec<u32> {
let mut found_primes: Mutex<Vec<u32>> = Mutex::new(Vec::new());
let num_threads: usize = 8;
let num_threads_32: u32 = 8;
let join_handles: Vec<thread::JoinHandle<()>> = Vec::with_capacity(num_threads);
// ERROR: `found_primes` does not live long enough
let vec_ref = &found_primes;
for t in 0..num_threads_32 {
thread::spawn(move || {
let mut n = range.0 + t;
'n_loop: while n < range.1 {
for divisor in 2..n {
if n % divisor == 0 {
n += num_threads_32;
continue 'n_loop;
}
}
// This is the part where I try to add a number to the vector
vec_ref.lock().expect("Mutex was poisoned!").push(n);
n += num_threads_32;
}
println!("Thread {} is done.", t);
});
}
for handle in join_handles {
handle.join();
}
// ERROR: cannot move out of dereference of `std::sync::MutexGuard<'_, std::vec::Vec<u32>>`
*found_primes.lock().expect("Mutex was poisoned!")
}
I managed to get it working with std::sync::mpsc, but I'm pretty sure it can be done just with mutexes. However, the borrow checker doesn't like it.
The errors are in comments. I (think I) understand the first error: the compiler can't prove that &found_primes won't be used within a thread after found_primes is dropped (when the function returns), even though I .join() all the threads before that. I'm guessing I'll need unsafe code to make it work. I don't understand the second error, though.
Can someone explain the errors and tell me how to do this with only Mutexes?

The last error is complaining about trying to move the contents out of the mutex. The .lock() returns a MutexGuard that only yields a reference to the contents. You can't move from it or it would leave the mutex in an invalid state. You can get an owned value by cloning, but that shouldn't be necessary if the mutex is going away anyway.
You can use .into_inner() to consume the mutex and return what was in it.
found_primes.into_inner().expect("Mutex was poisoned!")
See this fix and the other linked fix on the playground.

Why is this looping in Rust so slow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm new to Rust, as in this is the first code I've written. I'm trying to do some benchmarks for an app we will build against Go, however my Rust POC is ridiculously slow and I'm sure it's because I don't fully understand the language yet. This runs in seconds in Go, but has been running for many minutes in Rust:
use serde_json::{Result, Value};
use std::fs::File;
use std::io::BufReader;
fn rule(data: Value) {
for _i in 0..1000000000 {
let ru = "589ea4b8-99d1-8d05-9358-4c172c10685b";
let s = 0 as usize;
let tl = data["tl"].as_array().unwrap().len();
for i in s..tl {
if data["tl"][i]["t"] == "my_value" && data["tl"][i]["refu"] == ru {
//println!(" t {} matched with reference/ru {}\n", data["tl"][i]["t"], data["tl"][i]["refu"]);
let el = data["el"].as_array().unwrap().len();
for j in s..el {
if data["el"][j]["is_inpatient"] == true && data["el"][j]["eu"] == data["tl"][i]["eu"] {
//println!(" e {} matched.\n", data["el"][j]["eu"]);
}
}
}
}
}
}
fn start() -> Result<()> {
let file = File::open("../../data.json").expect("File should open read only");
let reader = BufReader::new(file);
let v: Value = serde_json::from_reader(reader).expect("JSON was not well-formatted");
//println!("Running rule");
rule(v);
Ok(())
}
fn main() {
let _r = start();
}
1) I know this is super ugly. It's just a speed POC so if it wins I plan to figure out the language in more detail later.
2) The big one: What am I doing wrong here that's causing Rust to perform so slowly?

Build with --release flag. Without it you get unoptimized build with debug checks, and this can be literally 100 times slower. Adding that flag is usually all you need to do to beat Go on execution speed, because Go doesn't have a heavy optimizer like this.
Rust doesn't do anything clever with caching of [] access, so every time you repeat data["tl"] it searches data for "tl". It would be good to cache that search in a variable.
Loops in form for i in 0..len {arr[i]} are the slowest form of loop in Rust. It's faster to use iterators: for item in arr {item}. That's because [i] does an extra bounds check, and iterators don't have to. In your case that's probably a tiny issue, that's more relevant to heavy numeric code.

Are there any nice cases where we should use `unwrap`? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Since using unwrap may be problematic because it crashes in the error scenario, it may be considered as dangerous usage.
What if I am hundred percent sure that it will not crash, like in the following scenarios:
if option.is_some() {
let value = option.unwrap();
}
if result.is_ok() {
let result_value = result.unwrap();
}
Since we already checked the Result and Option there will be no crash with the unwrap() usage. However, we could have used match or if let. In my opinion, either match or if let usage is more elegant.

Let's focus on Result; I'll go back to Option at the end.
The purpose of Result is to signal a result which may succeed or fail with an error. As such, any use of it should fall into this category. Let's ignore the cases where a crate returns Result for impossible-to-fail operations.
By doing what you are doing (checking if result.is_ok() then extracting the value), you're effectively doing the same thing twice. The first time, you're inspecting the content of the Result, and the second, you're checking and extracting unsafely.
This could indeed have been done with a match or map, and both would have been more idiomatic than an if. Consider this case for a moment:
You have an object implementing the following trait:
use std::io::{Error, ErrorKind};
trait Worker {
fn hours_left(&self) -> Result<u8, Error>;
fn allocate_hours(&mut self, hours: u8) -> Result<u8, Error>;
}
We're going to assume hours_left() does exactly what it says on the tin. We'll also assume we have a mutable borrow of Worker. Let's implement allocate_hours().
In order to do so, we'll obviously need to check if our worker has extra hours left over to allocate. You could write it similar to yours:
fn allocate_hours(&mut self, hours: u8) {
let hours_left = self.hours_left();
if (hours_left.is_ok()) {
let remaining_hours = hours_left.unwrap();
if (remaining_hours < hours) {
return Err(Error::new(ErrorKind::NotFound, "Not enough hours left"));
}
// Do the actual operation and return
} else {
return hours_left;
}
}
However, this implementation is both clunky and inefficient. We can simplify this by avoiding unwrap and if statements altogether.
fn allocate_hours(&mut self, hours: u8) -> Result<u8, Error> {
self.hours_left()
.and_then(|hours_left| {
// We are certain that our worker is actually there to receive hours
// but we are not sure if he has enough hours. Check.
match hours_left {
x if x >= hours => Ok(x),
_ => Err(Error::new(ErrorKind::NotFound, "Not enough hours")),
}
})
.map(|hours_left| {
// At this point we are sure the worker has enough hours.
// Do the operations
})
}
We've killed multiple birds with one stone here. We've made our code more readable, easier to follow and we've removed a whole bunch of repeated operations. This is also beginning to look like Rust and less like PHP ;-)
Option is similar and supports the same operations. If you want to process the content of either Option or Result and branch accordingly, and you're using unwrap, there are so many pitfalls you'll inevitably fall into when you forget you unwrapped something.
There are genuine cases where your program should barf out. For those, consider expect(&str) as opposed to unwrap()

In many, many cases you can avoid unwrap and others by more elegant means. However, I think there are situations where it is the correct solution to unwrap.
For example, many methods in Iterator return an Option. Let us assume that you have a nonempty slice (known to be nonempty by invariants) and you want to obtain the maximum, you could do the following:
assert!(!slice.empty()); // known to be nonempty by invariants
do_stuff_with_maximum(slice.iter().max().unwrap());
There are probably several opinions regarding this, but I would argue that using unwrap in the above scenario is perfectly fine - in the presence of the preceeding assert!.
My guideline is: If the parameters I am dealing with are all coming from my own code, not interfacing with 3rd party code, possibly assert!ing invariants, I am fine with unwrap. As soon as I am the slightest bit unsure, I resort to if, match, map and others.
Note that there is also expect which is basically an "unwrap with a comment printed in the error case". However, I have found this to be not-really-ergonomic. Moreover, I found the backtraces a bit hard to read if unwrap fails. Thus, I currently use a macro verify! whose sole argument is an Option or Result and that checks that the value is unwrapable. It is implemented like this:
pub trait TVerifiableByVerifyMacro {
fn is_verify_true(&self) -> bool;
}
impl<T> TVerifiableByVerifyMacro for Option<T> {
fn is_verify_true(&self) -> bool {
self.is_some()
}
}
impl<TOk, TErr> TVerifiableByVerifyMacro for Result<TOk, TErr> {
fn is_verify_true(&self) -> bool {
self.is_ok()
}
}
macro_rules! verify {($e: expr) => {{
let e = $e;
assert!(e.is_verify_true(), "verify!({}): {:?}", stringify!($e), e)
e
}}}
Using this macro, the aforementioned example could be written as:
assert!(!slice.empty()); // known to be nonempty by invariants
do_stuff_with_maximum(verify!(slice.iter().max()).unwrap());
If I can't unwrap the value, I get an error message mentioning slice.iter().max(), so that I can search my codebase quickly for the place where the error occurs. (Which is - in my experience - faster than looking through the backtrace for the origin of the error.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Does idiomatic rust code always avoid 'unsafe'? [closed] - rust

Related

Why does the order matter in closures? [closed]

Possible memory leak with mem::swap in tokio routine [closed]

How to use mutexes in threads without Arc? [closed]

Why is this looping in Rust so slow? [closed]

Are there any nice cases where we should use `unwrap`? [closed]

Categories

Resources