Does partial application in Rust have overhead?

Does partial application in Rust have overhead? - rust

I like using partial application, because it permits (among other things) to split a complicated function call, that is more readable.
An example of partial application:
fn add(x: i32, y: i32) -> i32 {
x + y
}
fn main() {
let add7 = |x| add(7, x);
println!("{}", add7(35));
}
Is there overhead to this practice?
Here is the kind of thing I like to do (from a real code):
fn foo(n: u32, things: Vec<Things>) {
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n); // ThingMultiplier is an Iterator
let new_things = things.clone().into_iter().flat_map(create_new_multiplier);
things.extend(new_things);
}
This is purely visual. I do not like to imbricate too much the stuff.

There should not be a performance difference between defining the closure before it's used versus defining and using it it directly. There is a type system difference — the compiler doesn't fully know how to infer types in a closure that isn't immediately called.
In code:
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n);
things.clone().into_iter().flat_map(create_new_multiplier)
will be the exact same as
things.clone().into_iter().flat_map(|thing| {
ThingMultiplier::new(thing, n)
})
In general, there should not be a performance cost for using closures. This is what Rust means by "zero cost abstraction": the programmer could not have written it better themselves.
The compiler converts a closure into implementations of the Fn* traits on an anonymous struct. At that point, all the normal compiler optimizations kick in. Because of techniques like monomorphization, it may even be faster. This does mean that you need to do normal profiling to see if they are a bottleneck.

In your particular example, yes, extend can get inlined as a loop, containing another loop for the flat_map which in turn just puts ThingMultiplier instances into the same stack slots holding n and thing.
But you're barking up the wrong efficiency tree here. Instead of wondering whether an allocation of a small struct holding two fields gets optimized away you should rather wonder how efficient that clone is, especially for large inputs.

Related

Stable alternative to collect_into - or - how do I collect a sized queue?

I have some pipeline which manipulates an iterator to a very big data set, and at the end, I wish to just keep the N top values.
I wrote a wrapper around a Vec - a struct which holds the Vec and its max size, and implements insertion such that the data in the vec is always ordered, and values which are too small would get ignored (could have also used a BTreeSet, if N is large enough).
Anyway, I thought I'd use it as follows:
let mut q = SizedQueue(5);
<my iterator pipleline>.collect_into(&mut q);
but I was disappointed to discover that collect_into is unstable, and could potentially be dropped because it might be deemed unnecessary, the reason given is that it could be done differently.
My question is - how could it be done differently (other than me just implementing a Trait for Iterator with this functionality myself)?

collect_into() is just a convenient shortcut to calling Extend::extend():
let mut q = SizedQueue(5);
q.extend(<my iterator pipleline>);
Of course, you need to implement Extend for your type. A simple implementation may look like:
impl<T: PartialOrd> Extend<T> for SizedQueue<T> {
fn extend<I: IntoIterator<Item = T>>(&mut self, iter: I) {
for item in iter {
self.push(item);
}
}
}
But if this is only for one use site where you call extend(), you may as well just inline it and loop and push().

How to dynamically signal to Rust compiler that a given variable is non-zero?

I'd like to try to eliminate bounds checking on code generated by Rust. I have variables that are rarely zero and my code paths ensure they do not run into trouble. But because they can be, I cannot use NonZeroU64. When I am sure they are non-zero, how can I signal this to the compiler?
For example, if I have the following function, I know it will be non-zero. Can I tell the compiler this or do I have to have the unnecessary check?
pub fn f(n:u64) -> u32 {
n.trailing_zeros()
}
I can wrap the number in NonZeroU64 when I am sure, but then I've already incurred the check, which defeats the purpose ...

Redundant checks within a single function body can usually be optimized out. So you just need convert the number to NonZeroU64 before calling trailing_zeros(), and rely on the compiler to optimize the bound checks.
use std::num::NonZeroU64;
pub fn g(n: NonZeroU64) -> u32 {
n.trailing_zeros()
}
pub fn other_fun(n: u64) -> u32 {
if n != 0 {
println!("Do something with non-zero!");
let n = NonZeroU64::new(n).unwrap();
g(n)
} else {
42
}
}
In the above code, the if n != 0 makes sure n cannot be zero within the block, and compiler is smart enough to remove the unwrap call, making NonZeroU64::new(n).unwrap() an zero-cost operation. You can check the asm to verify that.

core::intrinsics::assume
Informs the optimizer that a condition is always true. If the
condition is false, the behavior is undefined.
No code is generated for this intrinsic, but the optimizer will try to
preserve it (and its condition) between passes, which may interfere
with optimization of surrounding code and reduce performance. It
should not be used if the invariant can be discovered by the optimizer
on its own, or if it does not enable any significant optimizations.
This intrinsic does not have a stable counterpart.

How can I mutate a shared variable from multiple threads, disregarding data races?

How can I mutate the variable i inside the closure? Race conditions are considered to be acceptable.
use rayon::prelude::*;
fn main() {
let mut i = 0;
let mut closure = |_| {
i = i + 1;
};
(0..100).into_par_iter().for_each(closure);
}
This code fails with:
error[E0525]: expected a closure that implements the `Fn` trait, but this closure only implements `FnMut`
--> src\main.rs:6:23
|
6 | let mut closure = |_| {
| ^^^ this closure implements `FnMut`, not `Fn`
7 | i = i + 1;
| - closure is `FnMut` because it mutates the variable `i` here
...
10 | (0..100).into_par_iter().for_each(closure);
| -------- the requirement to implement `Fn` derives from here

There is a difference between a race condition and a data race.
A race condition is any situation when the outcome of two or more events depends on which one happens first, and nothing enforces a relative ordering between them. This can be fine, and as long as all possible orderings are acceptable, you may accept that your code has a race in it.
A data race is a specific kind of race condition where the events are unsynchronized accesses to the same memory and at least one of them is a mutation. Data races are undefined behavior. You cannot "accept" a data race because its existence invalidates the entire program; a program with an unavoidable data race in it does not have any defined behavior at all, so it does nothing useful.
Here's a version of your code that has a race condition, but not a data race:
use std::sync::atomic::{AtomicI32, Ordering};
let i = AtomicI32::new(0);
let closure = |_| {
i.store(i.load(Ordering::Relaxed) + 1, Ordering::Relaxed);
};
(0..100).into_par_iter().for_each(closure);
Because the loads and stores are not ordered with respect to the concurrently executing threads, there is no guarantee that the final value of i will be exactly 100. It could be 99, or 72, or 41, or even 1. This code has indeterminate, but defined behavior because although you don't know the exact order of events or the final outcome, you can still reason about its behavior. In this case, you can prove that the final value of i must be at least 1 and no greater than 100.
Note that in order to write this racy code, I still had to use AtomicI32 and atomic load and store. Not caring about the order of events in different threads doesn't free you from having to think about synchronizing memory access.
If your original code compiled, it would have a data race.¹ This means there are no guarantees about its behavior at all. So, assuming you actually accept data races, here's a version of your code that is consistent with what a compiler is allowed to do with it:
fn main() {}
Oh, right, undefined behavior must never occur. So this hypothetical compiler just deleted all your code because it is never allowed to run in the first place.
It's actually even worse than that. Suppose you had written something like this:
fn main() {
let mut i = 0;
let mut closure = |_| {
i = i + 1;
};
(0..100).into_par_iter().for_each(closure);
if i < 100 || i >= 100 {
println!("this should always print");
} else {
println!("this should never print");
}
}
What should this code print? If there are no data races, this code must emit the following:
this should always print
But if we allow data races, it might also print this:
this should never print
Or it could even print this:
this should never print
this should always print
If you think there is no way it could do the last thing, you are wrong. Undefined behavior in a program cannot be accepted, because it invalidates analysis even of correct code that has nothing obvious to do with the original error.
How likely is any of this to happen, if you just use unsafe and ignore the possibility of a data race? Well, probably not very likely, to be honest. If you use unsafe to bypass the checks and look at the generated assembly, it's likely to even be correct. But the only way to be sure is to write in assembly language directly, understand and code to the machine model: if you want to use Rust, you have to code to Rust's model, even if that means you lose a little performance.
How much performance? Probably not much, if anything. Atomic operations are very efficient and on many architectures, including the one you're probably using right now to read this, they actually are exactly as fast as non-atomic operations in cases like this. If you really want to know how much potential performance you lose, write both versions and benchmark them, or simply compare the assembly code with and without atomic operations.
¹ Technically, we can't say that a data race must occur, because it depends on whether any threads actually access i at the same time or not. If for_each decided for some reason to run all the closures on the same OS thread, for example, this code would not have a data race. But the fact that it may have a data race still poisons our analysis because we can't be sure it doesn't.

You cannot do that exactly, you need to ensure that some safe synchronisation happens in the under-layers for example. For example using an Arc + some kind of atomics operations.
You have some examples in the documentation:
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;
let val = Arc::new(AtomicUsize::new(5));
for _ in 0..10 {
let val = Arc::clone(&val);
thread::spawn(move || {
let v = val.fetch_add(1, Ordering::SeqCst);
println!("{:?}", v);
});
}
Playground
(as Adien4 points: there is no need of the Arc or the move in the second example- Rayon only requires the Fn to be Send + Sync)
Which lead us to your example, that could be adapted as:
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use rayon::prelude::*;
fn main() {
let i = AtomicUsize::new(5);
let mut closure = |_| {
i.fetch_add(1, Ordering::SeqCst);
};
(0..100).into_par_iter().for_each(closure);
}
Playground

This is not possible as it would require parallel access to i which causes race conditions. You can try to use a Mutex to allow access from multiple threads.

The accepted answer explains the situation thoroughly - you definitely don't want data races in your code, because they are undefined behavior, and distinct from the more general "race conditions". Nor do you need data races to update shared data, there are better efficient ways to do that. But to satisfy curiosity, this answer attempts to answer the question as literally asked - if you were reckless enough to intentionally ignore data races and incur undefined behavior at your own peril, could you do it in unsafe Rust?
You indeed can. Code and discussion in this answer is provided for educational purposes, such as to check what kind of code the compiler generates. If code that intentionally incurs UB offends you, please stop reading here. You've been warned. :)
The obvious way to convince Rust to allow this data race is to create a raw pointer to mut i, send the pointer to the closure, and dereference it to mutate i. This dereference is unsafe because it leaves it to the programmer to ensure that no mutable references exist simultaneously, and that writes to the underlying data are synchronized with other accesses to it. While we can easily ensure the former by just not creating a reference, we obviously won't ensure the latter:
// Must wrap raw pointer in type that implements Sync.
struct Wrap(*mut i32);
unsafe impl Sync for Wrap {}
// Contains undefined behavior - don't use this!
fn main() {
let mut i = 0;
let i_ptr = Wrap(&mut i as *mut i32);
let closure = |_| {
unsafe { *i_ptr.0 = *i_ptr.0 + 1 }; // XXX: UB!
};
(0..100).into_par_iter().for_each(closure);
println!("{}", i);
}
Playground
Note that pointers don't implement Sync or Send, so they require a wrapper to use them in threads. The wrapper unsafely implements Sync, but this unsafe is actually not UB - accessing to the pointer is safe, and there would be no UB if we, say, only printed it, or even dereferenced it for reading (as long as no one else writes to i). Writing to the dereferenced pointer is where we create UB, and that itself requires unsafe.
While this is the kind of code that the OP might have been after (it even prints 100 when run), it's of course still undefined behavior, and could break on a different hardware, or when upgraded to a different compiler. Making even a slight change to the code, such as using let i_ref = unsafe { &mut *i_ptr } to create a mutable reference and update it with *i_ref += 1 will make it change behavior.
In the context of C++11 Hans Boehm wrote an entire article on the danger of so-called "benign" data races, and why they cannot be allowed in the C++ memory model (which Rust shares).

A cell with interior mutability allowing arbitrary mutation actions

Standard Cell struct provides interior mutability but allows only a few mutation methods such as set(), swap() and replace(). All of these methods change the whole content of the Cell.
However, sometimes more specific manipulations are needed, for example, to change only a part of data contained in the Cell.
So I tried to implement some kind of universal Cell, allowing arbitrary data manipulation.
The manipulation is represented by user-defined closure that accepts a single argument - &mut reference to the interior data of the Cell, so the user itself can deside what to do with the Cell interior. The code below demonstrates the idea:
use std::cell::UnsafeCell;
struct MtCell<Data>{
dcell: UnsafeCell<Data>,
}
impl<Data> MtCell<Data>{
fn new(d: Data) -> MtCell<Data> {
return MtCell{dcell: UnsafeCell::new(d)};
}
fn exec<F, RetType>(&self, func: F) -> RetType where
RetType: Copy,
F: Fn(&mut Data) -> RetType
{
let p = self.dcell.get();
let pd: &mut Data;
unsafe{ pd = &mut *p; }
return func(pd);
}
}
// test:
type MyCell = MtCell<usize>;
fn main(){
let c: MyCell = MyCell::new(5);
println!("initial state: {}", c.exec(|pd| {return *pd;}));
println!("state changed to {}", c.exec(|pd| {
*pd += 10; // modify the interior "in place"
return *pd;
}));
}
However, I have some concerns regarding the code.
Is it safe, i.e can some safe but malicious closure break Rust mutability/borrowing/lifetime rules by using this "universal" cell?
I consider it safe since lifetime of the interior reference parameter prohibits its exposition beyond the closure call time. But I still have doubts (I'm new to Rust).
Maybe I'm re-inventing the wheel and there exist some templates or techniques solving the problem?
Note: I posted the question here (not on code review) as it seems more related to the language rather than code itself (which represents just a concept).
[EDIT] I'd want zero cost abstraction without possibility of runtime failures, so RefCell is not perfect solution.

This is a very common pitfall for Rust beginners.
Is it safe, i.e can some safe but malicious closure break Rust mutability/borrowing/lifetime rules by using this "universal" cell? I consider it safe since lifetime of the interior reference parameter prohibits its exposition beyond the closure call time. But I still have doubts (I'm new to Rust).
In a word, no.
Playground
fn main() {
let mt_cell = MtCell::new(123i8);
mt_cell.exec(|ref1: &mut i8| {
mt_cell.exec(|ref2: &mut i8| {
println!("Double mutable ref!: {:?} {:?}", ref1, ref2);
})
})
}
You're absolutely right that the reference cannot be used outside of the closure, but inside the closure, all bets are off! In fact, pretty much any operation (read or write) on the cell within the closure is undefined behavior (UB), and may cause corruption/crashes anywhere in your program.
Maybe I'm re-inventing the wheel and there exist some templates or techniques solving the problem?
Using Cell is often not the best technique, but it's impossible to know what the best solution is without knowing more about the problem.
If you insist on Cell, there are safe ways to do this. The unstable (ie. beta) Cell::update() method is literally implemented with the following code (when T: Copy):
pub fn update<F>(&self, f: F) -> T
where
F: FnOnce(T) -> T,
{
let old = self.get();
let new = f(old);
self.set(new);
new
}
Or you could use Cell::get_mut(), but I guess that defeats the whole purpose of Cell.
However, usually the best way to change only part of a Cell is by breaking it up into separate Cells. For example, instead of Cell<(i8, i8, i8)>, use (Cell<i8>, Cell<i8>, Cell<i8>).
Still, IMO, Cell is rarely the best solution. Interior mutability is a common design in C and many other languages, but it is somewhat more rare in Rust, at least via shared references and Cell, for a number of reasons (e.g. it's not Sync, and in general people don't expect interior mutability without &mut). Ask yourself why you are using Cell and if it is really impossible to reorganize your code to use normal &mut references.
IMO the bottom line is actually about safety: if no matter what you do, the compiler complains and it seems that you need to use unsafe, then I guarantee you that 99% of the time either:
There's a safe (but possibly complex/unintuitive) way to do it, or
It's actually undefined behavior (like in this case).
EDIT: Frxstrem's answer also has better info about when to use Cell/RefCell.

Your code is not safe, since you can call c.exec inside c.exec to get two mutable references to the cell contents, as demonstrated by this snippet containing only safe code:
let c: MyCell = MyCell::new(5);
c.exec(|n| {
// need `RefCell` to access mutable reference from within `Fn` closure
let n = RefCell::new(n);
c.exec(|m| {
let n = &mut *n.borrow_mut();
// now `n` and `m` are mutable references to the same data, despite using
// no unsafe code. this is BAD!
})
})
In fact, this is exactly the reason why we have both Cell and RefCell:
Cell only allows you to get and set a value and does not allow you to get a mutable reference from an immutable one (thus avoiding the above issue), but it does not have any runtime cost.
RefCell allows you to get a mutable reference from an immutable one, but needs to perform checks at runtime to ensure that this is safe.
As far as I know, there's not really any safe way around this, so you need to make a choice in your code between no runtime cost but less flexibility, and more flexibility but with a small runtime cost.

When is it necessary to circumvent Rust's borrow checker?

I'm implementing Conway's game of life to teach myself Rust. The idea is to implement a single-threaded version first, optimize it as much as possible, then do the same for a multi-threaded version.
I wanted to implement an alternative data layout which I thought might be more cache-friendly. The idea is to store the status of two cells for each point on a board next to each other in memory in a vector, one cell for reading the current generation's status from and one for writing the next generation's status to, alternating the access pattern for each
generation's computation (which can be determined at compile time).
The basic data structures are as follows:
#[repr(u8)]
pub enum CellStatus {
DEAD,
ALIVE,
}
/** 2 bytes */
pub struct CellRW(CellStatus, CellStatus);
pub struct TupleBoard {
width: usize,
height: usize,
cells: Vec<CellRW>,
}
/** used to keep track of current pos with iterator e.g. */
pub struct BoardPos {
x_pos: usize,
y_pos: usize,
offset: usize,
}
pub struct BoardEvo {
board: TupleBoard,
}
The function that is causing me troubles:
impl BoardEvo {
fn evolve_step<T: RWSelector>(&mut self) {
for (pos, cell) in self.board.iter_mut() {
//pos: BoardPos, cell: &mut CellRW
let read: &CellStatus = T::read(cell); //chooses the right tuple half for the current evolution step
let write: &mut CellStatus = T::write(cell);
let alive_count = pos.neighbours::<T>(&self.board).iter() //<- can't borrow self.board again!
.filter(|&&status| status == CellStatus::ALIVE)
.count();
*write = CellStatus::evolve(*read, alive_count);
}
}
}
impl BoardPos {
/* ... */
pub fn neighbours<T: RWSelector>(&self, board: &BoardTuple) -> [CellStatus; 8] {
/* ... */
}
}
The trait RWSelector has static functions for reading from and writing to a cell tuple (CellRW). It is implemented for two zero-sized types L and R and is mainly a way to avoid having to write different methods for the different access patterns.
The iter_mut() method returns a BoardIter struct which is a wrapper around a mutable slice iterator for the cells vector and thus has &mut CellRW as Item type. It is also aware of the current BoardPos (x and y coordinates, offset).
I thought I'd iterate over all cell tuples, keep track of the coordinates, count the number of alive neighbours (I need to know coordinates/offsets for this) for each (read) cell, compute the cell status for the next generation and write to the respective another half of the tuple.
Of course, in the end, the compiler showed me the fatal flaw in my design, as I borrow self.board mutably in the iter_mut() method and then try to borrow it again immutably to get all the neighbours of the read cell.
I have not been able to come up with a good solution for this problem so far. I did manage to get it working by making all
references immutable and then using an UnsafeCell to turn the immutable reference to the write cell into a mutable one.
I then write to the nominally immutable reference to the writing part of the tuple through the UnsafeCell.
However, that doesn't strike me as a sound design and I suspect I might run into issues with this when attempting to parallelize things.
Is there a way to implement the data layout I proposed in safe/idiomatic Rust or is this actually a case where you actually have to use tricks to circumvent Rust's aliasing/borrow restrictions?
Also, as a broader question, is there a recognizable pattern for problems which require you to circumvent Rust's borrow restrictions?

When is it necessary to circumvent Rust's borrow checker?
It is needed when:
the borrow checker is not advanced enough to see that your usage is safe
you do not wish to (or cannot) write the code in a different pattern
As a concrete case, the compiler cannot tell that this is safe:
let mut array = [1, 2];
let a = &mut array[0];
let b = &mut array[1];
The compiler doesn't know what the implementation of IndexMut for a slice does at this point of compilation (this is a deliberate design choice). For all it knows, arrays always return the exact same reference, regardless of the index argument. We can tell that this code is safe, but the compiler disallows it.
You can rewrite this in a way that is obviously safe to the compiler:
let mut array = [1, 2];
let (a, b) = array.split_at_mut(1);
let a = &mut a[0];
let b = &mut b[0];
How is this done? split_at_mut performs a runtime check to ensure that it actually is safe:
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
let len = self.len();
let ptr = self.as_mut_ptr();
unsafe {
assert!(mid <= len);
(from_raw_parts_mut(ptr, mid),
from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
}
}
For an example where the borrow checker is not yet as advanced as it can be, see What are non-lexical lifetimes?.
I borrow self.board mutably in the iter_mut() method and then try to borrow it again immutably to get all the neighbours of the read cell.
If you know that the references don't overlap, then you can choose to use unsafe code to express it. However, this means you are also choosing to take on the responsibility of upholding all of Rust's invariants and avoiding undefined behavior.
The good news is that this heavy burden is what every C and C++ programmer has to (or at least should) have on their shoulders for every single line of code they write. At least in Rust, you can let the compiler deal with 99% of the cases.
In many cases, there's tools like Cell and RefCell to allow for interior mutation. In other cases, you can rewrite your algorithm to take advantage of a value being a Copy type. In other cases you can use an index into a slice for a shorter period. In other cases you can have a multi-phase algorithm.
If you do need to resort to unsafe code, then try your best to hide it in a small area and expose safe interfaces.
Above all, many common problems have been asked about (many times) before:
How to iterate over mutable elements inside another mutable iteration over the same elements?
Mutating an item inside of nested loops
How can a nested loop with mutations on a HashMap be achieved in Rust?
What's the Rust way to modify a structure within nested loops?
Nesting an iterator's loops

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Does partial application in Rust have overhead? - rust

Related

Stable alternative to collect_into - or - how do I collect a sized queue?

How to dynamically signal to Rust compiler that a given variable is non-zero?

How can I mutate a shared variable from multiple threads, disregarding data races?

A cell with interior mutability allowing arbitrary mutation actions

When is it necessary to circumvent Rust's borrow checker?

Categories

Resources