I have a struct, Node, of which one of the fields (node_type) is an enum LayerType (which is either LayerType::ReLU, LayerType::Sigmoid, LayerType::Tanh or LayerType::Linear). There is a function on each Node: non_linear_function(&self, x: f64) -> f64 which should select the correct function to use based on the variant of the LayerType stored in node_type. The code I currently have is:
pub fn non_linear_function(&self, x: f64) -> f64 {
match self.node_type {
LayerType::ReLU => todo!(),
LayerType::Sigmoid => todo!(),
LayerType::Tanh => todo!(),
LayerType::Linear => todo!(),
}
}
However, I am not sure whether or not this is efficient. Will it perform the check every time the function is called? If so, is there a way to prevent this happening?
All of the relevant code can be found below.
struct Layer {
layer_type: LayerType,
layer_width: usize,
node_vec: Vec<Node>
}
#[derive(Debug)]
struct Node {
prev_layer_size: usize,
node_type: LayerType,
weights: Vec<f64>,
bias: f64
}
impl Node {
pub fn new(prev_layer_size: usize, layer_pointer: &Layer) -> Node {
// Instantiates a new node based on the size of the previous layer
let mut rng = rand::thread_rng();
return Node {
prev_layer_size,
node_type: (*layer_pointer).layer_type,
weights: (0..prev_layer_size).map(|x| rng.gen()).collect(),
bias: rng.gen()
};
}
pub fn eval(&self, prev_activations: &Vec<f64>) -> f64 {
// Evaluate the activation of the node
return self.non_linear_function(
self.weights.iter().zip(prev_activations.iter()).map(
|(&x, &y)| x*y
).sum::<f64>() + self.bias
)
}
pub fn update(&mut self, target: f64) -> Vec<f64> {
// Changes weights and biases based on target,
// which indicates how much a given node should change and
// returns a vector for how it thinks the nodes in the layer
// behind it should change
todo!()
}
// The code in question is here
pub fn non_linear_function(&self, x: f64) -> f64 {
match self.node_type {
LayerType::ReLU => todo!(),
LayerType::Sigmoid => todo!(),
LayerType::Tanh => todo!(),
LayerType::Linear => todo!(),
}
}
}
#[derive(Debug, Clone, Copy)]
enum LayerType {
ReLU,
Sigmoid,
Tanh,
Linear
}
Is there an idiomatic way to implement this without match? Or is the match already optimised away by the compiler?
Additionally, I can see that it is also an option to implement several different versions of Node, for each different LayerType, however this does not sound like a good idea.
I also understand that this is probably a miniscule time save, but I am writing this code to learn how to write fast Rust code.
Will it perform the check every time the function is called?
Yes, with this code the check will be kept at runtime.
If so, is there a way to prevent this happening?
Instead of using an enum for the LayerType, use a trait that specifies the method non_linear_function and implement it for the activation functions you want to support, this will move the check at compile time.
pub trait ActivationFunction {
fn non_linear_function(&self, x: f64) -> f64;
}
struct ReLU;
impl ActivationFunction for ReLU {
fn non_linear_function(&self, x: f64) -> f64 {
f64::max(0.0, x)
}
}
// ...
self.node_type.non_linear_function(x)
// ...
We can see in your code that node_type in Node is probably redundant with layer_type in Layer: the non-linear function should then be chosen at the Layer level.
If your concern is efficiency, I suggest you step away from this OO-like design.
Instead of asking each node of the layer « how would you do that? » and obtaining the same behaviour for each one, you could put all the logic in the layer and consider the nodes as simple passive structs (and even go further by packing properties as suggested in data-oriented-design).
The key is that the details of the computation are known once for all for the entire layer, then the compiler can generate some linear code with no branching/indirection.
In the following code, I tried to stay as close as possible to the example you provided.
The eval() function in Node does not apply the non-linear function.
Instead, I added an eval() function in Layer, which chooses once for all the non-linear function (I used dummy formulas) to apply to each node.
The Node struct does not need to hold the LayerType since it is already held in the Layer.
It does not either need to keep prev_layer_size since it is redundant with weights.len().
Note however that the inner-product of weight and prev_activations will likely take much more time than the match to choose the non-linear function to be applied right after; the efficiency gain expected on the way we specify this function may not be substantial in the end...
struct Layer {
layer_type: LayerType,
layer_width: usize,
node_vec: Vec<Node>,
}
impl Layer {
pub fn eval(
&self,
prev_activations: &Vec<f64>,
) -> Vec<f64> {
let eval_nodes =
self.node_vec.iter().map(|n| n.eval(prev_activations));
match self.layer_type {
LayerType::ReLU => {
let fnct = |x| 2.0 * x; // dummy formula
eval_nodes.map(fnct).collect()
}
LayerType::Sigmoid => {
let fnct = |x| 2.0 * x; // dummy formula
eval_nodes.map(fnct).collect()
}
LayerType::Tanh => {
let fnct = |x| 2.0 * x; // dummy formula
eval_nodes.map(fnct).collect()
}
LayerType::Linear => {
let fnct = |x| 2.0 * x; // dummy formula
eval_nodes.map(fnct).collect()
}
}
}
}
#[derive(Debug)]
struct Node {
weights: Vec<f64>,
bias: f64,
}
impl Node {
pub fn new(
prev_layer_size: usize,
layer_pointer: &Layer,
) -> Node {
// Instantiates a new node based on the size of the previous layer
let mut rng = rand::thread_rng();
Node {
weights: (0..prev_layer_size).map(|x| rng.gen()).collect(),
bias: rng.gen(),
}
}
pub fn eval(
&self,
prev_activations: &Vec<f64>,
) -> f64 {
// Evaluate the activation of the node
// NOTE: the non linear function is not used here
self.weights
.iter()
.zip(prev_activations.iter())
.map(|(&x, &y)| x * y)
.sum::<f64>()
+ self.bias
}
pub fn update(
&mut self,
target: f64,
) -> Vec<f64> {
// Changes weights and biases based on target,
// which indicates how much a given node should change and
// returns a vector for how it thinks the nodes in the layer
// behind it should change
todo!()
}
}
#[derive(Debug, Clone, Copy)]
enum LayerType {
ReLU,
Sigmoid,
Tanh,
Linear,
}
Related
I am working on implementing a sieve of atkins as my first decently sized program in rust. This algorithm takes a number and returns a vector of all primes below that number. There are two different vectors I need use this function.
BitVec 1 for prime 0 for not prime (flipped back and forth as part of the algorithm).
Vector containing all known primes.
The size of the BitVec is known as soon as the function is called. While the final size of the vector containing all known primes is not known, there are relatively accurate upper limits for the number of primes in a range. Using these I can set the size of the vector to an upper bound then shrink_to_fit it before returning. The upshot of this neither array should ever need to have it's capacity increased while the algorithm is running, and if this happens something has gone horribly wrong with the algorithm.
Therefore, I would like my function to panic if the capacity of either the vector or the bitvec is changed during the running of the function. Is this possible and if so how would I be best off implementing it?
Thanks,
You can assert that the vecs capacity() and len() are different before each push:
assert_ne!(v.capacity(), v.len());
v.push(value);
If you want it done automatically you'd have to wrap your vec in a newtype:
struct FixedSizeVec<T>(Vec<T>);
impl<T> FixedSizeVec<T> {
pub fn push(&mut self, value: T) {
assert_ne!(self.0.len(), self.0.capacity())
self.0.push(value)
}
}
To save on forwarding unchanged methods you can impl Deref(Mut) for your newtype.
use std::ops::{Deref, DerefMut};
impl<T> Deref for FixedSizeVec<T> {
type Target = Vec<T>;
fn deref(&self) -> &Vec<T> {
&self.0
}
}
impl<T> DerefMut for FixedSizeVec<T> {
fn deref_mut(&mut self) -> &mut Vec<T> {
&mut self.0
}
}
An alternative to the newtype pattern is to create a new trait with a method that performs the check, and implement it for the vector like so:
trait PushCheck<T> {
fn push_with_check(&mut self, value: T);
}
impl<T> PushCheck<T> for std::vec::Vec<T> {
fn push_with_check(&mut self, value: T) {
let prev_capacity = self.capacity();
self.push(value);
assert!(prev_capacity == self.capacity());
}
}
fn main() {
let mut v = Vec::new();
v.reserve(4);
dbg!(v.capacity());
v.push_with_check(1);
v.push_with_check(1);
v.push_with_check(1);
v.push_with_check(1);
// This push will panic
v.push_with_check(1);
}
The upside is that you aren't creating a new type, but the obvious downside is you need to remember to use the newly defined method.
I'm trying quite complex stuff with Rust where I need the following attributes, and am fighting the compiler.
Object which itself lives from start to finish of application, however, where internal maps/vectors could be modified during application lifetime
Multiple references to object that can read internal maps/vectors of an object
All single threaded
Multiple nested iterators which are map/modified in lazy manner to perform fast and complex calculations (see example below)
A small example, which already causes problems:
use std::cell::RefCell;
use std::rc::Rc;
use std::sync::Weak;
pub struct Holder {
array_ref: Weak<RefCell<Vec<isize>>>,
}
impl Holder {
pub fn new(array_ref: Weak<RefCell<Vec<isize>>>) -> Self {
Self { array_ref }
}
fn get_iterator(&self) -> impl Iterator<Item = f64> + '_ {
self.array_ref
.upgrade()
.unwrap()
.borrow()
.iter()
.map(|value| *value as f64 * 2.0)
}
}
get_iterator is just one of the implementations of a trait, but even this example already does not work.
The reason for Weak/Rc is to make sure that multiple places points to object (from point (1)) and other place can modify its internals (Vec<isize>).
What is the best way to approach this situation, given that end goal is performance critical?
EDIT:
Person suggested using https://doc.rust-lang.org/std/cell/struct.Ref.html#method.map
But unfortunately still can't get - if I should also change return type - or maybe the closure function is wrong here
fn get_iterator(&self) -> impl Iterator<Item=f64> + '_ {
let x = self.array_ref.upgrade().unwrap().borrow();
let map1 = Ref::map(x, |x| &x.iter());
let map2 = Ref::map(map1, |iter| &iter.map(|y| *y as f64 * 2.0));
map2
}
IDEA say it has wrong return type
the trait `Iterator` is not implemented for `Ref<'_, Map<std::slice::Iter<'_, isize>, [closure#src/bin/main.rs:30:46: 30:65]>>`
This won't work because self.array_ref.upgrade() creates a local temporary Arc value, but the Ref only borrows from it. Obviously, you can't return a value that borrows from a local.
To make this work you need a second structure to own the Arc, which can implement Iterator in this case since the produced items aren't references:
pub struct HolderIterator(Arc<RefCell<Vec<isize>>>, usize);
impl Iterator for HolderIterator {
type Item = f64;
fn next(&mut self) -> Option<f64> {
let r = self.0.borrow().get(self.1)
.map(|&y| y as f64 * 2.0);
if r.is_some() {
self.1 += 1;
}
r
}
}
// ...
impl Holder {
// ...
fn get_iterator<'a>(&'a self) -> Option<impl Iterator<Item=f64>> {
self.array_ref.upgrade().map(|rc| HolderIterator(rc, 0))
}
}
Alternatively, if you want the iterator to also weakly-reference the value contained within, you can have it hold a Weak instead and upgrade on each next() call. There are performance implications, but this also makes it easier to have get_iterator() be able to return an iterator directly instead of an Option, and the iterator written so that a failed upgrade means the sequence has ended:
pub struct HolderIterator(Weak<RefCell<Vec<isize>>>, usize);
impl Iterator for HolderIterator {
type Item = f64;
fn next(&mut self) -> Option<f64> {
let r = self.0.upgrade()?
.borrow()
.get(self.1)
.map(|&y| y as f64 * 2.0);
if r.is_some() {
self.1 += 1;
}
r
}
}
// ...
impl Holder {
// ...
fn get_iterator<'a>(&'a self) -> impl Iterator<Item=f64> {
HolderIterator(Weak::clone(&self.array_ref), 0)
}
}
This will make it so that you always get an iterator, but it's empty if the Weak is dead. The Weak can also die during iteration, at which point the sequence will abruptly end.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm writing a function to fix the float precision problem in rust (among other languages). I need to choose to .floor() or .ceil() within the function based on the passed parameter. Whats the best way to approach this problem. If something other than a function serves the propose better, by all means! Thank you for your help!
fn main() {
to_round(7.987, "floor");
}
fn to_round(n: f64, floor_or_ceil: &str) -> f64 {
fn test(diff_n: f64) -> f64 {
if floor_or_ceil == "floor" {
diff_n.floor()
} else {
diff_n.ceil()
}
}
test(n)
}
If you replace your inner fn function with a closure, then the code you have will compile. Closures can refer to variables from the enclosing scope, whereas fns cannot.
fn main() {
to_round(7.987, "floor");
}
fn to_round(n: f64, floor_or_ceil: &str) -> f64 {
let test = |diff_n: f64| -> f64 {
if floor_or_ceil == "floor" {
diff_n.floor()
} else {
diff_n.ceil()
}
};
test(n)
}
However, closures do come with some caveats; in particular, they can't be generic functions, and can't be used where function pointers can be (unless they don't mention ("close over") any variables from the enclosing scope and are merely using closure syntax instead of fn syntax).
The most general-purpose clean solution to this class of problem is to explicitly pass the needed parameters, but you can minimize the clutter of it by such means as using a struct (especially if there is more than one value needed) and making the functions methods of that struct:
fn main() {
let c = MathContext { rounding_mode: "floor" };
c.to_round(7.987);
}
struct MathContext {
rounding_mode: &'static str, // Side note: this should really be an enum, not a string
// And you can add more fields here for any other parameters needed.
}
impl MathContext {
fn to_round(&self, n: f64) -> f64 {
self.test(n)
}
fn test(&self, diff_n: f64) -> f64 {
if self.rounding_mode == "floor" {
diff_n.floor()
} else {
diff_n.ceil()
}
}
}
Whether exactly this set of methods fits depends on what you're actually doing; if test is very specific to to_round then it doesn't make sense to pull it out this way. But it seems likely that this pattern might be useful elsewhere in your code, at least, if you're doing things like picking which way to round numbers.
First, you should eliminate the use of &strs as enums. Just declare an enum:
#[derive(Clone, Copy)]
enum Round {
Floor,
Ceil,
}
use ::Round::{Floor, Ceil};
fn round(n: f64, setting: Round) -> f64 {
match setting {
Floor => n.floor(),
Ceil => n.ceil(),
}
}
The second variant is using closures.
fn round_func(setting: Round) -> impl Fn(f64) -> f64 {
move |n| match setting {
Floor => n.floor(),
Ceil => n.ceil(),
}
}
This will take as input your setting and return a closure that will calculate either the floor or ceiling for you. EG you could do something like
let closure = round_func(setting);
println!("{}", closure(5.5));
println!("{}", closure(5.5));
This will take as input the configuration and return a closure that serves as a function.
In this particular case, we can actually do even better by directly returning a function pointer. The code looks like this:
fn round_func_pointer(setting: Round) -> fn(f64) -> f64 {
match setting {
Floor => f64::floor,
Ceil => f64::ceil,
}
}
The code could also be written
fn round_func_pointer(setting: Round) -> fn(f64) -> f64 {
match setting {
Floor => |n| n.floor(),
Ceil => |n| n.ceil(),
}
}
This works because the "closure" |n| n.floor() actually doesn't capture anything at all (no move keyword necessary), and therefore is compiled to a function pointer.
Class variable w/ precision and rounding mode
Rust doesn't have syntax for declaring a field in a struct as static like C++, Java, and other languages. However, it is still possible to declare a static variable that for all intents and purposes is a "class variable".
In the implementation below, the scope of the static variable CONFIG is restricted to the file it's declared in; so, in that sense it isn't a "global" variable. The struct declared within the same file has exclusive access to it, and all instances of that struct access the same variable.
To ensure that the "class variable" doesn't get clobbered by simultaneous writes from separate threads, RwLock is used to synchronize it. This incurs a small cost in performance over implementations that don't synchronize access in single-threaded environments.
The lazy_static macro sets up the "class variable" to delay initialization until the variable is actually needed at runtime.
use std::sync::RwLock;
#[macro_use]
extern crate lazy_static;
lazy_static! {
// The tuple holds (rounding-mode, precision).
static ref CONFIG: RwLock<(RoundMode, u32)> = {
RwLock::new((RoundMode::Auto, 3))
};
}
#[derive(Clone, Copy)]
pub enum RoundMode { Ceil, Floor, Auto }
The struct implementation below provides a way for client code to set the rounding mode (Auto, Ceil, and Floor) and precision (number of decimal positions to round to) for all instances of HasValue.
pub struct HasValue { value: f64 }
impl HasValue {
pub fn new(value: f64) -> Self
{
HasValue { value }
}
pub fn config() -> (RoundMode, u32)
{
*CONFIG.read().unwrap()
}
pub fn set_config(mode: RoundMode, precision: u32)
{
*CONFIG.write().unwrap() = (mode, precision);
}
pub fn round(&self) -> f64
{
use RoundMode::*;
let (mode, prec) = HasValue::config();
let base = 10_f64.powf(prec as f64);
match mode {
Auto => (self.value * base).round() / base,
Ceil => (self.value * base).ceil() / base,
Floor => (self.value * base).floor() / base,
}
}
}
Creating, configuring, printing...
fn main()
{
let v = HasValue::new(7.98555);
let r = v.round();
println!("r={:?}", r);
HasValue::set_config(RoundMode::Floor, 4);
let r = v.round();
println!("r={:?}", r);
}
I'm creating some User Input components in Rust. I'm trying to modularize these components using trait CustomInput. This trait implements some basic for all UI Input components, like being able to set and get the values of the component. All my UI components will take a String input, and parse that string to some typed value, maybe an f64, i32, or Path. Because different UI components can return different types, I created enum CustomOutputType, so that get_value can have a single return type for all components.
As users can input any random string, each UI component needs to have a means of ensuring that the input string can be successfully converted to the right type for the respective input component, I.e MyFloatInput needs to return a f64 value, MyIntInput needs to return an i32, and so on. I see that std::str::FromStr is implemented for most basic types, bool, f64, usize, but I don't think this kind of parsing is robust enough for UI inputs.
I'd like some more advanced functionality than std::str::FromStr provides, I'd like to be able to:
Evaluate expression, i.e input of 500/25 returns 20
Floor/Ceiling for min/max values: i.e if max is 1000 and input is 2000, value is 1000
What would be an idiomatic way of implementing such parsing
functionality for my UI components?
Is using the CustomOutputType enum an efficient strategy for handling the
different types the UI Components could return?
Would a parsing crate like Nom be advisable in this situation? I think this might be
overkill for what I'm doing, but I'm not sure.
I've taken a stab at creating such a system:
use std::str::FromStr;
use snafu::{Snafu};
#[derive(Debug, Snafu)]
pub enum Error{
#[snafu(display("Incorrect Type: {:?}", msg))]
IncorrectInputType{msg: String},
}
type Result<T, E = Error> = std::result::Result<T, E>;
#[derive(Clone, Debug)]
pub enum CustomOutputType{
CFloat(f64),
CInt(i32),
}
pub trait CustomInput{
fn get_value(&self)->CustomOutputType;
fn set_value(&mut self, val: String)->Result<()>;
}
pub struct MyFloatInput{
val: f64,
}
pub struct MyIntInput{
val: i32
}
impl CustomInput for MyFloatInput{
fn get_value(&self)->CustomOutputType{
CustomOutputType::CFloat(self.val)
}
fn set_value(&mut self, val: String)->Result<()>{
match f64::from_str(&val) {
Ok(inp)=>Ok(self.val = inp),
// Ok(inp)=>{self.val = inp}
Err(e) => IncorrectInputType { msg: "setting float input failed".to_string() }.fail()?
}
}
}
impl CustomInput for MyIntInput{
fn get_value(&self)->CustomOutputType{
CustomOutputType::CInt(self.val)
}
fn set_value(&mut self, val: String)->Result<()>{
match i32::from_str(&val) {
Ok(inp)=>Ok(self.val = inp),
Err(e) => IncorrectInputType { msg: "setting int input failed".to_string() }.fail()?
}
}
}
fn main() {
let possible_inputs = vec!["100".to_string(), "100.0".to_string(), "100/2".to_string(), ".1".to_string(), "teststr".to_string()];
let mut int_input = MyIntInput{val: 0};
let mut float_input = MyFloatInput{val: 0.0};
for x in &possible_inputs{
int_input.set_value(x.to_string()).unwrap();
float_input.set_value(x.to_string()).unwrap();
}
}
Is there an equivalent of alloca to create variable length arrays in Rust?
I'm looking for the equivalent of the following C99 code:
void go(int n) {
int array[n];
// ...
}
It is not possible directly, as in there is not direct syntax in the language supporting it.
That being said, this particular feature of C99 is debatable, it has certain advantages (cache locality & bypassing malloc) but it also has disadvantages (easy to blow-up the stack, stumps a number of optimizations, may turn static offsets into dynamic offsets, ...).
For now, I would advise you to use Vec instead. If you have performance issues, then you may look into the so-called "Small Vector Optimization". I have regularly seen the following pattern in C code where performance is required:
SomeType array[64] = {};
SomeType* pointer, *dynamic_pointer;
if (n <= 64) {
pointer = array;
} else {
pointer = dynamic_pointer = malloc(sizeof(SomeType) * n);
}
// ...
if (dynamic_pointer) { free(dynamic_pointer); }
Now, this is something that Rust supports easily (and better, in a way):
enum InlineVector<T, const N: usize> {
Inline(usize, [T; N]),
Dynamic(Vec<T>),
}
You can see an example simplistic implementation below.
What matters, however, is that you now have a type which:
uses the stack when less than N elements are required
moves off to the heap otherwise, to avoid blowing up the stack
Of course, it also always reserves enough space for N elements on the stack even if you only use 2; however in exchange there is no call to alloca so you avoid the issue of having dynamic offsets to your variants.
And contrary to C, you still benefit from lifetime tracking so that you cannot accidentally return a reference to your stack-allocated array outside the function.
Note: prior to Rust 1.51, you wouldn't be able to customize by N.
I will show off the most "obvious" methods:
enum SmallVector<T, const N: usize> {
Inline(usize, [T; N]),
Dynamic(Vec<T>),
}
impl<T: Copy + Clone, const N: usize> SmallVector<T, N> {
fn new(v: T, n: usize) -> Self {
if n <= N {
Self::Inline(n, [v; N])
} else {
Self::Dynamic(vec![v; n])
}
}
}
impl<T, const N: usize> SmallVector<T, N> {
fn as_slice(&self) -> &[T] {
match self {
Self::Inline(n, array) => &array[0..*n],
Self::Dynamic(vec) => vec,
}
}
fn as_mut_slice(&mut self) -> &mut [T] {
match self {
Self::Inline(n, array) => &mut array[0..*n],
Self::Dynamic(vec) => vec,
}
}
}
use std::ops::{Deref, DerefMut};
impl<T, const N: usize> Deref for SmallVector<T, N> {
type Target = [T];
fn deref(&self) -> &Self::Target {
self.as_slice()
}
}
impl<T, const N: usize> DerefMut for SmallVector<T, N> {
fn deref_mut(&mut self) -> &mut Self::Target {
self.as_mut_slice()
}
}
Usage:
fn main() {
let mut v = SmallVector::new(1u32, 4);
v[2] = 3;
println!("{}: {}", v.len(), v[2])
}
Which prints 4: 3 as expected.
No.
Doing that in Rust would entail the ability to store Dynamically Sized Types (DSTs) like [i32] on the stack, which the language doesn't support.
A deeper reason is that LLVM, to my knowledge, doesn't really support this. I'm given to believe that you can do it, but it significantly interferes with optimisations. As such, I'm not aware of any near-terms plans to allow this.