How to properly initialize a struct in Rust, with good enough encapsulations? - rust

How to properly initialize a struct in Rust, with good enough encapsulations?
Or more naively:
how to leverage object/instance methods in the initialization/constructing process of structs?
For example, as the initialization block in Kotlin:
private class BinaryIndexedTree(nums: IntArray) {
private val nNums = nums.size
private val fenwick = IntArray(nNums + 1) { 0 }
// where to put this block in Rust?
init {
for (idx in nums.indices) {
update(idx, nums[idx])
}
}
fun update(index: Int, value: Int) {
var idx = index + 1
while (idx <= nNums) {
fenwick[idx] += value
idx += (idx and -idx)
}
}
fun query(index: Int): Int {
var sum = 0
var idx = index + 1
while (idx > 0) {
sum += fenwick[idx]
idx -= (idx and -idx)
}
return sum
}
}
According to Rust Design Patterns, there is no regular constructors as other languages, the convention is to use an associated function.
Correspondingly, in Rust:
struct BinaryIndexedTree{
len_ns: isize,
fenwick: Vec<i32>,
}
impl BinaryIndexedTree{
pub fn new(nums: &Vec<i32>) -> Self{
let len_ns: usize = nums.len();
let fenwick: Vec<i32> = vec![0; len_ns + 1];
for (idx, num) in nums.iter().enumerate(){
// how to leverage `update()` for initialization
// update(idx as isize, num);
// or even earlier: where/how to put the initialization logic?
}
Self{
len_ns: len_ns as isize,
fenwick,
}
}
pub fn update(&mut self, index: isize, value: i32){
let mut idx = index + 1;
while idx <= self.len_ns{
self.fenwick[idx as usize] += value;
idx += (idx & -idx);
}
}
pub fn query(&self, index: isize) -> i32{
let mut sum: i32 = 0;
let mut idx = index + 1;
while idx > 0{
sum += self.fenwick[idx as usize];
idx -= (idx & -idx);
}
sum
}
}
Is there any way to properly leverage the update method?
As a rule of thumbs, how to properly handle the initialization work after the creation of (all the fields of) the struct?
The builder pattern is a way to go, which introduces much more code just for initialization.

Yes, you can construct the struct then call a function on it before returning it. There is nothing special about the new function name or how the struct is constructed at the end of the function.
pub fn new(nums: &Vec<i32>) -> Self {
let len_ns: usize = nums.len();
let fenwick: Vec<i32> = vec![0; len_ns + 1];
// Construct an incomplete version of the struct.
let mut new_self = Self {
len_ns: len_ns as isize,
fenwick,
};
// Do stuff with the struct
for (idx, num) in nums.iter().enumerate(){
new_self.update(idx as isize, num);
}
// Return it
new_self
}

Related

Returning struct with vector

I just began learning Rust and doing some exercises.
Here I'm trying to return the next permutation. But at the end of the next() method it seems to return the wrong vector in the struct.
pub struct Permutation {
p : Vec<u8>,
}
impl Permutation{
pub fn new(length: u8) -> Permutation {
let mut p :Vec<u8> = Vec::new();
for i in 1..length+1 {
p.push(i as u8);
}
Permutation { p }
}
pub fn create(this: Vec<u8>) -> Permutation {
Permutation { p:this }
}
pub fn next(&mut self) -> Option<Permutation> {
let mut pivot :usize = self.p.len() + 1;
for i in (1..self.p.len()).rev() {
if self.p[i-1] < self.p[i] {
pivot = i-1;
break;
}
}
if pivot == self.p.len() + 1 {
return None;
}
let mut swap :usize = pivot + 1;
for i in pivot+1..self.p.len() {
if self.p[i] > self.p[pivot] && self.p[i] < self.p[swap] {
swap = i;
}
}
let temp = self.p[swap];
self.p[swap] = self.p[pivot];
self.p[pivot] = temp;
pivot += 1;
let mut new_perm :Vec<u8> = Vec::new();
for i in 0..pivot {
new_perm.push(self.p[i]);
}
for i in (pivot..self.p.len()).rev() {
new_perm.push(self.p[i]);
}
// Debug
// for e in &new_perm {
// println!("{}", e);
//}
return Some(Permutation{ p: new_perm })
}
}
If I uncomment the println I can see that the new_perm vector is correct, but I seem to be getting the self.p vector returned.
What am I doing wrong here?

Dealing with so-called global variables in Rust

We all know that using global variables can lead to subtle bugs. I need to migrate Python programs to Rust, keeping the algorithm intact as far as possible. Once I have demonstrated Python-Rust equivalence there will be opportunities to debug and change the logic to fit Rust better. Here is a simple Python program using global variables, followed by my unsuccessful Rust version.
# global variable
a = 15
# function to perform addition
def add():
global a
a += 100
# function to perform subtraction
def subtract():
global a
a -= 100
# Using a global through functions
print("Initial value of a = ", a)
add()
print("a after addition = ", a)
subtract()
print("a after subtraction = ", a)
Here is a Rust program that runs, but I cannot get the closures to update the so-called global variable.
fn fmain() {
// global variable
let mut a = 15;
// perform addition
let add = || {
let mut _name = a;
// name += 100; // the program won't compile if this is uncommented
};
call_once(add);
// perform subtraction
let subtract = || {
let mut _name = a;
// name -= 100; // the program won't compile if this is uncommented
};
call_once(subtract);
// Using a global through functions
println!("Initial value of a = {}", a);
add();
println!("a after addition = {}", a);
subtract();
println!("a after subtraction = {}", a);
}
fn main() {
fmain();
}
fn call_once<F>(f: F)
where
F: FnOnce(),
{
f();
}
My request: Re-create the Python logic in Rust.
Your Rust code is not using global variables, the a variable is stack-allocated. While Rust doesn't particularly endorse global variables, you can certainly use them. Translated to Rust that uses actual globals, your program would look like this:
use lazy_static::lazy_static;
use parking_lot::Mutex; // or std::sync::Mutex
// global variable
lazy_static! {
static ref A: Mutex<u32> = Mutex::new(15);
}
// function to perform addition
fn add() {
*A.lock() += 100;
}
// function to perform subtraction
fn subtract() {
*A.lock() -= 100;
}
fn main() {
// Using a global through functions
println!("Initial value of a = {}", A.lock());
add();
println!("a after addition = {}", A.lock());
subtract();
println!("a after subtraction = {}", A.lock());
}
Playground
If you prefer to use closures, you can do that too, but you'll need to use interior mutability to allow multiple closures to capture the same environment. For example, you could use a Cell:
use std::cell::Cell;
fn main() {
let a = Cell::new(15);
let add = || {
a.set(a.get() + 100);
};
let subtract = || {
a.set(a.get() - 100);
};
// Using a global through functions
println!("Initial value of a = {}", a.get());
add();
println!("a after addition = {}", a.get());
subtract();
println!("a after subtraction = {}", a.get());
}
Playground
Dependency-less examples as enum and function. EDIT : Code improved, as suggested in comment and corrected match arm.
use std::sync::{Arc, Mutex, Once};
static START: Once = Once::new();
static mut ARCMUT: Vec<Arc<Mutex<i32>>> = Vec::new();
// as enum
enum Operation {
Add,
Subtract,
}
impl Operation {
// static change
fn result(self) -> i32 {
let mut arc_clone = unsafe { ARCMUT[0].clone() };
let mut unlock = arc_clone.lock().unwrap();
match self {
Operation::Add => *unlock += 100,
Operation::Subtract => *unlock -= 100,
}
*unlock
}
// dynamic change
fn amount(self, amount: i32) -> i32 {
let mut arc_clone = unsafe { ARCMUT[0].clone() };
let mut unlock = arc_clone.lock().unwrap();
match self {
Operation::Add => *unlock += amount,
Operation::Subtract => *unlock -= amount,
}
*unlock
}
}
// as a function
fn add() -> i32 {
let mut arc_clone = unsafe { ARCMUT[0].clone() };
let mut unlcok = arc_clone.lock().unwrap();
*unlcok += 100;
*unlcok
}
// as trait
trait OperationTrait {
fn add(self) -> Self;
fn subtract(self) -> Self;
fn return_value(self) ->i32;
}
impl OperationTrait for i32 {
fn add(mut self) -> Self {
let arc_clone = unsafe{ARCMUT[0].clone()};
let mut unlock = arc_clone.lock().unwrap();
*unlock += self;
self
}
fn subtract(mut self) -> Self {
let arc_clone = unsafe{ARCMUT[0].clone()};
let mut unlock = arc_clone.lock().unwrap();
*unlock -= self;
self
}
fn return_value(self)->Self{
let arc_clone = unsafe{ARCMUT[0].clone()};
let mut unlock = arc_clone.lock().unwrap();
*unlock
}
}
// fn main
fn main() {
START.call_once(|| unsafe {
ARCMUT = vec![Arc::new(Mutex::new(15))];
});
let test = Operation::Add.result();
println!("{:?}", test);
let test = Operation::Subtract.amount(100);
println!("{:?}", test);
let test = add();
println!("{:?}", test);
let test = 4000.add();
println!("{:?}", test);
}

How to write a macro that splits a byte into a tuple of bits of user-specified count?

I would like to have macro splitting one byte into tuple with 2-8 u8 parts using bitreader crate.
I managed to achieve that by following code:
use bitreader::BitReader;
trait Tupleprepend<T> {
type ResultType;
fn prepend(self, t: T) -> Self::ResultType;
}
macro_rules! impl_tuple_prepend {
( () ) => {};
( ( $t0:ident $(, $types:ident)* ) ) => {
impl<$t0, $($types,)* T> Tupleprepend<T> for ($t0, $($types,)*) {
type ResultType = (T, $t0, $($types,)*);
fn prepend(self, t: T) -> Self::ResultType {
let ($t0, $($types,)*) = self;
(t, $t0, $($types,)*)
}
}
impl_tuple_prepend! { ($($types),*) }
};
}
impl_tuple_prepend! {
(_1, _2, _3, _4, _5, _6, _7, _8)
}
macro_rules! split_byte (
($reader:ident, $bytes:expr, $count:expr) => {{
($reader.read_u8($count).unwrap(),)
}};
($reader:ident, $bytes:expr, $count:expr, $($next_counts:expr),+) => {{
let head = split_byte!($reader, $bytes, $count);
let tail = split_byte!($reader, $bytes, $($next_counts),+);
tail.prepend(head.0)
}};
($bytes:expr $(, $count:expr)* ) => {{
let mut reader = BitReader::new($bytes);
split_byte!(reader, $bytes $(, $count)+)
}};
);
Now I can use this code as I would like to:
let buf: &[u8] = &[0x72];
let (bit1, bit2, bits3to8) = split_byte!(&buf, 1, 1, 6);
Is there a way to avoid using Tupleprepend trait and create only 1 tuple instead of 8 in the worst scenario?
Because the number of bit widths directly corresponds to the number of returned values, I'd solve the problem using generics and arrays instead. The macro only exists to remove the typing of the [], which I don't really think is worth it.
fn split_byte<A>(b: u8, bit_widths: A) -> A
where
A: Default + std::ops::IndexMut<usize, Output = u8>,
for<'a> &'a A: IntoIterator<Item = &'a u8>,
{
let mut result = A::default();
let mut start = 0;
for (idx, &width) in bit_widths.into_iter().enumerate() {
let shifted = b >> (8 - width - start);
let mask = (0..width).fold(0, |a, _| (a << 1) | 1);
result[idx] = shifted & mask;
start += width;
}
result
}
macro_rules! split_byte {
($b:expr, $($w:expr),+) => (split_byte($b, [$($w),+]));
}
fn main() {
let [bit1, bit2, bits3_to_8] = split_byte!(0b1010_1010, 1, 1, 6);
assert_eq!(bit1, 0b1);
assert_eq!(bit2, 0b0);
assert_eq!(bits3_to_8, 0b10_1010);
}
See also:
How does for<> syntax differ from a regular lifetime bound?
How to write a trait bound for adding two references of a generic type?
How do I write the lifetimes for references in a type constraint when one of them is a local reference?
If it's ok to target nightly Rust, I'd use the unstable min_const_generics feature:
#![feature(min_const_generics)]
fn split_byte<const N: usize>(b: u8, bit_widths: [u8; N]) -> [u8; N] {
let mut result = [0; N];
let mut start = 0;
for (idx, &width) in bit_widths.iter().enumerate() {
let shifted = b >> (8 - width - start);
let mask = (0..width).fold(0, |a, _| (a << 1) | 1);
result[idx] = shifted & mask;
start += width;
}
result
}
macro_rules! split_byte {
($b:expr, $($w:expr),+) => (split_byte($b, [$($w),+]));
}
fn main() {
let [bit1, bit2, bits3_to_8] = split_byte!(0b1010_1010, 1, 1, 6);
assert_eq!(bit1, 0b1);
assert_eq!(bit2, 0b0);
assert_eq!(bits3_to_8, 0b10_1010);
}
See also:
Is it possible to control the size of an array using the type parameter of a generic?

Does Rust have an equivalent to C++'s decltype() to get the type of an expression?

My code looks like:
macro_rules! mask {
($bitmap: tt, [..$count: tt], for type = $ty: ty) => {
{
let bit_count = std::mem::size_of::<$ty>() * 8;
let dec_bit_count = bit_count - 1;
$bitmap & [(1 << ($count & dec_bit_count)) - 1, <$ty>::MAX][((($count & !dec_bit_count)) != 0) as usize]
}
};
}
fn main() {
let bitmap: u8 = 0b_1111_1111;
let masked_bitmap = mask!(bitmap, [..5], for type = u8);
println!("{:#010b}", masked_bitmap);
}
The above code will mask the bitmap. In the above example, 0b_1111_1111 on being masked by [..5] will become 0b_0001_1111.
I want my macro to be like this:
macro_rules! mask {
($bitmap: tt, [..$count: tt]) => {
{
let bit_count = std::mem::size_of::<decltype($bitmap)>() * 8;
let dec_bit_count = bit_count - 1;
$bitmap & [(1 << ($count & dec_bit_count)) - 1, <decltype($bitmap)>::MAX][((($count & !dec_bit_count)) != 0) as usize]
}
};
}
But I have to pass type to the macro to get this done. Is there something like decltype() from C++ that I could use?
No, Rust does not have the ability to get the type of an arbitrary expression. typeof is a reserved keyword to potentially allow it in the future:
fn main() {
let a: i32 = 42;
let b: typeof(a) = a;
}
error[E0516]: `typeof` is a reserved keyword but unimplemented
--> src/main.rs:3:12
|
3 | let b: typeof(a) = a;
| ^^^^^^^^^ reserved keyword
There are RFCs suggesting that it be added.
See also:
How do I match the type of an expression in a Rust macro?
Is it possible to access the type of a struct member for function signatures or declarations?
.type` for getting concrete type of a binding — issue #2704
For your specific case, I would use traits instead:
use std::ops::RangeTo;
trait Mask {
fn mask(self, range: RangeTo<usize>) -> Self;
}
impl Mask for u8 {
#[inline]
fn mask(self, range: RangeTo<usize>) -> Self {
// Feel free to make this your more complicated bitwise logic
let mut m = 0;
for _ in 0..range.end {
m <<= 1;
m |= 1;
}
self & m
}
}
fn main() {
let bitmap: u8 = 0b_1111_1111;
let masked_bitmap = bitmap.mask(..5);
println!("{:#010b}", masked_bitmap);
}
You could use macros to implement the trait however:
macro_rules! impl_mask {
($($typ:ty),*) => {
$(
impl Mask for $typ {
#[inline]
fn mask(self, range: RangeTo<usize>) -> Self {
let mut m = 0;
for _ in 0..range.end {
m <<= 1;
m |= 1;
}
self & m
}
}
)*
};
}
impl_mask!(u8, u16, u32, u64, u128);

Efficient truncating string copy `str` to `[u8]` (utf8 aware strlcpy)?

While Rust provides str.as_bytes, I'm looking to copy a string into a fixed sized buffer, where only full unicode-scalar-values are copied into the buffer, and are instead truncated with a null terminator written at the end, in C terms, I'd call this a utf8 aware strlcpy (that is - it copies into a fixed size buffer and ensures its null terminated).
This is a function I came up with, but I expect there are better ways to do this in Rust:
// return the number of bytes written to
pub fn strlcpy_utf8(utf8_dst: &mut [u8], str_src: &str) -> usize {
let utf8_dst_len = utf8_dst.len();
if utf8_dst_len == 0 {
return 0;
}
let mut index: usize = 0;
if utf8_dst_len > 1 {
let mut utf8_buf: [u8; 4] = [0; 4];
for c in str_src.chars() {
let len_utf8 = c.len_utf8();
let index_next = index + len_utf8;
c.encode_utf8(&mut utf8_buf);
if index_next >= utf8_dst_len {
break;
}
utf8_dst[index..index_next].clone_from_slice(&utf8_buf[0..len_utf8]);
index = index_next;
}
}
utf8_dst[index] = 0;
return index + 1;
}
Note): I realize this isn't ideal since multiple UCS may make up a single glyph, however the result will at least be able to decoded back into a str.
Rust's str has a handy method char_indices for when you need to know the actual character boundaries. This would immediately simplify your function somewhat:
pub fn strlcpy_utf8(utf8_dst: &mut [u8], str_src: &str) -> usize {
let utf8_dst_len = utf8_dst.len();
if utf8_dst_len == 0 {
return 0;
}
let mut last_index = 0;
for (idx, _) in str_src.char_indices() {
if (idx+1) > utf8_dst_len {
break;
}
last_index = idx;
}
utf8_dst[0..last_index].copy_from_slice(&str_src.as_bytes()[0..last_index]);
utf8_dst[last_index] = 0;
return last_index + 1;
}
Playground
However you don't actually need to iterate through every character except when copying, as it turns out it's easy to find a boundary in UTF8; Rust has str::is_char_boundary(). This lets you instead look backwards from the end:
pub fn strlcpy_utf8(utf8_dst: &mut [u8], str_src: &str) -> usize {
let utf8_dst_len = utf8_dst.len();
if utf8_dst_len == 0 {
return 0;
}
let mut last_index = min(utf8_dst_len-1, str_src.len());
while last_index > 0 && !str_src.is_char_boundary(last_index) {
last_index -= 1;
}
utf8_dst[0..last_index].copy_from_slice(&str_src.as_bytes()[0..last_index]);
utf8_dst[last_index] = 0;
return last_index + 1;
}
Playground
Based on Chris Emerson's answer and #Matthieu-m's suggestion to remove a redundant check.
// returns the number of bytes written to
pub fn strlcpy_utf8(utf8_dst: &mut [u8], str_src: &str) -> usize {
let utf8_dst_len = utf8_dst.len();
if utf8_dst_len == 0 {
return 0;
}
// truncate if 'str_src' is too long
let mut last_index = str_src.len();
if last_index >= utf8_dst_len {
last_index = utf8_dst_len - 1;
// no need to check last_index > 0 here,
// is_char_boundary covers that case
while !str_src.is_char_boundary(last_index) {
last_index -= 1;
}
}
utf8_dst[0..last_index].clone_from_slice(&str_src.as_bytes()[0..last_index]);
utf8_dst[last_index] = 0;
return last_index + 1;
}
#ChrisEmerson: I'm posting this since it's the code I'm going to use for my project, feel free to update your answer with the changes if you like and I'll remove this answer.

Resources