generics "specialization" for equal types - rust

I am trying to convert Rc<Vec<F>> into into Rc<Vec<T>> where T and F are numeric types like u8, f32, f64, etc. As the vectors may be quite large, I would like to avoid copying them if F and T are the same type. I do not manage to find out how to do that. Something like this -- it does not compile as the type comparison T == F is invalid:
fn convert_vec<F: num::NumCast + Copy, T: num::NumCast + Copy>(data: &[F], undef: T) -> Vec<T> {
data.iter()
.map(|v| match T::from(*v) {
Some(x) => x,
None => undef,
})
.collect()
}
fn convert_rc_vec<F: num::NumCast + Copy, T: num::NumCast + Copy>(
data: &Rc<Vec<F>>,
undef: T,
) -> anyhow::Result<Rc<Vec<T>>> {
if (T == F) { // invalid
Ok(data.clone()) // invalid
} else {
Ok(Rc::new(convert_vec(data, undef)))
}
}
The vector that I need to convert from is the response from a server which first sends the data type (something like "u8", "f32", "f64", ...) and then the actual data. At present, I store the vector with these data in enum like
pub enum Values {
UInt8(Rc<Vec<u8>>),
Float32(Rc<Vec<f32>>),
Float64(Rc<Vec<f64>>),
// ...
}
At compile time, I do not know in which format the server will send the data, i.e. I do not know F in advance. I do know T in every case I use it, but T might be a different type depending on the use case.
Using specialized functions like convert_rc_vec_to_f32 it is easy to handle the case where clone() is best. But that requires a separate function for each T with almost identical text. I am trying to find a more elegant solution than writing a macro or more or less repeating the code 9 times.

You should not try to prevent your function from being monomorphized with T and F being the same type, or even change its behavior in that case. Instead, you should not use it at all if it would be monomorphized in that case. This is possible because, if T and F were the same type, you would know it at compile time, so you could actually simply remove the function call at all.
It seems that you are actually storing all these vectors into an enum, which means you only know the actual type at run-time. But this doesn't mean my suggestion doesn't apply. Typically, if you wanted to get a vec of f32, you can do something like
match data {
Float32(v) => v,
Float64(v) => convert_rc_vec(v),
UInt8(v) => convert_rc_vec(v),
...
}

If T and F both have a 'static lifetime, then you can use TypeId to compare the two types "at runtime":
if TypeId::of::<T>() == TypeId::of::<F>() {
Ok(data.clone()) // invalid
} else {
/* ... */
}
However, since this comparison happens "at runtime", the type system still doesn't know that T == F inside of this branch. You can use unsafe code to force this "conversion":
if TypeId::of::<T>() == TypeId::of::<F>() {
Ok(unsafe {
// SAFETY: this is sound because `T == F`, so we're
// just helping the compiler along here, with no actual
// type conversions
Rc::<Vec<T>>::from_raw(
Rc::<Vec<F>>::into_raw(data.clone()) as *const _
)
})
} else {
/* ... */
}

Related

Metaprogamming name to function and type lookup in Rust?

I am working on a system which produces and consumes large numbers of "events", they are a name with some small payload of data, and an attached function which is used as a kind of fold-left over the data, something like a reducer.
I receive from the upstream something like {t: 'fieldUpdated', p: {new: 'new field value'}}, and must in my program associate the fieldUpdated "callback" function with the incoming event and apply it. There is a confirmation command I must echo back (which follows a programatic naming convention), and each type is custome.
I tried using simple macros to do codegen for the structs, callbacks, and with the paste::paste! macro crate, and with the stringify macro I made quite good progress.
Regrettably however I did not find a good way to metaprogram these into a list or map using macros. Extending an enum through macros doesn't seem to be possible, and solutions such as the use of ctors seems extremely hacky.
My ideal case is something this:
type evPayload = {
new: String
}
let evHandler = fn(evPayload: )-> Result<(), Error> { Ok(()) }
// ...
let data = r#"{"t": 'fieldUpdated', "p": {"new": 'new field value'}}"#'
let v: Value = serde_json::from_str(data)?;
Given only knowledge of data how can use macros, specifically (boilerplate is actually 2-3 types, 3 functions, some factory and helper functions) in a way that I can do a name-to-function lookup?
It seems like Serde's adjacently, or internally tagged would get me there, if I could modify a enum in a macro https://serde.rs/enum-representations.html#internally-tagged
It almost feels like I need a macro which can either maintain an enum, or I can "cheat" and use module scoped ctors to do a quasi-static initialization of the names and types into a map.
My program would have on the order of 40-100 of these, with anything from 3-10 in a module. I don't think ctors are necessarily a problem here, but the fact that they're a little grey area handshake, and that ctors might preclude one day being able to cross-compile to wasm put me off a little.
I actually had need of something similar today; the enum macro part specifically. But beware of my method: here be dragons!
Someone more experienced than me — and less mad — should probably vet this. Please do not assume my SAFETY comments to be correct.
Also, if you don't have variant that collide with rust keywords, you might want to tear out the '_' prefix hack entirely. I used a static mut byte array for that purpose, as manipulating strings was an order of magnitude slower, but that was benchmarked in a simplified function. There are likely better ways of doing this.
Finally, I am using it where failing to parse must cause panic, so error handling somewhat limited.
With that being said, here's my current solution:
/// NOTE: It is **imperative** that the length of this array is longer that the longest variant name +1
static mut CHECK_BUFF: [u8; 32] = [b'_'; 32];
macro_rules! str_enums {
($enum:ident: $($variant:ident),* $(,)?) => {
#[allow(non_camel_case_types)]
#[derive(Debug, Default, Hash, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum $enum {
#[default]
UNINIT,
$($variant),*,
UNKNOWN
}
impl FromStr for $enum {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
unsafe {
// SAFETY: Currently only single threaded
CHECK_BUFF[1..len].copy_from_slice(s.as_bytes());
let len = s.len() + 1;
assert!(CHECK_BUFF.len() >= len);
// SAFETY: Safe as long as CHECK_BUFF.len() >= s.len() + 1
match from_utf8_unchecked(&CHECK_BUFF[..len]) {
$(stringify!($variant) => Ok(Self::$variant),)*
_ => Err(format!(
"{} variant not accounted for: {s} ({},)",
stringify!($enum),
from_utf8_unchecked(&CHECK_BUFF[..len])
))
}
}
}
}
impl From<&$enum> for &'static str {
fn from(variant: &$enum) -> Self {
unsafe {
match variant {
// SAFETY: The first byte is always '_', and stripping it of should be safe.
$($enum::$variant => from_utf8_unchecked(&stringify!($variant).as_bytes()[1..]),)*
$enum::UNINIT => {
eprintln!("uninitialized {}!", stringify!($enum));
""
}
$enum::UNKNOWN => {
eprintln!("unknown {}!", stringify!($enum));
""
}
}
}
}
}
impl Display for $enum {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", Into::<&str>::into(self))
}
}
};
}
And then I call it like so:
str_enums!(
AttributeKind:
_alias,
_allowduplicate,
_altlen,
_api,
...
_enum,
_type,
_struct,
);
str_enums!(
MarkupKind:
_alias,
_apientry,
_command,
_commands,
...
);

Best way in Rust to count leaves in a binary search tree?

I'm developing a basic implementation of a binary search tree in Rust. I was creating a method for counting leaves, but ran into some very strange looking code to get it to work. I wanted to clarify if the way I did it is:
Considered appropriate by Rust standards/convention
Efficient
I'm using an enum that differentiates between a node or nothing being present:
pub enum BST<T: Ord> {
Node {
value: T, // template with type T
left: Box<BST<T>>,
right: Box<BST<T>>,
},
Empty,
}
Now, count_leaves(&self) is first checking if the provided type is either a Node or Empty. If it's Empty, I can just return 0, but if it's a valid Node then I need to check if the left and right children are Empty. If so, then I can return a 1 because I'm at a leaf.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node {
value: _,
ref left,
ref right,
} => {
match (&**left, &**right) {
(BST::Empty, BST::Empty) => 1,
_ => {
left.count_leaves() + right.count_leaves()
}
}
},
BST::Empty => 0
}
}
So, to check if both left and right are BST::Empty, I wanted to use a tuple! But in doing so, Rust tries to move both left and right into the tuple. Since my type BST<T> does not implement the Copy trait, this is not possible. Also, since left and right are both boxes and borrowed, something simply like this is not possible:
match (left, right) {
BST::Empty => {},
_ => {}
}
In order to use this tuple, it looks like I need to first dereference the borrowed box using *, then dereference that box again into its type using a second *, and then finally borrow using & to avoid a move. This gives the weird looking (&**left, &**right).
From my testing this works, but I thought it looked really strange. Should I rewrite this in a more readable way (if there is one)?
I've considered using Option<> instead of the enum with the Node and Empty, but I wasn't sure if that would lead to anything more readable or more efficient.
Thanks!
EDIT:
Just wanted to clarify that when I say leaves I mean a node in the tree with no children, not a non-empty node.
You're just overthinking it. You already have a base case for when a node is empty so you don't need both matches. When possible you want to ignore the boxes in favor of implicitly using Deref to perform operations on them.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node { left, right, .. } => 1 + left.count_leaves() + right.count_leaves(),
BST::Empty => 0,
}
}
By manually checking if both sides are empty before calling count_leaves on both sides, you might actually be decreasing performance. A recursive function call (or any function call really) can be very cheap since your code is already at the processor. However, it takes (a very tiny) time for a processor to read a value from a pointer so ideally you only needs to do it once per value. However the compiler is made of eldritch sorcery so it will probably figure out the best way to optimize your code either way. Another option which may help is to add an #[inline] hint to the function to ask the compiler to unroll the recursive call one or more times if it thinks it would be helpful for performance.
You may find it helpful to change the structure of your BST. By making your tree an enum, then it needs to be matched every time you perform any operation on it.
pub struct BST<T> {
left: Option<Box<BST<T>>>,
right: Option<Box<BST<T>>>,
data: T,
}
impl<T> BST<T> {
pub fn new_root(data: T) -> Self {
BST {
left: None,
right: None,
data,
}
}
pub fn count_leaves(&self) -> u64 {
let left_leaves = self.left.as_ref().map_or(0, |x| x.count_leaves());
let right_leaves = self.right.as_ref().map_or(0, |x| x.count_leaves());
left_leaves + right_leaves + 1
}
}
impl<T: Ord> BST<T> {
pub fn insert(&mut self, data: T) {
let side = match self.data.cmp(&data) {
Ordering::Less | Ordering::Equal => &mut self.left,
Ordering::Greater => &mut self.right,
};
if let Some(node) = side {
node.insert(data);
} else {
*side = Some(Box::new(Self::new_root(data)));
}
}
}
Now this works well, but it also introduces a new problem that I'm guessing you were attempting to avoid with your solution. You can't create an empty BST<T>. This may make initializing your program difficult. We can fix this by using a small wrapper struct (Ex: pub struct BinarySearchTree<T>(Option<BST<T>>)). This is also what std::collections::LinkedList does. You may also be surprised to learn that this cuts our memory footprint in half compared to the original post. This is caused by Empty requiring just as much space as Node. So this means we need to allocate the entire next layer of the tree even though we don't use it.

Refer to generic type of struct in macro

I need to use an attribute of the generic type of a struct in a macro.
A slightly contrived but minimal example would be if I wanted to implement a method for a generic struct, that returned the minimum value of its generic type.
struct Barn<T> {
hay: T
}
macro_rules! impl_min_hay{
($barntype:ident) => {
impl $barntype {
fn min_hay(&self) -> ????T {
????T::MIN
}
}
}
}
type SmallBarn = Barn<i8>;
type BigBarn = Barn<i64>;
impl_min_hay!(SmallBarn);
impl_min_hay!(BigBarn);
fn main() {
let barn = SmallBarn { hay: 5 };
println!("{}", barn.min_hay());
}
How would I resolve from SmallBarn to get the generic type and thus it's MIN attribute?
The actual problem I am trying to solve is a change to this macro. The macro is applied to, among others, BooleanChunked, which is defined as:
pub type BooleanChunked = ChunkedArray<BooleanType>
And I need to use an attribute of BooleanType
The only general solution I can think of is to define a trait that allows you to get at the type parameter (the syntax for this is <Type as Trait>::AssociatedType):
trait HasHayType {
type HayType;
}
impl<T> HasHayType for Barn<T> {
type HayType = T;
}
macro_rules! impl_min_hay{
($barntype:ident) => {
impl $barntype {
fn min_hay(&self) -> <$barntype as HasHayType>::HayType {
<$barntype as HasHayType>::HayType::MIN
}
}
}
}
Here's the complete program on play.rust-lang.org.
That said, once you have a trait, you don't really need the macro – you can just implement min_hay on the trait (this example makes use of the widely used num-traits crate, because this approach needs a trait for "things that have minimum values"):
use num_traits::Bounded;
trait HasHayType {
type HayType: Bounded;
fn min_hay(&self) -> Self::HayType;
}
impl<T: Bounded> HasHayType for Barn<T> {
type HayType = T;
fn min_hay(&self) -> T {
T::min_value()
}
}
And here's what that looks like as a complete program.
(And of course, once you've done that too, you don't really need the separate trait either: you can inline the definition of HasHayType into Barn, using a where clause if you want to be able to handle Barns with non-numerical hay types in addition to the ones where you'd use the macro. Presumably, though, the actual situation you have is more complex than the cut-down example you used for the question, so I gave the more complex versions in case the simplified versions wouldn't work.)
As a side note, min_hay doesn't actually need the &self parameter here; you could remove it, in order to be able to learn the minimum amount of hay without needing a barn to put it in.

Is there a shorter way than a match or if let to get data through many nested levels without increasing the total amount of code?

I work with a bunch of structs / enums included in each other. I need to get ty.node<TyKind::Path>.1.segments.last().identifiers and ty.node<TyKind::Path>.1.segments.last().parameters<AngleBracketed::AngleBracketed>.types.
Is there a simpler way to get these two values then my implementation of f? My ideal syntax would be:
ty.node<TyKind::Path>?.1.segments.last().identifiers
// and
ty.node<TyKind::Path>?.1.segments.last().parameters<AngleBracketed::AngleBracketed>?.types
It that's impossible, maybe there is a way to reduce the number of if let? I want to solve only this particular case, so simplification should be possible compared to f. If an analog of Option::map / Option::unwrap_or_else were introduced, then the sum of its code + the code in f should be less then my original f.
#[derive(Clone)]
struct Ty {
node: TyKind,
}
#[derive(Clone)]
enum TyKind {
Path(Option<i32>, Path),
}
#[derive(Clone)]
struct Path {
segments: Vec<PathSegment>,
}
#[derive(Clone)]
struct PathSegment {
identifier: String,
parameters: Option<Box<PathParameters>>,
}
#[derive(Clone)]
enum PathParameters {
AngleBracketed(AngleBracketedParameterData),
}
#[derive(Clone)]
struct AngleBracketedParameterData {
types: Vec<Box<Ty>>,
}
/// If Tylnode == Path -> return last path segment + types
fn f(ty: &Ty) -> Option<(String, Vec<Box<Ty>>)> {
match ty.node {
TyKind::Path(_, ref path) => if let Some(seg) = path.segments.iter().last() {
let ident = seg.identifier.clone();
println!("next_ty: seg.id {:?}", seg.identifier);
match seg.parameters.as_ref() {
Some(params) => match **params {
PathParameters::AngleBracketed(ref params) => {
Some((ident, params.types.clone()))
}
_ => Some((ident, vec![])),
},
None => Some((ident, vec![])),
}
} else {
None
},
_ => None,
}
}
To simplify the question, I have removed unrelated enum variants and struct fields.
No.
The closest you can get, using nightly features and helper code, is probably this
fn f(ty: &Ty) -> MyOption<(String, Vec<Box<Ty>>)> {
let last = ty.node.path()?.segments.my_last()?;
Just((
last.identifier.clone(),
last.ab_parameters()
.map(|v| v.types.clone())
.unwrap_or_else(|| vec![]),
))
}
Playground
I guess what you want is called Lenses. Not sure about Rust, but here is about Haskell https://en.m.wikibooks.org/wiki/Haskell/Lenses_and_functional_references
It might be possible to implement that in Rust, if somebody haven't done yet.

How can I reuse a box that I have moved the value out of?

I have some non-copyable type and a function that consumes and (maybe) produces it:
type Foo = Vec<u8>;
fn quux(_: Foo) -> Option<Foo> {
Some(Vec::new())
}
Now consider a type that is somehow conceptually very similar to Box:
struct NotBox<T> {
contents: T
}
We can write a function that temporarily moves out contents of the NotBox and puts something back in before returning it:
fn bar(mut notbox: NotBox<Foo>) -> Option<NotBox<Foo>> {
let foo = notbox.contents; // now `notbox` is "empty"
match quux(foo) {
Some(new_foo) => {
notbox.contents = new_foo; // we put something back in
Some(notbox)
}
None => None
}
}
I want to write an analogous function that works with Boxes but the compiler does not like it:
fn baz(mut abox: Box<Foo>) -> Option<Box<Foo>> {
let foo = *abox; // now `abox` is "empty"
match quux(foo) {
Some(new_foo) => {
*abox = new_foo; // error: use of moved value: `abox`
Some(abox)
}
None => None
}
}
I could return Some(Box::new(new_foo)) instead but that performs unnecessary allocation - I already have some memory at my disposal! Is it possible to avoid that?
I would also like to get rid of the match statements but again the compiler is not happy with it (even for the NotBox version):
fn bar(mut notbox: NotBox<Foo>) -> Option<NotBox<Foo>> {
let foo = notbox.contents;
quux(foo).map(|new_foo| {
notbox.contents = new_foo; // error: capture of partially moved value: `notbox`
notbox
})
}
Is it possible to work around that?
So, moving out of a Box is a special case... now what?
The std::mem module presents a number of safe functions to move values around, without poking holes (!) into the memory safety of Rust. Of interest here are swap and replace:
pub fn replace<T>(dest: &mut T, src: T) -> T
Which we can use like so:
fn baz(mut abox: Box<Foo>) -> Option<Box<Foo>> {
let foo = std::mem::replace(&mut *abox, Foo::default());
match quux(foo) {
Some(new_foo) => {
*abox = new_foo;
Some(abox)
}
None => None
}
}
It also helps in the map case, because it does not borrow the Box:
fn baz(mut abox: Box<Foo>) -> Option<Box<Foo>> {
let foo = std::mem::replace(&mut *abox, Foo::default());
quux(foo).map(|new_foo| { *abox = new_foo; abox })
}
Moving out of boxes is special-cased in the compiler. You can move something out of them, but you can't move something back in, because the act of moving out also deallocates. You can do something silly with std::ptr::write, std::ptr::read, and std::ptr::replace, but it's hard to get it right, because something valid should be inside a Box when it is dropped. I would recommend just accepting the allocation, or switching to a Box<Option<Foo>> instead.
We can write a function that temporarily moves out contents of the NotBox and puts something back in before returning it
That's because you can partially move out from the struct that you take by value. It behaves as if all fields were separate variables. That is not possible though if the struct implements Drop, because drop needs the whole struct to be valid, always (in case of panic).
As for providing workaround, you haven't provided enough information – especially, why baz needs to take Box as an argument and why quux can't? Which functions are yours and which are part of an API you can't change? What is the real type of Foo? Is it big?
The best workaround would be not to use Box at all.

Resources