Attempt to implement sscanf in Rust, failing when passing &str as argument - rust

Problem:
Im new to Rust, and im trying to implement a macro which simulates sscanf from C.
So far it works with any numeric types, but not with strings, as i am already trying to parse a string.
macro_rules! splitter {
( $string:expr, $sep:expr) => {
let mut iter:Vec<&str> = $string.split($sep).collect();
iter
}
}
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
$x = res[i].parse::<$y>().unwrap_or_default();
i+=1;
)*
};
}
fn main() {
let mut a :u8; let mut b :i32; let mut c :i16; let mut d :f32;
let buffer = "00:98;76,39.6";
let sep = [':',';',','];
scan_to_types!(buffer,sep,[u8,i32,i16,f32],a,b,c,d); // this will work
println!("{} {} {} {}",a,b,c,d);
}
This obviously wont work, because at compile time, it will try to parse a string slice to str:
let a :u8; let b :i32; let c :i16; let d :f32; let e :&str;
let buffer = "02:98;abc,39.6";
let sep = [':',';',','];
scan_to_types!(buffer,sep,[u8,i32,&str,f32],a,b,e,d);
println!("{} {} {} {}",a,b,e,d);
$x = res[i].parse::<$y>().unwrap_or_default();
| ^^^^^ the trait `FromStr` is not implemented for `&str`
What i have tried
I have tried to compare types using TypeId, and a if else condition inside of the macro to skip the parsing, but the same situation happens, because it wont expand to a valid code:
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
if TypeId::of::<$y>() == TypeId::of::<&str>(){
$x = res[i];
}else{
$x = res[i].parse::<$y>().unwrap_or_default();
}
i+=1;
)*
};
}
Is there a way to set conditions or skip a repetition inside of a macro ? Or instead, is there a better aproach to build sscanf using macros ? I have already made functions which parse those strings, but i couldnt pass types as arguments, or make them generic.

Note before the answer: you probably don't want to emulate sscanf() in Rust. There are many very capable parsers in Rust, so you should probably use one of them.
Simple answer: the simplest way to address your problem is to replace the use of &str with String, which makes your macro compile and run. If your code is not performance-critical, that is probably all you need. If you care about performance and about avoiding allocation, read on.
A downside of String is that under the hood it copies the string data from the string you're scanning into a freshly allocated owned string. Your original approach of using an &str should have allowed for your &str to directly point into the data that was scanned, without any copying. Ideally we'd like to write something like this:
trait MyFromStr {
fn my_from_str(s: &str) -> Self;
}
// when called on a type that impls `FromStr`, use `parse()`
impl<T: FromStr + Default> MyFromStr for T {
fn my_from_str(s: &str) -> T {
s.parse().unwrap_or_default()
}
}
// when called on &str, just return it without copying
impl MyFromStr for &str {
fn my_from_str(s: &str) -> &str {
s
}
}
Unfortunately that doesn't compile, complaining of a "conflicting implementation of trait MyFromStr for &str", even though there is no conflict between the two implementations, as &str doesn't implement FromStr. But the way Rust currently works, a blanket implementation of a trait precludes manual implementations of the same trait, even on types not covered by the blanket impl.
In the future this will be resolved by specialization. Specialization is not yet part of stable Rust, and might not come to stable Rust for years, so we have to think of another solution. In case of macro usage, we can just let the compiler "specialize" for us by creating two traits with the same name. (This is similar to the autoref-based specialization invented by David Tolnay, but even simpler because it doesn't require autoref resolution to work, as we have the types provided explicitly.)
We create separate traits for parsed and unparsed values, and implement them as needed:
trait ParseFromStr {
fn my_from_str(s: &str) -> Self;
}
impl<T: FromStr + Default> ParseFromStr for T {
fn my_from_str(s: &str) -> T {
s.parse().unwrap_or_default()
}
}
pub trait StrFromStr {
fn my_from_str(s: &str) -> &str;
}
impl StrFromStr for &str {
fn my_from_str(s: &str) -> &str {
s
}
}
Then in the macro we just call <$y>::my_from_str() and let the compiler generate the correct code. Since macros are untyped, this works because we never need to provide a single "trait bound" that would disambiguate which my_from_str() we want. (Such a trait bound would require specialization.)
macro_rules! scan_to_types {
($buffer:expr,$sep:expr,[$($y:ty),+],$($x:expr),+) => {
#[allow(unused_assignments)]
{
let res = splitter!($buffer,$sep);
let mut i = 0;
$(
$x = <$y>::my_from_str(&res[i]);
i+=1;
)*
}
};
}
Complete example in the playground.

Related

Taking ownership of a &String without copying

Context
Link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9a9ffa99023735f4fbedec09e1c7ac55
Here's a contrived repro of what I'm running into
fn main() {
let mut s = String::from("Hello World");
example(&mut s);
}
fn example(s: &mut str) -> Option<String> {
other_func(Some(s.to_owned()))
// other random mutable stuff happens
}
fn other_func(s: Option<String>) {
match s {
Some(ref s) => other_func2(*s),
None => panic!()
}
}
fn other_func2(s: String) {
println!("{}", &s)
}
and the error
Compiling playground v0.0.1 (/playground)
error[E0507]: cannot move out of `*s` which is behind a shared reference
--> src/main.rs:12:36
|
12 | Some(ref s) => other_func2(*s),
| ^^ move occurs because `*s` has type `String`, which does not implement the `Copy` trait
Question
In the following code, why can't I deference the &String without having to do some sort of clone/copy? i.e. this doesn't work
fn other_func(s: Option<String>) {
match s {
Some(ref s) => other_func2(*s),
None => panic!()
}
}
but it works if I replace *s with s.to_owned()/s.to_string()/s.clone()
As an aside, I understand this can probably be solved by refactoring to use &str, but I'm specifically interested in turning &String -> String
Why would the compiler allow you to?
s is &String. And you cannot get a String from a &String without cloning. That's obvious.
And the fact that it was created from an owned String? The compiler doesn't care, and it is right. This is not different from the following code:
let s: String = ...;
let r: &String = ...;
let s2: String = *r; // Error
Which is in turn not different from the following code, for instance, as far as the compiler is concerned:
let r: &String = ...;
let s: String = *s;
And we no longer have an owned string at the beginning. In general, the compiler doesn't track data flow. And rightfully so - when it type-checks the move it doesn't even can confirm that this reference isn't aliased. Or that the owned value is not used anymore. References are just references, they give you no right to drop the value.
Changing that will not be feasible in the general case (for example, the compiler will have to track data flow across function calls), and will require some form of manual annotation to say "this value is mine". And you already have such annotation - use an owned value, String, instead of &String: this is exactly what it's about.

Writing to a file or String in Rust

TL;DR: I want to implement trait std::io::Write that outputs to a memory buffer, ideally String, for unit-testing purposes.
I must be missing something simple.
Similar to another question, Writing to a file or stdout in Rust, I am working on a code that can work with any std::io::Write implementation.
It operates on structure defined like this:
pub struct MyStructure {
writer: Box<dyn Write>,
}
Now, it's easy to create instance writing to either a file or stdout:
impl MyStructure {
pub fn use_stdout() -> Self {
let writer = Box::new(std::io::stdout());
MyStructure { writer }
}
pub fn use_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let writer = Box::new(File::create(path)?);
Ok(MyStructure { writer })
}
pub fn printit(&mut self) -> Result<()> {
self.writer.write(b"hello")?;
Ok(())
}
}
But for unit testing, I also need to have a way to run the business logic (here represented by method printit()) and trap its output, so that its content can be checked in the test.
I cannot figure out how to implement this. This playground code shows how I would like to use it, but it does not compile because it breaks borrowing rules.
// invalid code - does not compile!
fn main() {
let mut buf = Vec::new(); // This buffer should receive output
let mut x2 = MyStructure { writer: Box::new(buf) };
x2.printit().unwrap();
// now, get the collected output
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
// here I want to analyze the output, for instance in unit-test asserts
println!("Output to string was {}", output);
}
Any idea how to write the code correctly? I.e., how to implement a writer on top of a memory structure (String, Vec, ...) that can be accessed afterwards?
Something like this does work:
let mut buf = Vec::new();
{
// Use the buffer by a mutable reference
//
// Also, we're doing it inside another scope
// to help the borrow checker
let mut x2 = MyStructure { writer: Box::new(&mut buf) };
x2.printit().unwrap();
}
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
println!("Output to string was {}", output);
However, in order for this to work, you need to modify your type and add a lifetime parameter:
pub struct MyStructure<'a> {
writer: Box<dyn Write + 'a>,
}
Note that in your case (where you omit the + 'a part) the compiler assumes that you use 'static as the lifetime of the trait object:
// Same as your original variant
pub struct MyStructure {
writer: Box<dyn Write + 'static>
}
This limits the set of types which could be used here, in particular, you cannot use any kinds of borrowed references. Therefore, for maximum genericity we have to be explicit here and define a lifetime parameter.
Also note that depending on your use case, you can use generics instead of trait objects:
pub struct MyStructure<W: Write> {
writer: W
}
In this case the types are fully visible at any point of your program, and therefore no additional lifetime annotation is needed.

Rust: polymorphic calls for structs in a vector

I'm a complete newbie in Rust and I'm trying to get some understanding of the basics of the language.
Consider the following trait
trait Function {
fn value(&self, arg: &[f64]) -> f64;
}
and two structs implementing it:
struct Add {}
struct Multiply {}
impl Function for Add {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] + arg[1]
}
}
impl Function for Multiply {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] * arg[1]
}
}
In my main() function I want to group two instances of Add and Multiply in a vector, and then call the value method. The following works:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<&dyn Function> = vec![&Add {}, &Multiply {}];
for f in funcs {
println!("{}", f.value(&x));
}
}
And so does:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<Box<dyn Function>> = vec![Box::new(Add {}), Box::new(Multiply {})];
for f in funcs {
println!("{}", f.value(&x));
}
}
Is there any better / less verbose way? Can I work around wrapping the instances in a Box? What is the takeaway with trait objects in this case?
Is there any better / less verbose way?
There isn't really a way to make this less verbose. Since you are using trait objects, you need to tell the compiler that the vectors's items are dyn Function and not the concrete type. The compiler can't just infer that you meant dyn Function trait objects because there could have been other traits that Add and Multiply both implement.
You can't abstract out the calls to Box::new either. For that to work, you would have to somehow map over a heterogeneous collection, which isn't possible in Rust. However, if you are writing this a lot, you might consider adding helper constructor functions for each concrete impl:
impl Add {
fn new() -> Add {
Add {}
}
fn new_boxed() -> Box<Add> {
Box::new(Add::new())
}
}
It's idiomatic to include a new constructor wherever possible, but it's also common to include alternative convenience constructors.
This makes the construction of the vector a bit less noisy:
let funcs: Vec<Box<dyn Function>> = vec!(Add::new_boxed(), Multiply::new_boxed()));
What is the takeaway with trait objects in this case?
There is always a small performance hit with using dynamic dispatch. If all of your objects are the same type, they can be densely packed in memory, which can be much faster for iteration. In general, I wouldn't worry too much about this unless you are creating a library crate, or if you really want to squeeze out the last nanosecond of performance.

Rc<Trait> to Option<T>?

I'm trying to implement a method that looks like:
fn concretify<T: Any>(rc: Rc<Any>) -> Option<T> {
Rc::try_unwrap(rc).ok().and_then(|trait_object| {
let b: Box<Any> = unimplemented!();
b.downcast().ok().map(|b| *b)
})
}
However, try_unwrap doesn't work on trait objects (which makes sense, as they're unsized). My next thought was to try to find some function that unwraps Rc<Any> into Box<Any> directly. The closest thing I could find would be
if Rc::strong_count(&rc) == 1 {
Some(unsafe {
Box::from_raw(Rc::into_raw(rc))
})
} else {
None
}
However, Rc::into_raw() appears to require that the type contained in the Rc to be Sized, and I'd ideally not like to have to use unsafe blocks.
Is there any way to implement this?
Playground Link, I'm looking for an implementation of rc_to_box here.
Unfortunately, it appears that the API of Rc is lacking the necessary method to be able to get ownership of the wrapped type when it is !Sized.
The only method which may return the interior item of a Rc is Rc::try_unwrap, however it returns Result<T, Rc<T>> which requires that T be Sized.
In order to do what you wish, you would need to have a method with a signature: Rc<T> -> Result<Box<T>, Rc<T>>, which would allow T to be !Sized, and from there you could extract Box<Any> and perform the downcast call.
However, this method is impossible due to how Rc is implemented. Here is a stripped down version of Rc:
struct RcBox<T: ?Sized> {
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
pub struct Rc<T: ?Sized> {
ptr: *mut RcBox<T>,
_marker: PhantomData<T>,
}
Therefore, the only Box you can get out of Rc<T> is Box<RcBox<T>>.
Note that the design is severely constrained here:
single-allocation mandates that all 3 elements be in a single struct
T: ?Sized mandates that T be the last field
so there is little room for improvement in general.
However, in your specific case, it is definitely possible to improve on the generic situation. It does, of course, require unsafe code. And while it works fairly well with Rc, implementing it with Arc would be complicated by the potential data-races.
Oh... and the code is provided as is, no warranty implied ;)
use std::any::Any;
use std::{cell, mem, ptr};
use std::rc::Rc;
struct RcBox<T: ?Sized> {
strong: cell::Cell<usize>,
_weak: cell::Cell<usize>,
value: T,
}
fn concretify<T: Any>(rc: Rc<Any>) -> Option<T> {
// Will be responsible for freeing the memory if there is no other weak
// pointer by the end of this function.
let _guard = Rc::downgrade(&rc);
unsafe {
let killer: &RcBox<Any> = {
let killer: *const RcBox<Any> = mem::transmute(rc);
&*killer
};
if killer.strong.get() != 1 { return None; }
// Do not forget to decrement the count if we do take ownership,
// as otherwise memory will not get released.
let result = killer.value.downcast_ref().map(|r| {
killer.strong.set(0);
ptr::read(r as *const T)
});
// Do not forget to destroy the content of the box if we did not
// take ownership
if result.is_none() {
let _: Rc<Any> = mem::transmute(killer as *const RcBox<Any>);
}
result
}
}
fn main() {
let x: Rc<Any> = Rc::new(1);
println!("{:?}", concretify::<i32>(x));
}
I don't think it's possible to implement your concretify function if you're expecting it to move the original value back out of the Rc; see this question for why.
If you're willing to return a clone, it's straightforward:
fn concretify<T: Any+Clone>(rc: Rc<Any>) -> Option<T> {
rc.downcast_ref().map(Clone::clone)
}
Here's a test:
#[derive(Debug,Clone)]
struct Foo(u32);
#[derive(Debug,Clone)]
struct Bar(i32);
fn main() {
let rc_foo: Rc<Any> = Rc::new(Foo(42));
let rc_bar: Rc<Any> = Rc::new(Bar(7));
let foo: Option<Foo> = concretify(rc_foo);
println!("Got back: {:?}", foo);
let bar: Option<Foo> = concretify(rc_bar);
println!("Got back: {:?}", bar);
}
This outputs:
Got back: Some(Foo(42))
Got back: None
Playground
If you want something more "movey", and creating your values is cheap, you could also make a dummy, use downcast_mut() instead of downcast_ref(), and then std::mem::swap with the dummy.

How to take ownership of Any:downcast_ref from trait object?

I've met a conflict with Rust's ownership rules and a trait object downcast. This is a sample:
use std::any::Any;
trait Node{
fn gen(&self) -> Box<Node>;
}
struct TextNode;
impl Node for TextNode{
fn gen(&self) -> Box<Node>{
Box::new(TextNode)
}
}
fn main(){
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
let foo = &node as &Any;
match foo.downcast_ref::<TextNode>(){
Some(n) => {
v.push(*n);
},
None => ()
};
}
The TextNode::gen method has to return Box<Node> instead of Box<TextNode>, so I have to downcast it to Box<TextNode>.
Any::downcast_ref's return value is Option<&T>, so I can't take ownership of the downcast result and push it to v.
====edit=====
As I am not good at English, my question is vague.
I am implementing (copying may be more precise) the template parser in Go standard library.
What I really need is a vector, Vec<Box<Node>> or Vec<Box<Any>>, which can contain TextNode, NumberNode, ActionNode, any type of node that implements the trait Node can be pushed into it.
Every node type needs to implement the copy method, return Box<Any>, and then downcasting to the concrete type is OK. But to copy Vec<Box<Any>>, as you don't know the concrete type of every element, you have to check one by one, that is really inefficient.
If the copy method returns Box<Node>, then copying Vec<Box<Node>> is simple. But it seems that there is no way to get the concrete type from trait object.
If you control trait Node you can have it return a Box<Any> and use the Box::downcast method
It would look like this:
use std::any::Any;
trait Node {
fn gen(&self) -> Box<Any>; // downcast works on Box<Any>
}
struct TextNode;
impl Node for TextNode {
fn gen(&self) -> Box<Any> {
Box::new(TextNode)
}
}
fn main() {
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
if let Ok(n) = node.downcast::<TextNode>() {
v.push(*n);
}
}
Generally speaking, you should not jump to using Any. I know it looks familiar when coming from a language with subtype polymorphism and want to recreate a hierarchy of types with some root type (like in this case: you're trying to recreate the TextNode is a Node relationship and create a Vec of Nodes). I did it too and so did many others: I bet the number of SO questions on Any outnumbers the times Any is actually used on crates.io.
While Any does have its uses, in Rust it has alternatives.
In case you have not looked at them, I wanted to make sure you considered doing this with:
enums
Given different Node types you can express the "a Node is any of these types" relationship with an enum:
struct TextNode;
struct XmlNode;
struct HtmlNode;
enum Node {
Text(TextNode),
Xml(XmlNode),
Html(HtmlNode),
}
With that you can put them all in one Vec and do different things depending on the variant, without downcasting:
let v: Vec<Node> = vec![
Node::Text(TextNode),
Node::Xml(XmlNode),
Node::Html(HtmlNode)];
for n in &v {
match n {
&Node::Text(_) => println!("TextNode"),
&Node::Xml(_) => println!("XmlNode"),
&Node::Html(_) => println!("HtmlNode"),
}
}
playground
adding a variant means potentially changing your code in many places: the enum itself and all the functions that do something with the enum (to add the logic for the new variant). But then again, with Any it's mostly the same, all those functions might need to add the downcast to the new variant.
Trait objects (not Any)
You can try putting the actions you'd want to perform on the various types of nodes in the trait, so you don't need to downcast, but just call methods on the trait object.
This is essentially what you were doing, except putting the method on the Node trait instead of downcasting.
playground
The (more) ideomatic way for the problem:
use std::any::Any;
pub trait Nodeable {
fn as_any(&self) -> &dyn Any;
}
#[derive(Clone, Debug)]
struct TextNode {}
impl Nodeable for TextNode {
fn as_any(&self) -> &dyn Any {
self
}
}
fn main() {
let mut v: Vec<Box<dyn Nodeable>> = Vec::new();
let node = TextNode {}; // or impl TextNode::new
v.push(Box::new(node));
// the downcast back to TextNode could be solved like this:
if let Some(b) = v.pop() { // only if we have a node…
let n = (*b).as_any().downcast_ref::<TextNode>().unwrap(); // this is secure *)
println!("{:?}", n);
};
}
*) This is secure: only Nodeables are allowd to be downcasted to types that had Nodeable implemented.

Resources