Matching multiple possible types? - rust

I'm very, very new to Rust and struggling with it because of my strong weakly typed programming background.
The code below should write data being received from Python via PYO3 into a XLSX worksheet. I just don't know how to handle the last match, because "value" is of type PyAny (this is, its method extract can output multiple types such as String, f32, etc. and I want a specific behavior depending on the extracted type).
Maybe I could just chain matches for each potential extracted type (if first outputs Err, try the next), but I suspect there could be a better way. Maybe I'm just approaching the problem with a wrong design. Any insights will be welcome.
pub trait WriteValue {
fn write_value(&self, worksheet: &mut Worksheet, row: u32, col: u16, format: Option<&Format>) -> Result<(), XlsxError>;
}
impl WriteValue for String {
fn write_value(&self, worksheet: &mut Worksheet, row: u32, col: u16, format: Option<&Format>) -> Result<(), XlsxError> {
worksheet.write_string(row, col, &self, format)
}
}
impl WriteValue for f32 {
fn write_value(&self, worksheet: &mut Worksheet, row: u32, col: u16, format: Option<&Format>) -> Result<(), XlsxError> {
worksheet.write_number(row, col, f64::from(*self), format)
}
}
fn _write(path: &str, data: HashMap<u32, &PyList>, _highlight: Option<&PyDict>) -> Result<(), XlsxError> {
let workbook = Workbook::new(path);
let mut worksheet = workbook.add_worksheet(None)?;
let format_bold = workbook.add_format().set_bold();
for (row_index, values) in data {
let mut col_idx: u16 = 0;
for value in values {
col_idx += 1;
let row_format= match &row_index {
0 => Some(&format_bold),
_ => None
};
match value.extract::<String>() {
Ok(x) => x.write_value(&mut worksheet, row_index.clone(), &col_idx -1, row_format)?,
Err(_) => { }
}
}
}
workbook.close()
}

This is mostly a pyo3 API issue, and I don't think pyo3 has built-in "multiextract" though I'm not ultra familiar with it, so it may.
However, first since you don't care about the Err clause you could simplify your code by simply chaining if let statements, they're syntactic sugar but for unary or binary boolean conditions they're really convenient e.g.
if let Ok(x) = value.extract::<String>() {
x.write_value(...)
} else if let Ok(x) = value.extract::<f32>() {
// handle this case and possibly add a bunch more
} else {
// handle no case matching (optional if should be ignored)
}
Second, it looks like pyo3 lets you derive enums, since WriteValue is apparently an internal trait it would make sense to derive the corresponding enum:
#[derive(FromPyObject)]
enum Writables {
#[pyo3(transparent, annotation = "str")]
String(String),
#[pyo3(transparent, annotation = "float")]
Float(f32),
// put the other cases here
}
then you can extract to that and match all the variants at once (and handle the "unsupported types" separately).
In fact at this point the trait is probably unecessary, unless it's used for other stuff, you could just have your write_value method on the enum directly.
side-note: extracting a python float (which is a double) to an f32 then immediately widening it to an f64 in order to write it out seems... odd. Why not extract an f64 in the first place?

PyAny can be try to downcast to any other Python type. I am not proficient with PyO3, but the only approach I see here is to try to downcast to the types you support otherwise maybe launch an error:
fn _write(path: &str, data: HashMap<u32, &PyList>, _highlight: Option<&PyDict>) -> Result<(), XlsxError> {
let workbook = Workbook::new(path);
let mut worksheet = workbook.add_worksheet(None)?;
let format_bold = workbook.add_format().set_bold();
for (row_index, values) in data {
let mut col_idx: u16 = 0;
for value in values {
col_idx += 1;
let row_format= match &row_index {
0 => Some(&format_bold),
_ => None
};
if let Ok(string) = value.downcast::<PyString> {
// handle pystring object
string.write_value(&mut worksheet, row_index.clone(), &col_idx -1, row_format)?;
...
} else if let Ok(int) = value.downcast::<PyInt> {
// handle pyint object
...
} else {
// error, or not supported
}
}
}
workbook.close()
}

Related

Derive macro generation

I'm making my own Serializable trait, in the context of a client / server system.
My idea was that the messages sent by the system is an enum made by the user of this system, so it can be customize as needed.
Too ease implementing the trait on the enum, I would like to use the #[derive(Serializable)] method, as implementing it is always the same thing.
Here is the trait :
pub trait NetworkSerializable {
fn id(&self) -> usize;
fn size(&self) -> usize;
fn serialize(self) -> Vec<u8>;
fn deserialize(id: usize, data: Vec<u8>) -> Self;
}
Now, I've tried to look at the book (this one too) and this example to try to wrap my head around derive macros, but I'm really struggling to understand them and how to implement them. I've read about token streams and abstract trees, and I think I understand the basics.
Let's take the example of the id() method : it should gives a unique id for each variant of the enum, to allow headers of messages to tell which message is incoming.
let's say I have this enum as a message system :
enum NetworkMessages {
ErrorMessage,
SpawnPlayer(usize, bool, Transform), // player id, is_mine, position
MovePlayer(usize, Transform), // player id, new_position
DestroyPlayer(usize) // player_id
}
Then, the id() function should look like this :
fn id(&self) -> usize {
match &self {
&ErrorMessage => 0,
&SpawnPlayer => 1,
&MovePlayer => 2,
&DestroyPlayer => 3,
}
}
Here was my go with writting this using a derive macro :
#[proc_macro_derive(NetworkSerializable)]
pub fn network_serializable_derive(input: TokenStream) -> TokenStream {
// Construct a representation of Rust code as a syntax tree
// that we can manipulate
let ast = syn::parse(input).unwrap();
// Build the trait implementation
impl_network_serializable_macro(&ast)
}
fn impl_network_serializable_macro(ast: &syn::DeriveInput) -> TokenStream {
// get enum name
let ref name = ast.ident;
let ref data = ast.data;
let (id_func, size_func, serialize_func, deserialize_func) = match data {
// Only if data is an enum, we do parsing
Data::Enum(data_enum) => {
// Iterate over enum variants
let mut id_func_internal = TokenStream2::new();
let mut variant_id: usize = 0;
for variant in &data_enum.variants {
// add the branch for the variant
id_func_internal.extend(quote_spanned!{
variant.span() => &variant_id,
});
variant_id += 1;
}
(id_func_internal, (), (), ())
}
_ => {(TokenStream2::new(), (), (), ())},
};
let expanded = quote! {
impl NetworkSerializable for #name {
// variant_checker_functions gets replaced by all the functions
// that were constructed above
fn size(&self) -> usize {
match &self {
#id_func
}
}
/*
#size_func
#serialize_func
#deserialize_func
*/
}
};
expanded.into()
}
So this is generating quite a lot of errors, with the "proc macro NetworkSerializable not expanded: no proc macro dylib present" being first. So I'm guessing there a lot of misunderstaning from my part in here.

How do I read CSV data without knowing the structure at compile time?

I'm pretty new to Rust and trying to implement some kind of database. Users should create tables by giving a table name, a vector of column names and a vector of column types (realized over an enum). Filling tables should be done by specifying csv files. However, this requires the structure of the table rows to be specified at compile time, like shown in the basic example:
#[derive(Debug, Deserialize, Eq, PartialEq)]
struct Row {
key: u32,
name: String,
comment: String
}
use std::error::Error;
use csv::ReaderBuilder;
use serde::Deserialize;
use std::fs;
fn read_from_file(path: &str) -> Result<(), Box<dyn Error>> {
let data = fs::read_to_string(path).expect("Unable to read file");
let mut rdr = ReaderBuilder::new()
.has_headers(false)
.delimiter(b'|')
.from_reader(data.as_bytes());
let mut iter = rdr.deserialize();
if let Some(result) = iter.next() {
let record:Row = result?;
println!("{:?}", record);
Ok(())
} else {
Err(From::from("expected at least one record but got none"))
}
}
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization? Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization?
All generics replaced with concrete types at compile time. If you do not know types you will need in runtime, "generics" is not what you need.
Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
I suggest using Box<dyn Any> instead, to be able to store reference of any type and, still, know what type it is.
Maintenance cost for this approach is pretty high. You have to manage each possible value type everywhere you want to use a cell's value. On the other hand, you do not need to parse value each time, just make some type checks in runtime.
I have used std::any::TypeId to identify type, but it can not be used in match expressions. You can consider using custom enum as type identifier.
use std::any::{Any, TypeId};
use std::io::Read;
use csv::Reader;
#[derive(Default)]
struct Table {
name: String,
headers: Vec<(String, TypeId)>,
data: Vec<Vec<Box<dyn Any>>>,
}
impl Table {
fn add_header(&mut self, header: String, _type: TypeId) {
self.headers.push((header, _type));
}
fn populate_data<R: Read>(
&mut self,
rdr: &mut Reader<R>,
) -> Result<(), Box<dyn std::error::Error>> {
for record in rdr.records() {
let record = record?;
let mut row: Vec<Box<dyn Any>> = vec![];
for (&(_, type_id), value) in self.headers.iter().zip(record.iter()) {
if type_id == TypeId::of::<u32>() {
row.push(Box::new(value.parse::<u32>()?));
} else if type_id == TypeId::of::<String>() {
row.push(Box::new(value.to_owned()));
}
}
self.data.push(row);
}
Ok(())
}
}
impl std::fmt::Display for Table {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
writeln!(f, "Table: {}", self.name)?;
for (name, _) in self.headers.iter() {
write!(f, "{}, ", name)?;
}
writeln!(f)?;
for row in self.data.iter() {
for cell in row.iter() {
if let Some(&value) = cell.downcast_ref::<u32>() {
write!(f, "{}, ", value)?;
} else if let Some(value) = cell.downcast_ref::<String>() {
write!(f, "{}, ", value)?;
}
}
writeln!(f)?;
}
Ok(())
}
}
fn main() {
let mut table: Table = Default::default();
table.name = "Foo".to_owned();
table.add_header("key".to_owned(), TypeId::of::<u32>());
table.add_header("name".to_owned(), TypeId::of::<String>());
table.add_header("comment".to_owned(), TypeId::of::<String>());
let data = "\
key,name,comment
1,foo,foo comment
2,bar,bar comment
";
let mut rdr = Reader::from_reader(data.as_bytes());
table.populate_data(&mut rdr).unwrap();
print!("{}", table);
}

How to convert a vector of enums into a vector of inner values of a specific variant of that enum

The following code example is the best that I have come up with so far:
enum Variant {
VariantA(u64),
VariantB(f64),
}
fn main() {
let my_vec = vec![Variant::VariantA(1),
Variant::VariantB(-2.0),
Variant::VariantA(4),
Variant::VariantA(3),
Variant::VariantA(2),
Variant::VariantB(1.0)];
let my_u64_vec = my_vec
.into_iter()
.filter_map(|el| match el {
Variant::VariantA(inner) => Some(inner),
_ => None,
})
.collect::<Vec<u64>>();
println!("my_u64_vec = {:?}", my_u64_vec);
}
I would like to know if there is a less verbose way of obtaining the vector of inner values (i.e., Vec<u64> in the example). It feels like I might be able to use something like try_from or try_into to make this less verbose, but I cannot quite get there.
Enums are not "special" and don't have much if any implicitly associated magic, so by default yes you need a full match -- or at least an if let e.g.
if let Variant::VariantA(inner) = el { Some(inner) } else { None }
However nothing prevents you from implementing whatever utility methods you're thinking of on your enum e.g. get_a which would return an Option<A> (similar to Result::ok and Result::err), or indeed to implement TryFrom on it:
use std::convert::{TryFrom, TryInto};
enum Variant {
VariantA(u64),
VariantB(f64),
}
impl TryFrom<Variant> for u64 {
type Error = ();
fn try_from(value: Variant) -> Result<Self, Self::Error> {
if let Variant::VariantA(v) = value { Ok(v) } else { Err(()) }
}
}
fn main() {
let my_vec = vec![Variant::VariantA(1),
Variant::VariantB(-2.0),
Variant::VariantA(4),
Variant::VariantA(3),
Variant::VariantA(2),
Variant::VariantB(1.0)];
let my_u64_vec = my_vec
.into_iter()
.filter_map(|el| el.try_into().ok())
.collect::<Vec<u64>>();
println!("my_u64_vec = {:?}", my_u64_vec);
}

How to reduce boilerplate nested Result in Rust

I have code using a nested Result like this:
fn ip4(s: &str) -> Result<(u8, u8, u8, u8), num::ParseIntError> {
let t: Vec<_> = s.split('.').collect();
match t[0].parse::<u8>() {
Ok(a1) => {
match t[1].parse::<u8>() {
Ok(a2) => {
match t[2].parse::<u8>() {
Ok(a3) => {
match t[3].parse::<u8>() {
Ok(a4) => {
Ok((a1, a2, a3, a4))
}
Err(er) => Err(er)
}
},
Err(er) => Err(er)
}
}
Err(er) => Err(er)
}
}
Err(er) => Err(er),
}
}
Is there any function or composing way to reduce this? Something like Haskell or Scala programmers do:
fn ip4(s: &str) -> Result<(u8, u8, u8, u8), num::ParseIntError> {
let t: Vec<_> = s.split('.').collect();
Result
.lift((,,,))
.ap(() -> t[0].parse::<u8>())
.ap(() -> t[1].parse::<u8>())
.ap(() -> t[2].parse::<u8>())
.ap(() -> t[3].parse::<u8>()) // maybe more concise in Haskell or Scala but I think it's enough :)
}
The answer to your direct question is the questionmark operator which would allow you to replace your whole match block with
Ok((
t[0].parse::<u8>()?,
t[1].parse::<u8>()?,
t[2].parse::<u8>()?,
t[3].parse::<u8>()?,
))
where essentially ? will return the error immediately if one is encountered.
That said, Rust already provides APIs for parsing IP addresses. Even if you wanted to maintain your tuple approach (though why would you), you could implement your function as
fn ip4(s: &str) -> Result<(u8, u8, u8, u8), net::AddrParseError> {
let addr: net::Ipv4Addr = s.parse()?;
let octets = addr.octets();
Ok((octets[0], octets[1], octets[2], octets[3]))
}
or just pass around the Ipv4Addr value directly.
Though, I do not see anything bad in #loganfsmyth's answer, I want to add another solution.
Your problem is a very simple and general problem of all programming languages which can be solved very easily if you would have enough time or practice in optimizing solutions. There is some divide and conquer recursive technique which is usually used to solve such problems. For a start, imagine a more simple thing: parsing a single octet from a string. This is a simple parse which you already know. Then mentally try to expand this problem to a larger one - parsing all octets which is a simple repeating process of the smallest problem we have solved earlier (parsing a single octet). This leads us to an iterative/recursive process: do something until. Keeping this in mind I have rewritten your function to a simple iterative process which uses tail-recursion which will not cause a stack overflow as a usual recursion due to it's form:
use std::num;
#[derive(Debug, Copy, Clone)]
struct IpAddressOctets(pub u8, pub u8, pub u8, pub u8);
type Result = std::result::Result<IpAddressOctets, num::ParseIntError>;
fn ipv4(s: &str) -> Result {
let octets_str_array: Vec<_> = s.split('.').collect();
// If it does not contain 4 octets then there is a error.
if octets_str_array.len() != 4 {
return Ok(IpAddressOctets(0, 0, 0, 0)) // or other error type
}
let octets = Vec::new();
fn iter_parse(octets_str_array: Vec<&str>, mut octets: Vec<u8>) -> Result {
if octets.len() == 4 {
return Ok(IpAddressOctets(octets[0], octets[1], octets[2], octets[3]))
}
let index = octets.len();
octets.push(octets_str_array[index].parse::<u8>()?);
iter_parse(octets_str_array, octets)
}
iter_parse(octets_str_array, octets)
}
fn main() {
println!("IP address octets parsed: {:#?}", ipv4("10.0.5.234"));
}
Keep in mind that Rust language is a bit more functional than you might think.
Also, I would recommend you to read this book which greatly explains the solution.
You can use early returns to prevent the nesting (but not the repetition).
Note the body of the Err arms of the matches:
fn ip4(s: &str) -> Result<(u8, u8, u8, u8), num::ParseIntError> {
let t: Vec<_> = s.split('.').collect();
let a1 = match t[0].parse::<u8>() {
Ok(x) => x,
Err(er) => return Err(er),
};
let a2 = match t[1].parse::<u8>() {
Ok(x) => x,
Err(er) => return Err(er),
};
let a3 = match t[2].parse::<u8>() {
Ok(x) => x,
Err(er) => return Err(er),
};
let a4 = match t[3].parse::<u8>() {
Ok(x) => x,
Err(er) => return Err(er),
};
(a1, a2, a3, a4)
}
But, as the others have said, Rust already has a built-in way to parse IP addresses.

How can I set a struct field value by string name?

Out of habit from interpreted programming languages, I want to rewrite many values based on their key. I assumed that I would store all the information in the struct prepared for this project. So I started iterating:
struct Container {
x: String,
y: String,
z: String
}
impl Container {
// (...)
fn load_data(&self, data: &HashMap<String, String>) {
let valid_keys = vec_of_strings![ // It's simple vector with Strings
"x", "y", "z"
] ;
for key_name in &valid_keys {
if data.contains_key(key_name) {
self[key_name] = Some(data.get(key_name);
// It's invalid of course but
// I do not know how to write it correctly.
// For example, in PHP I would write it like this:
// $this[$key_name] = $data[$key_name];
}
}
}
// (...)
}
Maybe macros? I tried to use them. key_name is always interpreted as it is, I cannot get value of key_name instead.
How can I do this without repeating the code for each value?
With macros, I always advocate starting from the direct code, then seeing what duplication there is. In this case, we'd start with
fn load_data(&mut self, data: &HashMap<String, String>) {
if let Some(v) = data.get("x") {
self.x = v.clone();
}
if let Some(v) = data.get("y") {
self.y = v.clone();
}
if let Some(v) = data.get("z") {
self.z = v.clone();
}
}
Note the number of differences:
The struct must take &mut self.
It's inefficient to check if a value is there and then get it separately.
We need to clone the value because we only only have a reference.
We cannot store an Option in a String.
Once you have your code working, you can see how to abstract things. Always start by trying to use "lighter" abstractions (functions, traits, etc.). Only after exhausting that, I'd start bringing in macros. Let's start by using stringify
if let Some(v) = data.get(stringify!(x)) {
self.x = v.clone();
}
Then you can extract out a macro:
macro_rules! thing {
($this: ident, $data: ident, $($name: ident),+) => {
$(
if let Some(v) = $data.get(stringify!($name)) {
$this.$name = v.clone();
}
)+
};
}
impl Container {
fn load_data(&mut self, data: &HashMap<String, String>) {
thing!(self, data, x, y, z);
}
}
fn main() {
let mut c = Container::default();
let d: HashMap<_, _> = vec![("x".into(), "alpha".into())].into_iter().collect();
c.load_data(&d);
println!("{:?}", c);
}
Full disclosure: I don't think this is a good idea.

Resources