I inherited a Rust application and I wish to make a small modification to it. Presently, it retrieves records from Cassandra in the following way using Futures:
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct Item {
pub a: String,
pub b: f64,
}
#[derive(Debug)]
pub enum DataSetError {
CassandraError(cassandra_cpp::Error),
}
pub type Result<T> = std::result::Result<T, DataSetError>;
pub fn select_cass_items(
session: &Session,
a: String,
) -> impl Future<Output = Result<Vec<Item>>> + Unpin {
let table = envmnt::get_or("TABLE", "ab_table");
let mut statement = stmt!(&("SELECT a, b FROM ".to_owned() + &table + " WHERE a = ?"));
statement.bind(0, a).unwrap();
session.execute(&statement).map(|result| {
result
.map(|rows| {
rows.iter()
.map(|row| Item {
a: row.get_by_name("a").unwrap(),
b: row.get_by_name("b").unwrap(),
})
.collect()
})
.map_err(|e| {
warn!("[select_cass_items] {:?}", e);
DataSetError::CassandraError(e)
})
})
}
I want to add the option of doing the same thing but from Parquet files. I have written a simple non-Future function (below) that does the equivalent reading/filtering operation as the Cassandra function. I've verified it works as intended.
pub fn read_parquet_file (
a: String)
-> Vec<Item> {
let reader = SerializedFileReader::try_from("/path/to/file.parquet".to_string()).unwrap();
let iter = reader.get_row_iter(None).unwrap();
iter.filter_map(|row| {
if row.get_string(0).unwrap() == &a {
Some(Item {
a: row.get_string(0).unwrap().to_string(),
b: row.get_double(1).unwrap(),
})
}
else {
None
}
}).collect::<Vec<_>>()
}
The question is: how do I convert the non-Future Parquet function to be a drop-in replacement for the Future Cassandra function? I see that the cassandra_cpp crate supports Futures, but the parquet create does not. Surely there must be a way to do this? However, I'm a Rust newbie, and I can't find any examples close enough to what I want to be able to mogrify my work into what I need. I've tried various things but they've all been dead ends, and aren't worth sharing.
Thank you!
Related
I need a way to put different objects that all implement a certain trait integrate() in one enum. This enum shall implement a method that calls its variant's method integrate() in a certain way e.g. many times.
I tried to make a very simple example, but it is still not as short as I would want it to be.
Some more explanation: I want to write a solver that integrates certain differential equations i.e. calculate how a physical system behaves over a certain time span. For each time step the method integrate() is called. But when I execute the program I want to be able to choose which physical system is used at runtime. My idea was to have an enum that has the different physical systems in it e.g. OscillatorA and OscillatorB (in reality this could be a double pendulum, or a vibrating string - doesn't matter).
pub trait Integrate {
fn integrate(&mut self);
}
pub struct OscillatorA {
z: u32,
}
impl Integrate for OscillatorA {
fn integrate(&mut self) {
self.z += 1; // something happens here
}
}
#[derive(Debug)]
pub struct OscillatorB {
x: u32,
y: u32,
}
impl Integrate for OscillatorB {
fn integrate(&mut self) {
self.x += 1; // something different happens here
self.y += 2;
}
}
#[derive(Debug)]
pub enum Oscillator {
A(OscillatorA),
B(OscillatorB),
// ... many other physical systems come here
}
impl Oscillator {
pub fn new(num: &u64) -> Self {
match num {
0 => Self::A(OscillatorA { z: 1 }),
1.. => Self::B(OscillatorB { x: 1, y: 2 }),
}
}
}
impl Integrate for Oscillator {
fn integrate(&mut self) {
// this looks like it should be redundant:
match self {
Self::A(osc) => osc.integrate(),
Self::B(osc) => osc.integrate(),
}
}
}
pub fn integrate_n_times(object: &mut impl Integrate, n: u64) {
for _ in 0..n {
object.integrate();
}
}
fn main() {
let which = 0; // can be set via commandline arguments.
let mut s = Oscillator::new(&which);
integrate_n_times(&mut s, 10);
// ..
}
The function integrate_n_times(&mut self, n) will call n times the integrate() method required by the Integrate-trait. But it somehow doesn't feel right, because it will at each iteration solve a match-statement. I guess with compiler optimizations this might be avoided, but it somehow "feels" wrong, because it certainly reads like this.
Is there a better design pattern I am missing? Should I require the method "integrate_n_times" through the trait as well? (But then I would rely on it being written correctly in every Oscillator struct).
I somehow need to have one "main-struct" that I contains all the different physical systems and can call them depending on what arguments I pass to the program.
I would probably use dynamic dispatch here. While it's generally slower than using static dispatch, I would imagine it's faster than a massive match cases. Plus I think it's easier to work with, as long as we don't try to get the original type with Any and down-casting.
impl Oscillator {
pub fn new(num: &u64) -> Box<dyn Integrate> {
match num {
0 => Box::new(OscillatorA { z: 1 }),
1.. => Box::new(OscillatorB { x: 1, y: 2 }),
}
}
}
pub fn integrate_n_times(object: &mut Box<dyn Integrate>, n: u64) {
for _ in 0..n {
object.integrate();
}
}
fn main() {
let which = 0; // can be set via commandline arguments.
let mut my_oscillator: Box<dyn Integrate> = Oscillator::new(&which);
integrate_n_times(&mut my_oscillator, 10);
}
I am saving in append mode a stream of events on a YAML log file, where each event is represented by an indivual document, like this:
---
type: event
id: 1
---
type: trigger
id: 2
At some point later I want to iterate on these events, parsing each via serde_yaml. To my understanding though, serde_yaml doesn't seem to support parsing multiple documents from a single reader, as none of the available methods mention it, and trying to parse multiple documents at once results in a MoreThanOneDocument error.
use std::io::{self, BufRead};
use serde_yaml;
use serde::{self, Deserialize};
#[derive(Deserialize, Debug)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum Message {
Event { id: i32 },
Trigger { id: i32},
}
fn main() -> io::Result<()> {
let yaml = "---\ntype: event\nid: 1\n---\n\ntype: trigger\nid: 2";
let v: Message = serde_yaml::from_reader(yaml.as_bytes()).unwrap();
println!("{:?}", v);
Ok(())
}
I'm totally new to Rust, so maybe I completely missed the point of serde and just did not understand how to do it.
How would you parse such YAML, please?
I cooked up something that looks like a working solution, but I think I'll try to post it among the answers instead, because I don't want to bias other answers too much towards my solution. I kindly encourage you to have a look at it as well however, any feedback is welcome.
The documentation of serde_yaml::Deserializer shows an example very similar to yours. It would work like this:
use serde::Deserialize;
#[derive(Deserialize, Debug)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum Message {
Event { id: i32 },
Trigger { id: i32 },
}
fn main() {
let yaml = "---\ntype: event\nid: 1\n---\ntype: trigger\nid: 2\n";
for document in serde_yaml::Deserializer::from_str(yaml) {
let v = Message::deserialize(document).unwrap();
println!("{:?}", v);
}
}
I really hope to find a native solution by using serde and serde_yaml only, but until then the way I got it working is as follows.
trait BufReaderYamlExt {
fn read_next_yaml(&mut self) -> io::Result<Option<String>>;
}
impl<T: io::Read> BufReaderYamlExt for io::BufReader<T> {
fn read_next_yaml(&mut self) -> io::Result<Option<String>> {
const sep : &str = "\n---\n";
let mut doc = String::with_capacity(200);
while self.read_line(&mut doc)? > 0 {
if doc.len() > sep.len() && doc.ends_with(sep) {
doc.truncate(doc.len() - sep.len());
break;
}
}
if !doc.is_empty() {
doc.shrink_to_fit();
Ok(Some(doc))
} else {
Ok(None)
}
}
}
The trait extends the BufReader with an extra method that returns an optional owned String (or None at the end of the stream) containing just the portion with a single YAML document.
By iterating on it one could then apply serde_json::from_str() to parse the document into a Message struct.
fn main() -> io::Result<()> {
let yaml = "---\ntype: event\nid: 1\n\n---\n\ntype: trigger\nid: 2\n";
let mut r = io::BufReader::new(yaml.as_bytes());
while let Some(next) = r.read_next_yaml()? {
let d: Message = serde_yaml::from_str(&next).unwrap();
println!("parsed: {:?}", d);
}
Ok(())
}
I've made available the full source on the rust playground as well.
I'm pretty new to Rust and trying to implement some kind of database. Users should create tables by giving a table name, a vector of column names and a vector of column types (realized over an enum). Filling tables should be done by specifying csv files. However, this requires the structure of the table rows to be specified at compile time, like shown in the basic example:
#[derive(Debug, Deserialize, Eq, PartialEq)]
struct Row {
key: u32,
name: String,
comment: String
}
use std::error::Error;
use csv::ReaderBuilder;
use serde::Deserialize;
use std::fs;
fn read_from_file(path: &str) -> Result<(), Box<dyn Error>> {
let data = fs::read_to_string(path).expect("Unable to read file");
let mut rdr = ReaderBuilder::new()
.has_headers(false)
.delimiter(b'|')
.from_reader(data.as_bytes());
let mut iter = rdr.deserialize();
if let Some(result) = iter.next() {
let record:Row = result?;
println!("{:?}", record);
Ok(())
} else {
Err(From::from("expected at least one record but got none"))
}
}
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization? Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization?
All generics replaced with concrete types at compile time. If you do not know types you will need in runtime, "generics" is not what you need.
Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
I suggest using Box<dyn Any> instead, to be able to store reference of any type and, still, know what type it is.
Maintenance cost for this approach is pretty high. You have to manage each possible value type everywhere you want to use a cell's value. On the other hand, you do not need to parse value each time, just make some type checks in runtime.
I have used std::any::TypeId to identify type, but it can not be used in match expressions. You can consider using custom enum as type identifier.
use std::any::{Any, TypeId};
use std::io::Read;
use csv::Reader;
#[derive(Default)]
struct Table {
name: String,
headers: Vec<(String, TypeId)>,
data: Vec<Vec<Box<dyn Any>>>,
}
impl Table {
fn add_header(&mut self, header: String, _type: TypeId) {
self.headers.push((header, _type));
}
fn populate_data<R: Read>(
&mut self,
rdr: &mut Reader<R>,
) -> Result<(), Box<dyn std::error::Error>> {
for record in rdr.records() {
let record = record?;
let mut row: Vec<Box<dyn Any>> = vec![];
for (&(_, type_id), value) in self.headers.iter().zip(record.iter()) {
if type_id == TypeId::of::<u32>() {
row.push(Box::new(value.parse::<u32>()?));
} else if type_id == TypeId::of::<String>() {
row.push(Box::new(value.to_owned()));
}
}
self.data.push(row);
}
Ok(())
}
}
impl std::fmt::Display for Table {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
writeln!(f, "Table: {}", self.name)?;
for (name, _) in self.headers.iter() {
write!(f, "{}, ", name)?;
}
writeln!(f)?;
for row in self.data.iter() {
for cell in row.iter() {
if let Some(&value) = cell.downcast_ref::<u32>() {
write!(f, "{}, ", value)?;
} else if let Some(value) = cell.downcast_ref::<String>() {
write!(f, "{}, ", value)?;
}
}
writeln!(f)?;
}
Ok(())
}
}
fn main() {
let mut table: Table = Default::default();
table.name = "Foo".to_owned();
table.add_header("key".to_owned(), TypeId::of::<u32>());
table.add_header("name".to_owned(), TypeId::of::<String>());
table.add_header("comment".to_owned(), TypeId::of::<String>());
let data = "\
key,name,comment
1,foo,foo comment
2,bar,bar comment
";
let mut rdr = Reader::from_reader(data.as_bytes());
table.populate_data(&mut rdr).unwrap();
print!("{}", table);
}
I'm trying to create a method that can return a reference to Data that is either in a constant global array or inside an Option in an item. The lifetimes are certainly different, but it's safe to assume that the lifetime of the data is at least as long as the lifetime of the item. While doing this, I expected the compiler to warn if I did anything wrong, but it's instead generating wrong instructions and the program is crashing with SIGILL.
Concretely speaking, I have the following code failing in Rust 1.27.2:
#[derive(Debug)]
pub enum Type {
TYPE1,
TYPE2,
}
#[derive(Debug)]
pub struct Data {
pub ctype: Type,
pub int: i32,
}
#[derive(Debug)]
pub struct Entity {
pub idata: usize,
pub modifier: Option<Data>,
}
impl Entity {
pub fn data(&self) -> &Data {
if self.modifier.is_none() {
&DATA[self.idata]
} else {
self.modifier.as_ref().unwrap()
}
}
}
pub const DATA: [Data; 1] = [Data {
ctype: Type::TYPE2,
int: 1,
}];
fn main() {
let mut itemvec = vec![Entity {
idata: 0,
modifier: None,
}];
eprintln!("vec[0]: {:p} = {:?}", &itemvec[0], itemvec[0]);
eprintln!("removed item 0");
let item = itemvec.remove(0);
eprintln!("item: {:p} = {:?}", &item, item);
eprintln!("modifier: {:p} = {:?}", &item.modifier, item.modifier);
eprintln!("DATA: {:p} = {:?}", &DATA[0], DATA[0]);
let itemdata = item.data();
eprintln!("itemdata: {:p} = {:?}", itemdata, itemdata);
}
Complete code
I can't understand what I'm doing wrong. Why isn't the compiler generating a warning? Is it the removal of the (non-copy) item of the vector? Is it the ambiguous lifetimes?
How to return a reference to a global vector or an internal Option?
By using Option::unwrap_or_else:
impl Entity {
pub fn data(&self) -> &Data {
self.modifier.as_ref().unwrap_or_else(|| &DATA[self.idata])
}
}
but it's instead generating wrong instructions and the program is crashing with SIGILL
The code in your question does not have this behavior on macOS with Rust 1.27.2 or 1.28.0. On Ubuntu I see an issue when running the program in Valgrind, but the problem goes away in Rust 1.28.0.
See also:
Why should I prefer `Option::ok_or_else` instead of `Option::ok_or`?
What is this unwrap thing: sometimes it's unwrap sometimes it's unwrap_or
Say we want to have objects implementations switched at runtime, we'd do something like this:
pub trait Methods {
fn func(&self);
}
pub struct Methods_0;
impl Methods for Methods_0 {
fn func(&self) {
println!("foo");
}
}
pub struct Methods_1;
impl Methods for Methods_1 {
fn func(&self) {
println!("bar");
}
}
pub struct Object<'a> { //'
methods: &'a (Methods + 'a),
}
fn main() {
let methods: [&Methods; 2] = [&Methods_0, &Methods_1];
let mut obj = Object { methods: methods[0] };
obj.methods.func();
obj.methods = methods[1];
obj.methods.func();
}
Now, what if there are hundreds of such implementations? E.g. imagine implementations of cards for collectible card game where every card does something completely different and is hard to generalize; or imagine implementations for opcodes for a huge state machine. Sure you can argue that a different design pattern can be used -- but that's not the point of this question...
Wonder if there is any way for these Impl structs to somehow "register" themselves so they can be looked up later by a factory method? I would be happy to end up with a magical macro or even a plugin to accomplish that.
Say, in D you can use templates to register the implementations -- and if you can't for some reason, you can always inspect modules at compile-time and generate new code via mixins; there are also user-defined attributes that can help in this. In Python, you would normally use a metaclass so that every time a new child class is created, a ref to it is stored in the metaclass's registry which allows you to look up implementations by name or parameter; this can also be done via decorators if implementations are simple functions.
Ideally, in the example above you would be able to create Object as
Object::new(0)
where the value 0 is only known at runtime and it would magically return you an Object { methods: &Methods_0 }, and the body of new() would not have the implementations hard-coded like so "methods: [&Methods; 2] = [&Methods_0, &Methods_1]", instead it should be somehow inferred automatically.
So, this is probably extremely buggy, but it works as a proof of concept.
It is possible to use Cargo's code generation support to make the introspection at compile-time, by parsing (not exactly parsing in this case, but you get the idea) the present implementations, and generating the boilerplate necessary to make Object::new() work.
The code is pretty convoluted and has no error handling whatsoever, but works.
Tested on rustc 1.0.0-dev (2c0535421 2015-02-05 15:22:48 +0000)
(See on github)
src/main.rs:
pub mod implementations;
mod generated_glue {
include!(concat!(env!("OUT_DIR"), "/generated_glue.rs"));
}
use generated_glue::Object;
pub trait Methods {
fn func(&self);
}
pub struct Methods_2;
impl Methods for Methods_2 {
fn func(&self) {
println!("baz");
}
}
fn main() {
Object::new(2).func();
}
src/implementations.rs:
use super::Methods;
pub struct Methods_0;
impl Methods for Methods_0 {
fn func(&self) {
println!("foo");
}
}
pub struct Methods_1;
impl Methods for Methods_1 {
fn func(&self) {
println!("bar");
}
}
build.rs:
#![feature(core, unicode, path, io, env)]
use std::env;
use std::old_io::{fs, File, BufferedReader};
use std::collections::HashMap;
fn main() {
let target_dir = Path::new(env::var_string("OUT_DIR").unwrap());
let mut target_file = File::create(&target_dir.join("generated_glue.rs")).unwrap();
let source_code_path = Path::new(file!()).join_many(&["..", "src/"]);
let source_files = fs::readdir(&source_code_path).unwrap().into_iter()
.filter(|path| {
match path.str_components().last() {
Some(Some(filename)) => filename.split('.').last() == Some("rs"),
_ => false
}
});
let mut implementations = HashMap::new();
for source_file_path in source_files {
let relative_path = source_file_path.path_relative_from(&source_code_path).unwrap();
let source_file_name = relative_path.as_str().unwrap();
implementations.insert(source_file_name.to_string(), vec![]);
let mut file_implementations = &mut implementations[*source_file_name];
let mut source_file = BufferedReader::new(File::open(&source_file_path).unwrap());
for line in source_file.lines() {
let line_str = match line {
Ok(line_str) => line_str,
Err(_) => break,
};
if line_str.starts_with("impl Methods for Methods_") {
const PREFIX_LEN: usize = 25;
let number_len = line_str[PREFIX_LEN..].chars().take_while(|chr| {
chr.is_digit(10)
}).count();
let number: i32 = line_str[PREFIX_LEN..(PREFIX_LEN + number_len)].parse().unwrap();
file_implementations.push(number);
}
}
}
writeln!(&mut target_file, "use super::Methods;").unwrap();
for (source_file_name, impls) in &implementations {
let module_name = match source_file_name.split('.').next() {
Some("main") => "super",
Some(name) => name,
None => panic!(),
};
for impl_number in impls {
writeln!(&mut target_file, "use {}::Methods_{};", module_name, impl_number).unwrap();
}
}
let all_impls = implementations.values().flat_map(|impls| impls.iter());
writeln!(&mut target_file, "
pub struct Object;
impl Object {{
pub fn new(impl_number: i32) -> Box<Methods + 'static> {{
match impl_number {{
").unwrap();
for impl_number in all_impls {
writeln!(&mut target_file,
" {} => Box::new(Methods_{}),", impl_number, impl_number).unwrap();
}
writeln!(&mut target_file, "
_ => panic!(\"Unknown impl number: {{}}\", impl_number),
}}
}}
}}").unwrap();
}
The generated code:
use super::Methods;
use super::Methods_2;
use implementations::Methods_0;
use implementations::Methods_1;
pub struct Object;
impl Object {
pub fn new(impl_number: i32) -> Box<Methods + 'static> {
match impl_number {
2 => Box::new(Methods_2),
0 => Box::new(Methods_0),
1 => Box::new(Methods_1),
_ => panic!("Unknown impl number: {}", impl_number),
}
}
}