Grouping rust stream items - rust

Streams are often described as 'async iterators'. I am more used to C++ iterators which encourage you to keep the intermediate states of the iterators for re-use in algorithms. It can be difficult to see how you neatly achieve more complex iterator patterns in Rust which doesn't seem to encourage this. So how would you idiomatically express the below?
In the example the code attempts to group items with a matching timestamp, where one stream is providing the timestamps of each group, and the other the data that matches against it.
It is reasonably neat but doesn't work because take_while will consume the first value that it chooses NOT to take, meaning that is then omitted from the next group.
Is there a nice way to do this without having to loop and continually 'peek' at the next item in the stream to know when to stop?
use async_stream::stream;
use futures::{Stream, StreamExt, pin_mut};
use futures::future::{ready};
use futures::stream::iter;
#[derive(Debug)]
struct DataItem {
timestamp: usize,
source_id: i64,
value: f32
}
fn group_stream_items<DataStream: Stream<Item=DataItem>, MarkerStream: Stream<Item=usize>>(data: DataStream, markers: MarkerStream) -> impl Stream<Item=Vec<DataItem>> {
stream! {
pin_mut!(data);
pin_mut!(markers);
while let Some(marker) = markers.next().await {
let items_at_marked_time = data.as_mut()
.skip_while(|item| ready(item.timestamp < marker))
.take_while(|item| ready(item.timestamp == marker))
.collect::<Vec<_>>().await;
if items_at_marked_time.len() > 0 {
yield items_at_marked_time;
}
}
}
}
#[tokio::main]
async fn main() {
let data = [
DataItem {timestamp: 100, source_id: 1, value: 1_f32}, // Group: 0
DataItem {timestamp: 100, source_id: 2, value: 2_f32}, // Group: 0
DataItem {timestamp: 200, source_id: 1, value: 3_f32}, // Group: 1
DataItem {timestamp: 200, source_id: 2, value: 4_f32}, // Group: 1
DataItem {timestamp: 300, source_id: 1, value: 5_f32}, // Group: 2
DataItem {timestamp: 400, source_id: 2, value: 6_f32}, // Group: 3
];
let markers = [100, 200, 300, 400];
let groups = group_stream_items(iter(data), iter(markers))
.collect::<Vec<_>>().await;
println!("Found groups: {:#?}", groups);
}
[package]
name = "rust-stream-example"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
tokio = { version = "1", features = ["full"] }
async-stream = "0.3"
futures = "0.3"

Related

Function to move around keys in serde_yaml Value type in Rust

I have a yaml file that looks like this:
columns:
- name: col1
type: date
description: a date field
semantics:
type: dimension
- name: col2
type: integer
description: some int field
semantics:
identifiers:
- name: device
dimensions:
I'd like to move any semantics references under the columns to the dimensions key in the bottom block like so:
columns:
- name: col1
type: date
description: a date field
- name: col2
type: integer
description: some int field
semantics:
identifiers:
- name: device
dimensions:
- col1
This is what I've got so far:
pub fn swap_semantics(m: &Mapping) {
if let Some(col) = m.get("columns") {
let out: Vec<_> = col
.as_sequence()
.unwrap()
.iter()
.map(|c| {
if let Some(val) = c.as_mapping() {
if val.contains_key("semantics") {
val.get("name").take()
} else {
None
}
} else {
None
}
})
.filter(|c| c.is_some())
.flatten()
.map(|f| f.as_str().unwrap())
.collect();
//.for_each(|f| println!("{:?}", f));
let mut dims = HashMap::new();
dims.insert("dimensions", out);
println!("{:?}", dims);
}
}
It works, but it's ugly and seemingly excessively verbose.
Is there a better way to work with Value types? There are other keys that can exist on these yaml keys which could be list types or other, which make it difficult to simply deserialize into a HashMap. The other option is to define several structs with lots of optional fields, which also seems like high friction.

Get struct from inside tuple variant [duplicate]

I wish that enums in Rust can be used like Haskell's productive type. I want to
access a field's value directly
assign a field's value directly or make a clone with the changing value.
Directly means that not using too long pattern matching code, but just could access like let a_size = a.size.
In Haskell:
data TypeAB = A {size::Int, name::String} | B {size::Int, switch::Bool} deriving Show
main = do
let a = A 1 "abc"
let b = B 1 True
print (size a) -- could access a field's value directly
print (name a) -- could access a field's value directly
print (switch b) -- could access a field's value directly
let aa = a{size=2} -- could make a clone directly with the changing value
print aa
I tried two styles of Rust enum definition like
Style A:
#[derive(Debug)]
enum EntryType {
A(TypeA),
B(TypeB),
}
#[derive(Debug)]
struct TypeA {
size: u32,
name: String,
}
#[derive(Debug)]
struct TypeB {
size: u32,
switch: bool,
}
fn main() {
let mut ta = TypeA {
size: 3,
name: "TAB".to_string(),
};
println!("{:?}", &ta);
ta.size = 2;
ta.name = "TCD".to_string();
println!("{:?}", &ta);
let mut ea = EntryType::A(TypeA {
size: 1,
name: "abc".to_string(),
});
let mut eb = EntryType::B(TypeB {
size: 1,
switch: true,
});
let vec_ab = vec![&ea, &eb];
println!("{:?}", &ea);
println!("{:?}", &eb);
println!("{:?}", &vec_ab);
// Want to do like `ta.size = 2` for ea
// Want to do like `ta.name = "bcd".to_string()` for ea
// Want to do like `tb.switch = false` for eb
// ????
println!("{:?}", &ea);
println!("{:?}", &eb);
println!("{:?}", &vec_ab);
}
Style B:
#[derive(Debug)]
enum TypeCD {
TypeC { size: u32, name: String },
TypeD { size: u32, switch: bool },
}
fn main() {
// NOTE: Rust requires representative struct name before each constructor
// TODO: Check constructor name can be duplicated
let mut c = TypeCD::TypeC {
size: 1,
name: "abc".to_string(),
};
let mut d = TypeCD::TypeD {
size: 1,
switch: true,
};
let vec_cd = vec![&c, &d];
println!("{:?}", &c);
println!("{:?}", &d);
println!("{:?}", &vec_cd);
// Can't access a field's value like
// let c_size = c.size
let c_size = c.size; // [ERROR]: No field `size` on `TypeCD`
let c_name = c.name; // [ERROR]: No field `name` on `TypeCD`
let d_switch = d.switch; // [ERROR]: No field `switch` on `TypeCD`
// Can't change a field's value like
// c.size = 2;
// c.name = "cde".to_string();
// d.switch = false;
println!("{:?}", &c);
println!("{:?}", &d);
println!("{:?}", &vec_cd);
}
I couldn't access/assign values directly in any style. Do I have to implement functions or a trait just to access a field's value? Is there some way of deriving things to help this situation?
What about style C:
#[derive(Debug)]
enum Color {
Green { name: String },
Blue { switch: bool },
}
#[derive(Debug)]
struct Something {
size: u32,
color: Color,
}
fn main() {
let c = Something {
size: 1,
color: Color::Green {
name: "green".to_string(),
},
};
let d = Something {
size: 2,
color: Color::Blue { switch: true },
};
let vec_cd = vec![&c, &d];
println!("{:?}", &c);
println!("{:?}", &d);
println!("{:?}", &vec_cd);
let _ = c.size;
}
If all variant have something in common, why separate them?
Of course, I need to access not common field too.
This would imply that Rust should define what to do when the actual type at runtime doesn't contain the field you required. So, I don't think Rust would add this one day.
You could do it yourself. It will require some lines of code, but that matches the behavior of your Haskell code. However, I don't think this is the best thing to do. Haskell is Haskell, I think you should code in Rust and not try to code Haskell by using Rust. That a general rule, some feature of Rust come directly from Haskell, but what you want here is very odd in my opinion for Rust code.
#[derive(Debug)]
enum Something {
A { size: u32, name: String },
B { size: u32, switch: bool },
}
impl Something {
fn size(&self) -> u32 {
match self {
Something::A { size, .. } => *size,
Something::B { size, .. } => *size,
}
}
fn name(&self) -> &String {
match self {
Something::A { name, .. } => name,
Something::B { .. } => panic!("Something::B doesn't have name field"),
}
}
fn switch(&self) -> bool {
match self {
Something::A { .. } => panic!("Something::A doesn't have switch field"),
Something::B { switch, .. } => *switch,
}
}
fn new_size(&self, size: u32) -> Something {
match self {
Something::A { name, .. } => Something::A {
size,
name: name.clone(),
},
Something::B { switch, .. } => Something::B {
size,
switch: *switch,
},
}
}
// etc...
}
fn main() {
let a = Something::A {
size: 1,
name: "Rust is not haskell".to_string(),
};
println!("{:?}", a.size());
println!("{:?}", a.name());
let b = Something::B {
size: 1,
switch: true,
};
println!("{:?}", b.switch());
let aa = a.new_size(2);
println!("{:?}", aa);
}
I think there is currently no built-in way of accessing size directly on the enum type. Until then, enum_dispatch or a macro-based solution may help you.

Efficiently get all items of a vector with a given id

Say I have a vector of items where each item has an id, like in the example below. I can of course get all the items in the vector with a given id using something like large_vector.iter().filter(|item| item.id == given_id). However, for improved performance I can do some preprocessing and sort the vector by item id and store the bounds for each id, like in the example below. This way I can quickly access a slice of the vector for any given id. I end up doing this alot but feel like I am reinventing the wheel and needlessly opening myself up to bugs. Is there a better way to do this directly, preferably using the standard library else some other library?
use std::{collections::HashMap, ops::Range};
#[derive(Debug)]
struct Item {
id: String,
val: f64,
}
impl Item {
fn new(id: &str, val: f64) -> Item {
Item { id: id.into(), val }
}
}
fn main() {
let mut large_vector = vec![
Item::new("C", 2.21),
Item::new("A", 34.2),
Item::new("B", 23.54),
Item::new("C", 34.34),
Item::new("C", 45.21),
Item::new("B", 21.34),
];
// first sort by id
large_vector.sort_by(|item1, item2| item1.id.cmp(&item2.id));
dbg!(&large_vector);
// now create a HasMap storing bounds for each id
let mut lookup = HashMap::new();
let mut start: usize = 0;
let mut end: usize = 0;
if let Some(first_item) = large_vector.get(0) {
let mut current_id = first_item.id.clone();
// insert bound if entered new id section or is last item
for item in &large_vector {
if current_id != item.id {
lookup.insert(current_id.clone(), Range { start, end });
current_id = item.id.clone();
start = end;
}
end += 1;
}
lookup.insert(current_id.clone(), Range { start, end });
}
// test by getting the items for a given id
dbg!(&lookup);
let range = lookup.get("C").unwrap();
dbg!(range);
let items = large_vector[range.start..range.end]
.iter()
.collect::<Vec<_>>();
dbg!(items);
}
[src/main.rs:26] &large_vector = [
Item {
id: "A",
val: 34.2,
},
Item {
id: "B",
val: 23.54,
},
Item {
id: "B",
val: 21.34,
},
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]
[src/main.rs:47] &lookup = {
"A": 0..1,
"B": 1..3,
"C": 3..6,
}
[src/main.rs:49] range = 3..6
[src/main.rs:53] items = [
Item {
id: "C",
val: 2.21,
},
Item {
id: "C",
val: 34.34,
},
Item {
id: "C",
val: 45.21,
},
]
Assuming that your items have to be in a vector, and you can only sort them, I can think of two possibilities:
The solution you proposed. It should be the fastest one for lookup, but has the drawback that the lookup tables get completely invalidated every time you insert/remove an item.
Keep the vector sorted and perform a log(n) based divide-and-conquer search to get the range. If you are interested in what I mean with that, I can provide you with some code.
But in general, I think a vector is simply the wrong data structure. I'd try to change that first.

How to make serde deserialize BigInt as u64?

I'm using toml and num-bigint with the serde feature to deserialize the following data:
[[trades]]
action = "Buy"
date_time = 2019-04-15T15:36:00+01:00
fee = [1, [44000000]]
id = "#1"
price = [-1, [20154500000]]
quantity = [1, [200000000]]
But I'm getting this error:
Error: Error { inner: ErrorInner { kind: Custom, line: Some(7), col: 14, at: Some(156), message: "invalid value: integer `20154500000`, expected u32", key: ["trades", "price"] } }
Of course, if I make the price value smaller than u32::MAX, the program compiles fine. But I want to use this high value, because I'm scaling numbers by 1e8 to avoid dealing with floating-point arithmetic.
Is it possible to make serde deserialize BigInts to u64 instead?
use num_bigint::BigInt;
use serde_derive::Deserialize;
use toml::from_str;
use toml::value::Datetime;
#[derive(Debug, Deserialize)]
pub struct Trade {
pub action: String,
pub date_time: Datetime,
pub fee: BigInt,
pub id: Option<String>,
pub price: BigInt,
pub quantity: BigInt,
}
#[derive(Debug, Deserialize)]
pub struct TradeTable {
pub trades: Vec<Trade>,
}
fn main() {
let trades_string: String = String::from("[[trades]]\naction = \"Buy\"\ndate_time = 2019-04-15T15:36:00+01:00\nexchange = \"Degiro\"\nfee = [1, [44000000]]\nid = \"#1\"\nprice = [-1, [20154500000]]\nquantity = [1, [200000000]]");
let trade_table: TradeTable = from_str(&trades_string).unwrap();
let trades: Vec<Trade> = trade_table.trades;
}
Also, here's a link to a Rust Playground. Note that you will need to copy the code to your local machine, because you need the serde feature from num-bigint:
Cargo.toml
[dependencies.num-bigint]
version = "0.2.6"
features = ["serde"]
How did you create this file -- did you make it by serializing a BigInt, or did you write it by hand?
I wrote the data by hand.
Your data is invalid, the following work:
use num_bigint::BigInt;
use std::str::FromStr;
#[derive(Debug, serde::Serialize, serde::Deserialize, PartialEq)]
pub struct Trade {
pub price: BigInt,
}
fn main() {
let price = BigInt::from_str("-201545000000000").unwrap();
let trade = Trade { price };
let j = serde_json::to_string(&trade).unwrap();
println!("{}", j);
let a: Trade = serde_json::from_str(&j).unwrap();
assert_eq!(trade, a);
let j = toml::to_string(&trade).unwrap();
println!("{}", j);
let b: Trade = toml::from_str(&j).unwrap();
assert_eq!(trade, b);
}
You are not supposed to create it by hand.

How to correctly iterate all records of a multi-level depth structure in Rust?

I would like to know how to iterate correctly in Rust all results contained in a data structure arranged like this:
struct Node {
id: i64,
nodes: Vec<Node>
}
Where the records inserted in this structure have several levels of depth. Something like:
{id: 1, nodes: [
{id: 2, nodes: [
{id: 3, nodes: []},
{id: 4, nodes: []},
{id: 5, nodes: [
{id: 6, nodes: []},
{id: 7, nodes: [
{id: 8, nodes: []},
{id: 9, nodes: []}
]}
]}
]}
]};
I created a simple recursive function to handle the problem and everything is fine now. I do not know what was my mistake yesterday when I created this topic. The real problem is little different from what I asked for, but the essence is the same:
use std::vec::Vec;
struct Node {
id: i64,
nodes: Vec<Node>,
focused: bool,
}
struct Controller {
focused: i32,
}
impl Controller {
fn get_focused(&mut self) -> i32 {
let nodes: Node = ....; // code skipped. represented with JSON object above, but with 'focused' member
for node in nodes.iter() {
self.focused = self.node_iterator(node);
}
self.focused
}
fn node_iterator(&self, node: Node) -> i32 {
let mut focused: i32 = 0;
if node.nodes.len() > 0 {
for n in node.nodes.iter() {
if n.nodes.len() > 0 {
focused = self.node_iterator(n);
if focused > 0 {
return focused;
}
} else {
if n.focused == true {
focused = n.id as i32;
return focused;
}
}
}
}
return 0;
}
}
fn main() {
let mut controller = Controller { focused: 0 };
controller.get_focused();
println!("{}", controller.focused);
}

Resources