How to use wirefilter over an infinite stream of data - rust

I am writing a program to use wirefilter in order to filter data from an infinite stream.
But it seems that I cannot use a compiled ast in a loop because of lifetimes and when I try to compile, this is the output:
error: borrowed data cannot be stored outside of its closure
--> src/main.rs:34:33
|
31 | let filter = ast.compile();
| ------ ...so that variable is valid at time of its declaration
32 |
33 | for my_struct in data.filter(|my_struct| {
| ----------- borrowed data cannot outlive this closure
34 | let execution_context = my_struct.execution_context();
| ^^^^^^^^^ ----------------- cannot infer an appropriate lifetime...
| |
| cannot be stored outside of its closure
error: aborting due to previous error
error: Could not compile `wirefilter_playground`.
To learn more, run the command again with --verbose.
main.rs
use wirefilter::{ExecutionContext, Scheme};
lazy_static::lazy_static! {
static ref SCHEME: Scheme = Scheme! {
port: Int
};
}
#[derive(Debug)]
struct MyStruct {
port: i32,
}
impl MyStruct {
fn scheme() -> &'static Scheme {
&SCHEME
}
fn execution_context(&self) -> ExecutionContext {
let mut ctx = ExecutionContext::new(Self::scheme());
ctx.set_field_value("port", self.port).unwrap();
ctx
}
}
fn main() -> Result<(), failure::Error> {
let data = expensive_data_iterator();
let scheme = MyStruct::scheme();
let ast = scheme.parse("port in {2 5}")?;
let filter = ast.compile();
for my_struct in data.filter(|my_struct| {
let execution_context = my_struct.execution_context();
filter.execute(&execution_context).unwrap()
}).take(10) {
println!("{:?}", my_struct);
}
Ok(())
}
fn expensive_data_iterator() -> impl Iterator<Item=MyStruct> {
(0..).map(|port| MyStruct { port })
}
Cargo.toml
[package]
name = "wirefilter_playground"
version = "0.1.0"
edition = "2018"
[dependencies]
wirefilter-engine = "0.6.1"
failure = "0.1.5"
lazy_static = "1.3.0"
is it possible to make it work? I would like to yield only the filtered data for the final user otherwise the amount of data would be huge in memory.
Thank you in advance!

It looks like the problem is with the lifetime elision in return structs. In particular this code:
fn execution_context(&self) -> ExecutionContext {
//...
}
is equivalent to this one:
fn execution_context<'s>(&'s self) -> ExecutionContext<'s> {
//...
}
Which becomes obvious once you realize that ExecutionContext has an associated lifetime.
The lifetime of ExecutionContext does not have to match that of the MyStruct so you probably want to write:
fn execution_context<'e>(&self) -> ExecutionContext<'e> {
//...
}
or maybe:
fn execution_context<'s, 'e>(&'s self) -> ExecutionContext<'e>
where 'e: 's {
//...
}
depending on whether your context will eventually refer to any content of MyStruct.

Related

Blanket implementation for `der::asn1::SetOfVec` not recognized by compiler?

I want to DER encode a nested data structure using the der crate of the RustCrypto project. I have a Vec of SetOfVecs. There should be a blanket implementation of der::Encode for SetOfVec, but the compiler does not recognize it:
// Add this to Cargo.toml:
// ...
// [dependencies]
// der = { version = "0.6", features = ["alloc", "oid"]}
// main.rs:
use der::{DecodeValue, Encode, Header, Reader, Sequence};
use der::asn1::{ObjectIdentifier, SetOfVec};
struct NestedType(Vec<SetOfVec<ObjectIdentifier>>);
impl<'a> DecodeValue<'a> for NestedType {
fn decode_value<R: Reader<'a>>(reader: &mut R, header: Header) -> der::Result<Self> {
Ok(reader.decode()?)
}
}
impl<'a> Sequence<'a> for NestedType {
fn fields<F, T>(&self, field_encoder: F) -> der::Result<T>
where
F: FnOnce(&[&dyn Encode]) -> der::Result<T>,
{
// This works
// (the following 3 lines are just for comparison with the failing case)
let encoder1 = SetOfVec::<ObjectIdentifier>::new();
let encoder2 = SetOfVec::<ObjectIdentifier>::new();
field_encoder(&[&encoder1, &encoder2])?;
// This doesn't work
let mut refs: Vec<&SetOfVec<ObjectIdentifier>> = Vec::new();
for rdn in self.0.iter() {
refs.push(rdn);
}
field_encoder(refs.as_slice())
}
}
fn main() {}
The compiler error:
error[E0308]: mismatched types
--> src\main.rs:33:23
|
33 | field_encoder(refs.as_slice())
| ^^^^^^^^^^^^^^^ expected trait object `dyn Encode`, found struct `SetOfVec`
|
= note: expected reference `&[&dyn Encode]`
found reference `&[&SetOfVec<der::asn1::ObjectIdentifier>]`
As the Encode trait is implemented for SetOfVec, there should be no error. What's the problem here?
Looks like the compiler needs more info. The following code creates a Vec<&dyn Encode> with the references pointing to the SetOfVecs in the original Vec. This should also solve the memory allocation issue, that #Chayim Friedman mentioned (please correct me, if I'm wrong):
impl<'a> Sequence<'a> for NestedType {
fn fields<F, T>(&self, field_encoder: F) -> der::Result<T>
where
F: FnOnce(&[&dyn Encode]) -> der::Result<T>,
{
let mut refs: Vec<&dyn Encode> = Vec::new();
for rdn in self.0.iter() {
refs.push(rdn);
}
field_encoder(refs.as_slice())
}
}

Borrowing errors whilst mutating in for loop

I am having issues with the borrow checker and temporary values in rust.
I'm hoping to find a solution to this specific problem as well as better learn how to handle this kind of situation in the future. I originally started off with a for_each but ran into issues terminating early.
I considered moving the logic of check_foo into update_foo, however this wouldn't work well for my real world solution; the MRE focuses on the compilation issues vs what I'm trying to achieve in the whole program.
Edit: Is there a way for me to achieve this in a purely functional approach?
I want to iterate over a range of numbers, updating a Vec<Foo> and potentially returning early with a value. Below is a minimal reproducible example of my code with the same errors:
I tried tried implementing as:
fn run<'a>(mut foos: Vec<Foo>) -> Vec<&'a u32> {
let mut bar: Vec<&u32> = vec![];
for num in 0..10 {
for foo in &mut foos {
update_foo(foo);
let checked_foo = check_foo(&foo);
if checked_foo.is_empty() {
bar = checked_foo;
break;
}
}
}
bar
}
/* `fn update_foo` and `fn check_foo` definitions same as below */
but this resulted in:
21 | for foo in &mut foos {
| ^^^^^^^^^ `foos` was mutably borrowed here in the previous iteration of the loop
To overcome this I added the use of Rc and RefCell to allow me to iterate over a reference whilst still being able to mutate:
#[derive(Clone, Debug, PartialEq)]
pub struct Foo {
updated: bool,
}
fn run<'a>(foos: Vec<Rc<RefCell<Foo>>>) -> Vec<&'a u32> {
let mut bar: Vec<&u32> = vec![];
for num in 0..10 {
for foo in &foos {
update_foo(&mut foo.borrow_mut());
let checked_foo = check_foo(&foo.borrow());
if checked_foo.is_empty() {
bar = checked_foo;
break;
}
}
}
bar
}
fn update_foo(foo: &mut Foo) {
foo.updated = true
}
fn check_foo(foo: &Foo) -> Vec<&u32> {
if foo.updated {
vec![&0, &1, &2]
} else {
vec![]
}
}
which results in:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:33:5
|
26 | let checked_foo = check_foo(&foo.borrow());
| ------------ temporary value created here
...
33 | bar
| ^^^ returns a value referencing data owned by the current function
error[E0515]: cannot return value referencing function parameter `foos`
--> src/main.rs:33:5
|
23 | for foo in &foos {
| ----- `foos` is borrowed here
...
33 | bar
| ^^^ returns a value referencing data owned by the current function
For more information about this error, try `rustc --explain E0515`.
I'm not entirely sure what you plan to do with this, but it seems to me like a few of the references you're using should be owned. Here's what I came up with.
#[derive(Clone, Debug, PartialEq)]
pub struct Foo {
updated: bool,
}
fn run(foos: &mut Vec<Foo>) -> Vec<u32> {
let mut bar: Vec<u32> = vec![];
for num in 0..10 {
for foo in foos.iter_mut() {
update_foo(foo);
let checked_foo = check_foo(&foo);
if checked_foo.is_empty() {
bar = checked_foo;
break;
}
}
}
bar
}
fn update_foo(foo: &mut Foo) {
foo.updated = true
}
fn check_foo(foo: &Foo) -> Vec<u32> {
if foo.updated {
vec![0, 1, 2]
} else {
vec![]
}
}
References should be used when you expect some other struct to own the objects you are referring to, but here you're constructing new vectors with new data, so you should keep the elements owned.

Can't get async closure to work with Warp::Filter

I am trying to get an async closure working in the and_then filter from Warp.
This is the smallest example I could come up with where I am reasonably sure I didn't leave any important details out:
use std::{convert::Infallible, sync::Arc, thread, time};
use tokio::sync::RwLock;
use warp::Filter;
fn main() {
let man = Manifest::new();
let check = warp::path("updates").and_then(|| async move { GetAvailableBinaries(&man).await });
}
async fn GetAvailableBinaries(man: &Manifest) -> Result<impl warp::Reply, Infallible> {
Ok(warp::reply::json(&man.GetAvailableBinaries().await))
}
pub struct Manifest {
binaries: Arc<RwLock<Vec<i32>>>,
}
impl Manifest {
pub fn new() -> Manifest {
let bins = Arc::new(RwLock::new(Vec::new()));
thread::spawn(move || async move {
loop {
thread::sleep(time::Duration::from_millis(10000));
}
});
Manifest { binaries: bins }
}
pub async fn GetAvailableBinaries(&self) -> Vec<i32> {
self.binaries.read().await.to_vec()
}
}
I am using:
[dependencies]
tokio = { version = "0.2", features = ["full"] }
warp = { version = "0.2", features = ["tls"] }
The error is:
error[E0525]: expected a closure that implements the `Fn` trait, but this closure only implements `FnOnce`
--> src/main.rs:9:48
|
9 | let check = warp::path("updates").and_then(|| async move { GetAvailableBinaries(&man).await });
| -------- ^^^^^^^^^^^^^ ------------------------------------ closure is `FnOnce` because it moves the variable `man` out of its environment
| | |
| | this closure implements `FnOnce`, not `Fn`
| the requirement to implement `Fn` derives from here
After making Manifest implement Clone, you can fix the error by balancing when the manifest object is cloned:
fn main() {
let man = Manifest::new();
let check = warp::path("updates").and_then(move || {
let man = man.clone();
async move { get_available_binaries(&man).await }
});
warp::serve(check);
}
This moves man into the closure passed to and_then, then provides a clone of man to the async block each time the closure is executed. The async block then owns that data and can take a reference to it without worrying about executing the future after the data has been deallocated.
I'm not sure this is what you're going for, but this solution builds for me:
use std::{convert::Infallible, sync::Arc, thread, time};
use tokio::sync::RwLock;
use warp::Filter;
fn main() {
let man = Manifest::new();
let check = warp::path("updates").and_then(|| async { GetAvailableBinaries(&man).await });
}
async fn GetAvailableBinaries(man: &Manifest) -> Result<impl warp::Reply, Infallible> {
Ok(warp::reply::json(&man.GetAvailableBinaries().await))
}
#[derive(Clone)]
pub struct Manifest {
binaries: Arc<RwLock<Vec<i32>>>,
}
impl Manifest {
pub fn new() -> Manifest {
let bins = Arc::new(RwLock::new(Vec::new()));
thread::spawn(move || async {
loop {
thread::sleep(time::Duration::from_millis(10000));
//mutate bins here
}
});
Manifest { binaries: bins }
}
pub async fn GetAvailableBinaries(&self) -> Vec<i32> {
self.binaries.read().await.to_vec()
}
}
The move here is the reason the compiler gave a warning regarding the signature: let check = warp::path("updates").and_then(|| async move { GetAvailableBinaries(&man).await });. This means that everything referenced in this closure will be moved into the context of the closure. In this case, the compiler can't guarantee the closure to be Fn but only FnOnce meaning that the closure can only be guaranteed to execute once.

rust E0597: borrowed value does not live lnog enough

I am trying to rewrite an algorithm from javascript to rust. In the following code, I get borrowed value does not live long enough error at line number 17.
[dependencies]
scraper = "0.11.0"
use std::fs;
fn get_html(fname: &str) -> String {
fs::read_to_string(fname).expect("Something went wrong reading the file")
}
pub mod diff_html {
use scraper::{element_ref::ElementRef, Html};
pub struct DiffNode<'a> {
node_ref: ElementRef<'a>,
}
impl<'a> DiffNode<'a> {
fn from_html(html: &str) -> Self {
let doc = Self::get_doc(&html);
let root_element = doc.root_element().to_owned();
let diffn = Self {
node_ref: root_element,
};
diffn
}
fn get_doc(html: &str) -> Html {
Html::parse_document(html).to_owned()
}
}
pub fn diff<'a>(html1: &str, _html2: &str) -> DiffNode<'a> {
let diff1 = DiffNode::from_html(&html1);
diff1
}
}
fn main() {
//read strins
let filename1: &str = "test/test1.html";
let filename2: &str = "test/test2.html";
let html1: &str = &get_html(filename1);
let html2: &str = &get_html(filename2);
let diff1 = diff_html::diff(html1, html2);
//write html
//fs::write("test_outs/testx.html", html1).expect("unable to write file");
//written output file.
}
warning: unused variable: `diff1`
--> src\main.rs:43:9
|
43 | let diff1 = diff_html::diff(html1, html2);
| ^^^^^ help: if this is intentional, prefix it with an underscore: `_diff1`
|
= note: `#[warn(unused_variables)]` on by default
error[E0597]: `doc` does not live long enough
--> src\main.rs:17:32
|
14 | impl<'a> DiffNode<'a> {
| -- lifetime `'a` defined here
...
17 | let root_element = doc.root_element().to_owned();
| ^^^--------------------------
| |
| borrowed value does not live long enough
| assignment requires that `doc` is borrowed for `'a`
...
22 | }
| - `doc` dropped here while still borrowed
I want a detailed explanation/solution if possible.
root_element which is actually an ElementRef has reference to objects inside doc, not the actual owned object. The object doc here is created in from_html function and therefore owned by the function. Because doc is not returned, it is dropped / deleted from memory at the end of from_html function block.
ElementRef needs doc, the thing it is referencing to, to be alive when it is returned from the memory.
pub mod diff_html {
use scraper::{element_ref::ElementRef, Html};
pub struct DiffNode<'a> {
node_ref: ElementRef<'a>,
}
impl<'a> DiffNode<'a> {
fn from_html(html: &'a scraper::html::Html) -> Self {
Self {
node_ref: html.root_element(),
}
}
}
pub fn diff<'a>(html1_string: &str, _html2_string: &str) {
let html1 = Html::parse_document(&html1_string);
let diff1 = DiffNode::from_html(&html1);
// do things here
// at the end of the function, diff1 and html1 is dropped together
// this way the compiler doesn't yell at you
}
}
More or less you need to do something like this with diff function to let the HTML and ElementRef's lifetime to be the same.
This behavior is actually Rust's feature to guard values in memory so that it doesn't leak or reference not referencing the wrong memory address.
Also if you want to feel like operating detachable objects and play with reference (like java, javascript, golang) I suggest reading this https://doc.rust-lang.org/book/ch15-05-interior-mutability.html

Cursor of HashMap records with RwLockGuard in Rust

I am new to Rust, and I am trying to implement a simple, thread-safe memory key-value store, using a HashMap protected within a RwLock. My code looks like this:
use std::sync::{ Arc, RwLock, RwLockReadGuard };
use std::collections::HashMap;
use std::collections::hash_map::Iter;
type SimpleCollection = HashMap<String, String>;
struct Store(Arc<RwLock<SimpleCollection>>);
impl Store {
fn new() -> Store { return Store(Arc::new(RwLock::new(SimpleCollection::new()))) }
fn get(&self, key: &str) -> Option<String> {
let map = self.0.read().unwrap();
return map.get(&key.to_string()).map(|s| s.clone());
}
fn set(&self, key: &str, value: &str) {
let mut map = self.0.write().unwrap();
map.insert(key.to_string(), value.to_string());
}
}
So far, this code works OK. The problem is that I am trying to implement a scan() function, which returns a Cursor object that can be used to iterate over all the records. I want the Cursor object to hold a RwLockGuard, which is not released until the cursor itself is released (basically I don't want to allow modifications while a Cursor is alive).
I tried this:
use ...
type SimpleCollection = HashMap<String, String>;
struct Store(Arc<RwLock<SimpleCollection>>);
impl Store {
...
fn scan(&self) -> Cursor {
let guard = self.0.read().unwrap();
let iter = guard.iter();
return Cursor { guard, iter };
}
}
struct Cursor<'l> {
guard: RwLockReadGuard<'l, SimpleCollection>,
iter: Iter<'l, String, String>
}
impl<'l> Cursor<'l> {
fn next(&mut self) -> Option<(String, String)> {
return self.iter.next().map(|r| (r.0.clone(), r.1.clone()));
}
}
But that did not work, as I got this compilation error:
error[E0597]: `guard` does not live long enough
--> src/main.rs:24:20
|
24 | let iter = guard.iter();
| ^^^^^ borrowed value does not live long enough
25 | return Cursor { guard, iter };
26 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the anonymous lifetime #1 defined on the method body at 22:5...
--> src/main.rs:22:5
|
22 | / fn scan(&self) -> Cursor {
23 | | let guard = self.0.read().unwrap();
24 | | let iter = guard.iter();
25 | | return Cursor { guard, iter };
26 | | }
| |_____^
Any ideas?
As mentioned in the comments, the problem is that structs generally can't be self-referential in Rust. The Cursor struct you are trying to construct contains both the MutexGuard and the iterator borrowing the MutexGuard, which is not possible (for good reasons – see the linked question).
The easiest fix in this case is to introduce a separate struct storing the MutexGuard, e.g.
struct StoreLock<'a> {
guard: RwLockReadGuard<'a, SimpleCollection>,
}
On the Store, we can then introduce a method returning a StoreLock
fn lock(&self) -> StoreLock {
StoreLock { guard: self.0.read().unwrap() }
}
and the StoreLock can expose the actual scan() method (and possibly others requiring a persistent lock):
impl<'a> StoreLock<'a> {
fn scan(&self) -> Cursor {
Cursor { iter: self.guard.iter() }
}
}
The Cursor struct itself only contains the iterator:
struct Cursor<'a> {
iter: Iter<'a, String, String>,
}
Client code first needs to obtain the lock, then get the cursor:
let lock = s.lock();
let cursor = lock.scan();
This ensures that the lock lives long enough to finish scanning.
Full code on the playground

Resources