I'm wondering if there is a good way to make a function that can read any type of struct from a file. I was able to write a file with the function below. This allows me to write any struct that implements Serialize. I'm trying to do something similar with a reader using generics and struct that impl Deserilize. However, I seem to be hitting issues on the generics and lifetimes. Is there a way to read files of any type of struct?
extern crate bincode;
extern crate serde;
#[macro_use]
extern crate serde_derive;
use serde::{Serialize, Deserialize};
fn main() {
let filename = String::from("./prices.ruststruct");
{
let now = Instant::now();
let (open_prices, close_prices) = gen_random_prices();
let test_prices = TestPrices::new(open_prices, close_prices);
write_struct(&filename, &test_prices);
println!("{}", now.elapsed().as_micros());
}
{
let test_prices = read_struct::<Prices>(&filename);
let now = Instant::now();
let total_prices: f64 = test_prices.open_price.iter().sum();
println!("{}", now.elapsed().as_micros());
}
}
#[derive(Deserialize, Serialize)]
struct Prices {
open_price: Vec<f64>,
close_price: Vec<f64>,
}
fn write_struct(filename: &str, data: &impl Serialize) {
let filename = format!("{}.ruststruct", filename);
let bytes: Vec<u8> = bincode::serialize(&data).unwrap();
let mut file = File::create(filename).unwrap();
file.write_all(&bytes).unwrap();
}
fn read_struct<'a, T: Deserialize<'a>>(filename: &str) -> T {
let filename = format!("{}.ruststruct", filename);
let mut file = File::open(filename).unwrap();
let mut buffer = Vec::<u8>::new();
file.read_to_end(&mut buffer).unwrap();
let decoded: T = bincode::deserialize(&buffer[..]).unwrap();
decoded
}
fn read_struct<'a, T: Deserialize<'a>>(filename: &str) -> T {
Deserialize<'a> is only a suitable bound when you are planning to let the deserialized structures borrow from the input data. But this function cannot allow that, because it discards buffer when it returns.
For a structure like your Prices, T: DeserializeOwned will work. This guarantees that the structure won't borrow from the input data, so it's okay to drop the data.
If you want to allow borrowing then you must put reading the file into a buffer, and deserializing from the buffer, in separate functions so that the caller can keep the buffer alive as long as it wants to use the deserialized structure.
Related
I've generated an Index struct out of flatbuffers IDL schema and the building code was generated by flatc compiler as follows:
#[inline]
pub fn get_root_as_index<'a>(buf: &'a [u8]) -> Index<'a> {
flatbuffers::get_root::<Index<'a>>(buf)
}
It means the created Index lives only in 'a and i have to care about who owns buf. I'm trying to pass ownership to Index together with buf and i don't understand how to make it idiomatic in Rust:
pub struct Arena<'a> {
filter_factory: Box<dyn TFilterFactory<'a> + 'a>,
matcher: Box<dyn TMatcher<'a> + 'a>,
}
pub struct Context<'a> {
buffer: Vec<u8>,
arena: Arena<'a>
}
impl<'a> Arena<'a> {
pub fn new(file_path: &'a str) -> Context {
let mut file = File::open(file_path).unwrap();
let mut buffer = Vec::new();
file.read_to_end(&mut buffer).expect("File reading failed");
let index = get_root_as_index(&buffer[..]);
Context{
buffer,
arena: Arena {
filter_factory: Box::new(FilterFactory::new()),
matcher: Box::new(Matcher::new(
Box::new(FlatBuffersIndex::new(index)),
Box::new(FiltersLazyDataRegistry::new())
))
}
}
}
}
How can i pass the ownership of Index together with buf so i don't have to care about buf (assuming the owner of Index also owns buf)?
I've tried to make a buffer a member of Arena struct, borrow it when creating an Index instance, but later i can't move this new struct.
PS. Ideally i'd love to have an Index that owns it's buffer (so i can care about only 1 instance owner), but that's not how the code is generated.
I have a global static Vec that represents a state. It seems to be no other solution than a global state (I am developing a library that can be used by a threaded program to make network connections and I don't want the program to manage any of the internal data - such as currently open sockets).
Example of what I have in mind (does not compile):
lazy_static! {
static ref SOCKETS: Vec<Connection> = Vec::new();
}
#[no_mangle]
pub extern fn ffi_connect(address: *const u8, length: usize) {
let address_str = unsafe { from_raw_parts(address, length) };
let conn = internal_connect(address_str);
// Now I need to lock all the Vec to mutate it
SOCKETS.push(conn);
}
#[no_mangle]
pub extern fn ffi_recv(index: usize, msg: *mut c_void, size: usize) -> usize {
let buf = unsafe { from_raw_parts(msg as *const u8, size) };
// Now I need to lock ONLY the specific "index" item to mutate it
let conn = SOCKETS.get_mut(index);
conn.read(buf)
}
#[no_mangle]
pub extern fn ffi_send(index: usize, msg: *mut c_void, size: usize) -> usize {
let buf = unsafe { from_raw_parts(msg as *const u8, size) };
// Now I need to lock ONLY the specific "index" item to mutate it
let conn = SOCKETS.get_mut(index);
conn.write(buf)
}
The question is how should I implement SOCKETS in order to be able to call ffi_recv and ffi_send from two threads?
I'm thinking that I have to have a RwLock outside the Vec, in order to be able to lock during ffi_connect (I don't care about blocking at that point) but get multiple immutable references during ffi_recv and ffi_send. Then, somehow I need to get the interior mutability of the object that the Vec is pointing to.
I DON'T want to be able to ffi_recv and ffi_send at the same time on the same object (this MUST throw an error)
I almost had the answer inside my question...
I just had to RwLock<Vec<RwLock<Connection>>>. In order to mutate the Vec itself, the outer write lock would be locked. In order to mutate an item of the Vec, the outer lock would be read blocked where RwLock allows multiple locks. Then the inner RwLock could be either read or write locked.
ffi_connect becomes:
#[no_mangle]
pub extern fn ffi_connect(address: *const u8, length: usize) {
let address_str = unsafe { from_raw_parts(address, length) };
let conn = internal_connect(address_str);
let mut socket_lock = SOCKETS.write().unwrap();
// Nobody can read or write SOCKETS right now
socket_lock.push(conn);
}
And ffi_recv becomes:
#[no_mangle]
pub extern fn ffi_recv(index: usize, msg: *mut c_void, size: usize) -> usize {
let buf = unsafe { from_raw_parts(msg as *const u8, size) };
// Now I need to lock ONLY the specific "index" item to mutate it
let socket_lock = SOCKETS.read().unwrap();
// SOCKETS can only be "read" locked right now
let mut conn = socket_lock.get(index).write().unwrap();
// Nobody else can read or write to this exact object
// SOCKETS remains readable though!
conn.read(buf)
}
#![feature(rustc_private)]
#![feature(box_syntax)]
extern crate rustc;
extern crate rustc_driver;
use rustc::hir::intravisit as hir_visit;
use rustc::hir;
use rustc_driver::driver::{CompileController, CompileState};
pub struct SomeVisitor<'a, 'tcx: 'a> {
pub map: &'a hir::map::Map<'tcx>,
}
impl<'v, 'tcx: 'v> rustc::hir::intravisit::Visitor<'tcx> for SomeVisitor<'v, 'tcx> {
fn nested_visit_map<'this>(&'this mut self) -> hir_visit::NestedVisitorMap<'this, 'tcx> {
hir_visit::NestedVisitorMap::All(self.map)
}
}
fn hir(s: &mut CompileState) {
let krate = s.hir_crate.unwrap();
let map = s.hir_map.unwrap();
let mut visitor = SomeVisitor { map };
hir_visit::walk_crate(&mut visitor, krate);
}
fn main() {
{
let mut controller = CompileController::basic();
controller.after_hir_lowering.callback = box hir;
}
}
playground
I understand why I am getting the lifetime error and it is very easy to solve it by adding explicit lifetimes for the function hir.
pub fn walk_crate<'v, V: hir_visit::Visitor<'v>>(visitor: &mut V, krate: &'v Crate) {}
Because of this definition the lifetime for the reference needs to live for 'tcx.
fn hir<'v, 'tcx>(s: &'tcx mut CompileState<'v, 'tcx>) {
let krate = s.hir_crate.unwrap();
let map = s.hir_map.unwrap();
let mut visitor = SomeVisitor { map };
hir_visit::walk_crate(&mut visitor, krate);
}
But then the function hir becomes incompatible for the callback. playground
I assume that I may need to use HRTB here?
Update:
My current workaround is to use transmute. (playground). Surely there must be a better way?
hir_visit::walk_crate(&mut visitor, visitor.map.krate());
The solution was to realize that map also contains a krate as a reference, but with the correct lifetime. This means that I don't have to introduce explicit lifetimes.
playground
I'm using futures, tokio, hyper, and serde_json to request and deserialize some data that I need to hold until my next request. My initial thought was to make a struct containing the hyper::Chunk and the deserialized data that borrows from the Chunk, but couldn't get the lifetimes right. I tried using the rental crate, but I can't get this to work either. Perhaps I'm using the 'buffer lifetime before declaring the buffer Vec, but maybe I've messed something else up:
#[rental]
pub struct ChunkJson<T: serde::de::Deserialize<'buffer>> {
buffer: Vec<u8>,
json: T
}
Is there some way to make the lifetimes right or should I just use DeserializeOwned and give up on zero-copy?
For more context, the following code works (periodically deserializing JSON from two URLs, retaining the results so we can do something with them both). I'd like to change my X and Y types to use Cow<'a, str> for their fields, changing from DeserializeOwned to Deserialize<'a>. For this to work, I need to store the slice that has been deserialized for each, but I don't know how to do this. I'm looking for examples that use Serde's zero-copy deserialization and retain the result, or some idea for restructuring my code that would work.
#[macro_use]
extern crate serde_derive;
extern crate serde;
extern crate serde_json;
extern crate futures;
extern crate tokio_core;
extern crate tokio_periodic;
extern crate hyper;
use std::collections::HashMap;
use std::error::Error;
use futures::future;
use futures::Future;
use futures::stream::Stream;
use hyper::Client;
fn stream_json<'a, T: serde::de::DeserializeOwned + Send + 'a>
(handle: &tokio_core::reactor::Handle,
url: String,
period: u64)
-> Box<Stream<Item = T, Error = Box<Error>> + 'a> {
let client = Client::new(handle);
let timer = tokio_periodic::PeriodicTimer::new(handle).unwrap();
timer
.reset(::std::time::Duration::new(period, 0))
.unwrap();
Box::new(futures::Stream::zip(timer.from_err::<Box<Error>>(), futures::stream::unfold( (), move |_| {
let uri = url.parse::<hyper::Uri>().unwrap();
let get = client.get(uri).from_err::<Box<Error>>().and_then(|res| {
res.body().concat().from_err::<Box<Error>>().and_then(|chunks| {
let p: Result<T, Box<Error>> = serde_json::from_slice::<T>(chunks.as_ref()).map_err(|e| Box::new(e) as Box<Error>);
match p {
Ok(json) => future::ok((json, ())),
Err(err) => future::err(err)
}
})
});
Some(get)
})).map(|x| { x.1 }))
}
#[derive(Serialize, Deserialize, Debug)]
pub struct X {
foo: String,
}
#[derive(Serialize, Deserialize, Debug)]
pub struct Y {
bar: String,
}
fn main() {
let mut core = tokio_core::reactor::Core::new().unwrap();
let handle = core.handle();
let x_stream = stream_json::<HashMap<String, X>>(&handle, "http://localhost/X".to_string(), 2);
let y_stream = stream_json::<HashMap<String, Y>>(&handle, "http://localhost/Y".to_string(), 5);
let mut xy_stream = x_stream.merge(y_stream);
let mut last_x = HashMap::new();
let mut last_y = HashMap::new();
loop {
match core.run(futures::Stream::into_future(xy_stream)) {
Ok((Some(item), stream)) => {
match item {
futures::stream::MergedItem::First(x) => last_x = x,
futures::stream::MergedItem::Second(y) => last_y = y,
futures::stream::MergedItem::Both(x, y) => {
last_x = x;
last_y = y;
}
}
println!("\nx = {:?}", &last_x);
println!("y = {:?}", &last_y);
// Do more stuff with &last_x and &last_y
xy_stream = stream;
}
Ok((None, stream)) => xy_stream = stream,
Err(_) => {
panic!("error");
}
}
}
}
When trying to solve a complicated programming problem, it's very useful to remove as much as you can. Take your code and remove what you can until the problem goes away. Tweak your code a bit and keep removing until you can't any more. Then, turn the problem around and build from the smallest piece and work back to the error. Doing both of these will show you where the problem lies.
First, let's make sure we deserialize correctly:
extern crate serde;
extern crate serde_json;
#[macro_use]
extern crate serde_derive;
use std::borrow::Cow;
#[derive(Debug, Deserialize)]
pub struct Example<'a> {
#[serde(borrow)]
name: Cow<'a, str>,
key: bool,
}
impl<'a> Example<'a> {
fn info(&self) {
println!("{:?}", self);
match self.name {
Cow::Borrowed(_) => println!("Is borrowed"),
Cow::Owned(_) => println!("Is owned"),
}
}
}
fn main() {
let data: Vec<_> = br#"{"key": true, "name": "alice"}"#.to_vec();
let decoded: Example = serde_json::from_slice(&data).expect("Couldn't deserialize");
decoded.info();
}
Here, I forgot to add the #[serde(borrow)] attribute, so I'm glad I did this test!
Next, we can introduce the rental crate:
#[macro_use]
extern crate rental;
rental! {
mod holding {
use super::*;
#[rental]
pub struct VecHolder {
data: Vec<u8>,
parsed: Example<'data>,
}
}
}
fn main() {
let data: Vec<_> = br#"{"key": true, "name": "alice"}"#.to_vec();
let holder = holding::VecHolder::try_new(data, |data| {
serde_json::from_slice(data)
});
let holder = match holder {
Ok(holder) => holder,
Err(_) => panic!("Unable to construct rental"),
};
holder.rent(|example| example.info());
// Make sure we can move the data and it's still valid
let holder2 = { holder };
holder2.rent(|example| example.info());
}
Next we try to create a rental of Chunk:
#[rental]
pub struct ChunkHolder {
data: Chunk,
parsed: Example<'data>,
}
Unfortunately, this fails:
--> src/main.rs:29:1
|
29 | rental! {
| ^
|
= help: message: Field `data` must have an angle-bracketed type parameter or be `String`.
Oops! Checking the docs for rental, we can add #[target_ty_hack="[u8]"] to the data field. This leads to:
error[E0277]: the trait bound `hyper::Chunk: rental::__rental_prelude::StableDeref` is not satisfied
--> src/main.rs:29:1
|
29 | rental! {
| ^ the trait `rental::__rental_prelude::StableDeref` is not implemented for `hyper::Chunk`
|
= note: required by `rental::__rental_prelude::static_assert_stable_deref`
That's annoying; since we can't implement that trait for Chunk, we just need to box Chunk, proving that it has a stable address:
#[rental]
pub struct ChunkHolder {
data: Box<Chunk>,
parsed: Example<'data>,
}
I also looked to see if there is a way to get a Vec<u8> back out of Chunk, but it doesn't appear to exist. That would have been another solution with less allocation and indirection.
At this point, "all" that's left is to integrate this back into the futures code. It's a lot of work for anyone but you to recreate that, but I don't foresee any obvious problems in doing so.
I need a completely in-memory object that I can give to BufReader and BufWriter. Something like Python's StringIO. I want to write to and read from such an object using methods ordinarily used with Files.
Is there a way to do this using the standard library?
In fact there is a way: Cursor<T>!
(please also read Shepmaster's answer on why often it's even easier)
In the documentation you can see that there are the following impls:
impl<T> Seek for Cursor<T> where T: AsRef<[u8]>
impl<T> Read for Cursor<T> where T: AsRef<[u8]>
impl Write for Cursor<Vec<u8>>
impl<T> AsRef<[T]> for Vec<T>
From this you can see that you can use the type Cursor<Vec<u8>> just as an ordinary file, because Read, Write and Seek are implemented for that type!
Little example (Playground):
use std::io::{Cursor, Read, Seek, SeekFrom, Write};
// Create fake "file"
let mut c = Cursor::new(Vec::new());
// Write into the "file" and seek to the beginning
c.write_all(&[1, 2, 3, 4, 5]).unwrap();
c.seek(SeekFrom::Start(0)).unwrap();
// Read the "file's" contents into a vector
let mut out = Vec::new();
c.read_to_end(&mut out).unwrap();
println!("{:?}", out);
For a more useful example, check the documentation linked above.
You don't need a Cursor most of the time.
object that I can give to BufReader and BufWriter
BufReader requires a value that implements Read:
impl<R: Read> BufReader<R> {
pub fn new(inner: R) -> BufReader<R>
}
BufWriter requires a value that implements Write:
impl<W: Write> BufWriter<W> {
pub fn new(inner: W) -> BufWriter<W> {}
}
If you view the implementors of Read you will find impl<'a> Read for &'a [u8].
If you view the implementors of Write, you will find impl Write for Vec<u8>.
use std::io::{Read, Write};
fn main() {
// Create fake "file"
let mut file = Vec::new();
// Write into the "file"
file.write_all(&[1, 2, 3, 4, 5]).unwrap();
// Read the "file's" contents into a new vector
let mut out = Vec::new();
let mut c = file.as_slice();
c.read_to_end(&mut out).unwrap();
println!("{:?}", out);
}
Writing to a Vec will always append to the end. We also take a slice to the Vec that we can update. Each read of c will advance the slice further and further until it is empty.
The main differences from Cursor:
Cannot seek the data, so you cannot easily re-read data
Cannot write to anywhere but the end
If you want to use BufReader with an in-memory String, you can use the as_bytes() method:
use std::io::BufRead;
use std::io::BufReader;
use std::io::Read;
fn read_buff<R: Read>(mut buffer: BufReader<R>) {
let mut data = String::new();
let _ = buffer.read_line(&mut data);
println!("read_buff got {}", data);
}
fn main() {
read_buff(BufReader::new("Potato!".as_bytes()));
}
This prints read_buff got Potato!. There is no need to use a cursor for this case.
To use an in-memory String with BufWriter, you can use the as_mut_vec method. Unfortunately it is unsafe and I have not found any other way. I don't like the Cursor approach since it consumes the vector and I have not found a way yet to use the Cursor together with BufWriter.
use std::io::BufWriter;
use std::io::Write;
pub fn write_something<W: Write>(mut buf: BufWriter<W>) {
buf.write("potato".as_bytes());
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::{BufWriter};
#[test]
fn testing_bufwriter_and_string() {
let mut s = String::new();
write_something(unsafe { BufWriter::new(s.as_mut_vec()) });
assert_eq!("potato", &s);
}
}