How to conditionally create a serializer and sequence? (Option<SerializeSeq>) - rust

I'm trying to process data from a channel, so the whole structure can't be serialized at once. In fact it won't all fit in memory. The trouble I'm running into is that I can't create an Option<SerializeSeq> because that object depends on Serializer (which doesn't live long enough). I'd like to initialize them both together, or not at all:
use serde::ser::{SerializeSeq, Serializer};
fn process_stream(output: bool) -> serde_json::Result<()> {
let rows = /* whatever iterator */ "serde".chars();
let mut seq = if output {
let out = std::io::stdout();
let mut ser = serde_json::Serializer::new(out);
let seq = ser.serialize_seq(None)?
Some(Ok(seq))
} else {
None
}.transpose()?;
for row in rows {
//process_data(row);
if let Some(seq) = &mut seq {
seq.serialize_element(&row)?;
}
}
if let Some(seq) = seq {
seq.end()?;
}
Ok(())
}
(Original code snippet from here.)
The problem is: ser does not live long enough. But I don't want to initialize ser in the outer scope because it may not be enabled (and its writer would create or truncate a file that should not be created). I tried returning ser and seq as a tuple. I tried putting them together in a helper struct, but I couldn't figure out all the template parameters and lifetimes.
How can serde serializer and sequence be initialized based on a condition and stored in Option?

Make sure that ser doesn't get dropped prematurely by declaring it outside the if body:
let mut ser;
let mut seq = if output {
let out = std::io::stdout();
ser = serde_json::Serializer::new(out);
let seq = ser.serialize_seq(None)?;
Some(Ok(seq))
} else {
None
}.transpose()?;

Related

How do I conditionally import modules and add instances of struct to vec!, only when the module (and struct) exists?

I can't figure out how to do import- and instancing-lines such that they tolerate non-existing files/modules and structs.
I tried making a macro that unwraps into such lines based on what files it finds in the directory, using a crate I found that had promise - include_optional - which allows to check for existence of files already at compile-time (since it's a macro).
However, I can't figure out how to use it properly in a macro, neither did I manage to use it without macro using the example at bottom of the docs conditional compilation chapter.
if cfg!(unix) { "unix" } else if cfg!(windows) { "windows" } else { "unknown" } (from the docs)
vs
if include_optional::include_bytes_optional!("day1.rs").is_some() { Some(&day01::Day01 {}) } else { None } // assume day1.rs and thus Day01 are non-existent (my attempt at doing same thing)
My if-statement compiles both cases, including the unreachable code (causing a compilation error), despite how according to the the docs it supposedly doesn't for cfg! ("conditional compilation").
Essentially, what I want is something of this form:
// Macro to generate code based on how many files/structs has been created
// There are anywhere between 1-25 days
get_days_created!;
/* // Let's assume 11 have been created so far, then the macro should evaluate to this code:
* mod day1;
* use day1 as day0#;
* // ...
* mod day11;
* use day11 as day11;
*
* // ...
* fn main() -> Result<(), i32> {
* let days : Vec<&dyn Day> = vec![
* &day01::Day01 {},
* // ...
* &day11::Day11 {},
* ];
* // ...
* }
*/
The solution is to create a proc_macro. These function similar to regular macros except they allow you to write a function of actual code they should execute, instead being given (and returning) a 'TokenStream' to parse the given tokens (and, respectively, what tokens the macro should expand to).
To create a proc_macro, the first and most important piece of information you need to know is that you can't do this anywhere. Instead, you need to create a new library, and in its Cargo.toml file you need to set proc-macro = true. Then you can declare them in its lib.rs. An example TOML would look something like this:
[package]
name = "some_proc_macro_lib"
version = "0.1.0"
edition = "2021"
[lib]
proc-macro = true
[dependencies]
glob = "0.3.0"
regex = "1.7.0"
Then you can create your macros in this library as regular functions, with the #[proc_macro] attribute/annotation. Here's an example lib.rs with as few dependencies as possible. For my exact question, the input TokenStream is irrelevant and can be ignored, and instead you want to generate and return a new one:
use proc_macro::TokenStream;
use glob::glob;
use regex::Regex;
#[proc_macro]
pub fn import_days(_: TokenStream) -> TokenStream {
let mut stream = TokenStream::new();
let re = Regex::new(r".+(\d+)").unwrap();
for entry in glob("./src/day*.rs").expect("Failed to read pattern") {
if let Ok(path) = entry {
let prefix = path.file_stem().unwrap().to_str().unwrap();
let caps = re.captures(prefix);
if let Some(caps) = caps {
let n: u32 = caps.get(1).unwrap().as_str().parse().unwrap();
let day = &format!("{}", prefix);
let day_padded = &format!("day{:0>2}", n);
stream.extend(format!("mod {};", day).parse::<TokenStream>().unwrap());
if n < 10 {
stream.extend(format!("use {} as {};", day, day_padded).parse::<TokenStream>().unwrap());
}
}
}
}
return proc_macro::TokenStream::from(stream);
}
The question could be considered answered with this already, but the answer can and should be further expanded on in my opinion. And as such I will do so.
Some additional explanations and suggestions, beyond the scope of the question
There are however quite a few other crates beside proc_macro that can aid you with both parsing the input stream, and building the output one. Of note are the dependencies syn and quote, and to aid them both there's the crate proc_macro2.
The syn crate
With syn you get helpful types, methods and macros for parsing the input Tokenstream. Essentially, with a struct Foo implementing syn::parse::Parse and the macro let foo = syn::parse_macro_input!(input as Foo) you can much more easily parse it into a custom struct thanks to syn::parse::ParseStream. An example would be something like this:
use proc_macro2::Ident;
use syn;
use syn::parse::{Parse, ParseStream};
#[derive(Debug, Default)]
struct Foo {
idents: Vec<Ident>,
}
impl syn::parse::Parse for Foo {
fn parse(input: syn::parse::ParseStream) -> syn::Result<Self> {
let mut foo= Foo::default();
while !input.is_empty() {
let fn_ident = input.parse::<Ident>()?;
foo.idents.push(fn_ident);
// Optional comma: Ok vs Err doesn't matter. Just consume if it exists and ignore failures.
input.parse::<syn::token::Comma>().ok();
}
return Ok(foo);
}
}
Note that the syn::Result return-type allows for nice propagation of parsing-errors when using the sugary ? syntax: input.parse::<SomeType>()?
The quote crate
With quote you get a helpful macro for generating a tokenstream more akin to how macro_rules does it. As an argument you write essentially regular code, and tell it to use the value of variables by prefixing with #.
Do note that you can't just pass it variables containing strings and expect it to expand into identifiers, as strings resolve to the value "foo" (quotes included). ie. mod "day1"; instead of mod day1;. You need to turn them into either:
a proce_macro2::Ident
syn::Ident::new(foo_str, proc_macro2::Span::call_site())
or a proc_macro2::TokenStream
foo_str.parse::<TokenStream>().unwrap()
The latter also allows to convert longer strings with more than a single Ident, and manages things such as literals etc., making it possible to skip the quote! macro entirely and just use this tokenstream directly (as seen in import_days).
Here's an example that creates a struct with dynamic name, and implements a specific trait for it:
use proc_macro2::TokenStream;
use quote::quote;
// ...
let mut stream = TokenStream::new();
stream.extend(quote!{
#[derive(Debug)]
pub struct #day_padded_upper {}
impl Day for #day_padded_upper {
#trait_parts
}
});
return proc_macro::TokenStream::from(stream);
Finally, on how to implement my question
This 'chapter' is a bit redundant, as I essentially answered it with the first two code-snippets (.toml and fn import_days), and the rest could have been considered an exercise for the reader. However, while the question is about reading the filesystem at compile-time in a macro to 'dynamically' change its expansion (sort of), I wrote it in a more general form asking how to achieve a specific result (as old me didn't know macro's could do that). So for completion I'll include this 'chapter' nevertheless.
There is also the fact that the last macro in this 'chapter' - impl_day (which wasn't mentioned at all in the question) - serves as a good example of how to achieve two adjacent but important and relevant tasks.
Retrieving and using call-site's filename.
Parsing the input TokenStream using the syn dependency as shown above.
In other words: knowing all the above, this is how you can create macros for importing all targeted files, instantiating structs for all targeted files, as well as to declare + define the struct from current file's name.
Importing all targeted files:
See import_days above at the start.
Instantiating Vec with structs from all targeted files:
#[proc_macro]
pub fn instantiate_days(_: proc_macro::TokenStream) -> proc_macro::TokenStream {
let re = Regex::new(r".+(\d+)").unwrap();
let mut stream = TokenStream::new();
let mut block = TokenStream::new();
for entry in glob("./src/day*.rs").expect("Failed to read pattern") {
match entry {
Ok(path) => {
let prefix = path.file_stem().unwrap().to_str().unwrap();
let caps = re.captures(prefix);
if let Some(caps) = caps {
let n: u32 = caps.get(1).unwrap().as_str().parse().unwrap();
let day_padded = &format!("day{:0>2}", n);
let day_padded_upper = &format!("Day{:0>2}", n);
let instance = &format!("&{}::{} {{}}", day_padded, day_padded_upper).parse::<TokenStream>().unwrap();
block.extend(quote!{
v.push( #instance );
});
}
},
Err(e) => println!("{:?}", e),
}
}
stream.extend(quote!{
{
let mut v: Vec<&dyn Day> = Vec::new();
#block
v
}
});
return proc_macro::TokenStream::from(stream);
}
Declaring and defining struct for current file invoking this macro:
#[derive(Debug, Default)]
struct DayParser {
parts: Vec<Ident>,
}
impl Parse for DayParser {
fn parse(input: ParseStream) -> syn::Result<Self> {
let mut day_parser = DayParser::default();
while !input.is_empty() {
let fn_ident = input.parse::<Ident>()?;
// Optional, Ok vs Err doesn't matter. Just consume if it exists.
input.parse::<syn::token::Comma>().ok();
day_parser.parts.push(fn_ident);
}
return Ok(day_parser);
}
}
#[proc_macro]
pub fn impl_day(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
let mut stream = TokenStream::new();
let span = Span::call_site();
let binding = span.source_file().path();
let file = binding.to_str().unwrap();
let re = Regex::new(r".*day(\d+).rs").unwrap();
let caps = re.captures(file);
if let Some(caps) = caps {
let n: u32 = caps.get(1).unwrap().as_str().parse().unwrap();
let day_padded_upper = format!("Day{:0>2}", n).parse::<TokenStream>().unwrap();
let day_parser = syn::parse_macro_input!(input as DayParser);
let mut trait_parts = TokenStream::new();
for (k, fn_ident) in day_parser.parts.into_iter().enumerate() {
let k = k+1;
let trait_part_ident = format!("part_{}", k).parse::<TokenStream>().unwrap();
// let trait_part_ident = proc_macro::Ident::new(format!("part_{}", k).as_str(), span);
trait_parts.extend(quote!{
fn #trait_part_ident(&self, input: &str) -> Result<String, ()> {
return Ok(format!("Part {}: {:?}", #k, #fn_ident(input)));
}
});
}
stream.extend(quote!{
#[derive(Debug)]
pub struct #day_padded_upper {}
impl Day for #day_padded_upper {
#trait_parts
}
});
} else {
// don't generate anything
let str = format!("Tried to implement Day for a file with malformed name: file = \"{}\" , re = \"{:?}\"", file, re);
println!("{}", str);
// compile_error!(str); // can't figure out how to use these
}
return proc_macro::TokenStream::from(stream);
}

Borrowed value does not live long enough when comparing values

I am making a simple rust program that requests an api and detects updates but currently i am getting a really strange error when comparing the new data with the old data to detect if something has changed.
use std::io::prelude::*;
use std::net::TcpStream;
fn main() {
let mut old = "";
while true {
let mut stream = TcpStream::connect("ip:port").unwrap();
let _ = stream.write(b"GET /stats.json HTTP/1.0\r\nHost: example.com\r\n\r\n");
let mut res: String = "".to_string();
let mut buf = [0; 512];
let data: Vec<&str>;
while stream.read(&mut buf).unwrap() > 0 {
res.push_str(&String::from_utf8_lossy(&buf[..]));
for elem in buf.iter_mut() { *elem = 0;}
}
data = res.split("\r\n\r\n").collect();
if data[1] != old {
old = data[1];
println!("new");
}
}
}
The problem with your code is that old is outside the while true loop.
In this line:
data = res.split("\r\n\r\n").collect();
you are setting data that references (or borrows) res. Then here:
old = data[1];
you practically set to old data that borrows res, but res gets dropped at the end of each iteration. And, of course, it wouldn't make sense to reference a dropped value, hence the error "Borrowed value does not live long enough".
One way to avoid this error is to make old a String. That way, old will own its data instead of depending on the data in res.

Process value in Hashmap based on another value in same Hashmap

I have a HashMap acting as a lookup table in my code, mapping IDs <-> Data.
I need to lookup some data (let's call it Data A) based on my ID, then read the contents. Based on entries in the content, I would then need to lookup another value in the same lookup table, read those data, and do some calculations, updating my original data A.
Here is a minimal working example:
playground
use std::collections::HashMap;
struct MyData {
id: i32,
result: i32,
complex_data: Vec<i32>
}
impl MyData {
fn new(id: i32) -> Self {
MyData {
id,
result: 0,
complex_data: Vec::new()
}
}
}
fn main() {
let mut lookup_table = HashMap::new();
// init data
lookup_table.insert(1, MyData::new(1));
lookup_table.insert(2, MyData::new(2));
lookup_table.insert(3, MyData::new(3));
lookup_table.insert(4, MyData::new(4));
// process data based on an ID. In this example, hard coded as "1"
if let Some(data) = lookup_table.get_mut(&1) {
// process each entry
for c in data.complex_data.iter() {
// lookup some more values based the entry
if let Some(lookup_data) = lookup_table.get(c) {
//^^^^^^^^^^^^^^^^^^^ - cannot borrow `lookup_table` as immutable
// do some calculation and store result
data.result = lookup_data.result + 42; // random calculation as an example
}
}
}
println!("Hello, world!");
}
The error occurs because it seems I'm borrowing lookup_table twice. From what I understand, the compiler is worried that my second lookup also looks up the ID = 1, which will mean I have an mutable reference of DataID = 1, and an immutable reference of DataID = 1 at the same time.
I am fine with this, however, since my second read is immutable, and also this whole thing is single-threaded, so I'm not worried about any race conditions.
How can I restructure my code to make the Rust compiler happy whilst achieving my functionality?
I think you can work around the issue by doing all the reads with an immutable borrow at the first part of the if statement, saving the calculation results into a temporary vector, and doing all the writes with a mutable borrow at the second part. See the code below.
// process data based on an ID. In this example, hard coded as "1"
if let Some(data) = lookup_table.get(&1) {
let mut results = Vec::new();
// process each entry
for c in data.complex_data.iter() {
// lookup some more values based the entry
if let Some(lookup_data) = lookup_table.get(c) {
// do some calculation and store result
results.push(lookup_data.result);
}
}
let data = lookup_table.get_mut(&1).unwrap();
for v in results {
data.result = v + 42;
}
}
The latter assignment to data shadows the previous one and ends the lifetime of the immutable borrow.
Playground
You can use interior mutability pattern on result field. This gives you the possibility to make an immutable borrow &MyData in the outer loop, and mutate its result field in the inner loop. The borrow checker doesn't complain because all checks are done at runtime.
And at runtime, you never have several mutable ref at the same time.
use std::{cell::RefCell, collections::HashMap};
struct MyData {
id: i32,
result: RefCell<i32>,
complex_data: Vec<i32>,
}
impl MyData {
fn new(id: i32) -> Self {
MyData {
id,
result: RefCell::new(0),
complex_data: vec![1, 2, 3, 4],
}
}
fn set_result(&self, result: i32) {
*self.result.borrow_mut() = result;
}
fn get_result(&self) -> i32 {
self.result.take()
}
}
fn main() {
let mut lookup_table = HashMap::new();
// init data
lookup_table.insert(1, MyData::new(1));
lookup_table.insert(2, MyData::new(2));
lookup_table.insert(3, MyData::new(3));
lookup_table.insert(4, MyData::new(4));
// process data based on an ID. In this example, hard coded as "1"
if let Some(data) = lookup_table.get(&1) {
// process each entry
for c in data.complex_data.iter() {
// lookup some more values based the entry
if let Some(lookup_data) = lookup_table.get(c) {
//^^^^^^^^^^^^^^^^^^^ - cannot borrow `lookup_table` as immutable
// do some calculation and store result
data.set_result(lookup_data.get_result() + 42);
// random calculation as an example
}
}
}
}
If you don't want to pay the runtime cost, you can use the interior mutability pattern with Cell instead of RefCell.
Interior mutability is one option, as presented in a previous answer. Collecting the values is another option, as presented in another previous answer. Depending on the nature of your calculation, you might not need any allocation at all, but just store intermediate results in a global variable, and assign it at the end.
For example, this compiles:
fn main() {
let mut lookup_table = HashMap::from([
(1, MyData::new(1)),
(2, MyData::new(2)),
(3, MyData::new(3)),
(4, MyData::new(4)),
]);
let data_key = 1;
let mut to_store = None;
if let Some(data) = lookup_table.get(&data_key) {
let mut result = data.result;
for subkey in &data.complex_data {
if let Some(sub_data) = lookup_table.get(subkey) {
result += sub_data.result + 42;
}
}
to_store = Some(result);
}
if let Some(to_store) = to_store {
lookup_table.get_mut(&data_key).unwrap().result = to_store;
}
println!("Hello, world!");
}
Playground

Why does incrementing an integer in a thread not change the value outside of the thread?

I have this code:
fn main() {
let mut total: i64 = 0;
let pool = ThreadPool::with_name("search worker".to_owned(), num_cpus::get());
let items = getItems(); // returns a vector of sorts
let client = blocking::Client::new();
for item in items {
let t_client = client.clone(); // must clone here for some reason
let mut took: i64 = 0;
pool.execute(move || {
took = worker(t_client);
total += took;
});
}
println!("{}", total);
}
fn worker(c: blocking::Client) -> i64 {
// do some stuff and return a value
// for sake of an example, return 25
25
}
I had to use the move clause in the call to execute.
The problem is that the value of the total variable remains zero. It looks like it's being duplicated inside that loop and the local variable does not get modified at all.
I do get a warning on took = worker(t_client); and on total_took += took; which is that those variables aren't never read.
I ended up "solving" this by making that variable static and using unsafe, but that's not a solution. Also, eventually I need to get more results from that worker function and if I put a reference to a variable, it tell me that I can't borrow more than once, which I don't really understand.
total is of type i64 which implements the Copy trait so it's implicitly copied on moves. If you want to share some mutable value T across multiple threads you need to wrap it in an Arc<Mutex<T>> (Arc docs & Mutex docs), or in this particular case, since you're using an i64, you can use an AtomicI64 instead.
Based on #pretzelhammer's answer I write a small snippet on how I got it to work.
let total = Arc::new(AtomicI64::new(0));
...
for ... {
let total_c = total.clone();
pool.execute(move || {
total_c.fetch_add(es_helper::run_search(t_client, parsed), Ordering::Relaxed);
});
}

How can I convert a String into a Vector in Rust?

I was wondering how to convert a styled string into a vector. Say I had a String with the value:
"[x, y]"
-how could I turn it into a vector that has x as the first object and y as the second object?
Thanks!
Sure, but the elements can't be references. As mentioned by #prog-fh that isn't possible in rust since once compiled, variable names may not be stored and the compiler may have even removed some during optimizations.
You can however do something more similar to python's ast.literal_eval using serde with Rust Object Notation (RON, a type of serialization that was made to resemble rust data structures). It isn't perfect, but it is an option. It does however require you know what types you are trying to parse.
use ron::from_str;
let input = "[37.6, 24.3, 89.023]";
let parsed: Vec<f32> = from_str(input).unwrap();
On the other hand if #mcarton is correct and you want something like vec!["x", "y"], you could manually parse it like so:
fn parse(input: &str) -> Option<Vec<String>> {
let mut part = String::new();
let mut collected = Vec::new();
let mut char_iter = input.chars();
if char_iter.next() != Some('[') {
return None
}
loop {
match char_iter.next()? {
']' => {
collected.push(part);
return Some(collected)
}
',' | ' ' => {
if !part.is_empty() {
collected.push(part);
part = String::new();
}
}
x => part.push(x),
}
}
}
println!("{:?}", parse("[a, b, foo]"));
Or you could also use a regex to break it up instead, but you can look into how that works yourself.

Resources