Out of habit from interpreted programming languages, I want to rewrite many values based on their key. I assumed that I would store all the information in the struct prepared for this project. So I started iterating:
struct Container {
x: String,
y: String,
z: String
}
impl Container {
// (...)
fn load_data(&self, data: &HashMap<String, String>) {
let valid_keys = vec_of_strings![ // It's simple vector with Strings
"x", "y", "z"
] ;
for key_name in &valid_keys {
if data.contains_key(key_name) {
self[key_name] = Some(data.get(key_name);
// It's invalid of course but
// I do not know how to write it correctly.
// For example, in PHP I would write it like this:
// $this[$key_name] = $data[$key_name];
}
}
}
// (...)
}
Maybe macros? I tried to use them. key_name is always interpreted as it is, I cannot get value of key_name instead.
How can I do this without repeating the code for each value?
With macros, I always advocate starting from the direct code, then seeing what duplication there is. In this case, we'd start with
fn load_data(&mut self, data: &HashMap<String, String>) {
if let Some(v) = data.get("x") {
self.x = v.clone();
}
if let Some(v) = data.get("y") {
self.y = v.clone();
}
if let Some(v) = data.get("z") {
self.z = v.clone();
}
}
Note the number of differences:
The struct must take &mut self.
It's inefficient to check if a value is there and then get it separately.
We need to clone the value because we only only have a reference.
We cannot store an Option in a String.
Once you have your code working, you can see how to abstract things. Always start by trying to use "lighter" abstractions (functions, traits, etc.). Only after exhausting that, I'd start bringing in macros. Let's start by using stringify
if let Some(v) = data.get(stringify!(x)) {
self.x = v.clone();
}
Then you can extract out a macro:
macro_rules! thing {
($this: ident, $data: ident, $($name: ident),+) => {
$(
if let Some(v) = $data.get(stringify!($name)) {
$this.$name = v.clone();
}
)+
};
}
impl Container {
fn load_data(&mut self, data: &HashMap<String, String>) {
thing!(self, data, x, y, z);
}
}
fn main() {
let mut c = Container::default();
let d: HashMap<_, _> = vec![("x".into(), "alpha".into())].into_iter().collect();
c.load_data(&d);
println!("{:?}", c);
}
Full disclosure: I don't think this is a good idea.
Related
Is it possible to make short, neat loop that will call , as long as result is Ok(x) and act on x ?
E.g. sth like :
use text_io::try_read; // Cargo.toml += text_io = "0.1"
fn main() {
while let Ok(t): Result<i64, _> = try_read!() {
println!("{}", t);
}
}
fails to compile.
If I try to provide type info, then it fails,
when I don't provide , then obviously it's ambiguous how to resolve try_read!.
Here is working - but IMHO way longer - snippet:
use text_io::try_read; // Cargo.toml += text_io = "0.1"
fn main() {
loop {
let mut tok: Result<i64, _> = try_read!();
match tok {
Ok(t) => println!("{}", t),
Err(_) => break,
}
}
}
You can qualify Ok as Result::Ok and then use the "turbofish" operator to provide the concrete type:
fn main() {
while let Result::<i64, _>::Ok(t) = try_read!() {
println!("{}", t);
}
}
(while let Ok::<i64, _>(t) also works, but is perhaps a bit more cryptic.)
Another option is to request the type inside the loop - rustc is smart enough to infer the type for try_read!() from that:
fn main() {
while let Ok(t) = try_read!() {
let t: i64 = t;
println!("{}", t);
}
}
The latter variant is particularly useful in for loops where the pattern match is partly hidden, so there is no place to ascribe the type to.
The following code example is the best that I have come up with so far:
enum Variant {
VariantA(u64),
VariantB(f64),
}
fn main() {
let my_vec = vec![Variant::VariantA(1),
Variant::VariantB(-2.0),
Variant::VariantA(4),
Variant::VariantA(3),
Variant::VariantA(2),
Variant::VariantB(1.0)];
let my_u64_vec = my_vec
.into_iter()
.filter_map(|el| match el {
Variant::VariantA(inner) => Some(inner),
_ => None,
})
.collect::<Vec<u64>>();
println!("my_u64_vec = {:?}", my_u64_vec);
}
I would like to know if there is a less verbose way of obtaining the vector of inner values (i.e., Vec<u64> in the example). It feels like I might be able to use something like try_from or try_into to make this less verbose, but I cannot quite get there.
Enums are not "special" and don't have much if any implicitly associated magic, so by default yes you need a full match -- or at least an if let e.g.
if let Variant::VariantA(inner) = el { Some(inner) } else { None }
However nothing prevents you from implementing whatever utility methods you're thinking of on your enum e.g. get_a which would return an Option<A> (similar to Result::ok and Result::err), or indeed to implement TryFrom on it:
use std::convert::{TryFrom, TryInto};
enum Variant {
VariantA(u64),
VariantB(f64),
}
impl TryFrom<Variant> for u64 {
type Error = ();
fn try_from(value: Variant) -> Result<Self, Self::Error> {
if let Variant::VariantA(v) = value { Ok(v) } else { Err(()) }
}
}
fn main() {
let my_vec = vec![Variant::VariantA(1),
Variant::VariantB(-2.0),
Variant::VariantA(4),
Variant::VariantA(3),
Variant::VariantA(2),
Variant::VariantB(1.0)];
let my_u64_vec = my_vec
.into_iter()
.filter_map(|el| el.try_into().ok())
.collect::<Vec<u64>>();
println!("my_u64_vec = {:?}", my_u64_vec);
}
I'm trying to build a basic web crawler in Rust, which I'm trying to port to html5ever. As of right now, I have a function with a struct inside that is supposed to return a Vec<String>. It gets this Vec from the struct in the return statement. Why does it always return an empty vector? (Does it have anything to do with the lifetime parameters?)
fn find_urls_in_html<'a>(
original_url: &Url,
raw_html: String,
fetched_cache: &Vec<String>,
) -> Vec<String> {
#[derive(Clone)]
struct Sink<'a> {
original_url: &'a Url,
returned_vec: Vec<String>,
fetched_cache: &'a Vec<String>,
}
impl<'a> TokenSink for Sink<'a> {
type Handle = ();
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> {
trace!("token {:?}", token);
match token {
TagToken(tag) => {
if tag.kind == StartTag && tag.attrs.len() != 0 {
let _attribute_name = get_attribute_for_elem(&tag.name);
if _attribute_name == None {
return TokenSinkResult::Continue;
}
let attribute_name = _attribute_name.unwrap();
for attribute in &tag.attrs {
if &attribute.name.local != attribute_name {
continue;
}
trace!("element {:?} found", tag);
add_urls_to_vec(
repair_suggested_url(
self.original_url,
(&attribute.name.local, &attribute.value),
),
&mut self.returned_vec,
&self.fetched_cache,
);
}
}
}
ParseError(error) => {
warn!("error parsing html for {}: {:?}", self.original_url, error);
}
_ => {}
}
return TokenSinkResult::Continue;
}
}
let html = Sink {
original_url: original_url,
returned_vec: Vec::new(),
fetched_cache: fetched_cache,
};
let mut byte_tendril = ByteTendril::new();
{
let tendril_push_result = byte_tendril.try_push_bytes(&raw_html.into_bytes());
if tendril_push_result.is_err() {
warn!("error pushing bytes to tendril: {:?}", tendril_push_result);
return Vec::new();
}
}
let mut queue = BufferQueue::new();
queue.push_back(byte_tendril.try_reinterpret().unwrap());
let mut tok = Tokenizer::new(html.clone(), std::default::Default::default()); // default default! default?
let feed = tok.feed(&mut queue);
return html.returned_vec;
}
The output ends with no warning (and a panic, caused by another function due to this being empty). Can anyone help me figure out what's going on?
Thanks in advance.
When I initialize the Tokenizer, I use:
let mut tok = Tokenizer::new(html.clone(), std::default::Default::default());
The problem is that I'm telling the Tokenizer to use html.clone() instead of html. As such, it is writing returned_vec to the cloned object, not html. Changing a few things, such as using a variable with mutable references, fixes this problem.
The following code works, but it doesn't look nice as the definition of is_empty is too far away from the usage.
fn remove(&mut self, index: I, primary_key: &Rc<K>) {
let is_empty;
{
let ks = self.data.get_mut(&index).unwrap();
ks.remove(primary_key);
is_empty = ks.is_empty();
}
// I have to wrap `ks` in an inner scope so that we can borrow `data` mutably.
if is_empty {
self.data.remove(&index);
}
}
Do we have some ways to drop the variables in condition before entering the if branches, e.g.
if {ks.is_empty()} {
self.data.remove(&index);
}
Whenever you have a double look-up of a key, you need to think Entry API.
With the entry API, you get a handle to a key-value pair and can:
read the key,
read/modify the value,
remove the entry entirely (getting the key and value back).
It's extremely powerful.
In this case:
use std::collections::HashMap;
use std::collections::hash_map::Entry;
fn remove(hm: &mut HashMap<i32, String>, index: i32) {
if let Entry::Occupied(o) = hm.entry(index) {
if o.get().is_empty() {
o.remove_entry();
}
}
}
fn main() {
let mut hm = HashMap::new();
hm.insert(1, String::from(""));
remove(&mut hm, 1);
println!("{:?}", hm);
}
I did this in the end:
match self.data.entry(index) {
Occupied(mut occupied) => {
let is_empty = {
let ks = occupied.get_mut();
ks.remove(primary_key);
ks.is_empty()
};
if is_empty {
occupied.remove();
}
},
Vacant(_) => unreachable!()
}
I'm rather new to Rust and have put together a little experiment that blows my understanding of annotations entirely out of the water. This is compiled with rust-0.13.0-nightly and there's a playpen version of the code here.
The meat of the program is the function 'recognize', which is co-responsible for allocating String instances along with the function 'lex'. I'm sure the code is a bit goofy so, in addition to getting the lifetimes right enough to get this compiling I would also happily accept some guidance on making this idiomatic.
#[deriving(Show)]
enum Token<'a> {
Field(&'a std::string::String),
}
#[deriving(Show)]
struct LexerState<'a> {
character: int,
field: int,
tokens: Vec<Token<'a>>,
str_buf: &'a std::string::String,
}
// The goal with recognize is to:
//
// * gather all A .. z into a temporary string buffer str_buf
// * on ',', move buffer into a Field token
// * store the completely extracted field in LexerState's tokens attribute
//
// I think I'm not understanding how to specify the lifetimes and mutability
// correctly.
fn recognize<'a, 'r>(c: char, ctx: &'r mut LexerState<'a>) -> &'r mut LexerState<'a> {
match c {
'A' ... 'z' => {
ctx.str_buf.push(c);
},
',' => {
ctx.tokens.push(Field(ctx.str_buf));
ctx.field += 1;
ctx.str_buf = &std::string::String::new();
},
_ => ()
};
ctx.character += 1;
ctx
}
fn lex<'a, I, E>(it: &mut I)
-> LexerState<'a> where I: Iterator<Result<char, E>> {
let mut ctx = LexerState { character: 0, field: 0,
tokens: Vec::new(), str_buf: &std::string::String::new() };
for val in *it {
let c:char = val.ok().expect("wtf");
recognize(c, &mut ctx);
}
ctx
}
fn main() {
let tokens = lex(&mut std::io::stdio::stdin().chars());
println!("{}", tokens)
}
In this case, you're constructing new strings rather than borrowing existing strings, so you'd use an owned string directly:
use std::mem;
#[deriving(Show)]
enum Token {
Field(String),
}
#[deriving(Show)]
struct LexerState {
character: int,
field: int,
tokens: Vec<Token>,
str_buf: String,
}
// The goal with recognize is to:
//
// * gather all A .. z into a temporary string buffer str_buf
// * on ',', move buffer into a Field token
// * store the completely extracted field in LexerState's tokens attribute
//
// I think I'm not understanding how to specify the lifetimes and mutability
// correctly.
fn recognize<'a, 'r>(c: char, ctx: &'r mut LexerState) -> &'r mut LexerState {
match c {
'A' ...'z' => { ctx.str_buf.push(c); }
',' => {
ctx.tokens.push(Field(mem::replace(&mut ctx.str_buf,
String::new())));
ctx.field += 1;
}
_ => (),
};
ctx.character += 1;
ctx
}
fn lex<I, E>(it: &mut I) -> LexerState where I: Iterator<Result<char, E>> {
let mut ctx =
LexerState{
character: 0,
field: 0,
tokens: Vec::new(),
str_buf: String::new(),
};
for val in *it {
let c: char = val.ok().expect("wtf");
recognize(c, &mut ctx);
}
ctx
}
fn main() {
let tokens = lex(&mut std::io::stdio::stdin().chars());
println!("{}" , tokens)
}