I'd like to read some json into a static HashMap, and am using lazy_static and serde, but I can't figure out how (if at all) I can fix this serde lifetime issue:
#[macro_use]
extern crate lazy_static;
use std::fs::File;
use std::io::BufReader;
use std::collections::HashMap;
lazy_static! {
static ref KEYWORDS: HashMap<&'static str, i32> = {
let file = File::open("words.json").unwrap();
let reader = BufReader::new(file);
serde_json::from_reader(reader).unwrap()
};
}
playground link
error: implementation of serde::de::Deserialize is not general enough
note: HashMap<&str, i32> must implement serde::de::Deserialize<'0>, for any lifetime '0
note: but HashMap<&str, i32> actually implements serde::de::Deserialize<'1>, for some specific lifetime '1
words.json is a simple json map: {"aaargh": 1}.
I'm open to another, non-lazy_static approach if need be.
When using serde_json::from_str to deserialize from &str ⟶ HashMap<&str, i32>, the input JSON string needs to outlive the string slices in the output. This is the role of the 'a lifetime in the signature: https://docs.rs/serde_json/1.0.40/serde_json/fn.from_str.html
That means if the output needs to contain string slices with 'static lifetime, the input JSON data must also have 'static lifetime. We know how to do that -- lazy_static!
use lazy_static::lazy_static;
use std::collections::HashMap;
lazy_static! {
static ref KEYWORDS: HashMap<&'static str, i32> = {
lazy_static! {
static ref WORDS_JSON: String = {
std::fs::read_to_string("words.json").unwrap()
};
}
serde_json::from_str(&WORDS_JSON).unwrap()
};
}
The error message is telling you that you can't deserialize to a &'static str. As the deserializer goes along creating the entries, the &str keys could only have as long a lifetime as a borrow of the buffer the deserializer is reading the file into. But &'static str must point to a str which lives forever.
I see two solutions here: The easy way and the hard way.
The easy way: Just change &'static str in the type to String and it compiles. This way the HashMap owns the keys; serde already knows how to deserialize owned strings.
static ref KEYWORDS: HashMap<String, i32> = { // ...
The hard way: Technically you can still get your HashMap<&'static str, i32> by leaking the backing buffers of the Strings. Normally "leaking" is bad, but since this is a lazy static, it really makes no difference as those buffers would never be freed anyways. Getting a &'static str by leaking a String looks like this:
fn leak_string(from: String) -> &'static str {
Box::leak(from.into_boxed_str())
}
The problem is that serde doesn't do this automatically. One way to accomplish this would be to deserialize to the HashMap<String, i32> first and then convert it to HashMap<&'static string, i32> by taking each of the entries and inserting them into a new HashMap after running the keys through leak_string. This is inefficient as there was no need to collect into a HashMap in the first place. A better solution would involve writing a custom deserializer that did leak_string "on the fly". Since the easy way is so much easier, and there's some stumbling blocks for this hard way, I don't think it's useful to provide a full code sample here.
The only real advantage of "the hard way" vs "the easy way" is that "the hard way" requires one pointer's worth less memory for each key in the HashMap (&str is pointer+len; String is pointer+len+capacity). It's also nice in that it doesn't change your type signature, but there's very little you can do with a &'static str that you can't do with a String.
The culprit here is how serde_json::from_reader is defined. From its documentation:
pub fn from_reader<R, T>(rdr: R) -> Result<T> where
R: Read,
T: DeserializeOwned,
So, the result must be owned data, not borrowed. Even the &'static won't do. You have to use String here:
lazy_static! {
static ref KEYWORDS: HashMap<String, i32> = {
let file = File::open("words.json").unwrap();
let reader = BufReader::new(file);
serde_json::from_reader(reader).unwrap()
};
}
Related
I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?
While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).
This question already has an answer here:
Is it more conventional to pass-by-value or pass-by-reference when the method needs ownership of the value?
(1 answer)
Closed 2 years ago.
In general it's suggested to accept &str instead of String in Rust.
Let's assume i have a couple of functions and a String instance:
use std::collections::HashMap;
fn do_something_str(string: &str) {
let mut map = HashMap::new();
map.insert(string.to_owned() /* copying (expensive)? */, "value");
}
fn do_something_string(string: String) {
let mut map = HashMap::new();
map.insert(string /* moving (cheap)? */, "value");
}
fn main() {
let string = String::from("123");
do_something_str(&string);
do_something_string(string);
}
Playground
Does copying happen in do_something_str() meaning it will be slower/higher temporary memory consumption?
PS. i know i don't have to call .to_owned() explicitly and the following will also work:
fn do_something_str(string: &str) {
let mut map = HashMap::new();
map.insert(string /* copying (expensive)? */, "value");
}
But since a hashmap owns keys i believe it will clone it implicitly. Please correct me, if i'm wrong.
In general it's suggested to accept &str instead of String in Rust.
Not quite. The general wisdom is to accept &str instead of &String. That is, when you already intend to operate on a reference, the more general type is considered to be better.
If you need an owned String, it is in fact wasteful to pass a reference only to immediately clone it. Accepting a String in the first place leaves the choice to the caller: if they need to keep a copy of the string for themselves, they can clone(). Else, the caller just moves the String into the HashMap and no cloning is involved.
But since a hashmap owns keys i believe it will clone it implicitly. Please correct me, if i'm wrong.
HashMap does not clone on insertion. That would require a Clone trait bound on HashMap.
HashMap<&str, T> would still "own" the key, but the key in this case is &str, not its owned equivalent (String). Naturally, this would prevent the HashMap from outliving the string that the key references. The following example fails the borrow check:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<&str, ()> = HashMap::new();
{
let key = String::from("foo");
map.insert(&key, ()); // error[E0597]: `key` does not live long enough
}
println!("{:?}", map.get("foo"));
}
Calling .to_owned() on a &str will indeed create an owned copy of the string and is associated with the corresponding performance/memory penaltly.
As you correctly noted, you can also have a HashMap<&str, V>, i.e. a HashMap whose keys are of type &str. This does not require cloning of the string.
This however means that the lifetime of the HashMap is limited by the lifetime of the &str.
I come from a Java/C#/JavaScript background and I am trying to implement a Dictionary that would assign each passed string an id that never changes. The dictionary should be able to return a string by the specified id. This allows to store some data that has a lot of repetitive strings far more efficiently in the file system because only the ids of strings would be stored instead of entire strings.
I thought that a struct with a HashMap and a Vec would do but it turned out to be more complicated than that.
I started with the usage of &str as a key for HashMap and an item of Vec like in the following sample. The value of HashMap serves as an index into Vec.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<&'a str>
}
impl<'a> Dictionary<'a> {
pub fn put_and_get_key(&mut self, value: &'a str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
self.keys_map.push(value);
self.values_map.insert(value, id);
id
},
Some(&mut id) => id
}
}
}
This works just fine until it turns out that the strs need to be stored somewhere, preferably in this same struct as well. I tried to store a Box<str> in the Vec and &'a str in the HashMap.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<Box<str>>
}
The borrow checker did not allow this of course because it would have allowed a dangling pointer in the HashMap when an item is removed from the Vec (or in fact sometimes when another item is added to the Vec but this is an off-topic here).
I understood that I either need to write unsafe code or use some form of shared ownership, the simplest kind of which seems to be an Rc. The usage of Rc<Box<str>> looks like introducing double indirection but there seems to be no simple way to construct an Rc<str> at the moment.
pub struct Dictionary {
values_map: HashMap<Rc<Box<str>>, u32>,
keys_map: Vec<Rc<Box<str>>>
}
impl Dictionary {
pub fn put_and_get_key(&mut self, value: &str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
let value_to_store = Rc::new(value.to_owned().into_boxed_str());
self.keys_map.push(value_to_store);
self.values_map.insert(value_to_store, id);
id
},
Some(&mut id) => id
}
}
}
Everything seems fine with regard to ownership semantics, but the code above does not compile because the HashMap now expects an Rc, not an &str:
error[E0277]: the trait bound `std::rc::Rc<Box<str>>: std::borrow::Borrow<str>` is not satisfied
--> src/file_structure/sample_dictionary.rs:14:31
|
14 | match self.values_map.get_mut(value) {
| ^^^^^^^ the trait `std::borrow::Borrow<str>` is not implemented for `std::rc::Rc<Box<str>>`
|
= help: the following implementations were found:
= help: <std::rc::Rc<T> as std::borrow::Borrow<T>>
Questions:
Is there a way to construct an Rc<str>?
Which other structures, methods or approaches could help to resolve this problem. Essentially, I need a way to efficiently store two maps string-by-id and id-by-string and be able to retrieve an id by &str, i.e. without any excessive allocations.
Is there a way to construct an Rc<str>?
Annoyingly, not that I know of. Rc::new requires a Sized argument, and I am not sure whether it is an actual limitation, or just something which was forgotten.
Which other structures, methods or approaches could help to resolve this problem?
If you look at the signature of get you'll notice:
fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where K: Borrow<Q>, Q: Hash + Eq
As a result, you could search by &str if K implements Borrow<str>.
String implements Borrow<str>, so the simplest solution is to simply use String as a key. Sure it means you'll actually have two String instead of one... but it's simple. Certainly, a String is simpler to use than a Box<str> (although it uses 8 more bytes).
If you want to shave off this cost, you can use a custom structure instead:
#[derive(Clone, Debug)]
struct RcStr(Rc<String>);
And then implement Borrow<str> for it. You'll then have 2 allocations per key (1 for Rc and 1 for String). Depending on the size of your String, it might consume less or more memory.
If you wish to got further (why not?), here are some ideas:
implement your own reference-counted string, in a single heap-allocation,
use a single arena for the slice inserted in the Dictionary,
...
I wrote some Rust code that takes a &String as an argument:
fn awesome_greeting(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
I've also written code that takes in a reference to a Vec or Box:
fn total_price(prices: &Vec<i32>) -> i32 {
prices.iter().sum()
}
fn is_even(value: &Box<i32>) -> bool {
**value % 2 == 0
}
However, I received some feedback that doing it like this isn't a good idea. Why not?
TL;DR: One can instead use &str, &[T] or &T to allow for more generic code.
One of the main reasons to use a String or a Vec is because they allow increasing or decreasing the capacity. However, when you accept an immutable reference, you cannot use any of those interesting methods on the Vec or String.
Accepting a &String, &Vec or &Box also requires the argument to be allocated on the heap before you can call the function. Accepting a &str allows a string literal (saved in the program data) and accepting a &[T] or &T allows a stack-allocated array or variable. Unnecessary allocation is a performance loss. This is usually exposed right away when you try to call these methods in a test or a main method:
awesome_greeting(&String::from("Anna"));
total_price(&vec![42, 13, 1337])
is_even(&Box::new(42))
Another performance consideration is that &String, &Vec and &Box introduce an unnecessary layer of indirection as you have to dereference the &String to get a String and then perform a second dereference to end up at &str.
Instead, you should accept a string slice (&str), a slice (&[T]), or just a reference (&T). A &String, &Vec<T> or &Box<T> will be automatically coerced (via deref coercion) to a &str, &[T] or &T, respectively.
fn awesome_greeting(name: &str) {
println!("Wow, you are awesome, {}!", name);
}
fn total_price(prices: &[i32]) -> i32 {
prices.iter().sum()
}
fn is_even(value: &i32) -> bool {
*value % 2 == 0
}
Now you can call these methods with a broader set of types. For example, awesome_greeting can be called with a string literal ("Anna") or an allocated String. total_price can be called with a reference to an array (&[1, 2, 3]) or an allocated Vec.
If you'd like to add or remove items from the String or Vec<T>, you can take a mutable reference (&mut String or &mut Vec<T>):
fn add_greeting_target(greeting: &mut String) {
greeting.push_str("world!");
}
fn add_candy_prices(prices: &mut Vec<i32>) {
prices.push(5);
prices.push(25);
}
Specifically for slices, you can also accept a &mut [T] or &mut str. This allows you to mutate a specific value inside the slice, but you cannot change the number of items inside the slice (which means it's very restricted for strings):
fn reset_first_price(prices: &mut [i32]) {
prices[0] = 0;
}
fn lowercase_first_ascii_character(s: &mut str) {
if let Some(f) = s.get_mut(0..1) {
f.make_ascii_lowercase();
}
}
In addition to Shepmaster's answer, another reason to accept a &str (and similarly &[T] etc) is because of all of the other types besides String and &str that also satisfy Deref<Target = str>. One of the most notable examples is Cow<str>, which lets you be very flexible about whether you are dealing with owned or borrowed data.
If you have:
fn awesome_greeting(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
But you need to call it with a Cow<str>, you'll have to do this:
let c: Cow<str> = Cow::from("hello");
// Allocate an owned String from a str reference and then makes a reference to it anyway!
awesome_greeting(&c.to_string());
When you change the argument type to &str, you can use Cow seamlessly, without any unnecessary allocation, just like with String:
let c: Cow<str> = Cow::from("hello");
// Just pass the same reference along
awesome_greeting(&c);
let c: Cow<str> = Cow::from(String::from("hello"));
// Pass a reference to the owned string that you already have
awesome_greeting(&c);
Accepting &str makes calling your function more uniform and convenient, and the "easiest" way is now also the most efficient. These examples will also work with Cow<[T]> etc.
The recommendation is using &str over &String because &str also satisfies &String which could be used for both owned strings and the string slices but not the other way around:
use std::borrow::Cow;
fn greeting_one(name: &String) {
println!("Wow, you are awesome, {}!", name);
}
fn greeting_two(name: &str) {
println!("Wow, you are awesome, {}!", name);
}
fn main() {
let s1 = "John Doe".to_string();
let s2 = "Jenny Doe";
let s3 = Cow::Borrowed("Sally Doe");
let s4 = Cow::Owned("Sally Doe".to_string());
greeting_one(&s1);
// greeting_one(&s2); // Does not compile
// greeting_one(&s3); // Does not compile
greeting_one(&s4);
greeting_two(&s1);
greeting_two(s2);
greeting_two(&s3);
greeting_two(&s4);
}
Using vectors to manipulate text is never a good idea and does not even deserve discussion because you will loose all the sanity checks and performance optimizations. String type uses vector internally anyway. Remember, Rust uses UTF-8 for strings for storage efficiency. If you use vector, you have to repeat all the hard work. Other than that, borrowing vectors or boxed values should be OK.
Because those types can be coerced, so if we use those types functions will accept less types:
1- a reference to String can be coerced to a str slice. For example create a function:
fn count_wovels(words:&String)->usize{
let wovels_count=words.chars().into_iter().filter(|x|(*x=='a') | (*x=='e')| (*x=='i')| (*x=='o')|(*x=='u')).count();
wovels_count
}
if you pass &str, it will not be accepted:
let name="yilmaz".to_string();
println!("{}",count_wovels(&name));
// this is not allowed because argument should be &String but we are passing str
// println!("{}",wovels("yilmaz"))
But if that function accepts &str instead
// words:&str
fn count_wovels(words:&str)->usize{ ... }
we can pass both types to the function
let name="yilmaz".to_string();
println!("{}",count_wovels(&name));
println!("{}",wovels("yilmaz"))
With this, our function can accept more types
2- Similary, a reference to Box &Box[T], will be coerced to the reference to the value inside the Box Box[&T]. for example
fn length(name:&Box<&str>){
println!("lenght {}",name.len())
}
this accepts only &Box<&str> type
let boxed_str=Box::new("Hello");
length(&boxed_str);
// expected reference `&Box<&str>` found reference `&'static str`
// length("hello")
If we pass &str as type, we can pass both types
3- Similar relation exists between ref to a Vec and ref to an array
fn square(nums:&Vec<i32>){
for num in nums{
println!("square of {} is {}",num,num*num)
}
}
fn main(){
let nums=vec![1,2,3,4,5];
let nums_array=[1,2,3,4,5];
// only &Vec<i32> is accepted
square(&nums);
// mismatched types: mismatched types expected reference `&Vec<i32>` found reference `&[{integer}; 5]`
//square(&nums_array)
}
this will work for both types
fn square(nums:&[i32]){..}
How would I idiomatically create a string to string hashmap in rust. The following works, but is it the right way to do it? is there a different kind of string I should be using?
use std::collections::hashmap::HashMap;
//use std::str;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{0}", mymap["foo".to_string()]);
}
Assuming you would like the flexibility of String, HashMap<String, String> is correct. The other choice is &str, but that imposes significant restrictions on how the HashMap can be used/where it can be passed around; but if it it works, changing one or both parameter to &str will be more efficient. This choice should be dictated by what sort of ownership semantics you need, and how dynamic the strings are, see this answer and the strings guide for more.
BTW, searching a HashMap<String, ...> with a String can be expensive: if you don't already have one, it requires allocating a new String. We have a work around in the form of find_equiv, which allows you to pass a string literal (and, more generally, any &str) without allocating a new String:
use std::collections::HashMap;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{}", mymap.find_equiv(&"foo"));
println!("{}", mymap.find_equiv(&"not there"));
}
playpen (note I've left the Option in the return value, one could call .unwrap() or handle a missing key properly).
Another slightly different option (more general in some circumstances, less in others), is the std::string::as_string function, which allows viewing the data in &str as if it were a &String, without allocating (as the name suggests). It returns an object that can be dereferenced to a String, e.g.
use std::collections::HashMap;
use std::string;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{}", mymap[*string::as_string("foo")]);
}
playpen
(There is a similar std::vec::as_vec.)
Writing this answer for future readers. huon's answer is correct at the time but *_equiv methods were purged some time ago.
The HashMap documentation provides an example on using String-String hashmaps, where &str can be used.
The following code will work just fine. No new String allocation necessary:
use std::collections::HashMap;
fn main() {
let mut mymap = HashMap::new();
mymap.insert("foo".to_string(), "bar".to_string());
println!("{0}", mymap["foo"]);
println!("{0}", mymap.get("foo").unwrap());
}