This question already has answers here:
Initialize a large, fixed-size array with non-Copy types
(8 answers)
Closed last month.
I'm trying to implement a simple hash table with external chaining in Rust but having trouble with the table declaration. I declared the table as such:
static mut _HASH_TABLE: [Option<Box<MemoryMap>>; (std::u16::MAX -1) as usize] = [None; (std::u16::MAX -1) as usize];
with the MemoryMap being a dynamic linked list
pub struct MemoryMap {
entry: SimpleEntry,
next: Option<Box<MemoryMap>>,
}
impl MemoryMap {
pub fn new(init_key: String, init_value: Box<dyn Storable>) -> MemoryMap {
MemoryMap {
entry: SimpleEntry::new(init_key, init_value),
next: None,
}
}
//...
}
pub struct SimpleEntry {
key: String,
value: Box<dyn Storable>,
}
impl SimpleEntry {
pub fn new(key: String, value: Box<dyn Storable>) -> SimpleEntry {
SimpleEntry { key, value }
}
//...
}
That said, I get this error
5 | static mut _HASH_TABLE: [Option<Box<MemoryMap>>; (std::u16::MAX -1) as usize] = [None; (std::u16::MAX -1) as usize];
| ^^^^ the trait `Copy` is not implemented for `Box<MemoryMap>`
|
= note: required for `Option<Box<MemoryMap>>` to implement `Copy`
= note: the `Copy` trait is required because this value will be copied for each element of the array
and I can't understand why elements of the arrays have to be copied since I was expecting the array taking ownership of the Box<MemoryMap>
I can't understand why elements of the arrays have to be copied since I was expecting the array taking ownership of the Box<MemoryMap>
When you write
[None; (std::u16::MAX -1) as usize]
the system needs to take the one value you gave it, and duplicate it std::u16::MAX - 1 times in order to fill the array. To do that, it uses Copy:
A repeat expression [x; N], which produces an array with N copies of x. The type of x must be Copy.
That can not work here because even if the None value is trivial, as far as Rust is concerned Option<T> implements Copy iff T implements Copy, and Box<_> does not implement Copy. So you need to implement an alternate initialisation method. Like the one linked by #cafce25, or using once_cell / lazy_static in order to have a global which is (lazy-) initialised at runtime, and would likely be something like a Vec.
Though frankly I don't get why you're even creating a global mutable thing, let alone one initialised a fixed 65k entries (of 64 bit pointers).
Related
This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Closed 8 months ago.
The following is a snippet of a more complicated code, the idea is loading a SQL table and setting a hashmap with one of the table struct fields as the key and keeping the structure as the value (implementation details are not important since the code works fine if I clone the String, however, the Strings in the DB can be arbitrarily long and cloning can be expensive).
The following code will fail with
error[E0382]: use of partially moved value: `foo`
--> src/main.rs:24:35
|
24 | foo_hashmap.insert(foo.a, foo);
| ----- ^^^ value used here after partial move
| |
| value partially moved here
|
= note: partial move occurs because `foo.a` has type `String`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0382`.
use std::collections::HashMap;
struct Foo {
a: String,
b: String,
}
fn main() {
let foo_1 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashMap::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(foo.a, foo); // foo.a.clone() will make this compile
});
}
The struct Foo cannot implement Copy since its fields are String. I tried wrapping foo.a with Rc::new(RefCell::new()) but later went down the pitfall of missing the trait Hash for RefCell<String>, so currently I'm not certain in either using something else for the struct fields (will Cow work?), or to handle that logic within the for_each loop.
There are at least two problems here: First, the resulting HashMap<K, V> would be a self-referential struct, as the K borrows V; there are many questions and answers on SA about the pitfalls of this. Second, even if you could construct such a HashMap, you'd easily break the guarantees provided by HashMap, which allows you to modify V while assuming that K always stays constant: There is no way to get a &mut K for a HashMap, but you can get a &mut V; if K is actually a &V, one could easily modify K through V (by ways of mutating Foo.a ) and break the map.
One possibility is to change Foo.a from a String to a Rc<str>, which you can clone with minimal runtime cost in order to put the value both in the K and into V. As Rc<str> is Borrow<str>, you can still look up values in the map by means of &str. This still has the - theoretical - downside that you can break the map by getting a &mut Foo from the map and std::mem::swap the a, which makes it impossible to look up the correct value from its keys; but you'd have to do that deliberately.
Another option is to actually use a HashSet instead of a HashMap, and use a newtype for Foo which behaves like a Foo.a. You'd have to implement PartialEq, Eq, Hash (and Borrow<str> for good measure) like this:
use std::collections::HashSet;
#[derive(Debug)]
struct Foo {
a: String,
b: String,
}
/// A newtype for `Foo` which behaves like a `str`
#[derive(Debug)]
struct FooEntry(Foo);
/// `FooEntry` compares to other `FooEntry` only via `.a`
impl PartialEq<FooEntry> for FooEntry {
fn eq(&self, other: &FooEntry) -> bool {
self.0.a == other.0.a
}
}
impl Eq for FooEntry {}
/// It also hashes the same way as a `Foo.a`
impl std::hash::Hash for FooEntry {
fn hash<H>(&self, hasher: &mut H)
where
H: std::hash::Hasher,
{
self.0.a.hash(hasher);
}
}
/// Due to the above, we can implement `Borrow`, so now we can look up
/// a `FooEntry` in the Set using &str
impl std::borrow::Borrow<str> for FooEntry {
fn borrow(&self) -> &str {
&self.0.a
}
}
fn main() {
let foo_1 = Foo {
a: "foo".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "foobar".to_string(),
b: "barfoo".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashSet::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(FooEntry(foo));
});
// Look up `Foo` using &str as keys...
println!("{:?}", foo_hashmap.get("foo").unwrap().0);
println!("{:?}", foo_hashmap.get("foobar").unwrap().0);
}
Notice that HashSet provides no way to get a &mut FooEntry due to the reasons described above. You'd have to use RefCell (and read what the docs of HashSet have to say about this).
The third option is to simply clone() the foo.a as you described. Given the above, this is probably the most simple solution. If using an Rc<str> doesn't bother you for other reasons, this would be my choice.
Sidenote: If you don't need to modify a and/or b, a Box<str> instead of String is smaller by one machine word.
I am new to Rust. When I read chapter 15 of The Rust Programming Language, I failed to know why one should use Boxes in recursive data structures instead of regular references. 15.1 of the book explains that indirection is required to avoid infinite-sized structures, but it does not explain why to use Box.
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn main() {
let list = Cons(1, &Cons(2, &Cons(3, &Nil)));
println!("{:?}", list);
}
The code above compiles and produces the desired output. It seems that using FunctionalList to store a small amount of data on stack works perfectly well. Does this code cause troubles?
It is true that the FunctionalList works in this simple case. However, we will run into some difficulties if we try to use this structure in other ways. For instance, suppose we tried to construct a FunctionalList and then return it from a function:
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list(x: u32) -> FunctionalList {
return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
}
fn main() {
let list = make_list(1);
println!("{:?}", list);
}
This results in the following compile error:
error[E0106]: missing lifetime specifier
--> src/main.rs:9:25
|
9 | fn make_list(x: u32) -> FunctionalList {
| ^^^^^^^^^^^^^^ help: consider giving it an explicit bounded or 'static lifetime: `FunctionalList + 'static`
If we follow the hint and add a 'static lifetime, then we instead get this error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:10:12
|
10 | return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
| ^^^^^^^^^^^^^^^^^^^^^^-----------------^^
| | |
| | temporary value created here
| returns a value referencing data owned by the current function
The issue is that the inner FunctionalList values here are owned by implicit temporary variables whose scope ends at the end of the make_list function. These values would thus be dropped at the end of the function, leaving dangling references to them, which Rust disallows, hence the borrow checker rejects this code.
In contrast, if FunctionalList had been defined to Box its FunctionalList component, then ownership would have been moved from the temporary value into the containing FunctionalList, and we would have been able to return it without any problem.
With your original FunctionalList, the thing we have to think about is that every value in Rust has to have an owner somewhere; and so if, as in this case, the FunctionaList is not the owner of its inner FunctionalLists, then that ownership has to reside somewhere else. In your example, that owner was an implicit temporary variable, but in more complex situations we could use a different kind of external owner. Here's an example of using a TypedArena (from the typed-arena crate) to own the data, so that we can still implement a variation of the make_list function:
use typed_arena::Arena;
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list<'a>(x: u32, arena: &'a Arena<FunctionalList<'a>>) -> &mut FunctionalList<'a> {
let l0 = arena.alloc(Nil);
let l1 = arena.alloc(Cons(x + 2, l0));
let l2 = arena.alloc(Cons(x + 1, l1));
let l3 = arena.alloc(Cons(x, l2));
return l3;
}
fn main() {
let arena = Arena::new();
let list = make_list(1, &arena);
println!("{:?}", list);
}
In this case, we adapted the return type of make_list to return only a mutable reference to a FunctionalList, instead of returning an owned FunctionalList, since now the ownership resides in the arena.
Given the following code:
trait Function {
fn filter (&self);
}
#[derive(Debug, Copy, Clone)]
struct Kidney {}
impl Function for Kidney {
fn filter (&self) {
println!("filtered");
}
}
fn main() {
let k = Kidney {};
let f: &Function = &k;
//let k1 = (*f); //--> This gives a "size not satisfied" error
(*f).filter(); //--> Works; what exactly happens here?
}
I am not sure why it compiles. I was expecting the last statement to fail. I guess I have overlooked some fundamentals while learning Rust, as I am failing to understand why dereferencing a trait (that lives behind a pointer) should compile.
Is this issue similar to the following case?
let v = vec![1, 2, 3, 4];
//let s: &[i32] = *v;
println!("{}", (*v)[0]);
*v gives a slice, but a slice is unsized, so again it is not clear to me how this compiles. If I uncomment the second statement I get
| let s:&[i32]= *v;
| ^^
| |
| expected &[i32], found slice
| help: consider borrowing here: `&*v`
|
= note: expected type `&[i32]`
found type `[{integer}]`
Does expected type &[i32] mean "expected a reference of slice"?
Dereferencing a trait object is no problem. In fact, it must be dereferenced at some point, otherwise it would be quite useless.
let k1 = (*f); fails not because of dereferencing but because you try to put the raw trait object on the stack (this is where local variables live). Values on the stack must have a size known at compile time, which is not the case for trait objects because any type could implement the trait.
Here is an example where a structs with different sizes implement the trait:
trait Function {
fn filter (&self);
}
#[derive(Debug, Copy, Clone)]
struct Kidney {}
impl Function for Kidney {
fn filter (&self) {
println!("filtered");
}
}
#[derive(Debug, Copy, Clone)]
struct Liver {
size: f32
}
impl Function for Liver {
fn filter (&self) {
println!("filtered too!");
}
}
fn main() {
let k = Kidney {};
let l = Liver {size: 1.0};
let f: &Function;
if true {
f = &k;
} else {
f = &l;
}
// Now what is the size of *f - Kidney (0 bytes) or Liver (4 bytes)?
}
(*f).filter(); works because the temporarily dereferenced object is not put on the stack. In fact, this is the same as f.filter(). Rust automatically applies as many dereferences as required to get to an actual object. This is documented in the book.
What happens in the second case is that Vec implements Deref to slices, so it gets all methods implemented for slices for free. *v gives you a dereferenced slice, which you assign to a slice. This is an obvious type error.
Judging by the MIR produced by the first piece of code, (*f).filter() is equivalent to f.filter(); it appears that the compiler is aware that since filter is a method on &self, dereferencing it doesn't serve any purpose and is omitted altogether.
The second case, however, is different, because dereferencing the slice introduces bounds-checking code. In my opinion the compiler should also be able to tell that this operation (dereferencing) doesn't introduce any meaningful changes (and/or that there won't be an out-of-bounds error) and treat it as regular slice indexing, but there might be some reason behind this.
I come from a Java/C#/JavaScript background and I am trying to implement a Dictionary that would assign each passed string an id that never changes. The dictionary should be able to return a string by the specified id. This allows to store some data that has a lot of repetitive strings far more efficiently in the file system because only the ids of strings would be stored instead of entire strings.
I thought that a struct with a HashMap and a Vec would do but it turned out to be more complicated than that.
I started with the usage of &str as a key for HashMap and an item of Vec like in the following sample. The value of HashMap serves as an index into Vec.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<&'a str>
}
impl<'a> Dictionary<'a> {
pub fn put_and_get_key(&mut self, value: &'a str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
self.keys_map.push(value);
self.values_map.insert(value, id);
id
},
Some(&mut id) => id
}
}
}
This works just fine until it turns out that the strs need to be stored somewhere, preferably in this same struct as well. I tried to store a Box<str> in the Vec and &'a str in the HashMap.
pub struct Dictionary<'a> {
values_map: HashMap<&'a str, u32>,
keys_map: Vec<Box<str>>
}
The borrow checker did not allow this of course because it would have allowed a dangling pointer in the HashMap when an item is removed from the Vec (or in fact sometimes when another item is added to the Vec but this is an off-topic here).
I understood that I either need to write unsafe code or use some form of shared ownership, the simplest kind of which seems to be an Rc. The usage of Rc<Box<str>> looks like introducing double indirection but there seems to be no simple way to construct an Rc<str> at the moment.
pub struct Dictionary {
values_map: HashMap<Rc<Box<str>>, u32>,
keys_map: Vec<Rc<Box<str>>>
}
impl Dictionary {
pub fn put_and_get_key(&mut self, value: &str) -> u32 {
match self.values_map.get_mut(value) {
None => {
let id_usize = self.keys_map.len();
let id = id_usize as u32;
let value_to_store = Rc::new(value.to_owned().into_boxed_str());
self.keys_map.push(value_to_store);
self.values_map.insert(value_to_store, id);
id
},
Some(&mut id) => id
}
}
}
Everything seems fine with regard to ownership semantics, but the code above does not compile because the HashMap now expects an Rc, not an &str:
error[E0277]: the trait bound `std::rc::Rc<Box<str>>: std::borrow::Borrow<str>` is not satisfied
--> src/file_structure/sample_dictionary.rs:14:31
|
14 | match self.values_map.get_mut(value) {
| ^^^^^^^ the trait `std::borrow::Borrow<str>` is not implemented for `std::rc::Rc<Box<str>>`
|
= help: the following implementations were found:
= help: <std::rc::Rc<T> as std::borrow::Borrow<T>>
Questions:
Is there a way to construct an Rc<str>?
Which other structures, methods or approaches could help to resolve this problem. Essentially, I need a way to efficiently store two maps string-by-id and id-by-string and be able to retrieve an id by &str, i.e. without any excessive allocations.
Is there a way to construct an Rc<str>?
Annoyingly, not that I know of. Rc::new requires a Sized argument, and I am not sure whether it is an actual limitation, or just something which was forgotten.
Which other structures, methods or approaches could help to resolve this problem?
If you look at the signature of get you'll notice:
fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where K: Borrow<Q>, Q: Hash + Eq
As a result, you could search by &str if K implements Borrow<str>.
String implements Borrow<str>, so the simplest solution is to simply use String as a key. Sure it means you'll actually have two String instead of one... but it's simple. Certainly, a String is simpler to use than a Box<str> (although it uses 8 more bytes).
If you want to shave off this cost, you can use a custom structure instead:
#[derive(Clone, Debug)]
struct RcStr(Rc<String>);
And then implement Borrow<str> for it. You'll then have 2 allocations per key (1 for Rc and 1 for String). Depending on the size of your String, it might consume less or more memory.
If you wish to got further (why not?), here are some ideas:
implement your own reference-counted string, in a single heap-allocation,
use a single arena for the slice inserted in the Dictionary,
...
I have a struct representing a grid of data, and accessors for the rows and columns. I'm trying to add accessors for the rows and columns which return iterators instead of Vec.
use std::slice::Iter;
#[derive(Debug)]
pub struct Grid<Item : Copy> {
raw : Vec<Vec<Item>>
}
impl <Item : Copy> Grid <Item>
{
pub fn new( data: Vec<Vec<Item>> ) -> Grid<Item> {
Grid{ raw : data }
}
pub fn width( &self ) -> usize {
self.rows()[0].len()
}
pub fn height( &self ) -> usize {
self.rows().len()
}
pub fn rows( &self ) -> Vec<Vec<Item>> {
self.raw.to_owned()
}
pub fn cols( &self ) -> Vec<Vec<Item>> {
let mut cols = Vec::new();
for i in 0..self.height() {
let col = self.rows().iter()
.map( |row| row[i] )
.collect::<Vec<Item>>();
cols.push(col);
}
cols
}
pub fn rows_iter( &self ) -> Iter<Vec<Item>> {
// LIFETIME ERROR HERE
self.rows().iter()
}
pub fn cols_iter( &self ) -> Iter<Vec<Item>> {
// LIFETIME ERROR HERE
self.cols().iter()
}
}
Both functions rows_iter and cols_iter have the same problem: error: borrowed value does not live long enough. I've tried a lot of things, but pared it back to the simplest thing to post here.
You can use the method into_iter which returns std::vec::IntoIter. The function iter usually only borrows the data source iterated over. into_iter has ownership of the data source. Thus the vector will live as long as the actual data.
pub fn cols_iter( &self ) -> std::vec::IntoIter<Vec<Item>> {
self.cols().intoiter()
}
However, I think that the design of your Grid type could be improved a lot. Always cloning a vector is not a good thing (to name one issue).
Iterators only contain borrowed references to the original data structure; they don't take ownership of it. Therefore, a vector must live longer than an iterator on that vector.
rows and cols allocate and return a new Vec. rows_iter and cols_iter are trying to return an iterator on a temporary Vec. This Vec will be deallocated before rows_iter or cols_iter return. That means that an iterator on that Vec must be deallocated before the function returns. However, you're trying to return the iterator from the function, which would make the iterator live longer than the end of the function.
There is simply no way to make rows_iter and cols_iter compile as is. I believe these methods are simply unnecessary, since you already provide the public rows and cols methods.