The recommended way to create a regular boxed slice (i.e. Box<[T]>) seems to be to first create a std::Vec<T>, and use .into_boxed_slice(). However, nothing similar to this seems to work if I want the slice to be wrapped in UnsafeCell.
A solution with unsafe code is fine, but I'd really like to avoid having to manually manage the memory.
The only (not-unsafe) way to create a Box<[T]> is via Box::from, given a &[T] as the parameter. This is because [T] is ?Sized and can't be passed a parameter. This in turn effectively requires T: Copy, because T has to be copied from behind the reference into the new Box. But UnsafeCell is not Copy, regardless if T is. Discussion about making UnsafeCell Copy has been going on for years, yielding no final conclusion, due to safety concerns.
If you really, really want a Box<UnsafeCell<[T]>>, there are only two ways:
Because Box and UnsafeCell are both CoerceUnsize, and [T; N] is Unsize, you can create a Box<UnsafeCell<[T; N]>> and coerce it to a Box<UnsafeCell<[T]>. This limits you to initializing from fixed-sized arrays.
Unsize coercion:
fn main() {
use std::cell::UnsafeCell;
let x: [u8;3] = [1,2,3];
let c: Box<UnsafeCell<[_]>> = Box::new(UnsafeCell::new(x));
}
Because UnsafeCell is #[repr(transparent)], you can create a Box<[T]> and unsafely mutate it to a Box<UnsafeCell<[T]>, as the UnsafeCell<[T]> is guaranteed to have the same memory layout as a [T], given that [T] doesn't use niche-values (even if T does).
Transmute:
// enclose the transmute in a function accepting and returning proper type-pairs
fn into_boxed_unsafecell<T>(inp: Box<[T]>) -> Box<UnsafeCell<[T]>> {
unsafe {
mem::transmute(inp)
}
}
fn main() {
let x = vec![1,2,3];
let b = x.into_boxed_slice();
let c: Box<UnsafeCell<[_]>> = into_boxed_unsafecell(b);
}
Having said all this: I strongly suggest you are suffering from the xy-problem. A Box<UnsafeCell<[T]>> is a very strange type (especially compared to UnsafeCell<Box<[T]>>). You may want to give details on what you are trying to accomplish with such a type.
Just swap the pointer types to UnsafeCell<Box<[T]>>:
use std::cell::UnsafeCell;
fn main() {
let mut res: UnsafeCell<Box<[u32]>> = UnsafeCell::new(vec![1, 2, 3, 4, 5].into_boxed_slice());
unsafe {
println!("{}", (*res.get())[1]);
res.get_mut()[1] = 10;
println!("{}", (*res.get())[1]);
}
}
Playground
Related
Given the code snippet below:
use std::{io::BufWriter, pin::Pin};
pub struct SelfReferential {
pub writer: BufWriter<&'static mut [u8]>, // borrowed from buffer
pub buffer: Pin<Box<[u8]>>,
}
#[cfg(test)]
mod tests {
use std::io::Write;
use super::*;
fn init() -> SelfReferential {
let mut buffer = Pin::new(vec![0; 12].into_boxed_slice());
let writer = unsafe { buffer.as_mut().get_unchecked_mut() };
let writer = unsafe { (writer as *mut [u8]).as_mut().unwrap() };
let writer = BufWriter::new(writer);
SelfReferential { writer, buffer }
}
#[test]
fn move_works() {
let mut sr = init();
sr.writer.write(b"hello ").unwrap();
sr.writer.flush().unwrap();
let mut slice = &mut sr.buffer[6..];
slice.write(b"world!").unwrap();
assert_eq!(&sr.buffer[..], b"hello world!".as_ref());
let mut sr_moved = sr;
sr_moved.writer.write(b"W").unwrap();
sr_moved.writer.flush().unwrap();
assert_eq!(&sr_moved.buffer[..], b"hello World!".as_ref());
}
}
The first question: is it OK to assign 'static lifetime to mutable slice reference in BufWriter? As technically speaking, it's bound to the lifetime of struct instances themselves, and AFAIK there's no safe way to invalidate it.
The second question: besides the fact that unsafe instantiation of this type, in test example, creates two mutable references into the underlying buffer, is there any other potential dangers associated with such an "unidiomatic" (for the lack of better word) type?
is it OK to assign 'static lifetime to mutable slice reference in BufWriter?
Sort of, but there's a bigger problem. The lifetime itself is not worse than any other choice, because there is no lifetime that you can use here which is really accurate. But it is not safe to expose that reference, because then it can be taken:
let w = BufWriter<&'static mut [u8]> = {
let sr = init();
sr.writer
};
// `sr.buffer` has now been dropped, so `w` has a dangling reference
is there any other potential dangers associated with such an "unidiomatic" (for the lack of better word) type?
Yes, it's undefined behavior. Box isn't just managing an allocation; it also (currently) signals a claim of unique, non-aliasing access to the contents. You violate that non-aliasing by creating the writer and then moving the buffer — even though the heap memory is not actually touched, the move of buffer is counted invalidating all references into it.
This is an area of Rust semantics which is not yet fully nailed down, but as far as the current compiler is concerned, this is UB. You can see this if you run your test code under the Miri interpreter.
The good news is, what you're trying to do is a very common desire and people have worked on the problem. I personally recommend using ouroboros — with the help of a macro, it allows you to create the struct you want without writing any new unsafe code. There will be some restrictions on how you use the writer, but nothing you can't tidy out of the way by writing an impl io::Write for SelfReferential. Another, newer library in this space is yoke; I haven't tried it.
I am working on some software where I am managing a buffer of floats in a Vec<T> where T is either an f32 or f64. I sometimes need to interpret this buffer, or sections of it, as a mathematical vector. To this end, I am taking advantage of MatrixSlice and friends in nalgebra.
I can create a DVectorSliceMut, for example, the following way
fn as_vector<'a>(slice: &'a mut [f64]) -> DVectorSliceMut<'a, f64> {
DVectorSliceMut::from(slice)
}
However, sometimes I need to later extract the original slice from the DVectorSliceMut with the original lifetime 'a. Is there a way to do this?
The StorageMut trait has a as_mut_slice member function, but the lifetime of the returned slice is the lifetime of the reference to the Storage implementor, not the original slice. I am okay with a solution which consumes the DVectorSliceMut if necessary.
Update: Methods into_slice and into_slice_mut have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.
Given the current API of nalgebra (v0.27.1) there isn't much that you can do, except:
life with the shorter life-time of StorageMut::as_mut_slice
make a feature request for such a function at nalgebra (which seems you already did)
employ your own unsafe code to make StorageMut::ptr_mut into a &'a mut
You could go with the third option until nalgebra gets update and implement something like this in your own code:
use nalgebra::base::dimension::Dim;
use nalgebra::base::storage::Storage;
use nalgebra::base::storage::StorageMut;
fn into_slice<'a>(vec: DVectorSliceMut<'a, f64>) -> &'a mut [f64] {
let mut inner = vec.data;
// from nalgebra
// https://docs.rs/nalgebra/0.27.1/src/nalgebra/base/matrix_slice.rs.html#190
let (nrows, ncols) = inner.shape();
if nrows.value() != 0 && ncols.value() != 0 {
let sz = inner.linear_index(nrows.value() - 1, ncols.value() - 1);
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), sz + 1) }
} else {
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), 0) }
}
}
Methods into_slice and into_slice_mut which return the original slice have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.
As an educational exercise, I'm looking at porting cvs-fast-export to Rust.
Its basic mode of operation is to parse a number of CVS master files into a intermediate form, and then to analyse the intermediate form with the goal of transforming it into a git fast-export stream.
One of the things that is done when parsing is to convert common parts of the intermediate form into a canonical representation. A motivating example is commit authors. A CVS repository may have hundreds of thousands of individual file commits, but probably less than a thousand authors. So an interning table is used when parsing where you input the author as you parse it from the file and it will give you a pointer to a canonical version, creating a new one if it hasn't seen it before. (I've heard this called atomizing or interning too). This pointer then gets stored on the intermediate object.
My first attempt to do something similar in Rust attempted to use a HashSet as the interning table. Note this uses CVS version numbers rather than authors, this is just a sequence of digits such as 1.2.3.4, represented as a Vec.
use std::collections::HashSet;
use std::hash::Hash;
#[derive(PartialEq, Eq, Debug, Hash, Clone)]
struct CvsNumber(Vec<u16>);
fn intern<T:Eq + Hash + Clone>(set: &mut HashSet<T>, item: T) -> &T {
let dupe = item.clone();
if !set.contains(&item) {
set.insert(item);
}
set.get(&dupe).unwrap()
}
fn main() {
let mut set: HashSet<CvsNumber> = HashSet::new();
let c1 = CvsNumber(vec![1, 2]);
let c2 = intern(&mut set, c1);
let c3 = CvsNumber(vec![1, 2]);
let c4 = intern(&mut set, c3);
}
This fails with error[E0499]: cannot borrow 'set' as mutable more than once at a time. This is fair enough, HashSet doesn't guarantee references to its keys will be valid if you add more items after you have obtained a reference. The C version is careful to guarantee this. To get this guarantee, I think the HashSet should be over Box<T>. However I can't explain the lifetimes for this to the borrow checker.
The ownership model I am going for here is that the interning table owns the canonical versions of the data, and hands out references. The references should be valid as long the interning table exists. We should be able to add new things to the interning table without invalidating the old references. I think the root of my problem is that I'm confused how to write the interface for this contract in a way consistent with the Rust ownership model.
Solutions I see with my limited Rust knowledge are:
Do two passes, build a HashSet on the first pass, then freeze it and use references on the second pass. This means additional temporary storage (sometimes substantial).
Unsafe
Does anyone have a better idea?
I somewhat disagree with #Shepmaster on the use of unsafe here.
While right now it does not cause issue, should someone decide in the future to change the use of HashSet to include some pruning (for example, to only ever keep a hundred authors in there), then unsafe will bite you sternly.
In the absence of a strong performance reason, I would simply use a Rc<XXX>. You can alias it easily enough: type InternedXXX = Rc<XXX>;.
use std::collections::HashSet;
use std::hash::Hash;
use std::rc::Rc;
#[derive(PartialEq, Eq, Debug, Hash, Clone)]
struct CvsNumber(Rc<Vec<u16>>);
fn intern<T:Eq + Hash + Clone>(set: &mut HashSet<T>, item: T) -> T {
if !set.contains(&item) {
let dupe = item.clone();
set.insert(dupe);
item
} else {
set.get(&item).unwrap().clone()
}
}
fn main() {
let mut set: HashSet<CvsNumber> = HashSet::new();
let c1 = CvsNumber(Rc::new(vec![1, 2]));
let c2 = intern(&mut set, c1);
let c3 = CvsNumber(Rc::new(vec![1, 2]));
let c4 = intern(&mut set, c3);
}
Your analysis is correct. The ultimate issue is that when modifying the HashSet, the compiler cannot guarantee that the mutations will not affect the existing allocations. Indeed, in general they might affect them, unless you add another layer of indirection, as you have identified.
This is a prime example of a place that unsafe is useful. You, the programmer, can assert that the code will only ever be used in a particular way, and that particular way will allow the variable to be stable through any mutations. You can use the type system and module visibility to help enforce these conditions.
Note that String already introduces a heap allocation. So long as you don't change the String once it's allocated, you don't need an extra Box.
Something like this seems like an OK start:
use std::{cell::RefCell, collections::HashSet, mem};
struct EasyInterner(RefCell<HashSet<String>>);
impl EasyInterner {
fn new() -> Self {
EasyInterner(RefCell::new(HashSet::new()))
}
fn intern<'a>(&'a self, s: &str) -> &'a str {
let mut set = self.0.borrow_mut();
if !set.contains(s) {
set.insert(s.into());
}
let interned = set.get(s).expect("Impossible missing string");
// TODO: Document the pre- and post-conditions that the code must
// uphold to make this unsafe code valid instead of copying this
// from Stack Overflow without reading it
unsafe { mem::transmute(interned.as_str()) }
}
}
fn main() {
let i = EasyInterner::new();
let a = i.intern("hello");
let b = i.intern("world");
let c = i.intern("hello");
// Still strings
assert_eq!(a, "hello");
assert_eq!(a, c);
assert_eq!(b, "world");
// But with the same address
assert_eq!(a.as_ptr(), c.as_ptr());
assert!(a.as_ptr() != b.as_ptr());
// This shouldn't compile; a cannot outlive the interner
// let x = {
// let i = EasyInterner::new();
// let a = i.intern("hello");
// a
// };
let the_pointer;
let i = {
let i = EasyInterner::new();
{
// Introduce a scope to contstrain the borrow of `i` for `s`
let s = i.intern("inner");
the_pointer = s.as_ptr();
}
i // moving i to a new location
// All outstanding borrows are invalidated
};
// but the data is still allocated
let s = i.intern("inner");
assert_eq!(the_pointer, s.as_ptr());
}
However, it may be much more expedient to use a crate like:
string_cache, which has the collective brainpower of the Servo project behind it.
typed-arena
generational-arena
I've been trying to make a websocket client, but one that has tons of options! I thought of using a builder style since the configuration can be stored in a nice way:
let client = Client::new()
.options(5)
.stuff(true)
// now users can store the config before calling build
.build();
I am having trouble creating a function that takes in a list of strings. Of course I have a few options:
fn strings(self, list: &[&str]) -> Self;
fn strings(self, list: Vec<String>) -> Self;
fn strings(self, list: &[&String]) -> Self;
// etc...
I would like to accept generously so I would like to accept &String, &str, and hopefully keys in a HashMap (since this might be used with a large routing table) so I thought I would accept an iterator over items that implement Borrow<str> like so:
fn strings<P, Sp>(self, P)
where P: Iterator<Item = &'p Sp>,
Sp: Borrow<str> + 'p;
A full example is available here.
This was great until I needed to add another optional list of strings (extensions) to the builder.
This meant that if I created a builder without specifying both lists of strings that the compiler would complain that it couldn't infer the type of the Builder, which makes sense. The only reason this is not OK is that both these fields are optional so the user might never know the type of a field it hasn't yet set.
Does anyone have any ideas on how to specify an iterator over traits? Then I wouldn't have to specify the type fully at compile time. Or maybe just a better way to do this entirely?
A pragmatic solution is to simply discard the concrete types of the types and introduce some indirection. We can Box the trait object and store that as a known type:
use std::borrow::Borrow;
struct Builder {
strings: Option<Box<Iterator<Item = Box<Borrow<str>>>>>,
}
impl Builder {
fn new() -> Self {
Builder { strings: None }
}
fn strings<I>(mut self, iter: I) -> Self
where I: IntoIterator + 'static,
I::Item: Borrow<str> + 'static,
{
let i = iter.into_iter().map(|x| Box::new(x) as Box<Borrow<str>>);
self.strings = Some(Box::new(i));
self
}
fn build(self) -> String {
match self.strings {
Some(iter) => {
let mut s = String::new();
for i in iter {
s.push_str((*i).borrow());
}
s
},
None => format!("No strings here!"),
}
}
}
fn main() {
let s =
Builder::new()
.strings(vec!["a", "b"])
.build();
println!("{}", s);
}
Here we convert the input iterator to a boxed iterator of boxed things that implement Borrow. We have to do some gyrations to convert the specific concrete type we have into a conceptually higher level type but that is still concrete.
This remainder doesn't directly answer your question about an iterator of traits, but it provides an alternate solution that I would use.
You have to pick between that might be a bit more optimal and have a worse user experience, or something that might be a bit suboptimal but a nicer user experience.
You are currently storing the iterator in the builder struct:
struct Builder
where I: Iterator
{
things: Option<I>,
}
This requires that the concrete type of I be known in order to instantiate a Builder. Specifically, the size of that type needs to be known in order to allocate enough space. There's nothing around this; if you want to store a generic type, you need to know what type it is.
For the same reasons, you cannot have this standalone statement:
let foo = None;
How much space needs to be allocated for foo? You cannot know until you know what type the Some might hold.
The way I would go would be to not add type parameters for the struct, but have them on the function. This means that the struct has to have a fixed type to store the values. In your example, a String is a good fit:
struct Builder {
strings: Vec<String>,
}
impl Builder {
fn strings<I>(mut self, iter: I) -> Self
where I: IntoIterator,
I::Item: Into<String>,
{
self.strings.extend(iter.into_iter().map(Into::into));
self
}
}
A Vec has very compact storage (it only takes 3 machine-sized values), and doesn't allocate any heap memory when it is empty. For that reason, I wouldn't wrap it in an Option unless you needed to tell 0 items from the absence of a provided value.
If you are just appending each value to one big string, you might as well do that in the strings method. That depends on your application.
You mention that you might be providing a large amount of data, but I'm not sure that holding the iterator until the build call will really help. You are going to pay the cost earlier or later.
If you are going to reuse the builder, then it depends on what is expensive. If iterating is expensive, then doing it once and reusing that for each build call will be more efficient. If holding onto the memory is expensive, then you don't want to have multiple builders or built items around concurrently. Since the builder will transfer ownership of the memory to the new item, there shouldn't be any waste here.
Is there type erasure of generics in Rust (like in Java) or not? I am unable to find a definitive answer.
When you use a generic function or a generic type, the compiler generates a separate instance for each distinct set of type parameters (I believe lifetime parameters are ignored, as they have no influence on the generated code). This process is called monomorphization. For instance, Vec<i32> and Vec<String> are different types, and therefore Vec<i32>::len() and Vec<String>::len() are different functions. This is necessary, because Vec<i32> and Vec<String> have different memory layouts, and thus need different machine code! Therefore, no, there is no type erasure.
If we use Any::type_id(), as in the following example:
use std::any::Any;
fn main() {
let v1: Vec<i32> = Vec::new();
let v2: Vec<String> = Vec::new();
let a1 = &v1 as &dyn Any;
let a2 = &v2 as &dyn Any;
println!("{:?}", a1.type_id());
println!("{:?}", a2.type_id());
}
we obtain different type IDs for two instances of Vec. This supports the fact that Vec<i32> and Vec<String> are distinct types.
However, reflection capabilities in Rust are limited; Any is pretty much all we've got for now. You cannot obtain more information about the type of a runtime value, such as its name or its members. In order to be able to work with Any, you must cast it (using Any::downcast_ref() or Any::downcast_mut() to a type that is known at compile time.
Rust does have type erasure in the form of virtual method dispatch via dyn Trait, which allows you to have a Vec where the elements have different concrete types:
fn main() {
let list: Vec<Box<dyn ToString>> = vec![Box::new(1), Box::new("hello")];
for item in list {
println!("{}", item.to_string());
}
}
(playground)
Note that the compiler requires you to manually box the elements since it must know the size of every value at compile time. You can use a Box, which has the same size no matter what it points to since it's just a pointer to the heap. You can also use &-references:
fn main() {
let list: Vec<&dyn ToString> = vec![&1, &"hello"];
for item in list {
println!("{}", item.to_string());
}
}
(playground)
However, note that if you use &-references you may run into lifetime issues.