It seems when I create a vec like below, the memory alignment is not done in a way that makes the second vec contiguous with the first. Could you suggest a way to make them contiguous?
let v = vec![vec![1,2,3], vec![4,5,6]];
println!("{:p}", &v[0][0]);
println!("{:p}", &v[0][1]);
println!("{:p}", &v[0][2]);
println!("{:p}", &v[1][0]);
println!("{:p}", &v[1][1]);
println!("{:p}", &v[1][2]);
output:
0x7f6bbc000d20
0x7f6bbc000d24
0x7f6bbc000d28
0x7f6bbc000d40
0x7f6bbc000d44
0x7f6bbc000d48
You cannot have nested yet contiguous Vecs. Multiple vectors means multiple discontinuous allocations.
You'll have to flatten your data into one vector, indexing it by row * num_columns + column or similar if you want it contiguous. There are plenty of crates implementing contiguous matricies like that.
Related
I don't know if I phrased the title correctly, but here's the issue:
let mut rows: Vec<Box<String>> = vec![];
let row1 = &mut **rows.get(0).unwrap();
I want to store mutable references to multiple strings which are stored in boxes in the vector. This should be perfectly safe since I'm not referencing anything in the vector, just getting a box from the vector, dereferencing it and changing the memory it points to. If the vector gets too large and needs to reallocate its data my strings stay intact. But rust's compiler won't let me do that, I get the error cannot borrow data in a '&' reference as mutable
How do I design around this?
I could make rows mutable and use get_mut, but then I wouldn't be able to, for example, have mutable references to two rows at the same time:
let mut rows: Vec<Box<String>> = vec![];
let row1 = &mut **rows.get_mut(0).unwrap();
let row2 = &mut **rows.get_mut(1).unwrap();
*row1 = String::from("aaa");
*row2 = String::from("bbb");
This gives cannot borrow 'rows' as mutable more than once at a time.
Another solution would be to get each row only when I need to use it and then get it again if I need to use it again, but I don't think that's a very performant idea since I'd have to loop through the array to find the row I want every time I need to change something in it (I wouldn't be getting the array element by index).
EDIT: I am trying to design around storing mutable references to the strings, since Rust's compiler won't let me do that. Either I'm missing something I can do to have multiple mutable references to the strings or I need find another way to accomplish that. I need to have mutable references to what the boxes contain, in my program they're not strings, they're structs which have mutable functions, but I used strings here for the sake of simplicity. Here's a bit of code to clarify it
struct Table
{
cols: Vec<Box<Col>>
}
//...
let mut table = Table::new();
let mut id_col = table.new_col("ID" .to_owned());
let mut status_col = table.new_col("Status" .to_owned());
let mut title_col = table.new_col("Title" .to_owned());
let mut deadline_col = table.new_col("Deadline" .to_owned());
let mut tags_col = table.new_col("Tags" .to_owned());
let mut repeat_col = table.new_col("Repeat" .to_owned());
(I used rows in my first example to make it easier to understand)
I will iterate over some data and append stuff to these columns, so I don't want to search for them by name in the vector in each iteration, I want to have "cached" references to them (which are all of these variables). The problem is that my compiler won't let me do that because I can't borrow table as mutable more than once. So I mean by "designing around" is changing my way of thinking and restructuring my code so I don't have this problem.
I want to store mutable references to multiple strings which are stored in boxes in the vector. This should be perfectly safe since I'm not referencing anything in the vector, just getting a box from the vector, dereferencing it and changing the memory it points to. If the vector gets too large and needs to reallocate its data my strings stay intact.
The compiler has literally no idea about this, and it's not a certainty either: in your scheme nothing prevents getting a second mutable handle on the same box or string and breaking that.
RefCell exists for this sort of situations ("interior mutability" is the concept your want to look for), and it implies a runtime performance hit as it needs to keep track of extant borrows (it's basically a single-threaded RWLock).
I could make rows mutable and use get_mut, but then I wouldn't be able to, for example, have mutable references to two rows at the same time
That's what split_mut (and friends) is for.
(I used rows in my first example to make it easier to understand) I will iterate over some data and append stuff to these columns, so I don't want to search for them by name in the vector in each iteration, I want to have "cached" references to them (which are all of these variables). The problem is that my compiler won't let me do that because I can't borrow table as mutable more than once. So I mean by "designing around" is changing my way of thinking and restructuring my code so I don't have this problem.
Yeah no, that's not just having mutable handles on separate parts of a collection, it's having mutable handles while modifying the parent. You probably need Rc (instead of Box) and a RefCell (for interior mutability).
Or do a second pass of split_first_mut or an unrolled iterator:
let mut t = table.cols.iter_mut();
let id_col = t.next().unwrap();
let status_col = t.next().unwrap();
let title_col = t.next().unwrap();
let deadline_col = t.next().unwrap();
let tags_col = t.next().unwrap();
let repeat_col = t.next().unwrap();
with the issue that you will not be able to modify table until all of these references are dead. And that it's pretty ugly.
I have complex number data filled into a Vec<f64> by an external C library (prefer not to change) in the form [i_0_real, i_0_imag, i_1_real, i_1_imag, ...] and it appears that this Vec<f64> has the same memory layout as a Vec<num_complex::Complex<f64>> of half the length would be, given that num_complex::Complex<f64>'s data structure is memory-layout compatible with [f64; 2] as documented here. I'd like to use it as such without needing a re-allocation of a potentially large buffer.
I'm assuming that it's valid to use from_raw_parts() in std::vec::Vec to fake a new Vec that takes ownership of the old Vec's memory (by forgetting the old Vec) and use size / 2 and capacity / 2, but that requires unsafe code. Is there a "safe" way to do this kind of data re-interpretation?
The Vec is allocated in Rust as a Vec<f64> and is populated by a C function using .as_mut_ptr() that fills in the Vec<f64>.
My current compiling unsafe implementation:
extern crate num_complex;
pub fn convert_to_complex_unsafe(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
let new_vec = unsafe {
Vec::from_raw_parts(
buffer.as_mut_ptr() as *mut num_complex::Complex<f64>,
buffer.len() / 2,
buffer.capacity() / 2,
)
};
std::mem::forget(buffer);
return new_vec;
}
fn main() {
println!(
"Converted vector: {:?}",
convert_to_complex_unsafe(vec![3.0, 4.0, 5.0, 6.0])
);
}
Is there a "safe" way to do this kind of data re-interpretation?
No. At the very least, this is because the information you need to know is not expressed in the Rust type system but is expressed via prose (a.k.a. the docs):
Complex<T> is memory layout compatible with an array [T; 2].
— Complex docs
If a Vec has allocated memory, then [...] its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice),
— Vec docs
Arrays coerce to slices ([T])
— Array docs
Since a Complex is memory-compatible with an array, an array's data is memory-compatible with a slice, and a Vec's data is memory-compatible with a slice, this transformation should be safe, even though the compiler cannot tell this.
This information should be attached (via a comment) to your unsafe block.
I would make some small tweaks to your function:
Having two Vecs at the same time pointing to the same data makes me very nervous. This can be trivially avoided by introducing some variables and forgetting one before creating the other.
Remove the return keyword to be more idiomatic
Add some asserts that the starting length of the data is a multiple of two.
As rodrigo points out, the capacity could easily be an odd number. To attempt to avoid this, we call shrink_to_fit. This has the downside that the Vec may need to reallocate and copy the memory, depending on the implementation.
Expand the unsafe block to cover all of the related code that is required to ensure that the safety invariants are upheld.
pub fn convert_to_complex(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
buffer.shrink_to_fit();
let ptr = buffer.as_mut_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
let cap = buffer.capacity();
assert!(len % 2 == 0);
assert!(cap % 2 == 0);
std::mem::forget(buffer);
Vec::from_raw_parts(ptr, len / 2, cap / 2)
}
}
To avoid all the worrying about the capacity, you could just convert a slice into the Vec. This also doesn't have any extra memory allocation. It's simpler because we can "lose" any odd trailing values because the Vec still maintains them.
pub fn convert_to_complex(buffer: &[f64]) -> &[num_complex::Complex<f64>] {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
let ptr = buffer.as_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
assert!(len % 2 == 0);
std::slice::from_raw_parts(ptr, len / 2)
}
}
This question already has answers here:
Efficiently insert or replace multiple elements in the middle or at the beginning of a Vec?
(3 answers)
Closed 5 years ago.
I was expecting a Vec::insert_slice(index, slice) method — a solution for strings (String::insert_str()) does exist.
I know about Vec::insert(), but that inserts only one element at a time, not a slice. Alternatively, when the prepended slice is a Vec one can append to it instead, but this does not generalize. The idiomatic solution probably uses Vec::splice(), but using iterators as in the example makes me scratch my head.
Secondly, the whole concept of prepending has seemingly been exorcised from the docs. There isn't a single mention. I would appreciate comments as to why. Note that relatively obscure methods like Vec::swap_remove() do exist.
My typical use case consists of indexed byte strings.
String::insert_str makes use of the fact that a string is essentially a Vec<u8>. It reallocates the underlying buffer, moves all the initial bytes to the end, then adds the new bytes to the beginning.
This is not generally safe and can not be directly added to Vec because during the copy the Vec is no longer in a valid state — there are "holes" in the data.
This doesn't matter for String because the data is u8 and u8 doesn't implement Drop. There's no such guarantee for an arbitrary T in a Vec, but if you are very careful to track your state and clean up properly, you can do the same thing — this is what splice does!
the whole concept of prepending has seemingly been exorcised
I'd suppose this is because prepending to a Vec is a poor idea from a performance standpoint. If you need to do it, the naïve case is straight-forward:
fn prepend<T>(v: Vec<T>, s: &[T]) -> Vec<T>
where
T: Clone,
{
let mut tmp: Vec<_> = s.to_owned();
tmp.extend(v);
tmp
}
This has a bit higher memory usage as we need to have enough space for two copies of v.
The splice method accepts an iterator of new values and a range of values to replace. In this case, we don't want to replace anything, so we give an empty range of the index we want to insert at. We also need to convert the slice into an iterator of the appropriate type:
let s = &[1, 2, 3];
let mut v = vec![4, 5];
v.splice(0..0, s.iter().cloned());
splice's implementation is non-trivial, but it efficiently does the tracking we need. After removing a chunk of values, it then reuses that chunk of memory for the new values. It also moves the tail of the vector around (maybe a few times, depending on the input iterator). The Drop implementation of Slice ensures that things will always be in a valid state.
I'm more surprised that VecDeque doesn't support it, as it's designed to be more efficient about modifying both the head and tail of the data.
Taking into consideration what Shepmaster said, you could implement a function prepending a slice with Copyable elements to a Vec just like String::insert_str() does in the following way:
use std::ptr;
unsafe fn prepend_slice<T: Copy>(vec: &mut Vec<T>, slice: &[T]) {
let len = vec.len();
let amt = slice.len();
vec.reserve(amt);
ptr::copy(vec.as_ptr(),
vec.as_mut_ptr().offset((amt) as isize),
len);
ptr::copy(slice.as_ptr(),
vec.as_mut_ptr(),
amt);
vec.set_len(len + amt);
}
fn main() {
let mut v = vec![4, 5, 6];
unsafe { prepend_slice(&mut v, &[1, 2, 3]) }
assert_eq!(&v, &[1, 2, 3, 4, 5, 6]);
}
I want to sort HashMap data by value in Rust (e.g., when counting character frequency in a string).
The Python equivalent of what I’m trying to do is:
count = {}
for c in text:
count[c] = count.get('c', 0) + 1
sorted_data = sorted(count.items(), key=lambda item: -item[1])
print('Most frequent character in text:', sorted_data[0][0])
My corresponding Rust code looks like this:
// Count the frequency of each letter
let mut count: HashMap<char, u32> = HashMap::new();
for c in text.to_lowercase().chars() {
*count.entry(c).or_insert(0) += 1;
}
// Get a sorted (by field 0 ("count") in reversed order) list of the
// most frequently used characters:
let mut count_vec: Vec<(&char, &u32)> = count.iter().collect();
count_vec.sort_by(|a, b| b.1.cmp(a.1));
println!("Most frequent character in text: {}", count_vec[0].0);
Is this idiomatic Rust? Can I construct the count_vec in a way so that it would consume the HashMaps data and owns it (e.g., using map())? Would this be more idomatic?
Is this idiomatic Rust?
There's nothing particularly unidiomatic, except possibly for the unnecessary full type constraint on count_vec; you could just use
let mut count_vec: Vec<_> = count.iter().collect();
It's not difficult from context to work out what the full type of count_vec is. You could also omit the type constraint for count entirely, but then you'd have to play shenanigans with your integer literals to have the correct value type inferred. That is to say, an explicit annotation is eminently reasonable in this case.
The other borderline change you could make if you feel like it would be to use |a, b| a.1.cmp(b.1).reverse() for the sort closure. The Ordering::reverse method just reverses the result so that less-than becomes greater-than, and vice versa. This makes it slightly more obvious that you meant what you wrote, as opposed to accidentally transposing two letters.
Can I construct the count_vec in a way so that it would consume the HashMaps data and owns it?
Not in any meaningful way. Just because HashMap is using memory doesn't mean that memory is in any way compatible with Vec. You could use count.into_iter() to consume the HashMap and move the elements out (as opposed to iterating over pointers), but since both char and u32 are trivially copyable, this doesn't really gain you anything.
This could be another way to address the matter without the need of an intermediary vector.
// Count the frequency of each letter
let mut count: HashMap<char, u32> = HashMap::new();
for c in text.to_lowercase().chars() {
*count.entry(c).or_insert(0) += 1;
}
let top_char = count.iter().max_by(|a, b| a.1.cmp(&b.1)).unwrap();
println!("Most frequent character in text: {}", top_char.0);
use BTreeMap for sorted data
BTreeMap sorts its elements by key by default, therefore exchanging the place of your key and value and putting them into a BTreeMap
let count_b: BTreeMap<&u32,&char> = count.iter().map(|(k,v)| (v,k)).collect();
should give you a sorted map according to character frequency.
Some character of the same frequency shall be lost though. But if you only want the most frequent character, it does not matter.
You can get the result using
println!("Most frequent character in text: {}", count_b.last_key_value().unwrap().1);
I have a vector data with size unknown at compile time. I want to create a new vector of the exact that size. These variants don't work:
let size = data.len();
let mut try1: Vec<u32> = vec![0 .. size]; //ah, you need compile-time constant
let mut try2: Vec<u32> = Vec::new(size); //ah, there is no constructors with arguments
I'm a bit frustrated - there is no any information in Rust API, book, reference or rustbyexample.com about how to do such simple base task with vector.
This solution works but I don't think it is good to do so, it is strange to generate elements one by one and I don't have need in any exact values of elements:
let mut temp: Vec<u32> = range(0u32, data.len() as u32).collect();
The recommended way of doing this is in fact to form an iterator and collect it to a vector. What you want is not precisely clear, however; if you want [0, 1, 2, …, size - 1], you would create a range and collect it to a vector:
let x = (0..size).collect::<Vec<_>>();
(range(0, size) is better written (0..size) now; the range function will be disappearing from the prelude soon.)
If you wish a vector of zeroes, you would instead write it thus:
let x = std::iter::repeat(0).take(size).collect::<Vec<_>>();
If you merely want to preallocate the appropriate amount of space but not push values onto the vector, Vec::with_capacity(capacity) is what you want.
You should also consider whether you need it to be a vector or whether you can work directly with the iterator.
You can use Vec::with_capacity() constructor followed by an unsafe set_len() call:
let n = 128;
let v: Vec<u32> = Vec::with_capacity(n);
unsafe { v.set_len(n); }
v[12] = 64; // won't panic
This way the vector will "extend" over the uninitialized memory. If you're going to use it as a buffer it is a valid approach, as long as the type of elements is Copy (primitives are ok, but it will break horribly if the type has a destructor).