I want to detect and warn about collisions in the logs when collecting a IntoIterator into a HashMap. The current Rust behavior of collecting into a HashMap is to silently overwrite the earlier values with the latest one.
fn main() {
let a = vec![(0, 1), (0, 2)];
let b: std::collections::HashMap<_, _> = a.into_iter().collect();
println!("{}", b[&0]);
}
Output:
2
(Playground)
A possible workaround is collecting into a Vec then manually writing the conversion code, but that will introduce extra allocation overhead and unreadable code. Not consuming the original collection and comparing the len()s is less noisy, but still hogs 1x more memory (?) and can't detect where exactly the collision is happening.
Is there a more elegent way of handling HashMap collisions?
Depending on what you want to do in the case of a collision, you might be able to use fold (or try_fold) and the entry API to implement your custom functionality:
use std::collections::HashMap;
fn main() {
let a = vec![(0, 1), (0, 3), (0, 2)];
let b: std::collections::HashMap<_, _> = a.into_iter().fold(HashMap::new(), |mut map, (k,v)| {
map.entry(k)
.and_modify(|_| println!("Collision with {}, {}!", k, v))
.or_insert(v);
map
});
println!("{}", b[&0]);
}
(Playground)
Related
I have a Vec with iterators over slices. Some of them are in normal order, some are reversed.
let mut output = Vec::new();
let input = b"1234567890";
let cuts = [(0,1), (1,3), (3, 5)];
for (start, end) in cuts{
output.push(input[start..end].iter());
output.push(input[start..end].iter().rev());
}
But I can't compile it, because of
expected struct `std::slice::Iter`, found struct `Rev`
I understand the error, but I wounder, if I can convert both iterators into some common type (without 'collecting' them).
UPD: I even tried to use .iter().rev().rev(), but it's a different type from iter().rev()...
An iterator and a reverse iterator are completely different types and cannot be stored in the same vector directly. They most likely aren't even the same size.
However, they both implement Iterator, of course. So you can store them via indirection, either as trait object references (&dyn Iterator) or as boxed trait objects (Box<dyn Iterator>). Which one depends on your usecase; the first one is with less overhead, but borrowed; the second one has a minimal overhead, but owns the objects.
In your case, as you don't keep the iterator objects around and instead want to store them in the list directly, the proper solution would be to use Box<dyn Iterator>, like this:
fn main() {
let mut output: Vec<Box<dyn Iterator<Item = &u8>>> = Vec::new();
let input = b"1234567890";
let cuts = [(0, 1), (1, 3), (3, 5)];
for (start, end) in cuts {
output.push(Box::new(input[start..end].iter()));
output.push(Box::new(input[start..end].iter().rev()));
}
for iter in output {
println!("{:?}", iter.collect::<Vec<_>>());
}
}
[49]
[49]
[50, 51]
[51, 50]
[52, 53]
[53, 52]
Minor nitpick:
Iterators over &u8 are discouraged, because &u8 are actually larger than u8. As u8 are Copy, adding copied() to the iterator reduces the item size without any cost; most likely even with a performance benefit.
Reason is that returning a &u8 is slower than a u8 (because &u8 is either 4 or 8 byte, while u8 is a single byte). Further, accessing a &u8 has one indirection, while accessing a u8 is very fast.
So I'd rewrite your code like this:
fn main() {
let mut output: Vec<Box<dyn Iterator<Item = u8>>> = Vec::new();
let input = b"1234567890";
let cuts = [(0, 1), (1, 3), (3, 5)];
for (start, end) in cuts {
output.push(Box::new(input[start..end].iter().copied()));
output.push(Box::new(input[start..end].iter().copied().rev()));
}
for iter in output {
println!("{:?}", iter.collect::<Vec<_>>());
}
}
[49]
[49]
[50, 51]
[51, 50]
[52, 53]
[53, 52]
Of course this is only true for &u8 and not for &mut u8; if you want to mutate the original items, copying them is counterproductive.
To convert iterators into a common type you can use dynamic dispatch by storing trait objects into your output vector:
let mut output: Vec<Box<dyn Iterator<Item = _>>> = Vec::new();
let input = b"1234567890";
let cuts = [(0, 1), (1, 3), (3, 5)];
for (start, end) in cuts {
output.push(Box::new(input[start..end].iter()));
output.push(Box::new(input[start..end].iter().rev()));
}
Playground
This question already has answers here:
How do I convert a Vec<T> to a Vec<U> without copying the vector?
(2 answers)
Closed 3 years ago.
Is there a better way to cast Vec<i8> to Vec<u8> in Rust except for these two?
creating a copy by mapping and casting every entry
using std::transmute
The (1) is slow, the (2) is "transmute should be the absolute last resort" according to the docs.
A bit of background maybe: I'm getting a Vec<i8> from the unsafe gl::GetShaderInfoLog() call and want to create a string from this vector of chars by using String::from_utf8().
The other answers provide excellent solutions for the underlying problem of creating a string from Vec<i8>. To answer the question as posed, creating a Vec<u8> from data in a Vec<i8> can be done without copying or transmuting the vector. As pointed out by #trentcl, transmuting the vector directly constitutes undefined behavior because Vec is allowed to have different layout for different types.
The correct (though still requiring the use of unsafe) way to transfer a vector's data without copying it is:
obtain the *mut i8 pointer to the data in the vector, along with its length and capacity
leak the original vector to prevent it from freeing the data
use Vec::from_raw_parts to build a new vector, giving it the pointer cast to *mut u8 - this is the unsafe part, because we are vouching that the pointer contains valid and initialized data, and that it is not in use by other objects, and so on.
This is not UB because the new Vec is given the pointer of the correct type from the start. Code (playground):
fn vec_i8_into_u8(v: Vec<i8>) -> Vec<u8> {
// ideally we'd use Vec::into_raw_parts, but it's unstable,
// so we have to do it manually:
// first, make sure v's destructor doesn't free the data
// it thinks it owns when it goes out of scope
let mut v = std::mem::ManuallyDrop::new(v);
// then, pick apart the existing Vec
let p = v.as_mut_ptr();
let len = v.len();
let cap = v.capacity();
// finally, adopt the data into a new Vec
unsafe { Vec::from_raw_parts(p as *mut u8, len, cap) }
}
fn main() {
let v = vec![-1i8, 2, 3];
assert!(vec_i8_into_u8(v) == vec![255u8, 2, 3]);
}
transmute on a Vec is always, 100% wrong, causing undefined behavior, because the layout of Vec is not specified. However, as the page you linked also mentions, you can use raw pointers and Vec::from_raw_parts to perform this correctly. user4815162342's answer shows how.
(std::mem::transmute is the only item in the Rust standard library whose documentation consists mostly of suggestions for how not to use it. Take that how you will.)
However, in this case, from_raw_parts is also unnecessary. The best way to deal with C strings in Rust is with the wrappers in std::ffi, CStr and CString. There may be better ways to work this in to your real code, but here's one way you could use CStr to borrow a Vec<c_char> as a &str:
const BUF_SIZE: usize = 1000;
let mut info_log: Vec<c_char> = vec![0; BUF_SIZE];
let mut len: usize;
unsafe {
gl::GetShaderInfoLog(shader, BUF_SIZE, &mut len, info_log.as_mut_ptr());
}
let log = Cstr::from_bytes_with_nul(info_log[..len + 1])
.expect("Slice must be nul terminated and contain no nul bytes")
.to_str()
.expect("Slice must be valid UTF-8 text");
Notice there is no unsafe code except to call the FFI function; you could also use with_capacity + set_len (as in wasmup's answer) to skip initializing the Vec to 1000 zeros, and use from_bytes_with_nul_unchecked to skip checking the validity of the returned string.
See this:
fn get_compilation_log(&self) -> String {
let mut len = 0;
unsafe { gl::GetShaderiv(self.id, gl::INFO_LOG_LENGTH, &mut len) };
assert!(len > 0);
let mut buf = Vec::with_capacity(len as usize);
let buf_ptr = buf.as_mut_ptr() as *mut gl::types::GLchar;
unsafe {
gl::GetShaderInfoLog(self.id, len, std::ptr::null_mut(), buf_ptr);
buf.set_len(len as usize);
};
match String::from_utf8(buf) {
Ok(log) => log,
Err(vec) => panic!("Could not convert compilation log from buffer: {}", vec),
}
}
See ffi:
let s = CStr::from_ptr(strz_ptr).to_str().unwrap();
Doc
I have complex number data filled into a Vec<f64> by an external C library (prefer not to change) in the form [i_0_real, i_0_imag, i_1_real, i_1_imag, ...] and it appears that this Vec<f64> has the same memory layout as a Vec<num_complex::Complex<f64>> of half the length would be, given that num_complex::Complex<f64>'s data structure is memory-layout compatible with [f64; 2] as documented here. I'd like to use it as such without needing a re-allocation of a potentially large buffer.
I'm assuming that it's valid to use from_raw_parts() in std::vec::Vec to fake a new Vec that takes ownership of the old Vec's memory (by forgetting the old Vec) and use size / 2 and capacity / 2, but that requires unsafe code. Is there a "safe" way to do this kind of data re-interpretation?
The Vec is allocated in Rust as a Vec<f64> and is populated by a C function using .as_mut_ptr() that fills in the Vec<f64>.
My current compiling unsafe implementation:
extern crate num_complex;
pub fn convert_to_complex_unsafe(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
let new_vec = unsafe {
Vec::from_raw_parts(
buffer.as_mut_ptr() as *mut num_complex::Complex<f64>,
buffer.len() / 2,
buffer.capacity() / 2,
)
};
std::mem::forget(buffer);
return new_vec;
}
fn main() {
println!(
"Converted vector: {:?}",
convert_to_complex_unsafe(vec![3.0, 4.0, 5.0, 6.0])
);
}
Is there a "safe" way to do this kind of data re-interpretation?
No. At the very least, this is because the information you need to know is not expressed in the Rust type system but is expressed via prose (a.k.a. the docs):
Complex<T> is memory layout compatible with an array [T; 2].
— Complex docs
If a Vec has allocated memory, then [...] its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice),
— Vec docs
Arrays coerce to slices ([T])
— Array docs
Since a Complex is memory-compatible with an array, an array's data is memory-compatible with a slice, and a Vec's data is memory-compatible with a slice, this transformation should be safe, even though the compiler cannot tell this.
This information should be attached (via a comment) to your unsafe block.
I would make some small tweaks to your function:
Having two Vecs at the same time pointing to the same data makes me very nervous. This can be trivially avoided by introducing some variables and forgetting one before creating the other.
Remove the return keyword to be more idiomatic
Add some asserts that the starting length of the data is a multiple of two.
As rodrigo points out, the capacity could easily be an odd number. To attempt to avoid this, we call shrink_to_fit. This has the downside that the Vec may need to reallocate and copy the memory, depending on the implementation.
Expand the unsafe block to cover all of the related code that is required to ensure that the safety invariants are upheld.
pub fn convert_to_complex(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
buffer.shrink_to_fit();
let ptr = buffer.as_mut_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
let cap = buffer.capacity();
assert!(len % 2 == 0);
assert!(cap % 2 == 0);
std::mem::forget(buffer);
Vec::from_raw_parts(ptr, len / 2, cap / 2)
}
}
To avoid all the worrying about the capacity, you could just convert a slice into the Vec. This also doesn't have any extra memory allocation. It's simpler because we can "lose" any odd trailing values because the Vec still maintains them.
pub fn convert_to_complex(buffer: &[f64]) -> &[num_complex::Complex<f64>] {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
let ptr = buffer.as_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
assert!(len % 2 == 0);
std::slice::from_raw_parts(ptr, len / 2)
}
}
In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?
Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.
I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.
I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes
I'm trying to initialize a boxed slice of None values, such that the underlying type T does not need to implement Clone or Copy. Here a few ideal solutions:
fn by_vec<T>() -> Box<[Option<T>]> {
vec![None; 5].into_boxed_slice()
}
fn by_arr<T>() -> Box<[Option<T>]> {
Box::new([None; 5])
}
Unfortunately, the by_vec implementation requires T: Clone and the by_arr implemenation requires T: Copy. I've experimented with a few more approaches:
fn by_vec2<T>() -> Box<[Option<T>]> {
let v = &mut Vec::with_capacity(5);
for i in 0..v.len() {
v[i] = None;
}
v.into_boxed_slice() // Doesn't work: cannot move out of borrowed content
}
fn by_iter<T>() -> Box<[Option<T>]> {
(0..5).map(|_| None).collect::<Vec<Option<T>>>().into_boxed_slice()
}
by_vec2 doesn't get past the compiler (I'm not sure I understand why), but by_iter does. I'm concerned about the performance of collect -- will it need to resize the vector it is collecting into as it iterates, or can it allocate the correct sized vector to begin with?
Maybe I'm going about this all wrong -- I'm very new to Rust, so any tips would be appreciated!
Let's start with by_vec2. You are taking a &mut reference to a Vec. You shouldn't do that, work directly with the Vec and make the v binding mutable.
Then you are iterating over the length of a Vec with a capacity of 5 and a length of 0. That means your loop never gets executed. What you wanted was to iterate over 0..v.cap().
Since your v is still of length 0, accessing v[i] in the loop will panic at runtime. What you actually want is v.push(None). This would normally cause reallocations, but in your case you already allocated with Vec::with_capacity, so pushing 5 times will not allocate.
This time around we did not take a reference to the Vec so into_boxed_slice will actually work.
fn by_vec2<T>() -> Box<[Option<T>]> {
let mut v = Vec::with_capacity(5);
for _ in 0..v.capacity() {
v.push(None);
}
v.into_boxed_slice()
}
Your by_iter function actually only allocates once. The Range iterator created by 0..5 knows that is exactly 5 elements long. So collect will in fact check that length and allocate only once.