How to implement Ruby's unpack in Rust?

How to implement Ruby's unpack in Rust? - rust

I'm struggling to figure out how to implement the following unpack('IIII') Ruby statement in Rust.
require 'digest'
md5_digest_unpacked = Digest::MD5.digest(someString + "\x00").unpack('IIII')
I have gotten as far as generating the md5 portion with the following. The digests are the same between Ruby and Rust.
let digest = md5::compute(format!("{}{}", &someString, "\x00"));
However, I'm the not sure how to implement unpack('IIII').

As far as I know Rust does not have a drop-in replacement for unpack, but there are two ways to get equivalent behavior here.
The Safe Way
use std::mem;
use std::convert::TryInto;
let mut dest = [0u32; 4];
let mut iter = digest.0.chunks(mem::size_of::<u32>())
.map(|chunk| u32::from_ne_bytes(chunk.try_into().unwrap()));
dest.fill_with(|| iter.next().unwrap());
let [a, b, c, d] = dest;
The upside is that it's safe, the downside is that there are a couple unwraps required but those are infallible given that digest.0 is [u8; 16], and should be optimized out.
The Unsafe Way
Since you're converting to the native endian, you can just transmute the digest:
let [a, b, c, d] = unsafe { std::mem::transmute::<_, [u32; 4]>(digest.0) };
This transmute is safe because [u32; 4] and [u8; 16] have the same size and are both POD. You can find safe wrappers for these kinds of conversions through the bytemuck crate if you're fine with adding another dependency.
Edit: with -C opt-level=3, both methods have the same generated assembly.

Related

A nicer way to create an ad hoc iterator than "vec![a,b,c].into_iter()"?

The expression vec![a, b, c].into_iter() seems unnecessarily long.Is there something better?
The doc for itertools kmerge gives this example:
use itertools::Itertools;
let a = (0..6).step(3);
let b = (1..6).step(3);
let c = (2..6).step(3);
let it = vec![a, b, c].into_iter().kmerge();
itertools::assert_equal(it, vec![0, 1, 2, 3, 4, 5]);
When you just want to create an ad hoc iterator of values, is there anything better?
Maybe something like iter![a,b,c] (if iter is taken, then some other word).
If this is possible, I've got to believe that someone's already done it, but I can't find anything.

An array works just as well as a Vec and avoids the heap allocation.
[a, b, c].into_iter()
Perhaps the documentation predates Rust 1.53 when this was stabilized.

Thanks for all the help! Putting it all together, we can make [a, b, c].kmerge() work:
use itertools::Itertools;
let a = (0..6).step_by(3);
let b = (1..6).step_by(3);
let c = (2..6).step_by(3);
let it = [a, b, c].kmerge();
itertools::assert_equal(it, [0, 1, 2, 3, 4, 5]);
The extra step is to extend all IntoIterator's:
use itertools::kmerge;
use itertools::KMerge;
impl<I: IntoIterator> IntoItertoolsPlus for I {}
pub trait IntoItertoolsPlus: IntoIterator {
fn kmerge(self) -> KMerge<<Self::Item as IntoIterator>::IntoIter>
where
Self: IntoIterator + Sized,
Self::Item: IntoIterator,
<<Self as IntoIterator>::Item as IntoIterator>::Item: PartialOrd,
{
kmerge(self)
}
}
I'm going to use this method for two of my own functions. They both accept a variable number of same-type inputs.

Is there a way to create a mutable reference for a type that doesn't match underlying storage?

Consider a struct that is implemented as a [u8; 2]. Is it possible to construct a &mut u16 mutable reference to the whole struct? Is there a safe way to do it?
As an alternative way of phrasing this, is there a way to implement:
fn ref_all(&mut [u8; 2]) -> &mut u16
Is there a way to do this in general for custom types as well?

There is no perfectly safe method to do this, but there is align_to_mut (and its immutable counterpart align_to) defined for slices, which works for all types and is a safer alternative to the big hammer of mem::transmute:
fn ref_all(x: &mut [u8; 2]) -> &mut u16 {
let (prefix, chunks, suffix) = unsafe {x.align_to_mut::<u16>()};
// you don't need these asserts but know that chunks might not always have an element
assert!(prefix.is_empty());
assert!(suffix.is_empty());
assert_eq!(chunks.len(), 1);
&mut chunks[0]
}
For u16s, this should be fine, although it can cause architecture-dependent behavior due to the endianness of numbers. For other types it'd be very risky to do something like this.

Ergonomics issues with fixed size byte arrays in Rust

Rust sadly cannot produce a fixed size array [u8; 16] with a fixed size slicing operator s[0..16]. It'll throw errors like "expected array of 16 elements, found slice".
I've some KDFs that output several keys in wrapper structs like
pub struct LeafKey([u8; 16]);
pub struct MessageKey([u8; 32]);
fn kdfLeaf(...) -> (MessageKey,LeafKey) {
// let mut r: [u8; 32+16];
let mut r: (MessageKey, LeafKey);
debug_assert_eq!(mem::size_of_val(&r), 384/8);
let mut sha = Sha3::sha3_384();
sha.input(...);
// sha.result(r);
sha.result(
unsafe { mem::transmute::<&mut (MessageKey, LeafKey),&mut [u8;32+16]>(&r) }
);
sha.reset();
// (MessageKey(r[0..31]), LeafKey(r[32..47]))
r
}
Is there a safer way to do this? We know mem::transmute will refuse to compile if the types do not have the same size, but that only checks that pointers have the same size here, so I added that debug_assert.
In fact, I'm not terribly worried about extra copies though since I'm running SHA3 here, but afaik rust offers no ergonomic way to copy amongst byte arrays.
Can I avoid writing (MessageKey, LeafKey) three times here? Is there a type alias for the return type of the current function? Is it safe to use _ in the mem::transmute given that I want the code to refuse to compile if the sizes do not match? Yes, I know I could make a type alias, but that seems silly.
As an aside, there is a longer discussion of s[0..16] not having type [u8; 16] here

There's the copy_from_slice method.
fn main() {
use std::default::Default;
// Using 16+8 because Default isn't implemented
// for [u8; 32+16] due to type explosion unfortunateness
let b: [u8; 24] = Default::default();
let mut c: [u8; 16] = Default::default();
let mut d: [u8; 8] = Default::default();
c.copy_from_slice(&b[..16])
d.copy_from_slice(&b[16..16+8]);
}
Note, unfortunately copy_from_slice throws a runtime error if the slices are not the same length, so make sure you thoroughly test this yourself, or use the lengths of the other arrays to guard.
Unfortunately, c.copy_from_slice(&b[..c.len()]) doesn't work because Rust thinks c is borrowed both immutably and mutably at the same time.

I marked the accepted answer as best since it's safe, and led me to the clone_into_array answer here, but..
Another idea that improves the safety is to make a version of mem::transmute for references that checks the sizes of the referenced types, as opposed to just the pointers. It might look like :
#[inline]
unsafe fn transmute_ptr_mut<A,B>(v: &mut A) -> &mut B {
debug_assert_eq!(core::mem::size_of(A),core::mem::size_of(B));
core::mem::transmute::<&mut A,&mut B>(v)
}
I have raised an issue on the arrayref crate to discuss this, as arrayref might be a reasonable crate for it to live in.
Update : We've a new "best answer" by the arrayref crate developer :
let (a,b) = array_refs![&r,32,16];
(MessageKey(*a), LeafKey(*b))

Take slice of certain length known at compile time

In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?

Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.

I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.

I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes

Using pointer casting to change the “type” of data in memory [duplicate]

I am reading raw data from a file and I want to convert it to an integer:
fn main() {
let buf: &[u8] = &[0, 0, 0, 1];
let num = slice_to_i8(buf);
println!("1 == {}", num);
}
pub fn slice_to_i8(buf: &[u8]) -> i32 {
unimplemented!("what should I do here?")
}
I would do a cast in C, but what do I do in Rust?

I'd suggest using the byteorder crate (which also works in a no-std environment):
use byteorder::{BigEndian, ReadBytesExt}; // 1.2.7
fn main() {
let mut buf: &[u8] = &[0, 0, 0, 1];
let num = buf.read_u32::<BigEndian>().unwrap();
assert_eq!(1, num);
}
This handles oddly-sized slices and automatically advances the buffer so you can read multiple values.
As of Rust 1.32, you can also use the from_le_bytes / from_be_bytes / from_ne_bytes inherent methods on integers:
fn main() {
let buf = [0, 0, 0, 1];
let num = u32::from_be_bytes(buf);
assert_eq!(1, num);
}
These methods only handle fixed-length arrays to avoid dealing with the error when not enough data is present. If you have a slice, you will need to convert it into an array.
See also:
How to get a slice as an array in Rust?
How to convert a slice into an array reference?

I'd like to give this answer here to commit the following additional details:
A working code snippet which converts slice to integer (two ways to do it).
A working solution in no_std environment.
To keep everything in one place for the people getting here from the search engine.
Without external crates, the following methods are suitable to convert from slices to integer even for no_std build starting from Rust 1.32:
Method 1 (try_into + from_be_bytes)
use core::convert::TryInto;
let src = [1, 2, 3, 4, 5, 6, 7];
// 0x03040506
u32::from_be_bytes(src[2..6].try_into().unwrap());
use core::conver::TryInto is for no_std build. And the way to use the standard crate is the following: use std::convert::TryInto;.
(And about endians it has been already answered, but let me keep it here in one place: from_le_bytes, from_be_bytes, and from_ne_bytes - use them depending on how integer is represented in memory).
Method 2 (clone_from_slice + from_be_bytes)
let src = [1, 2, 3, 4, 5, 6, 7];
let mut dst = [0u8; 4];
dst.clone_from_slice(&src[2..6]);
// 0x03040506
u32::from_be_bytes(dst);
Result
In both cases integer will be equal to 0x03040506.

This custom serialize_deserialize_u8_i32 library will safely convert back and forth between u8 array and i32 array i.e. the serialise function will take all of your u8 values and pack them into i32 values and the deserialise function will take this library’s custom i32 values and convert them back to the original u8 values that you started with.
This was built for a specific purpose, however it may come in handy for other uses; depending on whether you want/need a fast/custom converter like this.
https://github.com/second-state/serialize_deserialize_u8_i32

Here’s my implementation (for a different use case) that discards any additional bytes beyond 8 (and therefore doesn’t need to panic if not exact):
pub fn u64_from_slice(slice: &[u8]) -> u64 {
u64::from_ne_bytes(slice.split_at(8).0.try_into().unwrap())
}
The split_at() method returns a tuple of two slices: one from index 0 until the specified index and the other from the specified index until the end. So by using .0 to access the first member of the tuple returned by .split_at(8), it ensures that only the first 8 bytes are passed to u64::to_ne_bytes(), discarding the leftovers. Then, of course, it calls the try_into method on that .0 tuple member, and .unwrap() since split_at does all the custom panicking for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to implement Ruby's unpack in Rust? - rust

Related

A nicer way to create an ad hoc iterator than "vec![a,b,c].into_iter()"?

Is there a way to create a mutable reference for a type that doesn't match underlying storage?

Ergonomics issues with fixed size byte arrays in Rust

Take slice of certain length known at compile time

Using pointer casting to change the “type” of data in memory [duplicate]

Categories

Resources