Initialize rest of array with a default value - rust

Is there a way in Rust to initialize the first n elements of an array manually, and specify a default value to be used for the rest?
Specifically, when initializing structs, we can specify some fields, and use .. to initialize the remaining fields from another struct, e.g.:
let foo = Foo {
x: 1,
y: 2,
..Default::default()
};
Is there a similar mechanism for initializing part of an array manually? e.g.
let arr: [i32; 5] = [1, 2, ..3];
to get [1, 2, 3, 3, 3]?

Edit: I realized this can be done on stable. For the original answer, see below.
I had to juggle with the compiler so it will be able to infer the type of the array, but it works:
// A workaround on the same method on `MaybeUninit` being unstable.
// Copy-paste from https://doc.rust-lang.org/stable/src/core/mem/maybe_uninit.rs.html#943-953.
pub unsafe fn maybe_uninit_array_assume_init<T, const N: usize>(
array: [core::mem::MaybeUninit<T>; N],
) -> [T; N] {
// SAFETY:
// * The caller guarantees that all elements of the array are initialized
// * `MaybeUninit<T>` and T are guaranteed to have the same layout
// * `MaybeUninit` does not drop, so there are no double-frees
// And thus the conversion is safe
(&array as *const _ as *const [T; N]).read()
}
macro_rules! array_with_default {
(#count) => { 0usize };
(#count $e:expr, $($rest:tt)*) => { 1usize + array_with_default!(#count $($rest)*) };
[$($e:expr),* ; $default:expr; $default_size:expr] => {{
// There is no hygiene for items, so we use unique names here.
#[allow(non_upper_case_globals)]
const __array_with_default_EXPRS_LEN: usize = array_with_default!(#count $($e,)*);
#[allow(non_upper_case_globals)]
const __array_with_default_DEFAULT_SIZE: usize = $default_size;
let mut result = unsafe { ::core::mem::MaybeUninit::<
[::core::mem::MaybeUninit<_>; {
__array_with_default_EXPRS_LEN + __array_with_default_DEFAULT_SIZE
}],
>::uninit().assume_init() };
let mut dest = result.as_mut_ptr();
$(
let expr = $e;
unsafe {
::core::ptr::write((*dest).as_mut_ptr(), expr);
dest = dest.add(1);
}
)*
for default_value in [$default; __array_with_default_DEFAULT_SIZE] {
unsafe {
::core::ptr::write((*dest).as_mut_ptr(), default_value);
dest = dest.add(1);
}
}
unsafe { maybe_uninit_array_assume_init(result) }
}};
}
Playground.
Based on the example from #Denys, here is a macro that works on nightly. Note that I had problems matching the .. syntax (though I'm not entirely sure that's impossible; just didn't put much time into that):
#![feature(generic_const_exprs)]
#![allow(incomplete_features)]
use std::mem::MaybeUninit;
pub fn concat_arrays<T, const N: usize, const M: usize>(a: [T; N], b: [T; M]) -> [T; N + M] {
unsafe {
let mut result = MaybeUninit::<[T; N + M]>::uninit();
let dest = result.as_mut_ptr().cast::<[T; N]>();
dest.write(a);
let dest = dest.add(1).cast::<[T; M]>();
dest.write(b);
result.assume_init()
}
}
macro_rules! array_with_default {
[$($e:expr),* ; $default:expr; $default_size:expr] => {
concat_arrays([$($e),*], [$default; $default_size])
};
}
fn main() {
dbg!(array_with_default![1, 2; 3; 7]);
}
Playground.

As another option, you can build a default filled array and just modify the positions you require in runtime:
#![feature(explicit_generic_args_with_impl_trait)]
fn array_with_default_and_positions<T: Copy, const SIZE: usize>(
default: T,
init_values: impl IntoIterator<Item = (usize, T)>,
) -> [T; SIZE] {
let mut res = [default; SIZE];
for (i, e) in init_values.into_iter() {
res[i] = e;
}
res
}
Playground
Notice the use of #![feature(explicit_generic_args_with_impl_trait)],which is nightly, it could be replaced by an slice since T and usize are copy:
fn array_with_default_and_positions_v2<T: Copy, const SIZE: usize>(
default: T,
init_values: &[(usize, T)],
) -> [T; SIZE] {
let mut res = [default; SIZE];
for &(i, e) in init_values.into_iter() {
res[i] = e;
}
res
}

Related

Conditionally sort a Vec in Rust

Let's say I want to sort a Vec of non-Clone items - but only maybe (this is a boiled down example of an issue in my code).
My attempt would be something like:
fn maybe_sort<T>(x: Vec<T>) -> Vec<T>
where
T: std::cmp::Ord,
{
// First, I need a copy of the vector - but only the vector, not the items inside
let mut copied = x.iter().collect::<Vec<_>>();
copied.sort();
// In my actual code the line below depends on the sorted vec
if rand::random() {
return copied.into_iter().map(|x| *x).collect::<Vec<_>>();
} else {
return x;
}
}
Alas the borrow checker isn't happy. I have a shared reference to each item in the Vec, and although I am not ever returning 2 references to the same item, Rust can't tell.
Is there a way to do this without unsafe? (and if not, what's the cleanest way to do it with unsafe.
You can .enumerate() the values to keep their original index. You can sort this based on its value T and decide whether to return the sorted version, or reverse the sort by sorting by original index.
fn maybe_sort<T: Ord>(x: Vec<T>) -> Vec<T> {
let mut items: Vec<_> = x.into_iter().enumerate().collect();
items.sort_by(|(_, a), (_, b)| a.cmp(b));
if rand::random() {
// return items in current order
}
else {
// undo the sort
items.sort_by_key(|(index, _)| *index);
}
items.into_iter().map(|(_, value)| value).collect()
}
If T implements Default, you can do it with a single sort and without unsafe like this:
fn maybe_sort<T: Ord + Default> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), Default::default);
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = v;
}
return r;
}
}
Playground
If T does not implement Default, the same thing can be done with MaybeUninit:
use std::mem::{self, MaybeUninit};
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
return x;
} else {
let mut r = Vec::new();
r.resize_with (x.len(), || unsafe { MaybeUninit::uninit().assume_init() });
for (i, v) in idx.into_iter().zip (x.drain(..)) {
r[i] = MaybeUninit::new (v);
}
return unsafe { mem::transmute::<_, Vec<T>> (r) };
}
}
Playground
Finally, here's a safe solution which doesn't require T to implement Default, but allocates an extra buffer (there is theoretically a way to reorder the indices in place, but I'll leave it as an exercise to the reader ☺):
fn maybe_sort<T: Ord> (mut x: Vec<T>) -> Vec<T> {
let mut idx = (0..x.len()).collect::<Vec<_>>();
idx.sort_by_key (|&i| &x[i]);
if rand::random() {
let mut rev = vec![0; x.len()];
for (i, &j) in idx.iter().enumerate() {
rev[j] = i;
}
for i in 0..x.len() {
while rev[i] != i {
let j = rev[i];
x.swap (j, i);
rev.swap (j, i);
}
}
}
x
}
Playground

How to write a macro that splits a byte into a tuple of bits of user-specified count?

I would like to have macro splitting one byte into tuple with 2-8 u8 parts using bitreader crate.
I managed to achieve that by following code:
use bitreader::BitReader;
trait Tupleprepend<T> {
type ResultType;
fn prepend(self, t: T) -> Self::ResultType;
}
macro_rules! impl_tuple_prepend {
( () ) => {};
( ( $t0:ident $(, $types:ident)* ) ) => {
impl<$t0, $($types,)* T> Tupleprepend<T> for ($t0, $($types,)*) {
type ResultType = (T, $t0, $($types,)*);
fn prepend(self, t: T) -> Self::ResultType {
let ($t0, $($types,)*) = self;
(t, $t0, $($types,)*)
}
}
impl_tuple_prepend! { ($($types),*) }
};
}
impl_tuple_prepend! {
(_1, _2, _3, _4, _5, _6, _7, _8)
}
macro_rules! split_byte (
($reader:ident, $bytes:expr, $count:expr) => {{
($reader.read_u8($count).unwrap(),)
}};
($reader:ident, $bytes:expr, $count:expr, $($next_counts:expr),+) => {{
let head = split_byte!($reader, $bytes, $count);
let tail = split_byte!($reader, $bytes, $($next_counts),+);
tail.prepend(head.0)
}};
($bytes:expr $(, $count:expr)* ) => {{
let mut reader = BitReader::new($bytes);
split_byte!(reader, $bytes $(, $count)+)
}};
);
Now I can use this code as I would like to:
let buf: &[u8] = &[0x72];
let (bit1, bit2, bits3to8) = split_byte!(&buf, 1, 1, 6);
Is there a way to avoid using Tupleprepend trait and create only 1 tuple instead of 8 in the worst scenario?
Because the number of bit widths directly corresponds to the number of returned values, I'd solve the problem using generics and arrays instead. The macro only exists to remove the typing of the [], which I don't really think is worth it.
fn split_byte<A>(b: u8, bit_widths: A) -> A
where
A: Default + std::ops::IndexMut<usize, Output = u8>,
for<'a> &'a A: IntoIterator<Item = &'a u8>,
{
let mut result = A::default();
let mut start = 0;
for (idx, &width) in bit_widths.into_iter().enumerate() {
let shifted = b >> (8 - width - start);
let mask = (0..width).fold(0, |a, _| (a << 1) | 1);
result[idx] = shifted & mask;
start += width;
}
result
}
macro_rules! split_byte {
($b:expr, $($w:expr),+) => (split_byte($b, [$($w),+]));
}
fn main() {
let [bit1, bit2, bits3_to_8] = split_byte!(0b1010_1010, 1, 1, 6);
assert_eq!(bit1, 0b1);
assert_eq!(bit2, 0b0);
assert_eq!(bits3_to_8, 0b10_1010);
}
See also:
How does for<> syntax differ from a regular lifetime bound?
How to write a trait bound for adding two references of a generic type?
How do I write the lifetimes for references in a type constraint when one of them is a local reference?
If it's ok to target nightly Rust, I'd use the unstable min_const_generics feature:
#![feature(min_const_generics)]
fn split_byte<const N: usize>(b: u8, bit_widths: [u8; N]) -> [u8; N] {
let mut result = [0; N];
let mut start = 0;
for (idx, &width) in bit_widths.iter().enumerate() {
let shifted = b >> (8 - width - start);
let mask = (0..width).fold(0, |a, _| (a << 1) | 1);
result[idx] = shifted & mask;
start += width;
}
result
}
macro_rules! split_byte {
($b:expr, $($w:expr),+) => (split_byte($b, [$($w),+]));
}
fn main() {
let [bit1, bit2, bits3_to_8] = split_byte!(0b1010_1010, 1, 1, 6);
assert_eq!(bit1, 0b1);
assert_eq!(bit2, 0b0);
assert_eq!(bits3_to_8, 0b10_1010);
}
See also:
Is it possible to control the size of an array using the type parameter of a generic?

Union-Find implementation does not update parent tags

I'm trying to create some sets of Strings and then merge some of these sets so that they have the same tag (of type usize). Once I initialize the map, I start adding strings:
self.clusters.make_set("a");
self.clusters.make_set("b");
When I call self.clusters.find("a") and self.clusters.find("b"), different values are returned, which is fine because I haven't merged the sets yet. Then I call the following method to merge two sets
let _ = self.clusters.union("a", "b");
If I call self.clusters.find("a") and self.clusters.find("b") now, I get the same value. However, when I call the finalize() method and try to iterate through the map, the original tags are returned, as if I never merged the sets.
self.clusters.finalize();
for (address, tag) in &self.clusters.map {
self.clusterizer_writer.write_all(format!("{};{}\n", address,
self.clusters.parent[*tag]).as_bytes()).unwrap();
}
// to output all keys with the same tag as a list.
let a: Vec<(usize, Vec<String>)> = {
let mut x = HashMap::new();
for (k, v) in self.clusters.map.clone() {
x.entry(v).or_insert_with(Vec::new).push(k)
}
x.into_iter().collect()
};
I can't figure out why this is the case, but I'm relatively new to Rust; maybe its an issue with pointers?
Instead of "a" and "b", I'm actually using something like utils::arr_to_hex(&input.outpoint.txid) of type String.
This is the Rust implementation of the Union-Find algorithm that I am using:
/// Tarjan's Union-Find data structure.
#[derive(RustcDecodable, RustcEncodable)]
pub struct DisjointSet<T: Clone + Hash + Eq> {
set_size: usize,
parent: Vec<usize>,
rank: Vec<usize>,
map: HashMap<T, usize>, // Each T entry is mapped onto a usize tag.
}
impl<T> DisjointSet<T>
where
T: Clone + Hash + Eq,
{
pub fn new() -> Self {
const CAPACITY: usize = 1000000;
DisjointSet {
set_size: 0,
parent: Vec::with_capacity(CAPACITY),
rank: Vec::with_capacity(CAPACITY),
map: HashMap::with_capacity(CAPACITY),
}
}
pub fn make_set(&mut self, x: T) {
if self.map.contains_key(&x) {
return;
}
let len = &mut self.set_size;
self.map.insert(x, *len);
self.parent.push(*len);
self.rank.push(0);
*len += 1;
}
/// Returns Some(num), num is the tag of subset in which x is.
/// If x is not in the data structure, it returns None.
pub fn find(&mut self, x: T) -> Option<usize> {
let pos: usize;
match self.map.get(&x) {
Some(p) => {
pos = *p;
}
None => return None,
}
let ret = DisjointSet::<T>::find_internal(&mut self.parent, pos);
Some(ret)
}
/// Implements path compression.
fn find_internal(p: &mut Vec<usize>, n: usize) -> usize {
if p[n] != n {
let parent = p[n];
p[n] = DisjointSet::<T>::find_internal(p, parent);
p[n]
} else {
n
}
}
/// Union the subsets to which x and y belong.
/// If it returns Ok<u32>, it is the tag for unified subset.
/// If it returns Err(), at least one of x and y is not in the disjoint-set.
pub fn union(&mut self, x: T, y: T) -> Result<usize, ()> {
let x_root;
let y_root;
let x_rank;
let y_rank;
match self.find(x) {
Some(x_r) => {
x_root = x_r;
x_rank = self.rank[x_root];
}
None => {
return Err(());
}
}
match self.find(y) {
Some(y_r) => {
y_root = y_r;
y_rank = self.rank[y_root];
}
None => {
return Err(());
}
}
// Implements union-by-rank optimization.
if x_root == y_root {
return Ok(x_root);
}
if x_rank > y_rank {
self.parent[y_root] = x_root;
return Ok(x_root);
} else {
self.parent[x_root] = y_root;
if x_rank == y_rank {
self.rank[y_root] += 1;
}
return Ok(y_root);
}
}
/// Forces all laziness, updating every tag.
pub fn finalize(&mut self) {
for i in 0..self.set_size {
DisjointSet::<T>::find_internal(&mut self.parent, i);
}
}
}
I think you're just not extracting the information out of your DisjointSet struct correctly.
I got sniped by this and implemented union find. First, with a basic usize implemention:
pub struct UnionFinderImpl {
parent: Vec<usize>,
}
Then with a wrapper for more generic types:
pub struct UnionFinder<T: Hash> {
rev: Vec<Rc<T>>,
fwd: HashMap<Rc<T>, usize>,
uf: UnionFinderImpl,
}
Both structs implement a groups() method that returns a Vec<Vec<>> of groups. Clone isn't required because I used Rc.
Playground

Iterating over the contents of an Option, or over a specific value

Let's say that we have the following C-code (assume that srclen == dstlen and the length is divisible by 64).
void stream(uint8_t *dst, uint8_t *src, size_t dstlen) {
int i;
uint8_t block[64];
while (dstlen > 64) {
some_function_that_initializes_block(block);
for (i=0; i<64; i++) {
dst[i] = ((src != NULL)?src[i]:0) ^ block[i];
}
dst += 64;
dstlen -= 64;
if (src != NULL) { src += 64; }
}
}
That is a function that takes a source and a destination and xors source with some value that
the function computes. When source is set to a NULL-pointer dst is just the computed value.
In rust it is quite simple to do this when src cannot be null, we can do something like:
fn stream(dst: &mut [u8], src: &[u8]) {
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(src.chunks(64)) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
However let us assume that we want to be able to mimic the original C-function. Then we would like to do something like:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
let srciter = match osrc {
None => repeat(0),
Some(src) => src.iter()
};
// the rest of the code as above
}
Alas, this won't work since repeat(0) and src.iter() have different types. However it doesn't seem possible to solve this by using a trait object since we get a compiler error saying cannot convert to a trait object because trait 'core::iter::Iterator' is not object safe. (also there is no function in the standard library that chunks an iterator).
Is there any nice way to solve this, or should I just duplicate the code in each arm of the match statement?
Instead of repeating the code in each arm, you can call a generic inner function:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
fn inner<T>(dst: &mut[u8], srciter: T) where T: Iterator<u8> {
let mut block = [0u8, ..64];
//...
}
match osrc {
None => inner(dst, repeat(0)),
Some(src) => inner(dst, src.iter().map(|a| *a))
}
}
Note the additional map to make both iterators compatible (Iterator<u8>).
As you mentioned, Iterator doesn't have a built-in way to do chunking. Let's incorporate Vladimir's solution and use an iterator over chunks:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
const CHUNK_SIZE: uint = 64;
fn inner<'a, T>(dst: &mut[u8], srciter: T) where T: Iterator<&'a [u8]> {
let mut block = [0u8, ..CHUNK_SIZE];
for (dstchunk, srcchunk) in dst.chunks_mut(CHUNK_SIZE).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
static ZEROES: &'static [u8] = &[0u8, ..CHUNK_SIZE];
match osrc {
None => inner(dst, repeat(ZEROES)),
Some(src) => inner(dst, src.chunks(CHUNK_SIZE))
}
}
Unfortunately, it is impossible to use different iterators directly or with trait objects (which have recently been changed to disallow instantiation of trait objects with inappropriate methods i.e. ones which use Self type in their signature). There is a workaround for your particular case, however. Just use enums:
fn stream(dst: &mut [u8], src: Option<&[u8]>) {
static EMPTY: &'static [u8] = &[0u8, ..64]; // '
enum DifferentIterators<'a> { // '
FromSlice(std::slice::Chunks<'a, u8>), // '
FromRepeat(std::iter::Repeat<&'a [u8]>) // '
}
impl<'a> Iterator<&'a [u8]> for DifferentIterators<'a> { // '
#[inline]
fn next(&mut self) -> Option<&'a [u8]> { // '
match *self {
FromSlice(ref mut i) => i.next(),
FromRepeat(ref mut i) => i.next()
}
}
}
let srciter = match src {
None => FromRepeat(repeat(EMPTY)),
Some(src) => FromSlice(src.chunks(64))
};
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
This is a lot of code, unfortunately, but in return it is more safe and less error-prone than the C version. It is also possible to optimize it in order not to require repeat() at all:
fn stream(dst: &mut [u8], src: Option<&[u8]>) {
static EMPTY: &'static [u8] = &[0u8, ..64]; // '
enum DifferentIterators<'a> { // '
FromSlice(std::slice::Chunks<'a, u8>), // '
AlwaysZeros
}
impl<'a> Iterator<&'a [u8]> for DifferentIterators<'a> { // '
#[inline]
fn next(&mut self) -> Option<&'a [u8]> { // '
match *self {
FromSlice(ref mut i) => i.next(),
AlwaysZeros => Some(STATIC),
}
}
}
let srciter = match src {
None => AlwaysZeros,
Some(src) => FromSlice(src.chunks(64))
};
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}

Correctly setting lifetimes and mutability expectations in Rust

I'm rather new to Rust and have put together a little experiment that blows my understanding of annotations entirely out of the water. This is compiled with rust-0.13.0-nightly and there's a playpen version of the code here.
The meat of the program is the function 'recognize', which is co-responsible for allocating String instances along with the function 'lex'. I'm sure the code is a bit goofy so, in addition to getting the lifetimes right enough to get this compiling I would also happily accept some guidance on making this idiomatic.
#[deriving(Show)]
enum Token<'a> {
Field(&'a std::string::String),
}
#[deriving(Show)]
struct LexerState<'a> {
character: int,
field: int,
tokens: Vec<Token<'a>>,
str_buf: &'a std::string::String,
}
// The goal with recognize is to:
//
// * gather all A .. z into a temporary string buffer str_buf
// * on ',', move buffer into a Field token
// * store the completely extracted field in LexerState's tokens attribute
//
// I think I'm not understanding how to specify the lifetimes and mutability
// correctly.
fn recognize<'a, 'r>(c: char, ctx: &'r mut LexerState<'a>) -> &'r mut LexerState<'a> {
match c {
'A' ... 'z' => {
ctx.str_buf.push(c);
},
',' => {
ctx.tokens.push(Field(ctx.str_buf));
ctx.field += 1;
ctx.str_buf = &std::string::String::new();
},
_ => ()
};
ctx.character += 1;
ctx
}
fn lex<'a, I, E>(it: &mut I)
-> LexerState<'a> where I: Iterator<Result<char, E>> {
let mut ctx = LexerState { character: 0, field: 0,
tokens: Vec::new(), str_buf: &std::string::String::new() };
for val in *it {
let c:char = val.ok().expect("wtf");
recognize(c, &mut ctx);
}
ctx
}
fn main() {
let tokens = lex(&mut std::io::stdio::stdin().chars());
println!("{}", tokens)
}
In this case, you're constructing new strings rather than borrowing existing strings, so you'd use an owned string directly:
use std::mem;
#[deriving(Show)]
enum Token {
Field(String),
}
#[deriving(Show)]
struct LexerState {
character: int,
field: int,
tokens: Vec<Token>,
str_buf: String,
}
// The goal with recognize is to:
//
// * gather all A .. z into a temporary string buffer str_buf
// * on ',', move buffer into a Field token
// * store the completely extracted field in LexerState's tokens attribute
//
// I think I'm not understanding how to specify the lifetimes and mutability
// correctly.
fn recognize<'a, 'r>(c: char, ctx: &'r mut LexerState) -> &'r mut LexerState {
match c {
'A' ...'z' => { ctx.str_buf.push(c); }
',' => {
ctx.tokens.push(Field(mem::replace(&mut ctx.str_buf,
String::new())));
ctx.field += 1;
}
_ => (),
};
ctx.character += 1;
ctx
}
fn lex<I, E>(it: &mut I) -> LexerState where I: Iterator<Result<char, E>> {
let mut ctx =
LexerState{
character: 0,
field: 0,
tokens: Vec::new(),
str_buf: String::new(),
};
for val in *it {
let c: char = val.ok().expect("wtf");
recognize(c, &mut ctx);
}
ctx
}
fn main() {
let tokens = lex(&mut std::io::stdio::stdin().chars());
println!("{}" , tokens)
}

Resources