Iterating over the contents of an Option, or over a specific value - rust

Let's say that we have the following C-code (assume that srclen == dstlen and the length is divisible by 64).
void stream(uint8_t *dst, uint8_t *src, size_t dstlen) {
int i;
uint8_t block[64];
while (dstlen > 64) {
some_function_that_initializes_block(block);
for (i=0; i<64; i++) {
dst[i] = ((src != NULL)?src[i]:0) ^ block[i];
}
dst += 64;
dstlen -= 64;
if (src != NULL) { src += 64; }
}
}
That is a function that takes a source and a destination and xors source with some value that
the function computes. When source is set to a NULL-pointer dst is just the computed value.
In rust it is quite simple to do this when src cannot be null, we can do something like:
fn stream(dst: &mut [u8], src: &[u8]) {
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(src.chunks(64)) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
However let us assume that we want to be able to mimic the original C-function. Then we would like to do something like:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
let srciter = match osrc {
None => repeat(0),
Some(src) => src.iter()
};
// the rest of the code as above
}
Alas, this won't work since repeat(0) and src.iter() have different types. However it doesn't seem possible to solve this by using a trait object since we get a compiler error saying cannot convert to a trait object because trait 'core::iter::Iterator' is not object safe. (also there is no function in the standard library that chunks an iterator).
Is there any nice way to solve this, or should I just duplicate the code in each arm of the match statement?

Instead of repeating the code in each arm, you can call a generic inner function:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
fn inner<T>(dst: &mut[u8], srciter: T) where T: Iterator<u8> {
let mut block = [0u8, ..64];
//...
}
match osrc {
None => inner(dst, repeat(0)),
Some(src) => inner(dst, src.iter().map(|a| *a))
}
}
Note the additional map to make both iterators compatible (Iterator<u8>).
As you mentioned, Iterator doesn't have a built-in way to do chunking. Let's incorporate Vladimir's solution and use an iterator over chunks:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
const CHUNK_SIZE: uint = 64;
fn inner<'a, T>(dst: &mut[u8], srciter: T) where T: Iterator<&'a [u8]> {
let mut block = [0u8, ..CHUNK_SIZE];
for (dstchunk, srcchunk) in dst.chunks_mut(CHUNK_SIZE).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
static ZEROES: &'static [u8] = &[0u8, ..CHUNK_SIZE];
match osrc {
None => inner(dst, repeat(ZEROES)),
Some(src) => inner(dst, src.chunks(CHUNK_SIZE))
}
}

Unfortunately, it is impossible to use different iterators directly or with trait objects (which have recently been changed to disallow instantiation of trait objects with inappropriate methods i.e. ones which use Self type in their signature). There is a workaround for your particular case, however. Just use enums:
fn stream(dst: &mut [u8], src: Option<&[u8]>) {
static EMPTY: &'static [u8] = &[0u8, ..64]; // '
enum DifferentIterators<'a> { // '
FromSlice(std::slice::Chunks<'a, u8>), // '
FromRepeat(std::iter::Repeat<&'a [u8]>) // '
}
impl<'a> Iterator<&'a [u8]> for DifferentIterators<'a> { // '
#[inline]
fn next(&mut self) -> Option<&'a [u8]> { // '
match *self {
FromSlice(ref mut i) => i.next(),
FromRepeat(ref mut i) => i.next()
}
}
}
let srciter = match src {
None => FromRepeat(repeat(EMPTY)),
Some(src) => FromSlice(src.chunks(64))
};
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
This is a lot of code, unfortunately, but in return it is more safe and less error-prone than the C version. It is also possible to optimize it in order not to require repeat() at all:
fn stream(dst: &mut [u8], src: Option<&[u8]>) {
static EMPTY: &'static [u8] = &[0u8, ..64]; // '
enum DifferentIterators<'a> { // '
FromSlice(std::slice::Chunks<'a, u8>), // '
AlwaysZeros
}
impl<'a> Iterator<&'a [u8]> for DifferentIterators<'a> { // '
#[inline]
fn next(&mut self) -> Option<&'a [u8]> { // '
match *self {
FromSlice(ref mut i) => i.next(),
AlwaysZeros => Some(STATIC),
}
}
}
let srciter = match src {
None => AlwaysZeros,
Some(src) => FromSlice(src.chunks(64))
};
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}

Related

Crossbeam Receiver to BufRead?

I receive a long string (several gigabytes) as chunks of [u8]s in a Crossbeam channel. I want to break it down to lines. How do I turn these chunks into a BufRead?
fn foo(recv: crossbeam_channel::Receiver<Vec<u8>>) {
let mut buf_read: dyn std::io::BufRead = WHAT_COMES_HERE(recv); // <----
for line in buf_read.lines() {
// ...
}
}
I make these chunks on another thread since they are CPU-intensive to make. I could use something else than Vec<u8> if it makes more sense.
I don't think there is anything builtin, but it shouldn't be too difficult to write yourself. For example something like this:
use crossbeam_channel; // 0.5.4
use std::cmp::min;
use std::io::BufRead;
struct CrossbeamReader {
recv: crossbeam_channel::Receiver<Vec<u8>>,
offset: usize,
buffer: Vec<u8>,
}
impl CrossbeamReader {
fn new (recv: crossbeam_channel::Receiver<Vec<u8>>) -> Self
{
CrossbeamReader { recv, offset: 0, buffer: vec![], }
}
}
impl std::io::Read for CrossbeamReader {
fn read (&mut self, buf: &mut [u8]) -> std::io::Result<usize>
{
while self.offset >= self.buffer.len() {
self.buffer = match self.recv.recv() {
Ok (v) => v,
Err (_) => return Ok (0), // TODO: error handling
};
self.offset = 0;
}
let size = min (buf.len(), self.buffer.len() - self.offset);
buf[..size].copy_from_slice (&self.buffer[self.offset .. self.offset + size]);
self.offset += size;
Ok (size)
}
}
pub fn foo(recv: crossbeam_channel::Receiver<Vec<u8>>) {
let buf_read = std::io::BufReader::new (CrossbeamReader::new (recv));
for _line in buf_read.lines() {
// ...
}
}
Playground

Is there a way in Rust to overload method for a specific type?

The following is only an example. If there's a native solution for this exact problem with reading bytes - cool, but my goal is to learn how to do it by myself, for any other purpose as well.
I'd like to do something like this: (pseudo-code below)
let mut reader = Reader::new(bytesArr);
let int32: i32 = reader.read(); // separate implementation to read 4 bits and convert into int32
let int64: i64 = reader.read(); // separate implementation to read 8 bits and convert into int64
I imagine it looking like this: (pseudo-code again)
impl Reader {
read<T>(&mut self) -> T {
// if T is i32 ... else if ...
}
}
or like this:
impl Reader {
read(&mut self) -> i32 {
// ...
}
read(&mut self) -> i64 {
// ...
}
}
But haven't found anything relatable yet.
(I actually have, for the first case (if T is i32 ...), but it looked really unreadable and inconvenient)
You could do this by having a Readable trait which you implement on i32 and i64, which does the operation. Then on Reader you could have a generic function which takes any type that is Readable and return it, for example:
struct Reader {
n: u8,
}
trait Readable {
fn read_from_reader(reader: &mut Reader) -> Self;
}
impl Readable for i32 {
fn read_from_reader(reader: &mut Reader) -> i32 {
reader.n += 1;
reader.n as i32
}
}
impl Readable for i64 {
fn read_from_reader(reader: &mut Reader) -> i64 {
reader.n += 1;
reader.n as i64
}
}
impl Reader {
fn read<T: Readable>(&mut self) -> T {
T::read_from_reader(self)
}
}
fn main() {
let mut r = Reader { n: 0 };
let int32: i32 = r.read();
let int64: i64 = r.read();
println!("{} {}", int32, int64);
}
You can try it on the playground
After some trials and searches, I found that implementing them in current Rust seems a bit difficult, but not impossible.
Here is the code, I'll explain it afterwards:
#![feature(generic_const_exprs)]
use std::{
mem::{self, MaybeUninit},
ptr,
};
static DATA: [u8; 8] = [
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
];
struct Reader;
impl Reader {
fn read<T: Copy + Sized>(&self) -> T
where
[(); mem::size_of::<T>()]: ,
{
let mut buf = [unsafe { MaybeUninit::uninit().assume_init() }; mem::size_of::<T>()];
unsafe {
ptr::copy_nonoverlapping(DATA.as_ptr(), buf.as_mut_ptr(), buf.len());
mem::transmute_copy(&buf)
}
}
}
fn main() {
let reader = Reader;
let v_u8: u8 = reader.read();
dbg!(v_u8);
let v_u16: u16 = reader.read();
dbg!(v_u16);
let v_u32: u32 = reader.read();
dbg!(v_u32);
let v_u64: u64 = reader.read();
dbg!(v_u64);
}
Suppose the global static variable DATA is the target data you want to read.
In current Rust, we cannot directly use the size of a generic parameter as the length of an array. This does not work:
fn example<T: Copy + Sized>() {
let mut _buf = [0_u8; mem::size_of::<T>()];
}
The compiler gives a weird error:
error: unconstrained generic constant
--> src\main.rs:34:31
|
34 | let mut _buf = [0_u8; mem::size_of::<T>()];
| ^^^^^^^^^^^^^^^^^^^
|
= help: try adding a `where` bound using this expression: `where [(); mem::size_of::<T>()]:`
There is an issue that is tracking it, if you want to go deeper into this error you can take a look.
We just follow the compiler's suggestion to add a where bound. This requires feature generic_const_exprs to be enabled.
Next, unsafe { MaybeUninit::uninit().assume_init() } is optional, which drops the overhead of initializing this array, since we will eventually overwrite it completely. You can replace it with 0_u8 if you don't like it.
Finally, copy the data you need and transmute this array to your generic type, return.
I think you will see the output you expect:
[src\main.rs:38] v_u8 = 255
[src\main.rs:41] v_u16 = 65535
[src\main.rs:44] v_u32 = 4294967295
[src\main.rs:47] v_u64 = 18446744073709551615

Union-Find implementation does not update parent tags

I'm trying to create some sets of Strings and then merge some of these sets so that they have the same tag (of type usize). Once I initialize the map, I start adding strings:
self.clusters.make_set("a");
self.clusters.make_set("b");
When I call self.clusters.find("a") and self.clusters.find("b"), different values are returned, which is fine because I haven't merged the sets yet. Then I call the following method to merge two sets
let _ = self.clusters.union("a", "b");
If I call self.clusters.find("a") and self.clusters.find("b") now, I get the same value. However, when I call the finalize() method and try to iterate through the map, the original tags are returned, as if I never merged the sets.
self.clusters.finalize();
for (address, tag) in &self.clusters.map {
self.clusterizer_writer.write_all(format!("{};{}\n", address,
self.clusters.parent[*tag]).as_bytes()).unwrap();
}
// to output all keys with the same tag as a list.
let a: Vec<(usize, Vec<String>)> = {
let mut x = HashMap::new();
for (k, v) in self.clusters.map.clone() {
x.entry(v).or_insert_with(Vec::new).push(k)
}
x.into_iter().collect()
};
I can't figure out why this is the case, but I'm relatively new to Rust; maybe its an issue with pointers?
Instead of "a" and "b", I'm actually using something like utils::arr_to_hex(&input.outpoint.txid) of type String.
This is the Rust implementation of the Union-Find algorithm that I am using:
/// Tarjan's Union-Find data structure.
#[derive(RustcDecodable, RustcEncodable)]
pub struct DisjointSet<T: Clone + Hash + Eq> {
set_size: usize,
parent: Vec<usize>,
rank: Vec<usize>,
map: HashMap<T, usize>, // Each T entry is mapped onto a usize tag.
}
impl<T> DisjointSet<T>
where
T: Clone + Hash + Eq,
{
pub fn new() -> Self {
const CAPACITY: usize = 1000000;
DisjointSet {
set_size: 0,
parent: Vec::with_capacity(CAPACITY),
rank: Vec::with_capacity(CAPACITY),
map: HashMap::with_capacity(CAPACITY),
}
}
pub fn make_set(&mut self, x: T) {
if self.map.contains_key(&x) {
return;
}
let len = &mut self.set_size;
self.map.insert(x, *len);
self.parent.push(*len);
self.rank.push(0);
*len += 1;
}
/// Returns Some(num), num is the tag of subset in which x is.
/// If x is not in the data structure, it returns None.
pub fn find(&mut self, x: T) -> Option<usize> {
let pos: usize;
match self.map.get(&x) {
Some(p) => {
pos = *p;
}
None => return None,
}
let ret = DisjointSet::<T>::find_internal(&mut self.parent, pos);
Some(ret)
}
/// Implements path compression.
fn find_internal(p: &mut Vec<usize>, n: usize) -> usize {
if p[n] != n {
let parent = p[n];
p[n] = DisjointSet::<T>::find_internal(p, parent);
p[n]
} else {
n
}
}
/// Union the subsets to which x and y belong.
/// If it returns Ok<u32>, it is the tag for unified subset.
/// If it returns Err(), at least one of x and y is not in the disjoint-set.
pub fn union(&mut self, x: T, y: T) -> Result<usize, ()> {
let x_root;
let y_root;
let x_rank;
let y_rank;
match self.find(x) {
Some(x_r) => {
x_root = x_r;
x_rank = self.rank[x_root];
}
None => {
return Err(());
}
}
match self.find(y) {
Some(y_r) => {
y_root = y_r;
y_rank = self.rank[y_root];
}
None => {
return Err(());
}
}
// Implements union-by-rank optimization.
if x_root == y_root {
return Ok(x_root);
}
if x_rank > y_rank {
self.parent[y_root] = x_root;
return Ok(x_root);
} else {
self.parent[x_root] = y_root;
if x_rank == y_rank {
self.rank[y_root] += 1;
}
return Ok(y_root);
}
}
/// Forces all laziness, updating every tag.
pub fn finalize(&mut self) {
for i in 0..self.set_size {
DisjointSet::<T>::find_internal(&mut self.parent, i);
}
}
}
I think you're just not extracting the information out of your DisjointSet struct correctly.
I got sniped by this and implemented union find. First, with a basic usize implemention:
pub struct UnionFinderImpl {
parent: Vec<usize>,
}
Then with a wrapper for more generic types:
pub struct UnionFinder<T: Hash> {
rev: Vec<Rc<T>>,
fwd: HashMap<Rc<T>, usize>,
uf: UnionFinderImpl,
}
Both structs implement a groups() method that returns a Vec<Vec<>> of groups. Clone isn't required because I used Rc.
Playground

Polymorphism in Rust and trait references (trait objects?)

I'm writing a process memory scanner with a console prompt interface in Rust.
I need scanner types such as a winapi scanner or a ring0 driver scanner so I'm trying to implement polymorphism.
I have the following construction at this moment:
pub trait Scanner {
fn attach(&mut self, pid: u32) -> bool;
fn detach(&mut self);
}
pub struct WinapiScanner {
pid: u32,
hprocess: HANDLE,
addresses: Vec<usize>
}
impl WinapiScanner {
pub fn new() -> WinapiScanner {
WinapiScanner {
pid: 0,
hprocess: 0 as HANDLE,
addresses: Vec::<usize>::new()
}
}
}
impl Scanner for WinapiScanner {
fn attach(&mut self, pid: u32) -> bool {
let handle = unsafe { OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid) };
if handle == 0 as HANDLE {
self.pid = pid;
self.hprocess = handle;
true
} else {
false
}
}
fn detach(&mut self) {
unsafe { CloseHandle(self.hprocess) };
self.pid = 0;
self.hprocess = 0 as HANDLE;
self.addresses.clear();
}
}
In future, I'll have some more scanner types besides WinapiScanner, so, if I understand correctly, I should use a trait reference (&Scanner) to implement polymorphism. I'm trying to create Scanner object like this (note the comments):
enum ScannerType {
Winapi
}
pub fn start() {
let mut scanner: Option<&mut Scanner> = None;
let mut scanner_type = ScannerType::Winapi;
loop {
let line = prompt();
let tokens: Vec<&str> = line.split_whitespace().collect();
match tokens[0] {
// commands
"scanner" => {
if tokens.len() != 2 {
println!("\"scanner\" command takes 1 argument")
} else {
match tokens[1] {
"list" => {
println!("Available scanners: winapi");
},
"winapi" => {
scanner_type = ScannerType::Winapi;
println!("Scanner type set to: winapi");
},
x => {
println!("Unknown scanner type: {}", x);
}
}
}
},
"attach" => {
if tokens.len() > 1 {
match tokens[1].parse::<u32>() {
Ok(pid) => {
scanner = match scanner_type {
// ----------------------
// Problem goes here.
// Object, created by WinapiScanner::new() constructor
// doesn't live long enough to borrow it here
ScannerType::Winapi => Some(&mut WinapiScanner::new())
// ----------------------
}
}
Err(_) => {
println!("Wrong pid");
}
}
}
},
x => println!("Unknown command: {}", x)
}
}
}
fn prompt() -> String {
use std::io::Write;
use std::io::BufRead;
let stdout = io::stdout();
let mut lock = stdout.lock();
let _ = lock.write(">> ".as_bytes());
let _ = lock.flush();
let stdin = io::stdin();
let mut lock = stdin.lock();
let mut buf = String::new();
let _ = lock.read_line(&mut buf);
String::from(buf.trim())
}
It's not a full program; I've pasted important parts only.
What am I doing wrong and how do I implement what I want in Rust?
Trait objects must be used behind a pointer. But references are not the only kind of pointers; Box is also a pointer!
let mut scanner: Option<Box<Scanner>> = None;
scanner = match scanner_type {
ScannerType::Winapi => Some(Box::new(WinapiScanner::new()))
}

Correctly setting lifetimes and mutability expectations in Rust

I'm rather new to Rust and have put together a little experiment that blows my understanding of annotations entirely out of the water. This is compiled with rust-0.13.0-nightly and there's a playpen version of the code here.
The meat of the program is the function 'recognize', which is co-responsible for allocating String instances along with the function 'lex'. I'm sure the code is a bit goofy so, in addition to getting the lifetimes right enough to get this compiling I would also happily accept some guidance on making this idiomatic.
#[deriving(Show)]
enum Token<'a> {
Field(&'a std::string::String),
}
#[deriving(Show)]
struct LexerState<'a> {
character: int,
field: int,
tokens: Vec<Token<'a>>,
str_buf: &'a std::string::String,
}
// The goal with recognize is to:
//
// * gather all A .. z into a temporary string buffer str_buf
// * on ',', move buffer into a Field token
// * store the completely extracted field in LexerState's tokens attribute
//
// I think I'm not understanding how to specify the lifetimes and mutability
// correctly.
fn recognize<'a, 'r>(c: char, ctx: &'r mut LexerState<'a>) -> &'r mut LexerState<'a> {
match c {
'A' ... 'z' => {
ctx.str_buf.push(c);
},
',' => {
ctx.tokens.push(Field(ctx.str_buf));
ctx.field += 1;
ctx.str_buf = &std::string::String::new();
},
_ => ()
};
ctx.character += 1;
ctx
}
fn lex<'a, I, E>(it: &mut I)
-> LexerState<'a> where I: Iterator<Result<char, E>> {
let mut ctx = LexerState { character: 0, field: 0,
tokens: Vec::new(), str_buf: &std::string::String::new() };
for val in *it {
let c:char = val.ok().expect("wtf");
recognize(c, &mut ctx);
}
ctx
}
fn main() {
let tokens = lex(&mut std::io::stdio::stdin().chars());
println!("{}", tokens)
}
In this case, you're constructing new strings rather than borrowing existing strings, so you'd use an owned string directly:
use std::mem;
#[deriving(Show)]
enum Token {
Field(String),
}
#[deriving(Show)]
struct LexerState {
character: int,
field: int,
tokens: Vec<Token>,
str_buf: String,
}
// The goal with recognize is to:
//
// * gather all A .. z into a temporary string buffer str_buf
// * on ',', move buffer into a Field token
// * store the completely extracted field in LexerState's tokens attribute
//
// I think I'm not understanding how to specify the lifetimes and mutability
// correctly.
fn recognize<'a, 'r>(c: char, ctx: &'r mut LexerState) -> &'r mut LexerState {
match c {
'A' ...'z' => { ctx.str_buf.push(c); }
',' => {
ctx.tokens.push(Field(mem::replace(&mut ctx.str_buf,
String::new())));
ctx.field += 1;
}
_ => (),
};
ctx.character += 1;
ctx
}
fn lex<I, E>(it: &mut I) -> LexerState where I: Iterator<Result<char, E>> {
let mut ctx =
LexerState{
character: 0,
field: 0,
tokens: Vec::new(),
str_buf: String::new(),
};
for val in *it {
let c: char = val.ok().expect("wtf");
recognize(c, &mut ctx);
}
ctx
}
fn main() {
let tokens = lex(&mut std::io::stdio::stdin().chars());
println!("{}" , tokens)
}

Resources