Multi-dimensional vector borrowing

Multi-dimensional vector borrowing - rust

I'm trying to implement a coding exercise, but I've ran into a wall regarding multi-dimensional vectors and borrowing.
The code is accessible in this playground, but I'll add here a snippet for reference:
type Matrix = Rc<RefCell<Vec<Vec<String>>>>;
/// sequence -> target string
/// dictionary -> array of 'words' that can be used to construct the 'sequence'
/// returns -> 2d array of all the possible combinations to create the 'sequence' from the 'dictionary'
pub fn all_construct<'a>(sequence: &'a str, dictionary: &'a [&str]) -> Matrix {
let memo: Rc<RefCell<HashMap<&str, Matrix>>> = Rc::new(RefCell::new(HashMap::new()));
all_construct_memo(sequence, dictionary, Rc::clone(&memo))
}
fn all_construct_memo<'a>(
sequence: &'a str,
dictionary: &'a [&str],
memo: Rc<RefCell<HashMap<&'a str, Matrix>>>,
) -> Matrix {
if memo.borrow().contains_key(sequence) {
return Rc::clone(&memo.borrow()[sequence]);
}
if sequence.is_empty() {
return Rc::new(RefCell::new(Vec::new()));
}
let ways = Rc::new(RefCell::new(Vec::new()));
for word in dictionary {
if let Some(new_sequence) = sequence.strip_prefix(word) {
let inner_ways = all_construct_memo(new_sequence, dictionary, Rc::clone(&memo));
for mut entry in inner_ways.borrow_mut().into_iter() { // error here
entry.push(word.to_string());
ways.borrow_mut().push(entry);
}
}
}
memo.borrow_mut().insert(sequence, Rc::clone(&ways));
Rc::clone(&ways)
}
The code doesn't compile.
Questions:
This feel overly complicated. Is there a simpler way to do it?
1.1 For the Matrix type, I tried getting by with just Vec<Vec<String>>, but that didn't get me very far. What's the way to properly encode a 2d Vector that allows for mutability and sharing, without using extra crates?
1.2. Is there a better way to pass the memo object?
Not really understanding the compiler error here. Can you help me with that?
error[E0507]: cannot move out of dereference of `RefMut<'_, Vec<Vec<String>>>`
--> src/lib.rs:31:30
|
31 | for mut entry in inner_ways.borrow_mut().into_iter() { // error here
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ move occurs because value has type `Vec<Vec<String>>`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0507`.
Thank you!

2d vecs work fine, and for jagged arrays like yours, your implementation is correct. Your issues stem from a needless use of Rc and RefCell. Because of the way you're calling things, a single, mutable reference will work.
Consider the following, modified, example:
type Vec2<T> = Vec<Vec<T>>;
fn all_constructs<'a>(sequence: &'a str, segments: &[&'a str]) -> Vec2<&'a str> {
let mut cache = HashMap::new();
all_constructs_memo(sequence, segments, &mut cache)
}
fn all_constructs_memo<'a>(
sequence: &'a str,
segments: &[&'a str],
cache: &mut HashMap<&'a str, Vec2<&'a str>>
) -> Vec2<&'a str> {
// If we have the answer cached, return the cache
if let Some(constructs) = cache.get(sequence) {
return constructs.to_vec();
}
// We don't have it cached, so figure it out
let mut constructs = Vec::new();
for segment in segments {
if *segment == sequence {
constructs.push(vec![*segment]);
} else if let Some(sub_sequence) = sequence.strip_suffix(segment) {
let mut sub_constructs = all_constructs_memo(sub_sequence, segments, cache);
sub_constructs.iter_mut().for_each(|c| c.push(segment));
constructs.append(&mut sub_constructs);
}
}
cache.insert(sequence, constructs.clone());
return constructs;
}
It's identical, execpt for 4 differences:
1.) I removed all Rc and RefCell. There is a single Hashmap reference
2.) Instead of having all_constructs_memo("", ...) -> Vec::new(), I just added a branch in the iterator if *segment == sequence to test for single-segment matches that way.
3.) I wrote Vec2 instead of Matrix
4.) strip_suffix instead of strip_prefix, just because adding to the end of vecs is a little more efficient than adding to the front.
Here's a playground link with tests against a non-memoized reference implementation
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1b488aafda6466629c17c8a7de8f3e42

Related

can I create a custom iterator the iterates over one sequence then another (chain doesnt work)

I have a struct Folder. I have a method called contents. I want that method to return an object that supports IntoIterator so that the caller can just go
for x in folder.contents(){
...
}
The Item type is (since this is what the hashmap iterator returns - see a little lower)
(&OsString, &FileOrFolder)
where FileOrFolder is an enum
enum FileOrFolder{
File(File),
Folder(Folder)
}
The iterator itself needs to first enumerate a HashMap<OSString, FileOrFolder> owned by the folder and then second, enumerate a Vec<File>. The Vec of files is created on the fly by the contents fn or by the IntoIterator call, whatever works. I tried simply using chain but quickly realized that wasn't going to work. So my rough sketch of what I am trying to do is this:
// the iterator
pub struct FFIter {
files: Vec<FileOrFolder>,
files_iter:Box<dyn Iterator<Item=FileOrFolder>>,
dirs: Box<dyn Iterator<Item = (&OsString, &FileOrFolder)>>,
dirs_done:bool
}
// the thing returned by the contents fn
struct FolderContents{
folder:&Folder
}
// make it iterable
impl IntoIterator for FolderContents {
type Item =(&OsString, &FileOrFolder);
type IntoIter = FFIter;
fn into_iter(self) -> Self::IntoIter {
let files = self.folder.make_the_files()
FFIter {
files: files, // to keep files 'alive'
files_iter: files.iter(),
dirs: Box::new(self.hashmap.iter()),
dirs_done:false
}
}
}
impl Iterator for FFIter {
type Item = (&OsString, &FileOrFolder);
fn next(&mut self) -> Option<(&OsString, &FileOrFolder)> {
None // return empty, lets just get the skeleton built
}
}
impl Folder{
pub fn contents(&self) -> FolderContents{
FolderContents{folder:&self}
}
}
I know this is full of errors, but I need to know if this is doable at all. As you can see I am not even trying to write the code that returns anything. I am just trying to get the basic outline to compile.
I started arm wrestling with the lifetime system and got to the point where I had this
error[E0658]: generic associated types are unstable
--> src\state\files\file_or_folder.rs:46:5
|
46 | type Item<'a> =(&'a OsString, &'a FileOrFolder);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #44265 <https://github.com/rust-lang/rust/issues/44265> for more information
Which kinda sucked as that is what the compiler said I should do.
I am happy to keep ploughing away at this following the suggestions from the compiler / reading / ... But in the past I have posted a question along these lines and been told - 'of course it can't be done'. So should I be able to make this work?
The Folder type is not Copy and expensive to clone. The File type is simple (string and i64), Copy and Clone
I know I could simply make the caller call two different iterations and merge them, but I am trying to write a transparent replacement module to drop into a large existing codebase.
If somebody says that chain() should work that's great, I will have another go at that.
EDIT Jmp said chain should work,
heres what I tried
pub fn contents(&self) -> Box<dyn Iterator<Item = (&OsString, &FileOrFolder)> + '_> {
let mut files = vec![];
if self.load_done {
for entry in WalkDir::new(&self.full_path)
.max_depth(1)
.skip_hidden(false)
.follow_links(false)
.into_iter()
{
let ent = entry.unwrap();
if ent.file_type().is_file() {
if let Some(name) = ent.path().file_name() {
files.push((
name.to_os_string(),
FileOrFolder::File(File {
name: name.to_os_string(),
size: ent.metadata().unwrap().len() as u128,
}),
));
}
}
}
};
Box::new(
self.contents
.iter()
.map(|(k, v)| (k, v))
.chain(files.iter().map(|x| (&x.0, &x.1))),
)
}
but the compiler complains, correctly, that 'files' get destroyed at the end of the call. What I need is for the vec to be held by the iterator and then dropped at the end of the iteration. Folder itself cannot hold the files - the whole point here is to populate files on the fly, its too expensive, memory wise to hold them.

You claim that files is populated on the fly, but that's precisely what your code is not doing: your code precomputes files before attempting to return it. The solution is to really compute files on the fly, something like this:
pub fn contents(&self) -> Box<dyn Iterator<Item = (&OsString, &FileOrFolder)> + '_> {
let files = WalkDir::new(&self.full_path)
.max_depth(1)
.skip_hidden(false)
.follow_links(false)
.into_iter()
.filter_map (|entry| {
let ent = entry.unwrap;
if ent.file_type().is_file() {
if let Some(name) = ent.path().file_name() {
Some((
name.to_os_string(),
FileOrFolder::File(File {
name: name.to_os_string(),
size: ent.metadata().unwrap().len() as u128,
}),
))
} else None
} else None
});
self.contents
.iter()
.chain (files)
}
Since you haven't given us an MRE, I haven't tested the above, but I think it will fail because self.contents.iter() returns references, whereas files returns owned values. Fixing this requires changing the prototype of the function to return some form of owned values since files cannot be made to return references. I see two ways to do this:
Easiest is to make FileOrFolder clonable and get rid of the references in the prototype:
pub fn contents(&self) -> Box<dyn Iterator<Item = (OsString, FileOrFolder)> + '_> {
let files = ...;
self.contents
.iter()
.cloned()
.chain (files)
Or you can make a wrapper type similar to Cow than can hold either a reference or an owned value:
enum OwnedOrRef<'a, T> {
Owned (T),
Ref (&'a T),
}
pub fn contents(&self) -> Box<dyn Iterator<Item = (OwnedOrRef::<OsString>, OwnedOrRef::<FileOrFolder>)> + '_> {
let files = ...;
self.contents
.iter()
.map (|(k, v)| (OwnedOrRef::Ref (k), OwnedOrRef::Ref (v))
.chain (files
.map (|(k, v)| (OwnedOrRef::Owned (k),
OwnedOrRef::Owned (v)))
}
You can even use Cow if FileOrFolder can implement ToOwned.

Iterating over named regex groups in Rust

I wish to extract all named groups from a match into a HashMap and I'm running into a "does not live long enough" error while trying to compile this code:
extern crate regex;
use std::collections::HashMap;
use regex::Regex;
pub struct Route {
regex: Regex,
}
pub struct Router<'a> {
pub namespace_seperator: &'a str,
routes: Vec<Route>,
}
impl<'a> Router<'a> {
// ...
pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
for route in &self.routes {
if route.regex.is_match(path) {
let mut hash = HashMap::new();
for cap in route.regex.captures_iter(path) {
for (name, value) in cap.iter_named() {
hash.insert(name, value.unwrap());
}
}
return Some(hash);
}
}
None
}
}
fn main() {}
Here's the error output:
error: `cap` does not live long enough
--> src/main.rs:23:42
|>
23 |> for (name, value) in cap.iter_named() {
|> ^^^
note: reference must be valid for the anonymous lifetime #1 defined on the block at 18:79...
--> src/main.rs:18:80
|>
18 |> pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
|> ^
note: ...but borrowed value is only valid for the for at 22:16
--> src/main.rs:22:17
|>
22 |> for cap in route.regex.captures_iter(path) {
|> ^
Obviously I still have a thing or two to learn about Rust lifetimes.

Let's follow the lifetime lines:
route.regex.captures_iter(path) creates a FindCapture<'r, 't> where the lifetime 'r is that of route.regex and the lifetime 't is that of path
this iterator yields a Captures<'t>, only linked to the lifetime of path
whose method iter_named(&'t self) yields a SubCapture<'t> itself linked to the lifetime of path and the lifetime of the cap
this iterator yields a (&'t str, Option<&'t str>) so that both keys and values of the HashMap are linked to the lifetime of path and the lifetime of the cap
Therefore, it is unfortunately impossible to have the HashMap outlive the cap variable as this variable is used by the code as a "marker" to keep the buffers containing the groups alive.
I am afraid that the only solution without significant re-structuring is to return a HashMap<String, String>, as unsatisfying as it is. It also occurs to me that a single capture group may match multiple times, not sure if you want to bother with this.

Matthieu M. already explained the lifetime situation well. The good news is that the regex crate recognized the problem and there's a fix in the pipeline for 1.0.
As stated in the commit message:
It was always possible to work around this by using indices.
It is also possible to work around this by using Regex::capture_names, although it's a bit more nested this way:
pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
for route in &self.routes {
if let Some(captures) = route.regex.captures(path) {
let mut hash = HashMap::new();
for name in route.regex.capture_names() {
if let Some(name) = name {
if let Some(value) = captures.name(name) {
hash.insert(name, value);
}
}
}
return Some(hash);
}
}
None
}
Note that I also removed the outer is_match — it's inefficient to run the regex once and then again.

Why does this variable definition imply static lifetime?

I'm trying to execute a function on chunks of a vector and then send the result back using the message passing library.
However, I get a strange error about the lifetime of the vector that isn't even participating in the thread operations:
src/lib.rs:153:27: 154:25 error: borrowed value does not live long enough
src/lib.rs:153 let extended_segments = (segment_size..max_val)
error: src/lib.rs:154 .collect::<Vec<_>>()borrowed value does not live long enough
note: reference must be valid for the static lifetime...:153
let extended_segments = (segment_size..max_val)
src/lib.rs:153:3: 155:27: 154 .collect::<Vec<_>>()
note: but borrowed value is only valid for the statement at 153:2:
reference must be valid for the static lifetime...
src/lib.rs:
let extended_segments = (segment_size..max_val)
consider using a `let` binding to increase its lifetime
I tried moving around the iterator and adding lifetimes to different places, but I couldn't get the checker to pass and still stay on type.
The offending code is below, based on the concurrency chapter in the Rust book. (Complete code is at github.)
use std::sync::mpsc;
use std::thread;
fn sieve_segment(a: &[usize], b: &[usize]) -> Vec<usize> {
vec![]
}
fn eratosthenes_sieve(val: usize) -> Vec<usize> {
vec![]
}
pub fn segmented_sieve_parallel(max_val: usize, mut segment_size: usize) -> Vec<usize> {
if max_val <= ((2 as i64).pow(16) as usize) {
// early return if the highest value is small enough (empirical)
return eratosthenes_sieve(max_val);
}
if segment_size > ((max_val as f64).sqrt() as usize) {
segment_size = (max_val as f64).sqrt() as usize;
println!("Segment size is larger than √{}. Reducing to {} to keep resource use down.",
max_val,
segment_size);
}
let small_primes = eratosthenes_sieve((max_val as f64).sqrt() as usize);
let mut big_primes = small_primes.clone();
let (tx, rx): (mpsc::Sender<Vec<usize>>, mpsc::Receiver<Vec<usize>>) = mpsc::channel();
let extended_segments = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
for this_segment in extended_segments.clone() {
let small_primes = small_primes.clone();
let tx = tx.clone();
thread::spawn(move || {
let sieved_segment = sieve_segment(&small_primes, this_segment);
tx.send(sieved_segment).unwrap();
});
}
for _ in 1..extended_segments.count() {
big_primes.extend(&rx.recv().unwrap());
}
big_primes
}
fn main() {}
How do I understand and avoid this error? I'm not sure how to make the lifetime of the thread closure static as in this question and still have the function be reusable (i.e., not main()). I'm not sure how to "consume all things that come into [the closure]" as mentioned in this question. And I'm not sure where to insert .map(|s| s.into()) to ensure that all references become moves, nor am I sure I want to.

When trying to reproduce a problem, I'd encourage you to create a MCVE by removing all irrelevant code. In this case, something like this seems to produce the same error:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>()
.chunks(segment_size);
}
fn main() {}
Let's break that down:
Create an iterator between numbers.
Collect all of them into a Vec<usize>.
Return an iterator that contains references to the vector.
Since the vector isn't bound to any variable, it's dropped at the end of the statement. This would leave the iterator pointing to an invalid region of memory, so that's disallowed.
Check out the definition of slice::chunks:
fn chunks(&self, size: usize) -> Chunks<T>
pub struct Chunks<'a, T> where T: 'a {
// some fields omitted
}
The lifetime marker 'a lets you know that the iterator contains a reference to something. Lifetime elision has removed the 'a from the function, which looks like this, expanded:
fn chunks<'a>(&'a self, size: usize) -> Chunks<'a, T>
Check out this line of the error message:
help: consider using a let binding to increase its lifetime
You can follow that as such:
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo = (segment_size..max_val)
.collect::<Vec<_>>();
let bar = foo.chunks(segment_size);
}
fn main() {}
Although I'd write it as
fn segmented_sieve_parallel(max_val: usize, segment_size: usize) {
let foo: Vec<_> = (segment_size..max_val).collect();
let bar = foo.chunks(segment_size);
}
fn main() {}
Re-inserting this code back into your original problem won't solve the problem, but it will be much easier to understand. That's because you are attempting to pass a reference to thread::spawn, which may outlive the current thread. Thus, everything passed to thread::spawn must have the 'static lifetime. There are tons of questions that detail why that must be prevented and a litany of solutions, including scoped threads and cloning the vector.
Cloning the vector is the easiest, but potentially inefficient:
for this_segment in extended_segments.clone() {
let this_segment = this_segment.to_vec();
// ...
}

Borrowed value doesn't live long enough, trying to expose iterators instead of concrete Vec representations of the data

I have a struct representing a grid of data, and accessors for the rows and columns. I'm trying to add accessors for the rows and columns which return iterators instead of Vec.
use std::slice::Iter;
#[derive(Debug)]
pub struct Grid<Item : Copy> {
raw : Vec<Vec<Item>>
}
impl <Item : Copy> Grid <Item>
{
pub fn new( data: Vec<Vec<Item>> ) -> Grid<Item> {
Grid{ raw : data }
}
pub fn width( &self ) -> usize {
self.rows()[0].len()
}
pub fn height( &self ) -> usize {
self.rows().len()
}
pub fn rows( &self ) -> Vec<Vec<Item>> {
self.raw.to_owned()
}
pub fn cols( &self ) -> Vec<Vec<Item>> {
let mut cols = Vec::new();
for i in 0..self.height() {
let col = self.rows().iter()
.map( |row| row[i] )
.collect::<Vec<Item>>();
cols.push(col);
}
cols
}
pub fn rows_iter( &self ) -> Iter<Vec<Item>> {
// LIFETIME ERROR HERE
self.rows().iter()
}
pub fn cols_iter( &self ) -> Iter<Vec<Item>> {
// LIFETIME ERROR HERE
self.cols().iter()
}
}
Both functions rows_iter and cols_iter have the same problem: error: borrowed value does not live long enough. I've tried a lot of things, but pared it back to the simplest thing to post here.

You can use the method into_iter which returns std::vec::IntoIter. The function iter usually only borrows the data source iterated over. into_iter has ownership of the data source. Thus the vector will live as long as the actual data.
pub fn cols_iter( &self ) -> std::vec::IntoIter<Vec<Item>> {
self.cols().intoiter()
}
However, I think that the design of your Grid type could be improved a lot. Always cloning a vector is not a good thing (to name one issue).

Iterators only contain borrowed references to the original data structure; they don't take ownership of it. Therefore, a vector must live longer than an iterator on that vector.
rows and cols allocate and return a new Vec. rows_iter and cols_iter are trying to return an iterator on a temporary Vec. This Vec will be deallocated before rows_iter or cols_iter return. That means that an iterator on that Vec must be deallocated before the function returns. However, you're trying to return the iterator from the function, which would make the iterator live longer than the end of the function.
There is simply no way to make rows_iter and cols_iter compile as is. I believe these methods are simply unnecessary, since you already provide the public rows and cols methods.

How to achieve equivalent of take_while on a slice?

Rust slices do not currently support some iterator methods, i.e. take_while. What is the best way to implement take_while for slices?
const STRHELLO:&'static[u8] = b"HHHello";
fn main() {
let subslice:&[u8] = STRHELLO.iter().take_while(|c|(**c=='H' as u8)).collect();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3),subslice);
assert!(subslice==STRHELLO.slice_to(3));
}
results in the error:
<anon>:6:74: 6:83 error: the trait `core::iter::FromIterator<&u8>` is not implemented for the type `&[u8]`
This code in the playpen:
http://is.gd/1xkcUa

First of all, the issue you have is that collect is about creating a new collection, while a slice is about referencing a contiguous range of items in an existing array (be it dynamically allocated or not).
I am afraid that due to the nature of traits, the fact that the original container (STRHELLO) was a contiguous range has been lost, and cannot be reconstructed after the fact. I am also afraid that any use of "generic" iterators simply cannot lead to the desired output; the type system would have to somehow carry the fact that:
the original container was a contiguous range
the chain of operations performed so far conserve this property
This may be doable or not, but I do not see it done now, and I am unsure in what way it could be elegantly implemented.
On the other hand, you can go about it in the do-it-yourself way:
fn take_while<'a>(initial: &'a [u8], predicate: |&u8| -> bool) -> &'a [u8] { // '
let mut i = 0u;
for c in initial.iter() {
if predicate(c) { i += 1; } else { break; }
}
initial.slice_to(i)
}
And then:
fn main() {
let subslice: &[u8] = take_while(STRHELLO, |c|(*c==b'H'));
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
}
Note: 'H' as u8 can be rewritten as b'H' as show here, which is symmetric with the strings.

It is possible via some heavy gymnastics to implement this functionality using the stock iterators:
use std::raw::Slice;
use std::mem::transmute;
/// Splice together to slices of the same type that are contiguous in memory.
/// Panics if the slices aren't contiguous with "a" coming first.
/// i.e. slice b must follow slice a immediately in memory.
fn splice<'a>(a:&'a[u8], b:&'a[u8]) -> &'a[u8] {
unsafe {
let aa:Slice<u8> = transmute(a);
let bb:Slice<u8> = transmute(b);
let pa = aa.data as *const u8;
let pb = bb.data as *const u8;
let off = aa.len as int; // Risks overflow into negative!!!
assert!(pa.offset(off) == pb, "Slices were not contiguous!");
let cc = Slice{data:aa.data,len:aa.len+bb.len};
transmute(cc)
}
}
/// Wrapper around splice that lets you use None as a base case for fold
/// Will panic if the slices cannot be spliced! See splice.
fn splice_for_fold<'a>(oa:Option<&'a[u8]>, b:&'a[u8]) -> Option<&'a[u8]> {
match oa {
Some(a) => Some(splice(a,b)),
None => Some(b),
}
}
/// Implementaton using pure iterators
fn take_while<'a>(initial: &'a [u8],
predicate: |&u8| -> bool) -> Option<&'a [u8]> {
initial
.chunks(1)
.take_while(|x|(predicate(&x[0])))
.fold(None, splice_for_fold)
}
usage:
const STRHELLO:&'static[u8] = b"HHHello";
let subslice: &[u8] = super::take_while(STRHELLO, |c|(*c==b'H')).unwrap();
println!("Expecting: {}, Got {}",STRHELLO.slice_to(3), subslice);
assert!(subslice == STRHELLO.slice_to(3));
Matthieu's implementation is way cleaner if you just need take_while. I am posting this anyway since it may be a path towards solving the more general problem of using iterator functions on slices cleanly.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Multi-dimensional vector borrowing - rust

Related

can I create a custom iterator the iterates over one sequence then another (chain doesnt work)

Iterating over named regex groups in Rust

Why does this variable definition imply static lifetime?

Borrowed value doesn't live long enough, trying to expose iterators instead of concrete Vec representations of the data

How to achieve equivalent of take_while on a slice?

Categories

Resources