Re-use already advanced iterator for different function - rust

While iterating over lines in a file I need to first do "task_A" and then "task_B". The first few lines there is some data that I need to put into some data structure (task_A) and after that the lines describe how the data inside of the data structure is manipulated (task_B). Right now I use a for-loop with enumerate and if-else statements that switch depending on which file number:
let file = File::open("./example.txt").unwrap();
let reader = BufReader::new(file);
for (i, lines) in reader.lines().map(|l| l.unwrap()).enumerate() {
if i < n {
do_task_a(&lines);
} else {
do_task_b(&lines);
}
}
There is also the take_while()-method for iterators. But this only solves one part. Ideally I would pass the iterator for n steps to one function and after that to another function. I want to have a solution that only needs to iterate over the file one time.
(For anyone wondering: I want a more elegant solution for 5th day of Advent of Code 2022 Is there a way to do that? To "re-use" the iterator when it is already advanced n steps?

Looping or using an iterator adapter will consume an iterator. But if I is an iterator then so is &mut I!
You can use that instance to partially iterate through the iterator with one adapter and then continue with another. The first use consumes only the mutable reference, but not the iterator itself. For example using take:
let mut it = reader.lines().map(|l| l.unwrap());
for lines in (&mut it).take(n) {
do_task_a(&lines);
}
for lines in it {
do_task_b(&lines);
}
But I think your original code is still completely fine.

Related

How to iterate over HashMap starting from given key?

Given a HashMap of n elements how does one start iteration from n-x element.
The order of elements does not matter, the only problem I need to solve is to start iteration from given key.
Example:
let mut map: HashMap<&str, i32> = HashMap::new();
map.insert("one", 1);
map.insert("two", 2);
map.insert("three", 3);
map.insert("four", 4);
[...]
for (k, v) in map {
//how to start iteration from third item and not the first one
}
Tried to google it but no examples found so far.
Tried to google it but no examples found so far.
That's because as Chayim Friedman notes it doesn't really make sense, a hashmap has an essentially random internal order, which means it has an arbitrary iteration order. Iterating from or between keys (/ entries) thus doesn't make much sense.
So it sounds a lot like an XY problem, what is the reason why you're trying to iterate "starting from a given key"?
Though if you really want that, you can just use the skip_while adapter, and skip while you have not found the key you're looking for.
Alternatively, since your post is ambiguous (you talk about both key and position) you can use the skip adapter to skip over a fixed number of items.
Technically neither will start iterating from that entry, they'll both start iterating from 0 but will only yield items following the specified break point. The standard library's hashmap has no support for range iteration (because that doesn't really make any sense on hashmap), and its iterators are not random access either (for similar reason).
You may want to use a BTreeMap, which has sorted keys and a range function which iterates over a range of keys.
use std::collections::BTreeMap;
fn main() {
let mut map = BTreeMap::new();
map.insert(1, "one");
map.insert(2, "two");
map.insert(3, "three");
for (&key, &value) in map.range(2..) {
println!("{key}: {value}");
}
}
// 2: two
// 3: three

What is the proper way of modifying a value of an entry in a HashMap?

I am a beginner in Rust, I haven't finished the "Book" yet, but one thing made me ask this question.
Considering this code:
fn main() {
let mut entries = HashMap::new();
entries.insert("First".to_string(), 10);
entries.entry("Second".to_string()).or_insert(20);
assert_eq!(10, *entries.get("First").unwrap());
entries.entry(String::from("First")).and_modify(|value| { *value = 20});
assert_eq!(20, *entries.get("First").unwrap());
entries.insert("First".to_string(), 30);
assert_eq!(30, *entries.get("First").unwrap());
}
I have used two ways of modifying an entry:
entries.entry(String::from("First")).and_modify(|value| { *value = 20});
entries.insert("First".to_string(), 30);
The insert way looks clunkish, and I woundn't personally use it to modify a value in an entry, but... it works. Nevertheless, is there a reason not to use it other than semantics? As I said, I'd rather use the entry construct than just bruteforcing an update using insert with an existing key. Something a newbie Rustacean like me could not possibly know?
insert() is a bit more idiomatic when you are replacing an entire value, particularly when you don't know (or care) if the value was present to begin with.
get_mut() is more idiomatic when you want to do something to a value that requires mutability, such as replacing only one field of a struct or invoking a method that requires a mutable reference. If you know the key is present you can use .unwrap(), otherwise you can use one of the other Option utilities or match.
entry(...).and_modify(...) by itself is rarely idiomatic; it's more useful when chaining other methods of Entry together, such as where you want to modify a value if it exists, otherwise add a different value. You might see this pattern when working with maps where the values are totals:
entries.entry(key)
.and_modify(|v| *v += 1)
.or_insert(1);

Writing expression in polars-lazy in rust

I need to write my own expression in polars_lazy. Based on my understanding from the source code I need to write a function that returns Expr::Function. The problem is that in order to construct an object of this type, an object of type FunctionOptions must be provided. The caveat is that this class is public but the members are pub(crate) and thus outside of the create one cannot construct such an object.
Are there ways around this?
I don't think you're meant to directly construct Exprs. Instead, you can use functions like polars_lazy::dsl::col() and polars_lazy::dsl::lit() to create expressions, then use methods on Expr to build up the expression. Several of those methods, such as map() and apply(), will give you an Expr::Function.
Personally I think the Rust API for polars is not well documented enough to really use yet. Although the other answer and comments mention apply and map, they don't mention how or the trade-offs. I hope this answer prompts others to correct me with the "right" way to do things.
So first, here's how to use apply on lazy dataframe, even though lazy dataframes don't take apply directly as a method as eager ones do, and mutating in-place:
// not sure how you'd find this type easily from apply documentation
let o = GetOutput::from_type(DataType::UInt32);
// this mutates two in place
let lf = lf.with_column(col("two").apply(str_to_len, o));
And here's how to use it while not mutating the source column and adding a new output column instead:
let o = GetOutput::from_type(DataType::UInt32);
// this adds new column len, two is unchanged
let lf = lf.with_column(col("two").alias("len").apply(str_to_len, o));
With the str_to_len looking like:
fn str_to_len(str_val: Series) -> Result<Series> {
let x = str_val
.utf8()
.unwrap()
.into_iter()
// your actual custom function would be in this map
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
Ok(x.into_series())
}
Note that it takes Series rather than &Series and wraps in Result.
With a regular (non-lazy) dataframe, apply still mutates but doesn't require with_column:
df.apply("two", str_to_len).expect("applied");
Whereas eager/non-lazy's with_column doesn't require apply:
// the fn we use to make the column names it too
df.with_column(str_to_len(df.column("two").expect("has two"))).expect("with_column");
And str_to_len has slightly different signature:
fn str_to_len(str_val: &Series) -> Series {
let mut x = str_val
.utf8()
.unwrap()
.into_iter()
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
// NB. this is naming the chunked array, before we even get to a series
x.rename("len");
x.into_series()
}
I know there's reasons to have lazy and eager operate differently, but I wish the Rust documentation made this easier to figure out.

Unable to join threads from JoinHandles stored in a Vector - Rust

I am writing a program which scrapes data from a list of websites and stores it into a struct called Listing which is then collected into a final struct called Listings.
use std::{ thread,
sync::{ Arc, Mutex }
};
fn main() {
// ... some declarations
let sites_count = site_list.len(); // site_list is a vector containing the list of websites
// The variable to be updated by the thread instances ( `Listing` is a struct holding the information )
let listings: Arc<Mutex<Vec<Vec<types::Listing<String>>>>> = Arc::new(Mutex::new(Vec::new()));
// A vector containing all the JoinHandles for the spawned threads
let mut fetch_handle: Vec<thread::JoinHandle<()>> = Vec::new();
// Spawn a thread for each concurrent website
for i in 0..sites_count {
let slist = Arc::clone(&site_list);
let listng = Arc::clone(&listings);
fetch_handle.push(
thread::spawn(move || {
println!("⌛ Spawned Thread: {}",i);
let site_profile = read_profile(&slist[i]);
let results = function1(function(2)) // A long list of functions from a submodule that make the http request and parse the data into `Listing`
listng.lock().unwrap().push(results);
}));
}
for thread in fetch_handle.iter_mut() {
thread.join().unwrap();
}
// This is the one line version of the above for loop - yields the same error.
// fetch_handle.iter().map(|thread| thread.join().unwrap());
// The final println to just test feed the target struct `Listings` with the values
println!("{}",types::Listings{ date_time: format!("{}", chrono::offset::Local::now()),
category: category.to_string(),
query: (&search_query).to_string(),
listings: listings.lock().unwrap() // It prevents me from owning this variable
}.to_json());
}
To which I stumble upon the error
error[E0507]: cannot move out of `*thread` which is behind a mutable reference
--> src/main.rs:112:9
|
112 | thread.join().unwrap();
| ^^^^^^ move occurs because `*thread` has type `JoinHandle<()>`, which does not implement the `Copy` trait
It prevents me from owning the variable after the thread.join() for loop.
When I tried assigning to check the output type
let all_listings = listings.lock().unwrap()
all_listings reports a type of MutexGuard(which is also true inside the thread for loop, but it allows me to call vector methods on it) and wouldn't allow me to own the data.
I changed the data type in the Listings struct to hold a reference instead of owning it. But it seems so the operations I perform on the struct in .to_json() require me to own its value.
The type declaration for listings inside the Listings Struct is Vec<Vec<Listing<T>>.
This code however works just fine when I move the .join().unwrap() to the end of thread::spawn() block or apply to its handle inside the for loop(whilst disabling the external .join() ). But that makes all the threads execute in a chain which is not desirable, since the main intention of using threads was to execute same functions with different data values simultaneously.
I am quite new to Rust in general(been 3 weeks since I am using it) and its my first time ever implementing Multithreading. I have only ever written single threaded programs in java and python before this, so if possible be a little noob friendly. However any help is appreciated :) .
I figured out what needed to happen. First, for this kind of thing, I agree that into_iter does what you want, but it IMO it obscures why. The why is that when you borrow on it, it doesn't own the value, which is necessary for the join() method on the JoinHandle<()> struct. You'll note its signature takes self and not &mut self or anything like that. So it needs the real object there.
To do that, you need to get your object out of the Vec<thread::JoinHandle<()>> that it's inside. As stated, into_iter does this, because it "destroys" the existing Vec and takes it over, so it fully owns the contents, and the iteration returns the "actual" objects to be joined without a copy. But you can also own the contents one at a time with remove as demonstrated below:
while fetch_handle.len() > 0 {
let cur_thread = fetch_handle.remove(0); // moves it into cur_thread
cur_thread.join().unwrap();
}
This is instead of your for loop above. The complete example in the playground is linked if you want to try that.
I hope this is clearer on how to work with things that can't be copied, but methods need to fully own them, and the issues in getting them out of collections. Imagine if you needed to end just one of those threads, and you knew which one to end, but didn't want to end them all? Vec<_>::remove would work, but into_iter would not.
Thank you for asking a question which made me think, and prompted me to go look up the answer (and try it) myself. I'm still learning Rust as well, so this helped a lot.
Edit:
Another way to do it with pop() and while let:
while let Some(cur_thread) = fetch_handle.pop() {
cur_thread.join().unwrap();
}
This goes through it from the end (pop pulls it off of the end, not the front), but doesn't reallocate or move the vector contents via pulling it off the front either.
Okay so the problem as pointed out by #PiRocks seems to be in the for loop that joins the threads.
for thread in fetch_handle.iter_mut() {
thread.join().unwrap();
}
The problem is the iter_mut(). Using into_iter() instead
for thread in fetch_handle.into_iter() {
thread.join().unwrap();
}
yields no errors and the program runs across the threads simultaneously as required.
The explanation to this, as given by #Kevin Anderson is:
Using into_iter() causes JoinHandle<()> to move into the for loop.
Also looking into the docs(std::iter)
I found that iter() and iter_mut() iterate over a reference of self whereas into_iter() iterates over self directly(owning it).
So iter_mut() was iterating over &mut thread::JoinHandle<()> instead of thread::JoinHandle<()>.

How to get a slice from an Iterator?

I started to use clippy as a linter. Sometimes, it shows this warning:
writing `&Vec<_>` instead of `&[_]` involves one more reference and cannot be
used with non-Vec-based slices. Consider changing the type to `&[...]`,
#[warn(ptr_arg)] on by default
I changed the parameter to a slice but this adds boilerplate on the call side. For instance, the code was:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
but now it is:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect::<Vec<_>>();
function(&names);
otherwise, I get the following error:
error: the trait `core::marker::Sized` is not implemented for the type
`[collections::string::String]` [E0277]
So I wonder if there is a way to convert an Iterator to a slice or avoid having to specify the collected type in this specific case.
So I wonder if there is a way to convert an Iterator to a slice
There is not.
An iterator only provides one element at a time, whereas a slice is about getting several elements at a time. This is why you first need to collect all the elements yielded by the Iterator into a contiguous array (Vec) before being able to use a slice.
The first obvious answer is not to worry about the slight overhead, though personally I would prefer placing the type hint next to the variable (I find it more readable):
let names: Vec<_> = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
Another option would be for function to take an Iterator instead (and an iterator of references, at that):
let names = args.arguments.iter().map(|arg| &arg.name);
function(names);
After all, iterators are more general, and you can always "realize" the slice inside the function if you need to.
So I wonder if there is a way to convert an Iterator to a slice
There is. (in applicable cases)
Got here searching "rust iter to slice", for my use-case, there was a solution:
fn main() {
// example struct
#[derive(Debug)]
struct A(u8);
let list = vec![A(5), A(6), A(7)];
// list_ref passed into a function somewhere ...
let list_ref: &[A] = &list;
let mut iter = list_ref.iter();
// consume some ...
let _a5: Option<&A> = iter.next();
// now want to eg. return a slice of the rest
let slice: &[A] = iter.as_slice();
println!("{:?}", slice); // [A(6), A(7)]
}
That said, .as_slice is defined on an iter of an existing slice, so the previous answerer was correct in that if you've got, eg. a map iter, you would need to collect it first (so there is something to slice from).
docs: https://doc.rust-lang.org/std/slice/struct.Iter.html#method.as_slice

Resources