Collect items from an iterator at a specific index - rust

I was wondering if it is possible to use .collect() on an iterator to grab items at a specific index. For example if I start with a string, I would normally do:
let line = "Some line of text for example";
let l = line.split(" ");
let lvec: Vec<&str> = l.collect();
let text = &lvec[3];
But what would be nice is something like:
let text: &str = l.collect(index=(3));

No, it's not; however you can easily filter before you collect, which in practice achieves the same effect.
If you wish to filter by index, you need to add the index in and then strip it afterwards:
enumerate (to add the index to the element)
filter based on this index
map to strip the index from the element
Or in code:
fn main() {
let line = "Some line of text for example";
let l = line.split(" ")
.enumerate()
.filter(|&(i, _)| i == 3 )
.map(|(_, e)| e);
let lvec: Vec<&str> = l.collect();
let text = &lvec[0];
println!("{}", text);
}
If you only wish to get a single index (and thus element), then using nth is much easier. It returns an Option<&str> here, which you need to take care of:
fn main() {
let line = "Some line of text for example";
let text = line.split(" ").nth(3).unwrap();
println!("{}", text);
}
If you can have an arbitrary predicate but wishes only the first element that matches, then collecting into a Vec is inefficient: it will consume the whole iterator (no laziness) and allocate potentially a lot of memory that is not needed at all.
You are thus better off simply asking for the first element using the next method of the iterator, which returns an Option<&str> here:
fn main() {
let line = "Some line of text for example";
let text = line.split(" ")
.enumerate()
.filter(|&(i, _)| i % 7 == 3 )
.map(|(_, e)| e)
.next()
.unwrap();
println!("{}", text);
}
If you want to select part of the result, by index, you may also use skip and take before collecting, but I guess you have enough alternatives presented here already.

There is a nth function on Iterator that does this:
let text = line.split(" ").nth(3).unwrap();

No; you can use take and next, though:
let line = "Some line of text for example";
let l = line.split(" ");
let text = l.skip(3).next();
Note that this results in text being an Option<&str>, as there's no guarantee that the sequence actually has at least four elements.
Addendum: using nth is definitely shorter, though I prefer to be explicit about the fact that accessing the nth element of an iterator necessarily consumes all the elements before it.

For anyone who may be interested, you can can do loads of cool things with iterators (thanks Matthieu M), for example to get multiple 'words' from a string according to their index, you can use filter along with logical or || to test for multiple indexes !
let line = "FCC2CCMACXX:4:1105:10758:14389# 81 chrM 1 32 10S90M = 16151 16062"
let words: Vec<&str> = line.split(" ")
.enumerate()
.filter(|&(i, _)| i==1 || i==3 || i==6 )
.map(|(_, e) | e)
.collect();

Related

Access value after it has been borrowed

I have the following function. It is given a file. It should return a random line from the file as a string.
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let word_count = lines.count();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let element = lines.nth(y);
match element {
Some(x) => println!("Result: {}", x.unwrap()),
None => println!("Error with nth"),
}
let word = String::new(""); // Once the error is gone. I would create the string.
return word;
}
But I keep getting this error:
93 | let lines = reader.lines();
| ----- move occurs because `lines` has type `std::io::Lines<BufReader<File>>`, which does not implement the `Copy` trait
94 | let word_count = lines.count();
| ------- `lines` moved due to this method call
...
99 | let element = lines.nth(y);
| ^^^^^^^^^^^^ value borrowed here after move
|
I am new to Rust and have been learning by try and error. I don't know how to access the data after I have called the count function. If there is another method to accomplish what I want, I would gladly welcome it.
The .count() method consumes the iterator. From the documentation
Consumes the iterator, counting the number of iterations and returning it.
This method will call next repeatedly until None is encountered, returning the number of times it saw Some. Note that next has to be called at least once even if the iterator does not have any elements.
In other words, it reads the file content and discards it. If you want to get the Nth line, then you have to re-read the file using another iterator instance.
If your file is small, you can save the read lines in a vector:
let lines = reader.lines().collect::<Vec<String>>();
Then the length of the vector is the number of lines and you can avoid re-reading the file, but if it's a large file you may end-up crashing with "out of memory" error. In that case you should re-read the file content, or use a better strategy such as indexing where the new lines are, so you can jump straight to the new line, without having to re-read a lot of data.
The value returned by lines is an iterator, which reads the file sequentially. To count the number of lines, the iterator is consumed: self is taken by value; ownership is transferred into the count() function. So you can't rewind and then request the nth line.
The easiest solution is to read all the lines into a vector:
let lines = reader.lines().collect::<Vec<String>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let word = lines[y].clone();
return word;
Notice the clone call: you can't simply write return lines[y]; because you'd be borrowing the string from the vector, but the vector is destroyed as soon as the function returns. By returning a clone of the string, this is avoided.
(to_owned or even to_string would also work. You can also avoid a copy by using swap_remove; I'm not sure there is a more elegant way to move one element from a vector and discard the rest.)
Note that counting the lines and then selecting one of them requires you to either rewind the iterator and go through it twice (once to count and once to select), or to store everything in memory first (e.g. with .collect::<Vec<_>>). Selecting a random line from the list can however be done in a single pass by randomly choosing on each line whether to keep the currently selected line or replacing it with the latest read line:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let mut selected = lines.next().unwrap();
let mut count = 0;
for l in lines {
count += 1;
if thread_rng().gen_range (0, count) == 0 {
selected = l;
}
}
match selected {
Ok(x) => return x,
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}
Or of course the simplest way is to just use choose:
fn get_word(word_list: File) -> String {
use rand::seq::IteratorRandom;
let reader = BufReader::new(word_list);
match reader.lines.choose (thread_rng()) {
Some (Ok (x)) => return x,
_ => {
print!("Error get_word");
return String::new();
}
}
}
In order to solve this problem I used the solution given of using .collect::<Vec<String>> but the whole solution needs a little more work. At least in my case.
First: .lines returns a Iterator of type Result<std::string::String, std::io::Error>.
Second: To access the value of this vector I have to borrow it with &.
Here the working function:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines().collect::<Vec<_>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
match &lines[y] {
Ok(x) => return x.to_string(),
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}

How to remove characters from specific index in String?

I have an application where I am receiving a string with some repetitive characters. I am receiving input as a String. How to remove the characters from specific index?
main.rs
fn main() {
let s:String = "{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}8668982856274}".to_string();
println!("{}", s);
}
how can I get result
"{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}"
instead of
"{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}}8668982856274}"
String indexing works only with bytes, thus you need to find an index for the appropriate byte slice like this:
let mut s = "{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}8668982856274}";
let closing_bracket_idx = s
.as_bytes()
.iter()
.position(|&x| x == b'}')
.map(|i| i + 1)
.unwrap_or_else(|| s.len());
let v: serde_json::Value = serde_json::from_str(&s[..closing_bracket_idx]).unwrap();
println!("{:?}", v);
However, keep in mind, this approach doesn't really work in general for more complex cases, for example } in a json string value, or nested objects, or a type other than an object at the upmost level (e.g. [1, {2: 3}, 4]). More neat way is using parser capabilities to ignore of the trailing, as an example for serde_json:
let v = serde_json::Deserializer::from_str(s)
.into_iter::<serde_json::Value>()
.next()
.expect("empty input")
.expect("invalid json value");
println!("{:?}", v);

How to find a string of multiple occurences in a vector?

I have a vector of strings, and I want to find a string that has the number of occurrences more than one. I've tried this but didn't work.
let strings = vec!["Rust", "Rest", "Rust"]; // I want to find "Rust" in this case
let val = strings
.into_iter()
.find(|x| o.into_iter().filter(|y| x == y).count() >= 2)
// sorry o ^ here is supposed to be strings
.unwrap();
There are two issues in your code:
o doesn't exist. I assume you meant to use strings instead.
into_itertakes ownership of the value, so once you have called into_iter on strings (or o), you can't call it again. You should use plain iter instead.
Here's a fixed version:
let strings = vec!["Rust", "Rest", "Rust"]; // I want to find "Rust" in this case
let val = strings
.iter()
.find(|x| strings.iter().filter(|y| x == y).count() >= 2)
.unwrap();
Note however that this is pretty slow. Depending on your requirements, there are more efficient alternatives:
Sort the strings array first. Then you only need to look at the next item to see if it is duplicated instead of needing to go through the whole array over and over. Advantage: no extra memory used. Drawback: you lose the original order.
Use an auxiliary variable to store the values you've already seen and/or count the number of occurences of each string. This may be a HashSet, BTreeSet, HashMap or BTreeMap. See #Netwave's answer. Advantage: doesn't change the input array. Drawback: uses memory to keep track of the duplicates.
You can count the appearances in O(n) with a tree or table like:
fn main() {
let strings = vec!["Rust", "Rest", "Rust"];
let mut sorted_data : HashMap<&str, u32> = HashMap::new();
strings.iter().for_each(|item| {
if !sorted_data.contains_key(item) {
sorted_data.insert(item, 0);
}
*sorted_data.get_mut(item).unwrap() += 1;
});
println!("{:?}", sorted_data);
}
The just use the one with the biggest key, for example with the new fold_first:
let result = sorted_data.iter().fold_first(|(k1, v1), (k2, v2)| { if v2 > v1 {(k2, v2)} else {(k1, v1)}}).unwrap();
Playground

How do I get the first character out of a string?

I want to get the first character of a std::str. The method char_at() is currently unstable, as is String::slice_chars.
I have come up with the following, but it seems excessive to get a single character and not use the rest of the vector:
let text = "hello world!";
let char_vec: Vec<char> = text.chars().collect();
let ch = char_vec[0];
UTF-8 does not define what "character" is so it depends on what you want. In this case, chars are Unicode scalar values, and so the first char of a &str is going to be between one and four bytes.
If you want just the first char, then don't collect into a Vec<char>, just use the iterator:
let text = "hello world!";
let ch = text.chars().next().unwrap();
Alternatively, you can use the iterator's nth method:
let ch = text.chars().nth(0).unwrap();
Bear in mind that elements preceding the index passed to nth will be consumed from the iterator.
I wrote a function that returns the head of a &str and the rest:
fn car_cdr(s: &str) -> (&str, &str) {
for i in 1..5 {
let r = s.get(0..i);
match r {
Some(x) => return (x, &s[i..]),
None => (),
}
}
(&s[0..0], s)
}
Use it like this:
let (first_char, remainder) = car_cdr("test");
println!("first char: {}\nremainder: {}", first_char, remainder);
The output looks like:
first char: t
remainder: est
It works fine with chars that are more than 1 byte.
Get the first single character out of a string w/o using the rest of that string:
let text = "hello world!";
let ch = text.chars().take(1).last().unwrap();
It would be nice to have something similar to Haskell's head function and tail function for such cases.
I wrote this function to act like head and tail together (doesn't match exact implementation)
pub fn head_tail<T: Iterator, O: FromIterator<<T>::Item>>(iter: &mut T) -> (Option<<T>::Item>, O) {
(iter.next(), iter.collect::<O>())
}
Usage:
// works with Vec<i32>
let mut val = vec![1, 2, 3].into_iter();
println!("{:?}", head_tail::<_, Vec<i32>>(&mut val));
// works with chars in two ways
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, String>(&mut val));
// calling the function with Vec<char>
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, Vec<char>>(&mut val));
NOTE: The head_tail function doesn't panic! if the iterator is empty. If this matched Haskell's head/tail output, this would have thrown an exception if the iterator was empty. It might also be good to use iterable trait to be more compatible to other types.
If you only want to test for it, you can use starts_with():
"rust".starts_with('r')
"rust".starts_with(|c| c == 'r')
I think it is pretty straight forward
let text = "hello world!";
let c: char = text.chars().next().unwrap();
next() takes the next item from the iterator
To “unwrap” something in Rust is to say, “Give me the result of the computation, and if there was an error, panic and stop the program.”
The accepted answer is a bit ugly!
let text = "hello world!";
let ch = &text[0..1]; // this returns "h"

How do I split a string in Rust?

From the documentation, it's not clear. In Java you could use the split method like so:
"some string 123 ffd".split("123");
Use split()
let mut split = "some string 123 ffd".split("123");
This gives an iterator, which you can loop over, or collect() into a vector.
for s in split {
println!("{}", s)
}
let vec = split.collect::<Vec<&str>>();
// OR
let vec: Vec<&str> = split.collect();
There are three simple ways:
By separator:
s.split("separator") | s.split('/') | s.split(char::is_numeric)
By whitespace:
s.split_whitespace()
By newlines:
s.lines()
By regex: (using regex crate)
Regex::new(r"\s").unwrap().split("one two three")
The result of each kind is an iterator:
let text = "foo\r\nbar\n\nbaz\n";
let mut lines = text.lines();
assert_eq!(Some("foo"), lines.next());
assert_eq!(Some("bar"), lines.next());
assert_eq!(Some(""), lines.next());
assert_eq!(Some("baz"), lines.next());
assert_eq!(None, lines.next());
There is a special method split for struct String:
fn split<'a, P>(&'a self, pat: P) -> Split<'a, P> where P: Pattern<'a>
Split by char:
let v: Vec<&str> = "Mary had a little lamb".split(' ').collect();
assert_eq!(v, ["Mary", "had", "a", "little", "lamb"]);
Split by string:
let v: Vec<&str> = "lion::tiger::leopard".split("::").collect();
assert_eq!(v, ["lion", "tiger", "leopard"]);
Split by closure:
let v: Vec<&str> = "abc1def2ghi".split(|c: char| c.is_numeric()).collect();
assert_eq!(v, ["abc", "def", "ghi"]);
split returns an Iterator, which you can convert into a Vec using collect: split_line.collect::<Vec<_>>(). Going through an iterator instead of returning a Vec directly has several advantages:
split is lazy. This means that it won't really split the line until you need it. That way it won't waste time splitting the whole string if you only need the first few values: split_line.take(2).collect::<Vec<_>>(), or even if you need only the first value that can be converted to an integer: split_line.filter_map(|x| x.parse::<i32>().ok()).next(). This last example won't waste time attempting to process the "23.0" but will stop processing immediately once it finds the "1".
split makes no assumption on the way you want to store the result. You can use a Vec, but you can also use anything that implements FromIterator<&str>, for example a LinkedList or a VecDeque, or any custom type that implements FromIterator<&str>.
There's also split_whitespace()
fn main() {
let words: Vec<&str> = " foo bar\t\nbaz ".split_whitespace().collect();
println!("{:?}", words);
// ["foo", "bar", "baz"]
}
The OP's question was how to split with a multi-character string and here is a way to get the results of part1 and part2 as Strings instead in a vector.
Here splitted with the non-ASCII character string "☄☃🤔" in place of "123":
let s = "☄☃🤔"; // also works with non-ASCII characters
let mut part1 = "some string ☄☃🤔 ffd".to_string();
let _t;
let part2;
if let Some(idx) = part1.find(s) {
part2 = part1.split_off(idx + s.len());
_t = part1.split_off(idx);
}
else {
part2 = "".to_string();
}
gets: part1 = "some string "
         part2 = " ffd"
If "☄☃🤔" not is found part1 contains the untouched original String and part2 is empty.
Here is a nice example in Rosetta Code -
Split a character string based on change of character - of how you can turn a short solution using split_off:
fn main() {
let mut part1 = "gHHH5YY++///\\".to_string();
if let Some(mut last) = part1.chars().next() {
let mut pos = 0;
while let Some(c) = part1.chars().find(|&c| {if c != last {true} else {pos += c.len_utf8(); false}}) {
let part2 = part1.split_off(pos);
print!("{}, ", part1);
part1 = part2;
last = c;
pos = 0;
}
}
println!("{}", part1);
}
into that
Task
Split a (character) string into comma (plus a blank) delimited strings based on a change of character (left to right).
If you are looking for the Python-flavoured split where you tuple-unpack the two ends of the split string, you can do
if let Some((a, b)) = line.split_once(' ') {
// ...
}

Resources