Detect duplicated elements of a string slice happening in order

Detect duplicated elements of a string slice happening in order - rust

I need to detect and list string characters of slice that repeat themselves in order more or equal than N times. I managed to write non-higher-order-function solution in Rust already (below), but I wonder if this can be simplified to chaining iter methods.
The idea:
let v = "1122253225";
let n = 2;
Output:
There are 2 repetition of '1'
There are 3 repetition of '2'
There are 2 repetition of '2'
Indexes where repetition happens are not important. Repetition must happen in order (ie. 3 repetition of '2' separated by other values from the other 2 repetition of '2' counts as separate output lines).
My non-iterator solution:
let mut cur_ch = '\0';
let mut repeat = 0;
for ch in v.chars() {
if ch == cur_ch {
repeat = repeat + 1;
}
else {
if repeat >= n {
printf!("There are {} repetition of '{}'", repeat, cur_ch);
}
cur_ch = ch;
repeat = 1;
}
}
if repeat >= n {
printf!("There are {} repetition of '{}'", repeat, cur_ch);
}
It works, but is there a better way to do so with chaining iter methods?

Here is a solution that uses scan and filter_map:
fn main() {
let s = "112225322555";
let n = 2;
let i = s
.chars()
.map(|v| Some(v))
.chain(std::iter::once(None))
.scan((0, None), |(count, ch), v| match ch {
Some(c) if *c == v => {
*count += 1;
Some((None, *count))
}
_ => Some((ch.replace(v), std::mem::replace(count, 1))),
})
.filter_map(|(ch, count)| match ch {
Some(Some(ch)) if count >= n => Some((ch, count)),
_ => None,
});
for (ch, num) in i {
println!("There are {} repititions of {}", num, ch);
}
}
Playground Link
The first step is to use scan to count the number of adjacent characters. The first argument to scan is a state variable, which gets passed to each call of the closure that you pass as the second argument. In this case the state variable is a tuple containing the current character and the number of times it has been seen.
Note:
We need to extend the iteration one beyond the end of the string we are analyzing (otherwise we would miss the case where the end of the string contained a run of characters meeting the criteria). We do this by mapping the iteration into Option<char> and then chaining on a single None. This is better than special-casing a character such as \0, so that we could even count \0 characters.
For the same reason, we use Option<char> as the current character within the state tuple.
The return value of scan is an iterator over (Option<Option<char>>, i32). The first value in the tuple will be None for each repeated character in the iterator, whereas at each boundary where the character changes it will be Some(Some(char))
We use replace to both return the current character and count, at the same time as setting the state tuple to new values
The last step is to use filter_map to:
remove the (None, i32) variants, which indicate no change in the incoming character
filter out the cases where the count does not reach the limit n.

Here's one attempt at using filter_map():
fn foo(v: &str, n: usize) -> impl Iterator<Item = (usize, char)> + '_ {
let mut cur_ch = '\0';
let mut repeat = 0;
v.chars().chain(std::iter::once('\0')).filter_map(move |ch| {
if ch == cur_ch {
repeat += 1;
return None;
}
let val = if repeat >= n {
Some((repeat, cur_ch))
} else {
None
};
cur_ch = ch;
repeat = 1;
val
})
}
fn main() {
for (repeat, ch) in foo("1122253225", 2) {
println!("There are {} repetition of '{}'", repeat, ch);
}
}
And then you can generalize this to something like this:
fn foo<'i, I, T>(v: I, n: usize) -> impl Iterator<Item = (usize, T)> + 'i
where
I: Iterator<Item = T> + 'i,
T: Clone + Default + PartialEq + 'i,
{
let mut cur = T::default();
let mut repeat = 0;
v.chain(std::iter::once(T::default()))
.filter_map(move |i| {
if i == cur {
repeat += 1;
return None;
}
let val = if repeat >= n {
Some((repeat, cur.clone()))
} else {
None
};
cur = i;
repeat = 1;
val
})
}
This would be higher-order, but not sure if it's actually much simpler than just using a for loop!

Related

Remove value if part of vector, and if so accumulate it to another variable

I currently do it this way:
// v is a vector with thousands of sorted unsigned int value.
let mut total = 0;
// [...]
// some loop
let a = 5;
if v.iter().any(|&x| x == a as u16) {
total += a;
v.retain(|&x| x != a as u16);
}
// end loop
But it is quite inefficient since I iterate twice over v (although perhaps the compiler would catch this and optimize), isn't it a more elegant way to do it with Rust?
NB: The vector is sorted and contains no duplicate values if it can help

If I understand correctly your request, here a solution:
You say your vector is sorted so you can use binary_search()
And so you can use remove()
fn foo(data: &mut Vec<u16>) -> u64 {
let mut total: u64 = 0;
let mut a = 0;
while data.len() > 0 {
if let Ok(i) = data.binary_search(&a) {
total += data.remove(i) as u64;
}
a += 1;
}
total
}
fn main() {
let mut data = vec![1, 3, 8, 9, 46];
assert_eq!(foo(&mut data), 67);
}
This keep the vector sorted while removing, note that this is a dummy example. If you don't care about sorting you can use swap_remove() but this disallow the use of binary_search().
It's hard to say what would be the better.

How do I get the index from the beginning in a reversed iterator of chars of a string?

This code:
let s = String::from("hi");
for (idx, ch) in s.chars().rev().enumerate() {
println!("{} {}", idx, ch);
}
prints
0 i
1 h
but I want to know the real index, so that it would print:
1 i
0 h
What's the best way to do that? Currently I only think of first getting .count() and subtracting each idx from it, but maybe there's a better method that I overlooked.

This is complicated, as they say. If your string is ASCII only, you can do the obvious enumeration then reverse against a String's byte iterator:
fn main() {
let s = String::from("hi");
for (idx, ch) in s.bytes().enumerate().rev() {
println!("{} {}", idx, ch as char);
}
}
This doesn't work for Unicode strings in general because of what a char in Rust stands for:
The char type represents a single character. More specifically, since 'character' isn't a well-defined concept in Unicode, char is a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'.
This can be illustrated by the following:
fn main() {
let s = String::from("y̆");
println!("{}", s.len());
for (idx, ch) in s.bytes().enumerate() {
println!("{} {}", idx, ch);
}
for (idx, ch) in s.chars().enumerate() {
println!("{} {}", idx, ch);
}
}
This weird looking string has length of 3, as in 3 u8s. At the same time it has 2 chars. So ExactSizeIterator can't be trivially implemented for std::str::Chars, but it can and does be implemented for std::str::Bytes. This is significant because to reverse a given iterator, it has to be DoubleEndedIterator:
fn rev(self) -> Rev<Self>
where
Self: DoubleEndedIterator,
But DoubleEndedIterator is only available for enumeration iterator if the underlying iterator is also ExactSizeIterator:
impl<I> DoubleEndedIterator for Enumerate<I>
where
I: ExactSizeIterator + DoubleEndedIterator,
In conclusion, you can only do s.bytes().enumerate().rev(), but not s.chars().enumerate().rev(). If you absolutely have to index the enumerated char iterator of a String that way, you are on your own.

Is it possible to concatenate iterators?

let vec = iter::repeat("don't satisfy condition 1") // iterator such as next() always "don't " satisfy condition 1"
.take_while(|_| {
satisfycondition1.satisfy() // true is condition 1 is satisfied else false
})
.collect();
This code creates a vector of n elements with n equal to the number of times condition 1 is not respected.
I would like now to create a vector of n + m elements with n equal to the number of times that condition 1 is not respected and m the number of times that condition 2 is not respected.
The code should look like something like this:
let vec = iter::repeat("dont't satisfy condition 1")
.take_while(|_| {
satisfycondition1.satisfy()
})
.union(
iter::repeat("has satisfed condition 1 but not 2 yet")
.take_while(|_| {
satisfycondition2.satisfy()
})
)
.collect();
I know I could create two vectors and then concatenate them but it's less efficient.
You can use this code to understand what does repeat:
use std::iter;
fn main() {
let mut c = 0;
let z: Vec<_> = iter::repeat("dont't satisfy condition 1")
.take_while(|_| {
c = c + 1;
let rep = if c < 5 { true } else { false };
rep
})
.collect();
println!("------{:?}", z);
}

It seems like std::iter::chain is what you're looking for.
use std::iter;
fn main() {
let mut c = 0;
let mut d = 5;
let z: Vec<_> = iter::repeat("don't satisfy condition 1")
.take_while(|_| {
c = c + 1;
let rep = if c < 5 { true } else { false };
rep
// this block can be simplified to
// c += 1;
// c < 5
// Clippy warns about this
})
.chain(
iter::repeat("satisfy condition 1 but not 2").take_while(|_| {
d -= 1;
d > 2
}),
)
.collect();
println!("------{:?}", z);
}
(playground link)
I can't comment on the semantics of your code, though. If you're trying to see which elements of an iterator "satisfy condition 1 but not 2", this wouldn't be how you do it. You would use std::iter::filter twice (once with condition 1 and once with not condition 2) to achieve that.

How to convert a Rust char to an integer so that '1' becomes 1?

I am trying to find the sum of the digits of a given number. For example, 134 will give 8.
My plan is to convert the number into a string using .to_string() and then use .chars() to iterate over the digits as characters. Then I want to convert every char in the iteration into an integer and add it to a variable. I want to get the final value of this variable.
I tried using the code below to convert a char into an integer:
fn main() {
let x = "123";
for y in x.chars() {
let z = y.parse::<i32>().unwrap();
println!("{}", z + 1);
}
}
(Playground)
But it results in this error:
error[E0599]: no method named `parse` found for type `char` in the current scope
--> src/main.rs:4:19
|
4 | let z = y.parse::<i32>().unwrap();
| ^^^^^
This code does exactly what I want to do, but first I have to convert each char into a string and then into an integer to then increment sum by z.
fn main() {
let mut sum = 0;
let x = 123;
let x = x.to_string();
for y in x.chars() {
// converting `y` to string and then to integer
let z = (y.to_string()).parse::<i32>().unwrap();
// incrementing `sum` by `z`
sum += z;
}
println!("{}", sum);
}
(Playground)

The method you need is char::to_digit. It converts char to a number it represents in the given radix.
You can also use Iterator::sum to calculate sum of a sequence conveniently:
fn main() {
const RADIX: u32 = 10;
let x = "134";
println!("{}", x.chars().map(|c| c.to_digit(RADIX).unwrap()).sum::<u32>());
}

my_char as u32 - '0' as u32
Now, there's a lot more to unpack about this answer.
It works because the ASCII (and thus UTF-8) encodings have the Arabic numerals 0-9 ordered in ascending order. You can get the scalar values and subtract them.
However, what should it do for values outside this range? What happens if you provide 'p'? It returns 64. What about '.'? This will panic. And '♥' will return 9781.
Strings are not just bags of bytes. They are UTF-8 encoded and you cannot just ignore that fact. Every char can hold any Unicode scalar value.
That's why strings are the wrong abstraction for the problem.
From an efficiency perspective, allocating a string seems inefficient. Rosetta Code has an example of using an iterator which only does numeric operations:
struct DigitIter(usize, usize);
impl Iterator for DigitIter {
type Item = usize;
fn next(&mut self) -> Option<Self::Item> {
if self.0 == 0 {
None
} else {
let ret = self.0 % self.1;
self.0 /= self.1;
Some(ret)
}
}
}
fn main() {
println!("{}", DigitIter(1234, 10).sum::<usize>());
}

If c is your character you can just write:
c as i32 - 0x30;
Test with:
let c:char = '2';
let n:i32 = c as i32 - 0x30;
println!("{}", n);
output:
2
NB: 0x30 is '0' in ASCII table, easy enough to remember!

Another way is to iterate over the characters of your string and convert and add them using fold.
fn sum_of_string(s: &str) -> u32 {
s.chars().fold(0, |acc, c| c.to_digit(10).unwrap_or(0) + acc)
}
fn main() {
let x = "123";
println!("{}", sum_of_string(x));
}

Find next char boundary index in string after char

Given the string s, and the index i which is where the 好 character starts:
let s = "abc 好 def";
let i = 4;
What's the best way to get the index after that character, so that I can slice the string and get abc 好? In code:
let end = find_end(s, i);
assert_eq!("abc 好", &s[0..end]);
(Note, + 1 doesn't work because that assumes that the character is only 1 byte long.)
I currently have the following:
fn find_end(s: &str, i: usize) -> usize {
i + s[i..].chars().next().unwrap().len_utf8()
}
But I'm wondering if I'm missing something and there's a better way?

You could use char_indices to get the next index rather than using len_utf8 on the character, though that has a special case for the last character.
I would use the handy str::is_char_boundary() method. Here's an implementation using that:
fn find_end(s: &str, i: usize) -> usize {
assert!(i < s.len());
let mut end = i+1;
while !s.is_char_boundary(end) {
end += 1;
}
end
}
Playground link
Normally I would make such a function return Option<usize> in case it's called with an index at the end of s, but for now I've just asserted.
In many cases, instead of explicitly calling find_end it may make sense to iterate using char_indices, which gives you each index along with the characters; though it's slightly annoying if you want to know the end of the current character.

To serve as a complement to #ChrisEmerson's answer, this is how one could implement a find_end that searches for the end of a character's first occurrence. Playground
fn find_end<'s>(s: &'s str, p: char) -> Option<usize> {
let mut indices = s.char_indices();
let mut found = false;
for (_, v) in &mut indices {
if v == p {
found = true;
break;
}
}
if found {
Some(indices.next()
.map_or_else(|| s.len(), |(i, _)| i))
} else {
None
}
}
Although it avoids the byte boundary loop, it is still not very elegant. Ideally, an iterator method for traversing until a predicate is met would simplify this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Detect duplicated elements of a string slice happening in order - rust

Related

Remove value if part of vector, and if so accumulate it to another variable

How do I get the index from the beginning in a reversed iterator of chars of a string?

Is it possible to concatenate iterators?

How to convert a Rust char to an integer so that '1' becomes 1?

Find next char boundary index in string after char

Categories

Resources