How can I write a Rust function to find different characters between two strings?

How can I write a Rust function to find different characters between two strings? - rust

The order of the characters is not important but the count is. I mean aaabaaa equals to 6a + b and the function is like math subtraction. For example:
fn diff(a: String, b: String) -> String {}
diff("aabbac", "accba") => "ab"
---------------------------------
"aabbac" = (3a+2b+c)
"accba" = (2a+b+2c)
(3a+2b+c) - (2a+b+2c) = a+b // -c is ignored

The usual technique is to create a function that counts the number of occurrences of each char, like collections.Counter in Python, and to compare these numbers for strings a and b.
The Rust standard library documentation contains a snippet that does the job. This is an adaptation that accepts any iterator:
use std::collections::HashMap;
use std::hash::Hash;
use std::iter::Iterator;
fn counter<T, I>(it: I) -> HashMap<T, usize>
where
T: Eq + Hash,
I: Iterator<Item = T>,
{
let mut count_by_element = HashMap::new();
for e in it {
*count_by_element.entry(e).or_insert(0) += 1;
}
count_by_element
}
Now that we know how to build a map char -> count, we just have to compare the counts of the string a and b:
use std::iter;
fn diff(a: &str, b: &str) -> String {
let mut v: Vec<char> = vec![];
let counter_a = counter(a.chars());
let counter_b = counter(b.chars());
for (c, n_a) in &counter_a {
let n_b = counter_b.get(c).unwrap_or(&0); // how many `c` in `b`?
if n_a > n_b {
v.extend(iter::repeat(c).take(n_a - n_b)); // add `n_a - n_b` `c`s
}
}
v.into_iter().collect::<String>() // build the String
}
If you want a "one shot" function, you can forget the counter function and use a more direct approach:
fn diff_one_shot(a: &str, b: &str) -> String {
let mut counter = HashMap::new();
for c in a.chars() {
*counter.entry(c).or_insert(0) += 1; // one more
}
for c in b.chars() {
*counter.entry(c).or_insert(0) -= 1; // one less
}
counter
.iter()
.filter(|(_c, &n)| n > 0) // only if more `c` in `a` than in `b`
.flat_map(|(c, &n)| iter::repeat(c).take(n)) // `n` times `c`
.collect::<String>()
}
Examples:
fn main() {
println!("{:?}", counter("aaabbc".chars()));
// {'b': 2, 'c': 1, 'a': 3}
println!("{}", diff("aaabbc", "ab"));
//aabc
println!("{}", diff_one_shot("aaabbc", "ab"));
//aacb
}

Related

How can I iterate over a sequence multiple times within a function?

I have a function that I would like to take an argument that can be looped over. However I would like to loop over it twice. I tried using the Iterator trait however I can only iterate over it once because it consumes the struct when iterating.
How should I make it so my function can loop twice? I know I could use values: Vec<usize> however I would like to make it generic over any object that is iterable.
Here's an example of what I would like to do: (Please ignore what the loops are actually doing. In my real code I can't condense the two loops into one.)
fn perform<'a, I>(values: I) -> usize
where
I: Iterator<Item = &'a usize>,
{
// Loop one: This works.
let sum = values.sum::<usize>();
// Loop two: This doesn't work due to `error[E0382]: use of moved value:
// `values``.
let max = values.max().unwrap();
sum * max
}
fn main() {
let v: Vec<usize> = vec![1, 2, 3, 4];
let result = perform(v.iter());
print!("Result: {}", result);
}

You can't iterate over the same iterator twice, because iterators are not guaranteed to be randomly accessible. For example, std::iter::from_fn produces an iterator that is most definitely not randomly accessible.
As #mousetail already mentioned, one way to get around this problem is to expect a Cloneable iterator:
fn perform<'a, I>(values: I) -> usize
where
I: Iterator<Item = &'a usize> + Clone,
{
// Loop one: This works.
let sum = values.clone().sum::<usize>();
// Loop two: This doesn't work due to `error[E0382]: use of moved value:
// `values``.
let max = values.max().unwrap();
sum * max
}
fn main() {
let v: Vec<usize> = vec![1, 2, 3, 4];
let result = perform(v.iter());
println!("Result: {}", result);
}
Result: 40
Although in your specific example, I'd compute both sum and max in the same iteration:
fn perform<'a, I>(values: I) -> usize
where
I: Iterator<Item = &'a usize>,
{
let (sum, max) = values.fold((0, usize::MIN), |(sum, max), &el| {
(sum + el, usize::max(max, el))
});
sum * max
}
fn main() {
let v: Vec<usize> = vec![1, 2, 3, 4];
let result = perform(v.iter());
println!("Result: {}", result);
}
Result: 40

How can I convert from Vec<char> to u32 in Rust without going through String?

My rust code runs in an environment where I have no access to std::string and std::* (but I have access to core::str). How can I convert a Vec<char> to u32 without going through String, such as:
let num_in_chars: Vec<char> = vec!['1', '2'];
// some process here
// let num = ...
// This is how I could do it if I have access to `String`
// let num = num_in_chars.iter().collect::<String>().parse::<u32>().unwrap();
assert_eq!(12, num);
Thanks

You must convert each char to a digit (in the map) and then you multiply each previous result by 10 and you add the new digit:
/// Returns `None` in case of invalid digit.
pub fn vec_to_int(digits: impl IntoIterator<Item = char>) -> Option<u32> {
const RADIX: u32 = 10;
digits
.into_iter()
.map(|c| c.to_digit(RADIX))
.try_fold(0, |ans, i| i.map(|i| ans * RADIX + i))
}
#[test]
fn it_works() {
let nums = vec!['1', '2'];
let num = vec_to_int(nums);
assert_eq!(Some(12), num);
}
#[test]
fn invalid_digit() {
let nums = vec!['1', 'a'];
let num = vec_to_int(nums);
assert_eq!(None, num);
}

Is there a method like JavaScript's substr in Rust?

I looked at the Rust docs for String but I can't find a way to extract a substring.
Is there a method like JavaScript's substr in Rust? If not, how would you implement it?
str.substr(start[, length])
The closest is probably slice_unchecked but it uses byte offsets instead of character indexes and is marked unsafe.

For characters, you can use s.chars().skip(pos).take(len):
fn main() {
let s = "Hello, world!";
let ss: String = s.chars().skip(7).take(5).collect();
println!("{}", ss);
}
Beware of the definition of Unicode characters though.
For bytes, you can use the slice syntax:
fn main() {
let s = b"Hello, world!";
let ss = &s[7..12];
println!("{:?}", ss);
}

You can use the as_str method on the Chars iterator to get back a &str slice after you have stepped on the iterator. So to skip the first start chars, you can call
let s = "Some text to slice into";
let mut iter = s.chars();
iter.by_ref().nth(start); // eat up start values
let slice = iter.as_str(); // get back a slice of the rest of the iterator
Now if you also want to limit the length, you first need to figure out the byte-position of the length character:
let end_pos = slice.char_indices().nth(length).map(|(n, _)| n).unwrap_or(0);
let substr = &slice[..end_pos];
This might feel a little roundabout, but Rust is not hiding anything from you that might take up CPU cycles. That said, I wonder why there's no crate yet that offers a substr method.

This code performs both substring-ing and string-slicing, without panicking nor allocating:
use std::ops::{Bound, RangeBounds};
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> &str;
fn slice(&self, range: impl RangeBounds<usize>) -> &str;
}
impl StringUtils for str {
fn substring(&self, start: usize, len: usize) -> &str {
let mut char_pos = 0;
let mut byte_start = 0;
let mut it = self.chars();
loop {
if char_pos == start { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_start += c.len_utf8();
}
else { break; }
}
char_pos = 0;
let mut byte_end = byte_start;
loop {
if char_pos == len { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_end += c.len_utf8();
}
else { break; }
}
&self[byte_start..byte_end]
}
fn slice(&self, range: impl RangeBounds<usize>) -> &str {
let start = match range.start_bound() {
Bound::Included(bound) | Bound::Excluded(bound) => *bound,
Bound::Unbounded => 0,
};
let len = match range.end_bound() {
Bound::Included(bound) => *bound + 1,
Bound::Excluded(bound) => *bound,
Bound::Unbounded => self.len(),
} - start;
self.substring(start, len)
}
}
fn main() {
let s = "abcdèfghij";
// All three statements should print:
// "abcdè, abcdèfghij, dèfgh, dèfghij."
println!("{}, {}, {}, {}.",
s.substring(0, 5),
s.substring(0, 50),
s.substring(3, 5),
s.substring(3, 50));
println!("{}, {}, {}, {}.",
s.slice(..5),
s.slice(..50),
s.slice(3..8),
s.slice(3..));
println!("{}, {}, {}, {}.",
s.slice(..=4),
s.slice(..=49),
s.slice(3..=7),
s.slice(3..));
}

For my_string.substring(start, len)-like syntax, you can write a custom trait:
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> Self;
}
impl StringUtils for String {
fn substring(&self, start: usize, len: usize) -> Self {
self.chars().skip(start).take(len).collect()
}
}
// Usage:
fn main() {
let phrase: String = "this is a string".to_string();
println!("{}", phrase.substring(5, 8)); // prints "is a str"
}

The solution given by oli_obk does not handle last index of string slice. It can be fixed with .chain(once(s.len())).
Here function substr implements a substring slice with error handling. If invalid index is passed to function, then a valid part of string slice is returned with Err-variant. All corner cases should be handled correctly.
fn substr(s: &str, begin: usize, length: Option<usize>) -> Result<&str, &str> {
use std::iter::once;
let mut itr = s.char_indices().map(|(n, _)| n).chain(once(s.len()));
let beg = itr.nth(begin);
if beg.is_none() {
return Err("");
} else if length == Some(0) {
return Ok("");
}
let end = length.map_or(Some(s.len()), |l| itr.nth(l-1));
if let Some(end) = end {
return Ok(&s[beg.unwrap()..end]);
} else {
return Err(&s[beg.unwrap()..s.len()]);
}
}
let s = "abc🙂";
assert_eq!(Ok("bc"), substr(s, 1, Some(2)));
assert_eq!(Ok("c🙂"), substr(s, 2, Some(2)));
assert_eq!(Ok("c🙂"), substr(s, 2, None));
assert_eq!(Err("c🙂"), substr(s, 2, Some(99)));
assert_eq!(Ok(""), substr(s, 2, Some(0)));
assert_eq!(Err(""), substr(s, 5, Some(4)));
Note that this does not handle unicode grapheme clusters. For example, "y̆es" contains 4 unicode chars but 3 grapheme clusters. Crate unicode-segmentation solves this problem. Unicode grapheme clusters are handled correctly if part
let mut itr = s.char_indices()...
is replaced with
use unicode_segmentation::UnicodeSegmentation;
let mut itr = s.grapheme_indices(true)...
Then also following works
assert_eq!(Ok("y̆"), substr("y̆es", 0, Some(1)));

Knowing about the various syntaxes of the slice type might be beneficial for some of the readers.
Reference to a part of a string
&s[6..11]
If you start at index 0, you can omit the value
&s[0..1] ^= &s[..1]
Equivalent if your substring contains the last byte of the string
&s[3..s.len()] ^= &s[3..]
This also applies when the slice encompasses the entire string
&s[..]
You can also use the range inclusive operator to include the last value
&s[..=1]
Link to docs: https://doc.rust-lang.org/book/ch04-03-slices.html

I would suggest you use the crate substring. (And look at its source code if you want to learn how to do this properly.)

I couldn't find the exact substr implementation that I'm familiar with from other programming languages like: JavaScript, Dart, and etc.
Here is possible implementation of method substr to &str and String
Let's define a trait for making able to implement functions to default types, (like extensions in Dart).
trait Substr {
fn substr(&self, start: usize, end: usize) -> String;
}
Then implement this trait for &str
impl<'a> Substr for &'a str {
fn substr(&self, start: usize, end: usize) -> String {
if start > end || start == end {
return String::new();
}
self.chars().skip(start).take(end - start).collect()
}
}
Try:
fn main() {
let string = "Hello, world!";
let substring = string.substr(0, 4);
println!("{}", substring); // Hell
}

You can also use .to_string()[ <range> ].
This example takes an immutable slice of the original string, then mutates that string to demonstrate the original slice is preserved.
let mut s: String = "Hello, world!".to_string();
let substring: &str = &s.to_string()[..6];
s.replace_range(..6, "Goodbye,");
println!("{} {} universe!", s, substring);
// Goodbye, world! Hello, universe!

I'm not very experienced in Rust but I gave it a try. If someone could correct my answer please don't hesitate.
fn substring(string:String, start:u32, end:u32) -> String {
let mut substr = String::new();
let mut i = start;
while i < end + 1 {
substr.push_str(&*(string.chars().nth(i as usize).unwrap().to_string()));
i += 1;
}
return substr;
}
Here is a playground

How to implement trim for Vec<u8>?

Rust provides a trim method for strings: str.trim() removing leading and trailing whitespace. I want to have a method that does the same for bytestrings. It should take a Vec<u8> and remove leading and trailing whitespace (space, 0x20 and htab, 0x09).
Writing a trim_left() is easy, you can just use an iterator with skip_while(): Rust Playground
fn main() {
let a: &[u8] = b" fo o ";
let b: Vec<u8> = a.iter().map(|x| x.clone()).skip_while(|x| x == &0x20 || x == &0x09).collect();
println!("{:?}", b);
}
But to trim the right characters I would need to look ahead if no other letter is in the list after whitespace was found.

Here's an implementation that returns a slice, rather than a new Vec<u8>, as str::trim() does. It's also implemented on [u8], since that's more general than Vec<u8> (you can obtain a slice from a vector cheaply, but creating a vector from a slice is more costly, since it involves a heap allocation and a copy).
trait SliceExt {
fn trim(&self) -> &Self;
}
impl SliceExt for [u8] {
fn trim(&self) -> &[u8] {
fn is_whitespace(c: &u8) -> bool {
*c == b'\t' || *c == b' '
}
fn is_not_whitespace(c: &u8) -> bool {
!is_whitespace(c)
}
if let Some(first) = self.iter().position(is_not_whitespace) {
if let Some(last) = self.iter().rposition(is_not_whitespace) {
&self[first..last + 1]
} else {
unreachable!();
}
} else {
&[]
}
}
}
fn main() {
let a = b" fo o ";
let b = a.trim();
println!("{:?}", b);
}
If you really need a Vec<u8> after the trim(), you can just call into() on the slice to turn it into a Vec<u8>.
fn main() {
let a = b" fo o ";
let b: Vec<u8> = a.trim().into();
println!("{:?}", b);
}

This is a much simpler version than the other answers.
pub fn trim_ascii_whitespace(x: &[u8]) -> &[u8] {
let from = match x.iter().position(|x| !x.is_ascii_whitespace()) {
Some(i) => i,
None => return &x[0..0],
};
let to = x.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
&x[from..=to]
}
Weird that this isn't in the standard library. I would have thought it was a common task.
Anyway here it is as a complete file/trait (with tests!) that you can copy/paste.
use std::ops::Deref;
/// Trait to allow trimming ascii whitespace from a &[u8].
pub trait TrimAsciiWhitespace {
/// Trim ascii whitespace (based on `is_ascii_whitespace()`) from the
/// start and end of a slice.
fn trim_ascii_whitespace(&self) -> &[u8];
}
impl<T: Deref<Target=[u8]>> TrimAsciiWhitespace for T {
fn trim_ascii_whitespace(&self) -> &[u8] {
let from = match self.iter().position(|x| !x.is_ascii_whitespace()) {
Some(i) => i,
None => return &self[0..0],
};
let to = self.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
&self[from..=to]
}
}
#[cfg(test)]
mod test {
use super::TrimAsciiWhitespace;
#[test]
fn basic_trimming() {
assert_eq!(b" A ".trim_ascii_whitespace(), b"A");
assert_eq!(b" AB ".trim_ascii_whitespace(), b"AB");
assert_eq!(b"A ".trim_ascii_whitespace(), b"A");
assert_eq!(b"AB ".trim_ascii_whitespace(), b"AB");
assert_eq!(b" A".trim_ascii_whitespace(), b"A");
assert_eq!(b" AB".trim_ascii_whitespace(), b"AB");
assert_eq!(b" A B ".trim_ascii_whitespace(), b"A B");
assert_eq!(b"A B ".trim_ascii_whitespace(), b"A B");
assert_eq!(b" A B".trim_ascii_whitespace(), b"A B");
assert_eq!(b" ".trim_ascii_whitespace(), b"");
assert_eq!(b" ".trim_ascii_whitespace(), b"");
}
}

All we have to do is find the index of the first non-whitespace character, one time counting forward from the start, and another time counting backwards from the end.
fn is_not_whitespace(e: &u8) -> bool {
*e != 0x20 && *e != 0x09
}
fn main() {
let a: &[u8] = b" fo o ";
// find the index of first non-whitespace char
let begin = a.iter()
.position(is_not_whitespace);
// find the index of the last non-whitespace char
let end = a.iter()
.rev()
.position(is_not_whitespace)
.map(|j| a.len() - j);
// build it
let vec = begin.and_then(|i| end.map(|j| a[i..j].iter().collect()))
.unwrap_or(Vec::new());
println!("{:?}", vec);
}

String join on strings in Vec in reverse order without a `collect`

I'm trying to join strings in a vector into a single string, in reverse from their order in the vector. The following works:
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
v.iter().rev().map(|s| s.clone()).collect::<Vec<String>>().connect(".")
However, this ends up creating a temporary vector that I don't actually need. Is it possible to do this without a collect? I see that connect is a StrVector method. Is there nothing for raw iterators?

I believe this is the shortest you can get:
fn main() {
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let mut r = v.iter()
.rev()
.fold(String::new(), |r, c| r + c.as_str() + ".");
r.pop();
println!("{}", r);
}
The addition operation on String takes its left operand by value and pushes the second operand in-place, which is very nice - it does not cause any reallocations. You don't even need to clone() the contained strings.
I think, however, that the lack of concat()/connect() methods on iterators is a serious drawback. It bit me a lot too.

I don't know if they've heard our Stack Overflow prayers or what, but the itertools crate happens to have just the method you need - join.
With it, your example might be laid out as follows:
use itertools::Itertools;
let v = ["a", "b", "c"];
let connected = v.iter().rev().join(".");

Here's an iterator extension trait that I whipped up, just for you!
pub trait InterleaveExt: Iterator + Sized {
fn interleave(self, value: Self::Item) -> Interleave<Self> {
Interleave {
iter: self.peekable(),
value: value,
me_next: false,
}
}
}
impl<I: Iterator> InterleaveExt for I {}
pub struct Interleave<I>
where
I: Iterator,
{
iter: std::iter::Peekable<I>,
value: I::Item,
me_next: bool,
}
impl<I> Iterator for Interleave<I>
where
I: Iterator,
I::Item: Clone,
{
type Item = I::Item;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
// Don't return a value if there's no next item
if let None = self.iter.peek() {
return None;
}
let next = if self.me_next {
Some(self.value.clone())
} else {
self.iter.next()
};
self.me_next = !self.me_next;
next
}
}
It can be called like so:
fn main() {
let a = &["a", "b", "c"];
let s: String = a.iter().cloned().rev().interleave(".").collect();
println!("{}", s);
let v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let s: String = v.iter().map(|s| s.as_str()).rev().interleave(".").collect();
println!("{}", s);
}
I've since learned that this iterator adapter already exists in itertools under the name intersperse — go use that instead!.
Cheating answer
You never said you needed the original vector after this, so we can reverse it in place and just use join...
let mut v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
v.reverse();
println!("{}", v.join("."))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I write a Rust function to find different characters between two strings? - rust

Related

How can I iterate over a sequence multiple times within a function?

How can I convert from Vec<char> to u32 in Rust without going through String?

Is there a method like JavaScript's substr in Rust?

How to implement trim for Vec<u8>?

String join on strings in Vec in reverse order without a `collect`

Categories

Resources