Difference between split(' ') and split_whitespace() - rust

The following code prints out:
1
a
2
a
I don't get this. why does this happen?
fn main() {
let s = "a ";
let sv1:Vec<&str> = s.split_whitespace().collect();
println!("{}", sv1.len());
for x in sv1.iter() {
println!("{}", x);
}
let sv2:Vec<&str> = s.split(' ').collect();
println!("{}", sv2.len());
for x in sv2.iter() {
println!("{}", x);
}
}

As per the method docs, split_whitespace returns std::str::SplitWhitespace, which is An iterator over the non-whitespace substrings of a string, separated by any amount of whitespace. which means it can split on multiple whitespaces, and doesn't include empty string in the result.
While for split method,
Contiguous separators are separated by the empty string. as well as Separators at the start or end of a string are neighbored by empty strings.
So in your example, split_whitespace gives ["a"] but split gives ["a", ""].

Related

Expected &str found char with rust?

I am getting this error
expected &str, found char
For this code
// Expected output
// -------
// h exists
// c exists
fn main() {
let list = ["c","h","p","u"];
let s = "Hot and Cold".to_string();
let mut v: Vec<String> = Vec::new();
for i in s.split(" ") {
let c = i.chars().nth(0).unwrap().to_lowercase().nth(0).unwrap();
println!("{}", c);
if list.contains(&c) {
println!("{} exists", c);
}
}
}
How do I solve this?
Change list from an array of &strs to an array of chars:
let list = ['c', 'h', 'p', 'u'];
Double-quotes "" create string literals, while single-quotes '' create character literals. See Literal Expressions in the Rust reference.
I'm assuming you want a list to be a list of chars not a list of strs, in that case try changing
let list = ["c","h","p","u"];
to
let list = ['c','h','p','u'];
and it should work
Rust playground

How to remove first and last character of a string in Rust?

I'm wondering how I can remove the first and last character of a string in Rust.
Example:
Input:
"Hello World"
Output:
"ello Worl"
You can use the .chars() iterator and ignore the first and last characters:
fn rem_first_and_last(value: &str) -> &str {
let mut chars = value.chars();
chars.next();
chars.next_back();
chars.as_str()
}
It returns an empty string for zero- or one-sized strings and it will handle multi-byte unicode characters correctly. See it working on the playground.
…or the trivial solution
also works with Unicode characters:
let mut s = "Hello World".to_string();
s.pop(); // remove last
if s.len() > 0 {
s.remove(0); // remove first
}
I did this by using string slices.
fn main() {
let string: &str = "Hello World";
let first_last_off: &str = &string[1..string.len() - 1];
println!("{}", first_last_off);
}
I took all the characters in the string until the end of the string - 1.
You can also use split_at().
let msg = "Hello, world!";
let msg = msg.split_at(msg.len() - 1);
let msg = msg.0.split_at(1);
println!("{}", msg.1);
ello, world
split_at() returns the following: (&str, &str).

Get the first letters of words in a sentence

How I could get the first letters from a sentence; example: "Rust is a fast reliable programming language" should return the output riafrpl.
fn main() {
let string: &'static str = "Rust is a fast reliable programming language";
println!("First letters: {}", string);
}
let initials: String = string
.split(" ") // create an iterator, yielding words
.flat_map(|s| s.chars().nth(0)) // get the first char of each word
.collect(); // collect the result into a String
This is a perfect task for Rust's iterators; here's how I would do it:
fn main() {
let string: &'static str = "Rust is a fast reliable programming language";
let first_letters = string
.split_whitespace() // split string into words
.map(|word| word // map every word with the following:
.chars() // split it into separate characters
.next() // pick the first character
.unwrap() // take the character out of the Option wrap
)
.collect::<String>(); // collect the characters into a string
println!("First letters: {}", first_letters); // First letters: Riafrpl
}

How do I get the first character out of a string?

I want to get the first character of a std::str. The method char_at() is currently unstable, as is String::slice_chars.
I have come up with the following, but it seems excessive to get a single character and not use the rest of the vector:
let text = "hello world!";
let char_vec: Vec<char> = text.chars().collect();
let ch = char_vec[0];
UTF-8 does not define what "character" is so it depends on what you want. In this case, chars are Unicode scalar values, and so the first char of a &str is going to be between one and four bytes.
If you want just the first char, then don't collect into a Vec<char>, just use the iterator:
let text = "hello world!";
let ch = text.chars().next().unwrap();
Alternatively, you can use the iterator's nth method:
let ch = text.chars().nth(0).unwrap();
Bear in mind that elements preceding the index passed to nth will be consumed from the iterator.
I wrote a function that returns the head of a &str and the rest:
fn car_cdr(s: &str) -> (&str, &str) {
for i in 1..5 {
let r = s.get(0..i);
match r {
Some(x) => return (x, &s[i..]),
None => (),
}
}
(&s[0..0], s)
}
Use it like this:
let (first_char, remainder) = car_cdr("test");
println!("first char: {}\nremainder: {}", first_char, remainder);
The output looks like:
first char: t
remainder: est
It works fine with chars that are more than 1 byte.
Get the first single character out of a string w/o using the rest of that string:
let text = "hello world!";
let ch = text.chars().take(1).last().unwrap();
It would be nice to have something similar to Haskell's head function and tail function for such cases.
I wrote this function to act like head and tail together (doesn't match exact implementation)
pub fn head_tail<T: Iterator, O: FromIterator<<T>::Item>>(iter: &mut T) -> (Option<<T>::Item>, O) {
(iter.next(), iter.collect::<O>())
}
Usage:
// works with Vec<i32>
let mut val = vec![1, 2, 3].into_iter();
println!("{:?}", head_tail::<_, Vec<i32>>(&mut val));
// works with chars in two ways
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, String>(&mut val));
// calling the function with Vec<char>
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, Vec<char>>(&mut val));
NOTE: The head_tail function doesn't panic! if the iterator is empty. If this matched Haskell's head/tail output, this would have thrown an exception if the iterator was empty. It might also be good to use iterable trait to be more compatible to other types.
If you only want to test for it, you can use starts_with():
"rust".starts_with('r')
"rust".starts_with(|c| c == 'r')
I think it is pretty straight forward
let text = "hello world!";
let c: char = text.chars().next().unwrap();
next() takes the next item from the iterator
To “unwrap” something in Rust is to say, “Give me the result of the computation, and if there was an error, panic and stop the program.”
The accepted answer is a bit ugly!
let text = "hello world!";
let ch = &text[0..1]; // this returns "h"

How do I split a string in Rust?

From the documentation, it's not clear. In Java you could use the split method like so:
"some string 123 ffd".split("123");
Use split()
let mut split = "some string 123 ffd".split("123");
This gives an iterator, which you can loop over, or collect() into a vector.
for s in split {
println!("{}", s)
}
let vec = split.collect::<Vec<&str>>();
// OR
let vec: Vec<&str> = split.collect();
There are three simple ways:
By separator:
s.split("separator") | s.split('/') | s.split(char::is_numeric)
By whitespace:
s.split_whitespace()
By newlines:
s.lines()
By regex: (using regex crate)
Regex::new(r"\s").unwrap().split("one two three")
The result of each kind is an iterator:
let text = "foo\r\nbar\n\nbaz\n";
let mut lines = text.lines();
assert_eq!(Some("foo"), lines.next());
assert_eq!(Some("bar"), lines.next());
assert_eq!(Some(""), lines.next());
assert_eq!(Some("baz"), lines.next());
assert_eq!(None, lines.next());
There is a special method split for struct String:
fn split<'a, P>(&'a self, pat: P) -> Split<'a, P> where P: Pattern<'a>
Split by char:
let v: Vec<&str> = "Mary had a little lamb".split(' ').collect();
assert_eq!(v, ["Mary", "had", "a", "little", "lamb"]);
Split by string:
let v: Vec<&str> = "lion::tiger::leopard".split("::").collect();
assert_eq!(v, ["lion", "tiger", "leopard"]);
Split by closure:
let v: Vec<&str> = "abc1def2ghi".split(|c: char| c.is_numeric()).collect();
assert_eq!(v, ["abc", "def", "ghi"]);
split returns an Iterator, which you can convert into a Vec using collect: split_line.collect::<Vec<_>>(). Going through an iterator instead of returning a Vec directly has several advantages:
split is lazy. This means that it won't really split the line until you need it. That way it won't waste time splitting the whole string if you only need the first few values: split_line.take(2).collect::<Vec<_>>(), or even if you need only the first value that can be converted to an integer: split_line.filter_map(|x| x.parse::<i32>().ok()).next(). This last example won't waste time attempting to process the "23.0" but will stop processing immediately once it finds the "1".
split makes no assumption on the way you want to store the result. You can use a Vec, but you can also use anything that implements FromIterator<&str>, for example a LinkedList or a VecDeque, or any custom type that implements FromIterator<&str>.
There's also split_whitespace()
fn main() {
let words: Vec<&str> = " foo bar\t\nbaz ".split_whitespace().collect();
println!("{:?}", words);
// ["foo", "bar", "baz"]
}
The OP's question was how to split with a multi-character string and here is a way to get the results of part1 and part2 as Strings instead in a vector.
Here splitted with the non-ASCII character string "☄☃🤔" in place of "123":
let s = "☄☃🤔"; // also works with non-ASCII characters
let mut part1 = "some string ☄☃🤔 ffd".to_string();
let _t;
let part2;
if let Some(idx) = part1.find(s) {
part2 = part1.split_off(idx + s.len());
_t = part1.split_off(idx);
}
else {
part2 = "".to_string();
}
gets: part1 = "some string "
         part2 = " ffd"
If "☄☃🤔" not is found part1 contains the untouched original String and part2 is empty.
Here is a nice example in Rosetta Code -
Split a character string based on change of character - of how you can turn a short solution using split_off:
fn main() {
let mut part1 = "gHHH5YY++///\\".to_string();
if let Some(mut last) = part1.chars().next() {
let mut pos = 0;
while let Some(c) = part1.chars().find(|&c| {if c != last {true} else {pos += c.len_utf8(); false}}) {
let part2 = part1.split_off(pos);
print!("{}, ", part1);
part1 = part2;
last = c;
pos = 0;
}
}
println!("{}", part1);
}
into that
Task
Split a (character) string into comma (plus a blank) delimited strings based on a change of character (left to right).
If you are looking for the Python-flavoured split where you tuple-unpack the two ends of the split string, you can do
if let Some((a, b)) = line.split_once(' ') {
// ...
}

Resources