How do I split a string in Rust? - rust

From the documentation, it's not clear. In Java you could use the split method like so:
"some string 123 ffd".split("123");

Use split()
let mut split = "some string 123 ffd".split("123");
This gives an iterator, which you can loop over, or collect() into a vector.
for s in split {
println!("{}", s)
}
let vec = split.collect::<Vec<&str>>();
// OR
let vec: Vec<&str> = split.collect();

There are three simple ways:
By separator:
s.split("separator") | s.split('/') | s.split(char::is_numeric)
By whitespace:
s.split_whitespace()
By newlines:
s.lines()
By regex: (using regex crate)
Regex::new(r"\s").unwrap().split("one two three")
The result of each kind is an iterator:
let text = "foo\r\nbar\n\nbaz\n";
let mut lines = text.lines();
assert_eq!(Some("foo"), lines.next());
assert_eq!(Some("bar"), lines.next());
assert_eq!(Some(""), lines.next());
assert_eq!(Some("baz"), lines.next());
assert_eq!(None, lines.next());

There is a special method split for struct String:
fn split<'a, P>(&'a self, pat: P) -> Split<'a, P> where P: Pattern<'a>
Split by char:
let v: Vec<&str> = "Mary had a little lamb".split(' ').collect();
assert_eq!(v, ["Mary", "had", "a", "little", "lamb"]);
Split by string:
let v: Vec<&str> = "lion::tiger::leopard".split("::").collect();
assert_eq!(v, ["lion", "tiger", "leopard"]);
Split by closure:
let v: Vec<&str> = "abc1def2ghi".split(|c: char| c.is_numeric()).collect();
assert_eq!(v, ["abc", "def", "ghi"]);

split returns an Iterator, which you can convert into a Vec using collect: split_line.collect::<Vec<_>>(). Going through an iterator instead of returning a Vec directly has several advantages:
split is lazy. This means that it won't really split the line until you need it. That way it won't waste time splitting the whole string if you only need the first few values: split_line.take(2).collect::<Vec<_>>(), or even if you need only the first value that can be converted to an integer: split_line.filter_map(|x| x.parse::<i32>().ok()).next(). This last example won't waste time attempting to process the "23.0" but will stop processing immediately once it finds the "1".
split makes no assumption on the way you want to store the result. You can use a Vec, but you can also use anything that implements FromIterator<&str>, for example a LinkedList or a VecDeque, or any custom type that implements FromIterator<&str>.

There's also split_whitespace()
fn main() {
let words: Vec<&str> = " foo bar\t\nbaz ".split_whitespace().collect();
println!("{:?}", words);
// ["foo", "bar", "baz"]
}

The OP's question was how to split with a multi-character string and here is a way to get the results of part1 and part2 as Strings instead in a vector.
Here splitted with the non-ASCII character string "☄☃🤔" in place of "123":
let s = "☄☃🤔"; // also works with non-ASCII characters
let mut part1 = "some string ☄☃🤔 ffd".to_string();
let _t;
let part2;
if let Some(idx) = part1.find(s) {
part2 = part1.split_off(idx + s.len());
_t = part1.split_off(idx);
}
else {
part2 = "".to_string();
}
gets: part1 = "some string "
         part2 = " ffd"
If "☄☃🤔" not is found part1 contains the untouched original String and part2 is empty.
Here is a nice example in Rosetta Code -
Split a character string based on change of character - of how you can turn a short solution using split_off:
fn main() {
let mut part1 = "gHHH5YY++///\\".to_string();
if let Some(mut last) = part1.chars().next() {
let mut pos = 0;
while let Some(c) = part1.chars().find(|&c| {if c != last {true} else {pos += c.len_utf8(); false}}) {
let part2 = part1.split_off(pos);
print!("{}, ", part1);
part1 = part2;
last = c;
pos = 0;
}
}
println!("{}", part1);
}
into that
Task
Split a (character) string into comma (plus a blank) delimited strings based on a change of character (left to right).

If you are looking for the Python-flavoured split where you tuple-unpack the two ends of the split string, you can do
if let Some((a, b)) = line.split_once(' ') {
// ...
}

Related

How to remove characters from specific index in String?

I have an application where I am receiving a string with some repetitive characters. I am receiving input as a String. How to remove the characters from specific index?
main.rs
fn main() {
let s:String = "{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}8668982856274}".to_string();
println!("{}", s);
}
how can I get result
"{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}"
instead of
"{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}}8668982856274}"
String indexing works only with bytes, thus you need to find an index for the appropriate byte slice like this:
let mut s = "{\"name\":\"xx/yyyy/machine/zzz/test_int4\",\"status\":\"online\",\"timestamp\":\"2021-06-11 18:20:42.231770800 UTC\",\"value\":7}8668982856274}";
let closing_bracket_idx = s
.as_bytes()
.iter()
.position(|&x| x == b'}')
.map(|i| i + 1)
.unwrap_or_else(|| s.len());
let v: serde_json::Value = serde_json::from_str(&s[..closing_bracket_idx]).unwrap();
println!("{:?}", v);
However, keep in mind, this approach doesn't really work in general for more complex cases, for example } in a json string value, or nested objects, or a type other than an object at the upmost level (e.g. [1, {2: 3}, 4]). More neat way is using parser capabilities to ignore of the trailing, as an example for serde_json:
let v = serde_json::Deserializer::from_str(s)
.into_iter::<serde_json::Value>()
.next()
.expect("empty input")
.expect("invalid json value");
println!("{:?}", v);

Expected &str found char with rust?

I am getting this error
expected &str, found char
For this code
// Expected output
// -------
// h exists
// c exists
fn main() {
let list = ["c","h","p","u"];
let s = "Hot and Cold".to_string();
let mut v: Vec<String> = Vec::new();
for i in s.split(" ") {
let c = i.chars().nth(0).unwrap().to_lowercase().nth(0).unwrap();
println!("{}", c);
if list.contains(&c) {
println!("{} exists", c);
}
}
}
How do I solve this?
Change list from an array of &strs to an array of chars:
let list = ['c', 'h', 'p', 'u'];
Double-quotes "" create string literals, while single-quotes '' create character literals. See Literal Expressions in the Rust reference.
I'm assuming you want a list to be a list of chars not a list of strs, in that case try changing
let list = ["c","h","p","u"];
to
let list = ['c','h','p','u'];
and it should work
Rust playground

How to remove first and last character of a string in Rust?

I'm wondering how I can remove the first and last character of a string in Rust.
Example:
Input:
"Hello World"
Output:
"ello Worl"
You can use the .chars() iterator and ignore the first and last characters:
fn rem_first_and_last(value: &str) -> &str {
let mut chars = value.chars();
chars.next();
chars.next_back();
chars.as_str()
}
It returns an empty string for zero- or one-sized strings and it will handle multi-byte unicode characters correctly. See it working on the playground.
…or the trivial solution
also works with Unicode characters:
let mut s = "Hello World".to_string();
s.pop(); // remove last
if s.len() > 0 {
s.remove(0); // remove first
}
I did this by using string slices.
fn main() {
let string: &str = "Hello World";
let first_last_off: &str = &string[1..string.len() - 1];
println!("{}", first_last_off);
}
I took all the characters in the string until the end of the string - 1.
You can also use split_at().
let msg = "Hello, world!";
let msg = msg.split_at(msg.len() - 1);
let msg = msg.0.split_at(1);
println!("{}", msg.1);
ello, world
split_at() returns the following: (&str, &str).

Convert int to a vector of strings

I am trying to convert long numbers to a string vector. For example, 17562 would become ["1", "7", "5", "6", "2"]. I have seen a lot of examples of converting ints to strings, but no ints to string vectors. I want to iterate over each digit individually.
Here is what I have so far, but it isn't working.
fn main() {
let x = 42;
let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
println!("{:?}", values);
}
Gives me the compiler error of :
<anon>:3:29: 3:42 error: borrowed value does not live long enough
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
<anon>:3:88: 6:2 note: reference must be valid for the block suffix following statement 1 at 3:87...
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
<anon>:4 println!("{:?}", values);
<anon>:5
<anon>:6 }
<anon>:3:5: 3:88 note: ...but borrowed value is only valid for the statement at 3:4
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
<anon>:3:5: 3:88 help: consider using a `let` binding to increase its lifetime
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
The equivalent of what I am trying to do in python would be x = 42; x = list(str(x)); print(x)
Ok, the first problem is that you don't store the result of x.to_string() anywhere. As such, it will cease to exist at the end of the expression, meaning that values will be trying to reference a value that no longer exists. Hence the error. The simplest solution is to just store the temporary string somewhere so that it continues to exist:
fn main() {
let x = 42;
let x_str = x.to_string();
let values: Vec<&str> = x_str.split(|c: char| c.is_alphabetic()).collect();
println!("{:?}", values);
}
Second problem: this outputs ["42"] because you told it to split on letters. You probably meant to use is_numeric:
fn main() {
let x = 42;
let x_str = x.to_string();
let values: Vec<&str> = x_str.split(|c: char| c.is_numeric()).collect();
println!("{:?}", values);
}
Third problem: this outputs ["", "", ""], because those are the three strings between numeric characters. Split's argument is the separator. Thus, the third problem is that you're using entirely the wrong method to begin with.
The closest direct equivalent to the Python code you listed would be:
fn main() {
let x = 42;
let values: Vec<String> = x.to_string().chars().map(|c| c.to_string()).collect();
println!("{:?}", values);
}
At last, it outputs: ["4", "2"].
But, this is horribly inefficient: this takes the integer, allocates an intermediate buffer, prints the integer to it, turns it into a string. It takes each code point in that string, allocates an intermediate buffer, prints the code point to it, turns it into a string. Then it collects all these strings into a Vec, possibly reallocating more than once.
It works, but is a bit wasteful. If you don't care about waste, you can stop reading now.
You can make things a bit less wasteful by collecting code points instead of strings:
fn main() {
let x = 42;
let values: Vec<char> = x.to_string().chars().collect();
println!("{:?}", values);
}
This outputs: ['4', '2']. Note the different quotes because we're using char instead of String.
We can remove the intermediate allocations from Vec resizing by pre-allocating its storage, which gives us this version:
fn main() {
let x = 42u32; // no negatives!
let values = {
if x == 0 {
vec!['0']
} else {
// pre-allocate Vec so there's no resizing
let digits = 1 + (x as f64).log10() as u32;
let mut cs = Vec::with_capacity(digits as usize);
let mut div = 10u32.pow(digits - 1);
while div > 0 {
cs.push((b'0' + ((x / div) % 10) as u8) as char);
div /= 10;
}
cs
}
};
println!("{:?}", values);
}
Unless you're doing this in a loop, I'd just stick to the correct, wasteful version.
If you are looking for a performant version, I'd just use this
fn digits(mut val: u64) -> Vec<u8> {
// An unsigned 64-bit number can have 20 digits
let mut result = Vec::with_capacity(20);
loop {
let digit = val % 10;
val = val / 10;
result.push(digit as u8);
if val == 0 { break }
}
result.reverse();
result
}
fn main() {
println!("{:?}", digits(0));
println!("{:?}", digits(1));
println!("{:?}", digits(9));
println!("{:?}", digits(10));
println!("{:?}", digits(11));
println!("{:?}", digits(1234567890));
println!("{:?}", digits(0xFFFFFFFFFFFFFFFF));
}
This may over allocate by a few bytes, but 20 bytes total is small unless you are doing this a whole bunch. It also leaves each value as a number, which you can convert to a string as needed.
What about:
let ss = value.to_string()
.chars()
.map(|c| c.to_string())
.collect::<Vec<_>>();
Demo
Not the greatest perf but reads well.

How do I get the first character out of a string?

I want to get the first character of a std::str. The method char_at() is currently unstable, as is String::slice_chars.
I have come up with the following, but it seems excessive to get a single character and not use the rest of the vector:
let text = "hello world!";
let char_vec: Vec<char> = text.chars().collect();
let ch = char_vec[0];
UTF-8 does not define what "character" is so it depends on what you want. In this case, chars are Unicode scalar values, and so the first char of a &str is going to be between one and four bytes.
If you want just the first char, then don't collect into a Vec<char>, just use the iterator:
let text = "hello world!";
let ch = text.chars().next().unwrap();
Alternatively, you can use the iterator's nth method:
let ch = text.chars().nth(0).unwrap();
Bear in mind that elements preceding the index passed to nth will be consumed from the iterator.
I wrote a function that returns the head of a &str and the rest:
fn car_cdr(s: &str) -> (&str, &str) {
for i in 1..5 {
let r = s.get(0..i);
match r {
Some(x) => return (x, &s[i..]),
None => (),
}
}
(&s[0..0], s)
}
Use it like this:
let (first_char, remainder) = car_cdr("test");
println!("first char: {}\nremainder: {}", first_char, remainder);
The output looks like:
first char: t
remainder: est
It works fine with chars that are more than 1 byte.
Get the first single character out of a string w/o using the rest of that string:
let text = "hello world!";
let ch = text.chars().take(1).last().unwrap();
It would be nice to have something similar to Haskell's head function and tail function for such cases.
I wrote this function to act like head and tail together (doesn't match exact implementation)
pub fn head_tail<T: Iterator, O: FromIterator<<T>::Item>>(iter: &mut T) -> (Option<<T>::Item>, O) {
(iter.next(), iter.collect::<O>())
}
Usage:
// works with Vec<i32>
let mut val = vec![1, 2, 3].into_iter();
println!("{:?}", head_tail::<_, Vec<i32>>(&mut val));
// works with chars in two ways
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, String>(&mut val));
// calling the function with Vec<char>
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, Vec<char>>(&mut val));
NOTE: The head_tail function doesn't panic! if the iterator is empty. If this matched Haskell's head/tail output, this would have thrown an exception if the iterator was empty. It might also be good to use iterable trait to be more compatible to other types.
If you only want to test for it, you can use starts_with():
"rust".starts_with('r')
"rust".starts_with(|c| c == 'r')
I think it is pretty straight forward
let text = "hello world!";
let c: char = text.chars().next().unwrap();
next() takes the next item from the iterator
To “unwrap” something in Rust is to say, “Give me the result of the computation, and if there was an error, panic and stop the program.”
The accepted answer is a bit ugly!
let text = "hello world!";
let ch = &text[0..1]; // this returns "h"

Resources