What's the proper way to check if a string is empty or blank for a) &str b) String? I used to do it by "aaa".len() == 0, but there should another way as my gut tells me?
Both &str and String have a method called is_empty:
Documentation for &str::is_empty
Documentation for String::is_empty
This is how they are used:
assert_eq!("".is_empty(), true); // a)
assert_eq!(String::new().is_empty(), true); // b)
Empty or whitespace only string can be checked with:
s.trim().is_empty()
where trim() returns a slice with whitespace characters removed from beginning and end of the string (https://doc.rust-lang.org/stable/std/primitive.str.html#method.trim).
Others have responded that Collection.is_empty can be used to know if a string is empty, but assuming by "is blank" you mean "is composed only of whitespace" then you want UnicodeStrSlice.is_whitespace(), which will be true for both empty strings and strings composed solely of characters with the White_Space unicode property set.
Only string slices implement UnicodeStrSlice, so you'll have to use .as_slice() if you're starting from a String.
tl;dr: s.is_whitespace() if s: &str, s.as_slice().is_whitespace() if s: String
Found in the doc :
impl Collection for String
fn len(&self) -> uint
fn is_empty(&self) -> bool
Related
Say I have a struct Foo that owns a string:
struct Foo {
owned_string: String
}
I want to implement some methods on this struct that return substrings from the owned String. For efficiency reasons, I don't want to allocate any new memory for this, I just want the return values to point to the original String.
Let's say I know the substring I want, it's characters 10 through 15.
I can't just slice it like self.owned_string[10..16], since that would give me bytes, not characters.
I can take the characters and collect them into a new String object, like self.owned_string.chars().skip(9).take(6).collect::<String>(), but that creates a new String object. String objects own their strings (AFAIK), so presumably new memory was allocated for this, which is not what I want.
How do I create string slices that reference a substring of a String object, but using character positions? (Without allocating any new memory)
You can use char_indices() then slice the string according to the positions the iterator gives you:
let mut iter = s.char_indices();
let (start, _) = iter.nth(10).unwrap();
let (end, _) = iter.nth(5).unwrap();
let slice = &s[start..end];
However, note that as mentioned in the documentation of chars():
Itβs important to remember that char represents a Unicode Scalar Value, and might not match your idea of what a βcharacterβ is. Iteration over grapheme clusters may be what you actually want. This functionality is not provided by Rustβs standard library, check crates.io instead.
#ChayimFriedman's answer is of course correct, I just wanted to contribute a more telling example:
fn print_string(s: &str) {
println!("String: {}", s);
}
fn main() {
let s: String = "π€£ππππ
".to_string();
let mut iter = s.char_indices();
// Retrieve the position of the char at pos 1
let (start, _) = iter.nth(1).unwrap();
// Now the next char will be at position `2`. Which would be
// equivalent of querying `.next()` or `.nth(0)`.
// So if we query for `nth(2)` we query 3 characters; meaning
// the position of character 4.
let (end, _) = iter.nth(2).unwrap();
// Gives you a &str, which is exactly what you want.
// A reference to a substring, zero allocations, zero overhead.
let substring = &s[start..end];
print_string(&s);
print_string(substring);
}
String: π€£ππππ
String: πππ
I've done it with smileys because smileys are definitely multi-byte unicode characters.
As #ChayimFriedman already noted, the reason why we have to iterate through the char_indices is because unicode characters are variably sized. They can be anywhere from 1 to 8 bytes long, so the only way to find out where the character boundaries are is to actually read the string up to the character we desire.
I'm trying to trim and lowercase my String.
Currently I have
use dialoguer::Input;
let input: String = Input::new()
.with_prompt("Guess a 5 letter word")
.interact_text()
.unwrap();
let guess: &str = input.as_str(); // trim and lowercase
I'm trying to transform String into a trimmed and lowercased &str but some functions are only on &str and others only on String so I'm having trouble coming up with an elegant solution.
TL;DR: End goal have guess be a trimmed and lowercase &str
Rust stdlib is not about elegance, but about correctness and efficiency.
In your particular case, trim() is defined as str::trim(&self) -> &str because it always returns a substring of the original string, so it does not need to copy or allocate a new string, just compute the begin and end, and do the slice.
But to_lowercase() is defined as str::to_lowercase(&self) -> String because it changes each of its characters to the lowercase equivalent, so it must allocate and fill a new String.
You may thing that if you own the string you can mutate it to lowercase in-place. But that will not work in general because there is not a 1-to-1 map between lowercase and uppercase letters. Think of, for example Γ <-> SS in German.
Naturally, you may know that your string only has ASCII characters... if so you can also use str::make_ascii_lowercase(&mut self) that does the change in-place, but only for ASCII characters that do have the 1-to-1 map.
So, summing up, the more ergonomic code would be, to trim input and copy to an owned lowercase:
let guess : String = input.trim().to_lowercase();
Or if you absolutely want to avoid allocating an extra string, but you are positive that only ASCII characters matter:
let mut input = input; //you could also add the mut above
input.make_ascii_lowercase();
let guess: &str = input.trim();
Try this:
let s = " aBcD ";
let s2 = s.trim().to_lowercase();
println!("[{s}], [{s2}]");
The above will work if s is &str (as in my example) or String and it will print:
[ aBcD ], [abcd]
So the last line in your code (if you insist on having guess as &str) should become:
let guess: &str = &input.trim().to_lowercase();
Otherwise if you write just:
let guess = input.trim().to_lowercase();
, guess will be of type String, as that's what to_lowercase() returns.
In Python, this would be final_char = mystring[-1]. How can I do the same in Rust?
I have tried
mystring[mystring.len() - 1]
but I get the error the type 'str' cannot be indexed by 'usize'
That is how you get the last char (which may not be what you think of as a "character"):
mystring.chars().last().unwrap();
Use unwrap only if you are sure that there is at least one char in your string.
Warning: About the general case (do the same thing as mystring[-n] in Python): UTF-8 strings are not to be used through indexing, because indexing is not a O(1) operation (a string in Rust is not an array). Please read this for more information.
However, if you want to index from the end like in Python, you must do this in Rust:
mystring.chars().rev().nth(n - 1) // Python: mystring[-n]
and check if there is such a character.
If you miss the simplicity of Python syntax, you can write your own extension:
trait StrExt {
fn from_end(&self, n: usize) -> char;
}
impl<'a> StrExt for &'a str {
fn from_end(&self, n: usize) -> char {
self.chars().rev().nth(n).expect("Index out of range in 'from_end'")
}
}
fn main() {
println!("{}", "foobar".from_end(2)) // prints 'b'
}
One option is to use slices. Here's an example:
let len = my_str.len();
let final_str = &my_str[len-1..];
This returns a string slice from position len-1 through the end of the string. That is to say, the last byte of your string. If your string consists of only ASCII values, then you'll get the final character of your string.
The reason why this only works with ASCII values is because they only ever require one byte of storage. Anything else, and Rust is likely to panic at runtime. This is what happens when you try to slice out one byte from a 2-byte character.
For a more detailed explanation, please see the strings section of the Rust book.
As #Boiethios mentioned
let last_ch = mystring.chars().last().unwrap();
Or
let last_ch = codes.chars().rev().nth(0).unwrap();
I would rather have (how hard is that!?)
let last_ch = codes.chars(-1); // Not implemented as rustc 1.56.1
Is there a function to do something like this:
fn string_to_unicode_char(s: &str) -> Option<char> {
// ...
}
fn main() {
let s = r"\u{00AA}"; // note the raw string literal!
string_to_unicode_char(s).unwrap();
}
Note that r"\u{00AA}" uses a raw string i. e. it isn't a Unicode sequence but 8 separate symbols, as \ u { 0 0 A A }.
I need to interpret/convert/parse this string and return a char if all is good, None otherwise. I don't have any experience with Unicode, so any ideas are welcome.
I believe the function you are looking for is char::from_u32:
fn string_to_unicode_char(s: &str) -> Option<char> {
// Do something more appropriate to find the actual number
let number = &s[3..7];
u32::from_str_radix(number, 16)
.ok()
.and_then(std::char::from_u32)
}
fn main() {
let s = r"\u{00AA}"; // note the raw string literal!
let ch = string_to_unicode_char(s);
assert_eq!(ch, Some('\u{00AA}'));
}
I indeed completely misunderstood your question; my old answer can be seen in the edit logs
Is there a builtin function to parse a string containing a Rust unicode escape into the corresponding unicode character?
AFAIK, no, there is not a builtin function to do that.
The answer to "how to do it yourself" is a bit broad, as there are many ways to do it (and it's not clear whether you also want to parse standard escapes, such as "\n").
Use a regex
Do simple, naive manual parsing
Embed it into a bigger lexer (the function in the Rust compiler parsing such unicode escapes)
I have a String (in particular a SHA1 hex digest) that I would like to split into two substrings - the first two characters and the rest of of the string. Is there a clean way to do this in Rust?
If you know that your string only contains ASCII characters (as in case with sha digests), you can use slices directly:
let s = "13e3f28a65a42bf6258cbd1d883d1ce3dac8f085";
let first = &s[..2]; // "13"
let rest = &s[2..]; // "e3f28a65a42bf6258cbd1d883d1ce3dac8f085"
It won't work correctly if your string contains non-ASCII characters because slicing uses byte offsets, and if any index used in slicing points into the middle of a code point representation, your program will panic.
There's a split_at method since Rust 1.4, use it like this:
let s = "13e3f28a65a42bf6258cbd1d883d1ce3dac8f085";
let (first, last) = s.split_at(2);
assert_eq!("13", first);
assert_eq!("e3f28a65a42bf6258cbd1d883d1ce3dac8f085", last);
Note that the index is a byte position and must lie on a character boundary. In this case this works because you know that your input string is ASCII.
If you are expecting two Strings instead of slices, you can use the chars() method and some Iterator methods to obtain them.
let text = "abcdefg".to_string();
let start: String = text.chars().take(2).collect();
let end: String = text.chars().skip(2).collect();
If you don't want to do heap allocations, you can use slices instead:
let start: &str = text.slice_chars(0, 2);
let end: &str = text.slice_chars(2, text.char_len());
Note that the slices version requires you to use unstable rust (nightly builds, not the beta)
Here is a way to efficiently split a String into two Strings, in case you have this owned string data case. The allocation of the input string is retained in the first piece by just using truncation.
/// Split a **String** at a particular index
///
/// **Panic** if **byte_index** is not a character boundary
fn split_string(mut s: String, byte_index: usize) -> (String, String)
{
let tail = s[byte_index..].into();
s.truncate(byte_index);
(s, tail)
}
Note: The .into() method is from the generic conversion trait Into and in this case it converts &str into String.