While I understand basically what str and std::string::String are and how they relate to each other, I find it a bit cumbersome to compose strings out of various parts without spending too much time and thought on it. So as usual I suspect I did not see the proper way to do it yet, which makes it intuitive and a breeze.
let mut s = std::string::String::with_capacity(200);
let precTimeToJSON = | pt : prectime::PrecTime, isLast : bool | {
s.push_str(
"{ \"sec\": "
+ &(pt.sec.to_string())
+ " \"usec\": "
+ &(pt.usec.to_string())
+ if isLast {"}"} else {"},"})
};
The code above is honored by the compiler with error messages like:
src\main.rs:25:20: 25:33 error: binary operation + cannot be applied to type &'static str [E0369]
And even after half an hours worth of fiddling and randomly adding &, I could not make this compilable. So, here my questions:
What do I have to write to achieve the obvious?
What is the "standard" way to do this in Rust?
The Rust compiler is right (of course): there's no + operator for string literals.
I believe the format!() macro is the idiomatic way to do what you're trying to do. It uses the std::fmt syntax, which essentially consists of a formatting string and the arguments to format (a la C's printf). For your example, it would look something like this:
let mut s: String = String::new();
let precTimeToJSON = | pt : prectime::PrecTime, isLast : bool | {
s = format!("{{ \"sec\": {} \"usec\": {} }}{}",
pt.sec,
pt.usec,
if isLast { "" } else { "," }
)
};
Because it's a macro, you can intermix types in the argument list freely, so long as the type implements the std::fmt::Display trait (which is true for all built-in types). Also, you must escape literal { and } as {{ and }}, respectively. Last, note that the format string must be a string literal, because the macro parses it and the expanded code looks nothing like the original format! expression.
Here's a playground link to the above example.
Two more points for you. First, if you're reading and writing JSON, have a look at a library such as serde. It's much less painful!
Second, if you just want to concatenate &'static str strings (that is, string literals), you can do that with zero run-time cost with the concat!() macro. It won't help you in your case above, but it might with other similar ones.
Itertools::format can help you write this as a single expression if you really want to.
let times: Vec<PrecTime>; // iterable of PrecTime
let s = format!("{}", times.iter().format(",", |pt, f|
f(&format_args!(r#"{{ "sec": {}, "usec": {} }}"#, pt.sec, pt.usec))
));
format() uses a separator, so just specify "," there (or "" if you need no separator). It's a bit involved so that the formatting can be completely lazy and composable. You receive a callback f that you call back with a &Display value (anything that can be Display formatted).
Here we demonstrate this great trick of using &format_args!() to construct a displayable value. This is something that comes in handy if you use the debug builder API as well.
Finally, use a raw string so that we don't need to escape the inner " in the format: r#"{{ "sec": {} "usec": {} }}"#. Raw strings are delimited by r#" and "# (free choice of number of #).
Itertools::format() uses no intermediate allocations, it is all directly passed on to the underlying formatter object.
You can also do this madness:
fn main() {
let mut s = std::string::String::with_capacity(200);
// Have to put this in a block so precTimeToJSON is dropped, see https://doc.rust-lang.org/book/closures.html
{
// I have no idea why this has to be mut...
let mut precTimeToJSON = |sec: u64, usec: u64, isLast: bool| {
s.push_str(&( // Coerce String to str. See https://doc.rust-lang.org/book/deref-coercions.html
"{ \"sec\": ".to_string() // String
+ &sec.to_string() // + &str (& coerces a String to a &str).
+ " \"usec\": " // + &str
+ &usec.to_string() // + &str
+ if isLast {"}"} else {"},"} // + &str
));
};
precTimeToJSON(30, 20, false);
}
println!("{}", &s);
}
Basically the operator String + &str -> String is defined, so you can do String + &str + &str + &str + &str. That gives you a String which you have to coerce back to a &str using &. I think this way is probably quite inefficient though as it will (possibly) allocate loads of Strings.
Related
I'd like to define a function that can return a number whose type is specified when the function is called. The function takes a buffer (Vec<u8>) and returns numeric value, e.g.
let byte = buf_to_num<u8>(&buf);
let integer = buf_to_num<u32>(&buf);
The buffer contains an ASCII string that represents a number, e.g. b"827", where each byte is the ASCII code of a digit.
This is my non-working code:
extern crate num;
use num::Integer;
use std::ops::{MulAssign, AddAssign};
fn buf_to_num<T: Integer + MulAssign + AddAssign>(buf: &Vec::<u8>) -> T {
let mut result : T;
for byte in buf {
result *= 10;
result += (byte - b'0');
}
result
}
I get mismatched type errors for both the addition and the multiplication lines (expected type T, found u32). So I guess my problem is how to tell the type system that T can be expressed in terms of a literal 10 or in terms of the result of (byte - b'0')?
Welcome to the joys of having to specify every single operation you're using as a generic. It's a pain, but it is worth.
You have two problems:
result *= 10; without a corresponding From<_> definition. This is because, when you specify "10", there is no way for the compiler to know what "10" as a T means - it knows primitive types, and any conversion you defined by implementing From<_> traits
You're mixing up two operations - coercion from a vector of characters to an integer, and your operation.
We need to make two assumptions for this:
We will require From<u32> so we can cap our numbers to u32
We will also clarify your logic and convert each u8 to char so we can use to_digit() to convert that to u32, before making use of From<u32> to get a T.
use std::ops::{MulAssign, AddAssign};
fn parse_to_i<T: From<u32> + MulAssign + AddAssign>(buf: &[u8]) -> T {
let mut buffer:T = (0 as u32).into();
for o in buf {
buffer *= 10.into();
buffer += (*o as char).to_digit(10).unwrap_or(0).into();
}
buffer
}
You can convince yourself of its behavior on the playground
The multiplication is resolved by force-casting the constant as u8, which makes it benefit from our requirement of From<u8> for T and allows the rust compiler to know we're not doing silly stuff.
The final change is to set result to have a default value of 0.
Let me know if this makes sense to you (or if it doesn't), and I'll be glad to elaborate further if there is a problem :-)
In Python, this would be final_char = mystring[-1]. How can I do the same in Rust?
I have tried
mystring[mystring.len() - 1]
but I get the error the type 'str' cannot be indexed by 'usize'
That is how you get the last char (which may not be what you think of as a "character"):
mystring.chars().last().unwrap();
Use unwrap only if you are sure that there is at least one char in your string.
Warning: About the general case (do the same thing as mystring[-n] in Python): UTF-8 strings are not to be used through indexing, because indexing is not a O(1) operation (a string in Rust is not an array). Please read this for more information.
However, if you want to index from the end like in Python, you must do this in Rust:
mystring.chars().rev().nth(n - 1) // Python: mystring[-n]
and check if there is such a character.
If you miss the simplicity of Python syntax, you can write your own extension:
trait StrExt {
fn from_end(&self, n: usize) -> char;
}
impl<'a> StrExt for &'a str {
fn from_end(&self, n: usize) -> char {
self.chars().rev().nth(n).expect("Index out of range in 'from_end'")
}
}
fn main() {
println!("{}", "foobar".from_end(2)) // prints 'b'
}
One option is to use slices. Here's an example:
let len = my_str.len();
let final_str = &my_str[len-1..];
This returns a string slice from position len-1 through the end of the string. That is to say, the last byte of your string. If your string consists of only ASCII values, then you'll get the final character of your string.
The reason why this only works with ASCII values is because they only ever require one byte of storage. Anything else, and Rust is likely to panic at runtime. This is what happens when you try to slice out one byte from a 2-byte character.
For a more detailed explanation, please see the strings section of the Rust book.
As #Boiethios mentioned
let last_ch = mystring.chars().last().unwrap();
Or
let last_ch = codes.chars().rev().nth(0).unwrap();
I would rather have (how hard is that!?)
let last_ch = codes.chars(-1); // Not implemented as rustc 1.56.1
I'd like to capitalize the first letter of a &str. It's a simple problem and I hope for a simple solution. Intuition tells me to do something like this:
let mut s = "foobar";
s[0] = s[0].to_uppercase();
But &strs can't be indexed like this. The only way I've been able to do it seems overly convoluted. I convert the &str to an iterator, convert the iterator to a vector, upper case the first item in the vector, which creates an iterator, which I index into, creating an Option, which I unwrap to give me the upper-cased first letter. Then I convert the vector into an iterator, which I convert into a String, which I convert to a &str.
let s1 = "foobar";
let mut v: Vec<char> = s1.chars().collect();
v[0] = v[0].to_uppercase().nth(0).unwrap();
let s2: String = v.into_iter().collect();
let s3 = &s2;
Is there an easier way than this, and if so, what? If not, why is Rust designed this way?
Similar question
Why is it so convoluted?
Let's break it down, line-by-line
let s1 = "foobar";
We've created a literal string that is encoded in UTF-8. UTF-8 allows us to encode the 1,114,112 code points of Unicode in a manner that's pretty compact if you come from a region of the world that types in mostly characters found in ASCII, a standard created in 1963. UTF-8 is a variable length encoding, which means that a single code point might take from 1 to 4 bytes. The shorter encodings are reserved for ASCII, but many Kanji take 3 bytes in UTF-8.
let mut v: Vec<char> = s1.chars().collect();
This creates a vector of characters. A character is a 32-bit number that directly maps to a code point. If we started with ASCII-only text, we've quadrupled our memory requirements. If we had a bunch of characters from the astral plane, then maybe we haven't used that much more.
v[0] = v[0].to_uppercase().nth(0).unwrap();
This grabs the first code point and requests that it be converted to an uppercase variant. Unfortunately for those of us who grew up speaking English, there's not always a simple one-to-one mapping of a "small letter" to a "big letter". Side note: we call them upper- and lower-case because one box of letters was above the other box of letters back in the day.
This code will panic when a code point has no corresponding uppercase variant. I'm not sure if those exist, actually. It could also semantically fail when a code point has an uppercase variant that has multiple characters, such as the German ß. Note that ß may never actually be capitalized in The Real World, this is the just example I can always remember and search for. As of 2017-06-29, in fact, the official rules of German spelling have been updated so that both "ẞ" and "SS" are valid capitalizations!
let s2: String = v.into_iter().collect();
Here we convert the characters back into UTF-8 and require a new allocation to store them in, as the original variable was stored in constant memory so as to not take up memory at run time.
let s3 = &s2;
And now we take a reference to that String.
It's a simple problem
Unfortunately, this is not true. Perhaps we should endeavor to convert the world to Esperanto?
I presume char::to_uppercase already properly handles Unicode.
Yes, I certainly hope so. Unfortunately, Unicode isn't enough in all cases.
Thanks to huon for pointing out the Turkish I, where both the upper (İ) and lower case (i) versions have a dot. That is, there is no one proper capitalization of the letter i; it depends on the locale of the the source text as well.
why the need for all data type conversions?
Because the data types you are working with are important when you are worried about correctness and performance. A char is 32-bits and a string is UTF-8 encoded. They are different things.
indexing could return a multi-byte, Unicode character
There may be some mismatched terminology here. A char is a multi-byte Unicode character.
Slicing a string is possible if you go byte-by-byte, but the standard library will panic if you are not on a character boundary.
One of the reasons that indexing a string to get a character was never implemented is because so many people misuse strings as arrays of ASCII characters. Indexing a string to set a character could never be efficient - you'd have to be able to replace 1-4 bytes with a value that is also 1-4 bytes, causing the rest of the string to bounce around quite a lot.
to_uppercase could return an upper case character
As mentioned above, ß is a single character that, when capitalized, becomes two characters.
Solutions
See also trentcl's answer which only uppercases ASCII characters.
Original
If I had to write the code, it'd look like:
fn some_kind_of_uppercase_first_letter(s: &str) -> String {
let mut c = s.chars();
match c.next() {
None => String::new(),
Some(f) => f.to_uppercase().chain(c).collect(),
}
}
fn main() {
println!("{}", some_kind_of_uppercase_first_letter("joe"));
println!("{}", some_kind_of_uppercase_first_letter("jill"));
println!("{}", some_kind_of_uppercase_first_letter("von Hagen"));
println!("{}", some_kind_of_uppercase_first_letter("ß"));
}
But I'd probably search for uppercase or unicode on crates.io and let someone smarter than me handle it.
Improved
Speaking of "someone smarter than me", Veedrac points out that it's probably more efficient to convert the iterator back into a slice after the first capital codepoints are accessed. This allows for a memcpy of the rest of the bytes.
fn some_kind_of_uppercase_first_letter(s: &str) -> String {
let mut c = s.chars();
match c.next() {
None => String::new(),
Some(f) => f.to_uppercase().collect::<String>() + c.as_str(),
}
}
Is there an easier way than this, and if so, what? If not, why is Rust designed this way?
Well, yes and no. Your code is, as the other answer pointed out, not correct, and will panic if you give it something like བོད་སྐད་ལ་. So doing this with Rust's standard library is even harder than you initially thought.
However, Rust is designed to encourage code reuse and make bringing in libraries easy. So the idiomatic way to capitalize a string is actually quite palatable:
extern crate inflector;
use inflector::Inflector;
let capitalized = "some string".to_title_case();
It's not especially convoluted if you are able to limit your input to ASCII-only strings.
Since Rust 1.23, str has a make_ascii_uppercase method (in older Rust versions, it was available through the AsciiExt trait). This means you can uppercase ASCII-only string slices with relative ease:
fn make_ascii_titlecase(s: &mut str) {
if let Some(r) = s.get_mut(0..1) {
r.make_ascii_uppercase();
}
}
This will turn "taylor" into "Taylor", but it won't turn "édouard" into "Édouard". (playground)
Use with caution.
I did it this way:
fn str_cap(s: &str) -> String {
format!("{}{}", (&s[..1].to_string()).to_uppercase(), &s[1..])
}
If it is not an ASCII string:
fn str_cap(s: &str) -> String {
format!("{}{}", s.chars().next().unwrap().to_uppercase(),
s.chars().skip(1).collect::<String>())
}
The OP's approach taken further:
replace the first character with its uppercase representation
let mut s = "foobar".to_string();
let r = s.remove(0).to_uppercase().to_string() + &s;
or
let r = format!("{}{s}", s.remove(0).to_uppercase());
println!("{r}");
works with Unicode characters as well eg. "😎foobar"
The first guaranteed to be an ASCII character, can changed to a capital letter in place:
let mut s = "foobar".to_string();
if !s.is_empty() {
s[0..1].make_ascii_uppercase(); // Foobar
}
Panics with a non ASCII character in first position!
Since the method to_uppercase() returns a new string, you should be able to just add the remainder of the string like so.
this was tested in rust version 1.57+ but is likely to work in any version that supports slice.
fn uppercase_first_letter(s: &str) -> String {
s[0..1].to_uppercase() + &s[1..]
}
Here's a version that is a bit slower than #Shepmaster's improved version, but also more idiomatic:
fn capitalize_first(s: &str) -> String {
let mut chars = s.chars();
chars
.next()
.map(|first_letter| first_letter.to_uppercase())
.into_iter()
.flatten()
.chain(chars)
.collect()
}
This is how I solved this problem, notice I had to check if self is not ascii before transforming to uppercase.
trait TitleCase {
fn title(&self) -> String;
}
impl TitleCase for &str {
fn title(&self) -> String {
if !self.is_ascii() || self.is_empty() {
return String::from(*self);
}
let (head, tail) = self.split_at(1);
head.to_uppercase() + tail
}
}
pub fn main() {
println!("{}", "bruno".title());
println!("{}", "b".title());
println!("{}", "🦀".title());
println!("{}", "ß".title());
println!("{}", "".title());
println!("{}", "བོད་སྐད་ལ".title());
}
Output
Bruno
B
🦀
ß
བོད་སྐད་ལ
Inspired by get_mut examples I code something like this:
fn make_capital(in_str : &str) -> String {
let mut v = String::from(in_str);
v.get_mut(0..1).map(|s| { s.make_ascii_uppercase(); &*s });
v
}
This question already has answers here:
How do I concatenate strings?
(9 answers)
Closed 7 years ago.
I started programming with Rust this week and I am having a lot of problems understanding how Strings work.
Right now, I am trying to do a simple program that prints a list of players appending their order(for learning purposes only).
let res : String = pl.name.chars().enumerate().fold(String::new(),|res,(i,ch)| -> String {
res+=format!("{} {}\n",i.to_string(),ch.to_string());
});
println!("{}", res);
This is my idea, I know I could just use a for loop but the objective is to understand the different Iterator functions.
So, my problem is that the String concatenation does not work.
Compiling prueba2 v0.1.0 (file:///home/pancho111203/projects/prueba2)
src/main.rs:27:13: 27:16 error: binary assignment operation `+=` cannot be applied to types `collections::string::String` and `collections::string::String` [E0368]
src/main.rs:27 res+=format!("{} {}\n",i.to_string(),ch.to_string());
^~~
error: aborting due to previous error
Could not compile `prueba2`.
I tried using &str but it is not possible to create them from i and ch values.
First, in Rust x += y is not overloadable, so += operator won't work for anything except basic numeric types. However, even if it worked for strings, it would be equivalent to x = x + y, like in the following:
res = res + format!("{} {}\n",i.to_string(),ch.to_string())
Even if this were allowed by the type system (it is not because String + String "overload" is not defined in Rust), this is still not how fold() operates. You want this:
res + &format!("{} {}\n", i, ch)
or, as a compilable example,
fn main(){
let x = "hello";
let res : String = x.chars().enumerate().fold(String::new(), |res, (i, ch)| {
res + &format!("{} {}\n", i, ch)
});
println!("{}", res);
}
When you perform a fold, you don't reassign the accumulator variable, you need to return the new value for it to be used on the next iteration, and this is exactly what res + format!(...) do.
Note that I've removed to_string() invocations because they are completely unnecessary - in fact, x.to_string() is equivalent to format!("{}", x), so you only perform unnecessary allocations here.
Additionally, I'm taking format!() result by reference: &format!(...). This is necessary because + "overload" for strings is defined for String + &str pair of types, so you need to convert from String (the result of format!()) to &str, and this can be done simply by using & here (because of deref coercion).
In fact, the following would be more efficient:
use std::fmt::Write;
fn main(){
let x = "hello";
let res: String = x.chars().enumerate().fold(String::new(), |mut res, (i, ch)| {
write!(&mut res, "{} {}\n", i, ch).unwrap();
res
});
println!("{}", res);
}
which could be written more idiomatically as
use std::fmt::Write;
fn main(){
let x = "hello";
let mut res = String::new();
for (i, ch) in x.chars().enumerate() {
write!(&mut res, "{} {}\n", i, ch).unwrap();
}
println!("{}", res);
}
(try it on playpen)
This way no extra allocations (i.e. new strings from format!()) are created. We just fill the string with the new data, very similar, for example, to how StringBuilder in Java works. use std::fmt::Write here is needed to allow calling write!() on &mut String.
I would also suggest reading the chapter on strings in the official Rust book (and the book as a whole if you're new to Rust). It explains what String and &str are, how they are different and how to work with them efficiently.
Note: this question contains deprecated pre-1.0 code! The answer is correct, though.
To convert a str to an int in Rust, I can do this:
let my_int = from_str::<int>(my_str);
The only way I know how to convert a String to an int is to get a slice of it and then use from_str on it like so:
let my_int = from_str::<int>(my_string.as_slice());
Is there a way to directly convert a String to an int?
You can directly convert to an int using the str::parse::<T>() method, which returns a Result containing the int.
let my_string = "27".to_string(); // `parse()` works with `&str` and `String`!
let my_int = my_string.parse::<i32>().unwrap();
You can either specify the type to parse to with the turbofish operator (::<>) as shown above or via explicit type annotation:
let my_int: i32 = my_string.parse().unwrap();
Since parse() returns a Result, it will either be an Err if the string couldn't be parsed as the type specified (for example, the string "peter" can't be parsed as i32), or an Ok with the value in it.
let my_u8: u8 = "42".parse().unwrap();
let my_u32: u32 = "42".parse().unwrap();
// or, to be safe, match the `Err`
match "foobar".parse::<i32>() {
Ok(n) => do_something_with(n),
Err(e) => weep_and_moan(),
}
str::parse::<u32> returns a Result<u32, core::num::ParseIntError> and Result::unwrap "Unwraps a result, yielding the content of an Ok [or] panics if the value is an Err, with a panic message provided by the Err's value."
str::parse is a generic function, hence the type in angle brackets.
If you get your string from stdin().read_line, you have to trim it first.
let my_num: i32 = my_num.trim().parse()
.expect("please give me correct string number!");
With a recent nightly, you can do this:
let my_int = from_str::<int>(&*my_string);
What's happening here is that String can now be dereferenced into a str. However, the function wants an &str, so we have to borrow again. For reference, I believe this particular pattern (&*) is called "cross-borrowing".
You can use the FromStr trait's from_str method, which is implemented for i32:
let my_num = i32::from_str("9").unwrap_or(0);
Yes, you can use the parse method on a String to directly convert it to an integer lik so:
let my_string = "42".to_string();
let my_int = my_string.parse::<i32>().unwrap();
The parse method returns a Result object, so you will need to handle the case where the string cannot be parsed into an integer. You can use unwrap as shown above to get the value if the parse was successful, or it will panic if the parse failed.
Or you can use the match expression to handle the success and failure cases separately like so:
let my_string = "42".to_string();
let my_int = match my_string.parse::<i32>() {
Ok(n) => n,
Err(_) => {
println!("Failed to parse integer");
0
},
};
FYI, the parse method is available for any type that implements the FromStr trait, which includes all of the integer types (e.g. i32, i64, etc.) as well as many other types such as f32 and bool.
Well, no. Why there should be? Just discard the string if you don't need it anymore.
&str is more useful than String when you need to only read a string, because it is only a view into the original piece of data, not its owner. You can pass it around more easily than String, and it is copyable, so it is not consumed by the invoked methods. In this regard it is more general: if you have a String, you can pass it to where an &str is expected, but if you have &str, you can only pass it to functions expecting String if you make a new allocation.
You can find more on the differences between these two and when to use them in the official strings guide.
So basically you want to convert a String into an Integer right!
here is what I mostly use and that is also mentioned in official documentation..
fn main() {
let char = "23";
let char : i32 = char.trim().parse().unwrap();
println!("{}", char + 1);
}
This works for both String and &str
Hope this will help too.