I have the following code to create a string of length n+1, where n is passed as a command line argument:
let args: Vec<String> = env::args().collect();
let min_no_of_bytes = &args[2].parse::<u64>();
let no_of_times_to_repeat = min_no_of_bytes as usize;
let mut inputtext='0'.to_string().repeat(no_of_times_to_repeat);
But I get the following error during compile:
error[E0606]: casting `&Result<u64, ParseIntError>` as `usize` is invalid
--> src/main.rs:33:17
|
33 | let temp = min_no_of_bytes as usize;
| ^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: cast through a raw pointer first
Please advise. Thanks.
There are several problems that are unhandled in the original code.
args is not guaranteed to have a second argument, therefore &args[2] could panic
the string at &args[2] is not guaranteed to be a parseable number and therefore parse returns a Result, which may carry an error.
Direct array access is discouraged in general, in Rust. It is of course possible, but it can panic and is slow. So if you want to access an array, you would usually use iterators or the .get() function.
Handling all the errors
First, I'd recommend handling all the errors before we make it a bit more pretty. The rust compiler should force you to do that if you avoid direct array accesses and unwrap:
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let arg = match args.get(2) {
Some(val) => val,
None => {
println!("Not enough arguments provided!");
return;
}
};
let min_no_of_bytes = match arg.parse::<u64>() {
Ok(val) => val,
Err(e) => {
println!("Unable to parse number from argument: {}", e);
return;
}
};
let no_of_times_to_repeat = min_no_of_bytes as usize;
let inputtext = '0'.to_string().repeat(no_of_times_to_repeat);
println!("{}", inputtext);
}
Running this program with: cargo run -- bla 3 prints 000.
Setting up proper error handling
If you realize we now have println!(...error) and return right next to each other. Instead, we could directly return an error from main to combine those two actions. Rust will then automatically print the error.
In order to accept any error type, this can be done with the type Result<(), Box<dyn Error>> or wrapping types like anyhow.
I'll provide the anyhow solution here, because I think it's in general better to use a wrapper library instead of Box<dyn Error> directly.
use anyhow::{anyhow, Result};
use std::env;
fn main() -> Result<()> {
let args: Vec<String> = env::args().collect();
let arg = args
.get(2)
.ok_or(anyhow!("Not enough arguments provided!"))?;
let min_no_of_bytes = arg.parse::<u64>()?;
let no_of_times_to_repeat = min_no_of_bytes as usize;
let inputtext = '0'.to_string().repeat(no_of_times_to_repeat);
println!("{}", inputtext);
Ok(())
}
Note the ? operator here, which is a quick way of saying "if this is an Err value, return it from the function, otherwise unwrap the value". Basically the safe, error-propagating version of .unwrap().
Using a proper command line parsing library
I won't go into further detail here, but if your projects get a little bigger than 10 lines of code, you will quickly discover that writing argument parsing manually is pretty tedious.
There are several libraries that make this process very easy. The one I'd recommend at the time of writing this is clap:
use clap::Parser;
/// A program that does things
#[derive(Parser, Debug)]
#[clap(version, about)]
struct Args {
/// Minimum number of bytes
#[clap(short)]
min_no_of_bytes: u64,
}
fn main() {
let args = Args::parse();
let no_of_times_to_repeat = args.min_no_of_bytes as usize;
let inputtext = '0'.to_string().repeat(no_of_times_to_repeat);
println!("{}", inputtext);
}
This requires clap = { version = "3.1", features = ["derive"] } in your Cargo.toml.
Running it with cargo run -- -m 3 gives you 000.
The cool side effect is that you get a help message for free:
cargo run -- -h
my-program 0.1.0
A program that does things
USAGE:
my-program.exe -m <MIN_NO_OF_BYTES>
OPTIONS:
-h, --help Print help information
-m <MIN_NO_OF_BYTES> Minimum number of bytes
-V, --version Print version information
The following worked:
let args: Vec<String> = env::args().collect();
let min_no_of_bytes = *(&args[2].parse::<usize>().unwrap());
let mut inputtext='0'.to_string().repeat(no_of_times_to_repeat);
Related
In the following code I would like to get the 2nd string in a vector args and then parse it into an i32. This code will not complie, however, because i can not call parse() on the Option value returned by nth().
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let a = args.iter().nth(1).parse::<i32>();
}
I know i could just use expect() to unwrap the value, before trying to parse it, however I do not want my code to panic. I want a to be a Result value that is an Err if either nth() or parse() fails, and otherwise is a Ok(Int). Is there a way to accomplish this in rust? Thanks.
It is quite easy if you look in the documentation for either Option or Result. The function you are thinking of is likely and_then which allows you to then provide a closure which can change the Ok type and value if filled, but otherwise leaves it unchanged when encountering an error. However, you need to do though is decide on a common error type to propagate. Since the Option<&String> needs to be turned to an error on a None value we have to choose a type to use.
Here I provide a brief example with a custom error type. I decided to use .get instead of .iter().nth(1) since it does the same thing and we might as well take advantage of the Vec since you have gone to the work of creating it.
use std::num::ParseIntError;
enum ArgParseError {
NotFound(usize),
InvalidArg(ParseIntError),
}
let args: Vec<String> = env::args().collect();
let a: Result<i32, ArgParseError> = args
.get(1) // Option<&String>
.ok_or_else(|| ArgParseError::NotFound(1)) // Result<&String, ArgParseError>
.and_then(|x: &String| {
x.parse::<i32>() // Result<i32, ParseIntError>
.map_err(|e| ArgParseError::InvalidArg(e)) // Result<i32, ArgParseError>
});
You could try the following.
use std::{env, num::ParseIntError};
enum Error {
ParseIntError(ParseIntError),
Empty,
}
fn main() {
let args: Vec<String> = env::args().collect();
let a: Option<Result<i32, ParseIntError>> = args.iter().nth(1).map(|s| s.parse::<i32>());
let a: Result<i32, Error> = match a {
Some(Ok(a)) => Ok(a),
Some(Err(e)) => Err(Error::ParseIntError(e)),
None => Err(Error::Empty),
};
}
The following code reads space-delimited records from stdin, and writes comma-delimited records to stdout. Even with optimized builds it's rather slow (about twice as slow as using, say, awk).
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let fields: Vec<_> = line.split(' ').collect();
println!("{}", fields.join(","));
}
}
One obvious improvement would be to use itertools to join without allocating a vector (the collect call causes an allocation). However, I tried a different approach:
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache.clear();
}
}
This version tries to reuse the same vector over and over. Unfortunately, the compiler complains:
error: `line` does not live long enough
--> src/main.rs:7:22
|
7 | cache.extend(line.split(' '));
| ^^^^
|
note: reference must be valid for the block suffix following statement 1 at 5:39...
--> src/main.rs:5:40
|
5 | let mut cache = Vec::<&str>::new();
| ^
note: ...but borrowed value is only valid for the for at 6:4
--> src/main.rs:6:5
|
6 | for line in stdin.lock().lines().map(|x| x.unwrap()) {
| ^
error: aborting due to previous error
Which of course makes sense: the line variable is only alive in the body of the for loop, whereas cache keeps a pointer into it across iterations. But that error still looks spurious to me: since the cache is cleared after each iteration, no reference to line can be kept, right?
How can I tell the borrow checker about this?
The only way to do this is to use transmute to change the Vec<&'a str> into a Vec<&'b str>. transmute is unsafe and Rust will not raise an error if you forget the call to clear here. You might want to extend the unsafe block up to after the call to clear to make it clear (no pun intended) where the code returns to "safe land".
use std::io::BufRead;
use std::mem;
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let cache: &mut Vec<&str> = unsafe { mem::transmute(&mut cache) };
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache.clear();
}
}
In this case Rust doesn't know what you're trying to do. Unfortunately, .clear() does not affect how .extend() is checked.
The cache is a "vector of strings that live as long as the main function", but in extend() calls you're appending "strings that live only as long as one loop iteration", so that's a type mismatch. The call to .clear() doesn't change the types.
Usually such limited-time uses are expressed by making a long-lived opaque object that enables access to its memory by borrowing a temporary object with the right lifetime, like RefCell.borrow() gives a temporary Ref object. Implementation of that would be a bit involved and would require unsafe methods for recycling Vec's internal memory.
In this case an alternative solution could be to avoid any allocations at all (.join() allocates too) and stream the printing thanks to Peekable iterator wrapper:
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let mut fields = line.split(' ').peekable();
while let Some(field) = fields.next() {
print!("{}", field);
if fields.peek().is_some() {
print!(",");
}
}
print!("\n");
}
BTW: Francis' answer with transmute is good too. You can use unsafe to say you know what you're doing and override the lifetime check.
Itertools has .format() for the purpose of lazy formatting, which skips allocating a string too.
use std::io::BufRead;
use itertools::Itertools;
fn main() {
let stdin = std::io::stdin();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
println!("{}", line.split(' ').format(","));
}
}
A digression, something like this is a “safe abstraction” in the littlest sense of the solution in another answer here:
fn repurpose<'a, T: ?Sized>(mut v: Vec<&T>) -> Vec<&'a T> {
v.clear();
unsafe {
transmute(v)
}
}
Another approach is to refrain from storing references altogether, and to store indices instead. This trick can also be useful in other data structure contexts, so this might be a nice opportunity to try it out.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
cache.push(0);
cache.extend(line.match_indices(' ').map(|x| x.0 + 1));
// cache now contains the indices where new words start
// do something with this information
for i in 0..(cache.len() - 1) {
print!("{},", &line[cache[i]..(cache[i + 1] - 1)]);
}
println!("{}", &line[*cache.last().unwrap()..]);
cache.clear();
}
}
Though you made the remark yourself in the question, I feel the need to point out that there are more elegant methods to do this using iterators, that might avoid the allocation of a vector altogether.
The approach above was inspired by a similar question here, and becomes more useful if you need to do something more complicated than printing.
Elaborating on Francis's answer about using transmute(), this could be safely abstracted, I think, with this simple function:
pub fn zombie_vec<'a, 'b, T: ?Sized>(mut data: Vec<&'a T>) -> Vec<&'b T> {
data.clear();
unsafe {
std::mem::transmute(data)
}
}
Using this, the original code would be:
fn main() {
let stdin = std::io::stdin();
let mut cache0 = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let mut cache = cache0; // into the loop
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache0 = zombie_vec(cache); // out of the loop
}
}
You need to move the outer vector into every loop iteration, and restore it back to before you finish, while safely erasing the local lifetime.
The safe solution is to use .drain(..) instead of .clear() where .. is a "full range". It returns an iterator, so drained elements can be processed in a loop. It is also available for other collections (String, HashMap, etc.)
fn main() {
let mut cache = Vec::<&str>::new();
for line in ["first line allocates for", "second"].iter() {
println!("Size and capacity: {}/{}", cache.len(), cache.capacity());
cache.extend(line.split(' '));
println!(" {}", cache.join(","));
cache.drain(..);
}
}
I'm trying to first set a String to be some default, but then update that String if a command line argument has been given...
This is my starting point (which doesn't compile):
use std::env;
fn main() {
let mut config_file = "C:\\temp\\rust\\config.txt".to_string();
let args: Vec<String> = env::args().collect();
if args.len() > 1 {
config_file = args[1];
}
println!("Config file path: {}", config_file);
}
So, (I think) env::args() is giving me an owned vector or owned strings... How do I either:
Copy a string in the vector
Get a reference to a string in the vector
Note:
$ rustc --version
rustc 1.8.0 (db2939409 2016-04-11)
In Rust, to create a copy of an element, it should implement the Clone trait, and thus have a .clone() method.
String implements Clone, thus:
config_file = args[1].clone();
Your method, however, has many unnecessary memory allocations; we can do better there is no need to create a Vec, args() yields an iterator so let's use that directly and cherry-pick the interesting value.
With this in mind:
fn main() {
let mut config_file = "C:\\temp\\rust\\config.txt".to_string();
if let Some(v) = env::args().nth(1) {
config_file = v;
}
println!("Config file path: {}", config_file);
}
At the behest of Shepmaster: it's show time!
The following is an equivalent program, without mutability or escape characters, and with as little allocations as possible:
fn main() {
let config_file = env::args()
.nth(1)
.unwrap_or_else(|| r#"C:\temp\rust\config.txt"#.to_string());
println!("Config file path: {}", config_file);
}
It uses unwrap_or_else on the Option returned by nth(1) to get either the content of the Option or, if none, generate a value using the passed lambda.
It also show cases the Raw String Literals, a great feature to use when having to embed back slashes in a string.
I'm working with rust-fuse, which takes mount options as a &[&std::ffi::os_str::OsStr]. It appears that I should be splitting my incoming comma-separated options string, which I'm doing like so:
mod fuse {
use std::ffi::OsStr;
pub fn mount(options: &[&OsStr]) {}
}
fn example(optstr: &str) {
let mut options: &[&str] = &[];
if optstr != "" {
options = optstr.split(",").collect::<Vec<_>>().as_slice();
}
fuse::mount(options)
}
Which gives the following error:
error[E0308]: mismatched types
--> src/main.rs:12:17
|
12 | fuse::mount(options)
| ^^^^^^^ expected struct `std::ffi::OsStr`, found str
|
= note: expected type `&[&std::ffi::OsStr]`
found type `&[&str]`
I was under the impression that all &strs were also OsStrs, but I'm new to Rust, so I guess that's wrong.
Use OsStr::new:
use std::ffi::OsStr;
fn main() {
let a_string: &str = "Hello world";
let an_os_str: &OsStr = OsStr::new(a_string);
println!("{:?}", an_os_str);
}
Note that the explicit type specification is not necessary, I'm just including it for educational purposes.
In your specific case:
let options: Vec<_> = optstr.split(",").map(OsStr::new).collect();
fuse::mount(&options)
It's actually rather rare to need to do this explicitly, however. Most of the time, functions accept a type that implements AsRef<OsStr>. This would allow you to pass more types without having to think about it. You may want to consider asking the maintainer or submitting a patch to the library to make it more generic.
I'm trying to port my library clog to the latest Rust version.
Rust changed a lot in the previous month and so I'm scratching my head over this code asking myself if there's really no way anymore to write this in a completely chained way?
fn get_last_commit () -> String {
let output = Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.output()
.ok().expect("error invoking git rev-parse");
let encoded = String::from_utf8(output.stdout).ok().expect("error parsing output of git rev-parse");
encoded
}
In an older version of Rust the code could be written like that
pub fn get_last_commit () -> String {
Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.spawn()
.ok().expect("failed to invoke rev-parse")
.stdout.as_mut().unwrap().read_to_string()
.ok().expect("failed to get last commit")
}
It seems there is no read_to_string() method anymore that doesn't take a buffer which makes it hard to implement a chaining API unless I'm missing something.
UPDATE
Ok, I figured I can use map to get it chaining.
fn get_last_commit () -> String {
Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.output()
.map(|output| {
String::from_utf8(output.stdout).ok().expect("error reading into string")
})
.ok().expect("error invoking git rev-parse")
}
Actually I wonder if I could use and then but it seems the errors don't line up correctly ;)
As others have said, this was changed to allow reusing buffers/avoiding allocations.
Another alternative is to use read_to_string and manually provide the buffer:
pub fn get_last_commit () -> String {
let mut string = String::new();
Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.spawn()
.ok().expect("failed to invoke rev-parse")
.stdout.as_mut().unwrap()
.read_to_string(&mut string)
.ok().expect("failed to get last commit");
string
}
This API was changed so that you didn't have to re-allocate a new String each time. However, as you've noticed, there's some convenience loss if you don't care about allocation. It might be a good idea to suggest re-adding this back in, like what happened with Vec::from_elem. Maybe open a small RFC?
While it may make sense to try to add this back to the standard library, here's a version of read_to_string that allocates on its own that you can use today:
#![feature(io)]
use std::io::{self,Read,Cursor};
trait MyRead: Read {
fn read_full_string(&mut self) -> io::Result<String> {
let mut s = String::new();
let r = self.read_to_string(&mut s);
r.map(|_| s)
}
}
impl<T> MyRead for T where T: Read {}
fn main() {
let bytes = b"hello";
let mut input = Cursor::new(bytes);
let s = input.read_full_string();
println!("{}", s.unwrap());
}
This should allow you to use the chaining style you had before.