Concisely initializing a vector of Strings - rust

I'm trying to create a vector of Strings to test arg parsing (since this is what std::env::args() returns) but struggling with how to do this concisely.
What I want:
let test_args = vec!["-w", "60", "arg"]; // should array of Strings
let expected_results = my_arg_parser(test_args);
This obviously doesn't work because the vectors contents are all &strs.
Using String::from but works but doesn't scale well and is ugly :)
let args = vec![String::from("-w"), String::from("60"), String::from("args")];
I could map over the references and return string objects, but this seems very verbose:
let args = vec!["-w", "60", "args"].iter().map(|x| x.to_string()).collect::<Vec<String>>();
Should I just create a helper function to do the conversion, or is there an easier way?

You can use the to_string() method directly on the literals:
let test_args = vec!["-w".to_string(), "60".to_string(), "arg".to_string()];
Otherwise a macro to do this would be as simple as:
macro_rules! vec_of_strings {
($($x:expr),*) => (vec![$($x.to_string()),*]);
}
See play.rust.org example

JDemler already provided a nice answer. I have two additional things to say:
First, you can also use into() instead of to_string() for all elements but the first. This is slightly shorter and also equivalent to to_string()/String::from(). Looks like this:
vec!["a".to_string(), "b".into(), "c".into()];
Second, you might want to redesign your arg parsing. I will assume here that you won't mutate the Strings you get from env::args(). I imagine your current function to look like:
fn parse_args(args: &[String]) -> SomeResult { ... }
But you can make that function more generic by not accepting Strings but AsRef<str>. It would look like this:
fn parse_args<T: AsRef<str>>(args: &[T]) -> SomeResult { ... }
In the documentation you can see that String as well as str itself implement that trait. Therefore you can pass a &[String] and a &[&str] into your function. Awesome, eh?
In similar fashion, if you want to accept anything that can be converted into an owned String, you can accept <T: Into<String>> and if you want to return either a String or an &str, you can use Cow. You can read more about that here and here.
Apart from all that: there are plenty of good CLI-Arg parsers out there (clap-rs, docopt-rs, ...), so you might not need to write your own.

I agree that Lukas Kalbertodt's answer is the best — use generics to accept anything that can look like a slice of strings.
However, you can clean up the map version a little bit:
There's no need to allocate a vector for the initial set of strings.
There's no need to use the complete type (Vec<String>); you could specify just the collection (Vec<_>). If you pass the result to a function that only accepts a Vec<String>, then you don't need any explicit types at all; it can be completely inferred.
You can use a slightly shorter s.into() in the map.
fn do_stuff_with_args(args: Vec<String>) { println!("{}", args.len()) }
fn main() {
let args = ["-w", "60", "args"].iter().map(|&s| s.into()).collect();
do_stuff_with_args(args);
}

Related

Rust Manipulating Strings in Functions

I'm new to Rust, and I want to process strings in a function in Rust and then return a struct that contains the results of that processing to use in more functions. This is very simplified and a bit messier because of all my attempts to get this working, but:
struct Strucc<'a> {
string: &'a str,
booool: bool
}
fn do_stuff2<'a>(input: &'a str) -> Result<Strucc, &str> {
let to_split = input.to_lowercase();
let splitter = to_split.split("/");
let mut array: Vec<&str> = Vec::new();
for split in splitter {
array.push(split);
}
let var = array[0];
println!("{}", var);
let result = Strucc{
string: array[0],
booool: false
};
Ok(result)
}
The issue is that to convert the &str to lowercase, I have to create a new String that's owned by the function.
As I understand it, the reason this won't compile is because when I split the new String I created, all the &strs I get from it are substrings of the String, which are all still owned by the function, and so when the value is returned and that String goes out of scope, the value in the struct I returned gets erased.
I tried to fix this with lifetimes (as you can see in the function definition), but from what I can tell I'd have to give the String a lifetime which I can't do as far as I'm aware because it isn't borrowed. Either that or I need to make the struct own that String (which I also don't understand how to do, nor does it seem reasonable as I'd have to make the struct mutable).
Also as a sidenote: Previously I have tried to just use a String in the struct but I want to define constants which won't work with that, and I still don't think it would solve the issue. I've also tried to use .clone() in various places just in case but had no luck (though I know why this shouldn't work anyway).
I have been looking for some solution for this for hours and it feels like such a small step so I feel I may be asking the wrong questions or have missed something simple but please explain it like I'm five because I'm very confused.
I think you misunderstand what &str actually is. &str is just a pointer to the string data plus a length. The point of &str is to be an immutable reference to a specific string, which enables all sorts of nice optimizations. When you attempt to turn the &str lowercase, Rust needs somewhere to put the data, and the only place to put it would be a String, because Strings own their data. Take a look at this post for more information.
Your goal is unachievable without Strucc containing a String, since .to_lowercase() has to create new data, and you have to allocate the resulting data somewhere in order to own a reference to it. The best place to put the resulting data would be the returned struct, i.e. Strucc, and therefore Strucc must contain a String.
Also as a sidenote: Previously I have tried to just use a String in the struct but I want to define constants which won't work with that, and I still don't think it would solve the issue.
You can use "x".to_owned() to create a String literal.
If you're trying to create a global constant, look at once_cell's lazy global initialization.

Writing expression in polars-lazy in rust

I need to write my own expression in polars_lazy. Based on my understanding from the source code I need to write a function that returns Expr::Function. The problem is that in order to construct an object of this type, an object of type FunctionOptions must be provided. The caveat is that this class is public but the members are pub(crate) and thus outside of the create one cannot construct such an object.
Are there ways around this?
I don't think you're meant to directly construct Exprs. Instead, you can use functions like polars_lazy::dsl::col() and polars_lazy::dsl::lit() to create expressions, then use methods on Expr to build up the expression. Several of those methods, such as map() and apply(), will give you an Expr::Function.
Personally I think the Rust API for polars is not well documented enough to really use yet. Although the other answer and comments mention apply and map, they don't mention how or the trade-offs. I hope this answer prompts others to correct me with the "right" way to do things.
So first, here's how to use apply on lazy dataframe, even though lazy dataframes don't take apply directly as a method as eager ones do, and mutating in-place:
// not sure how you'd find this type easily from apply documentation
let o = GetOutput::from_type(DataType::UInt32);
// this mutates two in place
let lf = lf.with_column(col("two").apply(str_to_len, o));
And here's how to use it while not mutating the source column and adding a new output column instead:
let o = GetOutput::from_type(DataType::UInt32);
// this adds new column len, two is unchanged
let lf = lf.with_column(col("two").alias("len").apply(str_to_len, o));
With the str_to_len looking like:
fn str_to_len(str_val: Series) -> Result<Series> {
let x = str_val
.utf8()
.unwrap()
.into_iter()
// your actual custom function would be in this map
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
Ok(x.into_series())
}
Note that it takes Series rather than &Series and wraps in Result.
With a regular (non-lazy) dataframe, apply still mutates but doesn't require with_column:
df.apply("two", str_to_len).expect("applied");
Whereas eager/non-lazy's with_column doesn't require apply:
// the fn we use to make the column names it too
df.with_column(str_to_len(df.column("two").expect("has two"))).expect("with_column");
And str_to_len has slightly different signature:
fn str_to_len(str_val: &Series) -> Series {
let mut x = str_val
.utf8()
.unwrap()
.into_iter()
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
// NB. this is naming the chunked array, before we even get to a series
x.rename("len");
x.into_series()
}
I know there's reasons to have lazy and eager operate differently, but I wish the Rust documentation made this easier to figure out.

Ergonomically passing a slice of trait objects

I am converting a variety of types to String when they are passed to a function. I'm not concerned about performance as much as ergonomics, so I want the conversion to be implicit. The original, less generic implementation of the function simply used &[impl Into<String>], but I think that it should be possible to pass a variety of types at once without manually converting each to a string.
The key is that ideally, all of the following cases should be valid calls to my function:
// String literals
perform_tasks(&["Hello", "world"]);
// Owned strings
perform_tasks(&[String::from("foo"), String::from("bar")]);
// Non-string types
perform_tasks(&[1,2,3]);
// A mix of any of them
perform_tasks(&["All", 3, String::from("types!")]);
Some various signatures I've attempted to use:
fn perform_tasks(items: &[impl Into<String>])
The original version fails twice; it can't handle numeric types without manual conversion, and it requires all of the arguments to be the same type.
fn perform_tasks(items: &[impl ToString])
This is slightly closer, but it still requires all of the arguments to be of one type.
fn perform_tasks(items: &[&dyn ToString])
Doing it this way is almost enough, but it won't compile unless I manually add a borrow on each argument.
And that's where we are. I suspect that either Borrow or AsRef will be involved in a solution, but I haven't found a way to get them to handle this situation. For convenience, here is a playground link to the final signature in use (without the needed references for it to compile), alongside the various tests.
The following way works for the first three cases if I understand your intention correctly.
pub fn perform_tasks<I, A>(values: I) -> Vec<String>
where
A: ToString,
I: IntoIterator<Item = A>,
{
values.into_iter().map(|s| s.to_string()).collect()
}
As the other comments pointed out, Rust does not support an array of mixed types. However, you can do one extra step to convert them into a &[&dyn fmt::Display] and then call the same function perform_tasks to get their strings.
let slice: &[&dyn std::fmt::Display] = &[&"All", &3, &String::from("types!")];
perform_tasks(slice);
Here is the playground.
If I understand your intention right, what you want is like this
fn main() {
let a = 1;
myfn(a);
}
fn myfn(i: &dyn SomeTrait) {
//do something
}
So it's like implicitly borrow an object as function argument. However, Rust won't let you to implicitly borrow some objects since borrowing is quite an important safety measure in rust and & can help other programmers quickly identified which is a reference and which is not. Thus Rust is designed to enforce the & to avoid confusion.

Vec<PathBuf> to &[&Path] without allocations?

I have function that return Vec<PathBuf> and function that accept &[&Path], basically like this:
use std::path::{Path, PathBuf};
fn f(paths: &[&Path]) {
}
fn main() {
let a: Vec<PathBuf> = vec![PathBuf::from("/tmp/a.txt"), PathBuf::from("/tmp/b.txt")];
f(&a[..]);
}
Is it possible to convert Vec<PathBuf> to &[&Path] without memory allocations?
If not, how should I change f signature to accept slices with Path and PathBuf?
Is it possible to convert Vec<PathBuf> to &[&Path] without memory allocations?
No, as answered by How do I write a function that takes both owned and non-owned string collections?; a PathBuf and a Path have different memory layouts (the answer uses String and str; the concepts are the same).
how should I change f signature to accept slices with Path and PathBuf?
Again as suggested in How do I write a function that takes both owned and non-owned string collections?, use AsRef:
use std::path::{Path, PathBuf};
fn f<P>(paths: &[P])
where P: AsRef<Path>
{}
fn main() {
let a = vec![PathBuf::from("/tmp/a.txt")];
let b = vec![Path::new("/tmp/b.txt")];
f(&a);
f(&b);
}
This requires no additional heap allocation.
To pass around a slice, you have to also have the original data held somewhere. To have a &[&Path], then this needs to be pointing into something like a Vec<&Path>. But you don't have one of those, you have a Vec<PathBuf>.
To get this to work with your existing signatures, you can make a temporary Vec<&Path> and then take a slice of it.
fn f(paths: &[&Path]) {
}
fn main() {
let a: Vec<PathBuf> = vec![PathBuf::from("/tmp/a.txt"), PathBuf::from("/tmp/b.txt")];
let paths: Vec<&Path> = a.iter().map(PathBuf::as_path).collect();
f(&paths[..]);
}
Even though this creates a new Vec, this is just a couple of pointers on the stack - it doesn't have to actually copy any of the paths.
It's not possible to just cast without any allocations, because their layout in memory is different.
Vec<PathBuf> stores the data inline, and [&Path] stores pointers to the data (it's roughly similar to Vec<&PathBuf>).
You need to create a new vector to hold the pointers. If the size is known at compile time, you could use a stack-allocated array for it. Otherwise map+collect is needed.

How to write a fn that processes input and returns an iterator instead of the full result?

Forgive me if this is a dumb question, but I'm new to Rust, and having a hard time writing this toy program to test my understanding.
I want a function that given a string, returns the first word in each line, as an iterator (because the input could be huge, I don't want to buffer the result as an array). Here's the program I wrote which collects the result as an array first:
fn get_first_words(input: ~str) -> ~[&str] {
return input.lines_any().filter_map(|x| x.split_str(" ").nth(0)).collect();
}
fn main() {
let s = ~"Hello World\nFoo Bar";
let words = get_words(s);
for word in words.iter() {
println!("{}", word);
}
}
Result (as expected):
Hello
Foo
How do I modify this to return an Iterator instead? I'm apparently not allowed to make Iterator<&str> the return type. If I try #Iterator<&str>, rustc says
error: The managed box syntax is being replaced by the `std::gc::Gc` and `std::rc::Rc` types. Equivalent functionality to managed trait objects will be implemented but is currently missing.
I can't figure out for the life of me how to make that work.
Similarly, trying to return ~Iterator<&str> makes rustc complain that the actual type is std::iter::FilterMap<....blah...>.
In C# this is really easy, as you simply return the result of the equivalent map call as an IEnumerable<string>. Then the callee doesn't have to know what the actual type is that's returned, it only uses methods available in the IEnumerable interface.
Is there nothing like returning an interface in Rust??
(I'm using Rust 0.10)
I believe that the equivalent of the C# example would be returning ~Iterator<&str>. This can be done, but must be written explicitly: rather than returning x, return ~x as ~Iterator<&'a str>. (By the way, your function is going to have to take &'a str rather than ~str—if you don’t know why, ask and I’ll explain.)
This is not, however, idiomatic Rust because it is needlessly inefficient. The idiomatic Rust is to list the return type explicitly. You can specify it in one place like this if you like:
use std::iter::{FilterMap, Map};
use std::str::CharSplits;
type Foo = FilterMap<'a, &'a str, &'a str,
Map<'a, &'a str, &'a str,
CharSplits<'a, char>>>
And then list Foo as the return type.
Yes, this is cumbersome. At present, there is no such thing as inferring a return type in any way. This has, however, been discussed and I believe it likely that it will come eventually in some syntax similar to fn foo<'a>(&'a str) -> Iterator<&'a str>. For now, though, there is no fancy sugar.

Resources