I was wondering how to convert a styled string into a vector. Say I had a String with the value:
"[x, y]"
-how could I turn it into a vector that has x as the first object and y as the second object?
Thanks!
Sure, but the elements can't be references. As mentioned by #prog-fh that isn't possible in rust since once compiled, variable names may not be stored and the compiler may have even removed some during optimizations.
You can however do something more similar to python's ast.literal_eval using serde with Rust Object Notation (RON, a type of serialization that was made to resemble rust data structures). It isn't perfect, but it is an option. It does however require you know what types you are trying to parse.
use ron::from_str;
let input = "[37.6, 24.3, 89.023]";
let parsed: Vec<f32> = from_str(input).unwrap();
On the other hand if #mcarton is correct and you want something like vec!["x", "y"], you could manually parse it like so:
fn parse(input: &str) -> Option<Vec<String>> {
let mut part = String::new();
let mut collected = Vec::new();
let mut char_iter = input.chars();
if char_iter.next() != Some('[') {
return None
}
loop {
match char_iter.next()? {
']' => {
collected.push(part);
return Some(collected)
}
',' | ' ' => {
if !part.is_empty() {
collected.push(part);
part = String::new();
}
}
x => part.push(x),
}
}
}
println!("{:?}", parse("[a, b, foo]"));
Or you could also use a regex to break it up instead, but you can look into how that works yourself.
Related
I can't figure out how to do import- and instancing-lines such that they tolerate non-existing files/modules and structs.
I tried making a macro that unwraps into such lines based on what files it finds in the directory, using a crate I found that had promise - include_optional - which allows to check for existence of files already at compile-time (since it's a macro).
However, I can't figure out how to use it properly in a macro, neither did I manage to use it without macro using the example at bottom of the docs conditional compilation chapter.
if cfg!(unix) { "unix" } else if cfg!(windows) { "windows" } else { "unknown" } (from the docs)
vs
if include_optional::include_bytes_optional!("day1.rs").is_some() { Some(&day01::Day01 {}) } else { None } // assume day1.rs and thus Day01 are non-existent (my attempt at doing same thing)
My if-statement compiles both cases, including the unreachable code (causing a compilation error), despite how according to the the docs it supposedly doesn't for cfg! ("conditional compilation").
Essentially, what I want is something of this form:
// Macro to generate code based on how many files/structs has been created
// There are anywhere between 1-25 days
get_days_created!;
/* // Let's assume 11 have been created so far, then the macro should evaluate to this code:
* mod day1;
* use day1 as day0#;
* // ...
* mod day11;
* use day11 as day11;
*
* // ...
* fn main() -> Result<(), i32> {
* let days : Vec<&dyn Day> = vec![
* &day01::Day01 {},
* // ...
* &day11::Day11 {},
* ];
* // ...
* }
*/
The solution is to create a proc_macro. These function similar to regular macros except they allow you to write a function of actual code they should execute, instead being given (and returning) a 'TokenStream' to parse the given tokens (and, respectively, what tokens the macro should expand to).
To create a proc_macro, the first and most important piece of information you need to know is that you can't do this anywhere. Instead, you need to create a new library, and in its Cargo.toml file you need to set proc-macro = true. Then you can declare them in its lib.rs. An example TOML would look something like this:
[package]
name = "some_proc_macro_lib"
version = "0.1.0"
edition = "2021"
[lib]
proc-macro = true
[dependencies]
glob = "0.3.0"
regex = "1.7.0"
Then you can create your macros in this library as regular functions, with the #[proc_macro] attribute/annotation. Here's an example lib.rs with as few dependencies as possible. For my exact question, the input TokenStream is irrelevant and can be ignored, and instead you want to generate and return a new one:
use proc_macro::TokenStream;
use glob::glob;
use regex::Regex;
#[proc_macro]
pub fn import_days(_: TokenStream) -> TokenStream {
let mut stream = TokenStream::new();
let re = Regex::new(r".+(\d+)").unwrap();
for entry in glob("./src/day*.rs").expect("Failed to read pattern") {
if let Ok(path) = entry {
let prefix = path.file_stem().unwrap().to_str().unwrap();
let caps = re.captures(prefix);
if let Some(caps) = caps {
let n: u32 = caps.get(1).unwrap().as_str().parse().unwrap();
let day = &format!("{}", prefix);
let day_padded = &format!("day{:0>2}", n);
stream.extend(format!("mod {};", day).parse::<TokenStream>().unwrap());
if n < 10 {
stream.extend(format!("use {} as {};", day, day_padded).parse::<TokenStream>().unwrap());
}
}
}
}
return proc_macro::TokenStream::from(stream);
}
The question could be considered answered with this already, but the answer can and should be further expanded on in my opinion. And as such I will do so.
Some additional explanations and suggestions, beyond the scope of the question
There are however quite a few other crates beside proc_macro that can aid you with both parsing the input stream, and building the output one. Of note are the dependencies syn and quote, and to aid them both there's the crate proc_macro2.
The syn crate
With syn you get helpful types, methods and macros for parsing the input Tokenstream. Essentially, with a struct Foo implementing syn::parse::Parse and the macro let foo = syn::parse_macro_input!(input as Foo) you can much more easily parse it into a custom struct thanks to syn::parse::ParseStream. An example would be something like this:
use proc_macro2::Ident;
use syn;
use syn::parse::{Parse, ParseStream};
#[derive(Debug, Default)]
struct Foo {
idents: Vec<Ident>,
}
impl syn::parse::Parse for Foo {
fn parse(input: syn::parse::ParseStream) -> syn::Result<Self> {
let mut foo= Foo::default();
while !input.is_empty() {
let fn_ident = input.parse::<Ident>()?;
foo.idents.push(fn_ident);
// Optional comma: Ok vs Err doesn't matter. Just consume if it exists and ignore failures.
input.parse::<syn::token::Comma>().ok();
}
return Ok(foo);
}
}
Note that the syn::Result return-type allows for nice propagation of parsing-errors when using the sugary ? syntax: input.parse::<SomeType>()?
The quote crate
With quote you get a helpful macro for generating a tokenstream more akin to how macro_rules does it. As an argument you write essentially regular code, and tell it to use the value of variables by prefixing with #.
Do note that you can't just pass it variables containing strings and expect it to expand into identifiers, as strings resolve to the value "foo" (quotes included). ie. mod "day1"; instead of mod day1;. You need to turn them into either:
a proce_macro2::Ident
syn::Ident::new(foo_str, proc_macro2::Span::call_site())
or a proc_macro2::TokenStream
foo_str.parse::<TokenStream>().unwrap()
The latter also allows to convert longer strings with more than a single Ident, and manages things such as literals etc., making it possible to skip the quote! macro entirely and just use this tokenstream directly (as seen in import_days).
Here's an example that creates a struct with dynamic name, and implements a specific trait for it:
use proc_macro2::TokenStream;
use quote::quote;
// ...
let mut stream = TokenStream::new();
stream.extend(quote!{
#[derive(Debug)]
pub struct #day_padded_upper {}
impl Day for #day_padded_upper {
#trait_parts
}
});
return proc_macro::TokenStream::from(stream);
Finally, on how to implement my question
This 'chapter' is a bit redundant, as I essentially answered it with the first two code-snippets (.toml and fn import_days), and the rest could have been considered an exercise for the reader. However, while the question is about reading the filesystem at compile-time in a macro to 'dynamically' change its expansion (sort of), I wrote it in a more general form asking how to achieve a specific result (as old me didn't know macro's could do that). So for completion I'll include this 'chapter' nevertheless.
There is also the fact that the last macro in this 'chapter' - impl_day (which wasn't mentioned at all in the question) - serves as a good example of how to achieve two adjacent but important and relevant tasks.
Retrieving and using call-site's filename.
Parsing the input TokenStream using the syn dependency as shown above.
In other words: knowing all the above, this is how you can create macros for importing all targeted files, instantiating structs for all targeted files, as well as to declare + define the struct from current file's name.
Importing all targeted files:
See import_days above at the start.
Instantiating Vec with structs from all targeted files:
#[proc_macro]
pub fn instantiate_days(_: proc_macro::TokenStream) -> proc_macro::TokenStream {
let re = Regex::new(r".+(\d+)").unwrap();
let mut stream = TokenStream::new();
let mut block = TokenStream::new();
for entry in glob("./src/day*.rs").expect("Failed to read pattern") {
match entry {
Ok(path) => {
let prefix = path.file_stem().unwrap().to_str().unwrap();
let caps = re.captures(prefix);
if let Some(caps) = caps {
let n: u32 = caps.get(1).unwrap().as_str().parse().unwrap();
let day_padded = &format!("day{:0>2}", n);
let day_padded_upper = &format!("Day{:0>2}", n);
let instance = &format!("&{}::{} {{}}", day_padded, day_padded_upper).parse::<TokenStream>().unwrap();
block.extend(quote!{
v.push( #instance );
});
}
},
Err(e) => println!("{:?}", e),
}
}
stream.extend(quote!{
{
let mut v: Vec<&dyn Day> = Vec::new();
#block
v
}
});
return proc_macro::TokenStream::from(stream);
}
Declaring and defining struct for current file invoking this macro:
#[derive(Debug, Default)]
struct DayParser {
parts: Vec<Ident>,
}
impl Parse for DayParser {
fn parse(input: ParseStream) -> syn::Result<Self> {
let mut day_parser = DayParser::default();
while !input.is_empty() {
let fn_ident = input.parse::<Ident>()?;
// Optional, Ok vs Err doesn't matter. Just consume if it exists.
input.parse::<syn::token::Comma>().ok();
day_parser.parts.push(fn_ident);
}
return Ok(day_parser);
}
}
#[proc_macro]
pub fn impl_day(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
let mut stream = TokenStream::new();
let span = Span::call_site();
let binding = span.source_file().path();
let file = binding.to_str().unwrap();
let re = Regex::new(r".*day(\d+).rs").unwrap();
let caps = re.captures(file);
if let Some(caps) = caps {
let n: u32 = caps.get(1).unwrap().as_str().parse().unwrap();
let day_padded_upper = format!("Day{:0>2}", n).parse::<TokenStream>().unwrap();
let day_parser = syn::parse_macro_input!(input as DayParser);
let mut trait_parts = TokenStream::new();
for (k, fn_ident) in day_parser.parts.into_iter().enumerate() {
let k = k+1;
let trait_part_ident = format!("part_{}", k).parse::<TokenStream>().unwrap();
// let trait_part_ident = proc_macro::Ident::new(format!("part_{}", k).as_str(), span);
trait_parts.extend(quote!{
fn #trait_part_ident(&self, input: &str) -> Result<String, ()> {
return Ok(format!("Part {}: {:?}", #k, #fn_ident(input)));
}
});
}
stream.extend(quote!{
#[derive(Debug)]
pub struct #day_padded_upper {}
impl Day for #day_padded_upper {
#trait_parts
}
});
} else {
// don't generate anything
let str = format!("Tried to implement Day for a file with malformed name: file = \"{}\" , re = \"{:?}\"", file, re);
println!("{}", str);
// compile_error!(str); // can't figure out how to use these
}
return proc_macro::TokenStream::from(stream);
}
The idea here is simple but I have tried three different ways with different errors each time: read in a string as an argument, but if the string is invalid or the string isn't provided, use a default.
Can this be done using Result to detect a valid string or a panic?
The basic structure I expect:
use std::env;
use std::io;
fn main() {
let args: Vec<String> = args().collect();
let word: Result<String, Error> = &args[1].expect("Valid string");
let word: String = match word {
Ok(word) = word,
Err(_) = "World",
}
println!("Hello, {}", word);
}
So, there are a lot of issues in your code.
First and foremost, in a match statement, you do not use =, you use =>.
Additionally, your match statement returns something, which makes it not an executing block, but rather a returning block (those are not the official terms). That means that your blocks result is bound to a variable. Any such returning block must end with a semicolon.
So your match statement would become:
let word: String = match word {
Ok(word) => word,
Err(_) => ...,
};
Next, when you do use std::env, you do not import all of the functions from it into your namespace. All you do is that you create an alias, so that the compiler turns env::<something> intostd::env::<something> automatically.
Therefore, this needs to be changed:
let args: Vec<String> = env::args().collect();
The same problem exists in your next line. What is Error? Well, what you actually mean is io::Error, that is also not imported due to the same reasons stated above. You might be wondering now, how Result does not need to be imported. Well, it is because the Rust Team has decided on a certain set of functions and struct, which are automatically imported into every project. Error is not one of them.
let word: Result<String, io::Error> = ...;
The next part is wrong twice (or even thrice).
First of all, the operation [x] does not return a Result, it returns the value and panics if it is out-of-bounds.
Now, even if it was a result, this line would still be wrong. Why? Because you expect(...) the result. That would turn any Result into its value.
Now, what you are looking for is the .get(index) operation. It tries to get a value and if it fails, it returns None, so it returns an option. What is an option? It is like a result, but there is no error value. It must be noted that get() returns the option filled with a reference to the string.
The line should look something like this:
let word: Option<&String> = args.get(1);
Now you have two options to handle default values, but before we come to that, I need to tell you why your error value is wrong.
In Rust, there are two kinds of Strings.
There is ´&str`, which you can create like this:
let a: &str = "Hello, World!";
These are immutable and non-borrowed strings stored on the stack. So you cannot just create a new one with arbitary values on the fly.
On the other hand, we have mutable and heap-allocated Strings.
let mut a: String = String::new();
a.push_str("Hello, World!");
// Or...
let b: String = String::from("Hello, World");
You store your arguments as a String, but in your match statement, you try to return a &str.
So, there are two ways to handle your error:
let word: Option<&String> = args.get(1);
let word: String = match word {
Some(word) => word.to_string(),
None => String::from("World"),
};
If you do not want to allocate that second string, you can also use
let word: Option<&String> = args.get(1);
let word: &str = match word {
Some(word) => word.as_str(),
None => "World",
};
The second option, unwrap_or
let args: Vec<String> = env::args().collect();
let default = &String::from("World");
let word: &String = args.get(1).unwrap_or(default);
println!("Hello, {}", word);
is a bit uglier, as it requires you to bind the default value to a variable. This will do what your match statement above does, but it's a bit prettier.
This works too:
let word: &str = args.get(1).unwrap_or(default);
So this is my favourite version of your program above:
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let default = &String::from("World");
let word: &str = args.get(1).unwrap_or(default);
println!("Hello, {}", word);
}
But this one works too:
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let word: Option<&String> = args.get(0);
let word: &str = match word {
Some(word) => word.as_str(),
None => "World",
};
println!("Hello, {}", word);
}
playground
use serde_json::json; // 1.0.66
use std::str;
fn main() {
let input = "{\"a\": \"b\\u001fc\"}";
let bytes = input.as_bytes();
let json: serde_json::Value = serde_json::from_slice(bytes).unwrap();
for (_k, v) in json.as_object().unwrap() {
let vec = serde_json::to_vec(v).unwrap();
let utf8_str = str::from_utf8(&vec).unwrap();
println!("value: {}", v);
println!("utf8_str: {}", utf8_str);
println!("bytes: {:?}", vec);
}
}
How can the value of object key "a" be transformed into the following string?
b\u{1f}c
I've tried with serde_json and str::from_utf8, but I always get "b\u001fc" as the result. The escaped character sequence is not interpreted correctly. How this can be solved?
The problem is this line:
let vec = serde_json::to_vec(v).unwrap();
From the serde_json docs on to_vec():
Serialize the given data structure as a JSON byte vector.
You are deserializing from JSON, getting the values of the object, serializing them back to JSON and printing that. You don't want to serialize back to JSON, you want to print the "raw" string, so something like this does what you want:
fn main() {
let input = "{\"a\": \"b\\u001fc\"}";
let bytes = input.as_bytes();
let json: serde_json::Value = serde_json::from_slice(bytes).unwrap();
for (_k, v) in json.as_object().unwrap() {
let string = v.as_str().unwrap();
println!("bytes: {:?}", string);
}
}
Playground
I think things are closer to working than you think. Your problem is not that the escape sequence isn't being interpreted correctly, but rather that serde_json::to_vec(v) essentially re-encodes v (which is serde_json::value::Value::String) into a vector of JSON-encoded bytes. This means that it picks up the surrounding quote characters (byte 34) and turns the escape sequence into a literal ['\\', 'u', ...] — because that's how it would look in JSON.
If you want to get the string value out, you can do this:
for (_k, v) in json.as_object().unwrap() {
if let serde_json::value::Value::String(s) = v {
println!("{:?}", s);
}
}
This prints "b\u{1f}c", the Rust string you want.
I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?
This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}
I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators
Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]