Read excel file with calamine and serde - Cannot find column with alias

Read excel file with calamine and serde - Cannot find column with alias - rust

I am trying to read an xlsx file in rust with calamine and serde to eventually replace some python code I have :)
The excel file has a header row with aliases in it
Name
Value
name
1
...
...
my code
use std::result::Result;
use calamine::{open_workbook_auto, Error, Reader, RangeDeserializerBuilder};
use serde::Deserialize;
#[derive(Deserialize, Debug)]
struct XlsxRow {
#[serde(alias = "Name")]
name: String,
#[serde(alias = "Value")]
value: u32,
}
pub(crate) fn read_file(path: std::path::PathBuf) -> Result<(),Error>{
let mut excel = open_workbook_auto(path).expect("Cannot open file");
let sheet_names = excel.sheet_names().to_vec();
let sheet_name = sheet_names.first().unwrap();
let range = excel.worksheet_range(sheet_name).ok_or(calamine::Error::Msg("Cannot find sheet"))??;
let iter_result =
RangeDeserializerBuilder::with_headers(&["name", "age", "email"]).from_range::<_, XlsxRow>(&range)?;
for my_struct in iter_result {
println!("{:?}", my_struct);
}
Ok(())
}
but it doesn't work
thread 'main' panicked at 'testfile failed: De(HeaderNotFound("name"))'
But based on that answer (which works) I think my code is okay. It also works if in the excel file I use name and value as headers
Is that maybe a bug in how calamine implemented serde deserialization? Or where am I wrong?

Related

(De)serialize RFC-3339 timestamp with serde to time-rs OffsetDateTime

My goal is to (de)serialize objects with RFC-3339 timestamps from Json to Rust structs (and vice versa) using serde and time-rs.
I would expect this ...
use serde::Deserialize;
use time::{OffsetDateTime};
#[derive(Deserialize)]
pub struct DtoTest {
pub timestamp: OffsetDateTime,
}
fn main() {
let deserialization_result = serde_json::from_str::<DtoTest>("{\"timestamp\": \"2022-07-08T09:10:11Z\"}");
let dto = deserialization_result.expect("This should not panic");
println!("{}", dto.timestamp);
}
... to create the struct and display the timestamp as the output, but I get ...
thread 'main' panicked at 'This should not panic: Error("invalid type: string \"2022-07-08T09:10:11Z\", expected an `OffsetDateTime`", line: 1, column: 36)', src/main.rs:12:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
My dependencies look like this:
[dependencies]
serde = { version = "1.0.138", features = ["derive"] }
serde_json = "1.0.82"
time = { version = "0.3.11", features = ["serde"] }
According to the documentation of the time-rs crate, this seems to be possible but I must be missing something.

The default serialization format for time is some internal format. If you want other formats, you should enable the serde-well-known feature and use the serde module to choose the format you want:
#[derive(Deserialize)]
pub struct DtoTest {
#[serde(with = "time::serde::rfc3339")]
pub timestamp: OffsetDateTime,
}

The solution below is based on the serde_with crate. As per its documentation, it aims to be more flexible and composable.
use serde_with::serde_as;
use time::OffsetDateTime;
use time::format_description::well_known::Rfc3339;
#[serde_as]
#[derive(Deserialize)]
pub struct DtoTest {
#[serde_as(as = "Rfc3339")]
pub timestamp: OffsetDateTime,
}
And the Cargo.toml file should have:
[dependencies]
serde_with = { version = "2", features = ["time_0_3"] }
At the following page are listed all De/Serialize transformations available.

Passing a data structure as an args to a proc macro attribute

I'm trying to create a macro that to use on a struct, in which you get another struct as an argument, then concatenate the structa fields. This the code of the macro
#[proc_macro_attribute]
pub fn cat(args: proc_macro::TokenStream, input: proc_macro::TokenStream) -> proc_macro:: TokenStream {
let mut base_struct = parse_macro_input!(input as ItemStruct);
let mut new_fields_struct = parse_macro_input!(args as ItemStruct);
if let syn::Fields::Named(ref mut base_fields) = base_struct.fields {
if let syn::Fields::Named(ref mut new_fields) = new_fields_struct.fields {
for field in new_fields.named.iter() {
base_fields.push(field.clone());
}
}
}
return quote! {
#base_struct
}.into();
}
Here's a basic example of what I want it to be used
struct new_fields{
field: usize,
}
#[cat(new_fields)]
struct base_struct{
base_field, usize,
}
However, it will just convert the word 'new_fields' into a TokenStream then fail to compile because it isn't a struct. Is there a way to 'pass' a struct to the macro as an argument without writing the code inside the the macro usage?

Is there a way to 'pass' a struct to the macro as an argument without writing the code inside the the macro usage?
No. Macros operate on tokens they are given and nothing else.
It's hard to give advice without knowing more detail, but the closest to what you've provided would be to have your macro generate the other struct, so that it knows the definition. You could perhaps pass it like this:
#[cat(
struct new_fields{
field: usize,
}
)]
struct base_struct{
base_field, usize,
}

How to pretty print Syn AST?

I'm trying to use syn to create an AST from a Rust file and then using quote to write it to another. However, when I write it, it puts extra spaces between everything.
Note that the example below is just to demonstrate the minimum reproducible problem I'm having. I realize that if I just wanted to copy the code over I could copy the file but it doesn't fit my case and I need to use an AST.
pub fn build_file() {
let current_dir = std::env::current_dir().expect("Unable to get current directory");
let rust_file = std::fs::read_to_string(current_dir.join("src").join("lib.rs")).expect("Unable to read rust file");
let ast = syn::parse_file(&rust_file).expect("Unable to create AST from rust file");
match std::fs::write("src/utils.rs", quote::quote!(#ast).to_string());
}
The file that it creates an AST of is this:
#[macro_use]
extern crate foo;
mod test;
fn init(handle: foo::InitHandle) {
handle.add_class::<Test::test>();
}
What it outputs is this:
# [macro_use] extern crate foo ; mod test ; fn init (handle : foo :: InitHandle) { handle . add_class :: < Test :: test > () ; }
I've even tried running it through rustfmt after writing it to the file like so:
utils::write_file("src/utils.rs", quote::quote!(#ast).to_string());
match std::process::Command::new("cargo").arg("fmt").output() {
Ok(_v) => (),
Err(e) => std::process::exit(1),
}
But it doesn't seem to make any difference.

The quote crate is not really concerned with pretty printing the generated code. You can run it through rustfmt, you just have to execute rustfmt src/utils.rs or cargo fmt -- src/utils.rs.
use std::fs;
use std::io;
use std::path::Path;
use std::process::Command;
fn write_and_fmt<P: AsRef<Path>, S: ToString>(path: P, code: S) -> io::Result<()> {
fs::write(&path, code.to_string())?;
Command::new("rustfmt")
.arg(path.as_ref())
.spawn()?
.wait()?;
Ok(())
}
Now you can just execute:
write_and_fmt("src/utils.rs", quote::quote!(#ast)).expect("unable to save or format");
See also "Any interest in a pretty-printing crate for Syn?" on the Rust forum.

As Martin mentioned in his answer, prettyplease can be used to format code fragments, which can be quite useful when testing proc macro where the standard to_string() on proc_macro2::TokenStream is rather hard to read.
Here a code sample to pretty print a proc_macro2::TokenStream parsable as a syn::Item:
fn pretty_print_item(item: proc_macro2::TokenStream) -> String {
let item = syn::parse2(item).unwrap();
let file = syn::File {
attrs: vec![],
items: vec![item],
shebang: None,
};
prettyplease::unparse(&file)
}
I used this in my tests to help me understand where is the wrong generated code:
assert_eq!(
expected.to_string(),
generate_event().to_string(),
"\n\nActual:\n {}",
pretty_print_item(generate_event())
);

Please see the new prettyplease crate. Advantages:
It can be used directly as a library.
It can handle code fragments while rustfmt only handles full files.
It is fast because it uses a simpler algorithm.

Similar to other answers, I also use prettyplease.
I use this little trick to pretty-print a proc_macro2::TokenStream (e.g. what you get from calling quote::quote!):
fn pretty_print(ts: &proc_macro2::TokenStream) -> String {
let file = syn::parse_file(&ts.to_string()).unwrap();
prettyplease::unparse(&file)
}
Basically, I convert the token stream to an unformatted String, then parse that String into a syn::File, and then pass that to prettyplease package.
Usage:
#[test]
fn it_works() {
let tokens = quote::quote! {
struct Foo {
bar: String,
baz: u64,
}
};
let formatted = pretty_print(&tokens);
let expected = "struct Foo {\n bar: String,\n baz: u64,\n}\n";
assert_eq!(formatted, expected);
}

How to deserialize a TOML table containing an array of tables

Take the following TOML data:
[[items]]
foo = 10
bar = 100
[[items]]
foo = 12
bar = 144
And the following rust code:
use serde_derive::Deserialize;
use toml::from_str;
use toml::value::Table;
#[derive(Deserialize)]
struct Item {
foo: String,
bar: String
}
fn main() {
let items_string: &str = "[[items]]\nfoo = 10\nbar = 100\n\n[[items]]\nfoo = 12\nbar = 144\n";
let items_table: Table = from_str(items_string).unwrap();
let items: Vec<Item> = items_table["items"].as_array().unwrap().to_vec();
// Uncomment this line to print the table
// println!("{:?}", items_table);
}
As you can see by yourself, the program does not compile, giving this error in return:
expected struct Item, found enum toml::value::Value
I understand its meaning, but I don't know how I could solve this and achieve what I wanted to do in the first place: cast a child array of a parent table into an array of structs and NOT into an array of tables.

You can parse into the pre-defined TOML types such as Table, but these types don't know about types outside of the pre-defined ones. Those types are mostly used when the actual type of the data is unknown, or unimportant.
In your case that means that the TOML Table type doesn't know about your Item type and cannot be made to know about it.
However you can easily parse into a different type:
use serde_derive::Deserialize;
use std::collections::HashMap;
use toml::from_str;
#[derive(Deserialize, Debug)]
struct Item {
foo: u64,
bar: u64,
}
fn main() {
let items_string: &str = "[[items]]\nfoo = 10\nbar = 100\n\n[[items]]\nfoo = 12\nbar = 144\n";
let items_table: HashMap<String, Vec<Item>> = from_str(items_string).unwrap();
let items: &[Item] = &items_table["items"];
println!("{:?}", items_table);
println!("{:?}", items);
}
(Permalink to the playground)

How can I create hygienic identifiers in code generated by procedural macros?

When writing a declarative (macro_rules!) macro, we automatically get macro hygiene. In this example, I declare a variable named f in the macro and pass in an identifier f which becomes a local variable:
macro_rules! decl_example {
($tname:ident, $mname:ident, ($($fstr:tt),*)) => {
impl std::fmt::Display for $tname {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let Self { $mname } = self;
write!(f, $($fstr),*)
}
}
}
}
struct Foo {
f: String,
}
decl_example!(Foo, f, ("I am a Foo: {}", f));
fn main() {
let f = Foo {
f: "with a member named `f`".into(),
};
println!("{}", f);
}
This code compiles, but if you look at the partially-expanded code, you can see that there's an apparent conflict:
impl std::fmt::Display for Foo {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let Self { f } = self;
write!(f, "I am a Foo: {}", f)
}
}
I am writing the equivalent of this declarative macro as a procedural macro, but do not know how to avoid potential name conflicts between the user-provided identifiers and identifiers created by my macro. As far as I can see, the generated code has no notion of hygiene and is just a string:
src/main.rs
use my_derive::MyDerive;
#[derive(MyDerive)]
#[my_derive(f)]
struct Foo {
f: String,
}
fn main() {
let f = Foo {
f: "with a member named `f`".into(),
};
println!("{}", f);
}
Cargo.toml
[package]
name = "example"
version = "0.1.0"
edition = "2018"
[dependencies]
my_derive = { path = "my_derive" }
my_derive/src/lib.rs
extern crate proc_macro;
use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, DeriveInput, Meta, NestedMeta};
#[proc_macro_derive(MyDerive, attributes(my_derive))]
pub fn my_macro(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as DeriveInput);
let name = input.ident;
let attr = input.attrs.into_iter().filter(|a| a.path.is_ident("my_derive")).next().expect("No name passed");
let meta = attr.parse_meta().expect("Unknown attribute format");
let meta = match meta {
Meta::List(ml) => ml,
_ => panic!("Invalid attribute format"),
};
let meta = meta.nested.first().expect("Must have one path");
let meta = match meta {
NestedMeta::Meta(Meta::Path(p)) => p,
_ => panic!("Invalid nested attribute format"),
};
let field_name = meta.get_ident().expect("Not an ident");
let expanded = quote! {
impl std::fmt::Display for #name {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let Self { #field_name } = self;
write!(f, "I am a Foo: {}", #field_name)
}
}
};
TokenStream::from(expanded)
}
my_derive/Cargo.toml
[package]
name = "my_derive"
version = "0.1.0"
edition = "2018"
[lib]
proc-macro = true
[dependencies]
syn = "1.0.13"
quote = "1.0.2"
proc-macro2 = "1.0.7"
With Rust 1.40, this produces the compiler error:
error[E0599]: no method named `write_fmt` found for type `&std::string::String` in the current scope
--> src/main.rs:3:10
|
3 | #[derive(MyDerive)]
| ^^^^^^^^ method not found in `&std::string::String`
|
= help: items from traits can only be used if the trait is in scope
= note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
|
1 | use std::fmt::Write;
|
What techniques exist to namespace my identifiers from identifiers outside of my control?

Summary: you can't yet use hygienic identifiers with proc macros on stable Rust. Your best bet is to use a particularly ugly name such as __your_crate_your_name.
You are creating identifiers (in particular, f) by using quote!. This is certainly convenient, but it's just a helper around the actual proc macro API the compiler offers. So let's take a look at that API to see how we can create identifiers! In the end we need a TokenStream, as that's what our proc macro returns. How can we construct such a token stream?
We can parse it from a string, e.g. "let f = 3;".parse::<TokenStream>(). But this was basically an early solution and is discouraged now. In any case, all identifiers created this way behave in a non-hygienic manner, so this won't solve your problem.
The second way (which quote! uses under the hood) is to create a TokenStream manually by creating a bunch of TokenTrees. One kind of TokenTree is an Ident (identifier). We can create an Ident via new:
fn new(string: &str, span: Span) -> Ident
The string parameter is self explanatory, but the span parameter is the interesting part! A Span stores the location of something in the source code and is usually used for error reporting (in order for rustc to point to the misspelled variable name, for example). But in the Rust compiler, spans carry more than location information: the kind of hygiene! We can see two constructor functions for Span:
fn call_site() -> Span: creates a span with call site hygiene. This is what you call "unhygienic" and is equivalent to "copy and pasting". If two identifiers have the same string, they will collide or shadow each other.
fn def_site() -> Span: this is what you are after. Technically called definition site hygiene, this is what you call "hygienic". The identifiers you define and the ones of your user live in different universes and won't ever collide. As you can see in the docs, this method is still unstable and thus only usable on a nightly compiler. Bummer!
There are no really great workarounds. The obvious one is to use a really ugly name like __your_crate_some_variable. To make it a bit easier for you, you can create that identifier once and use it within quote! (slightly better solution here):
let ugly_name = quote! { __your_crate_some_variable };
quote! {
let #ugly_name = 3;
println!("{}", #ugly_name);
}
Sometimes you can even search through all identifiers of the user that could collide with yours and then simply algorithmically chose an identifier that does not collide. This is actually what we did for auto_impl, with a fallback super ugly name. This was mainly to improve the generated documentation from having super ugly names in it.
Apart from that, I'm afraid you cannot really do anything.

You can thanks to a UUID:
fn generate_unique_ident(prefix: &str) -> Ident {
let uuid = uuid::Uuid::new_v4();
let ident = format!("{}_{}", prefix, uuid).replace('-', "_");
Ident::new(&ident, Span::call_site())
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Read excel file with calamine and serde - Cannot find column with alias - rust

Related

(De)serialize RFC-3339 timestamp with serde to time-rs OffsetDateTime

Passing a data structure as an args to a proc macro attribute

How to pretty print Syn AST?

How to deserialize a TOML table containing an array of tables

How can I create hygienic identifiers in code generated by procedural macros?

Categories

Resources