How to pass parameters loaded from configuration file to a procedural macro function? - rust

Here is a problem I am trying to solve. I have multiple procedural macro functions that generate tables of pre-computed values. Currently my procedural macro functions take parameters in the form of literal integers. I would like to be able to pass these parameters from a configuration file. I could re-write my functions to load parameters from macro themselves. However, I want to keep configuration from a top level crate, like in this example:
top-level-crate/
config/
params.yaml
macro1-crate/
macro2-crate/
Since the input into a macro function is syntax tokens not run-time values, I am not able to load a file from top-level-crate and pass params.
use macro1_crate::gen_table1;
use macro2_crate::gen_table2;
const TABLE1: [f32;100] = gen_table1!(500, 123, 499);
const TABLE2: [f32;100] = gen_table2!(1, 3);
fn main() {
// use TABLE1 and TABLE2 to do further computation.
}
I would like to be able to pass params to gen_table1 and gen_table2 from a configuration file like this:
use macro1_crate::gen_table1;
use macro2_crate::gen_table2;
// Load values PARAM1, PARAM2, PARAM3, PARAM4, PARAM5
const TABLE1: [f32;100] = gen_table1!(PARAM1, PARAM2, PARAM3);
const TABLE2: [f32;100] = gen_table2!(PARAM4, PARAM5);
fn main() {
// use TABLE1 and TABLE2 to do further computation.
}
The obvious problem is that PARAM1, PARAM2, PARAM3, PARAM4, PARAM5 are runtime values, and proc macros rely on build time information to generate tables.
One option I am considering is to create yet another proc macro specifically to load configuration into some sort of data-structure built using quote! tokens. Then feed this into macros. However, this feels hackish and the configuration file needs to be loaded several times. Also the params data structure need to be tightly coupled across macros. The code might look like this:
use macro1_crate::gen_table1;
use macro2_crate::gen_table2;
const TABLE1: [f32;100] = gen_table1!(myparams!());
const TABLE2: [f32;100] = gen_table2!(myparams!());
fn main() {
// use TABLE1 and TABLE2 to do further computation.
}
Any improvements or further suggestions?

gen_table1!(myparams!()); won't work: macros are not expanded from the inside out, like function calls. Your gen_table1 macro will receive the literal token stream myparams ! () and won't be able to evaluate this macro, thus not having access to the "return value" of myparams.
Right now, I only see one real way to do what you want: load the parameters from the file in gen_table1 and gen_table2, and just pass the filename of the file containing the parameters. For example:
const TABLE1: [f32; 100] = gen_table1!("../config/params.yaml");
const TABLE2: [f32; 100] = gen_table2!("../config/params.yaml");
Of course, this could lead to duplicate code in these two macros. But that should be solvable with the usual tools: extract that parameter loading into a function (in case both macros live in the same crate) or into an additional utility crate (in case the two macros live in different crates).
You also keep mentioning the term "runtime values". I think you mean "a const value, not a literal" and that you are referring to something like this:
const PARAM1: u32 = load_param!();
const TABLE1: [f32; 100] = gen_table1!(PARAM1); // <-- this does not work as expected!
Because here, again, your macro receives the literal token stream PARAM1 and not the value of said parameter.
So yes, I think that's what you mean by "runtime value". Granted, I don't have a better term for this right now, but "runtime value" is misleading/wrong because the value is available at compile time. If you were talking about an actual runtime value, i.e. a value that is ONLY knowable at runtime AFTER compilation is already done, then it would be impossible to do what you want. That's because proc macros run once at compile time, and never at runtime.

Related

Best approach to saving big 2d vec to vec of structs

i'm not sure if that's the right place to ask this kind of questions, but I feel like the way i'm doing things now is 'dumb way' and there's room for improvement in my code.
I'm trying to build stock data website as my side project, and im using rust for backend. One microservice i'm writing is responsible for scraping data from web and then saving it in database. The result of web scraping is 2d vector where each row is responsible for one attribute of struct i'll later construct. Then I save rows to variables.
Then i use izip! macro from itertools to make iterate over all those attributes and create struct.
izip!(
publication_dates,
quarter_dates,
income_revenue,
...
)
.for_each(
|(
publication_date,
quarter_date,
income_revenue,
...
)| {
Financials {
ticker: self.ticker.to_owned(),
publication_date,
quarter_date,
...
},
})
My issue is the fact, that one data table can have more than 40 attributes, to saving data from just one page can be over 250 lines of code so i'd have total of 2000 lines just to store webscraped data, most of it repetitive (parsing rows to correct data types). I'm pretty sure that's not correct approach since any changes i'd like to make would have to be done in many places.
One of my ideas to make it better was to create enum with desired types, then create vector of those enums like vec!([dataType::quarter_date, dataType::int32, dataType::int32 ...]) and iteratoe over both rows and new vector, and use match statement to use according function for data processing. That would get shorten rows allocation part a bit, but probably not by much.
Do you have any advice? Any hint would be great help, i just need a direction that i can later explore by myself :-)
If you want to only reduce the code duplication, I would recommend using a macro for that. A simple example is this (playground):
macro_rules! create_financials {
($rows:ident, $($fun:ident > $column:ident),+) => {{
$(
let $column = $rows
.next()
.ok_or("None")?
.into_iter()
.flat_map($fun);
)+
itertools::izip!($($column,)+).map(
|($($column,)+)| {
Financials {
$($column,)+
}
}
).collect::<Vec<_>>()
}}
}
Note that I removed the .collect::<Vec<_>>() part, it is not needed and allocates additional memory.
I also replaced the for_each with map to return a Vec from the macro which could be used outside of the macro.
The macro can be used simply like this:
let financials: Vec<Financials> = create_financials!(
rows,
quarter_string_date_to_naive_date > quarter_date,
publish_date_string_to_naive_date > publication_date,
income_revenue > income_revenue
);
To remove the code duplication of parsing to the different data types, look if the data types implement FromStr, From or TryFrom. Else you could define your own trait which does the conversion and which you can implement for each data type.

Why would you use the spread operator to spread a variable onto itself?

In the Google Getting started with Node.js tutorial they perform the following operation
data = {...data};
in the code for sending data to Firestore.
You can see it on their Github, line 63.
As far as I can tell this doesn't do anything.
Is there a good reason for doing this?
Is it potentially future proofing, so that if you added your own data you'd be less likely to do something like data = {data, moreData}?
#Manu's answer details what the line of code is doing, but not why it's there.
I don't know exactly why the Google code example uses this approach, but I would guess at the following reason (and would do the same myself in this situation):
Because objects in JavaScript are passed by reference, it becomes necessary to rebuild the 'data' object from it's constituent parts to avoid the original data object being further modified by the ref.set(data) call on line 64 of the example code:
await ref.set(data);
For example, in MongoDB, when you pass an object into a write or update method, Mongo will actually modify the object to add extra properties such as the datetime it was insert into a collection or it's ID within the collection. I don't know for sure if Firestore does the same, but if it doesn't now, it's possible that it may in future. If it does, and if your original code that calls the update method from Google's example code goes on to further manipulate the data object that it originally passed, that object would now have extra properties on it that may cause unexpected problems. Therefore, it's prudent to rebuild the data object from the original object's properties to avoid contamination of the original object elsewhere in code.
I hope that makes sense - the more I think about it, the more I'm convinced that this must be the reason and it's actually a great learning point.
I include the full original function from Google's code here in case others come across this in future, since the code is subject to change (copied from https://github.com/GoogleCloudPlatform/nodejs-getting-started/blob/master/bookshelf/books/firestore.js at the time of writing this answer):
// Creates a new book or updates an existing book with new data.
async function update(id, data) {
let ref;
if (id === null) {
ref = db.collection(collection).doc();
} else {
ref = db.collection(collection).doc(id);
}
data.id = ref.id;
data = {...data};
await ref.set(data);
return data;
}
It's making a shallow copy of data; let's say you have a third-party function that mutates the input:
const foo = input => {
input['changed'] = true;
}
And you need to call it, but don't want to get your object modified, so instead of:
data = {life: 42}
foo(data)
// > data
// { life: 42, changed: true }
You may use the Spread Syntax:
data = {life: 42}
foo({...data})
// > data
// { life: 42 }
Not sure if this is the particular case with Firestone but the thing is: spreading an object you get a shallow copy of that obj.
===
Related: Object copy using Spread operator actually shallow or deep?

Creating Node.js enum in code to match list of values in database

I have a list of valid values that I am storing in a data store. This list is about 20 items long now and will likely grow to around 100, maybe more.
I feel there are a variety of reasons it makes sense to store this in a data store rather than just storing in code. I want to be able to maintain the list and its metadata and make it accessible to other services, so it seems like a micro-service data store.
But in code, we want to make sure only values from the list are passed, and they can typically be hardcoded. So we would like to create an enum that can be used in code to ensure that valid values are passed.
I have created a simple node.js that can generate a JS file with the enum right from the data store. This could be regenerated anytime the file changes or maybe on a schedule. But sharing the enum file with any node.js applications that use it would not be trivial.
Has anyone done anything like this? Any reason why this would be a bad approach? Any feedback is welcome.
Piggy-backing off of this answer, which describes a way of creating an "enum" in JavaScript: you can grab the list of constants from your server (via an HTTP call) and then generate the enum in code, without the need for creating and loading a JavaScript source file.
Given that you have loaded your enumConstants from the back-end (here I hard-coded them):
const enumConstants = [
'FIRST',
'SECOND',
'THIRD'
];
const temp = {};
for (const constant of enumConstants) {
temp[constant] = constant;
}
const PlaceEnum = Object.freeze(temp);
console.log(PlaceEnum.FIRST);
// Or, in one line
const PlaceEnum2 = Object.freeze(enumConstants.reduce((o, c) => { o[c] = c; return o; }, {}));
console.log(PlaceEnum2.FIRST);
It is not ideal for code analysis or when using a smart editor, because the object is not explicitly defined and the editor will complain, but it will work.
Another approach is just to use an array and look for its members.
const members = ['first', 'second', 'third'...]
// then test for the members
members.indexOf('first') // 0
members.indexOf('third') // 2
members.indexOf('zero') // -1
members.indexOf('your_variable_to_test') // does it exist in the "enum"?
Any value that is >=0 will be a member of the list. -1 will not be a member. This doesn't "lock" the object like freeze (above) but I find it suffices for most of my similar scenarios.

How to get protobuf.js to output enum strings instead of integers

I'm using the latest protobuf.js with Node.js 4.4.5.
I currently struggle to get protobuf.js to output the string definitions of enums instead of integers. I tried several suggestions, but none of them worked:
https://github.com/dcodeIO/ProtoBuf.js/issues/97
https://github.com/dcodeIO/protobuf.js/issues/349
I guess it's because of API changes in protobuf.js for the first one. For the second one, I can use the suggested solution partially, but if the message is nested within other messages, the builder seems to fall back to using the integer values, although the string values have been explicitly set.
Ideally, I'd like to overwrite the function which is used for producing the enum values, but I have a hard time finding the correct one with the debugger. Or is there a better way to achieve this for deeply nested objects?
The generated JS code from protoc has a map in one direction only e.g.
proto.foo.Bar.Myenum = {
HEY: 0,
HO: 1
};
Rationale for this is here but you have to the reverse lookup in your own JS code. There are lots of easy solutions for this. I used the one at https://stackoverflow.com/a/59360329/449347 i.e.
Generic reverse mapper function ...
export function getKey(map, val) {
return Object.keys(map).find(key => map[key] === val);
}
UT ...
import { Bar } from "js/proto/bar_pb";
expect(getKey(proto.foo.Bar.Myenum, 0)).toEqual("HEY");
expect(getKey(proto.foo.Bar.Myenum, 1)).toEqual("HO");
expect(getKey(proto.foo.Bar.Myenum, 99)).toBeUndefined();

How to convert Rep[T] to T in slick 3.0?

I used a code, generated from slick code generator.
My table has more than 22 columns, hence it uses HList
It generates 1 type and 1 function:
type AccountRow
def AccountRow(uuid: java.util.UUID, providerid: String, email: Option[String], ...):AccountRow
How do I write compiled insert code from generated code?
I tried this:
val insertAccountQueryCompiled = {
def q(uuid:Rep[UUID], providerId:Rep[String], email:Rep[Option[String]], ...) = Account += AccountRow(uuid, providerId, email, ...)
Compiled(q _)
}
I need to convert Rep[T] to T for AccountRow function to work. How do I do that?
Thank you
;TLDR; Not possible
Explanation
There are two levels of abstraction in Slick: Querys and DBIOActions.
When you're dealing with Querys, you have to access your schema definitions, and rows, Reps and, basically, it's very constrained as it's the closest level of abstraction to the actual DB you're using. A Rep refers to an hypothetical value in the database, not in your program.
Then you have DBIOActions, which are the next level... not just some definition of a query, but the execution of it. You usually get DBIOActions when getting information out of a query, like with the result method or (TADAN!) when inserting rows.
Inserts and Updates are not queries and so what you're trying to do is not possible. You're dealing with DBIOAction (the += method), and Query stuff (the Rep types). The only way to get a Rep inside a DBIOAction is by executing a Query and obtaining a DBIOAction and then composing both Actions using flatMap or for comprehensions (which is the same).

Resources