converting strings to formula objects in Julia - metaprogramming

I have a dataframe in Julia with less than 10 column names. I want to generate a list of all possible formulas that could be fed into a linear model (eg, [Y~X1+X2+X3, Y~X1+X2, ....]). I can accomplish this easily with combinations() and string versions of the column names. However, when I try to convert the strings into Formula objects, it breaks down. Looking at DataFrames.jl documentation, it seems like one can only construct Formulas from "expressions" and I can indeed make a list of individual column names as expressions. Is there any way I can somehow join together a bunch of different expressions using the "+" operator programmatically such that the resulting composite expression can then be passed into RHS of the Formula constructor? My impulse is to search for some function that will convert an arbitrary string into the equivalent expression, but not sure if that is correct.

The function parse takes a string, parses it, and returns an expression. I see nothing wrong with using it for what you're talking about.

Here is some actual working code, because I have been struggling with getting a similar problem to work. Please note this is Julia version 1.3.1 so parse is now Meta.parse and instead of combinations I used IterTools.subsets.
using RDatasets, DataFrames, IterTools, GLM
airquality = rename(dataset("datasets", "airquality"), "Solar.R" => "Solar_R")
predictors = setdiff(names(airquality), [:Temp])
for combination in subsets(predictors)
formula = FormulaTerm(Term(:Temp), Tuple(Term.(combination)))
if length(combination) > 0
#show lm(formula, airquality)
end
end

Related

Excel's LAMBDA with a "kind of" composite function

Ever since I learnt that Excel is now Turing-complete, I understood that I can now "program" Excel using exclusively formulas, therefore excluding any use of VBA whatsoever.
I do not know if my conclusion is right or wrong. In reality, I do not mind.
However, to my satisfaction, I have been able to "program" the two most basic structures of program flow inside formulas: 1- branching the control flow (using an IF function has no secrets in excel) and 2- loops (FOR, WHILE, UNTIL loops).
Let me explain a little more in detail my findings. (Remark: because I am using a Spanish version of Excel 365, the field separator in formulas is the semicolon (";") instead of the comma (",").
A- Acumulator in a FOR loop
B- Factorial (using product)
C- WHILE loop
D-UNTIL loop
E- The notion of INTERNAL/EXTERNAL SCOPE
And now, the time of my question has arrived:
I want to use a formula that is really an array of formulas
I want to use an accumulator for the first number in the "tuple" whereas I want a factorial for the second number in the tuple. And all this using a single excel formula. I think I am not very far away from succeeding.
The REDUCE function accepts a LET function that contains 2 LAMBDAS instead of a single LAMBDA function. Until here, everything is perfect. However, the LET function seems to return only a "single" function instead of a tuple of functions
I can return (in the picture) function "x" or function "y" but not the tuple (x,y).
I have tried to use HSTACK(x,y), but it does not seem to work.
I am aware that this is a complex question, but I've done my best to make myself understood.
Can anybody give me any clues as to how I could solve my problem?
Very nice question.
I noticed that in your attempts you have given REDUCE() a single constant value in the 1st parameter. Funny enough, the documentation nowhere states you can't give values in array-format. Hence you could use the 1st parameter to give all the constants in (your case; horizontal) array-format, and while you loop through the array of the 2nd parameter you can apply the different types of logic using CHOOSE():
=REDUCE({0,1},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))
This way you have a single REDUCE() function which internal processes will update the given constants from the 1st parameter in array-form. You can now start stacking multiple functions horizontally and input an array of constants, for example:
=REDUCE({0,1,100},SEQUENCE(5),LAMBDA(a,b,CHOOSE({1,2,3},a+b,a*b,a/b)))
I suppose you'd have to use {0\1} and {1\2} like I'd have to in my Dutch version of Excel.
Given your accumulator:
Formula in A1:
=REDUCE(F1:G1,SEQUENCE(F3),LAMBDA(a,b,CHOOSE({1,2},a+b,a*b)))

Multiple arrays within multiple arrays

I have multiple lookup values that I am trying to match across multiple arrays. I would like to match one of those lookup values across several arrays within the same match but I keep getting "#VALUE" or "#N/A".
Current formula I try to use is below simplified for ease of reading.
=INDEX($I$2:$I$10,MATCH(A2&B2&C2,$D$2:$D$10&OR($E$2:$E$10,$F$2:$F$10)&$G$2:$G$10,0))
In this case, I am trying to match B2 either in $E$2:$E$10 or $F$2:$F$10. What am I doing wrong?
Thanks in advance!
At first: You misinterpret the OR function. OR needs boolean values as parameters. And it will not return arrays of values, even not boolean values. It will return either TRUE or FALSE.
At second: Even if OR would work as you seem to think, MATCH needs an one dimensional lookup_array, a row vector or a column vector. It can't work with two dimensional matrices like {$D$2&$E$2&$G$2 , $D$2&$F$2&$G$2 ; $D$3&$E$3&$G$3 , $D$3&$F$3&$G$3 ; ...}
So simplest solution with your example would be to have one INDEX MATCH combination each possible lookup_array:
{=IFERROR(INDEX($I$2:$I$10,MATCH(A2&B2&C2,$D$2:$D$10&$E$2:$E$10&$G$2:$G$10,0)),INDEX($I$2:$I$10,MATCH(A2&B2&C2,$D$2:$D$10&$F$2:$F$10&$G$2:$G$10,0)))}
Or, if you really need this the way you seem to think your formula should work, then you can't use MATCH and need calculate the row number on other way. For example like so:
{=INDEX($I$2:$I$10,MIN(IF(A2&B2&C2=$D$2:$D$10&T(OFFSET($E$2,ROW($2:$10)-2,{0,1}))&$G$2:$G$10,ROW($2:$10)-1,ROWS($D$2:$D$10)+1)))}
The T is used assuming the values in A2:G10 are text values. If they are numeric, N must be used instead.

How to evaluate typed-out formulas in Excel

If I type the formula 1/4*pi()*($A$1)^2 as a string in a cell and assuming I have a value in $A$1, I use the following VBA function in a third cell to evaluate the formula:
Public Function E(byval TextFormula as String) as Variant
E = Evaluate(TextFormula)
End Function
Is there a way to use math characters like •, √, ¼, π, ², etc. so that my typed-out formula looks more agreeable? Even translate '[' and ']' as '(' and ')'. I can just iterate through an array replacements using REPLACE() function for the simple characters but what about the extended characters like π?
For the really sharp macro'ers...
What about showing intermediate steps (iterations) as in (2*3) + (2.5*4) evaluates to 6 + 10 in the first iteration and then 16 in the next iteration. Asside: I would want the iterations to stop just before each set of addings/subtractings because I sometimes like to know what the relative magnitudes of the individual evaluated terms are to see what part of my formula is controlling the result.
And for the mega-genius ones...
What about mixed units? Such as typing out 560{lbs}/[1.23{m}*3.4{'}] and getting my result in ###{psf} as an example. I thought that the unit could be delineated by the underscore such as 34_kN but I think a start and end delineation is required for compound units like 34{kN/m^2}. There would need to be a way to force the output to a desired unit (ie. mm instead of in) like maybe setting up your desired units ahead in your sheet and then it would at least try to convert to one of those units. I think at this stage you will be charging me for the code;)
I like using Excel for my engineering calculations because I only use simpler formulas (no calculus!) and I don't want to constantly switch between Excel and Mathcad apps but use only one.
Shawn
Those are tall orders. The following sub might give you an idea for your first question:
Sub test()
Dim R As Range
Set R = Range("A1")
R.Value = "A = pr2"
R.Characters(5, 1).Font.Name = "Symbol"
R.Characters(7, 1).Font.Superscript = True
End Sub
Run it an then look at the contents of A1
As far as your second question goes - sure you can do it, but you would need to write a full-fledged expression parser. Writing one from scratch is fairly involved (at least a couple hundred lines of code) and is probably best done by using classes to create a custom tree data type then writing a recursive descent parser to parse strings into expression trees. Doable, though I have neither the time nor the inclination to do so.
I'm not quite sure what you are driving at with your last question, though my gut reaction is that it is easier than your second question since no real parsing is required and it is easy enough to create a dictionary of conversion factors.

How to compare 2 arrays with unequal numbers of elements in Excel

In an Excel array formula, I would like to test each element of one array against each element of a second array, when the 2 arrays do NOT have the same number of elements. Simplified right down, this scenario could be represented by:
=SUMPRODUCT({1,2,3,4,5}={1,2})
NB - in my real world scenario these arrays are calculated from various prior steps.
Using the above example, I would want a result of {TRUE,TRUE,FALSE,FALSE,FALSE}. What I get is {TRUE,TRUE,#N/A,#N/A,#N/A}.
It's clear that, when there's more than 1 value being tested for, Excel wants equal numbers of elements in the 2 arrays; when there isn't, the #N/A error fills in the blanks.
I've considered writing a UDF to achieve what I want, and I'm pretty sure my coding skills are up to creating something like:
=ArrayCompare({1,2,3,4,5},"=",{1,2})
But I'd much rather do this using native functionality if it's not too cumbersome...
So, simple question; can an array formula be constructed to do what I'm after?
Thanks peeps!
Using MATCH function is probably the best way.....but if you actually want to compare every element in one array with another array in a direct comparison then one should be a "column" and one a "row", e.g.
=SUMPRODUCT(({1,2,3,4,5}={1;4})+0)
Note the semi-colon separator in the second array
If you can't actually change the column/row designation then TRANSPOSE can be used, i.e.
=SUMPRODUCT(({1,2,3,4,5}=TRANSPOSE({1,4}))+0)
You may not get the required results if the arrays contain duplicates because then you will get some double-counting, e.g. with this formula
=SUMPRODUCT(({1,1,1,1,1}={1;1})+0)
the result is 10 because there are 5x2 comparisons and they are all TRUE
Maybe:
{=IF(ISERROR(MATCH({1,2,3,4,5},{1,2},0)),FALSE,TRUE)}
If the second array is a subset of the first array, same order, and starting at position 1 then you can use this array formula for equivalence testing:
=IFERROR(IF({1,2,3,4,5}={1,2},TRUE),FALSE)
For non equivalence just swap the FALSE and TRUE
=IFERROR(IF({1,2,3,4,5}={1,2},FALSE),TRUE)
You can then use this in other formulas just as an array:
However if the arrays are not in order, as in this example:
{1,2,3,4,5},{1,4,5}
Then you have to use MATCH. However all you need is to surround the match with an ISNUMBER like so:
Equivalence test:
=ISNUMBER(MATCH({1,2,3,4,5},{1,4,5},0))
Non Equivalence test:
=NOT(ISNUMBER(MATCH({1,2,3,4,5},{1,4,5},0)))
Remember all array formulas are entered with ctrl + shift + enter

INDIRECT() returns #VALUE! unexpectedly

Background: I'm using Excel functions to parse a lot of data out, essentially creating a flexible pivot table. It sorts a lot of race timing data by car, etc. In this portion of the sheet, I'm searching for the minimum segment times for each car. The rest of the sheet avoids macros and VBA so I'd like to avoid that here.
Issue: My formula worked when there are no zeros, but sometimes there are zeros that I need to exclude. My array formula is pretty complicated, but the change I made that broke it is this:
OLD (working):
{=min(if(car_number = indirect("number_vector"), indirect("data_vector")))}
NEW (non-working):
{=min(if(and(car_number = indirect("number_vector"),not(0=indirect("data_vector"))), indirect("data_vector")))}
I am using INDIRECT() with this exact argument several times in the formula. However, in this particular instance (inside the NOT()), it returns #VALUE! instead of {data1;...;datan}. Please see the screencaps below.
Before evaluation:
After evaluation:
I suspect that your AND function might be a problem - AND only returns a single result not an array of results as required, try using multiple IFs like this
=min(if(car_number = indirect("number_vector"),IF(indirect(data_vector)<>0, indirect(data_vector))))
Note that I also used <> rather than using NOT
Are data vector and number vector the same size and shape? (both vertical?)
why are there quotes around one but not the other?

Resources