I am using pyparsing to parse a nested expression which is formed by delimited lists but which includes some basic arithmetic (just multiplication, for instance).
A sample expression could look like this:
(A, B, 2 * C, 3 * ( D, E, 2 * F, 3 *(G, H)), I )
The output should unfold the arithmetic:
( A, B, C, C, D, E, F, F, G, H, G, H, G, H, D, E, F, F, G, H, G, H, G, H, D, E, F, F, G, H, G, H, G, H, I )
Could somebody give me hint how to approach the problem?
I started like follows: since there's just the operation multiplication, I decided to use the '*' character as a delimiter in a somewhat weird list:
import pyparsing as pp
oddDelim = pp.Or([',', '*'])
weirdList = pp.Optional(',').suppress() + \
pp.delimitedList(pp.Or([pp.alphas, pp.pyparsing_common.number]), delim = oddDelim, combine = False) + \
pp.Optional('*').suppress()
nestedTest = pp.nestedExpr(content = weirdList)
Using this nestedTest expression I get a reasonable result:
[['A', 'B', 2, 'C', 3, ['D', 'E', 2, 'F', 3, ['G', 'H']], 'I']]
but I don't know how should I parse the tokens in order to properly unfold the arithmetics.
Instead of iterating over the tokens sequentially in a FOR loop, I would ideally like to start unfolding the arithmetic from the highest degree of nesting and progressively going down. But I don't know how...
Is nestedExpr the way to go? Or should I change the approach and use Forward or maybe infixNotation? I am very new into pyparsing I would be very grateful if I got some hints/ideas on this.
Thanks very much in advance for your help!
Cheers,
Pau
If you want to use Forward() to roll our own recursive grammar, it is best to start
with writing a BNF for your grammar. This will help you think straight about the
problem space first, and then worry about the coding later.
Here is a rough BNF for what you've posted:
list_expr ::= '(' list_item [',' list_item]* ')'
list_item ::= term | mult_term | list_expr
mult_term ::= integer '*' list_item
term ::= A-Z
That is, each list enclosed in parentheses has a comma-delimited list of
items, and each item can be a single character term, a multiplication expression
of an integer, a '*' and an item, or a nested list in another set of parentheses.
To translate this to pyparsing, work bottom-up to define each expression. For
instance, define a term using the new Char class (which is a single-character
from a string of allowed characters):
term = pp.Char("ABCDEFGHI... etc.")
You'll need to use Forward for list_item, since it will need expressions that
aren't defined yet, so Forward() gives you a placeholder. Then when you have
term, mult_term, and list_expr defined, using '<<=' to "insert" the definition
into the existing placeholder, like this:
list_item <<= term | mult_term | list_expr
Since you asked about infixNotation, I'll talk about that approach also.
When using infixNotation, look at your input and identify what constitutes
grouping, operators, and operands.
The grouping here is easy, it is done using ()'s,
which is pretty standard, and infixNotation will treat them as such by default.
Next identify what the lowest-level
operands are. You have two types of operands: integers and single alpha
characters.
The two operators are '*' for multiplication, and ',' for addition.
Since you only asked for suggestions, I'll stop there and let you tackle/struggle
with the next steps on your own.
Related
First of all, this is not a homework. I'm studying Computer Sciences in my home, to learn a little more alone.
I'm doing an excercise. It says like this:
Construct a predicate called replaceAtomsString/4 so that given
a string s as the first parameter, a number N as the second parameter,
and a pair of atoms [g, h] (list) as the third parameter, unify in a
fourth parameter the replacement in the Nth apparition of g in s
replacing it by h. Example:
replaceAtomsString (sAbbbsAbbasA, 2, [sA, cc], X) should result in
X = sAbbbccbbasA
So, my first approach was trying to build a list with the string, just like prolog do with every string. After all, i've built this code:
substitute(X, S, T, Y) :-
append(S, Xt, X), % i.e. S is the first part of X, the rest is Xt
!,
substitute(Xt, S, T, Yt),
append(T, Yt, Y).
substitute([Xh|Xt], S, T, [Xh|Yt]) :-
substitute(Xt, S, T, Yt).
But it returns false on every attempt.
Any ideas?
Since you need substantial work to get your code done, here is how to perform the task using the available libraries.
sub_atom/5 it's a rather powerful predicate to handle atoms. Coupled with call_nth/2, the solution is straightforward and more general than what would result coding the loop around N.
replaceAtomsString(S,N,[G,H],X) :-
call_nth(sub_atom(S,Before,_,After,G),N),
sub_atom(S,0,Before,_,Left),
sub_atom(S,_,After,0,Right),
atomic_list_concat([Left,H,Right],X).
Example running your query, but leaving N to be computed:
?- replaceAtomsString(sAbbbsAbbasA, N, [sA, cc], X).
N = 1,
X = ccbbbsAbbasA ;
N = 2,
X = sAbbbccbbasA ;
N = 3,
X = sAbbbsAbbacc ;
false.
I'm sure this is obvious but I'm a little unclear about it. Suppose I wanted to make a function do something like f(x) = 3x+1. Knowing the rule for forks, I expect to see something like this: [: 1&+ 3&* which is not that beautiful to me, but I guess is nicer looking that (1&+) #: (3&*) with the extra parentheses. Instead, if I query with 13: I get this:
13 : '1+3*y'
1 + 3 * ]
Way more beautiful, but I don't understand how it is possible. ] is the identity function, * and + are doing their usual thing, but how are the literals working here? Why is J not attempting to "call" 1 and 3 with arguments as if they are functions? I notice that this continues to do the right thing if I replace any of the constants with [ or ], so I think it is interpreting this as a train of some kind, but I'm not sure.
When J was first described, forks were all verbs (V V V), but then it was decided to let nouns be in the left tine position and return their face value. So (N V V) is seen as a fork as well. In some older code you can see the left tine of the fork show up as a 'verbified' noun such as 1: or 'a'"_ which act as verbs that return their face value when given any argument.
(N V V) configuration is described in the dictionary as "The train N g h (a noun followed by two verbs) is equivalent to N"_ g h ." http://www.jsoftware.com/help/dictionary/dictf.htm
So I'm playing with Sympy in an effort to build a generic solver/generator of physics problems. One component is that I'm going for a function that will take kwargs and, according to what it got, rearrange the equation and substitute values in it. Thanks to SO, I managed to find the things I need for that.
However..... I've tried putting sympy.solve in a for loop to generate all those expressions and I've ran into.... something.
import sympy
R, U, I, eq = sympy.symbols('R U I eq')
eq = R - U/I
for x in 'RUI':
print(x)
print(sympy.solve(eq, x))
The output?
R
[U/I]
U
[I*R]
I
[]
However, whenever I do sympy.solve(eq, I) it works and returns [U/R].
Now, I'm guessing the issue is with sympy using I for imaginary unit and with variable hiding in blocks, but even when I transfer the symbol declaration inside the for loop (and equation as well), I still get the same problem.
I'm not sure I'll need this badly in the end, but this is interesting to say the least.
It's more like an undocumented feature than a bug. The loop for x in 'RUI' is equivalent to for x in ['R', 'U', 'I'], meaning that x runs over one-character strings, not sympy symbols. Insert print(type(x)) in the loop to see this. And note that sympy.solve(eq, 'I') returns [].
The loop for x in [R, U, I] solves correctly for each variable. This is the right way to write this loop.
The surprising thing is that you get anything at all when passing a string as the second argument of solve. Sympy documentation does not list strings among acceptable arguments. Apparently, it tries to coerce the string to a sympy object and does not always guess your meaning correctly: works with sympy.solve(eq, 'R') but not with sympy.solve(eq, 'I')
The issue is that some sympy functions "accidentally" work with strings as input because they call sympify on their input. But sympify('I') gives the imaginary unit (sqrt(-1)), not Symbol('I').
You should always define your symbols explicitly like
R, U, I = symbols("R U I")
and use those instead of strings.
See https://github.com/sympy/sympy/wiki/Idioms-and-Antipatterns#strings-as-input for more information on why you should avoid using strings with SymPy.
I have a Maxima program that does some algebra and then writes some things down on an external file. How do I include some calculated values and even small expressions into the name of the file?
A mwe would be the following:
N:3;
f: erf(x);
tay: taylor(f,x,0,N);
with_stdout("taylor.txt", fortran(tay));
But this example names the file taylor.txt. I wanted something that named the file taylor_N3_f_erf.txt or something like that. I have tried several syntaxes but nothing worked.
Also, I know Maxima in programmed in lisp and I learned the syntax for concatenating strings in Lisp but I haven't figured out how to use that in Maxima.
Thank you very much.
Here's what I came up with. It took some playing around with argument quoting and evaluation in functions but I think it works now.
(%i2) bar (name_base, name_extension, ['vars]) := sconcat (name_base, foo(vars), ".", name_extension) $
(%i3) foo(l) := apply (sconcat, join (makelist ("_", 2 * length (l)), join (l, map (string, map (ev, l))))) $
(%i4) [a, b, c] : [123, '(x + 1), '(y/2)];
y
(%o4) [123, x + 1, -]
2
(%i5) bar ("foobar", "txt", a, b, c);
(%o5) foobar_a_123_b_x+1_c_y/2.txt
(%i6) myname : bar ("baz", "quux", a, b);
(%o6) baz_a_123_b_x+1.quux
(%i7) with_stdout (myname, print ("HELLO WORLD"));
(%o7) HELLO WORLD
(%i8) printfile ("baz_a_123_b_x+1.quux");
HELLO WORLD
(%o8) baz_a_123_b_x+1.quux
Note that sconcat concatenates strings and string produces a string representation of an expression.
Division expressions could cause trouble since / means a directory in a file name ... maybe you'll have to subsitute for those characters or any other non-allowed characters. See ssubst.
Note that with_stdout evaluates its first argument, so if you have a variable e.g. myname then the value of myname is the name of the output file.
I'm trying to parse a mathematical expression using pyparsing. I know i could just copy the example calculator from pyparsing site, but i want to understand it so i can add to it later. And i'm here because i tried to understand the example, and i couldn't, so i tried my best, and i got to this:
symbol = (
pp.Literal("^") |
pp.Literal("*") |
pp.Literal("/") |
pp.Literal("+") |
pp.Literal("-")
)
operation = pp.Forward()
atom = pp.Group(
pp.Literal("(").suppress() + operation + pp.Literal(")").suppress()
) | number
operation << (pp.Group(number + symbol + number + pp.ZeroOrMore(symbol + atom)) | atom)
expression = pp.OneOrMore(operation)
print(expression.parseString("9-1+27+(3-5)+9"))
That prints:
[[9, '-', 1, '+', 27, '+', [[3, '-', 5]], '+', 9]]
It works, kinda. I want precedence and all sorted into Groups, but after trying a lot, i couldn't find a way to do it. More or less like this:
[[[[9, '-', 1], '+', 27], '+', [3, '-', 5]], '+', 9]
I want to keep it AST-looking, i would like to generate code from it.
I did saw the operatorPrecedence class? similar to Forward, but i don't think i understand how it works either.
EDIT:
Tried more in depth operatorPrecedence and i got this:
expression = pp.operatorPrecedence(number, [
(pp.Literal("^"), 1, pp.opAssoc.RIGHT),
(pp.Literal("*"), 2, pp.opAssoc.LEFT),
(pp.Literal("/"), 2, pp.opAssoc.LEFT),
(pp.Literal("+"), 2, pp.opAssoc.LEFT),
(pp.Literal("-"), 2, pp.opAssoc.LEFT)
])
Which doesn't handle parenthesis (i don't know if i will have to postprocess the results) and i need to handle them.
The actual name for this parsing problem is "infix notation" (and in recent versions of pyparsing, I am renaming operatorPrecedence to infixNotation). To see the typical implementation of infix notation parsing, look at the fourFn.py example on the pyparsing wiki. There you will see an implementation of this simplified BNF to implement 4-function arithmetic, with precedence of operations:
operand :: integer or real number
factor :: operand | '(' expr ')'
term :: factor ( ('*' | '/') factor )*
expr :: term ( ('+' | '-') term )*
So an expression is one or more terms separated by addition or subtraction operations.
A term is one or more factors separated by multiplication or division operations.
A factor is either a lowest-level operand (in this case, just integers or reals), OR an expr enclosed in ()'s.
Note that this is a recursive parser, since factor is used indirectly in the definition of expr, but expr is also used to define factor.
In pyparsing, this looks roughly like this (assuming that integer and real have already been defined):
LPAR,RPAR = map(Suppress, '()')
expr = Forward()
operand = real | integer
factor = operand | Group(LPAR + expr + RPAR)
term = factor + ZeroOrMore( oneOf('* /') + factor )
expr <<= term + ZeroOrMore( oneOf('+ -') + term )
Now using expr, you can parse any of these:
3
3+2
3+2*4
(3+2)*4
The infixNotation pyparsing helper method takes care of all the recursive definitions and groupings, and lets you define this as:
expr = infixNotation(operand,
[
(oneOf('* /'), 2, opAssoc.LEFT),
(oneOf('+ -'), 2, opAssoc.LEFT),
])
But this obscures all the underlying theory, so if you are trying to understand how this is implemented, look at the raw solution in fourFn.py.
[EDIT - 18 Dec 2022] For those looking for a pre-defined solution, I've packaged infixNotation up into its own pip-installable package called plusminus. plusminus defines a BaseArithmeticParser class for creating a ready-to-run parser and evaluator that supports these operators:
** ÷ >= ∈ in ?:
* + == ∉ not |absolute-value|
// - != ∩ and
/ < ≠ ∪ ∧
mod > ≤ & or
× <= ≥ | ∨
And these functions:
abs ceil max
round floor str
trunc min bool
The BaseArithmeticParser class allows you to define additional operators and functions for your own domain-specific expressions, and the examples show how to define parsers with custom functions and operators for dice rolling, retail price discounts, among others.