Recursive strategies with additional parameters in Hypothesis - python-hypothesis

Using recursive, I can generate simple ASTs, e.g.
from hypothesis import *
from hypothesis.strategies import *
def trees():
base = integers(min_value=1, max_value=10).map(lambda n: 'x' + str(n))
#composite
def extend(draw, children):
op = draw(sampled_from(['+', '-', '*', '/']))
return (op, draw(children), draw(children))
return recursive(base, draw)
Now I want to change it so I can generate boolean operations in addition to the arithmetical ones. My initial idea is to add a parameter to trees:
def trees(tpe):
base = integers(min_value=1, max_value=10).map(lambda n: 'x' + str(n) + ': ' + tpe)
#composite
def extend(draw, children):
if tpe == 'bool':
op = draw(sampled_from(['&&', '||']))
return (op, draw(children), draw(children))
elif tpe == 'num':
op = draw(sampled_from(['+', '-', '*', '/']))
return (op, draw(children), draw(children))
return recursive(base, draw)
Ok so far. But how do I mix them? That is, I also want comparison operators and the ternary operator, which would require "calling children with a different parameter", so to say.
The trees need to be well-typed: if the operation is '||' or '&&', both arguments need to be boolean, arguments to '+' or '<' need to be numbers, etc. If I only had two types, I could just use filter (given a type_of function):
if op in ('&&', '||'):
bool_trees = children.filter(lambda x: type_of(x) == 'bool')
return (op, draw(bool_trees), draw(bool_trees))
but in the real case it wouldn't be acceptable.
Does recursive support this? Or is there another way? Obviously, I can directly define trees recursively, but that runs into the standard problems.

You can simply describe trees where the comparison is drawn from either set of operations - in this case trivially by sampling from ['&&', '||', '+', '-', '*', '/'].
def trees():
return recursive(
integers(min_value=1, max_value=10).map('x{}'.format),
lambda node: tuples(sampled_from('&& || + - * /'.split()), node, node)
)
But of course that won't be well-typed (except perhaps by rare coincidence). I think the best option for well-typed ASTs is:
For each type, define a strategy for trees which evaluate to that type. The base case is simply (a strategy for) a value of that type.
The extension is to pre-calculate the possible combinations of types and operations that would generate a value of this type, using mutual recursion via st.deferred. That would look something like...
bool_strat = deferred(
lambda: one_of(
booleans(),
tuples(sampled_from(["and", "or"], bool_strat, bool_strat),
tuples(sampled_from(["==", "!=", "<", ...]), integer_strat, integer_strat),
)
)
integer_strat = deferred(
lambda: one_of(
integers(),
tuples(sampled_from("= - * /".split()), integer_strat, integer_strat),
)
)
any_type_ast = bool_strat | integer_strat
And it will work as if by magic :D
(on the other hand, this is a fair bit more complex - if your workaround is working for you, don't feel obliged to do this instead!)
If you're seeing problematic blowups in size - which should be very rare, as the engine has had a lot of work since that article was written - there's honestly not much to do about it. Threading a depth limit through the whole thing and decrementing it each step does work as a last resort, but it's not nice to work with.

The solution I used for now is to adapt the generated trees so e.g. if a num tree is generated when the operation needs a bool, I also draw a comparison operator op and a constant const and return (op, tree, const):
def make_bool(tree, draw):
if type_of(tree) == 'bool':
return tree
else type_of(tree) == 'num':
op = draw(sampled_from(comparison_ops))
const = draw(integers())
side = draw(booleans())
return (op, tree, const) if side else (op, const, tree)
// in def extend:
if tpe == 'bool':
op = draw(sampled_from(bool_ops + comparison_ops))
if op in bool_ops:
return (op, make_bool(draw(children), draw), make_bool(draw(children), draw))
else:
return (op, make_num(draw(children), draw), make_num(draw(children), draw))
Unfortunately, it's specific to ASTs and will mean specific kinds of trees are generated more often. So I'd still be happy to see better alternatives.

Related

How to recursively simplify a mathematical expression with AST in python3?

I have this mathematical expression:
tree = ast.parse('1 + 2 + 3 + x')
which corresponds to this abstract syntax tree:
Module(body=[Expr(value=BinOp(left=BinOp(left=BinOp(left=Num(n=1), op=Add(), right=Num(n=2)), op=Add(), right=Num(n=3)), op=Add(), right=Name(id='x', ctx=Load())))])
and I would like to simplify it - that is, get this:
Module(body=[Expr(value=BinOp(left=Num(n=6), op=Add(), right=Name(id='x', ctx=Load())))])
According to the documentation, I should use the NodeTransformer class. A suggestion in the docs says the following:
Keep in mind that if the node you’re operating on has child nodes you
must either transform the child nodes yourself or call the
generic_visit() method for the node first.
I tried implementing my own transformer:
class Evaluator(ast.NodeTransformer):
def visit_BinOp(self, node):
print('Evaluating ', ast.dump(node))
for child in ast.iter_child_nodes(node):
self.visit(child)
if type(node.left) == ast.Num and type(node.right) == ast.Num:
print(ast.literal_eval(node))
return ast.copy_location(ast.Subscript(value=ast.literal_eval(node)), node)
else:
return node
What it should do in this specific case is simplify 1+2 into 3, and then 3 +3 into 6.
It does simplify the binary operations I want to simplify, but it doesn't update the original Syntax Tree. I tried different approaches but I still don't get how I can recursively simplify all binary operations (in a depth-first manner). Could anyone point me in the right direction?
Thank you.
There are three possible return values for the visit_* methods:
None which means the node will be deleted,
node (the node itself) which means no change will be applied,
A new node, which will replace the old one.
So when you want to replace the BinOp with a Num you need to return a new Num node. The evaluation of the expression cannot be done via ast.literal_eval as this function only evaluates literals (not arbitrary expressions). Instead you can use eval for example.
So you could use the following node transformer class:
import ast
class Evaluator(ast.NodeTransformer):
ops = {
ast.Add: '+',
ast.Sub: '-',
ast.Mult: '*',
ast.Div: '/',
# define more here
}
def visit_BinOp(self, node):
self.generic_visit(node)
if isinstance(node.left, ast.Num) and isinstance(node.right, ast.Num):
# On Python <= 3.6 you can use ast.literal_eval.
# value = ast.literal_eval(node)
value = eval(f'{node.left.n} {self.ops[type(node.op)]} {node.right.n}')
return ast.Num(n=value)
return node
tree = ast.parse('1 + 2 + 3 + x')
tree = ast.fix_missing_locations(Evaluator().visit(tree))
print(ast.dump(tree))

Representing a constant symbol in Sympy such that it is not a free_symbol

Application
I want to create a python function (e.g., Laplacian(expr)). The Laplacian operator is defined as taking the sum of the second partial derivatives of expr with respect to each variable (e.g., Laplacian(f(x,y,z)) is diff(f,x,x) + diff(f,y,y) + diff(f,z,z). In the expression, there may be arbitrary constants c,k, etc that are not variables as far as the expression is concerned. Just as you cannot take the derivative diff(f,126), taking the derivative of the expression with respect to c is not defined.
Need I need to be able to extract the non-constant free symbols from an expression.
Problem
Though I can construct c = Symbol('c', constant=True, number=True) in Sympy, c.is_constant() evaluates to False. Similarly, g(c).is_constant() evaluates to false. For my application, the symbol c should have the exact same behavior as E.is_constant() == True and g(E).is_constant() == True, as it is a number.
Caveats
I cannot register c as a singleton, as it is only defined with respect to this particular proof or expression.
I cannot construct it in the same way values like E are constructed, as there is no specific numeric value for it to be assigned to.
I cannot simply add a constants keyword to Laplacian, as I do not know all such constants that may appear (just as it would not make sense to add constants=[1,2,3,4,...] to solve()).
I cannot simply add a variables keyword to Laplacian, as I do not know the variables that appear in the expression.
The desired usage is as follows:
>>> C = ... # somehow create the constant
>>> symbols_that_arent_constant_numbers(g(C))
set()
>>> symbols_that_arent_constant_numbers(g(C, x))
{x}
>>> g(C).is_constant()
True
stretch goals: It would be awesome to have an arbitrary constant symbol that absorbs other constant terms in the same way that constantsimp operates. Consider introducing an integration constant c into an expression, and then multiplying that expression by I. As far as we are concerned algebraically, cI=c without loosing any generality.
Note
Per Oscar Benjamin's comments on question, current best practices when constructing a sympy-style method (like Laplacian) is to pass a constants or variables keyword into a method. Bare that in mind when applying the following solution. Furthermore, free_symbols has many applications within Sympy, so using another class that has established semantics may have unexpected side-effects.
(I am not accepting my own solution in the event that a better one comes along, as Mr. Benjamin has pointed out there are many open related issues.)
Solution
Sympy provides a mechanism to create such a constant: sympy.physics.units.quantities.Quantity. Its behavior is equivalent to Symbol and singleton constants, but most notably it does not appear as a free symbol. This can help prevent code from interpreting it as a variable that may be differentiated, etc.
from sympy.physics.units.quantities import Quantity
C = Quantity('C')
print("C constant? : ", C.is_constant())
print("C free symbols : ", C.free_symbols)
print("x constant? : ", x.is_constant())
print("g(C) constant? : ", g(C).is_constant())
print("g(x) constant? : ", g(x).is_constant())
print("g(C,x) constant : ", g(C,x).is_constant())
print("g(C) free symbols : ", g(C).free_symbols)
print("g(C,x) free symbols: ", g(C,x).free_symbols)
assert C.is_constant()
assert C.free_symbols == set([])
assert g(C).is_constant()
assert g(C, x).is_constant() == g(x).is_constant() # consistent interface
assert g(C).free_symbols == set([])
assert g(C, x).free_symbols == set([x])
assert [5/C] == solve(C*x -5, x)
The above snippet produces the following output when tested in sympy==1.5.1:
C constant? : True
C free symbols : set()
x constant? : False
g(C) constant? : True
g(x) constant? : None
g(C,x) constant : None
g(C) free symbols : set()
g(C,x) free symbols: {x}
Note that while g(C).is_constant()==True, we see that g(x).is_constant() == None, as well as g(C,x).is_constant() == None. Consequently, I only assert that those two applications have a consistent interface.

Basic first order logic inference fails for symmetric binary predicate

Super basic question. I am trying to express a symmetric relationship between two binary predicates (parent and child). But, with the following statement, my resolution prover allows me to prove anything. The converted CNF form makes sense to me as does the proof by resolution, but this should be an obvious case for false. What am I missing?
forall x,y (is-parent-of(x,y) <-> is-child-of(y,x))
I am using the nltk python library and the ResolutionProver prover. Here is the nltk code:
from nltk.sem import Expression as exp
from nltk.inference import ResolutionProver as prover
s = exp.fromstring('all x.(all y.(parentof(y, x) <-> childof(x, y)))')
q = exp.fromstring('foo(Bar)')
print prover().prove(q, [s], verbose=True)
output:
[1] {-foo(Bar)} A
[2] {-parentof(z9,z10), childof(z10,z9)} A
[3] {parentof(z11,z12), -childof(z12,z11)} A
[4] {} (2, 3)
True
Here is a quick fix for the ResolutionProver.
The issue that causes the prover to be unsound is that it does not implement the resolution rule correctly when there is more than one complementary literal. E.g. given the clauses {A B C} and {-A -B D} binary resolution would produce the clauses {A -A C D} and {B -B C D}. Both would be discarded as tautologies. The current NLTK implementation instead would produce {C D}.
This was probably introduced because clauses are represented in NLTK as lists, therefore identical literals may occur more than once within a clause. This rule does correctly produce an empty clause when applied to the clauses {A A} and {-A -A}, but in general this rule is not correct.
It seems that if we keep clauses free from repetitions of identical literals we can regain soundness with a few changes.
First define a function that removes identical literals.
Here is a naive implementation of such a function
import nltk.inference.resolution as res
def _simplify(clause):
"""
Remove duplicate literals from a clause
"""
duplicates=[]
for i,c in enumerate(clause):
if i in duplicates:
continue
for j,d in enumerate(clause[i+1:],start=i+1):
if j in duplicates:
continue
if c == d:
duplicates.append(j)
result=[]
for i,c in enumerate(clause):
if not i in duplicates:
result.append(clause[i])
return res.Clause(result)
Now we can plug this function into some of the functions of the nltk.inference.resolution module.
def _iterate_first_fix(first, second, bindings, used, skipped, finalize_method, debug):
"""
This method facilitates movement through the terms of 'self'
"""
debug.line('unify(%s,%s) %s'%(first, second, bindings))
if not len(first) or not len(second): #if no more recursions can be performed
return finalize_method(first, second, bindings, used, skipped, debug)
else:
#explore this 'self' atom
result = res._iterate_second(first, second, bindings, used, skipped, finalize_method, debug+1)
#skip this possible 'self' atom
newskipped = (skipped[0]+[first[0]], skipped[1])
result += res._iterate_first(first[1:], second, bindings, used, newskipped, finalize_method, debug+1)
try:
newbindings, newused, unused = res._unify_terms(first[0], second[0], bindings, used)
#Unification found, so progress with this line of unification
#put skipped and unused terms back into play for later unification.
newfirst = first[1:] + skipped[0] + unused[0]
newsecond = second[1:] + skipped[1] + unused[1]
# We return immediately when `_unify_term()` is successful
result += _simplify(finalize_method(newfirst,newsecond,newbindings,newused,([],[]),debug))
except res.BindingException:
pass
return result
res._iterate_first=_iterate_first_fix
Similarly update res._iterate_second
def _iterate_second_fix(first, second, bindings, used, skipped, finalize_method, debug):
"""
This method facilitates movement through the terms of 'other'
"""
debug.line('unify(%s,%s) %s'%(first, second, bindings))
if not len(first) or not len(second): #if no more recursions can be performed
return finalize_method(first, second, bindings, used, skipped, debug)
else:
#skip this possible pairing and move to the next
newskipped = (skipped[0], skipped[1]+[second[0]])
result = res._iterate_second(first, second[1:], bindings, used, newskipped, finalize_method, debug+1)
try:
newbindings, newused, unused = res._unify_terms(first[0], second[0], bindings, used)
#Unification found, so progress with this line of unification
#put skipped and unused terms back into play for later unification.
newfirst = first[1:] + skipped[0] + unused[0]
newsecond = second[1:] + skipped[1] + unused[1]
# We return immediately when `_unify_term()` is successful
result += _simplify(finalize_method(newfirst,newsecond,newbindings,newused,([],[]),debug))
except res.BindingException:
#the atoms could not be unified,
pass
return result
res._iterate_second=_iterate_second_fix
Finally, plug our function into the clausify() to ensure the inputs are repetition-free.
def clausify_simplify(expression):
"""
Skolemize, clausify, and standardize the variables apart.
"""
clause_list = []
for clause in res._clausify(res.skolemize(expression)):
for free in clause.free():
if res.is_indvar(free.name):
newvar = res.VariableExpression(res.unique_variable())
clause = clause.replace(free, newvar)
clause_list.append(_simplify(clause))
return clause_list
res.clausify=clausify_simplify
After applying these changes the prover should run the standard tests and also deal correctly with the parentof/childof relationships.
print res.ResolutionProver().prove(q, [s], verbose=True)
output:
[1] {-foo(Bar)} A
[2] {-parentof(z144,z143), childof(z143,z144)} A
[3] {parentof(z146,z145), -childof(z145,z146)} A
[4] {childof(z145,z146), -childof(z145,z146)} (2, 3) Tautology
[5] {-parentof(z146,z145), parentof(z146,z145)} (2, 3) Tautology
[6] {childof(z145,z146), -childof(z145,z146)} (2, 3) Tautology
False
Update: Achieving correctness is not the end of the story. A more efficient solution would be to replace the container used to store literals in the Clause class with the one based on built-in Python hash-based sets, however that seems to require a more thorough rework of the prover implementation and introducing some performance testing infrastructure as well.

Python: NameError: name "string" is not defined, not via input()

The function below checks to see if the first 9 digits of string (n) equate to the 10th character (an integer from 1-9 or X for 10).
def isISBN(n):
checkSum = 0
for i in range(9):
checkSum = checkSum + (eval(n[i])*(i+1))
if checkSum%11 == eval(n[9]) or (checkSum%11 == 10 and n[9] == 'X'): return True
else: return False
When I run the function for n='020103803X' I get an error:
NameError: name 'X' is not defined
I've searched for this problem and found that most people's issues were with input() or raw_input(), but as I am not using input(), I'm confused as to why I can't test if a character is a specific string. This is my first post as Python beginner, please tell if I'm breaking rules or what extra info I should include.
The problem is with your use of eval: eval('X') is the same as doing X (without the quotes). python sees that as a variable reference, and you have no variable named X.
There is no reason to use eval here. What are you hoping to accomplish? Perhaps you should be checking to see if the character is a digit?
if checkSum%11 == n[9].isdigit() or (checkSum%11 == 10 and n[9] == 'X'): return True
You're trying to get a response from
eval('X')
This is illegal, as you have no symbol 'X' defined.
If you switch the order of your if check, you can pass legal ISBNs. However, it still fails on invalid codes with an X at the end.
def isISBN(n):
checkSum = 0
for i in range(9):
checkSum = checkSum + (eval(n[i])*(i+1))
if (checkSum%11 == 10 and n[9] == 'X') or \
checkSum%11 == eval(n[9]):
return True
else:
return False
Note also that you can short-cut that return logic by simply returning the expression value:
return (checkSum%11 == 10 and n[9] == 'X') or \
checkSum%11 == eval(n[9])
Eval is not the proper usage, nor is the way you use it correct. For example, see Wikipedia which shows the use. You probably want to use a try: except: pair.
try:
int(n[i]
except:
print "this character is not a digit"
A call to eval is sometimes used by inexperienced programmers for all
sorts of things. In most cases, there are alternatives which are more
flexible and do not require the speed penalty of parsing code.
For instance, eval is sometimes used for a simple mail merge facility,
as in this PHP example:
$name = 'John Doe';
$greeting = 'Hello';
$template = '"$greeting,
$name! How can I help you today?"';
print eval("return $template;");
Although this works, it can cause some security problems (see §
Security risks), and will be much slower than other possible
solutions. A faster and more secure solution would be changing the
last line to echo $template; and removing the single quotes from the
previous line, or using printf.
eval is also sometimes used in applications needing to evaluate math
expressions, such as spreadsheets. This is much easier than writing an
expression parser, but finding or writing one would often be a wiser
choice. Besides the fixable security risks, using the language's
evaluation features would most likely be slower, and wouldn't be as
customizable.
Perhaps the best use of eval is in bootstrapping a new language (as
with Lisp), and in tutoring programs for languages[clarification
needed] which allow users to run their own programs in a controlled
environment.
For the purpose of expression evaluation, the major advantage of eval
over expression parsers is that, in most programming environments
where eval is supported, the expression may be arbitrarily complex,
and may include calls to functions written by the user that could not
have possibly been known in advance by the parser's creator. This
capability allows you to effectively augment the eval() engine with a
library of functions that you can enhance as needed, without having to
continually maintain an expression parser. If, however, you do not
need this ultimate level of flexibility, expression parsers are far
more efficient and lightweight.
Thanks everyone. I don't know how I didn't think of using int(). The reason I used eval() was because the past few programs I wrote required something like
x = eval(input("Input your equation: "))
Anyways the function works now.
def isISBN(n):
checkSum = 0
for i in range(9):
checkSum = checkSum + (int(n[i])*(i+1))
if n[9] == 'X':
if checkSum%11 == 10: return True
else: return False
elif checkSum%11 == int(n[9]): return True
else: return False

Use literal operators (eg "and", "or") in Groovy expressions?

My current work project allows user-provided expressions to be evaluated in specific contexts, as a way for them to extend and influence the workflow. These expressions the usual logical ones f. To make it a bit palatable for non-programmers, I'd like to give them the option of using literal operators (e.g. and, or, not instead of &, |, !).
A simple search & replace is not sufficient, as the data might contains those words within quotes and building a parser, while doable, may not be the most elegant and efficient solution.
To make the question clear: is there a way in Groovy to allow the users to write
x > 10 and y = 20 or not z
but have Groovy evaluate it as if it were:
x > 10 && y == 20 || !z
Thank you.
Recent versions of Groovy support Command chains, so it's indeed possible to write this:
compute x > 10 and y == 20 or not(z)
The word "compute" here is arbitrary, but it cannot be omitted, because it's the first "verb" in the command chain. Everything that follows alternates between verb and noun:
compute x > 10 and y == 20 or not(z)
───┬─── ──┬─── ─┬─ ───┬─── ─┬─ ──┬───
verb noun verb noun verb noun
A command chain is compiled like this:
verb(noun).verb(noun).verb(noun)...
so the example above is compiled to:
compute(x > 10).and(y == 20).or(not(z))
There are many ways to implement this. Here is just a quick & dirty proof of concept, that doesn't implement operator precedence, among other things:
class Compute {
private value
Compute(boolean v) { value = v }
def or (boolean w) { value = value || w; this }
def and(boolean w) { value = value && w; this }
String toString() { value }
}
def compute(v) { new Compute(v) }
def not(boolean v) { !v }
You can use command chains by themselves (as top-level statements) or to the right-hand side of an assignment operator (local variable or property assignment), but not inside other expressions.
If you can swap operators like > and = for the facelets-like gt and eq, respectively, i THINK your case may be doable, though it will require a lot of effort:
x gt 10 and y eq 20 or not z
resolves to:
x(gt).10(and).y(eq).20(or).not(z)
And this will be hell to parse.
The way #Brian Henry suggested is the easiest way, though not user-friendly, since it needs the parens and dots.
Well, considering we can swap the operators, you could try to intercept the Integer.call to start expressions. Having the missing properties in a script being resolved to operations can solve your new keywords problem. Then you can build expressions and save them to a list, executing them in the end of the script. It's not finished, but i came along with this:
// the operators that can be used in the script
enum Operation { eq, and, gt, not }
// every unresolved variable here will try to be resolved as an Operation
def propertyMissing(String property) { Operation.find { it.name() == property} }
// a class to contain what should be executed in the end of the script
#groovy.transform.ToString
class Instruction { def left; Operation operation; def right }
// a class to handle the next allowed tokens
class Expression {
Closure handler; Instruction instruction
def methodMissing(String method, args) {
println "method=$method, args=$args"
handler method, args
}
}
// a list to contain the instructions that will need to be parsed
def instructions = []
// the start of the whole mess: an integer will get this called
Integer.metaClass {
call = { Operation op ->
instruction = new Instruction(operation: op, left: delegate)
instructions << instruction
new Expression(
instruction: instruction,
handler:{ String method, args ->
instruction.right = method.toInteger()
println instructions
this
})
}
}
x = 12
y = 19
z = false
x gt 10 and y eq 20 or not z
Which will give an exception, due the not() part not being implemented, but it can build two Instruction objects before failing:
[Instruction(12, gt, 10), Instruction(19, eq, 20)]
Not sure if it is worth it.
The GDK tacks on and() and or() methods to Boolean. If you supplied a method like
Boolean not(Boolean b) {return !b}
you could write something like
(x > 10).and(y == 20).or(not(4 == 1))
I'm not sure that's particularly easy to write, though.

Resources