Given a Hypothesis Strategy, can I get the minimal example? - python-hypothesis

I'd like to get the minimum for a complicated class, for which I have already written a strategy.
Is it possible to ask hypothesis to simply give me the minimum example for a given strategy?
Context here is writing a strategy and a default minimum (for use in #hypothesis.example) - surely the information for the latter is already contained in the former?
import dataclasses
import hypothesis
from hypothesis import strategies
#dataclasses.dataclass
class Foo:
bar: int
# Foo has many more attributes which are strategised over...
#classmethod
def strategy(cls):
return hypothesis.builds(cls, bar=strategies.integers())
#classmethod
def minimal(cls):
return hypothesis.minimal(cls.strategy())

Found the answer.
Use hypothesis.find:
"""Returns the minimal example from the given strategy specifier that
matches the predicate function condition.
So we want the following code for the above example:
minimal = hypothesis.find(
specifier=Foo.strategy(),
condition=lambda _: True,
)

You're correct that hypothesis.find(s, lambda _: True) is the way to do this.
Context here is writing a strategy and a default minimum (for use in #hypothesis.example()) - surely the information for the latter is already contained in the former?
It's not just contained in it, but generating examples will almost always start by generating the minimal example* - because if that fails, we can stop immediately! So you might not need to do this at all ;-)
(the rare exceptions are strategies with complicated filters, where the attempted-minimal example is rejected by a filter - and we just move on to reasonably-minimal examples rather than shrinking to find the minimal one)
I'd also note that st.builds() has native support for dataclasses, so you can omit the strategies (or pass hypothesis.infer) for any argument where you don't have more specific constraints than "any instance of the type". For example, there's no need to pass bar=st.integers() above!

Related

In Python, is what are the differences between a method outside a class definition, or a method in it using staticmethod?

I have been working a a very dense set of calculations. It all is to support a specific problem I have.
But the nature of the problem is no different than this. Suppose I develop a class called 'Matrix' that has the machinery to implement matrices. Instantiation would presumably take a list of lists, which would be the matrix entries.
Now I want to provide a multiply method. I have two choices. First, I could define a method like so:
class Matrix():
def __init__(self, entries)
# do the obvious here
return
def determinant(self):
# again, do the obvious here
return result_of_calcs
def multiply(self, b):
# again do the obvious here
return
If I do this, the call signature for two matrix objects, a and b, is
a.multiply(b)...
The other choice is a #staticmethod. Then, the definition looks like:
#staticethod
def multiply(a,b):
# do the obvious thing.
Now the call signature is:
z = multiply(a,b)
I am unclear when one is better than the other. The free-standing function is not truly part of the class definition, but who cares? it gets the job done, and because Python allows "reaching into an object" references from outside, it seems able to do everything. In practice they'll (the class and the method) end up in the same module, so they're at least linked there.
On the other hand, my understanding of the #staticmethod approach is that the function is now part of the class definition (it defines one of the methods), but the method gets no "self" passed in. In a way this is nice because the call signature is the much better looking:
z = multiply(a,b)
and the function can access all the instances' methods and attributes.
Is this the right way to view it? Are there strong reasons to do one or the other? In what ways are they not equivalent?
I have done quite a bit of Python programming since answering this question.
Suppose we have a file named matrix.py, and it has a bunch of code for manipulating matrices. We want to provide a matrix multiply method.
The two approaches are:
define a free:standing function with the signature multiply(x,y)
make it a method of all matrices: x.multiply(y)
Matrix multiply is what I will call a dyadic function. In other words, it always takes two arguments.
The temptation is to use #2, so that a matrix object "carries with it everywhere" the ability to be multiplied. However, the only thing it makes sense to multiply it with is another matrix object. In such cases there are two equally good ways to do that, viz:
z=x.multiply(y)
or
z=y.multiply(x)
However, a better way to do it is to define a function inside the file that is:
multiply(x,y)
multiply(), as such, is a function any code using the 'library' expects to have available. It need not be associated with each matrix. And, since the user will be doing an 'import', they will get the multiply method. This is better code.
What I was wrongly confounding was two things that led me to the method attached to every object instance:
Functions which need to be generally available inside the file that should be
exposed outside it; and
Functions which are needed only inside the file.
multiply() is an example of type 1. Any matrix 'library' ought to likely define matrix multiplication.
What I was worried about was needing to expose all the 'internal' functions. For example, suppose we want to make externally available matrix add(), multiple() and invert(). Suppose, however, we did not want to make externally available - but needed inside - determinant().
One way to 'protect' users is to make determinant a function (a def statement) inside the class declaration for matrices. Then it is protected from exposure. However, nothing stops a user of the code from reaching in if they know the internals, by using the method matrix.determinant().
In the end it comes down to convention, largely. It makes more sense to expose a matrix multiply function which takes two matrices, and is called like multiply(x,y). As for the determinant function, instead of 'wrapping it' in the class, it makes more sense to define it as __determinant(x) at the same level as the class definition for matrices.
You can never truly protect internal methods by their declaration, it seems. The best you can do is warn users. the "dunder" approach gives warning 'this is not expected to be called outside the code in this file'.

Is using eval() in this case a good idea in python? [duplicate]

I use the following class to easily store data of my songs.
class Song:
"""The class to store the details of each song"""
attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
def __init__(self):
for att in self.attsToStore:
exec 'self.%s=None'%(att.lower()) in locals()
def setDetail(self, key, val):
if key in self.attsToStore:
exec 'self.%s=val'%(key.lower()) in locals()
I feel that this is just much more extensible than writing out an if/else block. However, I have heard that eval is unsafe. Is it? What is the risk? How can I solve the underlying problem in my class (setting attributes of self dynamically) without incurring that risk?
Yes, using eval is a bad practice. Just to name a few reasons:
There is almost always a better way to do it
Very dangerous and insecure
Makes debugging difficult
Slow
In your case you can use setattr instead:
class Song:
"""The class to store the details of each song"""
attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
def __init__(self):
for att in self.attsToStore:
setattr(self, att.lower(), None)
def setDetail(self, key, val):
if key in self.attsToStore:
setattr(self, key.lower(), val)
There are some cases where you have to use eval or exec. But they are rare. Using eval in your case is a bad practice for sure. I'm emphasizing on bad practice because eval and exec are frequently used in the wrong place.
Replying to the comments:
It looks like some disagree that eval is 'very dangerous and insecure' in the OP case. That might be true for this specific case but not in general. The question was general and the reasons I listed are true for the general case as well.
Using eval is weak, not a clearly bad practice.
It violates the "Fundamental Principle of Software". Your source is not the sum total of what's executable. In addition to your source, there are the arguments to eval, which must be clearly understood. For this reason, it's the tool of last resort.
It's usually a sign of thoughtless design. There's rarely a good reason for dynamic source code, built on-the-fly. Almost anything can be done with delegation and other OO design techniques.
It leads to relatively slow on-the-fly compilation of small pieces of code. An overhead which can be avoided by using better design patterns.
As a footnote, in the hands of deranged sociopaths, it may not work out well. However, when confronted with deranged sociopathic users or administrators, it's best to not give them interpreted Python in the first place. In the hands of the truly evil, Python can a liability; eval doesn't increase the risk at all.
Yes, it is:
Hack using Python:
>>> eval(input())
"__import__('os').listdir('.')"
...........
........... #dir listing
...........
The below code will list all tasks running on a Windows machine.
>>> eval(input())
"__import__('subprocess').Popen(['tasklist'],stdout=__import__('subprocess').PIPE).communicate()[0]"
In Linux:
>>> eval(input())
"__import__('subprocess').Popen(['ps', 'aux'],stdout=__import__('subprocess').PIPE).communicate()[0]"
In this case, yes. Instead of
exec 'self.Foo=val'
you should use the builtin function setattr:
setattr(self, 'Foo', val)
Other users pointed out how your code can be changed as to not depend on eval; I'll offer a legitimate use-case for using eval, one that is found even in CPython: testing.
Here's one example I found in test_unary.py where a test on whether (+|-|~)b'a' raises a TypeError:
def test_bad_types(self):
for op in '+', '-', '~':
self.assertRaises(TypeError, eval, op + "b'a'")
self.assertRaises(TypeError, eval, op + "'a'")
The usage is clearly not bad practice here; you define the input and merely observe behavior. eval is handy for testing.
Take a look at this search for eval, performed on the CPython git repository; testing with eval is heavily used.
It's worth noting that for the specific problem in question, there are several alternatives to using eval:
The simplest, as noted, is using setattr:
def __init__(self):
for name in attsToStore:
setattr(self, name, None)
A less obvious approach is updating the object's __dict__ object directly. If all you want to do is initialize the attributes to None, then this is less straightforward than the above. But consider this:
def __init__(self, **kwargs):
for name in self.attsToStore:
self.__dict__[name] = kwargs.get(name, None)
This allows you to pass keyword arguments to the constructor, e.g.:
s = Song(name='History', artist='The Verve')
It also allows you to make your use of locals() more explicit, e.g.:
s = Song(**locals())
...and, if you really want to assign None to the attributes whose names are found in locals():
s = Song(**dict([(k, None) for k in locals().keys()]))
Another approach to providing an object with default values for a list of attributes is to define the class's __getattr__ method:
def __getattr__(self, name):
if name in self.attsToStore:
return None
raise NameError, name
This method gets called when the named attribute isn't found in the normal way. This approach somewhat less straightforward than simply setting the attributes in the constructor or updating the __dict__, but it has the merit of not actually creating the attribute unless it exists, which can pretty substantially reduce the class's memory usage.
The point of all this: There are lots of reasons, in general, to avoid eval - the security problem of executing code that you don't control, the practical problem of code you can't debug, etc. But an even more important reason is that generally, you don't need to use it. Python exposes so much of its internal mechanisms to the programmer that you rarely really need to write code that writes code.
When eval() is used to process user-provided input, you enable the user to Drop-to-REPL providing something like this:
"__import__('code').InteractiveConsole(locals=globals()).interact()"
You may get away with it, but normally you don't want vectors for arbitrary code execution in your applications.
In addition to #Nadia Alramli answer, since I am new to Python and was eager to check how using eval will affect the timings, I tried a small program and below were the observations:
#Difference while using print() with eval() and w/o eval() to print an int = 0.528969s per 100000 evals()
from datetime import datetime
def strOfNos():
s = []
for x in range(100000):
s.append(str(x))
return s
strOfNos()
print(datetime.now())
for x in strOfNos():
print(x) #print(eval(x))
print(datetime.now())
#when using eval(int)
#2018-10-29 12:36:08.206022
#2018-10-29 12:36:10.407911
#diff = 2.201889 s
#when using int only
#2018-10-29 12:37:50.022753
#2018-10-29 12:37:51.090045
#diff = 1.67292

How to define a strategy in hypothesis to generate a pair of similar recursive objects

I am new to hypothesis and I am looking for a way to generate a pair of similar recursive objects.
My strategy for a single object is similar to this example in the hypothesis documentation.
I want to test a function which takes a pair of recursive objects A and B and the side effect of this function should be that A==B.
My first approach would be to write a test which gets two independent objects, like:
#given(my_objects(), my_objects())
def test_is_equal(a, b):
my_function(a, b)
assert a == b
But the downside is that hypothesis does not know that there is a dependency between this two objects and so they can be completely different. That is a valid test and I want to test that too.
But I also want to test complex recursive objects which are only slightly different.
And maybe that hypothesis is able to shrink a pair of very different objects where the test fails to a pair of only slightly different objects where the test fails in the same way.
This one is tricky - to be honest I'd start by writing the same test you already have above, and just turn up the max_examples setting a long way. Then I'd probably write some traditional unit tests, because getting specific data distributions out of Hypothesis is explicitly unsupported (i.e. we try to break everything that assumes a particular distribution, using some combination of heuristics and a bit of feedback).
How would I actually generate similar recursive structures though? I'd use a #composite strategy to build them at the same time, and for each element or subtree I'd draw a boolean and if True draw a different element or subtree to use in the second object. Note that this will give you a strategy for a tuple of two objects and you'll need to unpack it inside the test; that's unavoidable if you want them to be related.
Seriously do try just cracking up max_examples on the naive approach first though, running Hypothesis for ~an hour is amazingly effective and I would even expect it to shrink the output fairly well.

Type hinting a non-generator (non-consumable) iterable

I'd like to type hint a function like this:
from types import Iterable
def func(thing: Iterable[str]) -> None:
for i in range(10):
for x in thing:
do_thing(x)
PyCharm will (correctly) let me get away with passing in a generator to this function, but I want to type hint it in a way that it won't allow me to, while still accepting other iterables.
Using Sequence[str] isn't an option, iterables like KeyView aren't sequences, but I would still like to be able to include them.
Someone mentioned using a Union with a Sequence + KeyView, which would work, but I was wondering if there was a more elegant and universal solution
Of course, I could just convert thing to a list no matter what, but I'd rather just have this function type hinted correctly.
Using Python 3.7
Unfortunately I think this is just not possible with Python's type system.
Straight from Guido (source):
Our type system doesn't allow you to express that -- it's like asking for any Animal except Cat. Adding an "except" clause like that to the type system would be a very difficult operation.
Since there's no solution, I'm going to close this as "won't fix".
His suggestion:
Yeah, the practical solution is Sequence|Set|Mapping. The needed negation is years off.

What is the 'better' approach to configuring a tkinter widget?

I've worked with tkinter a bit of a time now.
There are two ways for configuration or at at least I just know two:
1: frame.config(bg='#123456')
2: frame["bg"] = '#123456'
I use the latter more often. Only if there are more things to be done at the same time, the second seems useful for me.
Recently I was wondering if one of them is 'better' like faster or has any other advantage.
I don't think it's a crucially important Question but maybe someone knows it.
Studying the tkinter code base, we find the following:
class Frame(Widget):
# Other code here
class Widget(BaseWidget, Pack, Place, Grid):
pass
class BaseWidget(Misc):
# other code here
class Misc:
# various code
def __setitem__(self, key, value):
self.configure({key: value})
Therefore, the two methods are actually equivalent. The line
frame['bg'] = '#123456'
is interpreted as frame.__setitem__('bg','#123456'), which after passing through the inheritance chain finds itself on the internal class Misc which simply passes it to the configure method. As far as your question about efficiency is concerned, the first method is probably slightly faster because it doesn't need to be interpreted as much, but the speed difference is too little to be overly concerned with.

Resources