Related
I'd like to get the minimum for a complicated class, for which I have already written a strategy.
Is it possible to ask hypothesis to simply give me the minimum example for a given strategy?
Context here is writing a strategy and a default minimum (for use in #hypothesis.example) - surely the information for the latter is already contained in the former?
import dataclasses
import hypothesis
from hypothesis import strategies
#dataclasses.dataclass
class Foo:
bar: int
# Foo has many more attributes which are strategised over...
#classmethod
def strategy(cls):
return hypothesis.builds(cls, bar=strategies.integers())
#classmethod
def minimal(cls):
return hypothesis.minimal(cls.strategy())
Found the answer.
Use hypothesis.find:
"""Returns the minimal example from the given strategy specifier that
matches the predicate function condition.
So we want the following code for the above example:
minimal = hypothesis.find(
specifier=Foo.strategy(),
condition=lambda _: True,
)
You're correct that hypothesis.find(s, lambda _: True) is the way to do this.
Context here is writing a strategy and a default minimum (for use in #hypothesis.example()) - surely the information for the latter is already contained in the former?
It's not just contained in it, but generating examples will almost always start by generating the minimal example* - because if that fails, we can stop immediately! So you might not need to do this at all ;-)
(the rare exceptions are strategies with complicated filters, where the attempted-minimal example is rejected by a filter - and we just move on to reasonably-minimal examples rather than shrinking to find the minimal one)
I'd also note that st.builds() has native support for dataclasses, so you can omit the strategies (or pass hypothesis.infer) for any argument where you don't have more specific constraints than "any instance of the type". For example, there's no need to pass bar=st.integers() above!
I use the following class to easily store data of my songs.
class Song:
"""The class to store the details of each song"""
attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
def __init__(self):
for att in self.attsToStore:
exec 'self.%s=None'%(att.lower()) in locals()
def setDetail(self, key, val):
if key in self.attsToStore:
exec 'self.%s=val'%(key.lower()) in locals()
I feel that this is just much more extensible than writing out an if/else block. However, I have heard that eval is unsafe. Is it? What is the risk? How can I solve the underlying problem in my class (setting attributes of self dynamically) without incurring that risk?
Yes, using eval is a bad practice. Just to name a few reasons:
There is almost always a better way to do it
Very dangerous and insecure
Makes debugging difficult
Slow
In your case you can use setattr instead:
class Song:
"""The class to store the details of each song"""
attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
def __init__(self):
for att in self.attsToStore:
setattr(self, att.lower(), None)
def setDetail(self, key, val):
if key in self.attsToStore:
setattr(self, key.lower(), val)
There are some cases where you have to use eval or exec. But they are rare. Using eval in your case is a bad practice for sure. I'm emphasizing on bad practice because eval and exec are frequently used in the wrong place.
Replying to the comments:
It looks like some disagree that eval is 'very dangerous and insecure' in the OP case. That might be true for this specific case but not in general. The question was general and the reasons I listed are true for the general case as well.
Using eval is weak, not a clearly bad practice.
It violates the "Fundamental Principle of Software". Your source is not the sum total of what's executable. In addition to your source, there are the arguments to eval, which must be clearly understood. For this reason, it's the tool of last resort.
It's usually a sign of thoughtless design. There's rarely a good reason for dynamic source code, built on-the-fly. Almost anything can be done with delegation and other OO design techniques.
It leads to relatively slow on-the-fly compilation of small pieces of code. An overhead which can be avoided by using better design patterns.
As a footnote, in the hands of deranged sociopaths, it may not work out well. However, when confronted with deranged sociopathic users or administrators, it's best to not give them interpreted Python in the first place. In the hands of the truly evil, Python can a liability; eval doesn't increase the risk at all.
Yes, it is:
Hack using Python:
>>> eval(input())
"__import__('os').listdir('.')"
...........
........... #dir listing
...........
The below code will list all tasks running on a Windows machine.
>>> eval(input())
"__import__('subprocess').Popen(['tasklist'],stdout=__import__('subprocess').PIPE).communicate()[0]"
In Linux:
>>> eval(input())
"__import__('subprocess').Popen(['ps', 'aux'],stdout=__import__('subprocess').PIPE).communicate()[0]"
In this case, yes. Instead of
exec 'self.Foo=val'
you should use the builtin function setattr:
setattr(self, 'Foo', val)
Other users pointed out how your code can be changed as to not depend on eval; I'll offer a legitimate use-case for using eval, one that is found even in CPython: testing.
Here's one example I found in test_unary.py where a test on whether (+|-|~)b'a' raises a TypeError:
def test_bad_types(self):
for op in '+', '-', '~':
self.assertRaises(TypeError, eval, op + "b'a'")
self.assertRaises(TypeError, eval, op + "'a'")
The usage is clearly not bad practice here; you define the input and merely observe behavior. eval is handy for testing.
Take a look at this search for eval, performed on the CPython git repository; testing with eval is heavily used.
It's worth noting that for the specific problem in question, there are several alternatives to using eval:
The simplest, as noted, is using setattr:
def __init__(self):
for name in attsToStore:
setattr(self, name, None)
A less obvious approach is updating the object's __dict__ object directly. If all you want to do is initialize the attributes to None, then this is less straightforward than the above. But consider this:
def __init__(self, **kwargs):
for name in self.attsToStore:
self.__dict__[name] = kwargs.get(name, None)
This allows you to pass keyword arguments to the constructor, e.g.:
s = Song(name='History', artist='The Verve')
It also allows you to make your use of locals() more explicit, e.g.:
s = Song(**locals())
...and, if you really want to assign None to the attributes whose names are found in locals():
s = Song(**dict([(k, None) for k in locals().keys()]))
Another approach to providing an object with default values for a list of attributes is to define the class's __getattr__ method:
def __getattr__(self, name):
if name in self.attsToStore:
return None
raise NameError, name
This method gets called when the named attribute isn't found in the normal way. This approach somewhat less straightforward than simply setting the attributes in the constructor or updating the __dict__, but it has the merit of not actually creating the attribute unless it exists, which can pretty substantially reduce the class's memory usage.
The point of all this: There are lots of reasons, in general, to avoid eval - the security problem of executing code that you don't control, the practical problem of code you can't debug, etc. But an even more important reason is that generally, you don't need to use it. Python exposes so much of its internal mechanisms to the programmer that you rarely really need to write code that writes code.
When eval() is used to process user-provided input, you enable the user to Drop-to-REPL providing something like this:
"__import__('code').InteractiveConsole(locals=globals()).interact()"
You may get away with it, but normally you don't want vectors for arbitrary code execution in your applications.
In addition to #Nadia Alramli answer, since I am new to Python and was eager to check how using eval will affect the timings, I tried a small program and below were the observations:
#Difference while using print() with eval() and w/o eval() to print an int = 0.528969s per 100000 evals()
from datetime import datetime
def strOfNos():
s = []
for x in range(100000):
s.append(str(x))
return s
strOfNos()
print(datetime.now())
for x in strOfNos():
print(x) #print(eval(x))
print(datetime.now())
#when using eval(int)
#2018-10-29 12:36:08.206022
#2018-10-29 12:36:10.407911
#diff = 2.201889 s
#when using int only
#2018-10-29 12:37:50.022753
#2018-10-29 12:37:51.090045
#diff = 1.67292
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am going to be creating a script that parses through an XML (very large, .5gb+), and am trying to think of how to efficiently do it.
Normally, I would do this in AutoIt, as that's my 'normal' language to use for things, but I think it's more appropriate to do it in Python (plus I'd like to learn more python).
Normally how I'd do this, is create a constant with all the 'columns' I'd need from the XML, use that to match and parse it through into an array (actually 2 arrays, cause of subrecords), then pass sets of the array(s) to the system of record as JSON objects/strings.
In Python, I'm not sure that's the best route. I was thinking about making a class of the object, then creating instances for each record/row of the XML that I'd convert to JSON and then submit. If I feel ambitious, I'd even work on getting it to be multithreaded. My best option would be to pull out a record, then submit it in the background while I work on the next record, up to say, 5 to 10 records, but perhaps that's not good.
My question is, does it seem like I'm using a class just to use a class, or does it seem like a good reason to do it? I admit my thinking is colored by the fact that I haven't used classes much (at all) before, and am using it because it's neat and new.
Is there actually a totally better way that I'm overlooking because I'm blinded by new/shiny concepts, or lack of knowledge of the program (this is probably likely to me)?
I'm hoping for answers that will guide me in a general direction - this is part of my learning the language and doing the research myself really helps me understand what I'm doing and why. Unfortunately, I think I need a guide here on this point.
This debate is largely situational in nature, and will depend on what you intend to do within your program. The main thing I would consider is, Do I need to encapsulate properties (data) and functionality(methods/functions) into a single grouping?
Some additional things that come to mind, in terms of pros vs. cons of using a class (object) in this context:
Reasons to use a class:
If potential future maintainability would warrant 'swapping' in a new class into an existing structure within the program.
If there are attributes that would hold true for all instances of the class.
If it makes logical sense to have a group of functions separated out from the rest of your program.
More concise options for ensuring immutability
Providing a type for the underlying fields meshes well with the rest of your program.
Reasons not to use a class:
The code can be maintained purely through the addition of new functions.
You aren't performing functional tasks on the fields stored (e.g. storing create_date, but needing only to work with age - this can lend itself better to an object that doesn't expose create_date, but rather just a function get_age).
You have severe performance optimization standards to meet and can't justify calls to functions to ensure encapsulation, any additional memory overhead, etc...
Generally, Python lends itself to using classes since it is an object-oriented language. However, compared to more heavily oop languages like C++ and Java, you can "get away" with a lot more in Python without using classes. If you want to explore using a class, I certainly think it would be a good exercise in use of the language.
Edit:
Based on follow-up comment, I wanted to provide an example of using named arguments to instantiate a class with optional fields. The general overview, is that Python interprets the ordering of arguments when considering which argument to assign to internal functionality. As an example:
def get_info(name, birthday, favorite_color):
age = current_time - birthday
return [name, age, favorite_color]
In this example, Python interprets the input arguments based on the order they appear when the method is called:
get_info('James', '03-05-1998', 'blue')
However, Python also allows for named arguments, which specify the parameter-internal field assignment explicitly:
get_info(name='James', birthday='03-05-1998', favorite_color='blue')
While at first glance this syntax appears to be more verbose, it actually allows for great flexibility, in that ordering of named arguments doesn't matter, and you can set defaults for arguments that aren't passed into the method's signature:
def get_info(name, birthday, favorite_color=None):
age = current_time - birthday
return [name, age, favorite_color]
get_info(name='James', birthday='03-05-1998')
Below I've provided a more in-depth working example of how named arguments could help the situation you've outlined in your comment (Many fields, not all of them required) Play around with constructing this object in various ways to see how the non-named parameters are required, but the named parameters are optional and will default to the values specified in the __init__() method:
class Car(object):
""" Initializes a new Car object. Requires a color, make, model, horsepower, price, and condition.
Optional parameters include: wheel_size, moon_roof, premium_sound, interior_color, and interior_material."""
def __init__(self, color, make, model, horsepower, price, condition, wheel_size=16, moon_roof=None, premium_sound=None, interior_color='black', interior_material='cloth'):
self.color = color
self.make = make
self.model = model
self.horsepower = horsepower
self.price = price
self.condition = condition
self.wheel_size = wheel_size
self.moon_roof = moon_roof
self.premium_sound = premium_sound
self.interior_color = interior_color
self.interior_material = interior_material
# Prints attributes of the Car class and their associated values in no specific order.
def print_car(self):
fields = []
for key, value in self.__dict__.iteritems():
fields.append(key + ': ')
fields.append(str(value))
fields.append('\n')
print ''.join(fields)
# Executes the main program body
def main():
stock_car = Car('Red', 'Honda', 'NSX', 290, 89000.00, 'New')
stock_car.print_car()
custom_car = Car('Black', 'Mitsubishi', 'Lancer Evolution', 280, 45000.00, 'New', 17, "Tinted Moonroof", "Bose", "Black/Red", "Suede/Leather")
custom_car.print_car()
# Calls main() as the entry point for this program.
if __name__ == '__main__':
main()
I've started learning the typing system in python and came across an issue in defining function arguments that are picklable. Not everything in python can be pickled, can I define a type annotation that says "only accept objects that can are picklable"?
At first it sounds like something that should be possible, similar to Java's Serializable but then there is no Picklable interface in python and thinking about the issue a little more it occurs to me that pickling is an inherently runtime task. What can be pickled lists a number of things that can be pickled, and it's not difficult to imagine a container of lambda functions which would not be picklable, but I can't think of a way of determining that before hand (without touching the container definition).
The only way I've come up with is to define something like a typing.Union[Callable, Iterable, ...] of all the things listed in What can be pickled but that does not seem like a good solution.
This issue on github partially answers the question, although the issue is specifically related to json not pickle but the first answer from Guido should still apply to pickle
I tried to do that but a recursive type alias doesn't work in mypy right now, and I'm not sure how to make it work. In the mean time I use JsonDict = Dict[str, Any] (which is not very useful but at least clarifies that the keys are strings), and Any for places where a more general JSON type is expected.
https://github.com/python/typing/issues/182
I'm researching and experimenting more with Groovy and I'm trying to wrap my mind around the pros and cons of implementing things in Groovy that I can't/don't do in Java. Dynamic programming is still just a concept to me since I've been deeply steeped static and strongly typed languages.
Groovy gives me the ability to duck-type, but I can't really see the value. How is duck-typing more productive than static typing? What kind of things can I do in my code practice to help me grasp the benefits of it?
I ask this question with Groovy in mind but I understand it isn't necessarily a Groovy question so I welcome answers from every code camp.
A lot of the comments for duck typing don't really substantiate the claims. Not "having to worry" about a type is not sustainable for maintenance or making an application extendable. I've really had a good opportunity to see Grails in action over my last contract and its quite funny to watch really. Everyone is happy about the gains in being able to "create-app" and get going - sadly it all catches up to you on the back end.
Groovy seems the same way to me. Sure you can write very succinct code and definitely there is some nice sugar in how we get to work with properties, collections, etc... But the cost of not knowing what the heck is being passed back and forth just gets worse and worse. At some point your scratching your head wondering why the project has become 80% testing and 20% work. The lesson here is that "smaller" does not make for "more readable" code. Sorry folks, its simple logic - the more you have to know intuitively then the more complex the process of understanding that code becomes. It's why GUI's have backed off becoming overly iconic over the years - sure looks pretty but WTH is going on is not always obvious.
People on that project seemed to have troubles "nailing down" the lessons learned, but when you have methods returning either a single element of type T, an array of T, an ErrorResult or a null ... it becomes rather apparent.
One thing working with Groovy has done for me however - awesome billable hours woot!
Duck typing cripples most modern IDE's static checking, which can point out errors as you type. Some consider this an advantage. I want the IDE/Compiler to tell me I've made a stupid programmer trick as soon as possible.
My most recent favorite argument against duck typing comes from a Grails project DTO:
class SimpleResults {
def results
def total
def categories
}
where results turns out to be something like Map<String, List<ComplexType>>, which can be discovered only by following a trail of method calls in different classes until you find where it was created. For the terminally curious, total is the sum of the sizes of the List<ComplexType>s and categories is the size of the Map
It may have been clear to the original developer, but the poor maintenance guy (ME) lost a lot of hair tracking this one down.
It's a little bit difficult to see the value of duck typing until you've used it for a little while. Once you get used to it, you'll realize how much of a load off your mind it is to not have to deal with interfaces or having to worry about exactly what type something is.
Next, which is better: EMACS or vi? This is one of the running religious wars.
Think of it this way: any program that is correct, will be correct if the language is statically typed. What static typing does is let the compiler have enough information to detect type mismatches at compile time instead of run time. This can be an annoyance if your doing incremental sorts of programming, although (I maintain) if you're thinking clearly about your program it doesn't much matter; on the other hand, if you're building a really big program, like an operating system or a telephone switch, with dozens or hundreds or thousands of people working on it, or with really high reliability requirements, then having he compiler be able to detect a large class of problems for you without needing a test case to exercise just the right code path.
It's not as if dynamic typing is a new and different thing: C, for example, is effectively dynamically typed, since I can always cast a foo* to a bar*. It just means it's then my responsibility as a C programmer never to use code that is appropriate on a bar* when the address is really pointing to a foo*. But as a result of the issues with large programs, C grew tools like lint(1), strengthened its type system with typedef and eventually developed a strongly typed variant in C++. (And, of course, C++ in turn developed ways around the strong typing, with all the varieties of casts and generics/templates and with RTTI.
One other thing, though --- don't confuse "agile programming" with "dynamic languages". Agile programming is about the way people work together in a project: can the project adapt to changing requirements to meet the customers' needs while maintaining a humane environment for the programmers? It can be done with dynamically typed languages, and often is, because they can be more productive (eg, Ruby, Smalltalk), but it can be done, has been done successfully, in C and even assembler. In fact, Rally Development even uses agile methods (SCRUM in particular) to do marketing and documentation.
There is nothing wrong with static typing if you are using Haskell, which has an incredible static type system. However, if you are using languages like Java and C++ that have terribly crippling type systems, duck typing is definitely an improvement.
Imagine trying to use something so simple as "map" in Java (and no, I don't mean the data structure). Even generics are rather poorly supported.
With, TDD + 100% Code Coverage + IDE tools to constantly run my tests, I do not feel a need of static typing any more. With no strong types, my unit testing has become so easy (Simply use Maps for creating mock objects). Specially , when you are using Generics, you can see the difference:
//Static typing
Map<String,List<Class1<Class2>>> someMap = [:] as HashMap<String,List<Class1<Class2>>>
vs
//Dynamic typing
def someMap = [:]
IMHO, the advantage of duck typing becomes magnified when you adhere to some conventions, such as naming you variables and methods in a consistent way. Taking the example from Ken G, I think it would read best:
class SimpleResults {
def mapOfListResults
def total
def categories
}
Let's say you define a contract on some operation named 'calculateRating(A,B)' where A and B adhere to another contract. In pseudocode, it would read:
Long calculateRating(A someObj, B, otherObj) {
//some fake algorithm here:
if(someObj.doStuff('foo') > otherObj.doStuff('bar')) return someObj.calcRating());
else return otherObj.calcRating();
}
If you want to implement this in Java, both A and B must implement some kind of interface that reads something like this:
public interface MyService {
public int doStuff(String input);
}
Besides, if you want to generalize you contract for calculating ratings (let's say you have another algorithm for rating calculations), you also have to create an interface:
public long calculateRating(MyService A, MyServiceB);
With duck typing, you can ditch your interfaces and just rely that on runtime, both A and B will respond correctly to your doStuff() calls. There is no need for a specific contract definition. This can work for you but it can also work against you.
The downside is that you have to be extra careful in order to guarantee that your code does not break when some other persons changes it (ie, the other person must be aware of the implicit contract on the method name and arguments).
Note that this aggravates specially in Java, where the syntax is not as terse as it could be (compared to Scala for example). A counter-example of this is the Lift framework, where they say that the SLOC count of the framework is similar to Rails, but the test code has less lines because they don't need to implement type checks within the tests.
Here's one scenario where duck typing saves work.
Here's a very trivial class
class BookFinder {
def searchEngine
def findBookByTitle(String title) {
return searchEngine.find( [ "Title" : title ] )
}
}
Now for the unit test:
void bookFinderTest() {
// with Expando we can 'fake' any object at runtime.
// alternatively you could write a MockSearchEngine class.
def mockSearchEngine = new Expando()
mockSearchEngine.find = {
return new Book("Heart of Darkness","Joseph Conrad")
}
def bf = new BookFinder()
bf.searchEngine = mockSearchEngine
def book = bf.findBookByTitle("Heart of Darkness")
assert(book.author == "Joseph Conrad"
}
We were able to substitute an Expando for the SearchEngine, because of the absence of static type checking. With static type checking we would have had to ensure that SearchEngine was an interface, or at least an abstract class, and create a full mock implementation of it. That's labour intensive, or you can use a sophisticated single-purpose mocking framework. But duck typing is general-purpose, and has helped us.
Because of duck typing, our unit test can provide any old object in place of the dependency, just as long as it implements the methods that get called.
To emphasise - you can do this in a statically typed language, with careful use of interfaces and class hierarchies. But with duck typing you can do it with less thinking and fewer keystrokes.
That's an advantage of duck typing. It doesn't mean that dynamic typing is the right paradigm to use in all situations. In my Groovy projects, I like to switch back to Java in circumstances where I feel that compiler warnings about types are going to help me.
To me, they aren't horribly different if you see dynamically typed languages as simply a form of static typing where everything inherits from a sufficiently abstract base class.
Problems arise when, as many have pointed out, you start getting strange with this. Someone pointed out a function that returns a single object, a collection, or a null. Have the function return a specific type, not multiple. Use multiple functions for single vs collection.
What it boils down to is that anyone can write bad code. Static typing is a great safety device, but sometimes the helmet gets in the way when you want to feel the wind in your hair.
It's not that duck typing is more productive than static typing as much as it is simply different. With static typing you always have to worry that your data is the correct type and in Java it shows up through casting to the right type. With duck typing the type doesn't matter as long as it has the right method, so it really just eliminates a lot of the hassle of casting and conversions between types.