Using a class vs just functions [closed] - python-3.x

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am going to be creating a script that parses through an XML (very large, .5gb+), and am trying to think of how to efficiently do it.
Normally, I would do this in AutoIt, as that's my 'normal' language to use for things, but I think it's more appropriate to do it in Python (plus I'd like to learn more python).
Normally how I'd do this, is create a constant with all the 'columns' I'd need from the XML, use that to match and parse it through into an array (actually 2 arrays, cause of subrecords), then pass sets of the array(s) to the system of record as JSON objects/strings.
In Python, I'm not sure that's the best route. I was thinking about making a class of the object, then creating instances for each record/row of the XML that I'd convert to JSON and then submit. If I feel ambitious, I'd even work on getting it to be multithreaded. My best option would be to pull out a record, then submit it in the background while I work on the next record, up to say, 5 to 10 records, but perhaps that's not good.
My question is, does it seem like I'm using a class just to use a class, or does it seem like a good reason to do it? I admit my thinking is colored by the fact that I haven't used classes much (at all) before, and am using it because it's neat and new.
Is there actually a totally better way that I'm overlooking because I'm blinded by new/shiny concepts, or lack of knowledge of the program (this is probably likely to me)?
I'm hoping for answers that will guide me in a general direction - this is part of my learning the language and doing the research myself really helps me understand what I'm doing and why. Unfortunately, I think I need a guide here on this point.

This debate is largely situational in nature, and will depend on what you intend to do within your program. The main thing I would consider is, Do I need to encapsulate properties (data) and functionality(methods/functions) into a single grouping?
Some additional things that come to mind, in terms of pros vs. cons of using a class (object) in this context:
Reasons to use a class:
If potential future maintainability would warrant 'swapping' in a new class into an existing structure within the program.
If there are attributes that would hold true for all instances of the class.
If it makes logical sense to have a group of functions separated out from the rest of your program.
More concise options for ensuring immutability
Providing a type for the underlying fields meshes well with the rest of your program.
Reasons not to use a class:
The code can be maintained purely through the addition of new functions.
You aren't performing functional tasks on the fields stored (e.g. storing create_date, but needing only to work with age - this can lend itself better to an object that doesn't expose create_date, but rather just a function get_age).
You have severe performance optimization standards to meet and can't justify calls to functions to ensure encapsulation, any additional memory overhead, etc...
Generally, Python lends itself to using classes since it is an object-oriented language. However, compared to more heavily oop languages like C++ and Java, you can "get away" with a lot more in Python without using classes. If you want to explore using a class, I certainly think it would be a good exercise in use of the language.
Edit:
Based on follow-up comment, I wanted to provide an example of using named arguments to instantiate a class with optional fields. The general overview, is that Python interprets the ordering of arguments when considering which argument to assign to internal functionality. As an example:
def get_info(name, birthday, favorite_color):
age = current_time - birthday
return [name, age, favorite_color]
In this example, Python interprets the input arguments based on the order they appear when the method is called:
get_info('James', '03-05-1998', 'blue')
However, Python also allows for named arguments, which specify the parameter-internal field assignment explicitly:
get_info(name='James', birthday='03-05-1998', favorite_color='blue')
While at first glance this syntax appears to be more verbose, it actually allows for great flexibility, in that ordering of named arguments doesn't matter, and you can set defaults for arguments that aren't passed into the method's signature:
def get_info(name, birthday, favorite_color=None):
age = current_time - birthday
return [name, age, favorite_color]
get_info(name='James', birthday='03-05-1998')
Below I've provided a more in-depth working example of how named arguments could help the situation you've outlined in your comment (Many fields, not all of them required) Play around with constructing this object in various ways to see how the non-named parameters are required, but the named parameters are optional and will default to the values specified in the __init__() method:
class Car(object):
""" Initializes a new Car object. Requires a color, make, model, horsepower, price, and condition.
Optional parameters include: wheel_size, moon_roof, premium_sound, interior_color, and interior_material."""
def __init__(self, color, make, model, horsepower, price, condition, wheel_size=16, moon_roof=None, premium_sound=None, interior_color='black', interior_material='cloth'):
self.color = color
self.make = make
self.model = model
self.horsepower = horsepower
self.price = price
self.condition = condition
self.wheel_size = wheel_size
self.moon_roof = moon_roof
self.premium_sound = premium_sound
self.interior_color = interior_color
self.interior_material = interior_material
# Prints attributes of the Car class and their associated values in no specific order.
def print_car(self):
fields = []
for key, value in self.__dict__.iteritems():
fields.append(key + ': ')
fields.append(str(value))
fields.append('\n')
print ''.join(fields)
# Executes the main program body
def main():
stock_car = Car('Red', 'Honda', 'NSX', 290, 89000.00, 'New')
stock_car.print_car()
custom_car = Car('Black', 'Mitsubishi', 'Lancer Evolution', 280, 45000.00, 'New', 17, "Tinted Moonroof", "Bose", "Black/Red", "Suede/Leather")
custom_car.print_car()
# Calls main() as the entry point for this program.
if __name__ == '__main__':
main()

Related

Is using eval() in this case a good idea in python? [duplicate]

I use the following class to easily store data of my songs.
class Song:
"""The class to store the details of each song"""
attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
def __init__(self):
for att in self.attsToStore:
exec 'self.%s=None'%(att.lower()) in locals()
def setDetail(self, key, val):
if key in self.attsToStore:
exec 'self.%s=val'%(key.lower()) in locals()
I feel that this is just much more extensible than writing out an if/else block. However, I have heard that eval is unsafe. Is it? What is the risk? How can I solve the underlying problem in my class (setting attributes of self dynamically) without incurring that risk?
Yes, using eval is a bad practice. Just to name a few reasons:
There is almost always a better way to do it
Very dangerous and insecure
Makes debugging difficult
Slow
In your case you can use setattr instead:
class Song:
"""The class to store the details of each song"""
attsToStore=('Name', 'Artist', 'Album', 'Genre', 'Location')
def __init__(self):
for att in self.attsToStore:
setattr(self, att.lower(), None)
def setDetail(self, key, val):
if key in self.attsToStore:
setattr(self, key.lower(), val)
There are some cases where you have to use eval or exec. But they are rare. Using eval in your case is a bad practice for sure. I'm emphasizing on bad practice because eval and exec are frequently used in the wrong place.
Replying to the comments:
It looks like some disagree that eval is 'very dangerous and insecure' in the OP case. That might be true for this specific case but not in general. The question was general and the reasons I listed are true for the general case as well.
Using eval is weak, not a clearly bad practice.
It violates the "Fundamental Principle of Software". Your source is not the sum total of what's executable. In addition to your source, there are the arguments to eval, which must be clearly understood. For this reason, it's the tool of last resort.
It's usually a sign of thoughtless design. There's rarely a good reason for dynamic source code, built on-the-fly. Almost anything can be done with delegation and other OO design techniques.
It leads to relatively slow on-the-fly compilation of small pieces of code. An overhead which can be avoided by using better design patterns.
As a footnote, in the hands of deranged sociopaths, it may not work out well. However, when confronted with deranged sociopathic users or administrators, it's best to not give them interpreted Python in the first place. In the hands of the truly evil, Python can a liability; eval doesn't increase the risk at all.
Yes, it is:
Hack using Python:
>>> eval(input())
"__import__('os').listdir('.')"
...........
........... #dir listing
...........
The below code will list all tasks running on a Windows machine.
>>> eval(input())
"__import__('subprocess').Popen(['tasklist'],stdout=__import__('subprocess').PIPE).communicate()[0]"
In Linux:
>>> eval(input())
"__import__('subprocess').Popen(['ps', 'aux'],stdout=__import__('subprocess').PIPE).communicate()[0]"
In this case, yes. Instead of
exec 'self.Foo=val'
you should use the builtin function setattr:
setattr(self, 'Foo', val)
Other users pointed out how your code can be changed as to not depend on eval; I'll offer a legitimate use-case for using eval, one that is found even in CPython: testing.
Here's one example I found in test_unary.py where a test on whether (+|-|~)b'a' raises a TypeError:
def test_bad_types(self):
for op in '+', '-', '~':
self.assertRaises(TypeError, eval, op + "b'a'")
self.assertRaises(TypeError, eval, op + "'a'")
The usage is clearly not bad practice here; you define the input and merely observe behavior. eval is handy for testing.
Take a look at this search for eval, performed on the CPython git repository; testing with eval is heavily used.
It's worth noting that for the specific problem in question, there are several alternatives to using eval:
The simplest, as noted, is using setattr:
def __init__(self):
for name in attsToStore:
setattr(self, name, None)
A less obvious approach is updating the object's __dict__ object directly. If all you want to do is initialize the attributes to None, then this is less straightforward than the above. But consider this:
def __init__(self, **kwargs):
for name in self.attsToStore:
self.__dict__[name] = kwargs.get(name, None)
This allows you to pass keyword arguments to the constructor, e.g.:
s = Song(name='History', artist='The Verve')
It also allows you to make your use of locals() more explicit, e.g.:
s = Song(**locals())
...and, if you really want to assign None to the attributes whose names are found in locals():
s = Song(**dict([(k, None) for k in locals().keys()]))
Another approach to providing an object with default values for a list of attributes is to define the class's __getattr__ method:
def __getattr__(self, name):
if name in self.attsToStore:
return None
raise NameError, name
This method gets called when the named attribute isn't found in the normal way. This approach somewhat less straightforward than simply setting the attributes in the constructor or updating the __dict__, but it has the merit of not actually creating the attribute unless it exists, which can pretty substantially reduce the class's memory usage.
The point of all this: There are lots of reasons, in general, to avoid eval - the security problem of executing code that you don't control, the practical problem of code you can't debug, etc. But an even more important reason is that generally, you don't need to use it. Python exposes so much of its internal mechanisms to the programmer that you rarely really need to write code that writes code.
When eval() is used to process user-provided input, you enable the user to Drop-to-REPL providing something like this:
"__import__('code').InteractiveConsole(locals=globals()).interact()"
You may get away with it, but normally you don't want vectors for arbitrary code execution in your applications.
In addition to #Nadia Alramli answer, since I am new to Python and was eager to check how using eval will affect the timings, I tried a small program and below were the observations:
#Difference while using print() with eval() and w/o eval() to print an int = 0.528969s per 100000 evals()
from datetime import datetime
def strOfNos():
s = []
for x in range(100000):
s.append(str(x))
return s
strOfNos()
print(datetime.now())
for x in strOfNos():
print(x) #print(eval(x))
print(datetime.now())
#when using eval(int)
#2018-10-29 12:36:08.206022
#2018-10-29 12:36:10.407911
#diff = 2.201889 s
#when using int only
#2018-10-29 12:37:50.022753
#2018-10-29 12:37:51.090045
#diff = 1.67292

How to pass arguments to QTableWidget table cell signals in PyQt5 (PySide2)? [duplicate]

This question already has an answer here:
cellDoubleClicked text python
(1 answer)
Closed 3 years ago.
According to the API the PyQt5 or PySide cell oriented signals of a QTableWidget are supposed to receive two interger parameters for row, and column respectively. For example:
def cellClicked (row, column)
Now when I try to call them like that:
table=QTableWidget(5,5)
def slotCellClick1():
print('something')
table.cellClicked(0,0).connect(slotCellClick1)
, I get, TypeError: native Qt signal is not callable.
The compiling solution and so far described in examples is in this manner:
table.cellClicked.connect(slotCellClick1)
which works for cell click, in general.
Am I getting wrong the concept or there is still a way to address a specific cell signals with this api functions? Otherwise what would be the workaround to trigger specific cell click signals?
That's not how signals and slots work.
Toolkits, and APIs in general, use callbacks to notify the programmer when something happens, by calling a function to react to it; this approach usually provides an interface that can pass some arguments along with the notification.
Suppose you have a module that a certain point can change "something" in it, you want to be notified whenever that change happens and eventually do something with it:
# pseudo code
from some_api import some_object
def some_function(argument):
print("Something changed to {}!".format(argument))
some_object.set_something_changed_callback(some_function)
>>> some_object.change_something(True)
Something changed to True!
As you can see, the something_changed_callback is not about the possible value of "something", as the callback will be called anyway; if you want to react to a specific value of "something", you'll have to check that within the callback function.
While for simpler apis it's usually fine to have a set_*_changed_callback() for each possible case, in complex toolkits like Qt that would be unnecessary (adding thousands of functions, one for each signal) and confusing.
Qt (like other toolkits like Gtk) uses a similar callback technique but with an unified interface to connect all signals to their "callbacks"; the concept doesn't change that much, at least from the coding perspective, but it makes things easier.
Originally, the syntax was like this:
QObject.connect(some_object, SIGNAL("something_changed(bool)", some_function)
but since some years it's been simplified to the "new style" connection:
some_object.something_changed.connect(some_function)
which is almost the same as the above:
some_object.set_something_changed_callback(some_function)
So, long story short, you can't connect to a specific signal "result", you'll have to check it by yourself.
I can understand your point of view: «I'm interested in calling my slot only when the value is x/y/z». It would make sense, but that kind of interface could be problematic from the api implementation point of view.
Most importantly, a lot of signals emit objects that are class instancies (QModelIndex, QStandardItem, etc) that are created at runtime or even have parents that don't exist yet when you have to connect them, or are mutable objects (one might want to check if a list or dictionary is equal to the one emitted, or if is the same).
Also, some signals have multiple arguments, and one could be interested in checking only some or one of them, but that kind of checking would be almost impossible to create with a simple function argument without any possibility of error or exception. Let's say you want to connect to cellClicked whenever the column is 1, no matter what row; you'd probably think that a good way would be to use cellClicked(None, 1), cellClicked(False, 1) or cellClicked(-1, 1), but some signals actually return None, False or -1, so there wouldn't be a simple standardized way to tell "ignore that argument" (if not by using a custom type).
After searching I found an answer that answers my question for a specific case of cellDoubleClicked https://stackoverflow.com/a/46738897/3597222

What is the 'better' approach to configuring a tkinter widget?

I've worked with tkinter a bit of a time now.
There are two ways for configuration or at at least I just know two:
1: frame.config(bg='#123456')
2: frame["bg"] = '#123456'
I use the latter more often. Only if there are more things to be done at the same time, the second seems useful for me.
Recently I was wondering if one of them is 'better' like faster or has any other advantage.
I don't think it's a crucially important Question but maybe someone knows it.
Studying the tkinter code base, we find the following:
class Frame(Widget):
# Other code here
class Widget(BaseWidget, Pack, Place, Grid):
pass
class BaseWidget(Misc):
# other code here
class Misc:
# various code
def __setitem__(self, key, value):
self.configure({key: value})
Therefore, the two methods are actually equivalent. The line
frame['bg'] = '#123456'
is interpreted as frame.__setitem__('bg','#123456'), which after passing through the inheritance chain finds itself on the internal class Misc which simply passes it to the configure method. As far as your question about efficiency is concerned, the first method is probably slightly faster because it doesn't need to be interpreted as much, but the speed difference is too little to be overly concerned with.

What's the difference between T and def in groovy?

I was working with some SQL earlier that got me wondering what the difference was between these two typings.
In my example, I have 2 GroovyRowResults - pastData and currentData. Now, I need to compare 2 points from these result sets. These values should both be of indefinite type. So, when defining them, what's the difference between
def pastResult = pastData[commonKey]
def currentResult = currentData[commonKey]
if(pastResult == currentResult){
doSomething()
}
and
T pastResult = pastData[commonKey]
T currentResult = currentData[commonKey]
if(pastResult == currentResult){
doSomething()
}
I'm assuming T has been declared in your method/class earlier. In that case, it's a generic and the T refers to the same type of object consistently, whereas def is basically just an alias for Object.
T doesn't guarantee the two objects are the exact same class (they may just implement the same interface, or one may be a subclass), it does create more of a contract in the objects that you are dealing with. If you pass the same types of objects into the method, then there will be no difference, but if you pass different or unexpected types, it is more useful.
In other words, in Groovy it's done for readability and consistency, and using generics is much better than using dynamic typing.
I don't think the second example will work unless there is some kind of object called T. Check this link
http://groovy-lang.org/semantics.html#_variable_definition

What programming languages will let me manipulate the sequence of instructions in a method?

I have an upcoming project in which a core requirement will be to mutate the way a method works at runtime. Note that I'm not talking about a higher level OO concept like "shadow one method with another", although the practical effect would be similar.
The key properties I'm after are:
I must be able to modify the method in such a way that I can add new expressions, remove existing expressions, or modify any of the expressions that take place in it.
After modifying the method, subsequent calls to that method would invoke the new sequence of operations. (Or, if the language binds methods rather than evaluating every single time, provide me a way to unbind/rebind the new method.)
Ideally, I would like to manipulate the atomic units of the language (e.g., "invoke method foo on object bar") and not the assembly directly (e.g. "pop these three parameters onto the stack"). In other words, I'd like to be able to have high confidence that the operations I construct are semantically meaningful in the language. But I'll take what I can get.
If you're not sure if a candidate language meets these criteria, here's a simple litmus test:
Can you write another method called clean which:
accepts a method m as input
returns another method m2 that performs the same operations as m
such that m2 is identical to m, but doesn't contain any calls to the print-to-standard-out method in your language (puts, System.Console.WriteLn, println, etc.)?
I'd like to do some preliminary research now and figure out what the strongest candidates are. Having a large, active community is as important to me as the practicality of implementing what I want to do. I am aware that there may be some unforged territory here, since manipulating bytecode directly is not typically an operation that needs to be exposed.
What are the choices available to me? If possible, can you provide a toy example in one or more of the languages that you recommend, or point me to a recent example?
Update: The reason I'm after this is that I'd like to write a program which is capable of modifying itself at runtime in response to new information. This modification goes beyond mere parameters or configurable data, but full-fledged, evolved changes in behavior. (No, I'm not writing a virus. ;) )
Well, you could always use .NET and the Expression libraries to build up expressions. That I think is really your best bet as you can build up representations of commands in memory and there is good library support for manipulating, traversing, etc.
Well, those languages with really strong macro support (in particular Lisps) could qualify.
But are you sure you actually need to go this deeply? I don't know what you're trying to do, but I suppose you could emulate it without actually getting too deeply into metaprogramming. Say, instead of using a method and manipulating it, use a collection of functions (with some way of sharing state, e.g. an object holding state passed to each).
I would say Groovy can do this.
For example
class Foo {
void bar() {
println "foobar"
}
}
Foo.metaClass.bar = {->
prinltn "barfoo"
}
Or a specific instance of foo without effecting other instances
fooInstance.metaClass.bar = {->
println "instance barfoo"
}
Using this approach I can modify, remove or add expression from the method and Subsequent calls will use the new method. You can do quite a lot with the Groovy metaClass.
In java, many professional framework do so using the open source ASM framework.
Here is a list of all famous java apps and libs including ASM.
A few years ago BCEL was also very much used.
There are languages/environments that allows a real runtime modification - for example, Common Lisp, Smalltalk, Forth. Use one of them if you really know what you're doing. Otherwise you can simply employ an interpreter pattern for an evolving part of your code, it is possible (and trivial) with any OO or functional language.

Resources