Related
I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.
I'm trying to get numpy to use my own implementation of an RNG in for consistency reasons. My understanding, based on the little documentation I could find, from the numpy docs here and here is that I need to provide a custom BitGenerator class that implements the random_raw method and then initialise using np.random.Generator, so I tried this:
import numpy as np
class TestBitGenerator(np.random.BitGenerator):
def __init__(self):
super().__init__(0)
self.counter = 0
def random_raw(self, size=None):
self.counter += 1
if size is None:
return self.counter
return np.full(n, self.counter)
mc_adapter = TestBitGenerator()
npgen = np.random.Generator(mc_adapter)
print(npgen.random())
which results in a segfault:
$ python bitgen.py
Segmentation fault (core dumped)
I assume I'm missing something (from TestBitGenerator?) here, can anyone point me in the right direction?
I tried not subclassing np.random.BitGenerator, and got a different error:
object has no attribute 'capsule'
I using numpy 1.19.2 and python 3.8.2
Actually you can do it in pure python if you wrap it using the library RandomGen (this was the incubator for the current np.random.Generator). So there is a UserBitGenerator that allow you to use only python: https://bashtage.github.io/randomgen/bit_generators/userbitgenerator.html
Sad that this did not make it in numpy (or if it is I did not find it yet...).
If you are using pybind11, it's not too hard, although there's no good documentation explaining any of this, so it took me a while to figure it out. Here is what I came up with, so hopefully this will save a few people some consternation.
Define your custom bit generator in C++. You'll need to define three functions: next_unint32, next_uint64, and next_double, each returning the next random value of the given type. (Typically one of these will be primary, and you'll define the other two based on it.) They all need to take one argument, given as void*, which you'll want to recast to whatever you actually have for a state object. Say a pointer to an instance of some class CustomRNG or whatever.
Write a function that takes a pointer to your state object and a py::capsule (I use namespace py = pybind11, so spell that out if you don't do this.) This capsule wraps a bitgen_t struct, and writes the appropriate values to its elements. Something like the following:
#include <numpy/random/bitgen.h> // For bitgen_t struct
// For reference:
/*
typedef struct bitgen {
void *state;
uint64_t (*next_uint64)(void *st);
uint32_t (*next_uint32)(void *st);
double (*next_double)(void *st);
uint64_t (*next_raw)(void *st);
} bitgen_t;
*/
void SetupBitGen(CustomRNG* rng, py::capsule capsule)
{
bitgen_t* bg(capsule);
bg->state = rng;
bg->next_uint64 = next_uint64;
bg->next_uint32 = next_uint32;
bg->next_double = next_double;
bg->next_raw = next_uint64;
};
Have pybind11 wrap this for you so you can call it from python. I'm also assuming that your CustomRNG type is already wrapped and accessible in python.
my_module.def("setup_bitgen", &SetupBitGen)
Or you could make this function a method of your CustomRNG class:
py::class_<CustomRNG>
[...]
.def("setup_bitgen", &SetupBitGen);
In Python make a class derived from numpy.random.BitGenerator, which takes an instance of your CustomRNG as an argument. For the __init__, first call super().__init__() to make the things numpy expects to be there. Then call lib.setup_bitgen(self.rng, self.capsule) (or self.rng.setup_bitgen(self.capsule) if you went the method route) to update the elements of the capsule.
That's it. Now you can make your own BitGenerator object using this class, and pass that as an argument of numpy.random.Generator.
This is the method I used in GalSim, so you can see a worked example of it here.
I did some digging here and Generator's constructor looks like this:
def __init__(self, bit_generator):
self._bit_generator = bit_generator
capsule = bit_generator.capsule
cdef const char *name = "BitGenerator"
if not PyCapsule_IsValid(capsule, name):
raise ValueError("Invalid bit generator. The bit generator must "
"be instantiated.")
self._bitgen = (<bitgen_t *> PyCapsule_GetPointer(capsule, name))[0]
self.lock = bit_generator.lock
also here explains the requirements for the bitgen_t struct:
struct bitgen:
void *state
npy_uint64 (*next_uint64)(void *st) nogil
uint32_t (*next_uint32)(void *st) nogil
double (*next_double)(void *st) nogil
npy_uint64 (*next_raw)(void *st) nogil
ctypedef bitgen bitgen_t
so from what I can make out, it looks like any implementation of BitGenerator must be done in cython, any attempt to implement one in pure python (and probably pybind11 too) just won't work(?)
As I'm not familiar with cython and if/how it could coexist with pybind11, for now I'm just going to ensure each of my (parallel) processes explicitly calls np.random.Generator with a numpy RNG seeded deterministically so that it's independent from my own RNG steam.
How can I pass an integer by reference in Python?
I want to modify the value of a variable that I am passing to the function. I have read that everything in Python is pass by value, but there has to be an easy trick. For example, in Java you could pass the reference types of Integer, Long, etc.
How can I pass an integer into a function by reference?
What are the best practices?
It doesn't quite work that way in Python. Python passes references to objects. Inside your function you have an object -- You're free to mutate that object (if possible). However, integers are immutable. One workaround is to pass the integer in a container which can be mutated:
def change(x):
x[0] = 3
x = [1]
change(x)
print x
This is ugly/clumsy at best, but you're not going to do any better in Python. The reason is because in Python, assignment (=) takes whatever object is the result of the right hand side and binds it to whatever is on the left hand side *(or passes it to the appropriate function).
Understanding this, we can see why there is no way to change the value of an immutable object inside a function -- you can't change any of its attributes because it's immutable, and you can't just assign the "variable" a new value because then you're actually creating a new object (which is distinct from the old one) and giving it the name that the old object had in the local namespace.
Usually the workaround is to simply return the object that you want:
def multiply_by_2(x):
return 2*x
x = 1
x = multiply_by_2(x)
*In the first example case above, 3 actually gets passed to x.__setitem__.
Most cases where you would need to pass by reference are where you need to return more than one value back to the caller. A "best practice" is to use multiple return values, which is much easier to do in Python than in languages like Java.
Here's a simple example:
def RectToPolar(x, y):
r = (x ** 2 + y ** 2) ** 0.5
theta = math.atan2(y, x)
return r, theta # return 2 things at once
r, theta = RectToPolar(3, 4) # assign 2 things at once
Not exactly passing a value directly, but using it as if it was passed.
x = 7
def my_method():
nonlocal x
x += 1
my_method()
print(x) # 8
Caveats:
nonlocal was introduced in python 3
If the enclosing scope is the global one, use global instead of nonlocal.
Maybe it's not pythonic way, but you can do this
import ctypes
def incr(a):
a += 1
x = ctypes.c_int(1) # create c-var
incr(ctypes.ctypes.byref(x)) # passing by ref
Really, the best practice is to step back and ask whether you really need to do this. Why do you want to modify the value of a variable that you're passing in to the function?
If you need to do it for a quick hack, the quickest way is to pass a list holding the integer, and stick a [0] around every use of it, as mgilson's answer demonstrates.
If you need to do it for something more significant, write a class that has an int as an attribute, so you can just set it. Of course this forces you to come up with a good name for the class, and for the attribute—if you can't think of anything, go back and read the sentence again a few times, and then use the list.
More generally, if you're trying to port some Java idiom directly to Python, you're doing it wrong. Even when there is something directly corresponding (as with static/#staticmethod), you still don't want to use it in most Python programs just because you'd use it in Java.
Maybe slightly more self-documenting than the list-of-length-1 trick is the old empty type trick:
def inc_i(v):
v.i += 1
x = type('', (), {})()
x.i = 7
inc_i(x)
print(x.i)
A numpy single-element array is mutable and yet for most purposes, it can be evaluated as if it was a numerical python variable. Therefore, it's a more convenient by-reference number container than a single-element list.
import numpy as np
def triple_var_by_ref(x):
x[0]=x[0]*3
a=np.array([2])
triple_var_by_ref(a)
print(a+1)
output:
7
The correct answer, is to use a class and put the value inside the class, this lets you pass by reference exactly as you desire.
class Thing:
def __init__(self,a):
self.a = a
def dosomething(ref)
ref.a += 1
t = Thing(3)
dosomething(t)
print("T is now",t.a)
In Python, every value is a reference (a pointer to an object), just like non-primitives in Java. Also, like Java, Python only has pass by value. So, semantically, they are pretty much the same.
Since you mention Java in your question, I would like to see how you achieve what you want in Java. If you can show it in Java, I can show you how to do it exactly equivalently in Python.
class PassByReference:
def Change(self, var):
self.a = var
print(self.a)
s=PassByReference()
s.Change(5)
class Obj:
def __init__(self,a):
self.value = a
def sum(self, a):
self.value += a
a = Obj(1)
b = a
a.sum(1)
print(a.value, b.value)// 2 2
In Python, everything is passed by value, but if you want to modify some state, you can change the value of an integer inside a list or object that's passed to a method.
integers are immutable in python and once they are created we cannot change their value by using assignment operator to a variable we are making it to point to some other address not the previous address.
In python a function can return multiple values we can make use of it:
def swap(a,b):
return b,a
a,b=22,55
a,b=swap(a,b)
print(a,b)
To change the reference a variable is pointing to we can wrap immutable data types(int, long, float, complex, str, bytes, truple, frozenset) inside of mutable data types (bytearray, list, set, dict).
#var is an instance of dictionary type
def change(var,key,new_value):
var[key]=new_value
var =dict()
var['a']=33
change(var,'a',2625)
print(var['a'])
Suppose I have the following C++ function:
int summap(const map<int,int>& m) {
...
}
I try to call it from Python using cppyy, by sending a dict:
import cppyy
cppyy.include("functions.hpp")
print(cppyy.gbl.summap({55:1,66:2,77:3}))
I get an error:
TypeError: int ::summap(const map<int,int>& v) =>
TypeError: could not convert argument 1
How can I call this function?
There is no relation between Python's dict and C++'s std::map (the two have completely different internal structure), so this requires a conversion, there is currently no automatic one in cppyy, so do something like this:
cppm = cppyy.gbl.std.map[int, int]()
for key, value in {55:1,66:2,77:3}.items():
cppm[key] = value
then pass cppm to summap.
Automatic support for python list/tuple -> std::vector is available, but there, too, it's no smarter than copying (likewise, b/c the internal structure is completely different), so any automatic std::map <-> python dict conversion would internally still have to do a copy like the above.
I just started working with PyCharm Community Edition 2016.3.2 today. Every time I assign a value from my function at_square, it warns me that 'Function at_square doesn't return anything,' but it definitely does in every instance unless an error is raised during execution, and every use of the function is behaving as expected. I want to know why PyCharm thinks it doesn't and if there's anything I can do to correct it. (I know there is an option to suppress the warning for that particular function, but it does so by inserting a commented line in my code above the function, and I find it just as annoying to have to remember to take that out at the end of the project.)
This is the function in question:
def at_square(self, square):
""" Return the value at the given square """
if type(square) == str:
file, rank = Board.tup_from_an(square)
elif type(square) == tuple:
file, rank = square
else:
raise ValueError("Expected tuple or AN str, got " + str(type(square)))
if not 0 <= file <= 7:
raise ValueError("File out of range: " + str(file))
if not 0 <= rank <= 7:
raise ValueError("Rank out of range: " + str(rank))
return self.board[file][rank]
If it matters, this is more precisely a method of an object. I stuck with the term 'function' because that is the language PyCharm is using.
My only thought is that my use of error raising might be confusing PyCharm, but that seems too simple. (Please feel free to critique my error raising, as I'm not sure this is the idiomatic way to do it.)
Update: Humorously, if I remove the return line altogether, the warning goes away and returns immediately when I put it back. It also goes away if I replace self.board[file][rank] with a constant value like 8. Changing file or rank to constant values does not remove the warning, so I gather that PyCharm is somehow confused about the nature of self.board, which is a list of 8 other lists.
Update: Per the suggestion of #StephenRauch, I created a minimal example that reflects everything relevant to data assignment done by at_square:
class Obj:
def __init__(self):
self.nested_list = [[0],[1]]
#staticmethod
def tup_method(data):
return tuple(data)
def method(self,data):
x,y = Obj.tup_method(data)
return self.nested_list[x][y]
def other_method(self,data):
value = self.method(data)
print(value)
x = Obj()
x.other_method([1,2])
PyCharm doesn't give any warnings for this. In at_square, I've tried commenting out every single line down to the two following:
def at_square(self, square):
file, rank = Board.tup_from_an(square)
return self.board[file][rank]
PyCharm gives the same warning. If I leave only the return line, then and only then does the warning disappear. PyCharm appears to be confused by the simultaneous assignment of file and rank via tup_from_an. Here is the code for that method:
#staticmethod
def tup_from_an(an):
""" Convert a square in algebraic notation into a coordinate tuple """
if an[0] in Board.a_file_dict:
file = Board.a_file_dict[an[0]]
else:
raise ValueError("Invalid an syntax (file out of range a-h): " + str(an))
if not an[1].isnumeric():
raise ValueError("Invalid an syntax (rank out of range 1-8): " + str(an))
elif int(an[1]) - 1 in Board.n_file_dict:
rank = int(an[1]) - 1
else:
raise ValueError("Invalid an syntax (rank out of range 1-8): " + str(an))
return file, rank
Update: In its constructor, the class Board (which is the parent class for all these methods) saves a reference to the instance in a static variable instance. self.at_square(square) gives the warning, while Board.instance.at_square(square) does not. I'm still going to use the former where appropriate, but that could shed some light on what PyCharm is thinking.
PyCharm assumes a missing return value if the return value statically evaluates to None. This can happen if initialising values using None, and changing their type later on.
class Foo:
def __init__(self):
self.qux = [None] # infers type for Foo().qux as List[None]
def bar(self):
return self.qux[0] # infers return type as None
At this point, Foo.bar is statically inferred as (self: Foo) -> None. Dynamically changing the type of qux via side-effects does not update this:
foo = Foo()
foo.qux = [2] # *dynamic* type of foo.bar() is now ``(self: Foo) -> int``
foo_bar = foo.bar() # Function 'bar' still has same *static* type
The problem is that you are overwriting a statically inferred class attribute by means of a dynamically assigned instance attribute. That is simply not feasible for static analysis to catch in general.
You can fix this with an explicit type hint.
import typing
class Foo:
def __init__(self):
self.qux = [None] # type: typing.List[int]
def bar(self):
return self.qux[0] # infers return type as int
Since Python 3.5, you can also use inline type hints. These are especially useful for return types.
import typing
class Foo:
def __init__(self):
# initial type hint to enable inference
self.qux: typing.List[int] = [None]
# explicit return type hint to override inference
def bar(self) -> int:
return self.qux[0] # infers return type as int
Note that it is still a good idea to rely on inference where it works! Annotating only self.qux makes it easier to change the type later on. Annotating bar is mostly useful for documentation and to override incorrect inference.
If you need to support pre-3.5, you can also use stub files. Say your class is in foomodule.py, create a file called foomodule.pyi. Inside, just add the annotated fields and function signatures; you can (and should) leave out the bodies.
import typing
class Foo:
# type hint for fields
qux: typing.List[int]
# explicit return type hint to override inference
def bar(self) -> int:
...
Type hinting as of Python 3.6
The style in the example below is now recommended:
from typing import typing
class Board:
def __init__(self):
self.board: List[List[int]] = []
Quick Documentation
Pycharm's 'Quick Documentation' show if you got the typing right. Place the cursor in the middle of the object of interest and hit Ctrl+Q. I suspect the types from tup_from_an(an) is not going to be as desired. You could, try and type hint all the args and internal objects, but it may be better value to type hint just the function return types. Type hinting means I don't need to trawl external documentation, so I focus effort on objects that'll be used by external users and try not to do too much internal stuff. Here's both arg and return type hinting:
#staticmethod
def tup_from_an(an: List[int]) -> (int, int):
...
Clear Cache
Pycharm can lock onto out dated definitions. Doesn't hurt to go help>find action...>clear caches
No bodies perfect
Python is constantly improving (Type hinting was updated in 3.7) Pycharm is also constantly improving. The price for the fast pace of development on these relatively immature advanced features means checking or submitting to their issue tracker may be the next call.