numpy implementing a custom RNG - python-3.x

I'm trying to get numpy to use my own implementation of an RNG in for consistency reasons. My understanding, based on the little documentation I could find, from the numpy docs here and here is that I need to provide a custom BitGenerator class that implements the random_raw method and then initialise using np.random.Generator, so I tried this:
import numpy as np
class TestBitGenerator(np.random.BitGenerator):
def __init__(self):
super().__init__(0)
self.counter = 0
def random_raw(self, size=None):
self.counter += 1
if size is None:
return self.counter
return np.full(n, self.counter)
mc_adapter = TestBitGenerator()
npgen = np.random.Generator(mc_adapter)
print(npgen.random())
which results in a segfault:
$ python bitgen.py
Segmentation fault (core dumped)
I assume I'm missing something (from TestBitGenerator?) here, can anyone point me in the right direction?
I tried not subclassing np.random.BitGenerator, and got a different error:
object has no attribute 'capsule'
I using numpy 1.19.2 and python 3.8.2

Actually you can do it in pure python if you wrap it using the library RandomGen (this was the incubator for the current np.random.Generator). So there is a UserBitGenerator that allow you to use only python: https://bashtage.github.io/randomgen/bit_generators/userbitgenerator.html
Sad that this did not make it in numpy (or if it is I did not find it yet...).

If you are using pybind11, it's not too hard, although there's no good documentation explaining any of this, so it took me a while to figure it out. Here is what I came up with, so hopefully this will save a few people some consternation.
Define your custom bit generator in C++. You'll need to define three functions: next_unint32, next_uint64, and next_double, each returning the next random value of the given type. (Typically one of these will be primary, and you'll define the other two based on it.) They all need to take one argument, given as void*, which you'll want to recast to whatever you actually have for a state object. Say a pointer to an instance of some class CustomRNG or whatever.
Write a function that takes a pointer to your state object and a py::capsule (I use namespace py = pybind11, so spell that out if you don't do this.) This capsule wraps a bitgen_t struct, and writes the appropriate values to its elements. Something like the following:
#include <numpy/random/bitgen.h> // For bitgen_t struct
// For reference:
/*
typedef struct bitgen {
void *state;
uint64_t (*next_uint64)(void *st);
uint32_t (*next_uint32)(void *st);
double (*next_double)(void *st);
uint64_t (*next_raw)(void *st);
} bitgen_t;
*/
void SetupBitGen(CustomRNG* rng, py::capsule capsule)
{
bitgen_t* bg(capsule);
bg->state = rng;
bg->next_uint64 = next_uint64;
bg->next_uint32 = next_uint32;
bg->next_double = next_double;
bg->next_raw = next_uint64;
};
Have pybind11 wrap this for you so you can call it from python. I'm also assuming that your CustomRNG type is already wrapped and accessible in python.
my_module.def("setup_bitgen", &SetupBitGen)
Or you could make this function a method of your CustomRNG class:
py::class_<CustomRNG>
[...]
.def("setup_bitgen", &SetupBitGen);
In Python make a class derived from numpy.random.BitGenerator, which takes an instance of your CustomRNG as an argument. For the __init__, first call super().__init__() to make the things numpy expects to be there. Then call lib.setup_bitgen(self.rng, self.capsule) (or self.rng.setup_bitgen(self.capsule) if you went the method route) to update the elements of the capsule.
That's it. Now you can make your own BitGenerator object using this class, and pass that as an argument of numpy.random.Generator.
This is the method I used in GalSim, so you can see a worked example of it here.

I did some digging here and Generator's constructor looks like this:
def __init__(self, bit_generator):
self._bit_generator = bit_generator
capsule = bit_generator.capsule
cdef const char *name = "BitGenerator"
if not PyCapsule_IsValid(capsule, name):
raise ValueError("Invalid bit generator. The bit generator must "
"be instantiated.")
self._bitgen = (<bitgen_t *> PyCapsule_GetPointer(capsule, name))[0]
self.lock = bit_generator.lock
also here explains the requirements for the bitgen_t struct:
struct bitgen:
void *state
npy_uint64 (*next_uint64)(void *st) nogil
uint32_t (*next_uint32)(void *st) nogil
double (*next_double)(void *st) nogil
npy_uint64 (*next_raw)(void *st) nogil
ctypedef bitgen bitgen_t
so from what I can make out, it looks like any implementation of BitGenerator must be done in cython, any attempt to implement one in pure python (and probably pybind11 too) just won't work(?)
As I'm not familiar with cython and if/how it could coexist with pybind11, for now I'm just going to ensure each of my (parallel) processes explicitly calls np.random.Generator with a numpy RNG seeded deterministically so that it's independent from my own RNG steam.

Related

Python enum.Enum: Create variables which i can assign enum.Enum members

Creating enumerations in Python 3.4+ is pretty easy:
from enum import Enum
class MyEnum(Enum):
A = 10
B = 20
This gets me a typedef MyEnum.
With this i can assign a variable:
x = MyEnum.A
So far so good.
However things start to get complicate if i like to use enum.Enum's as arguments to functions or class methods and want to assure that class attributes only hold enum.Enum members but not other values.
How can i do this? My idea is sth like this, which i consider more as a workaround than a solution:
class EnContainer:
def __init__(self, val: type(MyEnum.A) = MyEnum.A):
assert isinstance(val, type(MyEnum.A))
self._value = val
Do you have any suggestions or do you see any problems with my approach? I have to consider about 10 different enumerations and would like to come to a consistent approach for initialization, setters and getters.
Instead of type(MyEnum.A), just use MyEnum:
def __init__(self, val: MyEnum = MyEnum.A):
assert isinstance(val, MyEnum)
Never use assert for error checking, they are for program validation -- in other words, who is calling EnContainer? If only your own code is calling it with already validated data, then assert is fine; but if code outside your control is calling it, then you should be using proper error checking:
def __init__(self, val: MyEnum = MyEnum.A):
if not isinstance(val, MyEnum):
raise ValueError(
"EnContainer called with %s.%r (should be a 'MyEnum')"
% (type(val), val)
)

Is there a way changing actual value of an int without creating a new instance? [duplicate]

How can I pass an integer by reference in Python?
I want to modify the value of a variable that I am passing to the function. I have read that everything in Python is pass by value, but there has to be an easy trick. For example, in Java you could pass the reference types of Integer, Long, etc.
How can I pass an integer into a function by reference?
What are the best practices?
It doesn't quite work that way in Python. Python passes references to objects. Inside your function you have an object -- You're free to mutate that object (if possible). However, integers are immutable. One workaround is to pass the integer in a container which can be mutated:
def change(x):
x[0] = 3
x = [1]
change(x)
print x
This is ugly/clumsy at best, but you're not going to do any better in Python. The reason is because in Python, assignment (=) takes whatever object is the result of the right hand side and binds it to whatever is on the left hand side *(or passes it to the appropriate function).
Understanding this, we can see why there is no way to change the value of an immutable object inside a function -- you can't change any of its attributes because it's immutable, and you can't just assign the "variable" a new value because then you're actually creating a new object (which is distinct from the old one) and giving it the name that the old object had in the local namespace.
Usually the workaround is to simply return the object that you want:
def multiply_by_2(x):
return 2*x
x = 1
x = multiply_by_2(x)
*In the first example case above, 3 actually gets passed to x.__setitem__.
Most cases where you would need to pass by reference are where you need to return more than one value back to the caller. A "best practice" is to use multiple return values, which is much easier to do in Python than in languages like Java.
Here's a simple example:
def RectToPolar(x, y):
r = (x ** 2 + y ** 2) ** 0.5
theta = math.atan2(y, x)
return r, theta # return 2 things at once
r, theta = RectToPolar(3, 4) # assign 2 things at once
Not exactly passing a value directly, but using it as if it was passed.
x = 7
def my_method():
nonlocal x
x += 1
my_method()
print(x) # 8
Caveats:
nonlocal was introduced in python 3
If the enclosing scope is the global one, use global instead of nonlocal.
Maybe it's not pythonic way, but you can do this
import ctypes
def incr(a):
a += 1
x = ctypes.c_int(1) # create c-var
incr(ctypes.ctypes.byref(x)) # passing by ref
Really, the best practice is to step back and ask whether you really need to do this. Why do you want to modify the value of a variable that you're passing in to the function?
If you need to do it for a quick hack, the quickest way is to pass a list holding the integer, and stick a [0] around every use of it, as mgilson's answer demonstrates.
If you need to do it for something more significant, write a class that has an int as an attribute, so you can just set it. Of course this forces you to come up with a good name for the class, and for the attribute—if you can't think of anything, go back and read the sentence again a few times, and then use the list.
More generally, if you're trying to port some Java idiom directly to Python, you're doing it wrong. Even when there is something directly corresponding (as with static/#staticmethod), you still don't want to use it in most Python programs just because you'd use it in Java.
Maybe slightly more self-documenting than the list-of-length-1 trick is the old empty type trick:
def inc_i(v):
v.i += 1
x = type('', (), {})()
x.i = 7
inc_i(x)
print(x.i)
A numpy single-element array is mutable and yet for most purposes, it can be evaluated as if it was a numerical python variable. Therefore, it's a more convenient by-reference number container than a single-element list.
import numpy as np
def triple_var_by_ref(x):
x[0]=x[0]*3
a=np.array([2])
triple_var_by_ref(a)
print(a+1)
output:
7
The correct answer, is to use a class and put the value inside the class, this lets you pass by reference exactly as you desire.
class Thing:
def __init__(self,a):
self.a = a
def dosomething(ref)
ref.a += 1
t = Thing(3)
dosomething(t)
print("T is now",t.a)
In Python, every value is a reference (a pointer to an object), just like non-primitives in Java. Also, like Java, Python only has pass by value. So, semantically, they are pretty much the same.
Since you mention Java in your question, I would like to see how you achieve what you want in Java. If you can show it in Java, I can show you how to do it exactly equivalently in Python.
class PassByReference:
def Change(self, var):
self.a = var
print(self.a)
s=PassByReference()
s.Change(5)
class Obj:
def __init__(self,a):
self.value = a
def sum(self, a):
self.value += a
a = Obj(1)
b = a
a.sum(1)
print(a.value, b.value)// 2 2
In Python, everything is passed by value, but if you want to modify some state, you can change the value of an integer inside a list or object that's passed to a method.
integers are immutable in python and once they are created we cannot change their value by using assignment operator to a variable we are making it to point to some other address not the previous address.
In python a function can return multiple values we can make use of it:
def swap(a,b):
return b,a
a,b=22,55
a,b=swap(a,b)
print(a,b)
To change the reference a variable is pointing to we can wrap immutable data types(int, long, float, complex, str, bytes, truple, frozenset) inside of mutable data types (bytearray, list, set, dict).
#var is an instance of dictionary type
def change(var,key,new_value):
var[key]=new_value
var =dict()
var['a']=33
change(var,'a',2625)
print(var['a'])

Runtime compiling a function with arguments in python

I am trying to use compile to runtime generate a Python function accepting arguments as follows.
import types
import ast
code = compile("def add(a, b): return a + b", '<string>', 'exec')
fn = types.FunctionType(code, {}, name="add")
print(fn(4, 2))
But it fails with
TypeError: <module>() takes 0 positional arguments but 2 were given
Is there anyway to compile a function accepting arguments using this way or is there any other way to do that?
Compile returns the code object to create a module. In Python 3.6, if you were to disassemble your code object:
>>> import dis
>>> dis.dis(fn)
0 LOAD_CONST 0 (<code object add at ...., file "<string>" ...>)
2 LOAD_CONST 1 ('add')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (add)
8 LOAD_CONST 2 (None)
10 RETURN_VALUE
That literally translates to make function; name it 'add'; return None.
This code means that your function runs the creation of the module, not returning a module or function itself. So essentially, what you're actually doing is equivalent to the following:
def f():
def add(a, b):
return a + b
print(f(4, 2))
For the question of how do you work around, the answer is it depends on what you want to do. For instance, if you want to compile a function using compile, the simple answer is you won't be able to without doing something similar to the following.
# 'code' is the result of the call to compile.
# In this case we know it is the first constant (from dis),
# so we will go and extract it's value
f_code = code.co_consts[0]
add = FunctionType(f_code, {}, "add")
>>> add(4, 2)
6
Since defining a function in Python requires running Python code (there is no static compilation by default other than compiling to bytecode), you can pass in custom globals and locals dictionaries, and then extract the values from those.
glob, loc = {}, {}
exec(code, glob, loc)
>>> loc['add'](4, 2)
6
But the real answer is if you want to do this, the simplest way is generally to generate Abstract Syntax Trees using the ast module, and compiling that into module code and evaluating or executing the module.
If you want to do bytecode transformation, I'd suggest looking at the codetransformer package on PyPi.
TL;DR using compile will only ever return code for a module, and most serious code generation is done either with ASTs or by manipulating byte codes.
is there any other way to do that?
For what's worth: I recently created a #compile_fun goodie that considerably eases the process of applying compile on a function. It relies on compile so nothing different than was explained by the above answers, but it provides an easier way to do it. Your example writes:
#compile_fun
def add(a, b):
return a + b
assert add(1, 2) == 3
You can see that you now can't debug into add with your IDE. Note that this does not improve runtime performance, nor protects your code from reverse-engineering, but it might be convenient if you do not want your users to see the internals of your function when they debug. Note that the obvious drawback is that they will not be able to help you debug your lib, so use with care!
See makefundocumentation for details.
I think this accomplishes what you want in a better way
import types
text = "lambda (a, b): return a + b"
code = compile(text, '<string>', 'eval')
body = types.FunctionType(code, {})
fn = body()
print(fn(4, 2))
The function being anonymous resolves the implicit namespace issues.
And returning it as a value by using the mode 'eval' is cleaner that lifting it out of the code contents, since it does not rely upon the specific habits of the compiler.
More usefully, as you seem to have noticed but not gotten to using yet, since you import ast, the text passsed to compile can actually be an ast object, so you can use ast transformation on it.
import types
import ast
from somewhere import TransformTree
text = "lambda (a, b): return a + b"
tree = ast.parse(text)
tree = TransformTree().visit(tree)
code = compile(text, '<string>', 'eval')
body = types.FunctionType(code, {})
fn = body()
print(fn(4, 2))

How to access object struct fields of subclasses from a Python C extension?

I'm writing a Python C extension that wraps an external C library. In the original library there are structs (of type T for the sake of the discussion), so my extension class looks like this:
typedef struct {
PyObject_HEAD
T *cdata;
} TWrapperBase;
I also need to look up pointers in Python from time to time, so I exposed a read-only field _cdata that is a cdata pointer as unsigned long long (yes, I know it's not very portable, but it's out of scope now).
Then, I want to be able to add some more methods in Python, but I can't just append them to a class declared in C, so I subclass it and add my new methods:
class TWrapper(TWrapperBase):
...
Now, in my C extension code I need a way of accesing cdata field, so I can pass it to library functions. I know that self won't be an instance of TWrapperBase, but rather TWrapper (this Python version). What is the proper way to do this?
static PyObject * doStuff(PyObject *self)
{
T *cdata_ptr;
// How to get a pointer to cdata?
//
// This looks very unsafe to me, do I have any guarantee of
// the subclass memory layout?
// 1. cdata_ptr = ((TWrapperBase*)self)->cdata
//
// This is probably safe, but it seems to be a bit of a hassle
// to query it with a string key
// 2. cdata_ptr = PyLong_AsVoidPtr(PyObject_GetAttrString(self, "_cdata"))
do_important_library_stuff(cdata_ptr);
Py_INCREF(self);
return self;
}
Thanks!
// This looks very unsafe to me, do I have any guarantee of
// the subclass memory layout?
// 1. cdata_ptr = ((TWrapperBase*)self)->cdata
Yeah, that works. You can look at all the implementations of Python's built-in types and see that they do pretty much the same thing, usually without checking whether they're operating on a subclass instance.

PyQt_PyObject equivalent when using new-style signals/slots?

So I have a need to pass around a numpy array in my PyQt Application. I first tried using the new-style signals/slots, defining my signal with:
newChunkToProcess = pyqtSignal(np.array()), however this gives the error:
TypeError: Required argument 'object' (pos 1) not found
I have worked out how to do this with the old-style signals and slots using
self.emit(SIGNAL("newChunkToProcess(PyQt_PyObject)"), np.array([5,1,2])) - (yes, that's just testing data :), but I was wondering, is it possible to do this using the new-style system?
The type you're looking for is np.ndarray
You can tell this from the following code:
>>> arr = np.array([]) # create an array instance
>>> type(arr) # ask 'what type is this object?'
<type 'numpy.ndarray'>
So your signal should look more like:
newChunkToProcess = pyqtSignal(np.ndarray)
(Notice I'm passing the type np.ndarray, rather than an array instance as you tried).
If you don't want to worry about the type of the argument, you could instead use:
newChunkToProcess = pyqtSignal(object)
This should let you send any data type at all through the signal.
Also: numpy and Qt do not share any major functionality that I know of. In fact, the two are quite complementary and make a very powerful combination.
You are doing it wrong. You have to pass the data object type: int, str, ... in your case list
Like I am doing:
images = pyqtSignal(int, str);
failed = pyqtSignal(str, str);
finished = pyqtSignal(int)

Resources