How to use ray parallelism within a class in python? - python-3.x

I want to use the ray task method rather than the ray actor method to parallelise a method within a class. The reason being the latter seems to need to change how a class is instantiated (as shown here). A toy code example is below, as well as the error
import numpy as np
import ray
class MyClass(object):
def __init__(self):
ray.init(num_cpus=4)
#ray.remote
def func(self, x, y):
return x * y
def my_func(self):
a = [1, 2, 3]
b = np.random.normal(0, 1, 10000)
result = []
# we wish to parallelise over the array `a`
for sub_array in np.array_split(a, 3):
result.append(self.func.remote(sub_array, b))
return result
mc = MyClass()
mc.my_func()
>>> TypeError: missing a required argument: 'y'
The error arises because ray does not seem to be "aware" of the class, and so it expects an argument self.
The code works fine if we do not use classes:
#ray.remote
def func(x, y):
return x * y
def my_func():
a = [1, 2, 3, 4]
b = np.random.normal(0, 1, 10000)
result = []
# we wish to parallelise over the list `a`
# split `a` and send each chunk to a different processor
for sub_array in np.array_split(a, 4):
result.append(func.remote(sub_array, b))
return result
res = my_func()
ray.get(res)
>>> [array([-0.41929678, -0.83227786, -2.69814232, ..., -0.67379119,
-0.79057845, -0.06862196]),
array([-0.83859356, -1.66455572, -5.39628463, ..., -1.34758239,
-1.5811569 , -0.13724391]),
array([-1.25789034, -2.49683358, -8.09442695, ..., -2.02137358,
-2.37173535, -0.20586587]),
array([ -1.67718712, -3.32911144, -10.79256927, ..., -2.69516478,
-3.1623138 , -0.27448782])]```
We see the output is a list of 4 arrays, as expected. How can I get MyClass to work with parallelism using ray?

a few tips:
It's generally recommended that you only use the ray.remote decorator on functions or classes in python (not bound methods).
You should be very very careful about calling ray.init inside the constructor of a function, since ray.init is not idempotent (which means your program will fail if you instantiate multiple instances of MyClass). Instead, you should make sure ray.init is only run once in your program.
I think there's 2 ways of achieving the results you're going for with Ray here.
You could move func out of the class, so it becomes a function instead of a bound method. Note that in this approach MyClass will be serialized, which means that changes that func makes to MyClass will not be reflected anywhere outside the function. In your simplified example, this doesn't appear to be a problem. This approach makes it easiest to achieve the most parallelism.
#ray.remote
def func(obj, x, y):
return x * y
class MyClass(object):
def my_func(self):
...
# we wish to parallelise over the array `a`
for sub_array in np.array_split(a, 3):
result.append(func.remote(self, sub_array, b))
return result
The other approach you could consider is to use async actors. In this approach, the ray actor will handle concurrency via asyncio, but this comes with the limitations of asyncio.
#ray.remote(num_cpus=4)
class MyClass(object):
async def func(self, x, y):
return x * y
def my_func(self):
a = [1, 2, 3]
b = np.random.normal(0, 1, 10000)
result = []
# we wish to parallelise over the array `a`
for sub_array in np.array_split(a, 3):
result.append(self.func.remote(sub_array, b))
return result

Please see this code:
#ray.remote
class Prime:
# Constructor
def __init__(self,number) :
self.num = number
def SumPrime(self,num) :
tot = 0
for i in range(3,num):
c = 0
for j in range(2, int(i**0.5)+1):
if i%j == 0:
c = c + 1
if c == 0:
tot = tot + i
return tot
num = 1000000
start = time.time()
# make an object of Check class
prime = Prime.remote(num)
print("duration =", time.time() - start, "\nsum_prime = ", ray.get(prime.SumPrime.remote(num)))

Related

How to evaluate multiple functions on one generator using asyncio instead of threading?

The Goal
This effort is towards creating an efficient solution to the following problem.
source = lambda: range(1 << 24) # for example
functions = (min, max, sum) # for example
data = tuple(source()) # from some generator
results = tuple(f(data) for f in functions)
This works. The source() function generates however many values it may. They're put into a tuple called data. Then a series of functions are called with that tuple to give the results. These functions iterate over one given parameterized iterator once and then give their result. This is fine for small datasets. However, if source() generates many, many values, they must all be stored. This can hog memory.
Possible solution
Something like...
from typing import Callable, Iterable, Tuple, TypeVar
TI = TypeVar('TI')
TO = TypeVar('TO')
def magic_function(data: Iterable[TI], fxns: Iterable[Callable[[Iterable[TI]], TO]]) -> Tuple[TO, ...]:
stored = tuple(data) # memory hog, prohibitively
return tuple(f(stored) for f in fxns)
source = lambda: range(1 << 24) # for example
functions = (min, max, sum) # for example
results = magic_function(source(), functions)
This is what I've been trying to do. This magic_function() would give the data iterator to some kind of internal asynchronous server. The fxns would then be given asynchronous clients -- which would appear to be normal iterators. The fxns can process these clients as iterators unmodified. The fxns cannot be modified. It is possible to do this with the threading module. The overhead would be horrendous, though.
Extra clarity
This should be true.
source = lambda: range(1 << 24) # for example
functions = (min, max, sum) # for example
if first_method:
data = tuple(source()) # from some generator
results = tuple(f(data) for f in functions)
else:
results = magic_function(source(), functions)
Whether first_method is True or False, for source()'s same output and the same functions, the results should always match (for single-pass iterator-consuming functions). The first computes and stores the entire dataset. This can be absently wasteful and slow. The magic method ought to save memory with minimal overhead costs (both time and memory).
Threading implementation
This is a working implementation using the threading module. It's visibly slow...
#!/usr/bin/python3
from collections import namedtuple
from random import randint
from statistics import geometric_mean, harmonic_mean, mean, median, median_high, median_low, mode
from threading import Event, Lock, Thread
from typing import *
''' https://pastebin.com/u4mTHfgc '''
int_iterable = Iterable[int]
_T = TypeVar('_T1', int, float)
_FXN_T = Callable[[int_iterable], _T]
class Server:
_it: int_iterable
slots: int
edit_slots: Lock
element: _T
available: Event
zero_slots: Event
end: bool
def __init__(self, it: int_iterable):
self._it = it
self.slots = 0
self.edit_slots = Lock()
self.available = Event()
self.zero_slots = Event()
self.end = False
def server(self, queue_length: int):
available = self.available
zero_slots = self.zero_slots
for v in self._it:
self.slots = queue_length
self.element = v
zero_slots.clear()
available.set()
zero_slots.wait()
self.slots = queue_length
self.end = True
zero_slots.clear()
available.set()
zero_slots.wait()
def client(self) -> int_iterable:
available = self.available
zero_slots = self.zero_slots
edit_slots = self.edit_slots
while True:
available.wait()
end = self.end
if not end:
yield self.element
with edit_slots:
self.slots -= 1
if self.slots == 0:
available.clear()
zero_slots.set()
zero_slots.wait()
if end:
break
class Slot:
thread: Thread
fxn: _FXN_T
server: Server
qid: int
result: Union[Optional[_T], Exception, Tuple[Exception, Exception]]
def __init__(self, fxn: _FXN_T, server: Server, qid: int):
self.thread = Thread(target = self.run, name = f'BG {id(self)} thread {qid}')
self.fxn = fxn
self.server = server
self.qid = qid
self.result = None
def run(self):
client = self.server.client()
try:
self.result = self.fxn(client)
except Exception as e:
self.result = e
try:
for _ in client: # one thread breaking won't break it all.
pass
except Exception as f:
self.result = e, f
class BranchedGenerator:
_server: Server
_queue: List[Slot]
def __init__(self, it: int_iterable):
self._server = Server(it)
self._queue = []
def new(self, fxn: _FXN_T) -> int:
qid = len(self._queue)
self._queue.append(Slot(fxn, self._server, qid))
return qid
def finalize(self):
queue = self._queue
for t in queue:
t.thread.start()
self._server.server(len(queue))
for t in queue:
t.thread.join()
def get(self, qid: int) -> _T:
return self._queue[qid].result
#classmethod
def make(cls, it: int_iterable, fxns: Iterable[_FXN_T]) -> Tuple[_T, ...]:
tmp = cls(it)
qid_range = max(map(tmp.new, fxns))
tmp.finalize()
return tuple((tmp.get(qid)) for qid in range(qid_range + 1))
seq_stats = namedtuple('seq_stats', ('tuple', 'mean', 'harmonic_mean', 'geometric_mean', 'median', 'median_high', 'median_low', 'mode'))
def bundle_bg(xs: int_iterable) -> seq_stats:
tmp = BranchedGenerator(xs)
# noinspection PyTypeChecker
ys = seq_stats(
tmp.new(tuple),
tmp.new(mean),
tmp.new(harmonic_mean),
tmp.new(geometric_mean),
tmp.new(median),
tmp.new(median_high),
tmp.new(median_low),
tmp.new(mode)
)
tmp.finalize()
return seq_stats(
tmp.get(ys.tuple),
tmp.get(ys.mean),
tmp.get(ys.harmonic_mean),
tmp.get(ys.geometric_mean),
tmp.get(ys.median),
tmp.get(ys.median_high),
tmp.get(ys.median_low),
tmp.get(ys.mode)
)
def bundle(xs: int_iterable) -> seq_stats:
return seq_stats(
tuple(xs),
mean(xs),
harmonic_mean(xs),
geometric_mean(xs),
median(xs),
median_high(xs),
median_low(xs),
mode(xs)
)
def display(v: seq_stats):
print(f'Statistics of {v.tuple}:\n'
f'\tMean: {v.mean}\n'
f'\tHarmonic Mean: {v.harmonic_mean}\n'
f'\tGeometric Mean: {v.geometric_mean}\n'
f'\tMedian: {v.median}\n'
f'\tMedian High: {v.median_high}\n'
f'\tMedian Low: {v.median_low}\n'
f'\tMode: {v.mode};')
def new(length: int, inclusive_maximum: int) -> int_iterable:
return (randint(1, inclusive_maximum) for _ in range(length))
def test1() -> int:
sample = new(10, 1 << 65)
struct1 = bundle_bg(sample)
display(struct1)
struct2 = bundle(struct1.tuple)
display(struct2)
matches = seq_stats(*(a == b for (a, b) in zip(struct1, struct2)))
display(matches)
return sum(((1 >> i) * (not e)) for (i, e) in enumerate(matches))
def test2():
sample = new(1000, 1 << 5)
struct1 = seq_stats(*BranchedGenerator.make(
sample,
(tuple, mean, harmonic_mean, geometric_mean, median, median_high, median_low, mode)
))
display(struct1)
struct2 = bundle(struct1.tuple)
display(struct2)
matches = seq_stats(*(a == b for (a, b) in zip(struct1, struct2)))
display(matches)
return sum(((1 >> i) * (not e)) for (i, e) in enumerate(matches))
def test3():
pass
if __name__ == '__main__':
exit((test2()))
The Branching Generator Module (V3) [using threading] - Pastebin.com link has the updated code. From Start to output, a half second elapses. That's just for eight functions! Both test1() and test2() have this speed issue.
Attempts
I have tried to implement magic_function() using the asyncio module.
#!/usr/bin/python3
from asyncio import Task, create_task, run, wait
from collections import deque, namedtuple
from random import randint
from statistics import geometric_mean, harmonic_mean, mean, median, median_high, median_low, mode
from typing import *
''' https://pastebin.com/ELzEaSK8 '''
int_iterable = Iterable[int]
_T = TypeVar('_T1', int, float)
ENGINE_T = AsyncGenerator[Tuple[_T, bool], int]
async def injector(engine: ENGINE_T, qid: int) -> AsyncIterator[int]:
while True:
try:
x, try_again = await engine.asend(qid)
except StopAsyncIteration:
break
if try_again:
continue
yield x
WRAPPER_FXN_T = Callable[[int_iterable], _T]
def wrapper(fxn: WRAPPER_FXN_T, engine: ENGINE_T, qid: int):
async def i():
# TypeError: 'async_generator' object is not iterable
return fxn(iter(x async for x in injector(engine, qid)))
return i
class BranchedGenerator:
_it: int_iterable
_engine: ENGINE_T
_queue: Union[tuple, deque]
def __init__(self, it: int_iterable):
self._it = it
self._engine = self._make_engine()
# noinspection PyTypeChecker
wait(self._engine)
self._queue = deque()
async def _make_engine(self) -> ENGINE_T: # it's like a server
lq = len(self._queue)
result = try_again = 0, True
for value in self._it:
waiting = set(range(lq))
while True:
qid = (yield result)
if len(waiting) == 0:
result = try_again
break
if qid in waiting:
waiting.remove(qid)
result = value, False
else:
result = try_again
def new(self, fxn: WRAPPER_FXN_T) -> int:
qid = len(self._queue)
self._queue.append(wrapper(fxn, self._engine, qid)())
return qid
def finalize(self):
self._queue = tuple(self._queue)
def get(self, qid: int) -> Task:
return create_task(self._queue[qid])
#classmethod
#(lambda f: (lambda it, fxns: run(f(it, fxns))))
def make(cls, it: int_iterable, fxns: Iterable[Callable[[int_iterable], _T]]) -> Tuple[_T, ...]:
tmp = cls(it)
qid_range = max(map(tmp.new, fxns))
tmp.finalize()
return tuple((await tmp.get(qid)) for qid in range(qid_range + 1))
seq_stats = namedtuple('seq_stats', ('tuple', 'mean', 'harmonic_mean', 'geometric_mean', 'median', 'median_high', 'median_low', 'mode'))
#(lambda f: (lambda xs: run(f(xs))))
async def bundle_bg(xs: int_iterable) -> seq_stats:
tmp = BranchedGenerator(xs)
# noinspection PyTypeChecker
ys = seq_stats(
tmp.new(tuple),
tmp.new(mean),
tmp.new(harmonic_mean),
tmp.new(geometric_mean),
tmp.new(median),
tmp.new(median_high),
tmp.new(median_low),
tmp.new(mode)
)
tmp.finalize()
return seq_stats(
await tmp.get(ys.tuple),
await tmp.get(ys.mean),
await tmp.get(ys.harmonic_mean),
await tmp.get(ys.geometric_mean),
await tmp.get(ys.median),
await tmp.get(ys.median_high),
await tmp.get(ys.median_low),
await tmp.get(ys.mode)
)
def bundle(xs: int_iterable) -> seq_stats:
return seq_stats(
tuple(xs),
mean(xs),
harmonic_mean(xs),
geometric_mean(xs),
median(xs),
median_high(xs),
median_low(xs),
mode(xs)
)
def display(v: seq_stats):
print(f'Statistics of {v.tuple}:\n'
f'\tMean: {v.mean}\n'
f'\tHarmonic Mean: {v.harmonic_mean}\n'
f'\tGeometric Mean: {v.geometric_mean}\n'
f'\tMedian: {v.median}\n'
f'\tMedian High: {v.median_high}\n'
f'\tMedian Low: {v.median_low}\n'
f'\tMode: {v.mode};')
def new(length: int, inclusive_maximum: int) -> int_iterable:
return (randint(1, inclusive_maximum) for _ in range(length))
def test1() -> int:
sample = new(10, 1 << 65)
struct1 = bundle_bg(sample)
display(struct1)
struct2 = bundle(struct1.tuple)
display(struct2)
matches = seq_stats(*(a == b for (a, b) in zip(struct1, struct2)))
display(matches)
return sum(((1 >> i) * (not e)) for (i, e) in enumerate(matches))
async def test2():
sample = new(1000, 1 << 5)
# noinspection PyTypeChecker
struct1 = seq_stats(*await BranchedGenerator.make(
sample,
(tuple, mean, harmonic_mean, geometric_mean, median, median_high, median_low, mode)
))
display(struct1)
struct2 = bundle(struct1.tuple)
display(struct2)
matches = seq_stats(*(a == b for (a, b) in zip(struct1, struct2)))
display(matches)
return sum(((1 >> i) * (not e)) for (i, e) in enumerate(matches))
async def test3():
pass
if __name__ == '__main__':
exit((test1()))
The Branching Generator Module (V2) - Pastebin.com link has the most up-to-date version. I will not be updating the embedded code! If changes are made, the pastebin copy will have them.
Tests
The test1() makes sure that bundle_bg() does what bundle() does. They should do the exact same thing.
The test2() sees if BranchedGenarator.make() behaves like bundle_bg() and (transitively) like bundle(). The BranchedGenarator.make() is supposed to be most like magic_function().
test3() has no purpose yet.
Status
The first test fails. The second test has a similar error calling BranchedGenerator.make().
[redacted]/b_gen.py:45: RuntimeWarning: coroutine 'wait' was never awaited
wait(self._engine)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
File "[redacted]/b_gen.py", line 173, in <module>
exit((test1()))
File "[redacted]/b_gen.py", line 144, in test1
struct1 = bundle_bg(sample)
File "[redacted]/b_gen.py", line 87, in <lambda>
#(lambda f: (lambda xs: run(f(xs))))
File "/usr/lib64/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "[redacted]/b_gen.py", line 103, in bundle_bg
await tmp.get(ys.tuple),
File "[redacted]/b_gen.py", line 31, in i
return fxn(iter(x async for x in injector(engine, qid)))
TypeError: 'async_generator' object is not iterable
sys:1: RuntimeWarning: coroutine 'wrapper.<locals>.i' was never awaited
In all honesty, I'm new to asyncio. I don't know how to fix this.
The Question
Can somebody help me fix this?! Please? This one with asyncio should do exactly what the one with threading does -- just without the overhead.
Another pathway
Before this, I attempted a simpler implementation.
#!/usr/bin/python3
from random import randrange
from statistics import mean as st_mean, median as st_median, mode as st_mode
from typing import Any, Callable, Iterable, Tuple, TypeVar
''' https://pastebin.com/xhfT1njJ '''
class BranchedGenerator:
_n: Iterable[int]
_stop_value: Any
def __init__(self, n: Iterable[int], stop: Any):
self._n = n
self._stop_value = stop
#property
def new(self):
return
def wrapper1(f):
new = (yield)
# SyntaxError: 'yield' inside generator expression
yield f((y for _ in new if (y := (yield)) or True))
return
_T1 = TypeVar('_T1')
_T2 = TypeVar('_T2')
def wrapper2(ns: Iterable[_T1], fs: Iterable[Callable[[Iterable[_T1]], _T2]]) -> Tuple[_T2, ...]:
def has_new():
while new:
yield True
while True:
yield False
new = True
xwf = tuple(map(wrapper1, fs))
for x in xwf:
next(x)
x.send(has_new)
next(x)
for n in ns:
for x in xwf:
x.send(n)
new = False
return tuple(map(next, xwf))
def source(n: int) -> Iterable[int]:
return (randrange(-9, 9000) for _ in range(n))
normal = (tuple, st_mean, st_median, st_mode)
def test0():
sample = tuple(source(25))
s_tuple, s_mean, s_median, s_mode = wrapper2(sample, normal)
b_tuple, b_mean, b_median, b_mode = (f(s_tuple) for f in normal)
assert all((
s_tuple == b_tuple,
s_mean == b_mean,
s_median == b_median,
s_mode == b_mode
))
def test1():
sample = source(25)
s_tuple, s_mean, s_median, s_mode = wrapper2(sample, normal)
b_tuple, b_mean, b_median, b_mode = (f(s_tuple) for f in normal)
print(
'Test1:'
'\nTuple', s_tuple, '\n', b_tuple, '\n==?', v0 := s_tuple == b_tuple,
'\nMean', s_mean, '\n', b_mean, '\n==?', v1 := s_mean == b_mean,
'\nMedian', s_median, '\n', b_median, '\n==?', v2 := s_median == b_median,
'\nMode', s_mode, '\n', b_mode, '\n==?', v3 := s_mode == b_mode,
'\nPasses', ''.join('01'[v * 1] for v in (v0, v1, v2, v3)), 'All?', all((v0, v1, v2, v3))
)
if __name__ == '__main__':
test0()
test1()
The Branching Generator Module (V1) - Pastebin.com link has the update policy.
Tests
Test 0 tells whether wrapper2() does what is supposed to do. That is to call all functions and return the results. No memory is saved, like first_method == True.
Test 1 is simply like first_method == False. The sample is not a tuple.
Problem
Ouch! I can code, I assure you.
File "[redacted]/branched_generator.py", line 25
yield f((y for _ in new if (y := (yield)) or True))
^
SyntaxError: 'yield' inside generator expression
I freely admit it: this version is dafter. The wrapper2() is obviously most like magic_function().
Question
As this is the simpler implementation, can this wrapper2() be salvaged? If not, don't sweat it.
If it's just the materialization of the data you are worried about, you could do
from itertools import tee
from statistics import geometric_mean, harmonic_mean, mean, median, median_high, median_low, mode
from random import randint
def magic_function(data, fxns):
return tuple(f(d) for f, d in zip(fxns, tee(data, len(fxns))))
def new(length: int, inclusive_maximum: int) -> Iterable[int]:
return (randint(1, inclusive_maximum) for _ in range(length))
sample = new(1000, 1 << 5)
functions = (tuple, mean, harmonic_mean, geometric_mean, median, median_high, median_low, mode)
magic_function(sample, functions)
NB tee is not thread-safe though
PS: You are right, this consumes the generator and makes n copies of all data in it.
I don't think we can salvage the async and await version in your question. The arbitrary functions in fxns will have to consume the iterators asynchronously; they have to release control flow after (roughly) each item they pop off and process. But async and await are cooperative, we can't force any given function f to await in its loop (that's why we get the TypeError). But your solution using threading does work, because at some points in their loops, threads are put to sleep pre-emptively by the VM, and in that way give opportunity for the other functions to run.
Keep in mind, there's a difference between simultaneous and concurrent. When I said a sequential roundrobin of the functions would be enough, I meant it in this way, let one of them consume an item, and then let the next one consume one. There is no need for the functions to run simultaneous. In fact, your working threading example, doesn't run anything simultaneously (on the CPython VM. IronPython and Jython may run multiple threading.Threads simultaneously, but on CPython there's only 1 running at a time)

Product feature optimization with constraints

I have trained a Lightgbm model on learning to rank dataset. The model predicts relevance score of a sample. So higher the prediction the better it is. Now that the model has learned I would like to find the best values of some features that gives me the highest prediction score.
So, lets say I have features u,v,w,x,y,z and the features I would like to optimize over are x,y,z.
maximize f(u,v,w,x,y,z) w.r.t features x,y,z where f is a lightgbm model
subject to constraints :
y = Ax + b
z = 4 if y < thresh_a else 4-0.5 if y >= thresh_b else 4-0.3
thresh_m < x <= thresh_n
The numbers are randomly made up but constraints are linear.
Objective function with respect to x looks like the following :
So the function is very spiky, non-smooth. I also don't have the gradient information as f is a lightgbm model.
Using Nathan's answer I wrote down the following class :
class ProductOptimization:
def __init__(self, estimator, features_to_change, row_fixed_values,
bnds=None):
self.estimator = estimator
self.features_to_change = features_to_change
self.row_fixed_values = row_fixed_values
self.bounds = bnds
def get_sample(self, x):
new_values = {k:v for k,v in zip(self.features_to_change, x)}
return self.row_fixed_values.replace({k:{self.row_fixed_values[k].iloc[0]:v}
for k,v in new_values.items()})
def _call_model(self, x):
pred = self.estimator.predict(self.get_sample(x))
return pred[0]
def constraint1(self, vector):
x = vector[0]
y = vector[2]
return # some float value
def constraint2(self, vector):
x = vector[0]
y = vector[3]
return #some float value
def optimize_slsqp(self, initial_values):
con1 = {'type': 'eq', 'fun': self.constraint1}
con2 = {'type': 'eq', 'fun': self.constraint2}
cons = ([con1,con2])
result = minimize(fun=self._call_model,
x0=np.array(initial_values),
method='SLSQP',
bounds=self.bounds,
constraints=cons)
return result
The results that I get are always around the initial guess. And I think its because of non-smoothness of the function and absence of any gradient information which is important for the SLSQP optimizer. Any advices how should I deal with this kind of problem ?
It's been a good minute since I last wrote some serious code, so I appologize if it's not entirely clear what everything does, please feel free to ask for more explanations
The imports:
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
from scipy.optimize import minimize
from copy import copy
First I define a new class that allows me to easily redefine values. This class has 5 inputs:
value: this is the 'base' value. In your equation y=Ax + b it's the b part
minimum: this is the minimum value this type will evaluate as
maximum: this is the maximum value this type will evaluate as
multipliers: the first tricky one. It's a list of other InputType objects. The first is the input type and the second the multiplier. In your example y=Ax +b you would have [[x, A]], if the equation was y=Ax + Bz + Cd it would be [[x, A], [z, B], [d, C]]
relations: the most tricky one. It's also a list of other InputType objects, it has four items: the first is the input type, the second defines if it's an upper boundary you use min, if it's a lower boundary you use max. The third item in the list is the value of the boundary, and the fourth the output value connected to it
Watch out if you define your input values too strangely I'm sure there's weird behaviour.
class InputType:
def __init__(self, value=0, minimum=-1e99, maximum=1e99, multipliers=[], relations=[]):
"""
:param float value: base value
:param float minimum: value can never be lower than x
:param float maximum: value can never be higher than y
:param multipliers: [[InputType, multiplier], [InputType, multiplier]]
:param relations: [[InputType, min, threshold, output_value], [InputType, max, threshold, output_value]]
"""
self.val = value
self.min = minimum
self.max = maximum
self.multipliers = multipliers
self.relations = relations
def reset_val(self, value):
self.val = value
def evaluate(self):
"""
- relations to other variables are done first if there are none then the rest is evaluated
- at most self.max
- at least self.min
- self.val + i_x * w_x
i_x is input i, w_x is multiplier (weight) of i
"""
for term, min_max, value, output_value in self.relations:
# check for each term if it falls outside of the expected terms
if min_max(term.evaluate(), value) != term.evaluate():
return self.return_value(output_value)
output_value = self.val + sum([i[0].evaluate() * i[1] for i in self.multipliers])
return self.return_value(output_value)
def return_value(self, output_value):
return min(self.max, max(self.min, output_value))
Using this, you can fix the input types sent from the optimizer, as shown in _call_model:
class Example:
def __init__(self, lst_args):
self.lst_args = lst_args
self.X = np.random.random((10000, len(lst_args)))
self.y = self.get_y()
self.clf = GradientBoostingRegressor()
self.fit()
def get_y(self):
# sum of squares, is minimum at x = [0, 0, 0, 0, 0 ... ]
return np.array([[self._func(i)] for i in self.X])
def _func(self, i):
return sum(i * i)
def fit(self):
self.clf.fit(self.X, self.y)
def optimize(self):
x0 = [0.5 for i in self.lst_args]
initial_simplex = self._get_simplex(x0, 0.1)
result = minimize(fun=self._call_model,
x0=np.array(x0),
method='Nelder-Mead',
options={'xatol': 0.1,
'initial_simplex': np.array(initial_simplex)})
return result
def _get_simplex(self, x0, step):
simplex = []
for i in range(len(x0)):
point = copy(x0)
point[i] -= step
simplex.append(point)
point2 = copy(x0)
point2[-1] += step
simplex.append(point2)
return simplex
def _call_model(self, x):
print(x, type(x))
for i, value in enumerate(x):
self.lst_args[i].reset_val(value)
input_x = np.array([i.evaluate() for i in self.lst_args])
prediction = self.clf.predict([input_x])
return prediction[0]
I can define your problem as shown below (be sure to define the inputs in the same order as the final list, otherwise not all the values will get updated correctly in the optimizer!):
A = 5
b = 2
thresh_a = 5
thresh_b = 10
thresh_c = 10.1
thresh_m = 4
thresh_n = 6
u = InputType()
v = InputType()
w = InputType()
x = InputType(minimum=thresh_m, maximum=thresh_n)
y = InputType(value = b, multipliers=([[x, A]]))
z = InputType(relations=[[y, max, thresh_a, 4], [y, min, thresh_b, 3.5], [y, max, thresh_c, 3.7]])
example = Example([u, v, w, x, y, z])
Calling the results:
result = example.optimize()
for i, value in enumerate(result.x):
example.lst_args[i].reset_val(value)
print(f"final values are at: {[i.evaluate() for i in example.lst_args]}: {result.fun)}")

How to define a callable derivative function

For a project I am working on, I am creating a class of polynomials that I can operate on. The polynomial class can do addition, subtraction, multiplication, synthetic division, and more. It also represents it properly.
For the project, we are required to do create a class for Newton's Method. I was able to create a callable function class for f, such that
>f=polynomial(2,3,4)
>f
2+3x+4x^2
>f(3)
47
I have a derivative function polynomial.derivative(f) outputs 3+8x.
I want to define a function labeled Df so that in my Newtons Method code, I can say, Df(x). It would work so that if x=2:
>Df(2)
19
The derivative of a polynomial is still a polynomial. Thus, instead of returning the string 3+8x, your polynomial.derivative function should return a new polynomial.
class polynomial:
def __init__(c, b, a):
self.coefs = [c, b, a]
[...]
def derivative(self):
return polynomial(*[i*c for i,c in enumerate(self.coefs) if i > 0], 0)
Hence you can use it as follow:
> f = polynomial(2, 3, 4)
> Df = f.derivative()
> f
2+3x+4x^2
> Df
3+8x+0x^2
> f(3)
47
> Df(2)
19
Edit
Of course, it is enumerate and not enumerates. As well, the __init__ misses the self argument. I code this directly on SO without any syntax check.
Of course you can write this in a .py file. Here is a complete working example:
class Polynomial:
def __init__(self, c, b, a):
self.coefs = [c, b, a]
self._derivative = None
#property
def derivative(self):
if self._derivative is None:
self._derivative = Polynomial(*[i*c for i,c in enumerate(self.coefs) if i > 0], 0)
return self._derivative
def __str__(self):
return "+".join([
str(c) + ("x" if i > 0 else "") + (f"^{i}" if i > 1 else "")
for i, c in enumerate(self.coefs)
if c != 0
])
def __call__(self, x):
return sum([c * (x**i) for i, c in enumerate(self.coefs)])
if __name__ == '__main__':
f = Polynomial(2, 3, 4)
print(f"f: y={f}")
print(f"f(3) = {f(3)}")
print(f"f': y={f.derivative}")
print(f"f'(2) = {f.derivative(2)}")
f: y=2+3x+4x^2
f(3) = 47
f': y=3+8x
f'(2) = 19
You can rename the property with the name you prefer: derivative, Df, prime, etc.

optimization doesn't respect constraint

I have an optimization problem and I'm solving it with scipy and the minimization module. I uses SLSQP as method, because it is the only one, which fits to my problem. The function to optimize is a cost function with 'x' as a list of percentages. I have some constraints which has to be respected:
At first, the sum of the percentages should be 1 (PercentSum(x)) This constrain is added as 'eg' (equal) as you can see in the code.
The second constraint is about a physical value which must be less then 'proberty1Max '. This constrain is added as 'ineq' (inequal). So if 'proberty1 < proberty1Max ' the function should be bigger than 0. Otherwise the function should be 0. The functions is differentiable.
Below you can see a model of my try. The problem is the 'constrain' function. I get solutions, where the sum of 'prop' is bigger than 'probertyMax'.
import numpy as np
from scipy.optimize import minimize
class objects:
def __init__(self, percentOfInput, min, max, cost, proberty1, proberty2):
self.percentOfInput = percentOfInput
self.min = min
self.max = max
self.cost = cost
self.proberty1 = proberty1
self.proberty2 = proberty2
class data:
def __init__(self):
self.objectList = list()
self.objectList.append(objects(10, 0, 20, 200, 2, 7))
self.objectList.append(objects(20, 5, 30, 230, 4, 2))
self.objectList.append(objects(30, 10, 40, 270, 5, 9))
self.objectList.append(objects(15, 0, 30, 120, 2, 2))
self.objectList.append(objects(25, 10, 40, 160, 3, 5))
self.proberty1Max = 1
self.proberty2Max = 6
D = data()
def optiFunction(x):
for index, obj in enumerate(D.objectList):
obj.percentOfInput = x[1]
costSum = 0
for obj in D.objectList:
costSum += obj.cost * obj.percentOfInput
return costSum
def PercentSum(x):
y = np.sum(x) -100
return y
def constraint(x, val):
for index, obj in enumerate(D.objectList):
obj.percentOfInput = x[1]
prop = 0
if val == 1:
for obj in D.objectList:
prop += obj.proberty1 * obj.percentOfInput
return D.proberty1Max -prop
else:
for obj in D.objectList:
prop += obj.proberty2 * obj.percentOfInput
return D.proberty2Max -prop
def checkConstrainOK(cons, x):
for con in cons:
y = con['fun'](x)
if con['type'] == 'eq' and y != 0:
print("eq constrain not respected y= ", y)
return False
elif con['type'] == 'ineq' and y <0:
print("ineq constrain not respected y= ", y)
return False
return True
initialGuess = []
b = []
for obj in D.objectList:
initialGuess.append(obj.percentOfInput)
b.append((obj.min, obj.max))
bnds = tuple(b)
cons = list()
cons.append({'type': 'eq', 'fun': PercentSum})
cons.append({'type': 'ineq', 'fun': lambda x, val=1 :constraint(x, val) })
cons.append({'type': 'ineq', 'fun': lambda x, val=2 :constraint(x, val) })
solution = minimize(optiFunction,initialGuess,method='SLSQP',\
bounds=bnds,constraints=cons,options={'eps':0.001,'disp':True})
print('status ' + str(solution.status))
print('message ' + str(solution.message))
checkConstrainOK(cons, solution.x)
There is no way to find a solution, but the output is this:
Positive directional derivative for linesearch (Exit mode 8)
Current function value: 4900.000012746761
Iterations: 7
Function evaluations: 21
Gradient evaluations: 3
status 8
message Positive directional derivative for linesearch
Where is my fault? In this case it ends with mode 8, because the example is very small. With bigger data the algorithm ends with mode 0. But I think it should ends with a hint that an constraint couldn't be hold.
It doesn't make a difference, if proberty1Max is set to 4 or to 1. But in the case it is 1, there could not be a valid solution.
PS: I changed a lot in this question... Now the code is executable.
EDIT:
1.Okay, I learned, an inequal constrain is accepted if the output is positiv (>0). In the past I think <0 would also be accepted. Because of this the constrain function is now a little bit shorter.
What about the constrain. In my real solution I add some constrains using a loop. In this case it is nice to feed a function with an index of the loop and in the function this index is used to choose an element of an array. In my example here, the "val" decides if the constrain is for proberty1 oder property2. What the constrain mean is, how much of a property is in the hole mix. So I'm calculating the property multiplied with the percentOfInput. "prop" is the sum of this over all objects.
I think there might be a connection to the issue tux007 mentioned in the comments. link to the issue
I think the optimizer doesn't work correct, if the initial guess is not a valid solution.
Linear programming is not good for overdetermined equations. My problem doesn't have a unique solution, its an approximation.
As mentioned in the comment I think this is the problem:
Misleading output from....
If you have a look at the latest changes, the constrain is not satisfied, but the algorithm says: "Positive directional derivative for linesearch"

Python: recall cached function result dependent on new function parameter

I am fairly new to the concepts of caching & memoization. I've read some other discussions & resources here, here, and here, but haven't been able to follow them all that well.
Say that I have two member functions within a class. (Simplified example below.) Pretend that the first function total is computationally expensive. The second function subtotal is computationally simple, except that it uses the return from the first function, and so also becomes computationally expensive because of this, in that it currently needs to re-call total to get its returned result.
I want to cache the results of the first function and use this as the input to the second, if the input y to subtotal shares the input x to a recent call of total. That is:
If calling subtotal() where y is equal to the value of x in a
previous call of total, then use that cached result instead of
re-calling total.
Otherwise, just call total() using x = y.
Example:
class MyObject(object):
def __init__(self, a, b):
self.a, self.b = a, b
def total(self, x):
return (self.a + self.b) * x # some time-expensive calculation
def subtotal(self, y, z):
return self.total(x=y) + z # Don't want to have to re-run total() here
# IF y == x from a recent call of total(),
# otherwise, call total().
With Python3.2 or newer, you could use functools.lru_cache.
If you were to decorate the total with functools.lru_cache directly, then the lru_cache would cache the return values of total based on the value of both arguments, self and x. Since lru_cache's internal dict stores a reference to self, applying #lru_cache directly to a class method creates a circular reference to self which makes instances of the class un-dereferencable (hence a memory leak).
Here is a workaround which allows you to use lru_cache with class methods -- it caches results based on all arguments other than the first one, self, and uses a weakref to avoid the circular reference problem:
import functools
import weakref
def memoized_method(*lru_args, **lru_kwargs):
"""
https://stackoverflow.com/a/33672499/190597 (orly)
"""
def decorator(func):
#functools.wraps(func)
def wrapped_func(self, *args, **kwargs):
# We're storing the wrapped method inside the instance. If we had
# a strong reference to self the instance would never die.
self_weak = weakref.ref(self)
#functools.wraps(func)
#functools.lru_cache(*lru_args, **lru_kwargs)
def cached_method(*args, **kwargs):
return func(self_weak(), *args, **kwargs)
setattr(self, func.__name__, cached_method)
return cached_method(*args, **kwargs)
return wrapped_func
return decorator
class MyObject(object):
def __init__(self, a, b):
self.a, self.b = a, b
#memoized_method()
def total(self, x):
print('Calling total (x={})'.format(x))
return (self.a + self.b) * x
def subtotal(self, y, z):
return self.total(x=y) + z
mobj = MyObject(1,2)
mobj.subtotal(10, 20)
mobj.subtotal(10, 30)
prints
Calling total (x=10)
only once.
Alternatively, this is how you could roll your own cache using a dict:
class MyObject(object):
def __init__(self, a, b):
self.a, self.b = a, b
self._total = dict()
def total(self, x):
print('Calling total (x={})'.format(x))
self._total[x] = t = (self.a + self.b) * x
return t
def subtotal(self, y, z):
t = self._total[y] if y in self._total else self.total(y)
return t + z
mobj = MyObject(1,2)
mobj.subtotal(10, 20)
mobj.subtotal(10, 30)
One advantage of lru_cache over this dict-based cache is that the lru_cache
is thread-safe. The lru_cache also has a maxsize parameter which can help
protect against memory usage growing without bound (for example, due to a
long-running process calling total many times with different values of x).
Thank you all for the responses, it was helpful just to read them and see what's going on under the hood. As #Tadhg McDonald-Jensen said, it seems like I didn't need anything more here than #functools.lru_cache. (I'm in Python 3.5.) Regarding #unutbu's comment, I'm not getting an error from decorating total() with #lru_cache. Let me correct my own example, I'll keep this up here for other beginners:
from functools import lru_cache
from datetime import datetime as dt
class MyObject(object):
def __init__(self, a, b):
self.a, self.b = a, b
#lru_cache(maxsize=None)
def total(self, x):
lst = []
for i in range(int(1e7)):
val = self.a + self.b + x # time-expensive loop
lst.append(val)
return np.array(lst)
def subtotal(self, y, z):
return self.total(x=y) + z # if y==x from a previous call of
# total(), used cached result.
myobj = MyObject(1, 2)
# Call total() with x=20
a = dt.now()
myobj.total(x=20)
b = dt.now()
c = (b - a).total_seconds()
# Call subtotal() with y=21
a2 = dt.now()
myobj.subtotal(y=21, z=1)
b2 = dt.now()
c2 = (b2 - a2).total_seconds()
# Call subtotal() with y=20 - should take substantially less time
# with x=20 used in previous call of total().
a3 = dt.now()
myobj.subtotal(y=20, z=1)
b3 = dt.now()
c3 = (b3 - a3).total_seconds()
print('c: {}, c2: {}, c3: {}'.format(c, c2, c3))
c: 2.469753, c2: 2.355764, c3: 0.016998
In this case I would do something simple, maybe is not the most elegant way, but works for the problem:
class MyObject(object):
param_values = {}
def __init__(self, a, b):
self.a, self.b = a, b
def total(self, x):
if x not in MyObject.param_values:
MyObject.param_values[x] = (self.a + self.b) * x
print(str(x) + " was never called before")
return MyObject.param_values[x]
def subtotal(self, y, z):
if y in MyObject.param_values:
return MyObject.param_values[y] + z
else:
return self.total(y) + z

Resources