Hyperopt list of values per hyperparameter - python-3.x

I'm trying to use Hyperopt on a regression model such that one of its hyperparameters is defined per variable and needs to be passed as a list. For example, if I have a regression with 3 independent variables (excluding constant), I would pass hyperparameter = [x, y, z] (where x, y, z are floats).
The values of this hyperparameter have the same bounds regardless of which variable they are applied to. If this hyperparameter was applied to all variables, I could simply use hp.uniform('hyperparameter', a, b). What I want the search space to be instead is a cartesian product of hp.uniform('hyperparameter', a, b) of length n, where n is the number of variables in a regression (so, basically, itertools.product(hp.uniform('hyperparameter', a, b), repeat = n))
I'd like to know whether this is possible within Hyperopt. If not, any suggestions for an optimizer where this is possible are welcome.

As noted in my comment, I am not 100% sure what you are looking for, but here is an example of using hyperopt to optimize 3 variables combination:
import random
# define an objective function
def objective(args):
v1 = args['v1']
v2 = args['v2']
v3 = args['v3']
result = random.uniform(v2,v3)/v1
return result
# define a search space
from hyperopt import hp
space = {
'v1': hp.uniform('v1', 0.5,1.5),
'v2': hp.uniform('v2', 0.5,1.5),
'v3': hp.uniform('v3', 0.5,1.5),
}
# minimize the objective over the space
from hyperopt import fmin, tpe, space_eval
best = fmin(objective, space, algo=tpe.suggest, max_evals=100)
print(best)
they all have the same search space in this case (as I understand this was your problem definition). Hyperopt aims to minimize the objective function, so running this will end up with v2 and v3 near the minimum value, and v1 near the maximum value. Since this most generally minimizes the result of the objective function.

You could use this function to create the space:
def get_spaces(a, b, num_spaces=9):
return_set = {}
for set_num in range(9):
name = str(set_num)
return_set = {
**return_set,
**{name: hp.uniform(name, a, b)}
}
return return_set

I would first define my pre-combinatorial space as a dict. The keys are names. The values are a tuple.
from hyperopt import hp
space = {'foo': (hp.choice, (False, True)), 'bar': (hp.quniform, 1, 10, 1)}
Next, produce the required combinatorial variants using loops or itertools. Each name is kept unique using a suffix or prefix.
types = (1, 2)
space = {f'{name}_{type_}': args for type_ in types for name, args in space.items()}
>>> space
{'foo_1': (<function hyperopt.pyll_utils.hp_choice(label, options)>,
(False, True)),
'bar_1': (<function hyperopt.pyll_utils.hp_quniform(label, *args, **kwargs)>,
1, 10, 1),
'foo_2': (<function hyperopt.pyll_utils.hp_choice(label, options)>,
(False, True)),
'bar_2': (<function hyperopt.pyll_utils.hp_quniform(label, *args, **kwargs)>,
1, 10, 1)}
Finally, initialize and create the actual hyperopt space:
space = {name: fn(name, *args) for name, (fn, *args) in space.items()}
values = tuple(space.values())
>>> space
{'foo_1': <hyperopt.pyll.base.Apply at 0x7f291f45d4b0>,
'bar_1': <hyperopt.pyll.base.Apply at 0x7f291f45d150>,
'foo_2': <hyperopt.pyll.base.Apply at 0x7f291f45d420>,
'bar_2': <hyperopt.pyll.base.Apply at 0x7f291f45d660>}
This was done with hyperopt 0.2.7. As a disclaimer, I strongly advise against using hyperopt because in my experience it has significantly poor performance relative to other optimizers.

Hi so I implemented this solution with optuna. The advantage of optuna is that it will create a hyperspace for all individual values, but optimizes this values in a more intelligent way and just uses one hyperparameter optimization. For example I optimized a neural network with the Batch-SIze, Learning-rate and Dropout-Rate:
The search space is much larger than the actual values being used. This safes a lot of time instead of an grid search.
The Pseudo-Code of the implementation is:
def function(trial): #trials is the parameter of optuna, which selects the next hyperparameter
distribution = [0 , 1]
a = trials.uniform("a": distribution) #this is a uniform distribution
b = trials.uniform("a": distribution)
return (a*b)-b
#This above is the function which optuna tries to optimze/minimze
For more detailed source-Code visit Optuna. It saved a lot of time for me and it was a really good result.

Related

How to get a 2D output from linear layer in pytorch?

I would like to project a tensor into a space with an additional dimension.
I tried
torch.nn.Linear(
in_features=num_inputs,
out_features=(num_inputs, num_additional),
)
But this results in an error
A workaround would be to
torch.nn.Linear(
in_features=num_inputs,
out_features=num_inputs*num_additional,
)
and then change the view the output
output.view(batch_size, num_inputs, num_additional)
But I imagine this workaround will get tricky to read, especially when a projection into more than one additional dimension is desired.
Is there a more direct way to code this operation?
Perhaps the source code for linear can be changed
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
To accept more dimensions for the weight and bias initialization, and F.linear seems like it would need to be replaced with a different function.
IMO the workaround you provided is already clear enough. However, if you want to express this as a single operation, you can always write your own module by subclassing torch.nn.Linear:
import numpy as np
import torch
class MultiDimLinear(torch.nn.Linear):
def __init__(self, in_features, out_shape, **kwargs):
self.out_shape = out_shape
out_features = np.prod(out_shape)
super().__init__(in_features, out_features, **kwargs)
def forward(self, x):
out = super().forward(x)
return out.reshape((len(x), *self.out_shape))
if __name__ == '__main__':
tmp = torch.empty((32, 10))
linear = MultiDimLinear(in_features=10, out_shape=(10, 10))
out = linear(tmp)
print(out.shape) # (32, 10, 10)
Another way would be to use torch.einsum
https://pytorch.org/docs/stable/generated/torch.einsum.html
torch.einsum can prevent summation across dimensions in tensor to tensor multiplication operations. This can allow separate multiplication operations to happen in parallel. [ I do not know if this would necessarily result in GPU efficiency; if the operations are still occurring in the same kernel. In fact, it may be slower https://github.com/pytorch/pytorch/issues/32591 ]
How this would work is to directly initialize the weight and bias tensors (look at source code for the torch linear layer for that code)
Say that the input (X) has dimensions (a, b), where a is the batch size.
Say that you want to pass this input through a series of classifiers, represented in a single weight tensor (W) with dimensions (c, d, e), where c is the number of classifiers, and e is the number of classes for the classifier
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 2)
torch.einsum('ab, cbe -> ace', x, w)
in the last line, a and b are the dimensions of the input as mentioned above. What might be the tricky part is c, b, and e are the dimensions of the classifiers weight tensor; I didn't use d, I used b instead. That is because the vector multiplication is happening along that dimension for the inputs tensor and the weight tensor. So that's why the left side of the einsum equation is ab, cbe. The right side of the einsum equation is simply what dimensions to exclude from summation.
The final dimensions we want is (a, c, e). a is the batch size, c is the number of classifiers, and e is the number of classes for each classifier. We do not want to add those values, so to preserve their separation, the left side of the equation is ace.
For those unfamiliar with einsum, this will be harder to read than the word around I created (though I highly recommend learning it, because it gets very easy and intuitive very fast even though it's a bit tricky at first https://www.youtube.com/watch?v=pkVwUVEHmfI )
However, for paralyzing certain operations (especially on GPU), it seems that einsum is the only way to do it. For example so that in my previous example, I didn't want to use a classification head yet, I just wanted to project to multiple dimensions.
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 4)
y = torch.einsum('ab, cbe -> ace', x, w)
And say I do a few other operations to y, perhaps some non linear operations, activations, etc.
z = f(y)
z will still have the dimensions 2, 5, 4. Batch size two, 5 hidden states per batch, and the dimension of those hidden states are 4.
And then I want to apply a classifier to each separate tensor.
w2 = torch.arange(4*2).view(4, 2)
final = torch.einsum('fgh, hj -> fgj', z, w2)
Quick refresh, 2 is the batch size, 5 is the number of classifier, and 2 is the number of outputs for each classifier.
The output dimensions, f, g, j (2, 5, 2) will not be summed across, and thus will be preserved in the output.
As cited in the github link, this may be slower than just using regular linear layers. There may be efficiencies in a very large number of parallel operations.

Python multiprocessing same function with different arguments

I have a program that is designed to calculate order parameter from coarse-grained molecular system. In the system the I have different beads, which represents different parts of molecule. Each of these beads have a xyz-coordinates that represent their place in the system. The program works, but it is very slow since I have to calculate the number of beads type i around beads type j within a certain cutoff distance.
Function to calculate Euclidean distance between bead a and b:
def distance_ab(a, b):
n_beads = 0
for i in range(len(a)):
for j in range(len(b)):
# Euclidean distance
dist = np.sqrt(np.sum((a[i] - b[j])** 2, axis=0))
if dist <= 1.0 and dist > 0.0: # cut-off distance
n_beads += 1
return n_beads
So I decided to fasten the process of calculating the distance between different beads by using python multiprocessing library. But for some reason I can not get the multiprocessing to work for repeating the same distance calculation function with different parameters (xyz-data of beads). Multiprocessing returns a list of some numbers, when the idea is to return only one number (the number of beads in certain cut-off distance). What I do wrong and could someone help me to understand where the problem is?
The part where I am trying to use multiprocessing:
with multiprocessing.Pool(os.cpu_count()) as pool:
# go through certain number of molecular simulation frames (e.g. 100 frames)
for i in range(frames)):
# Calculating euclidean distances between different types of beads
# for each frame
a_b = pool.starmap(calculate_distances, zip(bead_a_array, bead_b_array))
a_c = pool.starmap(calculate_distances, zip(bead_a_array, bead_c_array))
When you zip together your bead arrays, you are creating an iterable of tuples that overall has the same length as the shorter of the two arrays.
>>> A=[1,2,3]
>>> B=[4,5,6,7]
>>> res=zip(A,B)
>>> list(res)
[(1, 4), (2, 5), (3, 6)]
Looking at the documentation for starmap:
Like map() except that the elements of the iterable are expected to be iterables that are unpacked as arguments.
Hence an iterable of [(1,2), (3, 4)] results in [func(1,2), func(3,4)].
So your starmaps are actually just passing a pair of elements (one from each array) to your function and returning the result for each of these pairs. If you want to use multiprocessing to determine, say, the number of b and c beads within the cutoff distance of a at the same time, you would need to do something like this:
import itertools as it
all_arr=[bead_a_array,bead_b_array,bead_c_array]
with multiprocessing.Pool(os.cpu_count()) as pool:
a_counts=pool.starmap(distance_ab, it.combinations(all_arr,2))
Here, instead of passing individual elements of each array to the function, it now passes each array into your function and it will compute the counts of b and c within the threshold of a (and c with the threshold of b) simultaneously. The it.combinations(all_arr,2) selects unique pairs of arrays to pass to your function.

multithreaded iteration over numpy array indices

I have a piece of code which iterates over a three-dimensional array and writes into each cell a value based on the indices and the current value itself:
import numpy as np
nx = ny = nz = 100
array = np.zeros((nx, ny, nz))
def fun(val, k):
# Do something with the indices
return val + (k[0] * k[1] * k[2])
with np.nditer(array, flags=['multi_index'], op_flags=['readwrite']) as it:
for x in it:
x[...] = fun(x, it.multi_index)
Note, that fun might do something more sophisticated, which takes most of the total runtime, and that the input arrays might have different lengths per axis.
However, this code could run in multiple threads, as fun can be assumed to be threadsafe (Only the value and index of the current cell are required). But finding a method to iterate over all cells and have the current index available seems to be hard.
A possible solution might be https://stackoverflow.com/a/58012407/446140, where the array is split by the x-axis into chunks and passed to a Pool.
However, the solution is not universally applicable and I wonder if there is a more general solution for this problem (which could also work with nD arrays)?
The first issue is to split up the 3D array into equally sized chunks. np.array_split can be used, but the offset of each of the splits has to be stored to get the correct indices again.
An interesting question, with a few possible solutions. As you indicated, it is possible to use np.array_split, but since we are only interested in the indices, we can also use np.unravel_index, which would mean that we only have to loop over all the indices (the size) of the array to get the index.
Now there are two great ideas for multiprocessing:
Create a (thread safe) shared memory of the array and splitting the indices across the different processes.
Only update the array in a main thread, but provide a copy of the required data to the processes and let them return the value that has to be updated.
Both solutions will work for any np.ndarray, but have different advantages. Creating a shared memory doesn't create copies, but can have a large insertion penalty if it has to wait on other processes (the computational time, is small compared to the write time.)
There are probably many more solutions, but I will work out the first solution, where a Shared Memory object is created and a range of indices is provided to every process.
Required imports:
import itertools
import numpy as np
import multiprocessing as mp
from multiprocessing import shared_memory
Shared Numpy arrays
The main problem with applying multiprocessing on np.ndarray's is that memory sharing between processes can be difficult. For this the following class can be used:
class SharedNumpy:
__slots__ = ('arr', 'shm', 'name', 'shared',)
def __init__(self, arr: np.ndarray = None):
if arr is not None:
self.shm = shared_memory.SharedMemory(create=True, size=arr.nbytes)
self.arr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=self.shm.buf)
self.name = self.shm.name
np.copyto(self.arr, arr)
def __getattr__(self, item):
if hasattr(self.arr, item):
return getattr(self.arr, item)
raise AttributeError(f"{self.__class__.__name__}, doesn't have attribute {item!r}")
def __str__(self):
return str(self.arr)
#classmethod
def from_name(cls, name, shape, dtype):
memory = cls(arr=None)
memory.shm = shared_memory.SharedMemory(name)
memory.arr = np.ndarray(shape, dtype=dtype, buffer=memory.shm.buf)
memory.name = name
return memory
#property
def dtype(self):
return self.arr.dtype
#property
def shape(self):
return self.arr.shape
This makes it possible to create a shared memory object in the main process and then use SharedNumpy.from_name to get it in other processes.
Simple test
A quick (non threaded) test would be:
def simple_test():
data = np.array(np.zeros((5,) * 2))
mem_primary = SharedNumpy(arr=data)
mem_second = SharedNumpy.from_name(name=mem_primary.name, shape=data.shape, dtype=data.dtype)
assert mem_primary.name == mem_second.name, "Different memory names"
assert np.array_equal(mem_primary.arr, mem_second.arr), "Different array values."
mem_primary.arr[2] = 5
assert np.array_equal(mem_primary.arr, mem_second.arr), "Different array values."
print("Completed 3/3 tests...")
A threaded test will follow later!
Distribution
The next part is focused on providing the processes with the necessary data. In this case we will provide every process with a range of indices that it has to calculate and all the data that is required to load the shared memory.
The input of this function is a dim the number of numpy axis, and the size, which are the number of elements per axis.
def distributed(size, dim):
memory = SharedNumpy(arr=np.zeros((size,) * dim))
split_size = np.int64(np.ceil(memory.arr.size / mp.cpu_count()))
settings = dict(
memory=itertools.repeat(memory.name),
shape=itertools.repeat(memory.arr.shape),
dtype=itertools.repeat(memory.arr.dtype),
start=np.arange(mp.cpu_count()),
num=itertools.repeat(split_size)
)
with mp.Pool(mp.cpu_count()) as pool:
pool.starmap(fun, zip(*settings.values()))
print(f"\n\nDone {dim}D, size: {size}, elements: {size ** dim}")
return memory
Notes:
By using starmap instead of map, it is possible to provide multiple input arguments (a list of arguments for every process).
(also see docs starmap)
itertools.repeat is used to add constants to the starmap
(also see: zip() in python, how to use static values)
By using np.unravel_index, we only need to have a start index and the chunk size per process.
The start and num tell the chunks of indices that have to be converted per process, by applying range(start * num, (start + 1) * num).
Testing
For the testing I am using different input sizes and dimensions. Since the data increases with the formula sizes ^ dimensions, I limited the test to a size of 128 and 3 dimensions (that is 2,097,152 points, and already start taking quit a bit of time.)
Code
fun
def fun(name, shape, dtype, start, num):
memory = SharedNumpy.from_name(name, shape=shape, dtype=dtype)
for idx in range(start * num, min((start + 1) * num, memory.arr.size)):
# Do something with the indices
indices = np.unravel_index([idx], shape)
memory.arr[indices] += np.product(indices)
memory.shm.close() # Closes the shared memory for this process.
Running the example
if __name__ == '__main__':
for size in [5, 10, 15]:
for dim in [1, 2, 3]:
memory = distributed(size, dim)
print(memory)
memory.shm.unlink()
For the OP's code, I used his code with a small addition that I allow the array to have different sizes and dimensions, in any case I use:
def sequential(size, dim):
array = np.zeros((size,) * dim)
...
And looking at the output array of both codes, will result in the same outcomes.
Plots
The code for the graphs have been taken from the reply in:
https://codereview.stackexchange.com/questions/165245/plot-timings-for-a-range-of-inputs
With the minor alteration that labels was changed to codes in
empty_multi_index = pd.MultiIndex(levels=[[], []], codes=[[], []], names=['func', 'result'])
Where the 1d, 2d and 3d reference the dimensions and the input is the size.
Sequentially (OP code):
Distributed (this code):
Results
This method works on an arbitrary sized numpy array, and is able to perform an operation on the indices of the array. It provides you with full access of the whole numpy array, so it can also be used to perform different kind of statistical analysis, which do not change the array.
From the timings it can be seen that for small data shapes the distributed version has no to little advantages, because of the extra complexity of creating the processes. However for larger amount of data it starts to become more effective.
I only timed it on short delays in the computational time (simple fun), but on more complex calculations, it should outperform the sequential version much sooner.
Extra
If you are only interested in operations that are performed over or along axis, these numpy functions might help to vectorize your solutions instead of using multiprocessing:
np.apply_over_axes
np.apply_along_axis

How do I call a list of numpy functions without a for loop?

I'm doing data analysis that involves minimizing the least-square-error between a set of points and a corresponding set of orthogonal functions. In other words, I'm taking a set of y-values and a set of functions, and trying to zero in on the x-value that gets all of the functions closest to their corresponding y-value. Everything is being done in a 'data_set' class. The functions that I'm comparing to are all stored in one list, and I'm using a class method to calculate the total lsq-error for all of them:
self.fits = [np.poly1d(np.polyfit(self.x_data, self.y_data[n],10)) for n in range(self.num_points)]
def error(self, x, y_set):
arr = [(y_set[n] - self.fits[n](x))**2 for n in range(self.num_points)]
return np.sum(arr)
This was fine when I had significantly more time than data, but now I'm taking thousands of x-values, each with a thousand y-values, and that for loop is unacceptably slow. I've been trying to use np.vectorize:
#global scope
def func(f,x):
return f(x)
vfunc = np.vectorize(func, excluded=['x'])
…
…
#within data_set class
def error(self, x, y_set):
arr = (y_set - vfunc(self.fits, x))**2
return np.sum(arr)
func(self.fits[n], x) works fine as long as n is valid, and as far as I can tell from the docs, vfunc(self.fits, x) should be equivalent to
[self.fits[n](x) for n in range(self.num_points)]
but instead it throws:
ValueError: cannot copy sequence with size 10 to array axis with dimension 11
10 is the degree of the polynomial fit, and 11 is (by definition) the number of terms in it, but I have no idea why they're showing up here. If I change the fit order, the error message reflects the change. It seems like np.vectorize is taking each element of self.fits as a list rather than a np.poly1d function.
Anyway, if someone could either help me understand np.vectorize better, or suggest another way to eliminate that loop, that would be swell.
As the functions in question all have a very similar structure we can "manually" vectorize once we've extracted the poly coefficients. In fact, the function is then a quite simple one-liner, eval_many below:
import numpy as np
def poly_vec(list_of_polys):
O = max(p.order for p in list_of_polys)+1
C = np.zeros((len(list_of_polys), O))
for p, c in zip(list_of_polys, C):
c[len(c)-p.order-1:] = p.coeffs
return C
def eval_many(x,C):
return C#np.vander(x,11).T
# make example
list_of_polys = [np.poly1d(v) for v in np.random.random((1000,11))]
x = np.random.random((2000,))
# put all coeffs in one master matrix
C = poly_vec(list_of_polys)
# test
assert np.allclose(eval_many(x,C), [p(x) for p in list_of_polys])
from timeit import timeit
print('vectorized', timeit(lambda: eval_many(x,C), number=100)*10)
print('loopy ', timeit(lambda: [p(x) for p in list_of_polys], number=10)*100)
Sample run:
vectorized 6.817315469961613
loopy 56.35076989419758

How to make sense the output of DecisionTreeClassifier in scikit-learn?

I'm learning ML and uses scikit-learn to do a basic decision tree classify.
The value of features are categorical so I used DictVectorizer to convert the original feature values. Here's my code:
training_set # list of dict representing the traing set
labels # corresponding labels of the training set
vec = DictVectorizer()
vectorized = vec.fit_transform(training_set)
clf = tree.DecisionTreeClassifier()
clf.fit(vectorized.toarray(), labels)
with open("output.dot", "w") as output_file:
tree.export_graphviz(clf, out_file=output_file)
But I don't understand the output graph. It contains a tree with each node marked X[1] <= 0.5000 or something like that. What I expected was that the nodes marked with FEATURE_1 == VALUE_1, the un-vectorized information show on the tree.
Is it possible?
UPDATE:
For example, FEATURE_1 has three possible values A, B, C, which in turn be vectorized into 0,0, 0,1, 1,0 respectively. What I want on the graph is FEATURE_1 == A instead of X[1] <= 0.5
You can pass the feature names to the tree exporter method:
with open("output.dot", "w") as output_file:
tree.export_graphviz(clf, feature_names=vec.get_feature_names(),
out_file=output_file)
The classifier itself is unaware of the "meaning" of the data, it just deals with continuous numerical values, hence the need to use a vectorizer to one-hot-encode the categorical variables as binary variables that can safely treated as continuous variables in the range [0, 1] with all the actual values being either 0 or 1 and nothing in between.
To understand how the DictVectorizer does the one-hot-encoding, have a look at the example snippet in the documentation.
X[1] <= 0.5000 means X[1] = 0 if you have binary variables. If the equation holds, left branch is chosen. Otherwise, right branch. You can certainly parse the dot file and overwrite it (it's merely a text file and it's easy to do with regular expressions), but the way it is constructed initially is fixed like this, because by default the node of a tree is an inequality.
When the values are in the continuous interval, the machine learner will sort the values and look for all the intermediate values to find the value with the Highest Gini index.
This is reasonable since in the continuous domain, the chance of finding a test instance with a value exactly, let's say, 3.1415 is zero. In such cases the classifier shouldn't know what to do.
I don't know about scikit-learn, but in WEKA for instance, one can specify whether the values are continue or discrete.
When you do an export_graphviz specify the feature_names which are the column names in this case for the independent variables DataFrame.
This would yield you the column names in your output file as below.
model = clf.fit(X, y)
dot_data = tree.export_graphviz(model, out_file=None, feature_names=X.columns.values.tolist(), class_names = None, filled=True, rounded=True, special_characters=True)
with open("output.dot", "w") as output_file:
output_file.write(dot_data)

Resources