multithreaded iteration over numpy array indices - python-3.x

I have a piece of code which iterates over a three-dimensional array and writes into each cell a value based on the indices and the current value itself:
import numpy as np
nx = ny = nz = 100
array = np.zeros((nx, ny, nz))
def fun(val, k):
# Do something with the indices
return val + (k[0] * k[1] * k[2])
with np.nditer(array, flags=['multi_index'], op_flags=['readwrite']) as it:
for x in it:
x[...] = fun(x, it.multi_index)
Note, that fun might do something more sophisticated, which takes most of the total runtime, and that the input arrays might have different lengths per axis.
However, this code could run in multiple threads, as fun can be assumed to be threadsafe (Only the value and index of the current cell are required). But finding a method to iterate over all cells and have the current index available seems to be hard.
A possible solution might be https://stackoverflow.com/a/58012407/446140, where the array is split by the x-axis into chunks and passed to a Pool.
However, the solution is not universally applicable and I wonder if there is a more general solution for this problem (which could also work with nD arrays)?
The first issue is to split up the 3D array into equally sized chunks. np.array_split can be used, but the offset of each of the splits has to be stored to get the correct indices again.

An interesting question, with a few possible solutions. As you indicated, it is possible to use np.array_split, but since we are only interested in the indices, we can also use np.unravel_index, which would mean that we only have to loop over all the indices (the size) of the array to get the index.
Now there are two great ideas for multiprocessing:
Create a (thread safe) shared memory of the array and splitting the indices across the different processes.
Only update the array in a main thread, but provide a copy of the required data to the processes and let them return the value that has to be updated.
Both solutions will work for any np.ndarray, but have different advantages. Creating a shared memory doesn't create copies, but can have a large insertion penalty if it has to wait on other processes (the computational time, is small compared to the write time.)
There are probably many more solutions, but I will work out the first solution, where a Shared Memory object is created and a range of indices is provided to every process.
Required imports:
import itertools
import numpy as np
import multiprocessing as mp
from multiprocessing import shared_memory
Shared Numpy arrays
The main problem with applying multiprocessing on np.ndarray's is that memory sharing between processes can be difficult. For this the following class can be used:
class SharedNumpy:
__slots__ = ('arr', 'shm', 'name', 'shared',)
def __init__(self, arr: np.ndarray = None):
if arr is not None:
self.shm = shared_memory.SharedMemory(create=True, size=arr.nbytes)
self.arr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=self.shm.buf)
self.name = self.shm.name
np.copyto(self.arr, arr)
def __getattr__(self, item):
if hasattr(self.arr, item):
return getattr(self.arr, item)
raise AttributeError(f"{self.__class__.__name__}, doesn't have attribute {item!r}")
def __str__(self):
return str(self.arr)
#classmethod
def from_name(cls, name, shape, dtype):
memory = cls(arr=None)
memory.shm = shared_memory.SharedMemory(name)
memory.arr = np.ndarray(shape, dtype=dtype, buffer=memory.shm.buf)
memory.name = name
return memory
#property
def dtype(self):
return self.arr.dtype
#property
def shape(self):
return self.arr.shape
This makes it possible to create a shared memory object in the main process and then use SharedNumpy.from_name to get it in other processes.
Simple test
A quick (non threaded) test would be:
def simple_test():
data = np.array(np.zeros((5,) * 2))
mem_primary = SharedNumpy(arr=data)
mem_second = SharedNumpy.from_name(name=mem_primary.name, shape=data.shape, dtype=data.dtype)
assert mem_primary.name == mem_second.name, "Different memory names"
assert np.array_equal(mem_primary.arr, mem_second.arr), "Different array values."
mem_primary.arr[2] = 5
assert np.array_equal(mem_primary.arr, mem_second.arr), "Different array values."
print("Completed 3/3 tests...")
A threaded test will follow later!
Distribution
The next part is focused on providing the processes with the necessary data. In this case we will provide every process with a range of indices that it has to calculate and all the data that is required to load the shared memory.
The input of this function is a dim the number of numpy axis, and the size, which are the number of elements per axis.
def distributed(size, dim):
memory = SharedNumpy(arr=np.zeros((size,) * dim))
split_size = np.int64(np.ceil(memory.arr.size / mp.cpu_count()))
settings = dict(
memory=itertools.repeat(memory.name),
shape=itertools.repeat(memory.arr.shape),
dtype=itertools.repeat(memory.arr.dtype),
start=np.arange(mp.cpu_count()),
num=itertools.repeat(split_size)
)
with mp.Pool(mp.cpu_count()) as pool:
pool.starmap(fun, zip(*settings.values()))
print(f"\n\nDone {dim}D, size: {size}, elements: {size ** dim}")
return memory
Notes:
By using starmap instead of map, it is possible to provide multiple input arguments (a list of arguments for every process).
(also see docs starmap)
itertools.repeat is used to add constants to the starmap
(also see: zip() in python, how to use static values)
By using np.unravel_index, we only need to have a start index and the chunk size per process.
The start and num tell the chunks of indices that have to be converted per process, by applying range(start * num, (start + 1) * num).
Testing
For the testing I am using different input sizes and dimensions. Since the data increases with the formula sizes ^ dimensions, I limited the test to a size of 128 and 3 dimensions (that is 2,097,152 points, and already start taking quit a bit of time.)
Code
fun
def fun(name, shape, dtype, start, num):
memory = SharedNumpy.from_name(name, shape=shape, dtype=dtype)
for idx in range(start * num, min((start + 1) * num, memory.arr.size)):
# Do something with the indices
indices = np.unravel_index([idx], shape)
memory.arr[indices] += np.product(indices)
memory.shm.close() # Closes the shared memory for this process.
Running the example
if __name__ == '__main__':
for size in [5, 10, 15]:
for dim in [1, 2, 3]:
memory = distributed(size, dim)
print(memory)
memory.shm.unlink()
For the OP's code, I used his code with a small addition that I allow the array to have different sizes and dimensions, in any case I use:
def sequential(size, dim):
array = np.zeros((size,) * dim)
...
And looking at the output array of both codes, will result in the same outcomes.
Plots
The code for the graphs have been taken from the reply in:
https://codereview.stackexchange.com/questions/165245/plot-timings-for-a-range-of-inputs
With the minor alteration that labels was changed to codes in
empty_multi_index = pd.MultiIndex(levels=[[], []], codes=[[], []], names=['func', 'result'])
Where the 1d, 2d and 3d reference the dimensions and the input is the size.
Sequentially (OP code):
Distributed (this code):
Results
This method works on an arbitrary sized numpy array, and is able to perform an operation on the indices of the array. It provides you with full access of the whole numpy array, so it can also be used to perform different kind of statistical analysis, which do not change the array.
From the timings it can be seen that for small data shapes the distributed version has no to little advantages, because of the extra complexity of creating the processes. However for larger amount of data it starts to become more effective.
I only timed it on short delays in the computational time (simple fun), but on more complex calculations, it should outperform the sequential version much sooner.
Extra
If you are only interested in operations that are performed over or along axis, these numpy functions might help to vectorize your solutions instead of using multiprocessing:
np.apply_over_axes
np.apply_along_axis

Related

Python multiprocessing same function with different arguments

I have a program that is designed to calculate order parameter from coarse-grained molecular system. In the system the I have different beads, which represents different parts of molecule. Each of these beads have a xyz-coordinates that represent their place in the system. The program works, but it is very slow since I have to calculate the number of beads type i around beads type j within a certain cutoff distance.
Function to calculate Euclidean distance between bead a and b:
def distance_ab(a, b):
n_beads = 0
for i in range(len(a)):
for j in range(len(b)):
# Euclidean distance
dist = np.sqrt(np.sum((a[i] - b[j])** 2, axis=0))
if dist <= 1.0 and dist > 0.0: # cut-off distance
n_beads += 1
return n_beads
So I decided to fasten the process of calculating the distance between different beads by using python multiprocessing library. But for some reason I can not get the multiprocessing to work for repeating the same distance calculation function with different parameters (xyz-data of beads). Multiprocessing returns a list of some numbers, when the idea is to return only one number (the number of beads in certain cut-off distance). What I do wrong and could someone help me to understand where the problem is?
The part where I am trying to use multiprocessing:
with multiprocessing.Pool(os.cpu_count()) as pool:
# go through certain number of molecular simulation frames (e.g. 100 frames)
for i in range(frames)):
# Calculating euclidean distances between different types of beads
# for each frame
a_b = pool.starmap(calculate_distances, zip(bead_a_array, bead_b_array))
a_c = pool.starmap(calculate_distances, zip(bead_a_array, bead_c_array))
When you zip together your bead arrays, you are creating an iterable of tuples that overall has the same length as the shorter of the two arrays.
>>> A=[1,2,3]
>>> B=[4,5,6,7]
>>> res=zip(A,B)
>>> list(res)
[(1, 4), (2, 5), (3, 6)]
Looking at the documentation for starmap:
Like map() except that the elements of the iterable are expected to be iterables that are unpacked as arguments.
Hence an iterable of [(1,2), (3, 4)] results in [func(1,2), func(3,4)].
So your starmaps are actually just passing a pair of elements (one from each array) to your function and returning the result for each of these pairs. If you want to use multiprocessing to determine, say, the number of b and c beads within the cutoff distance of a at the same time, you would need to do something like this:
import itertools as it
all_arr=[bead_a_array,bead_b_array,bead_c_array]
with multiprocessing.Pool(os.cpu_count()) as pool:
a_counts=pool.starmap(distance_ab, it.combinations(all_arr,2))
Here, instead of passing individual elements of each array to the function, it now passes each array into your function and it will compute the counts of b and c within the threshold of a (and c with the threshold of b) simultaneously. The it.combinations(all_arr,2) selects unique pairs of arrays to pass to your function.

Two level multiprocessing in Python

Let's consider a function whose elaboration time depends on one of its parameter, size, and depending on its value the running time can take from hours to (few) days. In a script I launch multiple instances of this function in parallel to use all the cores and save time.
Let's say that the total elaboration time is limited by the longest running time of function instance with biggest size, and that I can also parallelize some parts of the function: this would probably not be beneficial if all the function instances are running, but it could be beneficial when only one remains (or if I have tons of cores). How would you do this in Python, that is, organize a two-level multiprocessing (one at the script level, the other inside the function)? Or would you just parallelize the function and launch multiple scripts with different configurations?
I provide a MWE, but I understand the answer on the (absolute) fastest execution is probably problem dependent. Here the function consists of nested loops, but you can adapt it as long as it verifies the preceding assumptions, or change the parameters. I am not interested in this particular MWE, but on how to set an inner parallelization.
import numpy as np
import time
import timeit
import multiprocessing as mp
import copy
def function_wrapper(matrix):
"""Puts to zero the multiples of 3."""
side = matrix.shape[0]
# consider to parallelize the following
# NOTE that I am not interested in parallelizing this particular example
# (you can change this part as long as by parallelizing it you better use the cores)
# with list comprehensions or built-in functions, but on how to setup a second level of multiprocessing
for x in range(side):
for y in range(side):
for z in range(side):
matrix[x, y, z] = timeit.timeit("numpy.linalg.eig(numpy.random.randint(0, 10, (L, L)))", setup='import numpy; L='+str(matrix[x, y, z]), number=10)
return matrix
num_cores = mp.cpu_count()
if __name__ == "__main__":
matrices_number = 20 # depending on the values of matrices_number and max_side the
max_side = 10 # parallelization setup is more or less important
matrices = [np.random.randint(1, 101, (side, side, side))
for side in np.random.randint(2, max_side, matrices_number)]
# sorts matrices by decreasing shape
args_generator = sorted([(m,) for m in matrices], key=lambda x: x[0].shape[0], reverse=True)
iterations = 10 # iterations to reduce caching effects
start = time.time()
for k in range(iterations):
results = [function_wrapper(*args) for args in copy.deepcopy(args_generator)]
elapsed = time.time() - start
print(f"loop: elapsed={elapsed} sec.")
start = time.time()
for k in range(iterations):
with mp.Pool(num_cores) as pool:
results = pool.starmap_async(function_wrapper, copy.deepcopy(args_generator)).get()
elapsed = time.time() - start
print(f"mp.Pool: elapsed={elapsed} sec.")
You're not taking advantage of the built-in functions in the numpy library. You are iterating over the array instead of broadcasting your logic across the whole matrix at once. When you take advantage of the built-in numpy functions, you take advantage of the underlying code written in C. The wrapper function should be the following.
def broadcaster(matrix):
return np.where(matrix % 3 != 0, matrix, 0)
start = time.time()
for k in range(iterations):
results = [broadcaster(*args) for args in copy.deepcopy(args_generator)]
elapsed = time.time() - start
print(f"Broadcast: elapsed={elapsed} sec.")
When I add that snippet to your code I get the following.
loop: elapsed=15.675757884979248 sec.
mp.Pool: elapsed=14.439897060394287 sec.
Broadcast: elapsed=0.6325647830963135 sec.
As you can see, in terms of performance, it is not even close.

Hyperopt list of values per hyperparameter

I'm trying to use Hyperopt on a regression model such that one of its hyperparameters is defined per variable and needs to be passed as a list. For example, if I have a regression with 3 independent variables (excluding constant), I would pass hyperparameter = [x, y, z] (where x, y, z are floats).
The values of this hyperparameter have the same bounds regardless of which variable they are applied to. If this hyperparameter was applied to all variables, I could simply use hp.uniform('hyperparameter', a, b). What I want the search space to be instead is a cartesian product of hp.uniform('hyperparameter', a, b) of length n, where n is the number of variables in a regression (so, basically, itertools.product(hp.uniform('hyperparameter', a, b), repeat = n))
I'd like to know whether this is possible within Hyperopt. If not, any suggestions for an optimizer where this is possible are welcome.
As noted in my comment, I am not 100% sure what you are looking for, but here is an example of using hyperopt to optimize 3 variables combination:
import random
# define an objective function
def objective(args):
v1 = args['v1']
v2 = args['v2']
v3 = args['v3']
result = random.uniform(v2,v3)/v1
return result
# define a search space
from hyperopt import hp
space = {
'v1': hp.uniform('v1', 0.5,1.5),
'v2': hp.uniform('v2', 0.5,1.5),
'v3': hp.uniform('v3', 0.5,1.5),
}
# minimize the objective over the space
from hyperopt import fmin, tpe, space_eval
best = fmin(objective, space, algo=tpe.suggest, max_evals=100)
print(best)
they all have the same search space in this case (as I understand this was your problem definition). Hyperopt aims to minimize the objective function, so running this will end up with v2 and v3 near the minimum value, and v1 near the maximum value. Since this most generally minimizes the result of the objective function.
You could use this function to create the space:
def get_spaces(a, b, num_spaces=9):
return_set = {}
for set_num in range(9):
name = str(set_num)
return_set = {
**return_set,
**{name: hp.uniform(name, a, b)}
}
return return_set
I would first define my pre-combinatorial space as a dict. The keys are names. The values are a tuple.
from hyperopt import hp
space = {'foo': (hp.choice, (False, True)), 'bar': (hp.quniform, 1, 10, 1)}
Next, produce the required combinatorial variants using loops or itertools. Each name is kept unique using a suffix or prefix.
types = (1, 2)
space = {f'{name}_{type_}': args for type_ in types for name, args in space.items()}
>>> space
{'foo_1': (<function hyperopt.pyll_utils.hp_choice(label, options)>,
(False, True)),
'bar_1': (<function hyperopt.pyll_utils.hp_quniform(label, *args, **kwargs)>,
1, 10, 1),
'foo_2': (<function hyperopt.pyll_utils.hp_choice(label, options)>,
(False, True)),
'bar_2': (<function hyperopt.pyll_utils.hp_quniform(label, *args, **kwargs)>,
1, 10, 1)}
Finally, initialize and create the actual hyperopt space:
space = {name: fn(name, *args) for name, (fn, *args) in space.items()}
values = tuple(space.values())
>>> space
{'foo_1': <hyperopt.pyll.base.Apply at 0x7f291f45d4b0>,
'bar_1': <hyperopt.pyll.base.Apply at 0x7f291f45d150>,
'foo_2': <hyperopt.pyll.base.Apply at 0x7f291f45d420>,
'bar_2': <hyperopt.pyll.base.Apply at 0x7f291f45d660>}
This was done with hyperopt 0.2.7. As a disclaimer, I strongly advise against using hyperopt because in my experience it has significantly poor performance relative to other optimizers.
Hi so I implemented this solution with optuna. The advantage of optuna is that it will create a hyperspace for all individual values, but optimizes this values in a more intelligent way and just uses one hyperparameter optimization. For example I optimized a neural network with the Batch-SIze, Learning-rate and Dropout-Rate:
The search space is much larger than the actual values being used. This safes a lot of time instead of an grid search.
The Pseudo-Code of the implementation is:
def function(trial): #trials is the parameter of optuna, which selects the next hyperparameter
distribution = [0 , 1]
a = trials.uniform("a": distribution) #this is a uniform distribution
b = trials.uniform("a": distribution)
return (a*b)-b
#This above is the function which optuna tries to optimze/minimze
For more detailed source-Code visit Optuna. It saved a lot of time for me and it was a really good result.

How do I call a list of numpy functions without a for loop?

I'm doing data analysis that involves minimizing the least-square-error between a set of points and a corresponding set of orthogonal functions. In other words, I'm taking a set of y-values and a set of functions, and trying to zero in on the x-value that gets all of the functions closest to their corresponding y-value. Everything is being done in a 'data_set' class. The functions that I'm comparing to are all stored in one list, and I'm using a class method to calculate the total lsq-error for all of them:
self.fits = [np.poly1d(np.polyfit(self.x_data, self.y_data[n],10)) for n in range(self.num_points)]
def error(self, x, y_set):
arr = [(y_set[n] - self.fits[n](x))**2 for n in range(self.num_points)]
return np.sum(arr)
This was fine when I had significantly more time than data, but now I'm taking thousands of x-values, each with a thousand y-values, and that for loop is unacceptably slow. I've been trying to use np.vectorize:
#global scope
def func(f,x):
return f(x)
vfunc = np.vectorize(func, excluded=['x'])
…
…
#within data_set class
def error(self, x, y_set):
arr = (y_set - vfunc(self.fits, x))**2
return np.sum(arr)
func(self.fits[n], x) works fine as long as n is valid, and as far as I can tell from the docs, vfunc(self.fits, x) should be equivalent to
[self.fits[n](x) for n in range(self.num_points)]
but instead it throws:
ValueError: cannot copy sequence with size 10 to array axis with dimension 11
10 is the degree of the polynomial fit, and 11 is (by definition) the number of terms in it, but I have no idea why they're showing up here. If I change the fit order, the error message reflects the change. It seems like np.vectorize is taking each element of self.fits as a list rather than a np.poly1d function.
Anyway, if someone could either help me understand np.vectorize better, or suggest another way to eliminate that loop, that would be swell.
As the functions in question all have a very similar structure we can "manually" vectorize once we've extracted the poly coefficients. In fact, the function is then a quite simple one-liner, eval_many below:
import numpy as np
def poly_vec(list_of_polys):
O = max(p.order for p in list_of_polys)+1
C = np.zeros((len(list_of_polys), O))
for p, c in zip(list_of_polys, C):
c[len(c)-p.order-1:] = p.coeffs
return C
def eval_many(x,C):
return C#np.vander(x,11).T
# make example
list_of_polys = [np.poly1d(v) for v in np.random.random((1000,11))]
x = np.random.random((2000,))
# put all coeffs in one master matrix
C = poly_vec(list_of_polys)
# test
assert np.allclose(eval_many(x,C), [p(x) for p in list_of_polys])
from timeit import timeit
print('vectorized', timeit(lambda: eval_many(x,C), number=100)*10)
print('loopy ', timeit(lambda: [p(x) for p in list_of_polys], number=10)*100)
Sample run:
vectorized 6.817315469961613
loopy 56.35076989419758

How to fix 'TypeError: Object arrays are not currently supported' error in numpy python 3 (matrix multiplication)

I'm trying to make my own neural network "library" (if you can call it that) for myself to use, since I am hobby-learning about them.
I wrote this code that makes a propagatable neural network by feeding it a structure of the desired network, and it worked pretty well.
But then when I tried giving the model a different amount of nodes, the code BUGGED
I've already tried to edit the amount of nodes in each layer and see where that takes me, and I've found out that I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. I've also tried to do the matrix multiplication of the structure that outputs the bug on paper, and it gave me an actual result (which I've double-checked for legitness a lot of times). So now I know that it has something to do with the practical and not theoretical.
There's clearly something wrong with the matrix multiplication, I think.
The script's functions
I had to include these functions in the question, so you can have a better inside on how this code works.
is_iterable()
This function returns a boolean value that describes if the input is iterable
def is_iterable(x):
try:
x[0]
return True
except:
return False
blueprint()
This function returns a copy of the input array but changes the elements that aren't iterable to 0's
def blueprint(x):
return [blueprint(e) if is_iterable(e) else 0 for e in x]
build()
This function takes a model of your desired neural network structure as input, and outputs suited randomized biases and weights seperated in two different arrays
The 'randomize()' function returns a copy of the input array but changes the elements that aren't iterable to random floats between -1's and 1's.
The 'build-weights()' function returns randomized weights based on a model of a neural network.
def build(x):
def randomize(x):
return np.array([randomize(n) if type(n) is list else random.uniform(-1, 1) for n in x])
def build_weighs(x):
y = []
for i, l in enumerate(x):
if i == len(x) - 1:
break
y.append([randomize(x[i + 1]) for n in l])
return np.array(y)
return (randomize(x), build_weighs(x))
apply_funcs()
This function applies a list of functions to another list of functions and then returns them. If the function list contains a 0, an element from the other list positioned in the same place will not be applied to any function.
def apply_funcs(x, f):
y = x
i = 0
for xj, fj in zip(x, f):
if fj == 0:
y[i] = xj
else:
y[i] = fj(xj)
i += 1
return y
nn()
This is the class for making a neural network.
You can see that it has a function named, 'prop' for the forward propagation of the network.
class nn:
def __init__(self, structure, a_funcs=None):
self.structure = structure
self.b = np.array(structure[0])
self.w = np.array(structure[1])
if a_funcs == None:
a_funcs = blueprint(self.b)
self.a_funcs = np.array(a_funcs).
def prop(self, x):
y = np.array(x)
if y.shape != self.b[0].shape:
raise ValueError("The input needs to be intact with the Input Nodes\nInput: {} != Input Nodes: {}".format(blueprint(y), blueprint(self.b[0])))
wi = 0
# A loop through the layers of the neural network
for i in range(len(self.b)):
# This if statement is here so that the weights get applied in the right order
if i != 0:
y = np.matmul(y, self.w[wi])
wi += 1
# Applying the biases of layer i to the current information
y = np.add(y, self.b[i])
# Applying the activation functions to the current information
y = apply_funcs(y, self.a_funcs[i])
return y
Defining a neural network structure and propagating it
n is containing the structure which is a 3 layer network containing respectively 2 nodes, 2 nodes and 3 nodes.
n = [[0] * 2, [0] * 2, [0] * 3]
bot = nn(build(n))
print(bot.prop([1] * 2))
When I do this I expect the code to output an array of three semi-random numbers like this:
[-0.55889818 0.62762604 0.59222784]
but instead I get an error from numpy saying this:
File "C:\Users\Black\git\Changbot\oper.py.py", line 78, in prop
y = np.matmul(y, self.w[wi])
TypeError: Object arrays are not currently supported
And the weirdest thing about this is that (as I said earlier) I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. All the other times I get the expected output...
I have now again checked the values that are causing this error and I don't see any objects other than a list. It's the same when it's not bugging...
So I added this try-except statement:
try:
y = np.matmul(np.array(y), self.w[wi])
except TypeError:
print("y:{}\nself.w[wi]:{}".format(y, self.w[wi]))
It then outputs this:
y:[1.6888437]
self.w[wi]:[array([-0.19013173])]
Which should have the ability to be multiplied with each other
I have even tried copy pasting the values into an interpreter and multiplying them there, and it works there...
NOTE: THIS IS A VERY BAD TEST AS THE COPY PASTE ARRAYS DOESN'T HAVE THE SAME DTYPES AS THE ACTUAL ARRAYS
np.matmul([1.6888437], [np.array([-0.19013173])])
Output for the above:
[-0.32110277]
After looking at the answers
Okay. I have now found out that the object dtype arrays lies in the structure of the neural network by doing this at the end of the script:
print("STRUCTURE:{}".format(n))
It then outputs this:
STRUCTURE:(array([array([0.6888437]), array([ 0.51590881, -0.15885684]),
array([-0.4821665 , 0.02254944, -0.19013173])], dtype=object), array([list([array([ 0.56759718, -0.39337455])]),
list([array([-0.04680609, 0.16676408, 0.81622577]), array([ 0.00937371, -0.43632431, 0.51160841])])],
dtype=object))
Solving the bug
I can understand from one of the answer to this post that np.array() tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).
The object dtype gets created in the build() function so I tried to remove all np.array() functions in that. Actually i removed all of such from the whole script. And guess what? It worked! Thanks a 1000 times to you contributers!
Btw Happy New Year
Regarding your copy-paste testing:
In [55]: np.matmul([1.6888437], [np.array([-0.19013173])])
Out[55]: array([-0.32110277])
But this isn't what your code is using. Instead we have to make arrays that match in dtype.
In [59]: x = np.array([1.6888437]); y = np.array([np.array([-0.19013173]),None])[:1]
In [60]: x
Out[60]: array([1.6888437])
In [61]: y
Out[61]: array([array([-0.19013173])], dtype=object)
I used the None funny business to force it to create an object dtype containing an array, which will print as [array([-0.19013173])].
Now I get your error:
In [62]: np.matmul(x,y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-b6212b061655> in <module>()
----> 1 np.matmul(x,y)
TypeError: Object arrays are not currently supported
Even if did work as with dot
In [66]: np.dot(x,y)
Out[66]: array([-0.32110277])
the calculations with object dtype arrays are slower.
I won't try to figure out why you have an object dtype array at this point. But I think you should avoid those in code where speed matters.
If you construct an array from arrays or lists that differ in size, the result is likely to be object dtype with a lower number of dimensions. np.array tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).

Resources