Reshaping with asterisk operator `*` in PyTorch - pytorch

While reading this annotated implementation of Diffusion Probabilistic models in PyTorch, I got stuck at understanding this function
def extract(a, t, x_shape):
batch_size = t.shape[0]
out = a.gather(-1, t.cpu())
return out.reshape(batch_size, *((1,) * (len(x_shape) - 1))).to(t.device)
What it's not clear it's the final return statement, what does the *((1,) mean into reshape function? Does that asterisk correspond to the unpacking operator? And if yes, how is it used here?

(1,) * (len(x_shape) - 1))
means to create a tuple with length len(x_shape) - 1 filled with just 1s
*(...)
means to spread the tuple into arguments
So it ends up being (say len(x_shape) == 5)
return out.reshape(batch_size, 1, 1, 1, 1).to(t.device)

Related

Hot to get the set difference of two 2d numpy arrays, or equivalent of np.setdiff1d in a 2d array?

Here Get intersecting rows across two 2D numpy arrays they got intersecting rows by using the function np.intersect1d. So i changed the function to use np.setdiff1d to get the set difference but it doesn't work properly. The following is the code.
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.view(dtype), B.view(dtype))
return C.view(A.dtype).reshape(-1, ncols)
The following data is used for checking the issue:
min_dis=400
Xt = np.arange(50, 3950, min_dis)
Yt = np.arange(50, 3950, min_dis)
Xt, Yt = np.meshgrid(Xt, Yt)
Xt[::2] += min_dis/2
# This is the super set
turbs_possible_locs = np.vstack([Xt.flatten(), Yt.flatten()]).T
# This is the subset
subset = turbs_possible_locs[np.random.choice(turbs_possible_locs.shape[0],50, replace=False)]
diffs = set_diff2d(turbs_possible_locs, subset)
diffs is supposed to have a shape of 50x2, but it is not.
Ok, so to fix your issue try the following tweak:
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.copy().view(dtype), B.copy().view(dtype))
return C
The problem was - A after .view(...) was applied was broken in half - so it had 2 tuple columns, instead of 1, like B. I.e. as a consequence of applying dtype you essentially collapsed 2 columns into tuple - which is why you could do the intersection in 1d in the first place.
Quoting after documentation:
"
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of the array’s memory with a different data-type. This can cause a reinterpretation of the bytes of memory.
"
Src https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html
I think the "reinterpretation" is exactly what happened - hence for the sake of simplicity I would just .copy() the array.
NB however I wouldn't square it - it's always A which gets 'broken' - whether it's an assignment, or inline B is always fine...

How is KL-divergence in pytorch code related to the formula?

In VAE tutorial, kl-divergence of two Normal Distributions is defined by:
And in many code, such as here, hereand here, the code is implemented as:
KL_loss = -0.5 * torch.sum(1 + logv - mean.pow(2) - logv.exp())
or
def latent_loss(z_mean, z_stddev):
mean_sq = z_mean * z_mean
stddev_sq = z_stddev * z_stddev
return 0.5 * torch.mean(mean_sq + stddev_sq - torch.log(stddev_sq) - 1)
How are they related? why there is not any "tr" or ".transpose()" in code?
The expressions in the code you posted assume X is an uncorrelated multi-variate Gaussian random variable. This is apparent by the lack of cross terms in the determinant of the covariance matrix. Therefore the mean vector and covariance matrix take the forms
Using this we can quickly derive the following equivalent representations for the components of the original expression
Substituting these back into the original expression gives

How to fix 'TypeError: Object arrays are not currently supported' error in numpy python 3 (matrix multiplication)

I'm trying to make my own neural network "library" (if you can call it that) for myself to use, since I am hobby-learning about them.
I wrote this code that makes a propagatable neural network by feeding it a structure of the desired network, and it worked pretty well.
But then when I tried giving the model a different amount of nodes, the code BUGGED
I've already tried to edit the amount of nodes in each layer and see where that takes me, and I've found out that I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. I've also tried to do the matrix multiplication of the structure that outputs the bug on paper, and it gave me an actual result (which I've double-checked for legitness a lot of times). So now I know that it has something to do with the practical and not theoretical.
There's clearly something wrong with the matrix multiplication, I think.
The script's functions
I had to include these functions in the question, so you can have a better inside on how this code works.
is_iterable()
This function returns a boolean value that describes if the input is iterable
def is_iterable(x):
try:
x[0]
return True
except:
return False
blueprint()
This function returns a copy of the input array but changes the elements that aren't iterable to 0's
def blueprint(x):
return [blueprint(e) if is_iterable(e) else 0 for e in x]
build()
This function takes a model of your desired neural network structure as input, and outputs suited randomized biases and weights seperated in two different arrays
The 'randomize()' function returns a copy of the input array but changes the elements that aren't iterable to random floats between -1's and 1's.
The 'build-weights()' function returns randomized weights based on a model of a neural network.
def build(x):
def randomize(x):
return np.array([randomize(n) if type(n) is list else random.uniform(-1, 1) for n in x])
def build_weighs(x):
y = []
for i, l in enumerate(x):
if i == len(x) - 1:
break
y.append([randomize(x[i + 1]) for n in l])
return np.array(y)
return (randomize(x), build_weighs(x))
apply_funcs()
This function applies a list of functions to another list of functions and then returns them. If the function list contains a 0, an element from the other list positioned in the same place will not be applied to any function.
def apply_funcs(x, f):
y = x
i = 0
for xj, fj in zip(x, f):
if fj == 0:
y[i] = xj
else:
y[i] = fj(xj)
i += 1
return y
nn()
This is the class for making a neural network.
You can see that it has a function named, 'prop' for the forward propagation of the network.
class nn:
def __init__(self, structure, a_funcs=None):
self.structure = structure
self.b = np.array(structure[0])
self.w = np.array(structure[1])
if a_funcs == None:
a_funcs = blueprint(self.b)
self.a_funcs = np.array(a_funcs).
def prop(self, x):
y = np.array(x)
if y.shape != self.b[0].shape:
raise ValueError("The input needs to be intact with the Input Nodes\nInput: {} != Input Nodes: {}".format(blueprint(y), blueprint(self.b[0])))
wi = 0
# A loop through the layers of the neural network
for i in range(len(self.b)):
# This if statement is here so that the weights get applied in the right order
if i != 0:
y = np.matmul(y, self.w[wi])
wi += 1
# Applying the biases of layer i to the current information
y = np.add(y, self.b[i])
# Applying the activation functions to the current information
y = apply_funcs(y, self.a_funcs[i])
return y
Defining a neural network structure and propagating it
n is containing the structure which is a 3 layer network containing respectively 2 nodes, 2 nodes and 3 nodes.
n = [[0] * 2, [0] * 2, [0] * 3]
bot = nn(build(n))
print(bot.prop([1] * 2))
When I do this I expect the code to output an array of three semi-random numbers like this:
[-0.55889818 0.62762604 0.59222784]
but instead I get an error from numpy saying this:
File "C:\Users\Black\git\Changbot\oper.py.py", line 78, in prop
y = np.matmul(y, self.w[wi])
TypeError: Object arrays are not currently supported
And the weirdest thing about this is that (as I said earlier) I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. All the other times I get the expected output...
I have now again checked the values that are causing this error and I don't see any objects other than a list. It's the same when it's not bugging...
So I added this try-except statement:
try:
y = np.matmul(np.array(y), self.w[wi])
except TypeError:
print("y:{}\nself.w[wi]:{}".format(y, self.w[wi]))
It then outputs this:
y:[1.6888437]
self.w[wi]:[array([-0.19013173])]
Which should have the ability to be multiplied with each other
I have even tried copy pasting the values into an interpreter and multiplying them there, and it works there...
NOTE: THIS IS A VERY BAD TEST AS THE COPY PASTE ARRAYS DOESN'T HAVE THE SAME DTYPES AS THE ACTUAL ARRAYS
np.matmul([1.6888437], [np.array([-0.19013173])])
Output for the above:
[-0.32110277]
After looking at the answers
Okay. I have now found out that the object dtype arrays lies in the structure of the neural network by doing this at the end of the script:
print("STRUCTURE:{}".format(n))
It then outputs this:
STRUCTURE:(array([array([0.6888437]), array([ 0.51590881, -0.15885684]),
array([-0.4821665 , 0.02254944, -0.19013173])], dtype=object), array([list([array([ 0.56759718, -0.39337455])]),
list([array([-0.04680609, 0.16676408, 0.81622577]), array([ 0.00937371, -0.43632431, 0.51160841])])],
dtype=object))
Solving the bug
I can understand from one of the answer to this post that np.array() tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).
The object dtype gets created in the build() function so I tried to remove all np.array() functions in that. Actually i removed all of such from the whole script. And guess what? It worked! Thanks a 1000 times to you contributers!
Btw Happy New Year
Regarding your copy-paste testing:
In [55]: np.matmul([1.6888437], [np.array([-0.19013173])])
Out[55]: array([-0.32110277])
But this isn't what your code is using. Instead we have to make arrays that match in dtype.
In [59]: x = np.array([1.6888437]); y = np.array([np.array([-0.19013173]),None])[:1]
In [60]: x
Out[60]: array([1.6888437])
In [61]: y
Out[61]: array([array([-0.19013173])], dtype=object)
I used the None funny business to force it to create an object dtype containing an array, which will print as [array([-0.19013173])].
Now I get your error:
In [62]: np.matmul(x,y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-b6212b061655> in <module>()
----> 1 np.matmul(x,y)
TypeError: Object arrays are not currently supported
Even if did work as with dot
In [66]: np.dot(x,y)
Out[66]: array([-0.32110277])
the calculations with object dtype arrays are slower.
I won't try to figure out why you have an object dtype array at this point. But I think you should avoid those in code where speed matters.
If you construct an array from arrays or lists that differ in size, the result is likely to be object dtype with a lower number of dimensions. np.array tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).

normalize vector with respect to the infinity norm python 3

This is the code I'm trying to write im new to coding so im sure im way off any help would be great. Thank you in advance.
Write a function normalize(vector) which takes in a vector and returns the normalized vector with respect to the infinity norm. i.e. (1/infNorm(vector)) * vector.
def normalize(vector):
infNorm(vector) = abs(vector[0])
for i in vector:
if abs(i) > norm:
infNorm(vector) = abs(i)
finalvector = (1/infNorm(vector)) * vector
return finalvector
vector = [2, 5, 7]
print(normalize(vector))
You are confusing function call parameters using () with sequence indices []. By sequence, I mean a Python sequence, which includes things like tuples and lists. Here, you're using a list as a vector. (You could also use tuples, but only if you don't plan to modify them. So we'll stick with lists, for generality and simplicity.)
Also, you need two loops: one to find the norm, and one to apply it.
def infnorm(vector):
norm = 0
for i in range(len(vector)):
if abs(vector[i]) > norm:
norm = vector[i]
return norm
def normalize(vector):
norm = infnorm(vector)
return [v/norm for v in vector]
vector = [2, 5, 7]
print(normalize(vector))
Results:
[0.2857142857142857, 0.7142857142857143, 1.0]
Note that I didn't take the absolute value of each element before normalizing it. I'm no vector wizard, so that might be wrong, but I'm guessing that the normalized vector can have negative values.
The last tricky bit, the return value for normalize(vector), is called a "list comprehension". It's a nifty python trick to build a list using a formula. They look odd at first, but with a little practice it gets easy and they're quite precise and clear. Check it out.
If you are going to use a for loop to find the maximum value of an array in python, I'd suggest splitting the normalize function in two functions, one to get the infinity norm and another one to calculate the vector, as such:
def infNorm(vector):
norm = vector[0]
for element in vector:
if norm < abs(element):
norm = abs(element)
return norm
def normalize(vector):
norm = infNorm(vector)
new_vector = []
for element in vector:
new_vector.append((1.0/norm)*element)
return new_vector
Otherwise, you could use the max() built-in function from python, with such function, the code would look like this:
def normalize(vector):
norm = abs(max(vector, key=abs))
new_vector = []
for element in vector:
new_vector.append((1.0/norm)*element)
return new_vector
By the way, when you have a symbol, followed by parenthesis, you are trying to invoke a function.So, when you do infNorm(vector) = abs(vector[0]), you are trying to assign a value to a function call, which will result in a syntax error. The correct way would be just infNorm = abs(vector[0]).
The infinity norm is the sum of the absolute values of the elements. For instance, here is what sagemath offers for one vector, for the infinity norm, the 2-norm and the 1-norm.
In general to normalise a vector according to a norm you divide each of its elements by its length in that norm.
Then this can be expressed in Python in this way:
>>> vec = [-2, 5, 3]
>>> inf_norm = sum([abs(v) for v in vec])
>>> inf_norm
10
>>> normalised_vec = [v/inf_norm for v in vec]
>>> normalised_vec
[-0.2, 0.5, 0.3]

matrices are not aligned Error: Python SciPy fmin_bfgs

Problem Synopsis:
When attempting to use the scipy.optimize.fmin_bfgs minimization (optimization) function, the function throws a
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
error. According to my error checking this occurs at the very end of the first iteration through fmin_bfgs--just before any values are returned or any calls to callback.
Configuration:
Windows Vista
Python 3.2.2
SciPy 0.10
IDE = Eclipse with PyDev
Detailed Description:
I am using the scipy.optimize.fmin_bfgs to minimize the cost of a simple logistic regression implementation (converting from Octave to Python/SciPy). Basically, the cost function is named cost_arr function and the gradient descent is in gradient_descent_arr function.
I have manually tested and fully verified that *cost_arr* and *gradient_descent_arr* work properly and return all values properly. I also tested to verify that the proper parameters are passed to the *fmin_bfgs* function. Nevertheless, when run, I get the ValueError: matrices are not aligned. According to the source review, the exact error occurs in the
def line_search_wolfe1
function in # Minpack's Wolfe line and scalar searches as supplied by the scipy packages.
Notably, if I use scipy.optimize.fmin instead, the fmin function runs to completion.
Exact Error:
File
"D:\Users\Shannon\Programming\Eclipse\workspace\SBML\sbml\LogisticRegression.py",
line 395, in fminunc_opt
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
File
"C:\Python32x32\lib\site-packages\scipy\optimize\optimize.py", line
533, in fmin_bfgs old_fval,old_old_fval)
File "C:\Python32x32\lib\site-packages\scipy\optimize\linesearch.py", line
76, in line_search_wolfe1
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
I call the optimization function with:
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
I have spent a few days trying to fix this and cannot seem to determine what is causing the matrices are not aligned error.
ADDENDUM: 2012-01-08
I worked with this a lot more and seem to have narrowed the issues (but am baffled on how to fix them). First, fmin (using just fmin) works using these functions--cost, gradient. Second, the cost and the gradient functions both accurately return expected values when tested in a single iteration in a manual implementation (NOT using fmin_bfgs). Third, I added error code to optimize.linsearch and the error seems to be thrown at def line_search_wolfe1 in line: derphi0 = np.dot(gfk, pk).
Here, according to my tests, scipy.optimize.optimize pk = [[ 12.00921659]
[ 11.26284221]]pk type = and scipy.optimize.optimizegfk = [[-12.00921659] [-11.26284221]]gfk type =
Note: according to my tests, the error is thrown on the very first iteration through fmin_bfgs (i.e., fmin_bfgs never even completes a single iteration or update).
I appreciate ANY guidance or insights.
My Code Below (logging, documentation removed):
Assume theta = 2x1 ndarray (Actual: theta Info Size=(2, 1) Type = )
Assume X = 100x2 ndarray (Actual: X Info Size=(2, 100) Type = )
Assume y = 100x1 ndarray (Actual: y Info Size=(100, 1) Type = )
def cost_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
logging.info(__name__ + "cost_arr reports m = " + str(m))
z = scipy.dot(theta.T, X) # Must transpose the vector theta
hypthetax = self.sigmoid(z)
yones = scipy.ones(scipy.shape(y))
hypthetaxones = scipy.ones(scipy.shape(hypthetax))
costright = scipy.dot((yones - y).T, ((scipy.log(hypthetaxones - hypthetax)).T))
costleft = scipy.dot((-1 * y).T, ((scipy.log(hypthetax)).T))
def gradient_descent_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
x = scipy.dot(theta.T, X) # Must transpose the vector theta
sig = self.sigmoid(x)
sig = sig.T - y
grad = scipy.dot(X,sig)
grad = m * grad
return grad
def fminunc_opt_bfgs(self, initialtheta, X, y, maxnumit):
myargs= (X,y)
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, retall=True, full_output=True)
return optcost
In case anyone else encounters this problem ....
1) ERROR 1: As noted in the comments, I incorrectly returned the value from my gradient as a multidimensional array (m,n) or (m,1). fmin_bfgs seems to require a 1d array output from the gradient (that is, you must return a (m,) array and NOT a (m,1) array. Use scipy.shape(myarray) to check the dimensions if you are unsure of the return value.
The fix involved adding:
grad = numpy.ndarray.flatten(grad)
just before returning the gradient from your gradient function. This "flattens" the array from (m,1) to (m,). fmin_bfgs can take this as input.
2) ERROR 2: Remember, the fmin_bfgs seems to work with NONlinear functions. In my case, the sample that I was initially working with was a LINEAR function. This appears to explain some of the anomalous results even after the flatten fix mentioned above. For LINEAR functions, fmin, rather than fmin_bfgs, may work better.
QED
As of current scipy version you need not pass fprime argument. It will compute the gradient for you without any issues. You can also use 'minimize' fn and pass method as 'bfgs' instead without providing gradient as argument.

Resources