Periodically log gradients without requiring two functions (or slowdown) in Theano - theano

For diagnostic purposes, I am grabbing the gradients of the network periodically. One way to do this is to return the gradients as output of the theano function. However, copying the gradients from the GPU to CPU memory every time may be costly so I would prefer to do it only periodically. At the moment, I am achieving this by creating two function objects, one which returns the gradient and one which doesn't.
However, I do not know whether this is optimal and am looking for a more elegant way to achieve the same thing.

Your first function obviously executes a training step and updates all your parameters.
The second function must return the gradients of your parameters.
The fastest way to do what you are asking is to add the updates for the training step to the second function and when logging the gradients, don't call the first function, but only the second.
gradients = [ ... ]
train_f = theano.function([x, y], [], updates=updates)
train_grad_f = theano.function([x, y], gradients, updates=updates)
num_iters = 1000
grad_array = []
for i in range(num_iters):
# every 10 training steps keep log of gradients
if i % 10 == 0:
grad_array.append(train_grad_f(...))
else:
train_f(...)
Update
if you wish to have a single function to do this, you can do the following
from theano.ifelse import ifelse
no_grad = T.iscalar('no_grad')
example_gradient = T.grad(example_cost, example_variable)
# if no_grad is > 0 then return the gradient, otherwise return zeros array
out_grad = ifelse(T.gt(no_grad,0), example_gradient, T.zeros_like(example_variable))
train_f = theano.function([x, y, no_grad], [out_grad], updates=updates)
So when you want to retrieve the gradients you call
train_f(x_data, y_data, 1)
otherwise
train_f(x_data, y_data, 0)

Related

How to improve this toy Jax optimizer code with while loops and saved history?

I'm writing a custom optimizer I want JIT-able with Jax which features 1) breaking on maximum steps reached 2) breaking on a tolerance reached, and 3) saving the history of the steps taken. I'm relatively new to some of this stuff in Jax, but reading the docs I have this solution:
import jax, jax.numpy as jnp
#jax.jit
def optimizer(x, tol = 1, max_steps = 5):
def cond(arg):
step, x, history = arg
return (step < max_steps) & (x > tol)
def body(arg):
step, x, history = arg
x = x / 2 # simulate taking an optimizer step
history = history.at[step].set(x) # simulate saving current step
return (step + 1, x, history)
return jax.lax.while_loop(
cond,
body,
(0, x, jnp.full(max_steps, jnp.nan))
)
optimizer(10.) # works
My question is whether this can be improved in some way? In particular, is there a way to avoid pre-allocating the history? This isn't ideal since the real thing is alot more complicated than a single array and there's obviously the potential for wasted memory if tolerance is reached well before the maximum steps.
is there a way to avoid pre-allocating the history?
No, as I understand JAX
in JAX, 'type' includes shape, that is, the in and out data shape of the body function MUST be the same, otherwise, say dynamic grow history use jnp.vstack((history, x)), JAX will consider it as side effect.
There is a way, if you think the tolerance will be often reached before the maximum number of steps.
JAX implements sparse matrices (and pytrees of them), in jax.experimental.sparse. They will have the same shape as the maximum history size, and therefore satisfy the "fixed size" requirement for XLA, but of course, will only store nonzero elements in memory.

Do gradient descent on function with no input using pytorch

What's the correct way to do gradient descent on an arbitrary function with no input using Pytorch?
x = torch.tensor(x_init, requires_grad=True)
opt = torch.optim.Adam([x])
cost_fnx = cost(x)
for iteration_count in range(100):
opt.zero_grad()
cost_fnx.backward()
opt.step()
When I tried the above, I got this error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
The error occurs because you are trying to backpropagate on the same graph multiple times. You most likely need to recompute the cost value (your regularizer function since it only has the model's parameters as input) to backpropagate again. Something like:
x = x_init.requires_grad_(True)
opt = torch.optim.Adam([x])
for iteration_count in range(2):
cost_fnx = cost(x)
opt.zero_grad()
cost_fnx.backward()
opt.step()

How to fix 'TypeError: Object arrays are not currently supported' error in numpy python 3 (matrix multiplication)

I'm trying to make my own neural network "library" (if you can call it that) for myself to use, since I am hobby-learning about them.
I wrote this code that makes a propagatable neural network by feeding it a structure of the desired network, and it worked pretty well.
But then when I tried giving the model a different amount of nodes, the code BUGGED
I've already tried to edit the amount of nodes in each layer and see where that takes me, and I've found out that I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. I've also tried to do the matrix multiplication of the structure that outputs the bug on paper, and it gave me an actual result (which I've double-checked for legitness a lot of times). So now I know that it has something to do with the practical and not theoretical.
There's clearly something wrong with the matrix multiplication, I think.
The script's functions
I had to include these functions in the question, so you can have a better inside on how this code works.
is_iterable()
This function returns a boolean value that describes if the input is iterable
def is_iterable(x):
try:
x[0]
return True
except:
return False
blueprint()
This function returns a copy of the input array but changes the elements that aren't iterable to 0's
def blueprint(x):
return [blueprint(e) if is_iterable(e) else 0 for e in x]
build()
This function takes a model of your desired neural network structure as input, and outputs suited randomized biases and weights seperated in two different arrays
The 'randomize()' function returns a copy of the input array but changes the elements that aren't iterable to random floats between -1's and 1's.
The 'build-weights()' function returns randomized weights based on a model of a neural network.
def build(x):
def randomize(x):
return np.array([randomize(n) if type(n) is list else random.uniform(-1, 1) for n in x])
def build_weighs(x):
y = []
for i, l in enumerate(x):
if i == len(x) - 1:
break
y.append([randomize(x[i + 1]) for n in l])
return np.array(y)
return (randomize(x), build_weighs(x))
apply_funcs()
This function applies a list of functions to another list of functions and then returns them. If the function list contains a 0, an element from the other list positioned in the same place will not be applied to any function.
def apply_funcs(x, f):
y = x
i = 0
for xj, fj in zip(x, f):
if fj == 0:
y[i] = xj
else:
y[i] = fj(xj)
i += 1
return y
nn()
This is the class for making a neural network.
You can see that it has a function named, 'prop' for the forward propagation of the network.
class nn:
def __init__(self, structure, a_funcs=None):
self.structure = structure
self.b = np.array(structure[0])
self.w = np.array(structure[1])
if a_funcs == None:
a_funcs = blueprint(self.b)
self.a_funcs = np.array(a_funcs).
def prop(self, x):
y = np.array(x)
if y.shape != self.b[0].shape:
raise ValueError("The input needs to be intact with the Input Nodes\nInput: {} != Input Nodes: {}".format(blueprint(y), blueprint(self.b[0])))
wi = 0
# A loop through the layers of the neural network
for i in range(len(self.b)):
# This if statement is here so that the weights get applied in the right order
if i != 0:
y = np.matmul(y, self.w[wi])
wi += 1
# Applying the biases of layer i to the current information
y = np.add(y, self.b[i])
# Applying the activation functions to the current information
y = apply_funcs(y, self.a_funcs[i])
return y
Defining a neural network structure and propagating it
n is containing the structure which is a 3 layer network containing respectively 2 nodes, 2 nodes and 3 nodes.
n = [[0] * 2, [0] * 2, [0] * 3]
bot = nn(build(n))
print(bot.prop([1] * 2))
When I do this I expect the code to output an array of three semi-random numbers like this:
[-0.55889818 0.62762604 0.59222784]
but instead I get an error from numpy saying this:
File "C:\Users\Black\git\Changbot\oper.py.py", line 78, in prop
y = np.matmul(y, self.w[wi])
TypeError: Object arrays are not currently supported
And the weirdest thing about this is that (as I said earlier) I only get this error when the first and the second layer have the same amount of nodes in them, but the output layer has a different amount. All the other times I get the expected output...
I have now again checked the values that are causing this error and I don't see any objects other than a list. It's the same when it's not bugging...
So I added this try-except statement:
try:
y = np.matmul(np.array(y), self.w[wi])
except TypeError:
print("y:{}\nself.w[wi]:{}".format(y, self.w[wi]))
It then outputs this:
y:[1.6888437]
self.w[wi]:[array([-0.19013173])]
Which should have the ability to be multiplied with each other
I have even tried copy pasting the values into an interpreter and multiplying them there, and it works there...
NOTE: THIS IS A VERY BAD TEST AS THE COPY PASTE ARRAYS DOESN'T HAVE THE SAME DTYPES AS THE ACTUAL ARRAYS
np.matmul([1.6888437], [np.array([-0.19013173])])
Output for the above:
[-0.32110277]
After looking at the answers
Okay. I have now found out that the object dtype arrays lies in the structure of the neural network by doing this at the end of the script:
print("STRUCTURE:{}".format(n))
It then outputs this:
STRUCTURE:(array([array([0.6888437]), array([ 0.51590881, -0.15885684]),
array([-0.4821665 , 0.02254944, -0.19013173])], dtype=object), array([list([array([ 0.56759718, -0.39337455])]),
list([array([-0.04680609, 0.16676408, 0.81622577]), array([ 0.00937371, -0.43632431, 0.51160841])])],
dtype=object))
Solving the bug
I can understand from one of the answer to this post that np.array() tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).
The object dtype gets created in the build() function so I tried to remove all np.array() functions in that. Actually i removed all of such from the whole script. And guess what? It worked! Thanks a 1000 times to you contributers!
Btw Happy New Year
Regarding your copy-paste testing:
In [55]: np.matmul([1.6888437], [np.array([-0.19013173])])
Out[55]: array([-0.32110277])
But this isn't what your code is using. Instead we have to make arrays that match in dtype.
In [59]: x = np.array([1.6888437]); y = np.array([np.array([-0.19013173]),None])[:1]
In [60]: x
Out[60]: array([1.6888437])
In [61]: y
Out[61]: array([array([-0.19013173])], dtype=object)
I used the None funny business to force it to create an object dtype containing an array, which will print as [array([-0.19013173])].
Now I get your error:
In [62]: np.matmul(x,y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-b6212b061655> in <module>()
----> 1 np.matmul(x,y)
TypeError: Object arrays are not currently supported
Even if did work as with dot
In [66]: np.dot(x,y)
Out[66]: array([-0.32110277])
the calculations with object dtype arrays are slower.
I won't try to figure out why you have an object dtype array at this point. But I think you should avoid those in code where speed matters.
If you construct an array from arrays or lists that differ in size, the result is likely to be object dtype with a lower number of dimensions. np.array tries to create as high a dimensional array as it can, and failing that falls back on object dtype (or for some combinations of inputs raises an error).

Running out of memory during evaluation in Pytorch

I'm training a model in pytorch. Every 10 epochs, I'm evaluating the train and test error on the entire train and test dataset. For some reason the evaluation function is causing out-of-memory on my GPU. This is strange because I have the same batchsize for training and evaluation. I believe it's due to the net.forward() method being called repeated and having all the hidden values stored in memory but I'm not sure how to get around this?
def evaluate(self, data):
correct = 0
total = 0
loader = self.train_loader if data == "train" else self.test_loader
for step, (story, question, answer) in enumerate(loader):
story = Variable(story)
question = Variable(question)
answer = Variable(answer)
_, answer = torch.max(answer, 1)
if self.config.cuda:
story = story.cuda()
question = question.cuda()
answer = answer.cuda()
pred_prob = self.mem_n2n(story, question)[0]
_, output_max_index = torch.max(pred_prob, 1)
toadd = (answer == output_max_index).float().sum().data[0]
correct = correct + toadd
total = total + captions.size(0)
acc = correct / total
return acc
I think it fails during Validation because you don't use optimizer.zero_grad(). The zero_grad executes detach, making the tensor a leaf. It is commonly used every epoch in the training part.
The use of volatile flag in Variable from PyTorch 0.4.0 has been removed.
Ref - migration_guide_to_0.4.0
Starting from 0.4.0, to avoid the gradient being computed during validation, use torch.no_grad()
Code example from the migration guide.
# evaluate
with torch.no_grad(): # operations inside don't track history
for input, target in test_loader:
...
For 0.3.X, using volatile should work.
I would suggest to use volatile flag set to True for all variables used during the evaluation,
story = Variable(story, volatile=True)
question = Variable(question, volatile=True)
answer = Variable(answer, volatile=True)
Thus, the gradients and operation history is not stored and you will save a lot of memory.
Also, you could delete references to those variables at the end of the batch processing:
del story, question, answer, pred_prob
Don't forget to set the model to the evaluation mode (and back to the train mode after you finished the evaluation). For instance, like this
model.eval()

How to add a confusion matrix to Theano examples?

I want to make use of Theano's logistic regression classifier, but I would like to make an apples-to-apples comparison with previous studies I've done to see how deep learning stacks up. I recognize this is probably a fairly simple task if I was more proficient in Theano, but this is what I have so far. From the tutorials on the website, I have the following code:
def errors(self, y):
# check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
I'm pretty sure this is where I need to add the functionality, but I'm not certain how to go about it. What I need is either access to y_pred and y for each and every run (to update my confusion matrix in python) or to have the C++ code handle the confusion matrix and return it at some point along the way. I don't think I can do the former, and I'm unsure how to do the latter. I've done some messing around with an update function along the lines of:
def confuMat(self, y):
x=T.vector('x')
classes = T.scalar('n_classes')
onehot = T.eq(x.dimshuffle(0,'x'),T.arange(classes).dimshuffle('x',0))
oneHot = theano.function([x,classes],onehot)
yMat = T.matrix('y')
yPredMat = T.matrix('y_pred')
confMat = T.dot(yMat.T,yPredMat)
confusionMatrix = theano.function(inputs=[yMat,yPredMat],outputs=confMat)
def confusion_matrix(x,y,n_class):
return confusionMatrix(oneHot(x,n_class),oneHot(y,n_class))
t = np.asarray(confusion_matrix(y,self.y_pred,self.n_out))
print (t)
But I'm not completely clear on how to get this to interface with the function in question and give me a numpy array I can work with.
I'm quite new to Theano, so hopefully this is an easy fix for one of you. I'd like to use this classifer as my output layer in a number of configurations, so I could use the confusion matrix with other architectures.
I suggest using a brute force sort of a way. You need an output for a prediction first. Create a function for it.
prediction = theano.function(
inputs = [index],
outputs = MLPlayers.predicts,
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size]})
In your test loop, gather the predictions...
labels = labels + test_set_y.eval().tolist()
for mini_batch in xrange(n_test_batches):
wrong = wrong + int(test_model(mini_batch))
predictions = predictions + prediction(mini_batch).tolist()
Now create confusion matrix this way:
correct = 0
confusion = numpy.zeros((outs,outs), dtype = int)
for index in xrange(len(predictions)):
if labels[index] is predictions[index]:
correct = correct + 1
confusion[int(predictions[index]),int(labels[index])] = confusion[int(predictions[index]),int(labels[index])] + 1
You can find this kind of an implementation in this repository.

Resources