I am new to pytorch, and I have been trying some examples with autograd, to see if I understand it. I am confused about why the following code does not work:
def Loss(a):
return a**2
a=torch.tensor(3.0, requires_grad=True )
L=Loss(a)
L.backward()
with torch.no_grad(): a=a+1.0
L=Loss(a)
L.backward()
print(a.grad)
Instead of outputing 8.0, we get "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"
There are two things to note regarding your code:
You are performing two back propagation up to leaf a which means the gradients should accumulate. In other words, you should get a gradient equal to da²/da + d(a+1)²/da which is equal to 2a + 2(a+1) which is 2(2a + 1). If a=3, then a.grad will be equal to 14.
You are using a torch.no_grad context manager which means you will be unable to perform backpropagation from any resulting tensor i.e. here a itself.
Here is a snippet which yields the desired result, that is 14 as the accumulation of both gradients:
>>> L = Loss(a)
>>> L.backward()
>>> a.grad
6
>>> L = Loss(a+1)
>>> L.backward()
>>> a.grad
14 # as 6 + 8
Related
How to find the biggest of two pytorch tensors on size
>>> tensor1 = torch.empty(0)
>>> tensor2 = torch.empty(1)
>>> tensor1
tensor([])
>>> tensor2
tensor([5.9555e-34])
torch.maximum is returrning the empty tensor as the biggest tensor
>>> torch.maximum(tensor1,tensor2)
tensor([])
Is there a way to find the biggest tensor among two tensors (mostly 1d), base on the number of elements in the tensor.
Why not comparing their first dimension size? To do so you can use equivalents: x.size(0), x.shape[0], and len(x). To return the tensor with longest size, you can use the built-in max function with the key argument:
>>> max((tensor1, tensor2), key=len)
Is it possible to run map_fn on a tensor with a single value?
The following works:
import tensorflow as tf
a = tf.constant(1.0, shape=[3])
tf.map_fn(lambda x: x+1, a)
#output: [2.0, 2.0, 2.0]
However this does not:
import tensorflow as tf
b = tf.constant(1.0)
tf.map_fn(lambda x: x+1, b)
#expected output: 2.0
Is it possible at all?
What am I doing wrong?
Any hints will be greatly appreciated!
Well, I see you accepted an answer, which correctly states that tf.map_fn() is applying a function to elements of a tensor, and a scalar tensor has no elements. But it's not impossible to do this for a scalar tensor, you just have to tf.reshape() it before and after, like this code (tested):
import tensorflow as tf
b = tf.constant(1.0)
if () == b.get_shape():
c = tf.reshape( tf.map_fn(lambda x: x+1, tf.reshape( b, ( 1, ) ) ), () )
else:
c = tf.map_fn(lambda x: x+1, b)
#expected output: 2.0
with tf.Session() as sess:
print( sess.run( c ) )
will output:
2.0
as desired.
This way you can factor this into an agnostic function that can take both scalar and non-scalar tensors as argument.
No, this is not possible. As you probably saw it throws an error:
ValueError: elems must be a 1+ dimensional Tensor, not a scalar
The point of map_fn is to apply a function to each element of a tensor, so it makes no sense to use this for a scalar (single-element) tensor.
As to "what you are doing wrong": This is difficult to say without knowing what you're trying to achieve.
I am pre-processing a numpy array and want to enter it in as a tensorflow Variable. I've tried following other stack exchange advice, but so far without success. I would like to see if I'm doing something uniquely wrong here.
npW = np.zeros((784,10))
npW[0,0] = 20
W = tf.Variable(tf.convert_to_tensor(npW, dtype = tf.float32))
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
print("npsum", np.sum(npW))
print(tf.reduce_sum(W))
And this is the result.
npsum 20.0
Tensor("Sum:0", shape=(), dtype=float32)
I don't know why the reduced sum of the W variable remains zero. Am i missing something here?
You need to understand that Tensorflow differs from traditionnal computing. First, you declare a computational graph. Then, you run operations through the graph.
Taking your example, you have your numpy variables :
npW = np.zeros((784,10))
npW[0,0] = 20
Next, these instructions are a definition of tensorflow variables, i.e. nodes in the computational graph:
W = tf.Variable(tf.convert_to_tensor(npW, dtype = tf.float32))
sum = tf.reduce_sum(W)
And to be able to compute the operation, you need to run the op through the graph, with a sesssion, i.e. :
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
result = sess.run(sum)
print(result) # print 20
Another way is to call eval instead of sess.run()
print(sum.eval()) # print 20
So i tested it a bit differently and found out that the variable is getting assigned properly, but, the reduced_sum function isn't working as expected. If any one has explanations on that it would be much appreciated.
npW = np.zeros((2,2))
npW[0,0] = 20
W = tf.Variable(npW, dtype = tf.float32)
A= tf.constant([[20,0],[0,0]])
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Train
print("npsum", np.sum(npW))
x=tf.reduce_sum(W,0)
print(x)
print(tf.reduce_sum(A))
print(W.eval())
print(A.eval())
This had output
npsum 20.0
Tensor("Sum:0", shape=(2,), dtype=float32)
Tensor("Sum_1:0", shape=(), dtype=int32)
[[ 20. 0.],
[ 0. 0.]]
[[20 0],
[ 0 0]]
import tensorflow as tf
x = tf.constant(35, name='x')
y = tf.Variable(x + 5, name='y')
model = tf.initialize_all_variables()
with tf.Session() as session:
session.run(model)
print(session.run(y))
This code generates an error saying Type Error: List of tensors when single tensor expected
what could be the problem???
System Details: Virtualbox:Ubuntu 16.04 xenial, tensor flow 0.9.0, python-3.5
It looks like you are missing a conceptual aspect of tensorflow.
First let me start with a code example
import tensorflow as tf
x = tf.constant(35, name='x')
y = tf.Variable(5, name='y')
add = tf.add(x,y)
update = tf.assign(y,add)
model = tf.initialize_all_variables()
with tf.Session() as session:
session.run(model)
print(session.run(y))
print(session.run([add,y]))
print(session.run([update,y]))
print(session.run([update,y]))
This will print the following
5
40 5
40 40
75 75
So what is going on? Firstly, x and y are not 35 and 5. They are tensorflow objects that contain data and can interact with the tensorflow graph. x is a constant and will provide the graph with the value of 35 when requested by tensorflow, but it is not equal to 35. y is a variable that can be assigned a value and updated by tensorflow when it is running.
In you example you set the value of y to be a variable with an initial value of x + 5, but x is not 35. x is a tensorflow object.
In the example above, we assign the value of 5 to the variable y. When we run the session and get the value of y, it is 5. when we get the value of add it is 35+5 but y hasn't changed. When we execute update we find that the value of y has been updated to 40. Finally, when we update again we see that y has been incremented again by 35 and is now 75.
I hope this explains the difference between classic python variables and constants and tensorflow variables and constants.
I don't understand why do we need tensor.reshape() function in Theano. It is said in the documentation:
Returns a view of this tensor that has been reshaped as in
numpy.reshape.
As far as I understood, theano.tensor.var.TensorVariable is some entity that is used for computation graphs creation. And it is absolutely independent of shapes. For instance when you create your function you can pass there matrix 2x2 or matrix 100x200. As I thought reshape somehow restricts this variety. But it is not. Suppose the following example:
X = tensor.matrix('X')
X_resh = X.reshape((3, 3))
Y = X_resh ** 2
f = theano.function([X_resh], Y)
print(f(numpy.array([[1, 2], [3, 4]])))
As I understood, it should give an error since I passed matrix 2x2 not 3x3, but it computes element-wise squares perfectly.
So what is the shape of the theano tensor variable and where should we use it?
There is an error in the provided code though Theano fails to point this out.
Instead of
f = theano.function([X_resh], Y)
you should really use
f = theano.function([X], Y)
Using the original code you are actually providing the tensor after the reshape so the reshape command never gets executed. This can be seen by adding
theano.printing.debugprint(f)
which prints
Elemwise{sqr,no_inplace} [id A] '' 0
|<TensorType(float64, matrix)> [id B]
Note that there is no reshape operation in this compiled execution graph.
If one changes the code so that X is used as the input instead of X_resh then Theano throws an error including the message
ValueError: total size of new array must be unchanged Apply node that
caused the error: Reshape{2}(X, TensorConstant{(2L,) of 3})
This is expected because one cannot reshape a tensor with shape (2, 2) (i.e. 4 elements) into a tensor with shape (3, 3) (i.e. 9 elements).
To address the broader question, we can use symbolic expressions in the target shape and those expressions can be functions of the input tensor's symbolic shape. Here's some examples:
import numpy
import theano
import theano.tensor
X = theano.tensor.matrix('X')
X_vector = X.reshape((X.shape[0] * X.shape[1],))
X_row = X.reshape((1, X.shape[0] * X.shape[1]))
X_column = X.reshape((X.shape[0] * X.shape[1], 1))
X_3d = X.reshape((-1, X.shape[0], X.shape[1]))
f = theano.function([X], [X_vector, X_row, X_column, X_3d])
for output in f(numpy.array([[1, 2], [3, 4]])):
print output.shape, output