Using scatter_nd to project an attention distribution onto another distribution, essentially creating an distribution that references a vocabulary.
indices = tf.stack((batch_nums, encoder_batch), axis=2)
shape = [batch_size, vocab_size]
attn_dists_projected = [tf.scatter_nd(indices, copy_distribution, shape) for copy_distribution in attn_dists]
When attempting to run this with placeholders with largely undefined dimensions, I ran into the following error:
ValueError: The inner 0 dimensions of output.shape=[?,?] must match the inner 1
dimensions of updates.shape=[128,128,?]: Shapes must be equal rank, but are 0 and 1
for 'final_distribution/ScatterNd' (op: 'ScatterNd') with input shapes:
[128,?,2], [128,128,?], [2].
This in the context of seq2seq, so the model placeholders' shapes need to be partially undefined. Additionally, my data batches are not consistent in size, which necessitates variable batch sizes as well.
Related
Lets say i have an CNN intermediate layer output tensor call it X with shape (B,C,H,W) batch, channels, height and width. I extract the regions of interest (ROIs) from the tensor based on some manually chosen criteria i.e i have box coordinates. Assume all the ROIs have same shape (B,N,C,h,w). N is number of ROIs, h is height, and w is width of ROI respectively. Lets call the ROI tensor Y. Now i perform a differentiable operation on Y (assume convolution), this operation does not alter the dimension or shape of the ROIs. Lets call the modified ROI tensor as Y’(shape: B,N,C,h,w).
Now i want to replace the locations where Y are extracted from X with Y’. This modified X is further processed in the subsequent layers of the model. So essentially if i do the following things
Y = X[location criteria]
Y’ = some_operation(Y)
X[location criteria] = Y’
The above operation mentioned has inplace change of X, pytorch computational graph cannot keep track of it. How to modify the value of X without causing error?
I'm working with time-variant graph embedding, where at each time step, the adjacency matrix of the graph changes. The main idea is to perform the node embedding of each timestep of the graph by looking to a set of node features and the adjacency matrix. The node embedding step is long and complicated, and is not part of the core of the problem, so I will skip this part. Suffice it to say that I use Graph Convolutional Network to embed the nodes.
Consider that I have a stack of B adjacency matrices A with sizes NxN, where B = batch size and N = number of nodes in the graph. Also, the matrices are stacked according to a time series, where matrix in index i comes before matrix in index i+1. I have already embedded the nodes of the graph, which results in a matrix of dimensions B x N x E, where E = size of the embedding (parameter). Note that the model has to deal with any graph, therefore, N is not a parameter. Another important comment is that each batch contains adjacency matrices from the same graph, and therefore all matrices of a batch have the same number of node, but the matrices of other batches may have different number of nodes.
I now need to pass these embedding through an LSTM cell. I never used Keras before, so I'm having a hard time making the Keras LSTM blend in my Tensorflow code. What I want to do is: pass each node embedding through an LSTM such that the number of timesteps = B and the LSTM batch size = N, that is, the input to my LSTM has the shape [N, B, E], where N and B are only known through execution time. I want the output of my LSTM to have the shape of [B, E*E]. The embedding matrix is called here self.embed_mat. Here is my code:
def _LSTM_layer(self):
with tf.variable_scope(self.scope, reuse=tf.AUTO_REUSE), tf.device(self.device):
in_shape = tf.shape(self.embed_mat)
lstm_input = tf.reshape(self.embed_mat, [in_shape[1], in_shape[0], EMBED_SIZE]) #lstm = [N, B, E]
input_plh = K.placeholder(name="lstm_input", shape=(None, None, EMBED_SIZE))
lstm = LSTM(EMBED_SIZE*EMBED_SIZE, input_shape=(None, None, EMBED_SIZE))
get_output = K.function(inputs=[input_plh], outputs=[lstm(input_plh)])
h = get_output([lstm_input])
I am a bit lost with the K.function part. All I want is the output tensor of the LSTM cell. I've seen that in order to get that with Keras, we need to use K.function, but I don't quite get it what it does. When I call get_output([lstm_input]), I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'worker_global/A/shape' with dtype int64 and shape [?]
Here, A is the stacked adjacency matrices with dimension BxNxN. What is going on here? Does the value of N needs to be known during graph building step? I think I made some dumb mistake with the LSTM cell, but I can't get what it is.
Thanks in advance!
If you want to get the output of your LSTM layer "out" given input of "inp" in a keras Sequential() model called "model," where "inp" is your first / input layer and "out" is an LSTM layer that happens to be, for the sake of this example, in the 4th position in your sequential model, you would obtain the output of that LSTM layer from the data you call "lstm_input" above with the following code:
inp = model.layers[0].input
out = model.layers[3].output
inp_to_out = K.function([inp], [out])
output = inp_to_out([lstm_input])
I used python 3 and when i insert transform random crop size 224 it gives miss match error.
here my code
what did i wrong ?
Your code makes variations on resnet: you changed the number of channels, the number of bottlenecks at each "level", and you removed a "level" entirely. As a result, the dimension of the feature map you have at the end of layer3 is not 64: you have a larger spatial dimension than you anticipated by the nn.AvgPool2d(8). The error message you got actually tells you that the output of level3 is of shape 64x56x56 and after avg pooling with kernel and stride 8 you have 64x7x7=3136 dimensional feature vector, instead of only 64 you are expecting.
What can you do?
As opposed to "standard" resnet, you removed stride from conv1 and you do not have max pool after conv1. Moreover, you removed layer4 which also have a stride. Therefore, You can add pooling to your net to reduce the spatial dimensions of layer3.
Alternatively, you can replace nn.AvgPool(8) with nn.AdaptiveAvgPool2d([1, 1]) an avg pool that outputs only one feature regardless of the spatial dimensions of the input feature map.
For a function approximation problem I'm trying to accumulate gradients but I find that sometimes some of these gradients are nan(i.e. undefined) even though the loss is always real. I think this might be due to numerical instabilities and I'm basically looking for a simple method for removing the nans from the computed gradients.
Starting with the solution to this question I tried doing the following:
# Optimizer definition - nothing different from any classical example
opt = tf.train.AdamOptimizer()
## Retrieve all trainable variables you defined in your graph
tvs = tf.trainable_variables()
## Creation of a list of variables with the same shape as the trainable ones
# initialized with 0s
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
gvs_ = opt.compute_gradients(rmse, tvs)
gvs =tf.where(tf.is_nan(gvs_), tf.zeros_like(gvs_), gvs_)
## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
## Define the training step (part with variable value update)
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])
So basically, the key idea is this line:
gvs =tf.where(tf.is_nan(gvs_), tf.zeros_like(gvs_), gvs_)
But when I apply this idea I obtain the following error:
ValueError: Tried to convert 'x' to a tensor and failed. Error:
Dimension 1 in both shapes must be equal, but are 30 and 9. Shapes are
[2,30] and [2,9]. From merging shape 2 with other shapes. for
'IsNan/packed' (op: 'Pack') with input shapes: [2,9,30], [2,30,9],
[2,30], [2,9].
compute_gradients returns a list of tensors in your case. You may want to do:
gvs_ = [(tf.where(tf.is_nan(grad), tf.zeros_like(grad), grad), val) for grad,val in gvs_]
I'm trying to randomly subsample the prediction and target array for my loss calculation.
idx = torch.randperm(target.shape[0])
target = target.index_select(0, idx[0, sample_size]
However I'm getting this error message.
index_select(): argument 'index' (position 2) must be Variable, not torch.LongTensor
Does anyone know how to fix this?
Edit:
I got one step closer. It seems like torch.randperm does not return a torch variable, so one has to explicitly convert the output:
idx = torch.randperm(target.shape[0])
idx = Variable(idx).cuda()
target = target.index_select(0, idx[0, sample_size]
only problem is now that the backpropagation fails. Seems like the operation of randomly subsampling is causing an issue with the dimensions.
However the dimensions seem to be fine when calculating the loss:
loss = F.nll_loss(prediction, target.view(-1)) # prediction shape is [Nx12] and target shape is N
Unfortunately when calling loss.backward() I get this error message:
RuntimeError: The expanded size of the tensor (12) must match the existing size (217456) at non-singleton dimension 1