TransformerEncoder with a padding mask - pytorch

I'm trying to implement torch.nn.TransformerEncoder with a src_key_padding_mask not equal to none. Imagine the input is of the shape src = [20, 95] and the binary padding mask has the shape src_mask = [20, 95], 1 in the position of padded tokens and 0 for other positions. I make a transformer encoder with 8 layers, each of which contain an attention with 8 heads and hidden dimension 256:
layer=torch.nn.TransformerEncoderLayer(256, 8, 256, 0.1)
encoder=torch.nn.TransformerEncoder(layer, 6)
embed=torch.nn.Embedding(80000, 256)
src=torch.randint(0, 1000, (20, 95))
src = emb(src)
src_mask = torch.randint(0,2,(20, 95))
output = encoder(src, src_mask)
But I get the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-107-31bf7ab8384b> in <module>
----> 1 output = encoder(src, src_mask)
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
545 result = self._slow_forward(*input, **kwargs)
546 else:
--> 547 result = self.forward(*input, **kwargs)
548 for hook in self._forward_hooks.values():
549 hook_result = hook(self, input, result)
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/transformer.py in forward(self, src, mask, src_key_padding_mask)
165 for i in range(self.num_layers):
166 output = self.layers[i](output, src_mask=mask,
--> 167 src_key_padding_mask=src_key_padding_mask)
168
169 if self.norm:
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
545 result = self._slow_forward(*input, **kwargs)
546 else:
--> 547 result = self.forward(*input, **kwargs)
548 for hook in self._forward_hooks.values():
549 hook_result = hook(self, input, result)
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/transformer.py in forward(self, src, src_mask, src_key_padding_mask)
264 """
265 src2 = self.self_attn(src, src, src, attn_mask=src_mask,
--> 266 key_padding_mask=src_key_padding_mask)[0]
267 src = src + self.dropout1(src2)
268 src = self.norm1(src)
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
545 result = self._slow_forward(*input, **kwargs)
546 else:
--> 547 result = self.forward(*input, **kwargs)
548 for hook in self._forward_hooks.values():
549 hook_result = hook(self, input, result)
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/activation.py in forward(self, query, key, value, key_padding_mask, need_weights, attn_mask)
781 training=self.training,
782 key_padding_mask=key_padding_mask, need_weights=need_weights,
--> 783 attn_mask=attn_mask)
784
785
~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v)
3250 if attn_mask is not None:
3251 attn_mask = attn_mask.unsqueeze(0)
-> 3252 attn_output_weights += attn_mask
3253
3254 if key_padding_mask is not None:
RuntimeError: The size of tensor a (20) must match the size of tensor b (95) at non-singleton dimension 2
I was wondering if somebody could help me figure out this problem.
Thanks

The required shapes are shown in nn.Transformer.forward - Shape (all building blocks of the transformer refer to it). The relevant ones for the encoder are:
src: (S, N, E)
src_mask: (S, S)
src_key_padding_mask: (N, S)
where S is the sequence length, N the batch size and E the embedding dimension (number of features).
The padding mask should have shape [95, 20], not [20, 95]. This assumes that your batch size is 95 and the sequence length is 20, but if that is the other way around, you would have to transpose the src instead.
Furthermore, when calling the encoder, you are not specifying the src_key_padding_mask, but rather the src_mask, as the signature of torch.nn.TransformerEncoder.forward is:
forward(src, mask=None, src_key_padding_mask=None)
The padding mask must be specified as the keyword argument src_key_padding_mask not as the second positional argument. And to avoid confusion, your src_mask should be renamed to src_key_padding_mask.
src_key_padding_mask = torch.randint(0,2,(95, 20))
output = encoder(src, src_key_padding_mask=src_key_padding_mask)

Related

How to reconstruct the decoder from an LSTM-AE?

I have a trained LSTM-AE, of which the architecture is as follows:
In brief, I have an LSTM-AE of depth 3, the number of cells on the LSTM layers on the encoder side are [120, 80, 50] (and symmetric for the decoder). I built the model using the code shown on this page. For information, because I want to train the LSTM-AT directly on variable-length time series, so I didn't specify the timestamps in the input layer, which means the model is trained on batches of size 1 (one time series per batch).
I can extract the encoder just fine, but I cannot do the same for the decoder :-(... My goal is to check, given a vector of 50 features (which are extracted by the encoder), whether the decoder can reconstruct the input series.
Here's my attempt so far:
# load the full autoencoder
model = load_model(path_to_model)
# reconstruct the decoder
in_layer = Input(shape=(None, 50))
time_dist = model.layers[-1]
dec_1 = model.layers[-2]
dec_2 = model.layers[-3]
dec_3 = model.layers[-4]
rep_vec = model.layers[-5]
out_layer = time_dist(dec_1(dec_2(dec_3(rep_vec(in_layer)))))
decoder = Model(in_layer, out_layer, name='decoder')
res = decoder(input_feature) # input_feature has shape (50,)
I obtained this error:
InvalidArgumentError: slice index 1 of dimension 0 out of bounds. [Op:StridedSlice] name: decoder/repeat/strided_slice/
If you are interested in the full error log...
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
Input In [86], in <module>
13 out_layer = time_dist(dec_1(dec_2(dec_3(rep_vec(in_layer)))))
14 decoder = Model(in_layer, out_layer, name='decoder')
---> 15 res = decoder(input_feature)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:1030, in Layer.__call__(self, *args, **kwargs)
1026 inputs = self._maybe_cast_inputs(inputs, input_list)
1028 with autocast_variable.enable_auto_cast_variables(
1029 self._compute_dtype_object):
-> 1030 outputs = call_fn(inputs, *args, **kwargs)
1032 if self._activity_regularizer:
1033 self._handle_activity_regularization(inputs, outputs)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py:420, in Functional.call(self, inputs, training, mask)
401 #doc_controls.do_not_doc_inheritable
402 def call(self, inputs, training=None, mask=None):
403 """Calls the model on new inputs.
404
405 In this case `call` just reapplies
(...)
418 a list of tensors if there are more than one outputs.
419 """
--> 420 return self._run_internal_graph(
421 inputs, training=training, mask=mask)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py:556, in Functional._run_internal_graph(self, inputs, training, mask)
553 continue # Node is not computable, try skipping.
555 args, kwargs = node.map_arguments(tensor_dict)
--> 556 outputs = node.layer(*args, **kwargs)
558 # Update tensor_dict.
559 for x_id, y in zip(node.flat_output_ids, nest.flatten(outputs)):
File ~/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:1030, in Layer.__call__(self, *args, **kwargs)
1026 inputs = self._maybe_cast_inputs(inputs, input_list)
1028 with autocast_variable.enable_auto_cast_variables(
1029 self._compute_dtype_object):
-> 1030 outputs = call_fn(inputs, *args, **kwargs)
1032 if self._activity_regularizer:
1033 self._handle_activity_regularization(inputs, outputs)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/keras/layers/core.py:919, in Lambda.call(self, inputs, mask, training)
915 return var
917 with backprop.GradientTape(watch_accessed_variables=True) as tape,\
918 variable_scope.variable_creator_scope(_variable_creator):
--> 919 result = self.function(inputs, **kwargs)
920 self._check_variables(created_variables, tape.watched_variables())
921 return result
File D:/PhD/Code/feature_learning/train_models/train_lstmae.py:30, in repeat_vector(args)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:206, in add_dispatch_support.<locals>.wrapper(*args, **kwargs)
204 """Call target, and fall back on dispatchers if there is a TypeError."""
205 try:
--> 206 return target(*args, **kwargs)
207 except (TypeError, ValueError):
208 # Note: convert_to_eager_tensor currently raises a ValueError, not a
209 # TypeError, when given unexpected types. So we need to catch both.
210 result = dispatch(wrapper, args, kwargs)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:1040, in _slice_helper(tensor, slice_spec, var)
1038 var_empty = constant([], dtype=dtypes.int32)
1039 packed_begin = packed_end = packed_strides = var_empty
-> 1040 return strided_slice(
1041 tensor,
1042 packed_begin,
1043 packed_end,
1044 packed_strides,
1045 begin_mask=begin_mask,
1046 end_mask=end_mask,
1047 shrink_axis_mask=shrink_axis_mask,
1048 new_axis_mask=new_axis_mask,
1049 ellipsis_mask=ellipsis_mask,
1050 var=var,
1051 name=name)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:206, in add_dispatch_support.<locals>.wrapper(*args, **kwargs)
204 """Call target, and fall back on dispatchers if there is a TypeError."""
205 try:
--> 206 return target(*args, **kwargs)
207 except (TypeError, ValueError):
208 # Note: convert_to_eager_tensor currently raises a ValueError, not a
209 # TypeError, when given unexpected types. So we need to catch both.
210 result = dispatch(wrapper, args, kwargs)
File ~/venv/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:1213, in strided_slice(input_, begin, end, strides, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask, var, name)
1210 if strides is None:
1211 strides = ones_like(begin)
-> 1213 op = gen_array_ops.strided_slice(
1214 input=input_,
1215 begin=begin,
1216 end=end,
1217 strides=strides,
1218 name=name,
1219 begin_mask=begin_mask,
1220 end_mask=end_mask,
1221 ellipsis_mask=ellipsis_mask,
1222 new_axis_mask=new_axis_mask,
1223 shrink_axis_mask=shrink_axis_mask)
1225 parent_name = name
1227 if var is not None:
File ~/venv/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py:10505, in strided_slice(input, begin, end, strides, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask, name)
10503 return _result
10504 except _core._NotOkStatusException as e:
> 10505 _ops.raise_from_not_ok_status(e, name)
10506 except _core._FallbackException:
10507 pass
File ~/venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:6897, in raise_from_not_ok_status(e, name)
6895 message = e.message + (" name: " + name if name is not None else "")
6896 # pylint: disable=protected-access
-> 6897 six.raise_from(core._status_to_exception(e.code, message), None)
File <string>:3, in raise_from(value, from_value)
InvalidArgumentError: slice index 1 of dimension 0 out of bounds. [Op:StridedSlice] name: decoder/repeat/strided_slice/
I appreciate very much any advice you would give me!
Edit
Here is the code I used to build the mode:
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.initializers import GlorotUniform
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.backend import shape
def repeat_vector(args):
"""Builds the repeat vector layer dynamically by the size of the input series"""
layer_to_repeat = args[0]
sequence_layer = args[1]
return RepeatVector(shape(sequence_layer)[1])(layer_to_repeat)
n_atts = 3 # time series of 3 measurements
n_units = [120, 80, 50] # encoder - 1st layer: 120, 2nd layer: 80, 3rd layer: 50 (and symmetric for decoder)
n_layers = len(n_units)
init = GlorotUniform(seed=420)
reg = None
optimizer = Adam(learning_rate=0.0001)
activ = 'tanh'
loss_metric = 'mse'
inputs = Input(shape=(None, n_atts), name='input_layer')
# the encoder
encoded = LSTM(n_units[0], name='encoder_1', return_sequences=(n_layers != 1), kernel_initializer=init,
kernel_regularizer=reg, activation=activ)(inputs)
for i in range(1, n_layers):
if i != n_layers - 1:
encoded = LSTM(n_units[i], name='encoder_{}'.format(i + 1), return_sequences=(n_layers != 1),
kernel_initializer=init, kernel_regularizer=reg, activation=activ)(encoded)
else:
encoded = LSTM(n_units[i], name='encoder_{}'.format(i + 1), return_sequences=False,
kernel_initializer=init, kernel_regularizer=reg, activation=activ)(encoded)
# repeat the vector (plug the encoder to the decoder)
repeated = Lambda(repeat_vector, output_shape=(None, n_units[-1]), name='repeat')([encoded, inputs])
# the decoder
decoded = LSTM(n_units[n_layers - 1], return_sequences=True, name='decoder_1',
kernel_initializer=init, kernel_regularizer=reg, activation=activ)(repeated) # first layer
for i in range(1, n_layers):
decoded = LSTM(n_units[n_layers - 1 - i], return_sequences=True, name='decoder_{}'.format(i + 1),
kernel_initializer=init, kernel_regularizer=reg, activation=activ)(decoded)
# last layer
tdist = TimeDistributed(Dense(n_atts))(decoded)
# compile the model
model = Model(inputs, tdist, name='lstm-ae')
model.compile(optimizer=optimizer, loss=loss_metric)
For information, I use tensorflow 2.5.
Because the number of units is read from a config file, I wrote the code this way to add the layers programmatically.

layers compatibility between attention layer and CONV1D in keras

I am building a model in bilstm-attention-conv1d fashion (i want to use multiple conv1d with different kernel sizes) I am facing the layers incompatibility issue between the attention layer and conv1d layer. I have tried Reshape function but it's not working, Following is my code:
my model is as follows
sequence_input = Input(shape=(maxlen,), dtype="int32")
embedded_sequences = Embedding(50000, output_dim=output_dim)(sequence_input)
lstm = Bidirectional(LSTM(RNN_CELL_SIZE, return_sequences = True), name="bi_lstm_0")(embedded_sequences)
# Getting our LSTM outputs
(lstm, forward_h, forward_c, backward_h, backward_c) = Bidirectional(LSTM(RNN_CELL_SIZE, return_sequences=True, return_state=True),
name="bi_lstm_1")(lstm)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
context_vector, attention_weights = Attention(10)(lstm, state_h)
x = Reshape((maxlen, output_dim, 1))(context_vector)
kernel_sizes = [1,2,3,4,5]
convs = []
for kernel_size in range(len(kernel_sizes)):
conv = Conv1D(128, kernel_size, activation='relu')(x)
convs.append(conv)
avg_pool = GlobalAveragePooling1D()(convs)
max_pool = GlobalMaxPooling1D()(convs)
conc = concatenate([avg_pool, max_pool])
output = Dense(50, activation="sigmoid")(conc)
model = keras.Model(inputs=sequence_input, outputs=output)
print(model.summary())
my code gives me the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-114-8e5c0c75e84a> in <module>()
13 context_vector, attention_weights = Attention(10)(lstm, state_h)
14
---> 15 x = Reshape((maxlen, output_dim, 1))(context_vector)
16
17
6 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
950 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
951 return self._functional_construction_call(inputs, args, kwargs,
--> 952 input_list)
953
954 # Maintains info about the `Layer.call` stack.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
1089 # Check input assumptions set after layer building, e.g. input shape.
1090 outputs = self._keras_tensor_symbolic_call(
-> 1091 inputs, input_masks, args, kwargs)
1092
1093 if outputs is None:
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs)
820 return nest.map_structure(keras_tensor.KerasTensor, output_signature)
821 else:
--> 822 return self._infer_output_signature(inputs, args, kwargs, input_masks)
823
824 def _infer_output_signature(self, inputs, args, kwargs, input_masks):
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks)
861 # TODO(kaftan): do we maybe_build here, or have we already done it?
862 self._maybe_build(inputs)
--> 863 outputs = call_fn(inputs, *args, **kwargs)
864
865 self._handle_activity_regularization(inputs, outputs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/core.py in call(self, inputs)
555 # Set the static shape for the result since it might lost during array_ops
556 # reshape, eg, some `None` dim in the result could be inferred.
--> 557 result.set_shape(self.compute_output_shape(inputs.shape))
558 return result
559
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/core.py in compute_output_shape(self, input_shape)
546 output_shape = [input_shape[0]]
547 output_shape += self._fix_unknown_dimension(input_shape[1:],
--> 548 self.target_shape)
549 return tensor_shape.TensorShape(output_shape)
550
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape)
534 output_shape[unknown] = original // known
535 elif original != known:
--> 536 raise ValueError(msg)
537 return output_shape
538
ValueError: total size of new array must be unchanged, input_shape = [256], output_shape = [2500, 100, 1]
kindly help me

ValueError: Target size (torch.Size([26])) must be the same as input size (torch.Size([66]))

ValueError Traceback (most recent call last)
<ipython-input-28-10509ec63b58> in <module>
11 start_time = time.time()
12
---> 13 train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
14 valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
15
<ipython-input-25-ecea5e6d8ce8> in train(model, iterator, optimizer, criterion)
12 predictions = model(batch.t).squeeze(1)
13
---> 14 loss = criterion(predictions, batch.l)
15
16 acc = binary_accuracy(predictions, batch.l)
~\anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
~\anaconda3\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
599 self.weight,
600 pos_weight=self.pos_weight,
--> 601 reduction=self.reduction)
602
603
~\anaconda3\lib\site-packages\torch\nn\functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
2122
2123 if not (target.size() == input.size()):
-> 2124 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
2125
2126 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([26])) must be the same as input size (torch.Size([66]))
I'm training CNN. Working on my own Indonesian dataset to do sentiment analysis. This is my code. I used the criteria = nn.BCEWithLogitsLoss () and optimizer = optim.RMSprop,. I do not understand, where I need to make changes, to correct mistakes.

Non-Stateful LSTM Issues with Keras

Good day,
I am trying to create a LSTM model (stateful or non-stateful) but running into several issues.
I am attempting to add a layer using:
model = Sequential()
...
model.add(LSTM(c['num_rnn_unit'],
activation=c['rnn_activation'],
dropout=c['dropout_rnn_input'],
recurrent_dropout=c['dropout_rnn_recurrent'],
return_sequences=True,
stateful=False,
#batch_input_shape=(c['batch_size'],c['num_steps'], c['input_dim'])
))
where:
'num_rnn_unit': np.random.choice([16, 32, 64, 128, 256, 512, 1024])
'rnn_activation': np.random.choice(['tanh', 'sigmoid'])
'dropout_rnn_input': 0
'batch_size': np.random.choice([64, 128])
'num_steps':np.random.choice([5, 10, 15])
'input_dim': 64
I experimented with "stateful=True" and used the commented out "batch_input_shape" but this caused additional errors, which others have had as well but found no workable solution.
So I stuck with trying to make "stateful=False" to work but it yields the error (below).
Any thoughts on why this error is coming up? Thanks in advance!
Here is the traceback:
~\AppData\Local\Continuum\anaconda3\envs\env_py3\lib\site-packages\keras\engine\sequential.py in add(self, layer)
180 self.inputs = network.get_source_inputs(self.outputs[0])
181 elif self.outputs:
--> 182 output_tensor = layer(self.outputs[0])
183 if isinstance(output_tensor, list):
184 raise TypeError('All layers in a Sequential model '
~\AppData\Local\Continuum\anaconda3\envs\env_py3\lib\site-packages\keras\layers\recurrent.py in __call__(self, inputs, initial_state, constants, **kwargs)
539
540 if initial_state is None and constants is None:
--> 541 return super(RNN, self).__call__(inputs, **kwargs)
542
543 # If any of `initial_state` or `constants` are specified and are Keras
~\AppData\Local\Continuum\anaconda3\envs\env_py3\lib\site-packages\keras\backend\tensorflow_backend.py in symbolic_fn_wrapper(*args, **kwargs)
73 if _SYMBOLIC_SCOPE.value:
74 with get_graph().as_default():
---> 75 return func(*args, **kwargs)
76 else:
77 return func(*args, **kwargs)
~\AppData\Local\Continuum\anaconda3\envs\env_py3\lib\site-packages\keras\engine\base_layer.py in __call__(self, inputs, **kwargs)
487 # Actually call the layer,
488 # collecting output(s), mask(s), and shape(s).
--> 489 output = self.call(inputs, **kwargs)
490 output_mask = self.compute_mask(inputs, previous_mask)
491
~\AppData\Local\Continuum\anaconda3\envs\env_py3\lib\site-packages\keras\layers\recurrent.py in call(self, inputs, mask, training, initial_state)
1689 mask=mask,
1690 training=training,
-> 1691 initial_state=initial_state)
1692
1693 #property
~\AppData\Local\Continuum\anaconda3\envs\env_py3\lib\site-packages\keras\layers\recurrent.py in call(self, inputs, mask, training, initial_state, constants)
635 mask = mask[0]
636
--> 637 if len(initial_state) != len(self.states):
638 raise ValueError('Layer has ' + str(len(self.states)) +
639 ' states but was passed ' +
~\AppData\Local\Continuum\anaconda3\envs\env_py3\lib\site-packages\keras\layers\recurrent.py in states(self)
436 num_states = 1
437 else:
--> 438 num_states = len(self.cell.state_size)
439 return [None for _ in range(num_states)]
440 return self._states
TypeError: object of type 'numpy.int32' has no len()
Would this be the first layer "input_shape", with batch_normalization=True:
if c['batch_normalization']:
model.add(BatchNormalization(input_shape=(c['num_steps'], c['input_dim'])))
model.add(TimeDistributed(Dropout(c['dropout_input']),
input_shape=(c['num_steps'], c['input_dim'])))

PyTorch: DecoderRNN: RuntimeError: input must have 3 dimensions, got 2

I am building a DecoderRNN using PyTorch (This is an image-caption decoder):
class DecoderRNN(nn.Module):
def __init__(self, embed_size, hidden_size, vocab_size):
super(DecoderRNN, self).__init__()
self.hidden_size = hidden_size
self.gru = nn.GRU(embed_size, hidden_size, hidden_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, features, captions):
print (features.shape)
print (captions.shape)
output, hidden = self.gru(features, captions)
output = self.softmax(self.out(output[0]))
return output, hidden
The data have the following shapes:
torch.Size([10, 200]) <- features.shape (10 for batch size)
torch.Size([10, 12]) <- captions.shape (10 for batch size)
Then I got the following errors. Any ideas what I missed here? Thanks!
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-76e05ba08b1d> in <module>()
44 # Pass the inputs through the CNN-RNN model.
45 features = encoder(images)
---> 46 outputs = decoder(features, captions)
47
48 # Calculate the batch loss.
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
--> 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)
/home/workspace/model.py in forward(self, features, captions)
37 print (captions.shape)
38 # features = features.unsqueeze(1)
---> 39 output, hidden = self.gru(features, captions)
40 output = self.softmax(self.out(output[0]))
41 return output, hidden
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
--> 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
167 flat_weight=flat_weight
168 )
--> 169 output, hidden = func(input, self.all_weights, hx)
170 if is_packed:
171 output = PackedSequence(output, batch_sizes)
/opt/conda/lib/python3.6/site-packages/torch/nn/_functions/rnn.py in forward(input, *fargs, **fkwargs)
383 return hack_onnx_rnn((input,) + fargs, output, args, kwargs)
384 else:
--> 385 return func(input, *fargs, **fkwargs)
386
387 return forward
/opt/conda/lib/python3.6/site-packages/torch/autograd/function.py in _do_forward(self, *input)
326 self._nested_input = input
327 flat_input = tuple(_iter_variables(input))
--> 328 flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
329 nested_output = self._nested_output
330 nested_variables = _unflatten(flat_output, self._nested_output)
/opt/conda/lib/python3.6/site-packages/torch/autograd/function.py in forward(self, *args)
348 def forward(self, *args):
349 nested_tensors = _map_variable_tensor(self._nested_input)
--> 350 result = self.forward_extended(*nested_tensors)
351 del self._nested_input
352 self._nested_output = result
/opt/conda/lib/python3.6/site-packages/torch/nn/_functions/rnn.py in forward_extended(self, input, weight, hx)
292 hy = tuple(h.new() for h in hx)
293
--> 294 cudnn.rnn.forward(self, input, hx, weight, output, hy)
295
296 self.save_for_backward(input, hx, weight, output)
/opt/conda/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py in forward(fn, input, hx, weight, output, hy)
206 if (not is_input_packed and input.dim() != 3) or (is_input_packed and input.dim() != 2):
207 raise RuntimeError(
--> 208 'input must have 3 dimensions, got {}'.format(input.dim()))
209 if fn.input_size != input.size(-1):
210 raise RuntimeError('input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
RuntimeError: input must have 3 dimensions, got 2
Your GRU input needs to be 3-dimensional:
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.
Further you need to provide the hidden state (last encoder hidden state in this case) as second parameter:
self.gru(input, h_0)
Where input is your actual input and h_0 the hidden state which needs to be 3-dimensional as well:
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided.
https://pytorch.org/docs/master/nn.html#torch.nn.GRU
The answer by #MBT is correct, but I thought maybe what you were looking to do originally was to use a GRUCell instead of a GRU (which processes input step-by-step instead of the whole time-sequence at once).

Resources