Autograd.grad() for Tensor in pytorch - pytorch

I want to compute the gradient between two tensors in a net. The input X tensor (batch size x m) is sent through a set of convolutional layers which give me back and output Y tensor(batch size x n).
I’m creating a new loss and I would like to know the gradient of Y w.r.t. X. Something that in tensorflow would be like:
tf.gradients(ys=Y, xs=X)
Unfortunately, I’ve been making tests with torch.autograd.grad(), but I could not figure out how to do it. I get errors like: “RunTimeerror: grad can be implicitly created only for scalar outputs”.
What should be the inputs in torch.autograd.grad() if I want to know the gradient of Y w.r.t. X?

Let's start from simple working example with plain loss function and regular backward. We will build short computational graph and do some grad computations on it.
Code:
import torch
from torch.autograd import grad
import torch.nn as nn
# Create some dummy data.
x = torch.ones(2, 2, requires_grad=True)
gt = torch.ones_like(x) * 16 - 0.5 # "ground-truths"
# We will use MSELoss as an example.
loss_fn = nn.MSELoss()
# Do some computations.
v = x + 2
y = v ** 2
# Compute loss.
loss = loss_fn(y, gt)
print(f'Loss: {loss}')
# Now compute gradients:
d_loss_dx = grad(outputs=loss, inputs=x)
print(f'dloss/dx:\n {d_loss_dx}')
Output:
Loss: 42.25
dloss/dx:
(tensor([[-19.5000, -19.5000], [-19.5000, -19.5000]]),)
Ok, this works! Now let's try to reproduce error "grad can be implicitly created only for scalar outputs". As you can notice, loss in previous example is a scalar. backward() and grad() by defaults deals with single scalar value: loss.backward(torch.tensor(1.)). If you try to pass tensor with more values you will get an error.
Code:
v = x + 2
y = v ** 2
try:
dy_hat_dx = grad(outputs=y, inputs=x)
except RuntimeError as err:
print(err)
Output:
grad can be implicitly created only for scalar outputs
Therefore, when using grad() you need to specify grad_outputs parameter as follows:
Code:
v = x + 2
y = v ** 2
dy_dx = grad(outputs=y, inputs=x, grad_outputs=torch.ones_like(y))
print(f'dy/dx:\n {dy_dx}')
dv_dx = grad(outputs=v, inputs=x, grad_outputs=torch.ones_like(v))
print(f'dv/dx:\n {dv_dx}')
Output:
dy/dx:
(tensor([[6., 6.],[6., 6.]]),)
dv/dx:
(tensor([[1., 1.], [1., 1.]]),)
NOTE: If you are using backward() instead, simply do y.backward(torch.ones_like(y)).

The above solution is not totally correct. It's only correct in a special case where output dimension is 1.
As mentioned in the docs, the output of torch.autograd.grad is related to derivatives but it's not actually dy/dx. For example, assume you have a neural network that inputs a tensor of shape (batch_size, input_dim) and outputs a tensor with shape (batch_size, output_dim). The derivatives of the output w.r.t. input should be of shape (batch_size, output_dim, input_dim) but what you get from torch.autograd.grad has shape (batch_size, input_dim), which is the sum of the real derivatives over the output dimension. If you want the correct derivatives you should use torch.autograd.functional.jacobian as follows:
import torch
torch.>>> torch.__version__
'1.10.1+cu111'
>>>
#!/usr/bin/env python
# coding: utf-8
import torch
from torch import nn
import numpy as np
batch_size = 10
hidden_dim = 20
input_dim = 3
output_dim = 2
model = nn.Sequential(nn.Linear(input_dim, hidden_dim), nn.Tanh(), nn.Linear(hidden_dim, output_dim)).double()
x = torch.rand(batch_size, input_dim, requires_grad=True, dtype=torch.float64) #(batch_size, input_dim)
y = model(x) #y: (batch_size, output_dim)
#using torch.autograd.grad
dydx1 = torch.autograd.grad(y, x, retain_graph=True, grad_outputs=torch.ones_like(y))[0] #dydx1: (batch_size, input_dim)
print(f' using grad dydx1: {dydx1.shape}')
#using torch.autograd.functional.jacobian
j = torch.autograd.functional.jacobian(lambda t: model(t), x) #j: (batch_size, output_dim, batch_size, input_dim)
#the off-diagonal elements of 0th and 2nd dimension are all zero. So we remove them
dydx2 = torch.diagonal(j, offset=0, dim1=0, dim2=2) #dydx2: (output_dim, input_dim, batch_size)
dydx2 = dydx2.permute(2, 0, 1) #dydx2: (batch_size, output_dim, input_dim)
print(f' using jacobian dydx2: {dydx2.shape}')
#round to 14 decimal digits to avoid noise
print(np.round((dydx2.sum(dim=1)).numpy(), 14) == np.round(dydx1.numpy(), 14))
Output:
>using grad dydx1: torch.Size([10, 3])
>using jacobian dydx2: torch.Size([10, 2, 3])
#dydx2.sum(dim=1) == dydx1
>[[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]]
In fact autograd.grad returns the sum of the dydx over output dimension.
If you really want to use torch.autograd.grad there is an inefficient way to do that:
dydx3 = torch.tensor([], dtype=torch.float64)
for i in range(output_dim):
l = torch.zeros_like(y)
l[:, i] = 1.
d = torch.autograd.grad(y, x, retain_graph=True, grad_outputs=l)[0] #dydx: (batch_size, input_dim)
dydx3 = torch.concat((dydx3, d.unsqueeze(dim=1)), dim=1)
print(f' dydx3: {dydx3.shape}')
print(np.round(dydx3.numpy(), 14) == np.round(dydx2.numpy(), 14))
Output:
dydx3: torch.Size([10, 2, 3])
[[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]
[[ True True True]
[ True True True]]]
I hope it helps.
P.S. I used retain_graph=True because of multiple backward calls.

Related

Bi-LSTM with Keras : dimensions must be equal but are 7 and 300

I am creating for the first time a bilstm with keras but I am having difficulties. So that you understand, here are the steps I have done:
I created an embedding matrix with Glove for my x ;
def create_embeddings(fichier,dictionnaire,dictionnaire_tokens):
with open(fichier) as file:
line = file.readline()
max_words = max(dictionnaire_tokens.values())+1 #1032
max_size_dimensions = 300
emb_matrix = np.zeros((max_words,max_size_dimensions))
for item,count in dictionnaire_tokens.items():
try:
vecteur = dictionnaire[item]
except:
pass
if vecteur is not None:
emb_matrix[count]= vecteur
return emb_matrix
I did some one hot encoding with my y's;
def one_hot_encoding(file):
with open(file) as file:
line = file.readline()
liste = []
while line:
tag = line.split(" ")[1]
tag = [tag]
line = file.readline()
liste.append(tag)
one_hot = MultiLabelBinarizer()
array = one_hot.fit_transform(liste)
return array
I compiled my model with keras
from tensorflow.keras.layers import Bidirectional
model = Sequential()
embedding_layer = Embedding(input_dim=1031 + 1,
output_dim=300,
weights=[embedding_matrix],
trainable=False)
model.add(embedding_layer)
bilstm_layer = Bidirectional(LSTM(units=300, return_sequences=True))
model.add(bilstm_layer)
model.add(Dense(300, activation="relu"))
#crf_layer = CRF(units=len(self.tags), sparse_target=True)
#model.add(crf_layer)
model.compile(optimizer="adam", loss='binary_crossentropy', metrics='acc')
model.summary()
Input of my embedding layer (embedding matrix) :
[[ 0. 0. 0. ... 0. 0. 0. ]
[ 0. 0. 0. ... 0. 0. 0. ]
[ 0. 0. 0. ... 0. 0. 0. ]
...
[-0.068577 -0.71314 0.3898 ... -0.077923 -1.0469 0.56874 ]
[ 0.32461 0.50463 0.72544 ... 0.17634 -0.28961 0.29007 ]
[-0.33771 -0.24912 -0.032685 ... -0.033254 -0.45513 -0.13319 ]]
I train my model. However when I want to train it, I get the following message: ValueError: Dimensions must be equal, but are 7 and 300 for '{{node binary_crossentropy/mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Cast, binary_crossentropy/Log)' with input shapes: [?,7], [?,300,300].
My embedding matrix was made with glove 300d so it has 300 dimensions. While my labels, I have only 7 labels. So I have to make my x and y have the same dimensions but how? Thank you!!!
keras.backend.clear_session()
from tensorflow.keras.layers import Bidirectional
model = Sequential()
_input = keras.layers.Input(shape=(300,1))
model.add(_input)
bilstm_layer = Bidirectional(LSTM(units=300, return_sequences=False))
model.add(bilstm_layer)
model.add(Dense(7, activation="relu")) #here 7 is the number of classes you have and None is the batch_size
#crf_layer = CRF(units=len(self.tags), sparse_target=True)
#model.add(crf_layer)
model.compile(optimizer="adam", loss='binary_crossentropy', metrics='acc')
model.summary()

How to setup a base model in inference mode?

Keras documentation about fine-tuning states that it is important to "keep the BatchNormalization layers in inference mode by passing training=False when calling the base model.". (What is interesting, that every non-official example that I've found about the topic ignores this setting.)
Documentation follows up with example:
from tensorflow import keras
from keras.applications.xception import Xception
base_model = keras.applications.Xception(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(150, 150, 3),
include_top=False) # Do not include the ImageNet classifier at the top.
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1)
x = scale_layer(x)
# We make sure that the base_model is running in inference mode here,
# by passing `training=False`. This is important for fine-tuning, as you will
# learn in a few paragraphs.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs , outputs)
The thing is that the example is adding preprocessing to the base model and my model(EfficientNetB3) has already preprocessing included and I don't know how to set my base_model with `training=False`` without prepending it with additional layer:
base_model = EfficientNetB3(weights='imagenet', include_top=False, input_shape=input_shape)
base_model.trainable=False
model = Sequential()
model.add(base_model) # How to set base_model training=False?
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.2))
model.add(Dense(10, activation="softmax", name="classifier"))
How to prove that training=False or training=True has an effect:
#Frightera explained to me how to "lock" the model's state and I wanted to prove to myself that the lock happens by checking BatchNormalization non-trainable variables. My understating is that if I call model with training=True then it should update the variables. However, this is not the case, or am I missing something?
import tensorflow as tf
from tensorflow import keras
from keras.applications.efficientnet import EfficientNetB3
import numpy as np
class WrappedEffNet(keras.layers.Layer):
def __init__(self, **kwargs):
super(WrappedEffNet, self).__init__(**kwargs)
self.model = EfficientNetB3(weights='imagenet',
include_top=False,
input_shape=(224, 224, 3))
self.model.trainable=False
def call(self, x, training=False):
return self.model(x, training=training) # Modified to pass also True.
base_model_wrapped = WrappedEffNet()
random_vector = tf.random.uniform((1, 224, 224, 3))
o1 = base_model_wrapped(random_vector)
o2 = base_model_wrapped(random_vector, training = False)
# Getting all non-trainable variable values from all BatchNormalization layers.
array_a = np.array([])
for layer in base_model_wrapped.model.layers:
if hasattr(layer, 'moving_mean'):
v = layer.moving_mean.numpy()
np.concatenate([array_a, v])
v = layer.moving_variance.numpy()
np.concatenate([array_a, v])
o3 = base_model_wrapped(random_vector, training = True) # Changing to True, shouldn't this update BatchNormalization non-trainable variables?
array_b = np.array([])
for layer in base_model_wrapped.model.layers:
if hasattr(layer, 'moving_mean'):
v = layer.moving_mean.numpy()
np.concatenate([array_b, v])
v = layer.moving_variance.numpy()
np.concatenate([array_b, v])
print(np.allclose(array_a, array_b)) # Shouldn't this be False?
It is not possible to invoke the call method of the base model in Sequential model as in Functional. However, you can think the model as if it is a custom layer:
class WrappedEffNet(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(WrappedEffNet, self).__init__(**kwargs)
self.model = keras.applications.EfficientNetB3(weights='imagenet',
include_top=False,
input_shape=(224, 224, 3))
self.model.trainable=False
def call(self, x, training):
return self.model(x, training=False)
Sanity check:
base_model_wrapped = WrappedEffNet()
random_vector = tf.random.uniform((1, 224, 224, 3))
o1 = base_model_wrapped(random_vector)
o2 = base_model_wrapped(random_vector, training = False)
o3 = base_model_wrapped(random_vector, training = True)
np.allclose(o1, o2), np.allclose(o1, o3), np.allclose(o2, o3)
# (True, True, True)
It is inference mode regardless of the value of training.
Model summary is the same as Sequential:
Layer (type) Output Shape Param #
=================================================================
wrapped_eff_net (WrappedEff (1, 7, 7, 1536) 10783535
Net)
global_average_pooling2d (G (1, 1536) 0
lobalAveragePooling2D)
dropout (Dropout) (1, 1536) 0
classifier (Dense) (1, 10) 15370
=================================================================
Total params: 10,798,905
Trainable params: 15,370
Non-trainable params: 10,783,535
_________________________________________________________________
Edit: In order to see difference of BatchNormalization:
import tensorflow as tf
import numpy as np
x = np.random.randn(1, 2) * 20 + 0.1
bn = tf.keras.layers.BatchNormalization()
input_layer = tf.keras.layers.Input((x.shape[-1], ))
output = bn(input_layer )
model = tf.keras.Model(inputs=input_layer , outputs=output)
model.trainable = False:
model.trainable = False
for i in range(2):
print('Input:', x)
print('Moving mean:', model.layers[1].moving_mean.numpy())
print('training = True -->', model(x, training = True).numpy())
print('training = False -->', model(x, training = False).numpy())
print()
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0. 0.]
training = True --> [[ 2.5019286 12.437845 ]]
training = False --> [[ 2.5019286 12.437845 ]]
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0. 0.]
training = True --> [[ 2.5019286 12.437845 ]]
training = False --> [[ 2.5019286 12.437845 ]]
model.trainable = True, training = True:
model.trainable = True
for i in range(2):
print('Input:', x)
print('Moving mean:', model.layers[1].moving_mean.numpy())
print('training = True -->', model(x, training = True).numpy())
print()
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0. 0.]
training = True --> [[0. 0.]]
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0.02503179 0.12444062]
training = True --> [[0. 0.]]
model.trainable = True, training = False:
model.trainable = True
for i in range(2):
print('Input:', x)
print('Moving mean:', model.layers[1].moving_mean.numpy())
print('training = False -->', model(x, training = False).numpy())
print()
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0.04981326 0.24763682]
training = False --> [[ 2.476884 12.313342]]
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0.04981326 0.24763682]
training = False --> [[ 2.476884 12.313342]]

CrossEntropyLoss on sequences

I need to compute the torch.nn.CrossEntropyLoss on sequences.
The output tensor y_est has shape: [batch_size, sequence_length, embedding_dim]. The values are embedded as one-hot vectors with embedding_dim dimensions (y_est is not binary however).
The target tensor y has shape: [batch_size, sequence_length] and contains the integer index of the correct class in the range [0, embedding_dim).
If I compute the loss on the two input data, with the shape described above, I get an error 1.
What I would like to do is described by the cycle at [2]. For each sequence in the batch, I would like the sum of the losses computed on each element in the sequence.
After reading the documentation of torch.nn.CrossEntropyLoss I came up with the solution [3], which seems to compute exactly what I want: the losses computed at point [2] and [3] are equale.
However, since .permute(.) returns a view of the original tensor, I am afraid it might mess up the backward propagation on the loss. Somewhere (I do not remember where, sorry) I have read that views should not be used in computing the loss.
Is my solution correct?
import torch
batch_size = 5
seq_len = 10
emb_dim = 100
y_est = torch.randn( (batch_size, seq_len, emb_dim))
y = torch.randint(0, emb_dim, (batch_size, seq_len) )
print("y_est, batch x seq x emb:", y_est.shape)
print("y, batch x seq", y.shape)
loss_fn = torch.nn.CrossEntropyLoss(reduction="none")
# [1]
# loss = loss_fn(y_est, y)
# error:
# RuntimeError: Expected target size [5, 100], got [5, 10]
[2]
loss = 0
for i in range(y_est.shape[1]):
loss += loss_fn ( y_est[:, i, :], y[:, i]).sum()
print(loss)
[3]
y_est_2 = torch.permute( y_est, (0, 2, 1))
print("y_est_2", y_est_2.shape)
loss2 = loss_fn(y_est_2, y).sum()
print(loss2)
whose output is:
y_est, batch x seq x emb: torch.Size([5, 10, 100])
y, batch x seq torch.Size([5, 10])
tensor(253.9994)
y_est_2 torch.Size([5, 100, 10])
tensor(253.9994)
Is the solution correct (also for what concerns the backward pass)? Is there a better way?
If y_est are probabilities you really want to compute the error/loss of a categorical output in each timestep/element of a sequence then y and y_est have to have the same shape. To do so, the categories/classes of y can be expanded to the same dim as y_est with one-hot encoding
import torch
batch_size = 5
seq_len = 10
emb_dim = 100
y_est = torch.randn( (batch_size, seq_len, emb_dim))
y = torch.randint(0, emb_dim, (batch_size, seq_len) )
y = torch.nn.functional.one_hot(y, num_classes=emb_dim).type(torch.float)
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(y_est, y)
print(loss)

Pytorch: How to compute IoU (Jaccard Index) for semantic segmentation

Can someone provide a toy example of how to compute IoU (intersection over union) for semantic segmentation in pytorch?
As of 2021, there's no need to implement your own IoU, as torchmetrics comes equipped with it - here's the link.
It is named torchmetrics.JaccardIndex (previously torchmetrics.IoU) and calculates what you want.
It works with PyTorch and PyTorch Lightning, also with distributed training.
From the documentation:
torchmetrics.JaccardIndex(num_classes, ignore_index=None, absent_score=0.0, threshold=0.5, multilabel=False, reduction='elementwise_mean', compute_on_step=None, **kwargs)
Computes Intersection over union, or Jaccard index calculation:
J(A,B) = \frac{|A\cap B|}{|A\cup B|}
Where: A and B are both tensors of the same size, containing integer class values. They may be subject to conversion from input data (see description below). Note that it is different from box IoU.
Works with binary, multiclass and multi-label data. Accepts probabilities from a model output or integer class values in prediction. Works with multi-dimensional preds and target.
Forward accepts
preds (float or long tensor): (N, ...) or (N, C, ...) where C is the number of classes
target (long tensor): (N, ...) If preds and target
are the same shape and preds is a float tensor, we use the
self.threshold argument to convert into integer labels. This is the case for binary and multi-label probabilities.
If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.
Official example:
>>> from torchmetrics import JaccardIndex
>>> target = torch.randint(0, 2, (10, 25, 25))
>>> pred = torch.tensor(target)
>>> pred[2:5, 7:13, 9:15] = 1 - pred[2:5, 7:13, 9:15]
>>> jaccard = JaccardIndex(num_classes=2)
>>> jaccard(pred, target)
tensor(0.9660)
I found this somewhere and adapted it for me. I'll post the link if I can find it again. Sorry in case this was a dublicate.
The key function here is the function called iou. The wrapping function evaluate_performance is not universal, but it shows that one needs to iterate over all results before computing IoU.
import torch
import pandas as pd # For filelist reading
import myPytorchDatasetClass # Custom dataset class, inherited from torch.utils.data.dataset
def iou(pred, target, n_classes = 12):
ious = []
pred = pred.view(-1)
target = target.view(-1)
# Ignore IoU for background class ("0")
for cls in xrange(1, n_classes): # This goes from 1:n_classes-1 -> class "0" is ignored
pred_inds = pred == cls
target_inds = target == cls
intersection = (pred_inds[target_inds]).long().sum().data.cpu()[0] # Cast to long to prevent overflows
union = pred_inds.long().sum().data.cpu()[0] + target_inds.long().sum().data.cpu()[0] - intersection
if union == 0:
ious.append(float('nan')) # If there is no ground truth, do not include in evaluation
else:
ious.append(float(intersection) / float(max(union, 1)))
return np.array(ious)
def evaluate_performance(net):
# Dataloader for test data
batch_size = 1
filelist_name_test = '/path/to/my/test/filelist.txt'
data_root_test = '/path/to/my/data/'
dset_test = myPytorchDatasetClass.CustomDataset(filelist_name_test, data_root_test)
test_loader = torch.utils.data.DataLoader(dataset=dset_test,
batch_size=batch_size,
shuffle=False,
pin_memory=True)
data_info = pd.read_csv(filelist_name_test, header=None)
num_test_files = data_info.shape[0]
sample_size = num_test_files
# Containers for results
preds = Variable(torch.zeros((sample_size, 60, 36, 60)))
gts = Variable(torch.zeros((sample_size, 60, 36, 60)))
dataiter = iter(test_loader)
for i in xrange(sample_size):
images, labels, filename = dataiter.next()
images = Variable(images).cuda()
labels = Variable(labels)
gts[i:i+batch_size, :, :, :] = labels
outputs = net(images)
outputs = outputs.permute(0, 2, 3, 4, 1).contiguous()
val, pred = torch.max(outputs, 4)
preds[i:i+batch_size, :, :, :] = pred.cpu()
acc = iou(preds, gts)
return acc
Say your outputs are of shape [32, 256, 256] # 32 is the minibatch size and 256x256 is the image's height and width, and the labels are also the same shape.
Then you can use sklearn's jaccard_similarity_score after some reshaping.
If both are torch tensors, then:
lbl = labels.cpu().numpy().reshape(-1)
target = output.cpu().numpy().reshape(-1)
Now:
from sklearn.metrics import jaccard_similarity_score as jsc
print(jsc(target,lbl))

Embedding 3D data in Pytorch

I want to implement character-level embedding.
This is usual word embedding.
Word Embedding
Input: [ [‘who’, ‘is’, ‘this’] ]
-> [ [3, 8, 2] ] # (batch_size, sentence_len)
-> // Embedding(Input)
# (batch_size, seq_len, embedding_dim)
This is what i want to do.
Character Embedding
Input: [ [ [‘w’, ‘h’, ‘o’, 0], [‘i’, ‘s’, 0, 0], [‘t’, ‘h’, ‘i’, ‘s’] ] ]
-> [ [ [2, 3, 9, 0], [ 11, 4, 0, 0], [21, 10, 8, 9] ] ] # (batch_size, sentence_len, word_len)
-> // Embedding(Input) # (batch_size, sentence_len, word_len, embedding_dim)
-> // sum each character embeddings # (batch_size, sentence_len, embedding_dim)
The final output shape is same as Word embedding. Because I want to concat them later.
Although I tried it, I am not sure how to implement 3-D embedding. Do you know how to implement such a data?
def forward(self, x):
print('x', x.size()) # (N, seq_len, word_len)
bs = x.size(0)
seq_len = x.size(1)
word_len = x.size(2)
embd_list = []
for i, elm in enumerate(x):
tmp = torch.zeros(1, word_len, self.embd_size)
for chars in elm:
tmp = torch.add(tmp, 1.0, self.embedding(chars.unsqueeze(0)))
Above code got an error because output of self.embedding is Variable.
TypeError: torch.add received an invalid combination of arguments - got (torch.FloatTensor, float, Variable), but expected one of:
* (torch.FloatTensor source, float value)
* (torch.FloatTensor source, torch.FloatTensor other)
* (torch.FloatTensor source, torch.SparseFloatTensor other)
* (torch.FloatTensor source, float value, torch.FloatTensor other)
didn't match because some of the arguments have invalid types: (torch.FloatTensor, float, Variable)
* (torch.FloatTensor source, float value, torch.SparseFloatTensor other)
didn't match because some of the arguments have invalid types: (torch.FloatTensor, float, Variable)
Update
I could do this. But for is not effective for batch. Do you guys know more efficient way?
def forward(self, x):
print('x', x.size()) # (N, seq_len, word_len)
bs = x.size(0)
seq_len = x.size(1)
word_len = x.size(2)
embd = Variable(torch.zeros(bs, seq_len, self.embd_size))
for i, elm in enumerate(x): # every sample
for j, chars in enumerate(elm): # every sentence. [ [‘w’, ‘h’, ‘o’, 0], [‘i’, ‘s’, 0, 0], [‘t’, ‘h’, ‘i’, ‘s’] ]
chars_embd = self.embedding(chars.unsqueeze(0)) # (N, word_len, embd_size) [‘w’,‘h’,‘o’,0]
chars_embd = torch.sum(chars_embd, 1) # (N, embd_size). sum each char's embedding
embd[i,j] = chars_embd[0] # set char_embd as word-like embedding
x = embd # (N, seq_len, embd_dim)
Update2
This is my final code. Thank you, Wasi Ahmad!
def forward(self, x):
# x: (N, seq_len, word_len)
input_shape = x.size()
bs = x.size(0)
seq_len = x.size(1)
word_len = x.size(2)
x = x.view(-1, word_len) # (N*seq_len, word_len)
x = self.embedding(x) # (N*seq_len, word_len, embd_size)
x = x.view(*input_shape, -1) # (N, seq_len, word_len, embd_size)
x = x.sum(2) # (N, seq_len, embd_size)
return x
I am assuming you have a 3d tensor of shape BxSxW where:
B = Batch size
S = Sentence length
W = Word length
And you have declared embedding layer as follows.
self.embedding = nn.Embedding(dict_size, emsize)
Where:
dict_size = No. of unique characters in the training corpus
emsize = Expected size of embeddings
So, now you need to convert the 3d tensor of shape BxSxW to a 2d tensor of shape BSxW and give it to the embedding layer.
emb = self.embedding(input_rep.view(-1, input_rep.size(2)))
The shape of emb will be BSxWxE where E is the embedding size. You can convert the resulting 3d tensor to a 4d tensor as follows.
emb = emb.view(*input_rep.size(), -1)
The final shape of emb will be BxSxWxE which is what you are expecting.
What you are looking for is implemented in allennlp TimeDistributed layer
Here is a demonstration:
from allennlp.modules.time_distributed import TimeDistributed
batch_size = 16
sent_len = 30
word_len = 5
Consider a sentence in input:
sentence = torch.randn(batch_size, sent_len, word_len) # suppose is your data
Define a char embedding layer (suppose you have also the input padded):
char_embedding = torch.nn.Embedding(char_vocab_size, char_emd_dim, padding_idx=char_pad_idx)
Wrap it!
embedding_sentence = TimeDistributed(char_embedding)(sentence) # shape: batch_size, sent_len, word_len, char_emb_dim
embedding_sentence has shape batch_size, sent_len, word_len, char_emb_dim
Actually, you can easily redefine a module in PyTorch to do this.

Resources