What is the meaning of "p_y_given_x_sentence = s[:, 0, :]" in rnnslu.py - theano

What does p_y_given_x_sentence = s[:, 0, :] in file rnnslu.py mean?
I assume that s is two dimensional. Where does s[:, 0, :] come from?

Related

torchmetrics represent uncertainty

I am using torchmetrics to calculate metrics such as F1 score, Recall, Precision and Accuracy in multilabel classification setting. With random initiliazed weights the softmax output (i.e. prediction) might look like this with a batch size of 8:
import torch
y_pred = torch.tensor([[0.1944, 0.1931, 0.2184, 0.1968, 0.1973],
[0.2182, 0.1932, 0.1945, 0.1973, 0.1968],
[0.2182, 0.1932, 0.1944, 0.1973, 0.1969],
[0.2182, 0.1931, 0.1945, 0.1973, 0.1968],
[0.2184, 0.1931, 0.1944, 0.1973, 0.1968],
[0.2181, 0.1932, 0.1941, 0.1970, 0.1976],
[0.2183, 0.1932, 0.1944, 0.1974, 0.1967],
[0.2182, 0.1931, 0.1945, 0.1973, 0.1968]])
With the correct labels (one-hot encoded):
y_true = torch.tensor([[0, 0, 1, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 0, 1, 1, 0],
[0, 0, 1, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 0, 1, 0, 1]])
And I can calculate the metrics by taking argmax:
import torchmetrics
torchmetrics.functional.f1_score(y_pred.argmax(-1), y_true.argmax(-1))
output:
tensor(0.1250)
The first prediction happens to be correct while the rest are wrong. However, none of the predictive probabilities are above 0.3, which means that the model is generally uncertain about the predictions. I would like to encode this and say that the f1 score should be 0.0 because none of the predictive probabilities are above a 0.3 threshold.
Is this possible with torchmetrics or sklearn library?
Is this common practice?
You need to threshold you predictions before passing them to your torchmetrics
t0, t1, mask_gt = batch
mask_pred = self.forward(t0, t1)
loss = self.criterion(mask_pred.squeeze().float(), mask_gt.squeeze().float())
mask_pred = torch.sigmoid(mask_pred).squeeze()
mask_pred = torch.where(mask_pred > 0.5, 1, 0)
# integers to comply with metrics input type
mask_pred = mask_pred.long()
mask_gt = mask_gt.long()
f1_score = self.f1(mask_pred, mask_gt)
precision = self.precision_(mask_pred, mask_gt)
recall = self.recall(mask_pred, mask_gt)
jaccard = self.jaccard(mask_pred, mask_gt)
The defined torchmetrics
self.f1 = F1Score(num_classes=2, average='macro', mdmc_average='samplewise')
self.recall = Recall(num_classes=2, average='macro', mdmc_average='samplewise')
self.precision_ = Precision(num_classes=2, average='macro', mdmc_average='samplewise') # self.precision exists in torch.nn.Module. Hence '_' symbol
self.jaccard = JaccardIndex(num_classes=2)

Parameters of my network on PyTorch are not updated

I want to make an auto calibration system using PyTorch.
I try to deal with a homogeneous transform matrix as weights of neural networks.
I write a code referring to PyTorch tutorials, but my custom parameters are not updated after backward method is called.
When I print a 'grad' attribute of each parameter, it is a None.
My code is below. Is there anything wrong?
Please give any advise to me. Thank you.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.params = nn.Parameter(torch.rand(6))
self.rx, self.ry, self.rz = self.params[0], self.params[1], self.params[2]
self.tx, self.ty, self.tz = self.params[3], self.params[4], self.params[5]
def forward(self, x):
tr_mat = torch.tensor([[1, 0, 0, self.params[3]],
[0, 1, 0, self.params[4]],
[0, 0, 1, self.params[5]],
[0, 0, 0, 1]], requires_grad=True)
rz_mat = torch.tensor([[torch.cos(self.params[2]), -torch.sin(self.params[2]), 0, 0],
[torch.sin(self.params[2]), torch.cos(self.params[2]), 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]], requires_grad=True)
ry_mat = torch.tensor([[torch.cos(self.params[1]), 0, torch.sin(self.params[1]), 0],
[0, 1, 0, 0],
[-torch.sin(self.params[1]), 0, torch.cos(self.params[1]), 0],
[0, 0, 0, 1]], requires_grad=True)
rx_mat = torch.tensor([[1, 0, 0, 0],
[0, torch.cos(self.params[0]), -torch.sin(self.params[0]), 0],
[0, torch.sin(self.params[0]), torch.cos(self.params[0]), 0],
[0, 0, 0, 1]], requires_grad=True)
tf1 = torch.matmul(tr_mat, rz_mat)
tf2 = torch.matmul(tf1, ry_mat)
tf3 = torch.matmul(tf2, rx_mat)
tr_local = torch.tensor([[1, 0, 0, x[0]],
[0, 1, 0, x[1]],
[0, 0, 1, x[2]],
[0, 0, 0, 1]])
tf_output = torch.matmul(tf3, tr_local)
output = tf_output[:3, 3]
return output
def get_loss(self, output):
pass
model = Net()
input_ex = np.array([[-0.01, 0.05, 0.92],
[-0.06, 0.03, 0.94]])
output_ex = np.array([[-0.3, 0.4, 0.09],
[-0.5, 0.2, 0.07]])
print(list(model.parameters()))
optimizer = optim.Adam(model.parameters(), 0.001)
criterion = nn.MSELoss()
for input_np, label_np in zip(input_ex, output_ex):
input_tensor = torch.from_numpy(input_np).float()
label_tensor = torch.from_numpy(label_np).float()
output = model(input_tensor)
optimizer.zero_grad()
loss = criterion(output, label_tensor)
loss.backward()
optimizer.step()
print(list(model.parameters()))
What happens
Your problem is related to PyTorch's implicit conversion of torch.tensor to float. Let's say you have this:
tr_mat = torch.tensor(
[
[1, 0, 0, self.params[3]],
[0, 1, 0, self.params[4]],
[0, 0, 1, self.params[5]],
[0, 0, 0, 1],
],
requires_grad=True,
)
torch.tensor can only be constructed from list which has Python like values, it cannot have torch.tensor inside it. What happens under the hood (let's say) is each element of self.params which can be converted to float is (in this case all of them can, e.g. self.params[3], self.params[4], self.params[5]).
When tensor's value is casted to float it's value is copied into Python counterpart hence it is not part of computational graph anymore, it's a new pure Python variable (which cannot be backpropagated obviously).
Solution
What you can do is choose elements of your self.params and insert them into eye matrices so the gradient flows. You can see a rewrite of your forward method taking this into account:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.params = nn.Parameter(torch.randn(6))
def forward(self, x):
sinus = torch.cos(self.params)
cosinus = torch.cos(self.params)
tr_mat = torch.eye(4)
tr_mat[:-1, -1] = self.params[3:]
rz_mat = torch.eye(4)
rz_mat[0, 0] = cosinus[2]
rz_mat[0, 1] = -sinus[2]
rz_mat[1, 0] = sinus[2]
rz_mat[1, 1] = cosinus[2]
ry_mat = torch.eye(4)
ry_mat[0, 0] = cosinus[1]
ry_mat[0, 2] = sinus[1]
ry_mat[2, 0] = -sinus[1]
ry_mat[2, 2] = cosinus[1]
rx_mat = torch.eye(4)
rx_mat[1, 1] = cosinus[0]
rx_mat[1, 2] = -sinus[0]
rx_mat[2, 1] = sinus[0]
rx_mat[2, 2] = cosinus[0]
tf1 = torch.matmul(tr_mat, rz_mat)
tf2 = torch.matmul(tf1, ry_mat)
tf3 = torch.matmul(tf2, rx_mat)
tr_local = torch.tensor(
[[1, 0, 0, x[0]], [0, 1, 0, x[1]], [0, 0, 1, x[2]], [0, 0, 0, 1]],
)
tf_output = torch.matmul(tf3, tr_local)
output = tf_output[:3, 3]
return output
(you may want to double check this rewrite but the idea holds).
Also notice tr_local can be done "your way" as we don't need any values to keep gradient.
requires_grad
You can see requires_grad wasn't used anywhere in the code. It's because what requires gradient is not the whole eye matrix (we will not optimize 0 and 1), but parameters which are inserted into it. Usually you don't need requires_grad at all in your neural network code because:
input tensors are not optimized (usually, those could be when you are doing adversarial attacks or such)
nn.Parameter requires gradient by default (unless frozen)
layers and other neural network specific stuff requires gradient by default (unless frozen)
values which don't need gradient (input tensors) going through layers which do require it (or parameters or w/e) can be backpropagated

Cannot get the expected target shape in keras

I am new to sequential models. I am working on an image caption generator with attention model in Keras.
I keep getting an error with the expected target shape in Keras. I have worked on basic models before. And for this kind of error, usually there's a mistake with the way I process my dataset.
However, in this case, I have tried adjusting the shape of my 'y' array by unpacking it, or by trying to pack my 'outputs' list. But it doesn't change the error message.
def model(photo_shape, max_len, n_s, vocab_size):
outputs=list()
seq=Input(shape=(max_len,), name='inseq')#max_len is 33.
x = Embedding(vocab_size, 300,mask_zero=True)(seq)
p=Input(shape=(photo_shape[0],photo_shape[1]),name='picture')
s,_,c=LSTM(n_s,return_state = True)(p)
for t in range(max_len):
context = attention(p,s)
word=Lambda(lambda x: x[:,t,:])(x)
context = concat([word,context])
context=reshape(context)
s,_,c = lstm(context)
out = den2(s) #return a dense layer with vocab_size units(none, 2791).
outputs.append(out)
#print(np.array(outputs).shape) => returns (33,)
model = Model(inputs=[seq,p],outputs=outputs)
return model
#the following method goes to a generator function.
def build_sequences(tokenizer, max_length, desc_list, photo):
X1, X2, y = list(), list(), list()
desc = desc_list[0]
seq = tokenizer.texts_to_sequences([desc])[0]
l=len(seq)
in_seq, out_seq = seq[:l-1], seq[1:l]
in_seq = pad_sequences([in_seq],padding='post', maxlen=max_length)[0]
out_seq =[to_categorical([w], num_classes=vocab_size)[0] for w in out_seq]
out_seq = pad_sequences([out_seq],padding='post', maxlen=max_length)[0]
X1.append(in_seq)
X2.append(photo)
y.append(out_seq)
return np.array(X1), np.array(X2), np.array(y)
The error I get is this.
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 33 array(s), but instead got the following list of 1 arrays: [array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0,...
The debug messages show the following shapes for my inputs and outputs.
in_seq: (1, 33)
photo: (1, 196, 512)
out_seq: (1, 33, 2971)
The first is the batch size, which should not be taken into consideration. So, I do not know why the 33 is not visible to keras. I have tried modifying this shape. But logically, should this not work?
Please let me know if this is a data processing error or a problem with my model structure!
Let me know if more code is required

Encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I'm getting the following error when calling .backward():
Encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
Here's the code:
for i, j, k in zip(X, Y, Z):
A[:, i, j] = A[:, i, j] + k
I've tried .clone(), torch.add(), and so on.
Please help!
After the comments I'm a bit confused about what you want to accomplish. The code you gave gives me an error using the dimensions you provided in the comments
Traceback (most recent call last):
A[:, i, j] = A[:, i, j] + k
RuntimeError: The size of tensor a (32) must match the size of tensor b (200) at non-singleton dimension 0
But here's what I think you want to do, please correct me in the comments if this is wrong...
Given tensors X, Y, and Z, each entry of X, Y, and Z correspond to a coordinate (x,y) and a value z. What you want is to add z to A at coordinate (x,y). For most cases the batch dimension is kept independent, although its not clear that's the case in the code you posted. For now that's what I'll assume you want to do.
For example lets say A contains all zeros and has shape 3x4x5 and X,Y are shape 3x3 and Z is shape 3x3x1. For this example let's assume A contains all zeros to start, and X, Y, and Z have the following values
X = tensor([[1, 2, 3],
[1, 2, 3],
[2, 2, 2]])
Y = tensor([[1, 2, 3],
[1, 2, 3],
[1, 1, 1]])
Z = tensor([[[0.1], [0.2], [0.3]],
[[0.4], [0.5], [0.6]],
[[0.7], [0.8], [0.9]]])
Then we would expect A to have the following values after the operation
A = tensor([[[0, 0, 0, 0, 0],
[0, 0.1, 0, 0, 0],
[0, 0, 0.2, 0, 0],
[0, 0, 0, 0.3, 0]],
[[0, 0, 0, 0, 0],
[0, 0.4, 0, 0, 0],
[0, 0, 0.5, 0, 0],
[0, 0, 0, 0.6, 0]],
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 2.4, 0, 0, 0],
[0, 0, 0, 0, 0]]])
In order to accomplish this we can make use to the index_add function which allows us to add to a list of indices. Since this only supports 1-dimensional operations we first need to convert X,Y to a linear index for flattened tensor A. Afterwards we can un-flatten to the original shape.
layer_size = A.shape[1] * A.shape[2]
index_offset = torch.arange(0, A.shape[0] * layer_size, layer_size).unsqueeze(1)
indices = (X * A.shape[2] + Y) + index_offset
A = A.view(-1).index_add(0, indices.view(-1), Z.view(-1)).view(A.shape)

Not able to use Stratified-K-Fold on multi label classifier

The following code is used to do KFold Validation but I am to train the model as it is throwing the error
ValueError: Error when checking target: expected dense_14 to have shape (7,) but got array with shape (1,)
My target Variable has 7 classes. I am using LabelEncoder to encode the classes into numbers.
By seeing this error, If I am changing the into MultiLabelBinarizer to encode the classes. I am getting the following error
ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.
The following is the code for KFold validation
skf = StratifiedKFold(n_splits=10, shuffle=True)
scores = np.zeros(10)
idx = 0
for index, (train_indices, val_indices) in enumerate(skf.split(X, y)):
print("Training on fold " + str(index+1) + "/10...")
# Generate batches from indices
xtrain, xval = X[train_indices], X[val_indices]
ytrain, yval = y[train_indices], y[val_indices]
model = None
model = load_model() //defined above
scores[idx] = train_model(model, xtrain, ytrain, xval, yval)
idx+=1
print(scores)
print(scores.mean())
I don't know what to do. I want to use Stratified K Fold on my model. Please help me.
MultiLabelBinarizer returns a vector which is of the length of your number of classes.
If you look at how StratifiedKFold splits your dataset, you will see that it only accepts a one-dimensional target variable, whereas you are trying to pass a target variable with dimensions [n_samples, n_classes]
Stratefied split basically preserves your class distribution. And if you think about it, it does not make a lot of sense if you have a multi-label classification problem.
If you want to preserve the distribution in terms of the different combinations of classes in your target variable, then the answer here explains two ways in which you can define your own stratefied split function.
UPDATE:
The logic is something like this:
Assuming you have n classes and your target variable is a combination of these n classes. You will have (2^n) - 1 combinations (Not including all 0s). You can now create a new target variable considering each combination as a new label.
For example, if n=3, you will have 7 unique combinations:
1. [1, 0, 0]
2. [0, 1, 0]
3. [0, 0, 1]
4. [1, 1, 0]
5. [1, 0, 1]
6. [0, 1, 1]
7. [1, 1, 1]
Map all your labels to this new target variable. You can now look at your problem as simple multi-class classification, instead of multi-label classification.
Now you can directly use StartefiedKFold using y_new as your target. Once the splits are done, you can map your labels back.
Code sample:
import numpy as np
np.random.seed(1)
y = np.random.randint(0, 2, (10, 7))
y = y[np.where(y.sum(axis=1) != 0)[0]]
OUTPUT:
array([[1, 1, 0, 0, 1, 1, 1],
[1, 1, 0, 0, 1, 0, 1],
[1, 0, 0, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 1, 1, 1],
[1, 1, 0, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1],
[0, 0, 1, 0, 0, 1, 1],
[1, 0, 1, 0, 0, 1, 1],
[0, 1, 1, 1, 1, 0, 0]])
Label encode your class vectors:
from sklearn.preprocessing import LabelEncoder
def get_new_labels(y):
y_new = LabelEncoder().fit_transform([''.join(str(l)) for l in y])
return y_new
y_new = get_new_labels(y)
OUTPUT:
array([7, 6, 3, 3, 2, 5, 8, 0, 4, 1])

Resources