Suppose I have the following embeddings emb_user = torch.randn(64, 128, 256). From the second dimension (of length 128), I wish to pick out 16 at random at each instance. I was wondering if there was a more efficient way of doing the following:
idx = torch.multinomial(torch.ones(64, 128), 16)
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
What I also find curios is that the above multinomial would not work if the weight matrix (torch.ones(64, 128)) exceeded more than 2 dimensions.
Since in your case you want an uniform distribution you could speed it up with
idx = torch.sort(torch.randint(
0, 128 - 15, (64, 16), device=device
), axis=1).values + torch.arange(0, 16, device=device).reshape(1, -1)
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
Instead of
idx = torch.multinomial(torch.ones(64, 128, device=device), 16)
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
The runtimes on my machine are 427 µs and 784 µs with device='cpu'; 135 µs and 260 µs and 469 µs with device='cuda'.
How it works?
The sorted randint gives the indices for a multinomial distribution with replacement. That is increasing, adding the arange term makes it strictly increasing, thus eliminates the replacements.
Illustrating with a small case
idx = torch.sort(torch.randint(0, 7, (4,))).values
print('Indices with replacement in the range from 0 to 6: ', idx)
print('Indices without replacement in the slice: ', idx + torch.arange(4))
Indices with replacement in the range from 0 to 6: tensor([0, 5, 5, 6])
Indices without replacement in the slice: tensor([0, 6, 7, 9])
A possibly faster solution, but not from exactly the same distribution is the following:
idx = torch.cumsum(torch.diff(
torch.sort(torch.randint(
0, 128 - 16, (64, 17), device=device
), axis=1).values
, axis=1) + 1, axis=1) - 1
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
One more way, I expect to be closer to the exact method, not very rigorously analyzed.
# 1-rand() to include 1 and exclude zero.
d = torch.cumsum(1 - torch.rand(64, 17, device=device
), axis=1)
# this produces a sorted tensor with values in the range [0:128-16]
d = (((128 - 15) * d[:, :-1]) / d[:, -1:]).to(torch.long)
idx = d + torch.arange(0, 16, device=device).reshape(1, -1)
But in the end it tends to be slower than the method using sort.
Hi I'm student who just started for deep learning.
For example, I have 1-D tensor x = [ 1 , 2]. From this one, I hope to make 2D tensor y whose (i,j)th element has value (x[i] - x[j]), i.e y[0,:] = [0 , 1] , y[1,:]=[ -1 , 0].
Is there built-in function like this in pytorch library?
Thanks.
Here you need right dim of tensor to get expected result which you can get using torch.unsqueeze
x = torch.tensor([1 , 2])
y = x - x.unsqueeze(1)
y
tensor([[ 0, 1],
[-1, 0]])
There are a few ways you could get this result, the cleanest I can think of is using broadcasting semantics.
x = torch.tensor([1, 2])
y = x.view(-1, 1) - x.view(1, -1)
which produces
y = tensor([[0, -1],
[1, 0]])
Note I'll try to edit this answer and remove this note if the original question is clarified.
In your question you ask for y[i, j] = x[i] - x[j], which the above code produces.
You also say that you expect y to have values
y = tensor([[ 0, 1],
[-1, 0]])
which is actually y[i, j] = x[j] - x[i] as was posted in Dishin's answer. If you instead wanted the latter then you can use
y = x.view(1, -1) - x.view(-1, 1)
I am using a DNN provided by tflearn to learn from some data. My data variable has a shape of (6605, 32) and my labels data has a shape of (6605,) which I reshape in the code below to (6605, 1)...
# Target label used for training
labels = np.array(data[label], dtype=np.float32)
# Reshape target label from (6605,) to (6605, 1)
labels = tf.reshape(labels, shape=[-1, 1])
# Data for training minus the target label.
data = np.array(data.drop(label, axis=1), dtype=np.float32)
# DNN
net = tflearn.input_data(shape=[None, 32])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 1, activation='softmax')
net = tflearn.regression(net)
# Define model.
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
This gives me a couple of errors, the first is...
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 2 for 'strided_slice' (op: 'StridedSlice') with input shapes: [6605,1], [1,16], [1,16], [1].
...and the second is...
During handling of the above exception, another exception occurred:
ValueError: Shape must be rank 1 but is rank 2 for 'strided_slice' (op: 'StridedSlice') with input shapes: [6605,1], [1,16], [1,16], [1].
I have no idea what rank 1 and rank 2 are, so I do not have an idea as to how to fix this issue.
In Tensorflow, rank is the number of dimensions of a tensor (not similar to the matrix rank). As an example, following tensor has a rank of 2.
t1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(t1.shape) # prints (3, 3)
Moreover, following tensor has a rank of 3.
t2 = np.array([[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]])
print(t2.shape) # prints (2, 2, 3)
Since tflearn is build on top of Tensorflow, inputs should not be tensors. I have modified your code as follows and commented where necessary.
# Target label used for training
labels = np.array(data[label], dtype=np.float32)
# Reshape target label from (6605,) to (6605, 1)
labels =np.reshape(labels,(-1,1)) #makesure the labels has the shape of (?,1)
# Data for training minus the target label.
data = np.array(data.drop(label, axis=1), dtype=np.float32)
data = np.reshape(data,(-1,32)) #makesure the data has the shape of (?,32)
# DNN
net = tflearn.input_data(shape=[None, 32])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 1, activation='softmax')
net = tflearn.regression(net)
# Define model.
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
Hope this helps.
I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length], I add EOF at the end of the line and pad each sentence with enough placeholders, e.g. #. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.
To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.
However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true and y_pred. So how to input true sequence_lengths to loss function and mask?
Besides, I find a function _weighted_masked_objective(fn) in \keras\engine\training.py. Its definition is
Adds support for masking and sample-weighting to an objective function.
But it seems that the function can only accept fn(y_true, y_pred). Is there a way to use this function to solve my problem?
To be specific, I modify the example of Yu-Yang.
from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)
max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)
model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
c_e = sess.run(cross_entropy)
m_c_e=sess.run(masked_loss)
print("tf unmasked_loss:", c_e)
print("tf masked_loss:", m_c_e)
The output in Keras and TensorFlow are compared as follows:
As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?
If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.
Some Details:
It's a bit involved to explain the whole process, so I'll just break it down to several steps:
In compile(), the mask is collected by calling compute_mask() and applied to the loss(es) (irrelevant lines are ignored for clarity).
weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]
# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
masks = [None for _ in self.outputs]
if not isinstance(masks, list):
masks = [masks]
# Compute total loss.
total_loss = None
with K.name_scope('loss'):
for i in range(len(self.outputs)):
y_true = self.targets[i]
y_pred = self.outputs[i]
weighted_loss = weighted_losses[i]
sample_weight = sample_weights[i]
mask = masks[i]
with K.name_scope(self.output_names[i] + '_loss'):
output_loss = weighted_loss(y_true, y_pred,
sample_weight, mask)
Inside Model.compute_mask(), run_internal_graph() is called.
Inside run_internal_graph(), the masks in the model is propagated layer-by-layer from the model's inputs to outputs by calling Layer.compute_mask() for each layer iteratively.
So if you're using a Masking layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective().
A Small Example:
max_sentence_length = 5
character_number = 2
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[-0.11980877 0.05803877 0.07880752]
[-0.00429189 0.13382857 0.19167568]
[ 0.06817091 0.19093043 0.26219055]]
[[ 0. 0. 0. ]
[ 0.0651961 0.10283815 0.12413475]
[-0.04420842 0.137494 0.13727818]
[ 0.04479844 0.17440712 0.24715884]
[ 0.11117355 0.21645413 0.30220413]]]
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(model.evaluate(X, y_true))
0.881977558136
print(masked_loss)
0.881978
print(unmasked_loss)
0.917384
As can be seen from this example, the loss on the masked part (the zeroes in y_pred) is ignored, and the output of model.evaluate() is equal to masked_loss.
EDIT:
If there's a recurrent layer with return_sequences=False, the mask stop propagates (i.e., the returned mask is None). In RNN.compute_mask():
def compute_mask(self, inputs, mask):
if isinstance(mask, list):
mask = mask[0]
output_mask = mask if self.return_sequences else None
if self.return_state:
state_mask = [None for _ in self.states]
return [output_mask] + state_mask
else:
return output_mask
In your case, if I understand correctly, you want a mask that's based on y_true, and whenever the value of y_true is [0, 0, 1] (the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.
The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask). And also, y_true can be compared to the one-hot encoded vector [0, 0, 1] directly.
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_categorical_crossentropy(y_true, y_pred):
# find out which timesteps in `y_true` are not the padding character '#'
mask = K.all(K.equal(y_true, mask_value), axis=-1)
mask = 1 - K.cast(mask, K.floatx())
# multiply categorical_crossentropy with the mask
loss = K.categorical_crossentropy(y_true, y_pred) * mask
# take average w.r.t. the number of unmasked entries
return K.sum(loss) / K.sum(mask)
return masked_categorical_crossentropy
masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')
The output of the above code then shows that the loss is computed only on the unmasked values:
model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339
The value is different from yours because I've changed the axis argument in tf.reverse from [0,1] to [1].
If you're not using masks as in Yu-Yang's answer, you can try this.
If you have your target data Y with length and padded with the mask value, you can:
import keras.backend as K
def custom_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, maskValue) #true for all mask values
#since y is shaped as (batch, length, features), we need all features to be mask values
isMask = K.all(isMask, axis=-1) #the entire output vector must be true
#this second line is only necessary if the output features are more than 1
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
return someLossFunction(yTrue,yPred)
If you have padding only for the input data, or if Y has no length, you can have your own mask outside the function:
masks = [
[1,1,1,1,1,1,0,0,0],
[1,1,1,1,0,0,0,0,0],
[1,1,1,1,1,1,1,1,0]
]
#shape (samples, length). If it fails, make it (samples, length, 1).
import keras.backend as K
masks = K.constant(masks)
Since masks depend on your input data, you can use your mask value to know where to put zeros, such as:
masks = np.array((X_train == maskValue).all(), dtype='float64')
masks = 1 - masks
#here too, if you have a problem with dimensions in the multiplications below
#expand masks dimensions by adding a last dimension = 1.
And make your function taking masks from outside of it (you must recreate the loss function if you change the input data):
def customLoss(yTrue,yPred):
yTrue = masks*yTrue
yPred = masks*yPred
return someLossFunction(yTrue,yPred)
Does anyone know if keras automatically masks the loss function??
Since it provides a Masking layer and says nothing about the outputs, maybe it does it automatically?
I took both anwers and imporvised a way for Multiple Timesteps, single Missing target Values, Loss for LSTM(or other RecurrentNN) with return_sequences=True.
Daniels Answer would not suffice for multiple targets, due to isMask = K.all(isMask, axis=-1). Removing this aggregation made the function undifferentiable, probably. I do not know for shure, since I never run the pure function and cannot tell if its able to fit a model.
I fused Yu-Yangs's and Daniels answer together and it worked.
from tensorflow.keras.layers import Layer, Input, LSTM, Dense, TimeDistributed
from tensorflow.keras import Model, Sequential
import tensorflow.keras.backend as K
import numpy as np
mask_Value = -2
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, mask_Value) #true for all mask values
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
isMask
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
# perform a root mean square error, whereas the mean is in respect to the mask
mean_loss = K.sum(K.square(yPred - yTrue))/K.sum(isMask)
loss = K.sqrt(mean_loss)
return loss
#RootMeanSquaredError()(yTrue,yPred)
return masked_loss
# define timeseries data
n_sample = 10
timesteps = 5
feat_inp = 2
feat_out = 2
X = np.random.uniform(0,1, (n_sample, timesteps, feat_inp))
y = np.random.uniform(0,1, (n_sample,timesteps, feat_out))
# define model
model = Sequential()
model.add(LSTM(50, activation='relu',return_sequences=True, input_shape=(timesteps, feat_inp)))
model.add(Dense(feat_out))
model.compile(optimizer='adam', loss=get_loss(mask_Value))
model.summary()
# %%
model.fit(X, y, epochs=50, verbose=0)
Note that Yu-Yang's answer does not appear to work on Tensorflow Keras 2.7.0
Surprisingly, model.evaluate does not compute masked_loss or unmasked_loss. Instead, it assumes that the loss from all masked input steps is zero (but still includes those steps in the mean() calculation). This means that every masked timestep actually reduces the calculated error!
#%% Yu-yang's example
# https://stackoverflow.com/a/47060797/3580080
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
# Fix the random seed for repeatable results
np.random.seed(5)
tf.random.set_seed(5)
max_sentence_length = 5
character_number = 2
input_tensor = keras.Input(shape=(max_sentence_length, character_number))
masked_input = keras.layers.Masking(mask_value=0)(input_tensor)
output = keras.layers.LSTM(3, return_sequences=True)(masked_input)
model = keras.Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(f"model.evaluate= {model.evaluate(X, y_true)}")
print(f"masked loss= {masked_loss}")
print(f"unmasked loss= {unmasked_loss}")
Prints:
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0.05340272 -0.06415359 -0.11803789]
[ 0.08775083 0.00600774 -0.10454659]
[ 0.11212641 0.07632366 -0.04133942]]
[[ 0. 0. 0. ]
[ 0.05394626 0.08956442 0.03843312]
[ 0.09092357 -0.02743799 -0.10386454]
[ 0.10791279 0.04083341 -0.08820333]
[ 0.12459432 0.09971555 -0.02882453]]]
1/1 [==============================] - 1s 658ms/step - loss: 0.6865
model.evaluate= 0.6864957213401794
masked loss= 0.9807082414627075
unmasked loss= 0.986495852470398
(This is intended as a comment rather than an answer).