Related
I am trying to do a multivariate linear regression and I am having some issues. Namely, I am getting the following error:
ValueError: Cannot feed value of shape (3,) for Tensor 'X:0', which has shape '(1, 3)'
I have 3 feature variables, which I call trainX and 1 label, which I call trainY. Their shapes are the following (they are numpy arrays):
trainX.shape:
(2500, 3)
trainY.shape:
(2500,)
The following piece of code defines the tensors that I use to compute the model:
X = tf.compat.v1.placeholder("float", [1, 3], name="X")
Y = tf.compat.v1.placeholder("float", [1], name="Y")
W = tf.Variable(tf.zeros([3, 1]), name="W")
b = tf.Variable(tf.zeros([1]), name="b")
I calculate the predicted label and the cost function and the optimizer by doing:
predicted_y = tf.matmul(X, W) + b
cost = tf.reduce_sum(tf.pow(predicted_y-Y, 2)) / (2 * n)
optimizer = tf.compat.v1.train.GradientDescentOptimizer(learning_rate).minimize(cost)
I am getting the error in the tensor-flow session, namely in the following piece of code:
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
for (_x, _y) in zip(trainX, trainY):
sess.run(optimizer, feed_dict={X: _x, Y: _y})
if (epoch + 1) % 100 == 0:
c = sess.run(cost, feed_dict={X: trainX, Y: trainY})
print("Epoch", (epoch + 1), ": cost =", c, "W =", sess.run(W), "b =", sess.run(b))
# Storing necessary values to be used outside the Session
training_cost = sess.run(cost, feed_dict={X: trainX, Y: trainY})
weight = sess.run(W)
bias = sess.run(b)
Any help would be greatly appreciated.
The problem is that _x is a vector with 3 elements, while X expects a matrix with 1 line and 3 columns. One possible solution would be to reshape _x:
_x = np.reshape(_x, [1, 3])
Another possibility would be to change the placeholder to the input shape:
X = tf.compat.v1.placeholder("float", [3], name="X")
Often one wants to train on more than one example. In this case, you may want to define the placeholders to allow any number of inputs:
X = tf.compat.v1.placeholder("float", [None, 3], name="X")
Y = tf.compat.v1.placeholder("float", [None], name="Y")
Then we could for example we use batches of 100:
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
for i in range(trainX.shape[0] % 100):
sess.run(optimizer, feed_dict={X: trainX[i*100:(i+1)*100, ...], Y: trainY[i*100:(i+1)*100]})
I have found an inconsistency in the predict function of the SVM model for multiclass problems. I have trained a model with SKlearn SVM.SVC function for a multiclass prediction problem (see plot below).
But on some occasions, the predict functions gives me different results when I did the prediction instead with the argmax of the decision function. One can see that the inconsistency is close to the decision boundary.
This inconsistency vanishes when I use the OneVsRestClassifier directly. Does the predict function of the SVM.SVC classes some corrections or why does it differ from the argmax prediction?
Here is the code to reproduce the result:
import numpy as np
from sklearn import svm, datasets
from sklearn.multiclass import OneVsRestClassifier
from scipy.linalg import cho_solve, cho_factor
def create_data(n_samples, noise):
# 4 gaussian blobs with different means and variances
sample_per_cls = np.int(n_samples/4)
sample_per_cls_rest = sample_per_cls + n_samples - 4*sample_per_cls #puts the rest of the samples into the last class
x1 = np.random.multivariate_normal([20, 18], np.array([[2, 3], [3, 7]])*4*noise, sample_per_cls, 'warn')
x2 = np.random.multivariate_normal([13, 27], np.array([[10, 3], [3, 2]])*4*noise, sample_per_cls, 'warn')
x3 = np.random.multivariate_normal([9, 13], np.array([[6, 1], [1, 5]])*4*noise, sample_per_cls, 'warn')
x4 = np.random.multivariate_normal([14, 20], np.array([[4, 0.2], [0.2, 7]])*4*noise, sample_per_cls_rest, 'warn')
X = np.vstack([x1,x2,x3,x4])
#define the labels for each class
Y = np.empty([n_samples], dtype=np.int)
Y[0:sample_per_cls] = 0
Y[sample_per_cls:2*sample_per_cls] = 1
Y[2*sample_per_cls:3*sample_per_cls] = 2
Y[3*sample_per_cls:] = 3
#shuffle the data set
rand_int = np.arange(n_samples)
np.random.shuffle(rand_int)
X = X[rand_int]
Y = Y[rand_int]
return X, Y
X, Y = create_data(n_samples=800, noise=0.15)
clf = svm.SVC(C=0.5, kernel='rbf', gamma=0.1, decision_function_shape='ovr', cache_size=8000)
#the classifier below is consistent
#clf = OneVsRestClassifier(svm.SVC(C=0.5, kernel='rbf', gamma=0.1, decision_function_shape='ovr', cache_size=8000))
clf.fit(X,Y)
Xs = np.linspace(np.min(X[:,0] - 1), np.max(X[:,0] + 1), 150)
Ys = np.linspace(np.min(X[:,1] - 1), np.max(X[:,1] + 1), 150)
XX, YY = np.meshgrid(Xs, Ys)
test_set = np.stack([XX, YY], axis=2).reshape(-1,2)
#prediction via argmax of the decision function
pred = np.argmax(clf.decision_function(test_set), axis=1)
#prediction with sklearn function
pred_1 = clf.predict(test_set)
diff = np.equal(pred, pred_1)
error = np.where(diff == False)[0]
print(error)
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [16, 10]
plt.contourf(XX, YY, pred_1.reshape(XX.shape), alpha=0.5, cmap='seismic')
plt.colorbar()
plt.scatter(X[:,0], X[:,1], c=Y, s=20, marker='o', edgecolors='k')
plt.scatter(test_set[error, 0], test_set[error, 1], c=pred_1[error], s=120, marker='^', edgecolors='k')
plt.show()
Triangles are marking the inconsistent points:
My network transposes an image, with size 62*71, to a vector of 124 outputs. In the test, I got the same output for each input. I checked 4000 cases.
I cannot seem to signify the problem because the learning seems to be fine, there is an improvement of the error and get a relatively low error.
Someone maybe knows what is the problem?
#load data
data_in= np.transpose(np.loadtxt("images_in_10000.csv", delimiter=',',dtype=np.float32))
data_out= np.transpose(np.loadtxt("out_to_image_10000.csv", delimiter=',',dtype=np.float32))
x_train = data_in[0:6000, :]
x_test = data_in[6000:10001,:]
y_train = data_out[0:6000, :]
y_test = data_out[6000:10001, :]
#parametersa
batch=100
epochs=7
learning_rate=0.01
n = x_test.shape[1] #4392
m = x_train.shape[0] #6000
d = y_test.shape[1] #124
l = y_test.shape[0] #4000
trainX = tf.placeholder(tf.float32, [batch, n])
trainY = tf.placeholder(tf.float32, [batch, d])
testX = tf.placeholder(tf.float32, [l, n])
testY = tf.placeholder(tf.float32, [l, d])
W_c1= tf.Variable(tf.random_normal([5, 5, 1, 32]))
W_c2= tf.Variable(tf.random_normal([5, 5, 32, 64]))
W_fc= tf.Variable(tf.random_normal([18 * 16 * 64, 128]))
W_out= tf.Variable(tf.random_normal([128, d]))
b_c1= tf.Variable(tf.random_normal([32]))
b_c2=tf.Variable(tf.random_normal([64]))
b_fc=tf.Variable(tf.random_normal([128]))
b_out=tf.Variable(tf.random_normal([d]))
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def maxpool2d(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def convolutional_neural_network(x):
x = tf.reshape(x, shape=[-1,61,72, 1])
conv1 = tf.nn.relu(conv2d(x, W_c1) + b_c1)
conv1 = maxpool2d(conv1)
conv2 = tf.nn.relu(conv2d(conv1, W_c2) + b_c2)
conv2 = maxpool2d(conv2)
fc = tf.reshape(conv2, [-1, 18 * 16 * 64])
fc = tf.nn.relu(tf.matmul(fc, W_fc) + b_fc)
output = tf.matmul(fc, W_out) + b_out
return output
prediction = convolutional_neural_network(trainX)
cost =tf.reduce_mean(tf.pow(prediction-trainY,2))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
prediction_t = convolutional_neural_network(testX)
losstest = tf.reduce_mean(tf.pow(prediction_t - testY, 2))
k=0
a = np.linspace(0, m - batch, m / batch, dtype=np.int32)
costshow = [0] * (len(a) * epochs)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(epochs):
epoch_loss = 0
for i in (np.linspace(0,m - batch, m / batch, dtype=np.int32)):
x = x_train[i:i + batch, :]
y = y_train[i:i + batch, :]
sess.run(optimizer, feed_dict={trainX: x, trainY: y})
cost_val = sess.run(cost, feed_dict={trainX: x, trainY: y})
costshow[k]=cost_val
print("Epoch=", '%04d' % (epoch + 1), "loss=", " {:.9f}".format(cost_val))
k = k + 1
print("finsh train-small ")
result = sess.run(prediction_t, feed_dict={testX: x_test})
test_loss = sess.run(losstest, feed_dict={testX: np.asarray(x_test), testY: np.asarray(y_test)})
print("Testing loss=", test_loss)
The metric behind a picture is clearly defined. The values of an image often ranges from 0-1 or 0-255. For CNN's you should normalize your input values (0-1).
Thus you have to be careful with your weight initialization. For example, if your have a bias of 0.6 and a value of 0.6, you get a 1.2 as image value and your plotting program thinks you are in the 0-255 range and everything is black.
So try to use the glorot-initializer for the weights and zero-initializer for the bias initializer:
Weights:
tf.get_variable("weight", shape=[5, 5, 1, 32], initializer=tf.glorot_uniform_initializer())
Bias:
tf.get_variable("bias", shape=[32], initializer=tf.zeros_initializer())
Furthermore, tf.Variabel is deprecated. It is better to use tf.get_variable.
My network starts to learn and looks okay on the first batch and then suddenly stops with TypeError on the second batch only! Why was it okay on the first batch then? Or why did it break after the first? Stupefying error... Here are the details:
I have built a CNN that is trying to predict 124 features for each image. The images are of size 61 x 72 pixels and the output vector of numbers are of size 124 x 1.
The images are floating point matrices with numbers between 1 and -1.
The information I'm trying to predict is in a CSV file, with each line describing an image. When I load the data for the training process I process each line and reshape them, also get the pictures the network is learning.
When I run my program, I get the following error on the second batch, however:
"TypeError: Fetch argument 2.7674865e+09 has invalid type , must be a string or Tensor. (Can not convert a float32 into a Tensor or Operation.)"
Can you please help pinpoint what the problem is? Here's my code:
import tensorflow as tf
import numpy as np
data_in=np.loadtxt(open("images.csv"), delimiter=',',dtype=np.float32);
data_out=np.loadtxt(open("outputmix-124.csv"),
delimiter=',',dtype=np.float32);
x_train = data_in[0:6000, :]
x_test = data_in[6000:10000,:]
y_train = data_out[0:6000, :]
y_test = data_out[6000:10000, :]
batch=600
epochs=10
n = x_test.shape[1] #4392
m = x_train.shape[0] #6000
d = y_test.shape[1] #124
l = y_test.shape[0] #4000
trainX= tf.placeholder(tf.float32, [batch, n], name="X")
trainY = tf.placeholder(tf.float32, [batch, d])
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def maxpool2d(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
padding='SAME')
def convolutional_neural_network(x):
weights = {'W_c1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
'W_c2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
'W_fc': tf.Variable(tf.random_normal([18 * 16 * 64, 1024])),
'out': tf.Variable(tf.random_normal([1024, d]))}
biases = {'b_c1': tf.Variable(tf.random_normal([32])),
'b_c2': tf.Variable(tf.random_normal([64])),
'b_fc': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([d]))}
x = tf.reshape(x, shape=[-1,61,72, 1])
conv1 = tf.nn.relu(conv2d(x, weights['W_c1']) + biases['b_c1'])
conv1 = maxpool2d(conv1)
conv2 = tf.nn.relu(conv2d(conv1, weights['W_c2']) + biases['b_c2'])
conv2 = maxpool2d(conv2)
fc = tf.reshape(conv2, [-1, 18 * 16 * 64])
fc = tf.nn.relu(tf.matmul(fc, weights['W_fc']) + biases['b_fc'])
fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, weights['out']) + biases['out']
return output
def train_neural_network(x):
prediction = convolutional_neural_network(x)
cost =tf.reduce_mean(tf.pow(prediction-trainY,2))
optimizer = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(epochs):
epoch_loss = 0
for i in (np.linspace(0,m - batch, m / batch, dtype=np.int32)):
x = x_train[i:i + batch, :]
y = y_train[i:i + batch, :]
sess.run(optimizer, feed_dict={trainX: x, trainY: y})
cost = sess.run(cost, feed_dict={trainX: x, trainY: y})
print("Epoch=", '%04d' % (epoch + 1), "loss=", "
{:.9f}".format(cost))
epoch_loss += cost
print('Epoch', epoch, 'completed out of', epochs, 'loss:',
epoch_loss)
train_neural_network(trainX)
This is a fairly typical mistake. The problem is with the variable cost. First you assign the loss calculation tensor to it in the second line of the function train_neural_network():
cost =tf.reduce_mean(tf.pow(prediction-trainY,2))
Then when you run the training and the cost calculation, you do this, and this is where it gets messed up:
cost = sess.run(cost, feed_dict={trainX: x, trainY: y})
because you assign the value of the loss to cost, which is now a simple floating point number, instead of a Tensor. The next time around sess.run() gets a floating point number instead of a tensor as a first argument, and the error above is printed.
Use something like cost_val for storing the value of the loss and leave cost to store the reference to the tensor. You of course need to update the lines that print the value as well, so these three lines I've changed:
cost_val = sess.run(cost, feed_dict={trainX: x, trainY: y})
print("Epoch=", '%04d' % (epoch + 1), "loss=", " {:.9f}".format(cost_val))
epoch_loss += cost_val
I'm posting the full revised version here (tested code; note I've generated test data instead of loading; this is a loadable and testable example for anyone but you need to change it back to load your actual data):
import tensorflow as tf
import numpy as np
keep_rate = 0.8
#data_in=np.loadtxt(open("images.csv"), delimiter=',',dtype=np.float32);
#data_out=np.loadtxt(open("outputmix-124.csv"),
# delimiter=',',dtype=np.float32);
data_in = np.random.normal( size = ( 10000, 4392 ) )
data_out = np.random.normal( size = ( 10000, 124 ) )
x_train = data_in[0:6000, :]
x_test = data_in[6000:10000,:]
y_train = data_out[0:6000, :]
y_test = data_out[6000:10000, :]
batch=600
epochs=10
n = x_test.shape[1] #4392
m = x_train.shape[0] #6000
d = y_test.shape[1] #124
l = y_test.shape[0] #4000
trainX = tf.placeholder(tf.float32, [batch, n], name="X")
trainY = tf.placeholder(tf.float32, [batch, d])
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def maxpool2d(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
padding='SAME')
def convolutional_neural_network(x):
weights = {'W_c1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
'W_c2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
'W_fc': tf.Variable(tf.random_normal([18 * 16 * 64, 1024])),
'out': tf.Variable(tf.random_normal([1024, d]))}
biases = {'b_c1': tf.Variable(tf.random_normal([32])),
'b_c2': tf.Variable(tf.random_normal([64])),
'b_fc': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([d]))}
x = tf.reshape(x, shape=[-1,61,72, 1])
conv1 = tf.nn.relu(conv2d(x, weights['W_c1']) + biases['b_c1'])
conv1 = maxpool2d(conv1)
conv2 = tf.nn.relu(conv2d(conv1, weights['W_c2']) + biases['b_c2'])
conv2 = maxpool2d(conv2)
fc = tf.reshape(conv2, [-1, 18 * 16 * 64])
fc = tf.nn.relu(tf.matmul(fc, weights['W_fc']) + biases['b_fc'])
fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, weights['out']) + biases['out']
return output
def train_neural_network(x):
prediction = convolutional_neural_network(x)
cost =tf.reduce_mean(tf.pow(prediction-trainY,2))
optimizer = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(epochs):
epoch_loss = 0
for i in (np.linspace(0,m - batch, m / batch, dtype=np.int32)):
x = x_train[i:i + batch, :]
y = y_train[i:i + batch, :]
sess.run(optimizer, feed_dict={trainX: x, trainY: y})
cost_val = sess.run(cost, feed_dict={trainX: x, trainY: y})
print("Epoch=", '%04d' % (epoch + 1), "loss=", " {:.9f}".format(cost_val))
epoch_loss += cost_val
print('Epoch', epoch, 'completed out of', epochs, 'loss:',
epoch_loss)
train_neural_network(trainX)
I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length], I add EOF at the end of the line and pad each sentence with enough placeholders, e.g. #. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.
To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.
However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true and y_pred. So how to input true sequence_lengths to loss function and mask?
Besides, I find a function _weighted_masked_objective(fn) in \keras\engine\training.py. Its definition is
Adds support for masking and sample-weighting to an objective function.
But it seems that the function can only accept fn(y_true, y_pred). Is there a way to use this function to solve my problem?
To be specific, I modify the example of Yu-Yang.
from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)
max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)
model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
c_e = sess.run(cross_entropy)
m_c_e=sess.run(masked_loss)
print("tf unmasked_loss:", c_e)
print("tf masked_loss:", m_c_e)
The output in Keras and TensorFlow are compared as follows:
As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?
If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.
Some Details:
It's a bit involved to explain the whole process, so I'll just break it down to several steps:
In compile(), the mask is collected by calling compute_mask() and applied to the loss(es) (irrelevant lines are ignored for clarity).
weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]
# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
masks = [None for _ in self.outputs]
if not isinstance(masks, list):
masks = [masks]
# Compute total loss.
total_loss = None
with K.name_scope('loss'):
for i in range(len(self.outputs)):
y_true = self.targets[i]
y_pred = self.outputs[i]
weighted_loss = weighted_losses[i]
sample_weight = sample_weights[i]
mask = masks[i]
with K.name_scope(self.output_names[i] + '_loss'):
output_loss = weighted_loss(y_true, y_pred,
sample_weight, mask)
Inside Model.compute_mask(), run_internal_graph() is called.
Inside run_internal_graph(), the masks in the model is propagated layer-by-layer from the model's inputs to outputs by calling Layer.compute_mask() for each layer iteratively.
So if you're using a Masking layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective().
A Small Example:
max_sentence_length = 5
character_number = 2
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[-0.11980877 0.05803877 0.07880752]
[-0.00429189 0.13382857 0.19167568]
[ 0.06817091 0.19093043 0.26219055]]
[[ 0. 0. 0. ]
[ 0.0651961 0.10283815 0.12413475]
[-0.04420842 0.137494 0.13727818]
[ 0.04479844 0.17440712 0.24715884]
[ 0.11117355 0.21645413 0.30220413]]]
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(model.evaluate(X, y_true))
0.881977558136
print(masked_loss)
0.881978
print(unmasked_loss)
0.917384
As can be seen from this example, the loss on the masked part (the zeroes in y_pred) is ignored, and the output of model.evaluate() is equal to masked_loss.
EDIT:
If there's a recurrent layer with return_sequences=False, the mask stop propagates (i.e., the returned mask is None). In RNN.compute_mask():
def compute_mask(self, inputs, mask):
if isinstance(mask, list):
mask = mask[0]
output_mask = mask if self.return_sequences else None
if self.return_state:
state_mask = [None for _ in self.states]
return [output_mask] + state_mask
else:
return output_mask
In your case, if I understand correctly, you want a mask that's based on y_true, and whenever the value of y_true is [0, 0, 1] (the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.
The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask). And also, y_true can be compared to the one-hot encoded vector [0, 0, 1] directly.
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_categorical_crossentropy(y_true, y_pred):
# find out which timesteps in `y_true` are not the padding character '#'
mask = K.all(K.equal(y_true, mask_value), axis=-1)
mask = 1 - K.cast(mask, K.floatx())
# multiply categorical_crossentropy with the mask
loss = K.categorical_crossentropy(y_true, y_pred) * mask
# take average w.r.t. the number of unmasked entries
return K.sum(loss) / K.sum(mask)
return masked_categorical_crossentropy
masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')
The output of the above code then shows that the loss is computed only on the unmasked values:
model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339
The value is different from yours because I've changed the axis argument in tf.reverse from [0,1] to [1].
If you're not using masks as in Yu-Yang's answer, you can try this.
If you have your target data Y with length and padded with the mask value, you can:
import keras.backend as K
def custom_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, maskValue) #true for all mask values
#since y is shaped as (batch, length, features), we need all features to be mask values
isMask = K.all(isMask, axis=-1) #the entire output vector must be true
#this second line is only necessary if the output features are more than 1
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
return someLossFunction(yTrue,yPred)
If you have padding only for the input data, or if Y has no length, you can have your own mask outside the function:
masks = [
[1,1,1,1,1,1,0,0,0],
[1,1,1,1,0,0,0,0,0],
[1,1,1,1,1,1,1,1,0]
]
#shape (samples, length). If it fails, make it (samples, length, 1).
import keras.backend as K
masks = K.constant(masks)
Since masks depend on your input data, you can use your mask value to know where to put zeros, such as:
masks = np.array((X_train == maskValue).all(), dtype='float64')
masks = 1 - masks
#here too, if you have a problem with dimensions in the multiplications below
#expand masks dimensions by adding a last dimension = 1.
And make your function taking masks from outside of it (you must recreate the loss function if you change the input data):
def customLoss(yTrue,yPred):
yTrue = masks*yTrue
yPred = masks*yPred
return someLossFunction(yTrue,yPred)
Does anyone know if keras automatically masks the loss function??
Since it provides a Masking layer and says nothing about the outputs, maybe it does it automatically?
I took both anwers and imporvised a way for Multiple Timesteps, single Missing target Values, Loss for LSTM(or other RecurrentNN) with return_sequences=True.
Daniels Answer would not suffice for multiple targets, due to isMask = K.all(isMask, axis=-1). Removing this aggregation made the function undifferentiable, probably. I do not know for shure, since I never run the pure function and cannot tell if its able to fit a model.
I fused Yu-Yangs's and Daniels answer together and it worked.
from tensorflow.keras.layers import Layer, Input, LSTM, Dense, TimeDistributed
from tensorflow.keras import Model, Sequential
import tensorflow.keras.backend as K
import numpy as np
mask_Value = -2
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, mask_Value) #true for all mask values
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
isMask
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
# perform a root mean square error, whereas the mean is in respect to the mask
mean_loss = K.sum(K.square(yPred - yTrue))/K.sum(isMask)
loss = K.sqrt(mean_loss)
return loss
#RootMeanSquaredError()(yTrue,yPred)
return masked_loss
# define timeseries data
n_sample = 10
timesteps = 5
feat_inp = 2
feat_out = 2
X = np.random.uniform(0,1, (n_sample, timesteps, feat_inp))
y = np.random.uniform(0,1, (n_sample,timesteps, feat_out))
# define model
model = Sequential()
model.add(LSTM(50, activation='relu',return_sequences=True, input_shape=(timesteps, feat_inp)))
model.add(Dense(feat_out))
model.compile(optimizer='adam', loss=get_loss(mask_Value))
model.summary()
# %%
model.fit(X, y, epochs=50, verbose=0)
Note that Yu-Yang's answer does not appear to work on Tensorflow Keras 2.7.0
Surprisingly, model.evaluate does not compute masked_loss or unmasked_loss. Instead, it assumes that the loss from all masked input steps is zero (but still includes those steps in the mean() calculation). This means that every masked timestep actually reduces the calculated error!
#%% Yu-yang's example
# https://stackoverflow.com/a/47060797/3580080
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
# Fix the random seed for repeatable results
np.random.seed(5)
tf.random.set_seed(5)
max_sentence_length = 5
character_number = 2
input_tensor = keras.Input(shape=(max_sentence_length, character_number))
masked_input = keras.layers.Masking(mask_value=0)(input_tensor)
output = keras.layers.LSTM(3, return_sequences=True)(masked_input)
model = keras.Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(f"model.evaluate= {model.evaluate(X, y_true)}")
print(f"masked loss= {masked_loss}")
print(f"unmasked loss= {unmasked_loss}")
Prints:
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0.05340272 -0.06415359 -0.11803789]
[ 0.08775083 0.00600774 -0.10454659]
[ 0.11212641 0.07632366 -0.04133942]]
[[ 0. 0. 0. ]
[ 0.05394626 0.08956442 0.03843312]
[ 0.09092357 -0.02743799 -0.10386454]
[ 0.10791279 0.04083341 -0.08820333]
[ 0.12459432 0.09971555 -0.02882453]]]
1/1 [==============================] - 1s 658ms/step - loss: 0.6865
model.evaluate= 0.6864957213401794
masked loss= 0.9807082414627075
unmasked loss= 0.986495852470398
(This is intended as a comment rather than an answer).