Compute gradient between a scalar and vector in PyTorch - pytorch

I am trying to replicate code which was written using Theano, to PyTorch. In the code, the author computes the gradient using
import theano.tensor as T
gparams = T.grad(cost, params)
and the shape of gparams is (256, 240)
I have tried using backward() but it doesn't seem to return anything. Is there an equivalent to grad within PyTorch?
Assume this is my input,
import torch
from torch.autograd import Variable
cost = torch.tensor(1.6019)
params = Variable(torch.rand(1, 73, 240))

cost needs to be a result of an operation involving params. You can't compute a gradient just knowing the values of two tensors. You need to know the relationship as well. This is why pytorch builds a computation graph when you perform tensor operations. For example, say the relationship is
cost = torch.sum(params)
then we would expect the gradient of cost with respect to params to be a vector of ones regardless of the value of params.
That could be computed as follows. Notice that you need to add the requires_grad flag to indicate to pytorch that you want backward to update the gradient when called.
# Initialize independent variable. Make sure to set requires_grad=true.
params = torch.tensor((1, 73, 240), requires_grad=True)
# Compute cost, this implicitly builds a computation graph which records
# how cost was computed with respect to params.
cost = torch.sum(params)
# Zero the gradient of params in case it already has something in it.
# This step is optional in this example but good to do in practice to
# ensure you're not adding gradients to existing gradients.
if params.grad is not None:
# Perform back propagation. This is where the gradient is actually
# computed. It also resets the computation graph.
# The gradient of params w.r.t to cost is now stored in params.grad.
tensor([1., 1., 1.])


Keras ImageDataGenerator sample_weight with data augmentation

I have a question about the use of the sample_weight parameter in the context of data augmentation in Keras with the ImageDataGenerator. Let's say I have a series of simple images with just one class of objects. So, for each image, I will have a corresponding mask with pixels = 0 for the background and 1 for where the object is labeled.
However, this dataset is unbalanced because a significant amount of these images are empty, which mean with masks just containing 0.
If I understood well, the 'sample_weight' parameter of the flow method of ImageDataGenerator is here to put the focus on the the samples of my dataset that I find more interesting, i.e. where my object is present.
My question is: what is the concrete influence of this sample_weight parameter on the training of my model. Does it influence the data augmentation? If I use the 'validation_split' parameter, does it influence the way validation sets are generated?
Here is the part of my code my question refers to:
data_gen_args = dict(rotation_range=90,
rescale=1. / 255,
image_datagen = ImageDataGenerator(**data_gen_args)
imf = image_datagen.flow(
sample_weight = sample_weight,
save_to_dir = 'traindir',
save_prefix = 'train_'
valf = image_datagen.flow(
sample_weight = sample_weight,
save_to_dir = 'valdir',
save_prefix = 'val_'
model = unet.UNet2(numberOfClasses, imshape, '', learningRate, depth=4)
history = model.fit_generator(generator=imf,
Thank you in advance for your attention.
As for Keras 2.2.5 with preprocessing at 1.1.0, the sample_weight is passed along with the samples and applied during processing. When calling .fit_generator, the model is trained on batches, each batch using sample weights:
model.train_on_batch(x, y,
In the source code of .train_on_batch, the documentation states: "sample_weight: Optional array of the same length as x, containing weights to apply to the model's loss for each sample. (...)". The actual application of weights happens when calculating loss on each batch. When compiling a model, Keras generates a "weighted loss" function out of the desired loss function. The weighted computation is stated in the code as:
def weighted(y_true, y_pred, weights, mask=None):
"""Wrapper function.
# Arguments
y_true: `y_true` argument of `fn`.
y_pred: `y_pred` argument of `fn`.
weights: Weights tensor.
mask: Mask tensor.
# Returns
Scalar tensor.
# score_array has ndim >= 2
score_array = fn(y_true, y_pred)
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in Theano
mask = K.cast(mask, K.floatx())
# mask should have the same shape as score_array
score_array *= mask
# the loss per batch should be proportional
# to the number of unmasked samples.
score_array /= K.mean(mask) + K.epsilon()
# apply sample weighting
if weights is not None:
# reduce score_array to same ndim as weight array
ndim = K.ndim(score_array)
weight_ndim = K.ndim(weights)
score_array = K.mean(score_array,
axis=list(range(weight_ndim, ndim)))
score_array *= weights
score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
return K.mean(score_array)
This wrapper shows it first calculates the desired loss (call to fn(y_true, y_pred)), then applies weighing if weights where passed (either with sample_weight or class_weight).
With this context in mind:
what is the concrete influence of this sample_weight parameter on the training of my model.
Weights are basically multiplied to the loss (and normalized). So "heavy" weights (more than 1) samples cause more loss, so larger gradients. "Light" weights reduce the importance of the sample and lead to smaller gradients.
Does it influence the data augmentation?
It depends on what you mean. Here is what I can say from experience, where I perform augmentation before feeding a Keras data generator (doing so as there were issues in preprocessing, as far as I know still existing in Preprocessing 1.1.0):
When feeding already augmented data to the generator, the .flow call will require a sample weights list as long as the input data. So the influence of weighing on augmentation depends on how the weights are chosen. A data point augmented N times may assign the same weight to each augmentation, or 1/N depending on the intent.
The default behaviour in Keras seems to assign the same weight to each augmentation (transform) performed by Keras. The code looks pretty clear, although I have never relied on it.
If I use the 'validation_split' parameter, does it influence the way validation sets are generated?
The sample_weight parameter does not seem to interfere with validation_split. I have not looked into the code specifically, but splitting basically gets the input data, and keeps a split for validation---whatever the data is. When sample_weight is added, what changes is each data point: Without weight, data is (x, y); with weight, data becomes (x, y, weight).

How to define custom cost function that depends on input when using ImageDataGenerator in Keras?

I would like to define a custom cost function
def custom_objective(y_true, y_pred):
return L
that will depend not only on y_true and y_pred, but on some feature of the corresponding x that produced y_pred. The only way I can think of doing this is to "hide" the relevant features in y_true, so that y_true = [usual_y_true, relevant_x_features], or something like that.
There are two main problems I am having with implementing this:
1) Changing the shape of y_true means I need to pad y_pred with some garbage so that their shapes are the same. I can do this by modyfing the last layer of my model
2) I used data augmentation like so:
datagen = ImageDataGenerator(preprocessing_function=my_augmenter)
where my_augmenter() is the function that should also give me the relevant x features to use in custom_objective() above. However, training with
model.fit_generator(datagen.flow(x_train, y_train, batch_size=1), ...)
doesn't seem to give me access to the features calculated with my_augmenter.
I suppose I could hide the features in the augmented x_train, copy them right away in my model setup, and then feed them directly into y_true or something like that, but surely there must be a better way to do this?
Maybe you could create a two part model with:
Inner model: original model that predicts desired outputs
Outer model:
Takes y_true data as inputs
Takes features as inputs
Outputs the loss itself (instead of predicted data)
So, suppose you already have the originalModel defined. Let's define the outer model.
#this model has three inputs:
originalInputs = originalModel.input
yTrueInputs = Input(shape_of_y_train)
featureInputs = Input(shape_of_features)
#the original outputs will become an input for a custom loss layer
originalOutputs = originalModel.output
#this layer contains our custom loss
loss = Lambda(innerLoss)([originalOutputs, yTrueInputs, featureInputs])
#outer model
outerModel = Model([originalInputs, yTrueInputs, featureInputs], loss)
Now, our custom inner loss:
def innerLoss(x):
y_pred = x[0]
y_true = x[1]
features = x[2]
.... calculate and return loss here ....
Now, for this model that already contains a custom loss "inside" it, we don't actually want a final loss function, but since keras demands it, we will use the final loss as just return y_pred:
def finalLoss(true,pred):
return pred
This will allow us to train passing just a dummy y_true.
But of course, we also need a custom generator, otherwise we can't get the features.
Consider you already have originalGenerator =datagen.flow(x_train, y_train, batch_size=1) defined:
def customGenerator(originalGenerator):
while True: #keras needs infinite generators
x, y = next(originalGenerator)
features = ____extract features here____(x)
yield (x,y,features), y
#the last y will be a dummy output, necessary but not used
You could also, if you want the extra functionality of randomizing batch order and use multiprocessing, implement a class CustomGenerator(keras.utils.Sequence) following the same logic. The help page shows how.
So, let's compile and train the outer model (this also trains the inner model so you can use it later for predicting):
outerModel.compile(optimizer=..., loss=finalLoss)
outerModel.fit_generator(customGenerator(originalGenerator), batchesInOriginalGenerator,

How to use the result of embedding with mask_zero=True in keras

In keras, I want to calculate the mean of nonzero embedding output.
I wonder what is the difference between mask_zero=True or False in Embedding Layer.
I tried the code below :
input_data = Input(shape=(5,), dtype='int32', name='input')
embedding_layer = Embedding(1000, 24, input_length=5,mask_zero=True,name='embedding')
out = word_embedding_layer(input_data)
def antirectifier(x):
x = K.mean(x, axis=1, keepdims=True)
return x
def antirectifier_output_shape(input_shape):
shape = list(input_shape)
return tuple(shape)
out = Lambda(antirectifier, output_shape=antirectifier_output_shape,name='lambda')(out)
But it seems that the result is the mean of all the elements, how can i just calculate the mean of all nonzero inputs?
From the function's doc :
If this is True then all subsequent layers in the model need to
support masking
Your lambda function doesn't support masking. For example Recurrent layers in Keras support masking. If you set mask_zero=True in your embeddings, then all the 0 indices that you feed to the embedding layer will be propagated as "masked" and the following layers that are able to understand the "masked" information will use them.
Basically, if you build a "mean" layer that grabs the mask and computes the average only for non-masked values, then you will get the desired results.
You can find here a way to build your lambda layers that support masking
I hope it helps.

Random results from pre-trained InceptionV3 CNN

I'm trying to create an InceptionV3 CNN which has previously been trained on Imagenet. While the creation and the loading of the checkpoint seems to be working correctly, the result seems to be random, as everytime I run the script, I get a different result, even though I don't change anything. The network is recreated from scratch, the same unchanged network is loaded and the same image is classified (which to my understanding should still lead to the same result, even if it can't decide what the image actually is).
I just noticed that even if I try to classify the same image multiple times within the same execution of the script, I end up with a random result.
I create the CNN using like this
from tensorflow.contrib.slim.nets import inception as nn_architecture
from tensorflow.contrib import slim
with slim.arg_scope([slim.conv2d, slim.fully_connected], normalizer_fn=slim.batch_norm,
normalizer_params={'updates_collections': None}): ## this is a fix for an issue where the model doesn't fit the checkpoint
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
True, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope
afterwards I load the imagenet trained checkpoint like this
saver = tf.train.Saver()
saver.restore(sess, CHECKPOINT_PATH)
then I verify that it is workingby classifying this image
which I squish from it's original resolution to 299x299 which is required as input for the network
from skimage import io
car = io.imread("data/car.jpg")
car_scaled = zoom(car, [299 / car.shape[0], 299 / car.shape[1], 1])
car_cnnable = np.array([car_scaled])
Then I try to classify the image and print which class the image belongs to most likely and with what likelihood.
predictions =, feed_dict={images: car_cnnable})
predictions = np.squeeze(predictions) #shape (1, 1001) to shape (1001)
The class is (or seems to be) random and the likelihood varies as well.
My last few executions were:
Class - likelihood
899 - 0.98858
660 - 0.887204
734 - 0.904047
675 - 0.886952
Here is my full code:
Since I set isTraining to true, it applied the dropout rate every time the network was used. I was under the impression that this only happened during back propagation.
To get it to work correctly, the code should be
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
False, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope

Restore graph in TensorFlow - Restore the value in a Tensor

I have build an Artificial Neural Networks to predict values ​​for life insurance data. When I restore the graph i can import my predict tensor for see my value.
sess = tf.Session()
new_saver = tf.train.import_meta_graph('model.ckpt.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
inputs = graph.get_tensor_by_name("inputs:0")
predict_restore = graph.get_tensor_by_name("predicted:0")
train_data = pd.read_csv(r"C:\...\tensorflow-1.3.1\tensorflow\train_titanic.csv")
train_predict_restore = train_data.drop(["Survived"], axis=1)
In the feed_dict I put the attribute of client in tensor inputs. Now I want to build a function that inputs the attributes of the customer, I go to look for their respective probability of survival (prob). There is a function in tensorflow to search one or more value in a tensor? (in my situation tensor inputs)
I believe the train_predict_restore is in the shape of [num_customers attributes]. Therefore, train_predict_restore[i] represents the ith certain customer .
You can do something like this,
feed_dict={inputs:[train_predict_restore[i]]}//changed train_predict_restore to [train_predict_restore[i]]
Here, the output is the probability values for the ith customer.
Hope this helps.
